ZT oracle全文索引
- 索引類別:
到10GR2為止,共有四種全文搜尋型別:
Index Type | Description | Supported Preferences and Parameters | Query Operator | Notes |
---|---|---|---|---|
CONTEXT | Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as MS Word, HTML or plain text. With a context index, you can customize your index in a variety of ways. This index type requires | All CREATE INDEX preferences and parameters supported except for INDEX SET . These supported parameters include the index partition clause, and the format, charset, and language columns. | CONTAINS Grammar is called the The | Supports all documents services and query services. Supports indexing of partitioned text tables. |
CTXCAT | Use this index type for better mixed query performance. Typically, with this index type, you index small documents or text fragments. Other columns in the base table, such as item names, prices and descriptions can be included in the index to improve mixed query performance. This index type is transactional, automatically updating itself after DML to base table. No | INDEX SET
Format, charset, and language columns not supported. Table and index partitioning not supported. | CATSEARCH Grammar is called The Theme querying is supported. | This index is larger and takes longer to build than a CONTEXT index. The size of a The |
CTXRULE | Use CTXRULE index to build a document classification or routing application. The CTXRULE index is an index created on a table of queries, where the queries define the classification or routing criteria. | See . | MATCHES | Single documents (plain text, HTML, or XML) can be classified using the MATCHES operator, which turns a document into a set of queries and finds the matching rows in the CTXRULE index. |
CTXPATH | Create this index when you need to speed up existsNode() queries on an XMLType column. | STORAGE | Use with existsNode() | Can only create this index on XMLType column. Although this index type can be helpful for |
- 建立語法
CREATE INDEX [schema.]index ON [schema.]table(column) INDEXTYPE IS
ctxsys.context [ONLINE] [LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]
[, PARTITION [partition] [PARAMETERS('paramstring')]])]
[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE]];
這裡的context還可以換成ctxcat、ctxrule、ctxpath幾個型別,支援分割槽索引.這裡很重要的部分
是PARAMETERS部分,定義了索引的各種建立條件.
The syntax for paramstring
is as follows:
paramstring =
'[DATASTORE datastore_pref] 定義要所索引的資料來自哪裡, [FILTER filter_pref] 定義過濾器, [CHARSET COLUMN charset_column_name] 當選用CHARSET_FILTER過濾器時,顯示指定
一個過濾器需要的列 [FORMAT COLUMN format_column_name] 過濾器需要格式化的列 [LEXER lexer_pref] 定義一個此法分析器 [LANGUAGE COLUMN language_column_name] 多語言環境中,顯示指定lexer分析的多語言列 [WORDLIST wordlist_pref] 定義一個常用搜尋詞表 [STORAGE storage_pref] 定義索引的儲存引數 [STOPLIST stoplist] 定義不需要index的片語列表 [SECTION GROUP section_group] 定義一個索引區域(通常用在索引html、xml的環境) [MEMORY memsize] 分配索引建立需要的記憶體空間(in 10G) [POPULATE | NOPOPULATE] 是否建立一個空的索引 [[METADATA] SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)] 定義索引同步時間(in 10G) [TRANSACTIONAL]' 是否開啟事務更新(in 10G)
索引建立
[oracle@ique ~]$ sqlplus mayp/mayp
SQL*Plus : Release 10.2.0.4.0 - Production on Thu Aug 21 19:45:43 2008
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing option
SQL> create table oratext(id number,name varchar2(30),address varchar2(200))
pctfree 10 tablespace assm/ 2
Table created.
SQL> alter table oratext add constraint pk_oratext primary key(id) using index/ 2
Table altered.
SQL> exec ctx_ddl.create_preference('mayp_storage','basic_storage');Elapsed: 00:00:00.22
SQL> exec ctx_ddl.set_attribute('mayp_storage','I_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.11SQL> exec ctx_ddl.set_attribute('mayp_storage','K_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.04SQL> exec ctx_ddl.set_attribute('mayp_storage','R_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.06
SQL> set feedback onSQL> exec ctx_ddl.set_attribute('mayp_storage','N_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','I_INDEX_CLAUSE','TABLESPACE TEST COMPRESS 2');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','P_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.04SQL> exec ctx_ddl.create_preference('mayp_lexer','chinese_lexer')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02SQL> exec ctx_ddl.create_preference('mayp_datastore','direct_datastore');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.06
SQL> create index oratext_address on oratext(address) indextype is ctxsys.context
parameters('lexer mayp_lexer datastore mayp_datastore storage mayp_storage')/ 2
Index created.
Elapsed: 00:00:04.48
這裡我重新定了了三個索引的引數:mayp_datastore(要索引的資料的儲存方式),mayp_storage(索引的儲存位置),
mayp_lexer(詞法分割器).在索引建立完之後,會發現多了以dr$index_name開頭的四張表和
一個dr$索引和oratext_address索引:
SQL> select table_name,tablespace_name from user_tables;
Tab.Name TABLESPACE_NAME------------------------------ ------------------------------DR$ORATEXT_ADDRESS$I TESTDR$ORATEXT_ADDRESS$R TESTSYS_IOT_OVER_45158 MSSMSYS_IOT_MAP_45158 MSSMORATEXT ASSMDR$ORATEXT_ADDRESS$NDR$ORATEXT_ADDRESS$K
SQL> select index_name,tablespace_name from user_indexes;
Ind.Name TABLESPACE_NAME------------------------------ ------------------------------SYS_IL0000011906C00002$$ TESTPK_P TESTSYS_IL0000011907C00002$$ TESTDR$ORATEXT_ADDRESS$X TESTSYS_IL0000046825C00006$$ TESTSYS_IL0000046830C00002$$ TESTORATEXT_ADDRESSSYS_IL0000046726C00006$$ TESTSYS_IL0000046731C00002$$ TEST
其中DR$ORATEXT_ADDRESS$I這張表最關鍵,它儲存被lexer分割後的token,也就是我們可以利用搜尋的關鍵字 .
TOKEN_TEXT----------------------------------------------------------------1128工業海街號街南路南蘇桐蘇州市桐路星海園區
13 rows selected.
Elapsed: 00:00:00.01
透過一個查詢來看看:
SQL> select name,address from oratext where contains(address,'園區')>0;
NAME ADDRESS------------------------------ --------------------------------------------------王皓 蘇州市工業園區星海街南11號司馬格達 蘇州市工業園區蘇桐路28號張磊 蘇州市工業園區蘇桐路28號
3 rows selected.
Elapsed: 00:00:01.49
SQL> select name,address from oratext where contains(address,'蘇桐路')>0;
NAME ADDRESS------------------------------ --------------------------------------------------司馬格達 蘇州市工業園區蘇桐路28號張磊 蘇州市工業園區蘇桐路28號
2 rows selected.
Elapsed: 00:00:00.07
SQL> select name,address from oratext where contains(address,'星海街')>0;
NAME ADDRESS------------------------------ --------------------------------------------------王皓 蘇州市工業園區星海街南11號
1 row selected.
索引維護
ctxsys.context全文索引在發生DML語句後,不同自動同步索引,必須透過ctx_ddl.sync_index同步,
可以定義一個job進行同步,
SQL> insert into oratext values(4,'馬成','蘇州市觀前街7號');
1 row created.
Elapsed: 00:00:00.08SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> select name,address from oratext where contains(address,'觀前街')>0;
no rows selected
Elapsed: 00:00:00.09
SQL> exec ctx_ddl.sync_index('oratext_address','2m');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.18
SQL> select name,address from oratext where contains(address,'觀前街')>0;
NAME ADDRESS------------------------------ --------------------------------------------------馬成 蘇州市觀前街7號
1 row selected.
Elapsed: 00:00:00.02
在索引的同步過程中,可能會產生碎片,我們可以定時透過ctx_ddl.optimize_index維護索引.
ctx_report package可以報告全文索引的一些統計資訊:
ctx_report.index_size:報告索引的大小
Elapsed: 00:00:00.00SQL> select ctx_report.index_size('oratext_address') from dual;
CTX_REPORT.INDEX_SIZE('ORATEXT_ADDRESS')--------------------------------------------------------------------------------=========================================================================== INDEX SIZE FOR MAYP.ORATEXT_ADDRESS===========================================================================TABLE: MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME: TESTBLOCKS ALLOCATED: 8BLOCKS USED: 3BYTES ALLOCATED: 65,536 (64.00 KB)BYTES USED: 24,576 (24.00 KB)
LOB SEGMENT: MAYP.SYS_LOB0000046825C00006$$TABLE NAME: MAYP.DR$ORATEXT_ADDRESS$ILOB COLUMN: TOKEN_INFOTABLESPACE NAME: TESTBLOCKS ALLOCATED: 8BLOCKS USED: 2BYTES ALLOCATED: 65,536 (64.00 KB)BYTES USED: 16,384 (16.00 KB)
INDEX (NORMAL): MAYP.DR$ORATEXT_ADDRESS$XTABLE NAME: MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME: TEST.........................
TOTALS FOR INDEX MAYP.ORATEXT_ADDRESS---------------------------------------------------------------------------TOTAL BLOCKS ALLOCATED: 72TOTAL BLOCKS USED: 19TOTAL BYTES ALLOCATED: 589,824 (576.00 KB)TOTAL BYTES USED: 155,648 (152.00 KB)
ctx_report.create_index_script:建立索引定義指令碼
SQL> select ctx_report.create_index_script('oratext_address') from dual;
CTX_REPORT.CREATE_INDEX_SCRIPT('ORATEXT_ADDRESS')--------------------------------------------------------------------------------begin ctx_ddl.create_preference('"ORATEXT_ADDRESS_DST"','DIRECT_DATASTORE');end;/
begin ctx_ddl.create_preference('"ORATEXT_ADDRESS_FIL"','NULL_FILTER');end;..........................
create index "MAYP"."ORATEXT_ADDRESS" on "MAYP"."ORATEXT" ("ADDRESS") indextype is ctxsys.context parameters(' datastore "ORATEXT_ADDRESS_DST" filter "ORATEXT_ADDRESS_FIL" section group "ORATEXT_ADDRESS_SGP" lexer "ORATEXT_ADDRESS_LEX" wordlist "ORATEXT_ADDRESS_WDL" stoplist "ORATEXT_ADDRESS_SPL" storage "ORATEXT_ADDRESS_STO" ')/
begin ctx_output.end_log;end;/1 row selected.
Elapsed: 00:00:00.23
ctx_report.index_stats:報告索引的內部結構,已經碎片的程度
SQL> declare l_index_stats clob; begin ctx_report.index_stats(index_name=>'oratext_address',report=>l_index_stats); dbms_output.put_line(l_index_stats); end; / 2 3 4 5 6 7 =========================================================================== STATISTICS FOR"MAYP"."ORATEXT_ADDRESS"===========================================================================
indexed documents: 3allocated docids:4$I rows: 17
--------------------------------------------------------------------------- TOKENSTATISTICS---------------------------------------------------------------------------
unique tokens: 17average $I rows per token:1.00tokens with most $I rows: 園區 (0:TEXT) 1 星海 (0:TEXT) 1 桐路 (0:TEXT)1 蘇州市 (0:TEXT) 1 蘇桐 (0:TEXT) 1 前街 (0:TEXT)1
...........
--------------------------------------------------------------------------- FRAGMENTATIONSTATISTICS---------------------------------------------------------------------------
total size of $I data: 93
$I rows:17estimated $I rows if optimal: 17estimated row fragmentation: 0 %
garbage docids:1estimated garbage size: 19
most fragmented tokens: 園區 (0:TEXT) 0 % 星海 (0:TEXT)0 % 桐路 (0:TEXT) 0 % 蘇州市 (0:TEXT) 0 % 蘇桐 (0:TEXT)0 % 前街 (0:TEXT) 0 % 南 (0:TEXT) 0 % 路 (0:TEXT)0 % 街南 (0:TEXT) 0 % 街 (0:TEXT) 0 % 號 (0:TEXT)0 % 海街 (0:TEXT) 0 % 觀前 (0:TEXT) 0 % 工業 (0:TEXT)0 % 7 (0:TEXT) 0 % 28 (0:TEXT) 0 % 11 (0:TEXT)0 %
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03這個索引沒有碎片.
--結束
參考文件:
Oracle Text Application Developer's Guide Release 9.2
Oracle Text Reference Release 9.2
Oracle® Text Application Developer's Guide 10g Release 2 (10.2)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/82387/viewspace-1017150/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 不錯的關於Oracle 全文索引的文章(zt)Oracle索引
- Oracle:全文索引Oracle索引
- Oracle的全文索引Oracle索引
- oracle 全文索引的配置Oracle索引
- ORACLE TEXT DATASTORE PREFERENCE(一) 【ORACLE 全文索引】OracleAST索引
- oracle全文索引之STORAGE PREFERENCEOracle索引
- oracle全文索引之WORDLIST PREFERENCEOracle索引
- oracle全文索引的簡單配置Oracle索引
- oracle enqueue(zt)OracleENQ
- ORACLE LARGE MEMORY(zt)Oracle
- oracle job管理(zt)Oracle
- oracle event 2 (zt)Oracle
- [zt] Oracle LogMinerOracle
- zt_oracle indexOracleIndex
- Oracle RAC TAF [zt]Oracle
- Oracle的MTS (ZT)Oracle
- oracle block 格式 (zt)OracleBloC
- ORACLE ROWID (zt)Oracle
- [zt] ORACLE RAC原理Oracle
- jboss oracle rac (zt)Oracle
- oracle全文索引之幾個關鍵表Oracle索引
- oracle全文索引之commit與DML操作Oracle索引MIT
- oracle全文索引之如何實現查詢Oracle索引
- 【轉載】oracle全文索引的建立和使用Oracle索引
- oracle全文索引之STOPLIST_4_MULTI_STOPLISTOracle索引
- oracle全文索引之STOPLIST_3_DEFAULT_STOPLISTOracle索引
- oracle全文索引之STOPLIST_2_EMPTY_STOPLISTOracle索引
- oracle全文索引之STOPLIST_1_BASIC_STOPLISTOracle索引
- oracle全文索引之LEXER_4_MULTI_LEXEROracle索引
- oracle全文索引之LEXER_3_DEFAULT_LEXEROracle索引
- oracle全文索引之LEXER_2_CHINESE_LEXEROracle索引
- oracle全文索引之LEXER_1_BASIC_LEXEROracle索引
- oracle全文索引之FILTER_4_PROCEDURE_FILTEROracle索引Filter
- oracle全文索引之FILTER_3_FORMAT_COLUMNOracle索引FilterORM
- oracle全文索引之FILTER_1_NULL_FILTEROracle索引FilterNull
- oracle全文索引之datastore_6_NESTED_DATASTOREOracle索引AST
- oracle全文索引之datastore_5_detail_datastoreOracle索引ASTAI
- oracle全文索引之datastore_4_URL_DATASTOREOracle索引AST