oracle全文索引之About_INDEX_THEMES操作

jolly10發表於2009-05-28
Oracle的全文索引除了支援文字匹配查詢,還支援對文章涵義的查詢。這是透過ABOUT操作實現的。[@more@]
Oracle預設情況下支援英文和法文的文章涵義支援。其他語言可以新增透過使用者自定義的語言庫來實現相應的功能。

在這之前需要先安裝Companion CD 的Oracle Database 10g Products Installation Type,其中包括了Oracle Text Supplied Knowledge Bases,否則在建立含有about的索引時會出錯。Companion CD可直接在官網下載,安裝也非常簡單。


下面看看英文環境下簡單的ABOUT操作用法:

SQL> conn myuser/myuser
Connected.
SQL> CREATE TABLE T (ID NUMBER PRIMARY KEY, DOCS VARCHAR2(1000));

Table created.

SQL> INSERT INTO T VALUES (1, 'You can augment the knowledge base to define concepts and terms specific to your

industry or query application. When you do so, ABOUT queries are more precise for the added concepts.');

1 row created.

SQL> INSERT INTO T VALUES (2, 'ABOUT queries perform best when you create a theme component in your index. Theme

components are created by default for English and French.');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> CREATE INDEX IND_T_DOCS ON T(DOCS) INDEXTYPE IS CTXSYS.CONTEXT;

Index created.

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(queries)') > 0;

ID
----------
1
2

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'queries') > 0;

ID
----------
1
2

預設的LEXER是BASIC_LEXER,因此,可以ABOUT操作已經可以起作用了,但是由於採用預設設定,ABOUT查詢的精度不高。雖然可以查詢到,但是這個單詞本身就是在文章中出現的,如果使用文章中沒有出現的單詞,ABOUT還是查詢不到。

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(definition)') > 0;

no rows selected

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'definition') > 0;

no rows selected


如果希望提高ABOUT操作的查詢精度,可以設定BASIC_LEXER中的INDEX_THEMES屬性的值為YES。

SQL> BEGIN
2 CTX_DDL.CREATE_PREFERENCE('TEST_ABOUT', 'BASIC_LEXER');
3 CTX_DDL.SET_ATTRIBUTE('TEST_ABOUT', 'INDEX_THEMES', 'YES');
4 CTX_DDL.SET_ATTRIBUTE('TEST_ABOUT', 'INDEX_TEXT', 'YES');
5 END;
6 /


SQL> DROP INDEX IND_T_DOCS;

Index dropped.

SQL> CREATE INDEX IND_T_DOCS ON T(DOCS) INDEXTYPE IS CTXSYS.CONTEXT
2 PARAMETERS ('LEXER TEST_ABOUT');

Index created.

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(definition)') > 0;

ID
----------
1

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'specifics') > 0;

no rows selected


現在已經可以看出差別來了,文章中並沒有包含definition單詞,但是Oracle自動生成了關於definition的主題。

下面再插入一些測試資料,試試主題搜尋的使用

SQL> Insert into MYUSER.T (ID, DOCS)
2 Values (3, 'Los Angeles is a city of american');

1 row created.

SQL> Insert into MYUSER.T (ID, DOCS)
2 Values (4, 'Los Angeles');

1 row created.

SQL> Insert into MYUSER.T (ID, DOCS)
2 Values (5, 'San Francisco is a city of american too');

1 row created.

SQL> Insert into MYUSER.T (ID, DOCS)
2 Values (6, 'ABOUT queries perform best when you create a theme component in your index.San Francisco is a

city of american too');

1 row created.

SQL> Insert into MYUSER.T (ID, DOCS)
2 Values (7, 'is Los Angeles');

1 row created.

SQL> commit;

Commit complete.


SQL> exec ctx_ddl.sync_index('IND_T_DOCS');

PL/SQL procedure successfully completed.


SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(California)') > 0;

ID
---
3
5
6

由於Los Angeles和San Francisco是所屬California洲,所以當查California時,Los Angeles和San Francisco的資料被查出來,但為何第4和第7筆資料不符合條件?可能是因為這些資料沒有意義。


順便再簡單說一下STOPTHEME,和STOPWORD類似,Oracle提供了停用主題STOPTHEME,可以人為的遮蔽不希望被ABOUT操作查詢到的主題。


SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(definition)') > 0;

ID
---
1

SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(specifics)') > 0;

ID
---
1


SQL> drop index IND_T_DOCS;

Index dropped.

SQL> BEGIN
2 CTX_DDL.CREATE_STOPLIST('TEST_BASIC', 'BASIC_STOPLIST');
3 CTX_DDL.ADD_STOPTHEME('TEST_BASIC','DEFINITION');
4 END;
5 /


SQL> CREATE INDEX IND_T_DOCS ON T(DOCS) INDEXTYPE IS CTXSYS.CONTEXT
2 PARAMETERS ('LEXER TEST_ABOUT STOPLIST TEST_BASIC');

Index created.


SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(definition)') > 0;

no rows selected


SQL> SELECT ID FROM T WHERE CONTAINS(DOCS, 'ABOUT(specifics)') > 0;

ID
---
1


上面的例子中,將DEFINITION新增到了TEST_BASIC停用表中,重建索引後,停用主題開始生效。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/271283/viewspace-1022558/,如需轉載,請註明出處,否則將追究法律責任。

相關文章