Oracle全文檢索之Context

cow977發表於2011-04-12

今天,按照徐進挺之“Oracle全文檢索”進行測試,發現Context型別索引對中文不能分詞,結果如下:

Create table docs (id number primary key, text varchar2(200));
Insert into docs values(1, 'california is a state in the us.');
Insert into docs values(2, 'paris is a city in france.');
Insert into docs values(3, 'france is in europe.');
Commit;
/
--建立context 索引
Create index idx_docs on docs(text)
indextype is ctxsys.context parameters
('filter ctxsys.null_filter section group ctxsys.html_section_group');
--查詢
Column text format a40;
Select id, text from docs where contains(text, 'france') > 0;

        ID TEXT
---------- ----------------------------------------
         2 paris is a city in france.
         3 france is in europe.

        
--繼續插入資料
Insert into docs values(4, 'los angeles is a city in california.');
Insert into docs values(5, 'mexico city is big.');
commit;
Select id, text from docs where contains(text, 'city') > 0;--新插入的資料沒有查詢到

        ID TEXT
---------- ----------------------------------------
         2 paris is a city in france.

--索引同步
exec ctx_ddl.sync_index('idx_docs', '2m');


Select id, text from docs where contains(text, 'city') > 0; --查到資料
        ID TEXT
---------- ----------------------------------------------------------------------
         2 paris is a city in france.
         4 los angeles is a city in california.
         5 mexico city is big.

Select id, text from docs where contains(text, 'city or state ') > 0;

        ID TEXT
---------- ----------------------------------------------------------------------
         1 california is a state in the us.
         2 paris is a city in france.
         4 los angeles is a city in california.
         5 mexico city is big.

Select id, text from docs where contains(text, 'city and state ') > 0;
Select id, text from docs where contains(text, 'city state ') > 0;
        ID TEXT
---------- ----------------------------------------------------------------------

SQL> Select SCORE(1),id, text from docs where contains(text, 'city or state',1) > 0;

  SCORE(1)         ID TEXT
---------- ---------- ----------------------------------------------------------------------
         5          1 california is a state in the us.
         4          2 paris is a city in france.
         4          4 los angeles is a city in california.
         4          5 mexico city is big.

--下面中文
Insert into docs values(14, '新華網東京4月12日電(記者藍建中)日本經濟產業省原子能安全保安院與日本原子能安全委員會12日上午舉行聯合新聞釋出會,正式宣佈根據國際核事件分級表,將福島第一核電站事故的嚴重程度評價提高到最高階別7級。');
Insert into docs values(15, '原子能安全保安院宣佈,福島第一核電站向大氣洩漏的放射性物質已達到37萬萬億貝克勒爾,而原子能安全委員會推斷為63萬萬億貝克勒爾,雖然數值存在差異,但都已經遠遠超過核電站事故7級的標準。');
Insert into docs values(16, '國際核事件分級表規定,如果放射性物質向外部的洩漏量達到數萬萬億貝克勒爾,就應定為7級。');
commit;

SQL> Select id, text from docs where contains(text, '原子能') > 0;

        ID TEXT
---------- ----------------------------------------------------------------------

SQL> Select id, text from docs;

        ID TEXT
---------- ----------------------------------------------------------------------
         1 california is a state in the us.
         2 paris is a city in france.
         3 france is in europe.
         4 los angeles is a city in california.
         5 mexico city is big.
        14 原子能安全保安院宣佈,福島第一核電站向大氣洩漏的放射性物質已達到37萬萬
           億貝克勒爾,而原子能安全委員會推斷為63萬萬億貝克勒爾,雖然數值存在差異
           ,但都已經遠遠超過核電站事故7級的標準。');
          
        15 原子能安全保安院宣佈,福島第一核電站向大氣洩漏的放射性物質已達到37萬萬
           億貝克勒爾,而原子能安全委員會推斷為63萬萬億貝克勒爾,雖然數值存在差異
           ,但都已經遠遠超過核電站事故7級的標準。

        16 國際核事件分級表規定,如果放射性物質向外部的洩漏量達到數萬萬億貝克勒爾
           ,就應定為7級。


8 rows selected

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/81227/viewspace-692318/,如需轉載,請註明出處,否則將追究法律責任。

相關文章