Oracle Text演算法模型
Oracle Text scoring Algorithm,Salton‘s Formula
[@more@]Scoring Algorithm for Word Queries
To calculate a relevance score for a returned document in a word query, Oracle uses an inverse frequency algorithm based on Salton's formula.
Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole.
The following table illustrates Oracle's inverse frequency scoring. The first column shows the number of documents in the document set, and the second column shows the number of terms in the document necessary to score 100.
This table assumes that only one document in the set contains the query term.
Number of Documents in Document Set | Occurrences of Term in Document Needed to Score 100 |
---|---|
1 | 34 |
5 | 20 |
10 | 17 |
50 | 13 |
100 | 12 |
500 | 10 |
1,000 | 9 |
10,000 | 7 |
100,000 | 5 |
1,000,000 | 4 |
The table illustrates that if only one document contained the query term and there were five documents in the set, the term would have to occur 20 times in the document to score 100. Whereas, if there were 1,000,000 documents in the set, the term would have to occur only 4 times in the document to score 100.
Example
You have 5000 documents dealing with chemistry in which the term chemical occurs at least once in every document. The term chemical thus occurs frequently in the document set.
You have a document that contains 5 occurrences of chemical and 5 occurrences of the term hydrogen. No other document contains the term hydrogen. The term hydrogen thus occurs infrequently in the document set.
Because chemical occurs so frequently in the document set, its score for the document is lower with respect to hydrogen, which is infrequent is the document set as a whole. The score for hydrogen is therefore higher than that of chemical. This is so even though both terms occur 5 times in the document.
Note: Even if the relatively infrequent term hydrogen occurred 4 times in the document, and chemical occurred 5 times in the document, the score for hydrogen might still be higher, because chemical occurs so frequently in the document set (at least 5000 times). |
Inverse frequency scoring also means that adding documents that contain hydrogen lowers the score for that term in the document, and adding more documents that do not contain hydrogen raises the score.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/46332/viewspace-1006935/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE TEXT(轉)Oracle
- Oracle Text簡介Oracle
- Oracle Text SQL Statements and OperatorsOracleSQL
- 文字向量化模型acge_text_embedding模型
- Oracle 安裝Full Text searchOracle
- ORACLE TEXT DATASTORE PREFERENCE(一) 【ORACLE 全文索引】OracleAST索引
- Oracle Text 學習筆記(11G)<一> :手工建立TEXTOracle筆記
- ORACLE TEXT安裝與解除安裝Oracle
- Oracle Text Health Check (Doc ID 823649.1)Oracle
- 手工建立oracle text全文檢索元件Oracle元件
- 演算法金 | 機器學習模型評價、模型與演算法選擇(綜述)演算法機器學習模型
- 測試庫csdb安裝ORACLE_TEXT元件Oracle元件
- 探究 Text Kit 和 Core Text 的前世今生 (Text Kit 篇)
- Text2Cypher:大語言模型驅動的圖查詢生成模型
- Oracle如何根據SQL_TEXT生成SQL_IDOracleSQL
- text/html和text/plain的區別HTMLAI
- 探索大模型:袋鼠雲在 Text To SQL 上的實踐與最佳化大模型SQL
- jQuery text()jQuery
- jQuery :textjQuery
- Text Representation
- 演算法金 | 突破最強演算法模型,決策樹演算法!!演算法模型
- 演算法金 | 一個強大的演算法模型,GPR !!演算法模型
- 演算法金 | 一個強大的演算法模型,GP !!演算法模型
- 演算法金 | Transformer,一個神奇的演算法模型!!演算法ORM模型
- Oracle 20c 的 In-Memory 新特性 Spatial 和 Text 支援Oracle
- elasticsearch演算法之搜尋模型(一)Elasticsearch演算法模型
- 模型壓縮-剪枝演算法詳解模型演算法
- QEM 網格模型簡化演算法模型演算法
- 決策樹模型(4)Cart演算法模型演算法
- 基於動態混合高斯模型的商品價格模型演算法模型演算法
- CSS 文字裝飾 text-decoration & text-emphasisCSS
- 線性迴歸演算法模型與線性分類演算法模型聯絡與區別初探演算法模型
- text-to-motion
- text1
- 使用DATAPUMP升級DB與Oracle Text索引無法成功匯入Oracle索引
- SAP MM Table to read PO Header text and item textHeader
- 05EM演算法-高斯混合模型-GMM演算法模型
- 機器學習中演算法與模型的區別機器學習演算法模型