Solr--schema.xm要點翻譯

BtWangZhi發表於2017-08-27

原文：https://my.oschina.net/HuifengWang/blog/307471

<?xml version="1.0" encoding="UTF-8" ?>
  略...
<!--  
這是Solr的schema檔案，應該命名為schema.xml，並且在solr home的conf目錄下
（如，預設在./solr/conf/schema.xml）.

 有關如何根據需要定製化該檔案，請參照：
 http://wiki.apache.org/solr/SchemaXml  效能須知: 這裡包含了很多實際應用不需要的可選項。 為改善效能，你可以：
  - 儘量將所有僅用於搜尋，而不用於實際返回的欄位設定stored="false"；
  - 儘量將所有僅用於返回，而不用於搜尋的欄位設定indexed="false"；
  - 去掉所有不需要的copyField 語句；
  - 為了達到最佳的索引大小和搜尋效能,對所有的文字欄位設定indexed="false"，
    使用copyField將他們拷貝到“整合欄位”name="text"的欄位中，使用整合欄位進行搜尋；
  - 使用server模式來執行JVM,同時將log級別調高, 避免輸出所有請求的日誌。
-->

<schema name="example" version="1.5">
  略...

 <fields>
   <!-- fields各個屬性說明:
     name: 必須屬性 - 欄位名
     type: 必須屬性 - <types>中定義的欄位型別 
     indexed: 如果欄位需要被索引（用於搜尋或排序），屬性值設定為true
     stored: 如果欄位內容需要被返回，值設定為true
     docValues: 如果這個欄位應該有文件值（doc values），設定為true。文件值在門
           面搜尋，分組，排序和函式查詢中會非常有用。雖然不是必須的，而且會導致生成
           索引變大變慢，但這樣設定會使索引載入更快，更加NRT友好，更高的記憶體使用效率。
           然而也有一些使用限制：目前僅支援StrField, UUIDField和所有 Trie*Fields, 
           並且依賴欄位型別, 可能要求欄位為單值（single-valued）的,必須的或者有預設值。
     multiValued: 如果這個欄位在每個文件中可能包含多個值，設定為true
     termVectors: [false] 設定為true後，會儲存所給欄位的相關向量（vector）
           當使用MoreLikeThis時, 用於相似度判斷的欄位需要設定為stored來達到最佳效能.
     termPositions: 儲存和向量相關的位置資訊，會增加儲存開銷 
     termOffsets: 儲存 offset 和向量相關的資訊，會增加儲存開銷
     required: 欄位必須有值，否則會拋異常
     default: 在增加文件時，可以根據需要為欄位設定一個預設值，防止為空
   -->

   <!-- 欄位名由字母數字下劃線組成，且不能以數字開頭。兩端為下劃線的欄位為保留欄位，
      如(_version_)。
    -->

   <field name="id" type="string" indexed="true" stored="true" 
           required="true" multiValued="false" /> 

   <field name="title" type="text_general" indexed="true" 
           stored="true" multiValued="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="url" type="text_general" indexed="true" stored="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <!-- 注意: 為了節省空間,這個欄位預設不被索引, 因使用copyField被拷貝到了名為text的欄位中
      。用於內容返回和高亮。搜尋時使用text欄位 
   -->
   <field name="content" type="text_general" indexed="false" 
           stored="true" multiValued="true"/>

   <!-- 整合欄位(catchall field), 包含其他可搜尋的欄位 （通過copyField實現） -->
   <field name="text" type="text_general" indexed="true" 
           stored="false" multiValued="true"/>

   <!-- 保留欄位，不能刪除，否則報錯 -->
   <field name="_version_" type="long" indexed="true" stored="true"/>

 </fields>


 <!-- 文件的唯一標識，可理解為主鍵，除非標識為required="false", 否則值不能為空-->
 <uniqueKey>id</uniqueKey>

  <!-- 拷貝需要索引的欄位到整合欄位中  -->
   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="url" dest="text"/>

  <types>
    <!-- 欄位型別定義 -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" 
        positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" 
        positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" 
        positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" 
        positionIncrementGap="0"/>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" 
        positionIncrementGap="0"/>
      略...
    <!-- Thai，泰語型別欄位 -->
    <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ThaiWordFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
            words="lang/stopwords_th.txt" />
      </analyzer>
    </fieldType>

    <!-- Turkish，土耳其語型別欄位 -->
    <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.TurkishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" 
            words="lang/stopwords_tr.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
      </analyzer>
    </fieldType>

    <!-- Chinese，需要我們自己配置，整合mmseg4j就配置在這裡 -->
 </types>

  <!-- 文件相似度判斷依賴於文件相似度得分。 一個自定義的 Similarity 或 SimilarityFactory 
     可以在這裡指定, 但是預設的設定已經適合大多數應用。可以參考: 
     http://wiki.apache.org/solr/SchemaXml#Similarity
    -->
  <!--
     <similarity class="com.example.solr.CustomSimilarityFactory">
       <str name="paramkey">param value</str>
     </similarity>
    -->
</schema>

NOIP防翻車要點
2024-10-15
科技翻譯的特點
2013-06-17
waypoint常常被翻譯成航路點
2024-11-12
翻譯出版那點事兒【轉載】
2012-11-14
翻譯
2020-12-29
Yurii談翻譯（五）怎樣翻譯更地道：so…that…的翻譯
2012-01-08
GAN做影象翻譯的一點總結
2018-01-29
GAN做影像翻譯的一點總結
2018-01-29
具有中國文化色彩那點詞的翻譯
2013-07-11
如何完成中文翻譯日文線上翻譯
2019-09-23
Yurii談翻譯（四）怎樣翻譯更地道：翻譯如鋪路
2012-01-08
Yurii談翻譯（九）怎樣翻譯更地道：冠詞a的翻譯
2012-01-09
Yurii談翻譯（十）怎樣翻譯更地道：最高階的翻譯
2012-01-09
翻譯的未來：翻譯機器和譯後編譯
2013-06-13
編譯
最新研究進展：關於機器翻譯領域，這4個要點不得不關注
2021-12-27
Ubuntu安裝劃詞翻譯軟體Goldendict 單詞翻譯句子翻譯
2021-01-05
UbuntuGo
Yurii談翻譯（六）怎樣翻譯更地道：“as somebody said…”的翻譯
2012-01-08
AI
Yurii談翻譯（十三）怎樣翻譯更地道：It is…that…句型諺語的翻譯
2012-01-09
Yurii談翻譯（十四）怎樣翻譯更地道：否定句的翻譯
2012-01-09
Draft 文件翻譯 - 高階主題 - 管理焦點
2021-09-09
Raft
蝴蝶書-task2: 文字推理、摘要、糾錯 transformers實現翻譯 OpenAI翻譯 PyDeepLX翻譯 DeepLpro翻譯
2024-04-29
ORMOpenAI
Nginx翻譯
2017-11-10
Nginx
[翻譯] TransitionKit
2015-02-08
翻譯篇
2013-07-29
OllDbg翻譯
2003-06-18
LLDB
OpenCV翻譯專案總結二——Mat翻譯
2019-07-13
OpenCV
關於 blog文集和翻譯的一點想法
2011-11-28
文件翻譯器怎麼用？如何翻譯Word文件？
2019-08-15
Laravel 谷歌翻譯 /Bing 翻譯擴充套件包
2019-06-11
Laravel谷歌套件
使用google翻譯 api 翻譯中文成其他語言
2018-08-06
GoAPI
有道雲詞典--翻譯/螢幕取詞翻譯
2020-12-12
TailWind文件翻譯說明以及每日翻譯進度
2021-01-19
AI
翻譯軟體
2019-05-03
翻譯介面整理
2018-11-14
JavaPoet 文件翻譯
2018-02-19
Java
有趣的翻譯
2016-06-28
術語翻譯
2015-02-04
痛苦的翻譯
2007-05-16

Solr--schema.xm要點翻譯

相關文章