檔案智慧搜尋

zwbsoft發表於2024-04-09
  1. 配置拼音搜尋:

    複製pinyin4j-2.5.0.jar、pinyinAnalyzer.jar這兩個jar包到solr-8.5.0/server/solr-webapp/webapp/WEB-INF/lib目錄下修改solr-8.5.0/server/solr/conf下的managed-schema

    在檔案中增加如下內容:這裡設定fieldType的name為 text_pinyin

    <fieldType name="text_pinyin" class="solr.TextField" positionIncrementGap="0">
    <analyzer type="index">
    <tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory"/>
    <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2"/>
    <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" maxGram="20" minGram="1"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory"/>
    <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2"/>
    <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" maxGram="20" minGram="1"/>
    </analyzer>
    </fieldType>
  2. 配置SuggestComponent:SuggestComponent 為使用者提供查詢術語的自動建議.該建議器的主要特點是:查詢實現可插拔性,術語詞典可插拔性,使您可以靈活選擇詞典實現, 分散式支援.

    第一步是新增一個搜尋元件solrconfig.xml並告訴它使用 SuggestComponent。

    <searchComponent name="suggest" class="solr.SuggestComponent">
      <lst name="suggester">
        <str name="name">mySuggester</str>
        <str name="lookupImpl">FuzzyLookupFactory</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">cat</str>
        <str name="weightField">price</str>
        <str name="suggestAnalyzerFieldType">string</str>
        <str name="buildOnStartup">false</str>
      </lst>
    </searchComponent>

    新增搜尋元件後,必須將請求處理程式新增到solrconfig.xml

    <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
      <lst name="defaults">
        <str name="suggest">true</str>
        <str name="suggest.count">10</str>
      </lst>
      <arr name="components">
        <str>suggest</str>
      </arr>
    </
    requestHandler>
  3. 配置拼寫檢查:

    SpellCheck 元件旨在根據其他類似術語提供內聯查詢建議.

    這些建議的基礎可以是 Solr 中欄位中的術語、外部建立的文字檔案或其他 Lucene 索引中的欄位。

    solrconfig.xml使用以下配置

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">name</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <int name="maxQueryLength">40</int>
        <float name="maxQueryFrequency">0.01</float>
        <float name="thresholdTokenFrequency">.01</float>
      </lst>
    </
    searchComponent>

    使用FileBasedSpellChecker外部檔案作為拼寫詞典

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="classname">solr.FileBasedSpellChecker</str>
        <str name="name">file</str>
        <str name="sourceLocation">spellings.txt</str>
        <str name="characterEncoding">UTF-8</str>
        <str name="spellcheckIndexDir">./spellcheckerFile</str>
     </lst><
    /
    searchComponent>

相關文章