java+lucene中文分詞，搜尋引擎搜詞剖析

huangxie發表於2016-05-17

我想只要是學過資料庫的孩紙，不管是mysql，還是sqlsever，一提到查詢，本能的想到的便是like關鍵字，其實去轉盤網（分類模式）之前也是採用這種演算法，但我可以告訴大家一個很不幸的事情，like匹配其實會浪費大量的有用資源，原因這裡不說了請自己想一想，我們還是直接擺事實驗證。

現在用去轉盤網搜：hello 找個單詞，如下：

http://www.quzhuanpan.com/sou…

翻頁你會發現只要是包含hello的單詞都找到了，但是如果你用like的話是不會有這個效果的，不信讓我們再看一下，還好他來說電影網的分詞演算法我還沒來得及修改，還可以看到現象：

http://www.talaishuo.com/sear…

你會發現只有開始包含hello這個欄位的搜尋串才能得到匹配，這就問題來了，資料庫中大量的資源豈不是白白浪費了，不過沒事，偉大的人類還是很聰明的，發明了分詞，分詞的原理我就不講了，請自己百度吧，還是直接上程式碼，提示，這裡需要四個jar包作為工具，我先上傳的去轉盤，想要做分詞的請先下載：

分詞包下載地址1

分詞包下載地址2

直接看程式碼：

package com.tray.indexData;
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Fieldable;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.wltea.analyzer.lucene.IKAnalyzer;
 
import com.tray.bean.SerachResult;
import com.tray.common.tools.DateFormater;
 
public class LuceneSearch {
     
    private static String DISC_URL = "/home/indexData/data";
     
    static {
        String os = System.getProperty("os.name");  
        if(os.toLowerCase().startsWith("win")){  
            DISC_URL = "E:\indexData\data"; 
        }
        else{
            DISC_URL ="/home/indexData/data";
        }
    }
         
    //指定分詞器 
    private Analyzer analyzer=new IKAnalyzer(); 
    private static Directory directory;
    //配置
    private static IndexWriterConfig iwConfig;
    //配置IndexWriter
    private static IndexWriter writer;  
    private static File indexFile = null;  
     
    private static Version version = Version.LUCENE_36;
     
    private final int PAPGESIZE=10;
 
    /**
     * 全量索引
     * @Author haoning
     */
    public void init() throws Exception {
         
        try {
            indexFile = new File(DISC_URL);
            if (!indexFile.exists()) {
                indexFile.mkdir();
            }
            directory=FSDirectory.open(indexFile);  
            //配置IndexWriterConfig  
            iwConfig = new IndexWriterConfig(version,analyzer);  
            iwConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);  
                //建立寫索引物件  
            writer = new IndexWriter(directory,iwConfig);   
        } catch (Exception e) {
        }
    }
     
    public void closeWriter(){
        try {
            writer.close();
        } catch (CorruptIndexException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
     
    public void commit(){
         
        try {
            writer.commit();
        } catch (CorruptIndexException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
     
    /**
     * 一個一個索引
     * @Author haoning
     */
    public void singleIndex(Document doc) throws Exception {
        writer.addDocument(doc);
    }
     
    /**
     * 一個跟新
     * @Author haoning
     */
    public void singleUpdate(Document doc) throws Exception {
        Term term = new Term("url", doc.get("url"));
        writer.updateDocument(term,doc);
    }
     
    /**
     * 全量索引
     * @Author haoning
     */
    public void fullIndex(Document[] documentes) throws Exception {
         
        writer.deleteAll();
        for (Document document : documentes) {
            writer.addDocument(document);
        }
        writer.commit();
    }
     
    /**
     * 根據id刪除索引
     * @Author haoning
     */
    public void deleteIndex(Document document)throws Exception{
        Term term = new Term("url", document.get("url"));//url才是唯一標誌
        writer.deleteDocuments(term);
        writer.commit();
    }
     
    /**
     * 根據id增量索引
     * @Author haoning
     */
    public void updateIndex(Document[] documentes) throws Exception{
        for (Document document : documentes) {
            Term term = new Term("url", document.get("url"));
            writer.updateDocument(term, document);
        }
        writer.commit();
    }
     
    /**
     * 直接查詢
     * @Author haoning
     */
    public void simpleSearch(String filedStr,String queryStr,int page, int pageSize) throws Exception{
        File indexDir = new File(DISC_URL);  
        //索引目錄  
        Directory dir=FSDirectory.open(indexDir);  
        //根據索引目錄建立讀索引物件  
        IndexReader reader = IndexReader.open(dir);  
        //搜尋物件建立  
        IndexSearcher searcher = new IndexSearcher(reader);
        TopScoreDocCollector topCollector = TopScoreDocCollector.create(searcher.maxDoc(), false);
         
        Term term = new Term(filedStr, queryStr);
        Query query = new TermQuery(term);
        searcher.search(query, topCollector);
        ScoreDoc[] docs = topCollector.topDocs((page-1)*pageSize, pageSize).scoreDocs;
         
        printScoreDoc(docs, searcher);
    }
     
    /**
     * 高亮查詢
     * @Author haoning
     */
    public Map<String, Object> highLightSearch(String filed,String keyWord,int curpage, int pageSize) throws Exception{
        List<SerachResult> list=new ArrayList<SerachResult>();
        Map<String,Object> map = new HashMap<String,Object>();
        if (curpage <= 0) {
            curpage = 1;
        }
        if (pageSize <= 0 || pageSize>20) {
             pageSize = PAPGESIZE;
        }
        File indexDir = new File(DISC_URL); //索引目錄   
        Directory dir=FSDirectory.open(indexDir);//根據索引目錄建立讀索引物件    
        IndexReader reader = IndexReader.open(dir);//搜尋物件建立    
        IndexSearcher searcher = new IndexSearcher(reader);
         
        int start = (curpage - 1) * pageSize;
         
        Analyzer analyzer = new IKAnalyzer(true);
        QueryParser queryParser = new QueryParser(Version.LUCENE_36, filed, analyzer);
        queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
        Query query = queryParser.parse(keyWord);
         
        int hm = start + pageSize;
        TopScoreDocCollector res = TopScoreDocCollector.create(hm, false);
        searcher.search(query, res);
         
        SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style=`color:red`>", "</span>");
        Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
         
        long amount = res.getTotalHits();
        //long pages = (rowCount - 1) / pageSize + 1; //計算總頁數
         
        map.put("amount",amount);//總共多少條記錄
         
        TopDocs tds = res.topDocs(start, pageSize);
        ScoreDoc[] sd = tds.scoreDocs;
         
        for (int i = 0; i < sd.length; i++) {
            Document doc = searcher.doc(sd[i].doc);
            String temp=doc.get("name");
            //做高亮處理
            TokenStream ts = analyzer.tokenStream("name", new StringReader(temp));
             
            SerachResult record=new SerachResult();
            String name = highlighter.getBestFragment(ts,temp); 
            String skydirverName=doc.get("skydirverName");
            String username=doc.get("username");
            String shareTime=doc.get("shareTime");
            String describ=doc.get("describ");
            String typeId=doc.get("typeId");
            String id=doc.get("id");
            String url=doc.get("url");
             
            record.setName(name);
            record.setSkydriverName(skydirverName);
            record.setUsername(username);
            record.setShareTime(DateFormater.getFormatDate(shareTime,"yyyy-MM-dd HH:mm:ss"));
            record.setDescrib(describ);
            record.setTypeId(Integer.parseInt(typeId));
            record.setId(new BigInteger(id));
            record.setUrl(url);
            list.add(record);
             
            /*System.out.println("name:"+name);
            System.out.println("skydirverName:"+skydirverName);
            System.out.println("username:"+username);
            System.out.println("shareTime:"+shareTime);
            System.out.println("describ:"+describ);
            System.out.println("typeId:"+typeId);
            System.out.println("id:"+id);
            System.out.println("url:"+url);*/
        }
        map.put("source",list);
        return map;
    }
     
    /**
     * 根據字首查詢
     * @Author haoning
     */
    public void prefixSearch(String filedStr,String queryStr) throws Exception{
        File indexDir = new File(DISC_URL);  
        //索引目錄  
        Directory dir=FSDirectory.open(indexDir);  
        //根據索引目錄建立讀索引物件  
        IndexReader reader = IndexReader.open(dir);  
        //搜尋物件建立  
        IndexSearcher searcher = new IndexSearcher(reader);
         
        Term term = new Term(filedStr, queryStr);
        Query query = new PrefixQuery(term);
         
        ScoreDoc[] docs = searcher.search(query, 3).scoreDocs;
        printScoreDoc(docs, searcher);
    }
     
    /**
     * 萬用字元查詢
     * @Author haoning
     */
    public void wildcardSearch(String filedStr,String queryStr) throws Exception{
        File indexDir = new File(DISC_URL);  
        //索引目錄  
        Directory dir=FSDirectory.open(indexDir);  
        //根據索引目錄建立讀索引物件  
        IndexReader reader = IndexReader.open(dir);  
        //搜尋物件建立  
        IndexSearcher searcher = new IndexSearcher(reader);
         
        Term term = new Term(filedStr, queryStr);
        Query query = new WildcardQuery(term);
        ScoreDoc[] docs = searcher.search(query, 3).scoreDocs;
        printScoreDoc(docs, searcher);
    }
     
    /**
     * 分詞查詢
     * @Author haoning
     */
    public void analyzerSearch(String filedStr,String queryStr) throws Exception{
        File indexDir = new File(DISC_URL);  
        //索引目錄  
        Directory dir=FSDirectory.open(indexDir);  
        //根據索引目錄建立讀索引物件  
        IndexReader reader = IndexReader.open(dir);  
        //搜尋物件建立  
        IndexSearcher searcher = new IndexSearcher(reader);
         
        QueryParser queryParser = new QueryParser(version, filedStr, analyzer);
        Query query = queryParser.parse(queryStr);
         
        ScoreDoc[] docs = searcher.search(query, 3).scoreDocs;
        printScoreDoc(docs, searcher);
    }
     
    /**
     * 多屬性分詞查詢
     * @Author haoning
     */
    public void multiAnalyzerSearch(String[] filedStr,String queryStr) throws Exception{
        File indexDir = new File(DISC_URL);  
        //索引目錄  
        Directory dir=FSDirectory.open(indexDir);  
        //根據索引目錄建立讀索引物件  
        IndexReader reader = IndexReader.open(dir);  
        //搜尋物件建立  
        IndexSearcher searcher = new IndexSearcher(reader);
        QueryParser queryParser = new MultiFieldQueryParser(version, filedStr, analyzer);
        Query query = queryParser.parse(queryStr);
         
        ScoreDoc[] docs = searcher.search(query, 3).scoreDocs;
        printScoreDoc(docs, searcher);
    }
     
    public void printScoreDoc(ScoreDoc[] docs,IndexSearcher searcher)throws Exception{
        for (int i = 0; i < docs.length; i++) {
            List<Fieldable> list = searcher.doc(docs[i].doc).getFields();
            for (Fieldable fieldable : list) {
                String fieldName = fieldable.name();
                String fieldValue = fieldable.stringValue();
                System.out.println(fieldName+" : "+fieldValue);
            }
        }
    }
}
注意由於去轉盤網（http://www.quzhuanpan.com)是部署到linux上的，所以DISC_URL可以更具系統變換，我是通過url來判定索引檔案是否唯一的，你可以更具id來判斷，具體情況具體對待吧。
package com.tray.indexData;
 
import java.sql.SQLException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import com.mysql.jdbc.Connection;
import com.mysql.jdbc.ResultSet;
import com.mysql.jdbc.Statement;
 
public class IndexFile {
     
     private static Connection conn = null;     
     private static Statement stmt = null;  
     private final int NUM=500000;
     private LuceneSearch ls;
     private long count=0;
      
     public ResultSet deal6SourceTable(String tableName) throws SQLException{
           String sql = "SELECT distinct `NAME`,SKYDRIVER_NAME,USERNAME,SHARE_TIME,DESCRIB,TYPE_ID,ID,URL FROM "+tableName+" where STATUS=1 and TYPE_ID !=`-1` and (TYPE_NAME is null or TYPE_NAME!=1) limit "+NUM;
           //System.out.println(sql);
           ResultSet rs = (ResultSet) stmt.executeQuery(sql);
           return rs;
     }
      
     public void update6SourceTable(String tableName) throws SQLException{
           Statement st = (Statement) conn.createStatement();
           String sql = "update "+tableName+" set TYPE_NAME=1 where STATUS=1 and TYPE_ID !=`-1` and (TYPE_NAME is null or TYPE_NAME!=1) limit "+NUM;
           //System.out.println("update"+sql);
            try {
                st.executeUpdate(sql);
            } catch (SQLException e) {
                e.printStackTrace();
            }
     }
      
     public void indexInit(){//資料庫+lcene初始化
        conn = (Connection) JdbcUtil.getConnection();     
        if(conn == null) {     
            try {
                throw new Exception("資料庫連線失敗！");
            } catch (Exception e) {
                e.printStackTrace();
            }     
        }
        ls=new LuceneSearch();
        try {
            ls.init();
        } catch (Exception e2) {
            e2.printStackTrace();
        }
     }
      
     public void indexEnd(){//資料庫+lcene關閉
          
         ls.closeWriter();
         try {
                conn.close();//關閉資料庫
             } catch (SQLException e) {
                e.printStackTrace();
          }
     }
      
     public void Index6Data() throws SQLException{   
            try {
                stmt = (Statement) conn.createStatement();
            } catch (SQLException e1) {
                e1.printStackTrace();
            }
             
            ResultSet r1=null;
            ResultSet r2=null;
            ResultSet r3=null;
            ResultSet r4=null;
            ResultSet r5=null;
            ResultSet r6=null;
             
            boolean stop=false;
            do{
                 r1=deal6SourceTable("film_and_tv_info");
                 stop=this.createIndex(r1,ls,"1");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引 
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 //System.out.println("stop"+stop);
                 
            }while(!stop);
            
            stop=false;
            do{
                 r2=deal6SourceTable("music_and_mv_info");
                 stop=this.createIndex(r2,ls,"2");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
             
            stop=false;
            do{
                 r3=deal6SourceTable("e_book_info");
                 stop=this.createIndex(r3,ls,"3");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
             
            stop=false;
            do{
                 r4=deal6SourceTable("bt_file_info");
                 stop=this.createIndex(r4,ls,"4");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
             
            stop=false;
            do{
                 r5=deal6SourceTable("characteristic_software_info");
                 stop=this.createIndex(r5,ls,"5");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
             
            stop=false;
            do{
                 r6=deal6SourceTable("source_code_info");
                 stop=this.createIndex(r6,ls,"6");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
            stop=false;
            
     }
      
     public ResultSet deal2Share(String tableName) throws SQLException{
        String sql = "SELECT  distinct NAME,SKYDRIVER_NAME,USERNAME,SHARE_TIME,DESCRIB,TYPE_ID,ID,SHORTURL from "+tableName+" where STATUS=1  and FS_ID =`1` limit "+NUM; //利用FS_ID這個欄位，沒什麼用處 
        ResultSet rs = (ResultSet) stmt.executeQuery(sql);
        return rs;
    }
     
    public ResultSet deal3Share(String tableName) throws SQLException{
        String sql = "SELECT  distinct title,channel,uid,ctime,description,port,id,shorturl from "+tableName+" where name =`1` limit "+NUM;  
        ResultSet rs = (ResultSet) stmt.executeQuery(sql);
        return rs;
    }
     
    public void Index3Data() throws SQLException{
            try {
                stmt = (Statement) conn.createStatement();
            } catch (SQLException e1) {
                e1.printStackTrace();
            }
             
            ResultSet r1=null;
            ResultSet r2=null;
            ResultSet r3=null;
             
            boolean stop=false;
            do{
                 r1=deal2Share("share1");
                 stop=this.createIndex(r1,ls,"7");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 //System.out.println("stop"+stop);
                 
            }while(!stop);
            
            stop=false;
            do{
                 r2=deal2Share("share2");
                 stop=this.createIndex(r2,ls,"8");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
             
            stop=false;
            do{
                 r3=deal3Share("share3");
                 stop=this.createIndex(r3,ls,"9");   //給資料庫建立索引,此處執行一次，不要每次執行都建立索引，以後資料有更新可以後臺呼叫更新索引  
                 if(!stop){
                     ls.commit();//加個判斷條件
                 }
                 
            }while(!stop);
            stop=false;
        }
     
        public void update2ShareTable(String tableName) throws SQLException{
            Statement st = (Statement) conn.createStatement();
           String sql = "update "+tableName+" set FS_ID=0 where STATUS=1  and FS_ID =`1` limit "+NUM; //利用FS_ID這個欄位，沒什麼用處 
           //System.out.println("update"+sql);
            try {
                st.executeUpdate(sql);
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
         
        public void update3ShareTable(String tableName) throws SQLException{
            Statement st = (Statement) conn.createStatement();
           String sql = "update "+tableName+" set name=0 where name =`1` limit "+NUM;  
           //System.out.println("update"+sql);
            try {
                st.executeUpdate(sql);
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
            
        public boolean createIndex(ResultSet rs,LuceneSearch ls,String mark) {
            try {
                String tableName=null;
                if(mark.equals("1")){
                    tableName="film_and_tv_info";
                }
                if(mark.equals("2")){
                    tableName="music_and_mv_info";
                }
                if(mark.equals("3")){
                    tableName="e_book_info";
                }
                if(mark.equals("4")){
                    tableName="bt_file_info";
                }
                if(mark.equals("5")){
                    tableName="characteristic_software_info";
                }
                if(mark.equals("6")){
                    tableName="source_code_info";
                }
                if(mark.equals("7")){
                    tableName="share1";
                }
                if(mark.equals("8")){
                    tableName="share2";
                }
                if(mark.equals("9")){
                    tableName="share3";
                }
 
                boolean isNull=rs.next();
                //System.out.println("hehe"+isNull);
                if(isNull==false){
                    return true;//處理完畢
                }
                while(isNull){
                    if(Integer.parseInt(mark)>=1&&Integer.parseInt(mark)<=8){
                        Document doc = new Document();  
                        //System.out.println("name"+rs.getString("NAME"));        
                        Field name = new Field("name",rs.getString("NAME"),Field.Store.YES,Field.Index.ANALYZED);
                        String skName=rs.getString("SKYDRIVER_NAME");
                        if(skName==null){
                            skName="百度";
                        }
                        Field skydirverName = new Field("skydirverName",skName, Field.Store.YES,Field.Index.NOT_ANALYZED);
                        Field username = new Field("username",rs.getString("USERNAME"),Field.Store.YES, Field.Index.ANALYZED);    
                        Field shareTime = new Field("shareTime",rs.getString("SHARE_TIME"), Field.Store.YES,Field.Index.NOT_ANALYZED);
                        String desb=rs.getString("DESCRIB");
                        if(desb==null){
                            desb="-1";
                        }
                        Field describ = new Field("describ",desb,Field.Store.NO,Field.Index.NOT_ANALYZED);     
                        Field typeId = new Field("typeId",rs.getString("TYPE_ID"), Field.Store.YES,Field.Index.NOT_ANALYZED); 
                        Field id = new Field("id",rs.getString("ID"),Field.Store.YES,Field.Index.NOT_ANALYZED);
                        Field url =null;
                        if(Integer.parseInt(mark)>=7&&Integer.parseInt(mark)<=8){
                            url = new Field("url",rs.getString("SHORTURL"), Field.Store.YES,Field.Index.ANALYZED); 
                        }
                        else{
                            url = new Field("url",rs.getString("URL"), Field.Store.YES,Field.Index.ANALYZED);  
                        }
                        doc.add(name);
                        doc.add(skydirverName);
                        doc.add(username);
                        doc.add(shareTime);
                        doc.add(describ);
                        doc.add(typeId);
                        doc.add(id);
                        doc.add(url);
                        ls.singleUpdate(doc);//用跟新更為合適     
                        isNull=rs.next();
                    }
                    else{
                        Document doc = new Document();  
                        //System.out.println("title"+rs.getString("title"));        
                        Field name = new Field("name",rs.getString("title"),Field.Store.YES,Field.Index.ANALYZED);
                        String skName=rs.getString("channel");
                        Field skydirverName = new Field("skydirverName",skName, Field.Store.YES,Field.Index.NOT_ANALYZED);
                        Field username = new Field("username",rs.getString("uid"),Field.Store.YES, Field.Index.ANALYZED);     
                        Field shareTime = new Field("shareTime",rs.getString("ctime"), Field.Store.YES,Field.Index.NOT_ANALYZED);
                        String desb=rs.getString("description");
                        if(desb==null){
                            desb="-1";
                        }
                        Field describ = new Field("describ",desb,Field.Store.NO,Field.Index.NOT_ANALYZED);     
                        Field typeId = new Field("typeId",rs.getString("port"), Field.Store.YES,Field.Index.NOT_ANALYZED);
                        Field id = new Field("id",rs.getString("id"),Field.Store.YES,Field.Index.NOT_ANALYZED);    
                        Field url = new Field("url",rs.getString("shorturl"), Field.Store.YES,Field.Index.ANALYZED);  
                         
                        doc.add(name);
                        doc.add(skydirverName);
                        doc.add(username);
                        doc.add(shareTime);
                        doc.add(describ);
                        doc.add(typeId);
                        doc.add(id);
                        doc.add(url);
                        ls.singleUpdate(doc);//用跟新更為合適     
                        isNull=rs.next();
                    }
                    count=count+1;
                }
                if(Integer.parseInt(mark)>=1&&Integer.parseInt(mark)<=6){
                    update6SourceTable(tableName);//處理完成後做標誌
                }
                else if(Integer.parseInt(mark)>=7&&Integer.parseInt(mark)<=8){
                    update2ShareTable(tableName);//處理完成後做標誌
                }
                else{
                    update3ShareTable(tableName);//處理完成後做標誌
                }
                System.out.println("Has index "+count+"條資料，資料來自表"+tableName);
                 
            } catch (Exception e) {
                e.printStackTrace();
            }
            return false;
        }
}
資料庫之類的請不要關心，看思路即可，你如果需要換成你的即可，這裡就不多說了。

看最後的部分：

package com.tray.indexData;
 
import java.sql.SQLException;
 
 
 
public class Application {
     
    public static void main(String[] args){
        /*IndexFile indexFile=new IndexFile();
        indexFile.indexInit();
        try {
            indexFile.Index6Data();
        } catch (SQLException e1) {
            e1.printStackTrace();
        }
        indexFile.indexEnd();*/
         
        IndexFile indexFile1=new IndexFile();
        indexFile1.indexInit();
        try {
            indexFile1.Index3Data();
        } catch (SQLException e1) {
            e1.printStackTrace();
        }
        indexFile1.indexEnd();
         
        LuceneSearch lch=new LuceneSearch();
        try {
            long a = System.currentTimeMillis();
            lch.highLightSearch("name", "flv", 1,3);
            long b = System.currentTimeMillis();
            long c = b - a;
            System.out.println("[高階檢索花費時間：" + c + "毫秒]");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

你可以在一個applicationic程式中開始索引，也可以寫個定時器來定時索引，看需求。以上程式碼是樓主幸苦的作品，轉載請不要改動，本人確保程式碼完全可用。本人建個qq群，歡迎大家一起交流技術，群號：512245829 喜歡微博的朋友關注：轉盤娛樂即可

中文搜尋引擎技術揭密：中文分詞
2020-04-05
中文分詞
搜尋引擎es-分詞與搜尋
2024-08-27
分詞
solr搜尋分詞優化
2018-03-10
Solr分詞優化
79. 單詞搜尋
2024-11-15
riot 搜尋引擎 v0.10.0 釋出, 優化引擎和分詞等
2017-11-21
優化分詞
【搜尋引擎】 PostgreSQL 10 實時全文檢索和分詞、相似搜尋、模糊匹配實現類似Google搜尋自動提示
2019-07-11
SQL分詞Go
網站最佳化搜尋引擎與關鍵詞
2022-12-20
網站
jquery搜尋關鍵詞高亮效果
2017-04-18
jQuery
搜尋引擎-03-搜尋引擎原理
2024-04-04
搜尋引擎核心技術與演算法 —— 詞項詞典與倒排索引優化
2020-01-09
演算法索引優化
【搜尋引擎】Solr Suggester 實現全文檢索功能-分詞和和自動提示
2019-06-26
Solr分詞
搜尋引擎關鍵詞劫持之php篇(原始碼與分析)
2015-07-26
PHP原始碼
BM42：語義搜尋與關鍵詞搜尋結合
2024-07-04
[教程三] 寫一個搜尋：自定義詞庫；使用 Laravel Scout，Elasticsearch，ik 分詞
2017-03-06
LaravelElasticsearch分詞
搜尋引擎核心技術與演算法 —— 詞項詞典與倒排索引最佳化
2020-01-09
演算法索引
MySQL單詞搜尋相關度排名
2021-01-15
MySql
語音技術——關鍵詞搜尋
2020-09-12
API介面獲取搜尋詞統計？
2023-04-10
API
海量資料搜尋---搜尋引擎
2018-11-13
[教程一] 寫一個搜尋：使用 Laravel Scout，Elasticsearch，ik 分詞
2017-03-05
LaravelElasticsearch分詞
LeetCode-079-單詞搜尋
2021-11-12
LeetCode
拼多多獲得搜尋詞推薦 API
2023-03-01
API
41_初識搜尋引擎_分詞器的內部組成到底是什麼，以及內建分詞器的介紹
2024-10-02
分詞
[教程二] 寫一個搜尋：解決搜尋結果高亮問題，使用 Laravel Scout，Elasticsearch，ik 分詞
2017-03-06
LaravelElasticsearch分詞
【Leetcode 346/700】79. 單詞搜尋【中等】【回溯深度搜尋JavaScript版】
2022-05-14
LeetCodeJavaScript
Trie|如何用字典樹實現搜尋引擎的關鍵詞提示功能
2019-06-25
微信小程式實現搜尋關鍵詞高亮
2021-03-29
微信小程式
[LeetCode題解]79. 單詞搜尋
2020-09-09
LeetCode
Elasticsearch：使用同義詞 synonyms 來提高搜尋效率
2021-11-03
Elasticsearch
ElasticSearch全文搜尋引擎
2019-07-29
Elasticsearch
搜尋引擎語法
2016-05-09
搜尋引擎命令大全
2013-07-31
搜尋引擎程式碼
2005-05-08
搜尋引擎面試題
2013-04-02
面試題
淘寶API介面：獲得關鍵詞搜尋推薦
2023-02-27
API
將搜尋關鍵詞高亮顯示例項程式碼
2017-03-15
jQuery的搜尋關鍵詞自動匹配外掛
2012-06-05
jQuery
Lumia網路搜尋排名超越Android關鍵詞
2012-05-22
Android

java+lucene中文分詞，搜尋引擎搜詞剖析

相關文章