Sphinx是一個基於SQL的全文檢索引擎,可以結合MySQL,PostgreSQL做全文搜尋,提供比資料庫本身更專業的搜尋功能特別為MySQL也設計了一個儲存引擎外掛,從此拋棄模糊查詢吧。
Sphinx 單一索引最大可包含1億條記錄,在1千萬條記錄情況下的查詢速度為0.x秒(毫秒級)。Sphinx建立100萬條記錄的索引只要 3、4分鐘,建立1000萬條記錄的索引可以在50分鐘內完成,而重建一次只包含最新10萬條記錄的增量索引只需幾十秒。
一、安裝
環境:centos6.5
yum install sphinx -y1
預設配置路徑 /etc/sphinx/ ,在該路徑下,有配置檔案sphinx.conf,看看我的配置
# 資料來源,這裡配置的是mysqlsource src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = beego_blog
sql_port = 3306 # optional, default is 3306
# 建立索引時候,從資料庫查詢資料的SQL
sql_query = \
SELECT id, userid, UNIX_TIMESTAMP(posttime) AS posttime, title, content, tags \
FROM tb_post
sql_attr_uint = userid
sql_attr_timestamp = posttime
sql_query_info = SELECT * FROM tb_post WHERE id=$id}# 索引1index test1
{ # 指定資料來源
source = src1 # 索引檔案路徑
path = /var/lib/sphinx/test1 # 儲文件資訊的方式 extern
docinfo = extern
charset_type = sbcs
}# 索引2index testrt
{
type = rt
rt_mem_limit = 32M
path = /var/lib/sphinx/testrt
charset_type = utf-8
rt_field = title
rt_field = content
rt_attr_uint = userid
}
indexer
{
mem_limit = 32M
}
searchd
{
listen = 0.0.0.0:9312 # 索引對外提供服務的地址
listen = 9306:mysql41 log = /var/log/sphinx/searchd.log
query_log = /var/log/sphinx/query.log
read_timeout = 5
max_children = 30
pid_file = /var/run/sphinx/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
binlog_path = /var/lib/sphinx
}12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
生成索引,這裡我們用上文配置的索引名稱test1來從mysql獲取資料。因此,我們先在myslq中,建立表和資料
CREATE TABLE `tb_post` ( `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `userid` mediumint(8) unsigned NOT NULL DEFAULT '0' COMMENT '使用者id', `title` varchar(100) NOT NULL DEFAULT '' COMMENT '標題', `content` mediumtext NOT NULL COMMENT '內容', `tags` varchar(100) NOT NULL DEFAULT '' COMMENT '標籤', `posttime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT '釋出時間', PRIMARY KEY (`id`)
);INSERT INTO `tb_post` VALUES ('1', '1', 'epoll邊沿觸發漏報訊息包問題', '開發一個即時通訊後臺,底層的網路收發使用 epoll + main loop實現網路事件', ',技術,', '2016-08-05 11:50:02');INSERT INTO `tb_post` VALUES ('2', '1', 'epoll 邊沿觸發和水平觸發區別實戰講解', 'epoll,看結果發現只接入了兩條,還有3條沒接入。說明高併發時,會出現客戶端連線不上的問題。', ',技術,', '2016-08-05 22:03:23');INSERT INTO `tb_post` VALUES ('3', '1', '快速排序演算法', '快速排序演算法是一個挺經典的演算法,值得我們學習', ',技術,', '2016-08-05 23:08:00');123456789101112
建立索引
[root@centos6 data]# indexer test1Sphinx 2.0.8-id64-release (r3831)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc ()
using config file '/etc/sphinx/sphinx.conf'...indexing index 'test1'...collected 37 docs, 0.8 MB
sorted 0.1 Mhits, 100.0% done
total 37 docs, 833156 bytes
total 0.082 sec, 10061176 bytes/sec, 446.81 docs/sec
total 3 reads, 0.000 sec, 57.7 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 40.2 kb/call avg, 0.0 msec/call avg12345678910111213
可以看索引了37條文件,我們可以在命令列測試下效果
[root@centos6 libertyblog]# search epoll|moreSphinx 2.0.8-id64-release (r3831)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc ()
using config file '/etc/sphinx/sphinx.conf'...index 'test1': query 'epoll ': returned 2 matches of 2 total in 0.000 sec
displaying matches:1. document=59, weight=2831, userid=1, posttime=Fri Aug 5 22:03:23 2016
id=59
userid=1
title=epoll ???????????????
content=開發一個即時通訊後臺,底層的網路收發使用 epoll + main loop實現網路事件
......12345678910111213141516
結果匹配到了兩條資料,篇幅有限,就不全列出來了。資料 1. document=59, weight=2831 表示該索引文件編號59,權重2831。以上是命令列操作,如果我們要對外提供服務,還需要啟動searchd服務程式
[root@centos6 data]# service searchd start正在啟動 searchd:Sphinx 2.0.8-id64-release (r3831)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc ()
using config file '/etc/sphinx/sphinx.conf'...WARNING: compat_sphinxql_magics=1 is deprecated; please update your application and config
listening on 127.0.0.1:9312listening on all interfaces, port=9306precaching index 'test1'precaching index 'testrt' precached 2 indexes in 0.002 sec
[確定]12345678910111213
啟動成功,繫結了埠9312,我們檢視下狀態
[root@centos6 data]# searchd --statusSphinx 2.0.8-id64-release (r3831)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc ()
using config file '/etc/sphinx/sphinx.conf'...searchd status
--------------
uptime: 252 connections: 1 maxed_out: 0 command_search: 0 command_excerpt: 0 command_update: 0 command_keywords: 0 command_persist: 0 command_status: 1 command_flushattrs: 0 agent_connect: 0 agent_retry: 0 queries: 0 dist_queries: 0 query_wall: 0.000 query_cpu: OFF
dist_wall: 0.000 dist_local: 0.000 dist_wait: 0.000 query_reads: OFF
query_readkb: OFF
query_readtime: OFF
avg_query_wall: 0.000 avg_query_cpu: OFF
avg_dist_wall: 0.000 avg_dist_local: 0.000 avg_dist_wait: 0.000 avg_query_reads: OFF
avg_query_readkb: OFF
avg_query_readtime: OFF 123456789101112131415161718192021222324252627282930313233343536373839
現在我們用一個第三方客戶端訪問該服務(golang語言開發)
package mainimport ( "github.com/yunge/sphinx"
"log")func main() {
SphinxClient := sphinx.NewClient().SetServer("localhost", 0).SetConnectTimeout(5000) if err := SphinxClient.Error(); err != nil {
log.Fatal(err) return
} // 查詢,第一個引數是我們要查詢的關鍵字,第二個是索引名稱test1,第三個是備註
res, err := SphinxClient.Query("epoll", "test1", "search article!") if err != nil {
log.Fatal(err) return
} var article_ids string
for _, match := range res.Matches {
article_ids += fmt.Sprintf("%d,", match.DocId)
}
log.Println(article_ids)
SphinxClient.Close()
}12345678910111213141516171819202122232425262728
列印結果,是 { 1 2 } ,這兩個id,沒有id為3的,說明索引查詢是準確的,因為3裡面沒有epoll這個單詞,而1和2裡面都有epoll。至此,我們的測試完成,可以把此功能和自己網站的搜尋框對接,以前都是用模糊查詢的方式,在資料庫中 like ‘%’ 某某,這樣效率其實很低,資料多的時候要等半天,現在用第三方索引來實現,速度快好幾個量級。
如果有新的資料插入,或者更新資料,是需要做 增量索引 的,很簡單
[root@centos6 data]# indexer --rotate test1Sphinx 2.0.8-id64-release (r3831)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc ()
using config file '/etc/sphinx/sphinx.conf'...indexing index 'test1'...collected 37 docs, 0.8 MB
sorted 0.1 Mhits, 100.0% done
total 37 docs, 833156 bytes
total 0.081 sec, 10184036 bytes/sec, 452.26 docs/sec
total 3 reads, 0.000 sec, 57.7 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 40.2 kb/call avg, 0.1 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=12074).1234567891011121314
最好把增量索引的操作放到crontab中,定時做增量,以保持索引最新。以下是每天2點做一次增量索引
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30239065/viewspace-2722033/,如需轉載,請註明出處,否則將追究法律責任。