近端時間在搬磚過程中對es進行了操作,但是對es查詢文件不熟悉,所以這兩週都在研究es,簡略看了《Elasticsearch權威指南》,摸摸魚又是一天。
es是一款基於Lucene的實時分散式搜尋和分析引擎,今天我們不聊其應用場景,聊一下es索引增刪改。
環境:Centos 7,Elasticsearch6.8.3,jdk8
(最新的es是7版本,7版本需要jdk11以上,所以裝了es6.8.3版本。)
下面都將以student索引為例
一、建立索引
PUT http://192.168.197.100:9200/student { "mapping":{ "_doc":{ //“_doc”是型別type,es6中一個索引下只有一個type,不能有其它type "properties":{ "id": { "type": "keyword" }, "name":{ "type":"text", "index":"analyzed", "analyzer":"standard" }, "age":{ "type":"integer", "fields": { "keyword": { "type": "keyword", "ignore_above":256 } } }, "birthday":{ "type":"date" }, "gender":{ "type":"keyword" }, "grade":{ "type":"text", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "class":{ "type":"text", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } } } } }, "settings":{ //主分片數量 "number_of_shards" : 1, //分片副本數量 "number_of_replicas" : 1 } }
type屬性是text和keyword的區別:
(1)text在查詢的時候會被分詞,用於搜尋
(2)keyword在查詢的時候不會被分詞,用於聚合
index屬性是表示字串以何種方式被索引,有三種值
(1)analyzed:欄位可以被模糊匹配,類似於sql中的like
(2)not_analyzed:欄位只能精確匹配,類似於sql中的“=”
(3)no:欄位不提供搜尋
analyzer屬性是設定分詞器,中文的話一般是ik分詞器,也可以自定義分詞器。
number_of_shards屬性是主分片數量,預設是5,建立之後不能修改
number_of_replicas屬性時分片副本數量,預設是1,可以修改
建立成功之後會返回如下json字串
{ "acknowledged": true, "shards_acknowledged": true, "index": "student"}
建立之後如何檢視索引的詳細資訊呢?
GET http://192.168.197.100:9200/student/_mapping
es6版本,索引之下只能有一個型別,例如上文中的“_doc”。
es跟關係型資料庫比較:
二、修改索引
//修改分片副本數量為2 PUT http://192.168.197.100:9200/student/_settings { "number_of_replicas":2 }
三、刪除索引
//刪除單個索引 DELETE http://192.168.197.100:9200/student //刪除所有索引 DELETE http://192.168.197.100:9200/_all
四、預設分詞器standard和ik分詞器比較
es預設的分詞器是standard,它對英文的分詞是以空格分割的,中文則是將一個詞分成一個一個的文字,所以其不適合作為中文分詞器。
例如:standard對英文的分詞
//此api是檢視文字分詞情況的 POST http://192.168.197.100:9200/_analyze { "text":"the People's Republic of China", "analyzer":"standard" }
結果如下:
{ "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "<ALPHANUM>", "position": 0 }, { "token": "people's", "start_offset": 4, "end_offset": 12, "type": "<ALPHANUM>", "position": 1 }, { "token": "republic", "start_offset": 13, "end_offset": 21, "type": "<ALPHANUM>", "position": 2 }, { "token": "of", "start_offset": 22, "end_offset": 24, "type": "<ALPHANUM>", "position": 3 }, { "token": "china", "start_offset": 25, "end_offset": 30, "type": "<ALPHANUM>", "position": 4 } ] }
對中文的分詞:
POST http://192.168.197.100:9200/_analyze { "text":"中華人民共和國萬歲", "analyzer":"standard" }
結果如下:
{ "tokens": [ { "token": "中", "start_offset": 0, "end_offset": 1, "type": "<IDEOGRAPHIC>", "position": 0 }, { "token": "華", "start_offset": 1, "end_offset": 2, "type": "<IDEOGRAPHIC>", "position": 1 }, { "token": "人", "start_offset": 2, "end_offset": 3, "type": "<IDEOGRAPHIC>", "position": 2 }, { "token": "民", "start_offset": 3, "end_offset": 4, "type": "<IDEOGRAPHIC>", "position": 3 }, { "token": "共", "start_offset": 4, "end_offset": 5, "type": "<IDEOGRAPHIC>", "position": 4 }, { "token": "和", "start_offset": 5, "end_offset": 6, "type": "<IDEOGRAPHIC>", "position": 5 }, { "token": "國", "start_offset": 6, "end_offset": 7, "type": "<IDEOGRAPHIC>", "position": 6 }, { "token": "萬", "start_offset": 7, "end_offset": 8, "type": "<IDEOGRAPHIC>", "position": 7 }, { "token": "歲", "start_offset": 8, "end_offset": 9, "type": "<IDEOGRAPHIC>", "position": 8 } ] }
ik分詞器是支援對中文進行詞語分割的,其有兩個分詞器,分別是ik_smart和ik_max_word。
(1)ik_smart:對中文進行最大粒度的劃分,簡略劃分
例如:
POST http://192.168.197.100:9200/_analyze { "text":"中華人民共和國萬歲", "analyzer":"ik_smart" }
結果如下:
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "萬歲", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 1 } ] }
(2)ik_max_word:對中文進行最小粒度的劃分,將文字劃分儘量多的詞語
例如:
POST http://192.168.197.100:9200/_analyze { "text":"中華人民共和國萬歲", "analyzer":"ik_max_word" }
結果如下:
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中華人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中華", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "華人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和國", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和國", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "國", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, { "token": "萬歲", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 }, { "token": "萬", "start_offset": 7, "end_offset": 8, "type": "TYPE_CNUM", "position": 10 }, { "token": "歲", "start_offset": 8, "end_offset": 9, "type": "COUNT", "position": 11 } ] }
ik分詞器對英文的分詞:
POST http://192.168.197.100:9200/_analyze { "text":"the People's Republic of China", "analyzer":"ik_smart" }
結果如下:會將不重要的詞去掉,但standard分詞器會保留(英語水平已經退化到a an the都不知道是屬於什麼型別的詞了,身為中國人,這個不能驕傲)
{ "tokens": [ { "token": "people", "start_offset": 4, "end_offset": 10, "type": "ENGLISH", "position": 0 }, { "token": "s", "start_offset": 11, "end_offset": 12, "type": "ENGLISH", "position": 1 }, { "token": "republic", "start_offset": 13, "end_offset": 21, "type": "ENGLISH", "position": 2 }, { "token": "china", "start_offset": 25, "end_offset": 30, "type": "ENGLISH", "position": 3 } ] }
五、新增文件
可以任意新增欄位
//1是“_id”的值,唯一的,也可以隨機生成 POST http://192.168.197.100:9200/student/_doc/1 { "id":1, "name":"tom", "age":20, "gender":"male", "grade":"7", "class":"1" }
六、更新文件
POST http://192.168.197.100:9200/student/_doc/1/_update { "doc":{ "name":"jack" } }
七、刪除文件
//1是“_id”的值 DELETE http://192.168.197.100:9200/student/_doc/1
上述就是簡略的對es進行索引建立,修改,刪除,文件新增,刪除,修改等操作,為避免篇幅太長,文件查詢操作將在下篇進行更新。