【ElasticSearch篇】--ElasticSearch從初識到安裝和應用

LHBlog發表於2018-03-26

原文網址 : https://www.cnblogs.com/LHWorldBlog/p/8527015.html

一、前述

ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面，在企業中全文搜尋時，特別常用。

二、常用概念

cluster

代表一個叢集，叢集中有多個節點，其中有一個為主節點，這個主節點是可以通過選舉產生的，主從節點是對於叢集內部來說的。es的一個概念就是去中心化，字面上理解就是無中心節點，這是對於叢集外部來說的，因為從外部來看es叢集，在邏輯上是個整體，你與任何一個節點的通訊和與整個es叢集通訊是等價的。只需要在同一個網段之內啟動多個es節點，就可以自動組成一個叢集。預設情況下es會自動發現同一網段內的節點，自動組成叢集。

shards
代表索引分片，es可以把一個完整的索引分成多個分片，這樣的好處是可以把一個大的索引拆分成多個，分佈到不同的節點上。構成分散式搜尋。分片的數量只能在索引建立前指定，並且索引建立後不能更改。

replicas
代表索引副本，es可以設定多個索引的副本，副本的作用一是提高系統的容錯性，當某個節點某個分片損壞或丟失時可以從副本中恢復。二是提高es的查詢效率，es會自動對搜尋請求進行負載均衡。

recovery
代表資料恢復或叫資料重新分佈，es在有節點加入或退出時會根據機器的負載對索引分片進行重新分配，掛掉的節點重新啟動時也會進行資料恢復。

river
代表es的一個資料來源，也是其它儲存方式（如：資料庫）同步資料到es的一個方法。它是以外掛方式存在的一個es服務，通過讀取river中的資料並把它索引到es中，官方的river有couchDB的，RabbitMQ的，Twitter的，Wikipedia的。

gateway
代表es索引快照的儲存方式，es預設是先把索引存放到記憶體中，當記憶體滿了時再持久化到本地硬碟。gateway對索引快照進行儲存，當這個es叢集關閉再重新啟動時就會從gateway中讀取索引備份資料。es支援多種型別的gateway，有本地檔案系統（預設），分散式檔案系統，Hadoop的HDFS和amazon的s3雲端儲存服務。

如果需要將資料落地到hadoop的hdfs需要先安裝外掛elasticsearch/elasticsearch-hadoop，然後再elasticsearch.yml配置
gateway: type: hdfsgateway: hdfs: uri: hdfs://localhost:9000

discovery.zen
代表es的自動發現節點機制，es是一個基於p2p的系統，它先通過廣播尋找存在的節點，再通過多播協議來進行節點之間的通訊，同時也支援點對點的互動。

Transport
代表es內部節點或叢集與客戶端的互動方式，預設內部是使用tcp協議進行互動，同時它支援http協議（json格式）、thrift、servlet、memcached、zeroMQ等的傳輸協議（通過外掛方式整合）。

索引（index）
一個索引就是一個擁有幾分相似特徵的文件的集合。比如說，你可以有一個客戶資料的索引，另一個產品目錄的索引，還有一個訂單資料的索引。一個索引由一個名字來標識（必須全部是小寫字母的），並且當我們要對對應於這個索引中的文件進行索引、搜尋、更新和刪除的時候，都要使用到這個名字。在一個叢集中，如果你想，可以定義任意多的索引。

型別（type）
在一個索引中，你可以定義一種或多種型別。一個型別是你的索引的一個邏輯上的分類/分割槽，其語義完全由你來定。通常，會為具有一組共同欄位的文件定義一個型別。比如說，我們假設你運營一個部落格平臺並且將你所有的資料儲存到一個索引中。在這個索引中，你可以為使用者資料定義一個型別，為部落格資料定義另一個型別，當然，也可以為評論資料定義另一個型別。

文件（document）
一個文件是一個可被索引的基礎資訊單元。比如，你可以擁有某一個客戶的文件，某一個產品的一個文件，當然，也可以擁有某個訂單的一個文件。文件以JSON（Javascript Object Notation）格式來表示，而JSON是一個到處存在的網際網路資料互動格式。在一個index/type裡面，只要你想，你可以儲存任意多的文件。注意，儘管一個文件，物理上存在於一個索引之中，文件必須被索引/賦予一個索引的type。

三、安裝教程

1、前提

只允許普通使用者操作，不允許root使用者

注意：因為elasticsearch有遠端執行指令碼的功能所以容易中木馬病毒，所以不允許用root使用者啟動，root使用者是起不來的，賦許可權，用一般的使用者啟動

要配置network.host才能別的機器或者網路卡訪問，否則只能是127.0.0.1或者localhost訪問，這裡配置成自己的區域網ip

注意配置yml結尾的配置檔案都需要冒號後面加空格才行

2、先要安裝一個解壓工具：

yum install unzip

3、不同節點間建立相同目錄和使用者

下載elasticsearch-2.0.1.tar.zip，不要用root解壓，需要切換使用者

root共同建立 es 目錄（共享模式）：

mkdir /opt/soft/es

3個節點同時新增（共享命令模式）使用者並提供密碼：

useradd sxt

echo sxt | passwd --stdin sxt

改變目錄許可權（共享命令模式）：

chown sxt:sxt es

4、解壓並配置

切換使用者為sxt
注意配置yml結尾的配置檔案都需要冒號後面加空格才行
使用sxt這個使用者解壓並進入es 目錄的config配置目錄修改配置檔案config/elasticsearch.yml：注意：這個檔案的所有行的：的後面，一定要有一個空格！

如果要配置叢集需要兩個節點上的elasticsearch配置的cluster.name相同，都啟動可以自動組成叢集，這裡如果不改cluster.name則預設是cluster.name=elasticsearch，

nodename隨意取但是叢集內的各節點不能相同。

主機和埠

新增防腦裂配置：如果不配不知道具體數量，不好控制腦裂

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: ["192.168.133.101","192.168.133.102", "192.168.133.103"]

discovery.zen.ping_timeout: 120s

client.transport.ping_timeout: 60s

完整配置如下：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: CKL_elasticsearch
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node01
#
# Add custom attributes to the node:
#
# node.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
#
# Path to log files:
#
# path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
# bootstrap.mlockall: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
# network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Disable starting multiple nodes on a single system:
#
# node.max_local_storage_nodes: 1
#
# Require explicit names when deleting indices:
#
# action.destructive_requires_name: true
discovery.zen.ping.multicast.enabled: false 
discovery.zen.ping.unicast.hosts: ["192.168.31.101","192.168.133.102", "192.168.133.103"]
discovery.zen.ping_timeout: 120s
client.transport.ping_timeout: 60s