前言
最近因為工作原因,需要安裝一個 kafka 叢集,目前網路上有很多相關的教程,按著步驟來也能完成安裝,只是這些教程都顯得略微繁瑣。因此,我寫了這篇文章幫助大家快速完成 kafka 叢集安裝。
安裝步驟
準備多臺伺服器,數量建議為奇數(如:3,5,7 等),作業系統為 CentOS 7+。
這裡使用 3 臺伺服器作為例子,IP 分別為 192.168.1.1、192.168.1.2、192.168.1.3,修改下述指令碼檔案的 IP 地址,並拷貝到 3 臺伺服器上分別執行即可完成安裝。
#!/bin/bash # Modify the link if you want to download other version KAFKA_DOWNLOAD_URL="https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz" # Please use your own server ip SERVERS=("192.168.1.1" "192.168.1.2" "192.168.1.3") ID=0 MECHINE_IP=$(hostname -i) echo "Mechine IP: "${MECHINE_IP} LENGTH=${#SERVERS[@]} for (( i=0; i<${LENGTH}; i++ )); do if [ "${SERVERS[$i]}" = "${MECHINE_IP}" ]; then ID=$((i+1)) fi done echo "ID: "${ID} if [ "${ID}" -eq "0" ]; then echo "Mechine IP is not matched to server list" exit 1 fi ZOOKEEPER_CONNECT=$(printf ",%s:2181" "${SERVERS[@]}") ZOOKEEPER_CONNECT=${ZOOKEEPER_CONNECT:1} echo "Zookeeper Connect: "${ZOOKEEPER_CONNECT} echo "---------- Update yum ----------" yum update -y yum install -y wget echo "---------- Install java ----------" yum -y install java-1.8.0-openjdk java -version echo "---------- Create kafka user & group ----------" groupadd -r kafka useradd -g kafka -r kafka -s /bin/false echo "---------- Download kafka ----------" cd /opt wget ${KAFKA_DOWNLOAD_URL} -O kafka.tgz mkdir -p kafka tar -xzf kafka.tgz -C kafka --strip-components=1 chown -R kafka:kafka /opt/kafka echo "---------- Install and start zookeeper ----------" mkdir -p /data/zookeeper chown -R kafka:kafka /data/zookeeper echo "${ID}" > /data/zookeeper/myid # zookeeper config # https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_configuration cat <<EOF > /opt/kafka/config/zookeeper-cluster.properties # the directory where the snapshot is stored. dataDir=/data/zookeeper # the port at which the clients will connect clientPort=2181 # setting number of connections to unlimited maxClientCnxns=0 # keeps a heartbeat of zookeeper in milliseconds tickTime=2000 # time for initial synchronization initLimit=10 # how many ticks can pass before timeout syncLimit=5 # define servers ip and internal ports to zookeeper EOF for (( i=0; i<${LENGTH}; i++ )); do INDEX=$((i+1)) echo "server.${INDEX}=${SERVERS[$i]}:2888:3888" >> /opt/kafka/config/zookeeper-cluster.properties done # zookeeper.service cat <<EOF > /usr/lib/systemd/system/zookeeper.service [Unit] Description=Apache Zookeeper server (Kafka) Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper-cluster.properties ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start zookeeper && systemctl enable zookeeper echo "---------- Install and start kafka ----------" mkdir -p /data/kafka chown -R kafka:kafka /data/kafka # kafka config # https://kafka.apache.org/documentation/#configuration cat <<EOF > /opt/kafka/config/server-cluster.properties # The id of the broker. This must be set to a unique integer for each broker. broker.id=${ID} # Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured. Otherwise, it will use the value # returned from java.net.InetAddress.getCanonicalHostName(). advertised.listeners=PLAINTEXT://${MECHINE_IP}:9092 # A comma separated list of directories under which to store log files log.dirs=/data/kafka # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state" # For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3. offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 # The minimum age of a log file to be eligible for deletion due to age log.retention.hours=168 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=300000 # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=${ZOOKEEPER_CONNECT}/kafka # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=60000 EOF # kafka.service cat <<EOF > /usr/lib/systemd/system/kafka.service [Unit] Description=Apache Kafka server (broker) Documentation=http://kafka.apache.org/documentation.html Requires=network.target remote-fs.target After=network.target remote-fs.target kafka-zookeeper.service [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server-cluster.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start kafka && systemctl enable kafka
基本操作
# 啟動 zookeeper systemctl start zookeeper # 停止 zookeeper systemctl stop zookeeper # 重啟 zookeeper systemctl restart zookeeper # 檢視 zookeeper 日誌 systemctl status zookeeper -l # 啟動 kafka systemctl start kafka # 停止 kafka systemctl stop kafka # 重啟 kafka systemctl restart kafka # 檢視 kafka 日誌 systemctl status kafka -l
簡單測試
# 進入 kafka bin 目錄 cd /opt/kafka/bin/ # 建立一個 topic kafka-topics.sh --create --topic test --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092 # 檢視 topic 描述 kafka-topics.sh --topic test --describe --bootstrap-server localhost:9092 # 啟動生產者然後輸入訊息 kafka-console-producer.sh --topic test --bootstrap-server localhost:9092 # 啟動消費者消費訊息 kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092 # 刪除 topic kafka-topics.sh --topic test --delete --bootstrap-server localhost:9092
指令碼說明
1. 以下程式碼主要指定下載 kafka 的版本以及伺服器 IP 列表,可根據實際情況進行調整。
# Modify the link if you want to download other version KAFKA_DOWNLOAD_URL="https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz" # Please use your own server ip SERVERS=("192.168.1.1" "192.168.1.2" "192.168.1.3")
2. 以下程式碼主要用於生成 zookeeper id 和 kafka broker id 以及拼接 kafka 配置中的 zookeeper 連線串,通過本機 IP 與填寫的 IP 列表進行匹配,如果本機 IP 等於第一個伺服器 IP,則 ID為 1,等於第二個伺服器 IP,則 ID為 2,等於第二個伺服器 IP,則 ID為 3,以此類推;本機 IP 不在填寫的 IP 列表中,則會退出安裝。
ID=0 MECHINE_IP=$(hostname -i) echo "Mechine IP: "${MECHINE_IP} LENGTH=${#SERVERS[@]} for (( i=0; i<${LENGTH}; i++ )); do if [ "${SERVERS[$i]}" = "${MECHINE_IP}" ]; then ID=$((i+1)) fi done echo "ID: "${ID} if [ "${ID}" -eq "0" ]; then echo "Mechine IP is not matched to server list" exit 1 fi ZOOKEEPER_CONNECT=$(printf ",%s:2181" "${SERVERS[@]}") ZOOKEEPER_CONNECT=${ZOOKEEPER_CONNECT:1}
3. 更新 yum 源,並安裝 wget 下載工具
yum update -y yum install -y wget
4. 安裝 java 8
yum -y install java-1.8.0-openjdk java -version
5. 建立 kafka 使用者及組
groupadd -r kafka useradd -g kafka -r kafka -s /bin/false
6. 下載並解壓 kafka 可執行程式
cd /opt wget ${KAFKA_DOWNLOAD_URL} -O kafka.tgz mkdir -p kafka tar -xzf kafka.tgz -C kafka --strip-components=1 chown -R kafka:kafka /opt/kafka
7. 建立 zookeeper 目錄,建立 zookeeper id
mkdir -p /data/zookeeper chown -R kafka:kafka /data/zookeeper echo "${ID}" > /data/zookeeper/myid
8. 生成 zookeeper 配置檔案,詳細說明可參考:https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_configuration
cat <<EOF > /opt/kafka/config/zookeeper-cluster.properties # the directory where the snapshot is stored. dataDir=/data/zookeeper # the port at which the clients will connect clientPort=2181 # setting number of connections to unlimited maxClientCnxns=0 # keeps a heartbeat of zookeeper in milliseconds tickTime=2000 # time for initial synchronization initLimit=10 # how many ticks can pass before timeout syncLimit=5 # define servers ip and internal ports to zookeeper EOF for (( i=0; i<${LENGTH}; i++ )); do INDEX=$((i+1)) echo "server.${INDEX}=${SERVERS[$i]}:2888:3888" >> /opt/kafka/config/zookeeper-cluster.properties done
9. 建立 zookeeper systemd 管理檔案,啟動並設定開機啟動 zookeeper
cat <<EOF > /usr/lib/systemd/system/zookeeper.service [Unit] Description=Apache Zookeeper server (Kafka) Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper-cluster.properties ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start zookeeper && systemctl enable zookeeper
10. 建立 kafka 目錄
mkdir -p /data/kafka chown -R kafka:kafka /data/kafka
11. 生成 kafka 配置檔案,詳細說明可參考:https://kafka.apache.org/documentation/#configuration
cat <<EOF > /opt/kafka/config/server-cluster.properties # The id of the broker. This must be set to a unique integer for each broker. broker.id=${ID} # Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured. Otherwise, it will use the value # returned from java.net.InetAddress.getCanonicalHostName(). advertised.listeners=PLAINTEXT://${MECHINE_IP}:9092 # A comma separated list of directories under which to store log files log.dirs=/data/kafka # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state" # For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3. offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 # The minimum age of a log file to be eligible for deletion due to age log.retention.hours=168 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=300000 # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=${ZOOKEEPER_CONNECT}/kafka # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=60000 EOF
12. 建立 kafka systemd 管理檔案,啟動並設定開機啟動 kafka
cat <<EOF > /usr/lib/systemd/system/kafka.service [Unit] Description=Apache Kafka server (broker) Documentation=http://kafka.apache.org/documentation.html Requires=network.target remote-fs.target After=network.target remote-fs.target kafka-zookeeper.service [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server-cluster.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start kafka && systemctl enable kafka
總結
按照上述的操作,你將快速完成 kafka 叢集安裝,如有問題可以在文章留言。