Apache Kafka 是一種分散式流式平臺
Kafka基本搭建 :
wget tar zxvf kafka_2.11- cd kafka_2.11-
Step2: 啟動Server
Kafka使用ZooKeeper,所以如果你沒有一個ZooKeeper Server你需要首先去啟動它。你可以透過一個指令碼來獲取一個快的單節點的ZooKeeper例項。
bin/zookeeper-server-start.sh config/zookeeper.properties
這時候你就可以啟動Kafka server:
bin/kafka-server-start.sh config/server.properties
Step3: 建立一個話題Topic
下面我們建立一個名為test的topic,其 只有一個分割槽和複製
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
[kason@kason kafka_2.11-]$ bin/kafka-topics.sh --list --zookeeper localhost:2181test
Step4 傳送訊息,生產者
Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
[kason@kason kafka_2.11-]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test Hello
Step5 接收訊息,消費者
Kafka also has a command line consumer that will dump out messages to standard output.
[kason@kason kafka_2.11-]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning Hello
以上就是Single Broker cluster,但是我們可以開發multi-broker
Step6 設定multi-broker cluster
So far we have been running against a single broker, but that's no fun. For Kafka, a single broker is just a cluster of size one, so nothing much changes other than starting a few more broker instances. But just to get feel for it, let's expand our cluster to three nodes (still all on our local machine).簡單看就是在本地主機上設定三個node構成kafka的broker叢集
cd /home/kason/kafka/kafka_2.11- su cp server.properties server-1.properties cp server.properties server-2.properties
server-1.properties: broker.id=1 listeners=PLAINTEXT://:9093 log.dir=/tmp/kafka-logs-1server-2.properties: broker.id=2 listeners=PLAINTEXT://:9094 log.dir=/tmp/kafka-logs-2
bin/kafka-server-start.sh config/server-1.properties bin/kafka-server-start.sh config/server-2.properties
[kason@kason kafka_2.11-]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic Created topic "my-replicated-topic".
但是如何知道每一個broker暗殺的呢 可以檢視describe topics命令
[kason@kason kafka_2.11-]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topicTopic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs: Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0[kason@kason kafka_2.11-]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic testTopic:test PartitionCount:1 ReplicationFactor:1 Configs: Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
leader 是為指定的分割槽負責讀寫的節點node,它是隨機選的
replicas 節點node list
isr 同步replicas
在這個broker叢集中傳送訊息 生產者:
[kason@kason kafka_2.11-]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic my test message 1^[^T^[^T hello world hello kafka
接收訊息 消費者:
[kason@kason kafka_2.11-]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic my test message 1���� hello world hello kafka
現在因為這是叢集我們來測試測試它的容錯性,根據上面我們知道Leader是node broker1,現在查到其程式號並手動殺死.。例子就不舉了
Step7 使用Kafka來匯入或者匯出資料
Writing data from the console and writing it back to the console is a convenient place to start, but you'll probably want to use data from other sources or export data from Kafka to other systems. For many systems, instead of writing custom integration code you can use Kafka Connect to import or export data
Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an extensible tool that runs connectors, which implement the custom logic for interacting with an external system. In this quickstart we'll see how to run Kafka Connect with simple connectors that import data from a file to a Kafka topic and export data from a Kafka topic to a file.
echo -e "foonbar" > test.txt
然後我們啟動兩個聯結器執行在standalone模式下(就是單獨本地的程式),提供三個配置檔案最為入參,第一個是Kafka Connect 程式的配置檔案,包含一些普通的配置檔案如Kafka brokers以及序列化資料的格式,剩下的配置檔案每一個指定了要建立的connector, 這些檔案包含一個獨一無二的connector名字
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
Spark Streaming
Spark Streaming Code
package com.scala.action.streamingimport kafka.serializer.StringDecoderimport org.apache.spark.SparkConfimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}/** * Created by kason_zhang on 4/11/2017. */object MyKafkaSparkStreaming { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("MyKafkaStreamingDemo").setMaster("local[3]") val ssc = new StreamingContext(conf,Seconds(5)) val topicLines = KafkaUtils.createStream(ssc,"" ,"StreamKafkaGroupId",Map("spark" -> 1)) topicLines.map(_._2).flatMap(str => str.split(" ")).print() ssc.start() ssc.awaitTermination() } }
因為我沒有在centos kafka server.properties裡面設定
listeners = ,它將採用預設的listeners,這樣的話host將獲取centos的host名,但是我的SparkStreaming程式是在Windows中開發的,他不能識別host,所以需要在C盤的hosts檔案裡面加入10.64.24.78 kason讓其能夠識別host
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/3402/viewspace-2818826/,如需轉載,請註明出處,否則將追究法律責任。
