http://blog.csdn.net/blwinner/article/details/53637932
http://blog.csdn.net/blwinner/article/details/53637932
Apache Kafka 0.10.0正式釋出了,此版本帶來了一系列新特性和bug修復,本文介紹新特性Kafka Stream
一、概述
Kafka Streams是一套類庫,它使得Apache Kafka可以擁有流處理的能力,通過使用Kafka Stream API進行業務邏輯處理最後寫回Kakfa或者其他系統中。Kafka Stream中有幾個重要的流處理概念:嚴格區分Event time和Process Time、支援視窗函式、應用狀態管理。開發者使用Kafka Stream的門檻非常低,比如單機進行一些小資料量的功能驗證而不需要在其他機器上啟動一些服務(比如在Storm執行Topology需要啟動Nimbus和Supervisor,當然也支援Local
Mode),Kafka Stream的併發模型可以對單應用多例項進行負載均衡。
二、主要功能
1、輕量的嵌入到java應用中
2、除了Kafka Stream Client lib以外無外部依賴
3、支援本地狀態故障轉移,以實現非常高效的有狀態操作,如join和window函式
4、低延遲訊息處理,支援基於event-time的window操作
5、提供必要的流處理原語、hige-level Stream DSL和low-level Processor API
三、開發指南
核心概念
- Stream Processing Topology
1、stream是Kafka Stream最重要的抽象,它代表了一個無限持續的資料集。stream是有序的、可重放訊息、對不可變資料集支援故障轉移
2、一個stream processing application由一到多個processor topologies組成,其中每個processor topology是一張圖,由多個streams(edges)連線著多個stream processor(node)
3、一個stream processor是processor topology中的一個節點,它代表一個在stream中的處理步驟:從上游processors接受資料、進行一些處理、最後傳送一到多條資料到下游processors
Kafka Stream提供兩種開發stream processing topology的API
1、high-level Stream DSL:提供通用的資料操作,如map和fileter
2、lower-level Processor API:提供定義和連線自定義processor,同時跟state store(下文會介紹)互動
- Time
在流處理中時間是一個比較重要的概念,比如說在視窗(windows)處理中,時間就代表兩個處理邊界
1、Event time:一條訊息最初產生/建立的時間點
2、Processing time:訊息準備被應用處理的時間點,如kafka消費某條訊息的時間,processing time的單位可以是毫秒、小時或天。Processing time晚於Event time
Kafka Stream 使用TimestampExtractor 介面為每個訊息分配一個timestamp,具體的實現可以是從訊息中的某個時間欄位獲取timestamp以提供event-time的語義或者返回處理時的時鐘時間,從而將processing-time的語義留給開發者的處理程式。開發者甚至可以強制使用其他不同的時間概念來進行定義event-time和processing time。
- States
有些stream應用不需要state,因為每條訊息的處理都是獨立的。然而維護stream處理的狀態對於複雜的應用是非常有用的,比如可以對stream中的資料進行join、group和aggreagte,Kafka Stream DSL提供了這個功能。
Kafka Stream使用state stores提供基於stream的資料儲存和資料查詢,Kafka Stream內嵌了多個state store,可以通過API訪問到,這些state store的實現可以是持久化的KV儲存引擎、記憶體HashMap或者其他資料結構。Kafka Stream提供了local state store的故障轉移和自動發現。
兩種API
1、Low-level Processor API
- Processor
開發著通過實現Processor介面並實現process和punctuate方法,每條訊息都會呼叫process方法,punctuate方法會週期性的被呼叫
- public class MyProcessor extends Processor {
- private ProcessorContext context;
- private KeyValueStore kvStore;
- @Override
- @SuppressWarnings("unchecked")
- public void init(ProcessorContext context) {
- this.context = context;
- this.context.schedule(1000);
- this.kvStore = (KeyValueStore) context.getStateStore("Counts");
- }
- @Override
- public void process(String dummy, String line) {
- String[] words = line.toLowerCase().split(" ");
- for (String word : words) {
- Integer oldValue = this.kvStore.get(word);
- if (oldValue == null) {
- this.kvStore.put(word, 1);
- } else {
- this.kvStore.put(word, oldValue + 1);
- }
- }
- }
- @Override
- public void punctuate(long timestamp) {
- KeyValueIterator iter = this.kvStore.all();
- while (iter.hasNext()) {
- KeyValue entry = iter.next();
- context.forward(entry.key, entry.value.toString());
- }
- iter.close();
- context.commit();
- }
- @Override
- public void close() {
- this.kvStore.close();
- }
- };
- Processor Topology
使用TopologyBuilder拼裝processor
- TopologyBuilder builder = new TopologyBuilder();
- builder.addSource("SOURCE", "src-topic")
- .addProcessor("PROCESS1", MyProcessor1::new /* the ProcessorSupplier that can generate MyProcessor1 */, "SOURCE")
- .addProcessor("PROCESS2", MyProcessor2::new /* the ProcessorSupplier that can generate MyProcessor2 */, "PROCESS1")
- .addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
- .addSink("SINK1", "sink-topic1", "PROCESS1")
- .addSink("SINK2", "sink-topic2", "PROCESS2")
- .addSink("SINK3", "sink-topic3", "PROCESS3");
1、首先一個叫做“SOURCE”的source node加入到topology,它會消費一個叫“src-topic”的Kafka Topic
2、其次三個processor nodes加入,其中“PROCESS1”是“SOURCE”的子節點,而"PROCESS2"和“PROCESS3”是“PROCESS1”的子節點
3、最後三個sink node加入,將三個processor nodes分別寫入到三個Kafka topic中
- Local State Store
Processor API除了依次處理訊息外,還能夠儲存一些狀態,開發者可以使用TopologyBuilder.addStateStore方法來建立local state並且關聯到某個processor node,也可以連線一個已建立的local state store到processor node
- TopologyBuilder.connectProcessorAndStateStores.
- TopologyBuilder builder = new TopologyBuilder();
- builder.addSource("SOURCE", "src-topic")
- .addProcessor("PROCESS1", MyProcessor1::new, "SOURCE")
- // create the in-memory state store "COUNTS" associated with processor "PROCESS1"
- .addStateStore(Stores.create("COUNTS").withStringKeys().withStringValues().inMemory().build(), "PROCESS1")
- .addProcessor("PROCESS2", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
- .addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
- // connect the state store "COUNTS" with processor "PROCESS2"
- .connectProcessorAndStateStores("PROCESS2", "COUNTS");
- .addSink("SINK1", "sink-topic1", "PROCESS1")
- .addSink("SINK2", "sink-topic2", "PROCESS2")
- .addSink("SINK3", "sink-topic3", "PROCESS3");
2、High-level Stream DSL
使用Stream DSL建立processor topology,開發者可以使用KStreamBuilder類,繼承自TopologyBuilder,下面是官方的一個例子,完整的原始碼可以在streams/examples包中找到
- Create Source Streams from Kafka
KStream可以從多個kafka topic中建立,而KTable只能單個topic
- KStreamBuilder builder = new KStreamBuilder();
- KStream source1 = builder.stream("topic1", "topic2");
- KTable source2 = builder.table("topic3");
- Transform a stream
KStream和KTable分別提供了一系列的transformation操作,每個操作都可以生成一至多個KStream和KTable物件並且傳到一至多個下游的processor topology,所有的操作都可以連結到一起組成一個複雜的processor topology。由於KStream和KTable是強型別的,所有transformation操作都定義成了泛型的函式,開發者可以指定輸入和輸入的資料型別。
在這些transformation中,如filter、map、mapValues等,都是無狀態的,並且可以同時用於KStream和KTable,開發者可以傳一個自定義函式作為引數,如
- // written in Java 8+, using lambda expressions
- KStream mapped = source1.mapValue(record -> record.get("category"));
有狀態transformation,需要連線到state store併產生結果,如join和aggregate操作
- //written in Java 8+, using lambda expressions
- KTable, Long> counts = source1.aggregateByKey(
- () -> 0L, // initial value
- (aggKey, value, aggregate) -> aggregate + 1L, // aggregating value
- HoppingWindows.of("counts").with(5000L).every(1000L), // intervals in milliseconds
- );
- KStream joined = source1.leftJoin(source2,
- (record1, record2) -> record1.get("user") + "-" + record2.get("region");
- );
- Write streams back to Kafka
最後,開發者可以將最終的結果stream寫回到kafka,通過 KStream.to and KTable.to
- joined.to("topic4");
- // equivalent to
- //
- // joined.to("topic4");
- // materialized = builder.stream("topic4");
- KStream materialized = joined.through("topic4");
相關文章
- http://blog.csdn.net/buoll/article/details/54150865HTTPAI
- http://blog.csdn.net/mengdong_zy/article/details/19043689HTTPAI
- http://blog.csdn.net/flower_vip/article/details/53034380HTTPAI
- http://blog.csdn.net/id_no_chinese/article/details/70228121HTTPAI
- 轉載字典地址:http://blog.csdn.net/aladdinty/article/details/3591789HTTPAI
- http://blog.csdn.net/friendan/article/details/12226201HTTPAI
- 此博不再更新,新博地址:http://blog.csdn.net/tonyzhou_cnHTTP
- indexedDB articleIndex
- sql hint articleSQL
- 初始化Article物件為何提示“ ‘Article’未定義?”物件
- HTML <article> 標籤HTML
- oracle article linkOracle
- 7.46 CLUSTER_DETAILSAI
- Details on individual partitionsAI
- Writing on important detailsImportAI
- <section>與<article> 區別
- recovery training articleAI
- 7.89 FEATURE_DETAILSAI
- SAP Purchasing Group in DetailsAI
- Laravel 路由這樣寫 "{article}"Laravel路由
- It's not even the whole quest. In this article
- [DW]An article about Materialized Views(zz)ZedView
- V$RMAN_BACKUP_JOB_DETAILSAI
- angular4 反向代理detailsAngularAI
- Using Statspack to Record Explain Plan DetailsAI
- http://jingyan.baidu.com/article/d169e186aa8728436611d8f3.htmlHTTPAIHTML
- AAPT2 error: check logs for detailsAPTErrorAI
- Oracle Clusterware Software Component Processing DetailsOracleAI
- SEGMENT SHRINK and Details. (文件 ID 242090.1)AI
- html5中article元素的使用方法HTML
- html5--2.3新的佈局元素(2)-articleHTML
- 三個不常用的HTML元素:<details>、<summary>、<dialog>HTMLAI
- ASP.NET MVC 5 - 查詢Details和Delete方法ASP.NETMVCAIdelete
- Please Check VKTM Trace File for More Details (文件 ID 1347586.1)AI
- html5之div,article,section區別與應用HTML
- 錯誤Error during artifact deployment. See server log for details.ErrorServerAI
- An error occurred while updating the entries. See the inner exception for details.ErrorWhileExceptionAI
- [轉載]P2P GroupSpecifier Class Explained In Details Part 1AI