spark與kafaka整合workcount示例 spark-stream-kafka
package hgs.spark.streaming import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.storage.StorageLevel import kafka.serializer.StringDecoder import org.apache.kafka.common.serialization.StringDeserializer import kafka.serializer.DefaultDecoder import org.apache.spark.HashPartitioner /* * pom.xml新增 * <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_2.11</artifactId> <version>2.1.1</version> </dependency> * */ object SparkStreamingKafkaReciverWordCount { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]") val sc = new SparkContext(conf) val ssc = new StreamingContext(sc,Seconds(4)) ssc.checkpoint("d:\\checkpoint") val updateFunc=(iter:Iterator[(String,Seq[Int],Option[Int])])=>{ //iter.flatMap(it=>Some(it._2.sum+it._3.getOrElse(0)).map((it._1,_)))//方式一 //iter.flatMap{case(x,y,z)=>{Some(y.sum+z.getOrElse(0)).map((x,_))}}//方式二 iter.flatMap(it=>Some(it._1,(it._2.sum.toInt+it._3.getOrElse(0))))//方式三 } //注意下面的map一定要加上泛型,否則createStream會報錯 //kafaka的一些引數 val props = Map[String,String]( "bootstrap.servers"->"bigdata01:9092,bigdata02:9092,bigdata03:9092", "group.id"->"group_test", "enable.auto.commit"->"true", "auto.commit.intervals.ms"->"2000", "auto.offset.reset"->"smallest", "zookeeper.connect"->"bigdata01:2181,bigdata02:2181,bigdata03:2181") //topics val topics = Map[String,Int]("test"->1) val rds = KafkaUtils.createStream[String,String,StringDecoder,StringDecoder](ssc, props, topics, StorageLevel.MEMORY_AND_DISK) val words = rds.flatMap(x=>x._2.split(" ")) val wordscount = words.map((_,1)).updateStateByKey(updateFunc, new HashPartitioner(sc.defaultMinPartitions), true) wordscount.print() //啟動 ssc.start() ssc.awaitTermination() } }
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31506529/viewspace-2216851/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RabbitMQ簡介以及與SpringBoot整合示例MQSpring Boot
- Spark整合hiveSparkHive
- Spark 系列(十五)—— Spark Streaming 整合 FlumeSpark
- Spark SQL 教程: 通過示例瞭解 Spark SQLSparkSQL
- Spark SQL 教程: 透過示例瞭解 Spark SQLSparkSQL
- kafaka和tdmq的區別MQ
- Spark Streaming簡單入門(示例+原理)Spark
- Spark報錯(二):關於Spark-Streaming官方示例wordcount執行異常Spark
- Spark中分散式使用HanLP(1.7.0)分詞示例Spark分散式HanLP分詞
- spark與hbaseSpark
- WebService之Spring+CXF整合示例WebSpring
- spark 與flume 1.6.0Spark
- Hadoop與Spark關係HadoopSpark
- spark 與 yarn 結合SparkYarn
- Spark安裝與配置Spark
- TodoMVC 與 director 示例MVC
- 【技術乾貨】程式碼示例:使用 Apache Spark 連線 TDengineApacheSpark
- spark學習筆記--Spark調優與除錯Spark筆記除錯
- ML.NET 示例:深度學習之整合TensorFlow深度學習
- springboot整合eureka,服務相互呼叫簡單示例Spring Boot
- BQspringboot整合mongodb?changestream的示例程式碼mmuSpring BootMongoDB
- Spark Streaming的PIDRateEstimator與backpressureSpark
- Spark GraphX簡介與教程Spark
- Spark 安裝部署與快速上手Spark
- Spark Connector Reader 原理與實踐Spark
- Spring與ActiveMQ整合SpringMQ
- spring與redis整合SpringRedis
- Storm與kafka整合ORMKafka
- Mybatis與Spring整合MyBatisSpring
- Prometheus 與 Grafana 整合PrometheusGrafana
- Spark-stream基礎---sparkStreaming和Kafka整合wordCount單詞計數SparkKafka
- Spark Streaming(六):快取與持久化Spark快取持久化
- Spark SQL / Catalyst 內部原理 與 RBOSparkSQL
- Spark與MapReduce的對比(區別)Spark
- Spark:Yarn-client與Yarn-clusterSparkYarnclient
- Spark SQL知識點與實戰SparkSQL
- Java中的WeakHashMap與類示例JavaHashMap
- Qt構建與解析Json示例QTJSON