Sparkstreaming讀取Kafka訊息再結合SparkSQL,將結果儲存到HBase
親自摸索,送給大家,原創文章,轉載註明哦。
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.hadoop.hbase.client.{Mutation, Put}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.OutputFormat
/**
* Created by sunyulong on 16/9/19.
*/
object OBDSQL extends App{
//kafka topic
val topics = List(("aaa",1)).toMap
//zookeeper
val zk = "10.1.11.71,10.1.11.72,10.1.11.73"
val conf = new SparkConf() setMaster "yarn-cluster" setAppName "SparkStreamingETL"
//create streaming context
val ssc = new StreamingContext(conf , Seconds(1))
//get every lines from kafka
val lines = KafkaUtils.createStream(ssc,zk,"sparkStreaming",topics).map(_._2)
//get spark context
val sc = ssc.sparkContext
//get sql context
val sqlContext = new SQLContext(sc)
//process every rdd AND save as HTable
lines.foreachRDD(rdd => {
//case class implicits
import sqlContext.implicits._
//filter empty rdd
if (!rdd.isEmpty) {
//register a temp table
rdd.map(_.split(",")).map(p => Persion(p(0), p(1).trim.toDouble, p(2).trim.toInt, p(3).trim.toDouble)).toDF.registerTempTable("oldDriver")
//use spark SQL
val rs = sqlContext.sql("select count(1) from oldDriver")
//create hbase conf
val hconf = HBaseConfiguration.create()
hconf.set("hbase.zookeeper.quorum",zk)
hconf.set("hbase.zookeeper.property.clientPort", "2181")
hconf.set("hbase.defaults.for.version.skip", "true")
hconf.set(TableOutputFormat.OUTPUT_TABLE, "obd_pv")
hconf.setClass("mapreduce.job.outputformat.class", classOf[TableOutputFormat[String]], classOf[OutputFormat[String, Mutation]])
val jobConf = new JobConf(hconf)
//convert every line to hbase lines
rs.rdd.map(line => (System.currentTimeMillis(),line(0))).map(line =>{
//create hbase put
val put = new Put(Bytes.toBytes(line._1))
//add column
put.addColumn(Bytes.toBytes("pv"),Bytes.toBytes("pv"),Bytes.toBytes(line._2.toString))
//retuen type
(new ImmutableBytesWritable,put)
}).saveAsNewAPIHadoopDataset(jobConf) //save as HTable
}
})
//streaming start
ssc start()
ssc awaitTermination()
}
//the entity of persion for SparkSQL
case class Persion(gender: String, tall: Double, age: Int, driverAge: Double)
相關文章
- Kafka結合SparkStreaming開發KafkaSpark
- 利用flink從kafka接收訊息,統計結果寫入mysql,訊息寫入hiveKafkaMySqlHive
- 訊息佇列Kafka學習總結佇列Kafka
- Kafka 訊息儲存機制Kafka
- 使用Java將圖片生成sequence file並儲存到HBaseJava
- Hbase篇--Hbase和MapReduce結合ApiAPI
- Kafka -- 訊息傳送儲存流程Kafka
- 使用mongodb、Kafka儲存mqtt訊息MongoDBKafkaMQQT
- [20190219]windows批處理如何將結果儲存到引數裡面.txtWindows
- storm與kafka結合ORMKafka
- Python純程式碼 取組合數結果Python
- HBase 資料儲存結構
- 使用 JDAudioCrawler 將下載的音訊儲存到本地儲存音訊
- 短視訊app開發,長按將視訊儲存到相簿APP
- sparkStreaming 之 kafka源SparkKafka
- Flume + Kafka + SparkStreaming分析KafkaSpark
- Kafka原始碼分析(三) - Server端 - 訊息儲存Kafka原始碼Server
- Kafka 訊息監控 - Kafka EagleKafka
- 理解TON合約中的訊息傳送結構
- 如何將AI/ML與物件儲存結合使用AI物件
- Metal Camera開發1:讀取渲染結果生成UIImageUI
- Spring Boot和Apache Kafka結合實現錯誤處理,訊息轉換和事務支援?Spring BootApacheKafka
- Kafka訊息佇列Kafka佇列
- kafka 訊息佇列Kafka佇列
- 分散式訊息Kafka分散式Kafka
- Kafka的訊息格式Kafka
- mysql group by 取想要的結果MySql
- PHP PDO獲取結果集PHP
- 訊息結構體MSG結構體
- SparkStreaming 的使用與總結Spark
- 使用外部表儲存查詢結果
- dataWarehouseOss專案總結(二)_讀取日誌資訊寫入kafkaKafka
- 如何將一個複雜的mysql結果集,再篩選一次MySql
- 一文講清HBase儲存結構
- Oracle將結果豎向顯示Oracle
- Pusher 結合 Dcat admin 彈出訊息總是多條?
- exonerate結果整理,獲取target序列
- 訊息佇列之 Kafka佇列Kafka