Sparkstreaming讀取Kafka訊息再結合SparkSQL,將結果儲存到HBase
親自摸索,送給大家,原創文章,轉載註明哦。
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.hadoop.hbase.client.{Mutation, Put}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.OutputFormat
/**
* Created by sunyulong on 16/9/19.
*/
object OBDSQL extends App{
//kafka topic
val topics = List(("aaa",1)).toMap
//zookeeper
val zk = "10.1.11.71,10.1.11.72,10.1.11.73"
val conf = new SparkConf() setMaster "yarn-cluster" setAppName "SparkStreamingETL"
//create streaming context
val ssc = new StreamingContext(conf , Seconds(1))
//get every lines from kafka
val lines = KafkaUtils.createStream(ssc,zk,"sparkStreaming",topics).map(_._2)
//get spark context
val sc = ssc.sparkContext
//get sql context
val sqlContext = new SQLContext(sc)
//process every rdd AND save as HTable
lines.foreachRDD(rdd => {
//case class implicits
import sqlContext.implicits._
//filter empty rdd
if (!rdd.isEmpty) {
//register a temp table
rdd.map(_.split(",")).map(p => Persion(p(0), p(1).trim.toDouble, p(2).trim.toInt, p(3).trim.toDouble)).toDF.registerTempTable("oldDriver")
//use spark SQL
val rs = sqlContext.sql("select count(1) from oldDriver")
//create hbase conf
val hconf = HBaseConfiguration.create()
hconf.set("hbase.zookeeper.quorum",zk)
hconf.set("hbase.zookeeper.property.clientPort", "2181")
hconf.set("hbase.defaults.for.version.skip", "true")
hconf.set(TableOutputFormat.OUTPUT_TABLE, "obd_pv")
hconf.setClass("mapreduce.job.outputformat.class", classOf[TableOutputFormat[String]], classOf[OutputFormat[String, Mutation]])
val jobConf = new JobConf(hconf)
//convert every line to hbase lines
rs.rdd.map(line => (System.currentTimeMillis(),line(0))).map(line =>{
//create hbase put
val put = new Put(Bytes.toBytes(line._1))
//add column
put.addColumn(Bytes.toBytes("pv"),Bytes.toBytes("pv"),Bytes.toBytes(line._2.toString))
//retuen type
(new ImmutableBytesWritable,put)
}).saveAsNewAPIHadoopDataset(jobConf) //save as HTable
}
})
//streaming start
ssc start()
ssc awaitTermination()
}
//the entity of persion for SparkSQL
case class Persion(gender: String, tall: Double, age: Int, driverAge: Double)
相關文章
- Kafka結合SparkStreaming開發KafkaSpark
- [20190219]windows批處理如何將結果儲存到引數裡面.txtWindows
- 利用flink從kafka接收訊息,統計結果寫入mysql,訊息寫入hiveKafkaMySqlHive
- 使用Java將圖片生成sequence file並儲存到HBaseJava
- 訊息佇列Kafka學習總結佇列Kafka
- Kafka 訊息儲存機制Kafka
- dataWarehouseOss專案總結(二)_讀取日誌資訊寫入kafkaKafka
- 使用mongodb、Kafka儲存mqtt訊息MongoDBKafkaMQQT
- Kafka -- 訊息傳送儲存流程Kafka
- HBase 資料儲存結構
- Python純程式碼 取組合數結果Python
- sparkStreaming 之 kafka源SparkKafka
- 使用 JDAudioCrawler 將下載的音訊儲存到本地儲存音訊
- SparkStreaming 的使用與總結Spark
- 如何將AI/ML與物件儲存結合使用AI物件
- 短視訊app開發,長按將視訊儲存到相簿APP
- Flink-Kafka-Connector Flink結合Kafka實戰Kafka
- Spring Boot和Apache Kafka結合實現錯誤處理,訊息轉換和事務支援?Spring BootApacheKafka
- Kafka原始碼分析(三) - Server端 - 訊息儲存Kafka原始碼Server
- SpringBoot--SpringBoot 讀取Properties檔案(結合JDBC)Spring BootJDBC
- 一文講清HBase儲存結構
- 理解TON合約中的訊息傳送結構
- ETLCloud結合kafka的資料整合CloudKafka
- 一文講清HBase的儲存結構
- Pusher 結合 Dcat admin 彈出訊息總是多條?
- Python 上下文管理器:控制輸出的結果能同時儲存到檔案中Python
- 實時數倉之Flink消費kafka訊息佇列資料入hbaseKafka佇列
- 分散式訊息Kafka分散式Kafka
- kafka 訊息佇列Kafka佇列
- Kafka訊息佇列Kafka佇列
- Flume將 kafka 中的資料轉存到 HDFS 中Kafka
- mysql group by 取想要的結果MySql
- 透過spark將資料儲存到elasticsearchSparkElasticsearch
- Java訊息佇列總結只需一篇解決ActiveMQ、RabbitMQ、ZeroMQ、KafkaJava佇列MQKafka
- SpringBoot 結合官網對MQTT訊息佇列整合記錄Spring BootMQQT佇列
- 訊息佇列mq總結佇列MQ
- spark讀取hbase的資料Spark
- 蘋果Mac如何將螢幕截圖儲存到桌面以外的特定資料夾?蘋果Mac
- Python向kafka發訊息PythonKafka