spark structed streaming 寫入hudi表
-
透過spark-sql建立hudi表
create table if not exists hudi_table3( id int, name string, price double ) using hudi options ( 'type' = 'mor', 'primaryKey' = 'id', 'hoodie.datasource.hive_sync.enable'='false', 'hoodie.datasource.meta.sync.enable'='false', 'hoodie.datasource.write.precombine.field'=price )
2. 寫入hudi程式碼
val spark = SparkSession.builder() .master("local[*]") .enableHiveSupport() .config("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .getOrCreate() val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("subscribe", "test") .option("group.id","test-1") .load() import spark.implicits._ val query = df .selectExpr("split(cast(value as string),',') as sp") .selectExpr("cast(sp[0] as int) as id ","sp[1] as name","cast(sp[2] as double) as price") .writeStream.format("hudi") .trigger(Trigger.ProcessingTime(5000L)) .option("checkpointLocation","file:///Users/haoguangshi/mysoft/ck") .option("path","/Users/haoguangshi/workspace/hudi-lrn/spark-warehouse/hudi_table3") // 主鍵相同的話根據該欄位進行判斷需要保留那行資料PRECOMBINE_FIELD_OPT_KEY .option("hoodie.datasource.write.precombine.field","price") // 表主鍵 RECORDKEY_FIELD_OPT_KEY .option("hoodie.datasource.write.recordkey.field","id") .start() query.awaitTermination() spark.stop()
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31506529/viewspace-2865291/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 實戰|使用Spark Streaming寫入HudiSpark
- Spark Streaming入門Spark
- 使用 ES-Hadoop 將 Spark Streaming 流資料寫入 ESHadoopSpark
- Spark Streaming簡單入門(示例+原理)Spark
- Spark學習進度11-Spark Streaming&Structured StreamingSparkStruct
- Spark Streaming VS FlinkSpark
- Spark 系列(十四)—— Spark Streaming 基本操作Spark
- Spark 系列(十五)—— Spark Streaming 整合 FlumeSpark
- spark學習筆記-- Spark StreamingSpark筆記
- Spark學習筆記(三)-Spark StreamingSpark筆記
- Spark Streaming Failed to read checSparkAI
- spark-streaming之 socketTextStreamSpark
- Spark Streaming學習——DStreamSpark
- Spark Streaming 流式處理Spark
- Spark Streaming :基本工作原理Spark
- Spark Structured Streaming 解析 JSONSparkStructJSON
- Spark Streaming + Spark SQL 實現配置化ETSparkSQL
- Spark 以及 spark streaming 核心原理及實踐Spark
- Spark Streaming中的Window操作Spark
- Cris 的 Spark Streaming 筆記Spark筆記
- Spark Streaming的PIDRateEstimator與backpressureSpark
- Spark Streaming監聽HDFS檔案(Spark-shell)Spark
- Spark Streaming 的容錯機制Spark
- Spark Streaming和Flink的區別Spark
- Spark-Streaming的學習使用Spark
- spark 批次寫入redis控制SparkRedis
- spark寫入hive資料SparkHive
- Spark Streaming 生產、消費流程梳理Spark
- Spark Streaming(六):快取與持久化Spark快取持久化
- Spark Streaming--開窗函式over()Spark函式
- Spark Streaming 之 Kafka 偏移量管理SparkKafka
- Spark Streaming——Spark第一代實時計算引擎Spark
- spark structured-streaming 最全的使用總結SparkStruct
- spark streaming執行kafka資料來源SparkKafka
- Spark 如何寫入HBase/Redis/MySQL/KafkaSparkRedisMySqlKafka
- hadoop基礎學習三十一(spark-streaming)HadoopSpark
- Spark streaming消費Kafka的正確姿勢SparkKafka
- Spark Streaming讀取Kafka資料兩種方式SparkKafka