Flink的Table以及SQL

TUJC發表於2020-10-10

Flink的Table以及SQL

1、Flink table以及SQL的基本介紹

Apache Flink 具有兩個關係型API:Table API 和SQL,用於統一流和批處理。

Table API 是用於 Scala 和 Java 語言的查詢API,允許以非常直觀的方式組合關係運算子的查詢,例如 select,filter 和 join。

Flink SQL 的支援是基於實現了SQL標準的 Apache Calcite。無論輸入是批輸入(DataSet)還是流輸入(DataStream),任一介面中指定的查詢都具有相同的語義並指定相同的結果。

Table API和SQL介面彼此整合,Flink的DataStream和DataSet API亦是如此。我們可以輕鬆地在基於API構建的所有API和庫之間切換。

注意,到目前最新版本為止,Table API和SQL還有很多功能正在開發中。 並非[Table API,SQL]和[stream,batch]輸入的每種組合都支援所有操作

2、為什麼需要SQL

SQL 作為一個"人所皆知"的語言,如果一個引擎提供 SQL,它將很容易被人們接受。這已經是業界很常見的現象了。

Table API 是一種關係型API,類 SQL 的API,使用者可以像操作表一樣地運算元據, 非常的直觀和方便。

Table & SQL API 還有另一個職責,就是流處理和批處理統一的API層。

3、Flink Table & SQL程式設計開發

官網介紹

Flink的tableAPI允許我們對流式處理以及批量處理都使用sql語句的方式來進行開發。只要我們知道了dataStream或者dataSet可以轉換成為Table,那麼我們就可以方便的從各個地方獲取資料,然後轉換成為Table,通過TableAPI或者SQL來實現我們的資料的處理等。

Flink的表API和SQL程式可以連線到其他外部系統來讀寫批處理表和流表。Table source提供對儲存在外部 系統(如資料庫、鍵值儲存、訊息佇列或檔案系統)中的資料的訪問。Table Sink將表傳送到外部儲存系統。

1、使用FlinkSQL實現讀取CSV檔案資料,並進行查詢
需求:讀取csv檔案,檔案內容參見課件當中的flinksql.csv檔案,查詢年齡大於18歲的人,並將結果寫入到csv檔案裡面去
第一步:匯入jar包

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-scala_2.11</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-common</artifactId>
    <version>1.8.1</version>
</dependency>

第二步:開發程式碼讀取csv檔案並進行查詢

import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment}
import org.apache.flink.table.api.{Table, Types}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.sinks.{CsvTableSink}
import org.apache.flink.table.sources.CsvTableSource

object FlinkStreamSQL {
  def main(args: Array[String]): Unit = {
    //流式sql,獲取執行環境
    val streamEnvironment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //流式table處理環境
    val tableEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(streamEnvironment)
    //註冊我們的tableSource
    val source: CsvTableSource = CsvTableSource.builder()
      .field("id", Types.INT)
      .field("name", Types.STRING)
      .field("age", Types.INT)
      .fieldDelimiter(",")
      .ignoreFirstLine()
      .ignoreParseErrors()
      .lineDelimiter("\r\n")
      .path("D:\\開課吧課程資料\\Flink實時數倉\\datas\\flinksql.csv")
      .build()
    //將tableSource註冊成為我們的表
    tableEnvironment.registerTableSource("user",source)
    //查詢年齡大於18歲的人
    val result: Table = tableEnvironment.scan("user").filter("age >18")
    //列印我們表的後設資料資訊===》也就是欄位資訊
    //將查詢出來的結果,儲存到我們的csv檔案裡面去
    val sink = new CsvTableSink("D:\\開課吧課程資料\\Flink實時數倉\\datas\\sink.csv","===",1,WriteMode.OVERWRITE)
    result.writeToSink(sink)
    streamEnvironment.execute()
  }
}

2、DataStream與Table的互相轉換操作
DataStream轉換成為Table說明:

我們也可以將dataStream,流式處理的資料處理成為一張表,然後通過sql語句進行查詢資料,讀取socket當中的資料,然後進行資料統計,統計年大於10的人數,並將結果儲存到本地檔案,socket傳送的資料格式如下。

101,zhangsan,18
102,lisi,20
103,wangwu,25
104,zhaoliu,8

將DataStream轉換成為Table,我們需要用到StreamExecutionEnvironment和StreamTableEnvironment這兩個物件
獲取StreamTableEnvironment 物件,然後呼叫fromDataStream或者registerDataStream就可以將我們的DataStream轉換成為Table

Table轉換成為DataStream說明:
或者我們也可以將我們處理完成之後的Table轉換成為DataStream,將Table轉換成為DataStream可以有兩種模式
第一種方式:AppendMode
將表附加到流資料,表當中只能有查詢或者新增操作,如果有update或者delete操作,那麼就會失敗
只有在動態Table僅通過INSERT更改修改時才能使用此模式,即它僅附加,並且以前發出的結果永遠不會更新。如果更新或刪除操作使用追加模式會失敗報錯

第二種模式:RetraceMode
始終可以使用此模式。返回值是boolean型別。它用true或false來標記資料的插入和撤回,返回true代表資料插入,false代表資料的撤回
第一步:程式碼開發
注意:flink程式碼開發需要匯入隱式轉換包

import org.apache.flink.api.scala._
對於flink tableAPI或者SQL的開發,則需要匯入隱式轉換包
import org.apache.flink.table.api._

import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.table.api._
import org.apache.flink.api.scala._
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.sinks.CsvTableSink

object FlinkStreamSQL {
  def main(args: Array[String]): Unit = {
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    val streamSQLEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(environment)
    val socketStream: DataStream[String] = environment.socketTextStream("node01",9000)
    //101,zhangsan,18
    //102,lisi,20
    //103,wangwu,25
    //104,zhaoliu,8
    val userStream: DataStream[User] = socketStream.map(x =>User(x.split(",")(0).toInt,x.split(",")(1),x.split(",")(2).toInt) )
    //將我們的流注冊成為一張表
    streamSQLEnvironment.registerDataStream("userTable",userStream)
    //通過sql語句的方式來進行查詢

    //通過表示式來進行查詢
    //使用tableAPI來進行查詢
   // val table: Table = streamSQLEnvironment.scan("userTable").filter("age > 10")
    //使用sql方式來進行查詢
    val table: Table = streamSQLEnvironment.sqlQuery("select * from userTable")
    val sink3 = new CsvTableSink("D:\\開課吧課程資料\\Flink實時數倉\\datas\\sink3.csv","===",1,WriteMode.OVERWRITE)
    table.writeToSink(sink3)

    //使用append模式將Table轉換成為dataStream,不能用於sum,count,avg等操作,只能用於新增資料操作
    val appendStream: DataStream[User] = streamSQLEnvironment.toAppendStream[User](table)
    //使用retract模式將Table轉換成為DataStream
    val retractStream: DataStream[(Boolean, User)] = streamSQLEnvironment.toRetractStream[User](table)
    environment.execute()
  }
}
case class User(id:Int,name:String,age:Int)

第二步:socket傳送資料

101,zhangsan,18
102,lisi,20
103,wangwu,25
104,zhaoliu,8

3、DataSet與Table的互相轉換操作
我們也可以將我們的DataSet註冊成為一張表Table,然後進行查詢資料,同時我們也可以將Table轉換成為DataSet

import org.apache.flink.api.scala._
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.table.api.scala.BatchTableEnvironment
import org.apache.flink.table.sinks.CsvTableSink

object FlinkBatchSQL {
  def main(args: Array[String]): Unit = {
    val environment: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val batchSQL: BatchTableEnvironment = BatchTableEnvironment.create(environment)

    val sourceSet: DataSet[String] = environment.readTextFile("D:\\開課吧課程資料\\Flink實時數倉\\datas\\dataSet.csv")

    val userSet: DataSet[User2] = sourceSet.map(x => {
      println(x)
      val line: Array[String] = x.split(",")
      User2(line(0).toInt, line(1), line(2).toInt)
    })

    import org.apache.flink.table.api._

    batchSQL.registerDataSet("user",userSet)
   //val table: Table = batchSQL.scan("user").filter("age > 18")
    //注意:user關鍵字是flink當中的保留欄位,如果用到了這些保留欄位,需要轉譯
    val table: Table = batchSQL.sqlQuery("select id,name,age from `user` ")
    val sink = new CsvTableSink("D:\\開課吧課程資料\\Flink實時數倉\\datas\\batchSink.csv","===",1,WriteMode.OVERWRITE)
    table.writeToSink(sink)


    //將Table轉換成為DataSet
     val tableSet: DataSet[User2] = batchSQL.toDataSet[User2](table)

    tableSet.map(x =>x.age).print()

    environment.execute()
  }
}
case class User2(id:Int,name:String,age:Int)

更多flink定義的保留關鍵欄位:
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sql.html

A, ABS, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE,
ALLOW, ALTER, ALWAYS, AND, ANY, ARE, ARRAY, AS, ASC, ASENSITIVE,
ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES,
AUTHORIZATION, AVG, BEFORE, BEGIN, BERNOULLI, BETWEEN, BIGINT, BINARY,
BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY,
CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING,
CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICS, CHARACTERS,
CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME,
CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASS_ORIGIN, CLOB, CLOSE,
COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG,
COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME,
COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION,
CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT,
CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA,
CONSTRUCTOR, CONTAINS, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT,
COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT,
CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP,
CURRENT_PATH, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME,
CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER,
CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE,
DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DAY, DEALLOCATE,
DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE,
DEFERRED, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF,
DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC,
DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOUBLE,
DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE,
EACH, ELEMENT, ELSE, END, END-EXEC, EPOCH, EQUALS, ESCAPE, EVERY,
EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP,
EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL,
FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORTRAN,
FOUND, FRAC_SECOND, FREE, FROM, FULL, FUNCTION, FUSION, G, GENERAL,
GENERATED, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING,
HAVING, HIERARCHY, HOLD, HOUR, IDENTITY, IMMEDIATE, IMPLEMENTATION,
IMPORT, IN, INCLUDING, INCREMENT, INDICATOR, INITIALLY, INNER, INOUT,
INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER,
INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISOLATION, JAVA,
JOIN, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LANGUAGE, LARGE, LAST,
LAST_VALUE, LATERAL, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE,
LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP,
MATCH, MATCHED, MAX, MAXVALUE, MEMBER, MERGE, MESSAGE_LENGTH,
MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM,
MIN, MINUTE, MINVALUE, MOD, MODIFIES, MODULE, MONTH, MORE, MULTISET,
MUMPS, NAME, NAMES, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW,
NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NULL, NULLABLE, NULLIF,
NULLS, NUMBER, NUMERIC, OBJECT, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD,
ON, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINALITY,
OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD,
PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION,
PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME,
PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSTHROUGH,
PATH, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PLACING, PLAN,
PLI, POSITION, POWER, PRECEDING, PRECISION, PREPARE, PRESERVE,
PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUARTER, RANGE, RANK,
READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX,
REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX,
REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, RESET, RESTART,
RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH,
RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNS, REVOKE, RIGHT,
ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME,
ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, SAVEPOINT, SCALE,
SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA,
SCROLL, SEARCH, SECOND, SECTION, SECURITY, SELECT, SELF, SENSITIVE,
SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER,
SET, SETS, SIMILAR, SIMPLE, SIZE, SMALLINT, SOME, SOURCE, SPACE,
SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE,
SQLWARNING, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR,
SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER,
SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQRT, START, STATE,
STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRUCTURE, STYLE,
SUBCLASS_ORIGIN, SUBMULTISET, SUBSTITUTE, SUBSTRING, SUM, SYMMETRIC,
SYSTEM, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN,
TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMEZONE_HOUR,
TIMEZONE_MINUTE, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION,
TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK,
TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATION, TREAT, TRIGGER,
TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRUE, TYPE,
UESCAPE, UNBOUNDED, UNCOMMITTED, UNDER, UNION, UNIQUE, UNKNOWN,
UNNAMED, UNNEST, UPDATE, UPPER, UPSERT, USAGE, USER,
USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE,
USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, VALUE,
VALUES, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VIEW,
WEEK, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN,
WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, ZONE

4、FlinkSQL處理kafka的json格式資料

Flink的SQL功能也可以讓我們直接讀取kafka當中的資料,然後將kafka當中的資料作為我們的資料來源,直接將kafka當中的資料註冊成為一張表,然後通過sql來查詢kafka當中的資料即可,如果kafka當中出現的是json格式的資料,那麼也沒關係flink也可以與json進行整合,直接解析json格式的資料
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/connect.html
第一步:匯入jar包
匯入jar包

<dependency>
     <groupId>org.apache.flink</groupId>
     <artifactId>flink-json</artifactId>
     <version>1.8.1</version>
 </dependency>

 <!--
 前面Flink Stream與kafka整合已經匯入了kafka的包,不用再匯入了
 <dependency>
     <groupId>org.apache.flink</groupId>
     <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
     <version>1.8.1</version>
 </dependency>
 <dependency>
     <groupId>org.apache.kafka</groupId>
     <artifactId>kafka-clients</artifactId>
     <version>1.1.0</version>
 </dependency>

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-api</artifactId>
     <version>1.7.25</version>
 </dependency>

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-log4j12</artifactId>
     <version>1.7.25</version>
 </dependency>
-->

第二步:建立kafka的topic
node01執行以下命令,建立一個topic

cd /kkb/install/kafka_2.11-1.1.0
bin/kafka-topics.sh --create --topic kafka_source_table --partitions 3 --replication-factor 1 --zookeeper node01:2181,node02:2181,node03:2181

第三步:使用flink查詢kafka當中的資料

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.core.fs.FileSystem.WriteMode
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.{Table, _}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.descriptors.{Json, Kafka, Schema}
import org.apache.flink.table.sinks.CsvTableSink
object KafkaJsonSource {
  def main(args: Array[String]): Unit = {
    val streamEnvironment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //隱式轉換
    //checkpoint配置
   /* streamEnvironment.enableCheckpointing(100);
    streamEnvironment.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
    streamEnvironment.getCheckpointConfig.setMinPauseBetweenCheckpoints(500);
    streamEnvironment.getCheckpointConfig.setCheckpointTimeout(60000);
    streamEnvironment.getCheckpointConfig.setMaxConcurrentCheckpoints(1);
    streamEnvironment.getCheckpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
*/
    val tableEnvironment: StreamTableEnvironment = StreamTableEnvironment.create(streamEnvironment)
    val kafka: Kafka = new Kafka()
      .version("0.11")
      .topic("kafka_source_table")
      .startFromLatest()
      .property("group.id", "test_group")
      .property("bootstrap.servers", "node01:9092,node02:9092,node03:9092")

    val json: Json = new Json().failOnMissingField(false).deriveSchema()
    //{"userId":1119,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000,"data":[{"package":"com.browser","activetime":120000}]}
    val schema: Schema = new Schema()
      .field("userId", Types.INT)
      .field("day", Types.STRING)
      .field("begintime", Types.LONG)
      .field("endtime", Types.LONG)
     tableEnvironment
      .connect(kafka)
      .withFormat(json)
      .withSchema(schema)
      .inAppendMode()
      .registerTableSource("user_log")
    //使用sql來查詢資料
    val table: Table = tableEnvironment.sqlQuery("select userId,`day` ,begintime,endtime  from user_log")
    table.printSchema()
    //定義sink,輸出資料到哪裡
    val sink = new CsvTableSink("D:\\開課吧課程資料\\Flink實時數倉\\datas\\flink_kafka.csv","====",1,WriteMode.OVERWRITE)
    //註冊資料輸出目的地
    tableEnvironment.registerTableSink("csvSink",
      Array[String]("f0","f1","f2","f3"),
        Array[TypeInformation[_]](Types.INT, Types.STRING, Types.LONG, Types.LONG),sink)
    //將資料插入到資料目的地
    table.insertInto("csvSink")
    streamEnvironment.execute("kafkaSource")
  }
}

第四步:kafka當中傳送資料
使用kafka命令列傳送資料

cd /kkb/install/kafka_2.11-1.1.0
bin/kafka-console-producer.sh  --topic kafka_source_table --broker-list node01:9092,node02:9092,node03:9092 

傳送資料格式如下:

{"userId":1119,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{"userId":1120,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{"userId":1121,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{"userId":1122,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}
{"userId":1123,"day":"2017-03-02","begintime":1488326400000,"endtime":1488327000000}

相關文章