核心程式碼
public class TrackLog {
private Integer entityId;
// flink的時間型別,必須使用LocalDateTime
private LocalDateTime statDateTime;
public Integer getEntityId() {
return entityId;
}
public void setEntityId(Integer entityId) {
this.entityId = entityId;
}
public LocalDateTime getStatDateTime() {
return statDateTime;
}
public void setStatDateTime(LocalDateTime statDateTime) {
this.statDateTime = statDateTime;
}
}
SideOutputDataStream<TrackLog> patrolStream = traceStream.getSideOutput(outputLogTag);
Table table = tableEnv.fromDataStream(patrolStream);
table.printSchema();
會輸出:
(
`entityId` INT,
`statDateTime` RAW('java.time.LocalDateTime', '...')
)
問題一: 往POJO類(TrackLog)中private 屬性isDup,未定義getter方法
public class TrackLog {
private Integer entityId;
// flink的時間型別,必須使用LocalDateTime
private LocalDateTime statDateTime;
private boolean isDup = false;
public Integer getEntityId() {
return entityId;
}
public void setEntityId(Integer entityId) {
this.entityId = entityId;
}
public LocalDateTime getStatDateTime() {
return statDateTime;
}
public void setStatDateTime(LocalDateTime statDateTime) {
this.statDateTime = statDateTime;
}
}
再執行:
(
`f0` RAW('com.tide.entity.TrackLog', '...')
)
schema中,只有f0一個field,型別是TrackLog,也就是說,在把POJO類的fields對映到表時,出現了問題。
很奇怪,debug了好久才發現問題所在。
問題二:定義了schema,但是欄位比POJO類中public field少了一個。程式抱錯
程式碼:
public class TrackLog {
private Integer entityId;
// flink的時間型別,必須使用LocalDateTime
private LocalDateTime statDateTime;
private boolean isDup = false;
public Integer getEntityId() {
return entityId;
}
public void setEntityId(Integer entityId) {
this.entityId = entityId;
}
public LocalDateTime getStatDateTime() {
return statDateTime;
}
public void setStatDateTime(LocalDateTime statDateTime) {
this.statDateTime = statDateTime;
}
public boolean isDup() {
return isDup;
}
public void setDup(boolean dup) {
isDup = dup;
}
}
SideOutputDataStream<TrackLog> patrolStream = traceStream.getSideOutput(outputLogTag);
Schema schema = Schema.newBuilder()
.column("entityId", DataTypes.INT())
.column("statDateTime", DataTypes.TIMESTAMP())
.build();
Table table = tableEnv.fromDataStream(patrolStream, schema);
Caused by: org.apache.flink.table.api.ValidationException: Unable to find a field named 'entityId' in the physical data type derived from the given type information for schema declaration. Make sure that the type information is not a generic raw type. Currently available fields are: [f0]
判斷:問題不在於POJO類中多了一個field,而在於多了一個Boolean型別的field,不明白為啥Boolean型別會導致問題。
教訓
1、當POJO類的fields和表的欄位嚴格一致時,不需要指定Schema
2、POJO類中如果有Boolean型別,可能會導致問題。當我們的POJO類加入
{
private Boolean isDup = false;
public Boolean isDup() {
return isDup;
}
public void setDup(boolean dup) {
isDup = dup;
}
}
不指定schema情況下,輸出:
(
`f0` RAW('com.tide.entity.TrackLog', '...')
)
去掉這個field,輸出的table schema就正常了。
謎底揭曉
經過百般嘗試,發現問題不在於Boolean型別,而在於IDEA為boolean型別生成的getter、setter方法不符合flink的標準。
public Boolean isDup() {
return isDup;
}
public void setDup(boolean dup) {
isDup = dup;
}
修改為:
public boolean getIsDup() {
return isDup;
}
public void setIsDup(boolean dup) {
isDup = dup;
}
程式一切正常了。
因此,POJO類的規範至關重要:
- 每個private field必須定義標準的getter、setter方法
- 注意一定是標準的getter、setter方法。
後面再抽時間看看,flink如何把POJO類對映成table schema的。(大機率是反射)