flink stream轉table POJO物件遇到的坑

耗子哥信徒發表於2024-08-02

核心程式碼

public class TrackLog {
    private Integer entityId;
    // flink的時間型別,必須使用LocalDateTime
    private LocalDateTime statDateTime;
	public Integer getEntityId() {
        return entityId;
    }

    public void setEntityId(Integer entityId) {
        this.entityId = entityId;
    }
	public LocalDateTime getStatDateTime() {
        return statDateTime;
    }

    public void setStatDateTime(LocalDateTime statDateTime) {
        this.statDateTime = statDateTime;
    }
}

SideOutputDataStream<TrackLog> patrolStream = traceStream.getSideOutput(outputLogTag);
Table table = tableEnv.fromDataStream(patrolStream);
table.printSchema();

會輸出:

(
  `entityId` INT,
  `statDateTime` RAW('java.time.LocalDateTime', '...')
)

問題一: 往POJO類(TrackLog)中private 屬性isDup,未定義getter方法

public class TrackLog {
    private Integer entityId;
    // flink的時間型別,必須使用LocalDateTime
    private LocalDateTime statDateTime;
	
	private boolean isDup = false;
	
	public Integer getEntityId() {
        return entityId;
    }

    public void setEntityId(Integer entityId) {
        this.entityId = entityId;
    }
	public LocalDateTime getStatDateTime() {
        return statDateTime;
    }

    public void setStatDateTime(LocalDateTime statDateTime) {
        this.statDateTime = statDateTime;
    }
}

再執行:

(
  `f0` RAW('com.tide.entity.TrackLog', '...')
)

schema中,只有f0一個field,型別是TrackLog,也就是說,在把POJO類的fields對映到表時,出現了問題。
很奇怪,debug了好久才發現問題所在。

問題二:定義了schema,但是欄位比POJO類中public field少了一個。程式抱錯

程式碼:

public class TrackLog {
    private Integer entityId;
    // flink的時間型別,必須使用LocalDateTime
    private LocalDateTime statDateTime;
	
	private boolean isDup = false;
	
	public Integer getEntityId() {
        return entityId;
    }

    public void setEntityId(Integer entityId) {
        this.entityId = entityId;
    }
	public LocalDateTime getStatDateTime() {
        return statDateTime;
    }

    public void setStatDateTime(LocalDateTime statDateTime) {
        this.statDateTime = statDateTime;
    }
	public boolean isDup() {
        return isDup;
    }

    public void setDup(boolean dup) {
        isDup = dup;
    }
}

SideOutputDataStream<TrackLog> patrolStream = traceStream.getSideOutput(outputLogTag);
        Schema schema = Schema.newBuilder()
                .column("entityId", DataTypes.INT())
                .column("statDateTime", DataTypes.TIMESTAMP())
                .build();
        Table table = tableEnv.fromDataStream(patrolStream, schema);
Caused by: org.apache.flink.table.api.ValidationException: Unable to find a field named 'entityId' in the physical data type derived from the given type information for schema declaration. Make sure that the type information is not a generic raw type. Currently available fields are: [f0]

判斷:問題不在於POJO類中多了一個field,而在於多了一個Boolean型別的field,不明白為啥Boolean型別會導致問題。

教訓

1、當POJO類的fields和表的欄位嚴格一致時,不需要指定Schema

2、POJO類中如果有Boolean型別,可能會導致問題。當我們的POJO類加入

{
    private Boolean isDup = false;
    public Boolean isDup() {
        return isDup;
    }

    public void setDup(boolean dup) {
        isDup = dup;
    }
}

不指定schema情況下,輸出:

(
  `f0` RAW('com.tide.entity.TrackLog', '...')
)

去掉這個field,輸出的table schema就正常了。

謎底揭曉

經過百般嘗試,發現問題不在於Boolean型別,而在於IDEA為boolean型別生成的getter、setter方法不符合flink的標準。

public Boolean isDup() {
        return isDup;
    }

    public void setDup(boolean dup) {
        isDup = dup;
    }

修改為:

 public boolean getIsDup() {
        return isDup;
    }

    public void setIsDup(boolean dup) {
        isDup = dup;
    }

程式一切正常了。
因此,POJO類的規範至關重要:

  1. 每個private field必須定義標準的getter、setter方法
  2. 注意一定是標準的getter、setter方法。

後面再抽時間看看,flink如何把POJO類對映成table schema的。(大機率是反射)

相關文章