訂單是統計分析的重要的物件,圍繞訂單有很多的維度統計需求,比如使用者、地區、商品、品類、品牌等等。為了之後統計計算更加方便,減少大表之間的關聯,所以在實時計算過程中將圍繞訂單的相關資料整合成為一張訂單的寬表。那究竟哪些資料需要和訂單整合在一起?
如上圖,由於在之前的操作(BaseDbTask)我們已經把資料分拆成了事實資料和維度資料,事實資料(綠色)進入 kafka 資料流(DWD 層)中,維度資料(藍色)進入 hbase 中長期儲存。那麼我們在 DWM 層中要把實時和維度資料進行整合關聯在一起,形成寬表。那麼這裡就要處理有兩種關聯,事實資料和事實資料關聯、事實資料和維度資料關聯。
-
事實資料和事實資料關聯,其實就是流與流之間的關聯。
-
事實資料與維度資料關聯,其實就是流計算中查詢外部資料來源。
import java.math.BigDecimal;
/**
* @author zhangbao
* @date 2021/10/25 19:55
* @desc 訂單
*/
@Data
public class OrderInfo {
Long id;
Long province_id;
String order_status;
Long user_id;
BigDecimal total_amount;
BigDecimal activity_reduce_amount;
BigDecimal coupon_reduce_amount;
BigDecimal original_total_amount;
BigDecimal feight_fee;
String expire_time;
String create_time;
String operate_time;
String create_date; // 把其他欄位處理得到
String create_hour;
Long create_ts;
}
import java.math.BigDecimal;
/**
* @author zhangbao
* @date 2021/10/25 19:55
* @desc 訂單明細
*/
@Data
public class OrderDetail {
Long id;
Long order_id ;
Long sku_id;
BigDecimal order_price ;
Long sku_num ;
String sku_name;
String create_time;
BigDecimal split_total_amount;
BigDecimal split_activity_amount;
BigDecimal split_coupon_amount;
Long create_ts;
}
3. 消費kafka事實資料
在dwm包下建立任務OrderWideApp.java,對訂單及明細資料做格式轉換,在這個階段可以做一些ETL操作。
import cn.hutool.core.date.DateTime;
import cn.hutool.core.date.DateUnit;
import cn.hutool.core.date.DateUtil;
import com.alibaba.fastjson.JSONObject;
import com.zhangbao.gmall.realtime.bean.OrderDetail;
import com.zhangbao.gmall.realtime.bean.OrderInfo;
import com.zhangbao.gmall.realtime.utils.MyKafkaUtil;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
/**
* @author zhangbao
* @date 2021/10/25 19:58
* @desc
*/
public class OrderWideApp {
public static void main(String[] args) {
//webui模式,需要新增pom依賴
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
// StreamExecutionEnvironment env1 = StreamExecutionEnvironment.createLocalEnvironment();
//設定並行度
env.setParallelism(4);
//設定檢查點
// env.enableCheckpointing(5000, CheckpointingMode.EXACTLY_ONCE);
// env.getCheckpointConfig().setCheckpointTimeout(60000);
// env.setStateBackend(new FsStateBackend("hdfs://hadoop101:9000/gmall/flink/checkpoint/uniqueVisit"));
// //指定哪個使用者讀取hdfs檔案
// System.setProperty("HADOOP_USER_NAME","zhangbao");
//從kafka的dwd主題獲取訂單和訂單詳情
String orderInfoTopic = "dwd_order_info";
String orderDetailTopic = "dwd_order_detail";
String orderWideTopic = "dwm_order_wide";
String orderWideGroup = "order_wide_group";
//訂單資料
FlinkKafkaConsumer<String> orderInfoSource = MyKafkaUtil.getKafkaSource(orderInfoTopic, orderWideGroup);
DataStreamSource<String> orderInfoDs = env.addSource(orderInfoSource);
//訂單詳情資料
FlinkKafkaConsumer<String> orderDetailSource = MyKafkaUtil.getKafkaSource(orderDetailTopic, orderWideGroup);
DataStreamSource<String> orderDetailDs = env.addSource(orderDetailSource);
//對訂單資料進行轉換
SingleOutputStreamOperator<OrderInfo> orderInfoObjDs = orderInfoDs.map(new RichMapFunction<String, OrderInfo>() {
@Override
public OrderInfo map(String jsonStr) throws Exception {
System.out.println("order info str >>> "+jsonStr);
OrderInfo orderInfo = JSONObject.parseObject(jsonStr, OrderInfo.class);
DateTime createTime = DateUtil.parse(orderInfo.getCreate_time(), "yyyy-MM-dd HH:mm:ss");
orderInfo.setCreate_ts(createTime.getTime());
return orderInfo;
}
});
//對訂單明細資料進行轉換
SingleOutputStreamOperator<OrderDetail> orderDetailObjDs = orderDetailDs.map(new RichMapFunction<String, OrderDetail>() {
@Override
public OrderDetail map(String jsonStr) throws Exception {
System.out.println("order detail str >>> "+jsonStr);
OrderDetail orderDetail = JSONObject.parseObject(jsonStr, OrderDetail.class);
DateTime createTime = DateUtil.parse(orderDetail.getCreate_time(), "yyyy-MM-dd HH:mm:ss");
orderDetail.setCreate_ts(createTime.getTime());
return orderDetail;
}
});
orderInfoDs.print("order info >>>");
orderDetailDs.print("order detail >>>");
try {
env.execute("order wide task");
} catch (Exception e) {
e.printStackTrace();
}
}
}
4. 雙流join準備
在 flink 中的流 join 大體分為兩種,一種是基於時間視窗的 join(Time Windowed Join),比如 join、coGroup 等。另一種是基於狀態快取的 join(Temporal Table Join),比如 intervalJoin。這裡選用 intervalJoin,因為相比較視窗 join,intervalJoin 使用更簡單,而且避免了應匹配的資料處於不同視窗的問題。
intervalJoin 目前只有一個問題,就是還不支援 left join。但是我們這裡是訂單主表與訂單從表之間的關聯不需要 left join,所以 intervalJoin 是較好的選擇。
官方文件:
先設定時間水位線,然後在分組
//指定事件時間欄位
//訂單事件時間欄位
SingleOutputStreamOperator<OrderInfo> orderInfoWithTsDs = orderInfoObjDs.assignTimestampsAndWatermarks(
WatermarkStrategy
.<OrderInfo>forBoundedOutOfOrderness(Duration.ofSeconds(3))
.withTimestampAssigner(new SerializableTimestampAssigner<OrderInfo>() {
@Override
public long extractTimestamp(OrderInfo orderInfo, long l) {
return orderInfo.getCreate_ts();
}
})
);
//訂單明細指定事件事件欄位
SingleOutputStreamOperator<OrderDetail> orderDetailWithTsDs = orderDetailObjDs.assignTimestampsAndWatermarks(
WatermarkStrategy.<OrderDetail>forBoundedOutOfOrderness(Duration.ofSeconds(3))
.withTimestampAssigner(new SerializableTimestampAssigner<OrderDetail>() {
@Override
public long extractTimestamp(OrderDetail orderDetail, long l) {
return orderDetail.getCreate_ts();
}
})
);
//分組
KeyedStream<OrderInfo, Long> orderInfoKeysDs = orderInfoWithTsDs.keyBy(OrderInfo::getId);
KeyedStream<OrderDetail, Long> orderDetailKeysDs = orderDetailWithTsDs.keyBy(OrderDetail::getId);
5. 建立訂單寬表
import lombok.AllArgsConstructor;
import lombok.Data;
import org.apache.commons.lang3.ObjectUtils;
import java.math.BigDecimal;
/**
* @author zhangbaohpu
* @date 2021/11/13 11:10
* @desc 訂單寬表
*/
@Data
@AllArgsConstructor
public class OrderWide {
Long detail_id;
Long order_id ;
Long sku_id;
BigDecimal order_price ;
Long sku_num ;
String sku_name;
Long province_id;
String order_status;
Long user_id;
BigDecimal total_amount;
BigDecimal activity_reduce_amount;
BigDecimal coupon_reduce_amount;
BigDecimal original_total_amount;
BigDecimal feight_fee;
BigDecimal split_feight_fee;
BigDecimal split_activity_amount;
BigDecimal split_coupon_amount;
BigDecimal split_total_amount;
String expire_time;
String create_time;
String operate_time;
String create_date; // 把其他欄位處理得到
String create_hour;
String province_name;//查詢維表得到
String province_area_code;
String province_iso_code;
String province_3166_2_code;
Integer user_age ;
String user_gender;
Long spu_id; //作為維度資料 要關聯進來
Long tm_id;
Long category3_id;
String spu_name;
String tm_name;
String category3_name;
public OrderWide(OrderInfo orderInfo, OrderDetail orderDetail){
mergeOrderInfo(orderInfo);
mergeOrderDetail(orderDetail);
}
public void mergeOrderInfo(OrderInfo orderInfo ) {
if (orderInfo != null) {
this.order_id = orderInfo.id;
this.order_status = orderInfo.order_status;
this.create_time = orderInfo.create_time;
this.create_date = orderInfo.create_date;
this.activity_reduce_amount = orderInfo.activity_reduce_amount;
this.coupon_reduce_amount = orderInfo.coupon_reduce_amount;
this.original_total_amount = orderInfo.original_total_amount;
this.feight_fee = orderInfo.feight_fee;
this.total_amount = orderInfo.total_amount;
this.province_id = orderInfo.province_id;
this.user_id = orderInfo.user_id;
}
}
public void mergeOrderDetail(OrderDetail orderDetail ) {
if (orderDetail != null) {
this.detail_id = orderDetail.id;
this.sku_id = orderDetail.sku_id;
this.sku_name = orderDetail.sku_name;
this.order_price = orderDetail.order_price;
this.sku_num = orderDetail.sku_num;
this.split_activity_amount=orderDetail.split_activity_amount;
this.split_coupon_amount=orderDetail.split_coupon_amount;
this.split_total_amount=orderDetail.split_total_amount;
}
}
public void mergeOtherOrderWide(OrderWide otherOrderWide){
this.order_status =
ObjectUtils.firstNonNull( this.order_status ,otherOrderWide.order_status);
this.create_time =
ObjectUtils.firstNonNull(this.create_time,otherOrderWide.create_time);
this.create_date =
ObjectUtils.firstNonNull(this.create_date,otherOrderWide.create_date);
this.coupon_reduce_amount =
ObjectUtils.firstNonNull(this.coupon_reduce_amount,otherOrderWide.coupon_reduce_amount);
this.activity_reduce_amount =
ObjectUtils.firstNonNull(this.activity_reduce_amount,otherOrderWide.activity_reduce_amount);
this.original_total_amount =
ObjectUtils.firstNonNull(this.original_total_amount,otherOrderWide.original_total_amount);
this.feight_fee = ObjectUtils.firstNonNull( this.feight_fee,otherOrderWide.feight_fee);
this.total_amount =
ObjectUtils.firstNonNull( this.total_amount,otherOrderWide.total_amount);
this.user_id = ObjectUtils.<Long>firstNonNull(this.user_id,otherOrderWide.user_id);
this.sku_id = ObjectUtils.firstNonNull( this.sku_id,otherOrderWide.sku_id);
this.sku_name = ObjectUtils.firstNonNull(this.sku_name,otherOrderWide.sku_name);
this.order_price =
ObjectUtils.firstNonNull(this.order_price,otherOrderWide.order_price);
this.sku_num = ObjectUtils.firstNonNull( this.sku_num,otherOrderWide.sku_num);
this.split_activity_amount=ObjectUtils.firstNonNull(this.split_activity_amount);
this.split_coupon_amount=ObjectUtils.firstNonNull(this.split_coupon_amount);
this.split_total_amount=ObjectUtils.firstNonNull(this.split_total_amount);
} }
6. 雙流join
在做好資料封裝,並標記時間水位線,我們可以做訂單和訂單明細表的雙流join操作了。
import cn.hutool.core.date.DateTime;
import cn.hutool.core.date.DateUtil;
import com.alibaba.fastjson.JSONObject;
import com.zhangbao.gmall.realtime.bean.OrderDetail;
import com.zhangbao.gmall.realtime.bean.OrderInfo;
import com.zhangbao.gmall.realtime.bean.OrderWide;
import com.zhangbao.gmall.realtime.utils.MyKafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.ProcessJoinFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.util.Collector;
import java.time.Duration;
/**
* @author zhangbao
* @date 2021/10/25 19:58
* @desc
* 啟動服務
* zk > kf > maxwell > hdfs > hbase > baseDbTask > OrderWideApp > mysql配置表
* 業務流程
* 模擬生成資料
* maxwell監控mysql資料
* kafka接收maxwell傳送的資料,放入ODS層(ods_base_db_m)
* baseDbTask消費kafka的主題資料並進行分流
* 從mysql讀取配置表
* 將配置快取到map集合中
* 檢查phoenix(hbase的皮膚)是否存在表
* 對資料表進行分流傳送到不同dwd層主題
*/
public class OrderWideApp {
public static void main(String[] args) {
//webui模式,需要新增pom依賴
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
// StreamExecutionEnvironment env1 = StreamExecutionEnvironment.createLocalEnvironment();
//設定並行度
env.setParallelism(4);
//設定檢查點
// env.enableCheckpointing(5000, CheckpointingMode.EXACTLY_ONCE);
// env.getCheckpointConfig().setCheckpointTimeout(60000);
// env.setStateBackend(new FsStateBackend("hdfs://hadoop101:9000/gmall/flink/checkpoint/uniqueVisit"));
// //指定哪個使用者讀取hdfs檔案
// System.setProperty("HADOOP_USER_NAME","zhangbao");
//從kafka的dwd主題獲取訂單和訂單詳情
String orderInfoTopic = "dwd_order_info";
String orderDetailTopic = "dwd_order_detail";
String orderWideTopic = "dwm_order_wide";
String orderWideGroup = "order_wide_group";
//訂單資料
FlinkKafkaConsumer<String> orderInfoSource = MyKafkaUtil.getKafkaSource(orderInfoTopic, orderWideGroup);
DataStreamSource<String> orderInfoDs = env.addSource(orderInfoSource);
//訂單詳情資料
FlinkKafkaConsumer<String> orderDetailSource = MyKafkaUtil.getKafkaSource(orderDetailTopic, orderWideGroup);
DataStreamSource<String> orderDetailDs = env.addSource(orderDetailSource);
//對訂單資料進行轉換
SingleOutputStreamOperator<OrderInfo> orderInfoObjDs = orderInfoDs.map(new RichMapFunction<String, OrderInfo>() {
@Override
public OrderInfo map(String jsonStr) throws Exception {
System.out.println("order info str >>> "+jsonStr);
OrderInfo orderInfo = JSONObject.parseObject(jsonStr, OrderInfo.class);
DateTime createTime = DateUtil.parse(orderInfo.getCreate_time(), "yyyy-MM-dd HH:mm:ss");
orderInfo.setCreate_ts(createTime.getTime());
return orderInfo;
}
});
//對訂單明細資料進行轉換
SingleOutputStreamOperator<OrderDetail> orderDetailObjDs = orderDetailDs.map(new RichMapFunction<String, OrderDetail>() {
@Override
public OrderDetail map(String jsonStr) throws Exception {
System.out.println("order detail str >>> "+jsonStr);
OrderDetail orderDetail = JSONObject.parseObject(jsonStr, OrderDetail.class);
DateTime createTime = DateUtil.parse(orderDetail.getCreate_time(), "yyyy-MM-dd HH:mm:ss");
orderDetail.setCreate_ts(createTime.getTime());
return orderDetail;
}
});
orderInfoObjDs.print("order info >>>");
orderDetailObjDs.print("order detail >>>");
//指定事件時間欄位
//訂單事件時間欄位
SingleOutputStreamOperator<OrderInfo> orderInfoWithTsDs = orderInfoObjDs.assignTimestampsAndWatermarks(
WatermarkStrategy
.<OrderInfo>forBoundedOutOfOrderness(Duration.ofSeconds(3))
.withTimestampAssigner(new SerializableTimestampAssigner<OrderInfo>() {
@Override
public long extractTimestamp(OrderInfo orderInfo, long l) {
return orderInfo.getCreate_ts();
}
})
);
//訂單明細指定事件事件欄位
SingleOutputStreamOperator<OrderDetail> orderDetailWithTsDs = orderDetailObjDs.assignTimestampsAndWatermarks(
WatermarkStrategy.<OrderDetail>forBoundedOutOfOrderness(Duration.ofSeconds(3))
.withTimestampAssigner(new SerializableTimestampAssigner<OrderDetail>() {
@Override
public long extractTimestamp(OrderDetail orderDetail, long l) {
return orderDetail.getCreate_ts();
}
})
);
//分組
KeyedStream<OrderInfo, Long> orderInfoKeysDs = orderInfoWithTsDs.keyBy(OrderInfo::getId);
KeyedStream<OrderDetail, Long> orderDetailKeysDs = orderDetailWithTsDs.keyBy(OrderDetail::getOrder_id);
/**
* interval-join
* https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/joining/#interval-join
*/
SingleOutputStreamOperator<OrderWide> orderWideDs = orderInfoKeysDs.intervalJoin(orderDetailKeysDs)
.between(Time.milliseconds(-5), Time.milliseconds(5))
.process(new ProcessJoinFunction<OrderInfo, OrderDetail, OrderWide>() {
@Override
public void processElement(OrderInfo orderInfo, OrderDetail orderDetail, ProcessJoinFunction<OrderInfo, OrderDetail, OrderWide>.Context context, Collector<OrderWide> out) throws Exception {
out.collect(new OrderWide(orderInfo, orderDetail));
}
});
orderWideDs.print("order wide ds >>>");
try {
env.execute("order wide task");
} catch (Exception e) {
e.printStackTrace();
}
}
}
更多請在某公號平臺搜尋:選手一號位,本文編號:1009,回覆即可獲取。