序
本文主要研究一下flink如何相容StormTopology
例項
@Test
public void testStormWordCount() throws Exception {
//NOTE 1 build Topology the Storm way
final TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomWordSpout(), 1);
builder.setBolt("count", new WordCountBolt(), 5)
.fieldsGrouping("spout", new Fields("word"));
builder.setBolt("print", new PrintBolt(), 1)
.shuffleGrouping("count");
//NOTE 2 convert StormTopology to FlinkTopology
FlinkTopology flinkTopology = FlinkTopology.createTopology(builder);
//NOTE 3 execute program locally using FlinkLocalCluster
Config conf = new Config();
// only required to stabilize integration test
conf.put(FlinkLocalCluster.SUBMIT_BLOCKING, true);
final FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster();
cluster.submitTopology("stormWordCount", conf, flinkTopology);
cluster.shutdown();
}
複製程式碼
- 這裡使用FlinkLocalCluster.getLocalCluster()來建立或獲取FlinkLocalCluster,之後呼叫FlinkLocalCluster.submitTopology來提交topology,結束時通過FlinkLocalCluster.shutdown來關閉cluster
- 這裡構建的RandomWordSpout繼承自storm的BaseRichSpout,WordCountBolt繼承自storm的BaseBasicBolt;PrintBolt繼承自storm的BaseRichBolt(
由於flink是使用的Checkpoint機制,不會轉換storm的ack操作,因而這裡用BaseBasicBolt還是BaseRichBolt都無特別要求
) - FlinkLocalCluster.submitTopology這裡使用的topology是StormTopoloy轉換後的FlinkTopology
LocalClusterFactory
flink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkLocalCluster.java
// ------------------------------------------------------------------------
// Access to default local cluster
// ------------------------------------------------------------------------
// A different {@link FlinkLocalCluster} to be used for execution of ITCases
private static LocalClusterFactory currentFactory = new DefaultLocalClusterFactory();
/**
* Returns a {@link FlinkLocalCluster} that should be used for execution. If no cluster was set by
* {@link #initialize(LocalClusterFactory)} in advance, a new {@link FlinkLocalCluster} is returned.
*
* @return a {@link FlinkLocalCluster} to be used for execution
*/
public static FlinkLocalCluster getLocalCluster() {
return currentFactory.createLocalCluster();
}
/**
* Sets a different factory for FlinkLocalClusters to be used for execution.
*
* @param clusterFactory
* The LocalClusterFactory to create the local clusters for execution.
*/
public static void initialize(LocalClusterFactory clusterFactory) {
currentFactory = Objects.requireNonNull(clusterFactory);
}
// ------------------------------------------------------------------------
// Cluster factory
// ------------------------------------------------------------------------
/**
* A factory that creates local clusters.
*/
public interface LocalClusterFactory {
/**
* Creates a local Flink cluster.
* @return A local Flink cluster.
*/
FlinkLocalCluster createLocalCluster();
}
/**
* A factory that instantiates a FlinkLocalCluster.
*/
public static class DefaultLocalClusterFactory implements LocalClusterFactory {
@Override
public FlinkLocalCluster createLocalCluster() {
return new FlinkLocalCluster();
}
}
複製程式碼
- flink在FlinkLocalCluster裡頭提供了一個靜態方法getLocalCluster,用來獲取FlinkLocalCluster,它是通過LocalClusterFactory來建立一個FlinkLocalCluster
- LocalClusterFactory這裡使用的是DefaultLocalClusterFactory實現類,它的createLocalCluster方法,直接new了一個FlinkLocalCluster
- 目前的實現來看,每次呼叫FlinkLocalCluster.getLocalCluster,都會建立一個新的FlinkLocalCluster,這個在呼叫的時候是需要注意一下的
FlinkTopology
flink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java
/**
* Creates a Flink program that uses the specified spouts and bolts.
* @param stormBuilder The Storm topology builder to use for creating the Flink topology.
* @return A {@link FlinkTopology} which contains the translated Storm topology and may be executed.
*/
public static FlinkTopology createTopology(TopologyBuilder stormBuilder) {
return new FlinkTopology(stormBuilder);
}
private FlinkTopology(TopologyBuilder builder) {
this.builder = builder;
this.stormTopology = builder.createTopology();
// extract the spouts and bolts
this.spouts = getPrivateField("_spouts");
this.bolts = getPrivateField("_bolts");
this.env = StreamExecutionEnvironment.getExecutionEnvironment();
// Kick off the translation immediately
translateTopology();
}
複製程式碼
- FlinkTopology提供了一個靜態工廠方法createTopology用來建立FlinkTopology
- FlinkTopology先儲存一下TopologyBuilder,然後通過getPrivateField反射呼叫getDeclaredField獲取_spouts、_bolts私有屬性然後儲存起來,方便後面轉換topology使用
- 之後先獲取到ExecutionEnvironment,最後就是呼叫translateTopology進行整個StormTopology的轉換
translateTopology
flink-release-1.6.2/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/FlinkTopology.java
/**
* Creates a Flink program that uses the specified spouts and bolts.
*/
private void translateTopology() {
unprocessdInputsPerBolt.clear();
outputStreams.clear();
declarers.clear();
availableInputs.clear();
// Storm defaults to parallelism 1
env.setParallelism(1);
/* Translation of topology */
for (final Entry<String, IRichSpout> spout : spouts.entrySet()) {
final String spoutId = spout.getKey();
final IRichSpout userSpout = spout.getValue();
final FlinkOutputFieldsDeclarer declarer = new FlinkOutputFieldsDeclarer();
userSpout.declareOutputFields(declarer);
final HashMap<String, Fields> sourceStreams = declarer.outputStreams;
this.outputStreams.put(spoutId, sourceStreams);
declarers.put(spoutId, declarer);
final HashMap<String, DataStream<Tuple>> outputStreams = new HashMap<String, DataStream<Tuple>>();
final DataStreamSource<?> source;
if (sourceStreams.size() == 1) {
final SpoutWrapper<Tuple> spoutWrapperSingleOutput = new SpoutWrapper<Tuple>(userSpout, spoutId, null, null);
spoutWrapperSingleOutput.setStormTopology(stormTopology);
final String outputStreamId = (String) sourceStreams.keySet().toArray()[0];
DataStreamSource<Tuple> src = env.addSource(spoutWrapperSingleOutput, spoutId,
declarer.getOutputType(outputStreamId));
outputStreams.put(outputStreamId, src);
source = src;
} else {
final SpoutWrapper<SplitStreamType<Tuple>> spoutWrapperMultipleOutputs = new SpoutWrapper<SplitStreamType<Tuple>>(
userSpout, spoutId, null, null);
spoutWrapperMultipleOutputs.setStormTopology(stormTopology);
@SuppressWarnings({ "unchecked", "rawtypes" })
DataStreamSource<SplitStreamType<Tuple>> multiSource = env.addSource(
spoutWrapperMultipleOutputs, spoutId,
(TypeInformation) TypeExtractor.getForClass(SplitStreamType.class));
SplitStream<SplitStreamType<Tuple>> splitSource = multiSource
.split(new StormStreamSelector<Tuple>());
for (String streamId : sourceStreams.keySet()) {
SingleOutputStreamOperator<Tuple> outStream = splitSource.select(streamId)
.map(new SplitStreamMapper<Tuple>());
outStream.getTransformation().setOutputType(declarer.getOutputType(streamId));
outputStreams.put(streamId, outStream);
}
source = multiSource;
}
availableInputs.put(spoutId, outputStreams);
final ComponentCommon common = stormTopology.get_spouts().get(spoutId).get_common();
if (common.is_set_parallelism_hint()) {
int dop = common.get_parallelism_hint();
source.setParallelism(dop);
} else {
common.set_parallelism_hint(1);
}
}
/**
* 1. Connect all spout streams with bolts streams
* 2. Then proceed with the bolts stream already connected
*
* <p>Because we do not know the order in which an iterator steps over a set, we might process a consumer before
* its producer
* ->thus, we might need to repeat multiple times
*/
boolean makeProgress = true;
while (bolts.size() > 0) {
if (!makeProgress) {
StringBuilder strBld = new StringBuilder();
strBld.append("Unable to build Topology. Could not connect the following bolts:");
for (String boltId : bolts.keySet()) {
strBld.append("
");
strBld.append(boltId);
strBld.append(": missing input streams [");
for (Entry<GlobalStreamId, Grouping> streams : unprocessdInputsPerBolt
.get(boltId)) {
strBld.append("`");
strBld.append(streams.getKey().get_streamId());
strBld.append("` from `");
strBld.append(streams.getKey().get_componentId());
strBld.append("`; ");
}
strBld.append("]");
}
throw new RuntimeException(strBld.toString());
}
makeProgress = false;
final Iterator<Entry<String, IRichBolt>> boltsIterator = bolts.entrySet().iterator();
while (boltsIterator.hasNext()) {
final Entry<String, IRichBolt> bolt = boltsIterator.next();
final String boltId = bolt.getKey();
final IRichBolt userBolt = copyObject(bolt.getValue());
final ComponentCommon common = stormTopology.get_bolts().get(boltId).get_common();
Set<Entry<GlobalStreamId, Grouping>> unprocessedBoltInputs = unprocessdInputsPerBolt.get(boltId);
if (unprocessedBoltInputs == null) {
unprocessedBoltInputs = new HashSet<>();
unprocessedBoltInputs.addAll(common.get_inputs().entrySet());
unprocessdInputsPerBolt.put(boltId, unprocessedBoltInputs);
}
// check if all inputs are available
final int numberOfInputs = unprocessedBoltInputs.size();
int inputsAvailable = 0;
for (Entry<GlobalStreamId, Grouping> entry : unprocessedBoltInputs) {
final String producerId = entry.getKey().get_componentId();
final String streamId = entry.getKey().get_streamId();
final HashMap<String, DataStream<Tuple>> streams = availableInputs.get(producerId);
if (streams != null && streams.get(streamId) != null) {
inputsAvailable++;
}
}
if (inputsAvailable != numberOfInputs) {
// traverse other bolts first until inputs are available
continue;
} else {
makeProgress = true;
boltsIterator.remove();
}
final Map<GlobalStreamId, DataStream<Tuple>> inputStreams = new HashMap<>(numberOfInputs);
for (Entry<GlobalStreamId, Grouping> input : unprocessedBoltInputs) {
final GlobalStreamId streamId = input.getKey();
final Grouping grouping = input.getValue();
final String producerId = streamId.get_componentId();
final Map<String, DataStream<Tuple>> producer = availableInputs.get(producerId);
inputStreams.put(streamId, processInput(boltId, userBolt, streamId, grouping, producer));
}
final SingleOutputStreamOperator<?> outputStream = createOutput(boltId,
userBolt, inputStreams);
if (common.is_set_parallelism_hint()) {
int dop = common.get_parallelism_hint();
outputStream.setParallelism(dop);
} else {
common.set_parallelism_hint(1);
}
}
}
}
複製程式碼
- 整個轉換是先轉換spout,再轉換bolt,他們根據的spouts及bolts資訊是在構造器裡頭使用反射從storm的TopologyBuilder物件獲取到的
- flink使用FlinkOutputFieldsDeclarer(
它實現了storm的OutputFieldsDeclarer介面
)來承載storm的IRichSpout及IRichBolt裡頭配置的declareOutputFields資訊,不過要注意的是flink不支援dirct emit;這裡通過userSpout.declareOutputFields方法,將原始spout的declare資訊設定到FlinkOutputFieldsDeclarer - flink使用SpoutWrapper來包裝spout,將其轉換為RichParallelSourceFunction型別,這裡對spout的outputStreams的個數是否大於1進行不同處理;之後就是將RichParallelSourceFunction作為StreamExecutionEnvironment.addSource方法的引數建立flink的DataStreamSource,並新增到availableInputs中,然後根據spout的parallelismHit來設定DataStreamSource的parallelism
- 對於bolt的轉換,這裡維護了unprocessdInputsPerBolt,key為boltId,value為該bolt要連線的GlobalStreamId及Grouping方式,由於是使用map來進行遍歷的,因此轉換的bolt可能亂序,如果連線的GlobalStreamId存在則進行轉換,然後從bolts中移除,bolt連線的GlobalStreamId不在availableInputs中的時候,需要跳過處理下一個,不會從bolts中移除,因為外層的迴圈條件是bolts的size大於0,就是依靠這個機制來處理亂序
- 對於bolt的轉換有一個重要的方法就是processInput,它把bolt的grouping轉換為對spout的DataStream的對應操作(
比如shuffleGrouping轉換為對DataStream的rebalance操作,fieldsGrouping轉換為對DataStream的keyBy操作,globalGrouping轉換為global操作,allGrouping轉換為broadcast操作
),之後呼叫createOutput方法轉換bolt的執行邏輯,它使用BoltWrapper或者MergedInputsBoltWrapper將bolt轉換為flink的OneInputStreamOperator,然後作為引數對stream進行transform操作返回flink的SingleOutputStreamOperator,同時將轉換後的SingleOutputStreamOperator新增到availableInputs中,之後根據bolt的parallelismHint對這個SingleOutputStreamOperator設定parallelism
FlinkLocalCluster
flink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/api/FlinkLocalCluster.java
/**
* {@link FlinkLocalCluster} mimics a Storm {@link LocalCluster}.
*/
public class FlinkLocalCluster {
/** The log used by this mini cluster. */
private static final Logger LOG = LoggerFactory.getLogger(FlinkLocalCluster.class);
/** The Flink mini cluster on which to execute the programs. */
private FlinkMiniCluster flink;
/** Configuration key to submit topology in blocking mode if flag is set to {@code true}. */
public static final String SUBMIT_BLOCKING = "SUBMIT_STORM_TOPOLOGY_BLOCKING";
public FlinkLocalCluster() {
}
public FlinkLocalCluster(FlinkMiniCluster flink) {
this.flink = Objects.requireNonNull(flink);
}
@SuppressWarnings("rawtypes")
public void submitTopology(final String topologyName, final Map conf, final FlinkTopology topology)
throws Exception {
this.submitTopologyWithOpts(topologyName, conf, topology, null);
}
@SuppressWarnings("rawtypes")
public void submitTopologyWithOpts(final String topologyName, final Map conf, final FlinkTopology topology, final SubmitOptions submitOpts) throws Exception {
LOG.info("Running Storm topology on FlinkLocalCluster");
boolean submitBlocking = false;
if (conf != null) {
Object blockingFlag = conf.get(SUBMIT_BLOCKING);
if (blockingFlag instanceof Boolean) {
submitBlocking = ((Boolean) blockingFlag).booleanValue();
}
}
FlinkClient.addStormConfigToTopology(topology, conf);
StreamGraph streamGraph = topology.getExecutionEnvironment().getStreamGraph();
streamGraph.setJobName(topologyName);
JobGraph jobGraph = streamGraph.getJobGraph();
if (this.flink == null) {
Configuration configuration = new Configuration();
configuration.addAll(jobGraph.getJobConfiguration());
configuration.setString(TaskManagerOptions.MANAGED_MEMORY_SIZE, "0");
configuration.setInteger(TaskManagerOptions.NUM_TASK_SLOTS, jobGraph.getMaximumParallelism());
this.flink = new LocalFlinkMiniCluster(configuration, true);
this.flink.start();
}
if (submitBlocking) {
this.flink.submitJobAndWait(jobGraph, false);
} else {
this.flink.submitJobDetached(jobGraph);
}
}
public void killTopology(final String topologyName) {
this.killTopologyWithOpts(topologyName, null);
}
public void killTopologyWithOpts(final String name, final KillOptions options) {
}
public void activate(final String topologyName) {
}
public void deactivate(final String topologyName) {
}
public void rebalance(final String name, final RebalanceOptions options) {
}
public void shutdown() {
if (this.flink != null) {
this.flink.stop();
this.flink = null;
}
}
//......
}
複製程式碼
- FlinkLocalCluster的submitTopology方法呼叫了submitTopologyWithOpts,而後者主要是設定一些引數,呼叫topology.getExecutionEnvironment().getStreamGraph()根據transformations生成StreamGraph,再獲取JobGraph,然後建立LocalFlinkMiniCluster並start,最後使用LocalFlinkMiniCluster的submitJobAndWait或submitJobDetached來提交整個JobGraph
小結
- flink通過FlinkTopology對storm提供了一定的相容性,這對於遷移storm到flink非常有幫助
- 要在flink上執行storm的topology,主要有幾個步驟,分別是構建storm原生的TopologyBuilder,之後通過FlinkTopology.createTopology(builder)來將StormTopology轉換為FlinkTopology,最後是通過FlinkLocalCluster(
本地模式
)或者FlinkSubmitter(遠端提交
)的submitTopology方法提交FlinkTopology - FlinkTopology是flink相容storm的核心,它負責將StormTopology轉換為flink對應的結構,比如使用SpoutWrapper將spout轉換為RichParallelSourceFunction,然後新增到StreamExecutionEnvironment建立DataStream,把bolt的grouping轉換為對spout的DataStream的對應操作(
比如shuffleGrouping轉換為對DataStream的rebalance操作,fieldsGrouping轉換為對DataStream的keyBy操作,globalGrouping轉換為global操作,allGrouping轉換為broadcast操作
),然後使用BoltWrapper或者MergedInputsBoltWrapper將bolt轉換為flink的OneInputStreamOperator,然後作為引數對stream進行transform操作 - 構建完FlinkTopology之後,就使用FlinkLocalCluster提交到本地執行,或者使用FlinkSubmitter提交到遠端執行
- FlinkLocalCluster的submitTopology方法主要是通過FlinkTopology作用的StreamExecutionEnvironment生成StreamGraph,通過它獲取JobGraph,然後建立LocalFlinkMiniCluster並start,最後通過LocalFlinkMiniCluster提交JobGraph