兩個例子(來自Storm實戰 構建大資料實時計算)
例子一:模擬網站計算使用者PV(頁面瀏覽量)
拓撲圖如下:
1、編寫Topology
public class TopoMain {
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
TopologyBuilder builder=new TopologyBuilder();
builder.setSpout("log-reader", new LogReader(),1);
builder.setBolt("log-stat", new LogStat(), 2).fieldsGrouping("log-reader",new Fields("user"));
builder.setBolt("log-writer", new LogWriter(),1).shuffleGrouping("log-stat");
Config config=new Config();
config.setNumWorkers(5);
StormSubmitter.submitTopology("log-topology", config, builder.createTopology());
}
}
2、編寫spout
public class LogReader extends BaseRichSpout{
private SpoutOutputCollector _collector;
private Random _rand=new Random();
private int _count=100;
private String[] _users={"userA","userB","userC","userD","userE"};
private String[] _urls={"url1","url2","url3","url4","url5"};
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector=collector;
}
@Override
public void nextTuple() {
try {
Thread.sleep(1000);
while(_count-->=0)
{
_collector.emit(new Values(System.currentTimeMillis(),_users[_rand.nextInt(5)],_urls[_rand.nextInt(5)]));
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("time","user","url"));
}
}
public class LogStat extends BaseRichBolt{
private OutputCollector _collector;//注意和SpoutOutputCollector區分
private Map<String, Integer>_pvMap=new HashMap<String,Integer>();
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
_collector=collector;
}
@Override
public void execute(Tuple input) {
String user=input.getStringByField("user");
if(_pvMap.containsKey(user))
{
_pvMap.put(user, _pvMap.get(user)+1);
}
else {
_pvMap.put(user, 1);
}
//把每個使用者的最新PV輸出到下一節點
_collector.emit(new Values(user,_pvMap.get(user)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("user","pv"));
}
}
public class LogWriter extends BaseRichBolt{
private FileWriter writer=null;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
// TODO 自動生成的方法存根
try {
writer=new FileWriter("/"+this);
} catch (IOException e) {
// TODO 自動生成的 catch 塊
e.printStackTrace();
}
}
@Override
public void execute(Tuple input) {
try {
writer.write(input.getStringByField("user")+" : "+input.getIntegerByField("pv"));
writer.write("\n");
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
對結果分析:
結果會輸出最新的PV,如userA: 1, userB:1, userA:2, userA:3.................
例子二、改進上面的例子,讓最終結果每個使用者只輸出一次頁面訪問量
1、topology
public class TopoMain {
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
TopologyBuilder builder=new TopologyBuilder();
builder.setSpout("log-reader", new LogReader(),1);
//流log負責輸出日誌,採用欄位分組;流stop告訴下游輸出完畢,全域性廣播這個tuple
builder.setBolt("log-stat", new LogStat(), 2).fieldsGrouping("log-reader", "log", new Fields("user")).allGrouping("log-reader", "stop");
builder.setBolt("log-writer", new LogWriter(),1).shuffleGrouping("log-stat");
Config config=new Config();
config.setNumWorkers(5);
StormSubmitter.submitTopology("log-topology", config, builder.createTopology());
}
}
2、spout
public class LogReader extends BaseRichSpout{
private SpoutOutputCollector _collector;
private Random _rand=new Random();
private int _count=100;
private String[] _users={"userA","userB","userC","userD","userE"};
private String[] _urls={"url1","url2","url3","url4","url5"};
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector=collector;
}
@Override
public void nextTuple() {
try {
Thread.sleep(1000);
while(_count-->=0)
{
if(_count==0)
{
//傳送""發給下流bolt
_collector.emit("stop", new Values(""));
}
_collector.emit("log",new Values(System.currentTimeMillis(),_users[_rand.nextInt(5)],_urls[_rand.nextInt(5)]));
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declareStream("log",new Fields("time","user","url"));
declarer.declareStream("stop", new Fields("flag"));
}
}
3、bolt
public class LogStat extends BaseRichBolt{
private OutputCollector _collector;//注意和SpoutOutputCollector區分
private Map<String, Integer>_pvMap=new HashMap<String,Integer>();
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
_collector=collector;
}
@Override
public void execute(Tuple input) {
String streamId=input.getSourceStreamId();
if(streamId.equals("log"))
{
String user=input.getStringByField("user");
if(_pvMap.containsKey(user))
{
_pvMap.put(user, _pvMap.get(user)+1);
}
else {
_pvMap.put(user, 1);
}
}
if(streamId.equals("stop"))
{
//從map取資料傳送
Iterator<Entry<String,Integer>> it=_pvMap.entrySet().iterator();
while(it.hasNext())
{
Entry<String,Integer> entry=it.next();
_collector.emit(new Values(entry.getKey(),entry.getValue()));
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("user","pv"));
}
}
public class LogWriter extends BaseRichBolt{
private FileWriter writer=null;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
// TODO 自動生成的方法存根
try {
writer=new FileWriter("/"+this);
} catch (IOException e) {
// TODO 自動生成的 catch 塊
e.printStackTrace();
}
}
@Override
public void execute(Tuple input) {
try {
writer.write(input.getStringByField("user")+" : "+input.getIntegerByField("pv"));
writer.write("\n");
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
例三、模擬一個簡化的電子商務網站來實時計算來源成交效果
1、topology
public class TopoMain {
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
TopologyBuilder builder=new TopologyBuilder();
builder.setSpout("log-vspout", new VSpout(),1);
builder.setSpout("log-bspout", new BSpout(),1);
//定義兩個流名稱,分別為visit和business
builder.setBolt("log-merge", new LogMergeBolt(),2).fieldsGrouping("log-vspout",
"visit",new Fields("user")).fieldsGrouping("log-bspout","business",new
Fields("user"));
builder.setBolt("log-stat", new LogStatBolt(),2).fieldsGrouping("log-merge",new
Fields("srcid"));
builder.setBolt("log-writer", new LogWriter()).globalGrouping("log-stat");
Config conf=new Config();
//實時計算不需要可靠訊息,故關閉acker節省通訊資源
conf.setNumAckers(0);
//設定獨立java程式數,一般設定為同spout和bolt的總tasks數量相等或更多,使得每個task
//都執行在獨立的Java程式中,以避免多個task集中在一個jvm裡執行產
//生GC(垃圾回收機制)瓶頸
conf.setNumWorkers(8);
StormSubmitter.submitTopology("ElectronicCom_Top", conf, builder.createTopology());
}
}
2、spout
public class BSpout extends BaseRichSpout{
private SpoutOutputCollector _collector;
private String[] _users={"userA","userB","userC","userD","userE"};
private String[] _pays={"100","200","300","400","200"};
private int count=5;
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector=collector;
}
@Override
public void nextTuple() {
for(int i=0;i<count;i++)
{
try {
Thread.sleep(1500);//停頓時間長一些,使得流量日誌先於成交日誌到達下游的LogMergeBolt元件
_collector.emit("business", new Values(System.currentTimeMillis(),_users[i],_pays[i]));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declareStream("business", new Fields("time","user","pay"));
}
}
public class VSpout extends BaseRichSpout{
private SpoutOutputCollector _collector;
private String[] _users={"userA","userB","userC","userD","userE"};
private String[] _srcid={"s1","s2","s3","s1","s1"};
private int count=5;
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector=collector;
}
@Override
public void nextTuple() {
for(int i=0;i<count;i++)
{
try {
Thread.sleep(1000);
_collector.emit("visit", new Values(System.currentTimeMillis(),_users[i],_srcid[i]));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declareStream("visit", new Fields("time","user","srcid"));
}
}
3、bolt
public class LogMergeBolt extends BaseRichBolt{
private OutputCollector _collector;
//暫時儲存使用者的訪問日誌記錄
private HashMap<String, String> srcMap;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
_collector=collector;
if(srcMap==null)
{
srcMap=new HashMap<String, String>();
}
}
@Override
public void execute(Tuple input) {
String streamID=input.getSourceStreamId();
if(streamID.equals("visit"))
{
String user=input.getStringByField("user");
String srcid=input.getStringByField("srcid");
srcMap.put(user, srcid);
}
if(streamID.equals("business"))
{
String user=input.getStringByField("user");
String pay=input.getStringByField("pay");
String srcid=srcMap.get(user);
if(srcid!=null)
{
_collector.emit(new Values(user,pay,srcid));
srcMap.remove(user);
}
else {
//一般只有成交日誌先於流量日誌才會發生
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("user","srcid","pay"));
}
}
public class LogStatBolt extends BaseRichBolt{
private OutputCollector _collector;
private HashMap<String, Long> srcpay;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
_collector=collector;
if(srcpay==null)
{
srcpay=new HashMap<String, Long>();
}
}
@Override
public void execute(Tuple input) {
String pay=input.getStringByField("pay");
String srcid=input.getStringByField("srcid");
if(srcpay.containsKey(srcid))
{
srcpay.put(srcid, srcpay.get(srcid)+Long.parseLong(pay.trim()));
}
else {
srcpay.put(srcid, Long.parseLong(pay.trim()));
}
_collector.emit(new Values(srcid,srcpay.get(srcid)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("srcid","pay"));
}
}
public class LogWriter extends BaseRichBolt{
private HashMap<String, Long> counts=null;
private FileWriter writer=null;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
// TODO 自動生成的方法存根
this.counts=new HashMap<String, Long>();
try {
writer=new FileWriter("/"+this);
} catch (IOException e) {
// TODO 自動生成的 catch 塊
e.printStackTrace();
}
}
@Override
public void execute(Tuple input) {
// TODO 自動生成的方法存根
String srcid=input.getStringByField("srcid");
Long pay=input.getLongByField("pay");
counts.put(srcid, pay);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// TODO 自動生成的方法存根
}
@Override
public void cleanup() {
// TODO 自動生成的方法存根
Iterator<Entry<String,Long>> it=counts.entrySet().iterator();
while(it.hasNext())
{
Entry<String,Long> entry=it.next();
try {
writer.write(entry.getKey()+" : "+entry.getValue());
writer.write("\n");
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
相關文章
- Storm 實戰:構建大資料實時計算ORM大資料
- 大資料開發實戰:實時資料平臺和流計算大資料
- storm實時計算例項(socket實時接入)ORM
- 大資料6.1 - 實時分析(storm和kafka)大資料ORMKafka
- 百城匯杭州站大資料實時計算實戰專場圓滿落幕大資料
- 大資料Storm 之RCE實踐大資料ORM
- 大資料架構:flume-ng+Kafka+Storm+HDFS 實時系統組合大資料架構KafkaORM
- 《Storm分散式實時計算模式》——1.7總結ORM分散式模式
- 一文讀懂大資料實時計算大資料
- 如何使用HBase?大資料儲存的兩個實戰場景大資料
- 大資料“重磅炸彈”:實時計算框架 Flink大資料框架
- 伍翀 :大資料實時計算Flink SQL解密大資料SQL解密
- Kafka實時流資料經Storm至HdfsKafkaORM
- 一個Golden Gate實時資料分發的例子Go
- 七牛大資料平臺的實時資料分析實戰大資料
- JStorm-Alibaba——Storm的實時流式計算框架JSORM框架
- 基於Redis、Storm的實時資料查詢實踐RedisORM
- 實時計算無線資料分析
- 《離線和實時大資料開發實戰》(二)大資料平臺架構 & 技術概覽大資料架構
- 《Greenplum構建實時資料倉儲實踐》簡介
- Arctic助力傳媒實現低成本的大資料準實時計算大資料
- Hadoop大資料實戰系列文章之Mapreduce 計算框架Hadoop大資料框架
- vivo 實時計算平臺建設實踐
- Storm實戰之WordCountORM
- Storm實戰之TopNORM
- Kafka實戰-Storm ClusterKafkaORM
- 《Storm實時資料處理》一1.1 簡介ORM
- 從Storm到Flink,有贊五年實時計算效率提升實踐ORM
- 大資料計算:結構化大資料計算的理想模式大資料模式
- 實時資料架構體系建設指南架構
- 實戰 | 使用 Kotlin Flow 構建資料流 "管道"Kotlin
- B 站構建實時資料湖的探索和實踐
- Storm入門之第6章一個實際的例子ORM
- Uber實時資料基礎設施:分散式計算架構分散式架構
- 計算兩個時間差
- Kafka實戰-Kafka到StormKafkaORM
- Apache Hudi 在 B 站構建實時資料湖的實踐Apache
- 實戰資料結構(2)_兩個單連結串列間的刪除操作資料結構