Storm任務啟動過慢

滄南發表於2017-03-01

最近Storm叢集提交任務後,任務的worker需要很長時間才能執行成功,從UI介面可以看到,個別worker一直在嘗試不同機器的solt來啟動,日誌報錯如下:

2017-03-01T18:47:51.785+0800 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.1.3.41/10.1.3.41:2181, sessionid = 0x25a44c27616369d, negotiated timeout = 120000
2017-03-01T18:47:51.785+0800 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2017-03-01T18:47:51.817+0800 b.s.d.worker [INFO] Reading Assignments.
2017-03-01T18:47:51.968+0800 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context
2017-03-01T18:47:52.119+0800 b.s.d.worker [INFO] Launching receive-thread for 21e824b9-6ed0-471f-a34d-278d2648ddc0:6719
2017-03-01T18:47:52.131+0800 b.s.m.n.Server [INFO] Create Netty Server Netty-server-localhost-6719, buffer_size: 5242880, maxWorkers: 1
2017-03-01T18:47:52.154+0800 b.s.d.worker [ERROR] Error on initialization of server mk-worker
org.apache.storm.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:6719
    at org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.messaging.netty.Server.<init>(Server.java:130) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.messaging.netty.Context.bind(Context.java:75) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68) ~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:668) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:378) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.worker$fn__6959$exec_fn__1103__auto____6960.invoke(worker.clj:413) ~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
    at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker$fn__6959$mk_worker__7015.doInvoke(worker.clj:391) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:502) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker.main(Unknown Source) [storm-core-0.9.5.jar:0.9.5]
    java.net.BindException: 地址已在使用
    at sun.nio.ch.Net.bind0(Native Method) ~[na:1.7.0_67]
    at sun.nio.ch.Net.bind(Net.java:444) ~[na:1.7.0_67]
    at sun.nio.ch.Net.bind(Net.java:436) ~[na:1.7.0_67]
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) ~[na:1.7.0_67]
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[na:1.7.0_67]
    at org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) ~[storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) ~[storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) ~[storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[storm-core-0.9.5.jar:0.9.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_67]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_67]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_67]

該worker在使用6719埠時,報java.net.BindException: 地址已在使用。

檢視哪個程式在使用6719埠:

netstat -anp|grep 6719
tcp        0      0 ::ffff:10.1.3.57:6719       ::ffff:10.1.24.102:2181     ESTABLISHED 16816/java

程式16816的資訊:

root     16816 71.3  1.1 12586988 2262020 ?    Sl   17:16  62:58 /usr/local/webserver/jdk1.7.0_67/bin/java -Dlogfile.name=worker-6705.log -Dstorm.home=/usr/local/webserver/apache-storm-0.9.5 -Dstorm.conf.file= -Dstorm.options= -Dstorm.log.dir=/data/logs/storm -Dlogback.configurationFile=/usr/local/webserver/apache-storm-0.9.5/logback/cluster.xml  -Dworker.port=6705 -cp /data/server/storm/supervisor/stormdist/***/stormjar.jar backtype.storm.daemon.worker logtrack_online@di-667-1484818913 21e824b9-6ed0-471f-a34d-278d2648ddc0

可以看到程式啟動引數中,使用的6705埠。

netstat -anp | grep 6705

tcp        0      0 ::ffff:10.1.3.57:6705       ::ffff:10.1.3.52:2776       ESTABLISHED 16816/java          
tcp        0      0 ::ffff:10.1.3.57:6705       ::ffff:10.1.3.51:2349       ESTABLISHED 16816/java          
tcp        0      0 ::ffff:10.1.3.57:6705       ::ffff:10.1.3.52:1812       ESTABLISHED 16816/java          
tcp        0      0 ::ffff:10.1.3.57:6705       ::ffff:10.1.3.46:57767      ESTABLISHED 16816/java          
tcp     1551      0 ::ffff:10.1.3.57:6705       ::ffff:10.1.3.58:7088       ESTABLISHED 16816/java

將該worker殺掉後,6719埠即可正常使用。

相關文章