使用Ganglia監控Spark

筆尖的痕發表於2016-03-13
在本部落格的《Spark Metrics配置詳解》文章中介紹了Spark Metrics的配置,其中我們就介紹了Spark監控支援Ganglia Sink。

  Ganglia是UC Berkeley發起的一個開源叢集監視專案,主要是用來監控系統效能,如:cpu 、mem、硬碟利用率, I/O負載、網路流量情況等,通過曲線很容易見到每個節點的工作狀態,對合理調整、分配系統資源,提高系統整體效能起到重要作用。

  由於Licene的限制,沒有放到預設的build裡面,如果需要使用,需要自己編譯。在使用Maven編譯Spark的時候,我們可以加上-Pspark-ganglia-lgpl選項來將Ganglia相關的類打包進spark-assembly-x.x.x-hadoopx.x.x.jar中,命令如下:

1 [iteblog@iteblog spark]$ ./make-distribution.sh --tgz -Phadoop-2.4 -Pyarn -DskipTests
2   -Dhadoop.version=2.4.0 -Pspark-ganglia-lgpl

  如果你使用的是SBT來編譯,可以加上SPARK_GANGLIA_LGPL=true,完整命令如下:

1 [iteblog@iteblog spark]$ SPARK_HADOOP_VERSION=2.4.0 SPARK_YARN=true
2   SPARK_GANGLIA_LGPL=true sbt/sbt assembly

  或者你在提交作業的時候,單獨將Ganglia相關依賴加入到--jars選項中:

1 --jars lib/spark-ganglia-lgpl_2.10-x.x.x.jar  ...

  依賴弄好之後,我們需要在$SPARK_HOME/conf/metrics.properties檔案中加入一下配置:

1 *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
2 *.sink.ganglia.host=www.iteblog.com
3 *.sink.ganglia.port=8080
4 *.sink.ganglia.period=10
5 *.sink.ganglia.unit=seconds
6 *.sink.ganglia.ttl=1
7 *.sink.ganglia.mode=multicast

hostport這個就是你Ganglia監控的地址,其中mode支援'unicast'(單播) 和 'multicast'(多播)兩種模式。

  如果你出現類似下面的異常資訊:
01 15/06/11 23:35:14 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.GangliaSink cannot be instantialized
02 java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.GangliaSink
03         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
04         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
05         at java.security.AccessController.doPrivileged(Native Method)
06         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
07         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
08         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
09         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
10         at java.lang.Class.forName0(Native Method)
11         at java.lang.Class.forName(Class.java:191)
12         at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:138)
13         at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:134)
14         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
15         at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
16         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
17         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
18         at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
19         at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:134)
20         at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:84)
21         at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:171)
22         at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:106)
23         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
24         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
25         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
26         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
27         at akka.util.Reflect$.instantiate(Reflect.scala:65)
28         at akka.actor.Props.newActor(Props.scala:337)
29         at akka.actor.ActorCell.newActor(ActorCell.scala:534)
30         at akka.actor.ActorCell.create(ActorCell.scala:560)
31         at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)
32         at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
33         at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
34         at akka.dispatch.Mailbox.run(Mailbox.scala:218)
35         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
36         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
37         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
38         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
39         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

  請檢視你的Spark包是否將org.apache.spark.metrics.sink.GangliaSink打包進去了;或者仔細看下你的配置檔案,請儘量拷貝我這裡提供的。

  配置弄好之後,啟動你的Spark叢集,然後去http://www.iteblog.com/ganglia-web監控頁面檢視是否弄好了,類似下面的資訊:


如果想及時瞭解Spark、Hadoop或者Hbase相關的文章,歡迎關注微信公共帳號:iteblog_hadoop

  除了上圖的master.appsmaster.workers 監控,Ganglia裡面還顯示如下的資訊:

01 {
02     "version": "3.0.0",
03     "gauges": {
04         "jvm.PS-MarkSweep.count": {
05             "value": 0
06         },
07         "jvm.PS-MarkSweep.time": {
08             "value": 0
09         },
10         "jvm.PS-Scavenge.count": {
11             "value": 186
12         },
13         "jvm.PS-Scavenge.time": {
14             "value": 375
15         },
16         "jvm.heap.committed": {
17             "value": 536412160
18         },
19         "jvm.heap.init": {
20             "value": 536870912
21         },
22         "jvm.heap.max": {
23             "value": 536412160
24         },
25         "jvm.heap.usage": {
26             "value": 0.315636349481712
27         },
28         "jvm.heap.used": {
29             "value": 169311176
30         },
31         "jvm.non-heap.committed": {
32             "value": 37879808
33         },
34         "jvm.non-heap.init": {
35             "value": 24313856
36         },
37         "jvm.non-heap.max": {
38             "value": 184549376
39         },
40         "jvm.non-heap.usage": {
41             "value": 0.19970542734319513
42         },
43         "jvm.non-heap.used": {
44             "value": 36855512
45         },
46         "jvm.pools.Code-Cache.usage": {
47             "value": 0.031689961751302086
48         },
49         "jvm.pools.PS-Eden-Space.usage": {
50             "value": 0.9052384254331968
51         },
52         "jvm.pools.PS-Old-Gen.usage": {
53             "value": 0.02212668565200476
54         },
55         "jvm.pools.PS-Perm-Gen.usage": {
56             "value": 0.26271122694015503
57         },
58         "jvm.pools.PS-Survivor-Space.usage": {
59             "value": 0.5714285714285714
60         },
61         "jvm.total.committed": {
62             "value": 574291968
63         },
64         "jvm.total.init": {
65             "value": 561184768
66         },
67         "jvm.total.max": {
68             "value": 720961536
69         },
70         "jvm.total.used": {
71             "value": 206166688
72         },
73         "master.apps": {
74             "value": 0
75         },
76         "master.waitingApps": {
77             "value": 0
78         },
79         "master.workers": {
80             "value": 0
81         }
82     },
83     "counters": { },
84     "histograms": { },
85     "meters": { },
86     "timers": { }
87 }

相關文章