一個JAVA應用啟動緩慢問題排查 --來自jdk SecureRandom 的困惑

等你歸去來發表於2019-05-09

  開發某個專案過程中,就需求,搭建了一套測試環境。很快完成!

  後來程式碼中加入了許多新功能,會涉及到反覆重啟,然後就發現了啟動特別慢。這給原本功能就不多的應用增添了許多的負擔。

  我決定改變這一切!找到啟動緩慢的根源,加快啟動速度!

思路展開:

  1. 啟動慢,先看日誌,有沒有什麼異常?
  2. 看下是不是gc 有問題,是否佔用很長時間?
  3. 日誌卡住,是否是存在網路請求不通情況?
  4. 抓包檢視卡住的時候,應用都做什麼?
  5. 把執行緒堆疊列印出來,執行緒都在做什麼?把記憶體dump 出來瞅瞅吧?
  6. 實在不行,本地debug不行,換一臺測試機試試?
  7. 實在不行,遠端debug吧?
  8. 定位到點後,再來思考解決好的解決方案?
  9. 除了解決這個問題,我們還能思考點什麼?

讓我按思路一個個來看解決過程!

1. 啟動慢,先看日誌,有沒有什麼異常?

  這個基本上很快就有答案了,看日誌,等待期間沒有問題!

  但是,我們往往會停留在這個階段較長一段時間,因為我們總想通過其他方式,列印點什麼出來!

  所以,我們可以會改日誌級別,try... catch... 等等一頓操作,然後,才說算了吧,我們想想別的招!我個人還是覺得這種思路是ok的!

 

2. 看下是不是gc 有問題,是否佔用很長時間?

  這裡可能就會涉及,之前並沒有列印gc日誌,然後還得回去在啟動引數里加上gc輸出!姿勢如下:

-Xloggc:/opt/logs/a_gc.log -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logs/ -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled -XX:ErrorFile=/run/dump/hs_err_pid%p.lo3g -XX:HeapDumpPath=/run/dump

  開啟後,你看到的gc 日誌可能是這樣的。 你想從裡面發現點什麼,好像頻繁沒有 FullGC 之類的操作,然後又悻悻而歸!

10.224: [GC concurrent-root-region-scan-start]
10.249: [GC concurrent-root-region-scan-end, 0.0252691 secs]
10.249: [GC concurrent-mark-start]
10.250: [GC concurrent-mark-end, 0.0005479 secs]
10.250: [GC remark 10.251: [Finalize Marking, 0.0001862 secs] 10.251: [GC ref-proc, 0.0026452 secs] 10.253: [Unloading, 0.0125200 secs], 0.0156381 secs]
 [Times: user=0.06 sys=0.00, real=0.02 secs]
10.266: [GC cleanup 33M->33M(1024M), 0.0012152 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]
13.768: [GC pause (G1 Evacuation Pause) (young), 0.0411037 secs]
   [Parallel Time: 34.5 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 13768.7, Avg: 13769.8, Max: 13773.1, Diff: 4.4]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 1.8, Max: 4.1, Diff: 4.1, Sum: 7.3]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.1, Max: 8.8, Diff: 8.8, Sum: 12.5]
      [Object Copy (ms): Min: 21.1, Avg: 28.0, Max: 30.5, Diff: 9.4, Sum: 111.9]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 12.2, Max: 22, Diff: 21, Sum: 49]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 29.9, Avg: 33.2, Max: 34.3, Diff: 4.4, Sum: 132.6]
      [GC Worker End (ms): Min: 13803.0, Avg: 13803.0, Max: 13803.0, Diff: 0.0]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 5.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.9 ms]
   [Eden: 544.0M(544.0M)->0.0B(573.0M) Survivors: 31.0M->41.0M Heap: 575.0M(1024.0M)->41.0M(1024.0M)]
 [Times: user=0.13 sys=0.02, real=0.04 secs]
17.178: [GC pause (G1 Evacuation Pause) (young), 0.0648366 secs]
   [Parallel Time: 59.7 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 17177.7, Avg: 17181.1, Max: 17184.4, Diff: 6.6]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 3.9, Max: 12.7, Diff: 12.7, Sum: 15.6]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 0.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 6.5, Max: 19.4, Diff: 19.4, Sum: 26.0]
      [Object Copy (ms): Min: 34.9, Avg: 45.5, Max: 51.2, Diff: 16.3, Sum: 182.1]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
         [Termination Attempts: Min: 1, Avg: 16.5, Max: 24, Diff: 23, Sum: 66]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [GC Worker Total (ms): Min: 52.9, Avg: 56.2, Max: 59.5, Diff: 6.6, Sum: 224.8]
      [GC Worker End (ms): Min: 17237.3, Avg: 17237.3, Max: 17237.3, Diff: 0.0]
   [Code Root Fixup: 0.5 ms]
20.648: [GC pause (G1 Evacuation Pause) (young), 0.0653569 secs]
   [Parallel Time: 55.9 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 20648.5, Avg: 20650.4, Max: 20652.3, Diff: 3.7]
      [Ext Root Scanning (ms): Min: 0.1, Avg: 2.9, Max: 9.4, Diff: 9.4, Sum: 11.7]
      [Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.5, Sum: 0.7]
         [Processed Buffers: Min: 0, Avg: 2.5, Max: 7, Diff: 7, Sum: 10]
      [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum: 1.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.2, Max: 10.8, Diff: 10.8, Sum: 12.9]
      [Object Copy (ms): Min: 39.4, Avg: 46.4, Max: 52.1, Diff: 12.8, Sum: 185.5]
      [Termination (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum: 3.7]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 52.1, Avg: 54.0, Max: 55.8, Diff: 3.7, Sum: 215.8]
      [GC Worker End (ms): Min: 20704.4, Avg: 20704.4, Max: 20704.4, Diff: 0.0]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 9.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 38.0M(8192.0K)->0.0B(610.0M) Survivors: 43.0M->4096.0K Heap: 83.3M(1024.0M)->54.4M(1024.0M)]
 [Times: user=0.24 sys=0.02, real=0.06 secs]
2826.037: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0101504 secs]
   [Parallel Time: 7.1 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 2826037.4, Avg: 2826037.5, Max: 2826037.5, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 1.7, Avg: 2.0, Max: 2.2, Diff: 0.5, Sum: 7.8]
      [Update RS (ms): Min: 0.9, Avg: 1.0, Max: 1.0, Diff: 0.1, Sum: 3.8]
         [Processed Buffers: Min: 7, Avg: 14.2, Max: 25, Diff: 18, Sum: 57]
      [Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum: 2.8]
      [Code Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
      [Object Copy (ms): Min: 2.7, Avg: 2.9, Max: 3.2, Diff: 0.5, Sum: 11.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 6.9, Avg: 7.0, Max: 7.0, Diff: 0.1, Sum: 27.8]
      [GC Worker End (ms): Min: 2826044.4, Avg: 2826044.4, Max: 2826044.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 2.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.0 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.5 ms]
   [Eden: 354.0M(438.0M)->0.0B(603.0M) Survivors: 4096.0K->11.0M Heap: 408.3M(1024.0M)->61.4M(1024.0M)]
 [Times: user=0.03 sys=0.00, real=0.01 secs]
2826.047: [GC concurrent-root-region-scan-start]
2826.054: [GC concurrent-root-region-scan-end, 0.0066742 secs]
2826.054: [GC concurrent-mark-start]
2826.116: [GC concurrent-mark-end, 0.0616457 secs]
2826.116: [GC remark 2826.116: [Finalize Marking, 0.0002202 secs] 2826.116: [GC ref-proc, 0.0031923 secs] 2826.120: [Unloading, 0.0131458 secs], 0.0169584 secs]
 [Times: user=0.05 sys=0.00, real=0.01 secs]
2826.134: [GC cleanup 62M->62M(1024M), 0.0010641 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]

  不管怎麼樣,你知道記憶體不是問題了。不過一般地,在啟動時就存在記憶體問題的應用可能確實不多!

 

3. 日誌卡住,是否是存在網路請求不通情況?

  一般來說,我們的應用,每做一些關鍵操作時,都會有相應的日誌輸出。所以,當你日誌卡住的時候,應該就是哪裡出現了問題了!

  而在排除了中間出現full gc 卡頓的問題後,我們可以認為可能是網路出現問題了。

  查詢網路問題必備的工具: tcpdump, lsof ...

  用法也很簡單,我們可以在完全沒有任何資訊的前提下,進行網路抓包,從而給排查問題一點提示:

tcpdump -iany -XX     # 這樣就可以看到應用的網路io情況了, 如果量太大可以先把 -XX 選項去除,只看來往情況:

  大概結果如下:

16:54:25.770463 IP 10.5.3.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, length 0
16:54:25.770478 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 127152:127332, ack 1, win 561, length 180
16:54:25.770482 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, options [nop,nop,sack 1 {125504:125684}], length 0
16:54:25.770487 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, length 0
16:54:25.770554 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127332:127864, ack 1, win 561, length 532
16:54:25.770579 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127864:127916, ack 1, win 561, length 52
16:54:25.770612 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127916:128208, ack 1, win 561, length 292
16:54:25.771813 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, options [nop,nop,sack 1 {125864:126044}], length 0
16:54:25.771822 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128208:128388, ack 1, win 561, length 180
16:54:25.771837 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126044, win 555, length 0
16:54:25.771840 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126792, win 552, length 0
16:54:25.771868 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128388:128808, ack 1, win 561, length 420
16:54:25.771884 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128808:128972, ack 1, win 561, length 164
16:54:25.771891 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127152, win 556, length 0
16:54:25.771934 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128972:129376, ack 1, win 561, length 404
16:54:25.771959 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129376:129556, ack 1, win 561, length 180
16:54:25.771982 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129556:129736, ack 1, win 561, length 180
16:54:25.779454 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127864, win 553, length 0
16:54:25.779469 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129736:129916, ack 1, win 561, length 180
16:54:25.779474 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128208, win 552, length 0
16:54:25.779476 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128808, win 556, length 0
16:54:25.779478 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129376, win 554, length 0
16:54:25.779480 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129736, win 553, length 0
16:54:25.779529 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 129916:130496, ack 1, win 561, length 580
16:54:25.779567 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 130496:130660, ack 1, win 561, length 164

  其中,我們主要關注 Flags 引數. man tcpdump 後可以看到解釋:

  Tcpflags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), U (URG), W (ECN CWR), E (ECN-Echo) or '.' (ACK)

  詳細說明就是:

    # SYN 表示建立連線;
    # FIN 表示關閉連線;
    # ACK 表示響應;
    # PSH 表示有 DATA資料傳輸;
    # RST 表示連線重置;
    # URG 表示緊急停止位;

  所以,我們在這裡需要關注這麼多嗎? 其實不需要的,我覺得我們們可以先看下是否有 RST 連線重置的情況,如果存在,則說明這個網路是不通的,基本定位網路問題!~ 如下抓包:

17:30:41.635937 IP zt-d-xx.43062 > 172.1.0.17.http: Flags [S], seq 3503889751, win 29200, options [mss 1460,sackOK,TS val 1904549734 ecr 0,nop,wscale 7], length 0
17:30:41.636084 IP 172.1.0.17.http > zt-d-xx.43062: Flags [R.], seq 0, ack 3503889752, win 0, length 0

  另外,要檢視下是不是某個網路請求 只有 [S] 訊號,沒有 [P] 的訊號, 則說明只有一端在建立連線,另一邊則無響應,網路問題定位!如下請求一個不存在的地址:

17:13:57.744349 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903545842 ecr 0,nop,wscale 7], length 0
17:13:58.745584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903546844 ecr 0,nop,wscale 7], length 0
17:14:00.749570 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903548848 ecr 0,nop,wscale 7], length 0
17:14:04.757571 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903552856 ecr 0,nop,wscale 7], length 0
17:14:12.765568 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903560864 ecr 0,nop,wscale 7], length 0
17:14:28.797584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903576896 ecr 0,nop,wscale 7], length 0
17:15:00.861591 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903608960 ecr 0,nop,wscale 7], length 0

   發了n個 [S] 包,都沒有響應; 說明整個網路都是不通的;

  通過這個排查,我們一般排除網路問題!

 

4. 抓包檢視卡住的時候,應用都做什麼?

  這個點其實可以早點做,使用 top 檢視記憶體,cpu 佔用情況!

  使用 vmstat 檢視系統虛擬記憶體狀態,也可以報告關於程式、記憶體、I/O等系統整體執行狀態!

  使用 lsof -p <pid> 檢視開啟的檔案情況!應該可以看到一些不一樣的東西,比如你會看到一些你不知道被開啟的檔案,你會看到一些你不知道的連線請求!

lsof -i   # 列出所有的網路連線
lsof -p <pid>     # 列出程式對應的檔案資訊
lsof -i4:8080 # 列表8080 埠對應的 ipv4 的連線

 

5. 把執行緒堆疊列印出來,執行緒都在做什麼?把記憶體dump 出來瞅瞅吧?

  如果還是沒有發現什麼,那我們還得回來。看看執行緒卡在了哪裡!

  使用 jstack 把執行緒堆疊列印出來,使用 jmap 把記憶體dump 出來,分析:

jstack <pid> | less        # 檢視執行緒資訊,是否存在死鎖
jmap -dump:format=b,file=/tmp/dumpid.dump <pid>        # 檢視堆資訊,是否需要特別的物件

 

6. 實在不行,本地debug不行,換一臺測試機試試?

  對於本地debug, 我只能說,一般環境都不通的,而且本地一般也很復現場景!

  換臺機器測試的目的,其實是想確認問題是否在所有機器上覆現,因為如果是和機器本身相關的問題,往往是很難排查的。(我說的沒有相關經驗的同學)

  所以,找一臺另外機器,程式碼跑起來,如果其他地方正常,則該問題是機器相關的。

  那麼機器相關的bug又該怎麼辦呢?其實,還得具體問題,具體分析!

  不過幸好,我們還有 remote debug 功能,直接連線上去就可以除錯了。問題自然也就清晰起來!

 

7. 實在不行,遠端debug吧?

  remote debug 請檢視上一篇文章: 排障利器之遠端除錯與監控 --jmx & remote debug 

  連線上之後,先大概猜測可能會出問題的地方,進行斷點。判斷依據,應該是之前的排查結果,執行緒情況等等!

  進行大概估計後,在可能的地方進行單步,卡住的地方,其中必然出現了問題。只需再進入內部重複剛才的做法即可!

  優點是這樣除錯下來,必定能找到問題出現的地方。缺點是:可能需要反覆重現問題(可能就是反覆重啟)。但是這樣顯得沒有技術含量,還有就是在問題難以復現的場景,很難抓住機會解決問題。

  所以,經驗真的很重要!

  卡住時的堆疊如下:

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
      at java.io.FileInputStream.read(FileInputStream.java:255)
      at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
      at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
      at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:203)
      at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
      - locked <0x1af3> (a sun.security.provider.SecureRandom)
      at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
      at java.security.SecureRandom.next(SecureRandom.java:491)
      at java.util.Random.nextLong(Random.java:424)
      at com.taobao.notify.utils.StrongRandom.<init>(StrongRandom.java:45)
      at com.taobao.notify.utils.UniqId.<init>(UniqId.java:38)
      at com.taobao.notify.utils.UniqId.<clinit>(UniqId.java:36)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:91)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:97)
      at com.taobao.notify.client.NotifyClientFactory.<clinit>(NotifyClientFactory.java:13)
      at com.taobao.notify.remotingclient.DefaultNotifyManager.<init>(DefaultNotifyManager.java:38)
      at com.taobao.notify.remotingclient.NotifyManagerBean.init(NotifyManagerBean.java:90)
      - locked <0x1a78> (a com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:64)
      at com.alipay.common.event.tbnotify.adapter.EventSendFacadeTBNotifyImpl.init(EventSendFacadeTBNotifyImpl.java:99)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:84)
      at com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter.init(UniformEventPublisherAdapter.java:80)
      at com.alipay.boot.dms.client.util.InitializeUtil.initializePublisher(InitializeUtil.java:14)
      at com.alipay.boot.dms.spring.factory.DmsPublisherFactoryBean.afterPropertiesSet(DmsPublisherFactoryBean.java:59)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1642)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1579)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      - locked <0x1af4> (a java.util.concurrent.ConcurrentHashMap)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:351)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:108)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1486)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1231)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:522)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:518)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:754)
      at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
      at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
      - locked <0x1af5> (a java.lang.Object)
      at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122)
      at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:761)
      at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:371)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:315)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1186)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1175)
      at com.xx.SOFABootSpringApplication.main(SOFABootSpringApplication.java:15)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
      at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)

  而根據堆疊進行的大概猜想,開始排查,過程如下:

    // 初始程式碼位置:
    // com.taobao.notify.remotingclient.NotifyManagerBean
    public synchronized void init() {
        if (inited.get()) {
            return;
        }

        if (StringUtil.isEmpty(this.name) || StringUtil.isEmpty(this.description)) {
            throw new RuntimeException("The name and description property is required.");
        }
        if (null == notifyManager) {

            AllNotifyClientProperties properties = new AllNotifyClientProperties();
            properties.setCheckMessageTPConfig(checkMessageTPConfig);
            properties.setMessageTPConfig(messageTPConfig);
            properties.setMessageProperties(messageProperties);
            properties.setAntVipConfig(antVipConfig);
            
            // 問題出在這裡,new 這個物件時,會觸發很多的其他類載入,而這裡面就包含了有問題的類
            notifyManager = new DefaultNotifyManager(groupId, appName, name, description,
                messageListener, checkMessageListener, 1, enableCrc, properties);

            notifyManager.subscribeAllMessages(this.subscribeMessages, this.bindingList);

            /** ������ظ���groupId������ɾ��֮ǰ��publish topic*/
            //notifyManager.resetPublishTopics(publishTopics);
            for (String publishTopic : publishTopics) {
                notifyManager.addPublishTopic(publishTopic);
            }

            // ��ʼ��ʱ����servertag��url�Ķ�Ӧ�б�
            notifyManager.resetServerTagUrls();
        }

        inited.set(true);
    }
    
    // com.taobao.notify.remotingclient.DefaultNotifyManager  這個構造方法是 ok 的
    public DefaultNotifyManager(String groupId, String appName, String name, String description,
                                MessageListener msgListener, CheckMessageListener checkMsgListener,
                                int connCount, boolean enableCrc,
                                AllNotifyClientProperties properties) {
        initNotifyClient(properties);

        if (StringUtil.isBlank(groupId)) {
            throw new IllegalArgumentException("GroupID cannot be blank.");
        }
        if (StringTools.containsWhitespace(groupId)) {
            throw new IllegalArgumentException("GroupID cannot contain the blank character. ["
                                               + groupId + "]");
        }
        this.groupId = groupId.trim();
        this.appName = StringUtil.trim(appName);

        if (null == properties) {
            properties = new AllNotifyClientProperties();
        }
        this.notifyclient.addGroup(this.groupId, this.appName, name, description, msgListener,
            checkMsgListener, properties.getMessageProperties(), properties.getWaitForConnTime(),
            enableCrc, properties.getAntVipConfig());
        this.notifyclient.setConnectionCount(connCount);
    }
    // 但是問題在靜態欄位的初始化問題 
    // com.taobao.notify.remotingclient.DefaultNotifyManager
    public class DefaultNotifyManager implements NotifyManager, DefaultNotifyManagerMBean {
        //��־
        private static final Logger logger                   = MsgBrokerClientLoggerFactory
                                                                .getLogger(DefaultNotifyManager.class);

        // 這看起來像是個單例的 client
        private NotifyClient        notifyclient             = NotifyClientFactory
                                                               .getSingletonNotifyClient();
    }
    // com.taobao.notify.client.NotifyClientFactory
    public class NotifyClientFactory implements FactoryBean {
    
        // 初始化類時,會先初始化一個單例
        static NotifyClient notifyClient = new DefaultNotifyClient();

        /**
         * 
         * @return
         */
        public static NotifyClient getSingletonNotifyClient() {
            return notifyClient;
        }
    }
    // 到此,一切看起來都很美好,讓我繼續往下看
    // com.taobao.notify.client.impl.DefaultNotifyClient
    public class DefaultNotifyClient implements NotifyClient {
        private static final String               LogPrefix                 = LoggerPrefix
                                                                                .makeLogPrefix(DefaultNotifyClient.class);
        static final Logger                       logger                    = MsgBrokerClientLoggerFactory
                                                                                .getLogger(DefaultNotifyClient.class);

        private volatile NotifyClientConfig       notifyClientConfig        = null;

        private static final int                  MAX_TIMES                 = 3;

        protected final NotifyGroupManager        notifyGroupManager        = new NotifyGroupManager();

        private final RemotingService             remotingService;

        protected final ClientSubscriptionManager clientSubscriptionManager = new ClientSubscriptionManager();

        protected final PublishTopicsManager      publishTopicsManager      = new PublishTopicsManager();

        private final ThreadPoolExecutor          asynSendMessageWorkTP;

        private final LoggingService              loggingService            = LoggingService
                                                                                .getInstance();

        private final ReliableAsynSendManager     reliableAsynSendManager;

        private final MessageProperties           reliableMessageProperties = new MessageProperties();

        // 問題在於 UniqId 的生成
        private final UniqId                      uniqIdInstance            = UniqId.getInstance();

        /**
         * 
         */
        public DefaultNotifyClient() {
            this(new NotifyClientConfig());
        }

        /**
         * @param notifyClientConfig
         */
        public DefaultNotifyClient(final NotifyClientConfig notifyClientConfig) {
            if (null == this.notifyClientConfig) {
                this.notifyClientConfig = notifyClientConfig;
            }
            if (notifyClientConfig.isDebug()) {
                this.remotingService = new IntegratedMockRemotingService(
                    this.clientSubscriptionManager, notifyClientConfig.getDebugRemoteSubscribers(),
                    this.notifyGroupManager, notifyClientConfig.getDebugLocalPort());
                logger.warn(LogPrefix + "Note that, this is a MsgbrokerClient in the DEBUG mode.");
                logger.warn(LogPrefix + "In the DEBUG mode, local port ["
                            + notifyClientConfig.getDebugLocalPort() + "]");
                logger.warn(LogPrefix + "In the DEBUG mode, connections ["
                            + notifyClientConfig.getDebugRemoteSubscribers() + "]");
            } else {
                this.remotingService = new DefaultRemotingService(
                    this.notifyClientConfig.getRemotingType(), this.notifyGroupManager,
                    notifyClientConfig);
            }

            // ��ֹѹ��
            this.reliableMessageProperties.setCompressSize(Integer.MAX_VALUE);
            this.reliableMessageProperties.setMaxStringMessageSize(Integer.MAX_VALUE);

            this.asynSendMessageWorkTP = new ManagedThreadPoolExecutor(notifyClientConfig
                .getAsynSendMessageTPConfig().getCorePoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxPoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getKeepAliveTime(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxQueueSize(), "asynSendMsgTP-" + this.hashCode(),
                new ThreadPoolExecutor.AbortPolicy());

            this.reliableAsynSendManager = new ReliableAsynSendManager(this, notifyGroupManager,
                notifyClientConfig);
            if (notifyClientConfig.isPreInitializeReliableAsynSendManager()) {
                this.reliableAsynSendManager.init();
            }

            // DefaultNotifyClientʵ�������ж������Ҫ���䲻ͬ��ID
            try {
                ThreadPoolConfig messageTPConfig = notifyClientConfig.getMessageTPConfig();
                Lookout.registry().info(
                    Lookout.registry().createId("msgbroker.client.group.infos")
                        .withTag("hashcode", String.valueOf(this.hashCode()))
                        .withTag("corePoolSize", String.valueOf(messageTPConfig.getCorePoolSize()))
                        .withTag("maxPoolSize", String.valueOf(messageTPConfig.getMaxPoolSize()))
                        .withTag("queueSize", String.valueOf(messageTPConfig.getMaxQueueSize()))
                        .withTag("channelSize", String.valueOf(messageTPConfig.getChannelSize())),
                    new Info<Set<String>>() {
                        public Set<String> value() {
                            return notifyGroupManager.getGroupIds();
                        }
                    });
            } catch (Exception e) {
                logger.warn("register msgbroker.client.group.infos failed, " + e.getMessage());
            }
        }
    }
    // UniqId 生成如下
    // com.taobao.notify.utils.UniqId
    public class UniqId {
        private static final Logger            log       = MsgBrokerClientLoggerFactory.getLogger(UniqId.class);
        private static final String            LogPrefix = LoggerPrefix.makeLogPrefix(UniqId.class);

        private static char[]                  digits    = { '0', '1', '2', '3', '4', '5', '6', '7',
                '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'  };

        private static Map<Character, Integer> rDigits   = new HashMap<Character, Integer>(16);
        static {
            for (int i = 0; i < digits.length; ++i) {
                rDigits.put(digits[i], i);
            }
        }

        private static UniqId                  me        = new UniqId();
        private String                         hostAddr;
        // 實際問題在於這裡了, StrongRandom 的問題
        private Random                         random    = new StrongRandom();
        private MessageDigest                  mHasher;
        private UniqTimer                      timer     = new UniqTimer();

        private ReentrantLock                  opLock    = new ReentrantLock();

        private UniqId() {
            try {
                InetAddress addr = InetAddress.getLocalHost();

                hostAddr = addr.getHostAddress();
            } catch (IOException e) {
                log.error(LogPrefix + "Get HostAddr Error", e);
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (Util.isBlank(hostAddr) || "127.0.0.1".equals(hostAddr)) {
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (log.isDebugEnabled()) {
                log.debug(LogPrefix + "hostAddr is:" + hostAddr);
            }

            try {
                mHasher = MessageDigest.getInstance("MD5");
            } catch (NoSuchAlgorithmException nex) {
                mHasher = null;
                log.error(LogPrefix + "new MD5 Hasher error", nex);
            }
        }

        /**
         * ��ȡUniqIDʵ��
         * 
         * @return UniqId
         */
        public static UniqId getInstance() {
            return me;
        }
    }
    
    // StrongRandom 生成緩慢
    // com.taobao.notify.utils.StrongRandom
    public StrongRandom() {
        try {
            // This operation may block on some systems with low entropy. See
            // this page
            // for workaround suggestions:
            // http://docs.codehaus.org.display.JETTY.Connectors+slow+to+startup
            random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM);
        } catch (NoSuchAlgorithmException e) {
            try {
                random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM_ALT);
                weakRandom = false;
            } catch (NoSuchAlgorithmException e_alt) {
                random = new Random();
                weakRandom = true;
            }
        }
        // 就是這裡緩慢了,卡住大約一分鐘,且無法除錯
        random.setSeed(random.nextLong() ^ System.currentTimeMillis() ^ hashCode()
                       ^ Runtime.getRuntime().freeMemory());
    }
    

  最後,定位到是 隨機數生成有問題了!

 

8. 定位到點後,再來思考解決好的解決方案?

  如上,我們知道了是因為 SecureRandom 生成隨機數時出現了問題! 

  那麼,如何解決?我能想到的,就是網上搜尋答案了。因為這時候已經沒有什麼可以參考的了。

  搜尋內容就是, SecureRandom 生成隨機緩慢解決方案? 然後就有答案了!

  當然,你可以諮詢遇到過這些問題的前輩們,那樣更快速!

  原因如下:

java生成隨機碼時,會使用兩個檔案:
1. /dev/random , 隨機性高,和真實的物理環境有關,阻塞模式。(本文出現的啟動緩慢問題,就出在這個上面)
2. /dev/urandom ,偽隨機模式,非阻塞,隨機性不如 random。
所以,既然是生成random模式有問題,那麼想辦法換掉這個模式就ok了。

  jdk中,提供相應的替換方法,我們先看 $JAVA_HOME/jre/lib/security/java.security 這個裡面有個隨機數的配置,其說明如下:

#
# Sun Provider SecureRandom seed source.
#
# Select the primary source of seed data for the "SHA1PRNG" and
# "NativePRNG" SecureRandom implementations in the "Sun" provider.
# (Other SecureRandom implementations might also use this property.)
#
# On Unix-like systems (for example, Solaris/Linux/MacOS), the
# "NativePRNG" and "SHA1PRNG" implementations obtains seed data from
# special device files such as file:/dev/random.
#
# On Windows systems, specifying the URLs "file:/dev/random" or
# "file:/dev/urandom" will enable the native Microsoft CryptoAPI seeding
# mechanism for SHA1PRNG.
#
# By default, an attempt is made to use the entropy gathering device
# specified by the "securerandom.source" Security property.  If an
# exception occurs while accessing the specified URL:
#
#     SHA1PRNG:
#         the traditional system/thread activity algorithm will be used.
#
#     NativePRNG:
#         a default value of /dev/random will be used.  If neither
#         are available, the implementation will be disabled.
#         "file" is the only currently supported protocol type.
#
# The entropy gathering device can also be specified with the System
# property "java.security.egd". For example:
#
#   % java -Djava.security.egd=file:/dev/random MainClass
#
# Specifying this System property will override the
# "securerandom.source" Security property.
#
# In addition, if "file:/dev/random" or "file:/dev/urandom" is
# specified, the "NativePRNG" implementation will be more preferred than
# SHA1PRNG in the Sun provider.
#
securerandom.source=file:/dev/random

  即 可以直接改這個檔案,也可以通過命令列加引數修改,且命令列優先順序更高,所以我們使用命令列修改即可:

    -Djava.security.egd=file:/dev/./urandom # 在專案中新增java_opts,裡面新增啟動引數 urandom 

  如此,再次驗證後,啟動正常了。問題done。

 

9. 除了解決這個問題,我們還能思考點什麼?

  

關於 SecureRandom 為什麼慢的原因,可以看下這篇文章:
https://blog.csdn.net.zengdeqing2012/article.details/78133370

 

老話:說了不一定有機會,但不說一定沒機會。

相關文章