硬解析帶來高CPU消耗的診斷
客戶的系統進行壓力測試時發現CPU壓力非常大,正常情況下CPU負載在30-40%,而峰值期間CPU基本上100%,以至於事務無法處理完成,甚至無法登陸。
經過客戶的描述,這個壓力測試實際上壓力應該不大,僅僅設定了100個客戶端的併發,而且單個客戶端整個功能模組完成實際上僅需要25秒鐘,但併發一多,CPU負載就非常高,以至於單個功能模組完成時間被延長至幾百秒。
到客戶現場,在作業系統級察看,CPU大量消耗在user上,io的CPU等待很少,確定CPU是消耗在應用層面。
[oracle@localhost ~]$ sar -u 3 50 Linux 2.6.18-92.el5 (localhost.localdomain) 2009年06月15日 _x86_64_ (8 CPU)
14時56分50秒 CPU %user %nice %system %iowait %steal %idle 14時56分53秒 all 93.29 0.00 1.08 0.00 0.00 5.62 14時56分56秒 all 90.54 0.00 1.33 0.00 0.00 8.12 14時56分59秒 all 93.00 0.00 1.08 0.00 0.00 5.93 14時57分02秒 all 92.34 0.00 0.92 0.00 0.00 6.75 14時57分05秒 all 93.25 0.00 0.87 0.00 0.00 5.88 14時57分08秒 all 92.06 0.00 1.02 0.00 0.00 6.92 14時57分11秒 all 91.95 0.00 0.87 0.00 0.00 7.18 14時57分14秒 all 92.71 0.00 0.92 0.00 0.00 6.38 14時57分17秒 all 92.19 0.00 0.87 0.00 0.00 6.94 14時57分20秒 all 94.96 0.00 0.79 0.00 0.00 4.25 14時57分23秒 all 92.91 0.00 0.83 0.00 0.00 6.26 14時57分26秒 all 92.06 0.00 0.70 0.00 0.00 7.24 |
由於客戶描述的情況中併發量與CPU消耗有直接關係,因此懷疑有應用級併發的競爭,於是收集了一下Statspack報告,在命中率方面:
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: |
99.98 |
Redo NoWait %: |
99.98 |
Buffer Hit %: |
100.00 |
In-memory Sort %: |
100.00 |
Library Hit %: |
74.95 |
Soft Parse %: |
72.85 |
Execute to Parse %: |
0.27 |
Latch Hit %: |
97.18 |
Parse CPU to Parse Elapsd %: |
9.47 |
% Non-Parse CPU: |
7.65 |
庫快取記憶體的命中率很低,硬解析很高,而硬解析與CPU消耗有直接關係。
在等待事件方面:
Top 5 Timed Events
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
latch: library cache |
5,679,771 |
363,979 |
64 |
86.1 |
Concurrency |
CPU time |
|
41,658 |
|
9.9 |
|
latch: shared pool |
1,908,510 |
6,061 |
3 |
1.4 |
Concurrency |
latch free |
192,585 |
5,334 |
28 |
1.3 |
Other |
latch: cache buffers chains |
19,228 |
823 |
43 |
.2 |
Concurrency |
很明顯,主要等待都集中在與庫快取記憶體相關的Latch競爭上,庫快取記憶體是主要的效能問題,而且硬解析也主要在消耗CPU,看來效能瓶頸已經表現出來了,通過下面會話級的直接查詢,也發現在大量會話中同時都存在著latch: library cache 競爭:
select event from v$session_wait where event not in ('SQL*Net message from client','rdbms ipc message')
EVENT ---------------------------------------------------------------- SQL*Net message to client latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: shared pool latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache 。。。。。。 smon timer pmon timer |
目前客戶的共享池有544M,而在程式執行過程中共享池的剩餘空間還有59M,同時從報告中的共享池大小建議上看,對於目前的應用來說,共享池只需要400多M就足夠,因此造成庫快取記憶體命中率低,硬解析高,CPU大量消耗的主要原因很可能是SQL程式碼沒有共享。
Shared Pool Size(M) |
SP Size Factr |
Est LC Size (M) |
Est LC Mem Obj |
Est LC Time Saved (s) |
Est LC Time Saved Factr |
Est LC Load Time (s) |
Est LC Load Time Factr |
Est LC Mem Obj Hits |
416 |
0.76 |
67 |
7,820 |
3,652 |
0.98 |
466,989 |
1.00 |
3,518,320 |
480 |
0.88 |
130 |
15,164 |
3,710 |
1.00 |
466,931 |
1.00 |
3,523,512 |
544 |
1.00 |
193 |
22,507 |
3,727 |
1.00 |
466,914 |
1.00 |
3,523,751 |
608 |
1.12 |
256 |
29,890 |
3,730 |
1.00 |
466,911 |
1.00 |
3,523,816 |
672 |
1.24 |
319 |
37,269 |
3,731 |
1.00 |
466,910 |
1.00 |
3,523,863 |
736 |
1.35 |
382 |
44,593 |
3,731 |
1.00 |
466,910 |
1.00 |
3,523,915 |
800 |
1.47 |
445 |
51,975 |
3,732 |
1.00 |
466,909 |
1.00 |
3,523,952 |
864 |
1.59 |
508 |
59,424 |
3,733 |
1.00 |
466,908 |
1.00 |
3,524,016 |
928 |
1.71 |
571 |
66,882 |
3,735 |
1.00 |
466,906 |
1.00 |
3,524,077 |
992 |
1.82 |
634 |
74,314 |
3,744 |
1.00 |
466,897 |
1.00 |
3,524,148 |
1,056 |
1.94 |
697 |
81,781 |
3,744 |
1.00 |
466,897 |
1.00 |
3,524,190 |
1,120 |
2.06 |
760 |
89,216 |
3,748 |
1.01 |
466,893 |
1.00 |
3,524,267 |
查詢記憶體中的所有SQL語句,發現有兩條SQL語句執行的條數佔了記憶體所有SQL語句的80%,而這些語句的文字基本相同,僅僅是where子句中的值不同。
select sql_text from v$sql order by sql_text;
select * from olat_examinationresult where examlog_id =243035 and question_id=1961 select * from olat_examinationresult where examlog_id =243035 and question_id=1974 select * from olat_examinationresult where examlog_id =243035 and question_id=2078 select * from olat_examinationresult where examlog_id =243035 and question_id=2114 select * from olat_examinationresult where examlog_id =243035 and question_id=2121 select * from olat_examinationresult where examlog_id =243035 and question_id=2330 select * from olat_examinationresult where examlog_id =243035 and question_id=2349 select * from olat_examinationresult where examlog_id =243035 and question_id=2389 select * from olat_examinationresult where examlog_id =243035 and question_id=2390 select * from olat_examinationresult where examlog_id =243035 and question_id=2420 select * from olat_examinationresult where examlog_id =243035 and question_id=2519 select * from olat_examinationresult where examlog_id =243035 and question_id=2520 select * from olat_examinationresult where examlog_id =243035 and question_id=2809 select * from olat_examinationresult where examlog_id =243035 and question_id=2826 select * from olat_examinationresult where examlog_id =243035 and question_id=2837 select * from olat_examinationresult where examlog_id =243035 and question_id=2839 select * from olat_examinationresult where examlog_id =243035 and question_id=2847 select * from olat_examinationresult where examlog_id =243035 and question_id=2871 select * from olat_examinationresult where examlog_id =243035 and question_id=2893 select * from olat_examinationresult where examlog_id =243035 and question_id=2895 select * from olat_examinationresult where examlog_id =243035 and question_id=3168 select * from olat_examinationresult where examlog_id =243035 and question_id=3264 。。。。。。
select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242499 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242500 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242501 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242502 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242503 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242504 。。。。。。 |
問題已經很明顯了,應用中存在兩條迴圈執行的SQL語句,這兩條語句都存在沒有使用繫結變數的問題,因此每次使用不同的值都不得不硬解析,因此併發越多,需要硬解析的SQL語句越多,從而帶來更多的CPU消耗,因此讓使用者將這兩條語句的值換成繫結變數,CPU消耗的峰值降低到了15%以下。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22235/viewspace-606501/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 公司某資料子系統定期cpu過高的診斷
- MySQL 5.7定位消耗CPU高的SQLMySql
- Spark效能優化:診斷記憶體的消耗Spark優化記憶體
- 一招教你成為大內密探:診斷優化找出消耗CPU效能的內鬼優化
- 一次.net code中的placeholder導致的高cpu診斷
- Xcode自帶的超好用的診斷工具XCode
- oracle之 redo過高診斷Oracle
- 帶你上手阿里開源的 Java 診斷利器:Arthas阿里Java
- 如何使用Apple診斷程式檢查Mac硬體問題APPMac
- 用更雲原生的方式做診斷|大規模 K8s 叢集診斷利器深度解析K8S
- ORACLE 如何診斷高水位爭用(enq: HW – contention)OracleENQ
- 用 Arthas 神器來診斷 HBase 異常程式
- 如何使用 dotTrace 來診斷 netcore 應用的效能問題NetCore
- [JVM] 應用診斷工具之Fastthread(線上診斷)JVMASTthread
- php-fpm 記憶體過高,CPU佔有率過高帶來的最佳化和調整PHP記憶體
- 記一次K8S叢集Node節點CPU消耗高故障K8S
- ORACLE診斷案例Oracle
- 網路診斷工具的使用
- Java診斷利器ArthasJava
- SQL問題診斷SQL
- SQLServer如何查詢近3分鐘最消耗CPU的SQLSQLServer
- 聊聊智慧診斷模型的構建模型
- 臻和科技獨家引進乳腺癌精準診斷利器MammaPrint®,為國內患者帶來福音
- 免費網站seo診斷:從哪些維度進行診斷呢?網站
- Oracle資料庫 11.2.0.4 EMON程式持續消耗CPUOracle資料庫
- 使用Android sdk自帶的jsonReader來解析jsonAndroidJSON
- Oracle診斷事件列表(轉)Oracle事件
- Java執行緒診斷Java執行緒
- 效能分析(5)- 軟中斷導致 CPU 使用率過高的案例
- 診斷子事務的瑞士軍刀
- 診斷叢集的潛在問題
- 基於等待事件的效能診斷(轉)事件
- 【譯】.NET 5 中的診斷改進
- 幾個常用的網路診斷命令
- 系統診斷小技巧(12):如何確定執行緒是否因CPU資源波動執行緒
- [20190402]對比_mutex_wait_scheme不同模式cpu消耗.txtMutexAIScheme模式
- 執行sed命令卡死CPU消耗100%一例分析
- 遊戲測試 Perfdog 實戰之減少 CPU 消耗遊戲
- AI診斷心臟病比人類更準?但這只是識圖,不是診斷AI