硬解析帶來高CPU消耗的診斷
客戶的系統進行壓力測試時發現CPU壓力非常大,正常情況下CPU負載在30-40%,而峰值期間CPU基本上100%,以至於事務無法處理完成,甚至無法登陸。
經過客戶的描述,這個壓力測試實際上壓力應該不大,僅僅設定了100個客戶端的併發,而且單個客戶端整個功能模組完成實際上僅需要25秒鐘,但併發一多,CPU負載就非常高,以至於單個功能模組完成時間被延長至幾百秒。
到客戶現場,在作業系統級察看,CPU大量消耗在user上,io的CPU等待很少,確定CPU是消耗在應用層面。
[oracle@localhost ~]$ sar -u 3 50 Linux 2.6.18-92.el5 (localhost.localdomain) 2009年06月15日 _x86_64_ (8 CPU)
14時56分50秒 CPU %user %nice %system %iowait %steal %idle 14時56分53秒 all 93.29 0.00 1.08 0.00 0.00 5.62 14時56分56秒 all 90.54 0.00 1.33 0.00 0.00 8.12 14時56分59秒 all 93.00 0.00 1.08 0.00 0.00 5.93 14時57分02秒 all 92.34 0.00 0.92 0.00 0.00 6.75 14時57分05秒 all 93.25 0.00 0.87 0.00 0.00 5.88 14時57分08秒 all 92.06 0.00 1.02 0.00 0.00 6.92 14時57分11秒 all 91.95 0.00 0.87 0.00 0.00 7.18 14時57分14秒 all 92.71 0.00 0.92 0.00 0.00 6.38 14時57分17秒 all 92.19 0.00 0.87 0.00 0.00 6.94 14時57分20秒 all 94.96 0.00 0.79 0.00 0.00 4.25 14時57分23秒 all 92.91 0.00 0.83 0.00 0.00 6.26 14時57分26秒 all 92.06 0.00 0.70 0.00 0.00 7.24 |
由於客戶描述的情況中併發量與CPU消耗有直接關係,因此懷疑有應用級併發的競爭,於是收集了一下Statspack報告,在命中率方面:
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: |
99.98 |
Redo NoWait %: |
99.98 |
Buffer Hit %: |
100.00 |
In-memory Sort %: |
100.00 |
Library Hit %: |
74.95 |
Soft Parse %: |
72.85 |
Execute to Parse %: |
0.27 |
Latch Hit %: |
97.18 |
Parse CPU to Parse Elapsd %: |
9.47 |
% Non-Parse CPU: |
7.65 |
庫快取記憶體的命中率很低,硬解析很高,而硬解析與CPU消耗有直接關係。
在等待事件方面:
Top 5 Timed Events
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
latch: library cache |
5,679,771 |
363,979 |
64 |
86.1 |
Concurrency |
CPU time |
|
41,658 |
|
9.9 |
|
latch: shared pool |
1,908,510 |
6,061 |
3 |
1.4 |
Concurrency |
latch free |
192,585 |
5,334 |
28 |
1.3 |
Other |
latch: cache buffers chains |
19,228 |
823 |
43 |
.2 |
Concurrency |
很明顯,主要等待都集中在與庫快取記憶體相關的Latch競爭上,庫快取記憶體是主要的效能問題,而且硬解析也主要在消耗CPU,看來效能瓶頸已經表現出來了,通過下面會話級的直接查詢,也發現在大量會話中同時都存在著latch: library cache 競爭:
select event from v$session_wait where event not in ('SQL*Net message from client','rdbms ipc message')
EVENT ---------------------------------------------------------------- SQL*Net message to client latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: shared pool latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache latch: library cache 。。。。。。 smon timer pmon timer |
目前客戶的共享池有544M,而在程式執行過程中共享池的剩餘空間還有59M,同時從報告中的共享池大小建議上看,對於目前的應用來說,共享池只需要400多M就足夠,因此造成庫快取記憶體命中率低,硬解析高,CPU大量消耗的主要原因很可能是SQL程式碼沒有共享。
Shared Pool Size(M) |
SP Size Factr |
Est LC Size (M) |
Est LC Mem Obj |
Est LC Time Saved (s) |
Est LC Time Saved Factr |
Est LC Load Time (s) |
Est LC Load Time Factr |
Est LC Mem Obj Hits |
416 |
0.76 |
67 |
7,820 |
3,652 |
0.98 |
466,989 |
1.00 |
3,518,320 |
480 |
0.88 |
130 |
15,164 |
3,710 |
1.00 |
466,931 |
1.00 |
3,523,512 |
544 |
1.00 |
193 |
22,507 |
3,727 |
1.00 |
466,914 |
1.00 |
3,523,751 |
608 |
1.12 |
256 |
29,890 |
3,730 |
1.00 |
466,911 |
1.00 |
3,523,816 |
672 |
1.24 |
319 |
37,269 |
3,731 |
1.00 |
466,910 |
1.00 |
3,523,863 |
736 |
1.35 |
382 |
44,593 |
3,731 |
1.00 |
466,910 |
1.00 |
3,523,915 |
800 |
1.47 |
445 |
51,975 |
3,732 |
1.00 |
466,909 |
1.00 |
3,523,952 |
864 |
1.59 |
508 |
59,424 |
3,733 |
1.00 |
466,908 |
1.00 |
3,524,016 |
928 |
1.71 |
571 |
66,882 |
3,735 |
1.00 |
466,906 |
1.00 |
3,524,077 |
992 |
1.82 |
634 |
74,314 |
3,744 |
1.00 |
466,897 |
1.00 |
3,524,148 |
1,056 |
1.94 |
697 |
81,781 |
3,744 |
1.00 |
466,897 |
1.00 |
3,524,190 |
1,120 |
2.06 |
760 |
89,216 |
3,748 |
1.01 |
466,893 |
1.00 |
3,524,267 |
查詢記憶體中的所有SQL語句,發現有兩條SQL語句執行的條數佔了記憶體所有SQL語句的80%,而這些語句的文字基本相同,僅僅是where子句中的值不同。
select sql_text from v$sql order by sql_text;
select * from olat_examinationresult where examlog_id =243035 and question_id=1961 select * from olat_examinationresult where examlog_id =243035 and question_id=1974 select * from olat_examinationresult where examlog_id =243035 and question_id=2078 select * from olat_examinationresult where examlog_id =243035 and question_id=2114 select * from olat_examinationresult where examlog_id =243035 and question_id=2121 select * from olat_examinationresult where examlog_id =243035 and question_id=2330 select * from olat_examinationresult where examlog_id =243035 and question_id=2349 select * from olat_examinationresult where examlog_id =243035 and question_id=2389 select * from olat_examinationresult where examlog_id =243035 and question_id=2390 select * from olat_examinationresult where examlog_id =243035 and question_id=2420 select * from olat_examinationresult where examlog_id =243035 and question_id=2519 select * from olat_examinationresult where examlog_id =243035 and question_id=2520 select * from olat_examinationresult where examlog_id =243035 and question_id=2809 select * from olat_examinationresult where examlog_id =243035 and question_id=2826 select * from olat_examinationresult where examlog_id =243035 and question_id=2837 select * from olat_examinationresult where examlog_id =243035 and question_id=2839 select * from olat_examinationresult where examlog_id =243035 and question_id=2847 select * from olat_examinationresult where examlog_id =243035 and question_id=2871 select * from olat_examinationresult where examlog_id =243035 and question_id=2893 select * from olat_examinationresult where examlog_id =243035 and question_id=2895 select * from olat_examinationresult where examlog_id =243035 and question_id=3168 select * from olat_examinationresult where examlog_id =243035 and question_id=3264 。。。。。。
select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242499 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242500 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242501 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242502 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242503 select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_ time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_ time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o lat_examination t2 on t1.examination_id = t2.id where t1.id=242504 。。。。。。 |
問題已經很明顯了,應用中存在兩條迴圈執行的SQL語句,這兩條語句都存在沒有使用繫結變數的問題,因此每次使用不同的值都不得不硬解析,因此併發越多,需要硬解析的SQL語句越多,從而帶來更多的CPU消耗,因此讓使用者將這兩條語句的值換成繫結變數,CPU消耗的峰值降低到了15%以下。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22235/viewspace-606501/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- MySQL 5.7定位消耗CPU高的SQLMySql
- 公司某資料子系統定期cpu過高的診斷
- 診斷和解決CPU利用率高的問題(zt)
- Strace診斷CPU跑高問題(java/php網站)JavaPHP網站
- 一招教你成為大內密探:診斷優化找出消耗CPU效能的內鬼優化
- 獲得消耗cpu較高的topsqlSQL
- Spark效能優化:診斷記憶體的消耗Spark優化記憶體
- 一次.net code中的placeholder導致的高cpu診斷
- 透過top命令抓取cpu高消耗的sqlSQL
- 通過top命令抓取cpu高消耗的sqlSQL
- oracle之 redo過高診斷Oracle
- 診斷與分析itpub壇友提出關於為何awr cpu usage非常高
- 一次異常記憶體消耗的診斷與解決記憶體
- AWR報告分析之一:高 DB CPU 消耗的效能根源-eygle
- Xcode自帶的超好用的診斷工具XCode
- 醫療視覺化為醫療診斷方式帶來巨大變革視覺化
- 查詢 分析硬解析較高的sql,SQL
- latch 相關效能問題診斷: latch: row cache objects等待事件導致CPU負載高Object事件負載
- 帶你上手阿里開源的 Java 診斷利器:Arthas阿里Java
- Oracle系統執行慢及資源消耗多問題診斷 [final]Oracle
- 用更雲原生的方式做診斷|大規模 K8s 叢集診斷利器深度解析K8S
- 如何使用Apple診斷程式檢查Mac硬體問題APPMac
- oracle診斷事件及深入解析10053事件Oracle事件
- php-fpm 記憶體過高,CPU佔有率過高帶來的最佳化和調整PHP記憶體
- 用 Arthas 神器來診斷 HBase 異常程式
- ORACLE 如何診斷高水位爭用(enq: HW – contention)OracleENQ
- 記一次K8S叢集Node節點CPU消耗高故障K8S
- 透過shell指令碼來檢視Undo中資源消耗高的sql指令碼SQL
- 通過shell指令碼來檢視Undo中資源消耗高的sql指令碼SQL
- 使用Dstat來進行Linux綜合效能診斷Linux
- 使用awr來診斷資料庫效能問題資料庫
- [JVM] 應用診斷工具之Fastthread(線上診斷)JVMASTthread
- 檢視當前最消耗CPU/Memory的oracle程式Oracle
- 蘇姿豐確認 RX 7000 顯示卡帶來新功能,支援高解析度、高幀率遊戲遊戲
- 找出消耗CPU最高的程式對應的SQL語句SQL
- ORACLE診斷案例Oracle
- Oracle故障診斷Oracle
- ORACLE診斷事件Oracle事件