硬解析帶來高CPU消耗的診斷

Ora-600發表於2009-06-16

客戶的系統進行壓力測試時發現CPU壓力非常大,正常情況下CPU負載在30-40%,而峰值期間CPU基本上100%,以至於事務無法處理完成,甚至無法登陸。

經過客戶的描述,這個壓力測試實際上壓力應該不大,僅僅設定了100個客戶端的併發,而且單個客戶端整個功能模組完成實際上僅需要25秒鐘,但併發一多,CPU負載就非常高,以至於單個功能模組完成時間被延長至幾百秒。

到客戶現場,在作業系統級察看,CPU大量消耗在user上,ioCPU等待很少,確定CPU是消耗在應用層面。

[oracle@localhost ~]$ sar -u 3 50

Linux 2.6.18-92.el5 (localhost.localdomain)     20090615  _x86_64_        (8 CPU)

 

145650     CPU     %user     %nice   %system   %iowait    %steal     %idle

145653     all     93.29      0.00      1.08      0.00      0.00      5.62

145656     all     90.54      0.00      1.33      0.00      0.00      8.12

145659     all     93.00      0.00      1.08      0.00      0.00      5.93

145702     all     92.34      0.00      0.92      0.00      0.00      6.75

145705     all     93.25      0.00      0.87      0.00      0.00      5.88

145708     all     92.06      0.00      1.02      0.00      0.00      6.92

145711     all     91.95      0.00      0.87      0.00      0.00      7.18

145714     all     92.71      0.00      0.92      0.00      0.00      6.38

145717     all     92.19      0.00      0.87      0.00      0.00      6.94

145720     all     94.96      0.00      0.79      0.00      0.00      4.25

145723     all     92.91      0.00      0.83      0.00      0.00      6.26

145726     all     92.06      0.00      0.70      0.00      0.00      7.24

 

由於客戶描述的情況中併發量與CPU消耗有直接關係,因此懷疑有應用級併發的競爭,於是收集了一下Statspack報告,在命中率方面:

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %:

99.98

Redo NoWait %:

99.98

Buffer Hit %:

100.00

In-memory Sort %:

100.00

Library Hit %:

74.95

Soft Parse %:

72.85

Execute to Parse %:

0.27

Latch Hit %:

97.18

Parse CPU to Parse Elapsd %:

9.47

% Non-Parse CPU:

7.65

庫快取記憶體的命中率很低,硬解析很高,而硬解析與CPU消耗有直接關係。

在等待事件方面:

Top 5 Timed Events

Event

Waits

Time(s)

Avg Wait(ms)

% Total Call Time

Wait Class

latch: library cache

5,679,771

363,979

64

86.1

Concurrency

CPU time

 

41,658

 

9.9

 

latch: shared pool

1,908,510

6,061

3

1.4

Concurrency

latch free

192,585

5,334

28

1.3

Other

latch: cache buffers chains

19,228

823

43

.2

Concurrency

很明顯,主要等待都集中在與庫快取記憶體相關的Latch競爭上,庫快取記憶體是主要的效能問題,而且硬解析也主要在消耗CPU,看來效能瓶頸已經表現出來了,通過下面會話級的直接查詢,也發現在大量會話中同時都存在著latch: library cache 競爭:

select event from v$session_wait

where event not in ('SQL*Net message from client','rdbms ipc message')

 

EVENT

----------------------------------------------------------------

SQL*Net message to client

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: shared pool

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

latch: library cache

。。。。。。

smon timer

pmon timer

 

目前客戶的共享池有544M,而在程式執行過程中共享池的剩餘空間還有59M,同時從報告中的共享池大小建議上看,對於目前的應用來說,共享池只需要400M就足夠,因此造成庫快取記憶體命中率低,硬解析高,CPU大量消耗的主要原因很可能是SQL程式碼沒有共享。

Shared Pool Size(M)

SP Size Factr

Est LC Size (M)

Est LC Mem Obj

Est LC Time Saved (s)

Est LC Time Saved Factr

Est LC Load Time (s)

Est LC Load Time Factr

Est LC Mem Obj Hits

416

0.76

67

7,820

3,652

0.98

466,989

1.00

3,518,320

480

0.88

130

15,164

3,710

1.00

466,931

1.00

3,523,512

544

1.00

193

22,507

3,727

1.00

466,914

1.00

3,523,751

608

1.12

256

29,890

3,730

1.00

466,911

1.00

3,523,816

672

1.24

319

37,269

3,731

1.00

466,910

1.00

3,523,863

736

1.35

382

44,593

3,731

1.00

466,910

1.00

3,523,915

800

1.47

445

51,975

3,732

1.00

466,909

1.00

3,523,952

864

1.59

508

59,424

3,733

1.00

466,908

1.00

3,524,016

928

1.71

571

66,882

3,735

1.00

466,906

1.00

3,524,077

992

1.82

634

74,314

3,744

1.00

466,897

1.00

3,524,148

1,056

1.94

697

81,781

3,744

1.00

466,897

1.00

3,524,190

1,120

2.06

760

89,216

3,748

1.01

466,893

1.00

3,524,267

 

查詢記憶體中的所有SQL語句,發現有兩條SQL語句執行的條數佔了記憶體所有SQL語句的80%,而這些語句的文字基本相同,僅僅是where子句中的值不同。

select sql_text from v$sql order by sql_text;

 

select * from olat_examinationresult where examlog_id =243035 and question_id=1961                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=1974                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2078                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2114                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2121                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2330                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2349                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2389                                      

select * from olat_examinationresult where examlog_id =243035 and question_id=2390                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2420                                      

select * from olat_examinationresult where examlog_id =243035 and question_id=2519                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2520                                      

select * from olat_examinationresult where examlog_id =243035 and question_id=2809                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2826                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2837                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2839                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2847                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2871                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2893                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=2895                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=3168                                     

select * from olat_examinationresult where examlog_id =243035 and question_id=3264                                     

。。。。。。       

 

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242499                                         

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242500                                         

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242501                                         

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242502                                        

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242503                                          

select t1.id, t1.examination_id, t1.user_id,to_char(t1.start_time,'yyyy-mm-dd hh24:mi:ss')as start_time, to_char(t1.end_

time,'yyyy-mm-dd hh24:mi:ss')as end_time, state, t1.right_rate, t1.result,t1.check_user, t1.isnot_pass, to_char(t1.last_

time,'yyyy-mm-dd hh24:mi:ss') as last_time, t1.USE_TIME,t1.TESTPAPER_ID,t2.name from olat_examinationlog t1 inner join o

lat_examination t2 on t1.examination_id = t2.id where t1.id=242504                                 

。。。。。。

 

問題已經很明顯了,應用中存在兩條迴圈執行的SQL語句,這兩條語句都存在沒有使用繫結變數的問題,因此每次使用不同的值都不得不硬解析,因此併發越多,需要硬解析的SQL語句越多,從而帶來更多的CPU消耗,因此讓使用者將這兩條語句的值換成繫結變數,CPU消耗的峰值降低到了15%以下。

 

                              

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/22235/viewspace-606501/,如需轉載,請註明出處,否則將追究法律責任。

相關文章