系統cpu佔用超高故障分析一例

浪漫雙魚發表於2011-02-22

下班的時候突然接到電話,通知一臺主機資源佔用超過,同時手機簡訊也不停的報session數,趕快連上伺服器檢視具體原因:

1、  通過top可以發現系統CPU資源佔用100%

查當前程式數,發現比平時多了100

Select * from v$license;

SESSIONS_MAX SESSIONS_WARNING SESSIONS_CURRENT SESSIONS_HIGHWATER  USERS_MAX

------------ ---------------- ---------------- ------------------ ----------

           0                0              234                246          0

session等待事件,大部分session都在做latch free等待

SQL> select sid,event,P1TEXT,state from v$session_wait;

 

   SID EVENT                          P1TEXT                         STATE

------ ------------------------------ ------------------------------ -------------------

    32 latch free                     address                        WAITED KNOWN TIME

    38 latch free                     address                        WAITED KNOWN TIME

    41 latch free                     address                        WAITING

    57 latch free                     address                        WAITING

    89 latch free                     address                        WAITED KNOWN TIME

    93 latch free                     address                        WAITED KNOWN TIME

   111 latch free                     address                        WAITED KNOWN TIME

   118 latch free                     address                        WAITING

   137 latch free                     address                        WAITED KNOWN TIME

   201 latch free                     address                        WAITING

   200 latch free                     address                        WAITED KNOWN TIME

   194 latch free                     address                        WAITED KNOWN TIME

   186 latch free                     address                        WAITED KNOWN TIME

   182 latch free                     address                        WAITED KNOWN TIME

   177 latch free                     address                        WAITING

   163 latch free                     address                        WAITING

   147 latch free                     address                        WAITED KNOWN TIME

   146 latch free                     address                        WAITED KNOWN TIME

   238 latch free                     address                        WAITED KNOWN TIME

   236 latch free                     address                        WAITED KNOWN TIME

   233 latch free                     address                        WAITED KNOWN TIME

   225 latch free                     address                        WAITED KNOWN TIME

   221 latch free                     address                        WAITED KNOWN TIME

   211 latch free                     address                        WAITING

   209 latch free                     address                        WAITED KNOWN TIME

   207 latch free                     address                        WAITED KNOWN TIME

   204 latch free                     address                        WAITING

   261 latch free                     address                        WAITED KNOWN TIME

   257 latch free                     address                        WAITED KNOWN TIME

   255 latch free                     address                        WAITED KNOWN TIME

   253 latch free                     address                        WAITED KNOWN TIME

   251 latch free                     address                        WAITED KNOWN TIME

   241 latch free                     address                        WAITING

   239 latch free                     address                        WAITED KNOWN TIME

   119 latch free                     address                        WAITED KNOWN TIME

   113 latch free                     address                        WAITED KNOWN TIME

    97 latch free                     address                        WAITING

    92 latch free                     address                        WAITED KNOWN TIME

    88 latch free                     address                        WAITED KNOWN TIME

    87 latch free                     address                        WAITED KNOWN TIME

    68 latch free                     address                        WAITED KNOWN TIME

  

2、  查詢佔用cpu的程式情況,大量程式佔用都很高,如3720.

SQL> SELECT /*+ ordered */ p.spid, s.sid, s.serial#, s.username, s.program,s.status,TO_CHAR(s.logon_time, 'mm-dd-yyyy hh24:mi') logon_time, s.last_call_et, st.value, s.sql_hash_value, s.sql_address, sq.child_number ,sq.sql_text  

  2  FROM v$statname sn, v$sesstat st, v$process p, v$session s, v$sql sq     

  3  WHERE s.paddr=p.addr        

  4  AND s.sql_hash_value = sq.hash_value and s.sql_Address = sq.address

  5  AND s.sid = st.sid

  6  AND st.STATISTIC# = sn.statistic#        

  7  AND sn.NAME = 'CPU used by this session'

  8  AND p.spid = &osPID -- parameter to restrict for a specific PID

  9  -- AND s.status = 'ACTIVE'

 10  ORDER BY st.value desc;

Enter value for ospid: 3720

 

SPID            SID SERIAL# USERNAME   PROGRAM                          STATUS   LOGON_TIME       LAST_CALL_ET      VALUE SQL_HASH_VALUE SQL_ADDRESS      CHILD_NUMBER

------------ ------ ------- ---------- -------------------------------- -------- ---------------- ------------ ---------- -------------- ---------------- ------------

SQL_TEXT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

3720             97   17856 CCATSUPT   JDBC Thin Client                 ACTIVE   02-21-2011 15:28         3413      16088     1113672989 C000000354738950            0

SELECT /*+ INDEX (C, I_SVR_PUB_DA_MAINQUEUE_HIS_FRT) */ COUNT(1)  FROM Svr_pub_da_MainQueue_his c,pub_Specialty k  WHERE c.Business in ( 'D46C2BC08404D1211DFA6F7BA8DCB9DB', '293C7B04CD7B0FFD56B255CC2585E16F')   AND c.specialty = k.specialtyid AND c.FirstReceptTime BETWEEN TO_DATE('2011-01-21 00:00:00', 'yyyy-MM-dd HH24:MI:SS') AND TO_DATE('2011-02-22 00:00:00', 'yyyy-MM-dd HH24:MI:SS')  AND ( k.TreeCode LIKE '0109%')  AND EXISTS (SELECT 1 FROM org_unit ou                             WHERE c.sourcedept = ou.unitid                               AND ou.TreeCode LIKE '0001000301970003%')

3、檢視該sql執行計劃,存在開銷極大的nested loop,檢查各個表的資料量後,懷疑是統計資訊出問題,走了不該走的索引。

 

-------------------------------------------------------------------------------------------------------------------------

| Id   | Operation                            | Name                           |  Rows | Bytes |  Cost | Pstart | Pstop |

-------------------------------------------------------------------------------------------------------------------------

|    0 | SELECT STATEMENT                     |                                |       |       | 34498 |        |       |

|    1 |  SORT AGGREGATE                      |                                |     1 |   203 |       |        |       |

| *  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID | SVR_PUB_DA_MAINQUEUE_HIS       |     1 |   101 | 34493 |  ROW L | ROW L |

|    3 |    NESTED LOOPS                      |                                |     1 |   203 | 34498 |        |       |

|    4 |     MERGE JOIN CARTESIAN             |                                |     1 |   102 |     5 |        |       |

|    5 |      TABLE ACCESS BY INDEX ROWID     | PUB_SPECIALTY                  |     1 |    46 |     3 |        |       |

| *  6 |       INDEX RANGE SCAN               | I_PUB_SPECIALTY_TR             |     1 |       |     2 |        |       |

|    7 |      BUFFER SORT                     |                                |     1 |    56 |     2 |        |       |

|    8 |       SORT UNIQUE                    |                                |       |       |       |        |       |

|    9 |        TABLE ACCESS BY INDEX ROWID   | ORG_UNIT                       |     1 |    56 |     2 |        |       |

| * 10 |         INDEX RANGE SCAN             | ORG_UNIT_I01                   |     1 |       |     1 |        |       |

| * 11 |     INDEX RANGE SCAN                 | I_SVR_PUB_DA_MAINQUEUE_HIS_FRT | 40241 |       |   119 |        |       |

-------------------------------------------------------------------------------------------------------------------------

4、更新統計資訊

analyze table PUB_SPECIALTY compute statistics for table for all indexed columns for all indexes;

analyze table org_unit compute statistics for table for all indexed columns for all indexes;

5、此時再次執行sql,檢視執行計劃,發現已經已無nested loop

 

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT ptimizer=CHOOSE (Cost=39980 Card=1 Bytes=2

          00)

 

   1    0   SORT (AGGREGATE)

   2    1     HASH JOIN (Cost=39980 Card=1811 Bytes=362200)

   3    2       TABLE ACCESS (FULL) OF 'PUB_SPECIALTY' (Cost=4 Card=17

          0 Bytes=7480)

 

   4    2       HASH JOIN (Cost=39975 Card=2059 Bytes=321204)

   5    4         SORT (UNIQUE)

   6    5           TABLE ACCESS (FULL) OF 'ORG_UNIT' (Cost=10 Card=21

           Bytes=1155)

 

   7    4         TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'SVR_PUB_DA_

          MAINQUEUE_HIS' (Cost=39951 Card=16097 Bytes=1625797)

 

   8    7           INDEX (RANGE SCAN) OF 'I_SVR_PUB_DA_MAINQUEUE_HIS_

          FRT' (NON-UNIQUE) (Cost=158 Card=40241)

6、查詢等待事件,latch freesession仍然存在,由於走錯了執行計劃,所以決定殺掉這些session,殺掉session後系統恢復正常

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/11088128/viewspace-687675/,如需轉載,請註明出處,否則將追究法律責任。

相關文章