1.1現象
存在一套11.2.0.4 RAC 2節點,資料庫存在5000個會話數量,其中active正在執行的會話500個,其餘均為非活躍會話。
大量inactive會話過多給Oracle資料庫帶來什麼樣的影響?
[活躍說明資料庫存在大量併發,正常情況下說明是業務負載這塊只能拆庫,常規無法優化套路】
1) 記憶體消耗,每個會話對應資料庫來說都是一個pga process 需要單獨分配一塊記憶體區域,並且其中會話保留的遊標越多,越消耗記憶體;
2) cpu資源消耗,每個會話對應作業系統來說都是一個Oracle使用者程式,雖然大量屬於inactive 但是對於作業系統來說還是一個程式執行;
1.2相關引數
SESSIONS Default value Derived: (1.5 * PROCESSES) + 22 Range of values 1 to 216 (which is 1 to 65536) SESSIONS specifies the maximum number of sessions that can be created in the system. Because every login requires a session, this parameter effectively determines
the maximum number of concurrent users in the system. You should always set this parameter explicitly to a value equivalent to your estimate of the maximum number of concurrent users,
plus the number of background processes, plus approximately 10% for recursive sessions. PROCESSES Default value 100 Range of values 6 to operating system dependent PROCESSES specifies the maximum number of operating system user processes that can simultaneously connect to Oracle. Its value should allow for all background processes such as locks,
job queue processes, and parallel execution processes. # vi /etc/security/limits.conf oracle soft nproc 32 oracle hard nproc 64 #ulimit -a
1.3 方法論
狀態 1.活躍會話基本上無法處理,Oracle資料庫拆分[極端情況例如:會話擠壓等異常情況,處理資料庫異常後,活躍會話趨於平穩】; 2.非活躍會話可以定期清理 參考文件 https://www.anbob.com/archives/5837.html 非活躍會話方法論: A:dead session Dead connection detection(DCD)是Oracle網路功能,主要用於客戶端關閉其系統電源或客戶端計算機意外崩潰而又無法正常關閉Oracle資料庫連線的環境。 如果客戶端計算機因正確斷開會話而突然崩潰,則這些會話鎖定的資源將繼續被鎖定,從而導致環境效能問題。由於這種情況可以在任何環境中發生,因此Oracle網路服務提供了DCD的功能,可以在早期檢測到這種情況,從而可以快速恢復鎖定的資源。 為了解決這種情況並檢測死連線,Oracle在網路會話(12c中的tcp)層引入了一個新概念。伺服器程式將SQL * Net Probe資料包傳送到客戶端,以檢查在sqlnet.expire_time【單位:分鐘】引數指定的每個固定時間間隔內連線是否仍然可用。
如果通過探測資料包的通訊失敗,則會返回錯誤,導致伺服器程式退出。 SQLNET.EXPIRE_TIME = 10 B:idel session 1.WAS等中介軟體或程式軟體,存在會話一定時間不工作自動斷開設定; 2.MAX_IDLE_TIME指定會話可以空閒的最大分鐘數。 此後會話將自動終止。12.2 引入的新引數。這引數是整個db或CDB級,不可以alter session. 單位分鐘,超過idle上限時間後會收到 ORA-03113錯誤。 3.編寫plsql指令碼,定期執行[如下詳細說明] C:idle blocker session 1.目前常規基本上都是DBA人為參與處理; 2.如果一個idle session阻塞了其他會話,在19c 20c中可以使用引數max_idle_blocker_time 自動終止blocker, 當會話持有其他會話所需的資源時,該會話被視為阻塞會話. 1)如 該會話持有另一個會話所需的鎖。 2)該會話是並行操作,並且其使用者組,PDB或資料庫已達到其最大並行伺服器限制或已排隊的並行操作。 3)會話的PDB或資料庫例項即將達到其SESSIONS或PROCESSES限制。 此引數與MAX_IDLE_TIME引數的不同之處在於,MAX_IDLE_TIME適用於所有會話(阻塞和非阻塞),而MAX_IDLE_BLOCKING_TIME僅適用於阻塞會話。 因此,為了使MAX_IDLE_BLOCKING_TIME有效,其限制必須小於MAX_IDLE_TIME限制。
1.4 模擬人工處理
作業系統併發20進行登入,使用1000併發測試主機hang
#!/bin/bash i=1 while (($i<=20)) do sqlplus -S scott/tiger & let "i++" echo $1 done
測試環境process 400達到上限報錯
ERROR: ORA-00020: maximum number of processes (400) exceeded Thu Oct 22 02:24:15 2020 ORA-00020: maximum number of processes (400) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Process m000 submission failed with error = 20 Process m000 submission failed with error = 20 Process m000 submission failed with error = 20 Process m000 submission failed with error = 20 Thu Oct 22 02:24:18 2020
DB 什麼檢視記錄歷史登入情況達到上限值???
SQL> SELECT * FROM GV$RESOURCE_LIMIT WHERE RESOURCE_NAME='sessions' or RESOURCE_NAME='processes';
手工方法輸出KILL語句 SELECT STATUS,COUNT(*) FROM GV$SESSION GROUP BY STATUS; select username,round(LAST_CALL_ET/3600) as "HH24",count(*) from gv$session where status='INACTIVE' group by username,round(LAST_CALL_ET/3600) order by 2,3,1; select 'ALTER SYSTEM KILL session '''||sid||','||serial#||''' IMMEDIATE;' AS "KILL_SQL" from v$session where status='INACTIVE' AND LAST_CALL_ET>600 AND USERNAME='CC' union select 'ALTER SYSTEM KILL session '''||sid||','||serial#||''' IMMEDIATE;' AS "KILL_SQL" from v$session where status='INACTIVE' AND LAST_CALL_ET>36000 AND USERNAME='CB';
plsql執行kill inactive session SET SERVEROUTPUT ON DECLARE P_SID NUMBER; P_SERIAL NUMBER; P_STATUS VARCHAR2(32); KILL_SQLTEXT VARCHAR2(4000); CURSOR CUR_APP_ID IS SELECT SID, SERIAL# FROM V$SESSION WHERE STATUS = 'INACTIVE' AND USERNAME IN('SCOTT') AND STATUS = 'INACTIVE' AND LAST_CALL_ET >6 ; BEGIN OPEN CUR_APP_ID; LOOP FETCH CUR_APP_ID INTO P_SID,P_SERIAL; EXIT WHEN CUR_APP_ID%NOTFOUND; SELECT STATUS INTO P_STATUS FROM V$SESSION WHERE SID = P_SID AND SERIAL# = P_SERIAL AND USERNAME IN('SCOTT') AND STATUS = 'INACTIVE' AND LAST_CALL_ET >6 ; IF P_STATUS = 'INACTIVE' THEN KILL_SQLTEXT := 'ALTER SYSTEM KILL SESSION '||CHR(39)||P_SID || ',' || P_SERIAL ||CHR(39)||' IMMEDIATE'; EXECUTE IMMEDIATE KILL_SQLTEXT; DBMS_OUTPUT.PUT_LINE(KILL_SQLTEXT); END IF; END LOOP; CLOSE CUR_APP_ID; END; /
1.5 配置定時任務
!對於RAC 2節點,需要每個節點都配置這個指令碼,建議時間錯開一定時間,例如30分鐘。 通過實際執行發現kill 1300 inactive session話費時間3分鐘左右。
[oracle@test script]$ chmod +x kill_inactive_session.sh [oracle@test script]$ cat kill_inactive_session.sh #!/bin/bash export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1/ export ORACLE_SID=tt export PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin sqlplus / as sysdba <<EOF spool /home/oracle/script/kill_session.log select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') as "date" from dual; SET SERVEROUTPUT ON DECLARE P_SID NUMBER; P_SERIAL NUMBER; P_STATUS VARCHAR2(32); KILL_SQLTEXT VARCHAR2(4000); CURSOR CUR_APP_ID IS SELECT SID, SERIAL# FROM V\$SESSION WHERE STATUS = 'INACTIVE' AND USERNAME IN('SCOTT') AND STATUS = 'INACTIVE' AND LAST_CALL_ET >6 ; BEGIN OPEN CUR_APP_ID; LOOP FETCH CUR_APP_ID INTO P_SID,P_SERIAL; EXIT WHEN CUR_APP_ID%NOTFOUND; SELECT STATUS INTO P_STATUS FROM V\$SESSION WHERE SID = P_SID AND SERIAL# = P_SERIAL AND USERNAME IN('SCOTT') AND STATUS = 'INACTIVE' AND LAST_CALL_ET >6 ; IF P_STATUS = 'INACTIVE' THEN KILL_SQLTEXT := 'ALTER SYSTEM KILL SESSION '||CHR(39)||P_SID || ',' || P_SERIAL ||CHR(39)||' IMMEDIATE'; EXECUTE IMMEDIATE KILL_SQLTEXT; DBMS_OUTPUT.PUT_LINE(KILL_SQLTEXT); END IF; END LOOP; CLOSE CUR_APP_ID; END; / exit; EOF [oracle@test script]$ crontab -l 01 07,21 * * * sh /home/oracle/script/kill_inactive_session.sh & # tail -200f //var/log/cron Oct 22 07:00:01 test crond[4834]: (oracle) CMD (sh /home/oracle/script/kill_inactive_session.sh &)