記一次資料庫重啟後歸檔急劇增加的問題
在本地的環境中測試外部表的效能,由於空間有限,不一會兒歸檔的空間就爆了。然後檔案貌似出現了系統級的問題,剛剛rm掉的歸檔日誌檔案。隔了幾秒鐘再ls,就出現了。怎麼刪都刪不掉。嘗試了多次之後,無奈嘗試shutdown immediate結果等了好半天還是沒反應,然後採用shutdown abort後重啟,庫直接起不來了。報了ora錯誤,然後庫就起不來了。
檢視日誌顯示,和之前碰到的歸檔空間不足導致的問題一致。清除有問題的歸檔之後,重啟庫就起來了。可以參見日誌:http://blog.itpub.net/23718752/viewspace-1167163/
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC1: Encountered disk I/O error 19502
ARC1: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u03/ora11g/product/11.2.0/dbhome_1/dbs/arch1_339_837590339.dbf' (error 19502) (TEST01)
ARC1: I/O error 19502 archiving log 3 to '/u03/ora11g/product/11.2.0/dbhome_1/dbs/arch1_339_837590339.dbf'
ARC2: Encountered disk I/O error 19502
ARC2: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u03/ora11g/product/11.2.0/dbhome_1/dbs/arch1_337_837590339.dbf' (error 19502) (TEST01)
ARC2: I/O error 19502 archiving log 2 to '/u03/ora11g/product/11.2.0/dbhome_1/dbs/arch1_337_837590339.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance TEST01 - Archival Error
ORA-16038: log 3 sequence# 339 cannot be archived
ORA-19502: write error on file "", block number (block size=)
ORA-00312: online log 3 thread 1: '/u03/ora11g/oradata/TEST01/redo03.log'
Sun Jun 08 00:37:11 2014
[2867] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:1051584 end:1055564 diff:3980 (39 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
ARC3: Encountered disk I/O error 19502
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u03/ora11g/product/11.2.0/dbhome_1/dbs/arch1_337_837590339.dbf' (error 19502) (TEST01)
No Resource Manager plan active
Sun Jun 08 00:37:13 2014
Non critical error ORA-48181 caught while writing to trace file "/u03/ora11g/diag/rdbms/test01/TEST01/trace/TEST01_dbrm_2840.trc"
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
然而這僅僅是個開始。我檢查檔案的使用情況,馬上發現有一個目錄下空間只剩下幾百k了,排查空間的使用情況,最後定為是Undo的自增長造成的,本來500M左右的Undo現在漲到了900多M。
因為庫是剛起來的,也沒什麼其他的操作,於是就做了Undo檔案的resize,結果讓我大跌眼鏡。
SQL> alter database datafile '/u03/ora11g/oradata/TEST01/undotbs01.dbf' resize 500M;
alter database datafile '/u03/ora11g/oradata/TEST01/undotbs01.dbf' resize 500M
*
ERROR at line 1:
ORA-03297: file contains used data beyond requested RESIZE value
resize不行,再也沒有其他的多餘空間,而且目前遇到的情況更奇怪,歸檔生成的極為頻繁。本來空間就緊張的虛擬機器幾乎不能做任何操作。
我最後嘗試更改歸檔路徑,把歸檔指到空間稍大的一個分割槽。然後再查檢視到底是什麼在後臺消耗了這麼多的資源,這個庫自啟動以來沒做任何其他的操作。
先更改了歸檔路徑,然後shutdown immediate還是沒反應,嘗試shutdown abort重啟。這次重啟沒有其他的問題。庫起來了,
但是在短時間內生成了相當多的歸檔檔案。有很多已經被自己手工刪除了。
rw-r----- 1 ora11g dba 99933184 Jun 8 01:04 1_358_837590339.dbf
-rw-r----- 1 ora11g dba 88097280 Jun 8 01:05 1_359_837590339.dbf
-rw-r----- 1 ora11g dba 99902976 Jun 8 01:06 1_360_837590339.dbf
-rw-r----- 1 ora11g dba 93864960 Jun 8 01:07 1_361_837590339.dbf
-rw-r----- 1 ora11g dba 97522176 Jun 8 01:08 1_362_837590339.dbf
-rw-r----- 1 ora11g dba 95759360 Jun 8 01:09 1_363_837590339.dbf
-rw-r----- 1 ora11g dba 92051968 Jun 8 01:10 1_364_837590339.dbf
[ora11g@rac1 arch]$
檢視系統級的程式。裡面有好幾個並行相關的程式,目前沒有其他的操作,是在做後臺的回滾嗎?
我記得重啟之前做資料載入測試的時候是用了並行。
top - 01:10:00 up 1:06, 3 users, load average: 3.52, 3.27, 2.99
Tasks: 181 total, 1 running, 180 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.8%us, 4.6%sy, 0.0%ni, 15.1%id, 68.7%wa, 0.2%hi, 0.7%si, 0.0%st
Mem: 2030124k total, 1965484k used, 64640k free, 2408k buffers
Swap: 4063224k total, 0k used, 4063224k free, 1578300k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3176 ora11g 20 0 530m 76m 74m D 10.9 3.9 2:16.24 ora_p003_TEST01
3158 ora11g 20 0 531m 79m 77m D 9.6 4.0 2:11.00 ora_p001_TEST01
3174 ora11g 20 0 530m 73m 71m S 5.3 3.7 2:13.23 ora_p002_TEST01
3128 ora11g 20 0 549m 91m 71m D 3.6 4.6 0:50.43 ora_dbw0_TEST01
3936 ora11g 20 0 14940 1248 904 R 1.0 0.1 0:00.15 top -c
3130 ora11g 20 0 545m 22m 20m D 0.7 1.2 0:09.51 ora_lgwr_TEST01
2655 grid 20 0 97440 1556 744 S 0.3 0.1 0:00.45 sshd: grid@pts/0
3114 ora11g -2 0 529m 15m 13m S 0.3 0.8 0:01.41 ora_vktm_TEST01
3140 ora11g 20 0 530m 31m 28m S 0.3 1.6 0:00.89 ora_mmnl_TEST01
檢視當前是否有session在操作
SELECT s.USERNAME,s.SID,s.SERIAL#,t.UBAFIL "UBA filenum", t.UBABLK
"UBA Block number",t.USED_UBLK "Number os undo Blocks Used",
t.START_TIME,t.STATUS,t.START_SCNB,t.XIDUSN RollID,r.NAME RollName
FROM v$session s,v$transaction t,v$rollname r
WHERE s.SADDR=t.SES_ADDR AND t.XIDUSN=r.usn;
no rows selected
檢視undo的使用情況
SELECT
seg.tablespace_name "Tablespace Name",
ts.bytes/1024/1024 "TS Size(MB)",
ue.status "UNDO Status",
count(*) "Used Extents",
round(sum(ue.bytes)/1024/1024, 2) "Used Size(MB)",
round(sum(ue.bytes)/ts.bytes*100, 2) "Used Rate(%)"
FROM dba_segments seg, DBA_UNDO_EXTENTS ue,
(SELECT tablespace_name, sum(bytes) bytes
FROM dba_data_files GROUP BY tablespace_name) ts
WHERE ue.segment_NAME=seg.segment_NAME and seg.tablespace_name=ts.tablespace_name
GROUP BY seg.tablespace_name, ts.bytes, ue.status ORDER BY seg.tablespace_name;
Tablespace Name TS Size(MB) UNDO Stat Used Extents Used Size(MB) Used Rate(%)
------------------------------ ----------- --------- ------------ ------------- ------------
UNDOTBS 933.125 ACTIVE 283 929.88 99.65
UNDOTBS 933.125 EXPIRED 12 .75 .08
UNDOTBS 933.125 UNEXPIRED 9 1.5 .16
使用ash來檢視一些到底在那幾分鐘裡發生了什麼。
Sid,Srl# (Inst) % Activity SQL ID Event % Event
--------------- ---------- ------------- ----------------------------- --------
User Program
-------------------- ------------------------------
243, 1(1) 75.35 wait for a undo record 21.62
SYS oracle@rac1 (SMON)
db file sequential read 18.32
SYS oracle@rac1 (SMON)
wait for stopper event to be 16.08
SYS oracle@rac1 (SMON)
-------------------------------------------------------------
Top DB Objects DB/Inst: TEST01/TEST01 (Jun 08 01:04 to 01:09)
-> With respect to Application, Cluster, User I/O and buffer busy waits only.
Object ID % Activity Event % Event
--------------- ---------- ------------------------------ ----------
Object Name (Type) Tablespace
----------------------------------------------------- -------------------------
15390 7.00 db file sequential read 6.83
N1.T (TABLE) POOL_DATA
15391 3.92 db file sequential read 2.86
N1.SYS_C005621 (INDEX) POOL_DATA
buffer busy waits 1.06
-------------------------------------------------------------
Top DB Files DB/Inst: TEST01/TEST01 (Jun 08 01:04 to 01:09)
-> With respect to Cluster and User I/O events only.
File ID % Activity Event % Event
--------------- ---------- ------------------------------ ----------
File Name Tablespace
----------------------------------------------------- -------------------------
3 9.30 db file sequential read 8.63
/u03/ora11g/oradata/TEST01/undotbs01.dbf UNDOTBS
10 2.13 db file sequential read 2.02
/u01/ora11g/pool_data06.dbf POOL_DATA
8 1.46 db file sequential read 1.46
/u03/ora11g/oradata/TEST01/pool_data04.dbf POOL_DATA
11 1.40 db file sequential read 1.40
/u01/ora11g/pool_data07.dbf POOL_DATA
7 1.34 db file sequential read 1.34
/u03/ora11g/oradata/TEST01/pool_data02.dbf POOL_DATA
-------------------------------------------------------------
Top Latches DB/Inst: TEST01/TEST01 (Jun 08 01:04 to 01:09)
No data exists for this section of the report.
-------------------------------------------------------------
Activity Over Time DB/Inst: TEST01/TEST01 (Jun 08 01:04 to 01:09)
-> Analysis period is divided into smaller time slots
-> Top 3 events are reported in each of those slots
-> 'Slot Count' shows the number of ASH samples in that slot
-> 'Event Count' shows the number of ASH samples waiting for
that event in that slot
-> '% Event' is 'Event Count' over all ASH samples in the analysis period
Slot Event
Slot Time (Duration) Count Event Count % Event
-------------------- -------- ------------------------------ -------- -------
01:04:00 (0 secs) 7 write complete waits 3 0.17
db file async I/O submit 1 0.06
free buffer waits 1 0.06
01:04:00 (1.0 min) 343 wait for a undo record 81 4.54
db file sequential read 67 3.75
wait for stopper event to be i 59 3.31
01:05:00 (1.0 min) 330 wait for a undo record 69 3.87
db file async I/O submit 58 3.25
wait for stopper event to be i 55 3.08
01:06:00 (1.0 min) 363 wait for a undo record 97 5.43
db file async I/O submit 59 3.31
wait for stopper event to be i 52 2.91
01:07:00 (1.0 min) 377 db file sequential read 87 4.87
wait for a undo record 77 4.31
wait for stopper event to be i 61 3.42
01:08:00 (1.0 min) 365 db file sequential read 80 4.48
wait for a undo record 61 3.42
db file async I/O submit 59 3.31
-------------------------------------------------------------
這樣看果然一目瞭然,全部的問題都指向了一個表t和對應的索引。
這個表是做資料載入測試使用的表,載入的資料量有千萬。會產生很多的redo。
檢視n1.t這個表的情況,看錶裡面,目前是沒有資料,但是查詢會持續相當長的時間。簡單驗證一下。
SQL> select count(*)from t where rownum<2;
COUNT(*)
----------
0
然後檢視錶t的統計資訊。還是現實千萬的資料條數。
*******************************************
OWNER TABLE_NAME
------------------------------ ------------------------------
N1 T
*******************************************
********** TABLE GENERAL INFO *****************
TABLE_NAME PAR TABLESPACE STATUS NUM_ROWS BLOCKS EMPTY_BLOCKS LOG MON ROW_MOVE LAST_ANAL
------------------------------ --- ---------- ------ ---------- ---------- ------------ --- --- -------- ---------
T NO POOL_DATA VALID 13240320 74364 0 NO YES DISABLED 07-JUN-14
********** TABLE STORAGE INFO *****************
INITEXT NXTEXT MINEXT MAXEXT FREELISTS AVG_SPACE CHAIN_CNT AVG_ROW_LEN CACHE T DEPENDEN COMPRES
--------- --------- ---------- ----------- ---------- ---------- ---------- ----------- -------------------- - -------- ----------
65536 1048576 1 2147483645 0 0 36 N N DISABLED DISABLED
在我查完問題之後,問題也好像自動解決了,歸檔也不在生成了。檢視alert檔案,確實是在做事務的並行回滾,不過剛剛做完。
Thread 1 advanced to log sequence 360 (LGWR switch)
Current log# 3 seq# 360 mem# 0: /u03/ora11g/oradata/TEST01/redo03.log
Sun Jun 08 01:05:28 2014
Archived Log entry 304 added for thread 1 sequence 359 ID 0xd9428f03 dest 1:
Sun Jun 08 01:06:09 2014
Thread 1 cannot allocate new log, sequence 361
Checkpoint not complete
Current log# 3 seq# 360 mem# 0: /u03/ora11g/oradata/TEST01/redo03.log
Sun Jun 08 01:06:22 2014
Thread 1 advanced to log sequence 361 (LGWR switch)
Current log# 1 seq# 361 mem# 0: /u03/ora11g/oradata/TEST01/redo01.log
Sun Jun 08 01:06:26 2014
Archived Log entry 305 added for thread 1 sequence 360 ID 0xd9428f03 dest 1:
Sun Jun 08 01:07:13 2014
Thread 1 advanced to log sequence 362 (LGWR switch)
Current log# 2 seq# 362 mem# 0: /u03/ora11g/oradata/TEST01/redo02.log
Sun Jun 08 01:07:16 2014
Archived Log entry 306 added for thread 1 sequence 361 ID 0xd9428f03 dest 1:
Sun Jun 08 01:08:17 2014
Thread 1 cannot allocate new log, sequence 363
Checkpoint not complete
Current log# 2 seq# 362 mem# 0: /u03/ora11g/oradata/TEST01/redo02.log
Sun Jun 08 01:08:30 2014
Thread 1 advanced to log sequence 363 (LGWR switch)
Current log# 3 seq# 363 mem# 0: /u03/ora11g/oradata/TEST01/redo03.log
Sun Jun 08 01:08:33 2014
Archived Log entry 307 added for thread 1 sequence 362 ID 0xd9428f03 dest 1:
Sun Jun 08 01:09:28 2014
Thread 1 advanced to log sequence 364 (LGWR switch)
Current log# 1 seq# 364 mem# 0: /u03/ora11g/oradata/TEST01/redo01.log
Sun Jun 08 01:09:30 2014
Archived Log entry 308 added for thread 1 sequence 363 ID 0xd9428f03 dest 1:
Sun Jun 08 01:10:20 2014
Thread 1 advanced to log sequence 365 (LGWR switch)
Current log# 2 seq# 365 mem# 0: /u03/ora11g/oradata/TEST01/redo02.log
Sun Jun 08 01:10:24 2014
Archived Log entry 309 added for thread 1 sequence 364 ID 0xd9428f03 dest 1:
Sun Jun 08 01:10:27 2014
SMON: Parallel transaction recovery tried
我現在要做的就是把表t的高水位線放下來。
SQL> truncate table t;
Table truncated.
SQL> set timing on
select count(*)from t
SQL> /
COUNT(*)
----------
0
Elapsed: 00:00:00.00
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23718752/viewspace-1178198/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RAC開啟資料庫歸檔資料庫
- 記錄一次重啟伺服器 Redis 資料丟失問題伺服器Redis
- Oracle資料庫日常問題-歸檔異常增長Oracle資料庫
- 記一次生產資料庫“意外”重啟的經歷資料庫
- 記一次 Kafka 重啟失敗問題排查Kafka
- 記一次 Mac 意外重啟導致的 Homestead 問題Mac
- 資料庫主機重啟卡住問題處理分享資料庫
- Oracle資料庫歸檔模式的開啟和關閉Oracle資料庫模式
- 記一次資料庫查詢超時優化問題資料庫優化
- 對歸檔模式下CLEAR 未歸檔日誌後恢復資料庫的一點看法模式資料庫
- SQL資料庫怎麼進行資料歸檔和歸檔管理?SQL資料庫
- iptables 重啟後ftp 策略失效的問題FTP
- 資料庫管理-第118期 記一次開啟附加日誌導致的效能問題(202301129)資料庫
- 記一次 oracle 資料庫在當機後的恢復Oracle資料庫
- Jmeter 連結資料庫後批次使用的問題JMeter資料庫
- Polardb-O資料庫歸檔配置資料庫
- Oracle資料庫歸檔模式的切換ELOracle資料庫模式
- 為VNPY增加資料庫記錄交易資料功能資料庫
- 記一次華為雲重啟伺服器後遠端登入提示emergency mode(緊急模式)的解決方式伺服器模式
- 記一次開啟資料庫慢原因分析過程資料庫
- 歸檔路徑更改後,如何對資料庫進行恢復(轉)資料庫
- Oracle資料庫啟動問題彙總(一)Oracle資料庫
- Sybase或SQLServer資料庫分批歸檔方案SQLServer資料庫
- 升級到Spring 5.3.x之後,GC次數急劇增加,我TM人傻了SpringGC
- 資料檢視的重複問題
- MySql資料庫ibtmp1檔案增長問題處理記錄MySql資料庫
- 記憶體洩漏引起的 資料庫效能問題記憶體資料庫
- 記錄一次刪除檔案失敗的問題
- 達夢資料庫歸檔方式及其配置方法資料庫
- oracle資料庫歸檔日誌量陡增分析Oracle資料庫
- Oralce資料庫關閉歸檔日誌並且刪除歸檔日誌資料庫
- linux的centos版本修改時間重啟後無效的問題LinuxCentOS
- 記錄一次手動升級達夢後DmAPService無法啟動問題
- 關於一個歸檔問題?
- DM7使用dmrestore工具利用不同資料庫的歸檔恢復資料庫REST資料庫
- MYSQL速度慢的問題 記錄資料庫語句MySql資料庫
- 好文分享 | 記一次Oracle12c資料庫SQL短暫緩慢問題分析Oracle資料庫SQL
- SSM解決中文存入資料庫亂碼問題(記錄自己的問題)SSM資料庫
- 記一次docker容器啟動/關閉,非常慢的問題Docker