100 Things You Probably Didn't Know About Oracle Database
題記: 最近想去了解國外ORACLE方面的blog,正好看到,所以拜訪了一下Arup的blog,看到他正開始寫這個系列——100 Things You Probably Didn't Know About Oracle Database。 細細讀之,覺得很好,內容雖然基礎,不過思路清晰,講得很透徹。之後的文章還有一些實際的操作驗證方法,有些像Tom的風格。文章後的評論和討論也很值得一讀。分享之~ |
這裡翻譯和補充一下文章最後的總結:
(1)資料buffer刷入磁碟與commit語句是相互獨立的。commit不會觸發資料buffer刷入磁碟。
(2)dirty buffer:在buffer cache中已經修改了但還未刷入磁碟的buffer
(3)如果一個buffer是clean(也就是說不是dirty),並不意味著資料的變化的相關語句已經commit。
(4)commit語句觸發redo日誌的log buffer(而不是資料buffer)寫入磁碟
(5)redo日誌的log buffer可能因為其它事件的觸發已經寫入磁碟(而不僅是commit)。所以在redo log file裡面記錄的內容並不一定是已經提交的。(比如,用logminer時有顯示commit語句的方式)
(6)commit語句會在redo log裡面記錄一個特別的“commit marker”
(7)資料buffer刷入磁碟的頻率與MTTR(Mean Time To Recover)設定 以及 是否需要free buffer來cache新到來的資料檔案中的block有關。那麼如何來利用這些資訊?你可以考慮如下:
(1) commit越多,log buffer寫磁碟次數就越多,這對I/O不是很好
(2) MTTR設定的越小,instance crash時恢復時間也就越短,但是寫datafile的頻率也越高——i/o操作
(3) MTTR與commit頻率無關,它們相互獨立。所以減少commit的頻率不會減少刷資料到磁碟的頻率。
(4) 如果buffer cache比較小,則需要更頻繁地刷資料到磁碟上
(5) 到底多少才算“小”呢?這沒有固定的公式。這取決於在buffer cache中有多少資料被更新了。
100 Things You Probably Didn't Know About Oracle Database
作者:Arup 原文連結
Recently, while delivering a presentation on Cache Fusion at New York Oracle Users Group (), the regional user group where I have been a long time member, I was surprised to hear from many participants some beliefs they had held for a long time that were completely wrong. I always thought these were as obvious as they come; but of course I was dead wrong. What was even more surprising that most of these believers were veterans in Oracle Database technologies; not newbies. Part of the problem – I think – lies with the system that focuses on the execution rather than learning and part of it due to the lack of clear documentation. During that discussion some encouraged me to write about these. I immediately agreed it was a great idea and merited serious attention. Here is the product: my attempt at explaining some of the “mysteries” of how Oracle Database operates. I will cover 100 such nuggets of information, roughly once a week.
Before you start, however, I would like to bring your attention to this important point. You may already be aware of these facts. I did; so it is reasonable to believe that a vast majority would as well. Under no circumstances I am claiming these to be ground breaking or awe-inspiring. If you are already familiar with this specific knowledge, I am not at all surprised. Please feel free to skip. For those who read on, I hope you found these helpful and will take a moment to write to me how you felt.
Part 1: Myth of Commit Causing Buffer to be Flushed to the Disk
Consider a scenario: In the EMP table I updated my salary from 1000 to 2000; and committed immediately. The very instance after I issued commit, if I check the datafile on the disk of the tablespace where this table is located, which value should I see – 1000 or 2000? (Remember, the value was committed)
Did you answer 2000 (perhaps because it was committed)? If so, then consider a normal application where commits are issued up to tens of thousands of times every minute. In a traditional database the weakest link in the chain is always I/O. If Oracle issued an update of the datafile every time someone commits, it would grind to a halt.
Did you answer 1000? Well, in that case, consider a case when the instance crashes. The datafile would have had 1000; not 2000 – the value that was committed. In such a case the instance must bring back the value committed (2000, in this case) to the datafile. How?
Let’s examine a different scenario. Suppose I did not issue a commit after the update (perhaps I was not sure of the implication of giving myself a pay hike or perhaps I had pang of conscience). I left the session as is and left for the day. The next day I was sick and didn’t come to work. 24 hours passed since I updated the record. At that point, if someone reads the datafile, what value would they see – 1000 or 2000?
Did you answer 1000 – a logical choice since the read consistency model of Oracle Database guarantees that the other sessions will see the pre-change data for the un-committed transactions?
Question #3 in this scenario: if you check the redo log file (not the datafile), what value will you find there – 1000 or 2000? Remember, it has not been committed. Did you answer 1000? It sort of makes sense; the changes are not committed so there is no reason for them to be in the redo log file, which is a very important part of the recovery process. If you answered 2000, then how would you explain the recovery process? In case of instance failure, the recovery must read the redo log file and since the transaction was not committed, it must roll the vale back to the previous – 1000. How would it do that if the redo log file contains 2000, not 1000?
The answers, if you can’t wait any longer: 1000 for the first question, 2000 for the second and 2000 for the third. How so? Please read on.
Explanation
To understand the mechanics of the process, let’s go over the buffer management process of the Oracle database. It’s a rudimentary detail but is quite vital in the path to understand the myth here. Consider a very small table in an equally small tablespace we created:SQL> create tablespace testts datafile '/tmp/testts_01.dbf' size 1M;
SQL> create table mythbuster1 (col1 varchar2(200)) tablespace testts;
Insert a row:SQL> insert into mythbuster1 values (‘ORIGINAL_VALUE’);
SQL> Commit;
Shutdown and restart the database so that the buffer cache is completely devoid of this table. You can also issue ALTER SYSTEM FLUSH BUFFER_CACHE; but I want to make sure all traces of this table (and value of the column inside) vanish from all memory areas – buffer cache, shared pool, PGA, whatever. You can now check the presence of the value in the datafile:$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
1j)
w>!
ORIGINAL_VALUE
The value is right there. Now suppose a user issues a statement like this from SQL*Plus:SQL> select * from mythbuster1;
Oracle creates a process – called “server process” – on behalf of this user session to service the request from the session. This process is named, in unix and like OS’es, oracle. Here is how you can find it out:$ ps -aef|grep sqlplus
oracle 14257 14214 0 13:42 pts/2 00:00:00 sqlplus as sysdba
$ ps -aef | grep 14257
oracle 14257 14214 0 13:42 pts/2 00:00:00 sqlplus as sysdba
oracle 14258 14257 0 13:42 ? 00:00:00 oracleD112D2 DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))The process 14258 is the server process. The SQL*Plus process is known as the user process which can be any process a user executes such as a Java program, a Pro*C code, a TOAD session and so on. It’s the server process that handles all the interaction with the Oracle database; not the user process. This is why Oracle database interaction is said to be based on a Two Task Architecture; there are always two tasks – the user task that a regular user has written and the server task that performs the database operations. This is an important concept established during the early foundations of the Oracle database to protect the database from errant code in the user task introduced either maliciously or inadvertently.
The server process then identifies the block the row exists in. Since the database instance just came up the buffer cache is empty and the block will not be found. Therefore the server process issues a read call from the datafile for that specific block. The block is read from the disk to the buffer cache. Until the loading of the block from the disk to the buffer cache is complete, the session waits with the event – db file scattered read. In this very case the session issues a full table scan. Had it performed an index scan, the session would have waited with the eventdb file sequential read. [I know, I know – it seems to defy conventional logic a little bit. I would have assumed index scan to be named scattered reads and full table scans to be sequential].Once this process is complete, the buffer cache holds the copy of the block of the table mythbuster1. Subsequent session, if they select from the table, will simply get the data from this buffer; not from the disk.
Now, suppose the session issues the statement:SQL> update mythbuster1 set col1 = ‘CHANGED_VALUE’;And commits:SQL> commit;Immediately afterwards, check for the presence of the values in the datafile:$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
1j)
w>!
ORIGINAL_VALUEThe old value, not the new value, is found. The datafile on the disk still has the old value; not the new one, even though the transaction has been committed. The update statement actually updated only the buffer in the cache; not the disk. So, when is the data on the disk updated?
The datafile gets updated by a process known as Database Writer (a.k.a. Database Buffer Writer). It’s named DBW0. Actually, there may be more than one such process and they are named DBW0, DBW1, etc. – more conveniently addressed as DBWn. For the purpose of the discussion here, let’s assume only one process – DBW0. It has only one responsibility – to update the datafile with the most up to date buffers from the buffer caches. [Note: I used buffer caches – plural. This is not a typo. There may be more than one buffer cache in the database – keep, recycle, default and other block sizes – but that’s for another day]. The buffer that has been updated is known as a dirty buffer since its contents are different from the block on the disk. DBW0 process writes the contents of the buffer to the disk – making it clean again.
But the big question is when DBW0 writes the dirty buffer to the disk? Ah, that’s the very question we are pondering over here. There are several “triggering” events that cause DBW0 to copy the buffers to the disk – also called flushing of the buffers. By the way, DBW0 is a lazy process; it does not flush buffers by itself or on a regular basis. It sleeps most of the time and must be woken up by another process to perform. its duties. One such watchdog process is called the Checkpoint (you can check its existence by ps -aef | grep ckpt in Unix systems). Checkpoint actually does not perform. the flushing (also called checkpointing activity); but calls the DBW0 process to do it. How often does Checkpoint process perform. a checkpoint? It depends on various conditions – the biggest of all is the MTTR setting, which we will cover later in a different installment.
Next. let's examine a different scenario. Drop the table, create the table again, recycle the database to remove all buffers of the table and then perform. the update; but do not commit. Then flush the buffers from the cache to the disk. You can also trigger a checkpointing activity manually instead of waiting for the checkpoint process. Here is how to do it:SQL> alter system checkpoint;After that statement completes, check the presence of the values in the datafile again:$ strings /tmp/testts_01.dbfThe old value is still there; but that is an artifact; it will eventually be gone. The new value is updated in the datafile. But do you remember a very important fact – the transaction is still not committed? In a different session, if you check the data in COL1 column, you will see the value ORIGINAL_VALUE. Where does Oracle get that value from? It gets that value from the Undo Segments in the Undo Tablespace. The undo tablespace contains the pre-change value.
}|{z
-N?pD112D2
TESTTS
1j)
w>!
CHANGED_VALUE,
ORIGINAL_VALUE
Well, now you may wonder how on earth the server process knows that the data is uncommitted and therefore the undo segment is to be checked. Good question. Let me add yet another wrinkle to it – the datafile contains the new value; not the old one. How does Oracle even know to return which rows pre-change? It gets that information from the header of the block where the transactions are recorded – called Transaction Table – or, a little bit differently: Interested Transaction List (ITL). I will cover that in detail in a future installment of this series. For the time being, please bear in mind that the block header holds that information. When the server process accesses the buffer (or the block on the disk) to get the column value, it accesses the transaction table, sees that there is an uncommitted transaction against it and gets the undo information from there. Finally it creates a different copy of the buffer as it would have looked like had the update statement not been issued. This process is calledConsistent Read (CR) Processing.
Now back to our original discussion. Since DBW0 does not immediately flush the buffers to the datafile, it makes datafile inconsistent with the committed data. Won’t that compromise the recovery process? What would happen when the instance crashes before the flushing has occurred? Since the transaction was previously committed, the recovery should update the datafile. Where does that information come from? Undo tablespace? No; undo tablespace is also another datafile; it gets flushed in the same manner; so it may not have those values. Besides it may not even contain the new value.
Redo Stream
This is where the other leg of the database’s guarantee of the committed transaction comes in. When the changes occur in the table, Oracle also records the information in another pool in the memory called Log Buffer. Compared to buffer caches, which could be several terabytes; this buffer is tiny – often just a few MBs. The update statement records the pre and post change values to the log buffer (not to the log file, mind you). But the log buffer is just an area of memory; it also goes away when the instance crashes. So how does Oracle use the information to protect the committed data?
This is where the redo log files (a.k.a. online redo logs) come into picture. When the session commits, the contents of the log buffer are flushed to the redo log files. Until the flushing is completed, the session waits with various wait events depending on conditions, the majority of which are “log file sync” and “log file parallel write”. But does the log buffer flushing occur only when a commit occurs? No. There are other triggering events as well:
(1) When one third of the log buffer is full
(2) When 1 MB of log buffer is written
(3) Every three seconds
There are other events as well; but these are the major ones. Since commit statement flushes the log buffer to the redo log file, even if the instance crashes the information is stored in the redo log file and can be easily read by the instance recovery processes. In case of a RAC database, a single instance may have crashed. The instance recovery is done by one of the surviving instances. But it must read the redo entries of the crashed instance to reconstruct the blocks on the disk. This is why the redo log files, although for only one instance, must be visible to all nodes.
Even if the commit is not issued, the other triggering events flush the contents of the log buffer to the redo log files as well. The presence of the redo entries on the redo log files is independent of the commit. From the previous discussion you learned that the checkpoint flushes the buffers from the cache to the disk, regardless of the issuance of the commit statement. Therefore, these are the interesting possibilities after a session updates the data (which is updated in the buffer cache):
Scenario Session committed? Log Buffer Flushed Checkpoint Occurred Datafile Updated Redo Log Updated Comment 1 No No No No No 2 No Yes No No Yes 3 No No Yes Yes No 4 No Yes Yes Yes Yes 5 Yes Yes No No Yes Commit will force a redo log flush 6 Yes Yes Yes Yes Yes
Looking at the table above you may see some interesting conundrums – redo log has the changed data but datafile does not and vice versa. How does Oracle know when and what exactly to recover since the presence of record in the redo log file is not a guarantee that the data was committed?
To address that issue, Oracle places a special “marker”, called a Commit Marker in the redo stream which goes into the redo log buffer. When instance recovery is required, Oracle doesn’t just recover anything that is present in the redo log buffer; it looks for a commit marker. If one is not found, then the changes are deemed to be uncommitted; and therefore Oracle rolls them back. If the changes are not found in the redo log, then the changes are uncommitted – guaranteed (remember, a commit will definitely flush the log buffer to redo). In that case Oracle rolls them back from the datafiles – a process known as rolling back. When the changes are found in redo log (along with the commit marker) but no corresponding changes in the datafile (scenario #5), Oracle will apply the changes to the datafile from the redo entries – a process known as roll forward. Recovery consists of both rolling back and forward.
To put it all together, here is a rough algorithm for the actions of the recovery process:
Read the redo log entries starting with the oldest one
Check the SCN number of the change
Look for the commit marker. If the commit marker is found, then data has been committed.
If found, then look for the changes in the datafile (via the SCN number)
Change has been reflected in the datafile?
If yes, then move on
If no, then apply the changes to the datafile (roll forward)
If not found, then the data is uncommitted. Look for the changes in the datafile.
Change found in datafile?
If no, then move on
If yes, then update the datafile with the pre-change data (rollback)
Takeaways
Let me reiterate some of the lessons from this installment.
(1) Data buffers are flushed to the disk from the buffer cache independently of the commit statement. Commit does not flush the buffers to the disk.
(2) If the buffer is modified in the buffer cache but not yet flushed to the disk, it is known as a dirty buffer.
(3) If a buffer is clean (i.e. not dirty), it does not mean that the data changes have been committed.
(4) When a commit occurs, the log buffer (not the buffer cache) is flushed to the disk
(5) Log buffer may already have been flushed to the disk due to other triggering events. So if a change in found in the redo log file, the change is not necessarily commited.
(6) A commit statement puts a special “commit marker” on the redo log, which is the guarantee of a commit.
(7) The frequency of the flushing of the buffer cache to the datafiles is controlled by the MTTR setting and whether free buffers are needed in the cache due to incoming blocks from the datafiles.
How do you use this information? There are several things for you to consider:
(1) The more you commit, the more log buffer will be flushed, not very good for I/O.
(2) The more aggressive the MTTR target is, the less time it will take if the instance crashes, but the more the frequency of flushing to the datafile will be as well – causing I/O
(3) The MTTR target has nothing to do with commit frequency; they are two independent activities. So, reducing commit frequency will not cause a reduction in flushing frequency.
(4) If your buffer size is small, there will be more the need to flush
(5) Exactly how small is “small”? There is no fixed formula; it depends on how much of the data in the buffer cache is updated.
I hope you enjoyed this installment of “100 Things …”. In the future installments I will explain some of the other nuances of the Oracle database that you may not have been aware of. As always, I will highly appreciate if you could drop me a line telling me your feedback – good, bad and anything inbetwen.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23650854/viewspace-686850/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- What you should know about JavaJava
- You don't know CSSCSS
- You don't know css(2)CSS
- Everything you need to know about mobile app architectureAPP
- 2008.1.15 He didn't know it until I put a bug in his ear.
- the program list about oracle database in SAP systemOracleDatabase
- 事件驅動,Do you know?事件
- Module 1 Getting to know you
- You-Dont-Know-JS 疑難彙總JS
- HDU 2685 I won't tell you this is about number theory (數論 公式 快速冪取模)公式
- 3.4.1 About Quiescing a DatabaseUIDatabase
- PostgreSQL DBA(44) - Privileges & User Management - What You Should KnowSQL
- 《The Hard Thing About Hard Things》讀書筆記筆記
- PostgreSQL DBA(71) - Locks(Table-Level):What You Should KnowSQL
- PostgreSQL DBA(74) - Locks(Row-Level):What You Should KnowSQL
- PostgreSQL DBA(75) - Locks(locktype:transactionid):What You Should KnowSQL
- PostgreSQL DBA(76) - Locks(Advisory Locks):What You Should KnowSQL
- 2008 5 5: I want to get know you better
- Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskMIT
- 區塊鏈概念 That You Must Know 第四期(3)區塊鏈
- ionic:Build mobile apps faster with the web technologies you know and loveUIAPPASTWeb
- About Oracle WITH clauseOracle
- About Oracle LockOracle
- Didn't find class "android.support.v7.widget.RecyclerView"AndroidView
- about Res folder doesn't exist in android projectAndroidProject
- What Happens When You Open a Database (297)APPDatabase
- About Oracle HanganalyzeOracle
- 【BUG系列】multiDexEnabled true 不能亂寫啊,ClassNotFoundException: Didn't find class "***IDEException
- 記一次 報錯:Android ClassNotFoundException: Didn't find class on pathAndroidException
- 網站提示:You Don’t Have Permission To Access網站
- Couldn't be opened because you don't have permission to view it 執行時報錯View
- JStorm Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topJSORMYAMLJAR
- About the Oracle GoldenGate TrailOracleGoAI
- T-SQL Verify Database BackupSQLDatabase
- mysql can't connect error about privilege----not root userMySqlError
- You can‘t specify target table ‘Person‘ for update in FROM clause
- You don't have write permissions for the /usr/bin directory.
- 【OCP最新題庫解析(052)--題9】You want to install Oracle 11g databaseOracleDatabase