【思考點亮生命之光】AWR之父Graham Wood專訪

tianya_2011發表於2017-10-12


Graham Wood  

Oracle RWP資深架構師


Graham 在效能最佳化方面有30多年的經驗,設計和最佳化過眾多大使用者量資料量的高階系統,建立及參與建立了包括 Statspack 在內的很多監控工具以及資料庫效能相關的特性,簡化了資料庫監控的流程,是 AWR,ASH,ADDM 的架構師,被業界尊稱為 AWR之 父。Graham 曾多次在 Oracle Open World 以及世界各地的第三方會議上演講,分享如何高效的使用 Oracle 的強大產品。

It’s 22 Mar, less than a week before the RWP China Tour, another technical seminar about Oracle database will also be held at the end of this month in Beijing, on cooperation with RWP and ACOUG.

由 RWP 團隊和 ACOUG 聯合主辦的 RWP 中國之旅將於3月28日在京隆重舉辦,倒數計時一週,一場圍繞 Oracle 效能的技術盛會將全面襲來,大家是否充滿期待?

 

We are all so excited about this conference, and the other reason we are gathering here today is (now you can prepare your applause if you like) that I am going to introduce you Mr. Wood, the father of AWR, the senior architect of Oracle Real-World Performance.

下面為大家隆重介紹:AWR 之父,Graham Wood,同時也是 Oracle RWP 團隊的資深架構師。


Q1:Hello, Mr. Wood. It’s an honor to have this conversation with you, Iguess most people work on database have known you well, but for new friends whomight also read this, please tell something about yourself, say hello toeverybody.

問題一:Mr. Wood您好,很榮幸能對您做這次採訪,首先請您簡單介紹一下自己,跟大家打個招呼吧!

Graham:Hi, I am Graham Wood. I’ve been working at Oracle for over 30 years now and almost all of that time has been working on performance related. Almost as soon as I started working with the Oracle database I was working on projects for which performance was key. As well as working with customers worldwide, I was the architect for AWR, ASH and ADDM which were introduced in Oracle 10g with the intention of making it easier to diagnosis the root cause of performance problems. Now I spend most of my time using those tools.

嗨,我是Graham Wood,已經在 Oracle 工作了30多年,幾乎一直從事與與效能有關的工作。 幾乎從我開始從事 Oracle,我主要負責的就是效能最佳化。在與全球客戶一起工作的同時,我還是 AWR,ASH 和 ADDM 的架構師,這些工具在 Oracle 10g 的時候被引入,目的是使診斷效能問題的根本原因更容易一些。這也是我現在的主要做的事情。


I’m not only interested in the performance of computer systems. I took a flight on Concorde before it was retired and last time I was in China I took the world’s fastest train to Shanghai Airport. I also like fast cars and fast motorcycles.

我不僅僅是對計算機系統效能感興趣,在協和飛機退役之前,我還曾乘坐過一次;上次在中國,我也乘坐世界上最快的火車前往上海機場; 快車和快速摩托車也是我最愛的。

2. It has been two years since last RWP China Tour, which was held in Oct 2015. No need to mention that it was a real great success, 300 technology enthusiasts gather together, they sharing, discussing and of course progressing. It was so much more than collision of technology.

上一次的RWP中國之旅是在2015年舉辦的,那是一次成功的技術盛會。廣大的Oracle技術愛好者歡聚一堂,思維碰撞,共同成長!3月28,我們將再次在北京相聚,我相信大家都已經迫不及待要再次一睹RWP團隊的風采,並在其最新研究中感悟技術之美!


Q2:Mr. Wood, would you like to share some interesting memory or the most unforgettable part about last RWP China Tour in Beijing, and what’s your biggest expectation about this time, what subject will you share in conference?

問題二:關於上次的RWP中國之旅您還記得嗎?有沒有什麼有趣或者難忘的經歷想和大家分享的呢?這次RWP之行,您有什麼期待,您將會分享什麼內容給大家呢?

Graham:I think that we were all very impressed by the enthusiasm and thirst for knowledge from the attendees. For me the most unforgettable part was the age of the attendees. When presenting at conferences I used to give out pens to attendees who were not born in 1986 when I joined Oracle. It would be very expensive to do that in China!

與會者的熱情和對知識的渴望讓我們印象深刻。 對我來說,最難忘的是與會者的年齡差。之前的會議上,我們會給在1986年還沒有出生的參會者贈送禮物,那一年剛好是我加入Oracle。但在中國,竟然有大部分的參會者都是1986年之後出生的年輕人。


Obviously technology has moved on in the last couple of years and we have seen new patterns emerging in the systems that we look at. Unfortunately new technology is not automatically better in all cases, and we will be sharing example of misusing development approaches that can severely limit system throughput.

顯然,技術在過去的幾年中已經發生了變化,系統中出現了新的模式。但很不幸,在所有情況下新技術都不會自我完善,所以我們將分享一些由於開發不當導致嚴重限制系統吞吐量的示例。

Q3:AWR is very important content for Oracle database, it has been helping operators to quickly recognize and resolve the performance issues of database since Oracle 10g, brought a lot of convenience. As the father of AWR, what is your original intention of designing this product, and what’s the major change about AWR in 12c release 2, if you don’t mind, we would also like to know how will it develop in future?

問題三:AWR是Oracle資料庫中很重要的一部分,自從10g 中推出該功能以後,它幫助廣大的運維者能夠快速排查和處理效能問題,作為AWR之父,您最初是出於什麼樣的初衷設計這樣一個工具的?在12.2中AWR發生的最重要的變化您可以跟可以介紹一下嗎? 如果您不介意的話,我們也想知道AWR今後將會朝著什麼樣的方向發展。

Graham:The goal of AWR was to have a single, unified repository populated by ‘always on’ data collection about the operation of an Oracle database. By having this data collected continuously we are never in the position that we have to guess about the reason when we have a performance issue on a system, or have to reproduce a problem before we can diagnose it. The data is always there to allow us to diagnose a problem the very first time it happens. Key to the success of the diagnosis is the concept of DB Time, focusing on the only thing that really matters in performance, time. It allows us to determine the amount of leverage that any one issue has in the overall performance of the database and also allows for some automation of the diagnostic process by ADDM.

AWR 的目標是擁有一個統一的儲存庫,資料來源於 Oracle 資料庫操作中“始終開啟”的資料採集功能。透過不斷累積資料,我們則無需在系統出現效能問題時揣測原因,或者在診斷前必須將問題重現。 歷史資料總能幫我們在故障發生的第一時間將原因確診。診斷成功的關鍵是 DB Time 概念,關注的重點永遠是效能問題的最主要因素 - 時間。 它能讓我們確定任何一個因素在資料庫整體效能上的比例,還能透過 ADDM 進行自動化診斷。


With each new version of the database there are new challenges and requirements for AWR. For Oracle Database 12c release 2 there were obviously major new features that had to be addressed in AWR, most prominently the pluggable multi-tenant databases, while continuing to enhance and improve for existing features, such Active Data Guard, capturing SQL Monitor data into AWR and incorporating data from ASH and ADDM into the AWR report, so we no longer need to ask for multiple files.

AWR 對每個新的資料庫版本都有新的挑戰和要求。對於 Oracle 12.2,最明顯的主要新功能是必須在 AWR 中解決,最突出就是可插拔的多租戶資料庫,不斷增強和改進了現有功能,如 Active Data Guard,將 SQL 監控資料捕獲到 AWR 並整合來自 ASH 和 ADDM 的資料到 AWR 報告中,所以我們無需再要多個檔案。

Q4:With the era of automation and intelligent operation coming, many people want to rescue DBA from repeat cumbersome work, so they have designed a variety of performance monitoring and optimization software. What do you think makes a good monitoring or optimization software,and with so many of these  software and tools out there, would you like to analyse the best application scenario for these tools/softwares, how to make the most of them ?

問題四:隨著自動化和智慧化運維的發展,很多人都想把DBA從繁冗重複的工作中解救出來,為此出現了很多各類的效能監控和最佳化工具,Oracle也有很多自帶的效能工具,您能跟大家分析一下這些工具的最佳使用場景嗎?如何才能更好地發揮這些工具的價值呢?

Graham:There certainly are many tools out there and the key tousing them effectively is make sure that you have correctly scoped the problem before using the right tool. In RWP we always take a top down approach, normally starting with AWR/ADDM to look at a database wide view. From there we will drill down into the issues found. If the issue comes down to SQL, then we will drill down into SQL Monitor reports to get the most detailed information about how and why a statement is consuming resources. Key data sources that we look at in SQL Monitor are the Estimated Rows and Actual Rows which will often lead us to understanding poor plans caused by cardinality mismatches. If we are trying to understand the flow of an application (often in the case where the developer has moved on) then we will use SQL Trace to give a complete database side view of the application. SQL Trace can also give us lots of details about how much time an application is spending outside the database and lead to a ‘not a database problem’ diagnosis where the majority of the time is outside the database. Other scenarios may require us to look at other data. For example diagnosing locking issues we would look at ASH data to identify the activity (if any) of the lock holder and waiters. We can do this easily through the Enterprise Manager Top Activity screen or directly by writing SQL against the ASH data in AWR.

工具肯定是有很多的,有效使用的關鍵是確保在使用之前,你已經正確地定位了問題。在 RWP 中,我們始終採取自上而下的方法,通常從 AWR / ADDM 開始全面檢查資料庫,從而可以深入瞭解發現的根源。如果問題歸結於 SQL,那麼我們將深入瞭解 SQL 監控報告,以獲取如何以及為何某一語句消耗資源的相關詳細資訊。我們在 SQL 監控中看到的關鍵資料來源是估計行和實際行,這些通常會使我們瞭解基數不匹配導致的不良計劃。如果我們試圖瞭解應用程式的流程(通常在開發人員已經移動的情況下),那麼我們將使用 SQL 跟蹤來提供應用程式的完整資料庫端檢視。 SQL 跟蹤還可以提供關於應用程式在資料庫之外花費時間的細節,並給出“非資料庫問題”的診斷。其他情況可能要求我們檢視其他資料,例如診斷鎖定問題,可以檢視 ASH 資料,以確定鎖的持有和等待活動(如果有的話)。可以透過企業管理器頂部活動螢幕輕鬆完成,也可以直接透過在 AWR 中針對 ASH 資料編寫 SQL。

Q5: In the past few years, the traditional industry has been under huge impact of Internet,in fact they need to changes their IT architecture to adapt the trend ,so how do you think the IT architecture ofinternet or traditional industry would change in like 20 or more years?

問題五:在最近這幾年,網際網路給傳統行業帶來了巨大的影響和改變, 很多傳統行業做了IT系統架構的改變去迎接潮流,在您看來,今後比如說十年或者20年的時間裡,傳統行業的IT架構將會有什麼樣的發展趨勢呢?

Graham:20 years! I don’t think anyone has a crystal ball that good! I can certainly say that in that time, applications will come and go, but data will always be there and processing that data will always be major way that companies can add value. The way that users interact with systems will change. When I started with Oracle users interacted with green screens connected to main frames, now a high proportion of interactions are through mobile applications, and I’m sure that in 20 years time many of the interactions will be through voice commands as we are seeing with Apple’s Siri and Amazon’s Alexa (a voice activated assistant that links to home automation tools) But underlying all of these interactions will still be data and solid, rigorous SQL databases will still be needed to manage that data.

20年!我覺得技術發展這麼快,20年是很難預測的! 但我可以肯定,在那個時候,應用程式會發生很大的變化,但是資料的核心地位卻是不變的。資料治理將永遠是企業增值的主要方式。 使用者與系統互動的方式將會改變。 當我開始透過綠色螢幕開始與 Oracle 使用者進行互動時,絕大部分是透過移動應用程式來進行;而且我相信20年後,許多互動將透過語音命令進行,就像與 Apple Siri 和 Amazon Alexa (一個連線到家庭自動化工具的語音啟用助手,)但是,所有這些互動的基礎仍然是資料,而且仍然需要嚴格的 SQL 資料庫來管理資料。

 

Q6:During your 30 years database performance tuning experience, what’sthe most reason of the performance issues you’ve encountered result from, whatis the ideal system like and how to build a perfect database system?

問題六:在您30多年的效能最佳化的經驗裡,您遇到的最多的導致效能問題的原因是什麼呢?一個效能良好的完美系統應該是什麼樣的呢?如何才能搭建這樣一套系統呢?

Graham:Well over 30 years all of the hardware related issues have changed completely. Moore’sLaw has continued and more recently we have seen a big change from spinning disk to solid state technology for permanent storage. But all the way through the single most common reason for performance issues has been suboptimal design and build of applications. A well designed database and interfaces makes it much simpler build a well performing application, and a poorly designed database and interface make it impossible.

30多年來,所有硬體相關問題都已經完全改變。 摩爾定律還在持續中,最近我們已經看到從磁碟到固態儲存的巨大變化。 但自始至終,效能問題的最常見原因,一直都是是次優設計和應用程式的構建。精心設計的資料庫和介面使構建良好的應用程式變得簡單很多,反之則成為不可能。


As far as an ideal system, it is really all about balance. A balanced system is one that can fully utilize all hardware resources, a system that has enough IO bandwidth to be able to keep the CPUs busy or has enough memory that you don’t need to do much IO. This has been a big part of the reason for the success of the Oracle engineered systems such as Exadata. But still, having a powerful, well balanced system can only go so far to help an application that is poorly designed whereas it can give exceptional performance with a well designed one.

就一個理想的系統來說,平衡才是核心。 一個平衡的系統能夠充分利用所有硬體資源,並且具有足夠的IO頻寬使CPU持續工作,此外還擁有足夠的記憶體,而無需對IO做過多處理。 這已經成為Oracle工程系統成功的主要原因,例如 Exadata。 但是,擁有一個功能強大並且平衡的系統,到目前為止,對一個設計不善的應用程式的幫助微乎其微,而對一個精心設計的應用程式則可以提供卓越的效能改善。

 

Q7:There is a new feature in Oracle 12.2 that is about ADG&AWR, describe as following:

(The Oracle Diagnostics Pack can be used with an Active Data Guard standby database that is open for read-only access. This enables you to capture the performance data tothe Automatic Workload Repository (AWR) for an Active Data Guard standby database and to run Automatic Database Diagnostic Monitor (ADDM) analysis on the AWR data. This feature enables performance tuning for read-only workloads executing on an Active Data Guard standby database.)


I assume you are familiar with that, we would like to know while using this new feature, is there anything we should pay attention to, does it have limit or restraints?

問題七:在Oracle 12.2中,有一個關於AWR的新特性,現在在ADG的standby庫上也能夠生成AWR報告,這樣就可以避免對Primary的影響。

我想您對這個特性應該很熟悉,想請您介紹一下,在使用這個新特性的時候有什麼要注意的問題嗎?存在什麼樣的限制嗎?

Graham:When ADG was first introduced in Oracle 11g it came with some limitations. As the ADG was real only it was not possible for it to capture AWR snapshots. The work around that was put in place was to modify the old Statspack code to allow capture of performance data from the ADG back into the primary database. With Oracle 12c release 2 the full set of AWR data is now available for ADG. The only limitation that I am aware of is that a database bounce is required for the transition from ADG to primary to complete. This istemporary limitation and we expect that the limitation will be removed in a patch set.

當 ADG首次在Oracle 11g中被引入時有一些限制。 由於 ADG 不可能捕獲AWR 快照。 實施的解決方法是修改舊的 Statspack 程式碼,以便將 ADG 中的效能資料捕獲回主資料庫。 在 Oracle 12.2中,ADG 已經可用 AWR 的全套資料。而我知道的唯一限制是從 ADG 到開始到完成的過渡需要資料庫反彈。 這個限制是暫時的,有望在補丁中被移除。


Q8: The PaaS is the least developed cloud service in the three, in past few years, many company lose interest for PaaS just because the flexibility of programming languages and infrastructure, so in future, how do you think this problem would be solved?  

問題八:PaaS 是三種雲服務模式中發展最慢的一個,在過去的這些年,很多企業因為PaaS在程式語言和架構上的靈活性不夠而放棄使用,您覺得以後這些問題會怎樣解決呢?

Graham:PaaS is certainly the least developed compared to IaaS and SaaS. I think that the key to success in that market is ensuring the PaaS offerings can support the ‘flavor of the month’ development tools on top of solid software stack so that developers can quickly try things out. I think it has to change more quickly, be more ‘agile’ than the other two offerings.

與 IaaS 和 SaaS 相比,PaaS 肯定是發展最不成熟的。我認為,在市場上取得成功的關鍵是確保 PaaS 產品能夠在堅實的軟體堆疊之上支援當下最新的開發工具,以便開發人員能夠快速嘗試。我認為它必須要更加善變,比其他兩個產品也要更加“敏捷”。


Q9:In your experience of database performance tuning over these years, what issue or what kind of issue is the most you’ve ever Encountered, for each kind of performance problem, does the portion change over these years, How and why, Would you like to share with us?

問題九:在您這麼多年的效能最佳化經驗裡,遇到的哪一類問題比較多,這些效能問題的型別所佔的比例,在這些年有什麼樣的變化呢 ?

Graham:Well, we certainly see less IO bound systems these days. It used to be the most common symptom seen in poorly performing systems but now that IO subsystem can deliver many GBs per second of bandwidth it is less frequently the issue. But once you really got to the root cause it always came down to schema design and the effective use of SQL. That certainly hasn’t changed over the years even though the languages and interfaces used by developers have changed completely.

現在我們看到更少的IO繫結系統。它曾經是在效能不佳的系統中最常見的症狀,但現在IO子系統可以每秒提供許多GB的頻寬,不太常見。 一旦深入到根本原因,總是會歸結到架構設計和SQL的有效使用。很多年過去了,儘管開發人員使用的語言和介面已經完全改變,但這依然沒變。

 

Q10:People who work on database especially oracle database area they admire your work a lot, some of them might be there on conference site 28 Mar, would you like to share your study methods to them, or any suggestions?

問題十:在28號的RWP會上,將會有很多RWP的粉絲到現場,大家對你們團隊的工作充滿了敬佩,您願意跟大家分享一下您的學習方法嗎,或者關於如何做好效能最佳化,您有什麼建議給他們嗎?

Graham:The approach that we take is top-down, looking at the entire systems (not just the database) and finding where the time is actually being spent, which is where there is the biggest leverage for improvement. Of course, that involves looking at data, rather than guessing and jumping to potential solutions, and thinking. As one of my colleagues in RWP often states it ‘Thinking is so important’.

我們採取的方法是自上而下的,檢視整個系統(而不僅僅是資料庫),並查明實際花費的時間,哪裡有最大改進。當然,這涉及到檢視資料,而不是猜測和跳轉到潛在的解決方案和想法。就像我一位 RWP 的同事經常說的一樣 “生命在於思考”。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25373498/viewspace-2145870/,如需轉載,請註明出處,否則將追究法律責任。

相關文章