Oracle Performance Tuning 11g2 (2-1)

yuntui發表於2016-11-03

在windows live writer中編輯的時候,以及預覽的時候都是整整齊齊的,但是一上傳到部落格中就變得烏七八黑了

看的時候覺得很簡單,但是一翻譯起來就頭大了,這樣一篇檔案,花了2天才翻譯完。2天時間足夠看4篇文章了。總結一下就是看的時候不一定真的看懂了,大概的明白意思就是了。但是翻譯的時候就要逐字逐字的,逐詞逐句的思考,有時還要前後對照著翻譯。常幹這活,會折壽的!

 

延續上一篇的Oracle Performance Tuning 11g2 (2-0)

2.5 Application Design Principles         應用程式設計的原則

This section describes the following design decisions that are involved in building applications: 下面介紹設計的一些方法:

  • Simplicity In Application Design                 簡化應用設計

  • Data Modeling                                         資料模型

  • Table and Index Design                           表和索引設計

  • Using Views                                           使用檢視

  • SQL Execution Efficiency                         SQL執行的效率

  • Implementing the Application                   編寫程式碼

  • Trends in Application Development           掌控應用開發進度

2.5.1 Simplicity In Application Design              簡化應用設計

Applications are no different than any other designed and engineered product. Well-designed structures, computers, and tools are usually reliable, easy to use and maintain, and simple in concept. In the most general terms, if the design looks correct, then it probably is. This principle should always be kept in mind when building applications.

應用設計和其他的工程設計是一樣的。被良好設計的架構,計算機,工具是通常可以依賴的,是容易使用、維護,和理解的,至少在概念上是這樣子的。在多數的專案組中,假如架構設計看起來正確的話,專案也會基本上會成功。這種理念應該永記心底。 --->簡單來講,要向喬大爺學習,簡單的想甜一下!(或者是美的想甜一下)

Consider the following design issues:

  • If the table design is so complicated that nobody can fully understand it, then the table is probably poorly designed.

  • If SQL statements are so long and involved that it would be impossible for any optimizer to effectively optimize it in real time, then there is probably a bad statement, underlying transaction, or table design.

  • If there are indexes on a table and the same columns are repeatedly indexed, then there is probably a poor index design.

  • If queries are submitted without suitable qualification for rapid response for online users, then there is probably a poor user interface or transaction design.

  • If the calls to the database are abstracted away from the application logic by many layers of software, then there is probably a bad software development method.

  • 如果表設計的沒人能看懂,就說明設計太差了

  • 如果SQL語句寫的又臭又長,以至於最佳化器都不知道如何去實時的最佳化,那麼這是一個很差的語句,很差的事務和表的設計

  • 假如一個表上的某一個列存在在兩個索引上,(兩個引數都引用了一個列的話),那這個索引設計就是有問題的。(個人經驗是表設計的問題)

  • 如果一個查詢條件無法快速的響應給聯機使用者,那麼這個使用者介面或事務設計是糟糕的。(oracle自己的certview慢的能死人的,還好意思在這裡講)

  • 假如資料庫訪問功能被許多中間層包裹,太多的抽象以至於脫離了應用邏輯,那這個應用設計是糟糕的(這個一般容易出現在面嚮物件語言中,特別是JAVA中)


2.5.2 Data Modeling

Data modeling is important to successful relational application design. You must perform this modeling in a way that quickly represents the business practices. Heated debates may occur about the correct data model. The important thing is to apply greatest modeling efforts to those entities affected by the most frequent business transactions. In the modeling phase, there is a great temptation to spend too much time modeling the non-core data elements, which results in increased development lead times. Use of modeling tools can then rapidly generate schema definitions and can be useful when a fast prototype is required.

在關係型資料庫中,資料模型是非常重要的。你必須要能快速的將這個模型反應到業務需求上。在設計這個資料模型過程出現爭論是正常的(沒有討論的模型設計,結果一般比較差,維護類專案中新增功能除外)。最重要的事情是這個模型要能代表最頻繁,最關鍵業務的應用。在這個模型設計階段,很容易出現在非關鍵業務資料上花費過多的時間,導致開發延期(在銀行的專案開發中這事經常出現的)。使用一些模型開發工具可以加快原型設計。

 



2.5.3 Table and Index Design

Table design is largely a compromise between flexibility and performance of core transactions. To keep the database flexible and able to accommodate unforeseen workloads, the table design should be very similar to the data model, and it should be normalized to at least 3rd normal form. However, certain core transactions required by users can require selective denormalization for performance purposes.

Examples of this technique include storing tables pre-joined, the addition of derived columns, and aggregate values. Oracle Database provides numerous options for storage of aggregates and pre-joined data by clustering and materialized view functions. These features allow a simpler table design to be adopted initially.

Again, focus and resources should be spent on the business critical tables, so that optimal performance can be achieved. For non-critical tables, shortcuts in design can be adopted to enable a more rapid application development. However, if prototyping and testing a non-core table becomes a performance problem, then remedial design effort should be applied immediately.

Index design is also a largely iterative process, based on the SQL generated by application designers. However, it is possible to make a sensible start by building indexes that enforce primary key constraints and indexes on known access patterns, such as a person's name. As the application evolves, and as you perform testing on realistic amounts of data, you may need to improve the performance of specific queries by building a better index. Consider the following list of indexing design ideas when building a new index:

表的設計多數是一個在效能和靈活性之間做的一個折衷方案結果,應該主要關注那些核心業務(次要業務交給畢業生設計都可以)。為了保持資料庫的靈活性,以及能夠適應不可預知的壓力,表的設計應該儘可能的和資料模型相似,同時應該至少符合第三正規化(大學學完就忘完了,當年那個用數學算啊算的,數字一忘,哪裡還記得怎麼做)。然而在一些核心的業務上,使用者需要極高的響應時間時,可以為了效能可以不按常規來設計。

具個例子就是透過表的預先合併,加上額外附加列或者是根據某種規則預先演算出列值,聚合值。oracle透過cluster和MV提供了支援此功能的函式。這些特性簡化了表的設計。

再次說明:一定要把精力和各種人力等資源放到業務的關鍵業務的核心表設計上,這樣就能保證基本上業務系統能正常運轉。對於非核心業務,雖然設計階段未完整設計,但是可以在後期進行加快處理。如果原型和非核心表的設計上出現效能問題,透過補救設計來立刻修正它。(在銀行核心系統業務中,80%的時間是和行方在那些非核心業務上週旋。也許覺得不可思議,但是行方的理念通常是核心業務本來就應該是你們做好的,所以他們只能在八杆子打不著的地方亂捅!)

索引的設計也是一個迭代的過程。但是從一開始就可以在一些很明顯的列上,比如姓名上,加上主鍵約束或索引。隨著專案的推進,各種功能的增加,以及在上線前對系統在相當數量的真實資料上進行測試時(將生產資料導到測試環境中),可以去增加或調整索引。當設計時使用下面的方法進行設計:

  • Appending Columns to an Index or Using Index-Organized Tables   將一些列加到索引上,或者使用IOT表

  • Using a Different Index Type                    根據不同的情況使用不同的索引型別

  • Finding the Cost of an Index                    找出成本最高的索引

  • Serializing within Indexes                         透過索引進行序列化

  • Ordering Columns in an Index                  透過索引進行排序

2.5.3.1 Appending Columns to an Index or Using Index-Organized Tables

One of the easiest ways to speed up a query is to reduce the number of logical I/Os by eliminating a table access from the execution plan. This can be done by appending to the index all columns referenced by the query. These columns are the select list columns, and any required join or sort columns. This technique is particularly useful in speeding up online applications response times when time-consuming I/Os are reduced. This is best applied when testing the application with properly sized data for the first time.

The most aggressive form of this technique is to build an index-organized table (IOT). However, you must be careful that the increased leaf size of an IOT does not undermine the efforts to reduce I/O.

加快查詢速度的最容易的辦法之一就是,在執行計劃中透過避免全表掃描來以最大程式的減少IO消耗。可以透過將查詢的列全部放到索引裡面。在索引裡的這些列是你select時要查詢的那些列,加上需要排序或者join時用到的列。這種技術透過減少IO消耗,加速聯機應用程式的響應時間非常有幫助。當資料量達到一定程式時再去使用。

聚集度最高的方法是建立IOT表(排好的資料擠在一個或相鄰的塊中,掃描非常快)。但是這種方法對於要經常插入資料的業務而言,所造成的IO開銷也是不少的。(也就是說對於IOT表而言,最好是以查詢為主的應用或者是參數列的應用)

 



2.5.3.2 Using a Different Index Type

There are several index types available, and each index has benefits for certain situations. The following list gives performance ideas associated with each index type.

oracle給我們提供許多型別的索引,每一種都有它強勁的地方。下面給出每種索引他們的優點



2.5.3.2.1 B-Tree Indexes

These indexes are the standard index type, and they are excellent for primary key and highly-selective indexes. Used as concatenated indexes, the database can use B-tree indexes to retrieve data sorted by the index columns.

B樹索引:這是最標準的索引型別,它們對於需要按主鍵或者選擇度很高的列進行查詢時,這種索引表現出色。透過使用索引,資料庫可以直接透過B樹,獲取到已經排序好資料。

注: 選擇度最高的就是主鍵的,因為他們從來不會重複;一般而言在OLTP中,這個選擇度要儘可能地達到或接近主鍵級別,一般有2~5條記錄重複是正常的。

比如一個POS機系統(去超市時刷卡的機器),每天產生200W的流水,但是多數客戶每天可能只有一筆消費記錄,或者稱消費流水;有錢人可能會有好幾筆,甚至幾十筆。這都沒有問題,因為消費再多,與200萬相比也是非常少的。在銀行中會有一個風險控制系統(簡稱風控系統),這個系統會根據POS的編號去檢視一個POS機每天的記錄數(當然從各種維度下去檢查的,不至這一個條件的),同時會對每張卡消費都做記錄,因此如果一個人瘋狂刷卡時,銀行的風控系統就會發現的。一旦某個人交易數太多,或者POS刷卡數太多,達到銀行設定的警戒線,就會被發現。但是如好又多之類的大超市,他們客流量大,所以交易量也大,銀行會對這些客戶做額外的閾值設定。對於這些大客戶,他們交易量比較大,所以選擇度就低、效率也低嗎? 非也,因為在查詢的時候都有一個條件----時間,一般精確到秒的,時間域也是在索引列中放置的,所以在OLTP系統中,對於大客戶一樣是選擇度非常高的。慢的地方是在做批次處理時,要將這些資訊取出來,會稍慢一些的。

 



2.5.3.2.2 Bitmap Indexes

These indexes are suitable for low cardinality data. Through compression techniques, they can generate a large number of rowids with minimal I/O. Combining bitmap indexes on non-selective columns allows efficient AND and OR operations with a great number of rowids with minimal I/O. Bitmap indexes are particularly efficient in queries with COUNT(), because the query can be satisfied within the index.

點陣圖索引:這種索引適合建立在低基數資料列上(一個列可能只有個別幾個值,如男或女,狗或貓等等)。透過壓縮的技術,它們可以透過最小的IO獲取到最大的ROWID值。如果查詢條件是選擇度低比較低,同時又使用AND/OR去過濾時,oracle會透過合併bitmap索引(或是位運算)達到減少IO,又提高檢索ROWID的效果。bitmap索引對於要查詢count值的SQL語句時效率額外的高,因為他們都直接透過索引去獲取了(不會再透過rowid去反查原表資料了。這裡講的是如果select count(*) from t1 where id1 = ‘狗男女’這樣的方式時,就一定會走到索引上,而且交易非常的高。如果什麼條件也沒有,那還是要全表掃描的了。那麼這種效率有多高呢?我覺得是第二高,第一名是直接把count結果透過group by放到一個mv中,oracle透過query rewrite enable直接跳到mv中,將查詢結果取出來)

 



2.5.3.2.3 Function-based Indexes

These indexes allow access through a B-tree on a value derived from a function on the base data. Function-based indexes have some limitations with regards to the use of nulls, and they require that you have the query optimizer enabled.

Function-based indexes are particularly useful when querying on composite columns to produce a derived result or to overcome limitations in the way data is stored in the database. An example is querying for line items in an order exceeding a certain value derived from (sales price - discount) x quantity, where these were columns in the table. Another example is to apply the UPPER function to the data to allow case-insensitive searches.

函式索引: 這種索引和B樹索引是一樣的,只不過它允許在查詢的列上使用函式(即由函式根據基值演算出的值)。它有許多的限制,特別是NULL值,同時需要我們開啟查詢最佳化器來使用。

函式索引對於那些需要根據一些組合列算出一個值,這個值已經存在在資料庫中時它非常有用。具個例子:當需要根據【(銷售金額-回扣)*銷售量】來從訂單中獲取商品值時,因為這個銷售的價格已經是在銷售後就存在在資料庫中了。另外一個例子是不知道某個欄位是大寫還是小寫的,這時直接將所有的都按UPPER來查詢,函式索引就派上用場了

注:理解了如何建立之後,都不用看上面的話了。函式索引有兩種建立方法,一種是傳統的方式,一種是11g中新增的虛擬列(這個應該是學DB2的,因為DB2沒有函式索引,只能使用稱之為生成列的方式實現,這個生成列和虛擬列除了要不要儲存到硬碟的區別外,其他一樣,連建立的語法都是generated always as,呵呵。有很大可能是oracle抄DB2的。這樣DB2的專案在搬遷到oracle時就不需要再做太多的改動了)

 



2.5.3.2.4 Partitioned Indexes

Partitioning a global index allows partition pruning to take place within an index access, which results in reduced I/Os. By definition of good range or list partitioning, fast index scans of the correct index partitions can result in very fast query times.

全域性索引分割槽允許在訪問索引時就進行分割槽裁減,這樣大大減少了IO。透過合理的定義範圍(range)或者值(list)分割槽,可以在極快的速度進行快速索引掃描(還有hash分割槽的)。

注:不知道哪個DBA以前告訴我們領導說分割槽對效能沒什麼提升,導致後來我們一說要分割槽,領導就覺得在乾沒意義的工作!花了我N長的演示他才相信以前的DBA錯了。但是他還是覺得以前的DBA很好,看來情商才是最重要的啊,智商算個屌啊!

 



2.5.3.2.5 Reverse Key Indexes

These indexes are designed to eliminate index hot spots on insert applications. These indexes are excellent for insert performance, but they are limited because the database cannot use them for index range scans.

這種索引的使用是為了在插入資料時消除索引的熱區競爭。這種索引在插入密集的應用中表現出色,但是它們缺點是無法進行範圍查詢。

 



2.5.3.3 Finding the Cost of an Index

Building and maintaining an index structure can be expensive, and it can consume resources such as disk space, CPU, and I/O capacity. Designers must ensure that the benefits of any index outweigh the negatives of index maintenance.

Use this simple estimation guide for the cost of index maintenance: each index maintained by an INSERT, DELETE, or UPDATE of the indexed keys requires about three times as much resource as the actual DML operation on the table. Thus, if you INSERT into a table with three indexes, then the insertion is approximately 10 times slower than an INSERT into a table with no indexes. For DML, and particularly for INSERT-heavy applications, the index design should be seriously reviewed, which might require a compromise between the query and INSERT performance.

See Also:

Oracle Database Administrator's Guide to learn how to monitor index usage

建立和維護索引是一種非常昂貴的工作,它會消耗IO,CPU,以及硬碟空間。設計人員要確保你建立這個索引帶來的收益要遠遠高於維護它的成本時再使用。

使用這個簡單的方法來評估索引維護成本:在插入刪除更新索引列值時,大概會比不使用時多消耗3倍的資源。因此當你插入一個有3個索引的表時(1個表上有3個索引),通常會比沒有索引時慢10倍。因此,對於那些需要大量插入的表或者應用,設計索引時要非常的謹慎的,你需要在查詢與插入之間做一個權衡。

注:我曾經在給一個銀行做維護時,把一個索引幹掉之後,效能立刻提升了100筆左右。但是專案組人員很不滿意,覺得這個索引未來可能會有用,真是無語了!如果你知道他們在這個表上建了10個索引的話,而且是要做大量插入修改動作,你會怎麼想呢?

如果監控索引呢?使用alter index xxxx_index_name monitoring usage; 透過查詢V$OBJECT_USAGE檢視看索引是否真的被執行計劃引用到

這個檢視的解釋如下: V$OBJECT_USAGE displays statistics about index usage gathered from the database for the indexes owned by the current user. You can use this view to monitor index usage. All indexes that have been used at least once can be monitored and displayed in this view.

 



2.5.3.4 Serializing within Indexes

Use of sequences, or timestamps, to generate key values that are indexed themselves can lead to database hotspot problems, which affect response time and throughput. This is usually the result of a monotonically growing key that results in a right-growing index. To avoid this problem, try to generate keys that insert over the full range of the index. This results in a well-balanced index that is more scalable and space efficient. You can achieve this by using a reverse key index or using a cycling sequence to prefix and sequence values.

如果一個索引列使用sequence,或者時間域的話,就可能會造成資料庫“熱區”問題,這是會影響響應時間和吞吐量的。這是因為它們不斷的單調性增長造成索引的右側傾斜(sequence從小到大的增長,時間也是)。為了解決這個問題,儘可能將索引值能分佈到整個index範圍內,而不是這種連續性的。當索引平衡時,系統的空間使用及效率都會更好。你可以透過使用反向索引或再建立一個迴圈sequence加到以前自己的索引頭上組成一個新值來達到平衡的目的。

 



2.5.3.5 Ordering Columns in an Index

Designers should be flexible in defining any rules for index building. Depending on your circumstances, use one of the following two ways to order the keys in an index:

  • Order columns with most selectivity first. This method is the most commonly used because it provides the fastest access with minimal I/O to the actual rowids required. This technique is used mainly for primary keys and for very selective range scans.

  • Order columns to reduce I/O by clustering or sorting data. In large range scans, I/Os can usually be reduced by ordering the columns in the least selective order, or in a manner that sorts the data in the way it should be retrieved. See Chapter 14, "Using Indexes and Clusters".

設計人員應該在定義index時保持足夠的靈活性。根據你的環境使用下面的兩種方法來在index中排序:

在索引中的第一列應該是最具有選擇度的(也就是說這一列最好能過濾過95%的資料)。這樣就能以最快的方式,以最少的IO,根據ROWID獲取資料。這種技術主要用於根據主鍵或者有極強的選擇度的範圍查詢

透過cluster或者排序資料減少需要排序列的IO消耗。在一個大的範圍查詢中,IO可能被儘可能降低,透過對低選擇度的列排序或者使用需要按順序取資料時預先排這些資料。(低選擇度不是說你要取表中30%資料這樣的概念,如果是這樣的話,oracle會直接表全表掃描的)

 



2.5.4 Using Views

Views can speed up and simplify application design. A simple view definition can mask data model complexity from the programmers whose priorities are to retrieve, display, collect, and store data.

However, while views provide clean programming interfaces, they can cause sub-optimal, resource-intensive queries. The worst type of view use is when a view references other views, and when they are joined in queries. In many cases, developers can satisfy the query directly from the table without using a view. Usually, because of their inherent properties, views make it difficult for the optimizer to generate the optimal execution plan.

檢視可以加速和簡化應用的開發。一個簡單的檢視定義可以隱藏複雜的資料模型,讓主要從事增刪改查的程式設計師不要過多的關心。

雖然檢視可以提供一個簡單明瞭的介面,但是他們也可能造成低效,特別是資源消耗敏感的查詢語句。最差的檢視是一個檢視又使用了另一個檢視,以及檢視之間的join查詢。許多時候開發人員可以直接從表裡取資料而不是透過檢視做。通常來說,由於檢視與生俱來的特點,最佳化器很難去做出最優的執行計劃。

 



2.5.5 SQL Execution Efficiency

In the design and architecture phase of any system development, care should be taken to ensure that the application developers understand SQL execution efficiency. To achieve this goal, the development environment must support the following characteristics:

在系統開發的任何設計階段,請注意一定要確保開發人員要理解SQL的執行效率。為了達到這個目標,開發環境中必須要滿足以下幾個特點:

  • Good database connection management            良好的資料庫連線管理模組

    Connecting to the database is an expensive operation that is highly unscalable. Therefore, the number of concurrent connections to the database should be minimized as much as possible. A simple system, where a user connects at application initialization, is ideal. However, in a Web-based or multitiered application, where application servers are used to multiplex database connections to users, this can be difficult. With these types of applications, design efforts should ensure that database connections are pooled and are not reestablished for each user request.

        連線資料庫是一個非常昂貴的過程(連線mysql除外)。因此併發連線資料庫應該越少越好。只有一個使用者連線上來的簡單的系統是最理想的。然而在一個WEB或者多層應用架構中,應用伺服器多路連線到資料庫給使用者使,情況變得非常困難。在這種型別的應用中,設計人員應該確保資料庫連線是使用“連線池”,並且確保每個使用者連線上來時不是真正的再重新連線到資料庫上(也就是從連線池中取一個連線使用)。

注:其實java連線基本上由weblogic解決,已經實現了連線池了;C程式基本上都是長連線,不會輕易斷開的,除了shell指令碼定期會連線一下,這個問題不大,畢竟幾分鐘才連線一次嘛。php好像在連線上是會有些問題的,不知道有沒有框架去解決,這個尚未去研究。不過PHP作為網站語言,主要和MYSQL使用。和oracle打交道並不多。

  • Good cursor usage and management                 良好的遊標使用和管理

    Maintaining user connections is equally important to minimizing the parsing activity on the system. Parsing is the process of interpreting a SQL statement and creating an execution plan for it. This process has many phases, including syntax checking, security checking, execution plan generation, and loading shared structures into the shared pool. There are two types of parse operations:  

    最大程度的降低SQL解析的重要性不亞於維護連線管理。解析就是翻譯SQL語句以及建立一個執行計劃的過程。這個過程有很多的階段,包括了語法的檢查,安全檢查,執行計劃產生 ,載入共享結構到shared pool中。oracle有兩種型別的解析:

    • Hard parsing     硬解析

      A SQL statement is submitted for the first time, and no match is found in the shared pool. Hard parses are the most resource-intensive and unscalable, because they perform all the operations involved in a parse.          當SQL語句第一次執行的時候,在shared pool中找不到些執行歷史記錄。硬解析是非常消耗資源的,但又不可避免,因為處女解析總是要來一次的。

    • Soft parsing     軟解析

      A SQL statement is submitted for the first time, and a match is found in the shared pool. The match can be the result of previous execution by another user. The SQL statement is shared, which is good for performance. However, soft parses are not ideal, because they still require syntax and security checking, which consume system resources.                               當SQL語句第一次執行的時候,在shared pool中找到了歷史執行的記錄。這個執行的記錄有可能是其他使用者執行的。為了效能考慮,這些SQL語句是共享的。然而軟解析不是最理想的,因為他們仍然需要語法和安全檢查,消耗資源。

    Because parsing should be minimized as much as possible, application developers should design their applications to parse SQL statements once and execute them many times. This is done through cursors. Experienced SQL programmers should be familiar with the concept of opening and re-executing cursors.

    Application developers must also ensure that SQL statements are shared within the shared pool. To achieve this goal, use bind variables to represent the parts of the query that change from execution to execution. If this is not done, then the SQL statement is likely to be parsed once and never re-used by other users. To ensure that SQL is shared, use bind variables and do not use string literals with SQL statements. For example:

        因為解析的過程應該儘可能的少,所以應用開發人員應該做到一次解析,多次執行。這個是可以透過遊標來實現的。高手應該會非常熟悉這個概念的,包括了開啟和重新執行遊標。應用開發人員同樣應該確保SQL是共享在shared pool中的。為了實現這個目標,繫結變數就代表了查詢條件中那麼變來變去的條件。假如不是使用繫結變數的話,可能一個SQL執行完後再也不會去執行了(OLTP中很多時候最多3次)。為了確保SQL是共享的,確保使用繫結變數和不要使用字串值作為SQL的查詢條件。(可問題是開發人員很多都不知道什麼叫繫結變數,或許他們在使用,但是有可能是從網上抄的一個用法,國內公司培訓培訓啊)例如:

    Statement with string literals:        

    SELECT * FROM employees 
      WHERE last_name LIKE 'KING'; --->使用字面值的做法

    Statement with bind variables:

    SELECT * FROM employees 
      WHERE last_name LIKE :1; --->使用繫結變數的做法

    The following example shows the results of some tests on a simple OLTP application: 下面的例子顯示了一個在OLTP在的測試結果

    Test                         #Users Supported
    No Parsing all statements           270 
    Soft Parsing all statements         150
    Hard Parsing all statements          60
    Re-Connecting for each Transaction   30

    These tests were performed on a four-CPU computer. The differences increase as the number of CPUs on the system increase. See Chapter 16, "SQL Tuning Overview" for information about optimizing SQL statements.  上面的測試是在一個4核電腦上測試的。隨著CPU個數的增加,差異也會增加

 



2.5.6 Implementing the Application       應用開發實現

The choice of development environment and programming language is largely a function of the skills available in the development team and architectural decisions made when specifying the application. There are, however, some simple performance management rules that can lead to scalable, high-performance applications.

應用開發實現:開發環境和程式語言的選擇很大程度上是由開發團隊的技能決定的,以及當指定了開發環境時架構決定的。然而有些簡單的效能管理方法可以讓應用程式變得更高效。

  1. Choose a development environment suitable for software components, and do not let it limit your design for performance decisions. If it does, then you probably chose the wrong language or environment.  為軟體模組選擇一個合理的開發環境,不要限制你的設計和效能決策。如果不能的話,那可能是選擇了錯誤的開發語言和環境(在銀行除了C和JAVA很難做選擇的。想用C++?前提是銀行願意為你買C++編譯器;想用bash,python?前提是銀行願意給你安裝這些軟體;因此很多時候沒得選擇的,除了C和JAVA沒得選擇)

    • User interface       使用者介面

      The programming model can vary between HTML generation and calling the windowing system directly. The development method should focus on response time of the user interface code. If HTML or Java is being sent over a network, then try to minimize network volume and interactions.

            開發模型或者是使用html的或者是使用傳統的windows視窗。開發的重點應該集中在響應時間上。假如使用JAVA語言BS架構來開發,請確保儘量降低每次網頁獲取的資料量和互動的次數。簡單說就是別一開啟任何視窗或功能時,都先從資料庫裡掃一堆資料回來展示,那樣業務人員在使用介面時就很不順暢,不順暢他們就會罵人。

    • Business logic      程式邏輯

      Interpreted languages, such as Java and PL/SQL, are ideal to encode business logic. They are fully portable, which makes upgrading logic relatively easy. Both languages are syntactically rich to allow code that is easy to read and interpret. If business logic requires complex mathematical functions, then a compiled binary language might be needed. The business logic code can be on the client computer, the application server, and the database server. However, the application server is the most common location for business logic.

            像JAVA和PLSQL這種解釋型語言(C是編譯型語言,編譯語言通常要會解釋型語言執行快幾倍到幾十幾百倍),是非常適合去做這種邏輯業務處理的。他們非常的輕便,所以修改升級都很容易。兩種語言擁有完善的語法,非常方便閱讀和解析。如果業務需要非常複雜的數學級運算或者非常精確的功能,那麼編譯型語言(C,C++)將更合適。業務邏輯處理可以在客戶端電腦(PC上,如JAVASCRIPT語言),應用伺服器上或者資料庫伺服器上。通常來說應用伺服器上處理是最普遍的。

    • User requests and resource allocation     使用者請求和資源分配

      Most of this is not affected by the programming language, but tools and fourth generation languages that mask database connection and cursor management might use inefficient mechanisms. When evaluating these tools and environments, check their database connection model and their use of cursors and bind variables.

            這種情況一般不會影響程式語言的選擇,但是一些工具和第4代語言(彙編是第一代,C是第二代,JAVA第三代,PLSQL是第四代語言),他們掩蓋了資料庫的連線機制以及遊標管理,有可能會導致低效的執行。當評估這些工具和環境時,請檢查一下他們的連線管理、遊標、繫結變數等使用機制。

    • Data management and transactions        資料及事務管理

      Most of this is not affected by the programming language.        這種對程式語言的選擇是沒有任何影響的

  2. When implementing a software component, implement its function and not the functionality associated with other components. Implementing another component's functionality results in sub-optimal designs and implementations. This applies to all components.

    當實現一個軟體模組時, 儘可能只要實現這個功能就行了,不要去和其他模組有太多的連線(我們通常稱之為弱耦合). 因為其他模組設計和實現可能不太理想. 耦合度高的話,這被應用到所有的模組中,即所有模組效率都低了。


  3. Do not leave gaps in functionality or have software components under-researched in design, implementation, or testing. In many cases, gaps are not discovered until the application is rolled out or tested at realistic volumes. This is usually a sign of poor architecture or initial system specification. Data archival and purge modules are most frequently neglected during initial system design, build, and implementation.

    不要在模組中留下一些缺陷,或者導致軟體模組需要重新設計,開發和測試的情況. 在許多情況下, 缺口或許直到應用程式上線或者在生產環境中測試時才能被發現.這通常是一個差勁的架構或者最初系統設計有問題的一個徵兆.  資料歸檔以及清理模組通常是在最初設計、開發和實現中最容易被忽略的. (通常這種情況發生在批次程式中, 批次被拉起來以後,通常就先做歷史歸檔,再清理舊資料,再跑當天資料)


  4. When implementing procedural logic, implement in a procedural language, such as C, Java, or PL/SQL. When implementing data access (queries) or data changes (DML), use SQL. This rule is specific to the business logic modules of code where procedural code is mixed with data access (nonprocedural SQL) code. There is great temptation to put procedural logic into the SQL access. This tends to result in poor SQL that is resource-intensive. SQL statements with DECODE case statements are very often candidates for optimization, as are statements with a large amount of OR predicates or set operators, such as UNION and MINUS.

    當實現一個程式邏輯時,使用例如C,java,或者plsql來做。 當實現查詢、修改時使用SQL做。這種特殊的邏輯處理方法由各種語言混合組成(即讓各種語言做他們擅長的事情)。但會出現將程式邏輯搬到SQL語言的風險。因為SQL語言是資源敏感型的,所以這時寫的SQL語句就可能比較差。使用DECODE方式的SQL通常可以被最佳化成包括大量的OR謂詞或者union和minus組成的操作。(也就是說DECODE效率低,應該用or 加上 union或minus去寫一個等價的SQL語句。那為什麼開發人員會寫大量的DECODE呢?因為DECODE的邏輯和程式開發的IF/ELSEIF/ELSE是很類似的,他們更喜歡這種邏輯處理的方式,而忘記了SQL就是SQL,不能把其他的東西加進來。可能有人會說,那你oracle為什麼要提供這些呢?不提供的話開發人員不就不去用了嗎?這就好比菜刀一樣,在中國很危險,要實名制;而在美國,多數人吃飯全是刀叉卻沒出現人在飯桌上被殺一樣。不能因為它危險所以就不提供。)


  5. Cache frequently accessed, rarely changing data that is expensive to retrieve on a repeated basis. However, make this cache mechanism easy to use, and ensure that it is indeed cheaper than accessing the data in the original method. This is applicable to all modules where frequently used data values should be cached or stored locally, rather than be repeatedly retrieved from a remote or expensive data store.

    當一些資料要被頻繁訪問的話儘可能去快取它,很少去修改的資料從資料庫訪問是非常昂貴的。這種實現快取的機制實現起來是非常簡單的,確保它的確是直接訪問資料庫要效率更高時再使用。這種機制主要是針對那些所有模組都頻繁去訪問的資料應該快取到本地來,而不是頻繁地從遠端或者昂貴的資料儲存中獲取。(11G中多了一些result cache,就是對付這些懶人的;但是問題又來了,DBA透過配置將訪問的效能提升後,有些專案經理竟然認為實現快取倒不如直接從資料庫上取,沒文化的領導最可怕啊!)


    The most common examples of candidates for local caching include the following:   下面是一些需要本地快取的例子:

    • Today's date. SELECT SYSDATE FROM DUAL can account for over 60% of the workload on a database.  今天的日期,sysdate很消耗資源的

    • The current user name.      當前的使用者名稱

    • Repeated application variables and constants, such as tax rates, discounting rates, or location information. 重複的常量如稅率,貼現率,或位置資訊

    • Caching data locally can be further extended into building a local data cache into the application server middle tiers. This helps take load off the central database servers. However, care should be taken when constructing local caches so that they do not become so complex that they cease to give a performance gain.

      可以將快取的機制放到中間層實現。這可以減緩資料庫的壓力。但是設計的時候要注意別設計的太複雜,否則為了那一點的效能而得不償失。

    • Local sequence generation.       本地的sequence生成(如果是多模組,多機器併發時,使用資料庫更方便些)

    The design implications of using a cache should be considered. For example, if a user is connected at midnight and the date is cached, then the user's date value becomes invalid.

    但是如果使用快取的話,設計時要注意些。比如快取了日期的話,結果到了晚上12:00左右時,如果直接訪問快取的話,這個日期就可能會失效了。(一般情況下,我們很少去快取這個日期,但是如果真要想去快取的話,最好的辦法是讓應用伺服器和資料庫伺服器做時間同步,就像RAC一樣,那直接從本地取就更方便一些了)

  6. Optimize the interfaces between components, and ensure that all components are used in the most scalable configuration. This rule requires minimal explanation and applies to all modules and their interfaces.

    當設計模組之間的介面時,確保所有的模組可以執行在合適的範圍內。這不需要再解釋了,oracle說你懂的。


  7. Use foreign key references. Enforcing referential integrity through an application is expensive. You can maintain a foreign key reference by selecting the column value of the child from the parent and ensuring that it exists. The foreign key constraint enforcement supplied by Oracle—which does not use SQL—is fast, easy to declare, and does not create network traffic.

    使用外來鍵約束。使用程式去管理這種約束通常是比較昂貴的。透過約束可以保證資料的一致性。透過oracle的約束機制--不是使用SQL實現的,所以執行速度很快,定義有很簡單,還不會造成頻繁訪問時的網路問題。(但是中興和廣發兩個變態的銀行全用DB2資料庫,我的程式就可能要執行在DB2資料庫的?如何遷移呢?DB2慢的話我還要重寫程式邏輯。所以在銀行中多數專案基本上看不到約束的。----如果是參數列的定義,基本上放到JAVA端去做驗證,後臺C在執行時也會再驗證一次,確保沒問題。如果是業務邏輯處理的話,全由C去驗證的。很多臭長臭長的程式碼都在處理這種一致性,如果大家都使用oracle該多好,oracle自己應該也這麼想)


  8. Consider setting up action and module names in the application to use with End-to-End Application Tracing. This allows greater flexibility in tracing workload problems. See "End-to-End Application Tracing".

    考慮使用action和module名(這兩個變數在v$session中,可以由我們來定義),以便進行end-to-end應用跟蹤。這增強了應用程式的跟蹤測試靈活性。(以前從來沒用到啊,未來一定去用一下,畢竟是DBA級別的人了,不能再跟小三那樣混)


 



2.5.7 Trends in Application Development

The two biggest challenges in application development today are the increased use of Java to replace compiled C or C++ applications, and increased use of object-oriented techniques, influencing the schema design.

Java provides better portability of code and availability to programmers. However, there are several performance implications associated with Java. Because Java is an interpreted language, it is slower at executing similar logic than compiled languages, such as C. As a result, resource usage of client computers increases. This requires more powerful CPUs to be applied in the client or middle-tier computers and greater care from programmers to produce efficient code.

Because Java is an object-oriented language, it encourages insulation of data access into classes not performing the business logic. As a result, programmers might invoke methods without knowledge of the efficiency of the data access method being used. This tends to result in minimal database access and uses the simplest and crudest interfaces to the database.

在現在的應用設計中,有兩個大的挑戰,一個是JAVA不斷的在替換C/C++之類的編譯型語言(不是C不好,主要開發效率低,開發人員能力差),一個是物件導向的設計理念。

JAVA語言開發遷移都比較的方便靈活。但是JAVA作為一個解釋型語言,它要比C語言這種編譯型語言慢很多。因此,客戶端的程式資源佔用就會增加。這就要求客戶端擁有更強的CPU以及開發人員要編碼更好的程式碼。(很多JAVA人員他們認為JAVA的效能和C已經差不多了,說這話的基本上我都認為他們水平很差。有次我給他們演示效能差距有多大:我處理一個好像5萬多行的XML檔案。使用C語言處理總共需要0.7秒,使用python需要1.2秒,使用JAVA處理需要4.2秒。因為PYTHON使用了C的底層庫,所以和C是一路貨,效能還可以。但是JAVA慢了剛好6倍,結果那JAVA的哥們說,4秒多鍾你還接受不了啊,這就是他們的思維!不做解釋的!)

因為JAVA是一個物件導向的語言,它鼓勵將資料訪問的封閉,而不執行業務邏輯(先封閉再透過方法執行)。結果程式設計師可能在沒有任何高效的中間資料訪問層知識的情況下去呼叫“方法”處理這些資料,也就是實現業務邏輯。這降低了資料庫的訪問以及使用最簡單的資料庫介面。

 

With this type of software design, queries do not always include all the WHERE predicates to be efficient, and row filtering is performed in the Java program. This is very inefficient. In addition, for DML operations—and especially for INSERTs—single INSERTs are performed, making use of the array interface impossible. In some cases, this is made more inefficient by procedure calls. More resources are used moving the data to and from the database than in the actual database calls.

In general, it is best to place data access calls next to the business logic to achieve the best overall transaction design.

The acceptance of object-orientation at a programming level has led to the creation of object-oriented databases within the Oracle Server. This has manifested itself in many ways, from storing object structures within BLOBs and only using the database effectively as an indexed card file to the use of the Oracle Database object-relational features.

If you adopt an object-oriented approach to schema design, then ensure that you do not lose the flexibility of the relational storage model. In many cases, the object-oriented approach to schema design ends up in a heavily denormalized data structure that requires considerable maintenance and REF pointers associated with objects. Often, these designs represent a step backward to the hierarchical and network database designs that were replaced with the relational storage method.

In summary, if you are storing your data in your database for the long-term, and if you anticipate a degree of ad hoc queries or application development on the same schema, then the relational storage method probably gives the best performance and flexibility.

使用這種型別的軟體設計,查詢語句不總是包括那麼高效的where條件,直接透過JAVA程式進行資料庫過濾。這種方式效率是不高的。另外,對於像insert之類的DML操作,通常都是單行插入操作,使用陣列介面不太容易。有些情況下,儲存過程的呼叫更加低效。因為多數的資源被“傳送資料到資料庫”和“從資料庫接收”這種工作中消耗了。

通常來說,最好的方式最將資料訪問層移到業務邏輯處理中實現,以達到最佳的總體事務設計

這種物件導向的設計已經引領著資料庫伺服器上建立物件導向的資料庫。(不太理解什麼是物件導向資料庫,postgres好像是這種資料庫,我沒什麼研究)這已經透過許多方式證明了,從儲存到BLOBS中的物件欄位和高效的索引卡目錄到使用物件導向的資料庫特性。

假如你採用了物件導向的設計模型,那麼確信你沒有丟失關係儲存模型的靈活性。在許多情況下,物件導向的設計常常以需要相當大維護和REF指標關聯物件的此類違反正常資料結構設計收場。(具體涵義我也不太懂,大致應該是設計層次過多,引用過多,最終導致無法維護,C++多級多層繼承,容易出現混亂)很多時候,這種設計模型像倒退到以前的那種層次型或網狀層的資料庫管理系統一樣。(我沒見過層次型資料庫,但是我見過IBM以前的一些文件,他們將表從層次型升級到關係型時還有一些是此類的圖表在裡面)

總的來說,假如你希望能將資料儲存的久一些,同時希望能按需去查詢或設計相似的應用,那麼關係儲存方式能給你最大的效能和靈活性。

 



2.6 Workload Testing, Modeling, and Implementation

This section describes workload estimation, modeling, implementation, and testing. This section covers the following topics:

  • Sizing Data

  • Estimating Workloads

  • Application Modeling

  • Testing, Debugging, and Validating a Design

2.6.1 Sizing Data

You could experience errors in your sizing estimates when dealing with variable length data if you work with a poor sample set. As data volumes grow, your key lengths could grow considerably, altering your assumptions for column sizes.

When the system becomes operational, it becomes more difficult to predict database growth, especially for indexes. Tables grow over time, and indexes are subject to the individual behavior of the application in terms of key generation, insertion pattern, and deletion of rows. The worst case is where you insert using an ascending key, and then delete most rows from the left-hand side but not all the rows. This leaves gaps and wasted space. If you have index use like this, then ensure that you know how to use the online index rebuild facility.

DBAs should monitor space allocation for each object and look for objects that may grow out of control. A good understanding of the application can highlight objects that may grow rapidly or unpredictably. This is a crucial part of both performance and availability planning for any system. When implementing the production database, the design should attempt to ensure that minimal space management takes place when interactive users are using the application. This applies for all data, temp, and rollback segments.

在處理很差的可變長的樣板資料集時,你也許有錯誤估算的經歷。隨著資料量的增長,你關鍵資料長度可能相應增加,超出了你預想的列長度。(沒看懂,自個看上面原文理解)

當你係統上線後,它可能變得更加難以預測,特別是索引值。表不斷的增長,索引也增長,索引結點不斷的產生,插入或刪除。最差的情況是當使用一個遞增的關鍵字作為索引,然後卻又從最小的值開始刪除一部分資料(不是刪除所有的值)。這將導致缺口或空間浪費。假如你的索引是這種方式建立的,那你一定要知道如何去線上的重建索引。(alter index xxxx rebuild online; 即可)

DBA應該監控每個物件的空間分配情況以確保沒有超出控制。如果對應用程式有較好的理解,就可以瞭解重要的物件快速增長或不可預測情況。這個是對於效能和可用性方面都非常重要的設計內容之一。在配置生產資料庫時,當有互動式使用者使用此應用時要確保要最少的管理空間。包括了所有的資料檔案,臨時檔案,UNDO段。

 



2.6.2 Estimating Workloads

Considering the number of variables involved, estimation of workloads for capacity planning and testing purposes is extremely difficult. However, designers must specify computers with CPUs, memory, and disk drives, and eventually roll out an application. There are several techniques used for sizing, and each technique has merit. When sizing, it is best to use the following two methods to validate your decision-making process and provide supporting documentation:

考慮到有各種的可能性,評估壓力容量計劃和測試是相當困難的。然而設計人員必須要計劃CPU,記憶體,硬碟,最終展開應用。有許多方法可以去做評估,每種都自己的優點。可以使用下面的兩種方法來幫助評估:

  • Extrapolating From a Similar System    從類似的系統中推斷

  • Benchmarking                                    用基準問題測試

2.6.2.1 Extrapolating From a Similar System

This is an entirely empirical approach where an existing system of similar characteristics and known performance is used as a basis system. The specification of this system is then modified by the sizing specialist according to the known differences. This approach has merit in that it correlates with an existing system, but it provides little assistance when dealing with the differences.

This approach is used in nearly all large engineering disciplines when preparing the cost of an engineering project, such as a large building, a ship, a bridge, or an oil rig. If the reference system is an order of magnitude different in size from the anticipated system, then some components may have exceeded their design limits.

這是一個完全靠經驗的做的方式,當存在一個相似的系統,根據這個類似系統的效能就大概推斷一下自己系統效能是多少了。然後根據兩個系統不同的處再進行相應的修正一下。這個方法對於那麼有相似處的地方可以借鑑一下,但是對於差異處就不好做判斷了!

這種方法在許多大型的專案預算中廣泛的使用,比如大廈建設,船舶,大橋,石油鑽探裝置等等。假如比較的是一個不同數量級的專案,那麼有些模組就有可能超出設計的範疇。

注:這個比較的方法在軟體中用於更加多了,基本上架構師認為一個模組,比如根據平安銀行經驗,可能在2天內能做完,那就按2天的工作量報價了。平安銀行假如說專案金額是120萬,那麼在廣州銀行中相似的專案,如果多增加了一些其他的功能,報價就加到150萬等等。但是有時候也不一定,因為要看銷售人員的水平了,如果他們只想著完成任務,不管專案人員的死活,可能80W就敢做的。

 



2.6.2.2 Benchmarking   基準測試(這部分我翻譯很差勁,最好自己去看原文)

The benchmarking process is both resource and time consuming, and it might not produce the correct results. By simulating an application in early development or prototype form, there is a danger of measuring something that has no resemblance to the actual production system. This sounds strange, but over the many years of benchmarking customer applications with the database development organization, Oracle has yet to see reliable correlation between the benchmark application and the actual production system. This is mainly due to the number of application inefficiencies introduced in the development process.

However, benchmarks have been used successfully to size systems to an acceptable level of accuracy. In particular, benchmarks are very good at determining the actual I/O requirements and testing recovery processes when a system is fully loaded.

Benchmarks by their nature stress all system components to their limits. As the benchmark stresses all components, be prepared to see all errors in application design and implementation manifest themselves while benchmarking. Benchmarks also test database, operating system, and hardware components. Because most benchmarks are performed in a rush, expect setbacks and problems when a system component fails. Benchmarking is a stressful activity, and it takes considerable experience to get the most out of a benchmarking exercise.

基準測試的過程是一個資源和時間消耗的過程,同時也可能不會產生一個正確的結果。透過模擬一個早期開發的應用程式或者設計原型,存在著可能測試了一個與生產系統沒有相似處的風險(測試的很high,但是生產上根本不用這個功能)。聽起來很奇怪,但是根據從事多年的基準測試應用和開發人員的經驗,oracle已經能可靠地識別出基準與實際生產的關係。這主要是由於前面開發階段介紹的低效的應用。

然而,基準測試已經被成功地使用在相對準確的可接受的評測系統中。特別是基準測試對於評測實際的IO請求和測試系統滿荷進行時恢復程式數是非常有效的。

基準測試根據自己的特點,對系統中所有的模組進行加壓。透過這個壓力測試,去發現應用程式中設計時要求的錯誤日誌資訊及看看自己到底是半斤還是八兩。基準測試也會去測試資料庫,作業系統,和硬體部件。因為多數的基準測試時間都使用狂風暴雨般的方式期望能測出一點問題出來。基準測試是一個壓力測試行動,它需要許多的經驗。

 



2.6.3 Application Modeling

Modeling the application can range from complex mathematical modeling exercises to the classic simple calculations performed on the back of an envelope. Both methods have merit, with one attempting to be very precise and the other making gross estimates. The downside of both methods is that they do not allow for implementation errors and inefficiencies.

The estimation and sizing process is an imprecise science. However, by investigating the process, some intelligent estimates can be made. The whole estimation process makes no allowances for application inefficiencies introduced by poor SQL, index design, or cursor management. A sizing engineer should build in margin for application inefficiencies. A performance engineer should discover the inefficiencies and make the estimates look realistic. The Oracle performance method describes how to discover the application inefficiencies.

設計模型可以是一個非常複雜的數學模型,或者是一個粗略的估算(或計劃)。兩者各有優點,一個希望極度的精確,一個是總的估算。兩者的底線是都不允許出現低效和錯誤。

這個評估和估算過程是一個不太準確的方式。通審查這個過程,就可以大概的得出一些評估。整個評估的過程不對那麼低效的SQL,索引,遊標管理做限制。一個分析工程師應該建立一個低效應用的界線。一個效能工程師應該能發現這些低效以及使這些評估更加真實。oracle效能最佳化技術介紹瞭如何去發現這些低效的應用。

 



2.6.4 Testing, Debugging, and Validating a Design

The testing process mainly consists of functional and stability testing. At some point in the process, performance testing is performed.

The following list describes some simple rules for performance testing an application. If correctly documented, then this list provides important information for the production application and the capacity planning process after the application has gone live.

測試主要集中在功能和穩定性上。效能或壓力測試有時也是在這個時候才做的。

下面幾項描述了一些簡單的方法去測試應用程式的效能。假如準確地書面化,那麼下面的幾項提供了應用開發中非常重要的一些資訊和已經上線系統的計劃處理。

  • Use the Automatic Database Diagnostic Monitor (ADDM) and SQL Tuning Advisor for design validation  使用ADDM和STA去驗證

  • Test with realistic data volumes and distributions     使用真實的資料量和分佈情況進行測試

    All testing must be done with fully populated tables. The test database should contain data representative of the production system in terms of data volume and cardinality between tables. All the production indexes should be built and the schema statistics should be populated correctly.  所有的測試都應該覆蓋到整個表。測試資料庫應該包括能代表生產系統的相對真實資料和相應的資料量。所有生產上的索引也應該建立起來,以及收集了相應的統計資料。(注:如果測試的資料和生產資料差別比較大的話,這個索引有時候就失效了。比如計劃的是查詢類交易可能會佔30%左右,但是結果在壓力測試時90%都是金融類交易,這樣就無法體現查詢交易的真實情況,如果此時再使用AWR,ADDM,STA都越做越偏離的遠)

  • Use the correct optimizer mode       使用正確的最佳化方式

    Perform all testing with the optimizer mode that you plan to use in production. All Oracle Database research and development effort is focused on the query optimizer. Therefore, the use of the query optimizer is recommended.  使用你計劃在生產上使用的那些最佳化模式去做相應測試。所有的資料庫研究和開發都應該集中在查詢的最佳化上。因此建議使用查詢最佳化器。  (這個是方法論的概念了!)

  • Test a single user performance       測試單一使用者的效能

    Test a single user on an idle or lightly-used database for acceptable performance. If a single user cannot achieve acceptable performance under ideal conditions, then multiple users cannot achieve acceptable performance under real conditions. 在一個小型資料庫上測試單一使用者時可接受的效能。假如在理想情況下單使用者都達不到要求,那麼在真實條件下多使用者更不可能滿足要求了。

  • Obtain and document plans for all SQL statements   獲取和記錄所有語句的執行計劃

    Obtain an execution plan for each SQL statement. Use this process to verify that the optimizer is obtaining an optimal execution plan, and that the relative cost of the SQL statement is understood in terms of CPU time and physical I/Os. This process assists in identifying the heavy use transactions that require the most tuning and performance work in the future.     獲取所有語句的執行計劃。這個確認最佳化器使用了最佳的執行計劃,以及相關的CPU和IO等資源消耗情況。這個過程有助於瞭解那些重要的或者是重量級的事務,以便於在未來重點去除錯。

  • Attempt multiuser testing              嘗試多使用者測試

    This process is difficult to perform accurately, because user workload and profiles might not be fully quantified. However, transactions performing DML statements should be tested to ensure that there are no locking conflicts or serialization problems.  這個過程很難精確的執行,因為使用者壓力和屬性無法完整的描述。然而透過大量的DML事務處理應該能確認沒有鎖的衝突或者導致序列價效比問題。

  • Test with the correct hardware configuration     測試硬體配置

    Test with a configuration as close to the production system as possible. Using a realistic system is particularly important for network latencies, I/O subsystem bandwidth, and processor type and speed. Failing to use this approach may result in an incorrect analysis of potential performance problems.    儘可能和生產一樣去配置系統。使用一個較真實系統對於測試網路延遲,IO頻寬,CPU速度是非常重要的。如果沒有這樣做的話可能導致不正確的分析潛在的效能問題。(這個在銀行是很難做到的,基本上測試環境很糟糕,或者有的銀行是測試機非常的厲害如16核768G記憶體,但是生產系統8核16G記憶體等等)

  • Measure steady state performance                   測量穩定狀態效能

    When benchmarking, it is important to measure the performance under steady state conditions. Each benchmark run should have a ramp-up phase, where users are connected to the application and gradually start performing work on the application. This process allows for frequently cached data to be initialized into the cache and single execution operations—such as parsing—to be completed before the steady state condition. Likewise, at the end of a benchmark run, there should be a ramp-down period, where resources are freed from the system and users cease work and disconnect.    當基準測試時,在一個穩定的狀態下進行效能測試是非常重要的。基準測試開始時都有一個上升階段,使用者連線到系統上並開始執行一些工作。這個過程允許那些需要被頻繁訪問的資料現在開始初始化,也就是開始快取進來並進行單一使用者執行操作,例如解析過程,在真正進入穩定狀態時已經全部做過一次了。同樣的,在基準測試進入尾聲,將會有一個下降的過程,因為使用者停止進行測試以及斷開連線,資源將會被釋放出來。

 



2.7 Deploying New Applications   應用程式部署

This section describes the following design decisions involved in deploying applications:

  • Rollout Strategies             展示或推廣策略

  • Performance Checklist       效能檢查項

2.7.1 Rollout Strategies

When new applications are rolled out, two strategies are commonly adopted:

  • Big Bang approach - all users migrate to the new system at once                     大爆炸方式,所有使用者一次性遷移過來

  • Trickle approach - users slowly migrate from existing systems to the new one   涓涓細流方式,使用者慢慢地慢慢地遷移

Both approaches have merits and disadvantages. The Big Bang approach relies on reliable testing of the application at the required scale, but has the advantage of minimal data conversion and synchronization with the old system, because it is simply switched off. The Trickle approach allows debugging of scalability issues as the workload increases, but might mean that data must be migrated to and from legacy systems as the transition takes place.

It is difficult to recommend one approach over the other, because each method has associated risks that could lead to system outages as the transition takes place. Certainly, the Trickle approach allows profiling of real users as they are introduced to the new application, and allows the system to be reconfigured while only affecting the migrated users. This approach affects the work of the early adopters, but limits the load on support services. This means that unscheduled outages only affect a small percentage of the user population.

The decision on how to roll out a new application is specific to each business. Any adopted approach has its own unique pressures and stresses. The more testing and knowledge that you derive from the testing process, the more you realize what is best for the rollout.

每種方式都有自己的優缺點。快速全部的遷移方式需要對系統有強有力的測試,以及有系統穩定的信念,這種方式的優點是需要最少的資料轉換工程和與舊系統的同步過程,因為一開始遷移時就全部遷移完了。而逐步遷移的方式則能在慢慢遷移中發現系統的不足,這樣可以保證舊系統在這段時間內還可以發揮自己的餘熱,慢慢過渡。

建議使用哪種方式是比較難的,每種方式都有自己過渡時停運的風險。慢慢遷移方式允許對真實的使用者做一些試執行,這樣在系統重新更新時僅影響小部件使用者。這種方式僅影響前期的試用者,但是對於壓力是不夠的。反正就是僅僅影響個別使用者。

採用哪種方式進行遷移,要根據自己的業務需要決定。任何一種決定的途徑都有自己的壓力及著重點。隨著你對測試過程瞭解的越多,你將會知道哪種遷移方式最合理。

(在銀行中,這兩種方式我都使用過。本質上沒有什麼太大區別。慢慢遷移可以做到緩解自己壓力,因為新系統上線很難做到不出問題的,這樣給自己的一個緩衝期是最明智的了;但是許多銀行人員比較喜歡一次性上線,這樣省得下次還得再遷移,再遷移,挺麻煩的(上線停運通常都是晚上23點以後,許多系統還是星期六做,要人命的)。不過我還使用過第三種方式,就是將生產資料同時在新舊系統上跑,新系統僅僅是為了驗證使用,等跑了個把個月之後,發現和系統對比沒有區別時,如果有問題就迅速修正,等一切都穩定之後,再將舊系統停掉,這樣就無縫的遷移過來了)

 



2.7.2 Performance Checklist

To assist in the rollout, build a list of tasks that increase the chance of optimal performance in production and enable rapid debugging of the application. Do the following:  為了協助升遷,按下面的方式建立一個任務列表,增加你的上線成功機率,以及快速的問題跟蹤:

  1. When you create the control file for the production database, allow for growth by setting MAXINSTANCES, MAXDATAFILES, MAXLOGFILES, MAXLOGMEMBERS, and MAXLOGHISTORY to values higher than what you anticipate for the rollout. This technique results in more disk space usage and larger control files, but saves time later should these need extension in an emergency.      當你建立控制檔案時,增加MAXINSTANCES, MAXDATAFILES, MAXLOGFILES, MAXLOGMEMBERS, and MAXLOGHISTORY這些值引數,也就是儘量引數調大一些。雖然一開始多佔用了一些空間,但是如果在未來需要緊急調整時,這將會節約不少時間。

  2. Set block size to the value used to develop the application. Export the schema statistics from the development or test environment to the production database if the testing was done on representative data volumes and the current SQL execution plans are correct. block size設定要和開發測試機一致。如果測試環境上的資料量和生產上是一致的,同時當年的SQL執行計劃是正確的話,將測試機統計資料匯出來,然後導到生產系統上。

  3. Set the minimal number of initialization parameters. Ideally, most other parameters should be left at default. If there is more tuning to perform, then this appears when the system is under load. See Chapter 4, "Configuring a Database for Performance" for information about parameter settings in an initial instance configuration.   設定最小資料的初始化引數。理想情況下,其他的引數預設設定就可以了。假如有許多需要調優的地方,那麼在相應的壓力下很快問題就可以會出現。

  4. Be prepared to manage block contention by setting storage options of database objects. Tables and indexes that experience high INSERT/UPDATE/DELETE rates should be created with automatic segment space management. To avoid contention of rollback segments, use automatic undo management. See Chapter 4, "Configuring a Database for Performance" for information about undo and temporary segments.   設定儲存引數,管理塊競爭。儘量使用ASSM以及自動UNDO_MANAGEMENT引數。

  5. All SQL statements should be verified to be optimal and their resource usage understood.  所有的SQL語句都應該檢查過,並且知道他們對資源的消耗程式。(在銀行裡,任何一個小的賬務系統都要比像阿里巴巴之類的網際網路企業要複雜的多,有可能SQL語句有上千個之多,所以這個相對是比較難的。但是通常情況下,在開發階段要迭代式的對SQL進行檢查)

  6. Validate that middleware and programs that connect to the database are efficient in their connection management and do not logon or logoff repeatedly.   檢查一下你的中間層軟體,看看他們在連線資料庫的策略是不是高效,確保他們不會頻繁的連線退出,連線退出。

  7. Validate that the SQL statements use cursors efficiently. The database should parse each SQL statement once and then execute it multiple times. The most common reason this does not happen is because bind variables are not used properly and WHERE clause predicates are sent as string literals. If you use precompilers to develop the application, then make sure to reset the parameters MAXOPENCURSORS, HOLD_CURSOR, and RELEASE_CURSOR from the default values before precompiling the application.  驗證你的SQL語句是否有效的使用了遊標。資料庫應該一次解析,多次執行。很多情況下,由於開發人員沒有使用繫結變數以及直接在where查詢條件中使用字面值引數導致。如果你使用預編譯方式開發應用程式,確保在重新編譯時重置了MAXOPENCURSORS, HOLD_CURSOR, and RELEASE_CURSOR 值

  8. Validate that all schema objects have been correctly migrated from the development environment to the production database. This includes tables, indexes, sequences, triggers, packages, procedures, functions, Java objects, synonyms, grants, and views. Ensure that any modifications made in testing are made to the production system.   確保你所有的表、索引、檢視、sequence、synonym、儲存過程、函式、觸發器、授權等等已經從開發環境遷移到了生產環境。確保測試環境中的任何小補丁都在生產環境中同樣應用。(一般我們是在上線前把所有的建表、索引等語句全部像mysql裡建立庫一樣,寫到xxx.sql檔案中,然後在測試環境中建立一個“上線演練環境”,這個環境主要是演練上線執行指令碼的,不對業務做測試的。執行完xxx.sql指令碼後,對比一下UAT測試環境就知道了。)(銀行裡每個專案組都有:開發環境:供開發人員隨意的修改的;SIT環境,相對穩定的開發環境,一般是開發模組穩定後,可以和其他人編寫的模組進行聯調時就放到這個環境下,供自己專案組的測試人員測試,這個模組下的程式碼要使用SVN進行控制的;UAT環境:供行方測試人員測試,這個階段程式碼控制更嚴格;上線演練環境:對上線前的執行指令碼或者遷移指令碼進行演練,這個時候程式碼已經被凍結,一般不會讓改動的; 壓力測試環境:這個更簡單了,就是做壓力測試的,通常是有經驗的專家來做,他可以直接對程式碼修改,以查詢調優效能,最終將新的修改合併到SIT上重新進行測試。     通常情況下,開發環境,SIT,UAT,上線演練環境這幾個都在開發伺服器上,透過不同的使用者區分開來;而壓力測試環境會抽一個效能比較好的機器臨時供專案組進行壓力測試。一般來說,除了全新的專案會在上線前進行壓力測試,其他的維護類專案,比如到了二期,三期時專案基本上已經穩定,沒人會再去做壓力測試了,一二個新增功能即使效率低也不會有太大影響,所以直接在測試環境中大概測一下,通常SQL執行計劃走到自己預想的索引或全表掃描上,就算成功了。具體的效能如何都是直接透過觀察生產環境來做最終的判斷。)

  9. As soon as the system is rolled out, establish a baseline set of statistics from the database and operating system. This first set of statistics validates or corrects any assumptions made in the design and rollout process.    一旦成功的上線後,建立一個作業系統和資料庫的統計基線。首次的統計資料通常是用來驗證或者修正在開發中的那個預想(假設)是否正確。(一般只要不出現大的效能和功能上的問題,都是在系統業務量和功能穩定後再做的)

  10. Start anticipating the first bottleneck (which is inevitable) and follow the Oracle performance method to make performance improvement. For more information, see Chapter 3, "Performance Improvement Methods".    開始檢視你的效能瓶頸(一般是肯定有的),然後按照oracle效能方法去提升。(這個不一定的,我們一定要講究一次性成功的,偶爾出問題的地方是上線指令碼可能有問題,或者是引數設計問題導致某個業務運轉不正常,很少有效能方面問題。可能有人覺得不可思議,怎麼可能一個新專案沒效能問題,畢竟你們的環境都不一樣? 答案是在評估伺服器時通常要比測試環境高几倍的方式估算,以應對未來業務量的增長。比如在8核的測試環境中效能是單程式每秒鐘2000筆交易,單伺服器起10程式,那麼一臺伺服器一秒鐘可以併發2000*10筆交易。上線後銀行用2臺機器做叢集,每臺機器是16核,那麼通常情況一秒鐘交易量基本上可以按5-6萬來算了;有些非常重要的系統,叢集數更多。我聽過一個聯通的系統,每天處理4000萬交易,用了20多臺機器做叢集;前方8臺機器做轉發,後臺N臺處理)

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30633755/viewspace-2127646/,如需轉載,請註明出處,否則將追究法律責任。

相關文章