熟悉SQL tuning

sembh發表於2010-07-08

--最佳化總原則:按語句執行順序,先過濾掉最容易過濾到的記錄.

--最佳化規則:1:資料比較多的表應該先進行連線,資料比較少的表應該儘可能後進行連線。
--原理:你從一堆沙子裡找針難,還是幾粒沙子裡面找針難。

--2 from後面寫表的順序時,選擇小表作為驅動表,放在右端。如果有3個以上的表連線查詢,
--那就需要選擇交叉表作為基礎表,交叉表是指被其他表所引用的表.

--3 where後的條件:表之間的連線必須寫在其它where條件之前.那些可以過濾掉最大數量記錄
--的條件必須寫在where字句的末尾.一般tablename.columnname = '多少'字句放到最後面.

--4 子查詢中,要減少對同一個表的查詢次數.

--5 儘量用exists代替in,not exists 代替not in.因為使用not in語句將執行一個內部的排序和合並,而且它對子
--查詢中的表執行了一個全表遍歷),可以用外連線(+)(次高效)或exits代替(最高效)

--6使用索引提高效率.但是索引需要空間來儲存,也需要定期維護,每當有記錄在表中增減或索引列
--被修改時,索引本身也會被修改,這意味著每條記錄的insert,update,delete將為此多付出4,5次的磁碟I/O.

--建分割槽表會不會需要額外的儲存空間來儲存?如果不需要的話,那都可以考慮了。(一般千萬級以上的表可以考慮建分割槽表)

/*oracle排序通常發生在以下情況:
1 order by 字句
2 group by 字句
3 select distinct 字句
4 建立索引時
5 union、minus 操作
6 呼叫排序合併連線
**/

--sql最佳化方法:
--1檢視explain,對錶作統計
--2嘗試在查詢欄位上建立索引
--3檢視資料庫初始引數是否有不合理的地方,調整。

--and not s.area_id like '2%'改成and substr(s.area_id,1,1) <> '2'
--有group by了,distinct 和order by多餘

--------------------------------------講join...-------------------------------------------

1 WHEN uses NESTED LOOP joins?
WHEN joining SMALL NUMBER OF ROWS,WITH a good driving condition BETWEEN two TABLES.you drive FROM THE
OUTER LOOP TO THE INNER LOOP,so THE ORDER OF THE TABLES IN THE execution PLAN IS important.
THE OUTER LOOP IS THE driving ROW SOURCE.it produces a SET OF ROWS FOR driving THE JOIN condition.
THE ROW SOURCE can be a TABLE ACCESSED USING an INDEX SCAN OR a FULL TABLE SCAN.(小表驅動原理)

THE INNER LOOP IS iterated FOR every ROW returned FROM THE OUTER LOOP,ideally BY an INDEX SCAN.
IF THE ACCESS path FOR THE INNER LOOP IS NOT dependent ON THE OUTER LOOP,you should USER
other JOIN methods WHEN two indepdent ROW sources are joined together.
(如果存取路徑對於內部迴圈來說,不依賴於外部迴圈的時候,那麼,你就應該使用其他的連線方法,也就是在
兩個獨立的行資源連線在一起的時候.)

2 WHEN uses HASH joins ?
HASH joins are used FOR large DATA SETS.uses THE smaller OF two TABLES OR DATA sources TO BUILD a
HASH TABLE ON THE JOIN KEY IN memory.it THEN scans THE large TABLE,probing THE HASH TABLE TO find THE joined ROWS.

具體包括下面兩個情況:
1 a large amount OF DATA needs TO be joined.
2 a large fraction OF a SMALL TABLE needs TO be joind.


3 WHEN uses SORT MERGE joins ?
can be used TO JOIN ROWS FROM two independent sources.

SORT MERGE joins are useful WHEN THE JOIN condition BETWEEN two TABLES i s an inequality condition
LIKE , OR >=. SORT MERGE joins perform better THAN NESTED LOOP joins FOR large DATA SETS.

--------------表掃描方式:
1 FULL TABLE scans:READS ALL ROWS FROM a TABLE AND filters OUT that do NOT meet SELECT criteria.During a fulll TABLE SCAN,ALL BLOCK IN THE TABLE that UNDER THE HIGH water mark are scanned.THE hith water mark indicate THE amount OF used SPACE.
和初始化引數db_file_multiblock_read_count相關.USING multiblock READS means a FULL TABLE scans can be performed very efficiently.

a FULL TABLE SCAN IS faster FOR accessing large amounts OF DATA.
FULL TABLE SCAN IS cheaper THEN INDEX RANGE SCAN WHEN accessing a large fraction OF THE blocks IN a TABLE.
(當存取一個表的大部分資料快的時候,全表掃描比索引範圍掃描花費的代價要小.)
this IS because FULL TABLE SCAN can USE large I/O calls,AND marking fewer large i/o calls IS cheaper THAN marking many smaller calls.

最佳化器在以下情況會使用full TABLE SCAN
(1) 缺乏索引
(2) large amount DATA
如果當最佳化器認為查詢存取了表的大部分資料庫塊的時候,即使有索引有用的情況下,還是會使用full TABLE SCAN.
(3) SMALL TABLE
IF a TABLE contains LESS THAN db_file_multiblock_read_count blocks UNDER THE HIGH water mark
(4) HIGH DEGREE OF parallelism


2 ROWID scans:
THE ROWID OF a ROW specifies THE DATAFILE AND DATA BLOCK containing THE ROW AND THE LOCATION OF THE ROW IN that BLOCK.THE exact LOCATION OF THE ROW IN THE DATABASE IS specified.

3 INDEX scans:
INDEX UNIQUE scans:
INDEX RANGE scans:
FULL scans:
FAST FULL INDEX scans:
INDEX joins:
BITMAP INDEXES:
4 CLUSTER ACCESS:
5 HASH ACCESS:
6 SAMPLE TABLE scans:

---------------performance tuning
cpu
memory:
i/o subsystem
NETWORK

SE IN,exists在子查詢中

IN certain cirsumstances, it IS better TO USE IN rather THEN EXISTS.
IN general, IF THE selective predicate IS IN THE subquery,THEN USE IN.
IF THE selective predicate IS IN THE PARENT QUERY,THEN USE EXISTS.


IN 適合於外表大而內表小的情況;exists適合於外表小而內表大的情況。

IN 和exists的區別 :如果子查詢得出的結果記錄較少,主查詢中的表較大且又有索引時應該用in,
反之如果外層的主查詢記錄較少,子查詢的表大,又有索引時使用exists.區別主要是驅動順序的改變,
如果是exists,那麼以外層表為驅動表,先被訪問,如果是in,那麼先執行子查詢。

/*
1: 減少不必要的全表掃描,快20倍
*/


/*
2:建立基於函式的索引
*/


/*
3:透過第三方工具,調整sql語句
*/

/*
從程式庫快取和stats$qls_summary抽取不友好的sql語句,並快速定位高頻執行的語句,並首先對它們進行調整.
*/

/*
將全表掃描替換為索引範圍掃描總體方針:
1 對於所有記錄排序的表:讀取表記錄數40%以下的查詢應使用索引範圍掃描
2 讀取表資料塊數目7%以下的查詢應該使用索引範圍掃描
*/

/*
oracle並行查詢只適用於執行全表掃描的查詢,索引是並行查詢的死對頭
*/

當建立一個新的索引時,應該考慮的因素:
1 :THE most aggresive form OF this technique IS TO BUILD an INDEX-organized TABLE.but,
you must be careful that THE increased leaf SIZE OF an IOT does NOT undermine THE efforts TO reduce I/O.
2: USING a different INDEX TYPE:b-tree INDEX, BITMAP INDEX, FUNCTION-based INDEX,partitioned INDEX,REVERSE INDEX,
考慮應該建立哪類索引.
3 finding THE COST OF an INDEX:building AND maintaining an INDEX STRUCTURE can be expansive,AND it can
consume RESOURCE sucn AS DISK SPACE,cpu, AND i/o capacity.designer must CONSIDER that THE benefit OF ANY INDEX outweigh THE negatives OF INDEX maintaince.

IF you INSERT INTO a TABLE WITH three INDEXES,THEN it will be approximately 10 times slower THAN an INSERT INTO a TABLE WITH NO INDEX.
FOR DML,AND particulary fo INSERT-heavy application,THE INDEX design should be seriously reviewed.
經常說的建索引能夠提高select,UPDATE,delete的效率,但不一定能提高insert的效率.所以對於一張經常要被insert的表,建索引時,要考慮清楚.

4 USING a REVERSE KEY INDEX OR USING a cycling sequenct TO prefix AND SEQUENCE VALUES.

5 ordering COLUMNS IN an INDEX(索引排序): (1) ORDER COLUMNS WITH most SELECTIVITY FIRST.
(2)ORDER COLUMNS TO reduce I/O BY clustering OR sorting DATA.

6 USING views:

7 開發人員必須有效地理解執行計劃,必須要準備的事情:(1)good DATABASE connection managemtn (2) good CURSOR USAGE AND managemt

8實現應用:
(1) USER interface
(2) business logic code can be ON THE client server,AND THE DATABASE server.but,THE application server IS THE most common loaction FOR business logic code.
(3) SQL statements WITH decode CASE statments are very ofen candidates FOR optimization.


AT a peak period ON a web site,THE resoponse TIME will NOT execeed five seconds FOR a page REFRESH.


---------------------------------top 10 mistakes found in oracle systems

1 bad connection MANAGEMENT
2 bad USE OF cursors AND teh SHARED pool
3 bad SQL:USE ADDM TO 看high load SQL AND THE SQL tuning advisor can be used TO provide recommendations FOR improvement.
4 USE OF nonstandard initializtion PARAMETERS:
5 getting DATABASE I/o wrong
6 redo LOG setup problems
7 serializtion OF DATA blocks IN THE buffer CACHE due TO lack OF free LISTS,free LIST GROUPS,transactions slots(INITRANS),OR shortage OF ROLLBACK segments.
8 LONG FULL TABLE scans
9 HIGH amounts OF recursive(SYS)遞迴 SQL
10 遷移過程當中出現的問題:例如缺乏indexes OR incorrent STATISTICS.
addm會顯示高負載的sql.

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/24214296/viewspace-1035093/,如需轉載,請註明出處,否則將追究法律責任。

相關文章