並行執行的學習與測試

gdutllf2006發表於2010-07-26


參考文件<>

#################################################################
1 Introduction to Parallel Execution <1>
#################################################################
1.1 Oracle的並行執行不適合於(事務多,時間短)的OLTP系統,在這類系統中,從並行執行得到的好處,遠小於並行執行帶來的額外開銷.

1.2 The most common example of using parallel execution is for DSS,Data warehouses,Complex queries, 批處理的動作,如資料的遷移.

1.3 Parallelism is the idea of breaking down a task so that, instead of on process doing all of the work in a query,
many processes do part of the work at the same time.

 

#################################################################
2 When (not) to implement parallel execution
#################################################################
2.1 業務時間不合適做並行動作。

2.2 Parallel execution improves performance for:
1) Queries
2) Creating of Large indexes
3) Bulk inserts, updates and deletes,
4) Aggregations and copying

and Parallel execution benefits systems that have all of the following characteristics:
1) SMP, Clusters, multiple CPUS.(在單CPU的機器中不能從並行技術中得到任何好處,其實也可以得到一些好處的,充分利用CPU資源.)
2) Sufficient I/O bandwidth (足夠的I/O頻寬)
3) Under-utilized or intermittently used CPUS(less than 30%,CPU相對空閒)
4) Sufficient Memory to support additional memory-intensive processes such as sorts, hashing, and I/O bufffers

2.3 使用並行執行的兩個前提:
1) 工作量大,如掃描一個50G的大表
2) 系統資源(CPU,IO, Memory)充足

 

#################################################################
3 How Parallel Execution Works<1>
#################################################################
when parallel execution is not used, a single server process performs all necessary processing for the sequential
execution of a SQL statement.For example, to perform. a full table scan(select * from employees), one process performs
the entire operation.(一個Client Connection,對應一個服務端的程式,這個程式在沒用使用並行機制的情況下,只由一個CPU來處理,
是這樣嗎?是的,那小機中的這麼多CPU不是沒發揮到作用???多個程式的併發)

Parallel execution performs these operations in parallel using multiple parallel processes. One process, known as the parallel
execution coordinator(協調者),dispatches the execution of a statement to several parallel execution servers and coordinates the
result from all of the server processes to send the results back to the user.(由一個協調者來分配任務和綜合結果,返回給終端使用者)

Take Parallel Table Scan employees for example, The table  is divided dynamically(dynamic partitioning) into load units called granules
and each granule is read by a single parallel execution server. The granules are generated by the coordinator. Each granules is a
range of physical blocks of the table. The mapping of granules to execution servers is not static, but is determined at execution
time. When an execution server finishes reading the rows of the table employees corresponding to a granule,it gets another granule
from the coordinator if there are any granules remaining. This continues till all granules are exhausted, in other words, the entire
table employees has been read. The parallel execution servers send results back to the parallel execution coordinator, which assembles
the pieces into the desired full table scan.
(按表的物理塊分成幾個Granule,每一個服務程式取一個,一旦完成,則再在剩餘的Granule中取一個,直到所有的Granule都完成.)

Given a query plan for a SQL query, the parallel execution coordinator breaks down each operator in a SQL query into parallel pieces.
runs them in the right order as specified in the query, and then integrates the partial results produced by the parallel execution
servers executing the operators. The number of parallel execution servers assigned to a single operation is the degree of parallelism(DOP)
for an operation. Multiple operations within the same SQL statement all have the same degree of parallelism.

 


#################################################################
4 Parallelized SQL Statement <1>
#################################################################

4.1 Tips
1) Each SQL statement undergoes an optimization and parallelization process when it is parsed.
(Oracle會動態地選擇適合的並行操作)

2) After the optimizer determines the execution plan of a statement, the parallel execution coordinator determines the parallelization method
for each operation in the execution plan.
在最佳化器確定執行計劃後,並行協調者必須確定並行的方法,並行度(parallelism)。

3) Parallelism Between  Operations

Intra-operation parallelism:  操作內部的並行
Parallelization of an individual operation where the same operation is performed on smaller sets of rows by parallel execution servers.

Inter-operation parallelism: 操作間的並行
When two operations run concurrently on different sets of parallel execution servers with data flowing from one operation into the other,
we achieve what is termed inter-operation parallelism.

Consider the following statement:

Select * from employees order by employee_id.

這個語句的執行計劃分為兩步:
1) Full table scan the table employees
2) Order by employee_id
假定這個查詢語句的並行度設為4.


Each of the two operations(scan and sort) performed concurrently is given its own set of parallel execution servers.
兩個操作內部的並行度都設為4,所以,對於這個查詢的執行並行度為4+4=8; This is because a parent and child operator can be performed at the same time.
(inter-operation parallelism) 排序操作不必等到所有的掃描操作完成後,才開始。但inter-operation不能超過2,No more than two sets of parallel execution servers
can run simultaneously.

示例圖參考<>

 

#################################################################
5 Degree of Parallelism <1>
#################################################################

1) The number of parallel execution servers associated with a single operation is known as the degree of parallelism.

2) Note that the degree of parallelism applies directly only to intra-operation parallelism. If inter-operation parallelism is possible, the total number of
parallel execution servers for a statement can be twice the specified degree of parallelism.But, No more than two sets of parallel execution servers can run simultaneously.Only two sets of parallel execution servers need to be active to guarantee optimal inter-operation parallelism.
(並行執行度針對於Intra-operation,對於一個語句的操作,並行度可能是兩倍於Intra-operation parallelism)

3) Server ways to manage resource utilizations
限制資源的幾種方法:

1> PARALLEL_ADAPTIVE_MULTI_USER
2> User resource limits and profiles, which allow you to set limits on resource to each user
3> Database Resource Manager,which lets you allocate resource to different groups of users.


#################################################################
6 SQL Operations That Can Be Parallelized
#################################################################

6.1 Parallel Query
You can parallelize queries and subqueries in SELECT statement, as well as the query portions of DDL statements and DML statements.
However, you cannot parallelize the query portion of a DDL or DML statement if it references a remote object.
(Select,或DDL中的查詢部分,或DDL中的查詢部分都可並行,但如果DDL,DML中的查詢部分引用了遠端物件,則會自動變成序列. 如果select引用了遠端物件呢?應該可以的)

示例:
select /*+ parallel(big_table, 4) parallel(big_table_bak, 4) */ 
max(a.col1), avg(a.col1)
from big_table a, big_table_bak b
where a.col1 = b.col1
group by a.col2;

 

6.2 Parallel DDL
You can normally use parallel DDL when you use regular DDL. However, cannot be used on tables with object or LOB columns
(DDL不能用於帶物件或LOB列的表)

DDL Statements that can be parallized:

1) Create table as select ...

2) Create table IOT can be parallelized either with or without an AS Select Clause.
(索引組織表可以不帶AS Select ...)

3) Create index ..

4) Alter index rebuild ..

if table is partitioned:
5) Alter table move or [split or coalesce]

6) Alter index rebuild or [split] partitioned index.

Different parallelism is used for different operations. Parallel Create (partitioned) table as select and parallel create (partitioned) index
run with a degree of parallelism equal to the number of partitions.
(不同的操作有不同的並行度,對於分割槽表,分割槽索引,並行度一般等於分割槽數)

Parallel operations require accurate statistics to perform. optimally.
並行DDL需要準確的統計資料


示例:
Create table big_table_bak parallel as select * from big_table;

6.3 Parallel DML
You can normally use parallel DML where you use regular DML.

並行DML,除了並行物件(Table,index)本身需要設定並行屬性外,還必須顯式開啟會話的並行選項

示例:
ALTER SESSION ENABLE PARALLEL DML;

Alter table big_table_bak parallel 4;


insert into big_table_bak select * from big_table;

insert into big_table_bak select /*+ parallel(big_table 4) */ * from big_table;

 

 

6.4 SQL*Loader

SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE

An important point to remember is that indexes are not maintained during a parallel load.
不維護索引

具體的語法要參考<>

 

#################################################################
7 How to Make a Statement Run in Parallel
#################################################################

7.1 Parallel Query
To achieve parallelism for SQL query statements, one or more of the tables being scanned should have a parallel attribute.
(新增表的並行屬性,並行Hint select /*+ parallel(employee 4) */ from employee;)

7.2 Parallel DDL
To achieve parallelism for SQL DDL statements, the parallel clause should be specified.

7.3 Parallel DML

PDML: first
"alter session enable parallel DML;"
then any DML issued against a table with a parallel attribute will occur in parallel, if no PDML restrictions are violated.
alter table xxx paralle;
insert into xxx select * from xxx;

 


#################################################################
8 並行相關的檢視
#################################################################

SQL> show parameter parallel_max_servers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_max_servers                 integer     7


SQL> select index_name, degree,instances from user_indexes;

INDEX_NAME                     DEGREE                                   INSTANCES
------------------------------ ---------------------------------------- ----------------------------------------
IND_PK                         10                                       1


執行
select  col1,col2,col3 from big_table where col1 in (select distinct col1 from big_table);
oracle會啟動並行程式來掃描索引.

1) v$px_process;

SQL> select * from v$px_process;

SERVER_NAME STATUS           PID SPID                SID    SERIAL#
----------- --------- ---------- ------------ ---------- ----------
P000        IN USE            21 13395                24          6
P001        IN USE            23 13399                25          1
P002        IN USE            24 13401                29          1
P003        IN USE            25 13403                26          1
P004        IN USE            26 13405                27          1
P005        IN USE            27 13407                28          1
P006        IN USE            28 13409                30          1
P007        AVAILABLE         29 13411                  

8 rows selected

起了7個並行程式.


2) v$px_session
SQL> select * from v$px_session;

SADDR                   SID    SERIAL#      QCSID  QCSERIAL#  QCINST_ID SERVER_GROUP SERVER_SET    SERVER#     DEGREE REQ_DEGREE
---------------- ---------- ---------- ---------- ---------- ---------- ------------ ---------- ---------- ---------- ----------
000000005B2D73B0         24          6         23         14          1            1          1          1         14         20
000000005B2D6920         23         14         23                                                                    
000000005B2D7E40         25          1         23         14          1            1          1          2         14         20
000000005B2DA880         29          1         23         14          1            1          1          3         14         20
000000005B2D88D0         26          1         23         14          1            1          1          4         14         20
000000005B2D9360         27          1         23         14          1            1          1          5         14         20
000000005B2D9DF0         28          1         23         14          1            1          1          6         14         20
000000005B2DB310         30          1         23         14          1            1          1          7         14         20

8 rows selected

起了7個並行程式.
000000005B2D6920         23         14         23
這是Coordinator Process.

#################################################################
9 並行執行步驟的執行計劃
#################################################################

這些都是什麼意思呢?


PARALLEL_TO_SERIAL: Parallel execution; output of step is returned to serial "query coordinator" process.
什麼意思呢?

blank Serial execution.

SERIAL_FROM_REMOTE (S -> R)
Serial from remote Serial execution at a remote site.

SERIAL_TO_PARALLEL (S -> P)
Serial to parallel Serial execution; output of step is partitioned or broadcast to parallel execution servers.

PARALLEL_TO_PARALLEL (P -> P)
Parallel to parallel Parallel execution; output of step is repartitioned to second set of parallel execution servers.

PARALLEL_TO_SERIAL (P -> S)
Parallel to serial Parallel execution; output of step is returned to serial "query coordinator" process.

PARALLEL_COMBINED_WITH_PARENT (PWP)
Parallel combined with parent Parallel execution; output of step goes to next step in same parallel process. No interprocess communication to parent.

PARALLEL_COMBINED_WITH_CHILD (PWC)
Parallel combined with child Parallel execution; input of step comes from prior step in same parallel process. No interprocess communication from child.

 

#################################################################
10  Hints for Parallel Execution 並行執行的Hint
#################################################################


1) PARALLEL
Select /*+ PARALLEL(table_alias, degree,rac_option) */ xxx from xxx;

示例:
SELECT /*+ FULL(hr_emp) PARALLEL(hr_emp, 5) */ last_name FROM hr.employees hr_emp;

SELECT /*+ FULL(hr_emp) PARALLEL(hr_emp, DEFAULT,DEFAULT) */ last_name FROM hr.employees hr_emp;

第二個Default Specifies that the table should be split among all of the  available instance, with the of parallelism on each instance.
(RAC選項)


2) NOPARALLEL

示例:
SELECT /*+ NOPARALLEL(hr_emp) */ last_name FROM hr.employees hr_emp;

3) PQ_DISTRIBUTE
The PQ_DISTRIBUTE hint improves the performance of parallel join operations.

4) PARALLEL_INDEX
The PARALLEL_INDEX hint specifies the desired number of concurrent servers that can be used to parallelize index range scans for partitioned indexes.

SELECT /*+ PARALLEL_INDEX(table1, index1, 3, 2) +/ * from xxxx;

table1 -- table name or alias name of table
index1 -- index name which an index scan is to be performed.
3 -- Degree of parallelism
2 -- Two nodes of RAC

5) NOPARALLEL_INDEX

The NOPARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.

 


#################################################################
 問題
#################################################################

1 一旦啟動的並行程式,什麼時候會關閉?
Session退出後?
不會killed.仍會保留供其它並行使用。

2 收集ITPub.net上的並行相關的帖子,看有沒其它的注意事項?

3 建議使用Parallel Hint而不是修改物件的Parallel屬性來使用並行執行。


參考文件
1 <> Chapter 20

2 <>for more information about parallel

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10248702/viewspace-669510/,如需轉載,請註明出處,否則將追究法律責任。

相關文章