原文地址：http://mikixiyou.iteye.com/blog/1673672

在plsql開發中，會涉及到一些大資料量表的資料處理，如將某記錄數超億的表的記錄經過處理轉換插入到另外一張或幾張表。

常規的操作方法固然可以實現，但時間、磁碟IO、redo日誌等等都非常大。Oracle 提供了一種高階函式，可以將這種資料處理的效能提升到極限。這種函式稱為管道函式。

在實際專案中，管道函式會和表函式、資料流函式（即表函式和CURSOR結合）、資料集合、並行度一起使用，達到大資料處理的效能頂峰。

下面是一個例子，將表t_ss_normal的記錄插入到表t_target中，插入過程中有部分轉換操作。

我分成四個方法來實現這個資料處理操作。

第一個方法，也是最常規的方法，程式碼如下：

create table T_SS_NORMAL

(

owner VARCHAR2(30),

object_name VARCHAR2(128),

subobject_name VARCHAR2(30),

object_id NUMBER,

data_object_id NUMBER,

object_type VARCHAR2(19),

created DATE,

last_ddl_time DATE,

timestamp VARCHAR2(19),

status VARCHAR2(7),

temporary VARCHAR2(1),

generated VARCHAR2(1),

secondary VARCHAR2(1)

);

/

create table T_TARGET

(

owner VARCHAR2(30),

object_name VARCHAR2(128),

comm VARCHAR2(10)

);

這是源表和目標表的表結構。現在源表有200W條，其資料來自dba_objects檢視。

create or replace package pkg_test is

procedure load_target_normal;

end pkg_test;

create or replace package body pkg_test is

procedure load_target_normal is

begin

insert into t_target (owner, object_name, comm)

select owner, object_name, 'xxx' from t_ss_normal;

commit;

end;

begin

null;

end pkg_test;

一個insert into select語句搞定這個資料處理，簡單。

第二方法，採用管道函式實現這個資料處理。

create type obj_target as object(

owner VARCHAR2(30), object_name VARCHAR2(128), comm varchar2(10)

);

/

create or replace type typ_array_target as table of obj_target;

/

create or replace package pkg_test is

function pipe_target(p_source_data in sys_refcursor) return typ_array_target

pipelined;

procedure load_target;

end pkg_test;

首先建立兩個自定義的型別。obj_target的定義和t_target的表結構一致，用於儲存每一條目標表記錄。typ_array_target用於管道函式的返回值。

接著定義一個管道函式。

普通函式的結尾加一個pipelined關鍵字，就是管道函式。這個函式的返回引數型別為集合，這是為了使其能作為表函式使用。表函式就是在from子句中以table(v_resultset)呼叫的，v_resultset就是一個集合型別的引數。

最後定義一個呼叫儲存過程。

在包體中定義該管道函式和呼叫儲存過程。管道函式pipe_target的傳入引數一個sys_refcursor型別。這是一個遊標，可以理解為使用select * from table才能得到的結果集。

你也可以不用這個傳入的遊標，取而代之，在函式中定義一個遊標，也一樣使用。

function pipe_target(p_source_data in sys_refcursor) return typ_array_target

pipelined is

r_target_data obj_target := obj_target(null, null, null);

r_source_data t_ss%rowtype;

begin

loop

fetch p_source_data

into r_source_data;

exit when p_source_data%notfound;

r_target_data.owner := r_source_data.owner;

r_target_data.object_name := r_source_data.object_name;

r_target_data.comm := 'xxx';

pipe row(r_target_data);

end loop;

close p_source_data;

return;

end;

procedure load_target is

begin

insert into t_target

(owner, object_name, comm)

select owner, object_name, comm

from table(pipe_target(cursor(select * from t_ss_normal)));

commit;

end;

關鍵字 pipe row 的作用是將obj_target插入到typ_array_target型別的陣列中，管道函式自動返回這些資料。

因為源表的資料量會非常大，所以在fetch取值時會使用bulk collect ，實現批次取值。這樣做可以減少plsql引擎和sql引擎的控制轉換次數。這種轉換稱為上下文切換。

function pipe_target_array(p_source_data in sys_refcursor,

p_limit_size in pls_integer default c_default_limit)

return typ_array_target

pipelined is

r_target_data obj_target := obj_target(null, null, null);

type typ_source_data is table of t_ss%rowtype index by pls_integer;

aa_source_data typ_source_data;

begin

loop

fetch p_source_data bulk collect

into aa_source_data;

exit when aa_source_data.count = 0;

for i in 1 .. aa_source_data.count loop

r_target_data.owner := aa_source_data(i).owner;

r_target_data.object_name := aa_source_data(i).object_name;

r_target_data.comm := 'xxx';

pipe row(r_target_data);

end loop;

close p_source_data;

return;

end;

procedure load_target_array is

begin

insert into t_target

(owner, object_name, comm)

select owner, object_name, comm

from table(pipe_target_array(cursor (select * from t_ss_normal),

100));

commit;

end;

還可以使用並行度，使得管道函式可以多程式同時執行。並行度還有一個好處，就是將資料插入方式從常規路徑轉換為直接路徑。直接路徑可以大量減少redo日誌的生成量。

function pipe_target_parallel(p_source_data in sys_refcursor,

p_limit_size in pls_integer default c_default_limit)

return typ_array_target

pipelined

parallel_enable(partition p_source_data by any) is

r_target_data obj_target := obj_target(null, null, null);

type typ_source_data is table of t_ss%rowtype index by pls_integer;

aa_source_data typ_source_data;

begin

loop

fetch p_source_data bulk collect

into aa_source_data;

exit when aa_source_data.count = 0;

for i in 1 .. aa_source_data.count loop

r_target_data.owner := aa_source_data(i).owner;

r_target_data.object_name := aa_source_data(i).object_name;

r_target_data.comm := 'xxx';

pipe row(r_target_data);

end loop;

close p_source_data;

return;

end;

procedure load_target_parallel is

begin

execute immediate 'alter session enable parallel dml';

insert /*+parallel(t,4)*/

into t_target t

(owner, object_name, comm)

select owner, object_name, comm

from table(pipe_target_array(cursor (select /*+parallel(s,4)*/

*

from t_ss_normal s),

100));

commit;

end;

在測試過程中，我測試200W記錄的操作，時間從24秒降到到8秒，重做日誌也降低更多。

Oracle的pipelined函式實現高效能大資料處理

相關文章