文件筆記--Oracle Data Pump 2

zecaro發表於2010-12-29

閱讀文件時,寫寫筆記而已。一般只寫綜述的那一章。

Oracle® Database Utilities
10g Release 2 (10.2)

Part Number B14215-01

What Happens During Execution of a Data Pump Job?

Data Pump jobs use a master table, a master process, and worker processes to perform the work and keep track of progress.

【Data Pump任務透過使用一個master table,一個master process和worker processes 來進行工作和與progress通訊】

Coordination of a Job

For every Data Pump Export job and Data Pump Import job, a master process is created. The master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.

【每一個Data Pump任務,會建立一個master process來控制整個任務,包括與客戶端通訊,建立和控制a pool of worker processes和記錄日誌】

Tracking Progress Within a Job

While the data and metadata are being transferred, a master table is used to track the progress within a job. The master table is implemented as a user table within the database. The specific function of the master table for export and import jobs is as follows:

【在db裡,master table就像使用者表一樣實現,以下是它的功能:】

  • For export jobs, the master table records the location of database objects within a dump file set. Export builds and maintains the master table for the duration of the job. At the end of an export job, the content of the master table is written to a file in the dump file set.

    【export時,master table記錄db objects在dump file set裡的位置。整個export的過程都在建立和維護master table,export的最後,master table會被寫入dump file set】

  • For import jobs, the master table is loaded from the dump file set and is used to control the sequence of operations for locating objects that need to be imported into the target database.

    【import時,master table從dump file set中載入,用於控制安放objects位置的順序的操作】

The master table is created in the schema of the current user performing the export or import operation. Therefore, that user must have sufficient tablespace quota for its creation. The name of the master table is the same as the name of the job that created it. Therefore, you cannot explicitly give a Data Pump job the same name as a preexisting table or view.

【master table在當前操作匯入匯出的使用者的schema裡建立,所以必須有足夠的空間。其次,master table的名字與進行的匯入匯出任務相同。因此,你不能給匯入匯出任務指定一個與已經存在的表或檢視相同的名字】

For all operations, the information in the master table is used to restart a job.

【master table裡的內容使用者重新開始一個任務。】

The master table is either retained or dropped, depending on the circumstances, as follows:

【master table要麼保留,要麼丟棄,取決於以下情形:】

  • Upon successful job completion, the master table is dropped.
  • If a job is stopped using the STOP_JOB interactive command, the master table is retained for use in restarting the job.
  • If a job is killed using the  KILL_JOB interactive command, the master table is dropped and the job cannot be restarted.
  • If a job  terminates unexpectedly, the master table is  retained. You can delete it if you do not intend to restart the job.
  • If a job  stops before it starts running (that is, it is in the Defining state), the master table is dropped.


See Also:

 for more information about how job names are formed.


Filtering Data and Metadata During a Job

Within the master table, specific objects are assigned attributes such as name or owning schema. Objects also belong to a class of objects (such as TABLE,INDEX, or DIRECTORY). The class of an object is called its object type. You can use the EXCLUDE and INCLUDE parameters to restrict the types of objects that are exported and imported. The objects can be based upon the name of the object or the name of the schema that owns the object. You can also specify data-specific filters to restrict the rows that are exported and imported.

【使用EXCLUDE and INCLUDE 引數限制匯入匯出的object。還可以對指定資料的進行過濾,選擇哪些行要匯入】


See Also:


 

Transforming Metadata During a Job

When you are moving data from one database to another, it is often useful to perform. transformations on the metadata for remapping storage between tablespaces or redefining the owner of a particular set of objects. This is done using the following Data Pump Import parameters: REMAP_DATAFILE, REMAP_SCHEMA, REMAP_TABLESPACE, and TRANSFORM.

【使用上面的幾個引數進行transformations on the metadata for remapping storage between tablespaces 以及redefining the owner of a particular set of objects】


See Also:


Maximizing Job Performance

To improve throughput of a job, you can use the PARALLEL parameter to set a degree of parallelism that takes maximum advantage of current conditions. For example, to limit the effect of a job on a production system, the database administrator (DBA) might wish to restrict the parallelism. The degree of parallelism can be reset at any time during a job. For example, PARALLEL could be set to 2 during production hours to restrict a particular job to only two degrees of parallelism, and during nonproduction hours it could be reset to 8. The parallelism setting is enforced by the master process, which allocates work to be executed to worker processes that perform. the data and metadata processing within an operation. These worker processes operate in parallel. In general, the degree of parallelism should be set to more than twice the number of CPUs on an instance.

【為提高吞吐量,使用PARALLEL引數,在執行過程中,都可以調整這個引數】


Note:

The ability to adjust the degree of parallelism is available only in the Enterprise Edition of Oracle Database.

【企業版才可以用PARALLEL引數】


Loading and Unloading of Data

The worker processes are the ones that actually unload and load metadata and table data in parallel. Worker processes are created as needed until  the number of worker processes is equal to the value supplied for the  PARALLEL  command-line parameter. The number of active worker processes  can be reset throughout the life of a job.

【在執行過程中,都可以調整 worker processes 數量】


Note:
The value of PARALLEL is  restricted to 1  in the  Standard Edition  of Oracle Database 10g.



When a worker process is assigned the task of loading or unloading a very large table or partition, it may choose to use the external tables access method to make maximum use of parallel execution. In such a case, the worker process becomes a parallel execution coordinator. The actual loading and unloading work is divided among some number of parallel I/O execution processes (sometimes called slaves) allocated from the instancewide pool of parallel I/O execution processes.

Monitoring Job Status

【檢視狀態可以是 interactive-command mode(更詳細寫),也可以是logging mode. 還會有日誌生成。另外也可以透過試圖相關查詢到】

The Data Pump Export and Import utilities can be attached to a job in either interactive-command mode or logging mode. In logging mode, real-time detailed status about the job is automatically displayed during job execution. The information displayed can include the job and parameter descriptions, an estimate of the amount of data to be exported, a description of the current operation or item being processed, files used during the job, any errors encountered, and the final job state (Stopped or Completed).


See Also:

  •  for information about changing the frequency of the status display in command-line Export

  •  for information about changing the frequency of the status display in command-line Import


Job status can be displayed on request in interactive-command mode. The information displayed can include the job description and state, a description of the current operation or item being processed, files being written, and a cumulative status.


See Also:

  •  for information about the STATUS command in interactive Export.

  •  for information about the STATUS command in interactive Import


A log file can also be optionally written during the execution of a job. The log file summarizes the progress of the job, lists any errors that were encountered along the way, and records the completion status of the job.

See Also:

  •  for information about how to set the file specification for a log file for Export

  •  for information about how to set the file specification for a log file for Import

An alternative way to determine job status or to get other information about Data Pump jobs, would be to query the DBA_DATAPUMP_JOBS, USER_DATAPUMP_JOBS, or DBA_DATAPUMP_SESSIONS views. See  for descriptions of these views.

Monitoring the Progress of Executing Jobs

Data Pump operations that transfer table data (export and import) maintain an entry in the  V$SESSION_LONGOPS  dynamic performance view indicating the job progress (in megabytes of table data transferred). The entry contains the estimated transfer size and is periodically updated to reflect the actual amount of data transferred.

Note:

The usefulness of the estimate value for export operations depends on the type of estimation requested when the operation was initiated, and it is updated as required if exceeded by the actual transfer amount. The estimate value for import operations is exact.

The V$SESSION_LONGOPS columns that are relevant to a Data Pump job are as follows:

USERNAME - job owner

OPNAME - job name

TARGET_DESC - job operation

SOFAR - megabytes (MB) transferred thus far during the job

TOTALWORK - estimated number of megabytes (MB) in the job

UNITS - 'MB'

MESSAGE - a formatted status message of the form.:

': : nnn out of mmm MB done'

 

File Allocation

There are three types of files managed by Data Pump jobs:

  • Dump files  to contain the data and metadata that is being moved

  • Log files  to record the messages associated with an operation

  • SQL files  to record the output of a SQLFILE operation. A SQLFILE operation is invoked using the Data Pump Import SQLFILE parameter and results in all of the SQL DDL that Import will be executing based on other parameters, being written to a SQL file. See  for more information.

An understanding of how Data Pump allocates and handles these files will help you to use Export and Import to their fullest advantage.

Specifying Files and Adding Additional Dump Files

For export operations, you can specify dump files at the time the job is defined, as well as at a later time during the operation. For example, if you discover that space is running low during an export operation, you can add additional dump files by using the Data Pump Export  ADD_FILE  command in interactive mode.

【如果在匯出時發現空間不夠,也可以透過interactive mode的ADD_FILE命令增加更多的dump files】

For import operations, all dump files  must  be specified at the time the job is defined.

Log files and SQL files   will  overwrite previously existing files. Dump files  will  never  overwrite previously existing files. Instead, an error will be generated.

Default Locations for Dump, Log, and SQL Files

Because Data Pump is  server-based, rather than client-based, dump files, log files, and SQL files are accessed relative to server-based directory paths. Data Pump requires you to specify directory paths as directory objects. A directory object maps a name to a directory path on the file system.

For example, the following SQL statement creates a directory object named dpump_dir1 that is mapped to a directory located at /usr/apps/datafiles.

SQL> CREATE DIRECTORY dpump_dir1 AS '/usr/apps/datafiles';

The reason that a directory object is required is to  ensure data security and integrity. For example:

  • If you were allowed to specify a directory path location for an input file, you might be able to  read data  that the server has access to, but to which you  should not.

  • If you were allowed to specify a directory path location for an output file, the server might  overwrite  a file that you might not normally have privileges to delete.

On Unix and Windows NT systems, a default directory object,  DATA_PUMP_DIR,  is created at database creation or whenever the database dictionary is upgraded. By default, it is available only to privileged users.

If you are not a privileged user, before you can run Data Pump Export or Data Pump Import, a directory object must be created  by a database administrator (DBA) or by any user with the CREATE ANY DIRECTORY privilege.

After a directory is created, the user creating the directory object needs to  grant READ or WRITE permission on the directory to other users.  For example, to allow the Oracle database to read and write files on behalf of user hr in the directory named by dpump_dir1, the DBA must execute the following command:

SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;

Note that READ or WRITE permission to a directory object only means that the Oracle database will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.

Data Pump Export and Import use the following order of precedence to determine a file's location:

確定file位置的順序,3和4中的DATA_PUMP_DIR,一個是系統的環境變數,一個是db裡的預設值】

  1. If a directory object is specified as part of the file specification, then the location specified by that directory object is used. (The directory object must be separated from the filename by a colon【冒號】.)

  2. If a directory object is not specified for a file, then the directory object named by the  DIRECTORY parameter  is used.

  3. If a directory object is not specified, and if no directory object was named by the  DIRECTORY parameter, then the value of the  environment variable, DATA_PUMP_DIR, is used. This environment variable is defined using operating system commands on the client system where the Data Pump Export and Import utilities are run. The value assigned to this client-based environment variable must be the name of a server-based directory object, which must first be created on the server system by a DBA. For example, the following SQL statement creates a directory object on the server system. The name of the directory object is DUMP_FILES1, and it is located at '/usr/apps/dumpfiles1'.

    SQL> CREATE DIRECTORY DUMP_FILES1 AS '/usr/apps/dumpfiles1';
    
    

    Then, a user on a UNIX-based client system using csh can assign the value DUMP_FILES1 to the environment variable DATA_PUMP_DIR. The DIRECTORY parameter can then be omitted from the command line. The dump file employees.dmp, as well as the log file export.log, will be written to '/usr/apps/dumpfiles1'.

    %setenv DATA_PUMP_DIR DUMP_FILES1
    %expdp hr/hr TABLES=employees DUMPFILE=employees.dmp
    
    
  4. If none of the previous three conditions yields a directory object and you are a privileged user, then Data Pump attempts to use the value of the default server-based directory object, DATA_PUMP_DIR. This directory object is automatically created at database creation or when the database dictionary is upgraded. You can use the following SQL query to see the path definition for DATA_PUMP_DIR:

    SQL> SELECT directory_name, directory_path FROM dba_directories
    2 WHERE directory_name='DATA_PUMP_DIR';
    
    

    If you are not a privileged user, access to the DATA_PUMP_DIR directory object must have previously been granted to you by a DBA.

    Do not confuse the default DATA_PUMP_DIR directory object with the client-based environment variable of the same name.

Using Directory Objects When Automatic Storage Management Is Enabled

If you use Data Pump Export or Import with Automatic Storage Management (ASM) enabled, you must define the directory object used for the dump file so that the ASM disk-group name is used (instead of an operating system directory path). A separate directory object, which points to an operating system directory path, should be used for the log file. For example, you would create a directory object for the ASM dump file as follows:

SQL> CREATE or REPLACE DIRECTORY dpump_dir as '+DATAFILES/';

Then you would create a separate directory object for the log file:

SQL> CREATE or REPLACE DIRECTORY dpump_log as '/homedir/user1/';

To enable user hr to have access to these directory objects, you would assign the necessary privileges, for example:

SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir TO hr;
SQL> GRANT READ, WRITE ON DIRECTORY dpump_log TO hr;

You would then use the following Data Pump Export command:

> expdp hr/hr DIRECTORY=dpump_dir DUMPFILE=hr.dmp LOGFILE=dpump_log:hr.log

See Also:

  •  for information about using this parameter in Data Pump Export

  •  for information about using this parameter in Data Pump Import

  •  for information about the CREATE DIRECTORY command

  •  for more information about Automatic Storage Management (ASM)

Setting Parallelism

For export and import operations, the parallelism setting (specified with the PARALLEL parameter) should be less than or equal to the number of dump files in the dump file set. If there are not enough dump files, the performance will not be optimal because multiple threads of execution will be trying to access the same dump file.

parallel引數要小於等於dump files的數量】

The PARALLEL parameter is valid only in the Enterprise Edition of Oracle Database 10g.

Using Substitution Variables

Instead of, or in addition to, listing specific filenames, you can use the DUMPFILE parameter during export operations to specify multiple dump files, by using a  substitution variable (%U) in the filename. This is called a dump file template. The new dump files are created as they are needed,  beginning with 01 for %U, then using 02, 03, and so on.  Enough dump files are created to allow all processes specified by the current setting of the PARALLEL parameter to be active. If one of the dump files becomes full because its size has reached the maximum size specified by the FILESIZE parameter, it is closed, and a new dump file (with a new generated name) is created to take its place.

變數替換,下面是舉例】

If multiple dump file templates are provided, they are used to generate dump files in a round-robin fashion. For example, if expa%U, expb%U, and expc%U were all specified for a job having a parallelism of 6, the initial dump files created would be expa01.dmp, expb01.dmp, expc01.dmp, expa02.dmp, expb02.dmp, and expc02.dmp.

For  import  and  SQLFILE  operations, if dump file specifications expa%U, expb%U, and expc%U are specified, then the operation will begin by attempting to open the dump files expa01.dmp,expb01.dmp, and expc01.dmp. If the dump file containing the master table is not found in this set, the operation expands its search for dump files by  incrementing the substitution variable  and looking up the new filenames (for example, expa02.dmp, expb02.dmp, and expc02.dmp). The search continues  until the dump file containing the master table is located. If a dump file does not exist, the operation stops incrementing the substitution variable for the dump file specification that was in error. For example, if expb01.dmp and expb02.dmp are found but expb03.dmp is not found, then no more files are searched for using the expb%U specification. Once the master table is found, it is used to determine whether all dump files in the dump file set have been located.

Moving Data Between Different Database Versions

Because most Data Pump operations are performed on the server side, if you are using any version of the database other than COMPATIBLE, you must provide the server with specific version information. Otherwise, errors may occur. To specify version information, use the VERSION parameter.

See Also:

  •  for information about the Data Pump Export VERSION parameter

  •  for information about the Data Pump Import VERSION parameter

Keep the following information in mind when you are using Data Pump Export and Import to move data between different database versions:

不同db版本注意事項】

  • If you specify a database version that is older than the current database version, certain features may be unavailable. For example, specifying VERSION=10.1 will cause an error if data compression is also specified for the job because compression was not supported in 10.1.

  • On a Data Pump  export, if you  specify a database version that is older than the current database version, then a dump file set is created that  you can import into that older version of the database. However, the dump file set will  not  contain any objects that the older database version does not support. For example, if you export from a version 10.2 database to a version 10.1 database, comments on indextypes will not be exported into the dump file set.

  • Data Pump  Import  can  always read  dump file sets created by  older versions  of the database.

  • Data Pump  Import  cannot  read dump file sets created by a database version that is  newer  than the current database version, unless those dump file sets were created with the version parameter set to the version of the target database. Therefore, the best way to  perform. a downgrade  is to perform. your Data Pump export with the VERSION parameter set to the version of the target database.

  • When operating across  a network link, Data Pump requires that  the remote database version be either the same as the local database or one version older, at the most. For example, if the local database is version 10.2, the remote database must be either version 10.1 or 10.2. If the local database is version 10.1, then 10.1 is the only version supported for the remote database

Original Export and Import Versus Data Pump Export and Import

【新舊對比】

If you are familiar with the original Export (exp) and Import (imp) utilities, it is important to understand that many of the concepts behind them  do not apply to  Data Pump Export (expdp) and Data Pump Import (impdp). In particular:

  • Data Pump Export and Import operate  on a group of files  called a dump file set rather than on a  single sequential  dump file.

  • Data Pump Export and Import access files  on the server  rather than  on the client. This results in improved performance. It also means that directory objects are required when you specify file locations.

  • The Data Pump Export and Import modes operate symmetrically, whereas original export and import did not always exhibit this behavior.

    original export and import 不對稱麼?】

    For example, suppose you perform. an export with FULL=Y, followed by an import using SCHEMAS=HR. This will produce the same results as if you performed an export with SCHEMAS=HR, followed by an import with FULL=Y.

  • Data Pump Export and Import use parallel execution rather than a single stream of execution, for improved performance. This means that the order of data within dump file sets and the information in the log files is more variable.

  • Data Pump Export and Import represent metadata in the dump file set as XML documents rather than as DDL commands. This provides improved flexibility for transforming the metadata at import time.

  • Data Pump Export and Import are  self-tuning utilities. Tuning parameters that were used in original Export and Import, such as BUFFER and RECORDLENGTH, are neither required nor supported by Data Pump Export and Import.

    自我調節,所以,諸如BUFFER and RECORDLENGTH引數無用也無需】

  • At import time there is  no option to perform.  interim commits during the restoration of a partition. This was provided by the COMMIT parameter in original Import.

  • There is  no option to merge extents when you re-create tables. In original Import, this was provided by the COMPRESS parameter. Instead, extents are reallocated according to storage parameters for the target table.

  • Sequential media, such as tapes and pipes, are  not supported.

  • The Data Pump method for moving data between different database versions is different than the method used by original Export/Import. With original Export, you had to run an older version of Export (exp) to produce a dump file that was compatible with an older database version. With Data Pump, you can use the current Export (expdp) version and  simply use the VERSION parameter to specify the target database version. See .

  • When you are importing data into an existing table using either APPEND or TRUNCATE, if any row violates an active constraint, the load is discontinued and no data is loaded. This is different from original Import, which logs any rows that are in violation and continues with the load.

  • Data Pump Export and Import  consume more undo tablespace  than original Export and Import. This is due to additional metadata queries during export and some relatively long-running master table queries during import. As a result, for databases with large amounts of metadata, you may receive an ORA-01555: snapshot too old error. To avoid this, consider adding additional undo tablespace or increasing the value of the UNDO_RETENTION initialization parameter for the database.

  • If a table has compression enabled, Data Pump Import attempts to compress the data being loaded. Whereas, the original Import utility loaded data in such a way that if a even table had compression enabled, the data was not compressed upon import.

  • Data Pump supports character set conversion for both direct path and external tables. Most of the restrictions that exist for character set conversions in the original Import utility do not apply to Data Pump. The one case in which character set conversions are  not supported  under the Data Pump is when using transportable tablespaces. [This info was added per mail from Simon Law/Bill Fisher on 2/23/05]

See Also:

For a comparison of Data Pump Export and Import parameters to the parameters of original Export and Import, see the following:


 


 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23650854/viewspace-682835/,如需轉載,請註明出處,否則將追究法律責任。

相關文章