使用CSSCAN 工具在源庫檢查資料。
指定掃描的schame，只能指定一個schame
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck USER=scott CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=3
指定掃描的表，在unix環境下當指定多個表時，' 是必須的，否則會報錯：0403-057 Syntax error at line 21 : `(' is not expected. ):
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck TABLE='(SCOTT.DEPT,SCOTT.EMP)' CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2
在windows環境下，' 可以省略！
c:\>csscan \"sys/@ as sysdba\" LOG=c:\temp\expcheck TABLE=(SCOTT.DEPT,SCOTT.EMP) CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2
如果你有很多表要掃描，或許要避免不同os環境下語法的麻煩。就可以使用引數檔案。
Example contents of csscan.par:
LOG=/tmp/expcheck
TABLE=(SCOTT.DEPT,SCOTT.EMP)
FULL=N
CAPTURE=Y
TOCHAR=AL32UTF8
PROCESS=6
ARRAY=1024000

使用引數檔案執行：
$ csscan \"sys/@ as sysdba\" PARFILE=/tmp/csscan.par
or
C:\>csscan \"sys/@ as sysdba\" PARFILE=c:\temp\csscan.par

Csscan will create 3 files :
expcheck.out csscan 輸出的日誌 a log of the output of csscan
expcheck.txt 資料庫掃描概要the Database Scan Summary Report
expcheck.err 掃描時出錯的記錄，包含了出錯的表，欄位（contains the rowid's of the Convertible , Truncation and Lossy rows reported in expcheck.txt）
下面是一些注意點：
You can only specify on user at the time, if you are exporting the majority of a database then use FULL=Y
$ csscan \"sys/@ as sysdba\" FULL=Y TOCHAR=AL32UTF8 LOG=expcheck CAPTURE=Y ARRAY=1000000 PROCESS=2
* Always run Csscan connecting with a 'sysdba' connection/user, do not use the "system" or "csmig" user.
* The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for example) the faster Csscan will be done, the lower this is the less impact it will have on your system. Adapt if needed.
* Do not specify the TONCHAR or FROMNCHAR csscan parameters , those are to change the NLS_NCHAR_CHARACTERSET. Again they are not needed and should not be specified.
* The csscan SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged / table. Using SUPPRESS=1000 will log max 1000 rows for each table in the .err file. It will not affect the information in the .txt file. It WILL affect the data logged in the .err file. This is mainly useful for the first scan of big databases, if you have no idea how much "Convertible" or "Lossy" there is in a database then this will avoid that the .err file becomes 100's of MB big and it limits also the space used by the csscan tables under the Csmig schema.
note that to have correct result of the following select you should NOT use the Csscan SUPPRESS option.

Once Csscan has been run you then need to check the .txt file

If there is any "Lossy" data - this is data that CANNOT be converted to the new NLS_CHARACTERSET
- you need to check further and see why this data is "Lossy", discussing "Lossy" is outside the scope of this note。
This select will give all the lossy objects found in the last Cssan run:
conn / AS sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' ' LossyColumns
FROM csmig.csmv$errors z
WHERE z.error_type ='DATA_LOSS'
ORDER BY LossyColumns
/

For solving the ORA-01401 / ORA-12899 error you need to know which columns are logged as "Truncation" and what the new size of the data will be after import.

You can find that in the expcheck.err file as "Max Post Conversion Data Size"
For example, check in the expcheck.txt file wich table has "Truncation", let's assume you have there a row that say's:

-- snip from expcheck.txt
[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy
--------------------- ---------------- ---------------- ----------------
...
SCOTT.TESTUTF8 69 6 0
...

then look in the expcheck.err file for "TESTUTF8" until the "Max Post Conversion Data Size" is bigger then the column size for that table.

-- snip from expcheck.err
User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81
the max size after going to AL32UTF8 will be 81 bytes for this column.

Or you can use this select to have a list of the columns that have "Truncation" and the new size in bytes that is minimally needed:

conn / AS sysdba
SET serveroutput ON
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN
( SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name ,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type='EXCEED_SIZE'
ORDER BY u.owner_name,
u.table_name,
u.column_name
)
LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# =rec.owner_id
AND obj# =rec.table_id
AND col# =rec.column_id
AND intcol#=rec.column_intid;

DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' Bytes');
END LOOP;
END;
/

oracle的字符集檢查工具CSSCAN(一)

相關文章