Csscan output explained [ID 444701.1]

maojinyu發表於2011-10-20
Csscan output explained [ID 444701.1]

Modified 23-JUN-2011 Type REFERENCE Status PUBLISHED

In this Document



























Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.7.4 and later [Release: 8.1.7 and later ]
Information in this document applies to any platform.
Oracle Server Enterprise Edition
Oracle Server Enterprise Edition - Version: 8.1.7.4 to 11.2.0.1.0

Purpose

Whenever a characterset conversion is attempted or involved one should always run the Csscan utility.

To install Csscan please see:
Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)

For 8i and 9i please do install Csscan version 1.2 , do not use the old csscan versions included in 8i/9i .

This tool analyzes the code points in the database and verifies that they are known in the source the database characterset and known in the target database characterset. The output of the Csscan utility is vital to determining the health of the data in the database, and this becomes especially crucial when changing charactersets, as well as during import and export operations or database link transfers when source and target database differ in characterset. Knowing what the Csscan output shows, and what it means and it's limits helps you to ensure the integrity of the data.

Scope

For all DBAs performing a characterset conversion, import/export between databases with different charactersets, and those who wish to learn more about Csscan output. It covers the basic conventions of the output files, what they mean, and what can be done to resolve the different scenarios that can arise.

Csscan output explained

A) Csscan output files.

Csscan generates 3 output files each run. The name of the files can be set by the user at runtime using the csscan LOG= parameter, but for the sake of this example we will assume use of the name 'charcheck' for the LOG=charcheck parameter. The filenames are then:

charcheck.out - Scan log -- provides a complete list of all scanned items
charcheck.txt - Database Scan Summary -- Summary of scan results
charcheck.err - Individual Exceptions -- Excerpted view of scan exceptions.

If you log a SR about a characterset change then please always provide the .txt and .err file. Zip them before uploading. If the .err file is really to big then please provide at least the part until the [Application data individual exceptions] header.

A.1) The .out file

The scan.out shows a log of the Csscan run, it's advisable to check the .out file for ora- errors after the csscan run.

A.2) The .txt file

The scan.txt file gives a snapshot view of the state of the database data.
Here is a sample output (some sections shortened) of a Csscan TOCHAR=AL32UTF8 scan:

Database Scan Summary Report

Time Started : 2008-12-05 10:03:07
Time Completed: 2008-12-06 05:52:53

Process ID Time Started Time Completed
---------- -------------------- --------------------
1 2008-12-05 10:18:53 2008-12-06 05:52:11
..
7 2008-12-05 10:18:54 2008-12-06 05:52:11
8 2008-12-05 10:18:54 2008-12-06 05:52:11
---------- -------------------- --------------------

[Database Size]

Tablespace Used Free Total Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM 12,586.25M 5,413.75M 18,000.00M 343.00K
ACED 9,807.38M 2,192.63M 12,000.00M 13.38M
....
SERUSD 267.44M 1,732.56M 2,000.00M .00K
SERUSX 64.00K 1,999.94M 2,000.00M .00K
XXLAAPPS 7,285.88M 714.13M 8,000.00M .00K
XXCPX 640.06M 1,359.94M 2,000.00M .00K
UNDO_TS1 2,141.13M 93,858.88M 96,000.00M .00K
UNDO_TS2 24,519.00M 35,481.00M 60,000.00M .00K
------------------------- --------------- --------------- --------------- ---------------
Total 3,581,509.00M 586,521.98M 4,168,030.98M 586.55M

The size of the largest CLOB is 6354008 bytes

[Database Scan Parameters]

Parameter Value
------------------------------ ------------------------------------------------
CSSCAN Version v1.2
Instance Name SSJDV1
Database Version 9.2.0.7.0
Scan type Full database
Scan CHAR data? YES
Database character set US7ASCII
FROMCHAR US7ASCII
TOCHAR AL32UTF8
Scan NCHAR data? NO
Array fetch buffer size 1000000
Number of processes 8
Capture convertible data? YES
------------------------------ ------------------------------------------------

[Scan Summary]

Some character type data in the data dictionary are not convertible to the new character set
Some character type application data are not convertible to the new character set

[Data Dictionary Conversion Summary]

Datatype Changeless Convertible Truncation Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2 136,913,559 0 0 154
CHAR 7 0 0 0
LONG 6,614,799 0 0 0
CLOB 58 81 0 0
--------------------- ---------------- ---------------- ---------------- ----------------
Total 143,528,423 81 0 154
Total in percentage 100% 0% 0% 0%

[Application Data Conversion Summary]

Datatype Changeless Convertible Truncation Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2 189,317,244,480 0 14,326 7,455,330
CHAR 977,410,022 0 0 0
LONG 36,450,616 0 0 648
CLOB 585,086,356 203,702 0 523
--------------------- ---------------- ---------------- ---------------- ----------------
Total 190,916,191,474 203,702 14,326 7,456,501
Total in percentage 100% 0% 0% 0%

[Distribution of Convertible Data per Table]

USER.TABLE Convertible Truncation Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
ACEREP.CXCOPYLOGMSG 0 0 4
..
APPLSYS.ALR_OUTPUT_HISTORY 0 0 8
APPLSYS.FND_COMPILED_DESCRIPTIVE_FLEXS 0 0 1
APPLSYS.FND_CONCURRENT_PROGRAMS_TL 0 0 1
APPLSYS.FND_CONC_PROG_ANNOTATIONS 20 0 0
APPLSYS.FND_DESCR_FLEX_COL_USAGE_TL 0 0 2
APPLSYS.FND_DOCUMENTS_LONG_TEXT 0 0 623
APPLSYS.FND_DOCUMENTS_SHORT_TEXT 0 10 6,414
APPLSYS.FND_DOCUMENTS_TL 0 8 4,694
APPLSYS.FND_FLEX_VALUES_TL 0 0 20
APPLSYS.FND_LOBS 0 1 3,894
APPLSYS.FND_OAM_CONTEXT_FILES 6 0 0
APPLSYS.FND_OAM_CSA_COLLECTIONS 3 0 1
APPLSYS.FND_OAM_DOC_LINK 1 0 0
APPLSYS.FND_REGISTRATIONS 0 0 4
APPLSYS.FND_REQUEST_GROUPS 0 1 2
APPLSYS.FND_USER_PREFERENCES 0 0 2
APPLSYS.JDR_ATTRIBUTES 0 0 1
APPLSYS.WF_ITEM_ATTRIBUTE_VALUES 41,967 0 204
APPLSYS.WF_LOCAL_ROLES 0 0 5,019
APPLSYS.WF_NOTIFICATION_ATTRIBUTES 0 0 857
APPLSYS.WF_NOTIFICATION_OUT 15 0 0
APPLSYS.WF_ROUTING_RULES 0 0 2
..
SYS.HISTGRM$ 0 0 6
SYS.METASTYLESHEET 58 0 0
SYS.RULE$ 23 0 0
SYS.SOURCE$ 0 0 147
SYSCASE.SDW_PROGRAM_LOCATIONS 0 0 1
SYSTEM.CRTBOX 0 55 55
-------------------------------------------------- ---------------- ---------------- ----------------

[Distribution of Convertible Data per Column]

USER.TABLE|COLUMN Convertible Truncation Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
ACEREP.CXCOPYLOGMSG|MESSAGE 0 0 4
ACEREP.CXEXP|EXPSTRING 0 0 10
ACEREP.CXEXPORTLOGMSG|MESSAGE 0 0 1
ACEREP.CXKBSBLOB|KBSBLOB 0 0 23
ACEREP.CXPURGEKBSELEMENT043004|PURGE_LOG 17,804 0 0
...
SYS.HISTGRM$|EPVALUE 0 0 6
SYS.METASTYLESHEET|STYLESHEET 58 0 0
SYS.RULE$|CONDITION 23 0 0
SYS.SOURCE$|SOURCE 0 0 147
-------------------------------------------------- ---------------- ---------------- ----------------

[Indexes to be Rebuilt]

USER.INDEX on USER.TABLE(COLUMN)
-----------------------------------------------------------------------------------------
APPLSYS.FND_CONCURRENT_PROGRAMS_TL_U2 on APPLSYS.FND_CONCURRENT_PROGRAMS_TL(APPLICATION_ID)
...
GCTA.COUNTRY_CODE_IDX on GCTA.CFI_HTS_MASS_APPROVE(COUNTRY_CODE)
-----------------------------------------------------------------------------------------


The .txt file shows:
* Time Started / Time Completed: duration of the Csscan run. Csscan will do a fetch of all character data, so running time is in most cases at least the time to do a full export.
* [Database Size]: the size of the data within the database, the Expansion column (if applicable) gives an estimation on how much more place you need in the current tablespace when going to the new characterset. The Tablespace Expansion for tablespace X is calculated as the grand total of the differences between the byte length of a string converted to the target character set and the original byte length of this string over all strings scanned in tables in X. The distribution of values in blocks, PCTFREE, free extents, etc., are not taken into account.
* [Database Scan Parameters]: the parameters used to run Csscan
* [Scan Summary]: which gives you directly an idea if you can do a full exp/imp, use Csalter or "Alter Database Character Set" as described in point D)
* [Data Dictionary Conversion Summary]: gives an overview of the amount of Changeless, Convertible, Truncation or Lossy data there is in the Data Dictionary.
* [Application Data Conversion Summary]: gives an overview of the amount of Changeless, Convertible, Truncation or Lossy data there is in User data.
* [Distribution of Convertible Data per Table]: gives a breakdown on table basis.
* [Distribution of Convertible Data per Column]: gives a breakdown on column basis.
* [Indexes to be Rebuilt]: gives which indexes are going to be affected by convertible data. The name of the section is bit misleading. When using full export/import there nothing to do on those indexes. When using Cslater/alter database characterset together with a partial export/import it depends on the amount of 'convertible' data in the underlying columns. If only a few rows in the underlying columns are 'convertible' then there is nothing to do (the indexes do not need to be rebuild as such). But if you have a lot of 'convertible' data in underlying columns it might be a good idea to drop and recreate them after the import, simply for performance reasons. The only exception is an index on a CHAR/NCHAR column that you need to adapt for "truncation". In that case all key values of a CHAR/NCHAR index key have to be padded with blanks to the new length and it may be more efficient to drop and recreate the index.
* [Truncation Due To Character Semantics]: (not often seen) This can be seen if you use Char Semantics in the current database.The Truncation Due to Character Semantics section identifies the number of data cells that would be truncated if they were converted to the target character set (for example, by the SQL CONVERT function or another inline conversion process) before the database character set is updated with the Csalter script. If the data conversion occurs after the database character set is changed (= you use export/import for convertible data), then this section can be ignored.

A.3) The .err file.

The output in the .err file depends on the CAPTURE=Y or CAPTURE=N Csscan parameter.

If you use CAPTURE=N then the .err file will only log rows who are Lossy or Truncation. Convertible data is not logged in the .err file.

If you use CAPTURE=Y then the .err file will log rows who are Convertible, Lossy or Truncation. Using CAPTURE=Y may also increase the running time of csscan (and the space needed for the csmig tables) if there are a lot of 'Convertible' seen for every 'Convertible' row an insert is done in a table in the csmig schema

The SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged for a table. Using SUPPRESS=1000 will log max 1000 rows for a table in the .err file and also reduce the space needed for the csscan Csmig tables. It will not affect the information in the .txt file.
This parameter may be useful for the first scan done on big databases to limit the size of the .err file and the Csmig tables.

Sample .err output using CAPTURE=N:

Database Scan Individual Exception Report


[Database Scan Parameters]

Parameter Value
------------------------------ ------------------------------------------------
Scan type Full database
Scan CHAR data? YES
Current database character set US7ASCII
New database character set AL32UTF8
Scan NCHAR data? NO
Array fetch buffer size 102400
Number of processes 3
Capture convertible data? NO
------------------------------ ------------------------------------------------

[Data Dictionary individual exceptions]

User : SYS
Table : AUD$
Column: NEW$OWNER
Type : VARCHAR2(30)
Number of Exceptions : 1
Max Post Conversion Data Size: 30

ROWID Exception Type Size Cell Data(first 30 bytes)
------------------ ------------------ ----- ------------------------------
AAAFccAAaAAAAyhAAL lossy conversion COMP_PROD
------------------ ------------------ ----- ------------------------------

[Application data individual exceptions]


User : COMP_PROD
Table : CIS_DEMO
Column: SHORT_NAME1
Type : VARCHAR2(30)
Number of Exceptions : 13
Max Post Conversion Data Size: 32

ROWID Exception Type Size Cell Data(first 30 bytes)
------------------ ------------------ ----- ------------------------------
AACQtnAC3AAAp2XAAF lossy conversion XT¬Q QVRLTQ
AACQtnAC3AAAqHsAAG lossy conversion VO¬C KTLUOC
AACQtnAC3AAAsMZAAD lossy conversion TAPIA QUI¬ONES EZEQUIEL
AACQtnAC5AAAH0/AAC lossy conversion UäBTA OLBQ ETEONOQ
AACQtnAC5AAAInqAAD lossy conversion VH¬TA ENQHROC XQJLOEOC
AACQtnAC5AAALzVAAB exceed column size 32 BTROVQ KQNRQ¬Q KONGOQ X RT YQ
AACQtnAC5AAALzVAAB lossy conversion BTROVQ KQNRQ¬Q KONGOQ X RT YQ
AACQtnAC5AAAMDDAAE lossy conversion ZHO¬CVTK EQNODJC U
AACQtnACvAAAM/mAAB lossy conversion UCRCS PHQV TRHQLRC UC¬O
AACQtnACvAAAOkLAAC lossy conversion ZHO¬CVTK QNOEOQ B
AACQtnACvAAAPUJAAB lossy conversion RHT¬QK NHOK
AACQtnACvAAARSTAAA lossy conversion IQNNQJJ WTLJIQ LCKQ QEH¬Q
AACQtnACvAAARsKAAF lossy conversion BH¬CA QVJCVOC
AACQtnACvAAASiPAAF lossy conversion XT¬Q NHOK BQLOQ
------------------ ------------------ ----- ------------------------------

In the .err this is listed:

* [Database Scan Parameters]: the used Csscan parameters.
* [Data Dictionary individual exceptions]: exceptions (LOSSY,TRUNCATION and CONVERTIBLE) for Data Dictionary objects.
* [Application data individual exceptions]: exceptions (LOSSY,TRUNCATION and CONVERTIBLE if CAPTURE=Y) for User objects.

For each affected column it lists

* User :
user name
* Table : table name
* Column: column name
* Type : data type of the column and the defined column length
* Number of Exceptions : the number of rows in this column that are lossy, convertible or truncation.
* Max Post Conversion Data Size: the maximum size of the data in this column after conversion to the new characterset.

Then for each row the ROWID, Exception Type, Size and Cell Data (first 30 bytes) are logged, the Cell Data in the .err file is not converted, it's simply dumped from the column/row.
The "exceed column size" message means this row is "truncation" and the new size of that row is mentioned under the "size" column. Practically you are only interested in the "Max Post Conversion Data Size" for the whole column.

Sometimes you see like in the example
Type : CLOB
Number of Exceptions : 0
Max Post Conversion Data Size: 0
this means the clob has no actual data but is intialised - hence it needs to be handled by Csalter .

B) The meaning of the status of data in the Conversion Summary of the .txt file:

The "Conversion summary" in the .txt file is the first important thing to look at. This is where you will see 4 columns, each representing a possible status for the data:

B.1) CHANGELESS data.

This data will not change of codepoint during the characterset conversion. In other words all those characters use the SAME codes/codepoints in the new characterset as in the old characterset. When using Csalter or for 8i/9i "Alter Database Character Set" this data needs no action.

B.2) CONVERTIBLE data.

This data is valid, but the characters will change to a different code point in the new characterset. When using Csalter / "Alter database character set" any User/Application Data that is "convertible" needs to be exported and truncated/deleted before the change of the characterset and imported afterwards.

Exp/imp NLS_LANG setting are simply AMERICAN_AMERICA.
NLS considerations in Import/Export - Frequently Asked Questions
Expdp/imdpd do not use the NLS_LANG for data conversion.<source>

When using Csalter / "Alter database character set" Convertible for Data Dictionary CHAR, VARCHAR2 and LONG data needs to be adressed (export is only applicable to a few tables, by default do NOT use export to solve this).
Convertible Data Dictionary CLOB in 8i/9i NEEDS to be addressed (export is only applicable to a few tables, by default do NOT use export to solve this).
Convertible Data Dictionary CLOB in 10g and up will be handled by Csalter.

The actual action on Data Dictionary convertible is quite depending on what table is reported.
Convertible data in data dictionary: Workarounds when changing character set
is a good place to start.

Seen in 10g an up 'Convertible' Data Dictionary CLOB data is also listed in the Csscan output but need NO action, so if you need to know on what Data Dictionary objects you need to take action you first need to filter out the convertible Data Dictionary CLOB data. There are selects in to make this easier.
Before exporting and truncating User data check for constraint definitions on the tables Script: To report Table Constraints
has also a select to make a nice list of all "User/Application Data" data that needs action.

B.3) TRUNCATION data.

This is a special case of "convertible" data, meaning that the amount of BYTES the data uses will expand in the conversion and the field it presently occupies is not large enough to hold the data post-conversion. Typically this happens when going from a single byte characterset (where one character is one byte) to a multi byte characterset (where on character can be one, two or more bytes) like Unicode (AL32UTF8/UTF8). If you do not take action this you will see ORA-01401: inserted value too large for column or from 10g onwards: ORA-12899: value too large for column during import.
Truncation in Data Dictionary objects is very rare and will be solved by using the steps for convertible Data Dictionary data.

Note that "Truncation" data is a special case of "Convertible" data , so "Truncation" data also needs to be exported and truncated/deleted before the change of the characterset and imported afterwards.

The Csscan utility gives information to help prepare for this scenario. In the scan.err file you can see something like the following:

User : RDOXNG
Table : T_RDOX_DOCTEXT
Column: VER
Type : CHAR(10)
Number of Exceptions : 1
Max Post Conversion Data Size: 12

What we see here is that some data will grow to 12 bytes in size (from the field 'Max Post Conversion Data Size'), and be 2 bytes larger than the column size of 10. Hence if you enlarge this column to CHAR(12) the 'Truncation' situation is resolved.

You can also use this procedure to have a overview of the "Max Post Conversion Data Size" for all columns who have "Truncation" data based on the last csscan result:

Note that when using the csscan SUPPRESS parameter this procedure may give incomplete results (not all tables or not the correct minimal needed data size).

conn / AS sysdba
SET serveroutput ON
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN
( SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name ,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type='EXCEED_SIZE'
ORDER BY u.owner_name,
u.table_name,
u.column_name
)
LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# =rec.owner_id
AND obj# =rec.table_id
AND col# =rec.column_id
AND intcol#=rec.column_intid;

DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' Bytes');
END LOOP;
END;
/

This will give the minimal amount of BYTES the column needs to be to accommodate the expansion.

To address this the column must be altered to fit the new data size prior the actual conversion and
* or increased to at least the "Max Post Conversion Data Size" by enlarging the columns size (in BYTES)
* or by using CHAR semantics. Note that when using CHAR semantics there are certain limits, see Examples and limits of BYTE and CHAR semantics usage.

Note that in some cases the expansion in BYTES is bigger then the max datalength of the datatype and then using CHAR semantics will also not help. This is 2000 BYTES for CHAR and 4000 BYTES for VARCHAR2.
In that case you or need to reduce the actual data or change to a datatype (like CLOB) that will allow you to store that length.

B.3.A) When changing to CHAR semantics:

When using Csalter or Alter Database Character Set and going to CHAR semantics this should be done after the characterset change but before you import back the exported convertible data (using for example the script in Note 313175.1 Changing columns to CHAR length semantics )

When using (full) export/import into an other database with the new characterset going to CHAR semantics can be done:
- before exporting the convertible data (using for example the script in Changing columns to CHAR length semantics ).
- before importing the data into the new database by pre-creating all the tables with CHAR semantics in the new database and imp with ignore=Y.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work.

While it is enough for the actual conversion to change only the columns reported to have "Truncation" to CHAR semantics, it's recommended to use CHAR semantics for all columns when going to a varying width characterset like AL32UTF8.

B.3.B) When enlarging the columns in BYTES:


When using Csalter or Alter Database Character Set and enlarging the columns using BYTE semantics this can be done:
- after the characterset change but before you import back the exported convertible data.
- before the characterset change and before exporting the convertible data.

When using export/import into an other database with the new characterset and enlarging the columns using BYTE semantics this can be done:
- before exporting the convertible data.
- before importing the data into the new database by pre-creating all the tables with the wider columns in BYTES and imp with ignore=Y.

B.4) LOSSY data.

This data is not a valid code point for the source NLS_CHARACTERSET or the target characterset you specified does not define that character. If no action is taken then this data will be 'lost' in the conversion. Every "lossy" character will be then converted to the "default replacement character". What that "default replacement character" is depends on the charactersets, for US7ASCII this is a question mark "?", for xx8ISO8859Px and xx8MSWIN125x charactersets this is a inverted question mark "¿".
Please note that every lossy is converted to the same "default replacement character" hence once you have done a conversion that resulted in lossy there is no way to "recover" the original data. If you have for example an US7ASCII database that contains the string "é

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9907339/viewspace-1055822/,如需轉載,請註明出處,否則將追究法律責任。

相關文章