Csscan output explained [ID 444701.1]

		Modified 23-JUN-2011 Type REFERENCE Status PUBLISHED

In this Document

Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.7.4 and later [Release: 8.1.7 and later ]
Information in this document applies to any platform.
Oracle Server Enterprise Edition
Oracle Server Enterprise Edition - Version: 8.1.7.4 to 11.2.0.1.0

Purpose

Whenever a characterset conversion is attempted or involved one should always run the Csscan utility.

To install Csscan please see:
Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)

For 8i and 9i please do install Csscan version 1.2 , do not use the old csscan versions included in 8i/9i .

This tool analyzes the code points in the database and verifies that they are known in the source the database characterset and known in the target database characterset. The output of the Csscan utility is vital to determining the health of the data in the database, and this becomes especially crucial when changing charactersets, as well as during import and export operations or database link transfers when source and target database differ in characterset. Knowing what the Csscan output shows, and what it means and it's limits helps you to ensure the integrity of the data.

Scope

For all DBAs performing a characterset conversion, import/export between databases with different charactersets, and those who wish to learn more about Csscan output. It covers the basic conventions of the output files, what they mean, and what can be done to resolve the different scenarios that can arise.

Csscan output explained

A) Csscan output files.

Csscan generates 3 output files each run. The name of the files can be set by the user at runtime using the csscan LOG= parameter, but for the sake of this example we will assume use of the name 'charcheck' for the LOG=charcheck parameter. The filenames are then:

charcheck.out - Scan log -- provides a complete list of all scanned items
charcheck.txt - Database Scan Summary -- Summary of scan results
charcheck.err - Individual Exceptions -- Excerpted view of scan exceptions.

If you log a SR about a characterset change then please always provide the .txt and .err file. Zip them before uploading. If the .err file is really to big then please provide at least the part until the [Application data individual exceptions] header.

A.1) The .out file

The scan.out shows a log of the Csscan run, it's advisable to check the .out file for ora- errors after the csscan run.

A.2) The .txt file

The scan.txt file gives a snapshot view of the state of the database data.
Here is a sample output (some sections shortened) of a Csscan TOCHAR=AL32UTF8 scan:

Database Scan Summary Report

Time Started  : 2008-12-05 10:03:07
Time Completed: 2008-12-06 05:52:53

Process ID         Time Started       Time Completed
---------- -------------------- --------------------
         1  2008-12-05 10:18:53  2008-12-06 05:52:11
..
         7  2008-12-05 10:18:54  2008-12-06 05:52:11
         8  2008-12-05 10:18:54  2008-12-06 05:52:11
---------- -------------------- --------------------

[Database Size]

Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                         12,586.25M       5,413.75M      18,000.00M         343.00K
ACED                            9,807.38M       2,192.63M      12,000.00M          13.38M
....
SERUSD                            267.44M       1,732.56M       2,000.00M            .00K
SERUSX                             64.00K       1,999.94M       2,000.00M            .00K
XXLAAPPS                        7,285.88M         714.13M       8,000.00M            .00K
XXCPX                             640.06M       1,359.94M       2,000.00M            .00K
UNDO_TS1                        2,141.13M      93,858.88M      96,000.00M            .00K
UNDO_TS2                       24,519.00M      35,481.00M      60,000.00M            .00K
------------------------- --------------- --------------- --------------- ---------------
Total                       3,581,509.00M     586,521.98M   4,168,030.98M         586.55M

The size of the largest CLOB is 6354008 bytes

[Database Scan Parameters]

Parameter                      Value                                           
------------------------------ ------------------------------------------------
CSSCAN Version                 v1.2                                            
Instance Name                  SSJDV1                                          
Database Version               9.2.0.7.0                                       
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Database character set         US7ASCII                                        
FROMCHAR                       US7ASCII                                        
TOCHAR                         AL32UTF8                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1000000                                         
Number of processes            8                                               
Capture convertible data?      YES                                             
------------------------------ ------------------------------------------------

[Scan Summary]

Some character type data in the data dictionary are not convertible to the new character set
Some character type application data are not convertible to the new character set

[Data Dictionary Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                   136,913,559                0                0              154
CHAR                                 7                0                0                0
LONG                         6,614,799                0                0                0
CLOB                                58               81                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                      143,528,423               81                0              154
Total in percentage                100%               0%               0%               0%

[Application Data Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2               189,317,244,480                0           14,326        7,455,330
CHAR                       977,410,022                0                0                0
LONG                        36,450,616                0                0              648
CLOB                       585,086,356          203,702                0              523
--------------------- ---------------- ---------------- ---------------- ----------------
Total                  190,916,191,474          203,702           14,326        7,456,501
Total in percentage                100%               0%               0%               0%

[Distribution of Convertible Data per Table]

USER.TABLE                                              Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
ACEREP.CXCOPYLOGMSG                                               0                0                4
..
APPLSYS.ALR_OUTPUT_HISTORY                                        0                0                8
APPLSYS.FND_COMPILED_DESCRIPTIVE_FLEXS                            0                0                1
APPLSYS.FND_CONCURRENT_PROGRAMS_TL                                0                0                1
APPLSYS.FND_CONC_PROG_ANNOTATIONS                                20                0                0
APPLSYS.FND_DESCR_FLEX_COL_USAGE_TL                               0                0                2
APPLSYS.FND_DOCUMENTS_LONG_TEXT                                   0                0              623
APPLSYS.FND_DOCUMENTS_SHORT_TEXT                                  0               10            6,414
APPLSYS.FND_DOCUMENTS_TL                                          0                8            4,694
APPLSYS.FND_FLEX_VALUES_TL                                        0                0               20
APPLSYS.FND_LOBS                                                  0                1            3,894
APPLSYS.FND_OAM_CONTEXT_FILES                                     6                0                0
APPLSYS.FND_OAM_CSA_COLLECTIONS                                   3                0                1
APPLSYS.FND_OAM_DOC_LINK                                          1                0                0
APPLSYS.FND_REGISTRATIONS                                         0                0                4
APPLSYS.FND_REQUEST_GROUPS                                        0                1                2
APPLSYS.FND_USER_PREFERENCES                                      0                0                2
APPLSYS.JDR_ATTRIBUTES                                            0                0                1
APPLSYS.WF_ITEM_ATTRIBUTE_VALUES                             41,967                0              204
APPLSYS.WF_LOCAL_ROLES                                            0                0            5,019
APPLSYS.WF_NOTIFICATION_ATTRIBUTES                                0                0              857
APPLSYS.WF_NOTIFICATION_OUT                                      15                0                0
APPLSYS.WF_ROUTING_RULES                                          0                0                2
..
SYS.HISTGRM$                                                      0                0                6
SYS.METASTYLESHEET                                               58                0                0
SYS.RULE$                                                        23                0                0
SYS.SOURCE$                                                       0                0              147
SYSCASE.SDW_PROGRAM_LOCATIONS                                     0                0                1
SYSTEM.CRTBOX                                                     0               55               55
-------------------------------------------------- ---------------- ---------------- ----------------

[Distribution of Convertible Data per Column]

USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
ACEREP.CXCOPYLOGMSG|MESSAGE                                       0                0                4
ACEREP.CXEXP|EXPSTRING                                            0                0               10
ACEREP.CXEXPORTLOGMSG|MESSAGE                                     0                0                1
ACEREP.CXKBSBLOB|KBSBLOB                                          0                0               23
ACEREP.CXPURGEKBSELEMENT043004|PURGE_LOG                     17,804                0                0
...
SYS.HISTGRM$|EPVALUE                                              0                0                6
SYS.METASTYLESHEET|STYLESHEET                                    58                0                0
SYS.RULE$|CONDITION                                              23                0                0
SYS.SOURCE$|SOURCE                                                0                0              147
-------------------------------------------------- ---------------- ---------------- ----------------

[Indexes to be Rebuilt]

USER.INDEX on USER.TABLE(COLUMN)                                                         
-----------------------------------------------------------------------------------------
APPLSYS.FND_CONCURRENT_PROGRAMS_TL_U2 on APPLSYS.FND_CONCURRENT_PROGRAMS_TL(APPLICATION_ID)
...
GCTA.COUNTRY_CODE_IDX on GCTA.CFI_HTS_MASS_APPROVE(COUNTRY_CODE)
-----------------------------------------------------------------------------------------

The .txt file shows:
* Time Started / Time Completed: duration of the Csscan run. Csscan will do a fetch of all character data, so running time is in most cases at least the time to do a full export.
* [Database Size]: the size of the data within the database, the Expansion column (if applicable) gives an estimation on how much more place you need in the current tablespace when going to the new characterset. The Tablespace Expansion for tablespace X is calculated as the grand total of the differences between the byte length of a string converted to the target character set and the original byte length of this string over all strings scanned in tables in X. The distribution of values in blocks, PCTFREE, free extents, etc., are not taken into account.
* [Database Scan Parameters]: the parameters used to run Csscan
* [Scan Summary]: which gives you directly an idea if you can do a full exp/imp, use Csalter or "Alter Database Character Set" as described in point D)
* [Data Dictionary Conversion Summary]: gives an overview of the amount of Changeless, Convertible, Truncation or Lossy data there is in the Data Dictionary.
* [Application Data Conversion Summary]: gives an overview of the amount of Changeless, Convertible, Truncation or Lossy data there is in User data.
* [Distribution of Convertible Data per Table]: gives a breakdown on table basis.
* [Distribution of Convertible Data per Column]: gives a breakdown on column basis.
* [Indexes to be Rebuilt]: gives which indexes are going to be affected by convertible data. The name of the section is bit misleading. When using full export/import there nothing to do on those indexes. When using Cslater/alter database characterset together with a partial export/import it depends on the amount of 'convertible' data in the underlying columns. If only a few rows in the underlying columns are 'convertible' then there is nothing to do (the indexes do not need to be rebuild as such). But if you have a lot of 'convertible' data in underlying columns it might be a good idea to drop and recreate them after the import, simply for performance reasons. The only exception is an index on a CHAR/NCHAR column that you need to adapt for "truncation". In that case all key values of a CHAR/NCHAR index key have to be padded with blanks to the new length and it may be more efficient to drop and recreate the index.
* [Truncation Due To Character Semantics]: (not often seen) This can be seen if you use Char Semantics in the current database.The Truncation Due to Character Semantics section identifies the number of data cells that would be truncated if they were converted to the target character set (for example, by the SQL CONVERT function or another inline conversion process) before the database character set is updated with the Csalter script. If the data conversion occurs after the database character set is changed (= you use export/import for convertible data), then this section can be ignored.

A.3) The .err file.

The output in the .err file depends on the CAPTURE=Y or CAPTURE=N Csscan parameter.

If you use CAPTURE=N then the .err file will only log rows who are Lossy or Truncation. Convertible data is not logged in the .err file.

If you use CAPTURE=Y then the .err file will log rows who are Convertible, Lossy or Truncation. Using CAPTURE=Y may also increase the running time of csscan (and the space needed for the csmig tables) if there are a lot of 'Convertible' seen for every 'Convertible' row an insert is done in a table in the csmig schema

The SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged for a table. Using SUPPRESS=1000 will log max 1000 rows for a table in the .err file and also reduce the space needed for the csscan Csmig tables. It will not affect the information in the .txt file.
This parameter may be useful for the first scan done on big databases to limit the size of the .err file and the Csmig tables.

Sample .err output using CAPTURE=N:

Database Scan Individual Exception Report


[Database Scan Parameters]

Parameter                      Value                                           
------------------------------ ------------------------------------------------
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Current database character set US7ASCII                                        
New database character set     AL32UTF8                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        102400                                          
Number of processes            3                                               
Capture convertible data?      NO                                              
------------------------------ ------------------------------------------------

[Data Dictionary individual exceptions]

User  : SYS
Table : AUD$
Column: NEW$OWNER
Type  : VARCHAR2(30)
Number of Exceptions         : 1         
Max Post Conversion Data Size: 30        

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAAFccAAaAAAAyhAAL lossy conversion         COMP_PROD                     
------------------ ------------------ ----- ------------------------------

[Application data individual exceptions]


User  : COMP_PROD
Table : CIS_DEMO
Column: SHORT_NAME1
Type  : VARCHAR2(30)
Number of Exceptions         : 13        
Max Post Conversion Data Size: 32        

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AACQtnAC3AAAp2XAAF lossy conversion         XT¬Q QVRLTQ                   
AACQtnAC3AAAqHsAAG lossy conversion         VO¬C KTLUOC                   
AACQtnAC3AAAsMZAAD lossy conversion         TAPIA QUI¬ONES EZEQUIEL       
AACQtnAC5AAAH0/AAC lossy conversion         UäBTA OLBQ ETEONOQ            
AACQtnAC5AAAInqAAD lossy conversion         VH¬TA ENQHROC XQJLOEOC        
AACQtnAC5AAALzVAAB exceed column size    32 BTROVQ KQNRQ¬Q KONGOQ X  RT YQ
AACQtnAC5AAALzVAAB lossy conversion         BTROVQ KQNRQ¬Q KONGOQ X  RT YQ
AACQtnAC5AAAMDDAAE lossy conversion         ZHO¬CVTK EQNODJC U            
AACQtnACvAAAM/mAAB lossy conversion         UCRCS PHQV TRHQLRC UC¬O       
AACQtnACvAAAOkLAAC lossy conversion         ZHO¬CVTK QNOEOQ B             
AACQtnACvAAAPUJAAB lossy conversion         RHT¬QK NHOK                   
AACQtnACvAAARSTAAA lossy conversion         IQNNQJJ WTLJIQ LCKQ QEH¬Q     
AACQtnACvAAARsKAAF lossy conversion         BH¬CA QVJCVOC                 
AACQtnACvAAASiPAAF lossy conversion         XT¬Q NHOK BQLOQ               
------------------ ------------------ ----- ------------------------------

In the .err this is listed:

* [Database Scan Parameters]: the used Csscan parameters.
* [Data Dictionary individual exceptions]: exceptions (LOSSY,TRUNCATION and CONVERTIBLE) for Data Dictionary objects.
* [Application data individual exceptions]: exceptions (LOSSY,TRUNCATION and CONVERTIBLE if CAPTURE=Y) for User objects.

For each affected column it lists

* User : user name
* Table : table name
* Column: column name
* Type : data type of the column and the defined column length
* Number of Exceptions : the number of rows in this column that are lossy, convertible or truncation.
* Max Post Conversion Data Size: the maximum size of the data in this column after conversion to the new characterset.

Then for each row the ROWID, Exception Type, Size and Cell Data (first 30 bytes) are logged, the Cell Data in the .err file is not converted, it's simply dumped from the column/row.
The "exceed column size" message means this row is "truncation" and the new size of that row is mentioned under the "size" column. Practically you are only interested in the "Max Post Conversion Data Size" for the whole column.

Sometimes you see like in the example
Type : CLOB
Number of Exceptions : 0
Max Post Conversion Data Size: 0
this means the clob has no actual data but is intialised - hence it needs to be handled by Csalter .

B) The meaning of the status of data in the Conversion Summary of the .txt file:

The "Conversion summary" in the .txt file is the first important thing to look at. This is where you will see 4 columns, each representing a possible status for the data:

B.1) CHANGELESS data.

This data will not change of codepoint during the characterset conversion. In other words all those characters use the SAME codes/codepoints in the new characterset as in the old characterset. When using Csalter or for 8i/9i "Alter Database Character Set" this data needs no action.

B.2) CONVERTIBLE data.

This data is valid, but the characters will change to a different code point in the new characterset. When using Csalter / "Alter database character set" any User/Application Data that is "convertible" needs to be exported and truncated/deleted before the change of the characterset and imported afterwards.

Exp/imp NLS_LANG setting are simply AMERICAN_AMERICA.
NLS considerations in Import/Export - Frequently Asked Questions
Expdp/imdpd do not use the NLS_LANG for data conversion.<source>

When using Csalter / "Alter database character set" Convertible for Data Dictionary CHAR, VARCHAR2 and LONG data needs to be adressed (export is only applicable to a few tables, by default do NOT use export to solve this).
Convertible Data Dictionary CLOB in 8i/9i NEEDS to be addressed (export is only applicable to a few tables, by default do NOT use export to solve this).
Convertible Data Dictionary CLOB in 10g and up will be handled by Csalter.

The actual action on Data Dictionary convertible is quite depending on what table is reported.
Convertible data in data dictionary: Workarounds when changing character set
is a good place to start.

Seen in 10g an up 'Convertible' Data Dictionary CLOB data is also listed in the Csscan output but need NO action, so if you need to know on what Data Dictionary objects you need to take action you first need to filter out the convertible Data Dictionary CLOB data. There are selects in to make this easier.
Before exporting and truncating User data check for constraint definitions on the tables Script: To report Table Constraints
has also a select to make a nice list of all "User/Application Data" data that needs action.

B.3) TRUNCATION data.

This is a special case of "convertible" data, meaning that the amount of BYTES the data uses will expand in the conversion and the field it presently occupies is not large enough to hold the data post-conversion. Typically this happens when going from a single byte characterset (where one character is one byte) to a multi byte characterset (where on character can be one, two or more bytes) like Unicode (AL32UTF8/UTF8). If you do not take action this you will see ORA-01401: inserted value too large for column or from 10g onwards: ORA-12899: value too large for column during import.
Truncation in Data Dictionary objects is very rare and will be solved by using the steps for convertible Data Dictionary data.

Note that "Truncation" data is a special case of "Convertible" data , so "Truncation" data also needs to be exported and truncated/deleted before the change of the characterset and imported afterwards.

The Csscan utility gives information to help prepare for this scenario. In the scan.err file you can see something like the following:

User  : RDOXNG
Table : T_RDOX_DOCTEXT
Column: VER
Type  : CHAR(10)
Number of Exceptions         : 1         
Max Post Conversion Data Size: 12

What we see here is that some data will grow to 12 bytes in size (from the field 'Max Post Conversion Data Size'), and be 2 bytes larger than the column size of 10. Hence if you enlarge this column to CHAR(12) the 'Truncation' situation is resolved.

You can also use this procedure to have a overview of the "Max Post Conversion Data Size" for all columns who have "Truncation" data based on the last csscan result:

Note that when using the csscan SUPPRESS parameter this procedure may give incomplete results (not all tables or not the correct minimal needed data size).

conn / AS sysdba
SET serveroutput ON
DECLARE
  newmaxsz NUMBER;
BEGIN
  FOR rec IN
  ( SELECT DISTINCT u.owner_name,
    u.table_name,
    u.column_name ,
    u.column_type,
    u.owner_id,
    u.table_id,
    u.column_id,
    u.column_intid
  FROM csmv$errors u
  WHERE u.error_type='EXCEED_SIZE'
  ORDER BY u.owner_name,
    u.table_name,
    u.column_name
  )
  LOOP
    SELECT MAX(cnvsize)
    INTO newmaxsz
    FROM csm$errors
    WHERE usr# =rec.owner_id
    AND obj#   =rec.table_id
    AND col#   =rec.column_id
    AND intcol#=rec.column_intid;
    

    DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| 
rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' 
Bytes');
  END LOOP;
END;
/

This will give the minimal amount of BYTES the column needs to be to accommodate the expansion.

To address this the column must be altered to fit the new data size prior the actual conversion and
* or increased to at least the "Max Post Conversion Data Size" by enlarging the columns size (in BYTES)
* or by using CHAR semantics. Note that when using CHAR semantics there are certain limits, see Examples and limits of BYTE and CHAR semantics usage.

Note that in some cases the expansion in BYTES is bigger then the max datalength of the datatype and then using CHAR semantics will also not help. This is 2000 BYTES for CHAR and 4000 BYTES for VARCHAR2.
In that case you or need to reduce the actual data or change to a datatype (like CLOB) that will allow you to store that length.

B.3.A) When changing to CHAR semantics:

When using Csalter or Alter Database Character Set and going to CHAR semantics this should be done after the characterset change but before you import back the exported convertible data (using for example the script in Note 313175.1 Changing columns to CHAR length semantics )

When using (full) export/import into an other database with the new characterset going to CHAR semantics can be done:
- before exporting the convertible data (using for example the script in Changing columns to CHAR length semantics ).
- before importing the data into the new database by pre-creating all the tables with CHAR semantics in the new database and imp with ignore=Y.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work.

While it is enough for the actual conversion to change only the columns reported to have "Truncation" to CHAR semantics, it's recommended to use CHAR semantics for all columns when going to a varying width characterset like AL32UTF8.

B.3.B) When enlarging the columns in BYTES:

When using Csalter or Alter Database Character Set and enlarging the columns using BYTE semantics this can be done:
- after the characterset change but before you import back the exported convertible data.
- before the characterset change and before exporting the convertible data.

When using export/import into an other database with the new characterset and enlarging the columns using BYTE semantics this can be done:
- before exporting the convertible data.
- before importing the data into the new database by pre-creating all the tables with the wider columns in BYTES and imp with ignore=Y.

B.4) LOSSY data.

This data is not a valid code point for the source NLS_CHARACTERSET or the target characterset you specified does not define that character. If no action is taken then this data will be 'lost' in the conversion. Every "lossy" character will be then converted to the "default replacement character". What that "default replacement character" is depends on the charactersets, for US7ASCII this is a question mark "?", for xx8ISO8859Px and xx8MSWIN125x charactersets this is a inverted question mark "¿".
Please note that every lossy is converted to the same "default replacement character" hence once you have done a conversion that resulted in lossy there is no way to "recover" the original data. If you have for example an US7ASCII database that contains the string "é

Csscan output explained [ID 444701.1]