[20231013]CLOB型別的編碼問題.txt

lfree發表於2023-10-16

[20231013]CLOB型別的編碼問題.txt

--//昨天想看一串unicode的內容,看了一些連結:



National Character Set(NLS_NCHAR_CHARACTERSET) defines the encoding of NCHAR, NVARCHAR2, and NCLOB columns and is in 9i
and up consistently Unicode. eg. AL16UTF16
國家字符集(NLS_NCHAR_CHARACTERSET)定義了NCHAR、NVARCHAR2和NCLOB列的編碼,並在9i中表示和一致的統一碼。例如AL16UTF16

Character Set(NLS_CHARACTERSET) defines the encoding of CHAR, VARCHAR2, LONG, and CLOB columns, these can also be used
for storing Unicode. eg AL32UTF8 or UTF8
字符集(NLS_CHARACTERSET)定義了CHAR、VARCHAR2、LONG和CLOB列的編碼,也可以使用它們用於儲存Unicode。如AL32UTF8或UTF8
--//注:感覺這裡不對,clob列的編碼好像也是AL16UTF16.我給測試看看是否這樣的情況.

The national Character set (NLS_NCHAR_CHARACTERSET) is used for data stored in NCHAR, NVARCHAR2, and NCLOB datatypes and
is a character set that is defined in addition to the (standard) database character set (NLS_CHARACTERSET), which is
used for CHAR, VARCHAR2, LONG and CLOB datatypes.
國家字符集(NLS_NCHAR_CHARACTERSET)用於儲存在NCHAR、NVARCHAR2和NCLOB資料型別和中的資料,和是在(標準)資料庫字符集
(NLS_CHARACTERSET)之外定義的字符集,即用於CHAR、VARCHAR2、LONG和CLOB資料型別。

1.環境:
$ env | grep NLS
NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
NLS_TIME_TZ_FORMATx=HH24.MI.SSXFF TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT=YYYY-MM-DD HH24:MI:SS.FF TZH:TZM
NLS_TIMESTAMP_FORMAT=YYYY-MM-DD HH24:MI:SS.FF
NLS_DATE_FORMAT=YYYY-MM-DD HH24:MI:SS

SCOTT@book> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SCOTT@book> select parameter,value from NLS_DATABASE_PARAMETERS where parameter in ('NLS_CHARACTERSET','NLS_NCHAR_CHARACTERSET');
PARAMETER              VALUE
---------------------- ----------
NLS_CHARACTERSET       ZHS16GBK
NLS_NCHAR_CHARACTERSET AL16UTF16

2.測試:
SCOTT@book> CREATE TABLE t1 ( id number,vc varchar2(30),nc nvarchar2(30),cc clob);
Table created.

SCOTT@book> insert into t1 values(1,'a測試中文a','a測試中文b','a測試中文c');
1 row created.

SCOTT@book> commit ;
Commit complete.

SCOTT@book> alter system checkpoint;
System altered.

SCOTT@book> select rowid from t1;
ROWID
------------------
AAAWsoAAEAAAAKvAAB

SCOTT@book> @ rowid AAAWsoAAEAAAAKvAAB
    OBJECT       FILE      BLOCK        ROW ROWID_DBA            DBA                  TEXT
---------- ---------- ---------- ---------- -------------------- -------------------- ----------------------------------------
     92968          4        687          1  0x10002AF           4,687                alter system dump datafile 4 block 687 ;

SCOTT@book> alter system dump datafile 4 block 687 ;
System altered.

3.檢查跟蹤檔案:
Block header dump:  0x010002af
 Object id on Block? Y
 seg/obj: 0x16b28  csc: 0x03.1873920b  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x10002a8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.004.0000d5b4  0x00c00bf1.2efa.1e  --U-    1  fsc 0x0000.18739220
0x02   0x000a.00e.0000d5dd  0x00c00bf1.2efa.1d  --U-    1  fsc 0x0043.1873920d
bdba: 0x010002af
data_block_dump,data header at 0x7fe7c95a4264
===============
tsiz: 0x1f98
hsiz: 0x16
pbl: 0x7fe7c95a4264
     76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f04
avsp=0x1eee
tosp=0x1f33
0xe:pti[0]  nrow=2  offs=0
0x12:pri[0] offs=0x1f53
0x14:pri[1] offs=0x1f04
block_row_dump:
tab 0, row 0, @0x1f53
tl: 2 fb: --HDFL-- lb: 0x2
tab 0, row 1, @0x1f04
tl: 79 fb: --H-FL-- lb: 0x1  cc: 4
col  0: [ 2]  c1 02
col  1: [10]  61 b2 e2 ca d4 d6 d0 ce c4 61
col  2: [12]  00 61 6d 4b 8b d5 4e 2d 65 87 00 62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
col  3: [48]
 00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 00 06 72 79 80 00 1c 09 00 00
 00 00 00 00 0c 00 00 00 00 00 01 00 61 6d 4b 8b d5 4e 2d 65 87 00 63
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LOB
Locator:
  Length:        84(48)
  Version:        1
  Byte Length:    2
  LobID: 00.00.00.01.00.00.06.72.79.80
  Flags[ 0x02 0x0c 0x80 0x00 ]:
    Type: CLOB
    Storage: BasicFile
    Enable Storage in Row
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: VaringWidthReadWrite
  Inode:
    Size:     28
    Flag:     0x09 [ Valid DataInRow ]
    Future:   0x00 (should be '0x00')
    Blocks:   0
    Bytes:    12
    Version:  00000.0000000001
    Inline data[12]
Dump of memory from 0x00007FE7C95A61AB to 0x00007FE7C95A61B7

--//61 b2 e2 ca d4 d6 d0 ce c4 61 = a測試中文a , varchar2能對上.
--//注意看下劃線可以發現nvarchar2,clob的編碼一樣的.都是unicode big endian編碼.

$ echo fffe 00 61 6d 4b 8b d5 4e 2d 65 87 00 63 | xxd -r -p  > a.txt
$ file a.txt
a.txt: Big-endian UTF-16 Unicode character data, with no line terminators

--//再上傳到windows的機器,使用記事本開啟就可以驗證編碼就是Big-endian UTF-16 Unicode character data.
--//我有點無法理解,intel系列機器不是採用Little endian嗎,為什麼nvarchar2,COLB使用的卻是Big endian.

SCOTT@book> select * from V$TRANSPORTABLE_PLATFORM;
PLATFORM_ID PLATFORM_NAME                     ENDIAN_FORMAT
----------- --------------------------------- --------------
          1 Solaris[tm] OE (32-bit)           Big
          2 Solaris[tm] OE (64-bit)           Big
          7 Microsoft Windows IA (32-bit)     Little
         10 Linux IA (32-bit)                 Little
          6 AIX-Based Systems (64-bit)        Big
          3 HP-UX (64-bit)                    Big
          5 HP Tru64 UNIX                     Little
          4 HP-UX IA (64-bit)                 Big
         11 Linux IA (64-bit)                 Little
         15 HP Open VMS                       Little
          8 Microsoft Windows IA (64-bit)     Little
          9 IBM zSeries Based Linux           Big
         13 Linux x86 64-bit                  Little
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~        
         16 Apple Mac OS                      Big
         12 Microsoft Windows x86 64-bit      Little
         17 Solaris Operating System (x86)    Little
         18 IBM Power Based Linux             Big
         19 HP IA Open VMS                    Little
         20 Solaris Operating System (x86-64) Little
         21 Apple Mac OS (x86-64)             Little
20 rows selected.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2989216/,如需轉載,請註明出處,否則將追究法律責任。

相關文章