RFC1952的部分翻譯及原文 (轉)[@more@]

以下內容只是1952中的一部分，其餘內容請參照原文。
2. Detailed specification

2.1. Overall conventions

下面的圖形表示一個位元組：
+---+
| | +---+

　　下面的圖形表示若干位元組：
+==============+
| |
+==============+

中所存貯的位元組並不存在“位順序”，因為位元組本身被看作是一個單元。
但是，當一個位元組被看作是一個0到255之間的整數時，就會有一些最重要的或是最不重
要的位。通常我們會將一個位元組中最重要的位寫在左邊，將幾個位元組中，最重要的位元組
寫在左邊。在圖表中，我們將一個位元組中的各位標上序號：位0表示最不重要的位等等：
Bytes stored within a computer do not have a "bit order", since
they are always treated as a unit. However, a byte consred as
an integer between 0 and 255 does have a most- and least-
significant bit, and since we write numbers with the most-
significant digit on the left, we also write bytes with the most-
significant bit on the left. In the diagrams below, we number the
bits of a byte so that bit 0 is the least-significant bit, i.e.,
the bits are numbered:

+--------+
|76543210|
+--------+
這篇文件不適用於位傳輸的情況，因為這裡所說的資料格式都是以位元組為單位的。
This document does not address the issue of the order in which
bits of a byte are transmitted on a bit-sequential medium, since
the data format described here is byte- rather than bit-oriented.

在計算機中，一個數可能佔用幾個位元組。這裡所說的多位元組資料都是將不重要的
部分存貯在低地址的位元組中，如520被儲存為：
Within a computer, a number may occupy multiple bytes. All
multi-byte numbers in the format described here are stored with
the least-significant byte first (at the lower memory address).
For example, the decimal number 520 is stored as:

0 1
+--------+--------+
|00001000|00000010|
+--------+--------+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8

2.2. File format
是由一系列連續的成員(被的資料單元）組成的。每一個成員格式
的說明見後面的章節。這些成員在檔案中都是一個接一個的排列的，而沒有其它的附加資訊。
A gzfile consists of a series of "members" (compressed data
sets). The format of each member is specified in the following
section. The members simply appear one after another in the file,
with no additional information before, between, or after them.

2.3. Member format
成員格式：每個成員都　有如下的結構：
Each member has the following structure:

+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|| (more--&gt)
+---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

+---+---+=================================+
| XLEN |...XLEN bytes of "extra field"...| (more--&gt)
+---+---+=================================+

(if FLG.FNAME set)

+=========================================+
|...original file name, zero-tenated...| (more--&gt)
+=========================================+

(if FLG.FCOMMENT set)

+===================================+
|...file comment, zero-terminated...| (more--&gt)
+===================================+

(if FLG.FHCRC set)

+---+---+
| CRC16 |
+---+---+

+=======================+
|...compressed blocks...| (more--&gt)
+=======================+

0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| CRC32 | ISIZE |
+---+---+---+---+---+---+---+---+

2.3.1. Member header and trailer
成員的頭部及尾部：
ID1 (IDentification 1)
ID2 (IDentification 2)
這兩個位元組是識別符號用來識別gzip檔案，有固定值：ID1 = 31,ID2 = 139；
These have the fixed values ID1 = 31 (0x1f, 37), ID2 = 139
(0x8b, 213), to identify the file as being in gformat.

CM (Compression Method)
這個位元組標識了檔案的壓縮方式。CM = 0-7的值是被保留的，CM = 8　表示
　“deflate”壓縮的方式，通常被gzip及使用。
This identifies the compression method used in the file. CM
= 0-7 are reserved. CM = 8 denotes the "deflate"
compression method, which is the one customarily used by
gzip and which is documented elsewhere.

FLG (FLaGs)
這個位元組被拆分成單獨的位：
This flag byte is divided into individual bits as follows:

bit 0 FTEXT
bit 1 FHCRC
bit 2 FEXTRA
bit 3 FNAME
bit 4 FCOMMENT
bit 5 reserved
bit 6 reserved
bit 7 reserved

如果FTEXT位被設定：則檔案可能是ASCII文字檔案。這是一個可選的
識別符號。壓縮可以檢查很小一部分的輸入資料，看看有沒有非ASCII碼的字元，如
果沒有，則可以設定這位。如果存在懷疑，可以清除這位，表示一個二進位制檔案。對於
有不同檔案格式（ASCII及二進位制）的來說，可以根據FTEXT來選擇適當的格式。
我們不指定設定這一位的規則，壓縮程式可以始終設定這一位為0，解壓程式也會
始終忽略這一位而讓其它的程式進行資料轉換工作。
If FTEXT is set, the file is probably ASCII text. This is
an optional indication, which the compressor may set by
checking a small amount of the input data to see whether any
non-ASCII characters are present. In case of doubt, FTEXT
is cleared, indicating binary data. For systems which have
different file formats for ascii text and binary data, the
decompressor can use FTEXT to choose the appropriate format.
We deliberately do not specify the algorithm used to set
this bit, since a compressor always has the option of
leaving it cleared and a decompressor always has the option
of ignoring it and letting some other program handle issues
of data conversion.

如果FHCRC位被設定，則gzip的頭部中，在被壓縮的資料前面，有
CRC16的部分。CRC16中包含有兩位元組的內容，它們是整個頭部內容（不包括CRC16
這兩位元組）的CRC32中兩個不重要的位元組。[FHCRC位永遠不會被1.2.4版本以上的
gzip所設定，即使它被1.2.4版本定義為不同的含義]
If FHCRC is set, a CRC16 for the gzip header is present,
immediately before the compressed data. The CRC16 consists
of the two least significant bytes of the CRC32 for all
bytes of the gzip header up to and not including the CRC16.
[The FHCRC bit was never set by versions of gzip up to
1.2.4, even though it was documented with a different
meaning in gzip 1.2.4.]

如果FEXTRA位被設定，則存在有可選的附加檔案。將在後幾節中敘述。
If FEXTRA is set, optional extra fields are present, as
described in a following section.

如果FNAME位設定，則提供了原始的檔名稱，由0位元組終止。
名稱必須由ISO8859-1中所定義的字元所組成。當使用EBCDIC或其它字符集
生成檔名的時候，檔名必須被轉換到ISO　LATIN－1字符集中。這個是被壓縮的
檔案的原始名字，不包括目錄部分。如果作業系統對檔名稱的大小寫字母不敏感，
則將檔名稱中的所有的字母強制轉換成小寫。如果資料不是從一個源始檔案壓縮而
來的，則不存在原始檔案的名稱。
If FNAME is set, an original file name is present,
terminated by a zero byte. The name must consist of ISO
8859-1 (LATIN-1) characters; on operating systems using
EBCDIC or any other character set for file names, the name
must be translated to the ISO LATIN-1 character set. This
is the original name of the file being compressed, with any
directory components removed, and, if the file being
compressed is on a file system with case insensitive names,
forced to lower case. There is no original file name if the
data was compressed from a other than a named file;
for example, if the source was stdin on a system, there
is no file name.

如果設定了FCOMMENT位，則提供有一個O－終結的檔案內容。這段內
容不被解釋，它只是被用來為人們所用。這部分內容必須包含有ISO　8859-1(LATIN-1)
字元。行終結符應該是0x0A。

If FCOMMENT is set, a zero-terminated file comment is
present. This comment is not interpreted; it is only
intended for human consumption. The comment must consist of
ISO 8859-1 (LATIN-1) characters. Line breaks should be
denoted by a single line feed character (10 decimal).

保留的FLG位必須是0。
Reserved FLG bits must be zero.

MTIME (Modification TIME)
MTIME：修改時間。這個部分提供了原始檔案在壓縮前的最新的修改時間。
時間是Unix格式的，是自從1970年1月1日0時0分0秒開始的秒數。如果被壓縮的內容不是
檔案，MTIME被設定為壓縮的開始時間。
This gives the most recent modification time of the original
file being compressed. The time is in Unix format, i.e.,
seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
may cause problems for MS-DOS and other systems that use
local rather than Universal time.) If the compressed data
did not come from a file, MTIME is set to the time at which
compression started. MTIME = 0 means no time stamp is
available.

XFL (eXtra FLags)
這個標誌會被特殊的壓縮方法所用到。“deflate”方法會這樣設定：

These flags are available for use by specific compression
methods. The "deflate" method (CM = 8) sets these flags as
follows:

使用最大的壓縮，最慢的演算法
XFL = 2 - compressor used maximum compression,
slowest algorithm
採用最快的演算法
XFL = 4 - compressor used fastest algorithm

OS (Operating System)
這個標誌指明瞭進行壓縮時系統的型別。這在用來決定文字檔案的行終結
符時十分有用。
This identifies the type of file system on which compression
took place. This may be useful in determining end-of-line
convention for text files. The currently defined values are
as follows:

0 - filesystem (MS-DOS, OS/2, NT/)
1 - Amiga
2 - VMS (or OpenVMS)
3 - Unix
4 - VM/CMS
5 - Atari TOS
6 - HPFS filesystem (OS/2, NT)
7 - Macintosh
8 - Z-System
9 - CP/M
10 - TOPS-20
11 - NTFS filesystem (NT)
12 - QDOS
13 - Acorn RIS
255 - unknown

XLEN (eXtra LENgth)
如果FLG。FEXTRA被設定了，這兩個位元組是可選的額外的內容的長度。
If FLG.FEXTRA is set, this gives the length of the optional
extra field. See below for details.

CRC32 (CRC-32)
這個是未壓縮資料的迴圈冗餘校驗值。
This contains a Cyclic Redundancy Check value of the
uncompressed data computed according to CRC-32 algorithm
used in the ISO 3309 standard and in section 8.1.1.6.2 of
ITU-T recommendation V.42. (See for
ordering ISO documents. See for an
online version of ITU-T V.42.)

ISIZE (Input SIZE)
這是原始資料的長度以2的32次方為模的值。
This contains the size of the original (uncompressed) input
data modulo 2^32.

2.3.1.1. Extra field
如果設定了FLG.FEXTRA位，則頭部中存在有這部分的內容，總長度是
XLEN位元組。它包含了一系列子域：

If the FLG.FEXTRA bit is set, an "extra field" is present in
the header, with total length XLEN bytes. It consists of a
series of subfields, each of the form:

+---+---+---+---+==================================+
|SI1|SI2| LEN |... LEN bytes of subfield data ...|
+---+---+---+---+==================================+

SI1和SI2提供了子域的ID，表示為兩個可以記憶的ASCII字元。SI2＝0
的值是為將來的使用而保留的。如下的ID是目前定義的：
SI1 and SI2 provide a subfield ID, typically two ASCII letters
with some mnemonic value. Jean-Loup Gailly
gzip@prep.ai.mit.edu> is maintaining a registry of subfield
IDs; please send him any subfield ID you wish to use. Subfield
IDs with SI2 = 0 are reserved for future use. The following
IDs are currently defined:

SI1 SI2 Data
---------- ---------- ----
0x41 ('A') 0x70 ('P') Apollo file type information

LEN欄位給出了子域的長度，包括最初的四個位元組。
LEN gives the length of the subfield data, excluding the 4
initial bytes.

2.3.1.2. Compliance
一個壓縮程式所產生的檔案應該有正確的ID1，ID2，CM，CRC32，
和ISIZE。但是可以將所有其它存在於可變長度的部分的欄位設定為預設值（255或
0）。必須設定所有有保留值為0；
A compliant compressor must produce files with correct ID1,
ID2, CM, CRC32, and ISIZE, but may set all the other fields in
the fixed-length part of the header to default values (255 for
OS, 0 for all others). The compressor must set all reserved
bits to zero.

解壓程式必須檢查ID1，ID2，CM，D而且，當這些值存在錯誤時，要
提供錯誤提示。必須要檢查：FEXTRA/XLEN, FNAME, FCOMMENT 和 FHCRC　至少這樣
可以跳過可選欄位。不需要檢查其它的頭部和尾部中的欄位。特別是解壓程式可以忽略
FTEXT　和　OS　而總是產生二進位制的輸。如果保留位非0，要給出錯誤提示，因為這一
位可能指出了一個新欄位的存在，而這又可能導致對後面資料的錯誤解釋。

A compliant decompressor must check ID1, ID2, and CM, and
provide an error indication if any of these have incorrect
values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
at least so it can skip over the optional fields if they are
present. It need not examine any other part of the header or
trailer; in particular, a decompressor may ignore FTEXT and OS
and always produce binary output, and still be compliant. A
compliant decompressor must give an error indication if any
reserved bit is non-zero, since such a bit could indicate the
presence of a new field that would cause subsequent data to be
interpreted incorrectly.

RFC1952的部分翻譯及原文 (轉)

相關文章