RFC1951的部分翻譯及原文(1/2) (轉)[@more@]

下面只是文件中的第1,2,3,節，其餘部分請參照原文。
1. Introduction

1.1. Purpose

本規範是用來定義一個無損的資料格式：
The purpose of this specification is to define a loess
compressed data format that:
它不依賴於型別，操作型別，系統型別，字符集。因此
可以被用來進行。
* Is independent of CPU type, operating system, file system,
and character set, and hence can be used for interchange;
能夠被生成或是去除，甚至是使用一個預先確定的中間存貯器的邊界
長度，對一個任意長的連續輸入資料流進行操作。因此可以被用在資料通訊及相似的
結構中，如過濾器。
* Can be produced or consumed, even for an arbitrarily long
sequentially presented input data stream, using only an a
priori bounded amount of intermediate storage, and hence
can be used in data communications or similar structures
such as Unix filters;
其壓縮可與目前的可用的，最普遍的壓縮方法相比擬。而且在
一些方面還要好於壓縮。
* Compresses data with efficiency comparable to the best
currently available general-purpose compression methods,
and in particular consrably better than the "compress"
program;

* Can be implemented readily in a manner not covered by
patents, and hence can be practiced freely;

* Is compatible with the file format produced by the current
widely used gzutility, in that confong decompressors
will be able to read data produced by the existing
compressor.

這篇文件所指的資料格式不包括：
The data format defined by this specification does not attempt to:

允許隨機存取壓縮資料。
* Allow ran access to compressed data;
壓縮專門的資料。
* Compress specialized data (e.g., raster graphics) as well
as the best currently available specialized algorithms.

A simple counting argument shows that no lossless compression
algorithm can compress every possible input data set. For the
format defined here, the worst case expansion is 5 bytes per 32K-
byte block, i.e., a size increase of 0.015% for large data sets.
English text usually compresses by a factor of 2.5 to 3;
executable files usually compress somewhat less; graphical data
such as raster images may compress much more.

1.2. Intended audience
這篇文件被用來實現壓縮資料為“deflate”格式或從“deflate”解壓資料。
This specification is intended for use by implementors of software
to compress data into "deflate" format and/or decompress data from
"deflate" format.

The text of the specification assumes a basic background in
programming at the level of bits and other primitive data
representations. Familiarity with the technique of Huffman coding
is helpful but not required.

1.3. pe
這篇文件說明了一種方法來將一個位元組序列描述成一個位序列，還說明了一種方
法來將位序列組合成位元組序列。
The specification specifies a method for representing a sequence
of bytes as a (usually shorter) sequence of bits, and a method for
packing the latter bit sequence into bytes.

1.4. Compliance

Unless otherwise indicated below, a compliant decompressor must be
able to accept and decompress any data set that conforms to all
the specifications presented here; a compliant compressor must
produce data sets that confoto all the specifications presented
here.

1.5. Definitions of terms and conventions used

Byte: 8 bits stored or transmitted as a unit (same as an octet).
For this specification, a byte is exactly 8 bits, even on machines

which store a character on a number of bits different from eight.
See below, for the numbering of bits within a byte.

String: a sequence of arbitrary bytes.

1.6. Changes from previous versions

There have been no technical changes to the deflate format since
version 1.1 of this specification. In version 1.2, some
terminology was changed. Version 1.3 is a conversion of the
specification to RFC style.

2. Compressed representation overview

一個壓縮的資料集包含一系列的塊，與輸入資料的塊相對應。塊的大小是任意
的，但是非壓縮的塊要在65535位元組之內。

A compressed data set consists of a series of blocks, corresponding
to successive blocks of input data. The block sizes are arbitrary,
except that non-compressible blocks are limited to 65,535 bytes.

每一塊的壓縮都使用了LZ77法則和Huffman編碼方法。每塊的Huffman樹都
和它的前一塊及後一塊無關。LZ77法則可以參考所複製的前一個串中的前32K內容。
Each block is compressed using a combination of the LZ77 algorithm
and Huffman coding. The Huffman trees for each block are independent
of those for previous or subsequent blocks; the LZ77 algorithm may
use a reference to a duplicated string occurring in a previous block,
up to 32K input bytes before.

每個塊包含了兩部分：1)一對Huffman編碼樹，它描述了壓縮內容的表示方法。
2)還有就是壓縮資料的部分。（其中的Huffman樹也是經過了Huffman編碼的）壓縮後的
資料包括了兩種型別。1）文字位元組（是字串的一部分，這個字串不是所複製的
前32K的內容。2）指向複製的字串的指標，這個指標的形式是一個離>對。這個使用在"deflate"格式中的形式限定了距離為32K及長度為258位元組。但是
沒有限制塊的大小(除了沒有壓縮的資料以外)。

Each block consists of two parts: a pair of Huffman code trees that
describe the representation of the compressed data part, and a
compressed data part. (The Huffman trees themselves are compressed
using Huffman encoding.) The compressed data consists of a series of
elements of two types: literal bytes (of strings that have not been
detected as duplicated within the previous 32K input bytes), and
pointers to duplicated strings, where a pointer is represented as a
pair . The representation used in the
"deflate" format limits distances to 32K bytes and lengths to 258
bytes, but does not limit the size of a block, except for
uncompressible blocks, which are limited as noted above.

壓縮資料中的每種型別值(文字，距離及長度)都使用Huffman編碼方法，使用一
個編碼樹編碼文字和長度，使用另一個獨立的編碼樹來編碼距離。每一塊的編碼樹都以
壓縮的方式存在於那一塊中壓縮的資料之前的位置。
Each type of value (literals, distances, and lengths) in the
compressed data is represented using a Huffman code, using one code
tree for literals and lengths and a separate code tree for distances.
The code trees for each block appear in a compact form just before
the compressed data for that block.

RFC1951的部分翻譯及原文(1/2) (轉)

相關文章