Java IO - Reader

劍西樓發表於2017-03-21

前言

JavaIO一共包括兩種,一種是stream,一種是reader/writer,每種又包括in/out,所以一共是四種包。Java 流在處理上分為字元流和位元組流。字元流處理的單元為 2 個位元組的 Unicode 字元,分別操作字元、字元陣列或字串,而位元組流處理單元為 1 個位元組,操作位元組和位元組陣列。
Java 內用 Unicode 編碼儲存字元,字元流處理類負責將外部的其他編碼的字元流和 java 內 Unicode 字元流之間的轉換。而類 InputStreamReader 和 OutputStreamWriter 處理字元流和位元組流的轉換。字元流(一次可以處理一個緩衝區)一次操作比位元組流(一次一個位元組)效率高。

0. 目錄

  1. Reader
  2. BufferedReader
  3. InputStreamReader
  4. FileReader
  5. 總結

1. Reader

reader
Reader是一個抽象類,其介紹如下:

Abstract class for reading character streams. The only methods that a subclass must implement are read(char[], int, int) and close(). Most subclasses, however, will override some of the methods defined here in order to provide higher efficiency, additional functionality, or both.

1.1 主要方法

abstract void   close()
Closes the stream and releases any system resources associated with it.

void    mark(int readAheadLimit)
Marks the present position in the stream.

boolean markSupported()
Tells whether this stream supports the mark() operation.

int read()
Reads a single character.

int read(char[] cbuf)
Reads characters into an array.

abstract int    read(char[] cbuf, int off, int len)
Reads characters into a portion of an array.

int read(CharBuffer target)
Attempts to read characters into the specified character buffer.

boolean ready()
Tells whether this stream is ready to be read.

void    reset()
Resets the stream.

long    skip(long n)
Skips characters.

注意,不同於stream的是reader讀的是char[]。
其常見的子類包括BufferedReader和InputStreamReader,InputStreamReader的子類FileReader也很常見,下面簡單介紹一下。

2. BufferedReader

BufferedReader繼承Reader,本身的方法非常簡單,其官方解釋如下:

Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.
The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

簡單翻譯一下:

從流裡面讀取文字,通過快取的方式提高效率,讀取的內容包括字元、陣列和行。
快取的大小可以指定,也可以用預設的大小。大部分情況下,預設大小就夠了。

2.1. 建構函式

BufferedReader有兩個建構函式,其宣告如下:

BufferedReader(Reader in)
Creates a buffering character-input stream that uses a default-sized input buffer.

BufferedReader(Reader in, int sz)
Creates a buffering character-input stream that uses an input buffer
 of the specified size.

一個是傳一個Reader,另外一個增加了快取的大小。

常見的初始化方法

BufferedReader br = new BufferedReader(new FileReader("d:/123.txt"));
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

第一個方法是讀取一個檔案;第二個方法是從標準輸入讀。

2.2. 主要方法

void    close()
Closes the stream and releases any system resources associated with it.

void    mark(int readAheadLimit)
Marks the present position in the stream.

boolean markSupported()
Tells whether this stream supports the mark() operation, which it does.

int read()
Reads a single character.

int read(char[] cbuf, int off, int len)
Reads characters into a portion of an array.

String  readLine()
Reads a line of text.

boolean ready()
Tells whether this stream is ready to be read.

void    reset()
Resets the stream to the most recent mark.

long    skip(long n)
Skips characters.

提供了三種讀資料的方法read、read(char[] cbuf, int off, int len)、readLine(),其中常用的是readLine。

3. InputStreamReader

InputStreamReader的介紹如下:

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
Each invocation of one of an InputStreamReader's read() methods may cause one or more bytes to be read from the underlying byte-input stream. To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:

 BufferedReader in
   = new BufferedReader(new InputStreamReader(System.in));

也就是說,InputStreamReader是把位元組翻譯成字元的。

3.1建構函式

InputStreamReader(InputStream in)
Creates an InputStreamReader that uses the default charset.

InputStreamReader(InputStream in, Charset cs)
Creates an InputStreamReader that uses the given charset.

InputStreamReader(InputStream in, CharsetDecoder dec)
Creates an InputStreamReader that uses the given charset decoder.

InputStreamReader(InputStream in, String charsetName)
Creates an InputStreamReader that uses the named charset.

可以看到,InputStreamReader的建構函式會傳入一個字元編碼,通常用InputStreamReader來解決亂碼問題。

3.2. 主要方法

void    close()
Closes the stream and releases any system resources associated with it.

String  getEncoding()
Returns the name of the character encoding being used by this stream.

int read()
Reads a single character.

int read(char[] cbuf, int offset, int length)
Reads characters into a portion of an array.

boolean ready()
Tells whether this stream is ready to be read.

3.3. 示例程式碼

經常用InputStreamReader解決亂碼問題,示例程式碼如下:

    private void test() throws Throwable {
        BufferedReader in = null;
        try {
            InputStreamReader isr = new InputStreamReader(new FileInputStream(
                    "d:/123.txt"), "UTF-8");
            in = new BufferedReader(isr);
            while (true) {
                String lineMsg = in.readLine();
                if (lineMsg == null || lineMsg.equals("")) {
                    break;
                }
            }
        } catch (Throwable t) {
            throw t;
        } finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (Throwable t) {
                throw t;
            }
        }
    }

編碼集見本文末尾。

4. FileReader

其介紹如下:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
FileReader is meant for reading streams of characters. For reading streams of raw bytes, consider using a FileInputStream.
FileReader是方便讀取字元檔案的。

4.1. 建構函式

FileReader(File file)
Creates a new FileReader, given the File to read from.

FileReader(FileDescriptor fd)
Creates a new FileReader, given the FileDescriptor to read from.

FileReader(String fileName)
Creates a new FileReader, given the name of the file to read from.

可以看到,FileReader的建構函式主要是讀取檔案。

5. 總結一下:

1. BufferedReader可以更高效的讀取檔案

2. InputStreamReader可以處理亂碼問題

3. FileReader可以直接讀取檔案,方便

4. 常見編碼

  • 7位ASCII字元,也叫作ISO646-US、Unicode字符集的基本拉丁塊
    "US-ASCII"
  • ISO 拉丁字母表 No.1,也叫作 ISO-LATIN-1
    "ISO-8859-1"
  • 8 位 UCS 轉換格式
    "UTF-8"
  • 16 位 UCS 轉換格式,Big Endian(最低地址存放高位位元組)位元組順序
    "UTF-16BE"
  • 16 位 UCS 轉換格式,Little-endian(最高地址存放低位位元組)位元組順序
    "UTF-16LE"
  • 16 位 UCS 轉換格式,位元組順序由可選的位元組順序標記來標識
    "UTF-16"
  • 中文超大字符集
    "GBK"