7、python之檔案操作

曲~線發表於2018-09-17

python之檔案操作

一、檔案操作基本流程

計算機系統分為:計算機硬體,作業系統,應用程式三部分。

我們用python或其他語言編寫的應用程式若想要把資料永久儲存下來,必須要儲存於硬碟中,這就涉及到應用程式要操作硬體,眾所周知,應用程式是無法直接操作硬體的,這就用到了作業系統。作業系統把複雜的硬體操作封裝成簡單的介面給使用者/應用程式使用,其中檔案就是作業系統提供給應用程式來操作硬碟虛擬概念,使用者或應用程式通過操作檔案,可以將自己的資料永久儲存下來。

有了檔案的概念,我們無需再去考慮操作硬碟的細節,只需要關注操作檔案的流程:

#1. 開啟檔案,得到檔案控制程式碼並賦值給一個變數

f=open(`a.txt`,`r`,encoding=`utf-8`) #預設開啟模式就為r

#2. 通過控制程式碼對檔案進行操作

data=f.read()

#3. 關閉檔案

f.close()

關閉檔案的注意事項:

開啟一個檔案包含兩部分資源:作業系統級開啟的檔案+應用程式的變數。在操作完畢一個檔案時,必須把與該檔案的這兩部分資源一個不落地回收,回收方法為:

1、f.close() #回收作業系統級開啟的檔案

2、del f      #回收應用程式級的變數

其中del f一定要發生在f.close()之後,否則就會導致作業系統開啟的檔案還沒有關閉,白白佔用資源,

而python自動的垃圾回收機制決定了我們無需考慮del f,這就要求我們,在操作完畢檔案後,一定要記住 f.close()

雖然我這麼說,但是很多同學還是會很不要臉地忘記f.close(),對於這些不長腦子的同學,我們推薦傻瓜式操作方式:使用with關鍵字來幫我們管理上下文

with open(`a.txt`,`w`) as f:

pass

with open(`a.txt`,`r`) as read_f,open(`b.txt`,`w`) as write_f:

data=read_f.read()

write_f.write(data)

注意

二、檔案編碼

f=open(…)是由作業系統開啟檔案,那麼如果我們沒有為open指定編碼,那麼開啟檔案的預設編碼很明顯是作業系統說了算了,作業系統會用自己的預設編碼去開啟檔案,在windows下是gbk,在linux下是utf-8。

#若要保證不亂碼,檔案以什麼方式存的,就要以什麼方式開啟。

f=open(`a.txt`,`r`,encoding=`utf-8`)

三、檔案的開啟模式

檔案控制程式碼 = open(‘檔案路徑’,‘模式’)

#1. 開啟檔案的模式有(預設為文字模式):

r ,只讀模式【預設模式,檔案必須存在,不存在則丟擲異常】

w,只寫模式【不可讀;不存在則建立;存在則清空內容】

a, 只追加寫模式【不可讀;不存在則建立;存在則只追加內容】

#2. 對於非文字檔案,我們只能使用b模式,”b”表示以位元組的方式操作(而所有檔案也都是以位元組的形式儲存的,使用這種模式無需考慮文字檔案的字元編碼、圖片檔案的jgp格式、視訊檔案的avi格式)

rb

wb

ab

注:以b方式開啟時,讀取到的內容是位元組型別,寫入時也需要提供位元組型別,不能指定編碼

#3,‘+’模式(就是增加了一個功能)

r+, 讀寫【可讀,可寫】

w+,寫讀【可寫,可讀】

a+, 寫讀【可寫,可讀】

#4,以bytes型別操作的讀寫,寫讀,寫讀模式

r+b, 讀寫【可讀,可寫】

w+b,寫讀【可寫,可讀】

a+b, 寫讀【可寫,可讀】

四、檔案操作方法

4.1常用操作方法

read(3):

  1. 檔案開啟方式為文字模式時,代表讀取3個字元

  2. 檔案開啟方式為b模式時,代表讀取3個位元組

其餘的檔案內游標移動都是以位元組為單位的如:seek,tell,truncate

注意:

  1. seek有三種移動方式0,1,2,其中1和2必須在b模式下進行,但無論哪種模式,都是以bytes為單位移動的

  2. truncate是截斷檔案,所以檔案的開啟方式必須可寫,但是不能用w或w+等方式開啟,因為那樣直接清空檔案了,所以truncate要在r+或a或a+等模式下測試效果。

4.2所有操作方法(瞭解)

class file(object)

def close(self): # real signature unknown; restored from __doc__

關閉檔案

“””

close() -> None or (perhaps) an integer. Close the file.

Sets data attribute .closed to True. A closed file cannot be used for

further I/O operations. close() may be called more than once without

error. Some kinds of file objects (for example, opened by popen())

may return an exit status upon closing.

“””

def fileno(self): # real signature unknown; restored from __doc__

檔案描述符

“””

fileno() -> integer “file descriptor”.

This is needed for lower-level file interfaces, such os.read().

“””

return 0

def flush(self): # real signature unknown; restored from __doc__

重新整理檔案內部緩衝區

“”” flush() -> None. Flush the internal I/O buffer. “””

pass

def isatty(self): # real signature unknown; restored from __doc__

判斷檔案是否是同意tty裝置

“”” isatty() -> true or false. True if the file is connected to a tty device. “””

return False

def next(self): # real signature unknown; restored from __doc__

獲取下一行資料,不存在,則報錯

“”” x.next() -> the next value, or raise StopIteration “””

pass

def read(self, size=None): # real signature unknown; restored from __doc__

讀取指定位元組資料

“””

read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.

Notice that when in non-blocking mode, less data than what was requested

may be returned, even if no size parameter was given.

“””

pass

def readinto(self): # real signature unknown; restored from __doc__

讀取到緩衝區,不要用,將被遺棄

“”” readinto() -> Undocumented. Don`t use this; it may go away. “””

pass

def readline(self, size=None): # real signature unknown; restored from __doc__

僅讀取一行資料

“””

readline([size]) -> next line from the file, as a string.

Retain newline. A non-negative size argument limits the maximum

number of bytes to return (an incomplete line may be returned then).

Return an empty string at EOF.

“””

pass

def readlines(self, size=None): # real signature unknown; restored from __doc__

讀取所有資料,並根據換行儲存值列表

“””

readlines([size]) -> list of strings, each a line from the file.

Call readline() repeatedly and return a list of the lines so read.

The optional size argument, if given, is an approximate bound on the

total number of bytes in the lines returned.

“””

return []

def seek(self, offset, whence=None): # real signature unknown; restored from __doc__

指定檔案中指標位置

“””

seek(offset[, whence]) -> None. Move to new file position.

Argument offset is a byte count. Optional argument whence defaults to

(offset from start of file, offset should be >= 0); other values are 1

(move relative to current position, positive or negative), and 2 (move

relative to end of file, usually negative, although many platforms allow

seeking beyond the end of a file). If the file is opened in text mode,

only offsets returned by tell() are legal. Use of other offsets causes

undefined behavior.

Note that not all file objects are seekable.

“””

pass

def tell(self): # real signature unknown; restored from __doc__

獲取當前指標位置

“”” tell() -> current file position, an integer (may be a long integer). “””

pass

def truncate(self, size=None): # real signature unknown; restored from __doc__

截斷資料,僅保留指定之前資料

“””

truncate([size]) -> None. Truncate the file to at most size bytes.

Size defaults to the current file position, as returned by tell().

“””

pass

def write(self, p_str): # real signature unknown; restored from __doc__

寫內容

“””

write(str) -> None. Write string str to file.

Note that due to buffering, flush() or close() may be needed before

the file on disk reflects the data written.

“””

pass

def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__

將一個字串列表寫入檔案

“””

writelines(sequence_of_strings) -> None. Write the strings to the file.

Note that newlines are not added. The sequence can be any iterable object

producing strings. This is equivalent to calling write() for each string.

“””

pass

def xreadlines(self): # real signature unknown; restored from __doc__

可用於逐行讀取檔案,非全部

“””

xreadlines() -> returns self.

For backward compatibility. File objects now include the performance

optimizations previously implemented in the xreadlines module.

“””

pass

2.x

class TextIOWrapper(_TextIOBase):

“””

Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be

decoded or encoded with. It defaults to locale.getpreferredencoding(False).

errors determines the strictness of encoding and decoding (see

help(codecs.Codec) or the documentation for codecs.register) and

defaults to “strict”.

newline controls how line endings are handled. It can be None, “,

`
`, `
`, and `
`. It works as follows:

* On input, if newline is None, universal newlines mode is

enabled. Lines in the input can end in `
`, `
`, or `
`, and

these are translated into `
` before being returned to the

caller. If it is “, universal newline mode is enabled, but line

endings are returned to the caller untranslated. If it has any of

the other legal values, input lines are only terminated by the given

string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any `
` characters written are

translated to the system default line separator, os.linesep. If

newline is “ or `
`, no translation takes place. If newline is any

of the other legal values, any `
` characters written are translated

to the given string.

If line_buffering is True, a call to flush is implied when a call to

write contains a newline character.

“””

def close(self, *args, **kwargs): # real signature unknown

關閉檔案

pass

def fileno(self, *args, **kwargs): # real signature unknown

檔案描述符

pass

def flush(self, *args, **kwargs): # real signature unknown

重新整理檔案內部緩衝區

pass

def isatty(self, *args, **kwargs): # real signature unknown

判斷檔案是否是同意tty裝置

pass

def read(self, *args, **kwargs): # real signature unknown

讀取指定位元組資料

pass

def readable(self, *args, **kwargs): # real signature unknown

是否可讀

pass

def readline(self, *args, **kwargs): # real signature unknown

僅讀取一行資料

pass

def seek(self, *args, **kwargs): # real signature unknown

指定檔案中指標位置

pass

def seekable(self, *args, **kwargs): # real signature unknown

指標是否可操作

pass

def tell(self, *args, **kwargs): # real signature unknown

獲取指標位置

pass

def truncate(self, *args, **kwargs): # real signature unknown

截斷資料,僅保留指定之前資料

pass

def writable(self, *args, **kwargs): # real signature unknown

是否可寫

pass

def write(self, *args, **kwargs): # real signature unknown

寫內容

pass

def __getstate__(self, *args, **kwargs): # real signature unknown

pass

def __init__(self, *args, **kwargs): # real signature unknown

pass

@staticmethod # known case of __new__

def __new__(*args, **kwargs): # real signature unknown

“”” Create and return a new object. See help(type) for accurate signature. “””

pass

def __next__(self, *args, **kwargs): # real signature unknown

“”” Implement next(self). “””

pass

def __repr__(self, *args, **kwargs): # real signature unknown

“”” Return repr(self). “””

pass

buffer = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

closed = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

encoding = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

errors = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

line_buffering = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

name = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

newlines = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

_CHUNK_SIZE = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

_finalizing = property(lambda self: object(), lambda self, v: None, lambda self: None) # default

3.x

五、檔案的修改

檔案的資料是存放於硬碟上的,因而只存在覆蓋、不存在修改這麼一說,我們平時看到的修改檔案,都是模擬出來的效果,具體的說有兩種實現方式:

方式一:將硬碟存放的該檔案的內容全部載入到記憶體,在記憶體中是可以修改的,修改完畢後,再由記憶體覆蓋到硬碟(word,vim,nodpad++等編輯器)

import os # 呼叫系統模組

with open(`a.txt`) as read_f,open(`a.txt.swap`,`w`) as write_f:

data=read_f.read() #全部讀入記憶體,如果檔案很大,會很卡

data=data.replace(`alex`,`SB`) #在記憶體中完成修改

write_f.write(data) #一次性寫入新檔案

os.remove(`a.txt`) #刪除原檔案

os.rename(`.a.txt.swap`,`a.txt`) #將新建的檔案重新命名為原檔案

方式二:將硬碟存放的該檔案的內容一行一行地讀入記憶體,修改完畢就寫入新檔案,最後用新檔案覆蓋原始檔

import os

with open(`a.txt`) as read_f,open(`.a.txt.swap`,`w`) as write_f:

for line in read_f:

line=line.replace(`alex`,`SB`)

write_f.write(line)

os.remove(`a.txt`)

os.rename(`.a.txt.swap`,`a.txt`)

六、練習

1、 檔案a.txt內容:每一行內容分別為商品名字,價錢,個數。

apple 10 3

tesla 100000 1

mac 3000 2

lenovo 30000 3

chicken 10 3

通過程式碼,將其構建成這種資料型別:[{`name`:`apple`,`price`:10,`amount`:3},{`name`:`tesla`,`price`:1000000,`amount`:1}……] 並計算出總價錢。

2、有如下檔案:

——-

alex是頭上長了個包。

alex其實是人妖。

誰說alex是sb?

你們真逗,alex再牛逼,也掩飾不住資深屌絲的氣質。

———-

將檔案中所有的alex都替換成大寫的SB。


相關文章