Python mutilprocessing Processing 父子程式共享檔案物件?

發表於2017-02-15

multiprocessing python多程式模組, 於是, Processing也是多程式的寵兒. 但今天討論的問題, 似乎也能引起我們一番重視

直接上程式碼:

from multiprocessing import Process, Lock
err_file = 'error1.log'  
err_fd = open(err_file, 'w')

def put(fd):
     print "PUT"
     fd.write("hello, func put writen")
     print "END"

if __name__=='__main__':
    p_list=[]
    for i in range(1):
        p_list.append(Process(target=put, args=(err_fd,)))    
    for p in p_list:
        p.start()
    for p in p_list:
        p.join()

from multiprocessing import Process, Lock

err_file = 'error1.log'

err_fd = open(err_file, 'w')

def put(fd):

print "PUT"

fd.write("hello, func put writen")

print "END"

if __name__=='__main__':

p_list=[]

for i in range(1):

p_list.append(Process(target=put, args=(err_fd,)))

for p in p_list:

p.start()

for p in p_list:

p.join()

上面的程式碼意圖很清晰: 通過multiprocessing.Process派生一個程式, 去執行put函式, put函式的作用也是很清楚, 輸出PUT和END, 並且將”hello, func put write” 寫到檔案error1.log中.

那麼按理說, 輸出應該如同上面說的那樣, PUT和END,然後error1.log將有那句話”hello, func put write”, 然而, 世事總有那麼點難料的, 程式碼執行結果是:

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log 
PUT
END
[root@iZ23pynfq19Z ~]#

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log

PUT

END

[root@iZ23pynfq19Z ~]#

what！？　為什麼error1.log沒東西　!?

讓我們稍微調整下程式碼, 再見證神奇的事情:

from multiprocessing import Process, Lock
err_file = 'error1.log'  
err_fd = open(err_file, 'w')

def put(fd):
     print "PUT"
     fd.write("hello, func put writen")
     fd.write("o" * 4075) # 神奇的一行
     print "END"

if __name__=='__main__':
    p_list=[]
    for i in range(1):
        p_list.append(Process(target=put, args=(err_fd,)))    for p in p_list:
        p.start()
    for p in p_list:
        p.join()

from multiprocessing import Process, Lock

err_file = 'error1.log'

err_fd = open(err_file, 'w')

def put(fd):

print "PUT"

fd.write("hello, func put writen")

fd.write("o" * 4075) # 神奇的一行

print "END"

if __name__=='__main__':

p_list=[]

for i in range(1):

p_list.append(Process(target=put, args=(err_fd,))) for p in p_list:

p.start()

for p in p_list:

p.join()

輸出結果:

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log 
PUT
END
hello, func put write
o....(有4075個)
[root@iZ23pynfq19Z ~]#

[root@iZ23pynfq19Z ~]# py27 2.py ; cat error1.log

PUT

END

hello, func put write

o....(有4075個)

[root@iZ23pynfq19Z ~]#

有沒有覺得一種懵逼的感覺!?

如今, 心中湧現兩個問題:

為什麼第一個程式無法寫入那句話 , 但是第二個卻可以?
那個4075是什麼鬼?

在解釋這些問題之前, 我們需要清楚標準IO庫所具有的特點: 全緩衝, 行緩衝, 不緩衝

具體可以看之前博文:https://my.oschina.net/u/2291453/blog/806102

因為現在是寫入檔案, 所以系統IO將採用全緩衝的方式, 也就是說, 會將緩衝區填滿才刷入系統寫佇列.

所以上面的問題就一下子全解決了, 正因為那些迷一般的 ‘o’,填滿了整個緩衝區, 所以系統將我們的內容刷進去寫佇列,所以4075怎麼來, 就是用4096-sizeof(“hello, func put writen”)+1, 為什麼要+1, 因為緩衝區滿還不行, 要大於才能觸發寫動作.

所以我們現在已經能夠得出答案, 如果我們想要在multiprcessing.Process中, 用上面類似的方式去寫檔案時,有三種方法去實現:

寫滿緩衝區
手動呼叫flush()
將檔案物件設定成不緩衝

第一第二種在上面已經闡述, 那我們簡單講下第三種:

取自Python官網Document:
open(name[, mode[, buffering]])
  ...
  The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 
  1 means line buffered, any other positive value means use a buffer of (approximately) that 
  size (in bytes). A negative buffering means to use the system default, which is usually line 
  buffered for tty devices and fully buffered for other files. If omitted, the system default is 
  used. [2]

取自Python官網Document:

open(name[, mode[, buffering]])

...

The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered,

1 means line buffered, any other positive value means use a buffer of (approximately) that

size (in bytes). A negative buffering means to use the system default, which is usually line

buffered for tty devices and fully buffered for other files. If omitted, the system default is

used. [2]

上圖說明就是, 允許我們在open的時候, 設定buffering為0, 那麼就是unbuffered模式, 那麼在每次寫, 就是直接寫入寫佇列,而不是寫到緩衝區.(效能最低的方式)

————————————————我是切割線———————————————-

談論完現象和處理的方法, 我們應該來點深入的;

相信我們曾經試過, 在沒有顯示關閉檔案物件或者顯示呼叫flush時, 檔案依舊能夠正常寫入,那麼又是怎麼一回事呢?

其實,在我們正常關閉程式時, 程式在退出將會為我們做一些”手尾”, 例如關閉開啟的檔案描述符, 清理臨時檔案,清理記憶體等等.正是因為系統的這種”好習慣”, 所以我們的資料在檔案描述符關閉時,就能刷入寫佇列,檔案內容也不會丟失.

那麼基於這種認識,我們再回首剛才的問題, 在子程式呼叫put的時候, 理論上在程式退出時, 並沒顯示關閉檔案描述符, 所以資料在緩衝區就丟失了.

讓我們在順藤摸瓜,看Process的實現

multiprocessing/Processing.py
    def start(self):
        '''
        Start child process
        '''
        assert self._popen is None, 'cannot start a process twice'
        assert self._parent_pid == os.getpid(), 
               'can only start a process object created by current process'
        assert not _current_process._daemonic, 
               'daemonic processes are not allowed to have children'
        _cleanup()
        if self._Popen is not None:
            Popen = self._Popen
        else:
            from .forking import Popen
        self._popen = Popen(self)
        _current_process._children.add(self)

multiprocessing/Processing.py

def start(self):

'''

Start child process

'''

assert self._popen is None, 'cannot start a process twice'

assert self._parent_pid == os.getpid(),

'can only start a process object created by current process'

assert not _current_process._daemonic,

'daemonic processes are not allowed to have children'

_cleanup()

if self._Popen is not None:

Popen = self._Popen

else:

from .forking import Popen

self._popen = Popen(self)

_current_process._children.add(self)

再看下Popn是怎麼做?

multiprocessing/forking.py
    class Popen(object):

        def __init__(self, process_obj):
            sys.stdout.flush()
            sys.stderr.flush()
            self.returncode = None

            self.pid = os.fork()
            if self.pid == 0:
                if 'random' in sys.modules:
                    import random
                    random.seed()
                code = process_obj._bootstrap()
                sys.stdout.flush()
                sys.stderr.flush()
                os._exit(code)

multiprocessing/forking.py

class Popen(object):

def __init__(self, process_obj):

sys.stdout.flush()

sys.stderr.flush()

self.returncode = None

self.pid = os.fork()

if self.pid == 0:

if 'random' in sys.modules:

import random

random.seed()

code = process_obj._bootstrap()

sys.stdout.flush()

sys.stderr.flush()

os._exit(code)

關鍵地方就是最後的 os._exit(code), 為什麼說最關鍵? 因為這部分的退出, 將決定程式會處理什麼”手尾”,

os._exit是什麼鬼? 其實就是標準庫的_eixt, 於是我們又能簡單學習這東西了

https://my.oschina.net/u/2291453/blog/813259

在上面的連結, 我們能夠比較清楚看到 _exit() 和exit() 是比較不同的兩個東西, _exit() 簡單暴力, 直接丟棄使用者態的內容,進入核心, 而exit()則比較耐心地為我們清理

那麼我們是否能夠假設: 如果Popen的退出不是os._exit() 會是怎樣的效果呢?

很幸運的是, sys.exit() 就是我們先要的exit(), 事不宜遲, 趕緊試下!

multiprocessing/forking.py
    class Popen(object):

        def __init__(self, process_obj):
            sys.stdout.flush()
            sys.stderr.flush()
            self.returncode = None

            self.pid = os.fork()
            if self.pid == 0:
                if 'random' in sys.modules:
                    import random
                    random.seed()
                code = process_obj._bootstrap()
                sys.stdout.flush()
                sys.stderr.flush()
                #os._exit(code)
                sys.exit(code)

multiprocessing/forking.py

class Popen(object):

def __init__(self, process_obj):

sys.stdout.flush()

sys.stderr.flush()

self.returncode = None

self.pid = os.fork()

if self.pid == 0:

if 'random' in sys.modules:

import random

random.seed()

code = process_obj._bootstrap()

sys.stdout.flush()

sys.stderr.flush()

#os._exit(code)

sys.exit(code)

測試程式碼, 返回最原始那個沒有’o’填充的版本

[root@iZ23pynfq19Z ~]# python 2.py ; cat error1.log 
PUT
END
hello, func put write

[root@iZ23pynfq19Z ~]# python 2.py ; cat error1.log

PUT

END

hello, func put write

我們可以看到, 確實是可以寫進去, 這樣就證明上面的說法是站得住腳步的

不過最好還是不要亂改原始碼哦, 畢竟這些都是老前輩多年優化的結果,可能這是他們故意這些寫,為了避免某些問題.還是規範好自己的行為,儘量減少這些看起來不怎麼規範的實現思路吧

程式——父子程式共享
2020-07-15
在父子程式間用管道傳遞檔案描述符
2014-08-28
samba共享檔案
2017-07-12
Samba
CIFS檔案共享
2018-05-07
QT父子與QT物件delete
2013-04-21
QT物件delete
使用Samba共享檔案
2022-12-15
Samba
Mac檔案共享工具
2021-10-12
Mac
檔案共享之nfs
2017-11-23
NFS
檔案共享服務
2024-11-04
如何實現檔案共享，檔案共享的設定方法-鐳速
2023-05-12
Python 編譯：code物件與 pyc檔案
2016-01-09
Python編譯物件
用 Git 來共享檔案
2020-01-10
Git
NFS網路檔案共享
2018-09-06
NFS
Windows檔案共享Linux
2020-11-26
WindowsLinux
FTP檔案共享服務
2023-09-15
FTP
cifs檔案系統｛samba檔案共享服務｝
2017-11-12
Samba
Oracle - 共享遊標、父子游標、硬軟解析
2018-09-17
Oracle
Python判斷物件是否為檔案物件(file object)的三種方法
2018-10-15
Python物件Object
Vagrant 開啟 smb 檔案共享
2019-11-23
如何共享印表機和檔案
2019-05-11
FileProvider共享檔案、快取
2018-11-30
IDE快取
共享檔案和文件方法指南
2023-03-23
29 檔案共享與保護
2017-06-24
ubuntu 16.04 samba 檔案共享
2016-08-11
UbuntuSamba
共享系統檔案（瞭解）
2006-07-13
網路檔案共享服務
2024-05-06
win10怎麼檢視共享檔案_win10如何訪問共享檔案
2020-06-22
Win10
Oracle Query processing 的程式
2007-07-16
Oracle
win10系統開啟共享檔案提示“因為檔案共享不安全,所以你不能連線到檔案共享”怎麼辦
2019-10-21
Win10
8.Python3原始碼—Code物件與pyc檔案
2018-06-06
Python原始碼物件
Python C7-4——檔案物件的幾種操作
2017-07-13
Python物件
cifs網路檔案共享系統
2018-11-22
mac網路檔案安全共享工具
2021-11-01
Mac
利用HFS工具進行檔案共享
2015-12-06
IIS訪問共享檔案詳解
2016-01-20
linux下的檔案共享(轉)
2007-08-15
Linux
Linux檔案共享（FTP+NFS）
2024-05-06
LinuxFTPNFS
CuteHttpFileServer(檔案共享工具) v3.1
2024-07-20
HTTPServer

Python mutilprocessing Processing 父子程式共享檔案物件?

相關文章