os.walk 和os.path.walk的區別

perfychi發表於2013-03-24
1 . 目錄結構:
/home/oracle/python
     | --
- os_path_walk.py
     | --
- os_walk.py
     | --
- script1.py
     | ---
script1.pyc
     | --
-  who.py.bak
     | ---  test1
             | -- who.py
     | --- test2
             | -- who.py




[oracle@shanxi python]$ ls -lR /home/oracle/python/
/home/oracle/python/:
total 28
-rw-r--r--  1 oracle dba  341 Mar 24 09:47 os_path_walk.py
-rw-r--r--  1 oracle dba  384 Mar 24 09:44 os_walk.py
-rw-r--r--  1 oracle dba   53 Mar 15 12:04 script1.py
-rw-r--r--  1 oracle dba  187 Mar  8 16:45 script1.pyc
drwxr-xr-x  2 oracle dba 4096 Mar 23 19:08 test1
drwxr-xr-x  2 oracle dba 4096 Mar 23 19:10 test2
-rw-r--r--  1 oracle dba  144 Mar 23 10:00 who.py.bak

/home/oracle/python/test1:
total 4
-rw-r--r--  1 oracle dba 144 Mar 23 19:08 who.py

/home/oracle/python/test2:
total 4
-rw-r--r--  1 oracle dba 144 Mar 23 19:10 who.py


注:下面所說的迴圈都是指遞迴目錄層次的迴圈,不是指從list中取值的那種迴圈,你懂的^_^
2.函式 os.path.walk()

>>> help(os.path.walk)
Help on function walk in module ntpath:

walk(top, func, arg)
    Directory tree walk with callback function.

    For each directory in the directory tree rooted at top (including top
    itself, but excluding '.' and '..'), call func(arg, dirname, fnames).
    dirname is the name of the directory, and fnames a list of the names of
    the files and subdirectories in dirname (excluding '.' and '..').  func
    may modify the fnames list in-place (e.g. via del or slice assignment),
    and walk will only recurse into the subdirectories whose names remain in
    fnames; this can be used to implement a filter, or to impose a specific
    order of visiting.  No semantics are defined for, or required of, arg,
    beyond that arg is always passed to func.  It can be used, e.g., to pass
    a filename pattern, or a mutable object designed to accumulate
    statistics.  Passing None for arg is common.


例如:
[oracle@shanxi python]$ more  os_path_walk.py      
import os,os.path
def VisitDir(arg,dirname,names):
        print "%s, arg=%s" % (type(arg), arg)
        print "%s, dirname=%s" % (type(dirname), dirname)
        print "%s, names=%s" % (type(names), names)
        print "\n"
        for filespath in names:
                print os.path.join(dirname,filespath)
        print "-----------------"
if __name__=="__main__":
        path="/home/oracle/python"
        os.path.walk(path,VisitDir,('chi'))


[oracle@shanxi python]$ python os_path_walk.py
, arg=chi
, dirname=/home/oracle/python
, names=['os_path_walk.py', 'test1', 'test2', 'who.py.bak', 'script1.py', 'script1.pyc', 'os_walk.py']


/home/oracle/python/os_path_walk.py
/home/oracle/python/test1
/home/oracle/python/test2
/home/oracle/python/who.py.bak
/home/oracle/python/script1.py
/home/oracle/python/script1.pyc
/home/oracle/python/os_walk.py
-----------------
, arg=chi
, dirname=/home/oracle/python/test1
, names=['who.py']


/home/oracle/python/test1/who.py
-----------------
, arg=chi
, dirname=/home/oracle/python/test2
, names=['who.py']


/home/oracle/python/test2/who.py

注:
(1)可以看出,VisitDir(arg,dirname,names)中的names既包含當前路徑下的目錄又包含當前路徑下的檔案,透過這個引數是區分不出檔案還是目錄的,除非加判斷(isfile或者isdir)
(2)
回撥函式VisitDir(arg,dirname,names)函式定義中沒有迴圈,之所以多次迴圈取出結果是由於os.path.walk()的迭代呼叫回撥函式導致的。


3、函式os.walk()
>>> help(os.walk)
Help on function walk in module os:

walk(top, topdown=True, nerror=None, followlinks=False)
    Directory tree generator.

    For each directory in the directory tree rooted at top (including top
    itself, but excluding '.' and '..'), yields a 3-tuple

        dirpath, dirnames, filenames

    dirpath is a string, the path to the directory.  dirnames is a list of
    the names of the subdirectories in dirpath (excluding '.' and '..').
    filenames is a list of the names of the non-directory files in dirpath.
    Note that the names in the lists are just names, with no path components.
    To get a full path (which begins with top) to a file or directory in
    dirpath, do os.path.join(dirpath, name).

    If optional arg 'topdown' is true or not specified, the triple for a
    directory is generated before the triples for any of its subdirectories
    (directories are generated top down).  If topdown is false, the triple
    for a directory is generated after the triples for all of its
    subdirectories (directories are generated bottom up).

    When topdown is true, the caller can modify the dirnames list in-place
    (e.g., via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune
    the search, or to impose a specific order of visiting.  Modifying
    dirnames when topdown is false is ineffective, since the directories in
    dirnames have already been generated by the time dirnames itself is
    generated.

    By default errors from the os.listdir() call are ignored.  If
    optional arg 'onerror' is specified, it should be a function; it
    will be called with one argument, an os.error instance.  It can
    report the error to continue with the walk, or raise the exception
    to abort the walk.  Note that the filename is available as the
    filename attribute of the exception object.

    By default, os.walk does not follow symbolic links to subdirectories on
    systems that support them.  In order to get this functionality, set the
    optional argument 'followlinks' to true.

    Caution:  if you pass a relative pathname for top, don't change the
    current working directory between resumptions of walk.  walk never
    changes the current directory, and assumes that the client doesn't
    either.

    Example:

    import os
    from os.path import join, getsize
    for root, dirs, files in os.walk('python/Lib/email'):
        print root, "consumes",
        print sum([getsize(join(root, name)) for name in files]),
        print "bytes in", len(files), "non-directory files"
        if 'CVS' in dirs:
            dirs.remove('CVS')  # don't visit CVS directories

例如:
[oracle@shanxi python]$ more os_walk.py
import os
def VisitDir(path):
  for root,dirs,files in os.walk(path):
        print "%s, root = %s"  % (type(root),root)
        print "%s, dirs = %s" % (type(dirs),dirs)
        print "%s, files = %s " % (type(files),files)
        print "\n"
        for filespath in files:
                print os.path.join(root,filespath)
        print "--------------------------"
if __name__=="__main__":
        path="/home/oracle/python"
        VisitDir(path)

[oracle@shanxi python]$ python os_walk.py
, root = /home/oracle/python
, dirs = ['test1', 'test2']
, files = ['os_path_walk.py', 'who.py.bak', 'script1.py', 'script1.pyc', 'os_walk.py']


/home/oracle/python/os_path_walk.py
/home/oracle/python/who.py.bak
/home/oracle/python/script1.py
/home/oracle/python/script1.pyc
/home/oracle/python/os_walk.py
--------------------------
, root = /home/oracle/python/test1
, dirs = []
, files = ['who.py']


/home/oracle/python/test1/who.py
--------------------------
, root = /home/oracle/python/test2
, dirs = []
, files = ['who.py']


/home/oracle/python/test2/who.py
--------------------------

注:
(1)os.walk()返回值是三元組迭代器,“
root,dirs,files in os.walk(path)”中的引數dirs純粹代表目錄, files純粹代表檔案,這就在引數位置把檔案和目錄給區分開來了。而os.path.walk()中的回撥函式func的第二個引數,是代表目錄和檔案的,眉毛鬍子一把抓,並沒有直接區分目錄和檔案,當然可以透過進一步判斷(函式isfile, isdir)來區分開來。
(2 ) os.walk因為返回迭代器所以需要for迴圈來逐層取出結果,而os.path.walk()不需要迴圈,是透過單層的回撥函式func實現的。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/27042095/viewspace-756940/,如需轉載,請註明出處,否則將追究法律責任。