3.2 深入函式

儘管函式在早先時候介紹了，但有關函式在更深層次上是如何工作的細節卻很少提供。本節旨在填補這些空白，並討論函式呼叫約定，作用域規則等問題。

呼叫函式

考慮以下函式：

def read_prices(filename, debug):
    ...

可以使用位置引數呼叫該函式：

prices = read_prices('prices.csv', True)

或者，可以使用關鍵字引數呼叫該函式：

prices = read_prices(filename='prices.csv', debug=True)

預設引數

有時候，你希望引數是可選的，如果是這樣，請在函式定義中分配一個預設值。

def read_prices(filename, debug=False):
    ...

如果分配了預設值，則引數在函式呼叫中是可選的。

d = read_prices('prices.csv')
e = read_prices('prices.dat', True)

注意：帶有預設值的引數（譯註：即關鍵字引數）必須出現在引數列表的末尾（所有非可選引數都放在最前面）

首選關鍵字引數作為可選引數

比較以下兩種不同的呼叫風格：

parse_data(data, False, True) # ?????

parse_data(data, ignore_errors=True)
parse_data(data, debug=True)
parse_data(data, debug=True, ignore_errors=True)

在大部分情況下，關鍵字引數提高了程式碼的簡潔性——特別是對於用作標誌的引數，或者與可選特性相關的引數。

設計最佳實踐

始終為函式引數指定簡短但有意義的名稱。

使用函式的人可能想要使用關鍵字呼叫風格。

d = read_prices('prices.csv', debug=True)

Python 開發工具將會在幫助功能或者幫助文件中顯示這些名稱。

返回值

return 語句返回一個值：

def square(x):
    return x * x

如果沒有給出返回值或者 return 語句缺失，那麼返回 None：

def bar(x):
    statements
    return

a = bar(4)      # a = None

# OR
def foo(x):
    statements  # No `return`

b = foo(4)      # b = None

多個返回值

函式只能返回一個值。但是，通過將返回值放到元組中，函式可以返回多個值：

def divide(a,b):
    q = a // b      # Quotient
    r = a % b       # Remainder
    return q, r     # Return a tuple

用例：

x, y = divide(37,5) # x = 7, y = 2

x = divide(37, 5)   # x = (7, 2)

變數作用域

程式給變數賦值：

x = value # Global variable

def foo():
    y = value # Local variable

變數賦值發生在函式的內部和外部。定義在函式外部的變數是“全域性的”。定義在函式內部的變數是“區域性的”。

區域性變數

在函式內部賦值的變數是私有的。

def read_portfolio(filename):
    portfolio = []
    for line in open(filename):
        fields = line.split(',')
        s = (fields[0], int(fields[1]), float(fields[2]))
        portfolio.append(s)
    return portfolio

在此示例中，filename, portfolio, line, fields 和 s 是區域性變數。在函式呼叫之後，這些變數將不會保留或者不可訪問。

>>> stocks = read_portfolio('portfolio.csv')
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>

區域性變數也不能與其它地方的變數衝突。

全域性變數

函式可以自由地訪問定義在同一檔案中的全域性變數值。

name = 'Dave'

def greeting():
    print('Hello', name)  # Using `name` global variable

但是，函式不能修改全域性變數：

name = 'Dave'

def spam():
  name = 'Guido'

spam()
print(name) # prints 'Dave'

切記：函式中的所有賦值都是區域性的

修改全域性變數

如果必須修改全域性變數，請像下面這樣宣告它：

name = 'Dave'

def spam():
    global name
    name = 'Guido' # Changes the global name above

全域性宣告必須在使用之前出現，並且相應的變數必須與該函式處在同一檔案中。看上面這個函式，要知道這是一種糟糕的形式。事實上，如果可以的話，儘量避免使用 global 。如果需要一個函式來修改函式外部的某種狀態，最好是使用類來代替（稍後詳細介紹）。

引數傳遞

當呼叫一個函式的時候，引數變數的傳遞是引用傳遞。不拷貝值（參見2.7 節）。如果傳遞了可變資料型別（如列表，字典），它們可以被原地修改。

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

關鍵點：函式不接收輸入引數的拷貝。

重新賦值與修改

確保瞭解修改值與給變數名重新賦值的細微差別。

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

# VS
def bar(items):
    items = [4,5,6]    # Changes local `items` variable to point to a different object

b = [1, 2, 3]
bar(b)
print(b)                # [1, 2, 3]

提醒：變數賦值永遠不會重寫記憶體。名稱只是被繫結到了新的值上面

練習

本組練習實現的內容可能是本課程最強大的和最難的。有很多步驟，並且過去練習中的許多概念被一次性整合在一起。雖然最後的題解只有大約 25 行的程式碼，但要花點時間，確保你理解每一個部分。

report.py 的中心部分主要用於讀取 CSV 檔案。例如，read_portfolio() 函式讀取包含投資組合資料的檔案，read_prices() 函式讀取包含價格資料的檔案。在這兩個函式中，有很多底層的“精細的”事以及相似的特性。例如，它們都開啟一個檔案並使用 csv 模組來處理，並且將各種欄位轉換為新的型別。

如果真的需要對大量的檔案進行解析，可能需要清理其中的一些內容使其更通用。這是我們的目標。

通過開啟 Work/fileparse.py 檔案開始本練習，該檔案是我們將要寫程式碼的地方。

練習 3.3：讀取 CSV 檔案

首先，讓我們僅關注將 CSV 檔案讀入字典列表的問題。在 fileparse.py 中，定義一個如下所示的函式：

# fileparse.py
import csv

def parse_csv(filename):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)
        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            record = dict(zip(headers, row))
            records.append(record)

    return records

該函式將 CSV 檔案讀入字典列表中，但是隱藏了開啟檔案，使用 csv 模組處理，忽略空行等詳細資訊。

試試看：

提示： python3 -i fileparse.py.

>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

這很好，除了不能使用資料做任何有用的計算之外。因為所有的內容都是用字串表示。我們將馬上解決此問題，先讓我們繼續在此基礎上進行構建。

練習 3.4：構建列選擇器

在大部分情況下，你只對 CSV 檔案中選定的列感興趣，而不是所有資料。修改 parse_csv() 函式，以便讓使用者指定任意的列，如下所示：

>>> # Read all of the data
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]

>>> # Read only some of the data
>>> shares_held = parse_csv('Data/portfolio.csv', select=['name','shares'])
>>> shares_held
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
>>>

練習 2.23 中給出了列選擇器的示例。

然而，這裡有一個方法可以做到這一點：

# fileparse.py
import csv

def parse_csv(filename, select=None):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)

        # If a column selector was given, find indices of the specified columns.
        # Also narrow the set of headers used for resulting dictionaries
        if select:
            indices = [headers.index(colname) for colname in select]
            headers = select
        else:
            indices = []

        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            # Filter the row if specific columns were selected
            if indices:
                row = [ row[index] for index in indices ]

            # Make a dictionary
            record = dict(zip(headers, row))
            records.append(record)

    return records

這部分有一些棘手的問題，最重要的一個可能是列選擇到行索引的對映。例如，假設輸入檔案具有以下標題：

>>> headers = ['name', 'date', 'time', 'shares', 'price']
>>>

現在，假設選定的列如下：

>>> select = ['name', 'shares']
>>>

為了執行正確的選擇，必須將選擇的列名對映到檔案中的列索引。這就是該步驟正在執行的操作：

>>> indices = [headers.index(colname) for colname in select ]
>>> indices
[0, 3]
>>>

換句話說，名稱（"name" ）是第 0 列，股份數目（"shares" ）是第 3 列。

當從檔案讀取資料行的時候，使用索引對其進行過濾：

>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
>>> row = [ row[index] for index in indices ]
>>> row
['AA', '100']
>>>

練習 3.5：執行型別轉換

修改 parse_csv() 函式，以便可以選擇將型別轉換應用到返回資料上。例如：

>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]

>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
>>> shares_held
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
>>>

在練習 2.24 中已經對此進行了探索。需要將下列程式碼片段插入到題解中：

...
if types:
    row = [func(val) for func, val in zip(types, row) ]
...

練習 3.6：處理無標題的資料

某些 CSV 檔案不包含任何的標題資訊。例如，prices.csv 檔案看起來像下面這樣：

"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
...

修改 parse_csv() 檔案以便通過建立元組列表來處理此類檔案。例如：

>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
>>> prices
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
>>>

要執行此更改，需要修改程式碼以便資料的第一行不被解釋為標題行。另外，需要確保不建立字典，因為不再有可用於列名的鍵。

練習 3.7：選擇其它的列分隔符

儘管 CSV 檔案非常普遍，但還可能會遇到使用其它列分隔符（如製表符（tab）或空格符（space））的檔案。例如，如下所示的 Data/portfolio.dat 檔案：

name shares price
"AA" 100 32.20
"IBM" 50 91.10
"CAT" 150 83.44
"MSFT" 200 51.23
"GE" 95 40.37
"MSFT" 50 65.10
"IBM" 100 70.44

csv.reader() 函式允許像下面這樣指定不同的分隔符：

rows = csv.reader(f, delimiter=' ')

修改 parse_csv() 函式以便也允許修改分隔符。

例如：

>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

說明

到目前為止，如果你已經完成，那麼你建立了一個非常有用的庫函式。你可以使用它去解析任意的 CSV 檔案，選擇感興趣的列，執行型別轉換，而不用對檔案或者 csv 模組的內部工作有太多的擔心。

目錄 | 上一節 (3.1 指令碼) | 下一節 (3.3 錯誤檢查)

注：完整翻譯見 https://github.com/codists/practical-python-zh

翻譯：《實用的Python程式設計》03_02_More_functions