資料處理02：Python數值計算包NumPy

weixin_34253539發表於2019-02-23

原文網址 : https://blog.csdn.net/weixin_34253539/article/details/88508178

NumPy 是一個專用於數值計算的第三方包，它提供了實現高效儲存與快速操作的多維陣列物件，可以說是整個 Python 資料科學生態系統的核心，專案官網 https://www.numpy.org/

NumPy 已包含於 Anaconda 中，匯入模組時請按慣例使用 np 作為別名：

In [1]: import numpy as np

In [2]: np.__version__  # 檢視版本號
Out[2]: '1.15.4'

NumPy 多維陣列類名為 ndarray，其中的元素必須型別一致。建立陣列的基本方式是呼叫 array 函式並以一個列表作為引數——如果傳入的列表以若干相同長度列表為組成元素，則會建立一個二維陣列（巢狀更多層將生成更高維度的陣列）：

In [3]: np.array([1, 2, 3, 4, 5])
Out[3]: array([1, 2, 3, 4, 5])

In [4]: np.array([[4, 9, 2], [3, 5, 7], [8, 1, 6]])
Out[4]: 
array([[4, 9, 2],
       [3, 5, 7],
       [8, 1, 6]])

NumPy 多維陣列屬於可變的序列物件，支援標準的索引操作；對於多維陣列可以用索引取到子陣列，並支援用逗號分隔的多組索引；甚至還可以用陣列作為“花式索引”：

In [5]: a1, a2 = Out[3], Out[4]  # IPython 的一個小技巧

In [6]: a1[0]
Out[6]: 1

In [7]: a2[1]
Out[7]: array([4, 9, 2])

In [8]: a2[0][-1]
Out[8]: 2

In [9]: a2[0, -1]
Out[9]: 2

In [10]: a2[1:, :2]
Out[10]: 
array([[3, 5],
       [8, 1]])

In [11]: a1[np.array([0, 2, 4])]  # 用陣列進行索引
Out[11]: array([1, 3, 5])

NumPy 多維陣列包含一些資料屬性，例如維度 ndim，形狀 shape（每個維度的項數），大小 size（總項數）以及資料型別 dtype——由於元素型別必須一致，賦值時如果遇到不同型別將會自動轉換（這有可能丟失資訊）：

In [12]: print("a1維度:", a1.ndim)
    ...: print("a1形狀:", a1.shape)
    ...: print("a1大小:", a1.size)
    ...: print("a1資料型別:", a1.dtype)
a1維度: 1
a1形狀: (5,)
a1大小: 5
a1資料型別: int64

In [13]: print("a2維度:", a2.ndim)
    ...: print("a2形狀:", a2.shape)
    ...: print("a2大小:", a2.size)
    ...: print("a2資料型別:", a2.dtype)
a2維度: 2
a2形狀: (3, 3)
a2大小: 9
a2資料型別: int64

In [14]: a1[-1] = 3.14  # 浮點數將自動轉換為整數

In [15]: a1
Out[15]: array([1, 2, 3, 4, 3])

NumPy 多維陣列也包含許多方法屬性用來執行各種操作，例如轉換型別 astype() 、改變形狀 reshape()，求和 sum()，累積求和 cumsum()，均值 mean() 等等：

In [16]: a1 = a1.astype(float)  # 轉換資料型別為浮點數

In [17]: a1[-1] = 3.14

In [18]: a1
Out[18]: array([1.  , 2.  , 3.  , 4.  , 3.14])

In [19]: a1.reshape(5, 1)  # 改變形狀為5行1列
Out[19]: 
array([[1.  ],
       [2.  ],
       [3.  ],
       [4.  ],
       [3.14]])

In [20]: a1.sum()  # 元素求和
Out[20]: 13.14

In [21]: a1.cumsum()  # 元素累積求和
Out[21]: array([ 1.  ,  3.  ,  6.  , 10.  , 13.14])

In [22]: a1.mean()  # 元素均值
Out[22]: 2.628

In [23]: a2.cumsum(0)  # 按第0維（行）累積求和
Out[23]: 
array([[ 4,  9,  2],
       [ 7, 14,  9],
       [15, 15, 15]], dtype=int32)

In [24]: a2.cumsum(1)  # 按第1維（列）累積求和
Out[24]: 
array([[ 4, 13, 15],
       [ 3,  8, 15],
       [ 8,  9, 15]], dtype=int32)

In [25]: a2.mean(0)  # 按第0維（行）求均值
Out[25]: array([5., 5., 5.])

NumPy 中除了 array 等普通函式，還有許多特殊的“通用函式”（ufunc），專門用於對陣列中的每個值執行同樣運算即所謂“向量化”（Vectorize），這種操作非常高效。基本算術類通用函式也可直接寫算術運算子，會在底層自動變為對應的通用函式：

In [26]: np.add(a2, a2)  # 使用加法通用函式
Out[26]: 
array([[ 8, 18,  4],
       [ 6, 10, 14],
       [16,  2, 12]])

In [27]: a2 + a2  # 使用加法運算子
Out[27]: 
array([[ 8, 18,  4],
       [ 6, 10, 14],
       [16,  2, 12]])

In [28]: a1 * 2  # 乘法
Out[28]: array([2.  , 4.  , 6.  , 8.  , 6.28])

In [29]: np.power(3, a1)  # 乘方
Out[29]: array([ 3.        ,  9.        , 27.        , 81.        , 31.48913565])

In [30]: np.sin(a1)  # 正弦
Out[30]: array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ,  0.00159265])

以上程式碼中陣列乘以單個數值即其所有元素都乘以該數值，這種機制稱為“廣播”（Broadcast）——當二元運算發現形狀不一致時就會嘗試進行廣播：首先在較小形狀的左邊補上項數為 1 的維度，然後擴充套件各個項數為 1 的維度來匹配形狀，如果無法匹配則將引發異常：

In [31]: np.arange(3)[:, np.newaxis]  # 生成序列陣列並用索引方式改變形狀
Out[31]: 
array([[0],
       [1],
       [2]])

In [32]: a1 + Out[31]  # 形狀為(5,)和(3,1)的兩陣列相加得到陣列形狀為(3,5)
Out[32]: 
array([[1.  , 2.  , 3.  , 4.  , 3.14],
       [2.  , 3.  , 4.  , 5.  , 4.14],
       [3.  , 4.  , 5.  , 6.  , 5.14]])

In [33]: a1 + np.arange(3)  # 形狀為(5,)和(3,)的兩個陣列相加將引發異常
Traceback (most recent call last):

  File "<ipython-input-33-02f149f97b98>", line 1, in <module>
    a1 + np.arange(3)

ValueError: operands could not be broadcast together with shapes (5,) (3,)

下面是之前曼德布羅分形圖繪製程式的改進版，雖然還是同樣的演算法，但使用 NumPy 並配合輔助模組 Numba 將耗時的例程“即時編譯”（JIT）為原生機器碼，執行速度有成數量級的提升。Numba 官網 https://numba.pydata.org/

"""使用NumPy繪製曼德布羅分形圖
"""
import time
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
from numba import jit, guvectorize, complex64, int32


@jit(int32(complex64, int32))
def mandelbrot(c, maxiter):
    nreal = 0
    real = 0
    imag = 0
    for n in range(maxiter):
        nreal = real*real - imag*imag + c.real
        imag = 2 * real*imag + c.imag
        real = nreal
        if real * real + imag * imag > 4.0:
            return n
    return 0


@guvectorize([(complex64[:], int32[:], int32[:])], "(n),()->(n)", target="parallel")
def mandelbrot_numpy(c, maxit, output):
    maxiter = maxit[0]
    for i in range(c.shape[0]):
        output[i] = mandelbrot(c[i], maxiter)


def mandelbrot_set(xmin, xmax, ymin, ymax, width, height, maxiter):
    r1 = np.linspace(xmin, xmax, width, dtype=np.float32)
    r2 = np.linspace(ymin, ymax, height, dtype=np.float32)
    c = r1 + r2[:, None]*1j
    n3 = mandelbrot_numpy(c, maxiter)
    return (r1, r2, n3.T)


def mandelbrot_image(xmin, xmax, ymin, ymax, width=10, height=10, maxiter=256, cmap="jet", gamma=0.3):
    dpi = 72
    img_width = dpi * width
    img_height = dpi * height
    x, y, z = mandelbrot_set(xmin, xmax, ymin, ymax,
                             img_width, img_height, maxiter)
    plt.figure(figsize=(width, height), dpi=dpi)
    ticks = np.arange(0, img_width, 3*dpi)
    x_ticks = xmin + (xmax-xmin)*ticks/img_width
    plt.xticks(ticks, x_ticks)
    y_ticks = ymin + (ymax-ymin)*ticks/img_width
    plt.yticks(ticks, y_ticks)
    plt.axis("off")
    norm = colors.PowerNorm(gamma)
    plt.imshow(z.T, cmap=cmap, norm=norm, origin="lower")


if __name__ == "__main__":
    t1 = time.process_time()
    mandelbrot_image(-2, 0.5, -1.25, 1.25, cmap="hot")
    print(f"執行耗時：{time.process_time() - t1}秒。")

02_numpy-mandelbrot.png

要了解 NumPy 的更多細節，請檢視官方文件 https://docs.scipy.org/doc/

——程式設計原來是這樣……

【Python資料分析基礎】: 資料缺失值處理
2018-07-28
Python
Python資料分析基礎: 資料缺失值處理
2020-10-31
Python
python計算對數值
2020-09-26
Python
Python資料處理（二）：處理 Excel 資料
2019-02-16
PythonExcel
python 處理資料
2020-10-29
Python
pandas 處理資料和crc16計算
2020-09-26
使用流式計算引擎 eKuiper 處理 Protocol Buffers 資料
2022-08-18
UIProtocol
Python資料分析 – numpy
2019-02-16
Python
python-----------------numpy計數模組
2024-03-13
Python
js處理浮點數計算誤差
2018-12-07
JS
[資料處理]python基礎
2019-02-02
Python
Python資料處理典型用法
2024-11-03
Python
Python資料處理-pandas用法
2020-12-17
Python
[python][科學計算][numpy]使用指南
2019-04-20
Python
Python技巧:用isnumeric等代替數值異常處理
2019-02-19
Python
Python資料處理(一)：處理 JSON、XML、CSV 三種格式資料
2019-01-27
PythonJSONXML
python 計算 sin 值
2024-11-01
Python
Python資料分析--Numpy常用函式介紹(6)--Numpy中與股票成交量有關的計算
2022-06-07
Python函式
Python資料分析之numpy
2018-07-23
Python
Python 計算機視覺（十五）—— 影像特效處理
2021-11-22
Python計算機視覺特效
python-資料分析-Numpy-3、陣列的運算
2024-06-09
Python陣列
Python 影像處理 OpenCV （6）：影像的閾值處理
2020-06-05
PythonOpenCV
C# 優雅的處理TCP資料（心跳，超時，粘包斷包，SSL加密，資料處理等）
2024-03-21
C#TCP加密
JavaScript 資料處理 - 運算元組
2022-03-05
JavaScript
pandas（進階操作）-- 處理非數值型資料 -- 資料分析三劍客(核心)
2023-10-02
Python數學建模-02.資料匯入
2021-05-30
Python
Python文字資料分析與處理
2018-08-29
Python
Python使用xlrd處理excel資料
2020-11-19
PythonExcel
Python 影像處理 OpenCV （2）：畫素處理與 Numpy 操作以及 Matplotlib 顯示影像
2020-05-19
PythonOpenCV
php 處理浮點數精度運算數字處理等
2021-05-12
PHP
【Pandas學習筆記02】-資料處理高階用法
2021-12-01
筆記
【Pandas學習筆記02】處理資料實用操作
2021-11-26
筆記
python 包之 Pillow 影像處理教程
2022-04-11
Python
Python科學計算之Numpy陣列生成與運算
2020-09-26
Python陣列
CANN訓練：模型推理時資料預處理方法及歸一化引數計算
2023-04-03
模型
【數值計算方法】數值積分&微分
2024-07-30
Python資料分析 numpy 筆記
2024-04-09
Python筆記
【Python資料科學】之Numpy
2019-04-29
Python資料科學

資料處理02：Python數值計算包NumPy

相關文章