Lesson17——NumPy 統計函式

反差萌er發表於2022-02-16

原文網址 : https://www.cnblogs.com/BlairGrowing/p/15900578.html

NumPy 教程目錄

^{1 NumPy 統計函式}

　　NumPy 提供了很多統計函式，用於從陣列中查詢最小元素，最大元素，百分位標準差和方差等。函式說明如下

1.1 統計

method	description
amin(a[, axis, out, keepdims, initial, where])	返回陣列的最小值或沿軸的最小值。
amax(a[, axis, out, keepdims, initial, where])	返回陣列的最大值或沿軸的最大值。
nanmin(a[, axis, out, keepdims])	返回陣列的最小值或沿軸的最小值，忽略任何 NaN。
nanmax(a[, axis, out, keepdims])	返回陣列的最大值或沿軸的最大值，忽略任何 NaN。
ptp(a[, axis, out, keepdims])	沿軸的值範圍（最大值 - 最小值）。
percentile(a, q[, axis, out, …])	沿指定軸計算資料的第 q 個百分位數。
nanpercentile(a, q[, axis, out, …])	計算沿指定軸的資料的第 q 個百分位數，同時忽略 nan 值。
quantile(a, q[, axis, out, overwrite_input, …])	沿指定軸計算資料的第 q 個分位數。
nanquantile(a, q[, axis, out, …])	沿指定軸計算資料的第 q 個分位數，同時忽略 nan 值。

1.2 平均值和方差

method	description
median(a[, axis, out, overwrite_input, keepdims])	計算沿指定軸的中位數。
average(a[, axis, weights, returned])	計算沿指定軸的加權平均值。
mean(a[, axis, dtype, out, keepdims])	計算沿指定軸的算術平均值。
std(a[, axis, dtype, out, ddof, keepdims])	計算沿指定軸的標準差。
var(a[, axis, dtype, out, ddof, keepdims])	計算沿指定軸的方差。
nanmedian(a[, axis, out, overwrite_input, …])	計算沿指定軸的中位數，同時忽略 NaN。
nanmean(a[, axis, dtype, out, keepdims])	計算沿指定軸的算術平均值，忽略 NaN。
nanstd(a[, axis, dtype, out, ddof, keepdims])	計算沿指定軸的標準差，同時忽略 NaN。
nanvar(a[, axis, dtype, out, ddof, keepdims])	計算沿指定軸的方差，同時忽略 NaN。

1.3 相關係數

method	description
corrcoef(x[, y, rowvar, bias, ddof])	返回 Pearson 積矩相關係數。
correlate(a, v[, mode])	兩個一維序列的互相關。
cov(m[, y, rowvar, bias, ddof, fweights, …])	給定資料和權重，估計協方差矩陣。

1.4 直方圖

method	description
histogram(a[, bins, range, normed, weights, …])	計算一組資料的直方圖。
histogram2d(x, y[, bins, range, normed, …])	計算兩個資料樣本的二維直方圖。
histogramdd(sample[, bins, range, normed, …])	計算一些資料的多維直方圖。
bincount(x[, weights, minlength])	計算非負整數陣列中每個值的出現次數。
histogram_bin_edges(a[, bins, range, weights])	僅計算直方圖函式使用的 bin 邊緣的函式。
digitize(x, bins[, right])	返回輸入陣列中每個值所屬的 bin 的索引。

2 統計例子

2.1 numpy.amin()

　　numpy.amin() 用於計算陣列中的元素沿指定軸的最小值。

Example：

a = np.array([[3,7,5],[8,4,3],[2,4,9]])  
print(a)
print(np.amin(a)) #所有元素的最小值
print(np.amin(a,axis=0)) #每列元素的最小值
print(np.amin(a,axis=1))  #每行元素的最小值
"""
[[3 7 5]
 [8 4 3]
 [2 4 9]]
 
2

[2 4 3]

[3 3 2]
"""

2.2 numpy.amax()

　　numpy.amax() 用於計算陣列中的元素沿指定軸的最大值。

Example：

a = np.array([[3,7,5],[8,4,3],[2,4,9]])  
print(a)
print(np.amax(a)) #所有元素的最大值
print(np.amax(a,axis=0)) #每列元素的最大值
print(np.amax(a,axis=1))  #每行元素的最大值
"""
[[3 7 5]
 [8 4 3]
 [2 4 9]]
9
[8 7 9]
[7 8 9]
"""

2.3 numpy.nanmin()

　　numpy.nanmin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>) 返回陣列的最小值或沿軸的最小值，忽略任何 NaN。當遇到所有 NaN 切片時，會引發 RuntimeWarning 併為該切片返回 Nan。

Example：

a = np.array([[1, 2], [3, np.nan],[3, -np.nan]])
print(np.amin(a))
print(np.nanmin(a))
print(np.nanmin(a,axis=0))
print(np.nanmin(a,axis=1))
"""
nan
1.0
[1. 2.]
[1. 3. 3.]
"""

2.4 numpy.nanmax()

　　numpy.nanmax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>) 返回陣列的最大值或沿軸的最大值，忽略任何 NaN。當遇到所有 NaN 切片時，會引發 RuntimeWarning 併為該切片返回 NaN。

Example：

a = np.array([[1, 2], [3, np.nan],[3, -np.nan]])
print(np.amax(a))
print(np.nanmax(a))
print(np.nanmax(a,axis=0))
print(np.nanmax(a,axis=1))
"""
nan
3.0
[3. 2.]
[2. 3. 3.]
"""

2.5 numpy.ptp()

　　numpy.ptp(a, axis=None, out=None, keepdims=<no value>) 沿軸的值範圍（最大值 - 最小值）。

Example：

x = np.array([[4, 9, 2, 10],
              [6, 9, 7, 12]])
print(np.ptp(x))
print(np.ptp(x,axis=0))
print(np.ptp(x,axis=1))
"""
10
[2 0 5 2]
[8 6]
"""

2.6 numpy.percentile()

　　numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, interpolation=None) 百分位數是統計中使用的度量，表示小於這個值的觀察值的百分比。

　　引數說明：

- a：輸入陣列
- q：要計算的百分位數，在 0 ~ 100 之間
- axis: 沿著它計算百分位數的軸

　　首先明確百分位數：

　　第 $q$ 個百分位數是這樣一個值，它使得至少有 q% 的資料項小於或等於這個值，且至少有 (100-q)% 的資料項大於或等於這個值。

舉個例子：高等院校的入學考試成績經常以百分位數的形式報告。比如，假設某個考生在入學考試中的語文部分的原始分數為 54 分。相對於參加同一考試的其他學生來說，他的成績如何並不容易知道。但是如果原始分數54分恰好對應的是第70百分位數，我們就能知道大約70%的學生的考分比他低，而約30%的學生考分比他高。

Example：

a = np.array([[10, 7, 4], [3, 2, 1]])
print ('我們的陣列是：')
print (a)
print ('呼叫 percentile() 函式：')
# 50% 的分位數，就是 a 裡排序之後的中位數
print (np.percentile(a, 50)) 
# axis 為 0，在縱列上求
print (np.percentile(a, 50, axis=0)) 
# axis 為 1，在橫行上求
print (np.percentile(a, 50, axis=1)) 
# 保持維度不變
print (np.percentile(a, 50, axis=1, keepdims=True))
"""
我們的陣列是：
[[10  7  4]
 [ 3  2  1]]
呼叫 percentile() 函式：
3.5
[6.5 4.5 2.5]
[7. 2.]
[[7.]
 [2.]]
"""

2.7 numpy.quantile()

　　numpy.quantile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, interpolation=None) 沿指定軸計算資料的第 q 個分位數。

Note

　　給定長度為 N 的向量V，V的第 q 個分位數是從最小到最大的方式的值 q 如果歸一化排名與 q 的位置完全不匹配，則兩個最近鄰居的值和距離以及內插引數將確定分位數。如果 q = 0.5，此函式與中位數相同；如果 q = 0.0，此函式與最小值相同；如果 q = 1.0，則與最大值相同.

Example：

>>> a = np.array([[10, 7, 4], [3, 2, 1]])
>>> a
array([[10,  7,  4],
       [ 3,  2,  1]])
>>> np.quantile(a, 0.5)
3.5
>>> np.quantile(a, 0.5, axis=0)
array([6.5, 4.5, 2.5])
>>> np.quantile(a, 0.5, axis=1)
array([7.,  2.])
>>> np.quantile(a, 0.5, axis=1, keepdims=True)
array([[7.],
       [2.]])
>>> m = np.quantile(a, 0.5, axis=0)
>>> out = np.zeros_like(m)
>>> np.quantile(a, 0.5, axis=0, out=out)
array([6.5, 4.5, 2.5])
>>> m
array([6.5, 4.5, 2.5])
>>> b = a.copy()
>>> np.quantile(b, 0.5, axis=1, overwrite_input=True)
array([7.,  2.])
>>> assert not np.all(a == b)

3 平均值和方差

3.1 numpy.median()

　　numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False) 計算沿指定軸的中位數。

Example：

a = np.array([[10, 7, 4], [3, 2, 1]])
print(a)
print(np.median(a))  #所有元素的中位數
print(np.median(a, axis=0))
print(np.median(a, axis=1))
"""
[[10  7  4]
 [ 3  2  1]]
3.5
[6.5 4.5 2.5]
[7.,  2.]
"""

Example：

m = np.median(a, axis=0)
out = np.zeros_like(m)
print(np.median(a, axis=0, out=m))
print(m)
b = a.copy()
print(np.median(b, axis=1, overwrite_input=True))
assert not np.all(a==b)
b = a.copy()
print(np.median(b, axis=None, overwrite_input=True))
assert not np.all(a==b)
"""
[6.5 4.5 2.5]
[6.5 4.5 2.5]
[7. 2.]
3.5
"""

3.2 numpy.average()

　　numpy.average(a, axis=None, weights=None, returned=False) 計算沿指定軸的加權平均值。

　　計算方式為：avg = sum(a * weights) / sum(weights)

Example：

data = np.arange(1, 5)
print(data)
print(np.average(data))
print(np.average(np.arange(1, 11), weights=np.arange(10, 0, -1)))
"""
[1 2 3 4]
2.5
4.0
"""

Example：

data = np.arange(6).reshape((3,2))
print(data)
print(np.average(data, axis=1, weights=[1./4, 3./4]))
"""
[[0 1]
 [2 3]
 [4 5]]
[0.75 2.75 4.75]
"""

3.3 numpy.mean()

　　numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>) 計算沿指定軸的算術平均值。

Example：

a = np.array([[1, 2], [3, 4]])
print(np.mean(a))
print(np.mean(a, axis=0))
print(np.mean(a, axis=1))
"""
2.5
[2. 3.]
[1.5 3.5]
"""

3.4 numpy.std()

　　numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>) 計算沿指定軸的標準差。

Example：

a = np.array([[1, 2], [3, 4]])
print( np.std(a))
print(np.std(a, axis=0))
print(np.std(a, axis=1))
"""
1.118033988749895
[1. 1.]
[0.5 0.5]
"""

3.5 numpy.var()

　　numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>) 計算沿指定軸的方差。

Example：

a = np.array([[1, 2], [3, 4]])
print( np.var(a))
print(np.var(a, axis=0))
print(np.var(a, axis=1))
"""
1.25
[1. 1.]
[0.25 0.25]
"""

4 相關係數

4.1 numpy.corrcoef()

　　numpy.corrcoef(x, y=None, rowvar=True, bias=<no value>, ddof=<no value>, *, dtype=None) 返回 Pearson 積矩相關係數。

Example：

rng = np.random.default_rng(seed=42)
xarr = rng.random((3, 3))
print(xarr)
"""
[[0.77395605 0.43887844 0.85859792]
 [0.69736803 0.09417735 0.97562235]
 [0.7611397  0.78606431 0.12811363]]
"""
R1 = np.corrcoef(xarr)
print(R1)
"""
[[ 1.          0.99256089 -0.68080986]
 [ 0.99256089  1.         -0.76492172]
 [-0.68080986 -0.76492172  1.        ]]
"""

4.2 numpy.correlate()

　　numpy.correlate(a, v, mode='valid') 兩個一維序列的互相關。

Example：

print(np.correlate([1, 2, 3], [0, 1, 0.5]))
print(np.correlate([1, 2, 3], [0, 1, 0.5], "same"))
print(np.correlate([1, 2, 3], [0, 1, 0.5], "full"))
"""
[3.5]
[2.  3.5 3. ]
[0.5 2.  3.5 3.  0. ]
"""

4.3 numpy.cov()

　　numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None) 給定資料和權重，估計協方差矩陣。

Example：

m = np.arange(10, dtype=np.float64)
f = np.arange(10) * 2
a = np.arange(10) ** 2.
ddof = 1
w = f * a
v1 = np.sum(w)
v2 = np.sum(w * a)
m -= np.sum(m * w, axis=None, keepdims=True) / v1
cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)
print(cov)
"""
2.368621947484198
"""

Example：

x = np.array([[0, 2], [1, 1], [2, 0]]).T
print(x)
print(np.cov(x))
"""
[[0 1 2]
 [2 1 0]]
[[ 1. -1.]
 [-1.  1.]]
"""

5 直方圖

5.1 numpy.histogram()

　　numpy.histogram(a, bins=10, range=None, normed=None, weights=None, density=None) 計算資料集的直方圖。

Example：

print( np.histogram([1, 2, 1], bins=[0, 1, 2, 3]))
print( np.histogram(np.arange(4), bins=np.arange(5), density=True))
print( np.histogram([[1, 2, 1], [1, 0, 1]], bins=[0,1,2,3]))
"""
(array([0, 2, 1], dtype=int64), array([0, 1, 2, 3]))
(array([0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4]))
(array([1, 4, 1], dtype=int64), array([0, 1, 2, 3]))
"""

Example：

a = np.arange(5)
hist, bin_edges = np.histogram(a, density=True)
print(hist)
print(hist.sum())
print(np.sum(hist * np.diff(bin_edges)))
"""
[0.5 0.  0.5 0.  0.  0.5 0.  0.5 0.  0.5]
2.4999999999999996
1.0
"""

Example：

rng = np.random.RandomState(10)  # deterministic random data
a = np.hstack((rng.normal(size=1000),
               rng.normal(loc=5, scale=2, size=1000)))
_ = plt.hist(a, bins='auto')  # arguments are passed to np.histogram
plt.title("Histogram with 'auto' bins")
Text(0.5, 1.0, "Histogram with 'auto' bins")
plt.show()

輸出結果：

numpy-random函式
2019-02-16
random函式
NumPy 數學函式
2023-12-20
函式
numpy.where()函式
2021-01-01
函式
numpy、pandas常用函式功能
2020-10-11
函式
Python中Numpy函式詳解
2024-03-25
Python函式
Task02：Numpy常用函式
2020-11-08
函式
NumPy之:ndarray中的函式
2021-05-24
函式
Python資料分析--Numpy常用函式介紹(7)--Numpy中矩陣和通用函式
2022-06-08
Python函式矩陣
Python資料分析--Numpy常用函式介紹(5)--Numpy中的相關性函式
2022-06-01
Python函式
NumPy常用的位運算函式
2023-12-20
函式
NumPy 基礎 (三) - 數學函式
2019-10-28
函式
【Numpy學習12】邏輯函式
2020-10-28
函式
numpy學習回顧-數學函式及邏輯函式
2020-10-28
函式
(2)python_numpy: numpy.ma.masked_invalid 與 numpy.ma.compress_rowcols 函式用法
2020-09-25
Python函式
Lesson12——NumPy 字串函式之 Part1：字串操作函式
2022-02-15
字串函式
Lesson14——NumPy 字串函式之 Par3：字串資訊函式
2022-02-16
字串函式
Python資料分析--Numpy常用函式介紹(6)--Numpy中與股票成交量有關的計算
2022-06-07
Python函式
numpy. 統計相關
2020-11-27
Numpy---統計相關
2020-11-27
第二篇使用 numpy函式建立陣列
2020-10-26
函式陣列
Python資料分析--Numpy常用函式介紹(3)
2022-05-23
Python函式
Python資料分析--Numpy常用函式介紹(2)
2022-05-21
Python函式
留存統計引數聚合函式
2024-05-27
函式
numpy3.統計相關
2020-11-26
淺談Numpy中的shape、reshape函式的區別
2022-08-08
函式
MySQL函式大全(字串函式，數學函式，日期函式，系統級函式，聚合函式)
2020-11-14
MySql函式字串
Python資料分析--Numpy常用函式介紹(9)--Numpy中幾中常見的圖形
2022-06-10
Python函式
DAX 第六篇：統計函式
2019-07-25
函式
golang count 單字元字串統計函式
2020-11-21
Golang字元字串函式
核心函式系統呼叫系統命令庫函式
2020-12-05
函式
區間統計聚合函式組合器
2024-05-27
函式
函式設計
2020-04-06
函式
NumPy 通用函式（ufunc）：高效能陣列運算的利器
2024-06-05
函式陣列
PHP函式漏洞審計之addslashes函式-
2021-03-22
PHP函式
在JS中統計函式執行次數
2018-10-26
JS函式
Oracle 分組彙總統計函式的使用
2020-11-10
Oracle函式
python函式程式設計返回函式匿名函式裝飾器偏函式
2020-11-04
Python函式程式設計
Python 潮流週刊#56：NumPy 2.0 裡更快速的字串函式（摘要）
2024-06-15
Python字串函式

Lesson17——NumPy 統計函式

NumPy 教程目錄

1 NumPy 統計函式

1.1 統計

1.2 平均值和方差

1.3 相關係數

1.4 直方圖

2 統計例子

2.1 numpy.amin()

2.2 numpy.amax()

2.3 numpy.nanmin()

2.4 numpy.nanmax()

2.5 numpy.ptp()

2.6 numpy.percentile()

2.7 numpy.quantile()

3 平均值和方差

3.1 numpy.median()

3.2 numpy.average()

3.3 numpy.mean()

3.4 numpy.std()

3.5 numpy.var()

4 相關係數

4.1 numpy.corrcoef()

4.2 numpy.correlate()

4.3 numpy.cov()

5 直方圖

5.1 numpy.histogram()

相關文章

^{1 NumPy 統計函式}