【Task01】Numpy學習打卡

Xiao_Spring發表於2020-10-20

原文網址 : https://blog.csdn.net/Xiao_Spring/article/details/109191998

注：為了節約行數，預設import numpy as np已經寫在每段程式碼前，不再重複寫入，如果有新的包引入，會在程式碼頭部import

一、常量

前言：

NumPy包括幾個常量：
np.inf、np.NINF、np.PZERO & np.NZERO、np.nan、np.e、np.pi、np.euler_gamma、np.newaxis

1、正無窮大

numpy中正無窮大一共有五種表現形式：Inf = inf = infty = Infinity = PINF

NumPy 使用IEEE二進位制浮點演算法標準（IEEE 754）
正無窮大不等於負無窮大。但無窮大相當於正無窮大。

【例一、1】判斷Inf 、inf 、infty、Infinity、PINF的關係

print(np.Inf == np.inf)
print(np.inf == np.infty)
print(np.infty == np.Infinity)
print(np.Infinity == np.PINF)

結果：

True
True
True
True

2、負無窮大

【例一、2】列印負無窮大

print（np.NINF）

結果：

-inf

3、正負零

【例一、3】列印正負零

正負零被認為是有限數

print（np.PZERO）
print（np.NZERO）

結果：

0.0
-0.0

4、非數值

非數值有三種表現形式：nan、NaN、NAN

Not a Number不等於無窮大。

【例一、4-1】判斷兩個非數值是否相等

#兩個nan是不等的
print(np.nan == np.nan)
print(np.NAN == np.NAN)
print(np.NaN == np.NaN)

結果：

False
False
False

表示兩個非數值是不相等的

【例一、4-2】判斷nan、NaN、NAN的關係

#比較不同的NAN
print(np.nan == np.NAN)
print(np.nan == np.NaN)
print(np.NAN == np.NaN)

結果：

False
False
False

表示不同形式的非數值常量不等

【例一、4-3】計算ndarray中非數值常量（nan）的數量

#列印array
x = np.array([1.2, 5, np.nan , True , False])
print(x)
#列印是否為nan
y = np.isnan(x)
print(y)
#數不是0的數量
print(np.count_nonzero(y))
#那麼可以用一個函式計算array中nan的數量
def countNaN(array):
    return np.count_nonzero(np.isnan(array))
print(countNaN(x))
z = np.array([NaN, 5, nan , NAN , False])
print(countNaN(z))

結果：

[1.2 5. nan 1. 0. ]
[False False True False False]
1
1
3

這裡自定義了一個countNaN函式，先呼叫numpy的isnan方法，返回包含布林變數的ndarray，然後繼續使用numpy的count_nonzero方法，返回不是0（即False）的數量。

綜上，countNaN函式返回的即是ndarray中nan的數量。

5、自然數e

【例一、5-1】列印自然數e

print(np.e)

結果：

2.718281828459045

在這裡插入圖片描述

【例一、5-2】判斷numpy中自然數e的有效位數

print(np.e == 2.718281828459045)
print(np.e == 2.7182818284590452353602874)

結果：

True
True

突發奇想，想看一下np.e和列印的浮點數是否相等，結果發現無論是小數點後15位和小數點後25位，均判斷相等，這裡先丟擲一個疑問。

6、圓周率π

【例一、6】列印圓周率π

print(np.pi)

結果：

3.141592653589793

7、伽馬常數γ

【一、例7】伽馬常數γ

print(np.euler_gamma)

結果：

0.5772156649015329

8、None（newaxis）

numpy.newaxis從字面上來理解就是用來建立新軸的，或者說是用來對array進行維度擴充套件的。

【例一、8-1】判斷None和newaxis的關係

print(None == np.newaxis)

結果：

True

表示None和np.newaxis實際是一個常量，axis有軸的意思，我們可以通過幾個例子看看為什麼在numpy中引入newaxis常數：

axis 英[ˈæksɪs] 美[ˈæksɪs]
n. 軸(旋轉物體假想的中心線); (尤指圖表中的) 固定參考軸線，座標軸; 對稱中心線(將物體平分為二);

【例一、8-2】newaxis的用法

>>>x = np.array([1, 0, 1, 9]) 
>>>print(x.shape)
#表示x是一維的ndarray變數
(4,)

>>>x1 = x[np.newaxis,:]
>>>print(x1)
>>>print(x1.shape)
#表示x1是二維的ndarray變數 1*4
[[1 0 1 9]]
(1, 4)

>>>x2 = x[np.newaxis,:]
>>>print(x2)
>>>print(x2.shape)
>#表示x2是二維的ndarray變數 4*1
[[1]
 [0]
 [1]
 [9]]
(4, 1)

到這裡我們可以清楚，np.newaxis的作用就是給ndarray增加維度，放在前面就是在前面新增維度（把之前的一維變數提煉成新的二維變數中），放在後面就是在後面新增維度（把之前的一維變數打散，充當二維變數中的一維變數）

二、資料型別

前言

numpy 支援的資料型別比 Python 內建的型別要多很多，基本上可以和 C 語言的資料型別對應上，其中部分型別對應為 Python 內建的型別。下表列舉了常用 NumPy 基本型別。為了區別於 Python 原生的資料型別，bool、int、float、complex、str 等型別名稱末尾都加了 “_”：

1、常見資料型別

1）整型：

型別	備註	說明	字元程式碼
int8 = byte	8位	整型	‘i1’
int16 = short	16位	整型	‘i2’
int32 = intc	32位	整型	‘i4’
int_ = int64 = long = int0 = intp	64位	整型	‘i8’

2）無符號整型：

型別	備註	說明	字元程式碼
uint8 = ubyte	8位	無符號整型	‘u1’
uint16 = ushort	16位	無符號整型	‘u2’
uint32 = uintc	32位	無符號整型	‘u4’
uint64 = uintp = uint0 = uint	64位	無符號整型	‘u8’

3）浮點型：

型別	備註	說明	字元程式碼
float16 = half	16位	浮點型	‘f2’
float32 = single	32位	浮點型	‘f4’
float_ = float64 = double	64位	浮點型	‘f8’

4）其他型別：

型別	備註	說明	字元程式碼
bool_ = bool8	8位	布林型別	‘b1’
str_ = unicode_ = str0 = unicode		Unicode 字串	‘U’
datetime64		日期時間型別	‘M’
timedelta64		表示兩個時間之間的間隔	‘m’

2、建立資料型別

print(numpy.dtype) 所顯示的都是 NumPy 中的資料型別，而非 Python原生資料型別。
實際上是 dtype 物件的例項

#dtype原始碼
class dtype(object):
    def __init__(self, obj, align=False, copy=False):
        pass

【例二、2】建立不同資料型別

在這裡，我們新建一個函式genDtype，用來生成檢視不同資料型別，然後集中測試：

def genDtype(str):
    a = np.dtype(str)
    print(a.type)  # <class 'numpy.bool_'>
    print(a.itemsize)

#集中測試
strList = ['b1','i1','i2','i4','i8','u1','u2','u4','u8','f2','f4','f8','S','S3','U3']
for x in strList:
    genDtype(x)

<class ‘numpy.bool_’>
1
<class ‘numpy.int8’>
1
<class ‘numpy.int16’>
2
<class ‘numpy.int32’>
4
<class ‘numpy.int64’>
8
<class ‘numpy.uint8’>
1
<class ‘numpy.uint16’>
2
<class ‘numpy.uint32’>
4
<class ‘numpy.uint64’>
8
<class ‘numpy.float16’>
2
<class ‘numpy.float32’>
4
<class ‘numpy.float64’>
8
<class ‘numpy.bytes_’>
0
<class ‘numpy.bytes_’>
3
<class ‘numpy.str_’>
12

我們注意到這裡itemsize的單位是位元組，值得注意的是型別為’S’的dtype大小為0，而’S3’大小為3，另外’U’的單個大小為4位元組，即32位。

3、資料型別資訊

Python 的浮點數通常是64位浮點數，幾乎等同於 np.float64。

NumPy和Python整數型別的行為在整數溢位方面存在顯著差異，與 NumPy 不同，Python 的int 是靈活的。這意味著Python整數可以擴充套件以容納任何整數並且不會溢位，這就是說numpy是有長度顯示的，那麼我們來看一下不同資料型別的顯示範圍。

#iinfo定義原始碼
class iinfo(object):
    def __init__(self, int_type):
        pass
    def min(self):
        pass
    def max(self):
        pass

可以看到iinfo類的初始化引數裡面有一個int_type，是提供給呼叫者使用。

【例二、3-1】檢視int16和int32的範圍

ii16 = np.iinfo(np.int16)
print(ii16.min)
print(ii16.max)

ii32 = np.iinfo(np.int32)
print(ii32.min)
print(ii32.max)

結果顯示：

-32768
32767
-2147483648
2147483647

同理，浮點數在numpy中也有範圍限制，先看定義：

#finfo定義原始碼
class finfo(object):
    def _init(self, dtype):

【例二、3-2】檢視float16和float32的範圍

ff16 = np.finfo(np.float16)
print(ff16.bits)
print(ff16.min)
print(ff16.max)
print(ff16.eps)

ff32 = np.finfo(np.float32)
print(ff32.bits)
print(ff32.min)
print(ff32.max)
print(ff32.eps)

16
-65500.0
65500.0
0.000977 32
-3.4028235e+38
3.4028235e+38
1.1920929e-07

其中，eps是一個很小的非負數，除法的分母不能為0的,不然會直接跳出顯示錯誤。使用eps將可能出現的零用eps來替換，這樣不會報錯。

三、時間日期和時間增量

引言

本章我們學習Numpy資料型別的時間日期（datetime64）和時間增量（timedelta64）

在 numpy 中，我們很方便的將字串轉換成時間日期型別 datetime64（datetime 已被 python 包含的日期時間庫所佔用）。

datatime64是帶單位的日期時間型別，其單位如下：

日期單位	程式碼含義	時間單位	程式碼含義
Y	年	h	小時
M	月	m	分鐘
W	周	s	秒
D	天	ms	毫秒
-	-	us	微秒
-	-	ns	納秒
-	-	ps	皮秒
-	-	fs	飛秒
-	-	as	阿託秒

秒、毫秒、微bai秒、納秒、皮秒、飛秒、阿託秒每兩級之du間的換算進率為1000。

其中1阿託秒等於光飛越3粒氫阿子的時間。

比例上，一阿託秒之於一秒，如同一秒之於 317.1 億年，約為宇宙年齡的兩倍。

1、datetime64的使用：

【例三、1-1】datetime64的使用自動選擇對應單位

>>>a = np.datetime64('2020-10')
>>>print(a,a.dtype)
2020-10 datetime64[M]

>>>b = np.datetime64('2020-10-20')
>>>print(b,b.dtype)
2020-10-20 datetime64[D]

>>>c = np.datetime64('2020-10-20 19')
>>>print(c,c.dtype)
2020-10-20T19 datetime64[h]

>>>d = np.datetime64('2020-10-20 19:37')
>>>print(d,d.dtype)
2020-10-20T19:37 datetime64[m]

>>>e = np.datetime64('2020-10-20 19:37:21')
>>>print(e,e.dtype)
2020-10-20T19:37:21 datetime64[s]
...

【例三、1-2】datetime64的使用指定使用的單位

在這裡，只用一個例子說明：

>>>a = np.datetime64('2020-10', 'D')
>>>print(a)
2020-10-01

我們再來判斷一下‘2020-10’和‘2020-10-01’的關係：

>>>print(np.datetime64('2020-10') == np.datetime64('2020-10-01'))
True

由上例可以看出，2020-10 和 2020-10-01 所表示的其實是同一個時間。
事實上，如果兩個 datetime64 物件具有不同的單位，它們可能仍然代表相同的時刻。並且從較大的單位（如月份）轉換為較小的單位（如天數）是安全的。

【例三、1-3】字串建立日期時間陣列(詳盡效應)

>>>a = np.array(['2020-10', '2020-10-20', '2020-10-20 20:00'], dtype='datetime64')
>>>print(a)
>>>print(a.dtype)
['2020-10-01T00:00' '2020-10-20T00:00' '2020-10-20T20:00']
datetime64[m]

可以看到，這裡如果list中單位不統一，則統一用最詳盡的方式表示日期（詳盡效應）。

【例三、1-4】配合 arange 函式使用

配合arrange函式，用於生成日期範圍

>>>a = np.arange('2020-10', '2020-11', dtype='datetime64[D]')
>>>print(a)
['2020-10-01' '2020-10-02' '2020-10-03' '2020-10-04' '2020-10-05'
 '2020-10-06' '2020-10-07' '2020-10-08' '2020-10-09' '2020-10-10'
 '2020-10-11' '2020-10-12' '2020-10-13' '2020-10-14' '2020-10-15'
 '2020-10-16' '2020-10-17' '2020-10-18' '2020-10-19' '2020-10-20'
 '2020-10-21' '2020-10-22' '2020-10-23' '2020-10-24' '2020-10-25'
 '2020-10-26' '2020-10-27' '2020-10-28' '2020-10-29' '2020-10-30'
 '2020-10-31']

同理，年-年，月-月…的用法相同，不再贅述

另外，這種方式也滿足詳盡效應，例如：

>>>a = np.arange('2020-10-01 20', '2020-10-03', dtype='datetime64')
>>>print(a)
['2020-10-01T20' '2020-10-01T21' '2020-10-01T22' '2020-10-01T23'
 '2020-10-02T00' '2020-10-02T01' '2020-10-02T02' '2020-10-02T03'
 '2020-10-02T04' '2020-10-02T05' '2020-10-02T06' '2020-10-02T07'
 '2020-10-02T08' '2020-10-02T09' '2020-10-02T10' '2020-10-02T11'
 '2020-10-02T12' '2020-10-02T13' '2020-10-02T14' '2020-10-02T15'
 '2020-10-02T16' '2020-10-02T17' '2020-10-02T18' '2020-10-02T19'
 '2020-10-02T20' '2020-10-02T21' '2020-10-02T22' '2020-10-02T23']

上述程式碼，h和D的datetime64變數，仍然取h進行展開

【例三、1-5】有趣的起始日期

>>>a = np.datetime64('2020-10-20', 'W')
>>>b = np.datetime64('2020-10-22', 'W')
>>>print(a, b)
2020-10-15 2020-10-22

跟【例三、1-2】不同的是，當用’W’去指定D型別的datetime64變數時，有如下規定：

1）如果是星期四，返回當天；
2）否則，返回上一個星期四的日期

2、關於Datetime64的運算

【例三、2-1】Datetime64 和 timedelta64 運算

timedelta64 表示兩個 Datetime64 之間的差。timedelta64 也是帶單位的，並且和相減運算中的兩個 Datetime64 中的較小的單位保持一致。（詳盡效應）

>>>a = np.datetime64('2020-10-20') - np.datetime64('2020-10-19')
>>>b = np.datetime64('2020-10-20') - np.datetime64('2020-10-19 09:00')
>>>c = np.datetime64('2020-10-20') - np.datetime64('2020-10-18 23:00', 'D')
>>>print(a, a.dtype)
>>>print(b, b.dtype)
>>>print(c, c.dtype)

1 days timedelta64[D]
900 minutes timedelta64[m]
2 days timedelta64[D]

第三行同【例三、1-2】，'D’限制了天數之後的展示

下面是Datetime64 和 Timedelta64 運算：

>>>a = np.datetime64('2020-10') + np.timedelta64(20, 'D')
>>>print(a,a.dtype)
2020-10-21 datetime64[D]

【例三、2-2】timedelta64 單獨運算

#單獨生成timedelta64
>>>a = np.timedelta64(1, 'Y')    # 方式一
>>>b = np.timedelta64(a, 'M')    # 方式二
>>>print(a)
>>>print(b)
1 years
12 months

#timedelta64加減乘除
>>>a = np.timedelta64(2, 'Y')
>>>b = np.timedelta64(5, 'M')
>>>print(a + b)
>>>print(a - b)
>>>print(2 * a)
>>>print(a / b)
29 months
19 months
4 years
4.8

年（‘Y’）和月（‘M’）這兩個單位是經過特殊處理的，它們無法和其他單位進行運算，一年有幾天？一個月有幾天？這些都是不確定的，比如：

>>>a = np.timedelta64(1, 'M')
>>>b = np.timedelta64(a, 'D')
TypeError: Cannot cast NumPy timedelta64 scalar from metadata [M] to [D] according to the rule 'same_kind'

【例三、2-3】numpy.datetime64 與 datetime.datetime 相互轉換

datetime 模組是python中提供用於處理日期和時間的類。

在支援日期時間數學運算的同時，實現的關注點更著重於如何能夠更有效地解析其屬性用於格式化輸出和資料操作。

>>>import datetime

>>>dt = datetime.datetime(2020, 10, 20)
>>>dt64 = np.datetime64(dt, 'D')
>>>print(dt64, dt64.dtype)

>>>dt2 = dt64.astype(datetime.datetime)
>>>print(dt2,type(dt2))
2020-10-20 datetime64[D]
2020-10-20 <class 'datetime.date'>

可以看到，我們可以直接用np.datetime64(datetime,dtype)來實現從後者到前者的轉換，另外，我們可以通過datetime64.astype(datetime.datetime)實現從前者到前者的轉換

3、工作日功能

為了允許在只有一週中某些日子有效的上下文中使用日期時間，NumPy包含一組“busday”（工作日）功能。

將指定的偏移量（offsets）應用於工作日，單位天（‘D’），偏移量為正表示朝著日曆向後的方向

【例三、3-1】返回指定日期是否是工作日

# 2020-10-20 星期二
>>>a = np.is_busday('2020-10-18')
>>>b = np.is_busday('2020-10-20')
>>>print(a)  
>>>print(b)  

False
True

【例三、3-2】返回下n個工作日

# 2020-07-10 星期五
>>>a = np.busday_offset('2020-10-20', offsets=1)
>>>print(a)  # 2020-07-13
2020-10-21

>>>a = np.busday_offset('2020-10-20', offsets=2)
>>>print(a)
2020-10-22

>>>a = np.busday_offset('2020-10-18', offsets=2)
>>>print(a)
ValueError: Non-business day date in busday_offset


>>>a = np.busday_offset('2020-10-18', offsets=0, roll='forward')
>>>b = np.busday_offset('2020-10-18', offsets=0, roll='backward')
>>>print(a) 
>>>print(b) 
2020-10-19
2020-10-16

>>>a = np.busday_offset('2020-10-18', offsets=1, roll='forward')
>>>b = np.busday_offset('2020-10-18', offsets=1, roll='backward')
>>>print(a)  
>>>print(b)  
2020-10-20
2020-10-19

可以指定偏移量為 0 來獲取當前日期向前或向後最近的工作日，當然，如果當前日期本身就是工作日，則直接返回當前日期。如果當前日期為非工作日，預設報錯。可以指定 forward 或 backward 規則來避免報錯。
（注：forward指的是日曆向後，backward指的是日曆向前，如果從字面意思容易弄混）

小技巧：第一步先判斷當天是否有工作日，若是則進行偏移；若不是，則需要根據

【例三、3-3】統計一個 `datetime64[D]` 陣列中的工作日天數。

# 2020-10-20 星期二
>>>begindates = np.datetime64('2020-10-20')
>>>enddates = np.datetime64('2020-11-01')
>>>a = np.arange(begindates, enddates, dtype='datetime64')
>>>b = np.count_nonzero(np.is_busday(a))
>>>c = np.busday_count(begindates, enddates)
>>>print(a)
>>>print(b)
>>>print(c)

['2020-10-20' '2020-10-21' '2020-10-22' '2020-10-23' '2020-10-24'
 '2020-10-25' '2020-10-26' '2020-10-27' '2020-10-28' '2020-10-29'
 '2020-10-30' '2020-10-31']
9

a和b的組合效果等於c，都是在計算一段日期範圍的工作日天數，且前閉後開

【例三、3-4】自定義周掩碼值，即指定一週中哪些星期是工作日。

# 2020-10-20 星期二
>>>a = np.is_busday('2020-10-20', weekmask=[1, 0, 1, 0, 1, 0, 1])
>>>b = np.is_busday('2020-10-20', weekmask='0100000')
>>>c = np.is_busday('2020-10-20', weekmask='Tue')
>>>print(a)  
>>>print(b)  
>>>print(c)
False
True
True

如上，weekmask一共有三種設定方式，這裡需要注意的是系統預設與真實世界的日曆一一對應，即1970-01-01為星期四，是不是很有趣呢？

四、陣列的建立

引言

NumPy 中定義的最重要的物件是稱為 ndarray 的 N 維陣列型別，它是描述相同型別的元素集合。ndarray 中的每個元素都是資料型別物件(dtype)的物件。ndarray 中的每個元素在記憶體中使用相同大小的塊

ndarray的含義是The N-dimensional array，意思就是N維陣列

1. 依據現有資料來建立 ndarray

【例四、1-1】通過array()函式進行建立。

def array(p_object, dtype=None, copy=True, order='K', subok=False, ndmin=0):

【例1-1】通過array()函式進行建立


>>>datas = [x for x in range(5)]

# 建立一維陣列
>>>a = np.array(datas)
>>>b = np.array(tuple(datas))
>>>print(a, type(a))
>>>print(b, type(b))
>>>print(a.shape,b.shape)
[0 1 2 3 4] <class 'numpy.ndarray'>
[0 1 2 3 4] <class 'numpy.ndarray'>
(5,) (5,)


# 建立二維陣列
>>>c = np.array([a]*2)
>>>print(c, type(c))
[[0 1 2 3 4]
 [0 1 2 3 4]] <class 'numpy.ndarray'> (2, 5)

我們可以看到使用array時，傳入的資料型別可以是list，也可以是tuple。有一個小技巧n維陣列的array()內的左括號數等於n，例如：
在這裡插入圖片描述

【例四、1-2】通過asarray()函式進行建立

array()和asarray()都可以將結構資料轉化為 ndarray，但是array()和asarray()主要區別就是當資料來源是ndarray 時，array()仍然會 copy 出一個副本，佔用新的記憶體，但不改變 dtype 時 asarray()不會。

如果改變dtype，例如int轉成float，那麼不用佔用新的記憶體，即保持一致。

def asarray(a, dtype=None, order=None):
    return array(a, dtype, copy=False, order=order)

#`array()`和`asarray()`都可以將結構資料轉化為 ndarray

>>>x = [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
>>>y = np.array(x)
>>>z = np.asarray(x)
>>>#改變x的資料
>>>x[1][2] = 2
>>>print(x,type(x))
[[1, 1, 1], [1, 1, 2], [1, 1, 1]] <class 'list'>

>>>print(y,type(y))
[[1 1 1]
[1 1 1]
[1 1 1]] <class 'numpy.ndarray'>

>>>print(z,type(z))
[[1 1 1]
[1 1 1]
[1 1 1]] <class 'numpy.ndarray'>

#當資料來源是ndarray時

>>>x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
>>>y = np.array(x)
>>>z = np.asarray(x)
>>>w = np.asarray(x, dtype=np.int)
>>>x[1][2] = 2
>>>print(x,type(x),x.dtype)
[[1 1 1]
[1 1 2]
[1 1 1]] <class 'numpy.ndarray'> int32

>>>print(y,type(y),y.dtype)
[[1 1 1]
[1 1 1]
[1 1 1]] <class 'numpy.ndarray'> int32

>>>print(z,type(z),z.dtype)
[[1 1 1]
[1 1 2]
[1 1 1]] <class 'numpy.ndarray'> int32

>>>print(w,type(w),w.dtype)
[[1 1 1]
[1 1 2]
[1 1 1]] <class 'numpy.ndarray'> int32

可見w隨著x的變化而變化，而不是在一個新的記憶體中

#更改為較大的dtype時，其大小必須是array的最後一個axis的總大小（以位元組為單位）的除數

>>>x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
>>>print(x, x.dtype)
[[1 1 1]
[1 1 1]
[1 1 1]] int32
>>>x.dtype = np.float

# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

怎麼理解上面的錯誤，很簡單，畫個圖示意一下：