實踐作業

qq_47718334發表於2020-12-02

原文網址 : https://blog.csdn.net/qq_47718334/article/details/110456544

本次練習使用鳶尾屬植物資料集 .\iris.data ，在這個資料集中，包括了三類不同的鳶尾屬植物： Iris Setosa，Iris Versicolour，Iris Virginica。每類收集了50個樣本，因此這個資料集一共包含了 150個樣本。
sepallength：萼片長度
sepalwidth：萼片寬度
petallength：花瓣長度
petalwidth：花瓣寬度

匯入鳶尾屬植物資料集，保持文字不變。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
print(iris_data[0:10])

求出鳶尾屬植物萼片長度的平均值、中位數和標準差（第1列，sepallength）【知識點：統計相關】如何計算numpy陣列的均值，中位數，標準差？

import numpy as np 
outfile = r'.\iris.data' 
sepalLength = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0]) 
print(sepalLength[0:10]) # [5.1 4.9 4.7 4.6 5. 5.4 4.6 5. 4.4 4.9] 
print(np.mean(sepalLength)) # 5.843333333333334 
print(np.median(sepalLength)) # 5.8
print(np.std(sepalLength))
# 0.8253012917851409

建立一種標準化形式的鳶尾屬植物萼片長度，其值正好介於0和1之間，這樣最小值為0，最大值為 1（第1列，sepallength）。

import numpy as np 
outfile = r'.\iris.data' 
sepalLength = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0]) 
# 方法1 
aMax = np.amax(sepalLength) 
aMin = np.amin(sepalLength) 
x = (sepalLength ‐ aMin) / (aMax ‐ aMin) 
print(x[0:10])
# [0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.30555556 
# 0.08333333 0.19444444 0.02777778 0.16666667] 
# 方法2 x = (sepalLength ‐ aMin) / np.ptp(sepalLength) 
print(x[0:10]) 
# [0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.30555556 
# 0.08333333 0.19444444 0.02777778 0.16666667]

找到鳶尾屬植物萼片長度的第5和第95百分位數（第1列，sepallength）。

import numpy as np 
outfile = r'.\iris.data' 
sepalLength = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0]) 
x = np.percentile(sepalLength, [5, 95]) 
print(x) # [4.6 7.255]

把iris_data資料集中的20個隨機位置修改為np.nan值

import numpy as np 
outfile = r'.\iris.data' 
# 方法1 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
i, j = iris_data.shape np.random.seed(20200621) iris_data[np.random.randint(i, size=20), np.random.randint(j, size=20)] = np.nan 
print(iris_data[0:10])
# 方法2 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
i, j = iris_data.shape np.random.seed(20200620) iris_data[np.random.choice(i, size=20), np.random.choice(j, size=20)] = np.nan 
print(iris_data[0:10])

在iris_data的sepallength中查詢缺失值的個數和位置（第1列）。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2,
3]) 
i, j = iris_data.shape np.random.seed(20200621) iris_data[np.random.randint(i, size=20), np.random.randint(j, size=20)] = np.nan 
sepallength = iris_data[:, 0] 
x = np.isnan(sepallength) 
print(sum(x)) # 6 
print(np.where(x)) 
# (array([ 26, 44, 55, 63, 90, 115], dtype=int64),)

篩選具有 sepallength（第1列）< 5.0 並且 petallength（第3列）> 1.5 的 iris_data行。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
sepallength = iris_data[:, 0] 
petallength = iris_data[:, 2] index = np.where(np.logical_and(petallength > 1.5, sepallength < 5.0)) print(iris_data[index])

選擇沒有任何 nan 值的 iris_data行

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
i, j = iris_data.shape np.random.seed(20200621) iris_data[np.random.randint(i, size=20), np.random.randint(j, size=20)] = np.nan 
x = iris_data[np.sum(np.isnan(iris_data), axis=1) == 0] print(x[0:10])

計算 iris_data 中sepalLength（第1列）和petalLength（第3列）之間的相關係數。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
sepalLength = iris_data[:, 0] 
petalLength = iris_data[:, 2] 
# 方法1 
m1 = np.mean(sepalLength) 
m2 = np.mean(petalLength) 
cov = np.dot(sepalLength ‐ m1, petalLength ‐ m2) 
std1 = np.sqrt(np.dot(sepalLength ‐ m1, sepalLength ‐ m1)) 
std2 = np.sqrt(np.dot(petalLength ‐ m2, petalLength ‐ m2)) print(cov / (std1 * std2)) # 0.8717541573048712
# 方法2 
x = np.mean((sepalLength ‐ m1) * (petalLength ‐ m2)) 
y = np.std(sepalLength) * np.std(petalLength) 
print(x / y) # 0.8717541573048712
# 方法3 
x = np.cov(sepalLength, petalLength, ddof=False) 
y = np.std(sepalLength) * np.std(petalLength) 
print(x[0, 1] / y) # 0.8717541573048716
# 方法4 
x = np.corrcoef(sepalLength, petalLength) 
print(x) 
# [[1. 0.87175416] # [0.87175416 1. ]]

找出iris_data是否有任何缺失值。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
x = np.isnan(iris_data) 
print(np.any(x)) # False

在numpy陣列中將所有出現的nan替換為0。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
i, j = iris_data.shape np.random.seed(20200621) iris_data[np.random.randint(i, size=20), np.random.randint(j, size=20)] = np.nan 
iris_data[np.isnan(iris_data)] = 0 
print(iris_data[0:10])

找出鳶尾屬植物物種中的唯一值和唯一值出現的數量。

import numpy as np
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1, usecols=[4]) 
x = np.unique(iris_data, return_counts=True) 
print(x) 
# (array(['Iris‐setosa', 'Iris‐versicolor', 'Iris‐virginica'], dtype=object), array([50, 50, 50], dtype=int64))

將 iris_data 的花瓣長度（第3列）以形成分類變數的形式顯示。定義：Less than 3 --> ‘small’；3-5 --> ‘medium’；’>=5 --> ‘large’。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
petal_length_bin = np.digitize(iris_data[:, 2], [0, 3, 5, 10]) label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan} petal_length_cat = [label_map[x] for x in petal_length_bin] print(petal_length_cat[0:10]) 
# ['small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small']

在 iris_data 中建立一個新列，其中 volume 是 (pi x petallength x sepallength ^ 2）/ 3 。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
sepalLength = iris_data[:, 0].astype(float) 
petalLength = iris_data[:, 2].astype(float) 
volume = (np.pi * petalLength * sepalLength ** 2) / 3 
volume = volume[:, np.newaxis] 
iris_data = np.concatenate([iris_data, volume], axis=1) print(iris_data[0:10])

隨機抽鳶尾屬植物的種類，使得Iris-setosa的數量是Iris-versicolor和Iris-virginica數量的兩倍。

import numpy as np 
species = np.array(['Iris‐setosa', 'Iris‐versicolor', 'Iris‐virginica']) 
species_out = np.random.choice(species, 10000, p=[0.5, 0.25, 0.25]) 
print(np.unique(species_out, return_counts=True)) # (array(['Iris‐setosa', 'Iris‐versicolor', 'Iris‐virginica'], dtype='<U15'), array([4927, 2477, 2596], dtype=int64))

根據 sepallength 列對資料集進行排序。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
sepalLength = iris_data[:, 0] 
index = np.argsort(sepalLength) 
print(iris_data[index][0:10])

在鳶尾屬植物資料集中找到最常見的花瓣長度值（第3列）。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=object, delimiter=',', skiprows=1) 
petalLength = iris_data[:, 2] 
vals, counts = np.unique(petalLength, return_counts=True) print(vals[np.argmax(counts)]) # 1.5 
print(np.amax(counts)) # 14

在鳶尾花資料集的 petalwidth（第4列）中查詢第一次出現的值大於1.0的位置。

import numpy as np 
outfile = r'.\iris.data' 
iris_data = np.loadtxt(outfile, dtype=float, delimiter=',', skiprows=1, usecols=[0, 1, 2, 3]) 
petalWidth = iris_data[:, 3] 
index = np.where(petalWidth > 1.0) 
print(index) 
print(index[0][0]) # 50

實踐作業的一種實現方式
2024-10-20
資料採集實踐作業2
2024-10-17
作業幫劉強作業幫在OceanBase 4.0的探索與實踐
2023-01-06
作業系統課程實踐報告
2018-07-04
作業系統
資料採集與融合實踐作業三
2024-10-30
2020系統綜合實踐第七次實踐作業 28組
2020-06-06
2024資料採集與融合實踐作業一
2024-10-16
Flink 實踐教程-入門（8）：簡單 ETL 作業
2021-11-28
作業幫多雲架構設計與實踐
2022-10-28
架構
軟工實踐第三次作業（結對第二次作業）
2019-03-15
軟工
軟工實踐第二次作業（結對第一次作業）
2019-03-08
軟工
資料採集第三次實踐作業
2024-11-12
資料採集與融合技術實踐作業三
2024-11-11
資料採集與融合技術實踐作業一
2024-10-19
資料採集與融合技術實踐作業四
2024-11-19
資料採集實踐第三次作業
2024-11-26
資料採集實踐第四次作業
2024-11-26
資料採集實踐第二次作業
2024-10-27
資料採集實踐第一次作業
2024-10-27
資料採集與融合技術實踐--作業三
2024-11-10
2024資料採集與融合技術實踐-作業3
2024-11-12
2024資料採集與融合技術實踐-作業4
2024-11-26
資料採集與融合技術實踐課作業2
2024-10-16
擁抱雲原生，作業幫多雲架構實踐之路
2022-07-20
架構
離散數學實踐作業，java輸出真值表（轉）
2020-12-06
Java
製作容器映象的最佳實踐
2023-01-14
行業實踐：RocketMQ 業務整合典型行業應用和實踐
2022-10-15
行業MQ
實驗作業2
2024-10-10
製作 Python Docker 映象的最佳實踐
2022-12-15
PythonDocker
使用 DartPad 製作程式碼實踐教程
2022-06-16
Dart
作業幫線上業務 Kubernetes Serverless 虛擬節點大規模應用實踐
2022-05-12
Server
資料採集與融合技術第四次實踐作業
2024-11-12
資料採集與融合技術-第四次實踐作業
2024-11-16
資料採集與融合技術-第三次實踐作業
2024-10-30
資料採集與融合技術實踐第三次作業
2024-10-29
資料採集與融合技術第一次實踐作業
2024-10-15
TKE 使用者故事 | 作業幫 Kubernetes 原生排程器優化實踐
2022-01-05
優化
linux 實驗課作業
2024-10-28
Linux

實踐作業

相關文章