Pandas知識點彙總(2)——布林索引

tony0087發表於2021-09-09

資料集地址:

1.計算布林值統計資訊

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

#讀取movie,設定行索引是movie_title 
pd.options.display.max_columns = 50 
movie = pd.read_csv("./data/movie.csv",index_col = 'movie_title')

#判斷電影時長是否超過兩個小時    #Figure1
movie_2_hours = movie['duration'] > 120

#統計時長超過兩小時的電影總數
print(movie_2_hours.sum())  #result:1039
#統計時長超過兩小時的電影的比例
print(movie_2_hours.mean())
#統計False和True的比例 
print(movie_2_hours.value_counts(normalize = True)) 
#比較同一個DataFrame中的兩列
actors = movie[['actor_1_facebook_likes','actor_2_facebook_likes']].dropna()
print((actors['actor_1_facebook_likes'] > actors['actor_2_facebook_likes']).mean()) #Figure2

執行結果:
圖片描述
Figure1
圖片描述
Figure2

2. 構建多個布林條件

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

#讀取movie,設定行索引是movie_title 
pd.options.display.max_columns = 50 
movie = pd.read_csv("./data/movie.csv",index_col = 'movie_title')

#建立多個布林條件
criteria1 = movie.imdb_score > 8
criteria2 = movie.content_rating == "PG-13"
criteria3 = (movie.title_year < 2000) | (movie.title_year >= 2010)

"""
print(criteria1.head())
print(criteria2.head())
print(criteria3.head())
執行結果:Figure1
"""

#將多個布林條件合併成一個
criteria_final = criteria1 & criteria2 & criteria3 

print(criteria_final.head())
#執行結果:Figure2

執行結果:
圖片描述
Figure1
圖片描述
Figure2

3.用布林索引過濾

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

#讀取movie,設定行索引是movie_title 
pd.options.display.max_columns = 50 
movie = pd.read_csv("./data/movie.csv",index_col = 'movie_title')
#建立第一個布林條件
crit_a1 = movie.imdb_score > 8 
crit_a2 = movie.content_rating == 'PG-13'
crit_a3 = (movie.title_year < 2000) | (movie.title_year > 2009)
final_crit_a = crit_a1 & crit_a2 & crit_a3

#建立第二個布林條件
crit_b1 = movie.imdb_score < 5
crit_b2 = movie.content_rating == 'R'
crit_b3 = (movie.title_year >= 2000) & (movie.title_year <= 2010)
final_crit_b = crit_b1 & crit_b2 & crit_b3

#將兩個條件用或運算合併起來
final_crit_all = final_crit_a | final_crit_b
print(final_crit_all.head())  #Figure 1 

#用最終的布林條件過濾資料
print(movie[final_crit_all].head()) #Figure2

執行結果:
圖片描述
Figure1

圖片描述
Figure2

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

#讀取movie,設定行索引是movie_title 
pd.options.display.max_columns = 50 
movie = pd.read_csv("./data/movie.csv",index_col = 'movie_title')
#建立第一個布林條件
crit_a1 = movie.imdb_score > 8 
crit_a2 = movie.content_rating == 'PG-13'
crit_a3 = (movie.title_year < 2000) | (movie.title_year > 2009)
final_crit_a = crit_a1 & crit_a2 & crit_a3

#建立第二個布林條件
crit_b1 = movie.imdb_score < 5
crit_b2 = movie.content_rating == 'R'
crit_b3 = (movie.title_year >= 2000) & (movie.title_year <= 2010)
final_crit_b = crit_b1 & crit_b2 & crit_b3

#將兩個條件用或運算合併起來
final_crit_all = final_crit_a | final_crit_b

#使用loc,對指定的列做過濾操作,可以清楚地看到過濾是否起作用
cols = ['imdb_score','content_rating','title_year']
movie_filtered = movie.loc[final_crit_all,cols]
print(movie_filtered.head(10))

執行結果:
圖片描述

參考教程:

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/1747/viewspace-2824422/,如需轉載,請註明出處,否則將追究法律責任。

相關文章