Datawhale中期考核
題目內容http://datawhale.club/t/topic/579/4
Task1企業收入的多樣性
程式碼
// An highlighted block
import pandas as pd
import numpy as np
df1 = pd.read_csv('company.csv')
df2 = pd.read_csv('company_data.csv')
df1.head(5)
df2.head(5)
df11 = df1.copy()
df11['證券程式碼'] = df11['證券程式碼'].str[1:].astype('int64')#使兩張表的證券程式碼格式一致
df2['日期'] = df2['日期'].str[:4].astype('int64')#使兩張表的日期格式一致
def entropy(x):
if x.any():
p = x/x.sum()
return -(p*np.log2(p)).sum()
return np.nan
res = df11.merge(df2, on=['證券程式碼','日期'], how='left').groupby(['證券程式碼','日期'])['收入額'].apply(entropy).reset_index()
res.head(5)
df1['收入熵指標'] = res['收入額']
df1
結果展示
![
Task2組隊學習資訊表的變換
// An highlighted block
df = pd.read_excel('team_data.xlsx') #把表格名稱改成了team_data,所以是讀team_data
df.drop(columns='所在群', inplace=True) #所在群的資訊沒用到,刪去
df.head(5)
col_1 = np.array(['隊伍名稱','編號_leader01','暱稱_leader01'])
col_2 = np.array([[f'編號_member{i}0', f'暱稱_member{i}0']for i in range(1,11)]).flatten()
df.columns = np.r_[col_1,col_2]
df.head(5)
res = pd.wide_to_long( df.reset_index(),
stubnames = ['暱稱','編號'],
i = ['index','隊伍名稱'],
j = '是否隊長',
sep = '_',
suffix = '.+').dropna().reset_index().drop(columns='index')
res
res['是否隊長'],res['編號'] = res['是否隊長'].str[-1],res['編號'].astype('int64')
res.reindex(columns=['是否隊長','隊伍名稱','暱稱','編號']
結果展示
Task3美國大選投票情況
程式碼
// An highlighted block
df1=pd.read_csv('president_county_candidate.csv')
df2=pd.read_csv('county_population.csv')
df1.head(5)
df2.head(5)
sum_vote=df1.groupby(['county','state'])['total_votes'].sum()
sum_vote=sum_vote.to_frame().reset_index()
US_county='.'+sum_vote['county']+', '+sum_vote['state']
df3=sum_vote.copy()
df3.head(5)
df4=df3.drop(['county','state'],axis=1).copy()
df4['US County']=US_county
df_12=df2.merge(df4,on='US County',how='left')
df_12[df_12['total_votes']/df_12['Population']>0.5].count(0)
columns=df1.groupby('candidate')['total_votes'].sum().sort_values(ascending = False).index
result=df1.pivot_table(index='state',columns='candidate',values='total_votes')
result.reindex(columns=columns)
df1.groupby(['state','county'])['total_votes'].transform('sum')
df1['縣總票數']=df1.groupby(['state','county'])['total_votes'].transform('sum')
df1['縣得票率']=df1['total_votes']/df1['縣總票數']
df_bt=df1.pivot(index=['state','county'],columns='candidate',values='縣得票率')
s_bt=df_bt['Joe Biden']-df_bt['Donald Trump']
df3=s_bt.to_frame()
result3=df3.rename(columns={0:'BT指標'}).reset_index()
def function(x):
if x.median()>0:
return 'Biden State'
else:
return 'Not Biden State'
result=result3.groupby('state')['BT指標'].transform(function)
result3[result=='Biden State']['state'].drop_duplicates().reset_index(drop=True)
相關文章
- Datawhale
- 考核簡略指南
- CSDN版主考核方案
- 考核第三天
- 360眾測靶場考核
- Java後臺考核總結Java
- 考核第一天
- 大二升大三 暑假中期
- Datawhale-MySQL-任務三MySql
- Datawhale-MySQL-任務二MySql
- Datawhale-MySQL-任務五MySql
- 專案績效考核管理有何方法?這7大考核方案你都知道嗎?
- 推薦系統 task 1 @datawhale
- Datawhale DRL task1 隨筆
- DataWhale17期-task5
- 【Datawhale】動手學資料分析
- Day5-Python變形(DataWhale)Python
- 研發考核第二週週記
- DBA的KPI考核指標有哪些KPI指標
- 研發考核第一週週記
- 大二下學期中期總結
- Datawhale-爬蟲-Task3(beautifulsoup)爬蟲
- Datawhale X 魔搭 AI夏令營(三)AI
- Day8-綜合作業1(DataWhale)
- 技術部員工績效考核方案
- Datawhale-MySQL-任務四(表聯結)MySql
- 【Datawhale】推薦系統-協同過濾
- [SQL] Datawhale 學習筆記 Task04SQL筆記
- Day5-SQL綜合練習(Datawhale)SQL
- 如何制定測試團隊的績效考核
- 如何用績效考核搞垮一個團隊?
- Datawhale學資料分析第一章
- Datawhale-爬蟲-Task5(selenium學習)爬蟲
- Datawhale-MySQL-任務六(複雜專案)MySql
- Datawhale-爬蟲-Task4(學習xpath)爬蟲
- datawhale_Day4_task09_線性代數
- Next.js踩坑入門系列(四)— 中期填坑JS
- 績效考核如何跟工資、等級掛鉤?