👋嗨，你有一份微信好友報告待查收～

AwesomeTang發表於2020-05-15

全部程式碼都已上傳至我的KLab—?嗨，你有一份微信好友報告待查收～，Fork後可執行生成你自己的微信好友報告～

本次專案統計展示瞭如下資訊：

?好友地域分佈
?性別統計
?備註比例
▶️首字母統計
?使用最多的emoji
?簽名詞雲

其他視覺化報告：

登入微信

因為在KLab裡面沒法調起其他應用來開啟二維碼圖片，所以這邊是通過多執行緒來處理：

?執行緒1：itchat獲取二維碼圖片，等待掃碼完成；
?執行緒2: 讀取本地二維碼圖片然後通過matplotlib載入到KLab；

具體程式碼如下，不算複雜～

? 之前有小夥伴遇到不能掃碼登入的，是因為微信那邊做了限制，對於有些賬號（特別是新註冊的賬號）不能在網頁端登入；

code_path = os.path.join('/home/kesci/work', 'QR.png')

def show_qrcode():
    # 等待圖片下載
    time.sleep(3)
    while True:
        if os.path.exists(code_path):
            img = Image.open(code_path)
            plt.figure(figsize=(15, 8))
            plt.imshow(img)
            plt.axis('off') # 關掉座標軸為 off
            plt.show()
            break
            
 
t= threading.Thread(target=show_qrcode)#建立執行緒
t.setDaemon(True)#設定為後臺執行緒，這裡預設是False，設定為True之後則主執行緒不用等待子執行緒
t.start()#開啟執行緒
 
t = threading.Thread(target=itchat.login(picDir=code_path))
t.start()

地域分佈

微信返回的好友資訊中包括了Province和City兩個欄位，不過有亮點要注意的：

對於北京等四個直轄市，Province中是存的城市名，City中是行政區；
另外地域資訊是國外的我這邊是都歸到一類下面了，二級分類用的Province的資訊；

資料處理

friends = itchat.get_friends(update=True)
df_friends = pd.DataFrame(list(friends))

f_loc = df_friends.groupby(
    ['Province', 'City'])['UserName'].count().reset_index()
# 篩選掉位置資訊缺失的
f_loc = f_loc[f_loc.Province != '']

for idx, row in f_loc.iterrows():
    # 位置資訊缺失的歸到其他中
    if not row.Province:
        f_loc.loc[idx, 'Province'] = '其他'
        f_loc.loc[idx, 'City'] = '其他'
    # 國外的統一歸到一類
    if re.match('[a-zA-Z]', row.Province):
        f_loc.loc[idx, 'Province'] = '國外'
        f_loc.loc[idx, 'City'] = row['Province']

# 四個直轄市City中是行政區
f_loc['City'].loc[f_loc.Province == '北京'] = '北京'
f_loc['City'].loc[f_loc.Province == '上海'] = '上海'
f_loc['City'].loc[f_loc.Province == '重慶'] = '重慶'
f_loc['City'].loc[f_loc.Province == '天津'] = '天津'

# 重新聚合求和
f_loc = f_loc.groupby(['Province', 'City'])['UserName'].sum().reset_index()
f_loc.columns = ['Province', 'City', 'num']

data_pair = []

parent_data = f_loc.Province.unique().tolist()
for province in parent_data:
    t_data = f_loc[f_loc.Province==province]
    t_dict = {"name": province,
              "label":{"show": False},
		      "children": []}
    # 父層級--好友數量大於15的顯示標籤
    if t_data.num.sum() > 15:
        t_dict['label']['show'] = True
    
    
    t_data.sort_values(by="num",ascending=False,inplace=True)
    t_data = t_data.reset_index(drop=True)
    
    else_num = 0
    for idx, row in t_data.iterrows():
        """
        因為涉及到的城市過多，全部顯示太亂了
        以下兩種情況下顯示，否則將歸入「其他城市」
        1. 每個父目錄下好友最多的城市；
        2. 該城市好友數量大於10；
        """
        if idx == 0:
            child_data = {"name": row.City, "value":row.num, "label":{"show": False}}
            # 子層級--好友數量大於10的顯示標籤
            if child_data['value'] > 10:
                child_data['label']['show'] = True
            t_dict['children'].append(child_data)        
        elif row.num > 10:
            child_data = {"name": row.City, "value":row.num, "label":{"show": True}}
            t_dict['children'].append(child_data)
        else:
            else_num += row.num
        
        
    
    if else_num > 10:
        child_data = {"name": '其他城市', "value":else_num, "label":{"show": True}}        
        t_dict['children'].append(child_data)    
    elif else_num:
        child_data = {"name": '其他城市', "value":else_num, "label":{"show": False}}        
        t_dict['children'].append(child_data)    
    
    data_pair.append(t_dict)

視覺化

c = (Sunburst(
        init_opts=opts.InitOpts(
            theme='light',
            width="1000px",
            height="1000px"))
    .add(
        "",
        data_pair=data_pair,
        highlight_policy="ancestor",
        radius=[0, "100%"],
        sort_='null',
        levels=[
            {},
            {
                "r0": "20%",
                "r": "45%",
                "itemStyle": {"borderColor": 'rgb(220,220,220)', "borderWidth": 2}
            },
            {"r0": "45%", "r": "80%", "label": {"align": "right"},
                "itemStyle": {"borderColor": 'rgb(220,220,220)', "borderWidth": 1}}
        ],
    )
    .set_global_opts(title_opts=opts.TitleOpts(title="好 友\n\n地 域 分 布",
                                               pos_left="center",
                                               pos_top="center",
                                               title_textstyle_opts=opts.TextStyleOpts(font_style='oblique', color="black", font_size=30),))
    .set_series_opts(label_opts=opts.LabelOpts(font_size=18, formatter="{b}: {c}"))
)

c.render_notebook()

好友性別佔比

f_sex = df_friends.groupby(['Sex'])['UserName'].count().reset_index()
f_sex['f_sex'] = f_sex['Sex'].astype(str).str.replace('1', '男').replace('2', '女').replace('0', '資訊缺失')

background_color_js = """new echarts.graphic.RadialGradient(0.5, 0.5, 1, [{
                                        offset: 0,
                                        color: '#696969'
                                    }, {
                                        offset: 1,
                                        color: '#000000'
                                    }])"""


pie = (Pie(init_opts=opts.InitOpts(theme='light', width='1000px', height='800px'))
       .add('WeChat️', [(row['f_sex'], row['UserName']) for _, row in f_sex.iterrows()],
            radius=["50%", "75%"])
       .set_global_opts(title_opts=opts.TitleOpts(title="好友性別佔比", 
                                                  pos_left="center",
                                                  title_textstyle_opts=opts.TextStyleOpts(color="black", font_size=20),     ),
                        legend_opts=opts.LegendOpts(is_show=True, pos_top='5%'))
       .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {d}%", font_size=18),
                        tooltip_opts=opts.TooltipOpts(trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"),)
      )
pie.render_notebook()

好友備註比例

你有給好友備註的習慣嗎❓


remark_num_f = len(df_friends.RemarkName[(
    df_friends.RemarkName != '') & (df_friends.Sex == 2)])
total_num_f = len(df_friends.RemarkName[(df_friends.Sex == 2)])

remark_num_m = len(df_friends.RemarkName[(
    df_friends.RemarkName != '') & (df_friends.Sex == 1)])
total_num_m = len(df_friends.RemarkName[(df_friends.Sex == 1)])

l1 = Liquid(
    init_opts=opts.InitOpts(
        theme='light',
        width='1000px',
        height='800px'))
l1.add("", [remark_num_f/total_num_f],
       center=["70%", "50%"],
       label_opts=opts.LabelOpts(font_size=50,
                                 formatter=JsCode(
                                     """function (param) {
                            return (Math.floor(param.value * 10000) / 100) + '%';
                        }"""),
                                 position="inside",
                                 ))
l1.set_global_opts(
    title_opts=opts.TitleOpts(
        title="女性好友備註比例",
        pos_left='62%',
        pos_top='8%'))
l1.set_series_opts(tooltip_opts=opts.TooltipOpts(is_show=False))

l2 = Liquid(
    init_opts=opts.InitOpts(
        theme='light',
        width='1000px',
        height='800px'))
l2.add("",
       [remark_num_m/total_num_m],
       center=["25%", "50%"],
       label_opts=opts.LabelOpts(font_size=50,
                                 formatter=JsCode(
                                     """function (param) {
                        return (Math.floor(param.value * 10000) / 100) + '%';
                    }"""),
                                 position="inside",
                                 ),)
l2.set_global_opts(
    title_opts=opts.TitleOpts(
        title="男性好友備註比例",
        pos_left='16%',
        pos_top='8%'))
l2.set_series_opts(tooltip_opts=opts.TooltipOpts(is_show=False))


grid = Grid().add(
    l1, grid_opts=opts.GridOpts()).add(
        l2, grid_opts=opts.GridOpts())
grid.render_notebook()

首字母分佈

這個統計與微信-聯絡人裡面的歸類有點不一樣，微信-聯絡人裡面是優先使用備註名的，這裡只與好友的微信暱稱有關；

first_letter = []
for item in df_friends.PYQuanPin:
    # 替換掉emoji表情和空格
    item = re.sub('spanclassemojiemoji[a-z0-9]{5}?|span', '' , item)
    
    try:
        if re.match('[A-Z]', item.upper()[0]):
            first_letter.append(item.upper()[0])
        else:
            first_letter.append('#')
    except IndexError:
        first_letter.append('#')
    

letters = [chr(i) for i in range(65,91)]
letters.append('#')
data_pair = [(w, first_letter.count(w)) for w in letters]
data_pair = sorted(data_pair, key=lambda x: x[1], reverse=True)

pie = (Pie(init_opts=opts.InitOpts(theme='light', width='1000px', height='800px'))
       .add("Wechat", data_pair,
            radius=["50%", "75%"])
       .set_global_opts(title_opts=opts.TitleOpts(title="微信名首字母",
                                                  pos_left="center",
                                                  title_textstyle_opts=opts.TextStyleOpts(color="black", font_size=20),),
                        legend_opts=opts.LegendOpts(is_show=False, pos_top='5%'))
       .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {d}%", font_size=18),
                        tooltip_opts=opts.TooltipOpts(trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"),)
                        )

pie.render_notebook()

Emoji表情

包括了微信暱稱和簽名中的emoji表情～

emoji_list = []
for name in df_friends.NickName:
    emoji = re.findall(u'[\U00010000-\U0010ffff]', name)
    if emoji:
        emoji_list.extend(emoji)

with open('/home/kesci/input/emoji6441/emoji.json', 'r') as f:
    emoji_code = json.load(f)

def find_emoji(code):
    for item in emoji_code:
        if item['codes'] == code.upper():
            return item['char']
            break

for sig in df_friends.Signature:
    emoji = re.findall('emoji([a-z0-9]{5})', sig)
    
    if emoji:
        emoji = [find_emoji(code) for code in emoji]
        emoji_list.extend(emoji)


counter = Counter(emoji_list).most_common(18)

bar = (Bar(init_opts=opts.InitOpts(theme='light', width='1000px', height='800px'))
       .add_xaxis([x for x, y in counter[::-1]])
       .add_yaxis('使用次數', [y for x, y in counter[::-1]])
       .set_global_opts(title_opts=opts.TitleOpts(title="使用最多的emoji表情",
                                                  pos_left="center",
                                                  title_textstyle_opts=opts.TextStyleOpts(color="black",
                                                                                          font_size=20)),
                        legend_opts=opts.LegendOpts(is_show=False),
                        xaxis_opts=opts.AxisOpts(is_show=False,),
                        yaxis_opts=opts.AxisOpts(
           axistick_opts=opts.AxisTickOpts(is_show=False),
           axisline_opts=opts.AxisLineOpts(is_show=False)))
       .set_series_opts(label_opts=opts.LabelOpts(is_show=True,
                                                  position='right',
                                                  font_style='italic'),
                        itemstyle_opts={"normal": {
                            "color": JsCode(
                                """new echarts.graphic.LinearGradient(1, 1, 0, 0, [{
                                                offset: 0,
                                                color: 'rgba(0, 244, 255, 1)'
                                            }, {
                                                offset: 1,
                                                color: 'rgba(0, 77, 167, 1)'
                                            }], false)"""
                            ),
                            "barBorderRadius": [30, 30, 30, 30],
                            "shadowColor": "rgb(0, 160, 221)",
                        }
       }
).reversal_axis())

bar.render_notebook()

簽名詞雲圖

簽名說的最多的詞語是什麼呢❓



back_color = imread('/home/kesci/work/font/wechat_logo.jpeg')  # 解析該圖片
wc = WordCloud(background_color='white',  # 背景顏色
               max_words=1000,  # 最大詞數
               mask=back_color,  # 以該引數值作圖繪製詞雲，這個引數不為空時，width和height會被忽略
               max_font_size=100,  # 顯示字型的最大值
               font_path="/home/kesci/work/font/simhei.ttf",  # 解決顯示口字型亂碼問題
               random_state=42,  # 為每個詞返回一個PIL顏色
               )

text=''
pattern = u"[\u4e00-\u9fa5]" #保留漢字
for x in df_friends['Signature']:
    text_temp =  re.findall(pattern, x) 
    text = text + ''.join(text_temp)

def word_cloud(texts):
    words_list = []
    word_generator = jieba.cut(texts, cut_all=False)  # 返回的是一個迭代器
    for word in word_generator:
        if len(word) > 1:  #去掉單字
            words_list.append(word)
    return ' '.join(words_list)  


text = word_cloud(text)

wc.generate(text)
# 基於彩色影像生成相應彩色
image_colors = ImageColorGenerator(back_color)
plt.figure(figsize = (15,15))
plt.axis('off')
# 繪製詞雲
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis('off')
# 儲存圖片
plt.show()

??整理不易，歡迎大家點贊支援～

叮咚，你的微信年度聊天報告請查收「GitHub 熱點速覽」
2023-12-04
Github
當 Python 遇到了你的微信好友
2019-04-21
Python
微信有3000以上好友的人，更容易找工作
2020-12-12
使用R語言分析微信好友
2018-10-05
R語言
微信聊天趣味玩法，把“你被移出群聊”發給好友
2019-01-21
如何利用Python網路爬蟲抓取微信好友數量以及微信好友的男女比例
2019-03-04
Python爬蟲
《2023 亞信安慧AntDB資料庫年度報告》請查收
2024-01-24
資料庫
OpenHarmony創新賽 | 您有一份創新激勵獎待領取請查收！
2023-09-21
叮咚！你有一封阿里師兄的來信，請查收
2018-12-26
阿里
（乾貨）微信小程式轉發好友
2018-05-15
微信小程式
python itchat 爬取微信好友資訊
2018-06-02
Python
企業微信sdk呼叫，企業微信好友收發訊息
2020-12-15
微信視訊號想獲取更多推薦，你的好友就是裁判
2020-10-29
這有一份網路相關的訊息，請查收
2020-04-07
批次檢測微信單向好友，超實用！
2024-11-16
Python自動掃描出微信不是好友名單
2021-05-19
Python
Hi 遊戲人~你有一封信待開啟
2020-04-17
遊戲
這是一份來自聯想Filez的2022年終總結報告！請注意查收
2023-01-19
微信不新增好友也能聊天技巧微信怎麼跟陌生人聊天？
2018-04-08
微信：品牌權利人保護報告
2018-03-13
DBA：這有一份對接NBU備份故障排除指南，請查收！
2021-09-11
行業報告| 請查收 2022 MongoDB 資料與創新報告
2022-06-21
行業MongoDB
微信小遊戲好友排行榜快速開發教程
2019-03-02
遊戲
微信互刪好友聊天記錄還能恢復嗎
2020-10-22
2019微信小程式“買買買”報告
2019-11-18
微信小程式
一份遊戲公司防騙/回款指南拍拍拍拍了你，請查收
2020-07-03
遊戲
嗨購開工季 | 請查收你的專屬薅羊毛必備攻略
2022-03-04
您有一份阿里云云原生直播攻略待查收
2021-11-22
阿里
Ding！您有一份ChunJun實用指南，請查收
2022-08-19
叮！您有一份參會指南，請注意查收
2020-12-14
用python玩微信（聊天機器人，好友資訊統計）
2018-03-16
Python機器人
微信小程式匯出Excel檔案並轉發給好友
2024-05-09
微信小程式Excel
GetX 實現類似微信轉發搜尋多選好友
2021-05-08
健康報&微信：2018微信應用改善醫療服務趨勢研究報告（附下載）
2018-08-02
活動｜滴，你有雙份端午驚喜待查收！
2020-06-01
微信怎麼傳送資料夾給好友微信發生整個資料夾的辦法
2021-11-02
微信：2021國慶微信資料包告
2021-10-08
透過標籤清理微信好友：Python自動化指令碼解析
2024-12-02
Python指令碼

&#128075;嗨，你有一份微信好友報告待查收～