《python專案開發案例集錦》讀書筆記

每天看一遍，防止戀愛&&墮落發表於2020-03-20

原文網址 : https://blog.csdn.net/zengraoli/article/details/104982044

Python筆記

文章目錄

說明

對本書，基本都是案列；我個人對本書的例子，僅對第13章"開心麻花影視作品分析"感興趣，所以下面的內容主要是對該章進行描述

資料獲取

從網上所說的，用chrome的F12模擬手機對maoyan電影評論進行獲取，方法已經失效，但連結依然是可以用的，不知道原來是啥樣的，offset的數字貌似不是分頁的內容

經過測試，offset到99的時候獲取的資料僅僅只有124條(過濾重複後的)

但不影響進行本章節內容的學習

本部分程式碼完成的內容主要有

訪問夏洛特煩惱在貓眼的評論連結，返回json
解析json資料，存放到panda中
對panda中的資料進行去重，寫到excel檔案中

貼出獲取評論內容程式碼如下

def getData(totalPage):
    header = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36"}
    tomato = pd.DataFrame(columns=['date', 'score', 'city', 'comment', 'nick'])
    i = 1
    while True:
        # 測試用程式碼
        print(i)
        if i == totalPage:
            break
        try:
            url = 'http://m.maoyan.com/mmdb/comments/movie/' + filmId + '.json?_v_=yes&offset=' + str(i)
            response = requests.get(url, headers=header)
            # 讀取返回內容
            # print(response.text)
            print(url)
            content = response.text
            total = json.loads(content)['total']
            if total == 0:
                # 結束迴圈
                break
            else:
                data = json.loads(content)['cmts']
                datah = json.loads(content)['hcmts']
                # print(data)
                for item in data:
                    # print(item)
                    tomato = tomato.append(
                        {'date': item['time'].split(' ')[0], 'city': item.get("cityName", ""),
                         'score': item['score'], 'comment': item['content'],
                         'nick': item['nick']}, ignore_index=True)
                for item in datah:
                    # print(item)
                    tomato = tomato.append(
                        {'date': item['time'].split(' ')[0], 'city': item.get("cityName", ""),
                         'score': item['score'], 'comment': item['content'],
                         'nick': item['nick']}, ignore_index=True)
            i += 1
            time.sleep(2)
        except Exception as ee:
            i += 1
            # 跳出本次迴圈
            print(ee, url)
            continue
    # 去掉重複資料
    tomato = tomato.drop_duplicates(subset=['date', 'score', 'city', 'comment', 'nick'], keep='first')
    # 生成xlsx檔案
    tomato.to_excel(moveName+'.xlsx', sheet_name='data')

得到的檔案內容

評論內容生成柱狀圖+折線圖

獲取到了對應的評論資訊，只需要讀取進行，利用這些資訊去構建想要的柱狀圖、折線圖。panda允許我們對某一列資料做分組和聚合，我們會對score進行聚合，得到mean和count，放到pyecharts中的對應圖表進行展示

pyecharts把結果儲存到html檔案中

本部分程式碼如下

def buildChart():
    # getData(1000)
    # 讀取檔案內容
    tomato_com = pd.read_excel(moveName + '.xlsx')
    grouped = tomato_com.groupby(['city'])
    grouped_pct = grouped['score']  # 得到分組後的score
    city_com = grouped_pct.agg(['mean', 'count']) # 對score進行聚合，得到mean和count
    # reset_index可以還原索引，從新變為預設的整型索引
    city_com.reset_index(inplace=True)
    # 返回浮點數 0.01 返回到後兩位
    city_com['mean'] = round(city_com['mean'], 2)
    city_main = city_com.sort_values('count', ascending=False)[0:10]
    attr = city_main['city']
    v1 = city_main['count']
    v2 = city_main['mean']

    # 柱狀圖的繪製
    bar = Bar({"theme": ThemeType.MACARONS})
    bar.add_xaxis(attr.tolist())
    bar.extend_axis(
            yaxis=opts.AxisOpts(
                # name="評分",
                type_="value",
                min_=0,
                max_=5,
                interval=1,
                axislabel_opts=opts.LabelOpts(formatter="{value} °分"),
            )
        )

    bar.set_global_opts(
        title_opts=opts.TitleOpts(title="主要城市評論數"),
        tooltip_opts=opts.TooltipOpts(
            is_show=True, trigger="axis", axis_pointer_type="cross" # 十字軸
        ),
        xaxis_opts=opts.AxisOpts(
            # name="城市名",
            type_="category",
            axispointer_opts=opts.AxisPointerOpts(is_show=True, type_="shadow"),
        ),
        yaxis_opts=opts.AxisOpts(
            name="評論數量",
            type_="value",
            min_=0,
            max_=10,
            interval=2,
            axislabel_opts=opts.LabelOpts(formatter="{value}"),
            axistick_opts=opts.AxisTickOpts(is_show=True),
            splitline_opts=opts.SplitLineOpts(is_show=True),
        ),
    )

    bar.add_yaxis("評論數", v1.tolist())

    # 折線圖的繪製
    line = (
        Line()
            .add_xaxis(xaxis_data=attr.tolist())
            .add_yaxis(
            series_name="評分",
            yaxis_index=1,
            y_axis=v2.tolist(),
            label_opts=opts.LabelOpts(is_show=True),
            # linestyle_opts=opts.LineStyleOpts(color="#fff"),
            # itemstyle_opts=opts.ItemStyleOpts(
            #     color="red", border_color="#fff", border_width=3
            # ),
            markpoint_opts=opts.MarkPointOpts(
                data=[
                    opts.MarkPointItem(type_="max", name="最大值"),
                    opts.MarkPointItem(type_="min", name="最小值"),
                ]
            ),
        )
        .set_global_opts(
            yaxis_opts=opts.AxisOpts(
                type_="category",
                boundary_gap=False,
                axislabel_opts=opts.LabelOpts(margin=30, color="#ffffff63"),
                axisline_opts=opts.AxisLineOpts(is_show=False),
                axistick_opts=opts.AxisTickOpts(
                    is_show=True,
                    length=250,
                    linestyle_opts=opts.LineStyleOpts(color="#ffffff1f"),
                ),
                splitline_opts=opts.SplitLineOpts(
                    is_show=True, linestyle_opts=opts.LineStyleOpts(color="#ffffff1f")
                ),
            ),
        )
    )
    bar.overlap(line).render(outputHtmlName)

可以看到得到的html檔案內容如下

評論內容生成詞雲

圖雲需要首先進行分詞，這裡使用的jieba，讀入詞雲背景.jpg作為背景模板，如果不設定模板生成是一張矩形圖

得到了分詞的結果，需要對每個詞出現的次數做統計，在詞雲中出現次數越多的詞，繪製字型越大

接著就是繪製到記憶體中，儲存到file

完整程式碼如下

def buildWordCloud():
    tomato_com = pd.read_excel(moveName + '.xlsx')
    # 評論內容
    tomato_str = ' '.join(tomato_com['comment'])
    words_list = []
    # 用結巴分詞
    word_generator = jieba.cut_for_search(tomato_str)
    for word in word_generator:
        words_list.append(word)
    words_list = [k for k in words_list if len(k) > 1]
    back_color = imageio.imread(wcMaskFileName)  # 解析該圖片
    wc = WordCloud(background_color='white',  # 背景顏色
                   max_words=200,  # 最大詞數
                   mask=back_color,  # 以該引數值作圖繪製詞雲，這個引數不為空時，width和height會被忽略
                   max_font_size=300,  # 顯示字型的最大值
                   font_path=fontFileName,  # 字型
                   # random_state=42,  # 為每個詞返回一個PIL顏色
                   # width=800,  # 圖片的寬
                   # height=600  # 圖片的長
                   )
    tomato_count = collections.Counter(words_list) # 統計元素出現的次數
    wc.generate_from_frequencies(tomato_count) # 給定詞頻去繪製詞
    # 基於彩色影像生成相應彩色
    image_colors = ImageColorGenerator(back_color)
    # 繪製詞雲
    plt.figure()
    plt.imshow(wc.recolor(color_func=image_colors))
    # 去掉座標軸
    plt.axis('off')
    # 儲存詞雲圖片
    wc.to_file(outputWcFileName)

生成的詞雲效果

其餘部分程式碼

# -*- coding: utf-8 -*-

import json
import requests
import time
import collections

import pandas as pd
import matplotlib.pyplot as plt
from pyecharts.charts import Line, Bar
from pyecharts import options as opts
from pyecharts.globals import ThemeType
from wordcloud import WordCloud, ImageColorGenerator

import jieba
import imageio


filmId = "246082"
moveName = "夏洛特煩惱"
outputHtmlName = "bar_base.html"
wcMaskFileName = "詞雲背景.jpg"
fontFileName = "STFANGSO.ttf"
outputWcFileName = "詞雲.png"


if __name__ == "__main__":
    # 從貓眼電影中獲取資料
    getData(100)

    # 生成對應柱狀圖的的html檔案
    buildChart()

    # 生成圖雲
    buildWordCloud()

整個專案程式碼下載

CSDN連結：https://download.csdn.net/download/zengraoli/12257686

參考連結

Python專案開發案例集錦及配套程式碼
2019-11-27
Python
專案管理--PMBOK 讀書筆記（4）【專案整合管理】
2020-07-28
專案管理筆記
python網路資料採集 - 讀書筆記 - 糾錯與記錄
2018-03-30
Python筆記
Python專案案例開發從入門到實戰 - 書籍資訊
2019-05-19
Python
“蟒蛇書”讀者群專屬——問題集錦，新入群讀者必看～
2021-03-26
python專案開發例項書-關於開發Python專案的心得總結
2022-01-31
Python
《深入核心的敏捷開發》讀書筆記（2）
2020-12-27
敏捷筆記
讀書筆記
2024-06-04
筆記
easy雲盤專案開發筆記
2024-04-27
筆記
《讀書與做人》讀書筆記
2024-06-14
筆記
<Laravel 開發環境部署>讀書筆記----常用指令
2019-05-27
Laravel開發環境筆記
python專案開發例項-Python專案案例開發從入門到實戰——爬蟲、遊戲
2020-10-28
Python爬蟲遊戲
python爬蟲學習筆記 4.2 （Scrapy入門案例（建立專案））
2020-04-30
Python爬蟲筆記
瑞吉外賣專案開發筆記
2024-04-27
筆記
張紹文android開發高手課讀書筆記1
2019-02-13
Android筆記
webpackDemo讀書筆記
2018-07-30
Web筆記
Vue讀書筆記
2018-05-02
Vue筆記
散文讀書筆記
2018-08-26
筆記
Cucumber讀書筆記
2020-04-06
筆記
HTTP 讀書筆記
2018-03-05
HTTP筆記
postgres 讀書筆記
2024-11-19
筆記
讀書筆記2
2024-10-30
筆記
讀書筆記3
2024-06-19
筆記
python小專案案例-開發Python專案案例，這8個庫不容錯過，功能強大效能優質
2020-11-01
Python
專案策劃書案例
2024-07-16
fluent python讀書筆記2—Python的序列型別1
2018-09-04
Python筆記型別
fluent python 讀書筆記 2–Python的序列型別2
2019-02-20
Python筆記型別
python高階程式設計讀書筆記（一）
2018-11-07
Python程式設計筆記
《Python 簡明教程》讀書筆記（持續更新）
2020-04-14
Python筆記
python專案開發
2020-03-24
Python
Android開發錯誤集錦
2019-12-23
Android
《Redis開發與運維》慢查詢分析讀書筆記
2019-01-13
Redis運維筆記
CSAPP 併發程式設計讀書筆記
2021-12-20
APP程式設計筆記
js高程讀書筆記
2018-04-09
JS筆記
《論語》讀書筆記
2024-07-10
筆記
《重構》讀書筆記
2021-03-06
筆記
PMBook讀書筆記（一）
2020-12-07
筆記
Python專案實戰開發最全案例，涵蓋8個開發方向
2022-05-11
Python