【資料視覺化】周杰倫新歌《Mojito》豆瓣短評資料

helaoshi發表於2020-06-26

原文網址 : http://blog.itpub.net/69977871/viewspace-2700670/

豆瓣音樂短評資料分析

資料來源：豆瓣音樂短評

https://music.douban.com/subject/35093585/comments/

資料量：498條。

資料讀入

In [1]:

# 匯入包import numpy as np import pandas as pd import jieba  import time  import requests from pyecharts.charts import Pie, Bar, Line, Pagefrom pyecharts import options as opts from pyecharts.globals import SymbolType

In [2]:

# 讀入資料df_douban = pd.read_csv('/home/kesci/input/mojito6931/Mojito豆瓣短評資料6.12.csv')df_douban.head()

Out[2]:

	user_name	user_url	rating_num	comment_time	content	vote_count
0	銳利修蕊	https://www.douban.com/people/ruilixiurui/	user-stars allstar20 rating	2020-06-12	比前幾首玩票性質的單曲當然是認真了很多。但是這個歌有任何令人驚喜的地方嗎？沒有，好久沒聽到能...	1238
1	twotwo	https://www.douban.com/people/GuanRenWoYao/	user-stars allstar20 rating	2020-06-12	建議周杰倫老老實實過婚後生活吧，演唱會水時長，發新歌感覺純粹為了去古巴旅遊順便拍了個vlog...	1038
2	Costi	https://www.douban.com/people/costi/	user-stars allstar50 rating	2020-06-12	二十年鐵粉夏日落淚。一個有趣的小知識，如果有人吐槽「周郎才盡」足夠早，到今天也差不多十五年了...	930
3	Santé	https://www.douban.com/people/trueGugi/	NaN	2020-06-12	屬於那種Spotify 新歌推送，聽到會直接切走的歌	259
4	月山行	https://www.douban.com/people/xmns/	NaN	2020-06-12	比前幾首好很多了，但還沒有達到周杰倫的正常水平。感覺他的問題是生活太滋潤了，沒有早年的傷春悲...	122

In [3]:

# 重複值df_douban.duplicated().sum()

Out[3]:

In [4]:

df_douban.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 6 columns):
user_name       498 non-null object
user_url        498 non-null object
rating_num      467 non-null object
comment_time    498 non-null object
content         498 non-null object
vote_count      498 non-null int64
dtypes: int64(1), object(5)
memory usage: 23.4+ KB

資料預處理

刪除多餘的列：user_url、comment_time
rating_num：提取星級
content：分詞和關鍵詞提取

In [5]:

# 提取星級df_douban['star'] = df_douban.rating_num.str.extract(r'(\d)')# 刪除列df_douban = df_douban.drop(['rating_num', 'user_url', 'comment_time'], axis=1) df_douban.head(3)

Out[5]:

	user_name	content	vote_count	star
0	銳利修蕊	比前幾首玩票性質的單曲當然是認真了很多。但是這個歌有任何令人驚喜的地方嗎？沒有，好久沒聽到能...	1238	2
1	twotwo	建議周杰倫老老實實過婚後生活吧，演唱會水時長，發新歌感覺純粹為了去古巴旅遊順便拍了個vlog...	1038	2
2	Costi	二十年鐵粉夏日落淚。一個有趣的小知識，如果有人吐槽「周郎才盡」足夠早，到今天也差不多十五年了...	930	5

In [6]:

# 異常值df_douban['content'] = df_douban.content.replace('?', '微笑')

In [7]:

# 輸入API Key和Secret Keyak = 'iBrqRI4BQunrDH7Bi1060bBG'sk = 'IkBdZFZQ2kBKVp3i1iXlDVcZzPQdGNmP'host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={}&client_secret={}'.format(ak, sk)# 發起請求r = requests.post(host) # 獲取tokentoken = r.json()['access_token']def get_sentiment_score(text):
    """    輸入文字，返回情感傾向得分    """
    url = 'https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify?charset=UTF-8&access_token={}'.format(token)
    data = {
        'text': text
    } 
    data = json.dumps(data)  #字典-字串
    # 發起請求
    try:
        res = requests.post(url, data=data, timeout=3)
        items_score = res.json()['items']
    except Exception as e:
        time.sleep(3)  
        res = requests.post(url, data=data, timeout=3)
        items_score = res.json()['items']
    return items_score

In [8]:

score_list = [] step = 0for i in df_douban['content']:
    score = get_sentiment_score(i)
    # 列印進度
    step += 1
    print('我正在獲取第{}個評分'.format(step), end='\r') 
    score_list.append(score)

我正在獲取第498個評分

In [9]:

# 提取正負概率positive_prob = [i[0]['positive_prob'] for i in score_list]negative_prob = [i[0]['negative_prob'] for i in score_list]df_douban['positive_prob'] = positive_probdf_douban['negative_prob'] = negative_prob# 正負向df_douban['label'] = ['正向' if i >0.5 else '負向' for i in df_douban.positive_prob]df_douban.head()

Out[9]:

	user_name	content	vote_count	star	positive_prob	negative_prob	label
0	銳利修蕊	比前幾首玩票性質的單曲當然是認真了很多。但是這個歌有任何令人驚喜的地方嗎？沒有，好久沒聽到能...	1238	2	0.989113	0.010887	正向
1	twotwo	建議周杰倫老老實實過婚後生活吧，演唱會水時長，發新歌感覺純粹為了去古巴旅遊順便拍了個vlog...	1038	2	0.000569	0.999431	負向
2	Costi	二十年鐵粉夏日落淚。一個有趣的小知識，如果有人吐槽「周郎才盡」足夠早，到今天也差不多十五年了...	930	5	0.353984	0.646016	負向
3	Santé	屬於那種Spotify 新歌推送，聽到會直接切走的歌	259	NaN	0.958526	0.041474	正向
4	月山行	比前幾首好很多了，但還沒有達到周杰倫的正常水平。感覺他的問題是生活太滋潤了，沒有早年的傷春悲...	122	NaN	0.997686	0.002314	正向

資料視覺化

豆瓣短評評分佔比

In [10]:

# 計數star_num = df_douban.star.value_counts()star_num = star_num.sort_index()star_num

Out[10]:

1     20
2     83
3    185
4    102
5     77
Name: star, dtype: int64

In [11]:

# 資料對data_pair = [list(z) for z in zip([i+'星' for i in star_num.index], star_num.values.tolist())]# 餅圖pie1 = Pie(init_opts=opts.InitOpts(width='1350px', height='750px'))pie1.add('', data_pair, radius=['35%', '60%'])pie1.set_global_opts(title_opts=opts.TitleOpts(title='豆瓣短評評分佔比'), 
                     legend_opts=opts.LegendOpts(orient='vertical', pos_top='15%', pos_left='2%')
                    ) pie1.set_series_opts(label_opts=opts.LabelOpts(formatter='{b}:{d}%'))pie1.render_notebook()

Out[11]:

基於百度AI評論情感得分

In [12]:

bins = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0] score_num = pd.cut(df_douban.positive_prob, bins=bins)score_num = score_num.value_counts().sort_index()score_num

Out[12]:

(0.0, 0.1]    161
(0.1, 0.2]     24
(0.2, 0.3]     12
(0.3, 0.4]     13
(0.4, 0.5]     20
(0.5, 0.6]     12
(0.6, 0.7]     16
(0.7, 0.8]     11
(0.8, 0.9]     31
(0.9, 1.0]    198
Name: positive_prob, dtype: int64

In [13]:

# 柱形圖bar1 = Bar(init_opts=opts.InitOpts(width='1350px', height='750px'))bar1.add_xaxis(score_num.index.astype('str').tolist())bar1.add_yaxis('', score_num.values.tolist(), category_gap='0%')bar1.set_global_opts(title_opts=opts.TitleOpts(title='基於百度AI評論情感得分'), 
                     visualmap_opts=opts.VisualMapOpts(max_=200))bar1.render_notebook()

Out[13]:

豆瓣正向評分詞雲圖-正向評分的原因？

In [14]:

def get_cut_words(content_series):
    # 讀入停用詞表
    stop_words = [] 
    
    with open(r"/home/kesci/input/stop6931/哈工大停用詞表.txt", 'r', encoding='gb18030') as f:
        lines = f.readlines()
        for line in lines:
            stop_words.append(line.strip())
    # 新增關鍵詞
    my_words = ['周杰倫', '一首歌']  
    for i in my_words:
        jieba.add_word(i) #     自定義停用詞
    my_stop_words = ['歌有', '真的', '這首', '一首', '一點', 
                    '反正', '一段', '一句', '首歌'] 
    stop_words.extend(my_stop_words)               
    # 分詞
    word_num = jieba.lcut(content_series.str.cat(sep='。'), cut_all=False)
    # 條件篩選
    word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
    
    return word_num_selected

In [15]:

text1 = get_cut_words(content_series=df_douban[(df_douban.star=='4')|(df_douban.star=='5')]['content'])text1[:5]

Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.803 seconds.
Prefix dict has been built succesfully.

Out[15]:

['二十年', '鐵粉', '夏日', '落淚', '有趣']

In [16]:

! pip install stylecloud

Collecting stylecloud
  Downloading https://files.pythonhosted.org/packages/c2/b4/f3a0f301009b2fffdce26f64507da9364e601719b2d00bfe6fa14e677a47/stylecloud-0.5.1.tar.gz (262kB)
    100% |████████████████████████████████| 266kB 1.4MB/s ta 0:00:01
Requirement already satisfied: wordcloud in /opt/conda/lib/python3.6/site-packages (from stylecloud)
Collecting icon-font-to-png (from stylecloud)
  Downloading https://files.pythonhosted.org/packages/3d/70/c3b6c5904ae8592cb97c3ddb5de40801837f66922aa140e285d4a2e49a42/icon_font_to_png-0.4.1-py2.py3-none-any.whl (161kB)
    100% |████████████████████████████████| 163kB 2.3MB/s ta 0:00:01
Collecting palettable (from stylecloud)
  Downloading https://files.pythonhosted.org/packages/ca/46/5198aa24e61bb7eef28d06cb69e56bfa1942f4b6807d95a0b5ce361fe09b/palettable-3.3.0-py2.py3-none-any.whl (111kB)
    100% |████████████████████████████████| 112kB 2.5MB/s ta 0:00:01
Collecting fire (from stylecloud)
  Downloading https://files.pythonhosted.org/packages/34/a7/0e22e70778aca01a52b9c899d9c145c6396d7b613719cd63db97ffa13f2f/fire-0.3.1.tar.gz (81kB)
    100% |████████████████████████████████| 81kB 2.9MB/s ta 0:00:011
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.6/site-packages (from stylecloud)
Requirement already satisfied: numpy>=1.6.1 in /opt/conda/lib/python3.6/site-packages (from wordcloud->stylecloud)
Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from wordcloud->stylecloud)
Requirement already satisfied: requests>=2.12.5 in /opt/conda/lib/python3.6/site-packages (from icon-font-to-png->stylecloud)
Requirement already satisfied: six>=1.10.0 in /opt/conda/lib/python3.6/site-packages (from icon-font-to-png->stylecloud)
Collecting tinycss>=0.4 (from icon-font-to-png->stylecloud)
  Downloading https://files.pythonhosted.org/packages/05/59/af583fff6236c7d2f94f8175c40ce501dcefb8d1b42e4bb7a2622dff689e/tinycss-0.4.tar.gz (87kB)
    100% |████████████████████████████████| 92kB 3.0MB/s ta 0:00:011
Requirement already satisfied: termcolor in /opt/conda/lib/python3.6/site-packages (from fire->stylecloud)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->stylecloud)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->stylecloud)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->stylecloud)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib->stylecloud)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests>=2.12.5->icon-font-to-png->stylecloud)
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests>=2.12.5->icon-font-to-png->stylecloud)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests>=2.12.5->icon-font-to-png->stylecloud)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests>=2.12.5->icon-font-to-png->stylecloud)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib->stylecloud)
Building wheels for collected packages: stylecloud, fire, tinycss
  Running setup.py bdist_wheel for stylecloud ... done
  Stored in directory: /home/kesci/.cache/pip/wheels/fb/19/e2/5e95c310c5a86586048ced770f35e60a8221be7ef0138f61ca
  Running setup.py bdist_wheel for fire ... done
  Stored in directory: /home/kesci/.cache/pip/wheels/c1/61/df/768b03527bf006b546dce284eb4249b185669e65afc5fbb2ac
  Running setup.py bdist_wheel for tinycss ... done
  Stored in directory: /home/kesci/.cache/pip/wheels/1b/26/08/7390b2e6d5eb3403ef35647f09576459ca567d00ac725307d5
Successfully built stylecloud fire tinycss
Installing collected packages: tinycss, icon-font-to-png, palettable, fire, stylecloud
Successfully installed fire-0.3.1 icon-font-to-png-0.4.1 palettable-3.3.0 stylecloud-0.5.1 tinycss-0.4You are using pip version 9.0.1, however version 20.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

In [17]:

import stylecloudfrom IPython.display import Image # 用於在jupyter lab中顯示本地圖片# 繪製詞雲圖stylecloud.gen_stylecloud(text=' '.join(text1), 
                          max_words=1000,
                          collocations=False,
                          font_path=r'/home/kesci/input/經典綜藝體簡.TTF',
                          icon_name='fas fa-thumbs-up',
                          size=612,
                          output_name='豆瓣正向評分詞雲圖.png')Image(filename='豆瓣正向評分詞雲圖.png')

Out[17]:

豆瓣負向評分詞雲圖-負向評分的原因？

In [18]:

text2 = get_cut_words(content_series=df_douban[(df_douban.star=='1')|(df_douban.star=='2')]['content'])text2[:5]

Out[18]:

['比前', '幾首', '玩票性質', '單曲', '當然']

In [19]:

# 繪製詞雲圖stylecloud.gen_stylecloud(text=' '.join(text2), 
                          max_words=1000,
                          collocations=False,
                          font_path=r'/home/kesci/input/經典綜藝體簡.TTF',
                          icon_name='fas fa-thumbs-down',
                          size=612,
                          output_name='豆瓣負向評分詞雲圖.png')Image(filename='豆瓣負向評分詞雲圖.png')

Out[19]:

In [20]:

page = Page()page = page.add(pie1, bar1)page.render('Mojito豆瓣資料分析.html')

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/69977871/viewspace-2700670/，如需轉載，請註明出處，否則將追究法律責任。

資料視覺化豆瓣電影 TOP250
2019-02-23
視覺化
資料視覺化【十五】
2020-10-14
視覺化
資料看板視覺化
2022-12-15
視覺化
Matlab資料視覺化
2021-04-08
Matlab視覺化
資料視覺化能否代替資料分析
2021-12-01
視覺化
用python寫一個豆瓣短評通用爬蟲(登入、爬取、視覺化)
2020-10-24
Python爬蟲視覺化
什麼是資料視覺化，為什麼資料視覺化很重要？
2022-03-14
視覺化
資料視覺化--實驗五：高維非空間資料視覺化
2021-01-01
視覺化
資料視覺化實踐
2018-10-25
視覺化
python資料視覺化——echarts
2018-10-16
Python視覺化Echarts
如何看待資料視覺化？
2019-03-06
視覺化
視覺化中的資料
2019-02-27
視覺化
python 資料視覺化利器
2019-02-28
Python視覺化
資料視覺化的秘密
2020-02-06
視覺化
【matplotlib教程】資料視覺化
2024-08-23
視覺化
資料視覺化的作用
2020-12-11
視覺化
什麼是資料視覺化？hightopo資料視覺化助力企業數字化
2021-12-08
視覺化
資料視覺化基本原理——視覺化模型
2019-03-01
視覺化模型
Python資料科學（八）- 資料探索與資料視覺化
2019-03-02
Python資料科學視覺化
（在模仿中精進資料視覺化03）OD資料的特殊視覺化方式
2020-10-18
視覺化
豆瓣短評榜單短評下載
2024-08-11
Python資料視覺化matplotlib庫
2019-03-04
Python視覺化
pyecharts做資料視覺化(二)
2018-09-14
Echarts視覺化
資料視覺化的藝術
2018-05-22
視覺化
什麼是資料視覺化？
2018-06-07
視覺化
新冠肺炎資料視覺化
2020-02-07
視覺化
視覺化資料分析軟體
2021-11-30
視覺化
如何做好資料視覺化
2021-09-28
視覺化
Matplotlib資料視覺化基礎
2022-07-01
視覺化
視覺化之資料視覺化最強工具推薦
2023-02-27
視覺化
如何將資料進行資料視覺化展現？
2019-01-04
視覺化
Python疫情資料分析，並做資料視覺化展示
2022-03-08
Python視覺化
大資料時代,人人都在談資料視覺化。
2022-02-22
大資料視覺化
資料視覺化Seaborn從零開始學習教程（三）資料分佈視覺化篇
2019-03-01
視覺化
[資料分析與視覺化] Python繪製資料地圖2-GeoPandas地圖視覺化
2023-04-09
視覺化Python地圖
資料視覺化的基本原理——視覺通道
2019-03-03
視覺化
資料中臺助力資料視覺化智慧治理
2024-02-01
視覺化
關於資料視覺化的思考
2019-03-07
視覺化

【資料視覺化】周杰倫新歌《Mojito》豆瓣短評資料

豆瓣音樂短評資料分析

資料讀入

資料預處理

資料視覺化

豆瓣短評評分佔比

基於百度AI評論情感得分

豆瓣正向評分詞雲圖-正向評分的原因？

豆瓣負向評分詞雲圖-負向評分的原因？

相關文章