python分析《三國演義》,誰才是這部書的絕對主角(包含統計指定角色的方法)

小貝書屋發表於2021-05-18

前面分析統計了金庸名著《倚天屠龍記》中人物按照出現次數並排序

https://www.cnblogs.com/becks/p/11421214.html

然後使用pyecharts,統計B站某視訊彈幕內容,並繪製成詞雲顯示

https://www.cnblogs.com/becks/p/14743080.html

 

這次,就用分析統計下《三國演義》這部書裡各角色出現的頻率,並繪製成詞雲顯示,看看誰是絕對的主角吧

 

首先,我們需要把這部書裡出現的人物都列舉出來,畢竟只統計角色資訊,不需要把非人物名也統計進來

角色 = {'劉備','諸葛亮','關羽','張飛','劉禪',"孫權",'趙雲','司馬懿','周瑜','曹操','袁紹','馬超','魏延',
        '黃忠','姜維','馬岱','龐德','孟獲','劉表','董卓','孫策',
        '魯肅','司馬昭','夏侯淵','王平','劉璋','袁術','呂蒙','甘寧','鄧艾','曹仁',
        '陸遜','許褚','龐統','曹洪','李典','曹丕','廖化','曹真','呂布'}

然後就是讀取實現準備好的《三國演義》書籍txt文件格式,使用jieba庫對文件內容進行處理

# -*-coding:utf8-*-
# encoding:utf-8

import jieba   #倒入jieba庫
import os
import sys
from collections import Counter#分詞後詞頻統計
from pyecharts.charts import WordCloud#詞雲

path = os.path.abspath(os.path.dirname(sys.argv[0]))
txt=open(path+'\\171182.txt',"r", encoding='utf-8').read()    #讀取三國演義文字

words=jieba.lcut(txt)   #jieba庫分析文字
counts={}

在就是統計指定角色姓名出現次數

for word in words:    
    if len(word)<=1:   
        continue
    elif word in 角色:
        counts[word]=counts.get(word,0)+1
    else:
        None

繪製詞雲

items=list(counts.items())#字典到列表
wordcloud = WordCloud()
wordcloud.add("",items,word_size_range=[15, 80],rotate_step=30,shape='cardioid')
wordcloud.render(path+'\\wordcloud.html')

執行指令碼後檢視生成檔案

 

 

 

 

曹操兩個字的顯示的最大,說明整部書裡出現的次數最多。這肯定不對,羅貫中是劉備粉啊,

後來想了下,在三國裡,直呼人姓名那是罵人,是損。那些所謂的正派人士都是有雅稱的,比如臥龍、諸葛等等

改了下程式碼,把這些人的雅稱也匹配進去

劉備 = {"玄德","玄德曰","先主","劉豫州","劉皇叔",'劉玄德','劉使君'}
諸葛亮 = {"孔明","孔明曰","臥龍","臥龍先生","諸葛先生",'孔明先生','諸葛丞相','諸葛'}
關羽 = {"關公","雲長","漢壽亭侯","關雲長"}
曹操 = {"孟德",'曹孟德','曹操'}
張飛 = {"張翼德",'翼德'}

同時,統計部分也作了處理

for word in words:    #篩選分析後的名詞
    if len(word)<=1:   #因為片語中的漢字數大於1個即認為是一個片語,所以通過continue結束掉讀取的漢字書為1的內容
        continue
    #elif word in exculdes:
        #continue
    #elif word in 諸葛亮 or word in 劉備 or word in 關羽 or word in 曹操:  
        #counts[word]=counts.get(word,0)+1
    elif word in 劉備:
        word ="劉備"
        counts[word]=counts.get(word,0)+1
    elif word in 諸葛亮:
        word ="諸葛亮"
        counts[word]=counts.get(word,0)+1
    elif word in 曹操:
        word ="曹操"
        counts[word]=counts.get(word,0)+1  
    elif word in 關羽:
        word ="關羽"
        counts[word]=counts.get(word,0)+1    
    elif word in 張飛:
        word ="張飛"
        counts[word]=counts.get(word,0)+1    
    elif word in 其他:
        counts[word]=counts.get(word,0)+1
  
    else:
        None

 

 再次執行,嗯,諸葛亮是王者,諸葛亮合計出現了1350次,劉備合計出現1271次

 

 

 

 

 附整個程式碼

# -*-coding:utf8-*-
# encoding:utf-8

import jieba   #倒入jieba庫
import os
import sys
from collections import Counter#分詞後詞頻統計
from pyecharts.charts import WordCloud#詞雲


path = os.path.abspath(os.path.dirname(sys.argv[0]))
txt=open(path+'\\三國演義.txt',"r", encoding='utf-8').read()    #文字

words=jieba.lcut(txt)   #jieba庫分析文字
counts={}

劉備 = {"玄德","玄德曰","先主","劉豫州","劉皇叔",'劉玄德','劉使君'}
諸葛亮 = {"孔明","孔明曰","臥龍","臥龍先生","諸葛先生",'孔明先生','諸葛丞相','諸葛'}
關羽 = {"關公","雲長","漢壽亭侯","關雲長"}
劉禪 = {"後主"}
曹操 = {"孟德",'曹孟德','曹操'}
張飛 = {"張翼德",'翼德'}

其他 = {"孫權",'趙雲','司馬懿','周瑜','劉禪','袁紹','馬超','魏延','黃忠','姜維','馬岱','龐德','孟獲','劉表','董卓','孫策',
        '魯肅','司馬昭','夏侯淵','王平','劉璋','袁術','呂蒙','甘寧','鄧艾','曹仁','陸遜','許褚','龐統','曹洪','李典','曹丕','廖化','曹真','呂布'}
for word in words:    #篩選分析後的名詞
    if len(word)<=1:   #因為片語中的漢字數大於1個即認為是一個片語,所以通過continue結束掉讀取的漢字書為1的內容
        continue
    #elif word in exculdes:
        #continue
    #elif word in 諸葛亮 or word in 劉備 or word in 關羽 or word in 曹操:  
        #counts[word]=counts.get(word,0)+1
    elif word in 劉備:
        word ="劉備"
        counts[word]=counts.get(word,0)+1
    elif word in 諸葛亮:
        word ="諸葛亮"
        counts[word]=counts.get(word,0)+1
    elif word in 曹操:
        word ="曹操"
        counts[word]=counts.get(word,0)+1  
    elif word in 關羽:
        word ="關羽"
        counts[word]=counts.get(word,0)+1    
    elif word in 張飛:
        word ="張飛"
        counts[word]=counts.get(word,0)+1    
    elif word in 其他:
        counts[word]=counts.get(word,0)+1
  
    else:
        None

items=list(counts.items())#字典到列表

wordcloud = WordCloud()
wordcloud.add("",items,word_size_range=[15, 80],rotate_step=30,shape='cardioid')
wordcloud.render(path+'\\wordcloud.html')

 

相關文章