統計序列中元素出現的頻度
案例1:
隨機序列[1,2,3,4,44,2,3,8...]中,找出次數出現最高的3個元素,並統計他們出現的次數。
# eg_v1
from random import randint data = [randint(1, 20) for _ in range(30)] print(data) # [19, 15, 4, 18, 18, 7, 18, 13, 18, 20, 18, 3, 5, 6, 7, 19, 2, 15, 3, 6, 13, 4, 14, 20, 1, 18, 13, 2, 11, 4] c = dict.fromkeys(data,0) for x in data: c[x] += 1 print (c) # {1: 1, 2: 2, 3: 2, 4: 3, 5: 1, 6: 2, 7: 2, 11: 1, 13: 3, 14: 1, 15: 2, 18: 6, 19: 2, 20: 2} d = sorted(c.items(),key= lambda asd:asd[1]) print (d) # [(1, 1), (5, 1), (11, 1), (14, 1), (2, 2), (3, 2), (6, 2), (7, 2), (15, 2), (19, 2), (20, 2), (4, 3), (13, 3), (18, 6)]
# eg_v2 使用字典中collections.Counter 方法
""" 將序列傳入Counter的構造器,得到Counter方法就是元素詞頻的字典 Counter.most_common(n) 方法得到最高的n個元素的列表 """ from random import randint from collections import Counter data = [randint(1, 20) for _ in range(30)] print (data) # [5, 13, 2, 9, 9, 20, 10, 9, 1, 14, 10, 1, 9, 12, 14, 3, 8, 20, 10, 7, 10, 4, 7, 18, 15, 10, 17, 5, 5, 16] c2 = Counter(data) print (c2) # Counter({10: 5, 9: 4, 5: 3, 1: 2, 7: 2, 14: 2, 20: 2, 2: 1, 3: 1, 4: 1, 8: 1, 12: 1, 13: 1, 15: 1, 16: 1, 17: 1, 18: 1}) n = c2.most_common(3) print (n) # [(10, 5), (9, 4), (5, 3)]
案例2:
對某英文文章的單詞進行詞頻統計,找出出現次數最多的10個單詞,它們出現的次數是多少。
import re from collections import Counter file_txt = open("Alice.txt").read() # print (file_txt) rst = re.split("\s",file_txt) # 匹配任何空白字元,包括空格、製表符、換頁符等。與 [ \f\n\r\t\v] 等效 # print(rst) c3 = Counter(rst) # print (c3) c4 = c3.most_common(10) print (c4) # [('I', 31), ('the', 13), ("I'll", 11), ('When', 8), ('stop', 6), ('down', 6), ('me', 5), ('myself', 5), ('get', 5), ('to', 5)]