python統計英文文字中的迴文單詞數

老瀟的摸魚日記發表於2020-05-13

原文網址 : https://www.cnblogs.com/victorxiao/p/12884047.html

1. 要求：

給定一篇純英文的文字，統計其中迴文單詞的比列，並輸出其中的迴文單詞，文字資料如下：

This is Everyday Grammar. I am Madam Lucija
And I am Kaveh. Why the title, Lucija?
Well, it is a special word. Madam?
Yeah, maybe I should spell it for you forward or backward?
I am lost. The word Madam is a Palindrome.
I just learned about them the other day and I am having a lot of fun!
Palindrome, huh? Let me try!
But first, we need to explain what a Palindrome is.
That is easy! Palindromes are words, phrases or numbers that read the same back and forward, like DAD.
So, Palindromes can be serious or just silly.
Yup, like, A nut for a jar of tuna.
Or, Borrow or Rob. Probably borrow!
And if you are hungry, you can always have a Taco cat?
That is gross. What about this one?
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal panama!
That is a real tongue twister. But I prefer Italy. Amore Roma!
So how do we make palindromes?
One, read words backwards and see if they make sense.
Two, try to make palindromes where even the spacing between words is consistent. Like, NotATon.
And three, you can always check the internet for great palindromes!
And that is Everyday Grammar.

注意：

區分單詞的大小寫，即同一個單詞的大寫和小寫視為不同的單詞；

2. 分析

本次任務的思路很簡單，基本步驟如下：

第一步：讀入文字資料，然後去掉文字中的換行符；
第二步：去掉第一步處理後的文字中的標點符號，這裡使用正規表示式將文字中的單詞保留，從而達到去標點符號的目的。之後使用一個列表存入每一行去掉標點之後的文字。
第三步：根據預處理之後的文字統計詞頻，因為一篇文字里面可能有很多重複的單詞，那麼只須判斷文字構成的子典中的單詞是否是迴文單詞即可。
第四步：遍歷字典中的鍵，並判斷是否是迴文單詞，具體實現方法見程式碼；
第五步：根據找到的迴文單詞計算文字中迴文單詞的比例；

3. 程式碼

import re
from collections import Counter


# 文字預處理,返回值為['This', 'is', 'Everyday']這種形式
def process(path):
    token = []
    with open(path, 'r') as f:
        text = f.readlines()
        for row_text in text:
            row_text_prod = row_text.rstrip('\n')
            row_text_prod = re.findall(r'\b\w+\b', row_text_prod)
            token = token + row_text_prod
        return token


# 統計迴文單詞
def palindrome(processed_text):
    c = Counter(processed_text)  # 詞頻字典
    palindrome_word = []  # 迴文單詞列表
    not_palindrome_word = []  # 非迴文單詞列表
    # 遍歷詞頻字典
    for word in c.keys():
        flag = True
        i, j = 0, len(word)-1
        # 判斷是否是迴文單詞
        while i < j:
            if word[i] != word[j]:
                not_palindrome_word.append(word)  # 不是迴文單詞
                flag = False
                break
            i += 1
            j -= 1
        if flag:
            palindrome_word.append(word)  # 是迴文單詞
    print("迴文單詞:")
    print(palindrome_word)
    print("非迴文單詞:")
    print(not_palindrome_word)
    # 統計迴文單詞的比率
    total_palindrome_word = 0
    for word in palindrome_word:
        total_palindrome_word += c[word]
    print("迴文單詞的比例為：{:.3f}".format(total_palindrome_word / len(processed_text)))


def main():
    text_path = 'test.txt'
    processed_text = process(text_path)
    palindrome(processed_text)


if __name__ == '__main__':
    main()

reference:
python3小技巧之：妙用string.punctuation
迴文字串(Palindromic_String)

Javafx-【直方圖】文字頻次統計工具中文/英文單詞統計
2021-11-09
Java直方圖
統計英文名著中單詞出現頻率
2018-06-03
利用python內建函式，快速統計單詞在文字中出現的次數
2021-09-09
Python函式
用Python如何統計文字檔案中的詞頻？(Python練習)
2019-11-26
Python
C++ 統計單詞數
2024-09-10
C++
用python3統計一行字元中的英文字母，空格，數字和其他字元的個數
2020-11-08
Python字元
LeetCode迴文數（Python）
2018-09-22
LeetCodePython
zzulioj:1133:單詞個數統計(python)我有點皮！！嘿嘿
2020-12-17
Python
C語言英文單詞
2020-04-05
C語言
SQL 操作指令英文單詞
2024-10-02
SQL
統計檔案中出現的單詞次數
2018-03-29
python 計算txt文字詞頻率
2018-07-29
Python
中國傳統文化英文詞彙100個
2024-10-27
Python將所有的英文單詞首字母變成大寫
2021-02-24
Python
英文單詞縮寫----DXNRY – Dictionary 字典
2019-03-26
文字挖掘之語料庫、分詞、詞頻統計
2024-05-20
分詞
程式設計師快速記憶英文單詞的專屬訣竅
2018-04-12
程式設計師
Python 英文的月份轉數字及數字轉英文
2019-01-24
Python
呼叫MapReduce對檔案中單詞出現次數進行統計
2020-12-16
python TK庫統計word文件單詞詞頻程式 UI選擇文件
2020-12-27
PythonUI
python如何統計詞頻
2021-09-11
Python
使用map：單詞計數程式
2020-10-27
背單詞純英文 2024年09月
2024-09-01
LeetCode-434-字串中的單詞數
2021-10-17
LeetCode字串
**呼叫MapReduce對檔案中各個單詞出現的次數進行統計**
2020-12-20
【Python】常用中英文詞彙對照
2020-12-20
Python
C語言：迴文數計算
2020-12-19
C語言
迴文數
2020-11-11
python實現詞頻統計
2020-12-08
Python
PHP讀取文字並計算單詞所在行列
2021-09-09
PHP
程式設計師大牛必備的英文詞彙
2018-05-28
程式設計師
文字分析——分配單詞權重
2019-03-04
LaTeX 中的一些英文字型
2019-10-04
Python統計四六級考試的詞頻
2018-09-10
Python
在Linux中呼叫MapReduce對檔案中各個單詞出現次數進行統計
2020-12-20
Linux
期末大作業關於利用hadoop來統計單詞數目
2020-12-20
Hadoop
dsl 在打包構建生成程式碼中，是哪個英文單詞的縮寫
2024-11-25
Python文字處理NLP：分詞與詞雲圖
2019-07-08
Python分詞

python統計英文文字中的迴文單詞數

1. 要求：

2. 分析

3. 程式碼

相關文章