陳志俠第二次作業

這個作業屬於哪個課程	https://edu.cnblogs.com/campus/zjlg/rjjc
這個作業的目標	編寫實現命令列計數統計文字程式
姓名-學號	陳志俠 2022329301009
碼雲地址	https://gitee.com/chen-zhixia666/second-assignment

一、專案簡介及其相關的用法

1.1專案簡介

本專案旨在利用PyCharm實現一個命令列文字計數統計程式，能正確統計匯入的純英文txt文字中的字元數，單詞數，句子數。在擴充功能中，實現了統計程式碼行、空行、註釋行、標點符號數、特定單詞數等功能，使用者可以透過命令列引數選擇不同的統計功能，並指定要處理的文字檔案。

1.2相關用法

具體命令列介面要求舉例：
命令模式： wc.exe [引數] [檔名]

wc.exe -c file.txt 統計字元數
wc.exe -w file.txt 統計單詞數
擴充套件功能（加分項）：統計程式碼行、空行、註釋行等，並提供相應命令介面。

二、檔案列表及其相關說明

2.1檔案列表

不同版本號體現各個版本更迭

v0.1	空專案
v0.2	專案完成基礎功能
v0.3	專案完成擴充套件功能（加分項）

點選檢視v0.2程式碼

import sys
import re

def count_characters(text):
    return len(text) #字元數

def count_words(text):
    words = text.split()
    return len(words)   #單詞數，以空格分割

def count_sentences(text):
    sentences = re.split(r'[.!?]', text)
    return len([s for s in sentences if s])   #句子數，以‘.’、‘！’、‘？’作為句子結束標誌

def main():
    if len(sys.argv) != 3:  #檢查命令列引數的數量是否正確
        print("笨蛋，正確用法是: python v0.2.py -c或-w或-s txt檔名")
        print("不同選擇的意思是: -c (字元數), -w (單詞數), -s (句子數)")
        sys.exit(1)

    option = sys.argv[1]
    filename = sys.argv[2]

    try:
        with open(filename, 'r') as file:
            text = file.read()
    except FileNotFoundError:  #如果檔案不存在
        print(f"錯誤！！！！！這個檔案居然不存在！！！")
        sys.exit(1)

    if option == "-c":
        num_characters = count_characters(text)
        print(f"字元數: {num_characters}")
    elif option == "-w":
        num_words = count_words(text)
        print(f"單詞數: {num_words}")
    elif option == "-s":
        num_sentences = count_sentences(text)
        print(f"句子數: {num_sentences}")
    else:
        print("你的選擇出錯了,請使用-c或-w或-s")
        sys.exit(1)

if __name__ == "__main__":
    main()

點選檢視v0.3程式碼

import sys
import re

def count_characters(text):
    return len(text) #字元數

def count_words(text):
    words = text.split()
    return len(words)   #單詞數，以空格分割

def count_sentences(text):
    sentences = re.split(r'[.!?]', text)
    return len([s for s in sentences if s])   #句子數，以‘.’、‘！’、‘？’作為句子結束標誌

def count_lines(text):
    return len(text.split('\n'))  #程式碼行數，用換行符進行分割

def count_empty_lines(text):
    return len([line for line in text.split('\n') if not line.strip()])   #空行數，過濾掉非空行

def count_comment_lines(text):
    return len([line for line in text.split('\n') if line.strip().startswith('#')]) #註釋行數，以‘#’開頭

def count_punctuation(text):
    punctuation = re.findall(r'[.,;:?!-]', text)
    return len(punctuation)  #標點符號數

def count_specific_word(text, word):
    words = re.findall(r'\b{}\b'.format(re.escape(word)), text)
    return len(words)  #特定單詞數

def main():
    if len(sys.argv) < 3 or len(sys.argv) > 4:  #檢查命令列引數的數量是否正確
        print("笨蛋，正確用法是: python v0.3.py -c或-w或-s或-e或-m或-a或-p txt檔名")
        print("不同選擇的意思是: -c (字元數), -w (單詞數), -s (句子數)，-e（程式碼行數），-m（空行數），-a（註釋行數），-p（標點符號數）")
        print("當你想要統計特定單詞的出現數時，用法是：python v0.3.py -x txt檔名 [特定單詞]")
        sys.exit(1)

    option = sys.argv[1]
    filename = sys.argv[2]
    specific_word = sys.argv[3] if len(sys.argv) == 4 else None  #獲得特定單詞（如果有）

    try:
        with open(filename, 'r') as file:
            text = file.read()
    except FileNotFoundError:  #如果檔案不存在
        print(f"錯誤！！！！！這個檔案居然不存在！！！")
        sys.exit(1)

    if option == "-c":
        num_characters = count_characters(text)
        print(f"字元數: {num_characters}")
    elif option == "-w":
        num_words = count_words(text)
        print(f"單詞數: {num_words}")
    elif option == "-s":
        num_sentences = count_sentences(text)
        print(f"句子數: {num_sentences}")
    elif option == "-e":
        num_lines = count_lines(text)
        print(f"程式碼行數: {num_lines}")
    elif option == "-m":
        num_empty_lines = count_empty_lines(text)
        print(f"空行數: {num_empty_lines}")
    elif option == "-a":
        num_comment_lines = count_comment_lines(text)
        print(f"註釋行數: {num_comment_lines}")
    elif option == "-p":
        num_punctuation = count_punctuation(text)
        print(f"標點符號數: {num_punctuation}")
    elif option == "-x":
        if specific_word is None:
            print("拜託，告訴我你想統計什麼單詞好嗎")
            sys.exit(1)
        num_specific_word = count_specific_word(text, specific_word)
        print(f"'{specific_word}' 出現的次數: {num_specific_word}")
    else:
        print("你的選擇出錯了,請使用-c或-w或-s或-e或-m或-a或-p或-x")
        sys.exit(1)

if __name__ == "__main__":
    main()

2.2程式說明

2.2.1函式說明

（1） count_characters(text):

功能: 計算並返回文字中的字元數。
引數: text (str): 輸入的文字字串。
返回值: 整數，表示字元數。

（2） count_specific_word(text, word):

功能: 計算並返回文字中特定單詞的出現次數。單詞匹配是區分大小寫的，並且要求整個單詞匹配（即不是部分匹配）。
引數:
- text (str): 輸入的文字字串。
- word (str): 要統計的特定單詞。
返回值: 整數，表示特定單詞出現的次數。

其他count函式功能類似，在此不一一闡述。
（3）main():

功能: 主函式，處理命令列引數（-c、-w、-s、-e等）並呼叫相應的統計函式。
引數: 無直接引數，但透過命令列傳遞。
返回值: 無直接返回值，但會列印統計結果或錯誤資訊。

2.2.2引數說明

-c: 統計字元數。
-w: 統計單詞數。
-s: 統計句子數。
-e: 統計程式碼行數。
-m : 統計空行數。
-a : 統計註釋行數。
-p : 統計標點符號數。
-x : 統計特定單詞的出現次數。

三、例程執行及其相關結果

3.1.例程準備

在執行前準備好純英文txt檔案，使其滿足具有字元、句子、註釋行的要求。

3.2例程執行流程

檢查命令列引數：
- 程式首先檢查命令列引數的數量是否正確（3個或4個）。
- 如果引數數量不正確，會列印錯誤資訊並退出。
讀取檔案內容：
- 根據使用者提供的檔名，嘗試開啟並讀取檔案內容。
- 如果檔案不存在，會列印錯誤資訊並退出。
根據選項執行相應的統計函式：
- 根據使用者選擇的選項（如 -c、-w、-s 等），呼叫相應的統計函式。
- 每個統計函式都會處理檔案內容並返回結果。
輸出結果：
- 將統計結果列印到控制檯。

3.3執行結果

3.3.1 v0.2.py執行結果

3.3.2 v0.3.py執行結果

四、總結

4.1 收穫

掌握了字串處理方法和正規表示式的應用。
學會了如何處理命令列引數。
提升了程式碼的組織和除錯能力。

4.2 改進空間

可以增加更多的統計功能，如統計不同型別單詞的數量等。
可以最佳化效能，特別是對於大檔案的處理。
可以新增更多的錯誤處理機制，以提高程式的健壯性。

透過本次實驗，讓我對python這個軟體更加深入瞭解了，並且在其中收穫快樂，這是一次非常棒的體驗：）