個人專案----詞頻統計（補全功能）

YangXiaomoo發表於2016-10-09

對每個功能 (或/和子功能)的預計花費時間

功能	預計時間（min）	實際時間（min）
檔案存放、分詞、詞頻統計	60	82
詞頻排序	20	27
讀取目錄下書目	15	26
主函式設計	50	74

詞頻統計psp

日期	型別	任務	開始時間	結束時間	被打斷時間	計劃（min）	實際（min）
2016.10.07	需求分析	看spec，分析每個功能的需求	14：59	15：38	3	30	36
2016.10.07	編碼學習	設計檔案存放、分詞、詞頻統計，閱讀同學的程式碼	15：44	17：11	5	60	82
2016.10.07	編碼學習	詞頻排序、讀取目錄下書目、主函式設計	19：00	21：26	19	85	127
2016.10.08	編碼學習	學習重定向	15.01	15：39	2	30	36
2016.10.08	程式碼複審	寫部落格、除錯執行結果	15：45	17：12	6	30	81
2016.10.08	程式碼複審	寫部落格、除錯執行結果	17：53	18：26	3	30	30
2016.10.09	總結psp	總結各項時間，總結心得，釋出部落格	9：48	10：57	7	30	62

對比分析

拖拉很久的作業，利用這個假期終於能勉強補上點了，之前動手蒐集了一些資料進行學習，一直摸不清頭緒，這次看了同學的程式碼才弄清楚大概。對於預期和實際上的差距，主要有下面幾個原因：

看過資料後感覺上程式的流程是這樣，但實際動手做起來的時候會遇到很多細節上的問題。例如在DOS下執行程式時會出現“wordcounta.java:13: 錯誤: 找不到符號”這樣的錯誤提示。

在編寫程式時遇到型別、宣告類的格式不對，呼叫方法的規則不正確等等錯誤。在更改這些錯誤上花費了不少的時間。

學習程式碼時遇到了很多問題。例如分詞時split()方法中要用的引數的使用，BufferedReader、FileReader的用法，Map對儲存到ArrayList的方法等等。

需求分析

　　作業中需要完成四個功能。

　　第一，使用者輸入的小檔案進行詞頻統計。輸出統計的單詞總數，每個單詞的詞頻。可以利用這個方法，來滿足其他需求下的這個功能。

　　第二，使用者可以輸入檔案的名字來對此檔案進行詞頻統計。輸出統計的單詞總數，每個單詞的詞頻。

　　第三，使用者輸入檔案所在目錄。在該目錄下顯示所有.txt檔案，隨後對每個檔案進行詞頻統計。輸出結果中應顯示單詞總數，以及不重複的單詞數。由於在書目數量過多的情況下，顯示每個單詞的詞頻結果篇幅非常長，使用者使用起來非常不方便。於是要在結果中只顯示每個文件詞頻排名前十的結果。

　　第四，使用者輸入重定向指令，在重定向的目錄下對檔案進行詞頻統計。輸出統計的單詞總數，每個單詞的詞頻。

功能實現

建立wordcount類，該類實現了詞頻統計的基本功能。包含以下三個方法：

　　public Map<String, Integer> map(File dir)

　　public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap)

　　public File[] Outputlist(Scanner sc)

public Map<String, Integer> map(File dir):對輸入的File檔案讀取，對檔案每一行分詞去空格及標點後存入ls佇列，對ls中的單詞進行統計，存入“Map對”wc中。Map<String,Integer>中的String表示單詞變數，Integer表示出現次數變數。

 1 public Map<String, Integer> map(File dir) throws IOException{
 2            BufferedReader reader = new BufferedReader(new FileReader(dir));
 3            List<String> ls = new ArrayList<String>();
 4            String readLine = null;  //定義readLine初始值
 5            Map<String,Integer> wc = new TreeMap<String,Integer>();
 6            while((readLine = reader.readLine()) != null){    
 7                  String[] wordsArr1 = readLine.split("[^a-zA-Z]");  //將每個單詞分割    
 8                  for (String word : wordsArr1) {    
 9                      if(word.length() != 0){  //去除長度為0的單詞    
10                          ls.add(word);    //將每個單詞存入列表
11                      }    
12                  }    
13              }    
14             reader.close();  //關閉流
15 
16             //單詞的詞頻統計  
17             for (String li : ls) {  
18                 if(wc.get(li) != null){  //get(li)表示獲得當前的單詞數
19                     wc.put(li,wc.get(li) + 1);  
20                 }else{  
21                     wc.put(li,1);  
22                 }  
23       
24             }
25             return wc;
26        }

public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap)：對“Map對”進行排序，按照Map中Integer的降序排序。

 1     public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap){  
 2         
 3            ArrayList<Map.Entry<String,Integer>> list = new ArrayList<Map.Entry<String,Integer>>(oldmap.entrySet());  
 4              
 5            Collections.sort(list,new Comparator<Map.Entry<String,Integer>>(){  //降序  
 6                @Override  
 7                public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) {  
 8                    return o2.getValue().compareTo(o1.getValue());  
 9                }  
10            }); 
11            
12            return list;
13        }

public File[] Outputlist(Scanner sc)：對輸入的地址查詢txt文件，把文件存入File陣列中，輸出陣列，並返回陣列，以便對陣列下的每個文件進行詞頻統計。

 1 public File[] Outputlist(Scanner sc) throws IOException{
 2            File file=new File(sc.nextLine());
 3            File[] tempList = file.listFiles();
 4            System.out.println("該目錄下的書目有：");
 5            for (int i = 0; i < tempList.length; i++) {
 6                 if (tempList[i].isFile()) {
 7                 System.out.println(tempList[i].getName());
 8                }
 9            }
10            return tempList; 
11     }

建立wordcounta,wordcountb,wordcountc,wordcountd四個類，對應著分別實現了需求的四個功能

wordcounta:

 1 public class wordcounta {
 2     public static void main(String[] args) throws IOException {
 3         // TODO Auto-generated method stub    
 4         @SuppressWarnings("resource")
 5         Scanner input = new Scanner(System.in);
 6         wordcount yl = new wordcount();
 7         File file = new File(input.nextLine());
 8         Map<String, Integer> wc = yl.map(file);
 9         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
10         int j = 0;//出事單詞總數
11         
12         for(int k = 0;k < list.size();k++){
13                j+=list.get(k).getValue();
14         }
15         System.out.println("單詞的總數是:"+j);
16         for(int k = 0;k < list.size();k++){  
17                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
18         } 
19 
20     } 
21        
22 }

執行結果如下

wordcountb：

 1 public class wordcountb {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         wordcount yl = new wordcount();
 6         Scanner inputxt = new Scanner(System.in);
 7         File[] tempList = yl.Outputlist(inputxt);
 8         for(int i = 0;i<tempList.length;i++){    //對目錄下的每個檔案進行統計
 9             System.out.println(tempList[i].getName());
10             Map<String, Integer> wc = yl.map(tempList[i]);//統計詞頻
11             ArrayList<Map.Entry<String,Integer>> list = yl.SortMap(wc);//詞頻排序
12             int j = 0;
13              for(int k = 0;k < list.size();k++)
14             {
15                 j+=list.get(k).getValue();
16             }
17                 
18             System.out.println("單詞的總數是"+j+"  "+"不重複的單詞的個數"+list.size());  
19             if(list.size()>=10){
20                 for(int m = 0; m<10; m++){  
21                     System.out.println(list.get(m).getKey()+ ": " +list.get(m).getValue());  
22                 }
23             }else{
24                for(int m = 0; m<list.size(); m++){  
25                    System.out.println(list.get(m).getKey()+ ": " +list.get(m).getValue());  
26                }
27               System.out.println("該文件下不重複的單詞數不足十個");
28             }
29         }
30         
31     }
32 
33 }

執行結果如下

wordcountc:

 1 public class wordcountc {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         @SuppressWarnings("resource")
 6         Scanner input = new Scanner(System.in);
 7         String path = "D:\\小說\\";
 8         path += input.next();
 9         File file = new File(path+".txt");
10         wordcount yl = new wordcount();
11         Map<String, Integer> wc = yl.map(file);
12         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
13         int j = 0;
14         
15         for(int k = 0;k < list.size();k++){
16                j+=list.get(k).getValue();
17         }
18         System.out.println("單詞的總數是:"+j);
19         for(int k = 0;k < list.size();k++){  
20                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
21         } 
22     }
23     
24 }

結果如下

wordcountd：

 1 public class wordcountd {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         if (args.length == 0) {
 6             Scanner in = new Scanner(System.in);
 7             FileWriter out = new FileWriter("D:\\小說\\new.txt"); 
 8             while (in.hasNext()) {
 9                 out.write(in.nextLine()+"\r\n"); 
10             }
11             out.close();
12             in.close(); 
13         }
14         File file = new File("D:\\小說\\new.txt");
15         wordcount yl = new wordcount();
16         Map<String, Integer> wc = yl.map(file);
17         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
18         
19         int j = 0;
20         for(int k = 0;k < list.size();k++){
21                j+=list.get(k).getValue();
22         }
23         System.out.println("單詞的總數是:"+j);
24         for(int k = 0;k < list.size();k++){  
25                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
26         } 
27         
28     }
29 
30 }

結果如下

HTTP:https://git.coding.net/YangXiaomoo/wordCountNO.1.git

SSH：git@git.coding.net:YangXiaomoo/wordCountNO.1.git

GIT：git://git.coding.net/YangXiaomoo/wordCountNO.1.git

詞語詞頻統計
2020-11-19
詞頻統計
2024-06-26
詞頻統計mapreduce
2024-10-27
python如何統計詞頻
2021-09-11
Python
python實現詞頻統計
2020-12-08
Python
PhpStorm 補全功能
2020-02-14
PHPORM
PostgreSQL全文檢索-詞頻統計
2018-04-18
SQL
文字挖掘之語料庫、分詞、詞頻統計
2024-05-20
分詞
詞頻統計任務程式設計實踐
2024-10-14
程式設計
用Python如何統計文字檔案中的詞頻？(Python練習)
2019-11-26
Python
Python統計四六級考試的詞頻
2018-09-10
Python
個人專案
2024-09-14
Java、Scala、Python ☞ 本地WordCount詞頻統計對比
2018-09-06
JavaPython
兩個coca略有不同詞頻檔案比較
2024-08-07
python TK庫統計word文件單詞詞頻程式 UI選擇文件
2020-12-27
PythonUI
Mac全功能專案管理軟體:ConceptDraw PROJECT
2021-12-07
Mac專案管理Project
基於RDD的Spark應用程式開發案列講解（詞頻統計）
2020-11-12
Spark
Javafx-【直方圖】文字頻次統計工具中文/英文單詞統計
2021-11-09
Java直方圖
統計英文名著中單詞出現頻率
2018-06-03
python 計算txt文字詞頻率
2018-07-29
Python
springboot整合ElasticSearch使用completion實現補全功能
2024-11-02
Spring BootElasticsearch
個人專案9/12（二）
2024-09-14
Java 多執行緒讀取檔案並統計詞頻例項出神入化的《ThreadPoolExecutor》
2021-01-18
Java執行緒thread
個人專案開發規範
2019-08-06
個人專案相關問題
2020-09-24
個人專案—論文查重
2024-09-12
個人專案：論文查重
2024-09-14
個人專案-論文查重
2024-09-15
個人開源專案：MyCms，專注自媒體部落格CMS系統
2021-10-30
基於Hadoop框架實現的對歷年四級單詞的詞頻分析（入門級Hadoop專案）
2024-05-01
Hadoop框架
全功能專案管理軟體：Project Office X for Mac中文版
2024-01-18
專案管理ProjectMac
Roslyn 簡單實現程式碼智慧提示補全功能
2024-08-18
ROS
統計檔案中出現的單詞次數
2018-03-29
第一次個人專案
2024-09-11
合約CCR量化機器人系統開發（專案設計）
2023-05-19
機器人
【csp202403-1】詞頻統計【第33次CCF計算機軟體能力認證】
2024-05-25
計算機
如何統計專案程式碼？
2024-04-25
Vue中實現輸入框的自動補全功能
2024-04-09
Vue
熱詞統計分析
2020-11-11

個人專案----詞頻統計（補全功能）

相關文章