詞頻統計-------------web版本

blogli發表於2016-10-19

要求:把程式遷移到web平臺,通過使用者上傳TXT的方式接收檔案。建議(但不強制要求)保留並維護Console版本,有利於測試。

在頁面上設定上傳的控制元件,然後在servlet中接受,得到的是一個位元組流,然後轉化為字元型在原有程式碼中進行統計。

jsp頁面的程式碼如下

<%@ page language="java" contentType="text/html; charset=utf-8"
    pageEncoding="utf-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Insert title here</title>
</head>
<body>
 <table>
     <tr>
         <td>
             <form action="server/CountWordServlet" method="post" enctype="multipart/form-data">
             請上傳要統計的檔案<input type="file" name="sourceFile"/>
                     <input type="submit" value="上傳">
             </form>
         </td>
     </tr>
 </table>
</body>
</html>

展示結果的頁面如下

<%@page import="com.server.servlet.Word"%>
<%@page import="java.util.ArrayList"%>
<%@ page language="java" contentType="text/html; charset=utf-8"
    pageEncoding="utf-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<%ArrayList<Word> list=(ArrayList<Word>)request.getAttribute("list"); %>
<title>Insert title here</title>
</head>
<body>
 <table>
         
             <%
             if(list!=null&&list.size()!=0){
                 %>
                 <tr> <td>單詞</td><td>數量</td> </tr>
                 <% 
                 for(int i=0;i<list.size();i++){
                      String word=((Word)list.get(i)).getWord();
                      int num=((Word)list.get(i)).getNum();
                      %><tr>
                          <td><%=word%></td>
                          <td><%=num%></td>
                      </tr> 
                      <%  
                  }
             }else{  %>
                 <td>此檔案沒有單詞或者檔案不存在</td>
         <%     }
          %>
          
 </table>
</body>
</html>

servle中的程式碼如下

public class CountWordServlet extends HttpServlet {
    private static final long serialVersionUID = 1L;
    
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        try {
        request.setCharacterEncoding("utf-8");
        ArrayList<Word> list=new ArrayList<>();
        DiskFileItemFactory factory=new DiskFileItemFactory();
        ServletFileUpload upload=new ServletFileUpload(factory); 
            FileItemIterator iterator=upload.getItemIterator(request);
            while(iterator.hasNext()){
                InputStream input=iterator.next().openStream();
                 
                WordCountFreq wcf=new WordCountFreq();
                list=(ArrayList<Word>) wcf.sortAndOutput(input);
                request.setAttribute("list", list);
            }
        } catch (FileUploadException e) { 
            e.printStackTrace();
        }
        System.out.println("成功了!");
        response.setContentType("text/html;charset=utf-8");
         
          request.getRequestDispatcher("/show.jsp").forward(request, response); 
    }

}

然後將統計過程的關鍵方法sortAndOutput()展示如下

public List<Word> sortAndOutput(InputStream input) throws IOException {
        BufferedInputStream bis=new BufferedInputStream(input);
        byte [] buf = new byte[1024];
        int len = -1; 
     String temp = ""; 
        String lastWord = ""; 
        while((len = bis.read(buf)) != -1) {
            //將讀取到的位元組資料轉化為字串列印出來
            String str = new String(buf,0,len); 
             temp = ""; 
            temp += lastWord;
            for (int i = 0; i < str.length(); i++) {
                temp += str.charAt(i);
            }
            lastWord = ""; 
            if (Character.isLetter(str.charAt(str.length()-1))) { 
                int j, t;
                for (j = str.length() - 1, t = 0; Character.isLetter(str.charAt(j)); j--, t++); 
                temp = temp.substring(0, temp.length() - t); 
                for (int k = j + 1; k < str.length(); k++) {
                    lastWord += str.charAt(k);
                }
            }  
            root = generateCharTree(temp);  
        }

示例如下

 

 在沒做web版本之前,只是傳入檔案的路徑進行處理。改為web版本之後將遇見的一點小困難是要將位元組流轉化為字元進行處理,經過查詢也很快就解決了。

 

ssh:git@git.coding.net:muziliquan/GUIVersion.git

git:git://git.coding.net/muziliquan/GUIVersion.git

相關文章