HttpClient和HtmlParser配合實現自動登陸系統抽取頁面資訊
HtmlParser程式碼介面變化比較多,因此寫一個最新的。廢話不多說,貼程式碼共大家享用!
/*
* Main.java
*
* Created on 2007年1月19日, 上午9:14
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/
package wapproxy;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;
import java.io.*;
import org.htmlparser.Node;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
import org.htmlparser.filters.TagNameFilter;
import org.htmlparser.tags.*;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
/**
*
* @author xcz
*/
public class Main {
/** Creates a new instance of Main */
public Main() {
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws Exception {
// Create an instance of HttpClient.
HttpClient client = new HttpClient();
// Create a method instance.
PostMethod post_method = new PostMethod("http://localhost/rcpq/");
NameValuePair[] data = {
new NameValuePair("username", "admin"),
new NameValuePair("password", "admin"),
new NameValuePair("dologin", "1"),
};
post_method.setRequestBody(data);
try {
// Execute the method.
int statusCode = client.executeMethod(post_method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + post_method.getStatusLine());
}
// Read the response body.
//byte[] responseBody = post_method.getResponseBody();
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
//System.out.println(new String(responseBody));
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
post_method.releaseConnection();
}
byte[] responseBody = null;
GetMethod get_method = new GetMethod("http://localhost/rcpq/unit.php");
// Provide custom retry handler is necessary
get_method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
try {
// Execute the method.
int statusCode = client.executeMethod(get_method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + get_method.getStatusLine());
}
// Read the response body.
//responseBody = get_method.getResponseBody();
//這裡用流來讀頁面
InputStream in = get_method.getResponseBodyAsStream();
if (in != null) {
byte[] tmp = new byte[4096];
int bytesRead = 0;
ByteArrayOutputStream buffer = new ByteArrayOutputStream(1024);
while ((bytesRead = in.read(tmp)) != -1) {
buffer.write(tmp, 0, bytesRead);
}
responseBody = buffer.toByteArray();
}
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
//System.out.println(new String(responseBody));
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
get_method.releaseConnection();
}
Parser parser;
parser = Parser.createParser(new String(responseBody, "GBK"), "GBK");
String filterStr = "table";
NodeFilter filter = new TagNameFilter(filterStr);
NodeList tables = parser.extractAllNodesThatMatch(filter);
//System.out.println(tables.elementAt(17).toString());
//找到單位列表所在的表格
TableTag tabletag = (TableTag) tables.elementAt(17);
TableRow row = tabletag.getRow(3);
TableColumn[] cols = row.getColumns();
//System.out.println("單位名稱:" + cols[2].toHtml());
System.out.println("單位名稱:" + cols[2].childAt(0).getText());
}
}
Trackback: http://tb.blog.csdn.net/TrackBack.aspx?PostId=1487602
相關文章
- 「手把手」利用websocket實現手機掃碼登陸後,同步登陸資訊到web端頁面Web
- linux 實現開機自動登陸Linux
- Vue學習:實現使用者沒有登陸時,訪問後自動跳轉登入頁面Vue
- 登陸頁面測試
- Android讀取XML實現軟體自動登陸AndroidXML
- 搭建jenkins配合gitee實現自動部署JenkinsGitee
- app直播原始碼,平臺登入頁面實現和修改密碼頁面實現APP原始碼密碼
- python 獲取excel資料 自動登陸PythonExcel
- 直播系統app原始碼,Android studio 實現app登入註冊頁面APP原始碼Android
- 在pycharm中用python Django來實現登陸首頁PyCharmPythonDjango
- php怎麼實現登陸後跳轉網頁PHP網頁
- vue實現後臺管理系統頁面功能和頁面路由許可權的控制Vue路由
- [shell]shell指令碼實現每天自動抽取資料插入hive表指令碼Hive
- Flutter開始干係列-實現一個登陸頁Flutter
- 直播系統原始碼,自動登入及記住密碼實現原始碼密碼
- JS實現登陸介面JS
- 直播系統中網頁類似app頁面切換動畫的實現方式網頁APP動畫
- Vue+Element-ui建立一個登陸頁面VueUI
- 【資訊科技】【2013.08】觀察蜜蜂活動的自動影像處理系統的實現
- koa2+vue實現登陸以及是否登陸控制Vue
- react配合antd元件實現的管理系統demoReact元件
- sharedWorker 實現多頁面通訊
- HTML基礎實現簡單的註冊和登入頁面HTML
- 教你如何實現頁面間的資料通訊
- 搭建自動化 Web 頁面效能檢測系統 —— 部署篇Web
- iOS 12 workflow 配合 AppleScript 實現 Mac 自動初始化iOSAPPMac
- H5頁面實現滑動控制音訊播放H5音訊
- 短視訊程式原始碼,PageSlider實現滑動頁面原始碼IDE
- js實現回車登陸JS
- Laravel 配合 puppeteer 抓取 SPA 頁面Laravel
- js實現操作成功之後自動跳轉頁面JS
- tkinter做一個簡單的登陸頁面(十六)
- 登陸註冊頁面html程式碼(仿知乎)HTML
- Httpclient 介面自動化HTTPclient
- PXE實現系統自動化安裝
- 直播app原始碼,系統首頁或任意頁面下拉自動重新整理APP原始碼
- 使用httpclient下載 頁面、圖片HTTPclient
- Selenium使用Cookie實現自動登入Cookie
- vue實現登入和個人資訊元件展示Vue元件