網頁主動探測工具(修改Bug)
http://blog.itpub.net/29254281/viewspace-1344706/
程式的結構雖然定了,但是程式本身還有一些bug
首先ParseHandler的解析有些問題,原來的程式是這樣的
路徑中有冒號的被過濾掉了,但是這樣會把絕對路徑也過濾掉.
其實最開始的本意是過濾如下的地址"mvboxmtv:url=http://d.mvbox.cn/util/getPlayer.htm?appId=9306"
針對這個問題,在Task類中增加了isValid欄位,然後在Default過濾器中處理,
在過濾器的最後一個鏈條,如果還有冒號,則將Task的valid標識為false
在createNewTask函式的過濾器結束的時候,只有Task valid為true的物件,才會再次放入連線佇列
另一個修正是,Task物件的setCurrentPath方法
這個方法會擷取字尾,如"test.html?name=xx"或者“test.html”
他都可以正確的擷取到字尾 html
但是我忽略了一種情況,“test.html#category”
還有井號的情況.
修正如下
還有兩個錯誤是關於JAVA理解的問題.
在解析的處理中使用了責任鏈模式,
最開始的設計是將Task物件推入責任鏈,如果在責任鏈中判斷不符合條件,將物件設定為null
想用null作為一個判斷是否正確的狀態位
仔細想想就能明白這是一個非常初級的錯誤.
責任鏈函式內的指標作用域相當於本地變數.所以外層的函式一直引用著Task物件,無論責任鏈的函式如何處理.
解決這個問題倒是很容易,在Task類增加isValid狀態標誌就可以了.
另外還有一個併發的錯誤,
最開始設計,是希望一個併發容器可以存放已經訪問的連結,避免環路訪問導致不能結束.
private static Set<String> SET = new ConcurrentSkipListSet<String>();
但是解析的handler方法,犯了一個關於併發的典型錯誤.
雖然容器的contains方法是執行緒安全的,但是上面的這段程式碼卻是不是執行緒安全的
假設一個執行緒通過了contains方法,還沒有執行到SET.add方法,但是就在同時,另外的執行緒進入到了SET.contains方法,他得到的結果也肯定是通過的.
所以一個連結可能被執行了兩次.
新的程式增加了一個異常處理,如果Socket超時,會將這個Task物件重新放入連線佇列.
修改之後的程式碼如下:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.InetAddress;
import java.net.Socket;
import java.net.SocketException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Probe {
private static final BlockingQueue CONNECTLIST = new LinkedBlockingQueue();
private static final BlockingQueue PARSELIST = new LinkedBlockingQueue();
private static final BlockingQueue PERSISTENCELIST = new LinkedBlockingQueue();
private static ExecutorService CONNECTTHREADPOOL;
private static ExecutorService PARSETHREADPOOL;
private static ExecutorService PERSISTENCETHREADPOOL;
private static final List DOMAINLIST = new CopyOnWriteArrayList<>();
static {
CONNECTTHREADPOOL = Executors.newFixedThreadPool(100);
PARSETHREADPOOL = Executors.newSingleThreadExecutor();
PERSISTENCETHREADPOOL = Executors.newFixedThreadPool(1);
DOMAINLIST.add("news.163.com");
}
public static void main(String args[]) throws Exception {
long start = System.currentTimeMillis();
CONNECTLIST.put(new Task("news.163.com", 80, "/index.html"));
for (int i = 0; i < 120; i++) {
CONNECTTHREADPOOL
.submit(new ConnectHandler(CONNECTLIST, PARSELIST));
}
PARSETHREADPOOL.submit(new ParseHandler(CONNECTLIST, PARSELIST,
PERSISTENCELIST, DOMAINLIST));
PERSISTENCETHREADPOOL.submit(new PersistenceHandler(PERSISTENCELIST));
while (true) {
Thread.sleep(1000);
long end = System.currentTimeMillis();
float interval = ((end - start) / 1000);
int connectTotal = ConnectHandler.GETCOUNT();
int parseTotal = ParseHandler.GETCOUNT();
int persistenceTotal = PersistenceHandler.GETCOUNT();
int connectps = Math.round(connectTotal / interval);
int parseps = Math.round(parseTotal / interval);
int persistenceps = Math.round(persistenceTotal / interval);
System.out.print("\r連線總數:" + connectTotal + " \t每秒連線:" + connectps
+ "\t連線佇列剩餘:" + CONNECTLIST.size() + " \t解析總數:"
+ parseTotal + " \t每秒解析:" + parseps + "\t解析佇列剩餘:"
+ PARSELIST.size() + " \t持久化總數:" + persistenceTotal
+ " \t每秒持久化:" + persistenceps + "\t持久化佇列剩餘:"
+ PERSISTENCELIST.size());
}
}
}
class Task {
public Task() {
}
public void init(String host, int port, String path) {
this.setCurrentPath(path);
this.host = host;
this.port = port;
}
public Task(String host, int port, String path) {
init(host, port, path);
}
private String host;
private int port;
private String currentPath;
private long taskTime;
private String type;
private String content;
private int state;
private boolean isValid = true;
public boolean isValid() {
return isValid;
}
public void setValid(boolean isValid) {
this.isValid = isValid;
}
public int getState() {
return state;
}
public void setState(int state) {
this.state = state;
}
public String getCurrentPath() {
return currentPath;
}
public void setCurrentPath(String currentPath) {
this.currentPath = currentPath;
int i = 0;
if (currentPath.indexOf("?") != -1) {
i = currentPath.indexOf("?");
} else {
if (currentPath.indexOf("#") != -1) {
i = currentPath.indexOf("#");
} else {
i = currentPath.length();
}
}
this.type = currentPath.substring(currentPath.indexOf(".") + 1, i);
}
public long getTaskTime() {
return taskTime;
}
public void setTaskTime(long taskTime) {
this.taskTime = taskTime;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getHost() {
return host;
}
public int getPort() {
return port;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
}
class ParseHandler implements Runnable {
private static final Set SET = new HashSet();
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue connectlist;
private BlockingQueue parselist;
private BlockingQueue persistencelist;
List domainlist;
private interface Filter {
void doFilter(Task fatherTask, Task newTask, String path, Filter chain);
}
private class FilterChain implements Filter {
private List list = new ArrayList();
{
addFilter(new TwoLevel());
addFilter(new OneLevel());
addFilter(new FullPath());
addFilter(new Root());
addFilter(new Default());
}
private void addFilter(Filter filter) {
list.add(filter);
}
private Iterator it = list.iterator();
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (it.hasNext()) {
it.next().doFilter(fatherTask, newTask, path, chain);
}
}
}
private class TwoLevel implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("../../")) {
String prefix = getPrefix(fatherTask.getCurrentPath(), 3);
newTask.init(fatherTask.getHost(), fatherTask.getPort(),
path.replace("../../", prefix));
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class OneLevel implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("../")) {
String prefix = getPrefix(fatherTask.getCurrentPath(), 2);
newTask.init(fatherTask.getHost(), fatherTask.getPort(),
path.replace("../", prefix));
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class FullPath implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("http://")) {
Iterator it = domainlist.iterator();
boolean flag = false;
while (it.hasNext()) {
String domain = it.next();
if (path.startsWith("http://" + domain + "/")) {
newTask.init(domain, fatherTask.getPort(),
path.replace("http://" + domain + "/", "/"));
flag = true;
break;
}
}
if (!flag) {
newTask.setValid(false);
}
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class Root implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("/")) {
newTask.init(fatherTask.getHost(), fatherTask.getPort(), path);
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class Default implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.contains(":")) {
newTask.setValid(false);
return;
}
String prefix = getPrefix(fatherTask.getCurrentPath(), 1);
newTask.init(fatherTask.getHost(), fatherTask.getPort(), prefix
+ "/" + path);
}
}
public ParseHandler(BlockingQueue connectlist,
BlockingQueue parselist, BlockingQueue persistencelist,
List domainlist) {
this.connectlist = connectlist;
this.parselist = parselist;
this.persistencelist = persistencelist;
this.domainlist = domainlist;
}
private Pattern pattern = Pattern.compile("\"[^\"]+\\.htm[^\"]*\"");
private void handler() {
try {
Task task = parselist.take();
parseTaskState(task);
if (200 == task.getState()) {
Matcher matcher = pattern.matcher(task.getContent());
while (matcher.find()) {
String path = matcher.group();
if (!path.contains(" ") && !path.contains("\t")
&& !path.contains("(") && !path.contains(")")) {
path = path.substring(1, path.length() - 1);
createNewTask(task, path);
}
}
}
task.setContent(null);
persistencelist.put(task);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void parseTaskState(Task task) {
if (task.getContent().startsWith("HTTP/1.1")) {
task.setState(Integer.parseInt(task.getContent().substring(9, 12)));
} else {
task.setState(Integer.parseInt(task.getContent().substring(19, 22)));
}
}
/**
* @param fatherTask
* @param path
* @throws Exception
*/
private void createNewTask(Task fatherTask, String path) throws Exception {
Task newTask = new Task();
FilterChain filterchain = new FilterChain();
filterchain.doFilter(fatherTask, newTask, path, filterchain);
if (newTask.isValid()) {
synchronized (SET) {
if (SET.contains(newTask.getHost() + newTask.getCurrentPath())) {
return;
}
SET.add(newTask.getHost() + newTask.getCurrentPath());
}
connectlist.put(newTask);
}
}
private String getPrefix(String s, int count) {
String prefix = s;
while (count > 0) {
prefix = prefix.substring(0, prefix.lastIndexOf("/"));
count--;
}
return "".equals(prefix) ? "/" : prefix;
}
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
}
class ConnectHandler implements Runnable {
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue connectlist;
private BlockingQueue parselist;
public ConnectHandler(BlockingQueue connectlist,
BlockingQueue parselist) {
this.connectlist = connectlist;
this.parselist = parselist;
}
private void handler() {
Task task = null;
try {
task = connectlist.take();
long start = System.currentTimeMillis();
getHtml(task);
long end = System.currentTimeMillis();
task.setTaskTime(end - start);
parselist.put(task);
} catch (SocketException e) {
if (task != null) {
try {
connectlist.put(task);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void getHtml(Task task) throws Exception {
StringBuilder sb = new StringBuilder(2048);
InetAddress addr = InetAddress.getByName(task.getHost());
// 建立一個Socket
Socket socket = new Socket(addr, task.getPort());
// 傳送命令,無非就是在Socket傳送流的基礎上加多一些握手資訊,詳情請了解HTTP協議
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(
socket.getOutputStream(), "UTF-8"));
wr.write("GET " + task.getCurrentPath() + " HTTP/1.0\r\n");
wr.write("HOST:" + task.getHost() + "\r\n");
wr.write("Accept:*/*\r\n");
wr.write("\r\n");
wr.flush();
// 接收Socket返回的結果,並列印出來
BufferedReader rd = new BufferedReader(new InputStreamReader(
socket.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
wr.close();
rd.close();
task.setContent(sb.toString());
socket.close();
}
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
}
class PersistenceHandler implements Runnable {
static {
try {
Class.forName("oracle.jdbc.OracleDriver");
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue persistencelist;
public PersistenceHandler(BlockingQueue persistencelist) {
this.persistencelist = persistencelist;
try {
conn = DriverManager.getConnection(
"jdbc:oracle:thin:127.0.0.1:1521:orcl", "edmond", "edmond");
ps = conn
.prepareStatement("insert into probe(id,host,path,state,tasktime,type) values(seq_probe_id.nextval,?,?,?,?,?)");
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private Connection conn;
private PreparedStatement ps;
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
private void handler() {
try {
Task task = persistencelist.take();
ps.setString(1, task.getHost());
ps.setString(2, task.getCurrentPath());
ps.setInt(3, task.getState());
ps.setLong(4, task.getTaskTime());
ps.setString(5, task.getType());
ps.executeUpdate();
conn.commit();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
這個程式還有一些優化的空間,
比如他現在不能自動停止,
並且採用了一個Socket繫結一個執行緒的方式,耗費了大量的執行緒資源.
很多異常處理也不是很周全.
會慢慢完善的.
程式的結構雖然定了,但是程式本身還有一些bug
首先ParseHandler的解析有些問題,原來的程式是這樣的
路徑中有冒號的被過濾掉了,但是這樣會把絕對路徑也過濾掉.
其實最開始的本意是過濾如下的地址"mvboxmtv:url=http://d.mvbox.cn/util/getPlayer.htm?appId=9306"
針對這個問題,在Task類中增加了isValid欄位,然後在Default過濾器中處理,
在過濾器的最後一個鏈條,如果還有冒號,則將Task的valid標識為false
在createNewTask函式的過濾器結束的時候,只有Task valid為true的物件,才會再次放入連線佇列
另一個修正是,Task物件的setCurrentPath方法
這個方法會擷取字尾,如"test.html?name=xx"或者“test.html”
他都可以正確的擷取到字尾 html
但是我忽略了一種情況,“test.html#category”
還有井號的情況.
修正如下
還有兩個錯誤是關於JAVA理解的問題.
在解析的處理中使用了責任鏈模式,
最開始的設計是將Task物件推入責任鏈,如果在責任鏈中判斷不符合條件,將物件設定為null
想用null作為一個判斷是否正確的狀態位
仔細想想就能明白這是一個非常初級的錯誤.
責任鏈函式內的指標作用域相當於本地變數.所以外層的函式一直引用著Task物件,無論責任鏈的函式如何處理.
解決這個問題倒是很容易,在Task類增加isValid狀態標誌就可以了.
另外還有一個併發的錯誤,
最開始設計,是希望一個併發容器可以存放已經訪問的連結,避免環路訪問導致不能結束.
private static Set<String> SET = new ConcurrentSkipListSet<String>();
但是解析的handler方法,犯了一個關於併發的典型錯誤.
雖然容器的contains方法是執行緒安全的,但是上面的這段程式碼卻是不是執行緒安全的
假設一個執行緒通過了contains方法,還沒有執行到SET.add方法,但是就在同時,另外的執行緒進入到了SET.contains方法,他得到的結果也肯定是通過的.
所以一個連結可能被執行了兩次.
新的程式增加了一個異常處理,如果Socket超時,會將這個Task物件重新放入連線佇列.
修改之後的程式碼如下:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.InetAddress;
import java.net.Socket;
import java.net.SocketException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Probe {
private static final BlockingQueue
private static final BlockingQueue
private static final BlockingQueue
private static ExecutorService CONNECTTHREADPOOL;
private static ExecutorService PARSETHREADPOOL;
private static ExecutorService PERSISTENCETHREADPOOL;
private static final List
static {
CONNECTTHREADPOOL = Executors.newFixedThreadPool(100);
PARSETHREADPOOL = Executors.newSingleThreadExecutor();
PERSISTENCETHREADPOOL = Executors.newFixedThreadPool(1);
DOMAINLIST.add("news.163.com");
}
public static void main(String args[]) throws Exception {
long start = System.currentTimeMillis();
CONNECTLIST.put(new Task("news.163.com", 80, "/index.html"));
for (int i = 0; i < 120; i++) {
CONNECTTHREADPOOL
.submit(new ConnectHandler(CONNECTLIST, PARSELIST));
}
PARSETHREADPOOL.submit(new ParseHandler(CONNECTLIST, PARSELIST,
PERSISTENCELIST, DOMAINLIST));
PERSISTENCETHREADPOOL.submit(new PersistenceHandler(PERSISTENCELIST));
while (true) {
Thread.sleep(1000);
long end = System.currentTimeMillis();
float interval = ((end - start) / 1000);
int connectTotal = ConnectHandler.GETCOUNT();
int parseTotal = ParseHandler.GETCOUNT();
int persistenceTotal = PersistenceHandler.GETCOUNT();
int connectps = Math.round(connectTotal / interval);
int parseps = Math.round(parseTotal / interval);
int persistenceps = Math.round(persistenceTotal / interval);
System.out.print("\r連線總數:" + connectTotal + " \t每秒連線:" + connectps
+ "\t連線佇列剩餘:" + CONNECTLIST.size() + " \t解析總數:"
+ parseTotal + " \t每秒解析:" + parseps + "\t解析佇列剩餘:"
+ PARSELIST.size() + " \t持久化總數:" + persistenceTotal
+ " \t每秒持久化:" + persistenceps + "\t持久化佇列剩餘:"
+ PERSISTENCELIST.size());
}
}
}
class Task {
public Task() {
}
public void init(String host, int port, String path) {
this.setCurrentPath(path);
this.host = host;
this.port = port;
}
public Task(String host, int port, String path) {
init(host, port, path);
}
private String host;
private int port;
private String currentPath;
private long taskTime;
private String type;
private String content;
private int state;
private boolean isValid = true;
public boolean isValid() {
return isValid;
}
public void setValid(boolean isValid) {
this.isValid = isValid;
}
public int getState() {
return state;
}
public void setState(int state) {
this.state = state;
}
public String getCurrentPath() {
return currentPath;
}
public void setCurrentPath(String currentPath) {
this.currentPath = currentPath;
int i = 0;
if (currentPath.indexOf("?") != -1) {
i = currentPath.indexOf("?");
} else {
if (currentPath.indexOf("#") != -1) {
i = currentPath.indexOf("#");
} else {
i = currentPath.length();
}
}
this.type = currentPath.substring(currentPath.indexOf(".") + 1, i);
}
public long getTaskTime() {
return taskTime;
}
public void setTaskTime(long taskTime) {
this.taskTime = taskTime;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getHost() {
return host;
}
public int getPort() {
return port;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
}
class ParseHandler implements Runnable {
private static final Set
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue
private BlockingQueue
private BlockingQueue
List
private interface Filter {
void doFilter(Task fatherTask, Task newTask, String path, Filter chain);
}
private class FilterChain implements Filter {
private List
{
addFilter(new TwoLevel());
addFilter(new OneLevel());
addFilter(new FullPath());
addFilter(new Root());
addFilter(new Default());
}
private void addFilter(Filter filter) {
list.add(filter);
}
private Iterator
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (it.hasNext()) {
it.next().doFilter(fatherTask, newTask, path, chain);
}
}
}
private class TwoLevel implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("../../")) {
String prefix = getPrefix(fatherTask.getCurrentPath(), 3);
newTask.init(fatherTask.getHost(), fatherTask.getPort(),
path.replace("../../", prefix));
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class OneLevel implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("../")) {
String prefix = getPrefix(fatherTask.getCurrentPath(), 2);
newTask.init(fatherTask.getHost(), fatherTask.getPort(),
path.replace("../", prefix));
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class FullPath implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("http://")) {
Iterator
boolean flag = false;
while (it.hasNext()) {
String domain = it.next();
if (path.startsWith("http://" + domain + "/")) {
newTask.init(domain, fatherTask.getPort(),
path.replace("http://" + domain + "/", "/"));
flag = true;
break;
}
}
if (!flag) {
newTask.setValid(false);
}
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class Root implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.startsWith("/")) {
newTask.init(fatherTask.getHost(), fatherTask.getPort(), path);
} else {
chain.doFilter(fatherTask, newTask, path, chain);
}
}
}
private class Default implements Filter {
@Override
public void doFilter(Task fatherTask, Task newTask, String path,
Filter chain) {
if (path.contains(":")) {
newTask.setValid(false);
return;
}
String prefix = getPrefix(fatherTask.getCurrentPath(), 1);
newTask.init(fatherTask.getHost(), fatherTask.getPort(), prefix
+ "/" + path);
}
}
public ParseHandler(BlockingQueue
BlockingQueue
List
this.connectlist = connectlist;
this.parselist = parselist;
this.persistencelist = persistencelist;
this.domainlist = domainlist;
}
private Pattern pattern = Pattern.compile("\"[^\"]+\\.htm[^\"]*\"");
private void handler() {
try {
Task task = parselist.take();
parseTaskState(task);
if (200 == task.getState()) {
Matcher matcher = pattern.matcher(task.getContent());
while (matcher.find()) {
String path = matcher.group();
if (!path.contains(" ") && !path.contains("\t")
&& !path.contains("(") && !path.contains(")")) {
path = path.substring(1, path.length() - 1);
createNewTask(task, path);
}
}
}
task.setContent(null);
persistencelist.put(task);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void parseTaskState(Task task) {
if (task.getContent().startsWith("HTTP/1.1")) {
task.setState(Integer.parseInt(task.getContent().substring(9, 12)));
} else {
task.setState(Integer.parseInt(task.getContent().substring(19, 22)));
}
}
/**
* @param fatherTask
* @param path
* @throws Exception
*/
private void createNewTask(Task fatherTask, String path) throws Exception {
Task newTask = new Task();
FilterChain filterchain = new FilterChain();
filterchain.doFilter(fatherTask, newTask, path, filterchain);
if (newTask.isValid()) {
synchronized (SET) {
if (SET.contains(newTask.getHost() + newTask.getCurrentPath())) {
return;
}
SET.add(newTask.getHost() + newTask.getCurrentPath());
}
connectlist.put(newTask);
}
}
private String getPrefix(String s, int count) {
String prefix = s;
while (count > 0) {
prefix = prefix.substring(0, prefix.lastIndexOf("/"));
count--;
}
return "".equals(prefix) ? "/" : prefix;
}
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
}
class ConnectHandler implements Runnable {
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue
private BlockingQueue
public ConnectHandler(BlockingQueue
BlockingQueue
this.connectlist = connectlist;
this.parselist = parselist;
}
private void handler() {
Task task = null;
try {
task = connectlist.take();
long start = System.currentTimeMillis();
getHtml(task);
long end = System.currentTimeMillis();
task.setTaskTime(end - start);
parselist.put(task);
} catch (SocketException e) {
if (task != null) {
try {
connectlist.put(task);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void getHtml(Task task) throws Exception {
StringBuilder sb = new StringBuilder(2048);
InetAddress addr = InetAddress.getByName(task.getHost());
// 建立一個Socket
Socket socket = new Socket(addr, task.getPort());
// 傳送命令,無非就是在Socket傳送流的基礎上加多一些握手資訊,詳情請了解HTTP協議
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(
socket.getOutputStream(), "UTF-8"));
wr.write("GET " + task.getCurrentPath() + " HTTP/1.0\r\n");
wr.write("HOST:" + task.getHost() + "\r\n");
wr.write("Accept:*/*\r\n");
wr.write("\r\n");
wr.flush();
// 接收Socket返回的結果,並列印出來
BufferedReader rd = new BufferedReader(new InputStreamReader(
socket.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
wr.close();
rd.close();
task.setContent(sb.toString());
socket.close();
}
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
}
class PersistenceHandler implements Runnable {
static {
try {
Class.forName("oracle.jdbc.OracleDriver");
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static int GETCOUNT() {
return COUNT.get();
}
private static final AtomicInteger COUNT = new AtomicInteger();
private BlockingQueue
public PersistenceHandler(BlockingQueue
this.persistencelist = persistencelist;
try {
conn = DriverManager.getConnection(
"jdbc:oracle:thin:127.0.0.1:1521:orcl", "edmond", "edmond");
ps = conn
.prepareStatement("insert into probe(id,host,path,state,tasktime,type) values(seq_probe_id.nextval,?,?,?,?,?)");
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private Connection conn;
private PreparedStatement ps;
@Override
public void run() {
while (true) {
this.handler();
COUNT.addAndGet(1);
}
}
private void handler() {
try {
Task task = persistencelist.take();
ps.setString(1, task.getHost());
ps.setString(2, task.getCurrentPath());
ps.setInt(3, task.getState());
ps.setLong(4, task.getTaskTime());
ps.setString(5, task.getType());
ps.executeUpdate();
conn.commit();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
這個程式還有一些優化的空間,
比如他現在不能自動停止,
並且採用了一個Socket繫結一個執行緒的方式,耗費了大量的執行緒資源.
很多異常處理也不是很周全.
會慢慢完善的.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29254281/viewspace-1347985/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 主頁bug
- 網站換主頁在哪裡修改網站
- php如何動態修改網頁titlePHP網頁
- ai網頁詳情頁-測試-只差樣式修改網頁
- php網站修改主頁顏色,輕鬆調整網站主頁配色方案PHP網站
- 移動web頁面如何自動探測電話號碼?Web
- 公司網站修改?網站主頁修改方案模板?網站
- 公司網站後臺修改網頁,如何在公司網站後臺修改網頁網站網頁
- Kali路由策略探測工具————firewalk路由
- Windows小技巧|登錄檔修改主頁Windows
- 網站主網頁修改,如何更新網站主頁內容網站網頁
- Selenium自動化測試網頁網頁
- 網頁程式碼(主頁)(初始版):網頁
- zabbix-agent修改主動模式模式
- 怎樣修改公司網站主頁?網站
- 修改網頁內容的方法網頁
- 火狐瀏覽器(69版)修改起始頁,主頁和新標籤頁瀏覽器
- win10如何用登錄檔修改主頁_win10通過登錄檔修改瀏覽器主頁步驟Win10瀏覽器
- win10如何用登錄檔修改主頁_win10透過登錄檔修改瀏覽器主頁步驟Win10瀏覽器
- 軟體分享:網頁監測及IIS重啟工具網頁
- 如何自動重新整理網頁?Auto Refresh Page網頁自動重新整理工具分享網頁
- 怎麼修改公司網站頁面,如何在公司網站後臺修改頁面內容網站
- php網站首頁動態地址修改,如何將PHP網站首頁的動態地址改為靜態地址PHP網站
- WIN10修改主頁後還是2345如何處理 win10主頁被2345鎖定無法修改解決方法Win10
- edge瀏覽器怎麼設定預設主頁 新版edge瀏覽器如何修改主頁瀏覽器
- 如何修改網頁文字或圖片?網頁
- 寶塔皮膚修改網站首頁 如何在寶塔皮膚中修改網站首頁網站
- SkyORB 2021 Astronomy for Mac(天文探測學習工具)ORBASTMac
- 修改tomcat預設主頁,修改後只需要輸入埠就能訪問Tomcat
- win10ie主頁修改了無效怎麼回事 win10修改ie瀏覽器主頁無效怎麼處理Win10瀏覽器
- 軟體分享:網頁監測及 IIS 重啟工具 IISMonitor網頁
- 使用Python編寫一個滲透測試探測工具Python
- 谷歌瀏覽器測試移動端網頁谷歌瀏覽器網頁
- 使用 Python 和 Selenium 自動化網頁測試Python網頁
- Auto Refresh Page for Mac(自動重新整理網頁工具)Mac網頁
- uni-app 動態修改主題色APP
- (Django)18.3建立網頁:學習筆記主頁Django網頁筆記
- 怎麼修改一個公司網站的網頁?網站網頁
- Win10系統IE瀏覽器主頁怎麼修改?Win10系統IE瀏覽器主頁的修改方法介紹Win10瀏覽器