網頁主動探測工具-NIO優化
接前文
http://blog.itpub.net/29254281/viewspace-1344706/
http://blog.itpub.net/29254281/viewspace-1347985/
建表語句:
CREATE SEQUENCE seq_probe_id INCREMENT BY 1 START WITH 1 NOMAXvalue NOCYCLE CACHE 2000;
create table probe(
id int primary key,
host varchar(40) not null,
path varchar(500) not null,
state int not null,
taskTime int not null,
type varchar(10) not null,
createtime date default sysdate not null
) ;
使用NIO優化這個程式,進一步壓榨資源使用率,已經想了好長時間了
無奈NIO+多執行緒,網上例子都不是很靠譜.自己學的也非常頭疼,一拖就是一年多.
新的程式,採用三段過程
首先 使用一個執行緒池不斷的傳送連線請求,但是不處理接收.僅僅註冊一個SelectionKey.OP_READ的鍵
另外的一個單執行緒 程式,不斷select符合條件的通道,然後分配給另外一個執行緒池,用於接收資料,解析資料.(接收和解析的過程合併了)
最後,使用一個單執行緒的程式,不斷的把結果通過批量的方式刷入資料庫.這塊也算一個優化.由單條Insert改為批量入庫.這塊至少節約了一個CPU核的處理能力.
持久化過程和解析過程 基本複用了原來的程式碼
每秒可以爬170-200左右的網頁
因為這個速度受制於公司頻寬.
CPU也基本上跑滿了
這個程式還有優化的空間,主要是以下程式碼的阻塞和喚醒關係,還是沒有搞明白.
socketChannel.register(selector, SelectionKey.OP_READ, task);
int n = selector.select();
key.selector().wakeup();
http://blog.itpub.net/29254281/viewspace-1344706/
http://blog.itpub.net/29254281/viewspace-1347985/
建表語句:
CREATE SEQUENCE seq_probe_id INCREMENT BY 1 START WITH 1 NOMAXvalue NOCYCLE CACHE 2000;
create table probe(
id int primary key,
host varchar(40) not null,
path varchar(500) not null,
state int not null,
taskTime int not null,
type varchar(10) not null,
createtime date default sysdate not null
) ;
使用NIO優化這個程式,進一步壓榨資源使用率,已經想了好長時間了
無奈NIO+多執行緒,網上例子都不是很靠譜.自己學的也非常頭疼,一拖就是一年多.
新的程式,採用三段過程
首先 使用一個執行緒池不斷的傳送連線請求,但是不處理接收.僅僅註冊一個SelectionKey.OP_READ的鍵
另外的一個單執行緒 程式,不斷select符合條件的通道,然後分配給另外一個執行緒池,用於接收資料,解析資料.(接收和解析的過程合併了)
最後,使用一個單執行緒的程式,不斷的把結果通過批量的方式刷入資料庫.這塊也算一個優化.由單條Insert改為批量入庫.這塊至少節約了一個CPU核的處理能力.
持久化過程和解析過程 基本複用了原來的程式碼
- import java.io.IOException;
- import java.net.InetSocketAddress;
- import java.net.SocketAddress;
- import java.nio.ByteBuffer;
- import java.nio.channels.SelectionKey;
- import java.nio.channels.Selector;
- import java.nio.channels.SocketChannel;
- import java.nio.charset.Charset;
- import java.sql.Connection;
- import java.sql.DriverManager;
- import java.sql.PreparedStatement;
- import java.sql.SQLException;
- import java.util.ArrayList;
- import java.util.HashSet;
- import java.util.Iterator;
- import java.util.List;
- import java.util.Set;
- import java.util.concurrent.BlockingQueue;
- import java.util.concurrent.CopyOnWriteArrayList;
- import java.util.concurrent.ExecutorService;
- import java.util.concurrent.Executors;
- import java.util.concurrent.LinkedBlockingQueue;
- import java.util.concurrent.atomic.AtomicInteger;
- import java.util.regex.Matcher;
- import java.util.regex.Pattern;
- public class Probe {
- private static final int REQUESTTHREADCOUNT = 10;
- private static final BlockingQueue CONNECTLIST = new LinkedBlockingQueue();
- private static final BlockingQueue PERSISTENCELIST = new LinkedBlockingQueue();
- private static ExecutorService REQUESTTHREADPOOL;
- private static ExecutorService RESPONSETHREADPOOL;
- private static ExecutorService PERSISTENCETHREADPOOL;
- private static final List DOMAINLIST = new CopyOnWriteArrayList<>();
- private static Selector SELECTOR;
- static {
- REQUESTTHREADPOOL = Executors.newFixedThreadPool(REQUESTTHREADCOUNT);
- RESPONSETHREADPOOL = Executors.newFixedThreadPool(3);
- PERSISTENCETHREADPOOL = Executors.newFixedThreadPool(1);
- DOMAINLIST.add("news.163.com");
- try {
- SELECTOR = Selector.open();
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
- public static void main(String[] args) throws IOException, InterruptedException {
- long start = System.currentTimeMillis();
- CONNECTLIST.put(new Task("news.163.com", 80, "/index.html"));
- for (int i = 0; i < REQUESTTHREADCOUNT; i++) {
- REQUESTTHREADPOOL.submit(new RequestHandler(CONNECTLIST, SELECTOR));
- }
- RESPONSETHREADPOOL
- .submit(new ResponseHandler(SELECTOR, CONNECTLIST, PERSISTENCELIST, DOMAINLIST, RESPONSETHREADPOOL));
- PERSISTENCETHREADPOOL.submit(new PersistenceHandler(PERSISTENCELIST));
- while (true) {
- Thread.sleep(1000);
- long end = System.currentTimeMillis();
- float interval = ((end - start) / 1000);
- int connectTotal = ResponseHandler.GETCOUNT();
- int persistenceTotal = PersistenceHandler.GETCOUNT();
- int connectps = Math.round(connectTotal / interval);
- int persistenceps = Math.round(persistenceTotal / interval);
- System.out.print(
- "\r連線總數:" + connectTotal + " \t每秒連線:" + connectps + "\t連線佇列剩餘:" + CONNECTLIST.size() + " \t持久化總數:"
- + persistenceTotal + " \t每秒持久化:" + persistenceps + "\t持久化佇列剩餘:" + PERSISTENCELIST.size());
- }
- }
- }
- class RequestHandler implements Runnable {
- BlockingQueue connectlist;
- Selector selector;
- public RequestHandler(BlockingQueue connectlist, Selector selector) {
- this.connectlist = connectlist;
- this.selector = selector;
- }
- @Override
- public void run() {
- while (true) {
- try {
- Task task = (Task) connectlist.take();
- SocketAddress addr = new InetSocketAddress(task.getHost(), 80);
- SocketChannel socketChannel = SocketChannel.open(addr);
- socketChannel.configureBlocking(false);
- ByteBuffer byteBuffer = ByteBuffer.allocate(2400);
- byteBuffer.put(("GET " + task.getCurrentPath() + " HTTP/1.0\r\n").getBytes("utf8"));
- byteBuffer.put(("HOST:" + task.getHost() + "\r\n").getBytes("utf8"));
- byteBuffer.put(("Accept:*/*\r\n").getBytes("utf8"));
- byteBuffer.put(("\r\n").getBytes("utf8"));
- byteBuffer.flip();
- socketChannel.write(byteBuffer);
- byteBuffer.clear();
- socketChannel.register(selector, SelectionKey.OP_READ, task);
- selector.wakeup();
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
- }
- class ResponseHandler implements Runnable {
- Selector selector;
- BlockingQueue connectlist;
- BlockingQueue persistencelist;
- List domainlist;
- ExecutorService threadPool;
- Charset charset = Charset.forName("utf8");
- Charset gbkcharset = Charset.forName("gbk");
- public static int GETCOUNT() {
- return COUNT.get();
- }
- private static final AtomicInteger COUNT = new AtomicInteger();
- public ResponseHandler(Selector selector, BlockingQueue connectlist, BlockingQueue persistencelist, List domainlist,
- ExecutorService threadpool) {
- this.selector = selector;
- this.connectlist = connectlist;
- this.persistencelist = persistencelist;
- this.domainlist = domainlist;
- this.threadPool = threadpool;
- }
- @Override
- public void run() {
- while (true) {
- try {
- int n = selector.selectNow();
- if (n == 0)
- continue;
- Iterator it = selector.selectedKeys().iterator();
- while (it.hasNext()) {
- SelectionKey key = (SelectionKey) it.next();
- if (key.isReadable() && key.isValid()) {
- key.interestOps(key.interestOps() & (~SelectionKey.OP_READ));
- Runnable r = new Runnable() {
- @Override
- public void run() {
- try {
- Task task = (Task) key.attachment();
- ByteBuffer byteBuffer = ByteBuffer.allocate(2400);
- SocketChannel channel = (SocketChannel) key.channel();
- int length;
- while ((length = channel.read(byteBuffer)) > 0) {
- byteBuffer.flip();
- task.appendContent(charset.decode(charset.encode(gbkcharset.decode(byteBuffer)))
- .toString());
- byteBuffer.compact();
- }
- if (length == -1) {
- channel.close();
- COUNT.incrementAndGet();
- new ParseHandler(task, connectlist, persistencelist, domainlist).handler();
- } else {
- channel.register(selector, SelectionKey.OP_READ, task);
- }
- key.selector().wakeup();
- } catch (Exception e) {
- try {
- key.cancel();
- key.channel().close();
- } catch (IOException e1) {
- e1.printStackTrace();
- }
- e.printStackTrace();
- }
- }
- };
- threadPool.submit(r);
- }
- it.remove();
- }
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
- }
- class ParseHandler {
- private static final Set SET = new HashSet();
- private BlockingQueue connectlist;
- private BlockingQueue persistencelist;
- List domainlist;
- Task task;
- private interface Filter {
- void doFilter(Task fatherTask, Task newTask, String path, Filter chain);
- }
- private class FilterChain implements Filter {
- private List list = new ArrayList();
- {
- addFilter(new TwoLevel());
- addFilter(new OneLevel());
- addFilter(new FullPath());
- addFilter(new Root());
- addFilter(new Default());
- }
- private void addFilter(Filter filter) {
- list.add(filter);
- }
- private Iterator it = list.iterator();
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (it.hasNext()) {
- ((Filter) it.next()).doFilter(fatherTask, newTask, path, chain);
- }
- }
- }
- private class TwoLevel implements Filter {
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (path.startsWith("../../")) {
- String prefix = getPrefix(fatherTask.getCurrentPath(), 3);
- newTask.init(fatherTask.getHost(), fatherTask.getPort(), path.replace("../../", prefix));
- } else {
- chain.doFilter(fatherTask, newTask, path, chain);
- }
- }
- }
- private class OneLevel implements Filter {
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (path.startsWith("../")) {
- String prefix = getPrefix(fatherTask.getCurrentPath(), 2);
- newTask.init(fatherTask.getHost(), fatherTask.getPort(), path.replace("../", prefix));
- } else {
- chain.doFilter(fatherTask, newTask, path, chain);
- }
- }
- }
- private class FullPath implements Filter {
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (path.startsWith("http://")) {
- Iterator it = domainlist.iterator();
- boolean flag = false;
- while (it.hasNext()) {
- String domain = (String) it.next();
- if (path.startsWith("http://" + domain + "/")) {
- newTask.init(domain, fatherTask.getPort(), path.replace("http://" + domain + "/", "/"));
- flag = true;
- break;
- }
- }
- if (!flag) {
- newTask.setValid(false);
- }
- } else {
- chain.doFilter(fatherTask, newTask, path, chain);
- }
- }
- }
- private class Root implements Filter {
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (path.startsWith("/")) {
- newTask.init(fatherTask.getHost(), fatherTask.getPort(), path);
- } else {
- chain.doFilter(fatherTask, newTask, path, chain);
- }
- }
- }
- private class Default implements Filter {
- @Override
- public void doFilter(Task fatherTask, Task newTask, String path, Filter chain) {
- if (path.contains(":")) {
- newTask.setValid(false);
- return;
- }
- String prefix = getPrefix(fatherTask.getCurrentPath(), 1);
- newTask.init(fatherTask.getHost(), fatherTask.getPort(), prefix + "/" + path);
- }
- }
- public ParseHandler(Task task, BlockingQueue connectlist, BlockingQueue persistencelist, List domainlist) {
- this.connectlist = connectlist;
- this.task = task;
- this.persistencelist = persistencelist;
- this.domainlist = domainlist;
- }
- private Pattern pattern = Pattern.compile("\"[^\"]+\\.htm[^\"]*\"");
- protected void handler() {
- try {
- parseTaskState(task);
- if (200 == task.getState()) {
- Matcher matcher = pattern.matcher(task.getContent());
- while (matcher.find()) {
- String path = matcher.group();
- if (!path.contains(" ") && !path.contains("\t") && !path.contains("(") && !path.contains(")")) {
- path = path.substring(1, path.length() - 1);
- createNewTask(task, path);
- }
- }
- }
- task.dropContent();
- persistencelist.put(task);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- private void parseTaskState(Task task) {
- if (task.getContent().startsWith("HTTP/1.1")) {
- task.setState(Integer.parseInt(task.getContent().substring(9, 12)));
- } else {
- task.setState(Integer.parseInt(task.getContent().substring(9, 12)));
- }
- }
- /**
- * @param fatherTask
- * @param path
- * @throws Exception
- */
- private void createNewTask(Task fatherTask, String path) throws Exception {
- Task newTask = new Task();
- FilterChain filterchain = new FilterChain();
- filterchain.doFilter(fatherTask, newTask, path, filterchain);
- if (newTask.isValid()) {
- synchronized (SET) {
- if (SET.contains(newTask.getHost() + newTask.getCurrentPath())) {
- return;
- }
- SET.add(newTask.getHost() + newTask.getCurrentPath());
- }
- connectlist.put(newTask);
- }
- }
- private String getPrefix(String s, int count) {
- String prefix = s;
- while (count > 0) {
- prefix = prefix.substring(0, prefix.lastIndexOf("/"));
- count--;
- }
- return "".equals(prefix) ? "/" : prefix;
- }
- }
- class Task {
- public Task() {
- }
- public void init(String host, int port, String path) {
- this.setCurrentPath(path);
- this.host = host;
- this.port = port;
- }
- public Task(String host, int port, String path) {
- init(host, port, path);
- }
- private String host;
- private int port;
- private String currentPath;
- private long starttime;
- private long endtime;
- public long getStarttime() {
- return starttime;
- }
- public void setStarttime(long starttime) {
- this.starttime = starttime;
- }
- public long getEndtime() {
- return endtime;
- }
- public void setEndtime(long endtime) {
- this.endtime = endtime;
- }
- private long taskTime;
- private String type;
- private StringBuilder content = new StringBuilder(2400);
- private int state;
- private boolean isValid = true;
- public boolean isValid() {
- return isValid;
- }
- public void setValid(boolean isValid) {
- this.isValid = isValid;
- }
- public int getState() {
- return state;
- }
- public void setState(int state) {
- this.state = state;
- }
- public String getCurrentPath() {
- return currentPath;
- }
- public void setCurrentPath(String currentPath) {
- this.currentPath = currentPath;
- int i = 0;
- if (currentPath.indexOf("?") != -1) {
- i = currentPath.indexOf("?");
- } else {
- if (currentPath.indexOf("#") != -1) {
- i = currentPath.indexOf("#");
- } else {
- i = currentPath.length();
- }
- }
- this.type = currentPath.substring(currentPath.indexOf(".") + 1, i);
- }
- public long getTaskTime() {
- return getEndtime() - getStarttime();
- }
- public String getType() {
- return type;
- }
- public void setType(String type) {
- this.type = type;
- }
- public String getHost() {
- return host;
- }
- public int getPort() {
- return port;
- }
- public String getContent() {
- return content.toString();
- }
- public void dropContent() {
- this.content = null;
- }
- public void appendContent(String content) {
- this.content.append(content);
- }
- }
- class PersistenceHandler implements Runnable {
- static {
- try {
- Class.forName("oracle.jdbc.OracleDriver");
- } catch (ClassNotFoundException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
- public static int GETCOUNT() {
- return COUNT.get();
- }
- private static final AtomicInteger COUNT = new AtomicInteger();
- private BlockingQueue persistencelist;
- public PersistenceHandler(BlockingQueue persistencelist) {
- this.persistencelist = persistencelist;
- try {
- conn = DriverManager.getConnection("jdbc:oracle:thin:127.0.0.1:1521:orcl", "edmond", "edmond");
- ps = conn.prepareStatement(
- "insert into probe(id,host,path,state,tasktime,type) values(seq_probe_id.nextval,?,?,?,?,?)");
- } catch (SQLException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
- private Connection conn;
- private PreparedStatement ps;
- @Override
- public void run() {
- while (true) {
- this.handler();
- COUNT.addAndGet(1);
- }
- }
- private void handler() {
- try {
- Task task = (Task) persistencelist.take();
- ps.setString(1, task.getHost());
- ps.setString(2, task.getCurrentPath());
- ps.setInt(3, task.getState());
- ps.setLong(4, task.getTaskTime());
- ps.setString(5, task.getType());
- ps.addBatch();
- if (GETCOUNT() % 500 == 0) {
- ps.executeBatch();
- conn.commit();
- }
- } catch (InterruptedException e) {
- e.printStackTrace();
- } catch (SQLException e) {
- e.printStackTrace();
- }
- }
- }
每秒可以爬170-200左右的網頁
因為這個速度受制於公司頻寬.
CPU也基本上跑滿了
這個程式還有優化的空間,主要是以下程式碼的阻塞和喚醒關係,還是沒有搞明白.
socketChannel.register(selector, SelectionKey.OP_READ, task);
int n = selector.select();
key.selector().wakeup();
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29254281/viewspace-2134876/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 移動App網路優化細節探討APP優化
- Selenium自動化測試網頁網頁
- Flinkx Logminer效能探測&優化之路優化
- 網頁效能優化網頁優化
- apache網頁優化Apache網頁優化
- 移動端網頁效能優化自查表網頁優化
- Apache網頁優化與安全優化Apache網頁優化
- 使用 Python 和 Selenium 自動化網頁測試Python網頁
- Apache網頁與安全優化Apache網頁優化
- App啟動頁面優化APP優化
- 移動web頁面如何自動探測電話號碼?Web
- 網頁Web優化工具:Scrutiny 10 Mac網頁Web優化Mac
- 【NIO系列】——之TCP探祕TCP
- Apache網頁優化__GongWei1997Apache網頁優化Go
- IdleHandler,頁面啟動優化神器優化
- Chrome實現自動化測試:錄製回放網頁動作Chrome網頁
- Vue 網站首頁載入優化Vue網站優化
- 丰采網教你如何優化落地頁MXT優化
- Nginx網頁優化與防盜鏈Nginx網頁優化
- 優化自動化測試流程,使用 flask 開發一個 toy jenkins工具優化FlaskJenkins
- Kali路由策略探測工具————firewalk路由
- 瀏覽器渲染流水線解析與網頁動畫效能優化瀏覽器網頁動畫優化
- 自動化測試工具QTPQT
- SQL優化器探討(zt)SQL優化
- 淺探前端圖片優化前端優化
- 網頁程式碼(主頁)(初始版):網頁
- iOS自動化測試驅動工具探索iOS
- 效能優化小冊 - 提高網頁響應速度:優化你的 CDN 效能優化網頁
- Nmap 7.95 - Zenmap 漢化版,埠掃描、網路嗅探工具
- 前端網頁載入速度緩慢優化策略前端網頁優化
- [javascript]如何優雅的實現網頁自動滾動JavaScript網頁
- EverWeb 3.9.6 視覺化的網頁設計工具Web視覺化網頁
- Blocs 4 for Mac(視覺化網頁設計工具)BloCMac視覺化網頁
- 【自動化測試】移動端測試輔助工具 - adb
- 軟體分享:網頁監測及IIS重啟工具網頁
- appium uiautomator 移動端自動化測試工具APPUI
- Windows桌面自動化測試工具:WinAppDriverWindowsAPP
- bats-Bash自動化測試工具BAT
- 微軟自動化測試工具palywright微軟