使用java 爬蟲
Caused by: java.lang.RuntimeException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
該報錯問題 是證照問題導致。
使用HttpURLConnection訪問https協議請求時.對SSL信任
參考連結:https://blog.csdn.net/zz153417230/article/details/80271155
package novel.spider.impl;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.List;
import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.TrustManager;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import novel.spider.entitys.Chapter;
import novel.spider.interfaces.IChapterSpider;
import novel.spider.junit.MyX509TrustManager;
public class AbstractChapterSpider implements IChapterSpider {
protected String crawl(String url) throws Exception {
SSLContext sslcontext = SSLContext.getInstance("SSL", "SunJSSE");//第一個引數為協議,第二個引數為提供者(可以預設)
TrustManager[] tm = {new MyX509TrustManager()};
sslcontext.init(null, tm, new SecureRandom());
HostnameVerifier ignoreHostnameVerifier = new HostnameVerifier() {
public boolean verify(String s, SSLSession sslsession) {
System.out.println("WARNING: Hostname is not matched for cert.");
return true;
}
};
HttpsURLConnection.setDefaultHostnameVerifier(ignoreHostnameVerifier);
HttpsURLConnection.setDefaultSSLSocketFactory(sslcontext.getSocketFactory());
URL url2 = new URL(url);
HttpURLConnection conn = (HttpURLConnection) url2.openConnection();
InputStream in = (InputStream) conn.getInputStream();
// String encoding = conn.getContentEncoding();
// encoding = encoding == null ? "UTF-8" : encoding;
String resp="";
java.io.BufferedReader breader = new BufferedReader(
new InputStreamReader(in, "UTF-8"));
String str = breader.readLine();
while (str != null) {
resp+=str;
str= breader.readLine();
}
return resp;
}
@Override
public List<Chapter> getsChapter(String url) {
try {
String result = crawl(url);
Document doc = Jsoup.parse(result);
//System.err.println(doc);
Elements as = doc.select("div li a");
List<Chapter> chapters = new ArrayList<>();
for (Element a : as) {
Chapter chapter = new Chapter();
chapter.setTitle(a.text());
chapter.setUrl("http://www.bxwx8.org" + a.attr("href"));
chapters.add(chapter);
}
return chapters;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
package novel.spider.junit;
import java.util.List;
import org.junit.Test;
import novel.spider.entitys.Chapter;
import novel.spider.impl.DefaultChapterSpider;
import novel.spider.interfaces.IChapterSpider;
public class Testcase {
@Test
public void test1() throws Exception {
IChapterSpider spider = new DefaultChapterSpider();
List<Chapter> chapters = spider.getsChapter("https://www.266ks.com/0_5/");
for (Chapter chapter : chapters) {
System.out.println(chapter);
}
}
}
相關文章
- Java爬蟲與Python爬蟲的區別?Java爬蟲Python
- JAVA爬蟲使用Selenium自動翻頁Java爬蟲
- Java爬蟲系列三:使用Jsoup解析HTMLJava爬蟲JSHTML
- Python爬蟲與Java爬蟲有何區別?Python爬蟲Java
- Java爬蟲翻頁Java爬蟲
- Java爬蟲批量爬取圖片Java爬蟲
- Java 爬蟲專案實戰之爬蟲簡介Java爬蟲
- Python爬蟲和java爬蟲哪個效率高Python爬蟲Java
- 【Python學習】爬蟲爬蟲爬蟲爬蟲~Python爬蟲
- Python爬蟲之路-chrome在爬蟲中的使用Python爬蟲Chrome
- Python爬蟲之路-selenium在爬蟲中的使用Python爬蟲
- IPIDEA乾貨|Java爬蟲與Python爬蟲的區別IdeaJava爬蟲Python
- Java爬蟲系列二:使用HttpClient抓取頁面HTMLJava爬蟲HTTPclientHTML
- java 爬蟲大型教程(一)Java爬蟲
- Java爬蟲-爬取疫苗批次資訊Java爬蟲
- node爬蟲-使用puppeteer爬蟲
- 怎麼使用爬蟲爬蟲
- 爬蟲-selenium的使用爬蟲
- python網路爬蟲(14)使用Scrapy搭建爬蟲框架Python爬蟲框架
- Java爬蟲系列四:使用selenium-java爬取js非同步請求的資料Java爬蟲JS非同步
- 實用爬蟲-02-爬蟲真正使用代理 ip爬蟲
- 3 行寫爬蟲 - 使用 Goribot 快速構建 Golang 爬蟲爬蟲Golang
- 爬蟲之xpath的使用爬蟲
- Python爬蟲之Pyspider使用Python爬蟲IDE
- 爬蟲使用代理是否安全爬蟲
- 爬蟲框架-scrapy的使用爬蟲框架
- Scrapy爬蟲框架的使用爬蟲框架
- 爬蟲:多程式爬蟲爬蟲
- python爬蟲---網頁爬蟲,圖片爬蟲,文章爬蟲,Python爬蟲爬取新聞網站新聞Python爬蟲網頁網站
- 一個很垃圾的整站爬取--Java爬蟲Java爬蟲
- 通用爬蟲與聚焦爬蟲爬蟲
- 爬蟲--Scrapy簡易爬蟲爬蟲
- Java爬蟲快速開發工具:uncsJava爬蟲
- 基於java的分散式爬蟲Java分散式爬蟲
- Java爬蟲專案環境搭建Java爬蟲
- Java網路爬蟲實操(10)Java爬蟲
- Java網路爬蟲實操(8)Java爬蟲
- Java爬蟲利器HTML解析工具-JsoupJava爬蟲HTMLJS