爬取外文工業技術期刊網頁原始碼（自用）

右介發表於2018-03-12

原文網址 : https://www.cnblogs.com/zhangtianyuan/p/8547559.html

#coding=utf-8
import requests
from pymongo import MongoClient
from lxml import etree
import datetime

client = MongoClient("localhost", 27017)

db = client["wanfang"]

collection=db["journal_name"]
collection1=db["journal_foreign_2014"]

db.authenticate("","")

cursor = collection.find()[1]

for i in range(2645):
    name = cursor['name_list'][i]

    num = int(cursor['number_list'][i][1:-1])
    mo = num%50
    count = 0
    if mo!=0:
        count = num/50 + 1
    else:
        count = num/50
    
    for i in range(count):

        url = "http://new.wanfangdata.com.cn/search/searchList.do?searchType=perio&pageSize=50&page="+str(i+1)+u"&searchWord= 摘要:is 起始年:2014 結束年:2014 刊名:" + name + "&order=correlation&showType=detail&isCheck=check&isHit=&isHitUnit=&firstAuthor=false&rangeParame=all"

        result = requests.post(url)
        html = result.text
        tree = etree.HTML(html)
        table = tree.xpath("//div[@class='title']/strong/following-sibling::*[1]/@href")

        for j in table:
            bson = {}
            url1 = "http://new.wanfangdata.com.cn" + j
            result1 = requests.post(url)
            html1 = result1.text
            time = datetime.datetime.now()
            bson['date'] = time
            bson['url'] = url1
            bson['html'] = html1
            bson['year'] = "2014"
            collection1.insert(bson)

爬取外文工業技術期刊名稱
2018-03-09
Python 爬取外文期刊論文資訊（機械儀表工業）
2018-06-19
Python
QWebView獲取網頁原始碼
2018-11-01
WebView網頁原始碼
爬取網頁文章
2021-09-29
網頁
爬蟲——網頁爬取方法和網頁解析方法
2020-12-07
爬蟲網頁
node：爬蟲爬取網頁圖片
2019-02-16
爬蟲網頁
ferret 爬取動態網頁
2019-12-15
網頁
Puppeteer爬取網頁資料
2019-03-22
網頁
關於python爬取網頁
2021-03-10
Python網頁
python爬蟲爬取網頁中文亂碼問題的解決
2024-11-17
Python爬蟲網頁
Python網路爬蟲之爬取淘寶網頁頁面 MOOC可以執行的程式碼
2018-11-24
Python爬蟲網頁
「無程式碼」高效的爬取網頁資料神器
2021-10-18
網頁
JB的Python之旅-爬取phizhub網站（原始碼）
2019-03-01
Python網站原始碼
python爬取網頁詳細教程
2021-09-11
Python網頁
如何爬取視訊的爬蟲程式碼原始碼
2020-12-26
爬蟲原始碼
Node JS爬蟲：爬取瀑布流網頁高清圖
2018-05-17
JS爬蟲網頁
爬取子頁
2018-08-24
如何使用python進行網頁爬取?
2020-08-06
Python網頁
全面解讀工業物聯網及其技術
2023-12-28
工業物聯網技術體系包括哪些
2020-09-07
一起學爬蟲——使用Beautiful Soup爬取網頁
2018-11-26
爬蟲網頁
Python 爬取汽車領域問答語料（自用）
2018-08-06
Python
搭建Python爬取菠菜程式開發網頁搭建網站技術篇-在搭建Java中如何遍歷字串呢？
2022-03-22
Python網頁網站Java字串
網頁用python爬取後如何解析
2021-09-11
網頁Python
Python爬取網頁的所有內外鏈
2021-04-09
Python網頁
手機版python爬取網頁書籍
2020-12-19
Python網頁
爬取 boss 直聘技術崗並分析
2019-03-20
python 爬蟲如何爬取動態生成的網頁內容
2024-10-31
Python爬蟲網頁
工業製造智慧技術
2020-12-25
工業網際網路技術推動貴州企業轉型升級---振工鏈
2020-07-13
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
Puppeteer 實戰-爬取動態生成的網頁
2018-11-10
網頁
結合LangChain實現網頁資料爬取
2024-07-18
LangChain網頁
Python應用開發——爬取網頁圖片
2022-09-21
Python網頁
Python 爬取網頁資料的兩種方法
2023-02-15
Python網頁
工業網際網路網路安全滲透測試技術研究
2020-08-05
工業網際網路內外網的技術與應用分析
2021-03-19
想獲取JS載入網頁的源網頁的原始碼，不想獲取JS載入後的資料
2024-04-10
JS網頁原始碼

爬取外文工業技術期刊網頁原始碼（自用）

相關文章