python爬蟲，獲取中國工程院院士資訊

Jocks5發表於2021-12-04

原文網址 : https://blog.csdn.net/Jocks5/article/details/121716308

import re
import os
import os.path
import time
from urllib.request import urlopen

dstDir = 'YuanShi'
if not os.path.isdir(dstDir):
    os.mkdir(dstDir)

startUrl = r'http://www.cae.cn/cae/html/main/col48/column_48_1.html'
with urlopen(startUrl) as fp:
    content = fp.read().decode()

# 提取並遍歷每位大牛連結
pattern = r'<li class="name_list"><a href="(.+)" target="_blank">(.+)</a></li>'
result = re.findall(pattern, content)
for item in result:
    perUrl, name = item
    # 測試是否獲取資訊
    print(perUrl)
    # 這裡根據初爬結果進行改進
    name = name.replace('<h3>', '').replace('</h3>', '')
    name = os.path.join(dstDir, name)
    perUrl = r'http://www.cae.cn/' + perUrl
    with urlopen(perUrl) as fp:
        content = fp.read().decode()

    # 抓取簡介
    pattern = r'<p>(.+?)</p>'
    result = re.findall(pattern, content)  # 返回string中所有與pattern匹配的全部字串,返回形式為陣列。
    if result:
        intro = re.sub('(<a.+</a>)|(&ensp;)|(&nbsp);','','\n'.join(result))
        with open(name+'.txt', 'w', encoding='utf8') as fp:
            fp.write(intro)

Python 爬蟲獲取網易雲音樂歌手資訊
2019-03-04
Python爬蟲
利用Python爬蟲獲取招聘網站職位資訊
2021-08-09
Python爬蟲網站
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
python爬蟲如何獲取表情包
2021-09-11
Python爬蟲
Python爬蟲爬取淘寶，京東商品資訊
2020-02-11
Python爬蟲
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
Python爬蟲精簡步驟1 獲取資料
2020-02-17
Python爬蟲
python爬蟲獲取天氣網實時資料
2022-11-29
Python爬蟲
Java爬蟲-爬取疫苗批次資訊
2024-06-03
Java爬蟲
python爬蟲58同城（多個資訊一次爬取）
2018-11-04
Python爬蟲
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲
python爬蟲--招聘資訊
2018-11-03
Python爬蟲
用Python網路爬蟲獲取Mikan動漫資源
2020-08-26
Python爬蟲
爬蟲例項-淘寶頁面商品資訊獲取
2020-10-08
爬蟲
python爬蟲獲取百度熱搜
2024-06-15
Python爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
python 爬蟲 5i5j房屋資訊獲取並儲存到資料庫
2018-08-20
Python爬蟲資料庫
python爬蟲從ip池獲取隨機IP
2021-09-11
Python爬蟲隨機
python 爬蟲之獲取標題和連結
2020-11-27
Python爬蟲
Python爬蟲抓取股票資訊
2021-01-03
Python爬蟲
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
中國工程院院士高文：人工智慧的“3144”
2019-08-09
人工智慧
python爬蟲小專案--飛常準航班資訊爬取variflight（上）
2019-03-23
Python爬蟲
爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python
爬蟲實踐之獲取網易雲評論資料資訊
2022-03-29
爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
最簡單的網路圖片的爬取 --Pyhon網路爬蟲與資訊獲取
2020-04-04
爬蟲
【大資料】中國工程院院士何友：工業大資料及其應用
2018-03-31
大資料
實時獲取股票資料，免費！——Python爬蟲Sina Stock實戰
2021-10-13
Python爬蟲
Golang 爬蟲快速入門 | 獲取 B 站全站的視訊資料
2020-04-14
Golang爬蟲
Python爬蟲實戰案例-爬取幣世界標紅快訊
2019-02-16
Python爬蟲
python爬蟲練習--爬取虎牙主播原畫視訊
2020-11-28
Python爬蟲
python爬取北京租房資訊
2018-05-18
Python
Python 爬蟲獲取網易雲音樂歌手歌詞
2018-08-09
Python爬蟲
【Python爬蟲實戰】使用Selenium爬取QQ音樂歌曲及評論資訊
2021-03-24
Python爬蟲
中國工程院院士：物聯網市場須走出碎片化
2018-05-08

python爬蟲，獲取中國工程院院士資訊

相關文章