用 BeautifulSoup 爬資料

sudo apt-get install python3-bs4 pip install BeautifulSoup

import urllib.request

from bs4 import BeautifulSoup
import re
from math import ceil
import time

def qiyeinfo(picurl):
time.sleep(1)
info = {}
qiyeid = picurl.split('/')[-2]
picurl = picurl + 'company_detail.html'
useragent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
headers = {'User-Agent': useragent}
req = urllib.request.Request(picurl, headers=headers)
html1 = urllib.request.urlopen(req, timeout=5)
bsObj = BeautifulSoup(html1, 'html.parser', from_encoding='gb18030')
html1.close()
try:
qiyeinfo = bsObj.find('div', {'class': 'data'})
tel = bsObj.find('div', {'class': 'telephone'}).get_text()
qiyename = qiyeinfo.p.get_text()
contactsname = bsObj.findAll('div', {'class': 'l-content'})[1].a.get_text()
with open(r'F:\test.txt', 'a+') as f:
f.write('企業url: ' + picurl + '\n')
f.write('企業名稱：' + qiyename + '\n')
f.write('聯絡人：' + contactsname + '\n')
f.write('手機: ' + tel + '\n')
for i in qiyeinfo.find('ul').findAll('li'):
f.write(i.get_text() + '\n')
f.write('\n')
except:
pass

def qiyelist(picurl):
useragent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
headers = {'User-Agent': useragent}
req = urllib.request.Request(picurl, headers=headers)
html = urllib.request.urlopen(req, timeout=10)
bsObj = BeautifulSoup(html, 'html.parser', from_encoding='gb18030')
html.close()
listnum = bsObj.find('div', {'class': 'tit tit2'}).em.get_text()
a = int(listnum) / len(bsObj.findAll('h4'))
for i in range(15, 25):
listurl = '%s/pn%s' % (picurl, i)
req = urllib.request.Request(listurl, headers=headers)
html = urllib.request.urlopen(req, timeout=5)
bsObj = BeautifulSoup(html, 'html.parser', from_encoding='gb18030')
html.close()
for i in bsObj.findAll('h4'):
qiyeurl = i.a.attrs['href']
qiyeinfo(qiyeurl)

if name == 'main':
qiyelist('http://b2b.huangye88.com/jiangxi/food')

Python爬蟲教程-25-資料提取-BeautifulSoup4（三）
2018-09-06
Python爬蟲
Python爬蟲教程-24-資料提取-BeautifulSoup4（二）
2018-09-06
Python爬蟲
Python爬蟲教程-23-資料提取-BeautifulSoup4（一）
2018-09-06
Python爬蟲
爬蟲-使用BeautifulSoup4（bs4）解析html資料
2021-01-24
爬蟲HTML
Python爬蟲之BeautifulSoup
2019-02-16
Python爬蟲
爬蟲（6） - 網頁資料解析(2) | BeautifulSoup4在爬蟲中的使用
2022-07-04
爬蟲網頁
Python爬蟲之BeautifulSoup庫
2020-12-14
Python爬蟲
11.18爬蟲學習（BeautifulSoup類）
2024-11-18
爬蟲
python 小爬蟲 DrissionPage+BeautifulSoup
2024-06-16
Python爬蟲
利用requests+BeautifulSoup爬取網頁關鍵資訊
2018-11-13
網頁
[python應用案例] 一.BeautifulSoup爬取天氣資訊併傳送至QQ郵箱
2018-05-03
Python
python爬蟲之 BeautifulSoup庫入門
2019-12-09
Python爬蟲
Datawhale-爬蟲-Task3(beautifulsoup)
2019-03-03
爬蟲
python爬蟲常用庫之BeautifulSoup詳解
2018-04-01
Python爬蟲
爬蟲系列 | 6、詳解爬蟲中BeautifulSoup4的用法
2021-01-19
爬蟲
python爬蟲：使用BeautifulSoup修改網頁內容
2020-04-05
Python爬蟲網頁
BeautifulSoup + requests 爬取扇貝 python 單詞書
2019-07-11
Python
爬蟲入門系列（四）：HTML 文字解析庫 BeautifulSoup
2019-02-27
爬蟲HTML
使用requests+BeautifulSoup的簡單爬蟲練習
2018-04-06
爬蟲
Python 實用爬蟲-04-使用 BeautifulSoup 去水印下載 CSDN 部落格圖片
2019-06-16
Python爬蟲
[譯] 如何使用 Python 和 BeautifulSoup 爬取網站內容
2019-02-23
Python網站
Python3爬蟲利器:BeautifulSoup4的安裝
2021-09-11
Python爬蟲
Python 爬蟲十六式 - 第五式：BeautifulSoup，美味的湯
2019-01-13
Python爬蟲
實戰（二）輕鬆使用requests庫和beautifulsoup爬連結
2019-03-03
房產資料爬取、智慧財產權資料爬取、企業工商資料爬取、抖音直播間資料python爬蟲爬取
2024-07-11
Python爬蟲
用Python爬蟲分析演唱會銷售資料
2018-12-05
Python爬蟲
爬蟲在大資料時代的應用
2023-04-27
爬蟲大資料
Python 爬蟲進階篇-利用beautifulsoup庫爬取網頁文章內容實戰演示
2020-09-14
Python爬蟲網頁
利用requestes\pyquery\BeautifulSoup爬取某租房公寓(深圳市)4755條租房資訊及總結
2020-11-22
[python爬蟲] BeautifulSoup設定Cookie解決網站攔截並爬取螞蟻短租
2018-03-07
Python爬蟲Cookie網站
BeautifulSoup庫
2024-05-19
爬蟲技術不只是用來抓資料
2019-01-07
爬蟲
用Jupyter—Notebook爬取網頁資料例項14
2020-12-01
網頁
用Jupyter—Notebook爬取網頁資料例項12
2020-12-01
網頁
儲存資料到MySql資料庫——我用scrapy寫爬蟲（二）
2019-02-16
MySql資料庫爬蟲
用xpath、bs4、re爬取B站python資料
2018-08-07
Python
小福利，用gevent多協程高效爬取海量資料
2020-10-18
爬蟲爬取資料如何繞開限制？
2022-06-10
爬蟲
Python BeautifulSoup 使用
2019-01-20
Python

用 BeautifulSoup 爬資料

相關文章