最近看到requests 作者 kennethreitz 出了一個新庫 requests-html,拿來練練手。該庫旨在儘可能簡單直觀地解析html(例如,抓取網頁)。
官方文件 http://html.python-requests.org/
來抓抓網易11選5的彩票的資料。 首先我們開啟網站,開啟開發者工具找到對應的html。
session = HTMLSession()
def getData():
response = session.get('http://caipiao.163.com/award/11xuan5/')
content = response.html.find('section.main', first=True)
body = content.find('tbody')
itemDicts = dict()
for tr in body:
list = tr.find('td.start')
for td in list:
try:
period = td.attrs['data-period']
award = td.attrs['data-award']
print("序號:" + td.text + " 期號:" + period + " 開獎號碼:" + award)
itemDicts[period] = award
except KeyError as e:
print('except: ', e)
finally:
print('finally')
複製程式碼
因為還有沒有開出來的開獎號碼 我們就try...except了。我們發現網頁是表格的,我們需要按期號排列。
sortItemDict = sorted(itemDicts.keys(), reverse=False)
# print(sortItemDict)
for key in sortItemDict:
print("期號:", key, " 開獎號碼:", itemDicts[key])
複製程式碼
最後結果:
完整程式碼(發現省了不少事,直接find元素s)from requests_html import HTMLSession
import requests
session = HTMLSession()
def getData():
response = session.get('http://caipiao.163.com/award/11xuan5/')
content = response.html.find('section.main', first=True)
body = content.find('tbody')
itemDicts = dict()
for tr in body:
list = tr.find('td.start')
for td in list:
try:
period = td.attrs['data-period']
award = td.attrs['data-award']
print("序號:" + td.text + " 期號:" + period + " 開獎號碼:" + award)
itemDicts[period] = award
except KeyError as e:
print('except: ', e)
finally:
print('finally')
sortItemDict = sorted(itemDicts.keys(), reverse=False)
# print(sortItemDict)
for key in sortItemDict:
print("期號:", key, " 開獎號碼:", itemDicts[key])
if __name__ == '__main__':
getData()
複製程式碼