題目網址:https://match.yuanrenxue.cn/match/19
解題步驟
-
看觸發的資料包。
-
有這麼好的事情,沒有加密的引數,url非常簡單,直接寫程式碼訪問。
import requests url = "https://match.yuanrenxue.cn/api/match/19?page=1" headers = {'Host': 'match.yuanrenxue.cn', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'sec-ch-ua-platform': '"Windows"', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'sec-ch-ua': '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Dest': 'empty', 'Referer': 'https://match.yuanrenxue.cn/match/16', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cookie': 'Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1730505879; HMACCOUNT=5B8060E4DC36D34F; Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1730505879; qpfccr=true; no-alert3=true; tk=-5370204167750759641; sessionid=39ahw8ftocq8eghmui7twey3qbw7lek8; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1730505890; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1730506377', } resp = requests.get(url, headers=headers) print(resp.text)
執行一下,發現啥也獲取不到。
-
再看資料包,是個http2.0協議,嘗試用
httpx
庫訪問。
import httpx url = "https://match.yuanrenxue.cn/api/match/19?page=1" headers = {'Host': 'match.yuanrenxue.cn', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'sec-ch-ua-platform': '"Windows"', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'sec-ch-ua': '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Dest': 'empty', 'Referer': 'https://match.yuanrenxue.cn/match/16', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cookie': 'Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1730505879; HMACCOUNT=5B8060E4DC36D34F; Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1730505879; qpfccr=true; no-alert3=true; tk=-5370204167750759641; sessionid=39ahw8ftocq8eghmui7twey3qbw7lek8; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1730505890; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1730506377', } client = httpx.Client(http2=True) resp = client.get(url, headers=headers) print(resp.text)
執行發現還是啥都獲取不到。
-
真無厘頭啊,嘗試用fiddler工具抓包。
也只有一個資料包,重放一下,發現還是可以正常獲取資料。
-
經過查資料,大概這裡是判斷了tls指紋。
tls指紋可參考https://developer.baidu.com/article/details/3348512
python中可以透過安裝
curl_cffi
庫來模擬瀏覽器的tls指紋(在請求時指定 impersonate 關鍵字引數即可)
安裝命令:pip install curl_cffi
-
編寫程式碼嘗試獲取第一頁的資料。
from curl_cffi import requests url = "https://match.yuanrenxue.cn/api/match/19?page=1" headers = {'Host': 'match.yuanrenxue.cn', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'sec-ch-ua-platform': '"Windows"', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'sec-ch-ua': '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Dest': 'empty', 'Referer': 'https://match.yuanrenxue.cn/match/16', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cookie': 'Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1730505879; HMACCOUNT=5B8060E4DC36D34F; Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1730505879; qpfccr=true; no-alert3=true; tk=-5370204167750759641; sessionid=39ahw8ftocq8eghmui7twey3qbw7lek8; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1730505890; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1730506377', } resp = requests.get(url, headers=headers, impersonate="chrome110") print(resp.text)
執行成功獲取到頁面資料。
-
看來我們的思路沒錯,完整爬蟲程式碼如下。
from curl_cffi import requests import re res_sum = 0 pattern = '{"value": (.*?)}' for i in range(1, 6): url = "https://match.yuanrenxue.cn/api/match/19?page={}".format(i) headers = {'Host': 'match.yuanrenxue.cn', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'sec-ch-ua-platform': '"Windows"', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'sec-ch-ua': '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"', 'sec-ch-ua-mobile': '?0', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Dest': 'empty', 'Referer': 'https://match.yuanrenxue.cn/match/16', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cookie': 'Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1730505879; HMACCOUNT=5B8060E4DC36D34F; Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1730505879; qpfccr=true; no-alert3=true; tk=-5370204167750759641; sessionid=39ahw8ftocq8eghmui7twey3qbw7lek8; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1730505890; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1730506377', } resp = requests.get(url, headers=headers, impersonate="chrome110") string = resp.text findall = re.findall(pattern, string) for item in findall: res_sum += int(item) print(res_sum)
執行得到結果。
-
提交結果,成功透過。