Python爬蟲實戰之bilibili

SilenceHL發表於2021-04-04

原文網址 : https://learnku.com/articles/55954

Python爬蟲

宣告：以下內容均為我個人的理解，如果發現錯誤或者疑問可以聯絡我共同探討

爬蟲介紹

網站介紹

本次要爬取的網站為bilibili，它是國內知名的視訊彈幕網站,這裡有及時的動漫新番,活躍的ACG氛圍,有創意的Up主。可以在這裡找到許多歡樂。

編寫爬蟲的原因和用途

bilibili已經從原來的小破站變成了現在現象級的多元化的社群網站，本次爬取它的目的是以它作為一個典型，告訴大家遇到各型別驗證碼的一種思路。

其實這類網站有個最簡單的辦法，就是提前登陸手動獲取到cookie，然後根據cookie去請求我們需要爬取的網站。個人目的的爬蟲可以用這類方法，比較省編寫程式碼的時間。但是公司中可能會遇到眾多賬戶的爬蟲需求，一個一個手動登入去獲取cookie就比較麻煩了，這時候使用Selenium自動化去獲取效率就高很多了。

Selenium

簡介

正如他們官方的介紹Selenium automates browsers. That's it!，他是一個自動化的瀏覽器，可以模擬人的操作。

使用教程

推薦通過Selenium中文網學習，非常全面！

驗證碼分析

滑動驗證碼

嗶哩嗶哩從之前的驗證碼是滑動驗證碼，主要思路就是找到缺口確定缺口的座標，然後通過Selenium操作滑動到指定位置就行。類似的還有阿里系的大部分網頁，比如飛豬、淘寶、天貓等，不過阿里系的不是每次都需要驗證，得根據實際情況操作。

這種就是找到最右邊的位置資訊，然後滑動即可

這種需要先找到整個圖片的位置，然後滑動先找到內容的輪廓在進行滑動，都是同一個思路演變的

看圖填答案系列

包括東方財富網上交易、bigquant等等，這類比較簡單。將其下載下來根據驗證碼的情況進行處理然後交給各大雲服務商的ORC服務識別就可以，都有免費試用的額度，根據自己的需求和喜好選擇，也可以多試幾家進行對比。

百度、騰訊、阿里、有道智雲

根據圖片進行操作點選系列

目前多了很多這種驗證碼，這種的難度在於情況變化比較多，不僅限於漢字和數字，還有可能是圖片等等，這時候自己想辦法也能解決，但是策略一改變就比較麻煩，可以藉助各類打碼的平臺，對內容進行識別，然後再根據內容去進行操作

易雲打碼、快識別網址、斐斐打碼等等

bilibili登陸分析

bilibili的最新的驗證碼屬於第三種，在點選登陸按鈕就會出現一個驗證碼的框,我們需要將這個圖片下載下來給打碼平臺去識別，獲得座標資訊然後再用Selenium進行點選操作

bilibili驗證碼

編寫程式碼

Selenium模擬登陸

import re
import time
import base64
import json
import requests
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options


class Bilibili(object):
 def __init__(self):
 chrome_options = Options()
 # 設定無視窗模式
 # chrome_options.add_argument('--headless')
 self.driver = webdriver.Chrome('./chromedriver', options=chrome_options)

 def login(self, username, password):
 # 開登陸頁面
 self.driver.get("https://passport.bilibili.com/login")
 # 輸入使用者名稱和密碼
 self.driver.find_element_by_id('login-username').send_keys(username)
 self.driver.find_element_by_id('login-passwd').send_keys(password)
 # 點選登陸按鈕
 self.driver.find_element_by_class_name('btn-login').click()
 # 等待驗證碼出現
 # self.driver.implicitly_wait(10)
 time.sleep(5)
 # 獲取圖片所在屬性
 img_style = self.driver.find_element_by_class_name('geetest_tip_img').get_attribute('style')
 # 通過正規表示式獲得圖片url
 url = re.findall('url\("(.*?)"\)', img_style)[0]
 # 通過requests傳送請求得到圖片
 response = requests.get(url).content
 # 將圖片儲存在本地
 with open('./captcha.png', 'wb') as f:
 f.write(response)
 # 通過打碼平臺進行打碼識別
 result = self.captcha_recognition()
 # 識別成功
 if result != "":
 # 對識別到的座標進行分組處理
 result_list = result.split('|')
 for result in result_list:
 x = result.split(',')[0]
 y = result.split(',')[1]
 # 根據座標執行整個動作鏈
 ActionChains(self.driver).move_to_element_with_offset(img_style, int(x), int(y)).click().perform()
 # 點選確定按鈕
 self.driver.find_element_by_class_name('geetest_commit').click()
 # 獲得登陸後的cookie
 cookie = [item["name"] + "=" + item["value"] for item in self.driver.get_cookies()]
 self.driver.close()
 return cookie

 def captcha_recognition(self):
 """驗證碼識別"""
 username = 'username'
 password = 'password'
 with open('./captcha.png', 'rb') as f:
 base64_data = base64.b64encode(f.read())
 b64 = base64_data.decode()
 data = {"username": username, "password": password, "typeid": 27, "image": b64}
 result = json.loads(requests.post("http://api.ttshitu.com/predict", json=data).text)
 if result['success']:
 return result["data"]["result"]
 else:
 print(result["message"])
 return ""

本作品採用《CC 協議》，轉載必須註明作者和本文連結

Python 爬蟲實戰
2023-10-16
Python爬蟲
Python爬蟲實戰之叩富網
2021-04-04
Python爬蟲
Python爬蟲實戰之蘿蔔投研
2021-04-04
Python爬蟲
python爬蟲實戰教程-Python爬蟲開發實戰教程（微課版）
2020-11-11
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
python3網路爬蟲開發實戰_Python3 爬蟲實戰
2022-01-24
Python爬蟲
Python網路爬蟲實戰
2022-03-18
Python爬蟲
python 爬蟲實戰的原理
2021-10-29
Python爬蟲
python爬蟲-33個Python爬蟲專案實戰(推薦)
2020-10-28
Python爬蟲
Java 爬蟲專案實戰之爬蟲簡介
2018-11-24
Java爬蟲
Python爬蟲實戰之（五）| 模擬登入wechat
2018-04-10
Python爬蟲
python實戰之爬蟲面試必備題目
2021-09-11
Python爬蟲面試
Python 爬蟲實戰之爬拼多多商品並做資料分析
2023-10-17
Python爬蟲
Python【爬蟲實戰】提取資料
2020-11-17
Python爬蟲
python網路爬蟲應用_python網路爬蟲應用實戰
2020-12-29
Python爬蟲
python3 爬蟲實戰：為爬蟲新增 GUI 影象介面
2020-03-06
Python爬蟲GUI
不踩坑的Python爬蟲：Python爬蟲開發與專案實戰，從爬蟲入門 Python
2021-12-17
Python爬蟲
Python爬蟲實戰之（二）| 尋找你的招聘資訊
2018-04-28
Python爬蟲
Python學習筆記——爬蟲之Scrapy專案實戰
2018-09-03
Python筆記爬蟲
爬蟲實戰：探索XPath爬蟲技巧之熱榜新聞
2024-03-21
爬蟲
Python爬蟲 ---scrapy框架初探及實戰
2020-04-16
Python爬蟲框架
《Python3網路爬蟲開發實戰》教程||爬蟲教程
2018-11-13
Python爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
Python爬蟲實戰之（四）| 模擬登入京東商城
2018-04-11
Python爬蟲
Python爬蟲，JS逆向之 webpack 打包站點原理與實戰
2022-05-23
Python爬蟲JSWeb
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
Python爬蟲入門實戰之貓眼電影資料抓取（實戰篇）
2019-04-07
Python爬蟲
python爬蟲實操專案_Python爬蟲開發與專案實戰 1.6 小結
2021-02-04
Python爬蟲
Python網路爬蟲實戰專案大全 32個Python爬蟲專案demo
2019-04-24
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
Python爬蟲之BeautifulSoup
2019-02-16
Python爬蟲
[Python3網路爬蟲開發實戰] 分散式爬蟲原理
2019-12-08
Python爬蟲分散式
爬蟲實戰scrapy
2018-03-11
爬蟲
Python 實戰:用 Scrapyd 打造爬蟲控制檯
2018-10-30
Python爬蟲
Python 爬蟲實戰（二）：使用 requests-html
2018-03-14
Python爬蟲HTML
乾貨分享！Python網路爬蟲實戰
2020-08-07
Python爬蟲
Python網路爬蟲實戰小專案
2021-04-12
Python爬蟲
Python 3網路爬蟲開發實戰
2021-04-28
Python爬蟲