Python 中 selenium 庫

Kun發表於2022-02-28

原文網址 : https://www.cnblogs.com/liuzhongkun/p/15947920.html

selenium 基礎語法

一、環境配置

1、安裝環境

安裝 selenium 第三方庫

pip install selenium

下載瀏覽器驅動：

Firefox瀏覽器驅動：geckodriver
Chrome瀏覽器驅動：chromedriver , taobao備用地址
IE瀏覽器驅動：IEDriverServer
Edge瀏覽器驅動：MicrosoftWebDriver
Opera瀏覽器驅動：operadriver
PhantomJS瀏覽器驅動：phantomjs

需要把這些瀏覽器驅動放入 Python 應用目錄裡面的 Script 資料夾裡面

2、配置引數

每次當selenium啟動chrome瀏覽器的時候，chrome瀏覽器很乾淨，沒有外掛、沒有收藏、沒有歷史記錄，這是因為selenium在啟動chrome時為了保證最快的執行效率，啟動了一個裸瀏覽器，這就是為什麼需要配置引數的原因，但是有些時候我們需要的不僅是一個裸瀏覽器

selenium啟動配置引數接收是ChromeOptions類，建立方式如下：

from selenium import webdriver
option = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=option)

建立了ChromeOptions類之後就是新增引數，新增引數有幾個特定的方法，分別對應新增不同型別的配置專案

from selenium import webdriver
option = webdriver.ChromeOptions()

# 新增啟動引數
option.add_argument()

# 新增擴充套件應用 
option.add_extension()
option.add_encoded_extension()

# 新增實驗性質的設定引數 
option.add_experimental_option()

# 設定偵錯程式地址
option.debugger_address()

常用配置引數：

from selenium import webdriver
option = webdriver.ChromeOptions()

# 新增UA
options.add_argument('user-agent="MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"')

# 指定瀏覽器解析度
options.add_argument('window-size=1920x3000') 

# 谷歌文件提到需要加上這個屬性來規避bug
chrome_options.add_argument('--disable-gpu') 

 # 隱藏滾動條, 應對一些特殊頁面
options.add_argument('--hide-scrollbars')

# 不載入圖片, 提升速度
options.add_argument('blink-settings=imagesEnabled=false') 

# 瀏覽器不提供視覺化頁面. linux下如果系統不支援視覺化不加這條會啟動失敗
options.add_argument('--headless') 

# 以最高許可權執行
options.add_argument('--no-sandbox')

# 手動指定使用的瀏覽器位置
options.binary_location = r"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" 

#新增crx外掛
option.add_extension('d:\crx\AdBlock_v2.17.crx') 

# 禁用JavaScript
option.add_argument("--disable-javascript") 

# 設定開發者模式啟動，該模式下webdriver屬性為正常值
options.add_experimental_option('excludeSwitches', ['enable-automation']) 

# 禁用瀏覽器彈窗
prefs = {  
    'profile.default_content_setting_values' :  {  
        'notifications' : 2  
     }  
}  
options.add_experimental_option('prefs',prefs)

# 新增代理 ip
options.add_argument("--proxy-server=http://XXXXX.com:80")

driver = webdriver.Chrome(chrome_options=chrome_options)

其他配置專案引數

–user-data-dir=”[PATH]” 
# 指定使用者資料夾User Data路徑，可以把書籤這樣的使用者資料儲存在系統分割槽以外的分割槽

　　–disk-cache-dir=”[PATH]“ 
# 指定快取Cache路徑

　　–disk-cache-size= 
# 指定Cache大小，單位Byte

　　–first run 
# 重置到初始狀態，第一次執行

　　–incognito 
# 隱身模式啟動

　　–disable-javascript 
# 禁用Javascript

　　--omnibox-popup-count="num" 
# 將位址列彈出的提示選單數量改為num個

　　--user-agent="xxxxxxxx" 
# 修改HTTP請求頭部的Agent字串，可以通過about:version頁面檢視修改效果

　　--disable-plugins 
# 禁止載入所有外掛，可以增加速度。可以通過about:plugins頁面檢視效果

　　--disable-javascript 
# 禁用JavaScript，如果覺得速度慢在加上這個

　　--disable-java 
# 禁用java

　　--start-maximized 
# 啟動就最大化

　　--no-sandbox 
# 取消沙盒模式

　　--single-process 
# 單程式執行

　　--process-per-tab 
# 每個標籤使用單獨程式

　　--process-per-site 
# 每個站點使用單獨程式

　　--in-process-plugins 
# 外掛不啟用單獨程式

　　--disable-popup-blocking 
# 禁用彈出攔截

　　--disable-plugins 
# 禁用外掛

　　--disable-images 
# 禁用影像

　　--incognito 
# 啟動進入隱身模式

　　--enable-udd-profiles 
# 啟用賬戶切換選單

　　--proxy-pac-url 
# 使用pac代理 [via 1/2]

　　--lang=zh-CN 
# 設定語言為簡體中文

　　--disk-cache-dir 
# 自定義快取目錄

　　--disk-cache-size 
# 自定義快取最大值（單位byte）

　　--media-cache-size 
# 自定義多媒體快取最大值（單位byte）

　　--bookmark-menu 
# 在工具 欄增加一個書籤按鈕

　　--enable-sync 
# 啟用書籤同步

3、常用引數搭配

製作無頭瀏覽器

# 第一種寫法
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)

# 第二種寫法
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)

規避檢測

入口網站檢測如果是selenium請求的，有可能會拒絕訪問。這也是一種反爬機制
實現規避檢測

from selenium import webdriver
from selenium.webdriver import ChromeOptions

options = ChromeOptions()
options.add_experimental_option('excludeSwitcher', ['enable-automation'])
driver = webdriver.Chrome(options=options)

注意：這裡只能使用 options 新增

如果有其他的模組要新增，注意要分開新增

4、分瀏覽器啟動

from selenium import webdriver


driver = webdriver.Firefox()   # Firefox瀏覽器
# driver = webdriver.Firefox(executable_path="驅動路徑")

driver = webdriver.Chrome()    # Chrome瀏覽器

driver = webdriver.Ie()        # Internet Explorer瀏覽器

driver = webdriver.Edge()      # Edge瀏覽器

driver = webdriver.Opera()     # Opera瀏覽器

driver = webdriver.PhantomJS()   # PhantomJS

二、基本語法

1、元素定位

元素定位語法

常用語法：

find_element_by_id()
find_element_by_name()
find_element_by_class_name()
find_element_by_tag_name()
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_xpath()
find_element_by_css_selector()

在 element 變成 elements 時，返回符合條件的所有元素組成的陣列

2、控制瀏覽器操作

控制瀏覽器大小

driver.set_window_size(480, 800)

瀏覽器後退，前進

前進：driver.forward()
後退：driver.back()

重新整理

driver.refresh()

3、操作元素的方法

3.1 點選和輸入

driver.find_element_by_id("kw").clear() # 清空文字 
driver.find_element_by_id("kw").send_keys("selenium") # 模擬按鍵輸入 
driver.find_element_by_id("su").click() # 單擊元素

3.2 提交

在搜尋框模擬回車操作

search_text = driver.find_element_by_id('kw') search_text.send_keys('selenium') search_text.submit()  # 模擬回車操作

3.3 其他

drive.size  # 返回元素的尺寸
drive.text  # 獲取元素的文字
drive.get_attribute(name)  # 獲得屬性值
drive.is_displayed()  # 設定該元素是否使用者可見
drive.page_source  # 獲取網頁原始碼

4、滑鼠操作

在 WebDriver 中，將這些關於滑鼠操作的方法封裝在 ActionChains 類提供

ActionChains 類提供了滑鼠操作的常用方法：

click(on_element=None) ——單擊滑鼠左鍵

click_and_hold(on_element=None) ——點選滑鼠左鍵，不鬆開

context_click(on_element=None) ——點選滑鼠右鍵

double_click(on_element=None) ——雙擊滑鼠左鍵

drag_and_drop(source, target) ——拖拽到某個元素然後鬆開

drag_and_drop_by_offset(source, xoffset, yoffset) ——拖拽到某個座標然後鬆開

key_down(value, element=None) ——按下某個鍵盤上的鍵

key_up(value, element=None) ——鬆開某個鍵

move_by_offset(xoffset, yoffset) ——滑鼠從當前位置移動到某個座標

move_to_element(to_element) ——滑鼠移動到某個元素

move_to_element_with_offset(to_element, xoffset, yoffset) ——移動到距某個元素（左上角座標）多少距離的位置

perform() ——執行鏈中的所有動作

release(on_element=None) ——在某個元素位置鬆開滑鼠左鍵

send_keys(*keys_to_send) ——傳送某個鍵到當前焦點的元素

send_keys_to_element(element, *keys_to_send) ——傳送某個鍵到指定元素

語法：

from selenium.webdriver.common.action_chains import ActionChains

# 獲取元素
menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav #submenu1")

# 鏈式寫法
ActionChains(driver).move_to_element(menu).click(hidden_submenu).perform()

# 分步寫法
actions = ActionChains(driver)
actions.move_to_element(menu)
actions.click(hidden_submenu)
actions.perform()

5、鍵盤操作

想使用selenium中的鍵盤事件，首先我們必須匯入Keys包，需要注意的是包名稱Keys首字母需要大寫。Keys類中提供了幾乎所有的鍵盤事件包括組合按鍵如 Ctrl+A、 Ctrl+C 等

使用語法：

from selenium.webdriver.common.keys import Keys

element.send_keys(鍵盤事件)

# 常用鍵盤事件
Keys.BACK_SPACE 	# 回退鍵(BackSpace)
Keys.TAB	# 製表鍵(Tab)
Keys.ENTER		# Enter鍵(Enter)
Keys.SHIFT		# 大小寫轉換鍵(Shift)
Keys.CONTROL	# Control鍵(Ctrl)
Keys.ALT	# ALT鍵(Alt)
Keys.ESCAPE 	# 返回鍵(Esc)
Keys.SPACE 		# 空格鍵(Space)
Keys.PAGE_UP		# 翻頁鍵上(Page Up)
Keys.PAGE_DOWN 		# 翻頁鍵下(Page Down)
Keys.END		# 行尾鍵(End)
Keys.HOME		# 行首鍵(Home)
Keys.LEFT		# 方向鍵左(Left)
Keys.UP		# 方向鍵上(Up)
Keys.RIGHT		# 方向鍵右(Right)
Keys.DOWN		# 方向鍵下(Down)
Keys.INSERT		# 插入鍵(Insert)
DELETE		# 刪除鍵(Delete)
NUMPAD0 ~ NUMPAD9		# 數字鍵1-9
Keys.F5		# 重新整理鍵
F1 ~ F12		# F1 - F12鍵
(Keys.CONTROL, 'a')		# 組合鍵Control+a，全選
(Keys.CONTROL, 'c')		# 組合鍵Control+c，複製
(Keys.CONTROL, 'x')		# 組合鍵Control+x，剪下
(Keys.CONTROL, 'v')		# 組合鍵Control+v，貼上

其他事件可以通過檢視原始碼獲取

6、獲取斷言資訊

title = driver.title # 列印當前頁面title
now_url = driver.current_url # 列印當前頁面URL
user = driver.find_element_by_class_name('nums').text # # 獲取結果數目

7、等待頁面載入完成

7.1 顯示等待

顯式等待使WebdDriver等待某個條件成立時繼續執行，否則在達到最大時長時丟擲超時異常

例項：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions 

driver = webdriver.Firefox()
driver.get("http://www.baidu.com")

element = WebDriverWait(driver, 5, 0.5).until(
          expected_conditions.presence_of_element_located((By.ID, "kw"))
                      )  # expected_conditions.presence_of_element_located()方法判斷元素是否存在
element.send_keys('selenium')
driver.quit()

WebDriverWait類是由WebDirver 提供的等待方法。在設定時間內，預設每隔一段時間檢測一次當前頁面元素是否存在，如果超過設定時間檢測不到則丟擲異常

語法：

WebDriverWait(driver, timeout, poll_frequency=0.5, ignored_exceptions=None)

引數：

driver ：瀏覽器驅動

timeout ：最長超時時間，預設以秒為單位

poll_frequency ：檢測的間隔（步長）時間，預設為0.5S

ignored_exceptions ：超時後的異常資訊，預設情況下拋NoSuchElementException異常

WebDriverWait()一般由until()或until_not()方法配合使用

until(method, message=‘’) ：呼叫該方法提供的驅動程式作為一個引數，直到返回值為True

until_not(method, message=‘’)：呼叫該方法提供的驅動程式作為一個引數，直到返回值為False

7.2 隱式等待

如果某些元素不是立即可用的，隱式等待是告訴WebDriver去等待一定的時間後去查詢元素。預設等待時間是0秒，一旦設定該值，隱式等待是設定該WebDriver的例項的生命週期

from selenium import webdriver

driver = webdriver.Firefox()    
driver.implicitly_wait(10) # 隱式等待 10 s    
driver.get("http://www.baidu.com")    
myDynamicElement = driver.find_element_by_id("myDynamicElement")

8、頁面切換

driver.switch_to_window("windowName")  # 切換視窗
driver.switch_to_frame("frameName")  # 切換進框架裡面
driver.switch_to_default_content()  # 退出框架

案例

#先通過xpth定位到iframe
xf = driver.find_element_by_xpath('//*[@id="x-URS-iframe"]')
#再將定位物件傳給switch_to_frame()方法
driver.switch_to_frame(xf)
driver.switch_to_default_content()  # 退出框架

9、框處理

9.1 警告框處理

語法：

alert = driver.switch_to_alert()

alert 裡面的方法

text：返回 alert/confirm/prompt 中的文字資訊

accept()：接受現有警告框

dismiss()：解散現有警告框

send_keys(keysToSend)：傳送文字至警告框。keysToSend：將文字傳送至警告框

9.2 下拉框選擇

9.2.1 Select類的方法

9.2.1.1 選中方法

from selenium import webdriver
from selenium.webdriver.support.select import Select

driver = webdriver.Chrome()
driver.implicitly_wait(10)  # 隱式等待
driver.get('http://www.baidu.com')
sel = driver.find_element_by_xpath("//select[@id='nr']")
"""
有三種方式選擇下拉框
select_by_value(value)  通過value屬性值進行選擇
select_by_index(index)  通過索引查詢,index從0開始
select_by_visible_text(text)  通過標籤顯示的text進行選擇
"""
Select(sel).select_by_value(value)

9.2.1.2 取消選擇方法

"""
deselect_all()  取消全選
deselect_by_value(value)  通過value屬性取消選擇
deselect_by_index(index)  通過index取消選擇
deselect_by_visible_text(text)  通過text取消選擇
"""
# 使用方法
Select(sel).deselect_by_value(value)

9.2.2 先定位select 然後在定位option

# 定位到下拉選擇框
selector = driver.find_element_by_id("selectdemo")
# selector = driver.find_element_by_xpath(".//*[@id='selectdemo']")
 
# 選擇"籃球運動員"
selector.find_element_by_xpath("//option[@value='210103']").click()
# selector.find_elements_by_tag_name("option")[2].click()

9.2.3 直接通過xpath層級標籤定位

# 直接通過xpath定位並選擇"籃球運動員"
driver.find_element_by_xpath(".//*[@id='selectdemo']/option[3]").click()

10、檔案上傳

driver.find_element_by_name("file").send_keys('D:\\upload_file.txt')  # 定位上傳按鈕，新增本地檔案

11、 cookie操作

WebDriver操作cookie的方法：

get_cookies()：獲得所有cookie資訊。
get_cookie(name)：返回字典的key為“name”的cookie資訊。
add_cookie(cookie_dict)：新增cookie。“cookie_dict”指字典物件，必須有name 和value 值。
delete_cookie(name,optionsString)：刪除cookie資訊。“name”是要刪除的cookie的名稱，“optionsString”是該cookie的選項，目前支援的選項包括“路徑”，“域”。
delete_all_cookies()：刪除所有cookie資訊

參考連結：https://www.jianshu.com/p/773c58406bdb

手動獲取網頁的cookie，將其序列化並儲存在本地
寫入程式碼

for item in cookies:
    driver.add_cookie(item)

與普通的在headers裡新增{'Cookies':' '}不一樣的是，此方法需要按照cookie的name,value,path,domain格式逐個cookie新增

12、呼叫JS程式碼

js="window.scrollTo(100,450);"
driver.execute_script(js) # 通過javascript設定瀏覽器視窗的滾動條位置

通過execute_script()方法執行JavaScripts程式碼來移動滾動條的位置

13、視窗截圖

driver.get_screenshot_as_file("D:\\baidu_img.jpg") # 擷取當前視窗，並指定截圖圖片的儲存位置

13.1 擷取驗證碼圖片案例

# encoding:utf-8
from PIL import Image
from selenium import webdriver
 
url = 'https://weixin.sogou.com/antispider/?from=http%3A%2F%2Fweixin.sogou.com%2Fweixin%3Ftype%3D2%26query%3Dpython'
driver = webdriver.Chrome()
driver.maximize_window()  # 將瀏覽器最大化
driver.get(url)
# 擷取當前網頁並放到D盤下命名為printscreen，該網頁有我們需要的驗證碼
driver.save_screenshot('D:\\python371\\python_wordspace\\img\\printscreen.png')
imgelement = driver.find_element_by_id('seccodeImage')  # 定位驗證碼
location = imgelement.location  # 獲取驗證碼x,y軸座標
print(location)
size = imgelement.size  # 獲取驗證碼的長寬
print(size)
rangle = (int(location['x']+110), int(location['y']+60), int(location['x'] + size['width']+165),
          int(location['y'] + size['height']+90))  # 寫成我們需要擷取的位置座標
i = Image.open("D:\\python371\\python_wordspace\\img\\printscreen.png")  # 開啟截圖
frame4 = i.crop(rangle)  # 使用Image的crop函式，從截圖中再次擷取我們需要的區域
frame4 = frame4.convert('RGB')
frame4.save('D:\\python371\\python_wordspace\\img\\save.jpg') # 儲存我們接下來的驗證碼圖片 進行打碼
 
driver.close()

14、關閉瀏覽器

driver.close() 關閉單個視窗
driver.quit() 關閉所有視窗

三、總結

參考文章：https://selenium-python-zh.readthedocs.io/en/latest/installation.html

Python爬蟲之Selenium庫的基本使用
2018-11-30
Python爬蟲
Python爬蟲之selenium庫使用詳解
2018-05-16
Python爬蟲
python+selenium 連線MySQL資料庫
2018-08-08
PythonMySql資料庫
Python中的selenium的安裝
2018-07-30
Python
python中selenium常用的api方法
2020-10-25
PythonAPI
Python中的selenium的簡單用法
2018-07-31
Python
Python之selenium：selenium庫的簡介、安裝、使用方法之詳細攻略
2020-11-16
Python
全網最全python庫selenium自動化使用教程
2020-10-30
Python
python selenium Demo
2024-03-15
Python
Python Selenium簡介
2018-12-21
Python
Python之Selenium 框架
2019-10-14
Python框架
python：selenium測試登入在chrome中閃退
2022-06-12
PythonChrome
Python爬蟲之路-selenium在爬蟲中的使用
2021-01-04
Python爬蟲
Python Selenium如何操作Cookies
2019-01-10
PythonCookie
python selenium 速查筆記
2024-04-02
Python筆記
python+selenium方法大全
2021-01-19
Python
Selenium–資料驅動(python)
2018-05-18
Python
python+selenium 儲存log
2018-08-07
Python
Python安裝selenium模組
2020-08-22
Python
Python Selenium如何定位元素
2019-01-06
Python
Python Selenium安裝下載
2018-12-25
Python
Python_Selenium的等待操作
2022-07-25
Python
Python Selenium異常處理
2021-09-11
Python
Python Selenium操作Cookie的方法
2021-09-11
PythonCookie
python selenium 用法和 Chrome headless
2020-12-20
PythonChrome
python+selenium環境搭建，pip安裝selenium失敗
2018-08-05
Python
scrapy中的selenium
2019-03-04
selenium+python 操作滾動條
2018-08-03
Python
python+selenium 截圖儲存
2018-08-07
Python
Python+Selenium(1)- 環境搭建
2020-11-04
Python
瀏覽器配置selenium+python
2020-10-13
瀏覽器Python
Python Selenium的簡單演示程式
2018-12-26
Python
Python爬蟲基礎之selenium
2022-07-13
Python爬蟲
Python+Selenium - 檔案上傳
2021-06-07
Python
python_selenium元素定位_xpath(2)
2022-10-24
Python
Python selenium 三種等待元素方式
2020-12-18
Python
Selenium實戰教程系列（三）--- Selenium中的動作
2018-10-27
selenium中的xpath定位
2018-04-09

Python 中 selenium 庫

selenium 基礎語法

一、 環境配置

1、 安裝環境

2、 配置引數

3、 常用引數搭配

4、 分瀏覽器啟動

二、 基本語法

1、 元素定位

2、 控制瀏覽器操作

3、 操作元素的方法