Python web自動化爬蟲-selenium/處理驗證碼/Xpath

Victooor_swd發表於2024-07-18

原文網址 : https://www.cnblogs.com/Vicrooor/p/18309861

# coding:utf-8
import time
import random
from time import sleep
from csv import writer
from selenium import webdriver
from selenium.webdriver.common.by import By
from chaojiying import Chaojiying_Client
from selenium.webdriver import ActionChains


driver = webdriver.Chrome()

#開啟網頁
def open_web(search_name):
    driver.get("https://www.izaiwen.cn/pro/sonE-stwE08?psnname={}".format(search_name))
    time.sleep(6)
#載入cookie
#    with open("cookies.json", "r") as file:
#        cookies = json.load(file)
    cookies = [{'name': 'HMACCOUNT', 'value': 'xxxxxxxxx'},
    {'name': 'heiheihei', 'value': 'xxxxxxxxx'},
    {'name': 'hahaha', 'value': 'xxxxxxx'},
    {'name': '_', 'value': 'xxxxxxxxxxx'},
    {'name': 'acw', 'value': 'xxxxxxxxxxxx'},
    {'name': 'cer', 'value': 'xxxxxxxxxx'},
    {'name': 'ession', 'value': 'xxxxxxxxxx'},
    {'name': 'userId', 'value': 'xxxxx'},
    {'name': 'uuid', 'value': 'xxxxxxxxxxxxxxxxxxx'},
    ] 
    for cookie in cookies:
        driver.add_cookie(cookie)
    time.sleep(5)
    driver.refresh()
    time.sleep(random.randrange(5,10))

#檢測是否在驗證碼頁面
def check_condition():
    header=driver.title#提取頁面標題
#    header=driver.find_element(By.XPATH,'/html/head/title').text  此方法無效，只能取出空字串
    print(header)
    if header == '請完成安全驗證': #如果在驗證碼頁面返回f值
        return 'f'
    else:
        return 't'

#基於xpath定位標籤獲取資料
def get_information():
    parent_element=driver.find_elements(By.XPATH,'.//div[@class="item-box layui-card "]')
    for child_element in parent_element:
        target_element=child_element.find_elements(By.XPATH,'.//div[@class="layui-col-xs4"]')
        print(name)
        info=''
        for n in target_element:
            info+=n.text#提取標籤中的資料
            info+=','
        print(info)
        list_data=[name,info]
        #儲存資料
        with open("資訊.csv", "a", newline="") as f_object:
            writer_object = writer(f_object)
            writer_object.writerow(list_data)
        time.sleep(5)
#自動識別驗證碼並提交至超級鷹打碼平臺識別
def anti_anti_spider():
    #找到包含驗證碼的元素
    img=driver.find_element(By.XPATH,'.//div[@id="aliyunCaptcha-window-embed"]')
    #對此元素進行截圖
    img.screenshot('D:/SeleniumX/yzm.png')
    #由於新版本的selenium的點選定位是從元素中心點開始，因此計算元素的尺寸來使點選從左上角開始
    img_half_width = float(img.rect['width'])/2
    img_half_height = float(img.rect['height'])/2
    #初始化超級鷹程式碼，需要從其官網下載程式碼放到此檔案相同資料夾中並匯入
    chaojiying = Chaojiying_Client('', '', '')#賬號，密碼，軟體ID
    #提交到平臺並獲得結果
    im = open('D:/SeleniumX/yzm.png', 'rb').read()
    yzm_result=chaojiying.PostPic(im, 9101)['pic_str']
    time.sleep(10)
    print(yzm_result)
#    for index in result.split('|'): #以"|"進行分割，得到一個列表，並迴圈出每一個字的座標，在這裡因為只返回一個結果所以不需要
    x = float(yzm_result.split(',')[0]) # 得到x軸的座標
    y = float(yzm_result.split(',')[1]) # 得到y軸的座標
    #使用動作鏈模擬點選操作
    action = ActionChains(driver) #建立動作鏈,y).click().perform()
    action.move_to_element_with_offset(img,x-img_half_width,y-img_half_height).click().perform()
    time.sleep(10)

    

#主程式
list_name=[]#需要爬取的人名，用於構建頁面url
for name in list_name:
    open_web(name)#開啟該網頁
    flag=check_condition()#檢測是否觸發了驗證碼
    print(flag)
    if flag == 'f':#若觸發了驗證碼，開始識別並點選驗證碼
        time.sleep(30)
        anti_anti_spider()
        time.sleep(15)
        get_information()
    else:
        time.sleep(5)
        get_information()
    time.sleep(random.randrange(10,30))
    print(name)

python爬蟲之處理驗證碼
2019-03-01
Python爬蟲
使用 Nim 和 Python 自動化處理登入和驗證碼
2024-11-29
Python
selenium自動爬取網易易盾的驗證碼
2020-07-20
Python爬蟲-xpath
2018-06-08
Python爬蟲
Python爬蟲——XPath
2018-07-28
Python爬蟲
自動化測試中的驗證碼處理
2024-11-01
自動化測試時對驗證碼的處理
2018-05-29
使用 Vyper 和 Python 進行自動化登入並處理驗證碼
2024-11-29
Python
爬蟲遇到頭疼的驗證碼？教你彈窗處理和驗證碼識別
2020-12-30
爬蟲
使用Fortran實現當前驗證碼自動化處理
2024-11-19
Python爬蟲——Xpath和lxml
2019-01-20
Python爬蟲XML
python爬蟲（四）——selenium校園網自動填報
2020-10-25
Python爬蟲
Python爬蟲js處理
2020-03-31
Python爬蟲JS
python+selenium 處理圖片驗證碼,Image 點不出 crop 方法
2020-11-23
Python
使用 Zig 實現自動化登入並處理驗證碼
2024-11-29
驗證碼處理在自動化測試中的應用
2024-11-01
Python爬蟲之XPath語法
2019-05-20
Python爬蟲
JAVA爬蟲使用Selenium自動翻頁
2024-05-15
Java爬蟲
基於Selenium + Python的web自動化框架
2019-01-15
PythonWeb框架
Python爬蟲入門教程 57-100 python爬蟲高階技術之驗證碼篇3-滑動驗證碼識別技術
2019-04-11
Python爬蟲
Python爬蟲之資料解析（XPath）
2018-12-18
Python爬蟲
爬蟲 – xpath 匹配
2018-12-20
爬蟲
[python][selenium][web自動化]webdriver的元素定位方式
2024-09-02
PythonWeb
Selenium自動化實現web自動化-1
2021-09-12
Web
Python爬蟲之路-selenium在爬蟲中的使用
2021-01-04
Python爬蟲
Python爬蟲入門教程 55-100 python爬蟲高階技術之驗證碼篇
2019-04-02
Python爬蟲
Python爬蟲教程-21-xpath 簡介
2018-09-06
Python爬蟲
python爬蟲：XPath語法和使用示例
2020-08-09
Python爬蟲
爬蟲驗證碼的幾種處理方式，已封裝成類，文章末尾有原始碼！
2020-10-24
爬蟲封裝原始碼
Web自動化測試環境搭建（selenium+python）
2019-04-24
WebPython
如何使用Selenium自動化測試工具獲取動態圖片驗證碼？
2022-04-25
Python爬蟲基礎之selenium
2022-07-13
Python爬蟲
Web自動化測試：xpath & CSS Selector定位
2020-12-25
WebCSS
【0基礎學爬蟲】爬蟲基礎之自動化工具 Selenium 的使用
2023-04-21
爬蟲
Python爬蟲教程-29-驗證碼識別-Tesseract-OCR
2018-09-06
Python爬蟲
JB的Python之旅-爬蟲篇-圖形驗證碼(1)-- tesserocr
2018-06-09
Python爬蟲
JB的Python之旅-爬蟲篇-圖形驗證碼(3)-- 驗證碼的生成了解下
2018-06-14
Python爬蟲
學會Python+Selenium，分分鐘搭建Web自動化框架！
2018-12-13
PythonWeb框架

Python web自動化爬蟲-selenium/處理驗證碼/Xpath

相關文章