(上)python3 selenium3 從框架實現學習selenium讓你事半功倍

1_bit發表於2020-05-24

本文感謝以下文件或說明提供的參考。
Selenium-Python中文文件
Selenium Documentation
Webdriver 參考

如有錯誤歡迎在評論區指出,作者將即時更改。文章從我的個人CSDN搬過來的,所以有水印未去

環境說明

  • 作業系統:Windows7 SP1 64
  • python 版本:3.7.7
  • 瀏覽器:谷歌瀏覽器
  • 瀏覽器版本: 80.0.3987 (64 位)
  • 谷歌瀏覽器驅動:驅動版本需要對應瀏覽器版本,不同的瀏覽器使用對應不同版本的驅動,點選下載
  • 如果是使用火狐瀏覽器,檢視火狐瀏覽器版本,點選 GitHub火狐驅動下載地址 下載(英文不好的同學右鍵一鍵翻譯即可,每個版本都有對應瀏覽器版本的使用說明,看清楚下載即可)

簡介

Selenium是一個涵蓋了一系列工具和庫的總體專案,這些工具和庫支援Web瀏覽器的自動化。並且在執行自動化時,所進行的操作會像真實使用者操作一樣。

Selenium有3個版本,分別是 Selenium 1.0、Selenium2.0、Selenium3.0;

Selenium 1.0 主要是呼叫JS注入到瀏覽器;最開始Selenium的作者Jason Huggins開發了JavaScriptTestRunner作為測試工具,當時向多位同事進行了展示(這個作者也是個很有趣的靈魂)。從這個測試工具的名字上可以看出,是基於JavaScript進行的測試。這個工具也就是Selenium的“前身”。

Selenium 2.0 基於 WebDriver 提供的API,進行瀏覽器的元素操作。WebDriver 是一個測試框架也可以說是一個整合的API介面庫。

Selenium 3.0 基於 Selenium 2.0 進行擴充套件,基本差別不大;本文將以Selenium 3.0 版本進行技術說明。

在官方介紹中介紹了有關支援瀏覽器的說明:“通過WebDriver,Selenium支援市場上所有主流瀏覽器,例如Chrom(ium),Firefox,Internet Explorer,Opera和Safari。

簡單開始

安裝好環境後,簡單的使用selenium讓瀏覽器開啟CSDN官網。
在環境配置時需要注意:必須把驅動給配置到系統環境,或者丟到你python的根目錄下。

首先引入 webdriver :

from selenium.webdriver import Chrome

當然也可以:

from selenium import webdriver

引入方式因人而異,之後使用不同的方法新建不同的例項。

from selenium.webdriver import Chrome
driver = Chrome()

或者

from selenium import webdriver
driver = webdriver.Chrome()

一般性的python語法將不會在下文贅述。
之前所提到,需要把驅動配置到系統環境之中,但不外乎由於其它原因導致的不能驅動路徑不能加入到系統環境中,在這裡提供一個解決方法:

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'F:\python\dr\chromedriver_win32\chromedriver.exe')

這裡使用 executable_path 指定驅動地址,這個地址是我驅動所存放的位置。當然這個位置可以根據自己需求制定,並且以更加靈活;本文為了更好說明,所以使用了絕對路徑傳入。

火狐瀏覽器:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://www.csdn.net")

谷歌瀏覽器:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.csdn.net")

火狐瀏覽器與谷歌瀏覽器只有例項化方法不同,其它的操作方法均一致。

在程式碼最開頭引入 webdriver ,在程式碼中例項化瀏覽器物件後,使用get方法請求網址,開啟所需要的網址。

實現剖析

檢視 webdriver.py 實現(from selenium import webdriver):

import warnings

from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
from .remote_connection import ChromeRemoteConnection
from .service import Service
from .options import Options


class WebDriver(RemoteWebDriver):
    """
    Controls the ChromeDriver and allows you to drive the browser.

    You will need to download the ChromeDriver executable from
    http://chromedriver.storage.googleapis.com/index.html
    """

    def __init__(self, executable_path="chromedriver", port=0,
                 options=None, service_args=None,
                 desired_capabilities=None, service_log_path=None,
                 chrome_options=None, keep_alive=True):
        """
        Creates a new instance of the chrome driver.

        Starts the service and then creates new instance of chrome driver.

        :Args:
         - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
         - port - port you would like the service to run, if left as 0, a free port will be found.
         - options - this takes an instance of ChromeOptions
         - service_args - List of args to pass to the driver service
         - desired_capabilities - Dictionary object with non-browser specific
           capabilities only, such as "proxy" or "loggingPref".
         - service_log_path - Where to log information from the driver.
         - chrome_options - Deprecated argument for options
         - keep_alive - Whether to configure ChromeRemoteConnection to use HTTP keep-alive.
        """
        if chrome_options:
            warnings.warn('use options instead of chrome_options',
                          DeprecationWarning, stacklevel=2)
            options = chrome_options

        if options is None:
            # desired_capabilities stays as passed in
            if desired_capabilities is None:
                desired_capabilities = self.create_options().to_capabilities()
        else:
            if desired_capabilities is None:
                desired_capabilities = options.to_capabilities()
            else:
                desired_capabilities.update(options.to_capabilities())

        self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
        self.service.start()

        try:
            RemoteWebDriver.__init__(
                self,
                command_executor=ChromeRemoteConnection(
                    remote_server_addr=self.service.service_url,
                    keep_alive=keep_alive),
                desired_capabilities=desired_capabilities)
        except Exception:
            self.quit()
            raise
        self._is_remote = False

    def launch_app(self, id):
        """Launches Chrome app specified by id."""
        return self.execute("launchApp", {'id': id})

    def get_network_conditions(self):
        return self.execute("getNetworkConditions")['value']

    def set_network_conditions(self, **network_conditions):
        self.execute("setNetworkConditions", {
            'network_conditions': network_conditions
        })

    def execute_cdp_cmd(self, cmd, cmd_args):
        return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

    def quit(self):
        try:
            RemoteWebDriver.quit(self)
        except Exception:
            # We don't care about the message because something probably has gone wrong
            pass
        finally:
            self.service.stop()

    def create_options(self):
        return Options()

從註釋中表明這是 “建立chrome驅動程式的新例項,並且建立chrome驅動程式的例項”

在此只列出本篇文章使用到的引數:

  • executable_path:可執行檔案的路徑。如果使用預設值,則假定可執行檔案位於PATH中;其中的PATH為系統環境根目錄

在 selenium 實現自動化過程中,必要的一步是啟動服務,檢視 init初始化方法中,發現了以下程式碼:

self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
self.service.start()

以上程式碼例項化了Service類,並且傳入相關引數,之後啟動服務;在這裡最主要的引數為 executable_path,也就是啟動驅動。檢視 Service 類(selenium.service):

from selenium.webdriver.common import service


class Service(service.Service):
    """
    Object that manages the starting and stopping of the ChromeDriver
    """

    def __init__(self, executable_path, port=0, service_args=None,
                 log_path=None, env=None):
        """
        Creates a new instance of the Service

        :Args:
         - executable_path : Path to the ChromeDriver
         - port : Port the service is running on
         - service_args : List of args to pass to the chromedriver service
         - log_path : Path for the chromedriver service to log to"""

        self.service_args = service_args or []
        if log_path:
            self.service_args.append('--log-path=%s' % log_path)

        service.Service.__init__(self, executable_path, port=port, env=env,
                                 start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")

    def command_line_args(self):
        return ["--port=%d" % self.port] + self.service_args

檢視基類 start 方法實現(由於基類過長不全部展出,基類在selenium.webdriver.common import service 中):

def start(self):
        """
        Starts the Service.

        :Exceptions:
         - WebDriverException : Raised either when it can't start the service
           or when it can't connect to the service
        """
        try:
            cmd = [self.path]
            cmd.extend(self.command_line_args())
            self.process = subprocess.Popen(cmd, env=self.env,
                                            close_fds=platform.system() != 'Windows',
                                            stdout=self.log_file,
                                            stderr=self.log_file,
                                            stdin=PIPE)
        except TypeError:
            raise
        except OSError as err:
            if err.errno == errno.ENOENT:
                raise WebDriverException(
                    "'%s' executable needs to be in PATH. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            elif err.errno == errno.EACCES:
                raise WebDriverException(
                    "'%s' executable may have wrong permissions. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            else:
                raise
        except Exception as e:
            raise WebDriverException(
                "The executable %s needs to be available in the path. %s\n%s" %
                (os.path.basename(self.path), self.start_error_message, str(e)))
        count = 0
        while True:
            self.assert_process_still_running()
            if self.is_connectable():
                break
            count += 1
            time.sleep(1)
            if count == 30:
                raise WebDriverException("Can not connect to the Service %s" % self.path)

其中發現:

try:
      cmd = [self.path]
      cmd.extend(self.command_line_args())
      self.process = subprocess.Popen(cmd, env=self.env,
                                      close_fds=platform.system() != 'Windows',
                                      stdout=self.log_file,
                                      stderr=self.log_file,
                                      stdin=PIPE)
except TypeError:
            raise
        except OSError as err:
            if err.errno == errno.ENOENT:
                raise WebDriverException(
                    "'%s' executable needs to be in PATH. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            elif err.errno == errno.EACCES:
                raise WebDriverException(
                    "'%s' executable may have wrong permissions. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            else:
                raise
        except Exception as e:
            raise WebDriverException(
                "The executable %s needs to be available in the path. %s\n%s" %
                (os.path.basename(self.path), self.start_error_message, str(e)))
        count = 0
        while True:
            self.assert_process_still_running()
            if self.is_connectable():
                break
            count += 1
            time.sleep(1)
            if count == 30:
                raise WebDriverException("Can not connect to the Service %s" % self.path)

啟動子程式開啟驅動。在出現異常時接收丟擲異常並且報錯。開啟驅動開啟瀏覽器。

在異常丟擲檢測到此已知道了selenium如何啟動服務。接下來檢視get請求網址的實現流程。
檢視webdriver基類(selenium.webdriver.remote.webdriver),找到get方法:

def get(self, url):
    """
    Loads a web page in the current browser session.
    """
    self.execute(Command.GET, {'url': url})

def execute(self, driver_command, params=None):
        """
        Sends a command to be executed by a command.CommandExecutor.

        :Args:
         - driver_command: The name of the command to execute as a string.
         - params: A dictionary of named parameters to send with the command.

        :Returns:
          The command's JSON response loaded into a dictionary object.
        """
        if self.session_id is not None:
            if not params:
                params = {'sessionId': self.session_id}
            elif 'sessionId' not in params:
                params['sessionId'] = self.session_id

        params = self._wrap_value(params)
        response = self.command_executor.execute(driver_command, params)
        if response:
            self.error_handler.check_response(response)
            response['value'] = self._unwrap_value(
                response.get('value', None))
            return response
        # If the server doesn't send a response, assume the command was
        # a success
        return {'success': 0, 'value': None, 'sessionId': self.session_id}

通過get方法得知,呼叫了 execute 方法,傳入了 Command.GET 與 url。
檢視Command.GET的類Command(selenium.webdriver.remote.command)得知,Command為標準WebDriver命令的常量;找到GET常量:

GET = "get"

從檔案上,應該是執行命令方式的類檔案。
首先整理一下流程:

  • 啟動服務→呼叫get方法

其中get方法具體流程:

  • get方法呼叫execute方法,傳入引數為 Command.GET與url,檢視Command的值是標準常量。 在execute方法中,

其中 execute 的實現為:

def execute(self, driver_command, params=None):
        """
        Sends a command to be executed by a command.CommandExecutor.

        :Args:
         - driver_command: The name of the command to execute as a string.
         - params: A dictionary of named parameters to send with the command.

        :Returns:
          The command's JSON response loaded into a dictionary object.
        """
        if self.session_id is not None:
            if not params:
                params = {'sessionId': self.session_id}
            elif 'sessionId' not in params:
                params['sessionId'] = self.session_id

        params = self._wrap_value(params)
        response = self.command_executor.execute(driver_command, params)
        if response:
            self.error_handler.check_response(response)
            response['value'] = self._unwrap_value(
                response.get('value', None))
            return response
        # If the server doesn't send a response, assume the command was
        # a success
        return {'success': 0, 'value': None, 'sessionId': self.session_id}

其中核心程式碼為:

params = self._wrap_value(params)
response = self.command_executor.execute(driver_command, params)
if response:
    self.error_handler.check_response(response)
    response['value'] = self._unwrap_value(
        response.get('value', None))
    return response

主要檢視:

self.command_executor.execute(driver_command, params)

其中 command_executor 為初始化後例項,檢視派生類 webdriver(selenium import webdriver) command_executor 的例項化為:

RemoteWebDriver.__init__(
                self,
                command_executor=ChromeRemoteConnection(
                    remote_server_addr=self.service.service_url,
                    keep_alive=keep_alive),
                desired_capabilities=desired_capabilities)

檢視 ChromeRemoteConnection 類(selenium import remote_connection):

from selenium.webdriver.remote.remote_connection import RemoteConnection


class ChromeRemoteConnection(RemoteConnection):

    def __init__(self, remote_server_addr, keep_alive=True):
        RemoteConnection.__init__(self, remote_server_addr, keep_alive)
        self._commands["launchApp"] = ('POST', '/session/$sessionId/chromium/launch_app')
        self._commands["setNetworkConditions"] = ('POST', '/session/$sessionId/chromium/network_conditions')
        self._commands["getNetworkConditions"] = ('GET', '/session/$sessionId/chromium/network_conditions')
        self._commands['executeCdpCommand'] = ('POST', '/session/$sessionId/goog/cdp/execute')

得知呼叫的是基類初始化方法,檢視得知 execute 方法實現為:

def execute(self, command, params):
        """
        Send a command to the remote server.

        Any path subtitutions required for the URL mapped to the command should be
        included in the command parameters.

        :Args:
         - command - A string specifying the command to execute.
         - params - A dictionary of named parameters to send with the command as
           its JSON payload.
        """
        command_info = self._commands[command]
        assert command_info is not None, 'Unrecognised command %s' % command
        path = string.Template(command_info[1]).substitute(params)
        if hasattr(self, 'w3c') and self.w3c and isinstance(params, dict) and 'sessionId' in params:
            del params['sessionId']
        data = utils.dump_json(params)
        url = '%s%s' % (self._url, path)
        return self._request(command_info[0], url, body=data)

    def _request(self, method, url, body=None):
        """
        Send an HTTP request to the remote server.

        :Args:
         - method - A string for the HTTP method to send the request with.
         - url - A string for the URL to send the request to.
         - body - A string for request body. Ignored unless method is POST or PUT.

        :Returns:
          A dictionary with the server's parsed JSON response.
        """
        LOGGER.debug('%s %s %s' % (method, url, body))

        parsed_url = parse.urlparse(url)
        headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
        resp = None
        if body and method != 'POST' and method != 'PUT':
            body = None

        if self.keep_alive:
            resp = self._conn.request(method, url, body=body, headers=headers)

            statuscode = resp.status
        else:
            http = urllib3.PoolManager(timeout=self._timeout)
            resp = http.request(method, url, body=body, headers=headers)

            statuscode = resp.status
            if not hasattr(resp, 'getheader'):
                if hasattr(resp.headers, 'getheader'):
                    resp.getheader = lambda x: resp.headers.getheader(x)
                elif hasattr(resp.headers, 'get'):
                    resp.getheader = lambda x: resp.headers.get(x)

        data = resp.data.decode('UTF-8')
        try:
            if 300 <= statuscode < 304:
                return self._request('GET', resp.getheader('location'))
            if 399 < statuscode <= 500:
                return {'status': statuscode, 'value': data}
            content_type = []
            if resp.getheader('Content-Type') is not None:
                content_type = resp.getheader('Content-Type').split(';')
            if not any([x.startswith('image/png') for x in content_type]):

                try:
                    data = utils.load_json(data.strip())
                except ValueError:
                    if 199 < statuscode < 300:
                        status = ErrorCode.SUCCESS
                    else:
                        status = ErrorCode.UNKNOWN_ERROR
                    return {'status': status, 'value': data.strip()}

                # Some of the drivers incorrectly return a response
                # with no 'value' field when they should return null.
                if 'value' not in data:
                    data['value'] = None
                return data
            else:
                data = {'status': 0, 'value': data}
                return data
        finally:
            LOGGER.debug("Finished Request")
            resp.close()

從以上實現得知,execute 為向遠端伺服器傳送請求;execute中呼叫的_request方法為傳送http請求並且返回相關結果,請求結果通過瀏覽器進行響應。

官方說明中說明了請求原理:

At its minimum, WebDriver talks to a browser through a driver.
Communication is two way: WebDriver passes commands to the browser through the driver, and receives information back via the same route.
在這裡插入圖片描述
The driver is specific to the browser, such as ChromeDriver for Google’s Chrome/Chromium, GeckoDriver for Mozilla’s Firefox, etc. Thedriver runs on the same system as the browser. This may, or may not be, the same system where the tests themselves are executing.
This simple example above is direct communication. Communication to the browser may also be remote communication through Selenium Server or RemoteWebDriver. RemoteWebDriver runs on the same system as the driver and the browser.

言而總之我們通過webdriver與瀏覽器進行對話,從而瀏覽器進行響應。

通過以上例項得知,使用 execute 向遠端伺服器傳送請求會通過 webdriver 與瀏覽器互動,且傳送已定義的命令常量可獲得一些相關資訊。

由於在程式碼中我們例項的是 webdriver 例項,去 webdriver基類(selenium.webdriver.remote.webdriver)中查詢相關資訊,是否有相關函式可以獲取資訊。發現以下函式:

def title(self):
    """Returns the title of the current page.

    :Usage:
        title = driver.title
    """
    resp = self.execute(Command.GET_TITLE)
    return resp['value'] if resp['value'] is not None else ""
@property
def current_url(self):
    """
    Gets the URL of the current page.

    :Usage:
        driver.current_url
    """
    return self.execute(Command.GET_CURRENT_URL)['value']
@property
def page_source(self):
    """
    Gets the source of the current page.

    :Usage:
        driver.page_source
    """
    return self.execute(Command.GET_PAGE_SOURCE)['value']

以上並沒有列全,我們簡單的嘗試以上函式的使用方法,使用方法在函式中已經說明。嘗試獲取 title(標題)、current_url(當前url)、page_source(網頁原始碼):

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者部落格:https://blog.csdn.net/A757291228')
#支援原創,轉載請貼上原文連結
# print(driver.page_source)

結果成功獲取到網頁標題以及當前網址:
在這裡插入圖片描述
試試 page_source:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者部落格:https://blog.csdn.net/A757291228')
#支援原創,轉載請貼上鍊接
print(driver.page_source)

成功獲取:
在這裡插入圖片描述
原創不易,看到這裡點個贊支援一下唄!謝謝

相關文章