爬蟲中經常出現Traceback (most recent call last):問題解決!!!

滑冰選手庫裡發表於2019-04-25

問題描述:

當用快速爬取某網站出現經常出現Traceback (most recent call last):的錯誤,也就是連線失敗。原因首先是快速爬取連線時網路不穩定造成的,於是寫了個多次嘗試連線的函式。

 

錯誤介面:

Traceback (most recent call last):

  File "E:/pycharm/PycharmProjects/爬蟲/BG5.py", line 118, in <module>

    main(j)

  File "E:/pycharm/PycharmProjects/爬蟲/BG5.py", line 84, in main

    response1 = getHTMLText(data[j][0])

  File "E:/pycharm/PycharmProjects/爬蟲/BG5.py", line 54, in getHTMLText

    response = requests.get(url, headers=kv, timeout=60)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\api.py", line 75, in get

    return request('get', url, params=params, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\api.py", line 60, in request

    return session.request(method=method, url=url, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\sessions.py", line 533, in request

    resp = self.send(prep, **send_kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\sessions.py", line 646, in send

    r = adapter.send(request, **kwargs)

  File "E:\pycharm\PycharmProjects\venv\lib\site-packages\requests\adapters.py", line 516, in send

    raise ConnectionError(e, request=request)

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.wzfg.com', port=80): Max retries exceeded with url: /realweb/stat/ProjectListHouseAll.jsp?status=&projectid=9001708&permitNo=%E7%91%9E%E5%AE%89%E5%B8%82%E5%94%AE%E8%AE%B8%E5%AD%97(2017)%E7%AC%AC010%E5%8F%B7 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000000000D42E208>: Failed to establish a new connection: [WinError 10060] 由於連線方在一段時間後沒有正確答覆或連線的主機沒有反應,連線嘗試失敗。',))

解決方法:

def getHTMLText(url):
    maxTryNum = 20
    for tries in range(maxTryNum):
        try:
            kv = {"user-agent": "Mizilla/5.0"}
            response = requests.get(url, headers=kv, timeout=60)
            return response.text
        except:
            if tries < (maxTryNum - 1):
                continue
            else:
                print("Has tried %d times to access url %s, all failed!" % (maxTryNum, url))
                break

 

相關文章