像這樣的網站，獲取的資料多了，爬蟲程式是會遇到反爬機制的，比如最常見的封ip。這些都很好解決，直接使用代理就好了。關於代理的使用有需要的可以諮詢下這裡。像這樣的網站使用代理的程式碼部分都是差不多相同的，分享給喲需要的參考下：

 #! -*- encoding:utf-8 -*-
    import requests
    import random
    # 要訪問的目標頁面
    targetUrl = "
    # 要訪問的目標HTTPS頁面
    # targetUrl = "
    # 代理伺服器(產品官網 )
    proxyHost = "t.16yun.cn"
    proxyPort = "31111"
    # 代理驗證資訊
    proxyUser = "username"
    proxyPass = "password"
    proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,
    }
    # 設定 http和https訪問都是用HTTP代理
    proxies = {
        "http"  : proxyMeta,
        "https" : proxyMeta,
    }
    #  設定IP切換頭
    tunnel = random.randint(1,10000)
    headers = {"Proxy-Tunnel": str(tunnel)}
    resp = requests.get(targetUrl, proxies=proxies, headers=headers)
    print resp.status_code
    print resp.text

資料分析的最終效果我們下次詳細的分享給大家，關於簡單的爬蟲使用代理獲取資料就分享這麼多，更多的爬蟲知識大家可以關注小編，以後會分享更多。

歷史股票資料的爬取

相關文章