網路爬蟲筆記2, requests庫入門-2(程式設計例項)

史努B發表於2018-05-10

實驗1:爬取京東網。

import requests
url = "http://item.jd.com/10460106645"
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000])
except:
    print("爬取失敗")


實驗二: 爬取amazon網

import requests
url = "https://www.amazon.cn/dp/B00RT6LB9W/ref=cngwdyfloorv2_recs_0?pf_rd_p=05f2b7d6-37ec-49bf-8fcf-5d2fec23a061&pf_rd_s=desktop-2&pf_rd_t=36701&pf_rd_i=desktop&pf_rd_m=A1AJ19PSB66TGU&pf_rd_r=TYTEFRZ086W1AQREBTFK&pf_rd_r=TYTEFRZ086W1AQREBTFK&pf_rd_p=05f2b7d6-37ec-49bf-8fcf-5d2fec23a061"
try:
    kv = {'user-agent':'Mozilla/5.0'}
    
    #此處把user-agent標誌變更為合法的標準瀏覽器。因為有的網頁會禁止爬蟲訪問。如果不修改,用r.request.headers 命令檢視,則輸出如下資訊
    #{'User-Agent': 'python-requests/2.18.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
    # 網站可能會設定 來自python的user-agent的訪問。
    #當修改後,再次用r.request.headers 命令檢視,則會顯示如下內容
    #{'user-agent': 'Mozilla/5.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}


    r = requests.get(url,headers = kv)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text)
except:
    print("爬取失敗")

實驗三  百度360搜尋關鍵詞提交

搜尋引擎關鍵詞提交介面:

百度:http://www.baidu.com/s?wd=keyword

360: http://www.so.com/s?q=keyword

Baidu爬蟲全碼

import requests
keyword = "Python"
try:
    kv = {'wd':keyword}
    r = requests.get("http://www.baidu.com/s",params = kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except:

    print("Failed")

//360全碼

import requests
keyword = "Python"
try:
    kv = {'q':keyword}
    r = requests.get("http://www.so.com/s",params = kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except:

    print("Failed")

實驗4  網路圖片的爬取

網路圖片的連結格式: http://www.example.com/picture.jpg

import requests
path = "C:/zhj/abc.jpg"
url = "https://www.nationalgeographic.com/content/dam/travel/2018-digital/wild-wonders-of-europe/wild-wonders-of-europe-23.ngsversion.1525723673468.adapt.676.1.jpg"
r = requests.get(url)

r.status_code

with open(path,'wb') as f:
    f.write(r.content)

實驗五 IP地址歸屬地自動查詢

http://m.ip138.com/ip.asp?ip-ipaddress


import requests
url = "http://m.ip138.com/ip.asp?ip="
try:
    r = requests.get(url + '202.116.65.13')
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[-500:])
except:
    print("Failed")

相關文章