爬蟲2

weixin_34138377發表於2018-01-31

原文網址 : https://blog.csdn.net/weixin_34138377/article/details/87058809

請求方式：get 和 post

獲取AJAX載入的內容 --用post,把資料儲存在request請求裡的data=

有些網頁內容使用AJAX請求載入，這種資料無法直接對網頁url進行獲取。但是隻要記住，AJAX請求一般返回給網頁的是JSON檔案，只要對AJAX請求地址進行POST或GET，就能返回JSON資料了。

爬取豆瓣熱門電影

import urllib.request

import urllib.parse

start=0

b=1

while True:

url_base='http://movie.douban.com/j/chart/top_list?'

url_kw={

'type': 11,

'interval_id': '100:90',

'action':'',

'start':start,

'limit': 20

}

url_all=url_base+urllib.parse.urlencode(url_kw)

print(url_all)

request=urllib.request.Request(url=url_all)

response = urllib.request.urlopen(url=request)

context = response.read()

file_name = 'douban%s.html'%(b)

with open(file_name, 'wb') as file:

file.write(context)

#通過解碼得到字串

ret1 = context.decode('utf-8')

#因為true與前端的True,衝突無法解析,所以要替換

ret2 = ret1.replace('true','True').replace('false','False')

ret3 = eval(ret2)

print(ret3)

print(len(ret3))

if ret3!=[]:

with open(file_name, 'w',encoding='utf-8') as file:

for i in ret3:

file.write(str(i)+'\n')

start=start+20

b+=1

else:

break

自定義opener物件

#自定義url opener物件

import urllib.request

#建立一個http物件

http_handler=urllib.request.HTTPHandler(debuglevel=1)

#建立一個opener物件

http_opener=urllib.request.build_opener(http_handler)

request = urllib.request.Request('http://www.sina.com')

#傳送請求，獲取影響

response = http_opener.open(request)

content = response.read()

with open('./12_1.html','wb') as file:

file.write(content)

urllib2的異常錯誤處理

import urllib.request

request = urllib.request.Request(url='http://www.iloveyou.com/')

try:

response = urllib.request.urlopen(url=request)

except urllib.request.URLError as ex:

print(ex)

else:

content = response.read()

print(content)

print('哦了...')

print('*'*100)

# request = urllib.request.Request(url='http://www.douyu.com/Jack_Cui.html')

request = urllib.request.Request(url='https://err.taobao.com/error1.html?c=404&u=https://www.taobao.com/markddddddddddddddddddddddddets/nvzhuang/dddddddddddddddtaobaonvzhuang?spm=a21bo.2017.201867-main.1.1819dddddddddddddddsac8a9XRYCTP&r=')

try:

response = urllib.request.urlopen(url=request)

except urllib.request.HTTPError as ex:

print(ex)

print(dir(ex))

print(ex.code)

print(ex.getcode())

print(ex.info())

print(ex.msg)

print(ex.reason)

else:

content = response.read()

print(content)

print('哦了...')

ProxyBasicAuthHandler(代理授權驗證)

如果我們使用之前的程式碼來使用私密代理，會報HTTP 407 錯誤，表示代理沒有通過身份驗證：

urllib.request.HTTPError: HTTP Error 407: Proxy Authentication Required

所以我們需要改寫程式碼，通過：

# 1.構建一個附帶Auth驗證的的ProxyHandler處理器類物件

proxyauth_handler = urllib.request.ProxyHandler({"http" : "使用者名稱:密碼@IP:PORT"})

# 2.通過 build_opener()方法使用這個代理Handler物件，建立自定義opener物件，引數包括構建的 proxy_handler

opener = urllib.request.build_opener(proxyauth_handler)

# 3.構造Request 請求

request = urllib.request.Request("http://www.baidu.com/")

# 4.使用自定義opener傳送請求

response = opener.open(request)

# 5.列印響應內容

print(response.read())

python爬蟲2
2019-01-07
Python爬蟲
Python爬蟲--2
2024-03-24
Python爬蟲
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
2、爬蟲-安裝anaconda工具
2024-07-01
爬蟲
手把手教你寫網路爬蟲（2）：迷你爬蟲架構
2018-04-27
爬蟲架構
爬蟲百戰穿山甲（2）：百度翻譯爬蟲
2021-04-15
爬蟲
爬蟲：多程式爬蟲
2021-05-19
爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
nodejs + koa2 實現爬蟲
2019-02-16
NodeJS爬蟲
python爬蟲—學習筆記-2
2024-04-10
Python爬蟲筆記
通用爬蟲與聚焦爬蟲
2023-04-18
爬蟲
爬蟲--Scrapy簡易爬蟲
2020-10-07
爬蟲
爬蟲（6） - 網頁資料解析(2) | BeautifulSoup4在爬蟲中的使用
2022-07-04
爬蟲網頁
爬蟲進階：反反爬蟲技巧
2018-06-28
爬蟲
反爬蟲之字型反爬蟲
2019-06-27
爬蟲
[Python3網路爬蟲開發實戰] 2-爬蟲基礎 2-網頁基礎
2018-03-08
Python爬蟲網頁
爬蟲
2024-11-16
爬蟲
爬蟲案例2-爬取影片的三種方式之一：selenium篇(2)
2024-09-11
爬蟲
【爬蟲】爬蟲專案推薦 / 思路
2020-04-21
爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
爬蟲代理 Scrapy 框架詳細介紹 2
2020-06-04
爬蟲框架
Python爬蟲教程-01-爬蟲介紹
2018-09-06
Python爬蟲
Java爬蟲與Python爬蟲的區別？
2023-10-25
Java爬蟲Python
python就是爬蟲嗎-python就是爬蟲嗎
2020-10-29
Python爬蟲
爬蟲與反爬蟲技術簡介
2022-09-20
爬蟲
request爬蟲
2019-02-16
爬蟲
nodejs 爬蟲
2019-02-16
NodeJS爬蟲
科普：爬蟲
2018-06-29
爬蟲
python 爬蟲
2024-04-20
Python爬蟲
app爬蟲
2024-05-04
APP爬蟲
爬蟲案例
2024-03-31
爬蟲
爬蟲概述
2024-05-02
爬蟲
爬蟲包
2019-12-10
python爬蟲
2024-06-13
Python爬蟲
Python爬蟲入門教程 2-100 妹子圖網站爬取
2018-12-13
Python爬蟲網站
【爬蟲】python爬蟲從入門到放棄
2018-12-20
爬蟲Python
分散式爬蟲原理之分散式爬蟲原理
2018-05-25
分散式爬蟲
C#爬蟲與反爬蟲--字型加密篇
2019-06-26
C#爬蟲加密

爬蟲2

相關文章