Python網路爬蟲之爬取淘寶網頁頁面 MOOC可以執行的程式碼

dream_網路安全發表於2018-11-24

原文網址 : https://blog.csdn.net/weixin_42859280/article/details/84429707

可以實現功能的全部程式碼：

import requests
import re


def getHTMLText(url):
    try:
        r = requests.get(url, timeout = 30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

def parsePage(ilt, html):
    try:
        plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)
        tlt = re.findall(r'\"raw_title\"\:\".*?\"',html)
        for i in range(len(plt)):
            price = eval(plt[i].split(':')[1])
            title = eval(tlt[i].split(':')[1])
            ilt.append([price,title])
    except:
            print("")

def printGoodsList(ilt):
    tplt = "{:4}\t{:8}\t{:16}"
    print(tplt.format("序號","價格","商品名稱"))
    count = 0
    for g in ilt:
          count = count + 1
          print(tplt.format(count,g[0],g[1]))
          
def main():
    goods = '書包'
    depth = 2
    start_url = 'https://s.taobao.com/search?q=' + goods
    infoList = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44*i)
            html = getHTMLText(url)
            parsePage(infoList, html)
        except:
            continue
    printGoodsList(infoList)

main()

執行示例：
在這裡插入圖片描述
無論爬取什麼網頁都要先看看robots協議。
淘寶的robots協議：

User-agent: *
Disallow: /

在這裡插入圖片描述
但是，我們模仿人一樣的頻率去爬去就沒事啦。
程式的結構設計：
步驟1：提交商品搜尋請求，迴圈獲取頁面
步驟2：對於每個頁面，提取商品名稱和價格資訊
步驟3：將資訊輸出到螢幕上

檢視原始碼：價格在view_prince裡面。
在這裡插入圖片描述

要注意對齊的方式，這樣就沒錯誤：
在這裡插入圖片描述
但是這樣就會報錯：

在這裡插入圖片描述

python網路爬蟲--爬取淘寶聯盟
2018-07-17
Python爬蟲
[Python3網路爬蟲開發實戰] 7-動態渲染頁面爬取-4-使用Selenium爬取淘寶商品
2018-03-30
Python爬蟲
爬蟲——網頁爬取方法和網頁解析方法
2020-12-07
爬蟲網頁
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
Python網路爬蟲第三彈《爬取get請求的頁面資料》
2018-09-14
Python爬蟲
python爬蟲爬取網頁中文亂碼問題的解決
2024-11-17
Python爬蟲網頁
手把手教你利用爬蟲爬網頁（Python程式碼）
2019-05-14
爬蟲網頁Python
node：爬蟲爬取網頁圖片
2019-02-16
爬蟲網頁
網路爬蟲有什麼用？怎麼爬？手把手教你爬網頁（Python程式碼）
2019-04-24
爬蟲網頁Python
Python爬蟲教程-13-爬蟲使用cookie爬取登入後的頁面(人人網)（下）
2018-09-06
Python爬蟲Cookie
Python爬蟲教程-12-爬蟲使用cookie爬取登入後的頁面(人人網)（上）
2018-09-06
Python爬蟲Cookie
《網頁爬蟲》
2018-11-26
網頁爬蟲
python 爬蟲如何爬取動態生成的網頁內容
2024-10-31
Python爬蟲網頁
python 爬蟲網頁登陸
2020-11-30
Python爬蟲網頁
如何使用python進行網頁爬取?
2020-08-06
Python網頁
爬蟲例項-淘寶頁面商品資訊獲取
2020-10-08
爬蟲
Python使用多程式提高網路爬蟲的爬取速度
2019-02-01
Python爬蟲
關於python爬取網頁
2021-03-10
Python網頁
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
Python爬蟲爬取淘寶，京東商品資訊
2020-02-11
Python爬蟲
Python爬蟲入門【9】：圖蟲網多執行緒爬取
2019-07-31
Python爬蟲執行緒
Node JS爬蟲：爬取瀑布流網頁高清圖
2018-05-17
JS爬蟲網頁
不會Python爬蟲？教你一個通用爬蟲思路輕鬆爬取網頁資料
2019-01-08
Python爬蟲網頁
爬取網頁文章
2021-09-29
網頁
python爬取網頁詳細教程
2021-09-11
Python網頁
一起學爬蟲——使用Beautiful Soup爬取網頁
2018-11-26
爬蟲網頁
python爬取換頁_爬蟲爬不進下一頁了，怎麼辦
2020-11-24
Python爬蟲
[Python3網路爬蟲開發實戰] 7-動態渲染頁面爬取-1-Selenium的使用
2019-02-28
Python爬蟲
python例項，python網路爬蟲爬取大學排名!
2018-11-20
Python爬蟲
「無程式碼」高效的爬取網頁資料神器
2021-10-18
網頁
[Python] 網路爬蟲與資訊提取（1）網路爬蟲之規則
2020-11-06
Python爬蟲
Python爬取網頁的所有內外鏈
2021-04-09
Python網頁
用PYTHON爬蟲簡單爬取網路小說
2021-09-11
Python爬蟲
網頁爬蟲--未完成
2020-10-04
網頁爬蟲
python爬蟲：使用BeautifulSoup修改網頁內容
2020-04-05
Python爬蟲網頁
python網路爬蟲_Python爬蟲：30個小時搞定Python網路爬蟲視訊教程
2020-10-21
Python爬蟲
利用python編寫爬蟲爬取淘寶奶粉部分資料.1
2021-09-09
Python爬蟲
如何用Python網路爬蟲爬取網易雲音樂歌曲
2018-04-27
Python爬蟲

Python網路爬蟲之爬取淘寶網頁頁面 MOOC可以執行的程式碼

相關文章