Python爬蟲爬取淘寶，京東商品資訊

誅心人i發表於2020-02-11

小編是一個理科生，不善長說一些廢話。簡單介紹下原理然後直接上程式碼。

使用的工具（Python+pycharm2019.3+selenium+xpath+chromedriver）其中要使用pycharm也可以私聊我selenium是一個框架可以通過pip下載

pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple/

（注）-i 後面是pip使用臨時清華源下載比較快使用pip原來的源很慢所以做了一個換源處理

chromedriver下載地址：http://chromedriver.storage.googleapis.com/index.html找到與自己的谷歌遊覽器對應版本的版本版本對應關係這篇文章裡面有：https://blog.csdn.net/BinGISer/article/details/88559532

京東流程。。。淘寶類似（就是多了一個登入驗證）

一，要找到商場地址：https://www.jd.com/

二，模擬正常的查詢（正常查詢商品步驟：輸入商品名，點選搜尋，下拉檢視商品，點選下一頁檢視更多的商品）怎麼來的我不去做詳細的說明(懶得打字，能用就行，懶得去做文章教人，實在想學習加我扣扣討論)直接上程式碼，能看懂就看，看不懂的可以加扣扣：2511217211一起討論（加好友驗證備註：討論）

爬取京東商品資訊程式碼：

from selenium import webdriver
from time import sleep
import re
import os


# 搜尋商品
def search_products():
    # 輸入商品名字
    driver.find_element_by_xpath('//*[@id="key"]').send_keys(keyword)
    # 點選搜尋
    driver.find_element_by_xpath('//*[@class="form"]/button').click()
    sleep(10)
    token = driver.find_element_by_xpath('//*[@id="J_bottomPage"]/span[2]/em[1]/b').text
    # 0代表所有匹配到的數字
    token = int(re.compile('(\d+)').search(token).group(1))
    # 返回總頁數
    return token


# 下拉下滑條，載入資料
def drop_down():
    for x in range(1, 11, 2):
        sleep(1)
        j = x / 10
        js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight * %f' % j
        driver.execute_script(js)


# 獲取商品資訊
def get_product():
    lis = driver.find_elements_by_xpath('//*[@class="gl-warp clearfix"]/li[@class="gl-item"]')
    for li in lis:
        price = li.find_element_by_xpath('.//div[@class="p-price"]/strong/i').text + '元'
        info = li.find_element_by_xpath('.//div[@class="p-name"]/a/em').text + li.find_element_by_xpath(
            './/div[@class="p-name"]/a').get_attribute('title')
        p_commit = li.find_element_by_xpath('.//div[@class="p-commit"]/strong/a').text
        p_shopnum = li.find_element_by_xpath('.//div[@class="p-shopnum"]/*').text
        p_img = li.find_element_by_xpath('.//div[@class="p-img"]/a/img').get_attribute('src')
        print(info, price, p_commit, p_shopnum, p_img, sep='|')


# 翻頁
def next_page():
    token = search_products()
    num = 1
    while (num != token):
        driver.get('https://search.jd.com/Search?keyword={}&page={}'.format(keyword, 2 * num - 1))
        driver.implicitly_wait(10)
        num += 1
        drop_down()
        get_product()


if __name__ == "__main__":
    keyword = input('輸入你想查詢的商品名字:')
    driver_path = os.path.abspath(os.path.join(os.getcwd(), "..")) + "/Drive/chromedriver.exe"
    driver = webdriver.Chrome(driver_path)
    # 視窗最大化，防止資料丟失
    driver.maximize_window()
    driver.get('https://www.jd.com/')
    next_page()

爬取淘寶資訊的程式碼：

from selenium import webdriver
from time import sleep
import re
import os


# 搜尋商品
def search_products():
    driver.find_element_by_xpath('//*[@id="q"]').send_keys(keyword)
    driver.find_element_by_xpath('//*[@id="J_TSearchForm"]/div[1]/button').click()
    sleep(10)
    token = driver.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[1]').text
    # 0代表所有匹配到的數字
    token = int(re.compile('(\d+)').search(token).group(1))
    return token


# 下拉下滑條，載入資料
def drop_down():
    for x in range(1, 11, 2):
        sleep(1)
        j = x / 10
        js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight * %f' % j
        driver.execute_script(js)


# 獲取商品資訊
def get_product():
    lis = driver.find_elements_by_xpath('//div[@class="items"]/div[@class="item J_MouserOnverReq  "]')
    for li in lis:
        info = li.find_element_by_xpath('.//div[@class="row row-2 title"]').text
        price = li.find_element_by_xpath('.//a[@class="J_ClickStat"]').get_attribute('trace-price') + '元'
        deal = li.find_element_by_xpath('.//div[@class="deal-cnt"]').text
        image = li.find_element_by_xpath('.//div[@class="pic"]/a/img').get_attribute('src')
        name = li.find_element_by_xpath('.//div[@class="shop"]/a/span[2]').text
        site = li.find_element_by_xpath('.//div[@class="location"]').text
        print(info, price, deal, name, site, image, sep='|')


# 翻頁
def next_page():
    token = search_products()
    num = 0
    while (num != token):
        driver.get('https://s.taobao.com/search?q={}&s={}'.format(keyword, 44 * num))
        driver.implicitly_wait(10)
        num += 1
        drop_down()
        get_product()


if __name__ == "__main__":
    keyword = input('輸入你想查詢的商品名字:')
    driver_path = os.path.abspath(os.path.join(os.getcwd(), ".."))+"/Drive/chromedriver.exe"
    driver = webdriver.Chrome(driver_path)
    # 視窗最大化，防止資料丟失
    driver.maximize_window()
    driver.get('https://www.taobao.com/')
    next_page()

Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
淘寶商品資訊爬取
2020-12-20
[Python3]selenium爬取淘寶商品資訊
2021-09-09
Python
爬蟲例項-淘寶頁面商品資訊獲取
2020-10-08
爬蟲
Python爬蟲二：抓取京東商品列表頁面資訊
2018-06-26
Python爬蟲
python網路爬蟲--爬取淘寶聯盟
2018-07-17
Python爬蟲
Python爬蟲，抓取淘寶商品評論內容!
2018-06-24
Python爬蟲
python 爬蟲實戰專案--爬取京東商品資訊（價格、優惠、排名、好評率等）
2018-06-27
Python爬蟲
利用python編寫爬蟲爬取淘寶奶粉部分資料.1
2021-09-09
Python爬蟲
爬蟲入門之淘寶商品資訊定向爬取！雙十一到了學起來啊！
2020-10-30
爬蟲
Javascript抓取京東、淘寶商品資料
2023-10-19
JavaScript
Java基於API介面爬取淘寶商品資料
2023-10-25
JavaAPI
爬蟲利器Pyppeteer的介紹和使用爬取京東商城書籍資訊
2020-09-22
爬蟲
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
用java爬取京東商品頁注意點
2024-12-08
Java
[Python3網路爬蟲開發實戰] 7-動態渲染頁面爬取-4-使用Selenium爬取淘寶商品
2018-03-30
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
Android 淘寶爬蟲學習
2019-03-18
Android爬蟲
淘寶API分享：獲取淘寶商品SKU資訊
2023-02-27
API
蘇寧易購網址爬蟲爬取商品資訊及圖片
2021-10-12
爬蟲
Java爬蟲-爬取疫苗批次資訊
2024-06-03
Java爬蟲
Python爬蟲入門教程 50-100 Python3爬蟲爬取VIP視訊-Python爬蟲6操作
2019-02-14
Python爬蟲
京東商品圖片自動下載抓取 c# 爬蟲
2020-09-30
C#爬蟲
Python網路爬蟲之爬取淘寶網頁頁面 MOOC可以執行的程式碼
2018-11-24
Python爬蟲網頁
python爬蟲58同城（多個資訊一次爬取）
2018-11-04
Python爬蟲
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
python爬京東（帶GUI）
2020-07-06
PythonGUI
python爬蟲--招聘資訊
2018-11-03
Python爬蟲
網路爬蟲淘寶/天貓獲得淘寶商品評論 API 返回值說明
2023-03-11
爬蟲API
上天的Node.js之爬蟲篇 15行程式碼爬取京東資源
2019-03-22
Node.js爬蟲行程
兩人因使用爬蟲非法爬取、使用淘寶11.8億使用者資料獲罪
2021-06-17
爬蟲
淘寶API分享：淘寶/天貓批次獲取商品重量資訊
2023-02-27
API
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
python爬蟲小專案--飛常準航班資訊爬取variflight（上）
2019-03-23
Python爬蟲
爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python

Python爬蟲爬取淘寶，京東商品資訊

相關文章