Python《必應bing桌面圖片爬取》

星海千尋發表於2020-12-26

原文網址 : https://blog.csdn.net/qq_29367075/article/details/111714545

桌面桌布，來自於bing，必應的桌布網址。https://bing.ioliu.cn/
每一頁都有12張照片，每個照片有對應的download高清大圖的地址，有多個分頁。

但是，麻煩的是開啟後，按不了F12，於是用python直接爬取頁面，才發現是這樣的。
在這裡插入圖片描述

123就是F12的code，這個網址禁止了F12，禁止了ctrl+shirt+i，禁止了ctrl+s。

但是這不影響啊，我們用urrlib.request可以獲得整個頁面的資訊。
每個圖片的文字描述資訊是在< h3>元素裡的。
每個圖片的下載地址是在< a class = “ctrl download”>元素裡的
總頁數資訊是在< div class=“page”>的< span>裡的。

每一頁面的url如下是：
https://bing.ioliu.cn/?p=1
https://bing.ioliu.cn/?p=2
https://bing.ioliu.cn/?p=3
https://bing.ioliu.cn/?p=4

完整程式碼如下：

import time
from concurrent.futures import ThreadPoolExecutor
import time
import os
import re
from urllib.parse import urlencode

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import  Options

rootrurl = 'https://bing.ioliu.cn/?'
save_dir = 'D:/estimages/'
headers = {
    "Referer": rootrurl,
    'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
    'Accept-Language': 'en-US,en;q=0.8',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive'
}  ###設定請求的頭部，偽裝成瀏覽器

def saveOneImg(dir, img_url, title):
    new_headers = {
        "Referer": img_url,
        'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
        'Accept-Language': 'en-US,en;q=0.8',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive'
    }  ###設定請求的頭部，偽裝成瀏覽器，實時換成新的 header 是為了防止403 http code問題，防止反盜鏈，

    try:
        img = requests.get(img_url, headers=new_headers)  # 請求圖片的實際URL
        if (str(img).find('200') > 1):
            with open(
                    '{}/{}.jpg'.format(dir, title), 'wb') as jpg:  # 請求圖片並寫進去到本地檔案
                jpg.write(img.content)
                print(img_url)
                jpg.close()
            return True
        else:
            return False
    except Exception as e:
        print('exception occurs: ' + img_url)
        print(e)
        return False


def getSubTitleName(str):
    # cop = re.compile("[^\u4e00-\u9fa5^a-z^A-Z^0-9]")  # 匹配不是中文、大小寫、數字的其他字元
    cop = re.compile("[^\u4e00-\u9fa5]")  # 匹配不是中文、大小寫、數字的其他字元
    string1 = cop.sub('', str)  # 將string1中匹配到的字元替換成空字元
    return string1


def getOnePage(i):
    params = {
        'p': i,
    }
    url = rootrurl + urlencode(params)
    print(url)
    html = BeautifulSoup(requests.get(url, headers=headers).text, features="html.parser")
    titles = html.find_all('h3')
    lis = html.find_all('a', {'class': 'ctrl download'})

    i = 0
    for a in lis:
        saveOneImg(save_dir, rootrurl[:-2] + a.get('href'), getSubTitleName(titles[i].get_text()))
        i = i + 1


def getNumOfPages():
    html = BeautifulSoup(requests.get(rootrurl, headers=headers).text, features="html.parser")
    return int(html.find('div', {'class': 'page'}).find('span').get_text().split('/')[1])


if __name__ == '__main__':
    getTotal = getNumOfPages()

    for i in range(1, getTotal+1):
        getOnePage(i)
    pass

效果如下：
請新增圖片描述

請新增圖片描述

Java爬蟲爬取bing必應每日一圖背景圖下載到本地(HttpClient+Jsoup+Jackson)
2020-10-20
Java爬蟲HTTPclientJS
Python應用開發——爬取網頁圖片
2022-09-21
Python網頁
利用Python爬取必應桌布
2020-10-13
Python
Python爬蟲—爬取某網站圖片
2020-11-19
Python爬蟲網站
Python爬蟲入門【5】：27270圖片爬取
2019-07-30
Python爬蟲
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
Python爬蟲新手教程：知乎文章圖片爬取器
2019-07-20
Python爬蟲
Python爬蟲遞迴呼叫爬取動漫美女圖片
2020-10-19
Python爬蟲遞迴
Java爬蟲批量爬取圖片
2021-09-24
Java爬蟲
AotucCrawler 快速爬取圖片
2021-11-25
爬取必應翻譯
2020-11-07
新手爬蟲教程：Python爬取知乎文章中的圖片
2019-01-17
爬蟲Python
Python《回車桌面圖片》
2020-12-26
Python
node：爬蟲爬取網頁圖片
2019-02-16
爬蟲網頁
爬蟲---xpath解析（爬取美女圖片）
2020-12-23
爬蟲
python 爬蟲之requests爬取頁面圖片的url，並將圖片下載到本地
2019-06-12
Python爬蟲
Python爬取王者榮耀英雄皮膚高清圖片
2018-11-07
Python
利用Python爬取攝影網站圖片，切勿商用
2018-12-18
Python網站
Python《爬取手機和桌面桌布》
2020-12-25
Python
python爬取網圖
2019-10-15
Python
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
Python網路爬蟲2 - 爬取新浪微博使用者圖片
2018-04-10
Python爬蟲
Python爬蟲入門【4】：美空網未登入圖片爬取
2019-07-30
Python爬蟲
爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python
Python資料爬蟲學習筆記（11）爬取千圖網圖片資料
2018-09-18
Python爬蟲筆記
爬蟲 Scrapy框架爬取圖蟲圖片並下載
2018-08-27
爬蟲框架
爬取愛套圖網上的圖片
2018-03-28
爬取微博圖片資料存到Mysql中遇到的各種坑mysql儲存圖片爬取微博圖片
2019-02-16
MySql
網路爬蟲---從千圖網爬取圖片到本地
2019-09-03
爬蟲
【python--爬蟲】千圖網高清背景圖片爬蟲
2019-05-21
Python爬蟲
Python爬蟲入門教程 4-100 美空網未登入圖片爬取
2018-12-17
Python爬蟲
python爬取FY-4作為桌面背景
2020-11-12
Python
青花瓷圖片的爬取和resize
2020-10-06
python入門012～使用requests爬取網路圖片並儲存到本地
2021-09-09
Python
python opencv讀取網路圖片
2019-03-04
PythonOpenCV
段友福利：Python爬取段友之家貼吧圖片和小視訊
2018-06-01
Python
簡單的爬蟲：爬取網站內容正文與圖片
2021-09-09
爬蟲網站
教你用Python爬取圖蟲網
2019-02-26
Python

Python《必應bing桌面圖片爬取》

相關文章