使用scrapy抓取Youtube播放列表資訊

libbit702發表於2019-02-16

原文網址 : https://flycode.co/archives/78731

抓取Youtube列表資料的前提是scrapy部署的機器可以正常訪問Youtube網站

存取到Mongo中的資料如下：

{
    "playlist_id" : "PLEbPmOCXPYV67l45xFBdmodrPkhzuwSe9",
    "videos" : [
        {
            "playlist_id" : "PLEbPmOCXPYV67l45xFBdmodrPkhzuwSe9",
            "video_id" : "9pTwztLOvj4",
            "thumbnail" : [
                {
                    "url" : "https://i.ytimg.com/vi/9pTwztLOvj4/hqdefault.jpg?sqp=-oaymwEZCPYBEIoBSFXyq4qpAwsIARUAAIhCGAFwAQ==&rs=AOn4CLCmUXUPe-HgXiie0SRfL5cYz0JRrg",
                    "width" : 245,
                    "height" : 137
                }
            ],
            "title" : "Legend of the galactic heroes (1988) episode 1",
            "index" : 1,
            "length_seconds" : 1445,
            "is_playable" : true
        },
        {
            "playlist_id" : "PLEbPmOCXPYV67l45xFBdmodrPkhzuwSe9",
            "video_id" : "zzD1xU37Vtc",
            "thumbnail" : [
                {
                    "url" : "https://i.ytimg.com/vi/zzD1xU37Vtc/hqdefault.jpg?sqp=-oaymwEZCPYBEIoBSFXyq4qpAwsIARUAAIhCGAFwAQ==&rs=AOn4CLCnLCYaZVBeHnZR0T73rfEd_Dbyew",
                    "width" : 245,
                    "height" : 137
                }
            ],
            "title" : "Legend of the galactic heroes (1988) episode 2",
            "index" : 2,
            "length_seconds" : 1447,
            "is_playable" : true
        },

程式碼如下：

# -*- coding: utf-8 -*-
import scrapy
import re
import json
from scrapy import Selector
from knowsmore.items import YoutubePlaylistItem, YoutubePlaylistVideoItem
from ..common import *

class YoutubeListSpider(scrapy.Spider):
    name = `youtube_list`
    allowed_domains = [`www.youtube.com`]
    start_urls = [`https://www.youtube.com/playlist?list=PLEbPmOCXPYV67l45xFBdmodrPkhzuwSe9`]

    def parse(self, response):
        # Extract JSON Data with Regex Expression
        ytInitialData = r1(r`window["ytInitialData"] = (.*?)}};`, response.body)
        if ytInitialData:
            ytInitialData = `%s}}` % ytInitialData
            ytInitialDataObj = json.loads(ytInitialData)

            # Assign VideoList info to variable
            playListInfo = ytInitialDataObj[`contents`][`twoColumnBrowseResultsRenderer`][`tabs`][0][`tabRenderer`][`content`][`sectionListRenderer`][`contents`][0][`itemSectionRenderer`][`contents`][0][`playlistVideoListRenderer`]

            # Build Scrapy Item
            playList = YoutubePlaylistItem(
                playlist_id = playListInfo[`playlistId`],
                videos = []
            )

            # Insert the videoItem to YoutubePlaylistItem videos field
            for videoInfo in playListInfo[`contents`]:
                videoInfo = videoInfo[`playlistVideoRenderer`]
                videoItem = YoutubePlaylistVideoItem(
                    playlist_id = playListInfo[`playlistId`],
                    video_id = videoInfo[`videoId`],
                    thumbnail = videoInfo[`thumbnail`][`thumbnails`],
                    title = videoInfo[`title`][`simpleText`],
                    index = videoInfo[`index`][`simpleText`],
                    length_seconds = videoInfo[`lengthSeconds`],
                    is_playable = videoInfo[`isPlayable`]
                )
                playList[`videos`].append(videoItem)
            
            yield playList

使用Scrapy抓取新浪微博使用者資訊
2019-02-16
使用Scrapy抓取優酷視訊列表頁（電影/電視）
2019-02-16
Scrapy爬蟲：實習僧網最新招聘資訊抓取
2021-09-09
爬蟲
colly 自動抓取資訊
2019-12-20
如何抓取網頁資訊？
2022-06-02
網頁
使用python3抓取pinpoint應用資訊入庫
2019-02-15
Python
爬蟲app資訊抓取之apk反編譯抓取
2019-05-10
爬蟲APPAPK編譯
Python爬蟲抓取股票資訊
2021-01-03
Python爬蟲
Centos7安裝ffmpeg和使用youtube-dl下載Youtube視訊
2019-02-02
CentOS
Python爬蟲抓取知乎所有使用者資訊
2018-03-14
Python爬蟲
Python爬蟲入門教程 33-100 《海王》評論資料抓取 scrapy
2019-02-14
Python爬蟲
scrapy使用
2024-04-12
mac使用者如何下載YouTube視訊？
2022-01-28
Mac
YouTube-dl 命令下載 YouTube 的視訊
2020-05-15
Python 爬蟲（六）：使用 Scrapy 爬取去哪兒網景區資訊
2019-10-20
Python爬蟲
Scrapy框架的使用之Scrapy入門
2018-05-02
框架
18.2 使用NPCAP庫抓取資料包
2023-10-26
PCA
Scrapy框架的使用之Scrapy框架介紹
2018-05-02
框架
Scrapy框架的使用之Scrapy通用爬蟲
2018-05-21
框架爬蟲
Scrapy框架的使用之Scrapy對接Splash
2018-05-18
框架
Python實現拼多多商品資訊抓取方法
2023-10-10
Python
windows安裝Anaconda3，Anaconda3安裝scrapy抓取鏈家資料入門例子
2018-12-12
Windows
如何使用代理IP進行資料抓取，PHP爬蟲抓取亞馬遜商品資料
2019-05-15
PHP爬蟲亞馬遜
用python抓取智聯招聘資訊並存入excel
2018-05-08
PythonExcel
抓取字串中的關鍵資訊神器---split()
2021-09-09
字串
Scrapy框架的使用之Scrapy爬取新浪微博
2018-05-23
框架
使用scrapy框架把資料非同步寫入資料庫
2018-07-16
框架非同步資料庫
UiBot無法抓取Chrome元素和資料抓取工具無法使用的解決方案
2020-03-16
UIChrome
用Java抓取天眼查公開失信人員資訊
2023-12-25
Java
串列埠資料抓取及串列埠通訊模擬
2020-08-19
串列埠
DIY技術資訊抓取工具的實踐與研究
2020-04-03
超5億LinkedIn使用者資訊遭販賣，疑為爬蟲抓取洩露
2021-04-21
爬蟲
Scrapy框架的使用之Scrapyrt的使用
2018-05-21
框架
使用Chrome快速實現資料的抓取（三）——JQuery
2020-04-05
ChromejQuery
使用代理IP抓取資料需要注意什麼？
2023-02-03
Day4--Scrapy基本使用
2019-03-24
爬蟲框架-scrapy的使用
2021-04-28
爬蟲框架
Scrapy爬蟲框架的使用
2021-01-17
爬蟲框架

使用scrapy抓取Youtube播放列表資訊

相關文章