當前仍可用的爬取Youtube影片方法

Isakovsky發表於2024-10-10

原文網址 : https://www.cnblogs.com/isakovsky/p/18457028

import yt_dlp  
import http.cookiejar  
import time  
import logging  
import os
import random

# Setup logging  
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')  

def load_cookies_from_netscape(cookie_file):  
    cookie_jar = http.cookiejar.CookieJar()  
    with open(cookie_file, 'r') as file:  
        for line in file:  
            if line.startswith('#') or not line.strip():  
                continue  
            parts = line.strip().split('\t')  
            if len(parts) >= 7:  
                domain, flag, path, expires, name, value = parts[0:6]  
                cookie_jar.set_cookie(http.cookiejar.Cookie(  
                    version=0,  
                    name=name,  
                    value=value,  
                    port=None,  
                    port_specified=False,  
                    domain=domain,  
                    domain_specified=True,  
                    domain_initial_dot=domain.startswith('.'),  
                    path=path,  
                    path_specified=True,  
                    secure=flag == 'TRUE',  
                    expires= None,  
                    discard=True,  
                    comment=None,  
                    comment_url=None,  
                    rest=None  
                ))  
    return cookie_jar  

def download_youtube_videos(video_urls, cookie_file, output_path, user_agent):  
    cookie_jar = load_cookies_from_netscape(cookie_file)  

    ydl_opts = {  
        'cookiejar': cookie_jar,  
        'outtmpl': f'{output_path}/%(id)s.%(ext)s',  # Save file as video ID  
        'http_headers': {  
            'User-Agent': user_agent  
        }  
    }  

    for video_url in video_urls:  
        attempt = 0  
        max_attempts = 2  # Maximum number of retries  
        while attempt < max_attempts:  
            try:  
                with yt_dlp.YoutubeDL(ydl_opts) as ydl:  
                    ydl.download([video_url])  # Note: must pass a list  
                break  # Break if download is successful  
            except Exception as e:  
                logging.error(f"Failed to download {video_url}: {e}")  
                attempt += 1  
                time.sleep(5)  # Delay before retrying  
        else:  
            logging.error(f"Exceeded maximum retries for {video_url}")  

# Example usage  
video_urls = []  
with open('mission.txt', 'r') as file:  
    datas = file.readlines()
    random.shuffle(datas) 
    for data in datas:  
        video_urls.append('https://www.youtube.com/watch?v=' + data.strip())  # Ensure no extra whitespace  

cookie_file = 'youtube.com_cookies.txt'  # Replace with your cookie file path  
output_path = './'  # Replace with your desired output path  
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.96 Safari/537.36'  # Replace with your user agent string  

download_youtube_videos(video_urls, cookie_file, output_path, user_agent)

需要注意的幾點是

1,yt_dlp庫

2,UA模仿正常使用者瀏覽

3,攜帶Cookie

其中,mission.txt是待下載的影片ID,youtube.com_cookies.txt為登入Youtube後匯出的Cookie,格式如下:

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file!  Do not edit.

.youtube.com	TRUE	/	TRUE ......

可以使用此類工具匯出:

當前仍可用的爬取Youtube影片方法

使用 xpath 爬取當前頁面所有城市名稱
2020-10-28
$request 請求方法獲取 API 的當前使用者
2018-09-14
API
Java：如何輕鬆獲取當前執行的方法名
2024-05-22
Java
php 獲取當前域名和當前協議
2021-03-29
PHP協議
利用爬蟲獲取當前博文數量與字數
2021-06-11
爬蟲
使用JS獲取當前地理位置方法彙總
2018-08-15
JS
JavaScript 獲取當前月份
2019-06-06
JavaScript
獲取當前時間
2020-12-28
Android開發：獲取當前系統時間和日期的方法
2021-09-09
Android
Flutter 小知識,Key的使用(獲取當前點選Widget位置/獲取當前Widget大小)
2020-12-04
Flutter
獲取當前Tomcat例項的埠
2019-01-19
Tomcat
獲取當前時間往前的日期
2018-03-08
Spark獲取當前分割槽的partitionId
2021-09-09
Spark
獲取當前頁面的topViewController
2019-03-25
ViewController
Java獲取當前星期幾
2018-07-18
Java
mybatis獲取當前時間
2019-12-07
MyBatis
Oracle database 19c中獲取當前資料庫版本的方法
2021-01-10
OracleDatabase資料庫
MediaHuman YouTube Downloader mac(YouTube影片下載器)
2021-01-03
Mac
爬蟲搭建代理池、爬取某網站影片案例、爬取新聞案例
2023-03-16
爬蟲網站
用js獲取當前月份的天數
2018-06-06
JS
js獲取當前的具體時間
2020-09-27
JS
2.7萬隻！當前及未來仍將是量化交易的黃金期
2022-03-30
各種語言裡獲取當前模組的方法：ABAP，ABSL，C，nodejs
2020-03-15
NodeJS
python 如何獲取當前時間
2021-09-11
Python
如何用js獲取當前時間和ip地址以及當前城市
2020-08-11
JS
Python爬蟲——批次爬取douyin影片，下載到本地
2024-12-06
Python爬蟲
js根據IP地址獲取當前的省市
2018-06-11
JS
獲取當前修改的行記錄資料
2019-07-23
[AHK]讀取演示PPT當前頁的備註
2020-10-27
JVM 如何獲取當前容器的資源限制？
2023-01-11
JVM
可用 chat-gpt 網站，當前500+站點
2023-05-04
GPT網站
Java如何獲取當前執行緒
2018-07-05
Java執行緒
Linux C獲取當前工作目錄
2020-11-08
Linux
微信小程式獲取當前位置
2019-01-28
微信小程式
Flutter獲取當前網路型別
2021-05-14
Flutter型別
爬蟲——網頁爬取方法和網頁解析方法
2020-12-07
爬蟲網頁
YouTube影片下載：Airy for mac
2024-01-09
AIMac
iOS獲取當前控制器的正確方式
2018-09-16
iOS

當前仍可用的爬取Youtube影片方法

相關文章