使用playwright控制瀏覽器在伺服器端將網頁轉化為PDF檔案

east_ebony發表於2024-05-15

需求

在實際需要中,經常存在需要在伺服器端將網頁轉化為PDF檔案儲存下來。

程式碼

requirements.txt

點選檢視程式碼
playwright

convert_pdf.py

點選檢視程式碼
from playwright.sync_api import sync_playwright, Playwright
import argparse

def run(playwright: Playwright, url: str, path: str, timeout: int):
    chromium = playwright.chromium
    browser = chromium.launch()
    context = browser.new_context()
    page = context.new_page()
    page.goto(url=url, timeout=timeout)
    page.emulate_media(media="print")
    page.pdf(path=path, format="A4", outline=True, margin=dict(top="35px", right="35px", bottom="35px", left="35px"))
    browser.close()

with sync_playwright() as playwright:
    parser = argparse.ArgumentParser(description='Convert PDF')
    parser.add_argument('-u', '--url', type=str, required = True, help='Need to convert PDF file network address')
    parser.add_argument('-p', '--path',  type=str, required = True, help='save file path')
    parser.add_argument('-t', '--timeout', type=int, help='timeout(Unit millisecond), defualt 30000 ', default=30000)
    args = parser.parse_args()
    
    if args.timeout < 1000:
        print("error: Please enter the correct timeout period in milliseconds.")
        exit(0)

    run(playwright, url=args.url, path=args.path, timeout=args.timeout)
使用以下命令安裝依賴

python install -r requirements.txt

playwright install

等待安裝完成後,在使用下列命令轉化即可
python .\convert_pdf.py -u https://www.baidu.com --path ./page8.pdf

相關文章