docker 打包 selenium+chromedriver+chrome 遇到的坑和解決方案

ponponon 發表於 2022-07-18
Docker Chrome

Dockerfile

FROM python:3.10-buster

# 如果要阿里源,就用下面這個
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 
# 如果要清華源,就用下面這個
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list) 
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 

WORKDIR /code
RUN mkdir /code/depends
# 下載並安裝 chrome, TIPS: dpkg 不會處理依賴,要使用 apt 安裝 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)


COPY install.py /code/
RUN python install.py

RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/

讓我們一行一行來看

  • RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 這行的作用是使用 aliyun 的 debian apt 倉庫,原因當然是邪惡長城
  • RUN (apt update) && (apt upgrade -y) 更新一下 apt 源,並更新軟體。可以只要 apt-get update,而刪除 apt-get upgrade,後者不是必須項
  • RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 這幾個包用來幹嘛呢?安裝中文字型,作用會在下面講到
  • RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb) ,記得使用 apt 安裝 chrome,而不是 dpkg

解決中文顯示為方塊的問題:

簡中網際網路上,會有人教你,如何自己安裝手動下載 ttf 檔案,然後複製貼上,然後怎麼怎麼樣,一堆操作。我就很無語,他們真的是一點不懂什麼叫做 Linux 嗎?

沒有這麼多麻煩的事情,你裝個 Linux Desktop 難道不是自帶中文的?還要你自己去網上下字型檔案的?

很簡單,apt 倉庫裡面都有準備好的字型,直接用 apt 命令一鍵安裝就好了!

apt-get install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*

如何在 Docker 中安裝 chrome?

簡中網際網路上,很喜歡用 dkpg 安裝 chrome,但是這樣是非常的愚蠢的!他們可能不懂 Linux 也不懂 apt

正確的方式:使用 apt 安裝 chrome,因為 apt 會自動幫你處理依賴關係!

解決 Docker + selenium + chromedriver + chrome 會出現殭屍程式的問題:

      1   18042   18041   18041 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18046       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18047       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18060       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18062       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18095       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18116       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18117       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18119   18118   18118 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18123       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18124       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18140       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18141       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18171       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18193       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18194       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18196   18195   18195 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18200       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18201       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18216       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18218       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18248       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18271       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18272       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18274   18273   18273 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18278       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18279       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18293       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18295       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18328       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18350       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18351       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18353   18352   18352 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18357       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18358       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18373       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18375       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18406       1       1 ?             -1 Z        0   0:01 [chrome] <defunct>
      1   18428       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18429       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18431   18430   18430 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18435       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18436       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18450       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18451       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18483       1       1 ?             -1 Z        0   0:03 [chrome] <defunct>
      1   18507       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18508       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18510   18509   18509 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18514       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18515       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18530       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18532       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18562       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>

超多的殭屍程式會耗盡 pid 表,導致 Chrome failed to start: exited abnormally.

snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer    |   (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer    |   (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer    | Stacktrace:

解決因為 shm 交換空間過小,導致 session deleted because of page crash

selenium + chrome + chromedriver 這套組合需要的 shm 空間還是挺大的,Docker 預設只分配 shm 的 size 為 16 MB

單個 selenium + chrome + chromedriver 例項需要 20 MB 左右的 shm 空間。

如果你不管,就會出現下面的錯誤:

snapshot-consumer    |   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer    |     raise exception_class(message, screen, stacktrace)
snapshot-consumer    |           │               │        │       └ ['#0 0x556b82b0db13 <unknown>', '#1 0x556b8291451f <unknown>', '#2 0x556b8290193d <unknown>', '#3 0x556b82901355 <unknown>', ...
snapshot-consumer    |           │               │        └ None
snapshot-consumer    |           │               └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n  (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer    |           └ <class 'selenium.common.exceptions.WebDriverException'>
snapshot-consumer    | 
snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer    | from tab crashed
snapshot-consumer    |   (Session info: headless chrome=103.0.5060.114)
snapshot-consumer    | Stacktrace:
snapshot-consumer    | #0 0x556b82b0db13 <unknown>
snapshot-consumer    | #1 0x556b8291451f <unknown>
snapshot-consumer    | #2 0x556b8290193d <unknown>

如何解決呢?

version: "3"
services:
  snapshot:
    container_name: snapshot
    image: ponponon/snapshot
    restart: always
    logging:
      driver: json-file
      options:
        max-size: "30m"
        max-file: "1"
    shm_size: "2048M"
    command: python main.py
shm_size 設為多大合適?通過肉眼觀測,使用一般在 50MB 左右,所以設為 512M 綽綽有大餘

解決辦法:https://developer.aliyun.com/article/833847

docker-compose 如何設定 shm-size :參考 https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container

如何獲取 jpg 截圖

參考: JPG 還是 PNG 和記憶體結構有關係嗎?還是隻是儲存到硬碟的時候,才有區別?


順手做了一個開源教程放於 github 中:ponponon/snapshot

圖片.png