Dockerfile
FROM python:3.10-buster
# 如果要阿里源,就用下面這個
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
# 如果要清華源,就用下面這個
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list)
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
WORKDIR /code
RUN mkdir /code/depends
# 下載並安裝 chrome, TIPS: dpkg 不會處理依賴,要使用 apt 安裝 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
COPY install.py /code/
RUN python install.py
RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/
讓我們一行一行來看
RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
這行的作用是使用 aliyun 的 debian apt 倉庫,原因當然是邪惡長城RUN (apt update) && (apt upgrade -y)
更新一下 apt 源,並更新軟體。可以只要apt-get update
,而刪除apt-get upgrade
,後者不是必須項RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
這幾個包用來幹嘛呢?安裝中文字型,作用會在下面講到RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
,記得使用 apt 安裝 chrome,而不是 dpkg
解決中文顯示為方塊的問題:
簡中網際網路上,會有人教你,如何自己安裝手動下載 ttf 檔案,然後複製貼上,然後怎麼怎麼樣,一堆操作。我就很無語,他們真的是一點不懂什麼叫做 Linux 嗎?
沒有這麼多麻煩的事情,你裝個 Linux Desktop 難道不是自帶中文的?還要你自己去網上下字型檔案的?
很簡單,apt 倉庫裡面都有準備好的字型,直接用 apt 命令一鍵安裝就好了!
apt-get install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*
如何在 Docker 中安裝 chrome?
簡中網際網路上,很喜歡用 dkpg 安裝 chrome,但是這樣是非常的愚蠢的!他們可能不懂 Linux 也不懂 apt
正確的方式:使用 apt 安裝 chrome,因為 apt 會自動幫你處理依賴關係!
解決 Docker + selenium + chromedriver + chrome 會出現殭屍程式的問題:
1 18042 18041 18041 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18046 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18047 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18060 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18062 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18095 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18116 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18117 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18119 18118 18118 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18123 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18124 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18140 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18141 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18171 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18193 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18194 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18196 18195 18195 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18200 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18201 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18216 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18218 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18248 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18271 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18272 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18274 18273 18273 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18278 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18279 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18293 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18295 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18328 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18350 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18351 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18353 18352 18352 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18357 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18358 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18373 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18375 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18406 1 1 ? -1 Z 0 0:01 [chrome] <defunct>
1 18428 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18429 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18431 18430 18430 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18435 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18436 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18450 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18451 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18483 1 1 ? -1 Z 0 0:03 [chrome] <defunct>
1 18507 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18508 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18510 18509 18509 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18514 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18515 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18530 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18532 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18562 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
超多的殭屍程式會耗盡 pid 表,導致 Chrome failed to start: exited abnormally.
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer | (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer | (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer | Stacktrace:
解決因為 shm 交換空間過小,導致 session deleted because of page crash
selenium + chrome + chromedriver 這套組合需要的 shm 空間還是挺大的,Docker 預設只分配 shm 的 size 為 16 MB
單個 selenium + chrome + chromedriver 例項需要 20 MB 左右的 shm 空間。
如果你不管,就會出現下面的錯誤:
snapshot-consumer | File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer | raise exception_class(message, screen, stacktrace)
snapshot-consumer | │ │ │ └ ['#0 0x556b82b0db13 <unknown>', '#1 0x556b8291451f <unknown>', '#2 0x556b8290193d <unknown>', '#3 0x556b82901355 <unknown>', ...
snapshot-consumer | │ │ └ None
snapshot-consumer | │ └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer | └ <class 'selenium.common.exceptions.WebDriverException'>
snapshot-consumer |
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer | from tab crashed
snapshot-consumer | (Session info: headless chrome=103.0.5060.114)
snapshot-consumer | Stacktrace:
snapshot-consumer | #0 0x556b82b0db13 <unknown>
snapshot-consumer | #1 0x556b8291451f <unknown>
snapshot-consumer | #2 0x556b8290193d <unknown>
如何解決呢?
version: "3"
services:
snapshot:
container_name: snapshot
image: ponponon/snapshot
restart: always
logging:
driver: json-file
options:
max-size: "30m"
max-file: "1"
shm_size: "2048M"
command: python main.py
shm_size 設為多大合適?通過肉眼觀測,使用一般在 50MB 左右,所以設為 512M
綽綽有大餘
解決辦法:https://developer.aliyun.com/article/833847
docker-compose 如何設定 shm-size :參考 https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container
如何獲取 jpg 截圖
參考: JPG 還是 PNG 和記憶體結構有關係嗎?還是隻是儲存到硬碟的時候,才有區別?
順手做了一個開源教程放於 github 中:ponponon/snapshot