徹底解決docker build時安裝軟體失敗問題

趙安家發表於2017-09-01

最近遇到一個問題,構建Dockerfile映象時,如果安裝軟體,有一定概率失敗(2%-10%)。以alpine為例

失敗日誌如下

Step 4/6 : RUN echo -e "https://mirrors.ustc.edu.cn/alpine/latest-stable/main\nhttps://mirrors.ustc.edu.cn/alpine/latest-stable/community" > /etc/apk/repositories &&     apk update &&     apk add tzdata &&     cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime &&     echo "Asia/Shanghai" >  /etc/timezone &&     rm -rf /var/cache/apk/*
 ---> Running in bd5d1dfd3ff4
fetch https://mirrors.ustc.edu.cn/alpine/latest-stable/main/x86_64/APKINDEX.tar.gz
fetch https://mirrors.ustc.edu.cn/alpine/latest-stable/community/x86_64/APKINDEX.tar.gz
v3.6.2-83-g1079181bed [https://mirrors.ustc.edu.cn/alpine/latest-stable/main]
v3.6.2-84-g6ee501e465 [https://mirrors.ustc.edu.cn/alpine/latest-stable/community]
OK: 8440 distinct packages available
(1/1) Installing tzdata (2017a-r0)
ERROR: tzdata-2017a-r0: temporary error (try again later)複製程式碼

為了重現該問題,簡單的構建一個Docker 映象,基於alpine,安裝tzdata,並設定北京時區

為了加速構建,替換為中科大的映象地址

Dockerfile

FROM alpine
RUN echo -e "https://mirrors.ustc.edu.cn/alpine/latest-stable/main\nhttps://mirrors.ustc.edu.cn/alpine/latest-stable/community" > /etc/apk/repositories && \
    apk update &&\
    apk --no-cache add tzdata && \
    cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo "Asia/Shanghai" >  /etc/timezone複製程式碼

其實一開始沒有用國內源,用的官方,但是經常失敗,以為是牆的問題,輾轉換過阿里雲映象,清華映象,中科大映象,甚至後來自建映象 github repo anjia0532/alpine-package-mirror , 三種方法解決docker構建失敗(alpine),但是都是時好時壞,嚴重影響效率。

後來在觀察nginx訪問日誌的時候,報錯的時候nginx沒有產生訪問日誌,遂懷疑是構建映象時沒有發出網路請求,祭出神器 tcpdump 進行進一步排查

note

為了減少干擾,實驗機器中,沒有其他docker服務在跑(否則tcp請求太多)

sudo tcpdump -i docker0

....

10:47:43.038404 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
10:47:43.114734 ARP, Request who-has 172.17.0.1 tell 172.17.0.2, length 28
10:47:43.114746 ARP, Reply 172.17.0.1 is-at 02:42:30:19:53:45 (oui Unknown), length 28
10:47:43.114750 IP 172.17.0.2.48223 > google-public-dns-a.google.com.domain: 18503+ A? alpine.xxx.com. (39)
10:47:43.114775 IP 172.17.0.2.48223 > google-public-dns-b.google.com.domain: 18503+ A? alpine.xxx.com. (39)
10:47:43.114827 IP 172.17.0.2.48223 > google-public-dns-a.google.com.domain: 18687+ AAAA? alpine.xxx.com. (39)
10:47:43.114833 IP 172.17.0.2.48223 > google-public-dns-b.google.com.domain: 18687+ AAAA? alpine.xxx.com. (39)
10:47:43.209679 IP google-public-dns-a.google.com.domain > 172.17.0.2.48223: 18503 1/0/0 A 172.60.20.6 (55)
10:47:43.229261 IP google-public-dns-a.google.com.domain > 172.17.0.2.48223: 18687 0/1/0 (106)

....複製程式碼

發現在構建的時候,是走的google的dns進行解析的,因為眾多不可描述的問題,google在國內基本是癱瘓狀態(google翻譯例外)

Filtering is necessary because all localhost addresses on the host are unreachable from the container’s network. After this filtering, if there are no more nameserver entries left in the container’s /etc/resolv.conf file, the daemon adds public Google DNS nameservers (8.8.8.8 and 8.8.4.4) to the container’s DNS configuration. If IPv6 is enabled on the daemon, the public IPv6 Google DNS nameservers will also be added (2001:4860:4860::8888 and 2001:4860:4860::8844).

原文見官方文件 Embedded DNS server in user-defined networks

兩種方案,

  1. 修改宿主機的hosts檔案,寫死ip
  2. 修改Docker的daemon.json檔案

推薦用第二種,參考一下官方文件 DAEMON CONFIGURATION FILE#On Linux

更合理的方案是修改docker的daemon.json sudo vi /etc/docker/daemon.json

比如改成dnspod dns

增加 "dns": ["119.29.29.29"]

然後sudo systemctl daemon-reload

11:14:17.586559 ARP, Request who-has 172.17.0.1 tell 172.17.0.2, length 28
11:14:17.586577 ARP, Reply 172.17.0.1 is-at 02:42:30:19:53:45 (oui Unknown), length 28
11:14:17.586581 IP 172.17.0.2.43273 > pdns.dnspod.cn.domain: 53616+ A? alpine.xxx.com. (39)
11:14:17.586604 IP 172.17.0.2.43273 > pdns.dnspod.cn.domain: 53868+ AAAA? alpine.xxx.com. (39)
11:14:17.777921 IP pdns.dnspod.cn.domain > 172.17.0.2.43273: 53868 0/1/0 (106)
11:14:17.843875 IP pdns.dnspod.cn.domain > 172.17.0.2.43273: 53616 1/0/0 A 172.60.20.6 (55)
11:14:17.844028 IP 172.17.0.2.36810 > 172.60.20.6.http: Flags [S], seq 4032628285, win 42340, options [mss 1460,sackOK,TS val 1959807306 ecr 0,nop,wscale 11], length 0複製程式碼

整個過程可參見我在中科大 github的issues alpine 映象頻繁異常

部落格 anjia.ml/2017/09/01/…
掘金 juejin.im/post/59a8f9…
簡書 www.jianshu.com/p/1f4e62dff…

相關文章