搭建的是 k8s 高可用叢集,用了 3 臺 master 節點,2 臺 master 節點當機後,僅剩的 1 臺無法正常工作。
執行 kubectl get nodes 命令出現下面的錯誤
The connection to the server k8s-api:6443 was refused - did you specify the right host or port?
注:k8s-api 對應的就是這臺 master 伺服器的本機 IP 地址。
執行 netstat -lntp 命令發現 kube-apiserver 根本沒有執行,同時發現 etcd 與 kube-proxy 也沒執行。
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:33807 0.0.0.0:* LISTEN 602/kubelet
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 572/rpcbind
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 3229/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 3753/kube-scheduler
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 571/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1644/sshd
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 602/kubelet
tcp6 0 0 :::111 :::* LISTEN 572/rpcbind
tcp6 0 0 :::10250 :::* LISTEN 602/kubelet
tcp6 0 0 :::10251 :::* LISTEN 3753/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 3229/kube-controlle
透過 docker ps 命令發現 etcd , kube-apiserver, kube-proxy 這 3 個容器都沒有執行,etcd 容器在不停地啟動->失敗->重啟->又失敗......,檢視容器日誌發現下面的錯誤:
etcdserver: publish error: etcdserver: request timed out
rafthttp: health check for peer 611e58a32a3e3ebe could not connect: dial tcp 10.0.1.252:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
rafthttp: health check for peer 611e58a32a3e3ebe could not connect: dial tcp 10.0.1.252:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
rafthttp: health check for peer cc00b4912b6442df could not connect: dial tcp 10.0.1.82:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
rafthttp: health check for peer cc00b4912b6442df could not connect: dial tcp 10.0.1.82:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
raft: 12637f5ec2bd02b8 is starting a new election at term 254669
etcd 啟動失敗是由於 etcd 在 3 節點叢集模式在啟動卻無法連線另外 2 臺 master 節點的 etcd ,要解決這個問題需要改為單節點叢集模式。開始不知道如何將 etcd 改為單節點模式,後來在網上找到 2 個引數 --initial-cluster-state=new
與 --force-new-cluster
,在 /etc/kubernetes/manifests/etcd.yaml 中給 etcd 命令加上這 2 個引數,並重啟伺服器後,master 節點就能正常執行了。
containers:
- command:
- etcd
- --advertise-client-urls=https://10.0.1.81:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://10.0.1.81:2380
- --initial-cluster=k8s-master0=https://10.0.1.81:2380
- --initial-cluster-state=new
......
master 正常執行後,需要去掉剛剛新增的這 2 個 etcd 引數。