Tungsten Fabric入門寶典丨關於叢集更新的那些事

TF中文社群發表於2020-05-23
Tungsten Fabric入門寶典系列文章 ,來自技術大牛傾囊相授的實踐經驗,由TF中文社群為您編譯呈現,旨在幫助新手深入理解TF的執行、安裝、整合、除錯等全流程。如果您有相關經驗或疑問,歡迎與我們互動,並與社群極客們進一步交流。更多TF技術文章,請點選公號底部按鈕>學習>文章合集。

作者:Tatsuya Naganawa  譯者:TF編譯組


叢集範圍內的更新是很重要的功能,可以在保證生產叢集的SLA的條件下提供叢集中最新功能的有效性。
由於Tungsten Fabric在MPLS-VPN中使用類似的協議,因此,根據我的嘗試,即使Control的模組版本和vRouter的模組版本不同,基本的互操作性還是有的。
因此,總體思路是,首先一個一個地更新controller,然後並根據需要使用vMotion或maintence模式逐一更新vRouter。
另外,Tungsten Fabric controller還支援名為ISSU的奇特功能,但是我認為這個名稱有些混亂,因為Tungsten Fabric controller與route reflector非常相似,而不是routing-engine。

因此,基本思路是,首先將所有配置(configs)複製到新建立的controller(或route reflectors),然後更新vRouter設定(如果伺服器可以重新啟動,則更新vRouter模組)以使用這些新的controller。透過這個過程,vRouter模組更新的回滾操作也將更加容易。
下面我來描述一下這個過程。

  就地更新(in-place update)
由於ansible-deployer遵循等冪行為(idempotent behavior),因此更新與安裝沒有太大區別。以下的命令將更新所有模組。
一個限制是由於此命令幾乎同時重啟了所有節點,因此不容易做到一一重啟controller和vRouter。另外,從instances.yaml中刪除其它節點將不起作用,因為一個節點的更新需要其它節點的某些引數。
  • 例如,vRouter更新需要控制元件的IP,該IP是從instances.yaml中的control role節點推匯出的

為了克服這個問題,從R2005版本開始,ziu.yaml就加入了這個功能,至少對於控制平面來說,要逐一更新。

就我試過的情況來看,當控制平面更新時,它會進行串聯更新和重啟控制程式,所以沒有看到掉包的現象。

  • 在install_contrail.yaml期間,控制程式的重啟是被跳過的,因為它們已經更新了。

  • 當執行vrouter-agent重啟時,仍然會出現一些丟包現象,所以如果可以的話,建議進行工作負載遷移。


  ISSU
即使容器格式的差異很大(例如從4.x到5.x),我們也可以使用ISSU,因為它建立了一個新的controller叢集,並在其中複製資料。
首先,我來描述最簡單的情況,即一箇舊controller和一個新controller,以檢視整個過程。所有命令都在新的controller上鍵入。

old-controller:
 ip: 172.31.2.209
 hostname: ip-172-31-2-209
new-controller:
 ip: 172.31.1.154
 hostname: ip-172-31-1-154

(both controllers are installed with this instances.yaml)
provider_config:
  bms:
   ssh_user: root
   ssh_public_key: /root/.ssh/id_rsa.pub
   ssh_private_key: /root/.ssh/id_rsa
   domainsuffix: local
   ntpserver: 0.centos.pool.ntp.org
instances:
  bms1:
   provider: bms
   roles:
      config_database:
      config:
      control:
      analytics:
      analytics_database:
      webui:
   ip: x.x.x.x  ## controller's ip
contrail_configuration:
  CONTRAIL_CONTAINER_TAG: r5.1
  KUBERNETES_CLUSTER_PROJECT: {}
  JVM_EXTRA_OPTS:  "-Xms128m -Xmx1g"
global_configuration:
  CONTAINER_REGISTRY: tungstenfabric

[commands]
1. 停止批處理作業
docker stop config_devicemgr_1
docker stop config_schema_1
docker stop config_svcmonitor_1

2. 在cassandra上註冊新的control,在它們之間執行bgp
docker exec -it config_api_1 bash
python /opt/contrail/utils/provision_control.py --host_name ip-172-31-1-154.local --host_ip 172.31.1.154 --api_server_ip 172.31.2.209 --api_server_port 8082 --oper add --router_asn 64512 --ibgp_auto_mesh

3. 在控制器之間同步資料
vi contrail-issu.conf
(write down this)
[DEFAULTS]
old_rabbit_address_list = 172.31.2.209
old_rabbit_port = 5673
new_rabbit_address_list = 172.31.1.154
new_rabbit_port = 5673
old_cassandra_address_list = 172.31.2.209:9161
old_zookeeper_address_list = 172.31.2.209:2181
new_cassandra_address_list = 172.31.1.154:9161
new_zookeeper_address_list = 172.31.1.154:2181
new_api_info={ "172.31.1.154": [( "root"), ( "password")]}  ## ssh public-key can be used

image_id=`docker images | awk '/config-api/{print $3}' | head -1`

docker run --rm -it --network host -v  $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh $image_id -c  "/usr/bin/contrail-issu-pre-sync -c /etc/contrail/contrail-issu.conf"

4. 啟動程式進行實時資料同步
docker run --rm --detach -it --network host -v  $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c  "/usr/bin/contrail-issu-run-sync -c /etc/contrail/contrail-issu.conf"

(如果需要,請檢查日誌)
docker exec -t issu-run-sync tail -f /var/log/contrail/issu_contrail_run_sync.log

5. (更新 vrouters)

6. 在結束後停止作業,同步所有資料
docker rm -f issu-run-sync

image_id=`docker images | awk '/config-api/{print $3}' | head -1`
docker run --rm -it --network host -v  $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c  "/usr/bin/contrail-issu-post-sync -c /etc/contrail/contrail-issu.conf"
docker run --rm -it --network host -v  $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c  "/usr/bin/contrail-issu-zk-sync -c /etc/contrail/contrail-issu.conf"

7. 從cassandra中去掉舊的節點,加入新的節點
vi issu.conf
(write down this)
[DEFAULTS]
db_host_info={ "172.31.1.154""ip-172-31-1-154.local"}
config_host_info={ "172.31.1.154""ip-172-31-1-154.local"}
analytics_host_info={ "172.31.1.154""ip-172-31-1-154.local"}
control_host_info={ "172.31.1.154""ip-172-31-1-154.local"}
api_server_ip=172.31.1.154

docker cp issu.conf config_api_1:issu.conf
docker exec -it config_api_1 python /opt/contrail/utils/provision_issu.py -c issu.conf

8. 啟動批處理作業
docker start config_devicemgr_1
docker start config_schema_1
docker start config_svcmonitor_1


以下將是可能的檢查點。
  1. 步驟3之後,你可以嘗試使用contrail-api-cli ls -l \*來檢視所有資料是否已成功複製,並且可以透過ist.py ctr nei來檢視controller之間的ibgp是否已啟動。

  2. 步驟4之後,你可以修改舊資料庫,以檢視更改是否可以成功傳播到新資料庫。


接下來,我將在使用編排器和兩個vRouter的條件下,討論更為實際的情況。


  編排器整合
為了說明結合編排器的情況,我嘗試用ansible-deployer部署兩個vRouters和kubernetes。
即使與編排器結合使用,總體過程也不會有太大不同。
需要注意的是,需要在什麼時候將kube-manager更改為新的。
從某種意義上講,由於kube-manager動態訂閱了來自kube-apiserver的事件並更新了Tungsten Fabric配置資料庫(config-database),因此它類似於批處理作業,例如schema-transformer、svc-monitor和device-manager。因此,我使用此類批處理作業,同時停止並啟動了舊的或新的kube-manager(實際上也包括webui),但是可能需要根據每個設定進行更改。
這個示例中的總體過程如下所示。

1.設定一個controller(帶有一個kube-manager和kubernetes-master)和兩個vRouter
2.設定一個新的controller(帶有一個kube-manager,但kubernetes-master與舊的controller相同)
3.停止批處理作業、新controller的kube-manager和webui
4.啟動ISSU程式並繼續執行,直到開始執行同步(run-sync)
  ->將在controller之間建立iBGP
5.根據新controller的ansible-ployer一個一個地更新vRouter
   ->將一個vRouter移至新的vRouter時,新的controller也將獲得k8s-default-pod-network的路由目標(route-target),並且容器之間仍然可以ping通正常工作(ist.py ctr路由摘要以及ping的結果將在稍後附上)
6.將所有vRouter移至新的controller後,停止批處理作業、舊controller上的kube-manager和webui
    之後,繼續執行ISSU程式,新controller上啟動批處理作業, kube-manager和webui
   ->這個階段從開始到結束,你都無法手動更改config-database,因此可能需要一些維護時間
   (整個過程可能會持續5至15分鐘,可以ping通,但是直到啟動新的kube-manager時,新容器的建立才能正常工作)
7.最後,停止舊的節點上的control、config和config-database


更新vRouters時,我使用了controller的provider: bms-maint,k8s_master和vRouter,它們都已經更改為新的,以避免由於容器重啟而造成干擾。我附上了原始instances.yaml和更新vRouter的instances.yaml,以便大家獲取更多詳細資訊。

我還將在每個階段附加ist.py ctr nei和ist.py ctr路由摘要的結果,以說明發生的相關細節。
  • 請注意,在此示例中,我實際上並未更新模組,因為此設定主要是為了突出ISSU程式(因為即使模組版本相同,ansible-deployer也會重新建立vrouter-agent容器,同時即使完成實際的模組更新,丟包的數量也不會有太大不同。)

old-controller:  172.31.19.25
new-controller:  172.31.13.9
two-vRouters:  172.31.25.102172.31.33.175

開始issu之前:

[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host:  172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip -172-31-25-102.local |  172.31.25.102 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
| ip -172-31-33-175.local |  172.31.33.175 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[ root@ip-172-31-13-9 ~]
[ root@ip-172-31-13-9 ~]
[ root@ip-172-31-13-9 ~]
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host:  172.31.13.9
[ root@ip-172-31-13-9 ~]
 -> iBGP  is not configured yet

[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
Introspect Host:  172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            |  0        |  0     |  0             |  0               |  0                |
| project:__link_local__:__link_local__.inet .0       |          |       |               |                 |                  |
default-domain: default-project:dci-                |  0        |  0     |  0             |  0               |  0                |
| network:__default__.inet .0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    |  0        |  0     |  0             |  0               |  0                |
| network.inet .0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    |  0        |  0     |  0             |  0               |  0                |
| network: default- virtual-network.inet .0             |          |       |               |                 |                  |
| inet .0                                             |  0        |  0     |  0             |  0               |  0                |
default-domain: default-project:ip-fabric:ip-       |  7        |  7     |  2             |  5               |  0                |
| fabric.inet .0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network |  7        |  7     |  4             |  3               |  0                |
| :k8s- default-pod-network.inet .0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    |  7        |  7     |  1             |  6               |  0                |
| network:k8s- default-service-network.inet .0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[ root@ip-172-31-13-9 ~]
[ root@ip-172-31-13-9 ~]
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
Introspect Host:  172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            |  0        |  0     |  0             |  0               |  0                |
| project:__link_local__:__link_local__.inet .0       |          |       |               |                 |                  |
default-domain: default-project:dci-                |  0        |  0     |  0             |  0               |  0                |
| network:__default__.inet .0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    |  0        |  0     |  0             |  0               |  0                |
| network.inet .0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    |  0        |  0     |  0             |  0               |  0                |
| network: default- virtual-network.inet .0             |          |       |               |                 |                  |
| inet .0                                             |  0        |  0     |  0             |  0               |  0                |
default-domain: default-project:ip-fabric:ip-       |  0        |  0     |  0             |  0               |  0                |
| fabric.inet .0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network |  0        |  0     |  0             |  0               |  0                |
| :k8s- default-pod-network.inet .0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    |  0        |  0     |  0             |  0               |  0                |
| network:k8s- default-service-network.inet .0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[ root@ip-172-31-13-9 ~]
 -> 在新的控制器裡沒有匯入路由

[ root@ip-172-31-19-25 contrail-ansible-deployer] # kubectl get pod -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE                                               NOMINATED NODE
cirros-deployment -75c98888b9 -6qmcm    1/ 1     Running    0           4m58s    10.47.255.249   ip -172-31-25-102.ap-northeast -1.compute. internal   <none>
cirros-deployment -75c98888b9-lxq4k    1/ 1     Running    0           4m58s    10.47.255.250   ip -172-31-33-175.ap-northeast -1.compute. internal   <none>
[ root@ip-172-31-19-25 contrail-ansible-deployer]

# ip -o a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu  65536 qdisc noqueue qlen  1000\    link/loopback  00: 00: 00: 00: 00: 00 brd  00: 00: 00: 00: 00: 00
1: lo    inet  127.0.0.1/ 8 scope host lo\       valid_lft forever preferred_lft forever
13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu  1500 qdisc noqueue \    link/ether  02: 6b:dc: 98:ac: 95 brd ff:ff:ff:ff:ff:ff
13: eth0    inet  10.47.255.249/ 12 scope  global eth0\       valid_lft forever preferred_lft forever
# ping 10.47.255.250
PING  10.47.255.250 ( 10.47.255.250):  56 data bytes
64 bytes  from  10.47.255.250: seq= 0 ttl= 63 time= 2.155 ms
64 bytes  from  10.47.255.250: seq= 1 ttl= 63 time= 0.904 ms
^C
---  10.47.255.250 ping statistics ---
2 packets transmitted,  2 packets received,  0% packet loss
round-trip min/avg/max =  0.904/ 1.529/ 2.155 ms

 -> 兩個vRouter每個都有一個容器,在兩個容器之間ping的結果正常。


在provision_control之後:

[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host:  172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state      | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
| ip -172-31-13-9.local   |  172.31.13.9   |  64512    | BGP      |  internal  | Idle        | not advertising |  0          | n/a       |
| ip -172-31-25-102.local |  172.31.25.102 |  0        | XMPP     |  internal  | Established |  in sync         |  0          | n/a       |
| ip -172-31-33-175.local |  172.31.33.175 |  0        | XMPP     |  internal  | Established |  in sync         |  0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+-----------------+------------+-----------+
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host:  172.31.13.9
[ root@ip-172-31-13-9 ~] #
 -> iBGP 在老的控制器上,但是新的控制器還沒有那些配置(在執行pre-sync後,這會被複制到新的控制器上)


在run-sync之後:
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host:  172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip -172-31-13-9.local   |  172.31.13.9   |  64512    | BGP      |  internal  | Established |  in sync    |  0          | n/a       |
| ip -172-31-25-102.local |  172.31.25.102 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
| ip -172-31-33-175.local |  172.31.33.175 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host:  172.31.13.9
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                  | peer_address | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip -172-31-19-25.local |  172.31.19.25 |  64512    | BGP      |  internal  | Established |  in sync    |  0          | n/a       |
+-----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
[ root@ip-172-31-13-9 ~] #
 -> iBGP 建立起來了,ctr route summary沒有改變,因為新的控制器沒有k8s- default-pod-network的路由目標(route-target),路由目標(route target)過濾組織了匯入這些字首。


在遷移節點到新的controller之後:

# ping 10.47.255.250
PING  10.47.255.250 ( 10.47.255.250):  56 data bytes
64 bytes  from  10.47.255.250: seq= 0 ttl= 63 time= 1.684 ms
64 bytes  from  10.47.255.250: seq= 1 ttl= 63 time= 0.835 ms
64 bytes  from  10.47.255.250: seq= 2 ttl= 63 time= 0.836 ms
(snip)
64 bytes  from  10.47.255.250: seq= 37 ttl= 63 time= 0.878 ms
64 bytes  from  10.47.255.250: seq= 38 ttl= 63 time= 0.823 ms
64 bytes  from  10.47.255.250: seq= 39 ttl= 63 time= 0.820 ms
64 bytes  from  10.47.255.250: seq= 40 ttl= 63 time= 1.364 ms
64 bytes  from  10.47.255.250: seq= 44 ttl= 63 time= 2.209 ms
64 bytes  from  10.47.255.250: seq= 45 ttl= 63 time= 0.869 ms
64 bytes  from  10.47.255.250: seq= 46 ttl= 63 time= 0.857 ms
64 bytes  from  10.47.255.250: seq= 47 ttl= 63 time= 0.855 ms
64 bytes  from  10.47.255.250: seq= 48 ttl= 63 time= 0.845 ms
64 bytes  from  10.47.255.250: seq= 49 ttl= 63 time= 0.842 ms
64 bytes  from  10.47.255.250: seq= 50 ttl= 63 time= 0.885 ms
64 bytes  from  10.47.255.250: seq= 51 ttl= 63 time= 0.891 ms
64 bytes  from  10.47.255.250: seq= 52 ttl= 63 time= 0.909 ms
64 bytes  from  10.47.255.250: seq= 53 ttl= 63 time= 0.867 ms
64 bytes  from  10.47.255.250: seq= 54 ttl= 63 time= 0.884 ms
64 bytes  from  10.47.255.250: seq= 55 ttl= 63 time= 0.865 ms
64 bytes  from  10.47.255.250: seq= 56 ttl= 63 time= 0.840 ms
64 bytes  from  10.47.255.250: seq= 57 ttl= 63 time= 0.877 ms
^C
---  10.47.255.250 ping statistics ---
58 packets transmitted,  55 packets received,  5% packet loss
round-trip min/avg/max =  0.810/ 0.930/ 2.209 ms

 -> 在vrouter-agent 重啟後,可以看到丟了 3個包(序號 40-44)。在遷移vRouter到新的之後,ping工作地很好。


[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
檢查主機: 172.31.19.25
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip -172-31-13-9.local   |  172.31.13.9   |  64512    | BGP      |  internal  | Established |  in sync    |  0          | n/a       |
| ip -172-31-33-175.local |  172.31.33.175 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host:  172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip -172-31-19-25.local  |  172.31.19.25  |  64512    | BGP      |  internal  | Established |  in sync    |  0          | n/a       |
| ip -172-31-25-102.local |  172.31.25.102 |  0        | XMPP     |  internal  | Established |  in sync    |  0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[ root@ip-172-31-13-9 ~]
 -> 兩個控制器具有XMPP連線,建立了IBGP

[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
檢查主機:  172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            |  0        |  0     |  0             |  0               |  0                |
| project:__link_local__:__link_local__.inet .0       |          |       |               |                 |                  |
default-domain: default-project:dci-                |  0        |  0     |  0             |  0               |  0                |
| network:__default__.inet .0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    |  0        |  0     |  0             |  0               |  0                |
| network.inet .0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    |  0        |  0     |  0             |  0               |  0                |
| network: default- virtual-network.inet .0             |          |       |               |                 |                  |
| inet .0                                             |  0        |  0     |  0             |  0               |  0                |
default-domain: default-project:ip-fabric:ip-       |  7        |  7     |  1             |  6               |  0                |
| fabric.inet .0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network |  7        |  7     |  1             |  6               |  0                |
| :k8s- default-pod-network.inet .0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    |  7        |  7     |  0             |  7               |  0                |
| network:k8s- default-service-network.inet .0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[ root@ip-172-31-13-9 ~] # ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
檢查主機:  172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            |  0        |  0     |  0             |  0               |  0                |
| project:__link_local__:__link_local__.inet .0       |          |       |               |                 |                  |
default-domain: default-project:dci-                |  0        |  0     |  0             |  0               |  0                |
| network:__default__.inet .0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    |  0        |  0     |  0             |  0               |  0                |
| network.inet .0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    |  0        |  0     |  0             |  0               |  0                |
| network: default- virtual-network.inet .0             |          |       |               |                 |                  |
| inet .0                                             |  0        |  0     |  0             |  0               |  0                |
default-domain: default-project:ip-fabric:ip-       |  7        |  7     |  1             |  6               |  0                |
| fabric.inet .0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network |  7        |  7     |  3             |  4               |  0                |
| :k8s- default-pod-network.inet .0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    |  7        |  7     |  1             |  6               |  0                |
| network:k8s- default-service-network.inet .0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[ root@ip-172-31-13-9 ~]
 -> 因為兩個控制器具有至少一個容器來自k8s- default-pod-network, 它們使用iBGP來交換字首,因此它們具有同一個字首


在將第二個vrouter遷移到新的控制器之後:
# ping 10.47.255.250
PING  10.47.255.250 ( 10.47.255.250):  56 data bytes
64 bytes  from  10.47.255.250: seq= 0 ttl= 63 time= 1.750 ms
64 bytes  from  10.47.255.250: seq= 1 ttl= 63 time= 0.815 ms
64 bytes  from  10.47.255.250: seq= 2 ttl= 63 time= 0.851 ms
64 bytes  from  10.47.255.250: seq= 3 ttl= 63 time= 0.809 ms
(snip)
64 bytes  from  10.47.255.250: seq= 34 ttl= 63 time= 0.853 ms
64 bytes  from  10.47.255.250: seq= 35 ttl= 63 time= 0.848 ms
64 bytes  from  10.47.255.250: seq= 36 ttl= 63 time= 0.833 ms
64 bytes  from  10.47.255.250: seq= 37 ttl= 63 time= 0.832 ms
64 bytes  from  10.47.255.250: seq= 38 ttl= 63 time= 0.910 ms
64 bytes  from  10.47.255.250: seq= 42 ttl= 63 time= 2.071 ms
64 bytes  from  10.47.255.250: seq= 43 ttl= 63 time= 0.826 ms
64 bytes  from  10.47.255.250: seq= 44 ttl= 63 time= 0.853 ms
64 bytes  from  10.47.255.250: seq= 45 ttl= 63 time= 0.851 ms
64 bytes  from  10.47.255.250: seq= 46 ttl= 63 time= 0.853 ms
64 bytes  from  10.47.255.250: seq= 47 ttl= 63 time= 0.851 ms
64 bytes  from  10.47.255.250: seq= 48 ttl= 63 time= 0.855 ms
64 bytes  from  10.47.255.250: seq= 49 ttl= 63 time= 0.869 ms
64 bytes  from  10.47.255.250: seq= 50 ttl= 63 time= 0.833 ms
64 bytes  from  10.47.255.250: seq= 51 ttl= 63 time= 0.859 ms
64 bytes  from  10.47.255.250: seq= 52 ttl= 63 time= 0.866 ms
64 bytes  from  10.47.255.250: seq= 53 ttl= 63 time= 0.840 ms
64 bytes  from  10.47.255.250: seq= 54 ttl= 63 time= 0.841 ms
64 bytes  from  10.47.255.250: seq= 55 ttl= 63 time= 0.854 ms
^C
---  10.47.255.250 ping statistics ---
56 packets transmitted,  53 packets received,  5% packet loss
round-trip min/avg/max =  0.799/ 0.888/ 2.071 ms
#
 ->  3  packet loss  is  seen ( seq  38-42)


[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                 | peer_address | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-13-9.local | 172.31.13.9  | 64512    | BGP      |  internal  | Established |  in sync    | 0          | n/a       |
+----------------------+--------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-19-25.local  | 172.31.19.25  | 64512    | BGP      |  internal  | Established |  in sync    | 0          | n/a       |
| ip-172-31-25-102.local | 172.31.25.102 | 0        | XMPP     |  internal  | Established |  in sync    | 0          | n/a       |
| ip-172-31-33-175.local | 172.31.33.175 | 0        | XMPP     |  internal  | Established |  in sync    | 0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]# 
 -> 新的控制器具有兩個XMPP連線。

[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr route summary
檢查主機:172.31.19.25
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            | 0        | 0     | 0             | 0               | 0                |
| project:__link_local__:__link_local__.inet.0       |          |       |               |                 |                  |
default-domain: default-project:dci-                | 0        | 0     | 0             | 0               | 0                |
| network:__default__.inet.0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    | 0        | 0     | 0             | 0               | 0                |
| network.inet.0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    | 0        | 0     | 0             | 0               | 0                |
| network: default- virtual-network.inet.0             |          |       |               |                 |                  |
| inet.0                                             | 0        | 0     | 0             | 0               | 0                |
default-domain: default-project:ip-fabric:ip-       | 0        | 0     | 0             | 0               | 0                |
| fabric.inet.0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network | 0        | 0     | 0             | 0               | 0                |
| :k8s- default-pod-network.inet.0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    | 0        | 0     | 0             | 0               | 0                |
| network:k8s- default-service-network.inet.0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr route summary
檢查主機:172.31.13.9
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
| name                                               | prefixes | paths | primary_paths | secondary_paths | infeasible_paths |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
default-domain: default-                            | 0        | 0     | 0             | 0               | 0                |
| project:__link_local__:__link_local__.inet.0       |          |       |               |                 |                  |
default-domain: default-project:dci-                | 0        | 0     | 0             | 0               | 0                |
| network:__default__.inet.0                         |          |       |               |                 |                  |
default-domain: default-project:dci-network:dci-    | 0        | 0     | 0             | 0               | 0                |
| network.inet.0                                     |          |       |               |                 |                  |
default-domain: default-project: default- virtual-    | 0        | 0     | 0             | 0               | 0                |
| network: default- virtual-network.inet.0             |          |       |               |                 |                  |
| inet.0                                             | 0        | 0     | 0             | 0               | 0                |
default-domain: default-project:ip-fabric:ip-       | 7        | 7     | 2             | 5               | 0                |
| fabric.inet.0                                      |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-pod-network | 7        | 7     | 4             | 3               | 0                |
| :k8s- default-pod-network.inet.0                    |          |       |               |                 |                  |
default-domain:k8s- default:k8s- default-service-    | 7        | 7     | 1             | 6               | 0                |
| network:k8s- default-service-network.inet.0         |          |       |               |                 |                  |
+----------------------------------------------------+----------+-------+---------------+-----------------+------------------+
[root@ip-172-31-13-9 ~]#
 -> 老的控制器不再具有字首。


在ISSU過程結束後,新的kube-manager 啟動:
[root@ip-172-31-19-25 ~]# kubectl  get pod -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE                                               NOMINATED NODE
cirros-deployment-75c98888b9-6qmcm    1/1     Running   0          34m   10.47.255.249   ip-172-31-25-102.ap-northeast-1.compute. internal   <none>
cirros-deployment-75c98888b9-lxq4k    1/1     Running   0          34m   10.47.255.250   ip-172-31-33-175.ap-northeast-1.compute. internal   <none>
cirros-deployment2-648b98685f-b8pxw   1/1     Running   0          15s   10.47.255.247   ip-172-31-25-102.ap-northeast-1.compute. internal   <none>
cirros-deployment2-648b98685f-nv7z9   1/1     Running   0          15s   10.47.255.248   ip-172-31-33-175.ap-northeast-1.compute. internal   <none>
[root@ip-172-31-19-25 ~]# 
 -> 透過新的IP建立容器(10.47.255.247, 10.47.255.248 是來自新控制器的新地址)

[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.19.25 ctr nei
Introspect Host: 172.31.19.25
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
| peer                 | peer_address | peer_asn | encoding | peer_type | state  | send_state      | flap_count | flap_time                   |
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
| ip-172-31-13-9.local | 172.31.13.9  | 64512    | BGP      |  internal  | Active | not advertising | 1          | 2019-Jun-23 05:37:02.614003 |
+----------------------+--------------+----------+----------+-----------+--------+-----------------+------------+-----------------------------+
[root@ip-172-31-13-9 ~]# ./contrail-introspect-cli/ist.py --host 172.31.13.9 ctr nei
Introspect Host: 172.31.13.9
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| peer                   | peer_address  | peer_asn | encoding | peer_type | state       | send_state | flap_count | flap_time |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
| ip-172-31-25-102.local | 172.31.25.102 | 0        | XMPP     |  internal  | Established |  in sync    | 0          | n/a       |
| ip-172-31-33-175.local | 172.31.33.175 | 0        | XMPP     |  internal  | Established |  in sync    | 0          | n/a       |
+------------------------+---------------+----------+----------+-----------+-------------+------------+------------+-----------+
[root@ip-172-31-13-9 ~]#
 -> 新控制器不再有iBGP路由到舊的控制器。舊控制器依然具有iBGP路由條目,雖然這個過程很快就要停止:)

在停止了就的控制器後,配置:
[root@ip-172-31-19-25 ~]# kubectl  get pod -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE                                               NOMINATED NODE
cirros-deployment-75c98888b9-6qmcm    1/1     Running   0          48m   10.47.255.249   ip-172-31-25-102.ap-northeast-1.compute. internal   <none>
cirros-deployment-75c98888b9-lxq4k    1/1     Running   0          48m   10.47.255.250   ip-172-31-33-175.ap-northeast-1.compute. internal   <none>
cirros-deployment2-648b98685f-b8pxw   1/1     Running   0          13m   10.47.255.247   ip-172-31-25-102.ap-northeast-1.compute. internal   <none>
cirros-deployment2-648b98685f-nv7z9   1/1     Running   0          13m   10.47.255.248   ip-172-31-33-175.ap-northeast-1.compute. internal   <none>
cirros-deployment3-68fb484676-ct9q9   1/1     Running   0          18s   10.47.255.245   ip-172-31-25-102.ap-northeast-1.compute. internal   <none>
cirros-deployment3-68fb484676-mxbzq   1/1     Running   0          18s   10.47.255.246   ip-172-31-33-175.ap-northeast-1.compute. internal   <none>
[root@ip-172-31-19-25 ~]# 
 -> 新容器依然可以被建立

[root@ip-172-31-25-102 ~]# contrail-status 
Pod      Service  Original Name           State    Id            Status         
vrouter  agent    contrail-vrouter-agent  running  9a46a1a721a7  Up 33 minutes  
vrouter  nodemgr  contrail-nodemgr        running  11fb0a7bc86d  Up 33 minutes  

vrouter kernel module  is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active

[ root@ip-172-31-25-102 ~]
 -> 具有新控制器的vRouter工作良好

# ping 10.47.255.250
PING  10.47.255.250 ( 10.47.255.250):  56 data bytes
64 bytes  from  10.47.255.250: seq= 0 ttl= 63 time= 1.781 ms
64 bytes  from  10.47.255.250: seq= 1 ttl= 63 time= 0.857 ms
^C
---  10.47.255.250 ping statistics ---
2 packets transmitted,  2 packets received,  0% packet loss
round-trip min/avg/max =  0.857/ 1.319/ 1.781 ms
#
 -> 在vRouter之間Ping成功了


  向後相容
由於有好幾種更新叢集的方法(就地、ISSU、是否為ifdown vhost0),因此方法的選擇也是個重要的話題。
在討論細節之前,讓我先描述一下vrouter-agent up / down的行為,以及ifup vhost0 / ifdown vhost0的行為。
重新啟動vrouter-agent時,一種假設是重新建立了vrouter-agent容器和vhost0。
實際上,事實並非如此,因為vhost0與vrouter.ko是緊密耦合的,需要從kernel中解除安裝vrouter.ko的同時將其刪除。所以從操作角度來說,需要ifdown vhost0,那麼不僅需要更新vrouter-agent,還需要更新vrouter.ko。(ifdown vhost0也將在內部執行rmmod vrouter)。
因此,要討論向後相容,需要研究下面三個主題。

1. controller與vrouter-agent的相容性
  • 如果沒有向後相容性,則需要ISSU


2. vrouter-agent與vrouter.ko的相容性
  • 如果沒有向後相容性,則需要ifdown vhost0,這將導致最少5-10秒的流量損失,因此實際上意味著需要將流量轉移到其它節點,如實時遷移(live migration)

  • 由於vrouter-agent使用netlink與vrouter.ko同步資料,因此架構更改可能導致vrouter-agent發生意外行為(例如Ksync logic上的vrouter-agent分段錯誤)


3. vrouter.ko與kernel的相容性
  • 如果沒有向後相容性,則需要更新kernel,因此這意味著需要將流量移至其它節點

  • 當vrouter.ko具有不同的in-kernal API時,無法由kernel載入,並且無法建立vhost0和vrouter-agent


對於2和3,基於種種原因不可避免地需要進行kernel更新,因此一個可行的計劃是,首先選擇一個新的kernel版本,然後選擇一個支援該kernel的vrouter-agent / vrouter.ko,再並檢查當前使用的vrouter-agent是否可以與該版本的control一起使用。

  • 如果執行良好,請使用就地更新;如果由於某些原因無法執行,或者需要回滾操作,此時需要使用ISSU


對於1,由於在匯入config-api定義時ifmap會為每個版本維護white_list。

  • voidIFMapGraphWalker::AddNodesToWhitelist():


根據我的嘗試,它似乎具有不錯的向後相容性(由於路由資訊更新與BGP類似,因此大多數情況下也應該可以正常工作)。

為了驗證這一點,我嘗試使用不同版本的模組進行該設定,看起來仍然可以正常工作。

I-1 . config 2002-latest, control 2002-latest, vrouter 5.0-latest, openstack queens
I-2 config 2002-latest, control 5.0-latest, vrouter 5.0-latest, openstack queens
II-1. config 2002-latest, control 2002-latest, vrouter r5.1, kubernetes 1.12

注意:不幸的是,這種組合不能很好地工作(cni無法從vrouter-agent獲取埠資訊),我想這是由於5.0.x和5.1之間的cni版本更改(0.2.0-> 0.3.1)引起的。

II-2. config 2002-latest, control 2002-latest, vrouter 5.0-latest, kubernetes 1.12

因此,即使不需要立即更改kernel和vRouter版本,比較頻繁地更新config / control也是一個好習慣,以修復可能的錯誤。



Tungsten Fabric入門寶典系列文章——

  1. 首次啟動和執行指南

  2. TF元件的七種“武器”

  3. 編排器整合

  4. 關於安裝的那些事(上)

  5. 關於安裝的那些事(下)

  6. 主流監控系統工具的整合

  7. 開始第二天的工作

  8. 8個典型故障及排查Tips


 Tungsten Fabric 架構解析 系列文章——






來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69957171/viewspace-2693907/,如需轉載,請註明出處,否則將追究法律責任。

相關文章