本文分享自華為雲社群《Calico BGP RouteReflector策略實踐》,作者:可以交個朋友。
一 背景
容器網路元件Calico支援多種後端模式,有Overlay的IPIP、Vxlan模式,也有Underlay純路由的BGP模式。
相比於Overlay網路模型,Underlay網路具有更高的資料面轉發效能。同時在純路由模式下,也有兩種方案:Calico BGP的fullmesh方案,該方案存在一些限制,適用於小規模kubernetes叢集,叢集節點越多,BGP連線就越多,需要建立大量連線來保證網路的互通性,每增加一個節點就要成倍的增加連線保證網路的互通性,這樣的話就會使用大量的網路消耗。所以這時就可以使用Route Reflector模式,也稱為RR模式。RR模式
中會指定一個或多個BGP Speaker為RouterReflecor,它與網路中其他Speaker建立連線,每個Speaker只要與Router Reflector建立BGP就可以獲得全網的路由資訊。
二 Calico BGP RouteReflector模式組網架構
在不改變IDC機房內部網路拓撲的情況下,接入層交換機和核心層交換機建立BGP連線,藉助於機房內部已有的路由策略實現,針對Node所處的物理位置分配Pod CIDR,並在每個節點上將Pod CIDR透過BGP協議宣告給接入層交換機,實現全網通訊的能力。下圖基於Leaf-Spine架構做詳細說明。
組網原則:
- 每個接入層交換機與其管理的Node二層聯通,共同構成一個AS。每個節點上跑BGP服務,用於宣告本節點路由資訊。
- 核心層交換機和接入層交換機之間的每個路由器單獨佔用一個AS,物理直連,跑BGP協議。核心層交換機可以感知到全網的路由資訊,接入層交換機可以感知與自己直連的Node上的路由資訊。
- 同一個主機上的pod互訪透過宿主機路由器。(將linux主機當成一個路由器)
- 同一個機架上不同node上的pod通訊透過TOR(leaf)交換機
- 不同機架上pod通訊走核心交換機
三 模擬生產場景組網搭建環境
提前準備一臺Ubuntu2204作業系統的機器(規格8U16G即可)。需要在虛擬機器上安裝如下軟體工具:
- Docker
- go開發環境
- Kind(kubernetes興趣小組開發的一款kuberntes in docker軟體,可用來快速搭建k8s測試環境,kind安裝需要主機上先安裝go,kind安裝版本可選v0.20.0版本)
- ContainerLab(使用容器技術構建的虛擬網路平臺,可以使用vyos映象構建虛擬的交換機路由器。建議安裝v0.42.0版本的containerlab)
3.1 kubernetes 環境搭建
kubernetes叢集版本為: 1.27.3
叢集規模為1 master,3 work node
叢集構建指令碼如下: 1-setup-env.sh
#!/bin/bash date set -v # 1.prep noCNI env cat <<EOF | kind create cluster --name=calico-bgp-rr --image=kindest/node:v1.27.3 --config=- kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 networking: disableDefaultCNI: true podSubnet: "10.244.0.0/16" nodes: - role: control-plane kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.10 node-labels: "rack=rack0" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.11 node-labels: "rack=rack0" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.10 node-labels: "rack=rack1" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.11 node-labels: "rack=rack1" EOF # 2.remove taints kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule- kubectl get nodes -o wide # 3. install tools for i in $(docker ps -a --format "table {{.Names}}" |grep calico-bgp-rr) do echo $i docker cp /usr/bin/ping $i:/usr/bin/ping docker cp /usr/local/bin/calicoctl $i:/usr/local/bin/ # docker exec -it $i bash -c "apt-get -y update > /dev/null && apt-get -y install net-tools tcpdump lrzsz > /dev/null 2>&1" done
執行指令碼建立叢集,由於未安裝cni元件,叢集部分pod會出現pending等狀態,叢集node 也會處於NotReady狀態,這是正常現象。後面安裝calico cni元件後,就可以解決。
3.2 建立網橋
在主機上建立網橋,主要作用是為了連通kind建立的K8s node和containerlab構建的交換機之間的網路。
brctl addbr br-leaf0;ifconfig br-leaf0 up;brctl addbr br-leaf1;ifconfig br-leaf1 up
3.3 藉助containerLab搭建三層交換機並配置BGP規則
containerlab構建交換機指令碼如下:2-setup-clab.sh
#!/bin/bash set -v cat <<EOF>clab.yaml | clab deploy -t clab.yaml - name: calico-bgp-rr topology: nodes: spine0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine0-boot.cfg:/opt/vyatta/etc/config/config.boot spine1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine1-boot.cfg:/opt/vyatta/etc/config/config.boot leaf0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf0-boot.cfg:/opt/vyatta/etc/config/config.boot leaf1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf1-boot.cfg:/opt/vyatta/etc/config/config.boot br-leaf0: kind: bridge br-leaf1: kind: bridge server1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-control-plane exec: - ip addr add 10.1.5.10/24 dev net0 - ip route replace default via 10.1.5.1 server2: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker exec: - ip addr add 10.1.5.11/24 dev net0 - ip route replace default via 10.1.5.1 server3: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker2 exec: - ip addr add 10.1.8.10/24 dev net0 - ip route replace default via 10.1.8.1 server4: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker3 exec: - ip addr add 10.1.8.11/24 dev net0 - ip route replace default via 10.1.8.1 links: - endpoints: ["br-leaf0:br-leaf0-net0", "server1:net0"] - endpoints: ["br-leaf0:br-leaf0-net1", "server2:net0"] - endpoints: ["br-leaf1:br-leaf1-net0", "server3:net0"] - endpoints: ["br-leaf1:br-leaf1-net1", "server4:net0"] - endpoints: ["leaf0:eth1", "spine0:eth1"] - endpoints: ["leaf0:eth2", "spine1:eth1"] - endpoints: ["leaf0:eth3", "br-leaf0:br-leaf0-net2"] - endpoints: ["leaf1:eth1", "spine0:eth2"] - endpoints: ["leaf1:eth2", "spine1:eth2"] - endpoints: ["leaf1:eth3", "br-leaf1:br-leaf1-net2"] EOF
可以看到containerlab組網成功,vyos對應的交換機上的bgp路由協議配置參照文件末尾。
3.4 Calico cni外掛部署安裝
由於Calico預設安裝的是ipip模式,需要手動進行關閉,不透過ipip/vxlan封裝就會開啟bgp模式。
kubectl apply -f calico.yaml
#kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.23/manifests/calico.yaml
Calico元件安裝完成後,節點之間建立的BGP連線是fullmesh全連線的形式
3.5 Calico BGP RR模式開啟
fullmesh全連線形式在大規模叢集中並不適用,我們需要關閉bgp fullmesh的模式,採取bgp route reflector
方法如下: 3-disable-bgp-full-mesh.sh
#!/bin/bash set -v # 1. disable bgp fullmesh cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 items: - apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: false kind: BGPConfigurationList metadata: EOF
3.6 Calico node 配置BGP RR規則
kubernetes 叢集中的節點作為BGP 路由反射器的客戶端,需要和BGP路由反射器配置peer資訊以達到同步路由的功能。
#!/bin/bash set -v # 1.3. add() bgp configuration for the nodes cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: annotations: labels: rack: rack0 name: calico-bgp-rr-control-plane spec: addresses: - address: 10.1.5.10 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.10/24 orchRefs: - nodeName: calico-bgp-rr-control-plane orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack0 name: calico-bgp-rr-worker spec: addresses: - address: 10.1.5.11 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.11/24 orchRefs: - nodeName: calico-bgp-rr-worker orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack1 name: calico-bgp-rr-worker2 spec: addresses: - address: 10.1.8.10 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.10/24 orchRefs: - nodeName: calico-bgp-rr-worker2 orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack1 name: calico-bgp-rr-worker3 spec: addresses: - address: 10.1.8.11 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.11/24 orchRefs: - nodeName: calico-bgp-rr-worker3 orchestrator: k8s EOF # 1.4. peer to leaf0 switch cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: rack0-to-leaf0 spec: peerIP: 10.1.5.1 asNumber: 65005 nodeSelector: rack == 'rack0' EOF # 1.5. peer to leaf1 switch cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: rack1-to-leaf1 spec: peerIP: 10.1.8.1 asNumber: 65008 nodeSelector: rack == 'rack1' EOF
登入到叢集中任意節點,檢視BGP資訊: 發現已經不再是BGP full mesh的形式了。node specific 表示該節點是路由反射器的客戶端,對端即路由反射器是10.1.5.1這個地址
四 叢集外訪問Pod進行BGP驗證測試
部署測試業務
apiVersion: apps/v1 kind: DaemonSet #kind: Deployment metadata: labels: app: app name: app spec: #replicas: 2 selector: matchLabels: app: app template: metadata: labels: app: app spec: containers: - image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool name: nettoolbox --- apiVersion: v1 kind: Service metadata: name: app spec: type: NodePort selector: app: app ports: - name: app port: 8080 targetPort: 80 nodePort: 32000
登入叢集任意節點檢視路由規則
例如: 10.244.210.64/26 via 10.1.5.1 dev net0 proto bird
, 就是表示透過BGP協議學習的路由,bird則是calico中的BGP客戶端
登入leaf0交換機檢視BGP資訊和路由規則
檢視路由表:
可以發現leaf0交換機上存在k8s叢集中的pod路由資訊,也就是說可以訪問叢集中的pod
檢視BGP資訊:show ip bgp
可以明顯看到:
前往地址為: 10.1.8.0/24
|| 10.244.192.0/26
|| 10.244.210.64
的裝置 下一跳有兩個10.1.12.2
和10.1.10.2
屬於EBGP路由,包含ECMP策略
前往地址為: 10.244.81.64/26
|| 10.244.205.64/26
下一跳分別為10.1.5.10
||10.1.5.11
屬於IBGP路由
訪問測試
叢集中pod互訪
核心交換機訪問叢集pod
如果說核心交換機和公網配置ebgp規則同步路由後,公網流量也就能進入kubernetes叢集中了。五 Containerlab中的vyos容器映象模擬交換機的配置檔案
spine0-boot.cfg如下:interfaces { ethernet eth1 { address 10.1.10.2/24 duplex auto speed auto } ethernet eth2 { address 10.1.34.2/24 duplex auto speed auto } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.10.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.10.1 { address-family { ipv4-unicast { } } remote-as 65005 } neighbor 10.1.34.1 { address-family { ipv4-unicast { } } remote-as 65008 } parameters { bestpath { as-path { multipath-relax } } } system-as 500 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name spine0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address "10.1.12.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } ethernet eth2 { address "10.1.11.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.11.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.11.1 { address-family { ipv4-unicast { } } remote-as "65008" } neighbor 10.1.12.1 { address-family { ipv4-unicast { } } remote-as "65005" } parameters { bestpath { as-path { multipath-relax { } } } router-id "10.1.8.1" } system-as "800" } } system { config-management { commit-revisions "100" } conntrack { modules { ftp { } h323 { } nfs { } pptp { } sip { } sqlnet { } tftp { } } } console { device ttyS0 { speed "9600" } } host-name "spine1" login { user vyos { authentication { encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/" plaintext-password "" } } } time-zone "UTC" } // Warning: Do not remove the following line. // // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address 10.1.10.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.12.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.5.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.5.0/24 { } network 10.1.10.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.5.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.5.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.10.2 { address-family { ipv4-unicast { } } remote-as 500 } neighbor 10.1.12.2 { address-family { ipv4-unicast { } } remote-as 800 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.5.1 } system-as 65005 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address 10.1.34.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.11.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.8.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.8.0/24 { } network 10.1.11.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.8.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.8.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.11.2 { address-family { ipv4-unicast { } } remote-as 800 } neighbor 10.1.34.2 { address-family { ipv4-unicast { } } remote-as 500 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.8.1 } system-as 65008 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf1 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
點選關注,第一時間瞭解華為雲新鮮技術~