如何找到VEth裝置的對端介面VEth peer

南喬峰發表於2019-04-09

序言

瞭解容器網路的同學都知道容器之間是通過VEth裝置來進行容器間的網路通訊的, 即通過將VEth裝置的一端接在宿主機上, 另一端接在容器裡面來實現宿主機network namespace和容器network namespace的連線, 在這裡VEth裝置充當了連線兩個network namespace的一根虛擬網線的作用.

處在宿主機上的這一端的“網線介面”體現為一個宿主機上的網路介面, 直接在宿主機上通過ip a即可以看到, 一般形式為vethXXX (我們也可以通過ip -d link show <interface name>的命令來檢視裝置的型別), 但是當我們看到一串串以veth開頭加上一串隨機字串的介面時是不是一下子就蒙了? 到底這些介面跟另一端在容器裡面的介面是如何對應的? 這跟虛擬網線的另一端到底連線的是哪個容器?

下面就來分享兩種方法我總結的方法, 第一種也是官方推薦的做法, 第二種是自己突然靈感乍現想到的?, 所以趕緊記錄下來, 不知道有沒有跟我有同感的同學哈 : )

實驗環境

兩個執行在同一個節點上的Pod容器, 也可以自己通過docker run隨意建立兩個容器, 這裡就不糾結了.

[root@10-10-40-84 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 47m 10.222.1.3 10-10-40-93 <none> <none>
busybox2 1/1 Running 0 45m 10.222.1.4 10-10-40-93 <none> <none>
[root@10-10-40-84 ~]#
複製程式碼

docker ps的輸出

[root@10-10-40-93 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
70eebe80845b af2f74c517aa "sleep 3600" 31 minutes ago Up 31 minutes k8s_busybox_busybox2_default_247b9265-59f5-11e9-9c05-faf63cb42000_1
2060ba52f6ed af2f74c517aa "sleep 3600" 34 minutes ago Up 34 minutes k8s_busybox_busybox_default_c7bf5185-59f4-11e9-9c05-faf63cb42000_1
bcb7f08f8707 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_busybox2_default_247b9265-59f5-11e9-9c05-faf63cb42000_0
9a23d437bf97 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_busybox_default_c7bf5185-59f4-11e9-9c05-faf63cb42000_0
[root@10-10-40-93 ~]#
複製程式碼

先來看下兩個容器所在的宿主機上ip a輸出的情況

[root@10-10-40-93 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:e7:af:c1:b5:00 brd ff:ff:ff:ff:ff:ff
    inet 10.10.40.93/24 brd 10.10.40.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f8e7:afff:fec1:b500/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:83:4d:a3:4e:01 brd ff:ff:ff:ff:ff:ff
    inet 172.16.130.91/24 brd 172.16.130.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::f883:4dff:fea3:4e01/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:42:ad:df:4f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether be:1f:af:bb:6e:f5 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::bc1f:afff:febb:6ef5/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP qlen 1000
    link/ether e6:36:8b:52:21:62 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::e436:8bff:fe52:2162/64 scope link
       valid_lft forever preferred_lft forever
8: vethf0808a3e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether b2:2f:ed:b3:d1:66 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::b02f:edff:feb3:d166/64 scope link
       valid_lft forever preferred_lft forever
9: vethd5962a6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
    link/ether be:14:67:cb:39:79 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::bc14:67ff:fecb:3979/64 scope link
       valid_lft forever preferred_lft forever
[root@10-10-40-93 ~]#
複製程式碼

可以看到在宿主機上有兩個VEth介面vethf0808a3e和vethd5962a6c, 再通過ip -d link show驗證確實是兩個VEth介面

[root@10-10-40-93 ~]# ip -d link show vethf0808a3e
8: vethf0808a3e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT
    link/ether b2:2f:ed:b3:d1:66 brd ff:ff:ff:ff:ff:ff link-netnsid 1 promiscuity 1
    veth
    bridge_slave state forwarding priority 32 cost 2 hairpin on guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.e6:36:8b:52:21:62 designated_root 8000.e6:36:8b:52:21:62 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on addrgenmode eui64
[root@10-10-40-93 ~]# ip -d link show vethd5962a6c
9: vethd5962a6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT
    link/ether be:14:67:cb:39:79 brd ff:ff:ff:ff:ff:ff link-netnsid 2 promiscuity 1
    veth
    bridge_slave state forwarding priority 32 cost 2 hairpin on guard off root_block off fastleave off learning on flood on port_id 0x8003 port_no 0x3 designated_port 32771 designated_cost 0 designated_bridge 8000.e6:36:8b:52:21:62 designated_root 8000.e6:36:8b:52:21:62 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on addrgenmode eui64
[root@10-10-40-93 ~]#
複製程式碼

通過 brctl show可以看到兩個VEth介面都接在網橋cni0上

[root@10-10-40-93 ~]# brctl show
bridge name	bridge id	STP enabled	interfaces
cni0	8000.e6368b522162	no	vethd5962a6c
       vethf0808a3e
docker0	8000.024242addf4f	no
[root@10-10-40-93 ~]#
複製程式碼

通過ip a輸出的網路介面序號對應關係找到VEth裝置的對端介面

分別在兩個Pod(容器)當中執行ip a, 檢視容器當中的網路介面情況

[root@10-10-40-84 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 47m 10.222.1.3 10-10-40-93 <none> <none>
busybox2 1/1 Running 0 45m 10.222.1.4 10-10-40-93 <none> <none>
[root@10-10-40-84 ~]# kubectl exec -it busybox -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether a6:d1:b0:67:6a:55 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.3/24 scope global eth0
       valid_lft forever preferred_lft forever
[root@10-10-40-84 ~]# kubectl exec -it busybox2 -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 5a:d8:0d:16:64:5e brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.4/24 scope global eth0
       valid_lft forever preferred_lft forever
[root@10-10-40-84 ~]#
複製程式碼

可以看到busybox這個容器裡面看到的介面為eth0@if8, 對應宿主機上的序號為8的介面即vethf0808a3e. 而busybox2這個容器裡面看到的介面為eth0@if9, 對應宿主機上序號為9的網路介面vethd5962a6c, 下面來進行抓包驗證, 通過在busybox這個容器往外發ping包, 然後在宿主機上抓包看宿主機上的哪個VEth網路介面上能抓到ICMP報文

[root@10-10-40-84 ~]# kubectl exec -it busybox sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether a6:d1:b0:67:6a:55 brd ff:ff:ff:ff:ff:ff
    inet 10.222.1.3/24 scope global eth0
       valid_lft forever preferred_lft forever
/ # ping 10.222.1.4
PING 10.222.1.4 (10.222.1.4): 56 data bytes
^C
--- 10.222.1.4 ping statistics ---
49 packets transmitted, 0 packets received, 100% packet loss
/ #
複製程式碼
[root@10-10-40-93 ~]# tcpdump -nn -i vethf0808a3e icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethf0808a3e, link-type EN10MB (Ethernet), capture size 262144 bytes
21:36:23.262196 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 19, length 64
21:36:24.262413 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 20, length 64
21:36:25.262565 IP 10.222.1.3 > 10.222.1.4: ICMP echo request, id 5888, seq 21, length 64
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
[root@10-10-40-93 ~]# tcpdump -nn -i vethd5962a6c icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethd5962a6c, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@10-10-40-93 ~]#
複製程式碼

可以看到只有宿主機上的vethf0808a3e對應序號為8的網路介面上有抓到ICMP報文, 驗證通過

通過Linux Bridge上的轉發表來找到VEth裝置的對端介面

另外一種奇淫異巧則則是通過Linux Bridge這個裝置上的MAC地址對應關係來查詢VEth裝置的對端介面, 所有的VEth裝置的一端實際上都連線在Linux Bridge上, 而Linux Bridge作為一個網路包轉發的中間人, 當然是得知道兩端的情況才行, 不然怎麼做網路包的轉發呢?

  • 檢視Linux Bridge上的MAC和虛擬交換機埠對應關係
[root@10-10-40-93 ~]# brctl show
bridge name	bridge id	STP enabled	interfaces
cni0	8000.e6368b522162	no	vethd5962a6c
       vethf0808a3e
docker0	8000.024242addf4f	no
[root@10-10-40-93 ~]#
[root@10-10-40-93 ~]# brctl showmacs cni0
port no	mac addr	is local?	ageing timer
  3	5a:d8:0d:16:64:5e	no	80.94
  2	a6:d1:b0:67:6a:55	no	72.95
  2	b2:2f:ed:b3:d1:66	yes	0.00
  2	b2:2f:ed:b3:d1:66	yes	0.00
  3	be:14:67:cb:39:79	yes	0.00
  3	be:14:67:cb:39:79	yes	0.00
[root@10-10-40-93 ~]#
複製程式碼

可以看到Linux Bridge上總共有兩個介面, 介面2跟介面3, 前面兩個local標誌為no的表示的就是VEth裝置的對端, 埠號一致的表示同一個VEth裝置, 通過對比宿主機上ip a和容器當中ip a輸出的結果對MAC地址進行比對即可發現跟第一種方法的結果是一致的, 同樣可以通過抓包的方式來驗證 : )

相關文章