模擬網路狀態的利器TC

小米運維發表於2018-11-28

本文主要介紹了可以模擬出多種複雜的網際網路傳輸效能的工具——TC,及具體的模擬方法。

上篇文章回顧:Nginx請求處理流程你瞭解嗎?

在日常生產環境中,如何判斷網路執行狀況是否正常是一個讓大家比較耗神的一件事情,因為我們往往被某些不太友好的人以所謂的“網路問題”甩鍋至此並開始了我們洗白的經歷,今天給大家介紹一個分析網路狀態的好幫手——TC。

說到TC,我們就不得不談談Netem(Network Emulator),Netem是Linux2.6及以上核心版本提供的一個網路模擬功能模組。該功能模組可可以用來在效能良好的區域網環境中,模擬出複雜的網際網路傳輸效能。例如:低頻寬、傳輸延遲、丟包等等等情況。

TC是Linu系統中的一個使用者工具,全名為Traffic Control(流量控制)。TC可以用來控制Netem模組的工作模式,也就是說如果想使用Netem需要至少兩個條件,一是核心中的Netem模組被啟用,另一個是要對應的使用者態工具TC,它們之間的關係你可以理解為netfilter框架和iptables的關係。

下面就讓我們一起來看看TC的有用之處(其實TC有很多功能,我們今天只介紹模擬網路環境的用處),我們先了解一下如下引數代表的意義再開始實驗。

Add:表示為指定網路卡新增Netem配置。
Change:表示修改已經存在的Netem配置到新的值。
Replace:表示替換已經存在的Netem配置的值。
del:表示刪除網路卡上的Netem配置。

1   模擬延遲傳輸

如果你想在一個區域網裡模擬遠距離傳輸的延遲可以用這個方法,比如實際使用者訪問網站延遲為 51 ms,而你測試環境網路互動只需要 1ms,那麼只要新增 50ms 額外延遲就行。

[root@tj1-vm-search020 ~]# tc qdisc add dev eth0 root netem delay 50ms [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=50.0 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=50.0 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=50.0 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=50.0 ms ^C --- tj1-vm-search020.kscn ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 50.037/50.044/50.063/0.223 ms

如果在網路中看到非常穩定的時延,很可能是某個地方加了定時器,因為網路線路很複雜,傳輸過程一定會有變化。因此實際情況網路延遲一定會有變化的,Netem 也考慮到這一點,提供了額外的引數來控制延遲的時間分佈。完整的引數列表為:

DELAY := delay TIME [ JITTER [ CORRELATION ]]]
[ distribution { uniform | normal | pareto |  paretonormal } ]

除了延遲時間 TIME 之外,還有三個可選引數:

  • JITTER:抖動,增加一個隨機時間長度,讓延遲時間出現在某個範圍。

  • CORRELATION:相關,下一個報文延遲時間和上一個報文的相關係數。

  • distribution:分佈,延遲的分佈模式。可以選擇的值有 uniform、normal、pareto 和 paretonormal。

先說說 JITTER,如果設定為 20ms,那麼報文延遲的時間在 50ms  ± 20ms 之間,具體值隨機選擇:

[root@tj1-vm-search020 ~]# tc qdisc replace dev eth0 root netem delay 50ms 20ms [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=69.4 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=51.9 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=66.3 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=57.4 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=46.0 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=33.8 ms ^C --- tj1-vm-search020.kscn ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5007ms rtt min/avg/max/mdev = 33.877/54.178/69.446/12.063 ms

CORRELATION 指相關性,因為網路狀況是平滑變化的,短時間裡相鄰報文的延遲應該是近似的而不是完全隨機的。這個值是個百分比,如果為 100%,就退化到固定延遲的情況;如果是 0% 則退化到隨機延遲的情況。

[root@tj1-vm-search020 ~]# tc qdisc replace dev eth0 root netem delay 50ms 20ms 30% [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=47.6 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=58.3 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=47.4 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=33.8 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=61.0 ms ^C --- tj1-vm-search020.kscn ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 33.898/49.668/61.050/9.610 ms

報文的分佈和很多現實事件一樣都滿足某種統計規律,比如最常用的正態分佈。因此為了更逼近現實情況,可以使用 distribution 引數來限制它的延遲分佈模型。比如讓報文延遲時間滿足正態分佈:

[root@tj1-vm-search020 ~]#  tc qdisc replace dev eth0 root netem delay 50ms 20ms distribution normal [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=41.7 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=44.3 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=50.7 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=57.2 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=37.6 ms ^C --- tj1-vm-search020.kscn ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 37.675/46.350/57.249/6.912 ms

這樣的話,大部分的延遲會在平均值的一定範圍內,而很少接近出現最大值和最小值的延遲。
其他分佈方法包括:uniform、pareto 和 paretonormal,這些分佈方法感興趣的讀者可以自行了解。對於大多數情況,隨機在某個時間範圍裡延遲就能滿足需求的。

2  模擬丟包率

另一個常見的網路異常是因為丟包,丟包會導致重傳,從而增加網路鏈路的流量和延遲。Netem 的 loss 引數可以模擬丟包率,比如傳送的報文有 50% 的丟包率(為了容易用 ping 看出來,所以這個數字我選的很大,實際情況丟包率可能比這個小很多,比如 0.5%):

[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem loss 50% [root@tj1-vm-search019 ~]# ping -c 10 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=7 ttl=64 time=0.036 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.037 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=9 ttl=64 time=0.035 ms  --- tj1-vm-search020.kscn ping statistics ---
10 packets transmitted, 5 received, 50% packet loss, time 9000ms rtt min/avg/max/mdev = 0.035/0.039/0.049/0.005 ms

可以從 icmp_seq 序號看出來大約有一半的報文丟掉了,和延遲類似丟包率也可以增加一個相關係數,表示後一個報文丟包機率和它前一個報文的相關性。

3  模擬包重複

報文重複和丟包的引數類似,就是重複率和相關性兩個引數,比如隨機產生 50% 重複的包:

[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem duplicate 50% [root@tj1-vm-search019 ~]# ping -c 10 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.044 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.045 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.050 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.033 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.037 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.033 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.038 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=0.037 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.036 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.039 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=7 ttl=64 time=0.029 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.030 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.034 ms (DUP!)
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=9 ttl=64 time=0.037 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=10 ttl=64 time=0.036 ms  --- tj1-vm-search020.kscn ping statistics ---
10 packets transmitted, 10 received, +6 duplicates, 0% packet loss, time 9001ms rtt min/avg/max/mdev = 0.029/0.037/0.050/0.007 ms

4   模擬包損壞

報文損壞和報文重複的引數也類似,比如隨機產生 2% 損壞的報文(在報文的隨機位置造成一個位元的錯誤)。

[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem corrupt 2% [root@tj1-vm-search019 ~]# ping  tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.040 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.033 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.034 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=0.033 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.043 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.039 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=10 ttl=64 time=0.056 ms wrong data byte #39 should be 0x27 but was 0xa7
#16    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 a7 28 29 2a 2b 2c 2d 2e 2f #48    30 31 32 33 34 35 36 37
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=11 ttl=64 time=0.046 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=12 ttl=64 time=0.036 ms Warning: time of day goes back (-4773815605012725725us), taking countermeasures. Warning: time of day goes back (-4773815605012725708us), taking countermeasures. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=13 ttl=64 time=0.000 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=15 ttl=64 time=0.045 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=16 ttl=64 time=0.043 ms ^C --- tj1-vm-search020.kscn ping statistics ---
16 packets transmitted, 13 received, 18% packet loss, time 15001ms rtt min/avg/max/mdev = 0.000/0.037/0.056/0.014 ms

5  模擬包亂序

網路傳輸並不能保證順序,傳輸層 TCP 會對報文進行重組保證順序,所以報文亂序對應用的影響比上面的幾種問題要小。

報文亂序和前面的引數不太一樣,因為上面的報文問題都是獨立的。針對單個報文做操作就行,而亂序則牽涉到多個報文的重組。模擬報亂序一定會用到延遲(因為模擬亂序的本質就是把一些包延遲傳送),Netem 有兩種方法可以做。
第一種是固定的每隔一定數量的報文就亂序一次。

# 每 5 個報文(第 51015…報文)會正常傳送,其他的報文延遲 50ms。 [root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem reorder 50% gap 3 delay 50ms [root@tj1-vm-search019 ~]# ping  -i 0.01 tj1-vm-search020.kscn | more PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=10.5 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=50.0 ms wrong data byte #21 should be 0x15 but was 0x5
#16    10 11 12 13 14 5 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f #48    30 31 32 33 34 35 36 37
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=50.0 ms

要想看到 ping 報文的亂序,我們要保證傳送報文的間隔小於報文的延遲時間 50ms,這裡用 -i 0.01 把傳送間隔設定為 10ms。
第二種方法的亂序是相對隨機的,使用機率來選擇亂序的報文。

$ tc qdisc change dev enp0s5 root netem reorder 50% 15% delay 300ms [root@tj1-vm-search019 ~]# ping  -i 0.01 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data.
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=11.5 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=51.5 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=71.5 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=111 ms
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=13 ttl=64 time=85.0 ms wrong data byte #51 should be 0x33 but was 0x23
#16    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f #48    30 31 32 23 34 35 36 37
64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=12 ttl=64 time=105 ms

50% 的報文會正常傳送,其他報文(1-50%)延遲 300ms 傳送,這裡選擇的延遲很大是為了能夠明顯看出來亂序的結果。

結語

本文介紹了TC在模擬網路狀態的幾種應用場景,實際上,TC作為Linux提供的高階流量控制工具,還有很多高階用法,諸入SHAPING(限制)、SCHEDULING(排程)、POLICING(策略)、DROPPING(丟棄)、QDISC(排隊規則)、CLASS(類)、FILTER(過濾器)。本文無法盡述,僅希望能給大家帶來一些基礎認識,激發大家深入瞭解TC。

模擬網路狀態的利器TC


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31559359/viewspace-2221929/,如需轉載,請註明出處,否則將追究法律責任。

相關文章