一、替代arp, ifconfig, route等命令

顯示網路卡和IP地址

root@openstack:~# ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
4: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:99:53:93:1b:47 brd ff:ff:ff:ff:ff:ff
10: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether fe:54:00:68:e0:04 brd ff:ff:ff:ff:ff:ff
35: br-tun: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 42:9b:ec:6c:f6:41 brd ff:ff:ff:ff:ff:ff
71: qbrf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
72: qvof38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 66:9e:9a:e1:25:37 brd ff:ff:ff:ff:ff:ff
73: qvbf38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UP qlen 1000
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
74: tapf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UNKNOWN qlen 500
    link/ether fe:16:3e:3d:68:e4 brd ff:ff:ff:ff:ff:ff
root@openstack:~# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
    inet 16.158.165.152/22 brd 16.158.167.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6631:50ff:fe43:57fa/64 scope link
       valid_lft forever preferred_lft forever
4: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
    inet 16.158.165.102/22 brd 16.158.167.255 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::905e:c9ff:fe4b:36ef/64 scope link
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:99:53:93:1b:47 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9036:18ff:fe6f:39bb/64 scope link
       valid_lft forever preferred_lft forever
35: br-tun: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 42:9b:ec:6c:f6:41 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::90c0:c4ff:fed2:3cfd/64 scope link
       valid_lft forever preferred_lft forever
71: qbrf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a811:6aff:fe0f:667f/64 scope link
       valid_lft forever preferred_lft forever
72: qvof38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 66:9e:9a:e1:25:37 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::649e:9aff:fee1:2537/64 scope link
       valid_lft forever preferred_lft forever
73: qvbf38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UP qlen 1000
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::94e0:4dff:fe68:c26b/64 scope link
       valid_lft forever preferred_lft forever
74: tapf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UNKNOWN qlen 500
    link/ether fe:16:3e:3d:68:e4 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe3d:68e4/64 scope link
      valid_lft forever preferred_lft forever

顯示路由

root@openstack:~# ip route show
default via 16.158.164.1 dev br-ex
16.158.164.0/22 dev br-ex proto kernel scope link src 16.158.165.102
16.158.164.0/22 dev eth0 proto kernel scope link src 16.158.165.152
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1

顯示ARP

root@openstack:~# ip neigh show
16.158.165.47 dev br-ex lladdr e4:11:5b:53:62:00 STALE
192.168.122.61 dev virbr0 lladdr 52:54:00:68:e0:04 STALE
16.158.164.1 dev br-ex lladdr 00:00:5e:00:01:15 DELAY
16.158.166.177 dev br-ex lladdr 00:26:99:d0:12:a9 STALE
16.158.164.3 dev br-ex lladdr 20:fd:f1:e4:c9:e8 STALE
16.158.165.87 dev br-ex lladdr 70:5a:b6:b3:dd:a5 STALE
16.158.166.150 dev br-ex FAILED
16.158.164.2 dev br-ex lladdr 20:fd:f1:e4:c9:b1 STALE

二、Rules: Routing Policy

Routing Table其實有三個：local, main, default

root@openstack:~# ip rule list
0: from all lookup local
32766: from all lookup main
32767: from all lookup default

原來的route命令修改的是main和local表

root@openstack:~# ip route list table local
broadcast 16.158.164.0 dev br-ex proto kernel scope link src 16.158.165.102
broadcast 16.158.164.0 dev eth0 proto kernel scope link src 16.158.165.152
local 16.158.165.102 dev br-ex proto kernel scope host src 16.158.165.102
local 16.158.165.152 dev eth0 proto kernel scope host src 16.158.165.152
broadcast 16.158.167.255 dev br-ex proto kernel scope link src 16.158.165.102
broadcast 16.158.167.255 dev eth0 proto kernel scope link src 16.158.165.152
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
broadcast 192.168.122.0 dev virbr0 proto kernel scope link src 192.168.122.1
local 192.168.122.1 dev virbr0 proto kernel scope host src 192.168.122.1
broadcast 192.168.122.255 dev virbr0 proto kernel scope link src 192.168.122.1
root@openstack:~# ip route list table main
default via 16.158.164.1 dev br-ex
16.158.164.0/22 dev br-ex proto kernel scope link src 16.158.165.102
16.158.164.0/22 dev eth0 proto kernel scope link src 16.158.165.152
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
root@openstack:~# ip route list table default

Simple source policy routing

我們來考慮下面的場景，我家裡接了兩個外網，一個到網通(用的光纖)，一個到電信(電話撥號)，這兩個Modem都連到我的NAT Router上，我把房子出租出去，有很多的室友，其中有一個室友僅僅訪問Email，因而想少付費，我想讓他僅僅使用電信的線，那麼我應該如何配置我的NAT Router呢？

原來的配置是這樣的

[ahu@home ahu]$ ip route list table main 
195.96.98.253 dev ppp2  proto kernel  scope link  src 212.64.78.148 
212.64.94.1 dev ppp0  proto kernel  scope link  src 212.64.94.251 
10.0.0.0/8 dev eth0  proto kernel  scope link  src 10.0.0.1 
127.0.0.0/8 dev lo  scope link 
default via 212.64.94.1 dev ppp0

預設都走快的路由

下面我新增一個Table，名字叫John

# echo 200 John >> /etc/iproute2/rt_tables
# ip rule add from 10.0.0.10 table John
# ip rule ls
0:	from all lookup local 
32765:	from 10.0.0.10 lookup John
32766:	from all lookup main 
32767:	from all lookup default

並設定規則從10.0.0.10來的包都檢視John這個路由表

在John路由表中新增規則

# ip route add default via 195.96.98.253 dev ppp2 table John
# ip route flush cache

預設的路由走慢的，達到了我的需求。

Routing for multiple uplinks/providers

$IF1是第一個Interface，它的IP是$IP1

$IF2是第二個Interface，它的IP是$IP2

$P1是Provider1的Gateway，Provider1的網路$P1_NET

$P2是Provider2的Gateway，Provider2的網路$P2_NET

我們要做的第一個事情是Split access.

建立兩個routing table, T1和T2，新增到/etc/iproute2/rt_tables.

	  ip route add $P1_NET dev $IF1 src $IP1 table T1
	  ip route add default via $P1 table T1
	  ip route add $P2_NET dev $IF2 src $IP2 table T2
	  ip route add default via $P2 table T2

在T1中設定，如果要到達$P1_NET，需要從網路卡$IF1出去

在T2中設定，如果要到達$P2_NET，需要從網路卡$IF2出去

設定main table

	    ip route add $P1_NET dev $IF1 src $IP1
	    ip route add $P2_NET dev $IF2 src $IP2

	    ip route add default via $P1

新增Rules

	    ip rule add from $IP1 table T1
	    ip rule add from $IP2 table T2

第二件事情是Load balancing

default gateway不能總是一個

ip route add default scope global nexthop via $P1 dev $IF1 weight 1 nexthop via $P2 dev $IF2 weight 1

GRE tunneling

在Router A上做如下配置：

ip tunnel add netb mode gre remote 172.19.20.21 local 172.16.17.18 ttl 255
ip link set netb up
ip addr add 10.0.1.1 dev netb
ip route add 10.0.2.0/24 dev netb

建立一個名為netb的tunnel，模式是GRE，遠端是172.19.20.21，此端是172.16.17.18

所有向10.0.2.0的包都透過這個Tunnel轉發

在Router B上做如下配置：

ip tunnel add neta mode gre remote 172.16.17.18 local 172.19.20.21 ttl 255
ip link set neta up
ip addr add 10.0.2.1 dev neta
ip route add 10.0.1.0/24 dev neta

Queueing Disciplines for Bandwidth Management

With queueing we determine the way in which data is SENT. It is important to realise that we can only shape data that we transmit.

With the way the Internet works, we have no direct control of what people send us.

我們只能控制傳送，無法控制接收，所以傳送叫shaping，我們可以控制我們的輸出流的形態，接收只能設定policy，拒絕或者接受。

Simple, classless Queueing Disciplines

pfifo_fast

First In, First Out

說是先入先出，實際上一個Queue包含三個Band，每個Band都是先入先出，Band 0優先順序最高，它不處理完畢，Band 1不處理，其次是Band 2

在IP頭裡面有TOS (Type of service)，有一個priomap，是一個對映，將不同的TOS對映給不同的Bind。

root@openstack:~# tc qdisc show dev eth0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

txqueuelen

The length of this queue is gleaned from the interface configuration, which you can see and set with ifconfig and ip.

Token Bucket Filter

Token按照一定的速度來，每個Token都帶走一個Packet，當Packet比Token快的時候，會保證按照Token的速度傳送，不至於傳送太快。

當Packet的速度比Token慢的時候，Token會累積，但是不會無限累積，累積到Bucket大小為止。如果累積的太多了，忽然來了大量的資料，導致瞬時間有大量的包傳送。有了Bucket限制，即便積累滿了Bucket，大量資料來的時候，最多帶走所有的Bucket的Token，然後又按照Token到來的速度慢慢傳送了。

limit or latency

Limit is the number of bytes that can be queued waiting for tokens to become available.

burst/buffer/maxburst

Size of the bucket, in bytes.

rate

The speedknob.

peakrate

If tokens are available, and packets arrive, they are sent out immediately by default.
That may not be what you want, especially if you have a large bucket.
The peakrate can be used to specify how quickly the bucket is allowed to be depleted.

# tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540

Stochastic Fairness Queueing

隨機公平佇列

A TCP/IP flow can be uniquely identified by the following parameters within a certain time period:
Source and Destination IP address
Source and Destination Port
Layer 4 Protocol (TCP/UDP/ICMP)

有很多的FIFO的佇列，TCP Session或者UDP stream會被分配到某個佇列。包會RoundRobin的從各個佇列中取出傳送。

這樣不會一個Session佔據所有的流量。

但不是每一個Session都有一個佇列，而是有一個Hash演算法，將大量的Session分配到有限的佇列中。

這樣兩個Session會共享一個佇列，也有可能互相影響。

Hash函式會經常改變，從而session不會總是相互影響。

perturb

Reconfigure hashing once this many seconds.

quantum

Amount of bytes a stream is allowed to dequeue before the next queue gets a turn.

limit

The total number of packets that will be queued by this SFQ

# tc qdisc add dev ppp0 root sfq perturb 10
# tc -s -d qdisc ls
qdisc sfq 800c: dev ppp0 quantum 1514b limit 128p flows 128/1024 perturb 10sec 
 Sent 4812 bytes 62 pkts (dropped 0, overlimits 0)

The number 800c: is the automatically assigned handle number, limit means that 128 packets can wait in this queue. There are 1024 hashbuckets available for accounting, of which 128 can be active at a time (no more packets fit in the queue!) Once every 10 seconds, the hashes are reconfigured.

Classful Queueing Disciplines

When traffic enters a classful qdisc, The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.

The qdisc family: roots, handles, siblings and parents:

Each interface has one egress 'root qdisc'.

Each qdisc and class is assigned a handle, which can be used by later configuration statements to refer to that qdisc.

The handles of these qdiscs consist of two parts, a major number and a minor number : <major>:<minor>.

The PRIO qdisc

它和FIFO Fast很類似，也分多個Band，但是它的每個Band其實是一個Class，而且數目可以改變。預設是三個Band。

每一個Band也不一定是FIFO，而是任何型別的qdisc.

預設也是根據TOS來決定去那個Class，Band是0-2，而Class是1-3.

當然也可以使用filter來決定去哪個Class

ands

Number of bands to create. Each band is in fact a class. If you change this number, you must also change:

priomap

If you do not provide tc filters to classify traffic, the PRIO qdisc looks at the TC_PRIO priority to decide how to enqueue traffic.

# tc qdisc add dev eth0 root handle 1: prio 
## This *instantly* creates classes 1:1, 1:2, 1:3
  
# tc qdisc add dev eth0 parent 1:1 handle 10: sfq
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
# tc qdisc add dev eth0 parent 1:3 handle 30: sfq

# tc -s qdisc ls dev eth0 
qdisc sfq 30: quantum 1514b 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc tbf 20: rate 20Kbit burst 1599b lat 667.6ms 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc sfq 10: quantum 1514b 
 Sent 132 bytes 2 pkts (dropped 0, overlimits 0) 

 qdisc prio 1: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 174 bytes 3 pkts (dropped 0, overlimits 0)

Hierarchical Token Bucket

# tc qdisc add dev eth0 root handle 1: htb default 30

# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k

# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k

The author then recommends SFQ for beneath these classes:

# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

Add the filters which direct traffic to the right classes:

# U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
# $U32 match ip dport 80 0xffff flowid 1:10
# $U32 match ip sport 25 0xffff flowid 1:20

HTB certainly looks wonderful - if 10: and 20: both have their guaranteed bandwidth, and more is left to divide, they borrow in a 5:3 ratio, just as you would expect.

Unclassified traffic gets routed to 30:, which has little bandwidth of its own but can borrow everything that is left over.

A fundamental part of the HTB qdisc is the borrowing mechanism. Children classes borrow tokens from their parents once they have exceeded . A child class will continue to attempt to borrow until it reaches , at which point it will begin to queue packets for transmission until more tokens/ctokens are available. As there are only two primary types of classes which can be created with HTB the following table and diagram identify the various possible states and the behaviour of the borrowing mechanisms.

Table 2. HTB class states and potential actions taken

type of class	class state	HTB internal state	action taken
leaf	< rate	HTB_CAN_SEND	Leaf class will dequeue queued bytes up to available tokens (no more than burst packets)
leaf	> rate, < ceil	HTB_MAY_BORROW	Leaf class will attempt to borrow tokens/ctokens from parent class. If tokens are available, they will be lent in quantum increments and the leaf class will dequeue up to cburst bytes
leaf	> ceil	HTB_CANT_SEND	No packets will be dequeued. This will cause packet delay and will increase latency to meet the desired rate.
inner, root	< rate	HTB_CAN_SEND	Inner class will lend tokens to children.
inner, root	> rate, < ceil	HTB_MAY_BORROW	Inner class will attempt to borrow tokens/ctokens from parent class, lending them to competing children inquantum increments per request.
inner, root	> ceil	HTB_CANT_SEND	Inner class will not attempt to borrow from its parent and will not lend tokens/ctokens to children classes.

This diagram identifies the flow of borrowed tokens and the manner in which tokens are charged to parent classes. In order for the borrowing model to work, each class must have an accurate count of the number of tokens used by itself and all of its children. For this reason, any token used in a child or leaf class is charged to each parent class until the root class is reached.

Any child class which wishes to borrow a token will request a token from its parent class, which if it is also over its rate will request to borrow from its parent class until either a token is located or the root class is reached. So the borrowing of tokens flows toward the leaf classes and the charging of the usage of tokens flows toward the root class.

Note in this diagram that there are several HTB root classes. Each of these root classes can simulate a virtual circuit.

7.1.4. HTB class parameters

default

An optional parameter with every HTB object, the default default is 0, which cause any unclassified traffic to be dequeued at hardware speed, completely bypassing any of the classes attached to the root qdisc.

rate

Used to set the minimum desired speed to which to limit transmitted traffic. This can be considered the equivalent of a committed information rate (CIR), or the guaranteed bandwidth for a given leaf class.

ceil

Used to set the maximum desired speed to which to limit the transmitted traffic. The borrowing model should illustrate how this parameter is used. This can be considered the equivalent of “burstable bandwidth”.

burst

This is the size of the bucket (see ). HTB will dequeue burst bytes before awaiting the arrival of more tokens.

cburst

This is the size of the bucket (see ). HTB will dequeue cburst bytes before awaiting the arrival of more ctokens.

quantum

This is a key parameter used by HTB to control borrowing. Normally, the correct quantum is calculated by HTB, not specified by the user. Tweaking this parameter can have tremendous effects on borrowing and shaping under contention, because it is used both to split traffic between children classes over (but below ) and to transmit packets from these same classes.

r2q

Also, usually calculated for the user, r2q is a hint to HTB to help determine the optimal for a particular class.

mtu

prio

Netfilter & iproute - marking packets

我們可以在iptable中設定mark，然後在route的時候使用mark

this command marks all packets destined for port 25, outgoing mail:

# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 \
 -j MARK --set-mark 1

We've already marked the packets with a '1', we now instruct the routing policy database to act on this:

# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0:	from all lookup local 
32764:	from all fwmark        1 lookup mail.out 
32766:	from all lookup main 
32767:	from all lookup default

Now we generate a route to the slow but cheap link in the mail.out table:

# /sbin/ip route add default via 195.96.98.253 dev ppp0 table mail.out

The u32 classifier

The U32 filter is the most advanced filter available in the current implementation.

# tc filter add dev eth0 protocol ip parent 1:0 pref 10 u32 \
  match u32 00100000 00ff0000 at 0 flowid 1:10

iproute2學習筆記