RFS 理解

lxgeek發表於2014-12-24

1.背景

網路卡接收一個資料包的情況下，會經過三個階段：

- 網路卡產生硬體中斷通知CPU有包到達

- 通過軟中斷處理此資料包

- 在使用者態程式處理此資料包

在SMP體系下，這三個階段有可能在3個不同的CPU上處理，如下圖所示：

而RFS的目標就是增加CPU快取的命中率從而提高網路延遲。當使用RFS後，其效果如下：

2.實現原理

當使用者程式呼叫 revmsg() 或者 sendmsg()的時候，RFS會將此使用者程式執行的CPU id存入hash表；

而當有關使用者程式的資料包到達的時候，RFS嘗試從hash表中取出相應的CPU id, 並將資料包放置

到此CPU的佇列，從而對效能進行優化。

3.重要資料結構

/*
* The rps_sock_flow_table contains mappings of flows to the last CPU
* on which they were processed by the application (set in recvmsg).
*/
struct rps_sock_flow_table {
    unsigned int mask;
    u16 ents[0];
};
#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_sock_flow_table) + \
    ((_num) * sizeof(u16)))

View Code

結構體 rps_sock_flow_table 實現了一個hash表，RFS會將其宣告一個全域性變數用於存放所有sock對應的CPU。

/*
* The rps_dev_flow structure contains the mapping of a flow to a CPU, the
* tail pointer for that CPU's input queue at the time of last enqueue, and
* a hardware filter index.
*/
struct rps_dev_flow {
    u16 cpu;     //此鏈路上次使用的cpu
    u16 filter;
    unsigned int last_qtail;   //此裝置佇列入隊的sk_buff的個數
};
#define RPS_NO_FILTER 0xffff

/*
* The rps_dev_flow_table structure contains a table of flow mappings.
*/
struct rps_dev_flow_table {
    unsigned int mask;
    struct rcu_head rcu;
    struct rps_dev_flow flows[0]; //實現hash表
};     
#define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \
    ((_num) * sizeof(struct rps_dev_flow)))

View Code

結構體 rps_dev_flow_table 是針對一個裝置佇列

4.具體實現

使用者程式使用revmsg() 或者 sendmsg()的時候設定CPU id。

int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
         size_t size, int flags)
{  
    struct sock *sk = sock->sk;
    int addr_len = 0;
    int err;
   
    sock_rps_record_flow(sk);   //設定CPU id
   
    err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
                   flags & ~MSG_DONTWAIT, &addr_len);
    if (err >= 0)
        msg->msg_namelen = addr_len;
    return err;
}
EXPORT_SYMBOL(inet_recvmsg);

View Code

當有資料包進行了響應後，會呼叫get_rps_cpu()選擇合適的CPU id。其關鍵程式碼如下：

3117     hash = skb_get_hash(skb);
3118     if (!hash)
3119         goto done;
3120
3121     flow_table = rcu_dereference(rxqueue->rps_flow_table);     //裝置佇列的hash表
3122     sock_flow_table = rcu_dereference(rps_sock_flow_table);    //全域性的hash表
3123     if (flow_table && sock_flow_table) {
3124         u16 next_cpu;
3125         struct rps_dev_flow *rflow;
3126
3127         rflow = &flow_table->flows[hash & flow_table->mask];
3128         tcpu = rflow->cpu;  
3129
3130         next_cpu = sock_flow_table->ents[hash & sock_flow_table->mask];   //得到使用者程式執行的CPU id
3131
3132         /*
3133          * If the desired CPU (where last recvmsg was done) is
3134          * different from current CPU (one in the rx-queue flow
3135          * table entry), switch if one of the following holds:
3136          *   - Current CPU is unset (equal to RPS_NO_CPU).
3137          *   - Current CPU is offline.
3138          *   - The current CPU's queue tail has advanced beyond the
3139          *     last packet that was enqueued using this table entry.
3140          *     This guarantees that all previous packets for the flow
3141          *     have been dequeued, thus preserving in order delivery.
3142          */
3143         if (unlikely(tcpu != next_cpu) &&
3144             (tcpu == RPS_NO_CPU || !cpu_online(tcpu) ||
3145              ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
3146               rflow->last_qtail)) >= 0)) {
3147             tcpu = next_cpu;
3148             rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
3149         }
3150
3151         if (tcpu != RPS_NO_CPU && cpu_online(tcpu)) {
3152             *rflowp = rflow;
3153             cpu = tcpu;
3154             goto done;
3155         }
3156     }

View Code

上面的程式碼中第3145行比較難理解，資料結構 softnet_data用於管理進出的流量，他有兩個關鍵的變數：

2374 #ifdef CONFIG_RPS
2375     /* Elements below can be accessed between CPUs for RPS */
2376     struct call_single_data csd ____cacheline_aligned_in_smp;
2377     struct softnet_data *rps_ipi_next;
2378     unsigned int        cpu;
2379     unsigned int        input_queue_head;   //佇列頭，也可以理解為出隊的位置
2380     unsigned int        input_queue_tail;     //佇列尾，也可以理解為入隊的位置 
2381 #endif

View Code

表示式 (int)(per_cpu(softnet_data, tcpu).input_queue_head 求出了在tcpu 這個CPU上的出隊數目，而rflow->last_qtail

代表裝置佇列上此sock對應的最後入隊的位置，如果出隊數目大於入隊數目，那麼說明這一鏈路上的包都處理完畢，不會

出現亂序處理的包。第3143的if 語句就是為了防止亂序包的出現，假如是多程式或者多執行緒同時處理一個socket，那麼此

socket對應的CPU id就會不停變化。

參考文獻：

http://www.pagefault.info/?p=115

http://syuu.dokukino.com/2013/05/linux-kernel-features-for-high-speed.html

https://www.kernel.org/doc/Documentation/networking/scaling.txt

standby database No RFS 程式
2015-06-11
Database
網路卡優化RPS/RFS
2018-01-24
優化
解壓三星.rfs檔案
2011-07-18
rfs (PID:146054): Database mount ID mismatch案例
2024-07-08
Database
ORA-16401: archivelog rejected by RFS ---解決方法
2011-07-03
Hive
Oracle DG 出現 RFS[6]: No standby redo logfiles created for thread 1
2017-03-07
Oraclethread
oracle 11g data guard 中RFS、MRP程式的說明
2015-11-26
Oracle
linux kernel 關於RSS/RPS/RFS/XPS的介紹
2018-04-24
Linux
19c ADG報錯Error 1094 attaching to RFS for reconnect
2022-12-02
Error
理解 this
2021-06-09
理解This
2016-03-29
LSTM理解
2020-08-03
Socket理解
2020-04-04
zookeeper理解
2019-05-10
YYCache理解
2019-03-04
Socket 理解
2019-06-18
理解 HTTP
2019-02-14
HTTP
理解haslayout
2019-02-10
理解sizeof
2019-01-11
理解TypeScript
2018-07-31
TypeScript
理解 invokedynamic
2017-10-12
理解 UDP
2017-03-02
UDP
理解"熵"
2016-12-21
熵
BFC理解
2018-06-06
理解 DocumentFragment
2017-09-02
Fragment
理解BFC
2018-06-29
理解 OpenStack
2016-07-04
理解 MEF
2014-01-14
MAXPIECESIZE理解
2013-11-20
理解模板
2013-12-07
MPTCP 理解
2014-12-26
TCP
理解模版
2013-11-11
Git理解
2016-02-18
Git
理解CBO
2005-06-01
jvm理解
2016-01-21
JVM
理解inode
2011-12-04
概念理解
2016-03-12
IOC理解
2009-12-18

RFS 理解

相關文章