之前在做waf併發壓力測試的時候,遇到一個問題,儀器測試正常,但是真實環境測試超時丟包的驗證的時候,併發cps都很低。
檢視cat /proc/net/netstat發現OfoPruned 對應值很大,看核心程式碼才發現,記憶體不夠或rmem超過sk_rcvbuf,就會私房ofo佇列,還是全部釋放。當時將全部釋放改為釋放最高的50%,效果明顯。
今天檢視新的核心發現依舊修改了。
在TCP套介面接收資料過程中,如果套介面接收快取已經大於限定的套介面快取限值,或者TCP系統佔用的快取已超過限定的總閾值,核心將使用tcp_prune_queue函式嘗試回收接收佇列佔用的快取。首先使用tcp_collapse_ofo_queue函式嘗試合併out_of_order_queue佇列中的重複資料,之後使用tcp_collapse函式嘗試將sk_receive_queue佇列中的資料摺疊到少量的skb結構中;最後如果接收快取還是佔用過高,呼叫函式tcp_prune_ofo_queue刪除out_of_order_queue佇列中的資料包。
目前看最新的改動原則有:
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
1、讓漏洞有機會被填補。這意味著如果資料包的序列位於傳入資料包序列之前。
2、如果有數千個資料包停留在那裡,則不會增加太大的延遲。(但如果應用程式縮小 SO_RCVBUF,我們仍然可能會此處釋放整個佇列)
針對這一塊最新的核心改動如下:
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct rb_node *node, *prev;
bool pruned = false;
int goal;
if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
return false;
goal = sk->sk_rcvbuf >> 3;
node = &tp->ooo_last_skb->rbnode;
do {
struct sk_buff *skb = rb_to_skb(node);
/* If incoming skb would land last in ofo queue, stop pruning. */
if (after(TCP_SKB_CB(in_skb)->seq, TCP_SKB_CB(skb)->seq))
break;
pruned = true;
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
goal -= skb->truesize;
tcp_drop_reason(sk, skb, SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
tp->ooo_last_skb = rb_to_skb(prev);
if (!prev || goal <= 0) {
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
break;
goal = sk->sk_rcvbuf >> 3;
}
node = prev;
} while (node);
if (pruned) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
* is in a sad state like this, we care only about integrity
* of the connection not performance.
*/
if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt);
}
return pruned;
}
需要注意的是:
1、/* If incoming skb would land last in ofo queue, stop pruning. */---
2、只有發生了重傳佇列修剪才會重置sack選項資訊
當時修改核心的時候,預設刪除50%的量直接就幹了