free batches of packets in tcp_prune_ofo_queue()

codestacklinuxer發表於2024-06-01

  之前在做waf併發壓力測試的時候,遇到一個問題,儀器測試正常,但是真實環境測試超時丟包的驗證的時候,併發cps都很低。

檢視cat /proc/net/netstat發現OfoPruned 對應值很大,看核心程式碼才發現,記憶體不夠或rmem超過sk_rcvbuf,就會私房ofo佇列,還是全部釋放。當時將全部釋放改為釋放最高的50%,效果明顯。

今天檢視新的核心發現依舊修改了。

  在TCP套介面接收資料過程中,如果套介面接收快取已經大於限定的套介面快取限值,或者TCP系統佔用的快取已超過限定的總閾值,核心將使用tcp_prune_queue函式嘗試回收接收佇列佔用的快取。首先使用tcp_collapse_ofo_queue函式嘗試合併out_of_order_queue佇列中的重複資料,之後使用tcp_collapse函式嘗試將sk_receive_queue佇列中的資料摺疊到少量的skb結構中;最後如果接收快取還是佔用過高,呼叫函式tcp_prune_ofo_queue刪除out_of_order_queue佇列中的資料包。

目前看最新的改動原則有:

/*
 * Clean the out-of-order queue to make room.
 * We drop high sequences packets to :
 * 1) Let a chance for holes to be filled.
 *    This means we do not drop packets from ooo queue if their sequence
 *    is before incoming packet sequence.
 * 2) not add too big latencies if thousands of packets sit there.
 *    (But if application shrinks SO_RCVBUF, we could still end up
 *     freeing whole queue here)
 * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
 *
 * Return true if queue has shrunk.
 */

1、讓漏洞有機會被填補。這意味著如果資料包的序列位於傳入資料包序列之前。

2、如果有數千個資料包停留在那裡,則不會增加太大的延遲。(但如果應用程式縮小 SO_RCVBUF,我們仍然可能會此處釋放整個佇列)

針對這一塊最新的核心改動如下:

/*
 * Clean the out-of-order queue to make room.
 * We drop high sequences packets to :
 * 1) Let a chance for holes to be filled.
 *    This means we do not drop packets from ooo queue if their sequence
 *    is before incoming packet sequence.
 * 2) not add too big latencies if thousands of packets sit there.
 *    (But if application shrinks SO_RCVBUF, we could still end up
 *     freeing whole queue here)
 * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
 *
 * Return true if queue has shrunk.
 */
static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct rb_node *node, *prev;
	bool pruned = false;
	int goal;

	if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
		return false;

	goal = sk->sk_rcvbuf >> 3;
	node = &tp->ooo_last_skb->rbnode;

	do {
		struct sk_buff *skb = rb_to_skb(node);

		/* If incoming skb would land last in ofo queue, stop pruning. */
		if (after(TCP_SKB_CB(in_skb)->seq, TCP_SKB_CB(skb)->seq))
			break;
		pruned = true;
		prev = rb_prev(node);
		rb_erase(node, &tp->out_of_order_queue);
		goal -= skb->truesize;
		tcp_drop_reason(sk, skb, SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
		tp->ooo_last_skb = rb_to_skb(prev);
		if (!prev || goal <= 0) {
			if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
			    !tcp_under_memory_pressure(sk))
				break;
			goal = sk->sk_rcvbuf >> 3;
		}
		node = prev;
	} while (node);

	if (pruned) {
		NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
		/* Reset SACK state.  A conforming SACK implementation will
		 * do the same at a timeout based retransmit.  When a connection
		 * is in a sad state like this, we care only about integrity
		 * of the connection not performance.
		 */
		if (tp->rx_opt.sack_ok)
			tcp_sack_reset(&tp->rx_opt);
	}
	return pruned;
}

需要注意的是:

1、/* If incoming skb would land last in ofo queue, stop pruning. */---

2、只有發生了重傳佇列修剪才會重置sack選項資訊

當時修改核心的時候,預設刪除50%的量直接就幹了

相關文章