排查 “Detected Tx Unit Hang”問題

lxgeek發表於2014-10-22

實現功能:

使用自己已經分配的記憶體讓skb->data指向,而不是使用alloc_malloc()。

部分程式碼如下:   

 1             /*
 2              * build a new sk_buff
 3              */
 4             //struct sk_buff *send_skb = kmem_cache_alloc_node(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA, NUMA_NO_NODE);
 5             struct sk_buff *send_skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA);
 6 
 7             if (!send_skb) {
 8                 //spin_unlock(&lock);
 9                 return NF_DROP;
10             }
11             
12             //printk("what2\n");
13             memset(send_skb, 0, offsetof(struct sk_buff, tail));
14             atomic_set(&send_skb->users, 2);
15             send_skb->cloned = 0;
16             
17             send_skb->head = mmap_buf + 1024;
18             send_skb->data = mmap_buf + 1024;
19             

第18行,mmap_buf是提前分配的記憶體。

在/var/log/messages中網路卡驅動會輸出錯誤資訊:

 1 ep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
 2 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <13>
 3 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
 4 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
 5 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
 6 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
 7 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <15>
 8 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <1>, <1eb>
 9 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1eb>
10 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <1>
11 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
12 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <14>
13 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
14 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
15 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
16 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
17 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <4>
18 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
19 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
20 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
21 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
22 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <12>
23 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <5>, <1ef>
24 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ef>
25 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <5>
26 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
27 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <2>
28 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <2>, <1ec>
29 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ec>
30 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <2>
31 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

在排除各種原因後,定位為分配的mmap_buf存在問題。使用vmalloc()分配不正確,改為kmalloc()後正常。

《Linux核心設計與實現》第12.5節有解釋,應該是:網路卡裝置要求分配的實體地址連續,而vmalloc()只是虛擬地址連續

 

相關文章