HPUX下回收非正常狀態的socket (FIN_WAIT_2)

liuhaimiao發表於2015-05-26

回收非正常狀態的socket (FIN_WAIT_2)

症狀描述:前置機收發某客戶的報文時經常無端中止,後使用netstat檢視發現我們與該客戶的連線數超過了1000,且絕大的連線狀態都是FIN_WAIT_2.

#netstat -an|grep 10.116.50.30
tcp        0      0  192.168.129.44.64306   10.116.50.30.53081       FIN_WAIT_2
tcp        0      0  192.168.129.44.49734   10.116.50.30.53660       FIN_WAIT_2
tcp        0      0  192.168.129.44.63611   10.116.50.30.57966       FIN_WAIT_2
tcp        0      0  192.168.129.44.63416   10.116.50.30.57946       FIN_WAIT_2
tcp        0      0  192.168.129.44.57835   10.116.50.30.49188       FIN_WAIT_2
tcp        0      0  192.168.129.44.57502   10.116.50.30.52615       ESTABLISHED
tcp        0      0  192.168.129.44.50387   10.116.50.30.58301       FIN_WAIT_2
tcp        0      0  192.168.129.44.53297   10.116.50.30.53943       FIN_WAIT_2
tcp        0      0  192.168.129.44.55202   10.116.50.30.54141       FIN_WAIT_2

關於TCP的連線狀態,參見 點此連結
(引用)TCP狀態
起初每個socket都是CLOSED狀態,當客戶端初使化一個連線,他傳送一個SYN包到伺服器,客戶端進入SYN_SENT狀態。
伺服器接收到SYN包,反饋一個SYN-ACK包,客戶端接收後返饋一個ACK包客戶端變成ESTABLISHED狀態,如果長時間沒收到SYN-ACK包,客戶端超時進入CLOSED狀態。
當伺服器繫結並監聽某一埠時,socket的狀態是LISTEN,當客戶企圖建立連線時,伺服器收到一個SYN包,並反饋SYN-ACK包。伺服器狀態變成SYN_RCVD,當客戶端傳送一個ACK包時,伺服器socket變成ESTABLISHED狀態。
當一個程式在ESTABLISHED狀態時有兩種圖徑關閉它, 第一是主動關閉,第二是被動關閉。
如果你要主動關閉的話,傳送一個FIN包。當你的程式closesocket或者shutdown(標記),你的程式發 送一個FIN包到peer,你的socket變成FIN_WAIT_1狀態。peer反饋一個ACK包,你的socket進入FIN_WAIT_2狀態。 如果peer也在關閉連線,那麼它將傳送一個FIN包到你的電腦,你反饋一個ACK包,並轉成TIME_WAIT狀態。TIME_WAIT狀態又號2MSL等待狀態。MSL意思是最大段生命週期(Maximum Segment Lifetime)表明一個包存在於網路上到被丟棄之間的時間。每個IP包有一個TTL(time_to_live),當它減到0時則包被丟棄。每個路由 器使TTL減一併且傳送該包。當一個程式進入TIME_WAIT狀態時,他有2個MSL的時間,這個充許TCP重發最後的ACK,萬一最後的ACK丟失 了,使得FIN被重新傳輸。在2MSL等待狀態完成後,socket進入CLOSED狀態。
被動關閉:當程式收到一個FIN包從peer,並 反饋一個ACK包,於是程式的socket轉入CLOSE_WAIT狀態。因為peer已經關閉了,所以不能發任何訊息了。但程式還可以。要關閉連線,程 序自已傳送給自已FIN,使程式的TCP socket狀態變成LAST_ACK狀態,當程式從peer收到ACK包時,程式進入CLOSED狀態。

TCP終止連線採用的是四次握手,如下圖。
FIN_WAIT_1:client發出fin以後,狀態更新為fin_wait_1,server接收到來自client的fin以後狀態也更改為fin_wait_1,立刻傳送ack,
產生FIN_WAIT_2的實質是隻完成了一次fin-ack的過程以後,client一直在等待來自server的第二次fin,但由於對端負荷過重,或者連線異常終止,導致對端無法傳送FIN. 官方的說法是:Socket closed, waiting for shutdown from remote.

Client                                                                               Server
    1 ----------------------------FIN------------------------------&gt 
FIN_WAIT_1                                                                 FIN_WAIT_1
    2  FIN_WAIT_2
    3     4 ----------------------------ACK------------------------------&gt 
TIME_WAIT

Impact:
If too many FIN_WAIT_2 sessions build up, it can fill up the space allocated for storing connection information and crash the Kernel.

Resolution or workaround:
The right way to handle this problem is for the TCP/IP stack to have a fin_wait2 timer that will shutdown sockets stuck in fin_wait2 state.

How long those FIN_WAIT_2 sockets stay in that state will depend on the "tcp_fin_wait_2_timeout" tcpip parameter. By default, HP will keep those FIN_WAIT_2 sockets around forever. To find out what your value is currently set to, issue :
#ndd -get /dev/tcp tcp_fin_wait_2_timeout

And to change this value to, say 1 hour, issue :
#ndd -set /dev/tcp tcp_fin_wait_2_timeout 3600000

Changing the parameter using the above ndd command will take effect immediately but that change will be lost when the system is rebooted. To make this change permanently, you need to edit the /etc/rc.config.d/nddconf file. By setting the "tcp_fin_wait_2_timeout" to 1 hour, the FIN_WAIT_2 sockets will be closed after 1 hour.

The FIN_WAIT_2 timer must be used with caution because when TCP is in the FIN_WAIT_2 state the remote is still allowed to send data. In addition, if the remote TCP would terminate normally (it is not hung nor terminating abnormally) and the connection is closed because of the FIN_WAIT_2 timer, the connection may be closed prematurely.

Data may be lost if the remote sends a window update or FIN after the local TCP has closed the connection. In this situation, the local TCP will send a RESET. According to the TCP protocol specification, the remote TCP should flush its receive queue when it receives the RESET. This may cause data to be lost.

參考文件

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/223653/viewspace-1672341/,如需轉載,請註明出處,否則將追究法律責任。

相關文章