We have a production issue with only one of our servers and have correlated slow performance to an abundance of sockets in the TIME_WAIT
state. Without drawing this question into a huge backstory, we basically know that every time the server is slow, about 80% of the server's sockets are in this TIME_WAIT
state, which of course we see by running a netstat
). Specifically, because TIME_WAIT
times out and go away, when our server is slow we see these TIME_WAIT
s crop up very frequently (about ever 5 - 10 minutes).
I did a little digging and see that TIME_WAIT
s occur when the server closes an active connection but keeps it around in case any delayed packets come through. Eventually TIME_WAIT
times out.
Anyway to see exactly why an individual socket went into the TIME_WAIT
state to begin with? This is CentOS 5 - does Linux log this info in var/logs
anywhere, or is there any way to do a tcpdump and look for a specific pattern that leads to a TIME_WAIT
? Thanks in advance.
Answer
Short answer - it is due to an app. The app creates sockets for a short time , closes them, then it immediately needs to open another socket. The sluggishness is related to the process(es) running out of sockets to use.
When creating a socket there are options - SO_REUSEADDR abnd SO_REUSEPORT. They have somewhat similar functions, but I suspect in Centos 5 SO_REUSEPORT is not available. Anyway, the optional setting on a socket call allows the port to be immediately reused.
So, a commonly used fix is to recode. It is probably a net app that connects for a few seconds then ends the session.
Comments
Post a Comment