We have a production issue with only
one of our servers and have correlated slow performance to an abundance of sockets in
the TIME_WAIT
state. Without drawing this question into a huge
backstory, we basically know that every time the server is slow, about 80% of the
server's sockets are in this TIME_WAIT
state, which of course
we see by running a netstat
). Specifically, because
TIME_WAIT
times out and go away, when our server is slow we see
these TIME_WAIT
s crop up very frequently (about ever 5 - 10
minutes).
I did a little digging and see that
TIME_WAIT
s occur when the server closes an active connection
but keeps it around in case any delayed packets come through. Eventually
TIME_WAIT
times
out.
Anyway to see exactly why an individual
socket went into the TIME_WAIT
state to begin with? This is
CentOS 5 - does Linux log this info in var/logs
anywhere, or is
there any way to do a tcpdump and look for a specific pattern that leads to a
TIME_WAIT
? Thanks in advance.
Answer
Short answer - it is due to an app. The app creates sockets for a short time ,
closes them, then it immediately needs to open another socket. The sluggishness is
related to the process(es) running out of sockets to
use.
When creating a socket there are
options - SO_REUSEADDR abnd SO_REUSEPORT. They have somewhat similar functions, but I
suspect in Centos 5 SO_REUSEPORT is not available. Anyway, the optional setting on a
socket call allows the port to be immediately reused.
So, a commonly used fix is to recode. It is
probably a net app that connects for a few seconds then ends the
session.
Comments
Post a Comment