Chad David wrote:
> Has anyone noticed (or fixed) a bug in -current where socket connections
> on the local machine do not shutdown properly?  During stress testing
> I'm seeing thousands (2316 right now) of these:
> 
> tcp4       0      0  192.168.1.2.8080       192.168.1.2.2215       FIN_WAIT_2
> tcp4       0      0  192.168.1.2.2215       192.168.1.2.8080       LAST_ACK
> 
> Both the client and the server are dead, but the connections stay in this
> state.
> 
> I tested with the server on -current and the client on another box, and
> all of the server sockets end up in TIME_WAIT.  Is there something delaying
> the last ack on local connections?

A connection goes into FIN_WAIT_2 when it has received the ACK
of the FIN, but not received a FIN (or sent an ACK) itself, thus
permitting it to enter TIME_WAIT state for 2MSL before proceeding
to the CLOSED state, as a result of a server initiated close.

A connection goes into LAST_ACK when it has sent a FIN and not
received the ACK of the FIN before proceeding to the CLOSED
state, as a result of a client initiated close.

Since it's showing IP addresses, you appear to be using real
network connections, rather than loopback connections.

There are basically several ways to cause this:

1)      You have something on your network, like a dummynet,
        that is deteministically dropping the the ACK to
        the client when the server goes from FIN_WAIT_1,
        so that the server goes to CLOSING instead of going
        to FIN_WAIT_2 (client closes first), or the FIN in
        the other direction so that the server doesn't go
        to TIME_WAIT from FIN_WAIT_2 (server closes first).

2)      You have intentionally disabled KEEPALIVE, so that
        a close results in an RST instead of a normal
        shutdown of the TCP connection (I can't tell if
        you are doing a real call to "shutdown(2)", or if
        you are just relying on the OS resource tracking
        behaviour that is implicit to "close(2)" (but only
        if you don't set KEEPALIVE, and have disabled the
        sysctl default of always doing KEEPALIVE on every
        connection).  In this case, it's possible that the
        RST was lost on the wire, and since RSTs are not
        retransmitted, you have shot yourself in the foot.

        Note:   You often see this type of foolish foot
                shooting when running MAST, WAST, or
                webbench, which try to factor out response
                speed and measure connection speed, so that
                they benchmark the server, not the FS or
                other OS latencies in the document delivery
                path (which is why these tools suck as real
                world benchmarks go).  You could also cause
                this (unlikely) with a bad firewall rule.

3)      You've exhausted your mbufs before you've exhausted
        the number of simultaneous connections you are
        permitted, because you have incorrectly tuned your
        kernel, and therefore all your connections are sitting
        in a starvation deadlock, waiting for packets that can
        never be sent because there are no mbufs available.

4)      You've got local hacks that your aren't telling us
        about (shame on you!).

5)      You have found an introduced bug in -current.

        Note:   I personally think this one is unlikely.

6)      Maybe something I haven't thought of...

        Note:   I personally think this one is unlikely,
                too... ;^)

See RFC 793 (or Stevens) for details on the state machine for
both ends of the connection, and you will see how your machine
got into this mess in the first place.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to