On (05/02/16 09:20), Santosh Shilimkar wrote: > > rds_conn_transition(conn, RDS_CONN_DOWN, RDS_CONN_CONNECTING); > >+ if (rs_tcp->t_sock) { > >+ /* Need to resolve a duelling SYN between peers. > >+ * We have an outstanding SYN to this peer, which may > >+ * potentially have transitioned to the RDS_CONN_UP state, > >+ * so we must quiesce any send threads before resetting > >+ * c_transport_data. > >+ */ > >+ wait_event(conn->c_waitq, > >+ !test_bit(RDS_IN_XMIT, &conn->c_flags)); > Would it be good to check the return value of rds_conn_transition() > since if CONN is already UP above will fail and then send message > might again race and we will let message through even though passive > hasn't finished its connection.
no, that was the original issue that I was running into, which needed commit 241b2719 - prior to that commit, if the conn was already UP, we'd end up doing a rds_conn_drop on a good connection, and both sides would end up in a pair of infinite 3WH loops. Even if we dont do a rds_conn_drop on the UP connection, we've just (before rds_tcp_accept_one) sent out a syn-ack on the incoming syn, and now need to RST that syn-ac. The other side is going to receive the rst, and get confused about what to clean up (since there's already an UP connection going on). In short, when there is a duel, it's cleanest to have a deterministic arbitration- both sides use the numeric value of saddr and faddr to figure out which side is active, which side is passive. (Thus the basis on the BGP router-id based model for 241b2719) FWIW, much of this is actually a corner case- in practice, its not frequent to have syns crossing each other at "almost the same time". --Sowmini