On 21/10/19 - 14:27:24, Jason Baron wrote: > > > On 10/21/19 2:02 PM, Yuchung Cheng wrote: > > Thanks for the patch. Detailed comments below > > > > On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell <ncardw...@google.com> wrote: > >> > >> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron <jba...@akamai.com> wrote: > >>> > >>> The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports > >>> whether > >>> or not data-in-SYN was ack'd on both the client and server side. We'd like > >>> to gather more information on the client-side in the failure case in order > >>> to indicate the reason for the failure. This can be useful for not only > >>> debugging TFO, but also for creating TFO socket policies. For example, if > >>> a middle box removes the TFO option or drops a data-in-SYN, we can > >>> can detect this case, and turn off TFO for these connections saving the > >>> extra retransmits. > >>> > >>> The newly added tcpi_fastopen_client_fail status is 2 bits and has 4 > >>> states: > >>> > >>> 1) TFO_STATUS_UNSPEC > >>> > >>> catch-all. > >>> > >>> 2) TFO_NO_COOKIE_SENT > >>> > >>> If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie > >>> was sent because we don't have one yet, its not in cache or black-holing > >>> may be enabled (already indicated by the global > >>> LINUX_MIB_TCPFASTOPENBLACKHOLE). > > > > It'd be useful to separate the two that cookie is available but is > > prohibited to use due to BH checking. We've seen users internally get > > confused due to lack of this info (after seeing cookies from ip > > metrics). > > > > ok, yeah i had been thinking about splitting these out but thought that > the LINUX_MIB_TCPFASTOPENBLACKHOLE counter could help differentiate > these cases - but I'm ok making it explicit. > > >>> > >>> 3) TFO_NO_SYN_DATA > >>> > >>> Data was sent with SYN, we received a SYN/ACK but it did not cover the > >>> data > >>> portion. Cookie is not accepted by server because the cookie may be > >>> invalid > >>> or the server may be overloaded. > >>> > >>> > >>> 4) TFO_NO_SYN_DATA_TIMEOUT > >>> > >>> Data was sent with SYN, we received a SYN/ACK which did not cover the data > >>> after at least 1 additional SYN was sent (without data). It may be the > >>> case > >>> that a middle-box is dropping data-in-SYN packets. Thus, it would be more > >>> efficient to not use TFO on this connection to avoid extra retransmits > >>> during connection establishment. > >>> > >>> These new fields certainly not cover all the cases where TFO may fail, but > >>> other failures, such as SYN/ACK + data being dropped, will result in the > >>> connection not becoming established. And a connection blackhole after > >>> session establishment shows up as a stalled connection. > >>> > >>> Signed-off-by: Jason Baron <jba...@akamai.com> > >>> Cc: Eric Dumazet <eduma...@google.com> > >>> Cc: Neal Cardwell <ncardw...@google.com> > >>> Cc: Christoph Paasch <cpaa...@apple.com> > >>> --- > >> > >> Thanks for adding this! > >> > >> It would be good to reset tp->fastopen_client_fail to 0 in > >> tcp_disconnect(). > >> > >>> +/* why fastopen failed from client perspective */ > >>> +enum tcp_fastopen_client_fail { > >>> + TFO_STATUS_UNSPEC, /* catch-all */ > >>> + TFO_NO_COOKIE_SENT, /* if not in TFO_CLIENT_NO_COOKIE mode */ > >>> + TFO_NO_SYN_DATA, /* SYN-ACK did not ack SYN data */ > >> > >> I found the "TFO_NO_SYN_DATA" name a little unintuitive; it sounded to > >> me like this means the client didn't send a SYN+DATA. What about > >> "TFO_DATA_NOT_ACKED", or something like that? > >> > >> If you don't mind, it would be great to cc: Yuchung on the next rev. > > TFO_DATA_NOT_ACKED is already available from the inverse of > > TCPI_OPT_SYN_DATA > > #define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or > > rcvd */ > > > > It occurs (3)(4) are already available indirectly from > > TCPI_OPT_SYN_DATA and tcpi_total_retrans together, but the socket must > > query tcpi_total_retrans right after connect/sendto returns which may > > not be preferred. > > > > How about an alternative proposal to the types to catch more TFO issues: > > > > TFO_STATUS_UNSPEC > > TFO_DISABLED_BLACKHOLE_DETECTED > > TFO_COOKIE_UNAVAILABLE > > TFO_SYN_RETRANSMITTED // use in conjunction w/ TCPI_OPT_SYN_DATA for (3)(4) > > Ok, that set works for me. I will re-spin with these states for v2. > Thanks for the suggestion!
Actually, longterm I hope we would be able to get rid of the blackhole-detection and fallback heuristics. In a far distant future where these middleboxes have been weeded out ;-) So, do we really want to eternalize this as part of the API in tcp_info ? Christoph