On Tue, Mar 2, 2021 at 1:37 PM Eric Dumazet <eduma...@google.com> wrote: > > On Tue, Mar 2, 2021 at 7:08 AM Jakub Kicinski <k...@kernel.org> wrote: > > > > When receiver does not accept TCP Fast Open it will only ack > > the SYN, and not the data. We detect this and immediately queue > > the data for (re)transmission in tcp_rcv_fastopen_synack(). > > > > In DC networks with very low RTT and without RFS the SYN-ACK > > may arrive before NIC driver reported Tx completion on > > the original SYN. In which case skb_still_in_host_queue() > > returns true and sender will need to wait for the retransmission > > timer to fire milliseconds later. > > > > Revert back to non-fast clone skbs, this way > > skb_still_in_host_queue() won't prevent the recovery flow > > from completing. > > > > Suggested-by: Eric Dumazet <eduma...@google.com> > > Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly") > > Hmmm, not sure if this Fixes: tag makes sense. > > Really, if we delay TX completions by say 10 ms, other parts of the > stack will misbehave anyway. > > Also, backporting this patch up to linux-3.19 is going to be tricky. > > The real issue here is that skb_still_in_host_queue() can give a false > positive. > > I have mixed feelings here, as you can read my answer :/ > > Maybe skb_still_in_host_queue() signal should not be used when a part > of the SKB has been received/acknowledged by the remote peer > (in this case the SYN part). > > Alternative is that drivers unable to TX complete their skbs in a > reasonable time should call skb_orphan() > to avoid skb_unclone() penalties (and this skb_still_in_host_queue() issue) > > If you really want to play and delay TX completions, maybe provide a > way to disable skb_still_in_host_queue() globally, > using a static key ?
The problem as I see it is that the original fclone isn't what we sent out on the wire and that is confusing things. What we sent was a SYN with data, but what we have now is just a data frame that hasn't been put out on the wire yet. I wonder if we couldn't get away with doing something like adding a fourth option of SKB_FCLONE_MODIFIED that we could apply to fastopen skbs? That would keep the skb_still_in_host queue from triggering as we would be changing the state from SKB_FCLONE_ORIG to SKB_FCLONE_MODIFIED for the skb we store in the retransmit queue. In addition if we have to clone it again and the fclone reference count is 1 we could reset it back to SKB_FCLONE_ORIG.