On Wed, 28 Mar 2018, Yuchung Cheng wrote: > On Wed, Mar 28, 2018 at 7:14 AM, Yuchung Cheng <ych...@google.com> wrote: > > > > On Wed, Mar 28, 2018 at 5:45 AM, Ilpo Järvinen > > <ilpo.jarvi...@helsinki.fi> wrote: > > > On Tue, 27 Mar 2018, Yuchung Cheng wrote: > > > > > >> On Tue, Mar 27, 2018 at 7:23 AM, Ilpo Järvinen > > >> <ilpo.jarvi...@helsinki.fi> wrote: > > >> > On Mon, 26 Mar 2018, Yuchung Cheng wrote: > > >> > > > >> >> On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen > > >> >> <ilpo.jarvi...@helsinki.fi> wrote: > > >> >> > > > >> >> > A miscalculation for the number of acknowledged packets occurs > > >> >> > during > > >> >> > RTO recovery whenever SACK is not enabled and a cumulative ACK > > >> >> > covers > > >> >> > any non-retransmitted skbs. The reason is that pkts_acked value > > >> >> > calculated in tcp_clean_rtx_queue is not correct for slow start > > >> >> > after > > >> >> > RTO as it may include segments that were not lost and therefore did > > >> >> > not need retransmissions in the slow start following the RTO. Then > > >> >> > tcp_slow_start will add the excess into cwnd bloating it and > > >> >> > triggering a burst. > > >> >> > > > >> >> > Instead, we want to pass only the number of retransmitted segments > > >> >> > that were covered by the cumulative ACK (and potentially newly sent > > >> >> > data segments too if the cumulative ACK covers that far). > > >> >> > > > >> >> > Signed-off-by: Ilpo Järvinen <ilpo.jarvi...@helsinki.fi> > > >> >> > --- > > >> >> > > >> >> My understanding is there are two problems > > >> >> > > >> >> 1) your fix: the reordering logic in tcp-remove_reno_sacks requires > > >> >> precise cumulatively acked count, not newly acked count? > > >> > > > >> > While I'm not entirely sure if you intented to say that my fix is > > >> > broken > > >> > or not, I thought this very difference alot while making the fix and I > > >> > believe that this fix is needed because of the discontinuity at RTO > > >> > (sacked_out is cleared as we set L-bits + lost_out). This is an > > >> > artifact > > >> > in the imitation of sacked_out for non-SACK but at RTO we can't keep > > >> > that > > >> > in sync because we set L-bits (and have no S-bits to guide us). Thus, > > >> > we > > >> > cannot anymore "use" those skbs with only L-bit for the reno_sacks > > >> > logic. > > >> > > > >> > In tcp_remove_reno_sacks acked - sacked_out is being used to calculate > > >> > tp->delivered, using plain cumulative acked causes congestion control > > >> > breakage later as call to tcp_cong_control will directly use the > > >> > difference in tp->delivered. > > >> > > > >> > This boils down the exact definition of tp->delivered (the one given in > > >> > the header is not detailed enough). I guess you might have better idea > > >> > what it exactly is since one of you has added it? There are subtle > > >> > things > > >> > in the defination that can make it entirely unsuitable for cc > > >> > decisions. > > >> > Should those segments that we (possibly) already counted into > > >> > tp->delivered during (potentially preceeding) CA_Recovery be added to > > >> > it > > >> > for _second time_ or not? This fix avoids such double counting (the > > >> Where is the double counting, assuming normal DUPACK behavior? > > >> > > >> In the non-sack case: > > >> > > >> 1. upon receiving a DUPACK, we assume one packet has been delivered by > > >> incrementing tp->delivered in tcp_add_reno_sack() > > > > > > 1b. RTO here. We clear tp->sacked_out at RTO (i.e., the discontinuity > > > I've tried to point out quite many times already)... > > > > > >> 2. upon receiving a partial ACK or an ACK that acks recovery point > > >> (high_seq), tp->delivered is incremented by (cumulatively acked - > > >> #dupacks) in tcp_remove_reno_sacks() > > > > > > ...and this won't happen correctly anymore after RTO (since non-SACK > > > won't keep #dupacks due to the discontinuity). Thus we end up adding > > > cumulatively acked - 0 to tp->delivered on those ACKs. > > > > > >> therefore tp->delivered is tracking the # of packets delivered > > >> (sacked, acked, DUPACK'd) with the most information it could have > > >> inferred. > > > > > > Since you didn't answer any of my questions about tp->delivered directly, > > > let me rephrase them to this example (non-SACK, of course): > > > > > > 4 segments outstanding. RTO recovery underway (lost_out=4, sacked_out=0). > > > Cwnd = 2 so the sender rexmits 2 out of 4. We get cumulative ACK for > > > three segments. How much should tp->delivered be incremented? 2 or 3? > > > > > > ...I think 2 is the right answer. > > > > > >> From congestion control's perspective, it cares about the delivery > > >> information (e.g. how much), not the sequences (what or how). > > > > > > I guess you must have missed my point. I'm talking about "how much" > > > whole the time. It's just about when can we account for it (and when not). > > > > > >> I am pointing out that > > >> > > >> 1) your fix may fix one corner CC packet accounting issue in the > > >> non-SACK and CA_Loss > > >> 2) but it does not fix the other (major) CC packet accounting issue in > > >> the SACK case > > > > > > You say major but it's major if and only if one is using one of the > > > affected cc modules which were not that many and IMHO not very mainstream > > > anyway (check my list from the previous email, not that I'd mind if they'd > > > get fixed too). The rest of the cc modules do not seem to use the > > > incorrect > > > value even if some of them have the pkts_acked callback. > > > > > > Other CC packet accounting (regardless of SACK) is based on tp->delivered > > > and tp->delivered is NOT similarly miscounted with SACK because the logic > > > depend on !TCPCB_SACKED_ACKED (that fails only if we have very high ACK > > > loss). > > > > > >> 3) it breaks the dynamic dupthresh / reordering in the non-SACK case > > >> as tcp_check_reno_reordering requires strictly cum. ack. > > > > > > I think that the current non-SACK reordering detection logic is not that > > > sound after RTO (but I'm yet to prove this). ...There seems to be some > > > failure modes which is why I even thought of disabling the whole thing > > > for non-SACK RTO recoveries and as it now seems you do care more about > > > non-SACK than you initial claimed, I might even have motivation to fix > > > more those corners rather than the worst bugs only ;-). ...But I'll make > > > the tp->delivered fix only in this patch to avoid any change this part of > > > the code. > > > > > >> Therefore I prefer > > >> a) using tp->delivered to fix (1)(2) > > >> b) perhaps improving tp->delivered SACK emulation code in the non-SACK > > >> case > > > > > > Somehow I get an impression that you might assume/say here that > > [resending in plaintext] > > That's wrong impression. Perhaps it's worth re-iterating what I agree > > and disagree > > > > 1. [agree] there's accounting issue in non-SACK as you discovered > > which causes CC misbehavior > > > > 2. [major disagree] adjusting pkts_acked for ca_ops->pkts_acked in non-sack > > => that field is not used by common C.C. (you said so too) > > > > 3. [disagree] adjusting pkts_acked may not affect reordering > > accounting in non-sack > > > > > > > > For cubic or reno, the main source is the "acked_sacked" passed into > > tcp_cong_avoid(). that variable is derived from tp->delivered. > > Therefore we need to fix that to address the problem in (1) > > > > I have yet to read your code. Will read later today. > Your patch looks good. Some questions / comments:
Just to be sure that we understand each other the correct way around this time, I guess it resolved both of your "disagree" points above? > 1. Why only apply to CA_Loss and not also CA_Recovery? this may > mitigate GRO issue I noted in other thread. Hmm, that's indeed a good idea. I'll give it some more thought but my initial impression is that it should work. I initially thought that the GRO issues just had to do with missing xmit opportunities and was confused by the use of "mitigate" here but then realized you meant also (or perhaps mainly?) that GRO causes similar bursts (once the too small sacked_out runs out). > 2. minor style nit: can we adjust tp->delivered directly in > tcp_clean_rtx_queue(). I am personally against calling "non-sack" reno > (e.g. tcp_*reno*()) because its really confusing w/ Reno congestion > control when SACK is used. One day I'd like to rename all these *reno* > to _nonsack_. That's what I actually did first but put ended up putting it into own function because of line lengths (not a particularly good reason). ...So yes, I can put it into the tcp_clean_rtx_queue in the next version. I'll also try to keep that "reno" thing in my mind. -- i.