On 6/2/2020 1:12 AM, Jakub Kicinski wrote:
This is a rare corner case anyway, where more than 1k tcp
connections sharing the same RX ring will request resync at the
same exact moment.

IDK about that. Certain applications are architected for max
capacity, not efficiency under steady load. So it matters a lot how
the system behaves under stress. What if this is the chain of
events:

overload -> drops -> TLS steams go out of sync -> all try to resync

I agree that this is not that rare, and it may be improved both in
future patches and hardware. Do you think it is critical to improve
it now, and not in a follow-up series?

It's not a blocker for me, although if this makes it into 5.8 there
will not be a chance to improve before net-next closes, so depends if
you want to risk it and support the code as is...


Hi Jakub,
Thanks for your comments.

This is just the beginning of this driver's offload support. I will continue working on enhancements and improvements in next kernels.
We have several enhancements in plans.

For now, if no real blockers, I think it's in a good shape to start with and make it to the kernel.

IMHO, this specific issue of better handling the resync failure in driver can be addressed in stages:

1. As a fix: stop asking the stack for resync re-calls. If a resync attempt fails, terminate any resync attempts for the specific connection. If there's room for a re-spin I can provide today. Otherwise it is a simple fix that can be addressed in the early rc's in -net.
What do you think?

2. Recover: this is an enhancement to be done in future kernels, where the driver internally and independently recovers from failed attempts and makes sure the are processed when there's enough room on the SQ again. Without the stack being engaged.

Thanks,
Tariq

Reply via email to