On Thu, 28 May 2020 09:03:07 +0300 Boris Pismenny wrote: > On 20/05/2020 23:34, Jakub Kicinski wrote: > > On Wed, 20 May 2020 18:14:08 +0300 Tariq Toukan wrote: > >> From: Boris Pismenny <bor...@mellanox.com> > >> > >> In driver request resync, the hardware requests a resynchronization > >> request at some TCP sequence number. If that TCP sequence number does > >> not point to a TLS record header, then the resync attempt has failed. > >> > >> Failed resync should reset the resync request to avoid spurious resyncs > >> after the TCP sequence number has wrapped around. > >> > >> Fix this by resetting the resync request when the TLS record header > >> sequence number is not before the requested sequence number. > >> As a result, drivers may be called with a sequence number that is not > >> equal to the requested sequence number. > >> > >> Fixes: f953d33ba122 ("net/tls: add kernel-driven TLS RX resync") > >> Signed-off-by: Boris Pismenny <bor...@mellanox.com> > >> Signed-off-by: Tariq Toukan <tar...@mellanox.com> > >> Reviewed-by: Saeed Mahameed <sae...@mellanox.com> > >> --- > >> net/tls/tls_device.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > >> index a562ebaaa33c..cbb13001b4a9 100644 > >> --- a/net/tls/tls_device.c > >> +++ b/net/tls/tls_device.c > >> @@ -714,7 +714,7 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 > >> rcd_len, u32 seq) > >> seq += TLS_HEADER_SIZE - 1; > >> is_req_pending = resync_req; > >> > >> - if (likely(!is_req_pending) || req_seq != seq || > >> + if (likely(!is_req_pending) || before(seq, req_seq) || > > So the kernel is going to send the sync message to the device with at > > sequence number the device never asked about? > > Yes, although I would phrase it differently: the kernel would indicate to the > driver, > that the resync request is wrong, and that it can go back to searching for a > header. > If there are any drivers that need an extra check, then we can add it in the > driver itself.
I'd rather make the API clear and use a different op to indicate this is a reset rather than a valid sync response. sync callback already has the enum for sync type. > > Kernel usually can't guarantee that the notification will happen, > > (memory allocation errors, etc.) so the device needs to do the > > restarting itself. The notification should not be necessary. > > > > Usually, all is best effort, but in principle, reliability should be > guaranteed by higher layers to simplify the design. Since we're talking high level design perspectives here - IMO when you talk to FW on a device - it's a distributed system. The days when you could say driver on the host is higher layer ended when people started putting fat firmwares on the NICs. So no, restart has to be handled by the system making the request. In this case the NIC. > On the one hand, resync depends on packet arrival, which may take a while, > and implementing different heuristics in each driver to timeout is complex. > On the other hand, assuming the user reads the record data eventually, ktls > will be able to deliver the resync request, so implementing this in the tls > layer is simple. We definitely not want any driver logic here - the resync restart logic has to be implemented on the device. > In this case, I see no reason for the tls layer to fail --- did you have a > specific flow in mind? > AFAICT, there are no memory allocation/error flows that will prevent the > driver to receive a resync without an error on the socket (bad tls header). > If tls received the header, then the driver will receive the resync call, and > it will take responsibility for reliably delivering it to HW. So you're saying the request path and the response path for resync are both 100% lossless both on the NIC and the host? There is no scenario in which queue overflows, PCIe gets congested, etc.?