On Thu, 17 Oct 2019 16:48:25 -0700, Jakub Kicinski wrote:
> > The only patch that we have been able to make consistently work
> > without crashing and also without compromising performance, is the
> > previously submitted one where later thread bails out of
> > tls_tx_records. And as mentioned, it can perhaps be made more
> > efficient by rescheduling delayed work in the case where work handler
> > thread turns out to be the later thread that has to bail.
>
> Let me try to find a way to repro this reliably without any funky
> accelerators. The sleep in do_tcp_sendpages() should affect all cases.
> I should have some time today and tomorrow to look into this, bear with
> me..
Could you please try this?
---->8-----
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index c2b5e0d2ba1a..ab7b0af162a7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1204,12 +1204,10 @@ static int tls_sw_do_sendpage(struct sock *sk, struct
page *page,
goto alloc_payload;
}
- if (num_async) {
- /* Transmit if any encryptions have completed */
- if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
- cancel_delayed_work(&ctx->tx_work.work);
- tls_tx_records(sk, flags);
- }
+ /* Transmit if any encryptions have completed */
+ if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
+ cancel_delayed_work(&ctx->tx_work.work);
+ tls_tx_records(sk, flags);
}
sendpage_end:
ret = sk_stream_error(sk, flags, ret);
@@ -2171,7 +2169,8 @@ static void tx_work_handler(struct work_struct *work)
if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
return;
lock_sock(sk);
- tls_tx_records(sk, -1);
+ if (!sk->sk_write_pending)
+ tls_tx_records(sk, -1);
release_sock(sk);
}