On Wed, Sep 18, 2019 at 5:45 PM Jakub Kicinski <jakub.kicin...@netronome.com> wrote: > > On Wed, 18 Sep 2019 17:37:44 -0400, Pooja Trivedi wrote: > > Hi Jakub, > > > > I have explained one potential way for the race to happen in my > > original message to the netdev mailing list here: > > https://marc.info/?l=linux-netdev&m=156805120229554&w=2 > > > > Here is the part out of there that's relevant to your question: > > > > ----------------------------------------- > > > > One potential way for race condition to appear: > > > > When under tcp memory pressure, Thread 1 takes the following code path: > > do_sendfile ---> ... ---> .... ---> tls_sw_sendpage ---> > > tls_sw_do_sendpage ---> tls_tx_records ---> tls_push_sg ---> > > do_tcp_sendpages ---> sk_stream_wait_memory ---> sk_wait_event > > Ugh, so do_tcp_sendpages() can also release the lock :/ > > Since the problem occurs in tls_sw_do_sendpage() and > tls_sw_do_sendmsg() as well, should we perhaps fix it at that level?
That won't do because tls_tx_records also gets called when completion callbacks schedule delayed work. That was the code path that caused the crash for my test. Cavium's nitrox crypto offload driver calling tls_encrypt_done, which calls schedule_delayed_work. Delayed work that was scheduled would then be processed by tx_work_handler. Notice in my previous reply, "Thread 2 code path: tx_work_handler ---> tls_tx_records" "Thread 2 code path: tx_work_handler ---> tls_tx_records"