On Wed, 20 May 2020 22:39:11 +0530 Vinay Kumar Yadav wrote: > On 5/20/2020 12:46 AM, David Miller wrote: > > From: Vinay Kumar Yadav <vinay.ya...@chelsio.com> > > Date: Tue, 19 May 2020 13:13:27 +0530 > > > >> + spin_lock_bh(&ctx->encrypt_compl_lock); > >> + pending = atomic_read(&ctx->encrypt_pending); > >> + spin_unlock_bh(&ctx->encrypt_compl_lock); > > The sequence: > > > > lock(); > > x = p->y; > > unlock(); > > > > Does not fix anything, and is superfluous locking. > > > > The value of p->y can change right after the unlock() call, so you > > aren't protecting the atomic'ness of the read and test sequence > > because the test is outside of the lock. > > Here, by using lock I want to achieve atomicity of following statements. > > pending = atomic_dec_return(&ctx->decrypt_pending); > if (!pending && READ_ONCE(ctx->async_notify)) > complete(&ctx->async_wait.completion); > > means, don't want to read (atomic_read(&ctx->decrypt_pending)) > in middle of two statements > > atomic_dec_return(&ctx->decrypt_pending); > and > complete(&ctx->async_wait.completion); > > Why am I protecting only read, not test ?
Protecting code, not data, is rarely correct, though. > complete() is called only if pending == 0 > if we read atomic_read(&ctx->decrypt_pending) = 0 > that means complete() is already called and its okay to > initialize completion (reinit_completion(&ctx->async_wait.completion)) > > if we read atomic_read(&ctx->decrypt_pending) as non zero that means: > 1- complete() is going to be called or > 2- complete() already called (if we read atomic_read(&ctx->decrypt_pending) > == 1, then complete() is called just after unlock()) > for both scenario its okay to go into wait (crypto_wait_req(-EINPROGRESS, > &ctx->async_wait)) First of all thanks for the fix, this completion code is unnecessarily complex and brittle if you ask me. That said I don't think your fix is 100%. Consider this scenario: # 1. writer queues first record on CPU0 # 2. encrypt completes on CPU1 pending = atomic_dec_return(&ctx->decrypt_pending); # pending is 0 # IRQ comes and CPU1 goes off to do something else with spin lock held # writer proceeds to encrypt next record on CPU0 # writer is done, enters wait smp_store_mb(ctx->async_notify, true); # Now CPU1 is back from the interrupt, does the check if (!pending && READ_ONCE(ctx->async_notify)) complete(&ctx->async_wait.completion); # and it completes the wait, even though the atomic decrypt_pending was # bumped back to 1 You need to hold the lock around the async_notify false -> true transition as well. The store no longer needs to have a barrier. For async_notify true -> false transitions please add a comment saying that there can be no concurrent accesses, since we have no pending crypt operations. Another way to solve this would be to add a large value to the pending counter to indicate that there is a waiter: if (atomic_add_and_fetch(&decrypt_pending, 1000) > 1000) wait(); else reinit(); atomic_sub(decrypt_pending, 1000) completion: if (atomic_dec_return(&decrypt_pending) == 1000) complete()