On 01/30/2019 09:01 PM, Alexei Starovoitov wrote: > On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote: >> On 01/30/2019 04:11 PM, Waiman Long wrote: >>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote: >>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote: >>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote: >>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote: >>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote: >>>>>>>> Lockdep warns about false positive: >>>>>>> This is not a false positive, and you probably also need to use >>>>>>> down_read_non_owner() to match this up_read_non_owner(). >>>>>>> >>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different >>>>>>> in the lockdep annotation; there is also optimistic spin stuff that >>>>>>> relies on 'owner' tracking. >>>>>> Can you point out in the code the spin bit? >>>>>> As far as I can see sem->owner is debug only feature. >>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS. >>>>> No, sem->owner is mainly for performing optimistic spinning which is a >>>>> performance feature to make rwsem writer-lock performs similar to mutex. >>>>> The debugging part is just an add-on. It is not the reason for the >>>>> presence of sem->owner. >>>> I see. Got it. >>>> >>>>>> Also there is no down_read_trylock_non_owner() at the moment. >>>>>> We can argue about it for -next, but I'd rather silence lockdep >>>>>> with this patch today. >>>>>> >>>>> We can add down_read_trylock_non_owner() if there is a need for it. It >>>>> should be easy to do. >>>> Yes, but looking through the code it's not clear to me that it's safe >>>> to mix non_owner() versions with regular. >>>> bpf/stackmap.c does down_read_trylock + up_read. >>>> If we add new down_read_trylock_non_owner that set the owner to >>>> NULL | RWSEM_* bits is this safe with conccurent read/write >>>> that do regular versions? >>>> rwsem_can_spin_on_owner() does: >>>> if (owner) { >>>> ret = is_rwsem_owner_spinnable(owner) && >>>> owner_on_cpu(owner); >>>> that looks correct. >>>> For a second I thought there could be fault here due to non_owner. >>>> But there could be other places where it's assumed that owner >>>> is never null? >>> The content of owner is not the cause of the lockdep warning. The >>> lockdep code assumes that the task that acquires the lock will release >>> it some time later. That is not the case when you need to acquire the >>> lock by one task and released by another. In this case, you have to use >>> the non_owner version of down/up_read which disable the lockdep >>> acquire/release tracking. That will be the only difference between the >>> two set of APIs. >>> >>> Cheers, >>> Longman >> BTW, you may want to do something like that to make sure that the lock >> ownership is probably tracked. >> >> -Longman >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index d43b145..79eef9d 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct >> bpf_stack_ >> } else { >> work->sem = ¤t->mm->mmap_sem; >> irq_work_queue(&work->irq_work); >> + >> + /* >> + * The irq_work will release the mmap_sem with >> + * up_read_non_owner(). The rwsem_release() is called >> + * here to release the lock from lockdep's perspective. >> + */ >> + rwsem_release(¤t->mm->mmap_sem.dep_map, 1, _RET_IP_); > are you saying the above is enough coupled with up_read_non_owner? > Looking at how amdgpu is using this api... I think they just use non_owner > version when doing things from different task. > So I don't think pairing non_owner with non_owner is strictly necessary. > It seems fine to use down_read_trylock() with above rwsem_release() hack > plus up_read_non_owner(). > Am I missing something? > The statement above is to clear the state for the lockdep so that it knows that the task no longer owns the lock. Without doing that, there is a possibility of some other kind of incorrect lockdep warning may be produced because the code will still think the task hold a read-lock on the mmap_sem. It is also possible no warning will be reported.
The bpf code is kind of special that it acquires the mmap_sem. Later on, it either releases it itself (non-NMI) or request irq_work to release it (NMI). So either, you use the _non_owner versions for both acquire and release or fake the release like the code segment above. Peter, please chime in if you have other suggestion. Cheers, Longman