On 3/31/26 01:27, Kuniyuki Iwashima wrote:
> On Mon, Mar 30, 2026 at 4:04 PM Michal Luczaj <[email protected]> wrote:
>> On 3/26/26 07:26, Martin KaFai Lau wrote:
>>> On 3/15/26 4:58 PM, Michal Luczaj wrote:
>>>>> Beside, from looking at the may_update_sockmap(), I don't know if it is
>>>>> even doable (or useful) to bpf_map_update_elem(unix_sk) in
>>>>> tc/flow_dissector/xdp. One possible path is the SOCK_FILTER when looking
>>>>> at unix_dgram_sendmsg() => sk_filter(). It was not the original use case
>>>>> when the bpf_map_update_elem(sockmap) support was added iirc.
>>>>
>>>> What about a situation when unix_sk is stored in a sockmap, then tc prog
>>>> looks it up and invokes bpf_map_update_elem(unix_sk)? I'm not sure it's
>>>> useful, but seems doable.
>>>
>>> [ Sorry for the late reply ]
>>>
>>> It is a bummer that the bpf_map_update_elem(unix_sk) path is possible
>>> from tc :(
>>>
>>> Then unix_state_lock() in its current form cannot be safely acquired in
>>> sock_map_update_elem(). It is currently a spin_lock() instead of
>>> spin_lock_bh().
>>
>> Is there a specific deadlock you have in your mind?
> 
> lockdep would complain if we used the same lock from
> different contexts.
> 
> e.g.)
> Process context holding unix_state_lock() with the normal spin_lock()
> -> BH interrupt
> -> tc prog trying to hold the same lock with spin_lock(). (_bh())
> -> deadlock

OK, I get it, thanks.

So here's one more idea: the null-ptr-deref issue is connect() racing
against sock_map_update_elem_*SYS*() coming from user-space, not the
can-be-called-from-BH sock_map_update_elem() variant. So can't we assume
that for any sock_map_update_elem(unix_sk) invoked by a tc prog, unix_sk
will always be "stable", i.e. in a state that cannot lead to that
null-ptr-deref?

IOW, if for a tc prog the only way to get hold of unix_sk is look it up in
a sockmap, then (by the fact that unix_sk _is_ in the sockmap) unix_sk will
be already safe to use by sock_map_update_elem() without taking the af_unix
state lock.

Long story short: take the unix state lock in sock_map_update_elem_sys(),
don't bother in sock_map_update_elem()?

>> ...
>> And sock_{map,hash}_seq_show() (being a part of bpf iter machinery) needs
>> to take lock_sock() just as well? Would that require a special-casing
>> (unix_state_lock()) for af_unix?
> 
> Right, lock_sock() + unix_state_lock() + SOCK_DEAD check
> should be best.

OK, got it.


Reply via email to