On 27.03.2024 11:28, Oleksii wrote:
> On Wed, 2024-03-27 at 08:40 +0100, Jan Beulich wrote:
> ...
> 
>>>>>>> +/* This is required to provide a full barrier on success.
>>>>>>> */
>>>>>>> +static inline int atomic_add_unless(atomic_t *v, int a,
>>>>>>> int u)
>>>>>>> +{
>>>>>>> +       int prev, rc;
>>>>>>> +
>>>>>>> +    asm volatile (
>>>>>>> +        "0: lr.w     %[p],  %[c]\n"
>>>>>>> +        "   beq      %[p],  %[u], 1f\n"
>>>>>>> +        "   add      %[rc], %[p], %[a]\n"
>>>>>>> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
>>>>>>> +        "   bnez     %[rc], 0b\n"
>>>>>>> +        RISCV_FULL_BARRIER
>>>>>>
>>>>>> With this and no .aq on the load, why the .rl on the store?
>>>>> It is something that LKMM requires [1].
>>>>>
>>>>> This is not fully clear to me what is so specific in LKMM, but
>>>>> accoring
>>>>> to the spec:
>>>>>    Ordering Annotation Fence-based Equivalent
>>>>>    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
>>>>>    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
>>>>>    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
>>>>>    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
>>>>>    amo<op>.aq          amo<op>; fence r,rw
>>>>>    amo<op>.rl          fence rw,w; amo<op>
>>>>>    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
>>>>>    Table 2.2: Mappings from .aq and/or .rl to fence-based
>>>>> equivalents.
>>>>>    An alternative mapping places a fence rw,rw after the
>>>>> existing 
>>>>>    s{b|h|w|d|c} mapping rather than at the front of the
>>>>>    l{b|h|w|d|r} mapping.
>>>>
>>>> I'm afraid I can't spot the specific case in this table. None of
>>>> the
>>>> stores in the right column have a .rl suffix.
>>> Yes, it is expected.
>>>
>>> I am reading this table as (f.e.) amo<op>.rl is an equivalent of
>>> fence
>>> rw,w; amo<op>. (without .rl) 
>>
>> In which case: How does quoting the table answer my original
>> question?
> Agree, it is starting to be confusing, so let me give an answer to your
> question below.
> 
>>
>>>>>    It is also safe to translate any .aq, .rl, or .aqrl
>>>>> annotation
>>>>> into
>>>>>    the fence-based snippets of
>>>>>    Table 2.2. These can also be used as a legal implementation
>>>>> of
>>>>>    l{b|h|w|d} or s{b|h|w|d} pseu-
>>>>>    doinstructions for as long as those instructions are not
>>>>> added
>>>>> to
>>>>>    the ISA.
>>>>>
>>>>> So according to the spec, it should be:
>>>>>  sc.w ...
>>>>>  RISCV_FULL_BARRIER.
>>>>>
>>>>> Considering [1] and how this code looks before, it seems to me
>>>>> that
>>>>> it
>>>>> is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
>>>>
>>>> Here you say "or". Then why dos the code use sc.?.rl _and_ a
>>>> fence?
>>> I confused this line with amo<op>.aqrl, so based on the table 2.2
>>> above
>>> s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
>>> Linux kernel decided to strengthen it with full barrier:
>>>    -              "0:\n\t"
>>>    -              "lr.w.aqrl  %[p],  %[c]\n\t"
>>>    -              "beq        %[p],  %[u], 1f\n\t"
>>>    -              "add       %[rc],  %[p], %[a]\n\t"
>>>    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
>>>    -              "bnez      %[rc], 0b\n\t"
>>>    -              "1:"
>>>    +               "0:     lr.w     %[p],  %[c]\n"
>>>    +               "       beq      %[p],  %[u], 1f\n"
>>>    +               "       add      %[rc], %[p], %[a]\n"
>>>    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
>>>    +               "       bnez     %[rc], 0b\n"
>>>    +               "       fence    rw, rw\n"
>>>    +               "1:\n"
>>> As they have the following issue:
>>>    implementations of atomics such as atomic_cmpxchg() and
>>>    atomic_add_unless() rely on LR/SC pairs,
>>>    which do not give full-ordering with .aqrl; for example, current
>>>    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
>>>    below to end up with the state indicated in the "exists" clause.
>>
>> Is that really "current implementations", not "the abstract model"?
>> If so, the use of an explicit fence would be more like a workaround
>> (and would hence want commenting to that effect).
>>
>> In neither case can I see my original question answered: Why the .rl
>> on the store when there is a full fence later?
> The good explanation for that was provided in the commit addressing a
> similar issue for ARM64 [
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/
> ].
> The same holds true for RISC-V since ARM also employs WMO.
> 
> I would also like to mention another point, as I indicated in another
> email thread
> [ https://lists.xen.org/archives/html/xen-devel/2024-03/msg01582.html ]
> , that now this fence can be omitted and .aqrl can be used instead.
> 
> This was confirmed by Dan (the author of the RVWMO spec)
> [https://lore.kernel.org/linux-riscv/[email protected]/
> ]
> 
> I hope this addresses your original question. Does it?

I think it does, thanks. Some of this will need putting in at least the
patch description, if not a code comment.

Jan

Reply via email to