https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117759

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Maciej W. Rozycki <ma...@gcc.gnu.org>:

https://gcc.gnu.org/g:3d4d82211c8cbfde0b852bde1603b5d549426df7

commit r15-9036-g3d4d82211c8cbfde0b852bde1603b5d549426df7
Author: Maciej W. Rozycki <ma...@orcam.me.uk>
Date:   Sun Mar 30 15:24:50 2025 +0100

    Alpha: Add option to avoid data races for sub-longword memory stores
[PR117759]

    With non-BWX Alpha implementations we have a problem of data races where
    a 8-bit byte or 16-bit word quantity is to be written to memory in that
    in those cases we use an unprotected RMW access of a 32-bit longword or
    64-bit quadword width.  If contents of the longword or quadword accessed
    outside the byte or word to be written are changed midway through by a
    concurrent write executing on the same CPU such as by a signal handler
    or a parallel write executing on another CPU such as by another thread
    or via a shared memory segment, then the concluding write of the RMW
    access will clobber them.  This is especially important for the safety
    of RCU algorithms, but is otherwise an issue anyway.

    To guard against these data races with byte and aligned word quantities
    introduce the `-msafe-bwa' command-line option (standing for Safe Byte &
    Word Access) that instructs the compiler to instead use an atomic RMW
    access sequence where byte and word memory access machine instructions
    are not available.  There is no change to code produced for BWX targets.

    It would be sufficient for the secondary reload handle to use a pair of
    scratch registers, as requested by `reload_out<mode>', but it would end
    with poor code produced as one of the scratches would be occupied by
    data retrieved and the other one would have to be reloaded with repeated
    calculations, all within the LL/SC sequence.

    Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler
    and ask for more scratches there by defining a 256-bit OI integer mode.
    While reload is documented in our manual to support an arbitrary number
    of scratches in reality it hasn't been implemented for IRA:

    /* ??? It would be useful to be able to handle only two, or more than
       three, operands, but for now we can only handle the case of having
       exactly three: output, input and one temp/scratch.  */

    and it seems to be the case for LRA as well.  Do what everyone else does
    then and just have one wide multi-register scratch.

    I note that the atomic sequences emitted are suboptimal performance-wise
    as the looping branch for the unsuccessful completion of the sequence
    points backwards, which means it will be predicted as taken despite that
    in most cases it will fall through.  I do not see it as a deficiency of
    this change proposed as it takes care of recording that the branch is
    unlikely to be taken, by calling `alpha_emit_unlikely_jump'.  Therefore
    generic code elsewhere should instead be investigated and adjusted
    accordingly for the arrangement to actually take effect.

    Add test cases accordingly.

    There are notable regressions between a plain `-mno-bwx' configuration
    and a `-mno-bwx -msafe-bwa' one:

    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O0  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O1  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O3 -g  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -Os  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
    FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
    FAIL: g++.dg/init/array25.C  -std=c++17 execution test
    FAIL: g++.dg/init/array25.C  -std=c++98 execution test
    FAIL: g++.dg/init/array25.C  -std=c++26 execution test

    They come from the fact that these test cases play tricks with alignment
    and end up calling code that expects a reference to aligned data but is
    handed one to unaligned data.

    This doesn't cause a visible problem with plain `-mno-bwx' code, because
    the resulting alignment exception is fixed up by Linux.  There's no such
    handling currently implemented for LDL_L or LDQ_L instructions (which
    are first in the sequence) and consequently the offender is issued with
    SIGBUS instead.  Suitable handling will be added to Linux to complement
    this change that will emulate the trapping instructions[1], so these
    interim regressions are seen as harmless and expected.

    References:

    [1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency",
       
<https://lore.kernel.org/r/alpine.deb.2.21.2502181912230.65...@angie.orcam.me.uk/>

            gcc/
            PR target/117759
            * config/alpha/alpha-modes.def (OI): New integer mode.
            * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New
            prototype.
            * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New
            function.
            (alpha_secondary_reload): Handle TARGET_SAFE_BWA.
            * config/alpha/alpha.md (aligned_store_safe_bwa)
            (unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa)
            (reload_out<mode>_unaligned_safe_bwa): New expanders.
            (mov<mode>, movcqi, reload_out<mode>_aligned): Handle
            TARGET_SAFE_BWA.
            (reload_out<mode>): Guard against TARGET_SAFE_BWA.
            * config/alpha/alpha.opt (msafe-bwa): New option.
            * config/alpha/alpha.opt.urls: Regenerate.
            * doc/invoke.texi (Option Summary, DEC Alpha Options): Document
            the new option.

            gcc/testsuite/
            PR target/117759
            * gcc.target/alpha/stb.c: New file.
            * gcc.target/alpha/stb-bwa.c: New file.
            * gcc.target/alpha/stb-bwx.c: New file.
            * gcc.target/alpha/stba.c: New file.
            * gcc.target/alpha/stba-bwa.c: New file.
            * gcc.target/alpha/stba-bwx.c: New file.
            * gcc.target/alpha/stw.c: New file.
            * gcc.target/alpha/stw-bwa.c: New file.
            * gcc.target/alpha/stw-bwx.c: New file.
            * gcc.target/alpha/stwa.c: New file.
            * gcc.target/alpha/stwa-bwa.c: New file.
            * gcc.target/alpha/stwa-bwx.c: New file.

Reply via email to