https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hong...@gcc.gnu.org>:

https://gcc.gnu.org/g:4d281ff7ddd8f6365943c0a622107f92315bb8a6

commit r12-5265-g4d281ff7ddd8f6365943c0a622107f92315bb8a6
Author: Hongyu Wang <hongyu.w...@intel.com>
Date:   Fri Nov 12 10:50:46 2021 +0800

    PR target/103069: Relax cmpxchg loop for x86 target

    From the CPU's point of view, getting a cache line for writing is more
    expensive than reading.  See Appendix A.2 Spinlock in:

    https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
    xeon-lock-scaling-analysis-paper.pdf

    The full compare and swap will grab the cache line exclusive and causes
    excessive cache line bouncing.

    The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
    -march=x86-64 like:

            movl    v(%rip), %eax
    .L2:
            movl    %eax, %ecx
            movl    %eax, %edx
            orl     $1, %ecx
            lock cmpxchgl   %ecx, v(%rip)
            jne     .L2
            movl    %edx, %eax
            andl    $1, %eax
            ret

    To relax above loop, GCC should first emit a normal load, check and jump to
    .L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
    yield the CPU to another hyperthread and to save power, so the code is
    like

    .L84:
            movl    (%rdi), %ecx
            movl    %eax, %edx
            orl     %esi, %edx
            cmpl    %eax, %ecx
            jne     .L82
            lock cmpxchgl   %edx, (%rdi)
            jne     .L84
    .L82:
            rep nop
            jmp     .L84

    This patch adds corresponding atomic_fetch_op expanders to insert load/
    compare and pause for all the atomic logic fetch builtins. Add flag
    -mrelax-cmpxchg-loop to control whether to generate relaxed loop.

    gcc/ChangeLog:

            PR target/103069
            * config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
            New expand function.
            * config/i386/i386-options.c (ix86_target_string): Add
            -mrelax-cmpxchg-loop flag.
            (ix86_valid_target_attribute_inner_p): Likewise.
            * config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
            New expand function prototype.
            * config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
            * config/i386/sync.md (atomic_fetch_<logic><mode>): New expander
            for SI,HI,QI modes.
            (atomic_<logic>_fetch<mode>): Likewise.
            (atomic_fetch_nand<mode>): Likewise.
            (atomic_nand_fetch<mode>): Likewise.
            (atomic_fetch_<logic><mode>): New expander for DI,TI modes.
            (atomic_<logic>_fetch<mode>): Likewise.
            (atomic_fetch_nand<mode>): Likewise.
            (atomic_nand_fetch<mode>): Likewise.
            * doc/invoke.texi: Document -mrelax-cmpxchg-loop.

    gcc/testsuite/ChangeLog:

            PR target/103069
            * gcc.target/i386/pr103069-1.c: New test.
            * gcc.target/i386/pr103069-2.c: Ditto.

Reply via email to