Re: [PATCH v3] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

Segher Boessenkool Thu, 29 May 2025 08:37:23 -0700

Hi!

On Thu, May 29, 2025 at 10:36:12AM +0530, jeevitha wrote:
> Changes to amo.h include the addition of the following load atomic operations:
> Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment
> Equal, and Fetch and Decrement Bounded. Additionally, Store Twin is added for
> store atomic operations.
> 
> 2025-05-29  Peter Bergner  <berg...@linux.ibm.com>
> 
> gcc/
>       * config/rs6000/amo.h: Add missing atomic memory operations.
>       * doc/extend.texi (PowerPC Atomic Memory Operation Functions):
>       Document new functions.


Add yourself to suthors as well?

> +/* Implementation of the LWAT/LDAT operations that take two input registers
> +   and modify one word or double-word of memory and return the value that was
> +   previously in the memory location.  The destination and two source
> +   registers are encoded with only one register number, so we need three
> +   consecutive GPR registers and there is no C/C++ type that will give
> +   us that, so we have to use register asm variables to achieve that.

That is the easiest (and probably best) way to do it, yeah.  Always
using the same hardcoded register numbers (8..10) isn't a big problem
the way these funtions are used (if they were ussed more often you
firstly don't want hardcoded #s so you can use multiple of those at the
same spot in the program, and secondly, hardcoding can restrict
scheduling, which in the end probably will mean the compiler will do a
whole bunch of register moves).

> +   The LWAT/LDAT opcode requires the address to be a single register,
> +   and that points to a suitably aligned memory location.  Asm volatile
> +   is used to prevent the optimizer from moving the operation.  */

That is not what asm volatile does.  asm volatile means the asm has an
unspecified side effect, so the asm instruction(s) should appear in
the output file exactly as in the input file, the asm cannot be
optimised away (if the output reg is unused), nor can two identical asms
(with the same inputs and outputs) be executes as just one insn, etc.

Typically you put "asm volatile" on instructions that touch memory with
some side effect (such as here), or on a "darn" or timestamp or similar
insn.

asm volatile does *not* mean an instruction cannot be moved, whatever
that may mean even.

"Load atomic has a side effect, so mark the asm as volatile"?  Something
like that.

> +#define _AMO_LD_CMPSWP(NAME, TYPE, OPCODE, FC)                               
> \
> +static __inline__ TYPE                                                       
> \
> +NAME (TYPE *_PTR, TYPE _COND, TYPE _VALUE)                           \

Please call it _ADDR here as well?  Or name things "ptr" elsewhere as
well, but addr is a better name.

> +{                                                                    \
> +  register TYPE _ret asm ("r8");                                     \
> +  register TYPE _cond asm ("r9") = _COND;                            \
> +  register TYPE _value asm ("r10") = _VALUE;                         \
> +  __asm__ __volatile__ (OPCODE " %[ret],%P[addr],%[code]"            \
> +                     : [addr] "+Q" (_PTR[0]), [ret] "=r" (_ret)      \
> +                     : "r" (_cond), "r" (_value), [code] "n" (FC));  \
> +  return _ret;                                                               
> \
> +}

Naming the operands is an extra indirection, and makes things way less
readable (which means *understandable*) as well.  Just use %0, %1, %2
please?  It's a single line, people will not lose track of what is what
anyway (and if they would, the code is then way too big for extended
asm, so named asm operands is always a code stench).

Please use *_PTR instead of _PTR[0] btw.  Yes, it means the same thing,
but there isn't (necessarily) an array here, let's not suggest there is
one.  It is shorter and more obvious anyway :-)

> +/* Implementation of the LWAT/LDAT fetch and increment operations.
> +
> +   The LWAT/LDAT opcode requires the address to be a single register that
> +   points to a suitably aligned memory location.  Asm volatile is used to
> +   prevent the optimizer from moving the operation.  */

Same things.  Why repeat yourself anyway?

> +#define _AMO_LD_INCREMENT(NAME, TYPE, OPCODE, FC)                    \
> +static __inline__ TYPE                                                       
> \
> +NAME (TYPE *_PTR)                                                    \
> +{                                                                    \
> +  TYPE _RET;                                                         \
> +  __asm__ volatile (OPCODE " %[ret],%P[addr],%[code]\n"                      
> \
> +                 : [addr] "+Q" (_PTR[0]), [ret] "=r" (_RET)          \
> +                 : "Q" (*(TYPE (*)[2]) _PTR), [code] "n" (FC));      \
> +  return _RET;                                                               
> \
> +}

I don't understand the [2].  Should it be [1]?  These instructions
can use the value at mem+s (as the ISA names things) as input, but not
mem+2*s.

> +/* Implementation of the STWAT/STDAT store twin operation that takes
> +   one register and modifies two words or double-wordxs of memory.

(typo, "double-words")

> diff --git a/gcc/testsuite/gcc.target/powerpc/amo3.c 
> b/gcc/testsuite/gcc.target/powerpc/amo3.c
> new file mode 100644
> index 00000000000..08928331a11
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/amo3.c
> @@ -0,0 +1,132 @@
> +/* { dg-do compile { target { lp64 } } } */

Hrm.  Many of these tests are for 32-bit only.  Split the tests maybe?

> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */

That is a) not a good thing to test for here, none of the insns are VSX
in any way shape or foem anyway, and b) there is -mcpu= that guarantees
support for VSX anyway (unless the user has manually disabled things,
but that is not something we can accommodate)

> +/* Test whether the ISA 3.0 amo (atomic memory operations) functions perform 
> as
> +   expected.  */

That sounds like this is a run test, but it isn't.

> diff --git a/gcc/testsuite/gcc.target/powerpc/amo4.c 
> b/gcc/testsuite/gcc.target/powerpc/amo4.c
> new file mode 100644
> index 00000000000..fce85c8dc52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/amo4.c
> @@ -0,0 +1,92 @@
> +/* { dg-do run { target { lp64 && p9vector_hw } } } */

Please use a more appropriate selector?

> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */

And get rid of this last one please.

Thanks,


Segher

Re: [PATCH v3] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

Reply via email to