Re: Clobbers and Scratch Registers

Richard Sandiford Mon, 21 Aug 2017 10:33:30 -0700

Thanks for doing this.

Alan Modra <amo...@gmail.com> writes:
> This is a revised version of
> https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to
> showing just the scratch register aspect, as a followup to
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html 
>
>       * doc/extend.texi (Extended Asm <Clobbers>): Rename to
>       "Clobbers and Scratch Registers".  Add paragraph on
>       alternative to clobbers for scratch registers and OpenBLAS
>       example.
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 940490e..0637672 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the 
> instructions in the
>  @item Clobbers
>  A comma-separated list of registers or other values changed by the 
>  @var{AssemblerTemplate}, beyond those listed as outputs.
> -An empty list is permitted.  @xref{Clobbers}.
> +An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
>  
>  @item GotoLabels
>  When you are using the @code{goto} form of @code{asm}, this section contains 
> @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the 
> syntax.
>  
>  When the compiler selects the registers to use to 
>  represent the output operands, it does not use any of the clobbered 
> registers 
> -(@pxref{Clobbers}).
> +(@pxref{Clobbers and Scratch Registers}).
>  
>  Output operand expressions must be lvalues. The compiler cannot check 
> whether 
>  the operands have data types that are reasonable for the instruction being 
> @@ -8671,7 +8671,8 @@ as input.  The enclosing parentheses are a required 
> part of the syntax.
>  @end table
>  
>  When the compiler selects the registers to use to represent the input 
> -operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
> +operands, it does not use any of the clobbered registers
> +(@pxref{Clobbers and Scratch Registers}).
>  
>  If there are no output operands but there are input operands, place two 
>  consecutive colons where the output operands would go:
> @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]"
>     : "r" (test), "r" (new), "[result]" (old));
>  @end example
>  
> -@anchor{Clobbers}
> -@subsubsection Clobbers
> +@anchor{Clobbers and Scratch Registers}
> +@subsubsection Clobbers and Scratch Registers
>  @cindex @code{asm} clobbers
> +@cindex @code{asm} scratch registers
>  
>  While the compiler is aware of changes to entries listed in the output 
>  operands, the inline @code{asm} code may modify more than just the outputs. 
> For 
> @@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha)
>  @}
>  @end smallexample
>  
> +Rather than allocating fixed registers via clobbers to provide scratch
> +registers for an @code{asm} statement, an alternative is to define a
> +variable and make it an early-clobber output as with @code{a2} and
> +@code{a3} in the example below.  This gives the compiler register
> +allocator more freedom.  You can also define a variable and make it an
> +output tied to an input as with @code{a0} and @code{a1}, tied
> +respectively to @code{ap} and @code{lda}.


I think it's worth emphasising that tying operands doesn't change
whether an output needs an earlyclobber or not.  E.g. for:

  asm ("%0 = f(%1); use %2"
       : "=r" (a) : "0" (b), "r" (c));

the compiler can assign the same register to all three operands if
it can prove that b == c on entry.  Since %0 is being modified before
%2 is used, it needs to be:

  asm ("%0 = f(%1); use %2"
       : "=&r" (a) : "0" (b), "r" (c));

instead.

Thanks,
Richard

> Of course, with tied
> +outputs your @code{asm} can't use the input value after modifying the
> +output register since they are one and the same register.  Note also
> +that tying an input to an output is the way to set up an initialized
> +temporary register modified by an @code{asm} statement.  An input not
> +tied to an output is assumed by GCC to be unchanged, for example
> +@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that
> +register in following code if the value 16 happened to be needed.  You
> +can even use a normal @code{asm} output for a scratch if all inputs
> +that might share the same register are consumed before the scratch is
> +used.  The VSX registers clobbered by the @code{asm} statement could
> +have used this technique except for GCC's limit on the number of
> +@code{asm} parameters.
> +
> +@smallexample
> +static void
> +dgemv_kernel_4x4 (long n, const double *ap, long lda,
> +                  const double *x, double *y, double alpha)
> +@{
> +  double *a0;
> +  double *a1;
> +  double *a2;
> +  double *a3;
> +
> +  __asm__
> +    (
> +     /* lots of asm here */
> +     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
> +     "#a0=%3 a1=%4 a2=%5 a3=%6"
> +     :
> +       "+m" (*(double (*)[n]) y),
> +       "+r" (n),     // 1
> +       "+b" (y),     // 2
> +       "=b" (a0),    // 3
> +       "=b" (a1),    // 4
> +       "=&b" (a2),   // 5
> +       "=&b" (a3)    // 6
> +     :
> +       "m" (*(const double (*)[n]) x),
> +       "m" (*(const double (*)[]) ap),
> +       "d" (alpha),  // 9
> +       "r" (x),              // 10
> +       "b" (16),     // 11
> +       "3" (ap),     // 12
> +       "4" (lda)     // 13
> +     :
> +       "cr0",
> +       "vs32","vs33","vs34","vs35","vs36","vs37",
> +       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
> +     );
> +@}
> +@end smallexample
> +
>  @anchor{GotoLabels}
>  @subsubsection Goto Labels
>  @cindex @code{asm} goto labels

Re: Clobbers and Scratch Registers

Reply via email to