Re: [PATCH] Asm memory constraints

2017-08-20 Thread Segher Boessenkool
Hi Alan,

On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote:
> +Flushing registers to memory has performance implications and may be
> +an issue for time-sensitive code.  You can provide better information
> +to GCC to avoid this, as shown in the following examples.  At a
> +minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
> +need to be flushed.  Also, if GCC can prove that all of the outputs of
> +a non-volatile @code{asm} statement are unused, then the @code{asm}
> +may be deleted.  Removal of otherwise dead @code{asm} statements will
> +not happen if they clobber @code{"memory"}.

void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); }
void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); }

Both f and g are completely removed by the first jump pass immediately
after expand (via delete_trivially_dead_insns).

Do you have a testcase for the behaviour you saw?


Segher


[committed] Xfail gcc.dg/ipa/ipcp-cstagg-7.c on 32-bit hppa targets

2017-08-20 Thread John David Anglin
Structures larger than 64 bits are passed by invisible reference on 32-bit hppa 
targets.  As noted in PR ipa/77732,
this is not currently handled.  So, we need to xfail these targets.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-08-20  John David Anglin  

PR ipa/77732
* gcc.dg/ipa/ipcp-cstagg-7.c: Xfail on 32-bit hppa.

Index: gcc.dg/ipa/ipcp-cstagg-7.c
===
--- gcc.dg/ipa/ipcp-cstagg-7.c  (revision 251059)
+++ gcc.dg/ipa/ipcp-cstagg-7.c  (working copy)
@@ -62,4 +62,4 @@
   return bar (s, x);
 }
 
-/* { dg-final { scan-ipa-dump-times "Discovered an indirect call to a known 
target" 3 "cp" } } */
+/* { dg-final { scan-ipa-dump-times "Discovered an indirect call to a known 
target" 3 "cp" { xfail { hppa*-*-* && { ! lp64 } } } } } */


[committed] hpux: Fix testsuite/17_intro/names.cc failure

2017-08-20 Thread John David Anglin
On hpux, the namespace for 'd' and 'r' is not clean, so we need to undef them 
in the test.  More
info is available in PR testsuite/81056.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-08-20  John David Anglin  

PR testsuite/81056
* testsuite/17_intro/names.cc: Undef 'd' and 'r' on __hpux__

Index: testsuite/17_intro/names.cc
===
--- testsuite/17_intro/names.cc (revision 248710)
+++ testsuite/17_intro/names.cc (working copy)
@@ -107,4 +107,9 @@
 #undef y
 #endif
 
+#ifdef __hpux__
+#undef d
+#undef r
+#endif
+
 #include 


New German PO file for 'gcc' (version 7.2.0)

2017-08-20 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-7.2.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] Asm memory constraints

2017-08-20 Thread Alan Modra
On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote:
> Hi Alan,
> 
> On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote:
> > +Flushing registers to memory has performance implications and may be
> > +an issue for time-sensitive code.  You can provide better information
> > +to GCC to avoid this, as shown in the following examples.  At a
> > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
> > +need to be flushed.  Also, if GCC can prove that all of the outputs of
> > +a non-volatile @code{asm} statement are unused, then the @code{asm}
> > +may be deleted.  Removal of otherwise dead @code{asm} statements will
> > +not happen if they clobber @code{"memory"}.
> 
> void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); }
> void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); }
> 
> Both f and g are completely removed by the first jump pass immediately
> after expand (via delete_trivially_dead_insns).
> 
> Do you have a testcase for the behaviour you saw?

Oh my.  I was sure that was how "memory" worked!  I see though that
every gcc I have lying around, going all the way back to gcc-2.95,
deletes the asm in your testcase.  I definitely don't want to put
something in the docs that is plain wrong, or just my idea of how
things ought to work, so the last two sentences quoted above need to
go.  Thanks for the correction.

Fixed in this revised patch.  The only controversial aspect now should
be whether those array casts ought to be officially blessed.  I've
checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and
"=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled
apparently properly by alias.c and other code.

For example, at -O3 the following shows gcc moving the read of "val"
before the asm, while an asm using a "memory" clobber forces the read
to occur after the asm.

static int
f (double *x)
{
  int res;
  asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x));
  return res;
}

int val = 123;
double foo[10];

int
main ()
{
  int b = f (foo);
  __builtin_printf ("%d %d\n", val, b);
  return 0;
}


I'm also encouraged by comments like the following by rth in 2004
(gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory
output constraints must continue to be supported.

  /* ??? Really, this should not be here.  Users should be using a
 proper lvalue, dammit.  But there's a long history of using casts
 in the output operands.  In cases like longlong.h, this becomes a
 primitive form of typechecking -- if the cast can be removed, then
 the output operand had a type of the proper width; otherwise we'll
 get an error.  Gross, but ...  */
  STRIP_NOPS (output);


* doc/extend.texi (Clobbers): Correct vax example.  Delete old
example of a memory input for a string of known length.  Move
commentary out of table.  Add a number of new examples
covering array memory inputs.
testsuite/
* gcc.target/i386/asm-mem.c: New test.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 649be01..940490e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8755,7 +8755,7 @@ registers:
 asm volatile ("movc3 %0, %1, %2"
: /* No outputs. */
: "g" (from), "g" (to), "g" (count)
-   : "r0", "r1", "r2", "r3", "r4", "r5");
+   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
 @end example
 
 Also, there are two special clobber arguments:
@@ -8786,14 +8786,72 @@ Note that this clobber does not prevent the 
@emph{processor} from doing
 speculative reads past the @code{asm} statement. To prevent that, you need 
 processor-specific fence instructions.
 
-Flushing registers to memory has performance implications and may be an issue 
-for time-sensitive code.  You can use a trick to avoid this if the size of 
-the memory being accessed is known at compile time. For example, if accessing 
-ten bytes of a string, use a memory input like: 
+@end table
 
-@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}.
+Flushing registers to memory has performance implications and may be
+an issue for time-sensitive code.  You can provide better information
+to GCC to avoid this, as shown in the following examples.  At a
+minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
+need to be flushed.
 
-@end table
+Here is a fictitious sum of squares instruction, that takes two
+pointers to floating point values in memory and produces a floating
+point register output.
+Notice that @code{x}, and @code{y} both appear twice in the @code{asm}
+parameters, once to specify memory accessed, and once to specify a
+base register used by the @code{asm}.  You won't normally be wasting a
+register by doing this as GCC can use the same register for both
+purposes.  However, it would be foolish to use both @code{%1} and
+@code{%3} for @code{x} in this @code{a

Clobbers and Scratch Registers

2017-08-20 Thread Alan Modra
This is a revised version of
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to
showing just the scratch register aspect, as a followup to
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html 

* doc/extend.texi (Extended Asm ): Rename to
"Clobbers and Scratch Registers".  Add paragraph on
alternative to clobbers for scratch registers and OpenBLAS
example.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 940490e..0637672 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the 
instructions in the
 @item Clobbers
 A comma-separated list of registers or other values changed by the 
 @var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted.  @xref{Clobbers}.
+An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
 
 @item GotoLabels
 When you are using the @code{goto} form of @code{asm}, this section contains 
@@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the 
syntax.
 
 When the compiler selects the registers to use to 
 represent the output operands, it does not use any of the clobbered registers 
-(@pxref{Clobbers}).
+(@pxref{Clobbers and Scratch Registers}).
 
 Output operand expressions must be lvalues. The compiler cannot check whether 
 the operands have data types that are reasonable for the instruction being 
@@ -8671,7 +8671,8 @@ as input.  The enclosing parentheses are a required part 
of the syntax.
 @end table
 
 When the compiler selects the registers to use to represent the input 
-operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
 
 If there are no output operands but there are input operands, place two 
 consecutive colons where the output operands would go:
@@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]"
: "r" (test), "r" (new), "[result]" (old));
 @end example
 
-@anchor{Clobbers}
-@subsubsection Clobbers
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
 @cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
 
 While the compiler is aware of changes to entries listed in the output 
 operands, the inline @code{asm} code may modify more than just the outputs. 
For 
@@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha)
 @}
 @end smallexample
 
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, an alternative is to define a
+variable and make it an early-clobber output as with @code{a2} and
+@code{a3} in the example below.  This gives the compiler register
+allocator more freedom.  You can also define a variable and make it an
+output tied to an input as with @code{a0} and @code{a1}, tied
+respectively to @code{ap} and @code{lda}.  Of course, with tied
+outputs your @code{asm} can't use the input value after modifying the
+output register since they are one and the same register.  Note also
+that tying an input to an output is the way to set up an initialized
+temporary register modified by an @code{asm} statement.  An input not
+tied to an output is assumed by GCC to be unchanged, for example
+@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that
+register in following code if the value 16 happened to be needed.  You
+can even use a normal @code{asm} output for a scratch if all inputs
+that might share the same register are consumed before the scratch is
+used.  The VSX registers clobbered by the @code{asm} statement could
+have used this technique except for GCC's limit on the number of
+@code{asm} parameters.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+(
+ /* lots of asm here */
+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+ "#a0=%3 a1=%4 a2=%5 a3=%6"
+ :
+   "+m" (*(double (*)[n]) y),
+   "+r" (n),   // 1
+   "+b" (y),   // 2
+   "=b" (a0),  // 3
+   "=b" (a1),  // 4
+   "=&b" (a2), // 5
+   "=&b" (a3)  // 6
+ :
+   "m" (*(const double (*)[n]) x),
+   "m" (*(const double (*)[]) ap),
+   "d" (alpha),// 9
+   "r" (x),// 10
+   "b" (16),   // 11
+   "3" (ap),   // 12
+   "4" (lda)   // 13
+ :
+   "cr0",
+   "vs32","vs33","vs34","vs35","vs36","vs37",
+   "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+ );
+@}
+@end smallexample
+
 @anchor{GotoLabels}
 @subsubsection Goto Labels
 @cindex @code{asm} goto labels

-- 
Alan Modra
Australia Development Lab, IBM


[patch, fortran] Bug 81296 - derived type I/o problem

2017-08-20 Thread Jerry DeLisle

Hi all,

The attached patch adds a check for the format label containing a "DT" format 
descriptor and enables the generation of the correct code.  The patch modifies 
an existing test case as a future check on this.


Regression tested on x86_64-linux.

OK for trunk and backport to 7?

Regards,

Jerry

2017-08-21  Jerry DeLisle  

PR fortran/81296
* trans-io.c (get_dtio_proc): Add check for format label and set
formatted flag accordingly. Reorganize the code a little.
diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c
index c3c56f29..aa974eb3 100644
--- a/gcc/fortran/trans-io.c
+++ b/gcc/fortran/trans-io.c
@@ -2214,18 +2214,24 @@ get_dtio_proc (gfc_typespec * ts, gfc_code * code, gfc_symbol **dtio_sub)
   bool formatted = false;
   gfc_dt *dt = code->ext.dt;
 
-  if (dt && dt->format_expr)
+  if (dt)
 {
-  char *fmt;
-  fmt = gfc_widechar_to_char (dt->format_expr->value.character.string,
-  -1);
-  if (strtok (fmt, "DT") != NULL)
+  char *fmt = NULL;
+
+  if (dt->format_label == &format_asterisk)
+	{
+	  /* List directed io must call the formatted DTIO procedure.  */
+	  formatted = true;
+	}
+  else if (dt->format_expr)
+	fmt = gfc_widechar_to_char (dt->format_expr->value.character.string,
+  -1);
+  else if (dt->format_label)
+	fmt = gfc_widechar_to_char (dt->format_label->format->value.character.string,
+  -1);
+  if (fmt && strtok (fmt, "DT") != NULL)
 	formatted = true;
-}
-  else if (dt && dt->format_label == &format_asterisk)
-{
-  /* List directed io must call the formatted DTIO procedure.  */
-  formatted = true;
+
 }
 
   if (ts->type == BT_CLASS)
diff --git a/gcc/testsuite/gfortran.dg/dtio_12.f90 b/gcc/testsuite/gfortran.dg/dtio_12.f90
index 213f7ebb..cf1bfe38 100644
--- a/gcc/testsuite/gfortran.dg/dtio_12.f90
+++ b/gcc/testsuite/gfortran.dg/dtio_12.f90
@@ -70,5 +70,11 @@ end module
   rewind (10)
   read (10, *) msg
   if (trim (msg) .ne. "77") call abort
+  rewind (10)
+  write (10,40) child (77) ! Modified using format label
+40 format(DT)
+  rewind (10)
+  read (10, *) msg
+  if (trim (msg) .ne. "77") call abort
   close(10)
 end