Re: [PATCH] Asm memory constraints
Hi Alan, On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote: > +Flushing registers to memory has performance implications and may be > +an issue for time-sensitive code. You can provide better information > +to GCC to avoid this, as shown in the following examples. At a > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} > +need to be flushed. Also, if GCC can prove that all of the outputs of > +a non-volatile @code{asm} statement are unused, then the @code{asm} > +may be deleted. Removal of otherwise dead @code{asm} statements will > +not happen if they clobber @code{"memory"}. void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); } void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); } Both f and g are completely removed by the first jump pass immediately after expand (via delete_trivially_dead_insns). Do you have a testcase for the behaviour you saw? Segher
[committed] Xfail gcc.dg/ipa/ipcp-cstagg-7.c on 32-bit hppa targets
Structures larger than 64 bits are passed by invisible reference on 32-bit hppa targets. As noted in PR ipa/77732, this is not currently handled. So, we need to xfail these targets. Dave -- John David Anglin dave.ang...@bell.net 2017-08-20 John David Anglin PR ipa/77732 * gcc.dg/ipa/ipcp-cstagg-7.c: Xfail on 32-bit hppa. Index: gcc.dg/ipa/ipcp-cstagg-7.c === --- gcc.dg/ipa/ipcp-cstagg-7.c (revision 251059) +++ gcc.dg/ipa/ipcp-cstagg-7.c (working copy) @@ -62,4 +62,4 @@ return bar (s, x); } -/* { dg-final { scan-ipa-dump-times "Discovered an indirect call to a known target" 3 "cp" } } */ +/* { dg-final { scan-ipa-dump-times "Discovered an indirect call to a known target" 3 "cp" { xfail { hppa*-*-* && { ! lp64 } } } } } */
[committed] hpux: Fix testsuite/17_intro/names.cc failure
On hpux, the namespace for 'd' and 'r' is not clean, so we need to undef them in the test. More info is available in PR testsuite/81056. Dave -- John David Anglin dave.ang...@bell.net 2017-08-20 John David Anglin PR testsuite/81056 * testsuite/17_intro/names.cc: Undef 'd' and 'r' on __hpux__ Index: testsuite/17_intro/names.cc === --- testsuite/17_intro/names.cc (revision 248710) +++ testsuite/17_intro/names.cc (working copy) @@ -107,4 +107,9 @@ #undef y #endif +#ifdef __hpux__ +#undef d +#undef r +#endif + #include
New German PO file for 'gcc' (version 7.2.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-7.2.0.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] Asm memory constraints
On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote: > Hi Alan, > > On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote: > > +Flushing registers to memory has performance implications and may be > > +an issue for time-sensitive code. You can provide better information > > +to GCC to avoid this, as shown in the following examples. At a > > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} > > +need to be flushed. Also, if GCC can prove that all of the outputs of > > +a non-volatile @code{asm} statement are unused, then the @code{asm} > > +may be deleted. Removal of otherwise dead @code{asm} statements will > > +not happen if they clobber @code{"memory"}. > > void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); } > void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); } > > Both f and g are completely removed by the first jump pass immediately > after expand (via delete_trivially_dead_insns). > > Do you have a testcase for the behaviour you saw? Oh my. I was sure that was how "memory" worked! I see though that every gcc I have lying around, going all the way back to gcc-2.95, deletes the asm in your testcase. I definitely don't want to put something in the docs that is plain wrong, or just my idea of how things ought to work, so the last two sentences quoted above need to go. Thanks for the correction. Fixed in this revised patch. The only controversial aspect now should be whether those array casts ought to be officially blessed. I've checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and "=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled apparently properly by alias.c and other code. For example, at -O3 the following shows gcc moving the read of "val" before the asm, while an asm using a "memory" clobber forces the read to occur after the asm. static int f (double *x) { int res; asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x)); return res; } int val = 123; double foo[10]; int main () { int b = f (foo); __builtin_printf ("%d %d\n", val, b); return 0; } I'm also encouraged by comments like the following by rth in 2004 (gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory output constraints must continue to be supported. /* ??? Really, this should not be here. Users should be using a proper lvalue, dammit. But there's a long history of using casts in the output operands. In cases like longlong.h, this becomes a primitive form of typechecking -- if the cast can be removed, then the output operand had a type of the proper width; otherwise we'll get an error. Gross, but ... */ STRIP_NOPS (output); * doc/extend.texi (Clobbers): Correct vax example. Delete old example of a memory input for a string of known length. Move commentary out of table. Add a number of new examples covering array memory inputs. testsuite/ * gcc.target/i386/asm-mem.c: New test. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 649be01..940490e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8755,7 +8755,7 @@ registers: asm volatile ("movc3 %0, %1, %2" : /* No outputs. */ : "g" (from), "g" (to), "g" (count) - : "r0", "r1", "r2", "r3", "r4", "r5"); + : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); @end example Also, there are two special clobber arguments: @@ -8786,14 +8786,72 @@ Note that this clobber does not prevent the @emph{processor} from doing speculative reads past the @code{asm} statement. To prevent that, you need processor-specific fence instructions. -Flushing registers to memory has performance implications and may be an issue -for time-sensitive code. You can use a trick to avoid this if the size of -the memory being accessed is known at compile time. For example, if accessing -ten bytes of a string, use a memory input like: +@end table -@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}. +Flushing registers to memory has performance implications and may be +an issue for time-sensitive code. You can provide better information +to GCC to avoid this, as shown in the following examples. At a +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} +need to be flushed. -@end table +Here is a fictitious sum of squares instruction, that takes two +pointers to floating point values in memory and produces a floating +point register output. +Notice that @code{x}, and @code{y} both appear twice in the @code{asm} +parameters, once to specify memory accessed, and once to specify a +base register used by the @code{asm}. You won't normally be wasting a +register by doing this as GCC can use the same register for both +purposes. However, it would be foolish to use both @code{%1} and +@code{%3} for @code{x} in this @code{a
Clobbers and Scratch Registers
This is a revised version of https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to showing just the scratch register aspect, as a followup to https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html * doc/extend.texi (Extended Asm ): Rename to "Clobbers and Scratch Registers". Add paragraph on alternative to clobbers for scratch registers and OpenBLAS example. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 940490e..0637672 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the @item Clobbers A comma-separated list of registers or other values changed by the @var{AssemblerTemplate}, beyond those listed as outputs. -An empty list is permitted. @xref{Clobbers}. +An empty list is permitted. @xref{Clobbers and Scratch Registers}. @item GotoLabels When you are using the @code{goto} form of @code{asm}, this section contains @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax. When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers -(@pxref{Clobbers}). +(@pxref{Clobbers and Scratch Registers}). Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being @@ -8671,7 +8671,8 @@ as input. The enclosing parentheses are a required part of the syntax. @end table When the compiler selects the registers to use to represent the input -operands, it does not use any of the clobbered registers (@pxref{Clobbers}). +operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). If there are no output operands but there are input operands, place two consecutive colons where the output operands would go: @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]" : "r" (test), "r" (new), "[result]" (old)); @end example -@anchor{Clobbers} -@subsubsection Clobbers +@anchor{Clobbers and Scratch Registers} +@subsubsection Clobbers and Scratch Registers @cindex @code{asm} clobbers +@cindex @code{asm} scratch registers While the compiler is aware of changes to entries listed in the output operands, the inline @code{asm} code may modify more than just the outputs. For @@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha) @} @end smallexample +Rather than allocating fixed registers via clobbers to provide scratch +registers for an @code{asm} statement, an alternative is to define a +variable and make it an early-clobber output as with @code{a2} and +@code{a3} in the example below. This gives the compiler register +allocator more freedom. You can also define a variable and make it an +output tied to an input as with @code{a0} and @code{a1}, tied +respectively to @code{ap} and @code{lda}. Of course, with tied +outputs your @code{asm} can't use the input value after modifying the +output register since they are one and the same register. Note also +that tying an input to an output is the way to set up an initialized +temporary register modified by an @code{asm} statement. An input not +tied to an output is assumed by GCC to be unchanged, for example +@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that +register in following code if the value 16 happened to be needed. You +can even use a normal @code{asm} output for a scratch if all inputs +that might share the same register are consumed before the scratch is +used. The VSX registers clobbered by the @code{asm} statement could +have used this technique except for GCC's limit on the number of +@code{asm} parameters. + +@smallexample +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) +@{ + double *a0; + double *a1; + double *a2; + double *a3; + + __asm__ +( + /* lots of asm here */ + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*(double (*)[n]) y), + "+r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*(const double (*)[n]) x), + "m" (*(const double (*)[]) ap), + "d" (alpha),// 9 + "r" (x),// 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); +@} +@end smallexample + @anchor{GotoLabels} @subsubsection Goto Labels @cindex @code{asm} goto labels -- Alan Modra Australia Development Lab, IBM
[patch, fortran] Bug 81296 - derived type I/o problem
Hi all, The attached patch adds a check for the format label containing a "DT" format descriptor and enables the generation of the correct code. The patch modifies an existing test case as a future check on this. Regression tested on x86_64-linux. OK for trunk and backport to 7? Regards, Jerry 2017-08-21 Jerry DeLisle PR fortran/81296 * trans-io.c (get_dtio_proc): Add check for format label and set formatted flag accordingly. Reorganize the code a little. diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c index c3c56f29..aa974eb3 100644 --- a/gcc/fortran/trans-io.c +++ b/gcc/fortran/trans-io.c @@ -2214,18 +2214,24 @@ get_dtio_proc (gfc_typespec * ts, gfc_code * code, gfc_symbol **dtio_sub) bool formatted = false; gfc_dt *dt = code->ext.dt; - if (dt && dt->format_expr) + if (dt) { - char *fmt; - fmt = gfc_widechar_to_char (dt->format_expr->value.character.string, - -1); - if (strtok (fmt, "DT") != NULL) + char *fmt = NULL; + + if (dt->format_label == &format_asterisk) + { + /* List directed io must call the formatted DTIO procedure. */ + formatted = true; + } + else if (dt->format_expr) + fmt = gfc_widechar_to_char (dt->format_expr->value.character.string, + -1); + else if (dt->format_label) + fmt = gfc_widechar_to_char (dt->format_label->format->value.character.string, + -1); + if (fmt && strtok (fmt, "DT") != NULL) formatted = true; -} - else if (dt && dt->format_label == &format_asterisk) -{ - /* List directed io must call the formatted DTIO procedure. */ - formatted = true; + } if (ts->type == BT_CLASS) diff --git a/gcc/testsuite/gfortran.dg/dtio_12.f90 b/gcc/testsuite/gfortran.dg/dtio_12.f90 index 213f7ebb..cf1bfe38 100644 --- a/gcc/testsuite/gfortran.dg/dtio_12.f90 +++ b/gcc/testsuite/gfortran.dg/dtio_12.f90 @@ -70,5 +70,11 @@ end module rewind (10) read (10, *) msg if (trim (msg) .ne. "77") call abort + rewind (10) + write (10,40) child (77) ! Modified using format label +40 format(DT) + rewind (10) + read (10, *) msg + if (trim (msg) .ne. "77") call abort close(10) end