Re: [PATCH] For broken exception handling in GDB on AIX platform

2017-03-01 Thread Nitish Kumar Mishra
Hi,
I have opened a defect for the same here:
https://sourceware.org/bugzilla/show_bug.cgi?id=21187

Thanks and Regards,
Nitish K Mishra

On Wed, Mar 1, 2017 at 1:25 PM, Nitish Kumar Mishra
 wrote:
> Hi,
> The patch is for the broken exception handling in GDB on AIX platform.
> When linked statically with libstdc++ and libgcc on AIX platform, GDB
> is facing broken exception handling issues.
> Following is the error output when GDB is linked statically with
> mentioned libraries: (GDB-7.12.1, built with GCC-6.2, 64 bit mode, AIX
> platform):
>
> # ./gdb
> GNU gdb (GDB) 7.12.1
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "powerpc64-ibm-aix7.2.0.0".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> (gdb) kill
> terminate called after throwing an instance of 
> 'gdb_exception_RETURN_MASK_ERROR'
> IOT/Abort trap (core dumped)
>
>
> The issue has been discussed here:
> https://sourceware.org/ml/gdb/2017-02/msg00047.html
>
> I have manually built and tested GDB-7.12.1 with this patch on AIX-7.2
> and Ubuntu-16.04 with GCC-6.2 and GCC-4.8.5. On both operating system
> GDB is working fine with the patch. I generated configure file using
> autoconf-2.64.
>
> The attached patch is for configure.ac file in binutils-gdb, in which
> one more option "--disable-staticlib" is implemented to link libstdc++
> and libgcc dynamically.
> I believe this issue is specific to AIX platform.
>
> Please find the attachment for patch file, and ChangeLog file.
>
> Thanks and Regards,
> Nitish K Mishra


Re: [RFA PATCH, i386]: Warn for 64-bit values in general-reg asm operands and error out for 8-bit values in invalid GR asm operand

2017-03-01 Thread Uros Bizjak
On Tue, Feb 28, 2017 at 4:34 PM, Uros Bizjak  wrote:
> On Tue, Feb 28, 2017 at 12:06 PM, Jakub Jelinek  wrote:
>> On Tue, Feb 28, 2017 at 11:41:56AM +0100, Richard Biener wrote:
>>> > 2017-02-28  Uros Bizjak  
>>> >
>>> > * config/i386/i386.c (print_reg): Warn for values of 64-bit size
>>> > in integer register on 32-bit targets.  Error out for values of
>>> > 8-bit size in invalid integer register.
>>> >
>>> > testsuite/ChangeLog:
>>> >
>>> > 2017-02-28  Uros Bizjak  
>>> >
>>> > * gcc.target/i386/invsize-1.c: New test.
>>> > * gcc.target/i386/invsize-2.c: Ditto.
>>> >
>>> > OK for mainline in stage 4?
>>>
>>> Yes.
>>
>> Have you tried to build say Linux kernel or firefox or similar large
>> codebase with lots of inline asm with that?
>
> No...
>
>> What constraint should people use for long long vars in 32-bit code?
>> "A" constraint is used a lot in 32-bit code (say for inline asm with
>> rdtsc), but what if you need more than one long long input?
>
> Hm, we don't guarantee DImode register pairs other that "A"
> constraint. But you are right, we have to allow "A" for 64bit eax/edx
> pair.

Some more thoughts on 64-bit reg on 32-bit targets warning.

Actually, we never *print* register name for instruction that use "A"
constraint, since %eax/%edx is always implicit  The warning does not
deal with constraints, so unless we want to output DImode register
name, there is no warning.

Runing the complete testsuite on a patched compiler, following
testcases trigger wartning on 32-bit targets:

FAIL: gcc.target/i386/pr66274.c (test for excess errors)
FAIL: gcc.target/i386/stackalign/asm-1.c -mstackrealign (test for excess errors)
FAIL: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign (test for
excess errors)

Some analysis:

pr66274:

void f()
{
  asm ("push %0" : : "r" ((unsigned long long) 456));
}

compiles to:

f:
movl$456, %eax
movl$0, %edx
#APP
# 6 "pr66274.c" 1
push %eax
# 0 "" 2
#NO_APP

The warning is correct, we didn't push the whole longlong value.

stackaling/asm-1.c:

void f(){asm("%0"::"r"(1.5F));}void g(){asm("%0"::"r"(1.5));}

resulting in:

g:
...
fldl.LC1
fstpl   -16(%ebp)
movl-16(%ebp), %eax
movl-12(%ebp), %edx
#APP
# 7 "asm-1.c" 1
%eax
# 0 "" 2
#NO_APP

We want to handle DFmode value (1.5) in the asm on 32bit target. Since
the value is 64bit, and we use 32bit register, the warning is correct
and beneficial in this case.

I will compile linux kernel with the patched compiler, but looking at
the above two cases, I'd say that warning is indeed helpful.

Uros.


Re: [RFA PATCH, i386]: Warn for 64-bit values in general-reg asm operands and error out for 8-bit values in invalid GR asm operand

2017-03-01 Thread Jakub Jelinek
On Wed, Mar 01, 2017 at 09:34:53AM +0100, Uros Bizjak wrote:
> Some more thoughts on 64-bit reg on 32-bit targets warning.
> 
> Actually, we never *print* register name for instruction that use "A"
> constraint, since %eax/%edx is always implicit  The warning does not
> deal with constraints, so unless we want to output DImode register
> name, there is no warning.

Ah, indeed, we don't have a modifier that would print the high register
of a register pair (i.e. essentially print REGNO (x) + 1 instead of REGNO
(x)), guess that might be useful not just for 64-bit GPR operands in 32-bit
code, but also 128-bit GPR operands in 64-bit code.

While looking at ix86_print_operand, I've noticed duplication in the
comment:
   w -- print the operand as if it's a "word" (HImode) even if it isn't.
   s -- print a shift double count, followed by the assemblers argument
delimiter.
   b -- print the QImode name of the register for the indicated operand.
%b0 would print %al if operands[0] is reg 0.
   w --  likewise, print the HImode name of the register.
   k --  likewise, print the SImode name of the register.
   q --  likewise, print the DImode name of the register.
   x --  likewise, print the V4SFmode name of the register.
   t --  likewise, print the V8SFmode name of the register.
   g --  likewise, print the V16SFmode name of the register.

w is documented twice, guess the first line should be removed.

Jakub


Re: [RFA PATCH, i386]: Warn for 64-bit values in general-reg asm operands and error out for 8-bit values in invalid GR asm operand

2017-03-01 Thread Uros Bizjak
On Wed, Mar 1, 2017 at 9:48 AM, Jakub Jelinek  wrote:
> On Wed, Mar 01, 2017 at 09:34:53AM +0100, Uros Bizjak wrote:
>> Some more thoughts on 64-bit reg on 32-bit targets warning.
>>
>> Actually, we never *print* register name for instruction that use "A"
>> constraint, since %eax/%edx is always implicit  The warning does not
>> deal with constraints, so unless we want to output DImode register
>> name, there is no warning.
>
> Ah, indeed, we don't have a modifier that would print the high register
> of a register pair (i.e. essentially print REGNO (x) + 1 instead of REGNO
> (x)), guess that might be useful not just for 64-bit GPR operands in 32-bit
> code, but also 128-bit GPR operands in 64-bit code.

The issue here is that (modulo ax/dx with "A" constraint) we don't
guarantee double-register sequence order, so any change in register
allocation order would break any assumptions. For implicit ax/dx, user
should explicitly use register name (e.g. DImode operand in "rdtscp;
mov %0, mem" asm should be corrected to use %%eax instead of %0).

And, yes - we should add similar warning for 128-bit GPRs. The only
way to use register pair with  width > machine_mode is with implicit
operands or with explicit regnames.

> While looking at ix86_print_operand, I've noticed duplication in the
> comment:
>w -- print the operand as if it's a "word" (HImode) even if it isn't.
>s -- print a shift double count, followed by the assemblers argument
> delimiter.
>b -- print the QImode name of the register for the indicated operand.
> %b0 would print %al if operands[0] is reg 0.
>w --  likewise, print the HImode name of the register.
>k --  likewise, print the SImode name of the register.
>q --  likewise, print the DImode name of the register.
>x --  likewise, print the V4SFmode name of the register.
>t --  likewise, print the V8SFmode name of the register.
>g --  likewise, print the V16SFmode name of the register.
>
> w is documented twice, guess the first line should be removed.

Indeed. The first one should be removed.

Uros.


Re: [PATCH] Some more translation related tweaks

2017-03-01 Thread Tom de Vries

On 27/02/17 18:33, Jakub Jelinek wrote:

On Mon, Feb 27, 2017 at 12:47:09PM +, Joseph Myers wrote:

On Mon, 27 Feb 2017, Jakub Jelinek wrote:


On Mon, Feb 27, 2017 at 11:04:36AM +0100, Volker Reichelt wrote:

This is not -Wformat-security friendly, perhaps better
  pedwarn (EXPR_LOC_OR_LOC (outer_nelts, input_location), OPT_Wvla,
   typedef_variant_p (orig_type)
   ? "non-constant array new length must be specified "
 "directly, not by typedef"
   : "non-constant array new length must be specified "
 "without parentheses around the type-id");
?


Not quite. Like this the second string doesn't end up in the gcc.pot
file for translation. I had to wrap the second string in G_(...) to make
it work. (I'll have a look for other instances of this pattern and
prepare a separate patch.)


Looks like a xgettext bug or missing feature :(.  Joseph, shall we just
change all those to be G_() around the second string (well, some could be


Yes, it's generally the case that G_() is used whenever there's a
conditional expression for the msgid argument to a diagnostic function.


So, is this ok for trunk?  Shall I regenerate gcc.pot or will you?

2017-02-27  Jakub Jelinek  

* config/i386/i386.c (ix86_option_override_internal): Use
cond ? G_("...") : G_("...") instead of just cond ? "..." : "...".
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Likewise.



--- gcc/config/nvptx/nvptx.c.jj 2017-02-21 15:36:03.0 +0100
+++ gcc/config/nvptx/nvptx.c2017-02-27 15:48:20.031688240 +0100
@@ -4542,8 +4542,8 @@ nvptx_goacc_validate_dims (tree decl, in
   if (fn_level < 0 && dims[GOMP_DIM_VECTOR] >= 0)
warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
dims[GOMP_DIM_VECTOR]
-   ? "using vector_length (%d), ignoring %d"
-   : "using vector_length (%d), ignoring runtime setting",
+   ? G_("using vector_length (%d), ignoring %d")
+   : G_("using vector_length (%d), ignoring runtime setting"),
PTX_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
   dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
   changed = true;


This breaks the nvptx build:
...
src/gcc-mainline/gcc/config/nvptx/nvptx.c: In function 'bool 
nvptx_goacc_validate_dims(tree, int*, int)':
src/gcc-mainline/gcc/config/nvptx/nvptx.c:4545:51: error: 'G_' was not 
declared in this scope

...

I suppose an
  #include "intl.h"
will fix that.

Thanks,
- Tom


Re: [RFA PATCH, i386]: Warn for 64-bit values in general-reg asm operands and error out for 8-bit values in invalid GR asm operand

2017-03-01 Thread Uros Bizjak
On Wed, Mar 1, 2017 at 10:00 AM, Uros Bizjak  wrote:
> On Wed, Mar 1, 2017 at 9:48 AM, Jakub Jelinek  wrote:
>> On Wed, Mar 01, 2017 at 09:34:53AM +0100, Uros Bizjak wrote:
>>> Some more thoughts on 64-bit reg on 32-bit targets warning.
>>>
>>> Actually, we never *print* register name for instruction that use "A"
>>> constraint, since %eax/%edx is always implicit  The warning does not
>>> deal with constraints, so unless we want to output DImode register
>>> name, there is no warning.
>>
>> Ah, indeed, we don't have a modifier that would print the high register
>> of a register pair (i.e. essentially print REGNO (x) + 1 instead of REGNO
>> (x)), guess that might be useful not just for 64-bit GPR operands in 32-bit
>> code, but also 128-bit GPR operands in 64-bit code.
>
> The issue here is that (modulo ax/dx with "A" constraint) we don't
> guarantee double-register sequence order, so any change in register
> allocation order would break any assumptions. For implicit ax/dx, user
> should explicitly use register name (e.g. DImode operand in "rdtscp;
> mov %0, mem" asm should be corrected to use %%eax instead of %0).
>
> And, yes - we should add similar warning for 128-bit GPRs. The only
> way to use register pair with  width > machine_mode is with implicit
> operands or with explicit regnames.

Something like the following patch I'm testing:

--cut here--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2b11aa1..943b2a0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17646,13 +17646,16 @@ print_reg (rtx x, int code, FILE *file)

   switch (msize)
 {
+case 16:
+case 12:
 case 8:
+  if (GENERAL_REGNO_P (regno) && msize > GET_MODE_SIZE (word_mode))
+   warning (0, "unsupported size for integer register");
+  /* FALLTHRU */
 case 4:
   if (LEGACY_INT_REGNO_P (regno))
putc (msize == 8 && TARGET_64BIT ? 'r' : 'e', file);
   /* FALLTHRU */
-case 16:
-case 12:
 case 2:
 normal:
   reg = hi_reg_name[regno];
--cut here--

Uros.


Re: [PATCH] Some more translation related tweaks

2017-03-01 Thread Jakub Jelinek
On Wed, Mar 01, 2017 at 11:20:37AM +0100, Tom de Vries wrote:
> > 2017-02-27  Jakub Jelinek  
> > 
> > * config/i386/i386.c (ix86_option_override_internal): Use
> > cond ? G_("...") : G_("...") instead of just cond ? "..." : "...".
> > * config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Likewise.
> 
> > --- gcc/config/nvptx/nvptx.c.jj 2017-02-21 15:36:03.0 +0100
> > +++ gcc/config/nvptx/nvptx.c2017-02-27 15:48:20.031688240 +0100
> > @@ -4542,8 +4542,8 @@ nvptx_goacc_validate_dims (tree decl, in
> >if (fn_level < 0 && dims[GOMP_DIM_VECTOR] >= 0)
> > warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
> > dims[GOMP_DIM_VECTOR]
> > -   ? "using vector_length (%d), ignoring %d"
> > -   : "using vector_length (%d), ignoring runtime setting",
> > +   ? G_("using vector_length (%d), ignoring %d")
> > +   : G_("using vector_length (%d), ignoring runtime setting"),
> > PTX_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
> >dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
> >changed = true;
> 
> This breaks the nvptx build:
> ...
> src/gcc-mainline/gcc/config/nvptx/nvptx.c: In function 'bool
> nvptx_goacc_validate_dims(tree, int*, int)':
> src/gcc-mainline/gcc/config/nvptx/nvptx.c:4545:51: error: 'G_' was not
> declared in this scope
> ...

Oops, sorry, I've fixed it in i386.c and two other files, but missed
nvptx.c.  Fixed thusly, committed to trunk:

2017-03-01  Jakub Jelinek  

* config/nvptx/nvptx.c: Include intl.h.

--- gcc/config/nvptx/nvptx.c.jj 2017-02-28 16:24:05.0 +0100
+++ gcc/config/nvptx/nvptx.c2017-03-01 11:24:04.178802355 +0100
@@ -69,6 +69,7 @@
 #include "tree-phinodes.h"
 #include "cfgloop.h"
 #include "fold-const.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"


Jakub


Re: [PATCH], PR target/79434, fix PowerPC recursive calls that can replaced at runtime

2017-03-01 Thread Segher Boessenkool
On Wed, Mar 01, 2017 at 01:37:14AM -0500, Michael Meissner wrote:
> This patch fixes PR target/79439, which is a recursive call when the 64-bit
> code is compiled with -fpic doesn't have the NOP after the call.  It is
> possible for the function to be overriden at link time.  In such a case, the
> call should call the module that is overriding the call, rather than itself.
> 
> The following patch was tested on a little endian Power8 Linux system (64-bit
> only), a big endian Power8 Linux system (both 32-bit and 64-bit), and a big
> endian Power7 Linux system (both 32-bit and 64-bit).  There were no 
> regressions
> in the test suite, and I verified that the new test ran successfully in 64-bit
> mode.  Can I check this patch into the trunk?

Yes, thanks!

> Since the bug was reported against GCC 6, can I apply the patch to GCC 6
> assuming the patch applies cleanly and has no regressions after a burn in
> period on the GCC 7 trunk?

Of course.  Also for GCC 5, if it is worth fixing it there?

Some questions/comments about the testcase:

> Index: gcc/testsuite/gcc.target/powerpc/pr79439.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr79439.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr79439.c(revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */

Is this enough?  Do all 64-bit ABIs have the insn to be patched after
call instructions?

> +/* { dg-require-effective-target powerpc_p8vector_ok } */

Why this?

> +/* Bug 79439 -- we should not eliminate NOP in 'rec' call because it can be
> +   interposed at link time for 64-bit ABIs.  We need -fpic to tell the 
> compiler
> +   functions may be interposed.  */

That reads as "cannot be interposed on 32-bit ABIs", which isn't what
you mean I think.

> +/* { dg-final { scan-assembler-times {\mnop\M} 3 } } */

You can also check they follow a "bl" insn immediately (scan-assembler
does not scan single lines, but the whole output).  Something like

{ scan-assembler-times {\mbl \S+\s+nop\M} 3 }

Or maybe this is overkill here :-)


Segher


Poll for option name (Was: [PATCH v6] add -fprolog-pad=N,M option)

2017-03-01 Thread Torsten Duwe
On Fri, Feb 17, 2017 at 11:30:29PM -0700, Sandra Loosemore wrote:
> >
> >+@item prolog_pad
> >+@cindex @code{prolog_pad} function attribute
> 
> I'm only a documentation maintainer so this is out of my area of
> responsibility, but I really wish we could rename the attribute and
> command-line option.  Per
> 
> per https://gcc.gnu.org/codingconventions.html#Spelling
> 
> the correct spelling is "prologue".
> 
> >+@cindex extra NOP instructions at the function entry point
> >+In case the target's text segment can be made writable at run time
> >+by any means, padding the function entry with a number of NOPs can
> >+be used to provide a universal tool for instrumentation.  Usually,
> >+prolog padding is enabled globally using the @option{-fprolog-pad=N,M}
> 
> definitely s/prolog/prologue/ in the running text here.

Well, you're definitely right in both cases.

About 400 occurrences of "prolog" in the source without ChangeLogs,
mainly in gcc/tree-vect-loop-manip.c and in libgcc/config/libbid/bid_conf.h;
about 3000 lines with "prologue". However there is a "-mprolog-function"
switch. One might call this a broken window. I don't want to contribute
to that.

However, writing some more documentation and being asked for clarity,
I found it more depicting to talk about the function entry point than
about the prologue. Also, this is about generic instrumentation, and it
surely involves NOPs.

So, hereby I'd like to start a small poll for a good name for this feature.
Anyone with a better idea please speak up now. Otherwise I'll just
s/prolog/prologue/g.

> The amount of space reserved is expressed as the number of NOP instructions
> to insert. On targets that have multiple instruction sizes, typically the
> smallest NOP instruction available for the current CPU mode is used to
> achieve the finest granularity.

I've made another improvement which makes the code even more robust now.
+DEF_TARGET_INSN (nop, (void))
In gcc/target-insns.def. This way I can easily check whether there is a
(define_insn "nop" ...) in the target md. Currently, all CPUs have it, but
who knows.

This will also be the default instruction used (It can be overridden
in the terget hook), so that rule has changed.

So, before the next version, any clever name suggestions?

Torsten



Re: Poll for option name (Was: [PATCH v6] add -fprolog-pad=N,M option)

2017-03-01 Thread Richard Earnshaw (lists)
On 01/03/17 11:26, Torsten Duwe wrote:
> On Fri, Feb 17, 2017 at 11:30:29PM -0700, Sandra Loosemore wrote:
>>>
>>> +@item prolog_pad
>>> +@cindex @code{prolog_pad} function attribute
>>
>> I'm only a documentation maintainer so this is out of my area of
>> responsibility, but I really wish we could rename the attribute and
>> command-line option.  Per
>>
>> per https://gcc.gnu.org/codingconventions.html#Spelling
>>
>> the correct spelling is "prologue".
>>
>>> +@cindex extra NOP instructions at the function entry point
>>> +In case the target's text segment can be made writable at run time
>>> +by any means, padding the function entry with a number of NOPs can
>>> +be used to provide a universal tool for instrumentation.  Usually,
>>> +prolog padding is enabled globally using the @option{-fprolog-pad=N,M}
>>
>> definitely s/prolog/prologue/ in the running text here.
> 
> Well, you're definitely right in both cases.
> 
> About 400 occurrences of "prolog" in the source without ChangeLogs,
> mainly in gcc/tree-vect-loop-manip.c and in libgcc/config/libbid/bid_conf.h;
> about 3000 lines with "prologue". However there is a "-mprolog-function"
> switch. One might call this a broken window. I don't want to contribute
> to that.
> 
> However, writing some more documentation and being asked for clarity,
> I found it more depicting to talk about the function entry point than
> about the prologue. Also, this is about generic instrumentation, and it
> surely involves NOPs.
> 
> So, hereby I'd like to start a small poll for a good name for this feature.
> Anyone with a better idea please speak up now. Otherwise I'll just
> s/prolog/prologue/g.

Hmm, I'd prefer the bike shed to be green :-)

How about --fpatchable-function-entry=?

> 
>> The amount of space reserved is expressed as the number of NOP instructions
>> to insert. On targets that have multiple instruction sizes, typically the
>> smallest NOP instruction available for the current CPU mode is used to
>> achieve the finest granularity.
> 
> I've made another improvement which makes the code even more robust now.
> +DEF_TARGET_INSN (nop, (void))
> In gcc/target-insns.def. This way I can easily check whether there is a
> (define_insn "nop" ...) in the target md. Currently, all CPUs have it, but
> who knows.

The mid-end already has direct calls to gen_nop with no guards on the
pattern existing,  So the compiler won't build without a NOP pattern.

> 
> This will also be the default instruction used (It can be overridden
> in the terget hook), so that rule has changed.
> 
> So, before the next version, any clever name suggestions?
> 
>   Torsten
> 

R.


[PATCH] Suppress compiler warning in libgcc/unwind-seh.c

2017-03-01 Thread JonY
Patch OK?

ChangeLog:
* unwind-seh.c: Suppress warnings for RtlUnwindEx() calls.
Index: libgcc/unwind-seh.c
===
--- libgcc/unwind-seh.c	(revision 245806)
+++ libgcc/unwind-seh.c	(working copy)
@@ -221,7 +221,7 @@
 	 test is that we're the target frame.  */
   if (ms_exc->ExceptionInformation[1] == (_Unwind_Ptr) this_frame)
 	{
-	  RtlUnwindEx (this_frame, ms_exc->ExceptionInformation[2],
+	  RtlUnwindEx (this_frame, (PVOID) ms_exc->ExceptionInformation[2],
 		   ms_exc, gcc_exc, ms_orig_context,
 		   ms_disp->HistoryTable);
 	  abort ();
@@ -313,7 +313,7 @@
 	  ms_exc->ExceptionInformation[3] = gcc_context.reg[1];
 
 	  /* Begin phase 2.  Perform the unwinding.  */
-	  RtlUnwindEx (this_frame, gcc_context.ra, ms_exc,
+	  RtlUnwindEx (this_frame, (PVOID)gcc_context.ra, ms_exc,
 		   (PVOID)gcc_context.reg[0], ms_orig_context,
 		   ms_disp->HistoryTable);
 	}
@@ -365,7 +365,7 @@
   ms_context.ContextFlags = CONTEXT_ALL;
   RtlCaptureContext (&ms_context);
 
-  RtlUnwindEx ((void *) gcc_exc->private_[1], gcc_exc->private_[2],
+  RtlUnwindEx ((void *) gcc_exc->private_[1], (PVOID)gcc_exc->private_[2],
 	   &ms_exc, gcc_exc, &ms_context, &ms_history);
 
   /* Is RtlUnwindEx declared noreturn?  */


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Sorry for the late reply, but:

Richard Biener  writes:
> On Mon, 7 Nov 2016, Richard Biener wrote:
>
>> 
>> Currently we force peeling for gaps whenever element overrun can occur
>> but for aligned accesses we know that the loads won't trap and thus
>> we can avoid this.
>> 
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> some testsuite fallout here so didn't bother to invent a new testcase).
>> 
>> Just in case somebody thinks the overrun is a bad idea in general
>> (even when not trapping).  Like for ASAN or valgrind.
>
> This is what I applied.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard.
[...]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 15aec21..c29e73d 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
> bool slp,
>/* If there is a gap at the end of the group then these optimizations
>would access excess elements in the last iteration.  */
>bool would_overrun_p = (gap != 0);
> +  /* If the access is aligned an overrun is fine.  */
> +  if (would_overrun_p
> +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> + would_overrun_p = false;
>if (!STMT_VINFO_STRIDED_P (stmt_info)
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (stmt) > 0)

...is this right for all cases?  I think it only looks for single-vector
alignment, but the gap can in principle be vector-sized or larger,
at least for load-lanes.

E.g. say we have a 128-bit vector of doubles in a group of size 4
and a gap of 2 or 3.  Even if the access itself is aligned, the group
spans two vectors and we have no guarantee that the second one
is mapped.

I haven't been able to come up with a testcase though.  We seem to be
overly conservative when computing alignments.

Thanks,
Richard


[PATCH PR66768]Skip address type iv_use if base object can't be determined

2017-03-01 Thread Bin Cheng
Hi,
As reported in PR66768, IVOPTs drops address-space information.  Root cause is 
IVOPTs fails to preserve base-object during identifying/rewriting address type 
iv_use for pointers converted from constant values.  This patch just skips 
address type iv_use if base-object can't be determined.  In the future, better 
handling of base-object is needed to fix the issue.  Benchmark data show the 
added condition is only triggered twice in spec2k6, and the two cases are 
actually null-pointer references.  I believe that code is dead in benchmark and 
is never executed.

Bootstrap and test on x86_64.  Is it OK?
2017-02-27  Bin Cheng  

PR tree-optimization/66768
* tree-ssa-loop-ivopts.c (find_interesting_uses_address): Skip addr
iv_use if base object can't be determined.

2017-02-27  Bin Cheng  

PR tree-optimization/66768
* gcc.target/i386/pr66768.c: New test.diff --git a/gcc/testsuite/gcc.target/i386/pr66768.c 
b/gcc/testsuite/gcc.target/i386/pr66768.c
new file mode 100644
index 000..9a8ad1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66768.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef __seg_gs struct foo_s {
+  int a[20];
+} foo_t;
+
+int sum(void)
+{
+  const foo_t *p = (const foo_t *)0x1234;
+  int i, total=0;
+  for (i=0; i<20; i++)
+total += p->a[i];
+  return total;
+}
+
+/* { dg-final { scan-assembler "add*.\[ \t\]%gs:" } } */
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index c9d16b2..f3ad373 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2324,6 +2324,10 @@ find_interesting_uses_address (struct ivopts_data *data, 
gimple *stmt,
 }
 
   civ = alloc_iv (data, base, step);
+  /* Fail if base object of this memory reference is unknown.  */
+  if (civ->base_object == NULL_TREE)
+goto fail;
+
   record_group_use (data, op_p, civ, stmt, USE_ADDRESS);
   return;
 


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Sorry for the late reply, but:
> 
> Richard Biener  writes:
> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >
> >> 
> >> Currently we force peeling for gaps whenever element overrun can occur
> >> but for aligned accesses we know that the loads won't trap and thus
> >> we can avoid this.
> >> 
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> some testsuite fallout here so didn't bother to invent a new testcase).
> >> 
> >> Just in case somebody thinks the overrun is a bad idea in general
> >> (even when not trapping).  Like for ASAN or valgrind.
> >
> > This is what I applied.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Richard.
> [...]
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 15aec21..c29e73d 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> > vectype, bool slp,
> >/* If there is a gap at the end of the group then these optimizations
> >  would access excess elements in the last iteration.  */
> >bool would_overrun_p = (gap != 0);
> > +  /* If the access is aligned an overrun is fine.  */
> > +  if (would_overrun_p
> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> > +   would_overrun_p = false;
> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >   && (can_overrun_p || !would_overrun_p)
> >   && compare_step_with_zero (stmt) > 0)
> 
> ...is this right for all cases?  I think it only looks for single-vector
> alignment, but the gap can in principle be vector-sized or larger,
> at least for load-lanes.
>
> E.g. say we have a 128-bit vector of doubles in a group of size 4
> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> spans two vectors and we have no guarantee that the second one
> is mapped.

The check assumes that if aligned_access_p () returns true then the
whole access is aligned in a way that it can't cross page boundaries.
That's of course not the case if alignment is 16 bytes but the access
will be a multiple of that.
 
> I haven't been able to come up with a testcase though.  We seem to be
> overly conservative when computing alignments.

Not sure if we can run into this with load-lanes given that bumps the
vectorization factor.  Also does load-lane work with gaps?

I think that gap can never be larger than nunits-1 so it is by definition
in the last "vector" independent of the VF.

Classical gap case is

for (i=0; i

[PATCH] Add wide_int_storage::operator=

2017-03-01 Thread Richard Biener

In debugging a -Wuninitialized issue from ipa-cp.c which does

  vr.min = vr.max = wi::zero (INT_TYPE_SIZE);

I figured we are missing this operator and are thus copying possibly
uninitialized data.

This means instead of a plain assignment of wide_int_storage we
get a loop here.  So I'm not 100% sure this "omission" wasn't on
purpose.  Note there already is a copy constructor implemented
in terms of wi::copy.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2017-03-01  Richard Biener  

* wide-int.h (wide_int_storage::operator=): Implement in terms
of wi::copy.

Index: gcc/wide-int.h
===
--- gcc/wide-int.h  (revision 245803)
+++ gcc/wide-int.h  (working copy)
@@ -1019,6 +1019,9 @@ public:
   HOST_WIDE_INT *write_val ();
   void set_len (unsigned int, bool = false);
 
+  template 
+  wide_int_storage &operator = (const T &);
+
   static wide_int from (const wide_int_ref &, unsigned int, signop);
   static wide_int from_array (const HOST_WIDE_INT *, unsigned int,
  unsigned int, bool = true);
@@ -1058,6 +1061,18 @@ inline wide_int_storage::wide_int_storag
   wi::copy (*this, xi);
 }
 
+template 
+inline wide_int_storage&
+wide_int_storage::operator = (const T &x)
+{
+  { STATIC_ASSERT (!wi::int_traits::host_dependent_precision); }
+  { STATIC_ASSERT (wi::int_traits::precision_type != wi::CONST_PRECISION); }
+  WIDE_INT_REF_FOR (T) xi (x);
+  precision = xi.precision;
+  wi::copy (*this, xi);
+  return *this;
+}
+
 inline unsigned int
 wide_int_storage::get_precision () const
 {


Re: [PATCH PR66768]Skip address type iv_use if base object can't be determined

2017-03-01 Thread Richard Biener
On Wed, Mar 1, 2017 at 1:03 PM, Bin Cheng  wrote:
> Hi,
> As reported in PR66768, IVOPTs drops address-space information.  Root cause 
> is IVOPTs fails to preserve base-object during identifying/rewriting address 
> type iv_use for pointers converted from constant values.  This patch just 
> skips address type iv_use if base-object can't be determined.  In the future, 
> better handling of base-object is needed to fix the issue.  Benchmark data 
> show the added condition is only triggered twice in spec2k6, and the two 
> cases are actually null-pointer references.  I believe that code is dead in 
> benchmark and is never executed.
>
> Bootstrap and test on x86_64.  Is it OK?

Ok.

Richard.

> 2017-02-27  Bin Cheng  
>
> PR tree-optimization/66768
> * tree-ssa-loop-ivopts.c (find_interesting_uses_address): Skip addr
> iv_use if base object can't be determined.
>
> 2017-02-27  Bin Cheng  
>
> PR tree-optimization/66768
> * gcc.target/i386/pr66768.c: New test.


Re: [PATCH] Add wide_int_storage::operator=

2017-03-01 Thread Jakub Jelinek
On Wed, Mar 01, 2017 at 01:08:58PM +0100, Richard Biener wrote:
> 
> In debugging a -Wuninitialized issue from ipa-cp.c which does
> 
>   vr.min = vr.max = wi::zero (INT_TYPE_SIZE);

Note maybe it would be faster to:
vr.min = wi::zero (INT_TYPE_SIZE);
vr.max = wi::zero (INT_TYPE_SIZE);

That doesn't mean your wide-int.h change isn't useful.

Jakub


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Sorry for the late reply, but:
>> 
>> Richard Biener  writes:
>> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >
>> >> 
>> >> Currently we force peeling for gaps whenever element overrun can occur
>> >> but for aligned accesses we know that the loads won't trap and thus
>> >> we can avoid this.
>> >> 
>> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> some testsuite fallout here so didn't bother to invent a new testcase).
>> >> 
>> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> (even when not trapping).  Like for ASAN or valgrind.
>> >
>> > This is what I applied.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> [...]
>> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > index 15aec21..c29e73d 100644
>> > --- a/gcc/tree-vect-stmts.c
>> > +++ b/gcc/tree-vect-stmts.c
>> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> > vectype, bool slp,
>> >/* If there is a gap at the end of the group then these 
>> > optimizations
>> > would access excess elements in the last iteration.  */
>> >bool would_overrun_p = (gap != 0);
>> > +  /* If the access is aligned an overrun is fine.  */
>> > +  if (would_overrun_p
>> > +&& aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> > +  would_overrun_p = false;
>> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >  && (can_overrun_p || !would_overrun_p)
>> >  && compare_step_with_zero (stmt) > 0)
>> 
>> ...is this right for all cases?  I think it only looks for single-vector
>> alignment, but the gap can in principle be vector-sized or larger,
>> at least for load-lanes.
>>
>> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> spans two vectors and we have no guarantee that the second one
>> is mapped.
>
> The check assumes that if aligned_access_p () returns true then the
> whole access is aligned in a way that it can't cross page boundaries.
> That's of course not the case if alignment is 16 bytes but the access
> will be a multiple of that.
>  
>> I haven't been able to come up with a testcase though.  We seem to be
>> overly conservative when computing alignments.
>
> Not sure if we can run into this with load-lanes given that bumps the
> vectorization factor.  Also does load-lane work with gaps?
>
> I think that gap can never be larger than nunits-1 so it is by definition
> in the last "vector" independent of the VF.
>
> Classical gap case is
>
> for (i=0; i  {
>y[3*i + 0] = x[4*i + 0];
>y[3*i + 1] = x[4*i + 1];
>y[3*i + 2] = x[4*i + 2];
>  }
>
> where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> the y's different streams and you should get the perfect case for
> load-lane:
>
> for (i=0; i  {
>y[i] = x[4*i + 0];
>z[i] = x[4*i + 1];
>w[i] = x[4*i + 2];
>  } 
>
> previously we'd peel at least 4 iterations into the epilogue for
> the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> ok.

The case I was thinking of was like the second, but with the
element type being DI or DF and with the + 2 statement removed.
E.g.:

double __attribute__((noinline))
foo (double *a)
{
  double res = 0.0;
  for (int n = 0; n < 256; n += 4)
res += a[n] + a[n + 1];
  return res;
}

(with -ffast-math).  We do use LD4 for this, and having "a" aligned
to V2DF isn't enough to guarantee that we can access a[n + 2]
and a[n + 3].

Thanks,
Richard


Re: [PATCH] Add wide_int_storage::operator=

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Jakub Jelinek wrote:

> On Wed, Mar 01, 2017 at 01:08:58PM +0100, Richard Biener wrote:
> > 
> > In debugging a -Wuninitialized issue from ipa-cp.c which does
> > 
> >   vr.min = vr.max = wi::zero (INT_TYPE_SIZE);
> 
> Note maybe it would be faster to:
>   vr.min = wi::zero (INT_TYPE_SIZE);
>   vr.max = wi::zero (INT_TYPE_SIZE);
> 
> That doesn't mean your wide-int.h change isn't useful.

Note that rewriting like above doesn't fix the warning.  The issue
is from

generic_wide_int& generic_wide_int::operator=(const T&) [with 
T = wi::hwi_with_prec; storage = wide_int_storage] (struct 
generic_wide_int * const this, const struct hwi_with_prec & x)
{
  struct wide_int_storage D.47458;
  struct generic_wide_int & D.52104;

  wide_int_storage::wide_int_storage (&D.47458, x);
  try
{
  this->D.16244 = D.47458;

where wide_int_storage::wide_int_storage (&D.47458, x)
doesn't initialize all of wide_int_storage (but only up to len).

Richard.


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Sorry for the late reply, but:
> >> 
> >> Richard Biener  writes:
> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >
> >> >> 
> >> >> Currently we force peeling for gaps whenever element overrun can occur
> >> >> but for aligned accesses we know that the loads won't trap and thus
> >> >> we can avoid this.
> >> >> 
> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> >> some testsuite fallout here so didn't bother to invent a new testcase).
> >> >> 
> >> >> Just in case somebody thinks the overrun is a bad idea in general
> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >
> >> > This is what I applied.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >
> >> > Richard.
> >> [...]
> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> > index 15aec21..c29e73d 100644
> >> > --- a/gcc/tree-vect-stmts.c
> >> > +++ b/gcc/tree-vect-stmts.c
> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> >> > vectype, bool slp,
> >> >/* If there is a gap at the end of the group then these 
> >> > optimizations
> >> >   would access excess elements in the last iteration.  */
> >> >bool would_overrun_p = (gap != 0);
> >> > +  /* If the access is aligned an overrun is fine.  */
> >> > +  if (would_overrun_p
> >> > +  && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> > +would_overrun_p = false;
> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >&& (can_overrun_p || !would_overrun_p)
> >> >&& compare_step_with_zero (stmt) > 0)
> >> 
> >> ...is this right for all cases?  I think it only looks for single-vector
> >> alignment, but the gap can in principle be vector-sized or larger,
> >> at least for load-lanes.
> >>
> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> >> spans two vectors and we have no guarantee that the second one
> >> is mapped.
> >
> > The check assumes that if aligned_access_p () returns true then the
> > whole access is aligned in a way that it can't cross page boundaries.
> > That's of course not the case if alignment is 16 bytes but the access
> > will be a multiple of that.
> >  
> >> I haven't been able to come up with a testcase though.  We seem to be
> >> overly conservative when computing alignments.
> >
> > Not sure if we can run into this with load-lanes given that bumps the
> > vectorization factor.  Also does load-lane work with gaps?
> >
> > I think that gap can never be larger than nunits-1 so it is by definition
> > in the last "vector" independent of the VF.
> >
> > Classical gap case is
> >
> > for (i=0; i >  {
> >y[3*i + 0] = x[4*i + 0];
> >y[3*i + 1] = x[4*i + 1];
> >y[3*i + 2] = x[4*i + 2];
> >  }
> >
> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> > the y's different streams and you should get the perfect case for
> > load-lane:
> >
> > for (i=0; i >  {
> >y[i] = x[4*i + 0];
> >z[i] = x[4*i + 1];
> >w[i] = x[4*i + 2];
> >  } 
> >
> > previously we'd peel at least 4 iterations into the epilogue for
> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> > ok.
> 
> The case I was thinking of was like the second, but with the
> element type being DI or DF and with the + 2 statement removed.
> E.g.:
> 
> double __attribute__((noinline))
> foo (double *a)
> {
>   double res = 0.0;
>   for (int n = 0; n < 256; n += 4)
> res += a[n] + a[n + 1];
>   return res;
> }
> 
> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
> to V2DF isn't enough to guarantee that we can access a[n + 2]
> and a[n + 3].

Yes, indeed.  It's safe when peeling for gaps would remove
N < alignof (ref) / sizeof (ref) scalar iterations.

Peeling for gaps simply subtracts one from the niter of the vectorized 
loop.

One should be able to construct a testcase w/o load-lanes by ensuring
a high enough VF.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


RE: [PATCH,testsuite] Skip gcc.dg/lto/pr60449_0.c for mips*-*-elf* targets.

2017-03-01 Thread Toma Tabacu
> From: Catherine Moore
> 
> Hi Toma,
> There are some MIPS ELF targets that do support gettimeofday.   Perhaps you
> could handle this with a dg_require_effective_target entry for gettimeofday.
> Thanks,
> Catherine
> 

Hi,

Thank you for your quick reply.

The patch below adds a dg_require_effective_target for gettimeofday.
Does it look good ? I'm having some doubts about the new directive's name.

Also, this patch makes the dg-skip-if for AVR redundant. Should I remove it ?

Regards,
Toma

gcc/testsuite/

* gcc.dg/lto/pr60449_0.c: Require gettimeofday support.
* lib/target-supports.exp (check_effective_target_gettimeofday): New.

diff --git a/gcc/testsuite/gcc.dg/lto/pr60449_0.c 
b/gcc/testsuite/gcc.dg/lto/pr60449_0.c
index 5b878a6..ad83938 100644
--- a/gcc/testsuite/gcc.dg/lto/pr60449_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr60449_0.c
@@ -1,5 +1,6 @@
 /* { dg-lto-do link } */
 /* { dg-skip-if "Needs gettimeofday" { "avr-*-*" } } */
+/* { dg-require-effective-target gettimeofday } */
 
 extern int printf (const char *__restrict __format, ...);
 typedef long int __time_t;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 2766af4..29d61ca 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4342,6 +4342,32 @@ proc check_effective_target_mips_nanlegacy { } {
 } "-mnan=legacy"]
 }
 
+proc check_effective_target_gettimeofday { } {
+  return [check_no_compiler_messages gettimeofday executable {
+struct timeval
+  {
+   long int tv_sec;
+   long int tv_usec;
+  };
+
+struct timezone
+  {
+   int tz_minuteswest;
+   int tz_dsttime;
+  };
+
+extern int gettimeofday (struct timeval * __tv, struct timezone * __tz);
+
+int main ()
+{
+   struct timeval tv;
+   struct timezone tz;
+   gettimeofday (&tv, &tz);
+   return 0;
+}
+}]
+}
+
 # Return 1 if an MSA program can be compiled to object
 
 proc check_effective_target_mips_msa { } {



[PATCH,testsuite] MIPS: Force O32 ABI for inline-memcpy-3.c.

2017-03-01 Thread Toma Tabacu
Hi,

inline-memcpy-3.c fails when using -mabi=n64 and -mabi=n32 as a test-run option
because it does not impose a specific ABI in its test options.

As there already are variants of this test which force a specific ABI (N64 in
inline-memcpy-4.c and N32 in inline-memcpy-5.c), inline-memcpy-3.c should also
do so with the O32 ABI.

This patch forces the O32 ABI for this test by adding "-mabi=32" to the test
options.

Regards,
Toma

gcc/testsuite/

* gcc.target/mips/inline-memcpy-3.c (dg-options): Add -mabi=32.

diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c 
b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c
index 3bdb28b..a449107 100644
--- a/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c
+++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c
@@ -1,4 +1,4 @@
-/* { dg-options "-fno-common isa_rev<=5 (REQUIRES_STDLIB)" } */
+/* { dg-options "-fno-common isa_rev<=5 -mabi=32 (REQUIRES_STDLIB)" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os"} { "" } } */
 /* { dg-final { scan-assembler-not "\tmemcpy" } } */
 /* { dg-final { scan-assembler-times "swl" 8 } } */



Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Sorry for the late reply, but:
>> >> 
>> >> Richard Biener  writes:
>> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >
>> >> >> 
>> >> >> Currently we force peeling for gaps whenever element overrun can occur
>> >> >> but for aligned accesses we know that the loads won't trap and thus
>> >> >> we can avoid this.
>> >> >> 
>> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> >> some testsuite fallout here so didn't bother to invent a new testcase).
>> >> >> 
>> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >
>> >> > This is what I applied.
>> >> >
>> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >
>> >> > Richard.
>> >> [...]
>> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> > index 15aec21..c29e73d 100644
>> >> > --- a/gcc/tree-vect-stmts.c
>> >> > +++ b/gcc/tree-vect-stmts.c
>> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> >> > vectype, bool slp,
>> >> >/* If there is a gap at the end of the group then these 
>> >> > optimizations
>> >> >  would access excess elements in the last iteration.  */
>> >> >bool would_overrun_p = (gap != 0);
>> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> > +  if (would_overrun_p
>> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> > +   would_overrun_p = false;
>> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >   && (can_overrun_p || !would_overrun_p)
>> >> >   && compare_step_with_zero (stmt) > 0)
>> >> 
>> >> ...is this right for all cases?  I think it only looks for single-vector
>> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> at least for load-lanes.
>> >>
>> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> >> spans two vectors and we have no guarantee that the second one
>> >> is mapped.
>> >
>> > The check assumes that if aligned_access_p () returns true then the
>> > whole access is aligned in a way that it can't cross page boundaries.
>> > That's of course not the case if alignment is 16 bytes but the access
>> > will be a multiple of that.
>> >  
>> >> I haven't been able to come up with a testcase though.  We seem to be
>> >> overly conservative when computing alignments.
>> >
>> > Not sure if we can run into this with load-lanes given that bumps the
>> > vectorization factor.  Also does load-lane work with gaps?
>> >
>> > I think that gap can never be larger than nunits-1 so it is by definition
>> > in the last "vector" independent of the VF.
>> >
>> > Classical gap case is
>> >
>> > for (i=0; i> >  {
>> >y[3*i + 0] = x[4*i + 0];
>> >y[3*i + 1] = x[4*i + 1];
>> >y[3*i + 2] = x[4*i + 2];
>> >  }
>> >
>> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> > the y's different streams and you should get the perfect case for
>> > load-lane:
>> >
>> > for (i=0; i> >  {
>> >y[i] = x[4*i + 0];
>> >z[i] = x[4*i + 1];
>> >w[i] = x[4*i + 2];
>> >  } 
>> >
>> > previously we'd peel at least 4 iterations into the epilogue for
>> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> > ok.
>> 
>> The case I was thinking of was like the second, but with the
>> element type being DI or DF and with the + 2 statement removed.
>> E.g.:
>> 
>> double __attribute__((noinline))
>> foo (double *a)
>> {
>>   double res = 0.0;
>>   for (int n = 0; n < 256; n += 4)
>> res += a[n] + a[n + 1];
>>   return res;
>> }
>> 
>> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> and a[n + 3].
>
> Yes, indeed.  It's safe when peeling for gaps would remove
> N < alignof (ref) / sizeof (ref) scalar iterations.
>
> Peeling for gaps simply subtracts one from the niter of the vectorized 
> loop.

I think subtracting one is enough in all cases.  It's only the final
iteration of the scalar loop that can't access a[n + 2] and a[n + 3].

(Of course, subtracting one happens before peeling for niters, so it
only makes a difference if the original niters was a multiple of the VF,
in which case we peel a full vector's worth of iterations instead of
peeling none.)

Thanks,
Richard


Re: [PATCH,testsuite] Skip gcc.dg/lto/pr60449_0.c for mips*-*-elf* targets.

2017-03-01 Thread Rainer Orth
Hi Toma,

> The patch below adds a dg_require_effective_target for gettimeofday.
> Does it look good ? I'm having some doubts about the new directive's name.

no, this has a couple of problems.  See below.

> Also, this patch makes the dg-skip-if for AVR redundant. Should I remove it ?

Of course: no reason to duplicate this, this would just be confusing.

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 2766af4..29d61ca 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -4342,6 +4342,32 @@ proc check_effective_target_mips_nanlegacy { } {
>  } "-mnan=legacy"]
>  }
>  
> +proc check_effective_target_gettimeofday { } {

This proc needs a comment like all the others.

> +  return [check_no_compiler_messages gettimeofday executable {
> +struct timeval
> +  {
> + long int tv_sec;
> + long int tv_usec;
> +  };
> +
> +struct timezone
> +  {
> + int tz_minuteswest;
> + int tz_dsttime;
> +  };
> +
> +extern int gettimeofday (struct timeval * __tv, struct timezone * __tz);
> +
> +int main ()
> +{
> + struct timeval tv;
> + struct timezone tz;
> + gettimeofday (&tv, &tz);
> + return 0;
> +}
> +}]
> +}

This is very wrong: some targets use a void * second arg, and other
types for the struct timeval members.  Better use
check_function_available instead, as in several other examples.  And
please test the testcase on at least one non-mips-elf target that
actually *has* gettimeofday.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[wwwdocs+doc] Adjust doxygen.org links

2017-03-01 Thread Gerald Pfeifer
Instead of making this change (for top level URLs, the trailing dash
has been optional per the standards for a decade or two) I could have
tweaked a whitelist I am maintaining with a better regexp, but figured
why not simplify things?

Committed.

Gerald

Index: codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.78
diff -u -r1.78 codingconventions.html
--- codingconventions.html  3 Feb 2017 07:11:56 -   1.78
+++ codingconventions.html  1 Mar 2017 12:50:51 -
@@ -652,7 +652,7 @@
 first. 
 
 libstdc++-v3:  In docs/doxygen, comments in *.cfg.in are
-partially autogenerated from http://www.doxygen.org/";>the
+partially autogenerated from http://www.doxygen.org";>the
 Doxygen tool.  In docs/html, the ext/lwg-* files are copied from http://www.open-std.org/jtc1/sc22/wg21/";>the C++ committee homepage,
 the 27_io/binary_iostream_* files are copies of Usenet postings, and most

2017-03-01  Gerald Pfeifer  

* doc/xml/manual/documentation_hacking.xml: Tweak link to
doxygen.org.

Index: doc/xml/manual/documentation_hacking.xml
===
--- doc/xml/manual/documentation_hacking.xml(revision 245807)
+++ doc/xml/manual/documentation_hacking.xml(working copy)
@@ -261,7 +261,7 @@
 
   
Prerequisite tools are Bash 2.0 or later,
-   http://www.w3.org/1999/xlink"; 
xlink:href="http://www.doxygen.org/";>Doxygen, and
+   http://www.w3.org/1999/xlink"; 
xlink:href="http://www.doxygen.org";>Doxygen, and
the http://www.w3.org/1999/xlink"; 
xlink:href="http://www.gnu.org/software/coreutils/";>GNU
coreutils. (GNU versions of find, xargs, and possibly
sed and grep are used, just because the GNU versions make


Re: Poll for option name (Was: [PATCH v6] add -fprolog-pad=N,M option)

2017-03-01 Thread Torsten Duwe
On Wed, Mar 01, 2017 at 11:34:37AM +, Richard Earnshaw (lists) wrote:
> On 01/03/17 11:26, Torsten Duwe wrote:
> > 
> > However, writing some more documentation and being asked for clarity,
> > I found it more depicting to talk about the function entry point than
> > about the prologue. Also, this is about generic instrumentation, and it
> > surely involves NOPs.
> > 
> > So, hereby I'd like to start a small poll for a good name for this feature.
> > Anyone with a better idea please speak up now. Otherwise I'll just
> > s/prolog/prologue/g.
> 
> Hmm, I'd prefer the bike shed to be green :-)
> 
> How about --fpatchable-function-entry=?
> 
IMHO qualifies as "better". And green is best anyway :-]

> > I've made another improvement which makes the code even more robust now.
> > +DEF_TARGET_INSN (nop, (void))
> > In gcc/target-insns.def. This way I can easily check whether there is a
> > (define_insn "nop" ...) in the target md. Currently, all CPUs have it, but
> > who knows.
> 
> The mid-end already has direct calls to gen_nop with no guards on the
> pattern existing,  So the compiler won't build without a NOP pattern.

Richard told me "don't do that", and we found the DEF_TARGET_INSN. So far
I can see gen_nop only in target specifics and in cfgrtl.c -- admittedly
I don't know what that does.

So the v6 code is basically OK?

Names better than -fpatchable-function-entry anyone?

Torsten



Re: Poll for option name (Was: [PATCH v6] add -fprolog-pad=N,M option)

2017-03-01 Thread Richard Earnshaw (lists)
On 01/03/17 13:32, Torsten Duwe wrote:
> On Wed, Mar 01, 2017 at 11:34:37AM +, Richard Earnshaw (lists) wrote:
>> On 01/03/17 11:26, Torsten Duwe wrote:
>>>
>>> However, writing some more documentation and being asked for clarity,
>>> I found it more depicting to talk about the function entry point than
>>> about the prologue. Also, this is about generic instrumentation, and it
>>> surely involves NOPs.
>>>
>>> So, hereby I'd like to start a small poll for a good name for this feature.
>>> Anyone with a better idea please speak up now. Otherwise I'll just
>>> s/prolog/prologue/g.
>>
>> Hmm, I'd prefer the bike shed to be green :-)
>>
>> How about --fpatchable-function-entry=?
>>
> IMHO qualifies as "better". And green is best anyway :-]
> 
>>> I've made another improvement which makes the code even more robust now.
>>> +DEF_TARGET_INSN (nop, (void))
>>> In gcc/target-insns.def. This way I can easily check whether there is a
>>> (define_insn "nop" ...) in the target md. Currently, all CPUs have it, but
>>> who knows.
>>
>> The mid-end already has direct calls to gen_nop with no guards on the
>> pattern existing,  So the compiler won't build without a NOP pattern.
> 
> Richard told me "don't do that", and we found the DEF_TARGET_INSN. So far
> I can see gen_nop only in target specifics and in cfgrtl.c -- admittedly
> I don't know what that does.
> 
> So the v6 code is basically OK?
> 
I haven't reviewed it yet.  I'm not really planning to spend any more
time on this until stage1 re-opens.

R.

> Names better than -fpatchable-function-entry anyone?
> 
>   Torsten
> 



Re: [wwwdocs] RISC-V readings and features

2017-03-01 Thread Gerald Pfeifer
On Wed, 8 Feb 2017, Gerald Pfeifer wrote:
> Except http://riscv.org actually redirects to https://riscv.org . :-}

I now made the same update to readings.html as well.

Applied.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.268
diff -u -r1.268 readings.html
--- readings.html   26 Feb 2017 14:35:13 -  1.268
+++ readings.html   1 Mar 2017 13:37:57 -
@@ -245,8 +245,8 @@
 
  riscv
   Manufacturer: Many (open ISA standard)
-  http://riscv.org";>RISC-V Foundation
-  http://riscv.org/specifications/";>ISA Specifications
+  https://riscv.org";>RISC-V Foundation
+  https://riscv.org/specifications/";>ISA Specifications
  
  
  rs6000 (powerpc, powerpcle)


[PATCH] Fix PR79345, better uninit warnings for memory

2017-03-01 Thread Richard Biener

The following addresses a regression in uninit warnings that happens
because clobber stmts preclude the very simple-minded support we have
for memory.  The patch fixes this by instead implementing uninit
warnings for memory properly, using the alias oracle walk_aliased_vdefs
helper.

The patch adds better limiting to that interface and fixes one
false positive in fixed-value.c.  Two other false positives are
fixed by the wide-int.h patch posted a few hours ago and a patch
to genemit from Jakub.

Bootstrap and regtest running on x86_64-unknown-linux-gnu with those
prerequesites included.

One issue with the patch is duplicate warnings as TREE_NO_WARNING
doesn't work very well on tcc_reference trees which are not
shared.  A followup could use some sort of hash table to mitigate
this a bit.  OTOH for maybe-uninit uses multiple locations may
be in need to be fixed to silence the warning.  Another thing is
that we walk the function in random (BB) order and thus the
alias oracle walk limiting may result in warnings popping up
and going away in less predictable order (and also be reported
in odd order, like not all must-uninits first).

Comments?  I realize this may introduce (a lot of?) false positives
quite late in the game, more aggressive limiting, like to 2
disambiguations, could solve the testcase in the PR and give up
most of the times while preserving the non-walking case of the
old code.

Richard.

2017-03-01  Richard Biener  

* tree-ssa-alias.c (walk_aliased_vdefs_1): Take a limit
param and abort the walk, returning -1 if it is hit.
(walk_aliased_vdefs): Take a limit param and pass it on.
* tree-ssa-alias.h (walk_aliased_vdefs): Add a limit param,
defaulting to 0 and return a signed int.
* tree-ssa-uninit.c (struct check_defs_data): New struct.
(check_defs): New helper.
(warn_uninitialized_vars): Use walk_aliased_vdefs to warn
about uninitialized memory.

* fixed-value.c (fixed_from_string): Use ulow/uhigh to avoid
bogus uninitialized warning.
(fixed_convert_from_real): Likewise.

* g++.dg/warn/Wuninitialized-7.C: New testcase.
* c-c++-common/ubsan/bounds-2.c: Add -Wno-uninitialized.
* gcc.dg/uninit-pr19430-2.c: Add expected warning.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 245803)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2897,13 +2897,15 @@ walk_non_aliased_vuses (ao_ref *ref, tre
PHI argument (but only one walk continues on merge points), the
return value is true if any of the walks was successful.
 
-   The function returns the number of statements walked.  */
+   The function returns the number of statements walked or -1 if
+   LIMIT stmts were walked and the walk was aborted at this point.
+   If LIMIT is zero the walk is not aborted.  */
 
-static unsigned int
+static int
 walk_aliased_vdefs_1 (ao_ref *ref, tree vdef,
  bool (*walker)(ao_ref *, tree, void *), void *data,
  bitmap *visited, unsigned int cnt,
- bool *function_entry_reached)
+ bool *function_entry_reached, unsigned limit)
 {
   do
 {
@@ -2925,14 +2927,22 @@ walk_aliased_vdefs_1 (ao_ref *ref, tree
  if (!*visited)
*visited = BITMAP_ALLOC (NULL);
  for (i = 0; i < gimple_phi_num_args (def_stmt); ++i)
-   cnt += walk_aliased_vdefs_1 (ref, gimple_phi_arg_def (def_stmt, i),
-walker, data, visited, 0,
-function_entry_reached);
+   {
+ int res = walk_aliased_vdefs_1 (ref,
+ gimple_phi_arg_def (def_stmt, i),
+ walker, data, visited, 0,
+ function_entry_reached, limit);
+ if (res == -1)
+   return -1;
+ cnt += res;
+   }
  return cnt;
}
 
   /* ???  Do we want to account this to TV_ALIAS_STMT_WALK?  */
   cnt++;
+  if (cnt == limit)
+   return -1;
   if ((!ref
   || stmt_may_clobber_ref_p_1 (def_stmt, ref))
  && (*walker) (ref, vdef, data))
@@ -2943,14 +2953,14 @@ walk_aliased_vdefs_1 (ao_ref *ref, tree
   while (1);
 }
 
-unsigned int
+int
 walk_aliased_vdefs (ao_ref *ref, tree vdef,
bool (*walker)(ao_ref *, tree, void *), void *data,
bitmap *visited,
-   bool *function_entry_reached)
+   bool *function_entry_reached, unsigned int limit)
 {
   bitmap local_visited = NULL;
-  unsigned int ret;
+  int ret;
 
   timevar_push (TV_ALIAS_STMT_WALK);
 
@@ -2959,7 +2969,7 @@ walk_aliased_vdefs (ao_ref *ref, tree vd
 
   ret = walk_aliased_vdefs_1 (ref, vdef, walker, data,
  visi

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >
> >> >> Sorry for the late reply, but:
> >> >> 
> >> >> Richard Biener  writes:
> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >> >
> >> >> >> 
> >> >> >> Currently we force peeling for gaps whenever element overrun can 
> >> >> >> occur
> >> >> >> but for aligned accesses we know that the loads won't trap and thus
> >> >> >> we can avoid this.
> >> >> >> 
> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> >> >> some testsuite fallout here so didn't bother to invent a new 
> >> >> >> testcase).
> >> >> >> 
> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >> >
> >> >> > This is what I applied.
> >> >> >
> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> >
> >> >> > Richard.
> >> >> [...]
> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> >> > index 15aec21..c29e73d 100644
> >> >> > --- a/gcc/tree-vect-stmts.c
> >> >> > +++ b/gcc/tree-vect-stmts.c
> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> >> >> > vectype, bool slp,
> >> >> >/* If there is a gap at the end of the group then these 
> >> >> > optimizations
> >> >> >would access excess elements in the last iteration.  */
> >> >> >bool would_overrun_p = (gap != 0);
> >> >> > +  /* If the access is aligned an overrun is fine.  */
> >> >> > +  if (would_overrun_p
> >> >> > +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> >> > + would_overrun_p = false;
> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >> > && (can_overrun_p || !would_overrun_p)
> >> >> > && compare_step_with_zero (stmt) > 0)
> >> >> 
> >> >> ...is this right for all cases?  I think it only looks for single-vector
> >> >> alignment, but the gap can in principle be vector-sized or larger,
> >> >> at least for load-lanes.
> >> >>
> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> >> >> spans two vectors and we have no guarantee that the second one
> >> >> is mapped.
> >> >
> >> > The check assumes that if aligned_access_p () returns true then the
> >> > whole access is aligned in a way that it can't cross page boundaries.
> >> > That's of course not the case if alignment is 16 bytes but the access
> >> > will be a multiple of that.
> >> >  
> >> >> I haven't been able to come up with a testcase though.  We seem to be
> >> >> overly conservative when computing alignments.
> >> >
> >> > Not sure if we can run into this with load-lanes given that bumps the
> >> > vectorization factor.  Also does load-lane work with gaps?
> >> >
> >> > I think that gap can never be larger than nunits-1 so it is by definition
> >> > in the last "vector" independent of the VF.
> >> >
> >> > Classical gap case is
> >> >
> >> > for (i=0; i >> >  {
> >> >y[3*i + 0] = x[4*i + 0];
> >> >y[3*i + 1] = x[4*i + 1];
> >> >y[3*i + 2] = x[4*i + 2];
> >> >  }
> >> >
> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> >> > the y's different streams and you should get the perfect case for
> >> > load-lane:
> >> >
> >> > for (i=0; i >> >  {
> >> >y[i] = x[4*i + 0];
> >> >z[i] = x[4*i + 1];
> >> >w[i] = x[4*i + 2];
> >> >  } 
> >> >
> >> > previously we'd peel at least 4 iterations into the epilogue for
> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> >> > ok.
> >> 
> >> The case I was thinking of was like the second, but with the
> >> element type being DI or DF and with the + 2 statement removed.
> >> E.g.:
> >> 
> >> double __attribute__((noinline))
> >> foo (double *a)
> >> {
> >>   double res = 0.0;
> >>   for (int n = 0; n < 256; n += 4)
> >> res += a[n] + a[n + 1];
> >>   return res;
> >> }
> >> 
> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
> >> and a[n + 3].
> >
> > Yes, indeed.  It's safe when peeling for gaps would remove
> > N < alignof (ref) / sizeof (ref) scalar iterations.
> >
> > Peeling for gaps simply subtracts one from the niter of the vectorized 
> > loop.
> 
> I think subtracting one is enough in all cases.  It's only the final
> iteration of the scalar loop that can't access a[n + 2] and a[n + 3].
> 
> (Of course, subtracting one happens before peeling for niters, so it
> only makes a difference if the original niters was a multiple of the VF,
> in which case we peel a full vector's worth of iterations instead of
> peeling none.)

I think one could extend the gcc.dg/vect/group-no-gaps-1.c testcase
to covert the case with bigger VF, for example

Re: [PATCH docs] remove Java from GCC 7 release criteria

2017-03-01 Thread Gerald Pfeifer

On Tue, 28 Feb 2017, Martin Sebor wrote:

The GCC 7 release criteria page mentions Java even though
the front end has been removed.  The attached patch removes Java
from the criteria page.  While reviewing the rest of the text I
noticed a few minor typos that I corrected in the patch as well.


Thanks, Martin!

To minor comments:

-bug reports for problems encountered building and using popular
+bug reports for problems encountered while building and using popular

I believe the original version was fine (an is shorter), so personally
would have left it as is.  Your proposed one is correct, too, of course.

-quality or compilation time regression is sufficiently severe as to
+quality or compilation time regression is sufficiently severe to
merit blocking the release.

Same here, though here I like yours edit better. :-)

Gerald


Re: C++ PATCH to fix wrong-code with pointer-to-data-members (PR c++/79687)

2017-03-01 Thread Marek Polacek
On Tue, Feb 28, 2017 at 01:12:38PM -1000, Jason Merrill wrote:
> On Tue, Feb 28, 2017 at 10:10 AM, Marek Polacek  wrote:
> > On Fri, Feb 24, 2017 at 11:11:05AM -0800, Jason Merrill wrote:
> >> On Fri, Feb 24, 2017 at 8:22 AM, Marek Polacek  wrote:
> >> > I had an interesting time tracking down some of the problems with this 
> >> > code.
> >> > Hopefully I've sussed out now how this stuff works.
> >> >
> >> > We've got
> >> >
> >> > struct A { char c; };
> >> > char A::*p = &A::c;
> >> > static char A::*const q = p;
> >> > and then
> >> > &(a.*q) - &a.c
> >> > which should evaluate to 0.  Here "p" will be 0, that's the offset from 
> >> > the
> >> > start of the struct to "c".  "q" is const-qualified and static and 
> >> > initialized
> >> > with "p", so we get to cp_fold_maybe_rvalue -> decl_constant_value ->
> >> > constant_value_1.  Now, NULL pointer-to-data-members are represented by 
> >> > -1, so
> >> > that a null pointer is distinguishable from an offset of the first 
> >> > member of a
> >> > struct (0).  So constant_value_1 looks at the DECL_INITIAL of "q", which 
> >> > is -1,
> >> > a constant, we fold "q" to -1, and sadness ensues.  I believe the -1 
> >> > value is
> >> > only an internal representation and shouldn't be used like that.
> >>
> >> Since q is initialized from p, it shouldn't have a DECL_INITIAL of -1;
> >> that sounds like the bug.
> >
> > The DECL_INITIAL of -1 comes from cp_finish_decl:
> >  7038  The memory occupied by any object of static storage
> >  7039  duration is zero-initialized at program startup before
> >  7040  any other initialization takes place.
> >  7041
> >  7042  We cannot create an appropriate initializer until after
> >  7043  the type of DECL is finalized.  If DECL_INITIAL is set,
> >  7044  then the DECL is statically initialized, and any
> >  7045  necessary zero-initialization has already been 
> > performed.  */
> >  7046   if (TREE_STATIC (decl) && !DECL_INITIAL (decl))
> >  7047 DECL_INITIAL (decl) = build_zero_init (TREE_TYPE (decl),
> >  7048
> > /*nelts=*/NULL_TREE,
> >  7049
> > /*static_storage_p=*/true);
> 
> Ah, that makes sense.  We do want to do constant-initialization with
> -1 before we do dynamic initialization with p.
> 
> So we need to detect in constant_value_1 that the variable has a
> dynamic initializer and therefore return the variable rather than -1.
> DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P seems useful, perhaps in
> combination with DECL_NONTRIVIALLY_INITIALIZED.

Got it.  I think the following should be the real fix.  I ran g++ dg.exp
with some logging to see how often the new check triggers, and it only
triggered in the two new tests, so I'm fairly happy with that.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?

2017-03-01  Marek Polacek  

PR c++/79687
* init.c (constant_value_1): Break if the variable has a dynamic
initializer.

* g++.dg/expr/ptrmem8.C: New test.
* g++.dg/expr/ptrmem9.C: New test.

diff --git gcc/cp/init.c gcc/cp/init.c
index 7ded37e..12e6bf4 100644
--- gcc/cp/init.c
+++ gcc/cp/init.c
@@ -2193,6 +2193,13 @@ constant_value_1 (tree decl, bool strict_p, bool 
return_aggregate_cst_ok_p)
   if (TREE_CODE (init) == CONSTRUCTOR
  && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl))
break;
+  /* If the variable has a dynamic initializer, don't use its
+DECL_INITIAL which doesn't reflect the real value.  */
+  if (VAR_P (decl)
+ && TREE_STATIC (decl)
+ && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl)
+ && DECL_NONTRIVIALLY_INITIALIZED_P (decl))
+   break;
   decl = unshare_expr (init);
 }
   return decl;
diff --git gcc/testsuite/g++.dg/expr/ptrmem8.C 
gcc/testsuite/g++.dg/expr/ptrmem8.C
index e69de29..c5a766a 100644
--- gcc/testsuite/g++.dg/expr/ptrmem8.C
+++ gcc/testsuite/g++.dg/expr/ptrmem8.C
@@ -0,0 +1,15 @@
+// PR c++/79687
+// { dg-do run }
+
+struct A
+{
+  char c;
+};
+
+int main()
+{
+  char A::* p = &A::c;
+  static char A::* const q = p;
+  A a;
+  return &(a.*q) - &a.c;
+}
diff --git gcc/testsuite/g++.dg/expr/ptrmem9.C 
gcc/testsuite/g++.dg/expr/ptrmem9.C
index e69de29..32ce777 100644
--- gcc/testsuite/g++.dg/expr/ptrmem9.C
+++ gcc/testsuite/g++.dg/expr/ptrmem9.C
@@ -0,0 +1,19 @@
+// PR c++/79687
+// { dg-do run }
+
+struct A
+{
+  char c;
+};
+
+int main()
+{
+  static char A::* p1 = &A::c;
+  char A::* const q1 = p1;
+
+  char A::* p2 = &A::c;
+  static char A::* const q2 = p2;
+
+  A a;
+  return (&(a.*q1) - &a.c) || (&(a.*q2) - &a.c);
+}

Marek


Re: [PATCH docs] remove Java from GCC 7 release criteria

2017-03-01 Thread Martin Sebor

On 03/01/2017 08:08 AM, Gerald Pfeifer wrote:

On Tue, 28 Feb 2017, Martin Sebor wrote:

The GCC 7 release criteria page mentions Java even though
the front end has been removed.  The attached patch removes Java
from the criteria page.  While reviewing the rest of the text I
noticed a few minor typos that I corrected in the patch as well.


Thanks, Martin!

To minor comments:

-bug reports for problems encountered building and using popular
+bug reports for problems encountered while building and using popular

I believe the original version was fine (an is shorter), so personally
would have left it as is.  Your proposed one is correct, too, of course.

-quality or compilation time regression is sufficiently severe as to
+quality or compilation time regression is sufficiently severe to
merit blocking the release.

Same here, though here I like yours edit better. :-)


Thanks for the review!

I committed the first patch for now (without the P1/P2 numbers).
If there's consensus to add something more about those than what's
already there I'll post a new patch to add that.

Martin


RE: [PATCH,testsuite] Skip gcc.dg/lto/pr60449_0.c for mips*-*-elf* targets.

2017-03-01 Thread Toma Tabacu
Hi Rainer,

Thank you for the feedback.

As you suggested, I have added a check_gettimeofday_available proc in
target-supports.exp and a dg-require-gettimeofday proc in target-supports-dg.exp
which check for gettimeofday using the existing check_function_available proc.

The test still runs and passes on mips-mti-linux-gnu, which has support for
gettimeofday, but it is now skipped for mips-mti-elf, which doesn't support it
(in my configuration, at least).

I have also removed the dg-skip-if for AVR from the test (haven't tested it on
AVR, though).

What do you think ?

Catherine, would this interfere with the MIPS ELF targets which do support
gettimeofday ?

Regards,
Toma

gcc/testsuite/

* gcc.dg/lto/pr60449_0.c: Add dg-require-gettimeofday. Remove
dg-skip-if for AVR.
* lib/target-supports-dg.exp (dg-require-gettimeofday): New function.
* lib/target-supports.exp (check_gettimeofday_available): Likewise.

diff --git a/gcc/testsuite/gcc.dg/lto/pr60449_0.c 
b/gcc/testsuite/gcc.dg/lto/pr60449_0.c
index 5b878a6..2d3a900 100644
--- a/gcc/testsuite/gcc.dg/lto/pr60449_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr60449_0.c
@@ -1,5 +1,5 @@
 /* { dg-lto-do link } */
-/* { dg-skip-if "Needs gettimeofday" { "avr-*-*" } } */
+/* { dg-require-gettimeofday "" } */
 
 extern int printf (const char *__restrict __format, ...);
 typedef long int __time_t;
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 6400d64..41369aa 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -236,6 +236,15 @@ proc dg-require-mkfifo { args } {
 }
 }
 
+# If this target does not have gettimeofday, skip this test.
+
+proc dg-require-gettimeofday { args } {
+if { ![check_gettimeofday_available] } {
+   upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
 # If this target does not use __cxa_atexit, skip this test.
 
 proc dg-require-cxa-atexit { args } {
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 2766af4..9794383 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2287,6 +2287,12 @@ proc check_mkfifo_available {} {
 return [check_function_available "mkfifo"]
 }
 
+# Returns true iff "gettimeofday" is available on the target system.
+
+proc check_gettimeofday_available {} {
+return [check_function_available "gettimeofday"]
+}
+
 # Returns true iff "__cxa_atexit" is used on the target system.
 
 proc check_cxa_atexit_available { } {



Re: [PATCH docs] remove Java from GCC 7 release criteria

2017-03-01 Thread Martin Sebor

On 02/28/2017 11:41 PM, Richard Biener wrote:

On March 1, 2017 3:34:46 AM GMT+01:00, Martin Sebor  wrote:

On 02/28/2017 01:41 PM, Richard Biener wrote:

On February 28, 2017 7:00:39 PM GMT+01:00, Jeff Law 

wrote:

On 02/28/2017 10:54 AM, Martin Sebor wrote:

The GCC 7 release criteria page mentions Java even though
the front end has been removed.  The attached patch removes Java
from the criteria page.  While reviewing the rest of the text I
noticed a few minor typos that I corrected in the patch as well.

Btw., as an aside, I read the page to see if I could find out more
about the "magic" bug counts that are being aimed for to decide

when

to cut the release.  Can someone say what those are and where to
find them?  I understand from the document that they're not exact
but even ballpark numbers would be useful.


OK.

WRT the bug counts.  0 P1 regressions, < 100 P1-P3 regressions.  I'm
not
sure if that's documented anywhere though.


Actually the only criteria is zero P1 regressions.  Those are

documented to block a release.

Yes, that is mentioned in the document.  Would it be fair to say
that the number of P2 bugs (or regressions) or their nature plays
into the decision in some way as well?  If so, what can the release
criteria say about it?


Ultimatively P2 bugs do not play a role and 'time' will trump them.  OTOH we 
never were in an uncomfortable situation with P2s at the desired point of 
release.

Also note that important P2 bugs can be promoted to P1 and not important P1 to 
P2.


I'm trying to get a better idea which bugs to work on and where
my help might have the biggest impact.  I think having better
visibility into the bug triage process (such as bug priorities
and how they impact the release schedule) might help others
focus too.


In order of importance:
- P1
- wrong-code, rejects-valid, ice-on-valid (even if not regressions, regressions 
more important)
- P2 regressions, more recent ones first (newest working version)


I see.  This is helpful, thanks.

The kinds of problems you mention are discussed in the document
so just to make the importance clear, would adding the following
after this sentence

  In general bugs blocking the release are marked with priority P1
  (Maintaining the GCC Bugzilla database).

accurately reflect what you described?

  As a general rule of thumb, within each priority level, bugs that
  result in incorrect code are considered more urgent than those
  that lead to rejecting valid code, which in turn are viewed as
  more severe than ice-on-valid code (compiler crashes).  More
  recently reported bugs are also prioritized over very old ones.

Martin


Re: [wwwdocs] RISC-V readings and features

2017-03-01 Thread Palmer Dabbelt
On Wed, 01 Mar 2017 05:38:42 PST (-0800), ger...@pfeifer.com wrote:
> On Wed, 8 Feb 2017, Gerald Pfeifer wrote:
>> Except http://riscv.org actually redirects to https://riscv.org . :-}
>
> I now made the same update to readings.html as well.
>
> Applied.

Thanks!


[PATCH rs6000 testsuite] Additional test in pr79544.c

2017-03-01 Thread Pat Haugen
Since I fixed both vec_sra and vec_vsrad, the testcase should be testing
both.  Committed as obvious.

-Pat


2017-03-01  Pat Haugen  

* gcc.target/powerpc/pr79544.c: Add test for vec_vsrad and fix up
scan string.


Index: gcc.target/powerpc/pr79544.c
===
--- gcc.target/powerpc/pr79544.c(revision 245811)
+++ gcc.target/powerpc/pr79544.c(working copy)
@@ -11,5 +11,11 @@ test_sra (vector unsigned long long x, v
   return vec_sra (x, y);
 }

-/* { dg-final { scan-assembler "vsrad" } } */
+vector unsigned long long
+test_vsrad (vector unsigned long long x, vector unsigned long long y)
+{
+  return vec_vsrad (x, y);
+}
+
+/* { dg-final { scan-assembler-times {\mvsrad\M} 2 } } */



Re: [PATCH], PR target/79434, fix PowerPC recursive calls that can replaced at runtime

2017-03-01 Thread Michael Meissner
On Wed, Mar 01, 2017 at 05:21:44AM -0600, Segher Boessenkool wrote:
> On Wed, Mar 01, 2017 at 01:37:14AM -0500, Michael Meissner wrote:
> > This patch fixes PR target/79439, which is a recursive call when the 64-bit
> > code is compiled with -fpic doesn't have the NOP after the call.  It is
> > possible for the function to be overriden at link time.  In such a case, the
> > call should call the module that is overriding the call, rather than itself.
> > 
> > The following patch was tested on a little endian Power8 Linux system 
> > (64-bit
> > only), a big endian Power8 Linux system (both 32-bit and 64-bit), and a big
> > endian Power7 Linux system (both 32-bit and 64-bit).  There were no 
> > regressions
> > in the test suite, and I verified that the new test ran successfully in 
> > 64-bit
> > mode.  Can I check this patch into the trunk?
> 
> Yes, thanks!
> 
> > Since the bug was reported against GCC 6, can I apply the patch to GCC 6
> > assuming the patch applies cleanly and has no regressions after a burn in
> > period on the GCC 7 trunk?
> 
> Of course.  Also for GCC 5, if it is worth fixing it there?

Yeah, it can probably go into GCC 5 if the branch is still open.  The original
report was against GCC 6.

> Some questions/comments about the testcase:
> 
> > Index: gcc/testsuite/gcc.target/powerpc/pr79439.c
> > ===
> > --- gcc/testsuite/gcc.target/powerpc/pr79439.c  (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr79439.c  (revision 0)
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> 
> Is this enough?  Do all 64-bit ABIs have the insn to be patched after
> call instructions?

I think all do, but I restricted it to powerpc64*-*-linux to be sure.

> > +/* { dg-require-effective-target powerpc_p8vector_ok } */
> 
> Why this?

Because I forgot to remove it, when I cloned another test.

> > +/* Bug 79439 -- we should not eliminate NOP in 'rec' call because it can be
> > +   interposed at link time for 64-bit ABIs.  We need -fpic to tell the 
> > compiler
> > +   functions may be interposed.  */
> 
> That reads as "cannot be interposed on 32-bit ABIs", which isn't what
> you mean I think.

I rewrote the comment.

> > +/* { dg-final { scan-assembler-times {\mnop\M} 3 } } */
> 
> You can also check they follow a "bl" insn immediately (scan-assembler
> does not scan single lines, but the whole output).  Something like
> 
> { scan-assembler-times {\mbl \S+\s+nop\M} 3 }
> 
> Or maybe this is overkill here :-)

Does scan-assembler-times go past 1 line?

In any case, here is the diff for the changes I checked in:

[gcc]
2017-03-01  Michael Meissner  

PR target/79439
* config/rs6000/predicates.md (current_file_function_operand): Do
not allow self calls to be local if the function is replaceable.

[gcc/testsuite]
2017-03-01  Michael Meissner  

PR target/79439
* gcc.target/powerpc/pr79439.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 245812)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1110,7 +1110,8 @@ (define_predicate "current_file_function
   (and (match_code "symbol_ref")
(match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op))
&& (SYMBOL_REF_LOCAL_P (op)
-   || op == XEXP (DECL_RTL (current_function_decl), 0))
+   || (op == XEXP (DECL_RTL (current_function_decl), 0)
+   && !decl_replaceable_p (current_function_decl)))
&& !((DEFAULT_ABI == ABI_AIX
  || DEFAULT_ABI == ABI_ELFv2)
 && (SYMBOL_REF_EXTERNAL_P (op)
Index: gcc/testsuite/gcc.target/powerpc/pr79439.c
===
--- gcc/testsuite/gcc.target/powerpc/pr79439.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79439.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { powerpc64*-*-linux* && lp64 } } } */
+/* { dg-options "-O2 -fpic" } */
+
+/* On the Linux 64-bit ABIs, we should not eliminate NOP in the 'rec' call if
+   -fpic is used because rec can be interposed at link time (since it is
+   external), and the recursive call should call the interposed function.  The
+   Linux 32-bit ABIs do not require NOPs after the BL instruction.  */
+
+int f (void);
+
+void
+g (void)
+{
+}
+
+int
+rec (int a)
+{
+  int ret = 0;
+  if (a > 10 && f ())
+ret += rec (a - 1);
+  g ();
+  return a + ret;
+}
+
+/* { dg-final { scan-assembler-times {\mbl f\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mbl g\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mbl rec\M} 1 } } */
+/*

[PATCH, rs6000] Document default code model for 64-bit Linux

2017-03-01 Thread Bill Schmidt
Hi,

The PowerPC documentation doesn't currently identify the default code model.
This is rather complicated due to all the various subtargets, but it is
valuable to at least document the common case for 64-bit Linux.

Verified on powerpc64le-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


2017-03-01  Bill Schmidt  

* doc/invoke.texi: Document default code model for 64-bit Linux.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 245811)
+++ gcc/doc/invoke.texi (working copy)
@@ -21166,7 +21166,8 @@ Generate PowerPC64 code for the small model: The T
 @item -mcmodel=medium
 @opindex mcmodel=medium
 Generate PowerPC64 code for the medium model: The TOC and other static
-data may be up to a total of 4G in size.
+data may be up to a total of 4G in size.  This is the default for 64-bit
+Linux.
 
 @item -mcmodel=large
 @opindex mcmodel=large



[PATCH] -Wduplicated-branches -fopenmp ICE in inchash::add_expr (PR c++/79672)

2017-03-01 Thread Marek Polacek
The following testcase ICEd with -Wduplicated-branches and -fopenmp
because we tried to has omp_parallel expression that contained some
TREE_VECs, but those aren't handled in inchash::add_expr.  Handling
that is easy and fixes the ICE.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-03-01  Marek Polacek  

PR c++/79672
* tree.c (inchash::add_expr): Handle TREE_VEC.

* g++.dg/warn/Wduplicated-branches2.C: Fix PR.
* g++.dg/warn/Wduplicated-branches3.C: New test.

diff --git gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C 
gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C
index 4da2d54..7e14c5f 100644
--- gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C
+++ gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C
@@ -1,4 +1,4 @@
-// PR c/6427
+// PR c/64279
 // { dg-do compile { target c++11 } }
 // { dg-options "-Wduplicated-branches" }
 
diff --git gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C 
gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C
index e69de29..26dab85 100644
--- gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C
+++ gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C
@@ -0,0 +1,18 @@
+// PR c++/79672
+// { dg-do compile }
+// { dg-options "-Wduplicated-branches -fopenmp" }
+// { dg-require-effective-target fopenmp }
+
+template void foo()
+{
+  if (N > 0)
+  {
+#pragma omp parallel for
+for (int i = 0; i < 10; ++i) ;
+  }
+}
+
+void bar()
+{
+  foo<0>();
+}
diff --git gcc/tree.c gcc/tree.c
index 42c8a2d..8f87e7c 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -7865,6 +7865,10 @@ add_expr (const_tree t, inchash::hash &hstate, unsigned 
int flags)
  inchash::add_expr (tsi_stmt (i), hstate, flags);
return;
   }
+case TREE_VEC:
+  for (int i = 0; i < TREE_VEC_LENGTH (t); ++i)
+   inchash::add_expr (TREE_VEC_ELT (t, i), hstate, flags);
+  return;
 case FUNCTION_DECL:
   /* When referring to a built-in FUNCTION_DECL, use the __builtin__ form.
 Otherwise nodes that compare equal according to operand_equal_p might

Marek


Re: [RFA PATCH, i386]: Warn for 64-bit values in general-reg asm operands and error out for 8-bit values in invalid GR asm operand

2017-03-01 Thread Uros Bizjak
On Wed, Mar 1, 2017 at 11:41 AM, Uros Bizjak  wrote:
> On Wed, Mar 1, 2017 at 10:00 AM, Uros Bizjak  wrote:
>> On Wed, Mar 1, 2017 at 9:48 AM, Jakub Jelinek  wrote:
>>> On Wed, Mar 01, 2017 at 09:34:53AM +0100, Uros Bizjak wrote:
 Some more thoughts on 64-bit reg on 32-bit targets warning.

 Actually, we never *print* register name for instruction that use "A"
 constraint, since %eax/%edx is always implicit  The warning does not
 deal with constraints, so unless we want to output DImode register
 name, there is no warning.
>>>
>>> Ah, indeed, we don't have a modifier that would print the high register
>>> of a register pair (i.e. essentially print REGNO (x) + 1 instead of REGNO
>>> (x)), guess that might be useful not just for 64-bit GPR operands in 32-bit
>>> code, but also 128-bit GPR operands in 64-bit code.
>>
>> The issue here is that (modulo ax/dx with "A" constraint) we don't
>> guarantee double-register sequence order, so any change in register
>> allocation order would break any assumptions. For implicit ax/dx, user
>> should explicitly use register name (e.g. DImode operand in "rdtscp;
>> mov %0, mem" asm should be corrected to use %%eax instead of %0).
>>
>> And, yes - we should add similar warning for 128-bit GPRs. The only
>> way to use register pair with  width > machine_mode is with implicit
>> operands or with explicit regnames.
>
> Something like the following patch I'm testing:

Attached is the patch I have committed to mainline SVN after a full
bootstrap and regression test.

2017-03-01  Uros Bizjak  

* config/i386/i386.c (print_reg): Warn for values of
unsupported size in integer register.

testsuite/ChangeLog:

2017-03-01  Uros Bizjak  

* gcc.target/i386/invsize-2.c: New test.
* gcc.target/i386/invsize-3.c: Ditto.
* gcc.target/i386/invsize-4.c: Ditto.
* gcc.target/i386/pr66274.c: Expect "unsuported size" warning.
* gcc.target/i386/stackalign/asm-1.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 245811)
+++ config/i386/i386.c  (working copy)
@@ -17646,13 +17646,16 @@ print_reg (rtx x, int code, FILE *file)
 
   switch (msize)
 {
+case 16:
+case 12:
 case 8:
+  if (GENERAL_REGNO_P (regno) && msize > GET_MODE_SIZE (word_mode))
+   warning (0, "unsupported size for integer register");
+  /* FALLTHRU */
 case 4:
   if (LEGACY_INT_REGNO_P (regno))
-   putc (msize == 8 && TARGET_64BIT ? 'r' : 'e', file);
+   putc (msize > 4 && TARGET_64BIT ? 'r' : 'e', file);
   /* FALLTHRU */
-case 16:
-case 12:
 case 2:
 normal:
   reg = hi_reg_name[regno];
Index: testsuite/gcc.target/i386/invsize-2.c
===
--- testsuite/gcc.target/i386/invsize-2.c   (nonexistent)
+++ testsuite/gcc.target/i386/invsize-2.c   (working copy)
@@ -0,0 +1,7 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "" } */
+
+void foo (long long x)
+{
+  __asm__ volatile ("# %0" : : "r" (x));
+} /* { dg-warning "unsupported size" }  */
Index: testsuite/gcc.target/i386/invsize-3.c
===
--- testsuite/gcc.target/i386/invsize-3.c   (nonexistent)
+++ testsuite/gcc.target/i386/invsize-3.c   (working copy)
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void foo (long double x)
+{
+  __asm__ volatile ("# %0" : : "r" (x));
+} /* { dg-warning "unsupported size" }  */
Index: testsuite/gcc.target/i386/invsize-4.c
===
--- testsuite/gcc.target/i386/invsize-4.c   (nonexistent)
+++ testsuite/gcc.target/i386/invsize-4.c   (working copy)
@@ -0,0 +1,7 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "" } */
+
+void foo (__int128 x)
+{
+  __asm__ volatile ("# %0" : : "r" (x));
+} /* { dg-warning "unsupported size" }  */
Index: testsuite/gcc.target/i386/pr66274.c
===
--- testsuite/gcc.target/i386/pr66274.c (revision 245811)
+++ testsuite/gcc.target/i386/pr66274.c (working copy)
@@ -3,7 +3,7 @@
 
 void f()
 {
-  asm ("push %0" : : "r" ((unsigned long long) 456));
-}
+  asm ("push %0" : : "r" ((unsigned long long) 456 >> 32));
+} /* { dg-warning "unsupported size" }  */
 
-/* { dg-final { scan-assembler-not "push %r" } } */
+/* { dg-final { scan-assembler-not "push\[ \t]+%r" } } */
Index: testsuite/gcc.target/i386/stackalign/asm-1.c
===
--- testsuite/gcc.target/i386/stackalign/asm-1.c(revision 245811)
+++ testsuite/gcc.target/i386/stackalign/asm-1.c(working copy)
@@ -4,4 +4,4 @@
 
 /* This case is to detect a compile time regression introduced in stack
branch

[PATCH] Avoid UB in insn-emit.c (PR tree-optimization/79345)

2017-03-01 Thread Jakub Jelinek
On Wed, Mar 01, 2017 at 03:03:29PM +0100, Richard Biener wrote:
> The patch adds better limiting to that interface and fixes one
> false positive in fixed-value.c.  Two other false positives are
> fixed by the wide-int.h patch posted a few hours ago and a patch
> to genemit from Jakub.

Here is that patch.  Right now insn-emit.c for match_scratch
operands in expanders emits
  rtx operand7 ATTRIBUTE_UNUSED;
...
  rtx operands[8];
  operands[0] = operand0;
...
  // but no operands[7] = something here;
...
  // C code from *.md file (which typically doesn't touch operands[7])
...
  operand7 = operands[7];
  (void) operand7;
...
  // generated code that doesn't use operand7
This triggers -Wuninitialized warning with Richard's patch and is really UB,
even when we actually don't use it and so hopefully optimize away.
The following patch just removes all operand7 and operands[7] references
from the generated code (i.e. operands that are for match_scratch).

Usually match_scratch numbers come after match_operand/match_dup etc.
numbers, but as can be seen, there are few spots where that is not the case.
The patch adds verification of this requirement in genemit and then fixes
the issues it has diagnosed.

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested with
make s-emit in a cross to
{powerpc64,aarch64,armv7hl,sparc64,s390x,cris,sh,ia64,hppa,mips}-linux, ok
for trunk?

2017-03-01  Jakub Jelinek  

PR tree-optimization/79345
* gensupport.h (struct pattern_stats): Add min_scratch_opno field.
* gensupport.c (get_pattern_stats_1) : Update it.
(get_pattern_stats): Initialize it.
* genemit.c (gen_expand): Verify match_scratch numbers come after
match_operand/match_dup numbers.
* config/i386/i386.md (mul3_highpart): Swap match_dup and
match_scratch numbers.
* config/i386/sse.md (avx2_gathersi, avx2_gatherdi):
Likewise.
* config/s390/s390.md (trunctdsd2): Likewise.

--- gcc/gensupport.h.jj 2017-01-01 12:45:38.0 +0100
+++ gcc/gensupport.h2017-03-01 12:06:21.816440102 +0100
@@ -199,7 +199,8 @@ struct pattern_stats
   /* The largest match_dup, match_op_dup or match_par_dup number found.  */
   int max_dup_opno;
 
-  /* The largest match_scratch number found.  */
+  /* The smallest and largest match_scratch number found.  */
+  int min_scratch_opno;
   int max_scratch_opno;
 
   /* The number of times match_dup, match_op_dup or match_par_dup appears
--- gcc/gensupport.c.jj 2017-01-05 22:10:31.0 +0100
+++ gcc/gensupport.c2017-03-01 12:21:24.830327207 +0100
@@ -3000,6 +3000,10 @@ get_pattern_stats_1 (struct pattern_stat
   break;
 
 case MATCH_SCRATCH:
+  if (stats->min_scratch_opno == -1)
+   stats->min_scratch_opno = XINT (x, 0);
+  else
+   stats->min_scratch_opno = MIN (stats->min_scratch_opno, XINT (x, 0));
   stats->max_scratch_opno = MAX (stats->max_scratch_opno, XINT (x, 0));
   break;
 
@@ -3032,6 +3036,7 @@ get_pattern_stats (struct pattern_stats
 
   stats->max_opno = -1;
   stats->max_dup_opno = -1;
+  stats->min_scratch_opno = -1;
   stats->max_scratch_opno = -1;
   stats->num_dups = 0;
 
--- gcc/genemit.c.jj2017-01-01 12:45:35.0 +0100
+++ gcc/genemit.c   2017-03-01 12:16:28.391343302 +0100
@@ -448,6 +448,10 @@ gen_expand (md_rtx_info *info)
 
   /* Find out how many operands this function has.  */
   get_pattern_stats (&stats, XVEC (expand, 1));
+  if (stats.min_scratch_opno != -1
+  && stats.min_scratch_opno <= MAX (stats.max_opno, stats.max_dup_opno))
+fatal_at (info->loc, "define_expand for %s needs to have match_scratch "
+"numbers above all other operands", XSTR (expand, 0));
 
   /* Output the function name and argument declarations.  */
   printf ("rtx\ngen_%s (", XSTR (expand, 0));
@@ -479,8 +483,6 @@ gen_expand (md_rtx_info *info)
  make a local variable.  */
   for (i = stats.num_generator_args; i <= stats.max_dup_opno; i++)
 printf ("  rtx operand%d;\n", i);
-  for (; i <= stats.max_scratch_opno; i++)
-printf ("  rtx operand%d ATTRIBUTE_UNUSED;\n", i);
   printf ("  rtx_insn *_val = 0;\n");
   printf ("  start_sequence ();\n");
 
@@ -516,7 +518,7 @@ gen_expand (md_rtx_info *info)
 (unless we aren't going to use them at all).  */
   if (XVEC (expand, 1) != 0)
{
- for (i = 0; i < stats.num_operand_vars; i++)
+ for (i = 0; i <= MAX (stats.max_opno, stats.max_dup_opno); i++)
{
  printf ("operand%d = operands[%d];\n", i, i);
  printf ("(void) operand%d;\n", i);
--- gcc/config/i386/i386.md.jj  2017-02-22 18:15:48.0 +0100
+++ gcc/config/i386/i386.md 2017-03-01 12:23:22.882736837 +0100
@@ -7364,11 +7364,11 @@ (define_expand "mul3_highpart"
   (match_operand:SWI48 1 "nonimmediate_operand"))
 (any_extend:
   (match_operand:SWI48 2 "reg

[C++ PATCH] -Wunused-but-set-parameter fix followup (PR c++/79782)

2017-03-01 Thread Jakub Jelinek
Hi!

On Tue, Feb 28, 2017 at 02:47:31PM -1000, Nathan Sidwell wrote:
> On 02/28/2017 02:41 PM, Jason Merrill wrote:
> > On Tue, Feb 28, 2017 at 12:48 PM, Jakub Jelinek  wrote:
> > > The DR1659/DR1611 changes result in construct_virtual_base not being 
> > > called,
> > > but unfortunately the call generated in there was the spot that caused
> > > mark_exp_read on the arguments passed to the vbase construction (TREE_USED
> > > is set on these earlier already during parsing them).  That results
> > > in false positive -Wunused-but-set-parameter warnings.
> > > 
> > > The following patch tries to avoid the warning in that case by marking
> > > the arguments as read (essentially pretending they were read in the 
> > > omitted
> > > call).  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for 
> > > trunk?
> > 
> > I believe there's some question still about whether the DR1659/11
> > changes are what we want; I'll defer to Nathan on this patch.
> 
> Jakub's patch is OK.  The defect I've reported is how 1658 interacts with
> virtual destructors. (bug 79393)

Unfortunately my patch apparently broke the case where such ctor in virtual
class has no arguments (void_type_node is used in that case instead of a
TREE_LIST, it is a little bit weird (I'd have expected perhaps
void_list_node instead), but it is what it does).

Plus, as the following testcase shows, mark_exp_read really expects to be
called on quite narrow sets of expressions that are being emitted, while
in arguments because it is not really emitted we can have e.g. nested
CONSTRUCTORs etc. and mark_exp_read doesn't handle those.

So this patch in addition to not walking anything for
arguments == void_type_node
just walks the arguments and calls mark_exp_read on all PARM_DECLs in there
(I think that is all we care about, we can't have there VAR_DECLs or
RESULT_DECLs).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-03-01  Jakub Jelinek  

PR c++/79782
* init.c (mark_exp_read_r): New function.
(emit_mem_initializers): Don't walk arguments if it is void_type_node.
Use cp_walk_tree with mark_exp_read_r instead of plain mark_exp_read.

* g++.dg/warn/Wunused-parm-10.C: New test.

--- gcc/cp/init.c.jj2017-03-01 09:35:19.0 +0100
+++ gcc/cp/init.c   2017-03-01 16:59:09.014981469 +0100
@@ -1127,6 +1127,17 @@ sort_mem_initializers (tree t, tree mem_
   return sorted_inits;
 }
 
+/* Callback for cp_walk_tree to mark all PARM_DECLs in a tree as read.  */
+
+static tree
+mark_exp_read_r (tree *tp, int *, void *)
+{
+  tree t = *tp;
+  if (TREE_CODE (t) == PARM_DECL)
+mark_exp_read (t);
+  return NULL_TREE;
+}
+
 /* Initialize all bases and members of CURRENT_CLASS_TYPE.  MEM_INITS
is a TREE_LIST giving the explicit mem-initializer-list for the
constructor.  The TREE_PURPOSE of each entry is a subobject (a
@@ -1217,12 +1228,12 @@ emit_mem_initializers (tree mem_inits)
/* C++14 DR1658 Means we do not have to construct vbases of
   abstract classes.  */
construct_virtual_base (subobject, arguments);
-  else
+  else if (arguments != void_type_node)
/* When not constructing vbases of abstract classes, at least mark
   the arguments expressions as read to avoid
   -Wunused-but-set-parameter false positives.  */
for (tree arg = arguments; arg; arg = TREE_CHAIN (arg))
- mark_exp_read (TREE_VALUE (arg));
+ cp_walk_tree (&TREE_VALUE (arg), mark_exp_read_r, NULL, NULL);
 
   if (inherited_base)
pop_deferring_access_checks ();
--- gcc/testsuite/g++.dg/warn/Wunused-parm-10.C.jj  2017-03-01 
17:02:41.811195793 +0100
+++ gcc/testsuite/g++.dg/warn/Wunused-parm-10.C 2017-03-01 17:01:31.0 
+0100
@@ -0,0 +1,12 @@
+// PR c++/79782
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-but-set-parameter -Wunused-parameter" }
+
+struct E { virtual E *foo () const = 0; };
+struct F : virtual public E { };
+struct G : public virtual F { G (int x) : F () { } };  
// { dg-warning "unused parameter" }
+struct H : virtual public E { H (int x, int y); };
+struct I : public virtual H { I (int x, int y) : H (x, y) { } };   
// { dg-bogus "set but not used" }
+struct J : public virtual H { J (int x, int y) : H { x, y } { } }; 
// { dg-bogus "set but not used" }
+struct K : public virtual H { K (int x, int y) : H (x * 0, y + 1) { } };   
// { dg-bogus "set but not used" }
+struct L : public virtual H { L (int x, int y) : H { x & 0, y | 1 } { } }; 
// { dg-bogus "set but not used" }


Jakub


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >
>> >> >> Sorry for the late reply, but:
>> >> >> 
>> >> >> Richard Biener  writes:
>> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >> >
>> >> >> >> 
>> >> >> >> Currently we force peeling for gaps whenever element overrun can 
>> >> >> >> occur
>> >> >> >> but for aligned accesses we know that the loads won't trap and thus
>> >> >> >> we can avoid this.
>> >> >> >> 
>> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> >> >> some testsuite fallout here so didn't bother to invent a new 
>> >> >> >> testcase).
>> >> >> >> 
>> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >> >
>> >> >> > This is what I applied.
>> >> >> >
>> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >> >
>> >> >> > Richard.
>> >> >> [...]
>> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> >> > index 15aec21..c29e73d 100644
>> >> >> > --- a/gcc/tree-vect-stmts.c
>> >> >> > +++ b/gcc/tree-vect-stmts.c
>> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> >> >> > vectype, bool slp,
>> >> >> >/* If there is a gap at the end of the group then these 
>> >> >> > optimizations
>> >> >> >   would access excess elements in the last iteration.  */
>> >> >> >bool would_overrun_p = (gap != 0);
>> >> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> >> > +  if (would_overrun_p
>> >> >> > +  && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> >> > +would_overrun_p = false;
>> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >> >&& (can_overrun_p || !would_overrun_p)
>> >> >> >&& compare_step_with_zero (stmt) > 0)
>> >> >> 
>> >> >> ...is this right for all cases?  I think it only looks for 
>> >> >> single-vector
>> >> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> >> at least for load-lanes.
>> >> >>
>> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> >> >> spans two vectors and we have no guarantee that the second one
>> >> >> is mapped.
>> >> >
>> >> > The check assumes that if aligned_access_p () returns true then the
>> >> > whole access is aligned in a way that it can't cross page boundaries.
>> >> > That's of course not the case if alignment is 16 bytes but the access
>> >> > will be a multiple of that.
>> >> >  
>> >> >> I haven't been able to come up with a testcase though.  We seem to be
>> >> >> overly conservative when computing alignments.
>> >> >
>> >> > Not sure if we can run into this with load-lanes given that bumps the
>> >> > vectorization factor.  Also does load-lane work with gaps?
>> >> >
>> >> > I think that gap can never be larger than nunits-1 so it is by 
>> >> > definition
>> >> > in the last "vector" independent of the VF.
>> >> >
>> >> > Classical gap case is
>> >> >
>> >> > for (i=0; i> >> >  {
>> >> >y[3*i + 0] = x[4*i + 0];
>> >> >y[3*i + 1] = x[4*i + 1];
>> >> >y[3*i + 2] = x[4*i + 2];
>> >> >  }
>> >> >
>> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> >> > the y's different streams and you should get the perfect case for
>> >> > load-lane:
>> >> >
>> >> > for (i=0; i> >> >  {
>> >> >y[i] = x[4*i + 0];
>> >> >z[i] = x[4*i + 1];
>> >> >w[i] = x[4*i + 2];
>> >> >  } 
>> >> >
>> >> > previously we'd peel at least 4 iterations into the epilogue for
>> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> >> > ok.
>> >> 
>> >> The case I was thinking of was like the second, but with the
>> >> element type being DI or DF and with the + 2 statement removed.
>> >> E.g.:
>> >> 
>> >> double __attribute__((noinline))
>> >> foo (double *a)
>> >> {
>> >>   double res = 0.0;
>> >>   for (int n = 0; n < 256; n += 4)
>> >> res += a[n] + a[n + 1];
>> >>   return res;
>> >> }
>> >> 
>> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> >> and a[n + 3].
>> >
>> > Yes, indeed.  It's safe when peeling for gaps would remove
>> > N < alignof (ref) / sizeof (ref) scalar iterations.
>> >
>> > Peeling for gaps simply subtracts one from the niter of the vectorized 
>> > loop.
>> 
>> I think subtracting one is enough in all cases.  It's only the final
>> iteration of the scalar loop that can't access a[n + 2] and a[n + 3].
>> 
>> (Of course, subtracting one happens before peeling for niters, so it
>> only makes a difference if the original niters was a multiple of the VF,
>> in which case we peel a full vector'

New Spanish PO file for 'gcc' (version 7.1-b20170226)

2017-03-01 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-7.1-b20170226.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

2017-03-01 Thread Thomas Koenig

Hello world,

the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul.  This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a suitable machine).

Question: Is this still appropriate for the current state of trunk?
Or rather, OK for when gcc 8 opens (which might still be some time
in the future)?

2017-03-01  Thomas Koenig  

PR fortran/78379
* m4/matmul.m4: (matmul_'rtype_code`_avx2): Also generate for
reals.  Add fma to target options.
(matmul_'rtype_code`_avx512f): Add fma to target options.
(matmul_'rtype_code`):  Call AVX2 and AVX512F only if
FMA is available.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Regenerated.
* generated/matmul_c4.c: Regenerated.
* generated/matmul_c8.c: Regenerated.
* generated/matmul_i1.c: Regenerated.
* generated/matmul_i16.c: Regenerated.
* generated/matmul_i2.c: Regenerated.
* generated/matmul_i4.c: Regenerated.
* generated/matmul_i8.c: Regenerated.
* generated/matmul_r10.c: Regenerated.
* generated/matmul_r16.c: Regenerated.
* generated/matmul_r4.c: Regenerated.
* generated/matmul_r8.c: Regenerated.

Regards

Thomas
Index: m4/matmul.m4
===
--- m4/matmul.m4	(Revision 245760)
+++ m4/matmul.m4	(Arbeitskopie)
@@ -75,14 +75,6 @@
 	int blas_limit, blas_call gemm);
 export_proto(matmul_'rtype_code`);
 
-'ifelse(rtype_letter,`r',dnl
-`#if defined(HAVE_AVX) && defined(HAVE_AVX2)
-/* REAL types generate identical code for AVX and AVX2.  Only generate
-   an AVX2 function if we are dealing with integer.  */
-#undef HAVE_AVX2
-#endif')
-`
-
 /* Put exhaustive list of possible architectures here here, ORed together.  */
 
 #if defined(HAVE_AVX) || defined(HAVE_AVX2) || defined(HAVE_AVX512F)
@@ -101,7 +93,7 @@
 `static void
 'matmul_name` ('rtype` * const restrict retarray, 
 	'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx2")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma")));
 static' include(matmul_internal.m4)dnl
 `#endif /* HAVE_AVX2 */
 
@@ -110,7 +102,7 @@
 `static void
 'matmul_name` ('rtype` * const restrict retarray, 
 	'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f,fma")));
 static' include(matmul_internal.m4)dnl
 `#endif  /* HAVE_AVX512F */
 
@@ -138,7 +130,9 @@
 	{
   /* Run down the available processors in order of preference.  */
 #ifdef HAVE_AVX512F
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+	  && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
+
 	{
 	  matmul_p = matmul_'rtype_code`_avx512f;
 	  goto tailcall;
@@ -147,7 +141,8 @@
 #endif  /* HAVE_AVX512F */
 
 #ifdef HAVE_AVX2
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+	 && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
 	{
 	  matmul_p = matmul_'rtype_code`_avx2;
 	  goto tailcall;


Re: [PATCH, rs6000] Document default code model for 64-bit Linux

2017-03-01 Thread Segher Boessenkool
On Wed, Mar 01, 2017 at 12:45:41PM -0600, Bill Schmidt wrote:
> The PowerPC documentation doesn't currently identify the default code model.
> This is rather complicated due to all the various subtargets, but it is
> valuable to at least document the common case for 64-bit Linux.
> 
> Verified on powerpc64le-unknown-linux-gnu.  Ok for trunk?

Sure.  Thanks!


Segher


[PATCH] handling address mode changes inside extract_bit_field

2017-03-01 Thread Jim Wilson
This is a proposed patch for the bug 79794 which I just submitted.
This isn't a regression, so this can wait for after the gcc 7 branch
if necessary.

The problem here is that a reg+offset MEM target is passed to
extract_bit_field with a vector register source.  On aarch64, we have
an instruction for this, but it accepts a reg address only, so the
address gets loaded into a reg inside extract_bit_field.  We then
return to expand_expr which does
  ! rtx_equal_p (temp, target)
which fails because of the address mode change, so we end up copying
target into a reg and then back to itself.

expand_expr has a solution for this problem.  There is an alt_rtl
variable that can be set when temp is logically the same as target.
This variable is currently not passed into extract_bit_field.  This
patch does that.

There is an additional complication that the actual address load into
a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable
to pass alt_reg into that.  However, I can grab a bit from the
expand_operand structure to indicate when an operand is the target,
and then clear it if target is replaced with a reg.

The resulting patch works, but ends up a bit more invasive than I
hoped.  The patch has passed a bootstrap and make check test on x86_64
and aarch64.

Jim
Proposed patch for RTL expand bug affecting aarch64 vector code.

	PR middle-end/79794
	* expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before
	maybe_expand_insn call, set ops[0].target.  If still set after call,
	set alt_rtl.  Add extra arg to recursive calls.
	(extract_bit_field): Add alt_rtl argument.  Pass to
	extract_bit_field.
	* expmed.h (extract_bit_field): Fix prototype.
	* expr.c (emit_group_load_1, copy_blkmode_from_reg)
	(copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL
	to extract_bit_field_calls.
	(expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.
	Pass alt_rtl to extract_bit_field calls.
	* calls.c (store_unaligned_arguments_into_psuedos)
	load_register_parameters): Pass extra NULL to extract_bit_field calls.
	* optabs.c (maybe_legitimize_operand): Clear op->target when call
	gen_reg_rtx.
	* optabs.h (struct expand_operand): Add target bitfield.

Index: gcc/calls.c
===
--- gcc/calls.c	(revision 245764)
+++ gcc/calls.c	(working copy)
@@ -1161,7 +1161,7 @@ store_unaligned_arguments_into_pseudos (struct arg
 
 	args[i].aligned_regs[j] = reg;
 	word = extract_bit_field (word, bitsize, 0, 1, NULL_RTX,
-  word_mode, word_mode, false);
+  word_mode, word_mode, false, NULL);
 
 	/* There is no need to restrict this code to loading items
 	   in TYPE_ALIGN sized hunks.  The bitfield instructions can
@@ -2554,7 +2554,8 @@ load_register_parameters (struct arg_data *args, i
 		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
 		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
 		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest,
-	 word_mode, word_mode, false);
+	 word_mode, word_mode, false,
+	 NULL);
 		  if (BYTES_BIG_ENDIAN)
 		x = expand_shift (LSHIFT_EXPR, word_mode, x,
   BITS_PER_WORD - bitsize, dest, 1);
Index: gcc/expmed.c
===
--- gcc/expmed.c	(revision 245764)
+++ gcc/expmed.c	(working copy)
@@ -1528,7 +1528,7 @@ static rtx
 extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
 		 machine_mode mode, machine_mode tmode,
-		 bool reverse, bool fallback_p)
+		 bool reverse, bool fallback_p, rtx *alt_rtl)
 {
   rtx op0 = str_rtx;
   machine_mode int_mode;
@@ -1604,10 +1604,13 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
   unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
 
   create_output_operand (&ops[0], target, innermode);
+  ops[0].target = 1;
   create_input_operand (&ops[1], op0, outermode);
   create_integer_operand (&ops[2], pos);
   if (maybe_expand_insn (icode, 3, ops))
 	{
+	  if (alt_rtl && ops[0].target)
+	*alt_rtl = target;
 	  target = ops[0].value;
   	  if (GET_MODE (target) != mode)
 	return gen_lowpart (tmode, target);
@@ -1729,7 +1732,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
 	= extract_bit_field_1 (op0, MIN (BITS_PER_WORD,
 	 bitsize - i * BITS_PER_WORD),
    bitnum + bit_offset, 1, target_part,
-   mode, word_mode, reverse, fallback_p);
+   mode, word_mode, reverse, fallback_p, NULL);
 
 	  gcc_assert (target_part);
 	  if (!result_part)
@@ -1832,7 +1835,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
 	  xop0 = copy_to_reg (xop0);
 	  rtx result = extract_bit_field_1 (xop0, bitsize, bitpos,
 	unsignedp, target,
-	mode, tmode, reverse, false);
+	mode, tmode, reverse, false, NULL);
 	  if (result)
 	return result;
 

[PATCH] free MPFR caches in gimple-ssa-sprintf.c (PR 79699)

2017-03-01 Thread Martin Sebor

The uses of MPFR in gimple-ssa-sprintf.c apparently cause
the library to allocates some internal caches that it then leaks
on program exit, causing Valgrind memory leak errors.  The MPFR
manual "strongly advises to [call mpfr_free_cache] before
terminating a thread, or before exiting when using tools like
'valgrind' (to avoid memory leaks being reported)) so the attached
patch does just that.

It' seems like an obvious fix that could presumably be committed
without a review or approval but I'd like to give others a chance
to comment on the placement of the call and whether it should be
guarded by ENABLE_VALGRIND_ANNOTATIONS.

Joseph, since you commented on the bug, do you have a suggestion
for a different site for it or a guard?  The only other call to
the function is in the Fortran FE and it's neither guarded nor
does it appear to ever be called.

Thanks
Martin
PR tree-optimization/79699 - small memory leak in MPFR

gcc/ChangeLog:

	PR tree-optimization/79699
	* gimple-ssa-sprintf.c (pass_sprintf_length::execute): Free MPFR
	caches to avoid a memory leak on program exit.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 7688439..0c00fa0 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -3612,6 +3612,8 @@ pass_sprintf_length::execute (function *fun)
   /* Clean up object size info.  */
   fini_object_sizes ();
 
+  /* Clean up MPFR caches (see bug 79699).  */
+  mpfr_free_cache ();
   return 0;
 }
 


Re: [PATCH] avoid using upper bound of width and precision in -Wformat-overlow (PR 79692)

2017-03-01 Thread Martin Sebor

So in some cases you use

+  /* For an indeterminate precision the lower bound must be assumed
+ to be zero.  */
+  if (prec[0] < 0 && prec[1] >= 0)

Note prec[1] >= 0

In other cases you have:

+/* The minimum output with unknown precision is a single byte
+   (e.g., "0") but the more likely output is 3 bytes ("0.0").  */
+if (dir.prec[0] < 0 && dir.prec[1] > 0)

Note  dir.prec[1] > 0

Shouldn't one of those be changed to be consistent with the other?


Thanks for the careful review!  The two tests determine two different
things so I'm not sure they need to be consistent. But considering
your question made me realize that the first conditional isn't
completely correct:

+  /* For an indeterminate precision the lower bound must be assumed
+ to be zero.  */
+  if (prec[0] < 0 && prec[1] >= 0)
+prec[0] = 0;

Precisions in a negative-positive range with an upper bound of less
than 6 must be assumed to have an upper bound of 6 because that's
the default when precision is negative.  E.g., given
snprintf (0, 0, "%*f", p, 1.23456789) where p is in [-1, 0] results
either in:

  1.234568

when (p == -1) holds, or in:

  1

when (p == 0) holds.  So while the lower bound in the if statement
above must be set to zero, the upper bound may need to be adjusted
as well.

The patch I just committed in r245822 fixes that (and also changes
the conditional so that consistency is no longer an issue).

However, while reviewing the rest of the floating point handling
code it became clear that this is a bug that affects both floating
point formatting functions (i.e., the one that handles constants
as well as the non-constant one).  I also noticed some other minor
glitches in this area that should be fixed.  I'm testing another
patch that resolves those problem as well.


Similarly in known_width_and_precision.  Please review the patch to
ensure that we're as consistent as possible for these tests.


In known_width_and_precision the different inequalities are
deliberate.  Because the actual width is an absolute value
of the specified argument the range of unknown width is
[0, INT_MAX + 1] (printf("%*i", INT_MIN, i) specifies a width
of -INT_MIN, or INT_MAX + 1 (without overflow).  But because
negative precisions are ignored, the largest upper bound is
INT_MAX.

Martin



Re: [PATCH] free MPFR caches in gimple-ssa-sprintf.c (PR 79699)

2017-03-01 Thread Joseph Myers
On Wed, 1 Mar 2017, Martin Sebor wrote:

> Joseph, since you commented on the bug, do you have a suggestion
> for a different site for it or a guard?  The only other call to
> the function is in the Fortran FE and it's neither guarded nor
> does it appear to ever be called.

I don't think a guard is needed.  Arguably it should be called from an 
atexit handler, but since we don't have such a handler calling it from the 
relevant pass seems reasonable (and I'm not sure what the right way to 
handle such freeing of memory in the JIT context is).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C++ PATCH] -Wunused-but-set-parameter fix followup (PR c++/79782)

2017-03-01 Thread Nathan Sidwell

On 03/01/2017 09:40 AM, Jakub Jelinek wrote:


Unfortunately my patch apparently broke the case where such ctor in virtual
class has no arguments (void_type_node is used in that case instead of a
TREE_LIST, it is a little bit weird (I'd have expected perhaps
void_list_node instead), but it is what it does).


Seems funky (but not your problem), which of the testcase(s) does it correspond 
to?


So this patch in addition to not walking anything for
arguments == void_type_node
just walks the arguments and calls mark_exp_read on all PARM_DECLs in there
(I think that is all we care about, we can't have there VAR_DECLs or
RESULT_DECLs).


I suppose someone could pass in a global VAR_DECL, but if the ctor's the only 
use of that decl, it's rather stupid.


Do you actually need to iterate over the arg list -- can't you just pass 
ARGUMENTS straight into cp_walk_tree?


Ok with or without that change.

nathan

--
Nathan Sidwell


Re: C++ PATCH for C++17 class template argument deduction issues

2017-03-01 Thread Jason Merrill
On Tue, Feb 28, 2017 at 1:56 PM, Jason Merrill  wrote:
> This patch implements some proposed resolutions to open issues with
> C++17 class template argument deduction.

And some more:
commit 41e5f38da5699736eb02a5b9c65549799c288714
Author: Jason Merrill 
Date:   Wed Mar 1 13:15:11 2017 -1000

Class template argument deduction in new-expression
* init.c (build_new): Handle deduction from no initializer.
* parser.c (cp_parser_new_expression): Don't require a single
expression for class template deduction.
* typeck2.c (cxx_incomplete_type_diagnostic): Fix diagnostic for
class template placeholder.
* pt.c (tsubst_copy) [TEMPLATE_DECL]: Handle dependent context.
(tsubst_copy_and_build) [TEMPLATE_ID_EXPR]: Handle SCOPE_REF.
(redeclare_class_template): Set TEMPLATE_TYPE_PARM_FOR_CLASS.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 7ded37e..191fe13 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3478,15 +3478,19 @@ build_new (vec **placement, tree type, 
tree nelts,
   if (type == error_mark_node)
 return error_mark_node;
 
-  if (nelts == NULL_TREE && vec_safe_length (*init) == 1
+  if (nelts == NULL_TREE
   /* Don't do auto deduction where it might affect mangling.  */
   && (!processing_template_decl || at_function_scope_p ()))
 {
   tree auto_node = type_uses_auto (type);
   if (auto_node)
{
- tree d_init = (**init)[0];
- d_init = resolve_nondeduced_context (d_init, complain);
+ tree d_init = NULL_TREE;
+ if (vec_safe_length (*init) == 1)
+   {
+ d_init = (**init)[0];
+ d_init = resolve_nondeduced_context (d_init, complain);
+   }
  type = do_auto_deduction (type, d_init, auto_node);
}
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 50528e2..e684870 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -8228,7 +8228,8 @@ cp_parser_new_expression (cp_parser* parser)
  contain a new-initializer of the form ( assignment-expression )".
  Additionally, consistently with the spirit of DR 1467, we want to accept
  'new auto { 2 }' too.  */
-  else if (type_uses_auto (type)
+  else if ((ret = type_uses_auto (type))
+  && !CLASS_PLACEHOLDER_TEMPLATE (ret)
   && (vec_safe_length (initializer) != 1
   || (BRACE_ENCLOSED_INITIALIZER_P ((*initializer)[0])
   && CONSTRUCTOR_NELTS ((*initializer)[0]) != 1)))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ec9d53a..8144ca6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5732,6 +5732,9 @@ redeclare_class_template (tree type, tree parms, tree 
cons)
  gcc_assert (DECL_CONTEXT (parm) == NULL_TREE);
  DECL_CONTEXT (parm) = tmpl;
}
+
+  if (TREE_CODE (parm) == TYPE_DECL)
+   TEMPLATE_TYPE_PARM_FOR_CLASS (TREE_TYPE (parm)) = true;
 }
 
   // Cannot redeclare a class template with a different set of constraints.
@@ -14638,6 +14641,15 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 have to substitute this with one having context `D'.  */
 
  tree context = tsubst (DECL_CONTEXT (t), args, complain, in_decl);
+ if (dependent_scope_p (context))
+   {
+ /* When rewriting a constructor into a deduction guide, a
+non-dependent name can become dependent, so memtmpl
+becomes context::template memtmpl.  */
+ tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+ return build_qualified_name (type, context, DECL_NAME (t),
+  /*template*/true);
+   }
  return lookup_field (context, DECL_NAME(t), 0, false);
}
   else
@@ -16621,6 +16633,14 @@ tsubst_copy_and_build (tree t,
if (targs == error_mark_node)
  return error_mark_node;
 
+   if (TREE_CODE (templ) == SCOPE_REF)
+ {
+   tree name = TREE_OPERAND (templ, 1);
+   tree tid = lookup_template_function (name, targs);
+   TREE_OPERAND (templ, 1) = tid;
+   return templ;
+ }
+
if (variable_template_p (templ))
  RETURN (lookup_and_finish_template_variable (templ, targs, complain));
 
@@ -25144,7 +25164,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init, 
int flags,
   type = TREE_TYPE (most_general_template (tmpl));
 }
 
-  bool saw_default = false;
+  bool saw_ctor = false;
   bool saw_copy = false;
   if (CLASSTYPE_METHOD_VEC (type))
 // FIXME cache artificial deduction guides
@@ -25154,9 +25174,9 @@ do_class_deduction (tree ptype, tree tmpl, tree init, 
int flags,
tree guide = build_deduction_guide (fn, outer_args, complain);
cands = ovl_cons (guide, cands);
 
+   saw_ctor = true;
+
tree parms = FUNCTION_FIRST_USER_PARMTYPE (fn);
-   if (sufficient_parms_p (parms))
- saw_def

Re: [patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

2017-03-01 Thread Jerry DeLisle

On 03/01/2017 01:00 PM, Thomas Koenig wrote:

Hello world,

the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul.  This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a suitable machine).

Question: Is this still appropriate for the current state of trunk?
Or rather, OK for when gcc 8 opens (which might still be some time
in the future)?


I think it may be appropriate now because you are making an adjustment to the 
just added new feature.


I would prefer that it was tested on the actual expected platform. Does anyone 
anywhere on this list have access to one of these machines to test?


Jerry




[PATCH] PR 79798 Fix incorrect use of std::result_of in std::bind

2017-03-01 Thread Jonathan Wakely
Another case of problems caused by incorrect use of result_of. Because
functions can't have top-level const on parameters result_of is result_of so doesn't give you the answer for the question
you meant to ask.

PR libstdc++/79798
* include/std/functional (bind::_Res_type_impl): Fix incorrect use of
result_of that loses top-level cv-qualifiers.
* testsuite/20_util/bind/79798.cc: New test.


Tested powerpc64le-linux, committed to trunk.
commit d2d652a7e40288a89259bae17eddec9cde0e177c
Author: Jonathan Wakely 
Date:   Thu Mar 2 01:33:10 2017 +

PR 79798 Fix incorrect use of std::result_of in std::bind

PR libstdc++/79798
* include/std/functional (bind::_Res_type_impl): Fix incorrect use of
result_of that loses top-level cv-qualifiers.
* testsuite/20_util/bind/79798.cc: New test.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 4f3d8b3..ea36dd0 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -502,7 +502,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 
   template
using _Res_type_impl
- = typename result_of< _Fn&(_Mu_type<_BArgs, _CallArgs>...) >::type;
+ = typename result_of< _Fn&(_Mu_type<_BArgs, _CallArgs>&&...) >::type;
 
   template
using _Res_type = _Res_type_impl<_Functor, _CallArgs, _Bound_args...>;
diff --git a/libstdc++-v3/testsuite/20_util/bind/79798.cc 
b/libstdc++-v3/testsuite/20_util/bind/79798.cc
new file mode 100644
index 000..9780ff4
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/bind/79798.cc
@@ -0,0 +1,33 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++11 } }
+
+#include 
+
+// PR libstdc++/79798
+
+struct X { };
+const X f(int);
+
+struct Y {
+  void operator()(X&&) = delete;
+  int operator()(const X&&);
+};
+
+auto b = std::bind(Y(), std::bind(f, std::placeholders::_1));
+auto i = b(1);


[PATCH] PR libstdc++/79789 fix non-reserved names in headers

2017-03-01 Thread Jonathan Wakely
Some of these are years old, most are more recent. The new testcase
should help prevent us trying to use these names again.

PR libstdc++/79789
* include/bits/hashtable_policy.h (__clp2): Use reserved names for
parameters and local variables.
* include/bits/ios_base.h (make_error_code, make_error_condition):
Likewise.
* include/bits/list.tcc (list::sort): Likewise.
* include/bits/mask_array.h (mask_array): Likewise.
* include/bits/regex.h (regex_token_iterator): Likewise.
* include/bits/slice_array.h (slice_array): Likewise.
* include/bits/stl_algo.h (__sample): Likewise.
* include/std/memory (undeclare_no_pointers): Likewise.
* include/std/type_traits (is_callable_v, is_nothrow_callable_v):
Likewise.
* libsupc++/exception_ptr.h (__dest_thunk): Likewise.
* testsuite/17_intro/headers/names.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport pieces of this as appropriate to the branches.
commit 4831cf64a6d7a5ee155b88428768067f481f994b
Author: Jonathan Wakely 
Date:   Thu Mar 2 02:55:41 2017 +

PR libstdc++/79789 fix non-reserved names in headers

PR libstdc++/79789
* include/bits/hashtable_policy.h (__clp2): Use reserved names for
parameters and local variables.
* include/bits/ios_base.h (make_error_code, make_error_condition):
Likewise.
* include/bits/list.tcc (list::sort): Likewise.
* include/bits/mask_array.h (mask_array): Likewise.
* include/bits/regex.h (regex_token_iterator): Likewise.
* include/bits/slice_array.h (slice_array): Likewise.
* include/bits/stl_algo.h (__sample): Likewise.
* include/std/memory (undeclare_no_pointers): Likewise.
* include/std/type_traits (is_callable_v, is_nothrow_callable_v):
Likewise.
* libsupc++/exception_ptr.h (__dest_thunk): Likewise.
* testsuite/17_intro/headers/names.cc: New test.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index cfa27a3..8af8c49 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -521,24 +521,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Compute closest power of 2.
   _GLIBCXX14_CONSTEXPR
   inline std::size_t
-  __clp2(std::size_t n) noexcept
+  __clp2(std::size_t __n) noexcept
   {
 #if __SIZEOF_SIZE_T__ >= 8
-std::uint_fast64_t x = n;
+std::uint_fast64_t __x = __n;
 #else
-std::uint_fast32_t x = n;
+std::uint_fast32_t __x = __n;
 #endif
 // Algorithm from Hacker's Delight, Figure 3-3.
-x = x - 1;
-x = x | (x >> 1);
-x = x | (x >> 2);
-x = x | (x >> 4);
-x = x | (x >> 8);
-x = x | (x >>16);
+__x = __x - 1;
+__x = __x | (__x >> 1);
+__x = __x | (__x >> 2);
+__x = __x | (__x >> 4);
+__x = __x | (__x >> 8);
+__x = __x | (__x >>16);
 #if __SIZEOF_SIZE_T__ >= 8
-x = x | (x >>32);
+__x = __x | (__x >>32);
 #endif
-return x + 1;
+return __x + 1;
   }
 
   /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
diff --git a/libstdc++-v3/include/bits/ios_base.h 
b/libstdc++-v3/include/bits/ios_base.h
index 965ec8a..f1ebfcc 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -207,12 +207,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const error_category& iostream_category() noexcept;
 
   inline error_code
-  make_error_code(io_errc e) noexcept
-  { return error_code(static_cast(e), iostream_category()); }
+  make_error_code(io_errc __e) noexcept
+  { return error_code(static_cast(__e), iostream_category()); }
 
   inline error_condition
-  make_error_condition(io_errc e) noexcept
-  { return error_condition(static_cast(e), iostream_category()); }
+  make_error_condition(io_errc __e) noexcept
+  { return error_condition(static_cast(__e), iostream_category()); }
 #endif
 
   // 27.4.2  Class ios_base
diff --git a/libstdc++-v3/include/bits/list.tcc 
b/libstdc++-v3/include/bits/list.tcc
index d80d569..9623a13 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -500,8 +500,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
__catch(...)
  {
this->splice(this->end(), __carry);
-   for (int i = 0; i < sizeof(__tmp)/sizeof(__tmp[0]); ++i)
- this->splice(this->end(), __tmp[i]);
+   for (int __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i)
+ this->splice(this->end(), __tmp[__i]);
__throw_exception_again;
  }
   }
@@ -586,8 +586,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
__catch(...)
  {
this->splice(this->end(), __carry);
-   for (int i = 0; i < sizeof(__tmp)/sizeof(__tmp[0]); ++i)
- this->splice(this->end(), __tmp[i]);
+   for (int __i = 0; 

Re: Disable some concept checks in C++11

2017-03-01 Thread Jonathan Wakely
On 27 February 2017 at 20:11, François Dumont wrote:
> Hi
>
>
> I had some problems when testing pretty printers while having activated
> concept checks. I noticed that std::deque had already the
> _SGIAssignableConcept check disable when using C++11 so I propose to
> generalize to all usages of this concept check.

The use in  seems to still be valid, that type wasn't
updated for C++11 and move semantics, so the original concept checks
still apply, don't they?

I think all the other changes to remove Assignable in C++11 are
correct, and I see no problem making this change now, because it only
relaxes some checks that are not used by most people and not used in
the default configuration.


Re: [patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

2017-03-01 Thread Thomas Koenig

Hi Jerry,


I would prefer that it was tested on the actual expected platform. Does
anyone anywhere on this list have access to one of these machines to test?


If anybody wants to test who does not have --enable-maintainer-mode
activated, here is a patch that works "out of the box".

Regards

Thomas
Index: generated/matmul_c10.c
===
--- generated/matmul_c10.c	(Revision 245760)
+++ generated/matmul_c10.c	(Arbeitskopie)
@@ -74,9 +74,6 @@ extern void matmul_c10 (gfc_array_c10 * const rest
 	int blas_limit, blas_call gemm);
 export_proto(matmul_c10);
 
-
-
-
 /* Put exhaustive list of possible architectures here here, ORed together.  */
 
 #if defined(HAVE_AVX) || defined(HAVE_AVX2) || defined(HAVE_AVX512F)
@@ -628,7 +625,7 @@ matmul_c10_avx (gfc_array_c10 * const restrict ret
 static void
 matmul_c10_avx2 (gfc_array_c10 * const restrict retarray, 
 	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx2")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma")));
 static void
 matmul_c10_avx2 (gfc_array_c10 * const restrict retarray, 
 	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
@@ -1171,7 +1168,7 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict re
 static void
 matmul_c10_avx512f (gfc_array_c10 * const restrict retarray, 
 	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f,fma")));
 static void
 matmul_c10_avx512f (gfc_array_c10 * const restrict retarray, 
 	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
@@ -2268,7 +2265,9 @@ void matmul_c10 (gfc_array_c10 * const restrict re
 	{
   /* Run down the available processors in order of preference.  */
 #ifdef HAVE_AVX512F
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+	  && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
+
 	{
 	  matmul_p = matmul_c10_avx512f;
 	  goto tailcall;
@@ -2277,7 +2276,8 @@ void matmul_c10 (gfc_array_c10 * const restrict re
 #endif  /* HAVE_AVX512F */
 
 #ifdef HAVE_AVX2
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+	 && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
 	{
 	  matmul_p = matmul_c10_avx2;
 	  goto tailcall;
Index: generated/matmul_c16.c
===
--- generated/matmul_c16.c	(Revision 245760)
+++ generated/matmul_c16.c	(Arbeitskopie)
@@ -74,9 +74,6 @@ extern void matmul_c16 (gfc_array_c16 * const rest
 	int blas_limit, blas_call gemm);
 export_proto(matmul_c16);
 
-
-
-
 /* Put exhaustive list of possible architectures here here, ORed together.  */
 
 #if defined(HAVE_AVX) || defined(HAVE_AVX2) || defined(HAVE_AVX512F)
@@ -628,7 +625,7 @@ matmul_c16_avx (gfc_array_c16 * const restrict ret
 static void
 matmul_c16_avx2 (gfc_array_c16 * const restrict retarray, 
 	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx2")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma")));
 static void
 matmul_c16_avx2 (gfc_array_c16 * const restrict retarray, 
 	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
@@ -1171,7 +1168,7 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict re
 static void
 matmul_c16_avx512f (gfc_array_c16 * const restrict retarray, 
 	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
-	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f")));
+	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f,fma")));
 static void
 matmul_c16_avx512f (gfc_array_c16 * const restrict retarray, 
 	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
@@ -2268,7 +2265,9 @@ void matmul_c16 (gfc_array_c16 * const restrict re
 	{
   /* Run down the available processors in order of preference.  */
 #ifdef HAVE_AVX512F
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX512F))
+	  && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
+
 	{
 	  matmul_p = matmul_c16_avx512f;
 	  goto tailcall;
@@ -2277,7 +2276,8 @@ void matmul_c16 (gfc_array_c16 * const restrict re
 #endif  /* HAVE_AVX512F */
 
 #ifdef HAVE_AVX2
-  	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+  	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
+	 && (__cpu_model.__cpu_features

Re: [C++ PATCH] -Wunused-but-set-parameter fix followup (PR c++/79782)

2017-03-01 Thread Jakub Jelinek
On Wed, Mar 01, 2017 at 03:10:15PM -1000, Nathan Sidwell wrote:
> On 03/01/2017 09:40 AM, Jakub Jelinek wrote:
> 
> > Unfortunately my patch apparently broke the case where such ctor in virtual
> > class has no arguments (void_type_node is used in that case instead of a
> > TREE_LIST, it is a little bit weird (I'd have expected perhaps
> > void_list_node instead), but it is what it does).
> 
> Seems funky (but not your problem), which of the testcase(s) does it 
> correspond to?

That is ICE on the last line of:

struct E { virtual E *foo () const = 0; };
struct F : virtual public E { };
struct G : public virtual F { G (int x) : F () { } };

where arguments == void_type_node:

cp_parser_mem_initializer has:

  if (!expression_list)
expression_list = void_type_node;

but it is documented:

   Returns a TREE_LIST.  The TREE_PURPOSE is the TYPE (for a base
   class) or FIELD_DECL (for a non-static data member) to initialize;
   the TREE_VALUE is the expression-list.  An empty initialization
   list is represented by void_list_node.  */

and e.g. pt.c has:

  if (TREE_VALUE (t) == void_type_node)
/* VOID_TYPE_NODE is used to indicate
   value-initialization.  */
{
  for (i = 0; i < len; i++)
TREE_VEC_ELT (expanded_arguments, i) = void_type_node;
}

> > So this patch in addition to not walking anything for
> > arguments == void_type_node
> > just walks the arguments and calls mark_exp_read on all PARM_DECLs in there
> > (I think that is all we care about, we can't have there VAR_DECLs or
> > RESULT_DECLs).
> 
> I suppose someone could pass in a global VAR_DECL, but if the ctor's the
> only use of that decl, it's rather stupid.

We don't track -Wunused-but-set-* for global VAR_DECLs, it is just for
automatic vars and parameters.

> Do you actually need to iterate over the arg list -- can't you just pass
> ARGUMENTS straight into cp_walk_tree?

You're right, that works too, and actually doesn't mind void_type_node
either.  So I'll retest:

2017-03-02  Jakub Jelinek  

PR c++/79782
* init.c (mark_exp_read_r): New function.
(emit_mem_initializers): Use cp_walk_tree with mark_exp_read_r on
whole arguments instead of plain mark_exp_read on TREE_LIST values.

* g++.dg/warn/Wunused-parm-10.C: New test.

--- gcc/cp/init.c.jj2017-03-02 08:08:42.736368162 +0100
+++ gcc/cp/init.c   2017-03-02 08:15:51.805770171 +0100
@@ -1127,6 +1127,17 @@ sort_mem_initializers (tree t, tree mem_
   return sorted_inits;
 }
 
+/* Callback for cp_walk_tree to mark all PARM_DECLs in a tree as read.  */
+
+static tree
+mark_exp_read_r (tree *tp, int *, void *)
+{
+  tree t = *tp;
+  if (TREE_CODE (t) == PARM_DECL)
+mark_exp_read (t);
+  return NULL_TREE;
+}
+
 /* Initialize all bases and members of CURRENT_CLASS_TYPE.  MEM_INITS
is a TREE_LIST giving the explicit mem-initializer-list for the
constructor.  The TREE_PURPOSE of each entry is a subobject (a
@@ -1221,8 +1232,7 @@ emit_mem_initializers (tree mem_inits)
/* When not constructing vbases of abstract classes, at least mark
   the arguments expressions as read to avoid
   -Wunused-but-set-parameter false positives.  */
-   for (tree arg = arguments; arg; arg = TREE_CHAIN (arg))
- mark_exp_read (TREE_VALUE (arg));
+   cp_walk_tree (&arguments, mark_exp_read_r, NULL, NULL);
 
   if (inherited_base)
pop_deferring_access_checks ();
--- gcc/testsuite/g++.dg/warn/Wunused-parm-10.C.jj  2017-03-02 
08:13:25.365661085 +0100
+++ gcc/testsuite/g++.dg/warn/Wunused-parm-10.C 2017-03-02 08:13:25.365661085 
+0100
@@ -0,0 +1,12 @@
+// PR c++/79782
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-but-set-parameter -Wunused-parameter" }
+
+struct E { virtual E *foo () const = 0; };
+struct F : virtual public E { };
+struct G : public virtual F { G (int x) : F () { } };  
// { dg-warning "unused parameter" }
+struct H : virtual public E { H (int x, int y); };
+struct I : public virtual H { I (int x, int y) : H (x, y) { } };   
// { dg-bogus "set but not used" }
+struct J : public virtual H { J (int x, int y) : H { x, y } { } }; 
// { dg-bogus "set but not used" }
+struct K : public virtual H { K (int x, int y) : H (x * 0, y + 1) { } };   
// { dg-bogus "set but not used" }
+struct L : public virtual H { L (int x, int y) : H { x & 0, y | 1 } { } }; 
// { dg-bogus "set but not used" }


Jakub


Re: [patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

2017-03-01 Thread Janne Blomqvist
On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig  wrote:
> Hello world,
>
> the attached patch enables FMA for the AVX2 and AVX512F variants of
> matmul.  This should bring a very nice speedup (although I have
> been unable to run benchmarks due to lack of a suitable machine).

In lieu of benchmarks, have you looked at the generated asm to verify
that fma is actually used?

> Question: Is this still appropriate for the current state of trunk?

Yes, looks pretty safe.



-- 
Janne Blomqvist


[PATCH] S/390: Change 2-byte NOPs

2017-03-01 Thread Robin Dapp
Hi,

the following patch changes "nopr %r7" to "nopr %r0" which is
advantageous from a hardware perspective. It will only be emitted for
hotpatching and should not impact normal code.

Bootstrapped and regression tested on s390 and s390x.

Regards
 Robin

gcc/ChangeLog:

2017-03-02  Robin Dapp  

* config/s390/s390.c (s390_asm_output_function_label): Use nopr %r0.
* config/s390/s390.md: Likewise.

gcc/testsuite/ChangeLog:

2017-03-02  Robin Dapp  

* gcc.target/s390/hotpatch-1.c: Check for nopr %r0.
* gcc.target/s390/hotpatch-10.c: Likewise.
* gcc.target/s390/hotpatch-11.c: Likewise.
* gcc.target/s390/hotpatch-12.c: Likewise.
* gcc.target/s390/hotpatch-13.c: Likewise.
* gcc.target/s390/hotpatch-14.c: Likewise.
* gcc.target/s390/hotpatch-15.c: Likewise.
* gcc.target/s390/hotpatch-16.c: Likewise.
* gcc.target/s390/hotpatch-17.c: Likewise.
* gcc.target/s390/hotpatch-18.c: Likewise.
* gcc.target/s390/hotpatch-19.c: Likewise.
* gcc.target/s390/hotpatch-2.c: Likewise.
* gcc.target/s390/hotpatch-26.c: Likewise.
* gcc.target/s390/hotpatch-27.c: Likewise.
* gcc.target/s390/hotpatch-28.c: Likewise.
* gcc.target/s390/hotpatch-3.c: Likewise.
* gcc.target/s390/hotpatch-4.c: Likewise.
* gcc.target/s390/hotpatch-5.c: Likewise.
* gcc.target/s390/hotpatch-6.c: Likewise.
* gcc.target/s390/hotpatch-7.c: Likewise.
* gcc.target/s390/hotpatch-8.c: Likewise.
* gcc.target/s390/hotpatch-9.c: Likewise.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f98eee7..e8265c6 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7218,11 +7218,11 @@ s390_asm_output_function_label (FILE *asm_out_file, const char *fname,
   /* Add a trampoline code area before the function label and initialize it
 	 with two-byte nop instructions.  This area can be overwritten with code
 	 that jumps to a patched version of the function.  */
-  asm_fprintf (asm_out_file, "\tnopr\t%%r7"
+  asm_fprintf (asm_out_file, "\tnopr\t%%r0"
 		   "\t# pre-label NOPs for hotpatch (%d halfwords)\n",
 		   hw_before);
   for (i = 1; i < hw_before; i++)
-	fputs ("\tnopr\t%r7\n", asm_out_file);
+	fputs ("\tnopr\t%r0\n", asm_out_file);
 
   /* Note:  The function label must be aligned so that (a) the bytes of the
 	 following nop do not cross a cacheline boundary, and (b) a jump address
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index cbf8c0a..e86525b 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10371,7 +10371,7 @@
 (define_insn "nop_2_byte"
   [(unspec_volatile [(const_int 0)] UNSPECV_NOP_2_BYTE)]
   ""
-  "nopr\t%%r7"
+  "nopr\t%%r0"
   [(set_attr "op_type" "RR")])
 
 (define_insn "nop_4_byte"
diff --git a/gcc/testsuite/gcc.target/s390/hotpatch-1.c b/gcc/testsuite/gcc.target/s390/hotpatch-1.c
index 55088b8..5f0f2e1 100644
--- a/gcc/testsuite/gcc.target/s390/hotpatch-1.c
+++ b/gcc/testsuite/gcc.target/s390/hotpatch-1.c
@@ -13,7 +13,7 @@ void hp1(void)
 /* Check number of occurences of certain instructions.  */
 /* { dg-final { scan-assembler-not "pre-label NOPs" } } */
 /* { dg-final { scan-assembler-not "post-label NOPs" } } */
-/* { dg-final { scan-assembler-not "nopr\t%r7" } } */
+/* { dg-final { scan-assembler-not "nopr\t%r0" } } */
 /* { dg-final { scan-assembler-not "nop\t0" } } */
 /* { dg-final { scan-assembler-not "brcl\t0, 0" } } */
 /* { dg-final { scan-assembler-not "alignment for hotpatch" } } */
diff --git a/gcc/testsuite/gcc.target/s390/hotpatch-10.c b/gcc/testsuite/gcc.target/s390/hotpatch-10.c
index d2cb9a2..2308d33 100644
--- a/gcc/testsuite/gcc.target/s390/hotpatch-10.c
+++ b/gcc/testsuite/gcc.target/s390/hotpatch-10.c
@@ -13,7 +13,7 @@ void hp1(void)
 /* Check number of occurences of certain instructions.  */
 /* { dg-final { scan-assembler-not "pre-label NOPs" } } */
 /* { dg-final { scan-assembler-not "post-label NOPs" } } */
-/* { dg-final { scan-assembler-not "nopr\t%r7" } } */
+/* { dg-final { scan-assembler-not "nopr\t%r0" } } */
 /* { dg-final { scan-assembler-not "nop\t0" } } */
 /* { dg-final { scan-assembler-not "brcl\t0, 0" } } */
 /* { dg-final { scan-assembler-not "alignment for hotpatch" } } */
diff --git a/gcc/testsuite/gcc.target/s390/hotpatch-11.c b/gcc/testsuite/gcc.target/s390/hotpatch-11.c
index cabb9d26..56b3596 100644
--- a/gcc/testsuite/gcc.target/s390/hotpatch-11.c
+++ b/gcc/testsuite/gcc.target/s390/hotpatch-11.c
@@ -13,6 +13,6 @@ void hp1(void)
 /* Check number of occurences of certain instructions.  */
 /* { dg-final { scan-assembler "pre-label.*(1 halfwords)" } } */
 /* { dg-final { scan-assembler-not "post-label NOPs" } } */
-/* { dg-final { scan-assembler-times "nopr\t%r7" 1 } } */
+/* { dg-final { scan-assembler-times "nopr\t%r0" 1 } } */
 /* { dg-final { scan-assembler-not "nop\t0" } } */
 /* { dg-final { scan-assembler-not "brcl\t0, 0" } 

Re: [patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

2017-03-01 Thread Thomas Koenig

Am 02.03.2017 um 08:32 schrieb Janne Blomqvist:

On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig  wrote:

Hello world,

the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul.  This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a suitable machine).


In lieu of benchmarks, have you looked at the generated asm to verify
that fma is actually used?


Yes, I did.

Here's something from the new matmul_r8_avx2:

156c:   c4 62 e5 b8 fd  vfmadd231pd %ymm5,%ymm3,%ymm15
1571:   c4 c1 79 10 04 06   vmovupd (%r14,%rax,1),%xmm0
1577:   c4 62 dd b8 db  vfmadd231pd %ymm3,%ymm4,%ymm11
157c:   c4 c3 7d 18 44 06 10vinsertf128 
$0x1,0x10(%r14,%rax,1),%ymm0,%ymm0

1583:   01
1584:   c4 62 ed b8 ed  vfmadd231pd %ymm5,%ymm2,%ymm13
1589:   c4 e2 ed b8 fc  vfmadd231pd %ymm4,%ymm2,%ymm7
158e:   c4 e2 fd a8 ad 30 ffvfmadd213pd 
-0x800d0(%rbp),%ymm0,%ymm5


... and here from matmul_r8_avx512f:

1da8:   c4 a1 7b 10 14 d6   vmovsd (%rsi,%r10,8),%xmm2
1dae:   c4 c2 b1 b9 f0  vfmadd231sd %xmm8,%xmm9,%xmm6
1db3:   62 62 ed 08 b9 e5   vfmadd231sd %xmm5,%xmm2,%xmm28
1db9:   62 62 ed 08 b9 ec   vfmadd231sd %xmm4,%xmm2,%xmm29
1dbf:   62 62 ed 08 b9 f3   vfmadd231sd %xmm3,%xmm2,%xmm30
1dc5:   c4 e2 91 99 e8  vfmadd132sd %xmm0,%xmm13,%xmm5
1dca:   c4 e2 99 99 e0  vfmadd132sd %xmm0,%xmm12,%xmm4
1dcf:   c4 e2 a1 99 d8  vfmadd132sd %xmm0,%xmm11,%xmm3
1dd4:   c4 c2 a9 99 d1  vfmadd132sd %xmm9,%xmm10,%xmm2
1dd9:   c4 c2 89 99 c1  vfmadd132sd %xmm9,%xmm14,%xmm0
1dde:   0f 8e d3 fe ff ff   jle1cb7 



... so this is looking pretty good.

Regards

Thomas


[PATCH] Fix PR79777

2017-03-01 Thread Richard Biener

This fixes the case where we only late during insertion are able to
simplify an expression (when we re-instantiated range-info on all
SSA names).  We can't do anything here but give up since we'd end up
with a SSA name with two values which for sure will eventually end up
confusing elimination.

Hopefully for GCC 8 I can fix all this by re-writing SCCVN to RPO style
iteration.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-03-02  Richard Biener  

PR tree-optimization/79777
* tree-ssa-pre.c (eliminate_insert): Give up if we simplify
the to insert expression to sth existing.

* gcc.dg/torture/pr79777.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 245803)
--- gcc/tree-ssa-pre.c  (working copy)
*** eliminate_insert (gimple_stmt_iterator *
*** 4133,4143 
else
  res = gimple_build (&stmts, gimple_assign_rhs_code (stmt),
TREE_TYPE (val), leader);
!   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
!   VN_INFO_GET (res)->valnum = val;
  
!   if (TREE_CODE (leader) == SSA_NAME)
! gimple_set_plf (SSA_NAME_DEF_STMT (leader), NECESSARY, true);
  
pre_stats.insertions++;
if (dump_file && (dump_flags & TDF_DETAILS))
--- 4133,4174 
else
  res = gimple_build (&stmts, gimple_assign_rhs_code (stmt),
TREE_TYPE (val), leader);
!   if (TREE_CODE (res) != SSA_NAME
!   || SSA_NAME_IS_DEFAULT_DEF (res)
!   || gimple_bb (SSA_NAME_DEF_STMT (res)))
! {
!   gimple_seq_discard (stmts);
  
!   /* During propagation we have to treat SSA info conservatively
!  and thus we can end up simplifying the inserted expression
!at elimination time to sth not defined in stmts.  */
!   /* But then this is a redundancy we failed to detect.  Which means
!  res now has two values.  That doesn't play well with how
!we track availability here, so give up.  */
!   if (dump_file && (dump_flags & TDF_DETAILS))
!   {
! if (TREE_CODE (res) == SSA_NAME)
!   res = eliminate_avail (res);
! if (res)
!   {
! fprintf (dump_file, "Failed to insert expression for value ");
! print_generic_expr (dump_file, val, 0);
! fprintf (dump_file, " which is really fully redundant to ");
! print_generic_expr (dump_file, res, 0);
! fprintf (dump_file, "\n");
!   }
!   }
! 
!   return NULL_TREE;
! }
!   else
! {
!   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
!   VN_INFO_GET (res)->valnum = val;
! 
!   if (TREE_CODE (leader) == SSA_NAME)
!   gimple_set_plf (SSA_NAME_DEF_STMT (leader), NECESSARY, true);
! }
  
pre_stats.insertions++;
if (dump_file && (dump_flags & TDF_DETAILS))
Index: gcc/testsuite/gcc.dg/torture/pr79777.c
===
*** gcc/testsuite/gcc.dg/torture/pr79777.c  (nonexistent)
--- gcc/testsuite/gcc.dg/torture/pr79777.c  (working copy)
***
*** 0 
--- 1,38 
+ /* { dg-do compile } */
+ 
+ typedef unsigned short __u16;
+ typedef unsigned int __u32;
+ typedef unsigned char u8;
+ typedef unsigned int u32;
+ typedef __u16 __le16;
+ typedef __u32 __le32;
+ typedef u32 secno;
+ struct bplus_internal_node {
+ __le32 file_secno;
+ __le32 down;
+ };
+ struct bplus_header {
+ u8 n_used_nodes;
+ __le16 first_free;
+ union {
+   struct bplus_internal_node internal[0];
+ }
+ u;
+ };
+ 
+ __u16 __fswab16(__u16 val);
+ __u32 __fswab32(__u32 val);
+ void hpfs_ea_remove (__u32);
+ 
+ void hpfs_truncate_btree(secno f, int fno, unsigned secs, struct bplus_header 
*btree)
+ {
+   int i, j;
+   for (i = 0; i < btree->n_used_nodes; i++)
+ if ((__builtin_constant_p((__u32)(( 
__u32)(__le32)(btree->u.internal[i].file_secno))) ? ((__u32)( (((__u32)(( 
__u32)(__le32)(btree->u.internal[i].file_secno)) & (__u32)0x00ffUL) << 24) 
| (((__u32)(( __u32)(__le32)(btree->u.internal[i].file_secno)) & 
(__u32)0xff00UL) << 8) | (((__u32)(( 
__u32)(__le32)(btree->u.internal[i].file_secno)) & (__u32)0x00ffUL) >> 8) | 
(((__u32)(( __u32)(__le32)(btree->u.internal[i].file_secno)) & 
(__u32)0xff00UL) >> 24))) : __fswab32(( 
__u32)(__le32)(btree->u.internal[i].file_secno))) >= secs) goto f;
+   return;
+ f:
+   for (j = i + 1; j < btree->n_used_nodes; j++)
+ hpfs_ea_remove((__builtin_constant_p((__u32)(( 
__u32)(__le32)(btree->u.internal[j].down))) ? ((__u32)( (((__u32)(( 
__u32)(__le32)(btree->u.internal[j].down)) & (__u32)0x00ffUL) << 24) | 
(((__u32)(( __u32)(__le32)(btree->u.internal[j].down)) & (__u32)0xff00UL) 
<< 8) | (((__u32)(( __u32)(__le32)(btree->u.internal[j].down)) & 
(__u32)0x00ffUL) >> 8) | (((__u32)(( 
__u32)(__le32)(btree->u.internal[j].down)) & (__u32)0xff00UL) 

Re: [PATCH docs] remove Java from GCC 7 release criteria

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Martin Sebor wrote:

> On 02/28/2017 11:41 PM, Richard Biener wrote:
> > On March 1, 2017 3:34:46 AM GMT+01:00, Martin Sebor 
> > wrote:
> > > On 02/28/2017 01:41 PM, Richard Biener wrote:
> > > > On February 28, 2017 7:00:39 PM GMT+01:00, Jeff Law 
> > > wrote:
> > > > > On 02/28/2017 10:54 AM, Martin Sebor wrote:
> > > > > > The GCC 7 release criteria page mentions Java even though
> > > > > > the front end has been removed.  The attached patch removes Java
> > > > > > from the criteria page.  While reviewing the rest of the text I
> > > > > > noticed a few minor typos that I corrected in the patch as well.
> > > > > > 
> > > > > > Btw., as an aside, I read the page to see if I could find out more
> > > > > > about the "magic" bug counts that are being aimed for to decide
> > > when
> > > > > > to cut the release.  Can someone say what those are and where to
> > > > > > find them?  I understand from the document that they're not exact
> > > > > > but even ballpark numbers would be useful.
> > > > > 
> > > > > OK.
> > > > > 
> > > > > WRT the bug counts.  0 P1 regressions, < 100 P1-P3 regressions.  I'm
> > > > > not
> > > > > sure if that's documented anywhere though.
> > > > 
> > > > Actually the only criteria is zero P1 regressions.  Those are
> > > documented to block a release.
> > > 
> > > Yes, that is mentioned in the document.  Would it be fair to say
> > > that the number of P2 bugs (or regressions) or their nature plays
> > > into the decision in some way as well?  If so, what can the release
> > > criteria say about it?
> > 
> > Ultimatively P2 bugs do not play a role and 'time' will trump them.  OTOH we
> > never were in an uncomfortable situation with P2s at the desired point of
> > release.
> > 
> > Also note that important P2 bugs can be promoted to P1 and not important P1
> > to P2.
> > 
> > > I'm trying to get a better idea which bugs to work on and where
> > > my help might have the biggest impact.  I think having better
> > > visibility into the bug triage process (such as bug priorities
> > > and how they impact the release schedule) might help others
> > > focus too.
> > 
> > In order of importance:
> > - P1
> > - wrong-code, rejects-valid, ice-on-valid (even if not regressions,
> > regressions more important)
> > - P2 regressions, more recent ones first (newest working version)
> 
> I see.  This is helpful, thanks.
> 
> The kinds of problems you mention are discussed in the document
> so just to make the importance clear, would adding the following
> after this sentence
> 
>   In general bugs blocking the release are marked with priority P1
>   (Maintaining the GCC Bugzilla database).
> 
> accurately reflect what you described?
> 
>   As a general rule of thumb, within each priority level, bugs that
>   result in incorrect code are considered more urgent than those
>   that lead to rejecting valid code, which in turn are viewed as
>   more severe than ice-on-valid code (compiler crashes).  More
>   recently reported bugs are also prioritized over very old ones.

I'd rather see to clarify things in bugs/management.html.  Note
that wrong-code, rejects-valid, ice-on-valid are equally important.
Less important would be accepts-invalid or ice-on-invalid or, of course,
missed-optimization.  Also it's not more recently _reported_  bugs
but a [6/7 Regression] is more important to fix than a [5/6/7 Regression]
(this is also why [7 Regression]s are P1 by default).

Richard.

> Martin
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)