[PATCH] Fix PR69715

2016-02-09 Thread Richard Biener

The following fixes update-address-taken to properly reject rewriting
decls into SSA that require fixup of call lhs because that's not done.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-09  Richard Biener  

PR tree-optimization/69715
* tree-ssa.c (execute_update_addresses_taken): Mark non-decl
LHS on calls as non-rewritable.

* gcc.dg/torture/pr69715.c: New testcase.

Index: gcc/tree-ssa.c
===
*** gcc/tree-ssa.c  (revision 233211)
--- gcc/tree-ssa.c  (working copy)
*** execute_update_addresses_taken (void)
*** 1436,1442 
tree lhs = gimple_get_lhs (stmt);
if (lhs
  && TREE_CODE (lhs) != SSA_NAME
! && non_rewritable_lvalue_p (lhs))
{
  decl = get_base_address (lhs);
  if (DECL_P (decl))
--- 1443,1450 
tree lhs = gimple_get_lhs (stmt);
if (lhs
  && TREE_CODE (lhs) != SSA_NAME
! && ((code == GIMPLE_CALL && ! DECL_P (lhs))
! || non_rewritable_lvalue_p (lhs)))
{
  decl = get_base_address (lhs);
  if (DECL_P (decl))
Index: gcc/testsuite/gcc.dg/torture/pr69715.c
===
*** gcc/testsuite/gcc.dg/torture/pr69715.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69715.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ struct __attribute__((may_alias)) S { long long low; int high; };
+ struct S foo (void);
+ long double
+ bar (void)
+ {
+   long double a;
+   *(struct S *)&a = foo ();
+   return a;
+ }


Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-09 Thread Richard Biener
On Mon, 8 Feb 2016, Tom de Vries wrote:

> On 08/02/16 11:54, Richard Biener wrote:
> > On Mon, 8 Feb 2016, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > when compiling the fipa-pta tests in the libgomp testsuite
> > > (omp-nested-2.c,
> > > pr46032.c) with -flto -flto-partition=max, the tests fail in execution
> > > (PR69599).
> > > 
> > > The problem is related to the GOMP/GOACC_parallel optimization we do in
> > > fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a
> > > call
> > > foo._0 (data).
> > > 
> > > The problem is that this optimization is only legal in lto if both:
> > > - foo containing the call GOMP_parallel (&foo._0, data) and
> > > - foo._0
> > > are contained in the same partition.
> > > 
> > > In the case of -flto-partition=max, foo is contained in it's own
> > > partition,
> > > and foo._0 is contained in another partition.  This means the data
> > > argument to
> > > the GOMP_parallel call appears unused, and the setting of the argument is
> > > optimized away, which causes the execution failure.
> > > 
> > > This patch fixes that by testing if foo and foo._0 are part of the same
> > > partition.
> > > 
> > > [ Note that the node_address_taken change in the patch has no effect,
> > > since
> > > nonlocal_p already tests for used_from_other_partition. But I thought it
> > > was
> > > clearer to state the conditions under which we are allowed to ignore
> > > node->address_taken explicitly. ]
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > Build for nvidia accelerator and reg-tested libgomp with various lto
> > > settings.
> > > 
> > > OK for trunk, stage4?
> > 
> > I don't like the in_lto_p checks, why's the check not working
> > for non-LTO?
> > 
> 
> I was not sure if the partition flags were valid outside lto.
> 
> Updated patch removes the in_lto_p checks.
> 
> Bootstrapped on x86_64.
> 
> Build and reg-tested libgomp testsuite.
> 
> OK?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


New German PO file for 'gcc' (version 6.1-b20160131)

2016-02-09 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-6.1-b20160131.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: testsuite fix

2016-02-09 Thread Jonathan Wakely

On 08/02/16 17:48 -0800, Mike Stump wrote:

I’m running the pretty printer test cases on a target with status wrappers, and 
that system works by printing the return code on that output.  It is dependent 
upon the last line being terminated by “\n”, as the code that looks for the 
return code requires the return code at the start of a line.

The below patch added newlines to the ends of all files, so that the status 
wrappers always work.

Ok?


Yes, thanks.



[PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization

2016-02-09 Thread Bin Cheng
Hi,
When counting cost for loop inv, GCC checks if a loop inv can be propagated 
into its use site (a memory reference).  If it cannot be propagated, we 
increase its cost so that it's expensive enough to be hoisted out of loop.  
Currently we simply replace loop inv register in the use site with its 
definition expression, then call validate_changes to check if the result insn 
is valid.  This is weak because validate_changes doesn't take canonicalization 
into consideration.  Given below example:

  Loop inv def:  
   69: r149:SI=r87:SI+const(unspec[`'] 1)
  REG_DEAD r87:SI
  Loop inv use:
   70: r150:SI=[r90:SI*0x4+r149:SI]
  REG_DEAD r149:SI

The address expression after propagation is "r90 * 0x4 + (r87 + 
const(unspec[`']))".  Function validate_changes simply returns false to it. 
 As a matter of fact, the propagation is feasible if we canonicalize address 
expression into the form like "(r90 * 0x4 + r87) + const(unspec[`'])".

This patch fixes the problem by canonicalizing address expression and verifying 
if the new addr is valid.  The canonicalization follows GCC insn 
canonicalization rules.  The test case from bugzilla PR is also included.
As for the canonicalize_address interface, there is another 
canonicalize_address in fwprop.c which only changes shift into mult.  I think 
it would be good to factor out a common RTL interface in GCC, but that's stage1 
work.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-02-09  Bin Cheng  

PR tree-optimization/69052
* loop-invariant.c (canonicalize_address): New function.
(inv_can_prop_to_addr_use): Check validity of address expression
which is canonicalized by above function.

gcc/testsuite/ChangeLog
2016-02-09  Bin Cheng  

PR tree-optimization/69052
* gcc.target/i386/pr69052.c: New test.
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 707f044..157e273 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn *insn, 
bitmap depends_on,
   return inv;
 }
 
+/* Returns a canonical version address for X.  It identifies
+   addr expr in the form of A + B + C.  Following instruction
+   canonicalization rules, MULT operand is moved to the front,
+   CONST operand is moved to the end; also PLUS operators are
+   chained to the left.  */
+
+static rtx
+canonicalize_address (rtx x)
+{
+  rtx op0, op1, op2;
+  machine_mode mode = GET_MODE (x);
+  enum rtx_code code = GET_CODE (x);
+
+  if (code != PLUS)
+return x;
+
+  /* Extract operands from A + B (+ C).  */
+  if (GET_CODE (XEXP (x, 0)) == PLUS)
+{
+  op0 = XEXP (XEXP (x, 0), 0);
+  op1 = XEXP (XEXP (x, 0), 1);
+  op2 = XEXP (x, 1);
+}
+  else if (GET_CODE (XEXP (x, 1)) == PLUS)
+{
+  op0 = XEXP (x, 0);
+  op1 = XEXP (XEXP (x, 1), 0);
+  op2 = XEXP (XEXP (x, 1), 1);
+}
+  else
+{
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+  op2 = NULL_RTX;
+}
+
+  /* Move MULT operand to the front.  */
+  if (!REG_P (op1) && !CONST_INT_P (op1))
+std::swap (op0, op1);
+
+  /* Move CONST operand to the end.  */
+  if (CONST_INT_P (op0))
+std::swap (op0, op1);
+
+  if (op2 != NULL && CONST_INT_P (op1))
+{
+  /* Try to simplify CONST1 + CONST2 into one operand.  */
+  if (CONST_INT_P (op2))
+   {
+ rtx x = simplify_binary_operation (PLUS, mode, op1, op2);
+
+ if (x != NULL_RTX && CONST_INT_P (x))
+   {
+ op1 = x;
+ op2 = NULL_RTX;
+   }
+   }
+  else
+   std::swap (op1, op2);
+}
+  /* Chain PLUS operators to the left.  */
+  op0 = simplify_gen_binary (PLUS, mode, op0, op1);
+  if (op2 == NULL_RTX)
+return op0;
+  else
+return simplify_gen_binary (PLUS, mode, op0, op2);
+}
+
 /* Given invariant DEF and its address USE, check if the corresponding
invariant expr can be propagated into the use or not.  */
 
@@ -761,7 +829,7 @@ static bool
 inv_can_prop_to_addr_use (struct def *def, df_ref use)
 {
   struct invariant *inv;
-  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set, use_set;
   rtx_insn *use_insn = DF_REF_INSN (use);
   rtx_insn *def_insn;
   bool ok;
@@ -778,6 +846,29 @@ inv_can_prop_to_addr_use (struct def *def, df_ref use)
 
   validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
   ok = verify_changes (0);
+  /* Try harder with canonicalization in address expression.  */
+  if (!ok && (use_set = single_set (use_insn)) != NULL_RTX)
+{
+  rtx src, dest, mem = NULL_RTX;
+
+  src = SET_SRC (use_set);
+  dest = SET_DEST (use_set);
+  if (MEM_P (src))
+   mem = src;
+  else if (MEM_P (dest))
+   mem = dest;
+
+  if (mem != NULL_RTX
+ && !memory_address_addr_space_p (GET_MODE (mem),
+  XEXP (mem, 0),
+  MEM_ADD

Re: Combine simplify_set WORD_REGISTER_OPERATIONS

2016-02-09 Thread Alan Modra
On Mon, Feb 08, 2016 at 09:27:36AM -0700, Jeff Law wrote:
> On 01/31/2016 03:16 PM, Alan Modra wrote:
> >The comment says this test is supposed to prevent "a narrower
> >operation than requested", but it actually only allows a larger
> >subreg, not one the same size.  Fix that.
> >
> >Bootstrapped and regression tested powerpc64-linux.  OK for stage1?
> >
> >Note that this bug was found when investigating why gcc-6 does not
> >suffer from pr69548, ie. this bug was masking a powerpc backend bug.
> >
> > * combine.c (simplify_set): Correct WORD_REGISTER_OPERATIONS test.
> 
> Is there a strong need to apply this to gcc6?

No, better to wait for gcc-7, I think.

>  Can we construct a testcase
> where this makes a difference in the code we generate?

I instrumented the combine.c code in question with this

  if (!WORD_REGISTER_OPERATIONS
  && (GET_MODE_SIZE (GET_MODE (src))
  == GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)
{
  FILE *f = fopen ("/tmp/alan", "a");
  fprintf (f, "%s\n", main_input_filename);
  print_inline_rtx (f, src, 0);
  fprintf (f, "\n");
  fclose (f);
}

to see what popped out when bootstrapping gcc on x86_64.  There were
quite a lot of hits, DI -> DF, SI -> SF, V4SF -> TI etc, especially in
the testsuite (I should have dumped options too)..  

Here's the first one:
/src/gcc.git/libgcc/config/libbid/bid64_div.c
(subreg:DF (plus:DI (subreg:DI (reg:DF 841) 0)
(const_int 1 [0x1])) 0)
This one resulted in using lea vs. add, so slightly better code.

One from the testsuite:
/src/gcc.git/gcc/testsuite/gcc.dg/sso/p4.c
(subreg:SF (bswap:SI (reg:SI 99 [ Local_R2.F ])) 0)
When compiling with -Og, this showed

before  after
.loc 3 49 0 .loc 3 49 0
movl-32(%ebp), %eax movl-32(%ebp), %eax
bswap   %eaxbswap   %eax
movl%eax, -44(%ebp) movl%eax, -28(%ebp)
flds-44(%ebp)
fstps   -28(%ebp)

Quite an improvement, if you care about -Og code.

I didn't see any worse code, except some cases that I think were
caused by register allocation differences.

> My inclination would be to approve for gcc-7 as-is, but I'm more hesitant
> for gcc-6.
> 
> jeff

-- 
Alan Modra
Australia Development Lab, IBM


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-09 Thread James Greenhalgh

On Mon, Feb 08, 2016 at 03:24:14PM +0100, Richard Biener wrote:
> On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh
>  wrote:
> > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
> >> Hi James,
> >>
> >> Thanks for reporting this issue.
> >> I prepared slightly different patch since we don't need to add
> >> tree-vect dump option - it is on by default for all tests in /vect
> >> directory.
> >
> > Hm, I added that line as my test runs were showing:
> >
> >   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist
> >
> > I would guess the explicit
> >
> >   /* { dg-options "-O3" } */
> >
> > is clobbering the vect.exp setup of flags?
>
> Yes.  Use { dg-additional-options "-O3" } instead.

I don't see why this test needs anything more than the default vect
options anyway... In which case, the patch would look like this.

Tested on x86-64 where the test passes, and on AArch64 where it is
correctly skipped.

OK?

Thanks,
James

---
2016-02-09  James Greenhalgh  

* gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
gate check on x86_64/i?86.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
index e575f6d..f5cae4f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
 /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
 
 #define N 256
@@ -16,4 +15,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { target { i?86-*-* x86_64-*-* } } } } */


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-09 Thread Richard Biener
On Tue, Feb 9, 2016 at 12:24 PM, James Greenhalgh
 wrote:
>
> On Mon, Feb 08, 2016 at 03:24:14PM +0100, Richard Biener wrote:
>> On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh
>>  wrote:
>> > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
>> >> Hi James,
>> >>
>> >> Thanks for reporting this issue.
>> >> I prepared slightly different patch since we don't need to add
>> >> tree-vect dump option - it is on by default for all tests in /vect
>> >> directory.
>> >
>> > Hm, I added that line as my test runs were showing:
>> >
>> >   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not 
>> > exist
>> >
>> > I would guess the explicit
>> >
>> >   /* { dg-options "-O3" } */
>> >
>> > is clobbering the vect.exp setup of flags?
>>
>> Yes.  Use { dg-additional-options "-O3" } instead.
>
> I don't see why this test needs anything more than the default vect
> options anyway... In which case, the patch would look like this.
>
> Tested on x86-64 where the test passes, and on AArch64 where it is
> correctly skipped.
>
> OK?

Ok.

Richard.

> Thanks,
> James
>
> ---
> 2016-02-09  James Greenhalgh  
>
> * gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
> gate check on x86_64/i?86.
>


Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-09 Thread Jakub Jelinek
On Tue, Feb 09, 2016 at 11:24:57AM +, James Greenhalgh wrote:

Also tested on i686-linux (32-bit), where it previously FAILed too.

> 2016-02-09  James Greenhalgh  
> 
>   * gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
>   gate check on x86_64/i?86.

Ok, thanks.

> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> index e575f6d..f5cae4f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> @@ -1,5 +1,4 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
>  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
>  
>  #define N 256
> @@ -16,4 +15,4 @@ void foo (int n)
>}
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { 
> target { i?86-*-* x86_64-*-* } } } } */


Jakub


Re: [PATCH PR69652, Regression]

2016-02-09 Thread Richard Biener
On Fri, Feb 5, 2016 at 3:54 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is updated patch - I came back to move call statements also since
>  masked loads are presented by internal call. I also assume that for
> the following simple loop
>   for (i = 0; i < n; i++)
> if (b1[i])
>   a1[i] = sqrtf(a2[i] * a2[i] + a3[i] * a3[i]);
> motion must be done for all vector statements in semi-hammock including SQRT.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?

The patch is incredibly hard to parse due to the re-indenting.  Please
consider sending
diffs with -b.

This issue exposes that you are moving (masked) stores across loads without
checking aliasing.  In the specific case those loads are dead and thus
this is safe
but in general I thought we were checking that we are using the same VUSE
during the sinking operation.

Thus, I'd rather have

+ /* Check that LHS does not have uses outside of STORE_BB.  */
+ res = true;
+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+   {
+ gimple *use_stmt;
+ use_stmt = USE_STMT (use_p);
+ if (is_gimple_debug (use_stmt))
+   continue;
+ if (gimple_bb (use_stmt) != store_bb)
+   {
+ res = false;
+ break;
+   }
+   }

also check for the dead code case and DCE those stmts here.  Like so:

   if (has_zero_uses (lhs))
{
  gsi_remove (&gsi_from, true);
  continue;
}

before the above loop.

Richard.

> ChangeLog:
>
> 2016-02-05  Yuri Rumyantsev  
>
> PR tree-optimization/69652
> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
> to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
> skipped scalar statements, introduce variable LAST_VUSE to keep
> vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
> begining of current masked store processing, did source re-formatting,
> skip parsing of debug gimples, stop processing if a gimple with
> volatile operand has been encountered, save scalar statement
> with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
> iterator, change vuse of all saved scalar statements to LAST_VUSE if
> it makes sence.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/torture/pr69652.c: New test.
>
> 2016-02-04 19:40 GMT+03:00 Jakub Jelinek :
>> On Thu, Feb 04, 2016 at 05:46:27PM +0300, Yuri Rumyantsev wrote:
>>> Here is a patch that cures the issues with non-correct vuse for scalar
>>> statements during code motion, i.e. if vuse of scalar statement is
>>> vdef of masked store which has been sunk to new basic block, we must
>>> fix it up.  The patch also fixed almost all remarks pointed out by
>>> Jacub.
>>>
>>> Bootstrapping and regression testing on v86-64 did not show any new 
>>> failures.
>>> Is it OK for trunk?
>>>
>>> ChangeLog:
>>> 2016-02-04  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/69652
>>> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
>>> to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
>>> skipped scalar statements, introduce variable LAST_VUSE that has
>>> vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
>>> begining of current masked store processing, did source re-formatting,
>>> skip parsing of debug gimples, stop processing when call or gimple
>>> with volatile operand habe been encountered, save scalar statement
>>> with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
>>> iterator, change vuse of all saved scalar statements to LAST_VUSE if
>>> it makes sence.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/torture/pr69652.c: New test.
>>
>> Your mailer breaks ChangeLog formatting, so it is hard to check the
>> formatting of the ChangeLog entry.
>>
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69652.c 
>> b/gcc/testsuite/gcc.dg/torture/pr69652.c
>> new file mode 100644
>> index 000..91f30cf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/torture/pr69652.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
>> +
>> +void fn1(double **matrix, int column, int row, int n)
>> +{
>> +  int k;
>> +  for (k = 0; k < n; k++)
>> +if (matrix[row][k] != matrix[column][k])
>> +  {
>> +   matrix[column][k] = -matrix[column][k];
>> +   matrix[row][k] = matrix[row][k] - matrix[column][k];
>> +  }
>> +}
>> \ No newline at end of file
>>
>> Please make sure the last line of the test is a new-line.
>>
>> @@ -6971,6 +6972,8 @@ optimize_mask_stores (struct loop *loop)
>>gsi_next (&gsi))
>> {
>>   stmt = gsi_stmt (gsi);
>> + if (is_gimple_debug (stmt))
>> +   continue;
>>   if (is_gimple_call (stmt)
>>   && gimple_call_internal_p (stmt)

[PATCH] Fix PR69726

2016-02-09 Thread Richard Biener

It turns out if-conversions poor job on

 if (a)
   x[i] = ...;
 else
   x[i] = ...;

results in bogus uninit warnings of x[i] for a variety of reasons.
First of all forwprop (aka match.pd simplification) doesn't fixup
all of if-conversions poor job as canonicalization sometimes
inverts the condition in [VEC_]COND_EXPRs and thus the existing
A ? B : (A ? X : C) -> A ? B : C pattern doesn't apply.  The match.pd
hunk fixes this (albeit in an awkward way - I don't feel like mucking
with genmatch at this stage, nor exactly for the poor [VEC_]COND_EXPR
IL we should rather fix).  Second, the late uninit pass is confused
by the left-over dead code, in this case dead load feeding a dead
VEC_COND_EXPR.  Adding a DCE pass before late uninit as the comment
in passes.def suggests fixes this and also should avoid creating the dead
RTL I've sometimes seen.

Due to the PR69719 fix we're now over the alias-test limit for the
testcase (well, all alias tests are bogus, see PR69732), so I upped
that limit for the testcase.  I'm investigating the Job done there.

Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu.

Richard.

2016-02-09  Richard Biener  

PR tree-optimization/69726
* passes.def: Add DCE pass before late uninit.
* match.pd: Add A ? B : (!A ? C : X) -> A ? B : C patterns to
really fixup if-conversions job.

* gcc.dg/uninit-22.c: New testcase.

Index: gcc/passes.def
===
*** gcc/passes.def  (revision 233241)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 322,336 
NEXT_PASS (pass_fold_builtins);
NEXT_PASS (pass_optimize_widening_mul);
NEXT_PASS (pass_tail_calls);
!   /* FIXME: If DCE is not run before checking for uninitialized uses,
 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
 However, this also causes us to misdiagnose cases that should be
!real warnings (e.g., testsuite/gcc.dg/pr18501.c).
! 
!To fix the false positives in uninit-5.c, we would have to
!account for the predicates protecting the set and the use of each
!variable.  Using a representation like Gated Single Assignment
!may help.  */
/* Split critical edges before late uninit warning to reduce the
   number of false positives from it.  */
NEXT_PASS (pass_split_crit_edges);
--- 322,332 
NEXT_PASS (pass_fold_builtins);
NEXT_PASS (pass_optimize_widening_mul);
NEXT_PASS (pass_tail_calls);
!   /* If DCE is not run before checking for uninitialized uses,
 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
 However, this also causes us to misdiagnose cases that should be
!real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
!   NEXT_PASS (pass_dce);
/* Split critical edges before late uninit warning to reduce the
   number of false positives from it.  */
NEXT_PASS (pass_split_crit_edges);
Index: gcc/match.pd
===
*** gcc/match.pd(revision 233241)
--- gcc/match.pd(working copy)
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 1717,1722 
--- 1717,1745 
   (simplify
(cnd @0 @1 (cnd @0 @2 @3))
(cnd @0 @1 @3))
+  /* A ? B : (!A ? C : X) -> A ? B : C.  */
+  /* ???  This matches embedded conditions open-coded because genmatch
+ would generate matching code for conditions in separate stmts only.
+ The following is still important to merge then and else arm cases
+ from if-conversion.  */
+  (simplify
+   (cnd @0 @1 (cnd @2 @3 @4))
+   (if (COMPARISON_CLASS_P (@0)
+&& COMPARISON_CLASS_P (@2)
+&& invert_tree_comparison
+(TREE_CODE (@0), HONOR_NANS (TREE_OPERAND (@0, 0))) == TREE_CODE 
(@2)
+&& operand_equal_p (TREE_OPERAND (@0, 0), TREE_OPERAND (@2, 0), 0)
+&& operand_equal_p (TREE_OPERAND (@0, 1), TREE_OPERAND (@2, 1), 0))
+(cnd @0 @1 @3)))
+  (simplify
+   (cnd @0 (cnd @1 @2 @3) @4)
+   (if (COMPARISON_CLASS_P (@0)
+&& COMPARISON_CLASS_P (@1)
+&& invert_tree_comparison
+(TREE_CODE (@0), HONOR_NANS (TREE_OPERAND (@0, 0))) == TREE_CODE 
(@1)
+&& operand_equal_p (TREE_OPERAND (@0, 0), TREE_OPERAND (@1, 0), 0)
+&& operand_equal_p (TREE_OPERAND (@0, 1), TREE_OPERAND (@1, 1), 0))
+(cnd @0 @3 @4)))
  
   /* A ? B : B -> B.  */
   (simplify
Index: gcc/testsuite/gcc.dg/uninit-22.c
===
*** gcc/testsuite/gcc.dg/uninit-22.c(revision 0)
--- gcc/testsuite/gcc.dg/uninit-22.c(revision 0)
***
*** 0 
--- 1,69 
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -Wuninitialized --param 
vect-max-version-for-alias-checks=20" } */
+ 
+ #include 
+ 
+ #define A1  2896 /* (1/sqrt(2))<<12 */
+ 

Re: [PATCH] S/390: PR 69625: Add test case

2016-02-09 Thread Dominik Vogt
On Fri, Feb 05, 2016 at 05:07:57PM +0100, Dominik Vogt wrote:
> The attached patch adds a testcase for PR 69625.

Version 2 also runs with -m31.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
ChangeLog

* gcc.target/s390/pr69625.c: Add test case.
>From 5c539cfea4292dc20bb5e7f854101997f11bc215 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 5 Feb 2016 15:13:08 +0100
Subject: [PATCH] S/390: PR 69625: Add test case.

---
 gcc/testsuite/gcc.target/s390/pr69625.c | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr69625.c

diff --git a/gcc/testsuite/gcc.target/s390/pr69625.c b/gcc/testsuite/gcc.target/s390/pr69625.c
new file mode 100644
index 000..f717183
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr69625.c
@@ -0,0 +1,37 @@
+/* Test for PR 69625; make sure that a leaf vararg function does not overwrite
+   the caller's r6.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+extern void abort (void);
+
+__attribute__ ((noinline))
+int
+foo (int x, ...)
+{
+  __builtin_va_list vl;
+  int i;
+
+  __asm__ __volatile__ ("lhi %%r6,1" : : : "r6");
+  __builtin_va_start(vl, x);
+  for (i = 2; i <= 6; i++)
+x += __builtin_va_arg(vl, int);
+  __builtin_va_end (vl);
+
+  return x;
+}
+
+__attribute__ ((noinline))
+void
+bar (int r2, int r3, int r4, int r5, int r6)
+{
+  foo (r2, r3, r4, r5, r6);
+  if (r6 != 6)
+abort ();
+}
+
+int
+main (void)
+{
+  bar (2, 3, 4, 5, 6);
+}
-- 
2.3.0



Re: Fix PR67639

2016-02-09 Thread Matthias Klose

On 08.02.2016 15:26, Bernd Schmidt wrote:

On 12/21/2015 08:39 PM, Jeff Law wrote:

On 12/18/2015 11:38 AM, Bernd Schmidt wrote:

In an earlier fix, the following change was made in varasm.c for invalid
register variables:

--- trunk/gcc/varasm.c2014/08/26 14:59:59214525
+++ trunk/gcc/varasm.c2014/08/26 17:06:31214526
@@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl)
/* As a register variable, it has no section.  */
return;
  }
+  /* Avoid internal errors from invalid register
+ specifications.  */
+  SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE);
+  DECL_HARD_REGISTER (decl) = 0;
+  return;
  }

As seen in PR67639, this makes the IL inconsistent and triggers another
internal error where we expect to see an SSA_NAME instead of a VAR_DECL.

The following patch extends the above slightly, by also setting
DECL_EXTERNAL to pretend that the erroneous variable is actually a
global.

Bootstrapped and tested on x86_64-linux, ok?

OK.


Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to backport?


ChangeLog entry is not backported.



Re: [openacc] reference-typed data mappings

2016-02-09 Thread Cesar Philippidis
On 02/01/2016 09:57 AM, Cesar Philippidis wrote:

> This patch fixes a couple of bugs preventing c++ reference-typed
> variables from working in openacc data clauses. These fixes include:
> 
>  * Teach the gimplifier to filter out pointer data mappings for
>OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>Along with using a firsptrivate mapping for the array base pointers
>in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
> 
>  * Make the data mapping errors emitted by the c and c++ front ends
>more consistent with openacc by reporting data mapping errors, not
>omp-specific map errors.
> 
>  * Add some light checking for duplicate reference mappings in c++. The
>c++ FE still fails to detect duplicate component refs, but that's not
>working in openacc at the moment, anyway.
> 
> Jakub, the latter issue also affects openmp. I've added a simple openmp
> test case, but it could probably be more extensive. Can you add more
> test coverage or tell me what should be included?

While working on a different reduction problem, I noticed that both the
c and c++ front end's are treating reductions as generic data clauses.
That means, parallel reductions of the form

  #pragma acc copy(foo) reduction(+:foo)

would get treated as an error. This patch fixes that, in addition to the
changes listed above.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index eede3a7..20ff7da 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13115,7 +13115,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask 

[PATCH] Fix gcc.dg/vect/vect-mask-store-move-1.c

2016-02-09 Thread Richard Biener

Tested on x86_64-linux.

2016-02-09  Richard Biener  

* gcc.dg/vect/vect-mask-store-move-1.c: Add missing space.

Index: gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c  (revision 233244)
+++ gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c  (working copy)
@@ -15,4 +15,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect"{ 
target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */


openacc reference reductions

2016-02-09 Thread Cesar Philippidis
This patch teaches omp-lower how handle reference-typed reductions,
which are common in fortran subroutines. Unlike the implementation in
gomp4 branch, this patch doesn't rewrite the reference reduction
variables as local variables. Instead, a local copy is created for
reduction variable.

There are two things that stick out in this patch. First, I took care
not remap any reduction variable appearing on a parallel directive
inside an offloaded region in order to keep it private. Second, you'll
notice that I'm creating quite a few temporary pointers inside
lower_oacc_reductions. Without those separate pointers, I'd get SSA
validation errors because those pointers get deferenced multiple times.
I didn't investigate that problem further.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..8a66760 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,28 @@ is_oacc_kernels (omp_context *ctx)
 	  == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Ret

Re: openacc reference reductions

2016-02-09 Thread Nathan Sidwell

While I've not looked at the rest of the patch, this bit stood out:


+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  /* Don't install a local copy of the decl if it used
+ inside a acc parallel reduction.  */


^^ comment is misleading -- this routine's not installing anything


+  if (is_oacc_parallel (ctx))


^^ already checked above.


+for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+ && OMP_CLAUSE_DECL (c) == var)
+   return true;
+
+  return false;
+}
+





RE: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-09 Thread Claudiu Zissulescu
Please find attached a reworked patch. It doesn't contain the ABI modifications 
as I notified you earlier in an email.  Also, you may have extra comments 
regarding these original observations:

>+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
>+ registers.  */
>+  if (TARGET_HS)
>+{
>+  for (regno = 1; regno < 32; regno +=2)
>+   {
>+ arc_hard_regno_mode_ok[regno] = S_MODES;
>+   }
>+}
>+
>
>Does TARGET_HS with -mabi=default allow for passing DFmode / DImode 
>arguments
>in odd registers?  I fear you might run into reload trouble when trying to
>access the values.

The current ABI passes the DI-like modes in any register pair. This should not 
be an issue as the movdi_insn and movdf_insn should handle those exceptional 
cases. As for partial passing of arguments, move_block_from_reg() should take 
care of exceptional cases like DImode.

>+ if (!link_insn
>+ /* Avoid FPU instructions.  */
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPUmode)
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPU_UNEQmode)
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPUEmode))
>
>It's pointless to search for the CC setter and then bail out this late.
>The mode is also accessible in the CC user, so after we have computed
>pc_target, we can check the condition code register in the comparison
>XEXP (pc_target, 1) for its mode.

Most of the cases checking only the CC user may be sufficient. However, there 
are cases (only one which I found), where the CC user has a different mode than 
of the CC setter.  This is happening when running gcc.dg/pr56424.c test. Here, 
the C_FPU mode cstore is simplified by the following steps losing the CC_FPU 
mode:

In the expand:
   18: cc:CC_FPU=cmp(r159:DF,r162:DF)
   19: r163:SI=cc:CC_FPU<0
   20: r161:QI=r163:SI#0
   21: r153:SI=zero_extend(r161:QI)
   22: cc:CC_ZN=cmp(r153:SI,0)
   23: pc={(cc:CC_ZN!=0)?L28:pc}

Then after combine we get this:
   18: cc:CC_FPU=cmp(r2:DF,r4:DF)
  REG_DEAD r4:DF
  REG_DEAD r2:DF
   23: pc={(cc:CC_ZN<0)?L28:pc}
  REG_DEAD cc:CC_ZN
  REG_BR_PROB 6102

Ok to apply?
Claudiu


0001-ARC-Add-single-double-IEEE-precission-FPU-support.patch
Description: 0001-ARC-Add-single-double-IEEE-precission-FPU-support.patch


Re: [RFC] Combine vectorized loops with its scalar remainder.

2016-02-09 Thread Ilya Enkovich
2015-12-15 19:41 GMT+03:00 Yuri Rumyantsev :
> Hi Richard,
>
> I re-designed the patch to determine ability of loop masking on fly of
> vectorization analysis and invoke it after loop transformation.
> Test-case is also provided.
>
> what is your opinion?
>
> Thanks.
> Yuri.
>

Hi,

I'm going to start work on extending this patch to handle mixed mask sizes,
support vectorization of peeled loop tail and fix profitability
estimation to choose
proper loop tail processing. Here is shortly a planned changes list:

1. Don't put any restriction on mask type when check if statement can be masked.
Instead just store all required masks in LOOP_VINFO_REQUIRED_MASKS. After
all statements are checked we additionally check all required masks
can be produced
(we have proper comparison, widening and narrowing support).

2. In vect_estimate_min_profitable_iters compute overhead for masks creation,
decide what we should do with a loop tail (nothing, vectorize, combine
with loop body),
additionally return a number of tail iterations required for chosen
tail processing
profitability.

3. In vect_transform_loop depending on chosen strategy either mask whole loop or
produce vectorized tail. For now it's not fully clear to me what is
the best way to get
vectorized tail.

The first option is to just peel one iteration after loop is
vectorized. But in our masking
functions we use LOOP_VINFO and STMT_VINFO structures we loose during peeling.

Another option is to peel scalar loop and then just run vectorizer one more time
to vectorize and mask it.

Also we may peel vectorized loop and use original version (with all
STMT_VINFO still
available) as a tail and peeled version as a main loop.

Currently I think the best option is to peel scalar loop and run
vectorizer one more time
for it. This option is simpler and can also be used to vectorize loop
tail with a smaller vector
size when target doesn't support masking or masking is not profitable.

Any comments?

Thanks,
Ilya


Re: openacc reference reductions

2016-02-09 Thread Cesar Philippidis
On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
> While I've not looked at the rest of the patch, this bit stood out:
> 
>> +static bool
>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>> +{
>> +  if (!is_oacc_parallel (ctx))
>> +return false;
>> +
>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>> +
>> +  /* Don't install a local copy of the decl if it used
>> + inside a acc parallel reduction.  */
> 
> ^^ comment is misleading -- this routine's not installing anything
> 
>> +  if (is_oacc_parallel (ctx))
> 
> ^^ already checked above.
> 
>> +for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>> +  && OMP_CLAUSE_DECL (c) == var)
>> +return true;
>> +
>> +  return false;
>> +}
>> +

Thanks for catching that. Those are artifacts from when this code used
to be located exclusively in scan_sharing_clauses. I've updated the
patch with those changes.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c

Re: [RFC] Combine vectorized loops with its scalar remainder.

2016-02-09 Thread Jeff Law

On 02/09/2016 09:09 AM, Ilya Enkovich wrote:


Another option is to peel scalar loop and then just run vectorizer
one more time to vectorize and mask it.

Also we may peel vectorized loop and use original version (with all
STMT_VINFO still available) as a tail and peeled version as a main
loop.

Currently I think the best option is to peel scalar loop and run
vectorizer one more time for it. This option is simpler and can also
be used to vectorize loop tail with a smaller vector size when target
doesn't support masking or masking is not profitable.
In general, a path where we have peeling & masking as an option seems 
wise.  The sense I've gotten from rth was that there's going to be 
classes of loops where that's going to be the best option.


jeff


Re: [ARM] Use vector wide add for mixed-mode adds

2016-02-09 Thread Kyrill Tkachov

Hi Michael,

On 17/12/15 00:02, Michael Collison wrote:

Kyrill,

I have attached a patch that address your comments. The only change I would ask you to re-consider renaming is the function 'bool aarch32_simd_check_vect_par_cnst_half'. This function was copied from the aarch64 port and I thought it as 
important to match the naming for maintenance purposes. I did rename the function to 'bool arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm' and added '_p' per you suggestions. Is this okay?




Ok, that's fine with me.


I implemented all your other change suggestions.



Thanks, sorry it took a long time to get back to this, I was busy with 
regression-fixing patches as we're
in bug-fixing mode...


2015-12-16 Michael Collison  

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.



I've tried this out and I have a few comments.
The arm.c hunk doesn't apply to current trunk anymore due to context.
Can you please rebase the patch?
I've fixed it up manually in my tree so I can build it.
With this patch I'm seeing two PASS->FAIL on arm-none-eabi:
FAIL: gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 1
My compiler is configured with --with-float=hard --with-cpu=cortex-a9 
--with-fpu=neon --with-mode=thumb
Can you please look into these? Maybe it's just the tests that need adjustment?

Also, I'm seeing the new tests give an error:
ERROR: gcc.target/arm/neon-vaddws16.c: Unrecognized option type: arm_neon_ok for " 
dg-add-options 3 arm_neon_ok "
UNRESOLVED: gcc.target/arm/neon-vaddws16.c: Unrecognized option type: arm_neon_ok for 
" dg-add-options 3 arm_neon_ok "

That've because the dg-add-options argument should be arm_neon rather than 
arm_neon_ok.
Also, since the new tests are compile-only the effective target check should be 
arm_neon_ok rather than arm_neon_hw.

I also see ./contrib/check_GNU_style.sh complaining about some minor style 
issues like trailing whitespace and
blocks of whitespace that should be replaced with tabs.

In any case, this patch is GCC 7 material at this point, so I think with the 
above issues resolved
(and the FAILs investigated) this should be in good shape.

Thanks,
Kyrill


Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

2016-02-09 Thread Charles Baylis
On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:
> Hi Charles,
>
>
> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>>

>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
>> op1, rtx sel)
>> arm_expand_vec_perm_1 (target, op0, op1, sel);
>>   }
>>   +/* map lane ordering between architectural lane order, and GCC lane
>> order,
>> +   taking into account ABI.  See comment above output_move_neon for
>> details.  */
>> +static int
>> +neon_endian_lane_map (machine_mode mode, int lane)
>
>
> s/map/Map/
> New line between comment and function signature.

Done.

>> +{
>> +  if (BYTES_BIG_ENDIAN)
>> +  {
>> +int nelems = GET_MODE_NUNITS (mode);
>> +/* Reverse lane order.  */
>> +lane = (nelems - 1 - lane);
>> +/* Reverse D register order, to match ABI.  */
>> +if (GET_MODE_SIZE (mode) == 16)
>> +  lane = lane ^ (nelems / 2);
>> +  }
>> +  return lane;
>> +}
>> +
>> +/* some permutations index into pairs of vectors, this is a helper
>> function
>> +   to map indexes into those pairs of vectors.  */
>> +static int
>> +neon_pair_endian_lane_map (machine_mode mode, int lane)
>
>
> Similarly, s/some/Some/ and new line after comment.

Done.

>> +{
>> +  int nelem = GET_MODE_NUNITS (mode);
>> +  if (BYTES_BIG_ENDIAN)
>> +lane =
>> +  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
>> +  return lane;
>> +}
>> +
>>   /* Generate or test for an insn that supports a constant permutation.
>> */
>> /* Recognize patterns for the VUZP insns.  */
>> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
>> unsigned int i, odd, mask, nelt = d->nelt;
>> rtx out0, out1, in0, in1;
>> rtx (*gen)(rtx, rtx, rtx, rtx);
>> +  int first_elem;
>> +  int swap;
>>
>
> Just make this a bool.

As discussed on IRC, this variable does contain an integer. I have
renamed it as swap_nelt, and changed the test on it below.

[snip]

>>   @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
>> *d)
>>   in0 = d->op0;
>> in1 = d->op1;
>> -  if (BYTES_BIG_ENDIAN)
>> +  if (swap)
>>   {
>> std::swap (in0, in1);
>> -  odd = !odd;
>>   }
>
> remove the braces around the std::swap

Done. Also changed if (swap) to if (swap_nelt != 0)

[snip]

>> @@ -0,0 +1,24 @@
>> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>> +
>> +#define SIZE 128
>> +unsigned short _Alignas (16) in[SIZE];
>> +
>> +extern void abort (void);
>> +
>> +__attribute__ ((noinline)) int
>> +test (unsigned short sum, unsigned short *in, int x)
>> +{
>> +  for (int j = 0; j < SIZE; j += 8)
>> +sum += in[j] * x;
>> +  return sum;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  for (int i = 0; i < SIZE; i++)
>> +in[i] = i;
>> +  if (test (0, in, 1) != 960)
>> +abort ();
>
>
> AFAIK tests here usually prefer __builtin_abort ();
> That way you don't have to declare the abort prototype in the beginning.

Done.

Updated patch attached
From 99a536e2e10e3759a5de88422fadcabb22084b2f Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Tue, 9 Feb 2016 15:18:43 +
Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

gcc/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* config/arm/arm.c (neon_endian_lane_map): New function.
	(neon_vector_pair_endian_lane_map): New function.
	(arm_evpc_neon_vuzp): Allow for big endian lane order.
	* config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
	endian.
	(vuzpq_s16): Likewise.
	(vuzpq_s32): Likewise.
	(vuzpq_f32): Likewise.
	(vuzpq_u8): Likewise.
	(vuzpq_u16): Likewise.
	(vuzpq_u32): Likewise.
	(vuzpq_p8): Likewise.
	(vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* gcc.c-torture/execute/pr68532.c: New test.

Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8a2745..95ee9a5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,37 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   arm_expand_vec_perm_1 (target, op0, op1, sel);
 }
 
+/* Map lane ordering between architectural lane order, and GCC lane order,
+   taking into account ABI.  See comment above output_move_neon for details.  */
+
+static int
+neon_endian_lane_map (machine_mode mode, int lane)
+{
+  if (BYTES_BIG_ENDIAN)
+  {
+int nelems = GET_MODE_NUNITS (mode);
+/* Reverse lane order.  */
+lane = (nelems - 1 - lane);
+/* Reverse D register order, to match ABI.  */
+if (GET_MODE_SIZE (mode) == 16)
+  lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* Some permutations index into pairs of vectors, this is a helper function
+   to map indexes into those pairs of vectors.  */
+
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)
+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+lane =
+  neon_endian

Re: [openacc] reference-typed data mappings

2016-02-09 Thread Cesar Philippidis
On 02/09/2016 07:00 AM, Cesar Philippidis wrote:
> On 02/01/2016 09:57 AM, Cesar Philippidis wrote:
> 
>> > This patch fixes a couple of bugs preventing c++ reference-typed
>> > variables from working in openacc data clauses. These fixes include:
>> > 
>> >  * Teach the gimplifier to filter out pointer data mappings for
>> >OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>> >Along with using a firsptrivate mapping for the array base pointers
>> >in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
>> > 
>> >  * Make the data mapping errors emitted by the c and c++ front ends
>> >more consistent with openacc by reporting data mapping errors, not
>> >omp-specific map errors.
>> > 
>> >  * Add some light checking for duplicate reference mappings in c++. The
>> >c++ FE still fails to detect duplicate component refs, but that's not
>> >working in openacc at the moment, anyway.
>> > 
>> > Jakub, the latter issue also affects openmp. I've added a simple openmp
>> > test case, but it could probably be more extensive. Can you add more
>> > test coverage or tell me what should be included?
> While working on a different reduction problem, I noticed that both the
> c and c++ front end's are treating reductions as generic data clauses.
> That means, parallel reductions of the form
> 
>   #pragma acc copy(foo) reduction(+:foo)
> 
> would get treated as an error. This patch fixes that, in addition to the
> changes listed above.
> 
> Is this patch ok for trunk?

>   libgomp/
>   * testsuite/libgomp.c++/non-scalar-data.C: New test.

I copied the wrong test here. It should be testing omp target, not acc
*. This patch updates that test case.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/li

Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

2016-02-09 Thread Charles Baylis
On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:

> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>> unsigned int i, high, mask, nelt = d->nelt;
>> rtx out0, out1, in0, in1;
>> rtx (*gen)(rtx, rtx, rtx, rtx);
>> +  int first_elem;
>> +  bool is_swapped;
>>   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
>>   return false;
>>   +  is_swapped = BYTES_BIG_ENDIAN ? true : false;
>
>
> This is just "is_swapped = BYTES_BIG_ENDIAN;"

Done.

>> +
>> /* Note that these are little-endian tests.  Adjust for big-endian
>> later.  */
>
>
> I think you can remove this comment now, like in patch 1/2

Done.

>> +  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
>> +
>> high = nelt / 2;
>> -  if (d->perm[0] == high)
>> +  if (first_elem == neon_endian_lane_map (d->vmode, high))
>>   ;
>> -  else if (d->perm[0] == 0)
>> +  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
>>   high = 0;
>> else
>>   return false;
>> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>   for (i = 0; i < nelt / 2; i++)
>>   {
>> -  unsigned elt = (i + high) & mask;
>> -  if (d->perm[i * 2] != elt)
>> +  unsigned elt =
>> +   neon_pair_endian_lane_map (d->vmode, i + high) & mask;
>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>> is_swapped)]
>> + != elt)
>> return false;
>> -  elt = (elt + nelt) & mask;
>> -  if (d->perm[i * 2 + 1] != elt)
>> +  elt =
>> +   neon_pair_endian_lane_map (d->vmode, i + nelt + high)
>> +   & mask;
>
>
> The "& mask" can go on the previous line.

Done

>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>> !is_swapped)]
>> + != elt)
>> return false;
>>   }
>>   @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d
>> *d)
>>   in0 = d->op0;
>> in1 = d->op1;
>> -  if (BYTES_BIG_ENDIAN)
>> +  if (is_swapped)
>>   {
>> std::swap (in0, in1);
>> -  high = !high;
>>   }
>
>
> remove the braces around the std::swap.

Done.

> Ok with these changes.
> I've tried out both patch and they do fix execution failures on big-endian
> and don't break any NEON intrinsics tests that I threw at them.

Attached for completeness, will commit once the VUZP patch is OKd.
From 469f82610a4e70284bf23c373b8a73685cad0ec1 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Tue, 9 Feb 2016 15:18:44 +
Subject: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

gcc/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
	order.
	* config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
	endian.
	(vzipq_s16): Likewise.
	(vzipq_s32): Likewise.
	(vzipq_f32): Likewise.
	(vzipq_u8): Likewise.
	(vzipq_u16): Likewise.
	(vzipq_u32): Likewise.
	(vzipq_p8): Likewise.
	(vzipq_p16): Likewise.

Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 95ee9a5..5562baa 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28318,15 +28318,20 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
   unsigned int i, high, mask, nelt = d->nelt;
   rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  bool is_swapped;
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
 return false;
 
-  /* Note that these are little-endian tests.  Adjust for big-endian later.  */
+  is_swapped = BYTES_BIG_ENDIAN;
+
+  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
+
   high = nelt / 2;
-  if (d->perm[0] == high)
+  if (first_elem == neon_endian_lane_map (d->vmode, high))
 ;
-  else if (d->perm[0] == 0)
+  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
 high = 0;
   else
 return false;
@@ -28334,11 +28339,15 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
 
   for (i = 0; i < nelt / 2; i++)
 {
-  unsigned elt = (i + high) & mask;
-  if (d->perm[i * 2] != elt)
+  unsigned elt =
+	neon_pair_endian_lane_map (d->vmode, i + high) & mask;
+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)]
+	  != elt)
 	return false;
-  elt = (elt + nelt) & mask;
-  if (d->perm[i * 2 + 1] != elt)
+  elt =
+	neon_pair_endian_lane_map (d->vmode, i + nelt + high) & mask;
+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)]
+	  != elt)
 	return false;
 }
 
@@ -28362,11 +28371,8 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
 
   in0 = d->op0;
   in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
-{
-  std::swap (in0, in1);
-  high = !high;
-}
+  if (is_swapped)
+std::swap (in0, in1);
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
diff --git a/gcc/config/arm/arm_neon.h b/g

Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

2016-02-09 Thread Kyrill Tkachov


On 09/02/16 17:00, Charles Baylis wrote:

On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:

Hi Charles,


On 03/02/16 18:59, charles.bay...@linaro.org wrote:

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
op1, rtx sel)
 arm_expand_vec_perm_1 (target, op0, op1, sel);
   }
   +/* map lane ordering between architectural lane order, and GCC lane
order,
+   taking into account ABI.  See comment above output_move_neon for
details.  */
+static int
+neon_endian_lane_map (machine_mode mode, int lane)


s/map/Map/
New line between comment and function signature.

Done.


+{
+  if (BYTES_BIG_ENDIAN)
+  {
+int nelems = GET_MODE_NUNITS (mode);
+/* Reverse lane order.  */
+lane = (nelems - 1 - lane);
+/* Reverse D register order, to match ABI.  */
+if (GET_MODE_SIZE (mode) == 16)
+  lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* some permutations index into pairs of vectors, this is a helper
function
+   to map indexes into those pairs of vectors.  */
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)


Similarly, s/some/Some/ and new line after comment.

Done.


+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+lane =
+  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
+  return lane;
+}
+
   /* Generate or test for an insn that supports a constant permutation.
*/
 /* Recognize patterns for the VUZP insns.  */
@@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
 unsigned int i, odd, mask, nelt = d->nelt;
 rtx out0, out1, in0, in1;
 rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  int swap;


Just make this a bool.

As discussed on IRC, this variable does contain an integer. I have
renamed it as swap_nelt, and changed the test on it below.


This is ok.
Thanks,
Kyrill


[snip]


   @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
*d)
   in0 = d->op0;
 in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  if (swap)
   {
 std::swap (in0, in1);
-  odd = !odd;
   }

remove the braces around the std::swap

Done. Also changed if (swap) to if (swap_nelt != 0)

[snip]


@@ -0,0 +1,24 @@
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#define SIZE 128
+unsigned short _Alignas (16) in[SIZE];
+
+extern void abort (void);
+
+__attribute__ ((noinline)) int
+test (unsigned short sum, unsigned short *in, int x)
+{
+  for (int j = 0; j < SIZE; j += 8)
+sum += in[j] * x;
+  return sum;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < SIZE; i++)
+in[i] = i;
+  if (test (0, in, 1) != 960)
+abort ();


AFAIK tests here usually prefer __builtin_abort ();
That way you don't have to declare the abort prototype in the beginning.

Done.

Updated patch attached




[PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

2016-02-09 Thread Kyrill Tkachov

Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when 
targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar 
reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the 
caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation 
the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, 
whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the 
result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant 
with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that 
the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I 
understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar 
correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf 
configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large 
patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and 
only for static functions
that don't get called outside their translation units (or during LTO I suppose) 
so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill

2016-02-09  Kyrylo Tkachov  

PR target/65578
* config/arm/arm.c (use_vfp_abi): Remove id_double argument.
Don't check for is_double and TARGET_VFP_DOUBLE.
(aapcs_vfp_is_call_or_return_candidate): Update callsite.
(aapcs_vfp_is_return_candidate): Likewise.
(aapcs_vfp_is_call_candidate): Likewise.
(aapcs_vfp_allocate_return_reg): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e71c9f56cbe846fdddf2e42a9f4575bacee570c1..e1404c74f74d01eb9c3362c7250e2b30ba5e47e7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5696,7 +5696,7 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
 
 /* Return true if PCS_VARIANT should use VFP registers.  */
 static bool
-use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
+use_vfp_abi (enum arm_pcs pcs_variant)
 {
   if (pcs_variant == ARM_PCS_AAPCS_VFP)
 {
@@ -5715,8 +5715,7 @@ use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
   if (pcs_variant != ARM_PCS_AAPCS_LOCAL)
 return false;
 
-  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT &&
-	  (TARGET_VFP_DOUBLE || !is_double));
+  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT);
 }
 
 /* Return true if an argument whose type is TYPE, or mode is MODE, is
@@ -5758,7 +5757,7 @@ aapcs_vfp_is_call_or_return_candidate (enum arm_pcs pcs_variant,
 return false;
 
 
-  if (!use_vfp_abi (pcs_variant, ARM_NUM_REGS (new_mode) > 1))
+  if (!use_vfp_abi (pcs_variant))
 return false;
 
   *base_mode = new_mode;
@@ -5772,7 +5771,7 @@ aapcs_vfp_is_return_candidate (enum arm_pcs pcs_variant,
   int count ATTRIBUTE_UNUSED;
   machine_mode ag_mode ATTRIBUTE_UNUSED;
 
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
 return false;
   return aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
 		&ag_mode, &count);
@@ -5782,7 +5781,7 @@ static bool
 aapcs_vfp_is_call_candidate (CUMULATIVE_ARGS *pcum, machine_mode mode,
 			 const_tree type)
 {
-  if (!use_vfp_abi (pcum->pcs_variant, false))
+  if (!use_vfp_abi (pcum->pcs_variant))
 return false;
 
   return aapcs_vfp_is_call_or_return_candidate (pcum->pcs_variant, mode, type,
@@ -5848,7 +5847,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
 			   machine_mode mode,
 			   const_tree type ATTRIBUTE_UNUSED)
 {
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
 return NULL;
 
   if (mode == BLKmode


Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

2016-02-09 Thread Kyrill Tkachov


On 09/02/16 17:21, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when 
targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar 
reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the 
caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation 
the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, 
whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the 
result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant 
with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that 
the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I 
understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar 
correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf 
configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large 
patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and 
only for static functions
that don't get called outside their translation units (or during LTO I suppose) 
so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill



Huh, I just realised I wrote completely the wrong PR number on this.
The PR I'm talking about here is PR target/69538

Sorry for the confusion.

Kyrill



2016-02-09 Kyrylo Tkachov  

PR target/65578
* config/arm/arm.c (use_vfp_abi): Remove id_double argument.
Don't check for is_double and TARGET_VFP_DOUBLE.
(aapcs_vfp_is_call_or_return_candidate): Update callsite.
(aapcs_vfp_is_return_candidate): Likewise.
(aapcs_vfp_is_call_candidate): Likewise.
(aapcs_vfp_allocate_return_reg): Likewise.




Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2016-02-09 Thread Ulrich Weigand
Hi Bill,

> 2014-02-20  Bill Schmidt  
> 
>   * config/rs6000/altivec.md (altivec_lvxl): Rename as
>   *altivec_lvxl__internal and use VM2 iterator instead of
>   V4SI.
>   (altivec_lvxl_): New define_expand incorporating
>   -maltivec=be semantics where needed.

I just noticed that this:

> -(define_insn "altivec_lvxl"
> +(define_expand "altivec_lvxl_"
>[(parallel
> -[(set (match_operand:V4SI 0 "register_operand" "=v")
> -   (match_operand:V4SI 1 "memory_operand" "Z"))
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
>   (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
>"TARGET_ALTIVEC"
> -  "lvxl %0,%y1"
> +{
> +  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
> +{
> +  altivec_expand_lvx_be (operands[0], operands[1], mode, 
> UNSPEC_SET_VSCR);
> +  DONE;
> +}
> +})
> +
> +(define_insn "*altivec_lvxl__internal"
> +  [(parallel
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
> + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> +  "TARGET_ALTIVEC"
> +  "lvx %0,%y1"
>[(set_attr "type" "vecload")])

now causes vec_ldl to emit the lvx instead of the lvxl instruction.
I assume this was not actually intended?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Combine simplify_set WORD_REGISTER_OPERATIONS

2016-02-09 Thread Segher Boessenkool
On Mon, Feb 01, 2016 at 08:46:42AM +1030, Alan Modra wrote:
> The comment says this test is supposed to prevent "a narrower
> operation than requested", but it actually only allows a larger
> subreg, not one the same size.  Fix that.
> 
> Bootstrapped and regression tested powerpc64-linux.  OK for stage1?

It looks good, but please post it again then.


Segher


[PATCH, i386]: Use gen_int_mode to truncate const_int operand

2016-02-09 Thread Uros Bizjak
Hello!

No need to go through all subreg processing, we already know we have
const_int here.

2016-02-09  Uros Bizjak  

* config/i386/i386.md (insv_1): Use gen_int_mode to
truncate const_int operand 1 to QImode.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32},
committed to mainline SVN.

Uros.

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 233245)
+++ config/i386/i386.md (working copy)
@@ -2883,7 +2883,7 @@
   ""
 {
   if (CONST_INT_P (operands[1]))
-operands[1] = simplify_gen_subreg (QImode, operands[1], mode, 0);
+operands[1] = gen_int_mode (INTVAL (operands[1]), QImode);
   return "mov{b}\t{%b1, %h0|%h0, %b1}";
 }
   [(set_attr "isa" "*,nox64")


[PATCH], PR target/68404, Fix PowerPC fusion error

2016-02-09 Thread Michael Meissner
This bug fixes PR 68404, which created an insn for the fusion operation when
accessing an array with a large constant offset that the downstream passes
(regrenam in particular don't like).  Because fusion in general adds so little
to the performance of power8, I just eliminated the compiler from generating
this case for GCC 6.  In the GCC 7 timeframe, I likely will revist fusion for
power9 support.  I ran a spec 2006 benchmark suite comparing the current
behavior and the fix for PR 68404, and it was in the noise level (mcf was 1%
slower, others ranged from 0.3% slower to 0.4% faster).

I did a bootstrap build, including a bootstrap profiled build with LTO (which
is how the problem was found) and it was found.  I rewrote 2 of the 3 fusion
tests so that it uses fusion from a medium code toc entry instead of accessing
an array element with a constant index over 65536 bytes.

Is this patch ok to apply?  If you would prefer, I can eliminate the code
inside of the fusion_gpr_addis predicate instead of using #if 0.

[gcc]
2016-02-08  Michael Meissner  

PR target/68404
* config/rs6000/predicates.md (fusion_gpr_addis): Prevent fusing
an ADDIS that adds a pointer to a large constant that sets the
upper16 bits with a load operation.

[gcc/testsuite]
2016-02-08  Michael Meissner  

PR target/68404
* gcc.target/powerpc/fusion.c: Rewrite test to use TOC fusion
instead accessing a really large arrray.
* gcc.target/powerpc/fusion3.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 233220)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1716,10 +1716,17 @@ (define_predicate "fusion_gpr_addis"
   if (CONST_INT_P (op))
 int_const = op;
 
+#if 0
+  /* PR 68404 -- regrename doesn't  like:
+
+   (mem (plus (plus (reg)
+(const_int))
+  (const_int  */
   else if (GET_CODE (op) == PLUS
   && base_reg_operand (XEXP (op, 0), Pmode)
   && CONST_INT_P (XEXP (op, 1)))
 int_const = XEXP (op, 1);
+#endif
 
   else
 return 0;
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===
--- gcc/testsuite/gcc.target/powerpc/fusion3.c  (revision 233220)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c  (working copy)
@@ -4,15 +4,24 @@
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
 
-#define LARGE 0x12345
+#define SIZE 4
+struct foo {
+  float f;
+  double d;
+};
 
-int fusion_float_read (float *p){ return p[LARGE]; }
-int fusion_double_read (double *p){ return p[LARGE]; }
+static struct foo st[SIZE];
+struct foo *ptr_st = &st[0];
 
-void fusion_float_write (float *p, float f){ p[LARGE] = f; }
-void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+float fusion_float_read (void){ return st[SIZE].f; }
+double fusion_float_extend (void){ return (double)st[SIZE].f; }
+double fusion_double_read (void){ return st[SIZE].d; }
 
-/* { dg-final { scan-assembler "load fusion, type SF"  } } */
-/* { dg-final { scan-assembler "load fusion, type DF"  } } */
-/* { dg-final { scan-assembler "store fusion, type SF" } } */
-/* { dg-final { scan-assembler "store fusion, type DF" } } */
+void fusion_float_write (float f){ st[SIZE].f = f; }
+void fusion_float_truncate (double d){ st[SIZE].f = (float)d; }
+void fusion_double_write (double d){ st[SIZE].d = d; }
+
+/* { dg-final { scan-assembler-times "load fusion, type SF"  2 } } */
+/* { dg-final { scan-assembler-times "load fusion, type DF"  1 } } */
+/* { dg-final { scan-assembler-times "store fusion, type SF" 2 } } */
+/* { dg-final { scan-assembler-times "store fusion, type DF" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===
--- gcc/testsuite/gcc.target/powerpc/fusion.c   (revision 233220)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c   (working copy)
@@ -1,17 +1,28 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3 -mcmodel=medium" } */
 
-#define LARGE 0x12345
+#define SIZE 4
+struct foo {
+  unsigned char uc;
+  signed char sc;
+  unsigned short us;
+  short ss;
+  int i;
+  unsigned u;
+};
 
-int fusion_uchar (unsigned char *p){ return p[LARGE]; }
-int fusion_schar

Re: [PR69634] fix debug_insn-inconsistent REG_N_CALLS_CROSSED

2016-02-09 Thread Jeff Law

On 02/06/2016 03:06 AM, Alexandre Oliva wrote:

The testcase has a debug insn referencing a pseudo right before an
insn that modifies the pseudo.

Without debug insns, REG_N_CALLS_CROSSED was zero for that pseudo, so
sched_analyze_reg added a dep between the pseudo setter and an earlier
(lib)call.

With debug insns, we miscomputed REG_N_CALLS_CROSSED as nonzero
because of the debug insn, and then no dep was added between the two
insns.  This was enough to change sched1's decisions about where to
place the pseudo setter.

REG_N_CALLS_CROSSED is computed by both regstat_bb_compute_ri and
regstat_bb_compute_calls_crossed, but although the former skipped
debug insns, the latter didn't.

Fixing this inconsistency was enough to fix the -fcompare-debug error.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR target/69634
* regstat.c (regstat_bb_compute_calls_crossed): Disregard
debug insns.

for  gcc/testsuite/ChangeLog

PR target/69634
* gcc.dg/pr69634.c: New.
I removed the explicit -m32.  It was there merely to try and exercise 
the test, even on an x86-64 configured toolchain.  As Uros noted, a 
standard multilib of x86-64 will test -m32, so the explicit -m32 isn't 
needed and it is in fact harmful.


Committed to the trunk with that change.  I'm going to add 4.9/5 
regression markers to the BZ since this affects those releases as well.


Jeff



Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

2016-02-09 Thread Charles Baylis
Committed to trunk as r233252

On 9 February 2016 at 17:07, Charles Baylis  wrote:
> On 8 February 2016 at 11:42, Kyrill Tkachov  
> wrote:
>
>> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>> unsigned int i, high, mask, nelt = d->nelt;
>>> rtx out0, out1, in0, in1;
>>> rtx (*gen)(rtx, rtx, rtx, rtx);
>>> +  int first_elem;
>>> +  bool is_swapped;
>>>   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
>>>   return false;
>>>   +  is_swapped = BYTES_BIG_ENDIAN ? true : false;
>>
>>
>> This is just "is_swapped = BYTES_BIG_ENDIAN;"
>
> Done.
>
>>> +
>>> /* Note that these are little-endian tests.  Adjust for big-endian
>>> later.  */
>>
>>
>> I think you can remove this comment now, like in patch 1/2
>
> Done.
>
>>> +  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
>>> +
>>> high = nelt / 2;
>>> -  if (d->perm[0] == high)
>>> +  if (first_elem == neon_endian_lane_map (d->vmode, high))
>>>   ;
>>> -  else if (d->perm[0] == 0)
>>> +  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
>>>   high = 0;
>>> else
>>>   return false;
>>> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>>   for (i = 0; i < nelt / 2; i++)
>>>   {
>>> -  unsigned elt = (i + high) & mask;
>>> -  if (d->perm[i * 2] != elt)
>>> +  unsigned elt =
>>> +   neon_pair_endian_lane_map (d->vmode, i + high) & mask;
>>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>>> is_swapped)]
>>> + != elt)
>>> return false;
>>> -  elt = (elt + nelt) & mask;
>>> -  if (d->perm[i * 2 + 1] != elt)
>>> +  elt =
>>> +   neon_pair_endian_lane_map (d->vmode, i + nelt + high)
>>> +   & mask;
>>
>>
>> The "& mask" can go on the previous line.
>
> Done
>
>>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>>> !is_swapped)]
>>> + != elt)
>>> return false;
>>>   }
>>>   @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d
>>> *d)
>>>   in0 = d->op0;
>>> in1 = d->op1;
>>> -  if (BYTES_BIG_ENDIAN)
>>> +  if (is_swapped)
>>>   {
>>> std::swap (in0, in1);
>>> -  high = !high;
>>>   }
>>
>>
>> remove the braces around the std::swap.
>
> Done.
>
>> Ok with these changes.
>> I've tried out both patch and they do fix execution failures on big-endian
>> and don't break any NEON intrinsics tests that I threw at them.
>
> Attached for completeness, will commit once the VUZP patch is OKd.


Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

2016-02-09 Thread Charles Baylis
On 9 February 2016 at 17:08, Kyrill Tkachov  wrote:
>
> On 09/02/16 17:00, Charles Baylis wrote:
>>
>> On 8 February 2016 at 11:42, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Charles,
>>>
>>>
>>> On 03/02/16 18:59, charles.bay...@linaro.org wrote:

 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
 op1, rtx sel)
  arm_expand_vec_perm_1 (target, op0, op1, sel);
}
+/* map lane ordering between architectural lane order, and GCC lane
 order,
 +   taking into account ABI.  See comment above output_move_neon for
 details.  */
 +static int
 +neon_endian_lane_map (machine_mode mode, int lane)
>>>
>>>
>>> s/map/Map/
>>> New line between comment and function signature.
>>
>> Done.
>>
 +{
 +  if (BYTES_BIG_ENDIAN)
 +  {
 +int nelems = GET_MODE_NUNITS (mode);
 +/* Reverse lane order.  */
 +lane = (nelems - 1 - lane);
 +/* Reverse D register order, to match ABI.  */
 +if (GET_MODE_SIZE (mode) == 16)
 +  lane = lane ^ (nelems / 2);
 +  }
 +  return lane;
 +}
 +
 +/* some permutations index into pairs of vectors, this is a helper
 function
 +   to map indexes into those pairs of vectors.  */
 +static int
 +neon_pair_endian_lane_map (machine_mode mode, int lane)
>>>
>>>
>>> Similarly, s/some/Some/ and new line after comment.
>>
>> Done.
>>
 +{
 +  int nelem = GET_MODE_NUNITS (mode);
 +  if (BYTES_BIG_ENDIAN)
 +lane =
 +  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
 +  return lane;
 +}
 +
/* Generate or test for an insn that supports a constant permutation.
 */
  /* Recognize patterns for the VUZP insns.  */
 @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
 *d)
  unsigned int i, odd, mask, nelt = d->nelt;
  rtx out0, out1, in0, in1;
  rtx (*gen)(rtx, rtx, rtx, rtx);
 +  int first_elem;
 +  int swap;

>>> Just make this a bool.
>>
>> As discussed on IRC, this variable does contain an integer. I have
>> renamed it as swap_nelt, and changed the test on it below.
>
>
> This is ok.

Thanks. Committed to trunk as r233251


[PATCH] [target/65867] FIx bootstrap failure on mingw32

2016-02-09 Thread Jeff Law


This was actually approved by Kai in the BZ eons ago.  I've installed 
the patch on the trunk.


Essentially there's a missing #include for mingw32 that prevents libssp 
from building.


Jeff
commit d48dbf6568626d96cc948d8aaf7ef0265689a213
Author: law 
Date:   Tue Feb 9 19:16:30 2016 +

2015-04-25  Daniel Starke  

PR target/65867
* ssp.c: Added wincrypt.h include for Windows targets.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233253 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libssp/ChangeLog b/libssp/ChangeLog
index 47e7339..036257f 100644
--- a/libssp/ChangeLog
+++ b/libssp/ChangeLog
@@ -1,3 +1,8 @@
+2015-04-25  Daniel Starke  
+
+   PR target/65867
+   * ssp.c: Added wincrypt.h include for Windows targets.
+
 2015-05-13  Michael Haubenwallner  
 
* Makefile.in: Regenerated with automake-1.11.6.
diff --git a/libssp/ssp.c b/libssp/ssp.c
index 38e3ec8..69805bc 100644
--- a/libssp/ssp.c
+++ b/libssp/ssp.c
@@ -56,6 +56,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
to the console using  "CONOUT$"   */
 #if defined (_WIN32) && !defined (__CYGWIN__)
 #include 
+#include 
 # define _PATH_TTY "CONOUT$"
 #else
 # define _PATH_TTY "/dev/tty"


New Finnish PO file for 'cpplib' (version 6.1-b20160131)

2016-02-09 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Finnish team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/fi.po

(This file, 'cpplib-6.1-b20160131.fi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: C++ PATCH for c++/69657 (abs not inlined)

2016-02-09 Thread Rainer Orth
Jason Merrill  writes:

> The issue in this bug was that due to changes in the libstdc++ headers, the
> built-in abs declaration was getting hidden by a using-declaration, so that
> then when the built-in got an explicit declaration, the original
> declaration wasn't there anymore and so the new declaration didn't get
> marked as built-in.
>
> Fixed by overloading an anticipated built-in rather than clobbering it when
> we see a using-declaration of the same name.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

This patch broke Solaris bootstrap (seen on i386-pc-solaris2.12):

/var/gcc/reghunt/trunk/gcc/c-family/c-lex.c: In function 'c_fileinfo* 
get_fileinfo(const char*)':
/var/gcc/reghunt/trunk/gcc/c-family/c-lex.c:102:62: error: overloaded function 
with no contextual type information
 file_info_tree = splay_tree_new ((splay_tree_compare_fn) strcmp,
  ^~
/var/gcc/reghunt/trunk/gcc/c-family/c-lex.c:104:39: error: overloaded function 
with no contextual type information
  (splay_tree_delete_value_fn) free);
   ^~~~
make[3]: *** [c-family/c-lex.o] Error 1

Comparing c-lex.ii before and after the patch, I see that they differ
like this:

-# 508 "/var/gcc/reghunt/trunk/gcc/system.h"
+# 452 "/var/gcc/reghunt/trunk/gcc/system.h"
+extern void free (void *);
+# 496 "/var/gcc/reghunt/trunk/gcc/system.h"
+extern void *malloc (size_t);
+
+
+
+extern void *calloc (size_t, size_t);
+
+
+
+extern void *realloc (void *, size_t);
+
+
+
 }

and indeed auto-host.h changed:

--- /var/gcc/reghunt/of-no-cti/28675/gcc/auto-host.h2016-02-09 19:53:23.6030
78417 +0100
+++ auto-host.h 2016-02-09 20:28:39.946901958 +0100
@@ -743 +743 @@
-#define HAVE_DECL_CALLOC 1
+#define HAVE_DECL_CALLOC 0
@@ -846 +846 @@
-#define HAVE_DECL_FREE 1
+#define HAVE_DECL_FREE 0
@@ -937 +937 @@
-#define HAVE_DECL_MALLOC 1
+#define HAVE_DECL_MALLOC 0
@@ -958 +958 @@
-#define HAVE_DECL_REALLOC 1
+#define HAVE_DECL_REALLOC 0

The calloc configure test now fails like this:

conftest.cpp: In function 'int main()':
conftest.cpp:150:28: error: overloaded function with no contextual type 
information
 char *(*pfn) = (char *(*)) calloc ;
^~

The test boils down to

typedef unsigned int size_t;
namespace std {
extern void *calloc(size_t, size_t);
}
using std::calloc;
int
main ()
{
  char *(*pfn) = (char *(*)) calloc ;
  return 0;
}

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] PR other/69554: avoid excessive source printing for widely-separated locations

2016-02-09 Thread David Malcolm
PR other/69554 describes a regression seen from the Fortran frontend
when issuing a diagnostic containing more than one location: if the
locations are within the same file (and hence not filtered by the
existing sanitization code), diagnostic_show_locus could print all
of the lines of the source file between the two locations, which
could be an excessive amount of output.

It's possible to see this from other frontends; for example, in the
C frontend, we emit errors like this:

 left_hand_side () + right_hand_side ()
 ~ ^ ~~

and the three locations could potentially be separated by e.g. large
comments, leading to lots of diagnostic spew.

The solution seems to be to split the source file up when printing,
to print just "the source lines of interest", meaning those source
lines containing at least part of an underline or caret.

The attached patch implements this, in diagnostic-show-locus.
Previously, within class layout, the fields m_first_line and m_last_line
described a single "span" of source lines to be printed
(e.g. lines 3-12).  This patch replaces those fields with a vector of
line_span instances, so that we can print e.g. lines 3-5, then line 8,
then lines 10-12.

We need to tell the user which lines he/she is seeing, so the patch
prints locus information every time we change line spans.  This is
modelled on the existing output of the Fortran frontend.  For example,
we might print:

foo.c:8:5: error: insufficiently cromulent code
foo.c:3:1:
  left_hand_side (3, /* line 3 */
  ~~
  4,  /* line 4 */
  ~~
  5)  /* line 5 */
  ~~
foo.c:8:5:
 + /* line 8 */
 ^
foo.c:10:2:
  right_hand_side (10,  /* line 10 */
  
   11,  /* line 11 */
   ~~~
   12)  /* line 12 */
   ~~~

(note the primary caret is at line 8 column 5, so the initial message
emitted by the C frontend describes that, but the initial span doesn't
contain the primary caret, so it gets a locus line "foo.c:3:1:")

Typically the source will be printed in a single span, and so there
won't be any extra locus lines; this is all about gracefully handling
the more awkward cases.

For Fortran, the locus line gets an extra newline (and thus
restoring the gcc 5 behavior):

foo.F90:7:4:

 1000 continue ! first instance
1
foo.F90:11:4:

 1000 continue ! second instance
2
Error: Duplicate statement label 1000 at (1) and (2)

Given that the code to print the locus information varies slightly
for Fortran, the patch adds it as a new callback within the
diagnostic_context: "start_span", called from diagnostic_show_locus.

I added a "dg-locus" directive for detecting these locus lines from
test cases.

As far as I know, we currently have no test coverage for the Fortran
frontend's printing of caret and source code; the test suite implicity
injects -fno-diagnostics-show-caret into options, and gfortran-dg.exp
expects this and rewrites the output somewhat accordingly.

This patch adds Fortran test cases that use -fdiagnostics-show-caret,
and adds support to gfortran-dg.exp to detect this, and to disable the
output rewriting, so that the textual output for this case can be
more directly tested.  This gives us test coverage of source-printing
of multi-location diagnostics emitted by the Fortran frontend.

The patch also adds similar test coverage for the C frontend.  In
both cases (C and Fortran), the test cases exercise a variety of
situations in which the lines can be all in one line-span, or split
between two or three.

The patch adds the first use of dg-begin/end-multiline-output for
Fortran.  Given that Fortran doesn't (to my knowledge) support
multiline comments, I enclosed the directives in a
#if 0/#endif pair (which requires the test cases to be .F90, rather
than .f90).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

Adds 28 new PASS results to gcc.sum and 8 new PASS results to
gfortran.sum.

OK for trunk in stage 4?  (PR 69554 is a regression)

gcc/ChangeLog:
PR other/69554
* diagnostic-show-locus.c (struct line_span): New struct.
(layout::get_first_line): Delete.
(layout::get_last_line): Delete.
(layout::get_num_line_spans): New member function.
(layout::get_line_span): Likewise.
(layout::print_heading_for_line_span_index_p): Likewise.
(layout::get_expanded_location): Likewise.
(layout::calculate_line_spans): Likewise.
(layout::m_first_line): Delete.
(layout::m_last_line): Delete.
(layout::m_line_spans): New field.
(layout::layout): Update comment.  Replace m_first_line and
m_last_line with m_line_spans, replacing their initialization
with a call to calculate_line_spans.
(diagnostic_show_locus): When printing source lines and
annotations, rather than looping over a single span
of lines, instea

Re: [PATCH] PR driver/69265: improved suggestions for various misspelled options

2016-02-09 Thread David Malcolm
Ping.  

This is a bug in a new feature, so it isn't a regression as such, but
it's fairly visible, and I believe the fix is relatively low-risk
(error-handling of typos of command-line options).

This also now covers PR driver/69453 (and its duplicate PR
driver/69642), so people *are* running into this.

On Wed, 2016-01-13 at 16:50 -0500, David Malcolm wrote:
> As of r230285 (b279775faf3c56b554ecd38159b70ea7f2d37e0b; PR
> driver/67613)
> the driver provides suggestions for misspelled options.
> 
> This works well for some options e.g.
> 
>  $ gcc -static-libfortran test.f95
>  gcc: error: unrecognized command line option '-static-libfortran';
>  did you mean '-static-libgfortran'?
> 
> but as reported in PR driver/69265 it can generate poor suggestions:
> 
>  $ c++ -sanitize=address foo.cc
>  c++: error: unrecognized command line option ‘-sanitize=address’;
>  did you mean ‘-Wframe-address’?
> 
> The root cause is that the current implementation only considers
> cl_options[].opt_text, and has no knowledge of the arguments to
> -fsanitize (and hence doesn't consider the "address" text when
> computing edit distances).
> 
> It also fails to consider the alternate ways of spelling options
> e.g. "-Wno-" vs "-W".
> 
> The following patch addresses these issues by building a vec of
> candidates from cl_options[].opt_text, rather than just using
> the latter.
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
> adds 8 PASS results to gcc.sum.
> 
> OK for trunk in stage 3?
> 
> gcc/ChangeLog:
>   PR driver/69265
>   * gcc.c (suggest_option): Move 2nd half of existing
>   implementation into find_closest_string.  Build the list
>   of candidates using add_misspelling_candidates.  Special-case
>   OPT_fsanitize_ and OPT_fsanitize_recover_, making use of
>   the sanitizer_args array.  Clean up the list of candidates,
>   returning a copy of the suggestion.
>   (driver::handle_unrecognized_options): Free the result
>   of suggest_option.
>   * opts-common.c (add_misspelling_candidates): New function.
>   * opts.c (common_handle_option): Rename local "spec" array and
>   make it a global...
>   (sanitizer_args): ...here.
>   * opts.h (sanitizer_args): New array decl.
>   (add_misspelling_candidates): New function decl.
>   * spellcheck.c (find_closest_string): New function.
>   * spellcheck.h (find_closest_string): New function decl.
> 
> gcc/testsuite/ChangeLog:
>   PR driver/69265
>   * gcc.dg/spellcheck-options-3.c: New test case.
>   * gcc.dg/spellcheck-options-4.c: New test case.
>   * gcc.dg/spellcheck-options-5.c: New test case.
>   * gcc.dg/spellcheck-options-6.c: New test case.
> ---
>  gcc/gcc.c   | 85 +--
> --
>  gcc/opts-common.c   | 41 
>  gcc/opts.c  | 97 ++-
> --
>  gcc/opts.h  | 11 
>  gcc/spellcheck.c| 46 ++
>  gcc/spellcheck.h|  4 ++
>  gcc/testsuite/gcc.dg/spellcheck-options-3.c |  6 ++
>  gcc/testsuite/gcc.dg/spellcheck-options-4.c |  6 ++
>  gcc/testsuite/gcc.dg/spellcheck-options-5.c |  6 ++
>  gcc/testsuite/gcc.dg/spellcheck-options-6.c |  6 ++
>  10 files changed, 232 insertions(+), 76 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-6.c
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index 319a073..8dcc356 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -7610,39 +7610,71 @@ driver::maybe_putenv_OFFLOAD_TARGETS () const
>  
> Given an unrecognized option BAD_OPT (without the leading dash),
> locate the closest reasonable matching option (again, without the
> -   leading dash), or NULL.  */
> +   leading dash), or NULL.
>  
> -static const char *
> +   If non-NULL, the string is a copy, which must be freed by the
> caller.  */
> +
> +static char *
>  suggest_option (const char *bad_opt)
>  {
> -  const cl_option *best_option = NULL;
> -  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +  /* We build a vec of candidates, using add_misspelling_candidates
> + to add copies of strings, without a leading dash.  */
> +  auto_vec  candidates;
>  
>for (unsigned int i = 0; i < cl_options_count; i++)
>  {
> -  edit_distance_t dist = levenshtein_distance (bad_opt,
> -   
>  cl_options[i].opt_text + 1);
> -  if (dist < best_distance)
> +  const char *opt_text = cl_options[i].opt_text;
> +  switch (i)
>   {
> -   best_distance = dist;
> -   best_option = &cl_options[i];
> + default:
> +   /* For most options, we simply consider the plain option
> t

[wwwdocs] Document null 'this' dereference issue in /gcc-6/porting_to.html

2016-02-09 Thread Jonathan Wakely

This adds a note to the porting document about the (shockingly
widespread) problem of calling member functions through null pointers,
which GCC 6 no longer tolerates.

Following some comments about (bool)os I'm also tweaking another part
of the doc to use the plusplusgood static_cast form.

Committed to CVS.
Index: htdocs/gcc-6/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/porting_to.html,v
retrieving revision 1.6
diff -u -r1.6 porting_to.html
--- htdocs/gcc-6/porting_to.html	4 Feb 2016 16:50:41 -	1.6
+++ htdocs/gcc-6/porting_to.html	9 Feb 2016 19:35:59 -
@@ -106,13 +106,11 @@
 
 
 Such code must be changed to convert the iostream object to bool
-explicitly:
+explicitly, e.g. return (bool)os;
+or
+return static_cast(os);
 
 
-
-  bool valid(std::ostream& os) { return (bool)os; }
-
-
 Lvalue required as left operand of assignment with complex numbers
 
 
@@ -222,6 +220,25 @@
 the C++ standard library.
 
 
+Optimizations remove null pointer checks for this
+
+
+When optimizing, GCC now assumes the this pointer can never be
+null, which is guaranteed by the language rules. Invalid programs which 
+assume it is OK to invoke a member function through a null pointer (possibly
+relying on checks like this != NULL) may crash or otherwise fail
+at run-time if null pointer checks are optimized away.
+With the -Wnull-dereference option the compiler tries to warn
+when it detects such invalid code.
+
+
+
+If the program cannot be fixed to remove the undefined behaviour then the
+option -fno-delete-null-pointer-checks can be used to disable
+this optimization. That option also disables other optimizations involving
+pointers, not only those involving this.
+
+
 -Wmisleading-indentation
 
 A new warning -Wmisleading-indentation was added


Re: [PATCH], PR target/68404, Fix PowerPC fusion error

2016-02-09 Thread David Edelsohn
On Tue, Feb 9, 2016 at 9:49 AM, Michael Meissner
 wrote:
> This bug fixes PR 68404, which created an insn for the fusion operation when
> accessing an array with a large constant offset that the downstream passes
> (regrenam in particular don't like).  Because fusion in general adds so little
> to the performance of power8, I just eliminated the compiler from generating
> this case for GCC 6.  In the GCC 7 timeframe, I likely will revist fusion for
> power9 support.  I ran a spec 2006 benchmark suite comparing the current
> behavior and the fix for PR 68404, and it was in the noise level (mcf was 1%
> slower, others ranged from 0.3% slower to 0.4% faster).
>
> I did a bootstrap build, including a bootstrap profiled build with LTO (which
> is how the problem was found) and it was found.  I rewrote 2 of the 3 fusion
> tests so that it uses fusion from a medium code toc entry instead of accessing
> an array element with a constant index over 65536 bytes.
>
> Is this patch ok to apply?  If you would prefer, I can eliminate the code
> inside of the fusion_gpr_addis predicate instead of using #if 0.
>
> [gcc]
> 2016-02-08  Michael Meissner  
>
> PR target/68404
> * config/rs6000/predicates.md (fusion_gpr_addis): Prevent fusing
> an ADDIS that adds a pointer to a large constant that sets the
> upper16 bits with a load operation.
>
> [gcc/testsuite]
> 2016-02-08  Michael Meissner  
>
> PR target/68404
> * gcc.target/powerpc/fusion.c: Rewrite test to use TOC fusion
> instead accessing a really large arrray.
> * gcc.target/powerpc/fusion3.c: Likewise.

Please remove the code entirely, not #if 0.

Okay with that change.

Thanks, David


libgo patch committed: Change gcstack_size to size_t

2016-02-09 Thread Ian Lance Taylor
In PR 69511 Dominik Vogt sent in this patch to change the gcstack_size
field of the G struct from uintptr to size_t.  This is because the
address of the field is passed to __splitstack_find, which takes an
argument of type size_t*.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 233235)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2ef5f1ca449b5cf07dbbd7b13a50910fb5567372
+4cec4c5db5b054c5536ec5c50ee7aebec83563bc
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 232239)
+++ libgo/runtime/proc.c(working copy)
@@ -2267,7 +2267,7 @@ runtime_malg(int32 stacksize, byte** ret
}
*ret_stacksize = stacksize;
newg->gcinitial_sp = *ret_stack;
-   newg->gcstack_size = stacksize;
+   newg->gcstack_size = (size_t)stacksize;
 #endif
}
return newg;
Index: libgo/runtime/runtime.h
===
--- libgo/runtime/runtime.h (revision 233235)
+++ libgo/runtime/runtime.h (working copy)
@@ -200,7 +200,7 @@ struct  G
void*   exception;  // current exception being thrown
boolis_foreign; // whether current exception from other language
void*gcstack;   // if status==Gsyscall, gcstack = stackbase to 
use during gc
-   uintptr gcstack_size;
+   size_t  gcstack_size;
void*   gcnext_segment;
void*   gcnext_sp;
void*   gcinitial_sp;


[patch,libgfortran] Bug 69668 - [4.9/5/6 Regression] Error reading namelist opened with DELIM='NONE'

2016-02-09 Thread Jerry DeLisle
The attached patch reverts the guilty code. We were trying to honor delim=NONE
on namelist reads which is invalid.

Test cases updated. Regression tested on x86-64.

OK for trunk and back port in about a week?

Regards,

Jerry

2016-02-09  Jerry DeLisle  

PR libgfortran/69668
* io/list_read.c (read_character): Remove code related to DELIM_NONE.



2016-02-09 Jerry DeLisle 

PR libgfortran/69668
* gfortran.dg/nanelist_38.f90: Update test.
* gfortran.dg/nanelist_84.f90: Update test.
diff --git a/gcc/testsuite/gfortran.dg/namelist_38.f90 b/gcc/testsuite/gfortran.dg/namelist_38.f90
index 5578654e..1da41c09 100644
--- a/gcc/testsuite/gfortran.dg/namelist_38.f90
+++ b/gcc/testsuite/gfortran.dg/namelist_38.f90
@@ -5,6 +5,7 @@
 program main
   implicit none
   character(len=3) :: a
+  character(25) :: b
   namelist /foo/ a
 
   open(10, status="scratch", delim="quote")
@@ -25,12 +26,16 @@ program main
   if (a.ne."a'a") call abort
   close (10)
 
-  open(10, status="scratch", delim="none")
+  open(10, delim="none")
   a = "a'a"
   write(10,foo) 
-  rewind 10
-  a = ""
-  read (10,foo)
-  if (a.ne."a'a") call abort
   close (10)
+  open(10)
+  read(10,"(a)") b
+  if (b .ne. "&FOO") call abort
+  read(10,"(a)") b
+  if (b .ne. " A=a'a") call abort
+  read(10,"(a)") b
+  if (b .ne. " /") call abort
+  close(10, status="delete")
 end program main
diff --git a/gcc/testsuite/gfortran.dg/namelist_84.f90 b/gcc/testsuite/gfortran.dg/namelist_84.f90
index af139d91..14b68a44 100644
--- a/gcc/testsuite/gfortran.dg/namelist_84.f90
+++ b/gcc/testsuite/gfortran.dg/namelist_84.f90
@@ -17,12 +17,11 @@ program namelist_delim_none
write(10, mylist)
rewind(10)
mystring = "x"
-   read(10,mylist)
-   if (any(mystring /= (/ 'mon', 'tue', 'wed', 'thu', 'fri' /))) call abort
rewind(10)
do i=1,5
  read(10,'(a)') internal_unit
- if (scan(internal_unit,"""'").ne.0) call abort
+ if (i.eq.2 .and. internal_unit .ne. " MYSTRING=mon  tue  wed  thu  fri  ,") call abort
+ if (scan(internal_unit,"""'").ne.0) print *, internal_unit
end do
close(10)
 end program
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index 052219be..efbbcb6c 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -1131,21 +1131,6 @@ read_character (st_parameter_dt *dtp, int length __attribute__ ((unused)))
 default:
   if (dtp->u.p.namelist_mode)
 	{
-	  if (dtp->u.p.current_unit->delim_status == DELIM_NONE)
-	{
-	  /* No delimiters so finish reading the string now.  */
-	  int i;
-	  push_char (dtp, c);
-	  for (i = dtp->u.p.ionml->string_length; i > 1; i--)
-		{
-		  if ((c = next_char (dtp)) == EOF)
-		goto done_eof;
-		  push_char (dtp, c);
-		}
-	  dtp->u.p.saved_type = BT_CHARACTER;
-	  free_line (dtp);
-	  return;
-	}
 	  unget_char (dtp, c);
 	  return;
 	}


Re: [patch, fortran] PR56007 Remarkably bad error message with DO array=1,2

2016-02-09 Thread Jerry DeLisle
On 02/08/2016 11:28 AM, Harald Anlauf wrote:
> Hi,
> 
> the simple patch below rejects arrays as do loop index
> variable before another (confusing) error message is emitted.
> Two new testcases derived from the PR, plus adaption of one
> testcase that relies on the old error message.
> 
> Whoever wants to take it...
> 

I will commit this under the simple rules as soon as I get the namelist
regression fix committed on trunk.

Thanks Harald

Jerry


Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-09 Thread Joern Wolfgang Rennecke



On 09/02/16 15:34, Claudiu Zissulescu wrote:

Most of the cases checking only the CC user may be sufficient. However, there 
are cases (only one which I found), where the CC user has a different mode than 
of the CC setter.  This is happening when running gcc.dg/pr56424.c test. Here, 
the C_FPU mode cstore is simplified by the following steps losing the CC_FPU 
mode:

In the expand:
18: cc:CC_FPU=cmp(r159:DF,r162:DF)
19: r163:SI=cc:CC_FPU<0
20: r161:QI=r163:SI#0
21: r153:SI=zero_extend(r161:QI)
22: cc:CC_ZN=cmp(r153:SI,0)
23: pc={(cc:CC_ZN!=0)?L28:pc}

Then after combine we get this:
18: cc:CC_FPU=cmp(r2:DF,r4:DF)
   REG_DEAD r4:DF
   REG_DEAD r2:DF
23: pc={(cc:CC_ZN<0)?L28:pc}
   REG_DEAD cc:CC_ZN
   REG_BR_PROB 6102


That sound like a bug.  Have you looked more closely what's going on?


Re: [PATCH] PR other/69554: avoid excessive source printing for widely-separated locations

2016-02-09 Thread Thomas Koenig

[In reply to https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00646.html ]

Hi David,

> OK for trunk in stage 4?  (PR 69554 is a regression)

The Fortran part of the patch is OK.

I would also appreciate if the patch could go in.  The chances
of encountering this regression in Fortran are rather high
for an average user.

Thanks for the patch!

Thomas