date:20160209


The following fixes update-address-taken to properly reject rewriting
decls into SSA that require fixup of call lhs because that's not done.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-09  Richard Biener  

PR tree-optimization/69715
* tree-ssa.c (execute_update_addresses_taken): Mark non-decl
LHS on calls as non-rewritable.

* gcc.dg/torture/pr69715.c: New testcase.

Index: gcc/tree-ssa.c
===
*** gcc/tree-ssa.c  (revision 233211)
--- gcc/tree-ssa.c  (working copy)
*** execute_update_addresses_taken (void)
*** 1436,1442 
tree lhs = gimple_get_lhs (stmt);
if (lhs
  && TREE_CODE (lhs) != SSA_NAME
! && non_rewritable_lvalue_p (lhs))
{
  decl = get_base_address (lhs);
  if (DECL_P (decl))
--- 1443,1450 
tree lhs = gimple_get_lhs (stmt);
if (lhs
  && TREE_CODE (lhs) != SSA_NAME
! && ((code == GIMPLE_CALL && ! DECL_P (lhs))
! || non_rewritable_lvalue_p (lhs)))
{
  decl = get_base_address (lhs);
  if (DECL_P (decl))
Index: gcc/testsuite/gcc.dg/torture/pr69715.c
===
*** gcc/testsuite/gcc.dg/torture/pr69715.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69715.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ struct __attribute__((may_alias)) S { long long low; int high; };
+ struct S foo (void);
+ long double
+ bar (void)
+ {
+   long double a;
+   *(struct S *)&a = foo ();
+   return a;
+ }

Re: [PATCH, PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

On Mon, 8 Feb 2016, Tom de Vries wrote:

> On 08/02/16 11:54, Richard Biener wrote:
> > On Mon, 8 Feb 2016, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > when compiling the fipa-pta tests in the libgomp testsuite
> > > (omp-nested-2.c,
> > > pr46032.c) with -flto -flto-partition=max, the tests fail in execution
> > > (PR69599).
> > > 
> > > The problem is related to the GOMP/GOACC_parallel optimization we do in
> > > fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a
> > > call
> > > foo._0 (data).
> > > 
> > > The problem is that this optimization is only legal in lto if both:
> > > - foo containing the call GOMP_parallel (&foo._0, data) and
> > > - foo._0
> > > are contained in the same partition.
> > > 
> > > In the case of -flto-partition=max, foo is contained in it's own
> > > partition,
> > > and foo._0 is contained in another partition.  This means the data
> > > argument to
> > > the GOMP_parallel call appears unused, and the setting of the argument is
> > > optimized away, which causes the execution failure.
> > > 
> > > This patch fixes that by testing if foo and foo._0 are part of the same
> > > partition.
> > > 
> > > [ Note that the node_address_taken change in the patch has no effect,
> > > since
> > > nonlocal_p already tests for used_from_other_partition. But I thought it
> > > was
> > > clearer to state the conditions under which we are allowed to ignore
> > > node->address_taken explicitly. ]
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > Build for nvidia accelerator and reg-tested libgomp with various lto
> > > settings.
> > > 
> > > OK for trunk, stage4?
> > 
> > I don't like the in_lto_p checks, why's the check not working
> > for non-LTO?
> > 
> 
> I was not sure if the partition flags were valid outside lto.
> 
> Updated patch removes the in_lto_p checks.
> 
> Bootstrapped on x86_64.
> 
> Build and reg-tested libgomp testsuite.
> 
> OK?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

New German PO file for 'gcc' (version 6.1-b20160131)

2016-02-09 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-6.1-b20160131.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: testsuite fix

2016-02-09 Thread Jonathan Wakely


On 08/02/16 17:48 -0800, Mike Stump wrote:

I’m running the pretty printer test cases on a target with status wrappers, and 
that system works by printing the return code on that output.  It is dependent 
upon the last line being terminated by “\n”, as the code that looks for the 
return code requires the return code at the start of a line.

The below patch added newlines to the ends of all files, so that the status 
wrappers always work.

Ok?


Yes, thanks.

[PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization

2016-02-09 Thread Bin Cheng

Hi,
When counting cost for loop inv, GCC checks if a loop inv can be propagated 
into its use site (a memory reference).  If it cannot be propagated, we 
increase its cost so that it's expensive enough to be hoisted out of loop.  
Currently we simply replace loop inv register in the use site with its 
definition expression, then call validate_changes to check if the result insn 
is valid.  This is weak because validate_changes doesn't take canonicalization 
into consideration.  Given below example:

  Loop inv def:  
   69: r149:SI=r87:SI+const(unspec[`'] 1)
  REG_DEAD r87:SI
  Loop inv use:
   70: r150:SI=[r90:SI*0x4+r149:SI]
  REG_DEAD r149:SI

The address expression after propagation is "r90 * 0x4 + (r87 + 
const(unspec[`']))".  Function validate_changes simply returns false to it. 
 As a matter of fact, the propagation is feasible if we canonicalize address 
expression into the form like "(r90 * 0x4 + r87) + const(unspec[`'])".

This patch fixes the problem by canonicalizing address expression and verifying 
if the new addr is valid.  The canonicalization follows GCC insn 
canonicalization rules.  The test case from bugzilla PR is also included.
As for the canonicalize_address interface, there is another 
canonicalize_address in fwprop.c which only changes shift into mult.  I think 
it would be good to factor out a common RTL interface in GCC, but that's stage1 
work.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-02-09  Bin Cheng  

PR tree-optimization/69052
* loop-invariant.c (canonicalize_address): New function.
(inv_can_prop_to_addr_use): Check validity of address expression
which is canonicalized by above function.

gcc/testsuite/ChangeLog
2016-02-09  Bin Cheng  

PR tree-optimization/69052
* gcc.target/i386/pr69052.c: New test.
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 707f044..157e273 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn *insn, 
bitmap depends_on,
   return inv;
 }
 
+/* Returns a canonical version address for X.  It identifies
+   addr expr in the form of A + B + C.  Following instruction
+   canonicalization rules, MULT operand is moved to the front,
+   CONST operand is moved to the end; also PLUS operators are
+   chained to the left.  */
+
+static rtx
+canonicalize_address (rtx x)
+{
+  rtx op0, op1, op2;
+  machine_mode mode = GET_MODE (x);
+  enum rtx_code code = GET_CODE (x);
+
+  if (code != PLUS)
+return x;
+
+  /* Extract operands from A + B (+ C).  */
+  if (GET_CODE (XEXP (x, 0)) == PLUS)
+{
+  op0 = XEXP (XEXP (x, 0), 0);
+  op1 = XEXP (XEXP (x, 0), 1);
+  op2 = XEXP (x, 1);
+}
+  else if (GET_CODE (XEXP (x, 1)) == PLUS)
+{
+  op0 = XEXP (x, 0);
+  op1 = XEXP (XEXP (x, 1), 0);
+  op2 = XEXP (XEXP (x, 1), 1);
+}
+  else
+{
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+  op2 = NULL_RTX;
+}
+
+  /* Move MULT operand to the front.  */
+  if (!REG_P (op1) && !CONST_INT_P (op1))
+std::swap (op0, op1);
+
+  /* Move CONST operand to the end.  */
+  if (CONST_INT_P (op0))
+std::swap (op0, op1);
+
+  if (op2 != NULL && CONST_INT_P (op1))
+{
+  /* Try to simplify CONST1 + CONST2 into one operand.  */
+  if (CONST_INT_P (op2))
+   {
+ rtx x = simplify_binary_operation (PLUS, mode, op1, op2);
+
+ if (x != NULL_RTX && CONST_INT_P (x))
+   {
+ op1 = x;
+ op2 = NULL_RTX;
+   }
+   }
+  else
+   std::swap (op1, op2);
+}
+  /* Chain PLUS operators to the left.  */
+  op0 = simplify_gen_binary (PLUS, mode, op0, op1);
+  if (op2 == NULL_RTX)
+return op0;
+  else
+return simplify_gen_binary (PLUS, mode, op0, op2);
+}
+
 /* Given invariant DEF and its address USE, check if the corresponding
invariant expr can be propagated into the use or not.  */
 
@@ -761,7 +829,7 @@ static bool
 inv_can_prop_to_addr_use (struct def *def, df_ref use)
 {
   struct invariant *inv;
-  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set, use_set;
   rtx_insn *use_insn = DF_REF_INSN (use);
   rtx_insn *def_insn;
   bool ok;
@@ -778,6 +846,29 @@ inv_can_prop_to_addr_use (struct def *def, df_ref use)
 
   validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
   ok = verify_changes (0);
+  /* Try harder with canonicalization in address expression.  */
+  if (!ok && (use_set = single_set (use_insn)) != NULL_RTX)
+{
+  rtx src, dest, mem = NULL_RTX;
+
+  src = SET_SRC (use_set);
+  dest = SET_DEST (use_set);
+  if (MEM_P (src))
+   mem = src;
+  else if (MEM_P (dest))
+   mem = dest;
+
+  if (mem != NULL_RTX
+ && !memory_address_addr_space_p (GET_MODE (mem),
+  XEXP (mem, 0),
+  MEM_ADD

Re: Combine simplify_set WORD_REGISTER_OPERATIONS

2016-02-09 Thread Alan Modra

On Mon, Feb 08, 2016 at 09:27:36AM -0700, Jeff Law wrote:
> On 01/31/2016 03:16 PM, Alan Modra wrote:
> >The comment says this test is supposed to prevent "a narrower
> >operation than requested", but it actually only allows a larger
> >subreg, not one the same size.  Fix that.
> >
> >Bootstrapped and regression tested powerpc64-linux.  OK for stage1?
> >
> >Note that this bug was found when investigating why gcc-6 does not
> >suffer from pr69548, ie. this bug was masking a powerpc backend bug.
> >
> > * combine.c (simplify_set): Correct WORD_REGISTER_OPERATIONS test.
> 
> Is there a strong need to apply this to gcc6?

No, better to wait for gcc-7, I think.

>  Can we construct a testcase
> where this makes a difference in the code we generate?

I instrumented the combine.c code in question with this

  if (!WORD_REGISTER_OPERATIONS
  && (GET_MODE_SIZE (GET_MODE (src))
  == GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)
{
  FILE *f = fopen ("/tmp/alan", "a");
  fprintf (f, "%s\n", main_input_filename);
  print_inline_rtx (f, src, 0);
  fprintf (f, "\n");
  fclose (f);
}

to see what popped out when bootstrapping gcc on x86_64.  There were
quite a lot of hits, DI -> DF, SI -> SF, V4SF -> TI etc, especially in
the testsuite (I should have dumped options too)..  

Here's the first one:
/src/gcc.git/libgcc/config/libbid/bid64_div.c
(subreg:DF (plus:DI (subreg:DI (reg:DF 841) 0)
(const_int 1 [0x1])) 0)
This one resulted in using lea vs. add, so slightly better code.

One from the testsuite:
/src/gcc.git/gcc/testsuite/gcc.dg/sso/p4.c
(subreg:SF (bswap:SI (reg:SI 99 [ Local_R2.F ])) 0)
When compiling with -Og, this showed

before  after
.loc 3 49 0 .loc 3 49 0
movl-32(%ebp), %eax movl-32(%ebp), %eax
bswap   %eaxbswap   %eax
movl%eax, -44(%ebp) movl%eax, -28(%ebp)
flds-44(%ebp)
fstps   -28(%ebp)

Quite an improvement, if you care about -Og code.

I didn't see any worse code, except some cases that I think were
caused by register allocation differences.

> My inclination would be to approve for gcc-7 as-is, but I'm more hesitant
> for gcc-6.
> 
> jeff

-- 
Alan Modra
Australia Development Lab, IBM

Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-09 Thread James Greenhalgh


On Mon, Feb 08, 2016 at 03:24:14PM +0100, Richard Biener wrote:
> On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh
>  wrote:
> > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
> >> Hi James,
> >>
> >> Thanks for reporting this issue.
> >> I prepared slightly different patch since we don't need to add
> >> tree-vect dump option - it is on by default for all tests in /vect
> >> directory.
> >
> > Hm, I added that line as my test runs were showing:
> >
> >   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not exist
> >
> > I would guess the explicit
> >
> >   /* { dg-options "-O3" } */
> >
> > is clobbering the vect.exp setup of flags?
>
> Yes.  Use { dg-additional-options "-O3" } instead.

I don't see why this test needs anything more than the default vect
options anyway... In which case, the patch would look like this.

Tested on x86-64 where the test passes, and on AArch64 where it is
correctly skipped.

OK?

Thanks,
James

---
2016-02-09  James Greenhalgh  

* gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
gate check on x86_64/i?86.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
index e575f6d..f5cae4f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
 /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
 
 #define N 256
@@ -16,4 +15,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { target { i?86-*-* x86_64-*-* } } } } */

Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

On Tue, Feb 9, 2016 at 12:24 PM, James Greenhalgh
 wrote:
>
> On Mon, Feb 08, 2016 at 03:24:14PM +0100, Richard Biener wrote:
>> On Mon, Feb 8, 2016 at 2:40 PM, James Greenhalgh
>>  wrote:
>> > On Mon, Feb 08, 2016 at 04:29:31PM +0300, Yuri Rumyantsev wrote:
>> >> Hi James,
>> >>
>> >> Thanks for reporting this issue.
>> >> I prepared slightly different patch since we don't need to add
>> >> tree-vect dump option - it is on by default for all tests in /vect
>> >> directory.
>> >
>> > Hm, I added that line as my test runs were showing:
>> >
>> >   UNRESOLVED: gcc.dg/vect/vect-mask-store-move-1.c: dump file does not 
>> > exist
>> >
>> > I would guess the explicit
>> >
>> >   /* { dg-options "-O3" } */
>> >
>> > is clobbering the vect.exp setup of flags?
>>
>> Yes.  Use { dg-additional-options "-O3" } instead.
>
> I don't see why this test needs anything more than the default vect
> options anyway... In which case, the patch would look like this.
>
> Tested on x86-64 where the test passes, and on AArch64 where it is
> correctly skipped.
>
> OK?

Ok.

Richard.

> Thanks,
> James
>
> ---
> 2016-02-09  James Greenhalgh  
>
> * gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
> gate check on x86_64/i?86.
>

Re: [Patch] Gate vect-mask-store-move-1.c correctly, and actually output the dump

2016-02-09 Thread Jakub Jelinek

On Tue, Feb 09, 2016 at 11:24:57AM +, James Greenhalgh wrote:

Also tested on i686-linux (32-bit), where it previously FAILed too.

> 2016-02-09  James Greenhalgh  
> 
>   * gcc.dg/vect/vect-mask-store-move-1.c: Drop dg-options directive,
>   gate check on x86_64/i?86.

Ok, thanks.

> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> index e575f6d..f5cae4f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
> @@ -1,5 +1,4 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
>  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
>  
>  #define N 256
> @@ -16,4 +15,4 @@ void foo (int n)
>}
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { 
> target { i?86-*-* x86_64-*-* } } } } */


Jakub

Re: [PATCH PR69652, Regression]

On Fri, Feb 5, 2016 at 3:54 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is updated patch - I came back to move call statements also since
>  masked loads are presented by internal call. I also assume that for
> the following simple loop
>   for (i = 0; i < n; i++)
> if (b1[i])
>   a1[i] = sqrtf(a2[i] * a2[i] + a3[i] * a3[i]);
> motion must be done for all vector statements in semi-hammock including SQRT.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?

The patch is incredibly hard to parse due to the re-indenting.  Please
consider sending
diffs with -b.

This issue exposes that you are moving (masked) stores across loads without
checking aliasing.  In the specific case those loads are dead and thus
this is safe
but in general I thought we were checking that we are using the same VUSE
during the sinking operation.

Thus, I'd rather have

+ /* Check that LHS does not have uses outside of STORE_BB.  */
+ res = true;
+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+   {
+ gimple *use_stmt;
+ use_stmt = USE_STMT (use_p);
+ if (is_gimple_debug (use_stmt))
+   continue;
+ if (gimple_bb (use_stmt) != store_bb)
+   {
+ res = false;
+ break;
+   }
+   }

also check for the dead code case and DCE those stmts here.  Like so:

   if (has_zero_uses (lhs))
{
  gsi_remove (&gsi_from, true);
  continue;
}

before the above loop.

Richard.

> ChangeLog:
>
> 2016-02-05  Yuri Rumyantsev  
>
> PR tree-optimization/69652
> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
> to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
> skipped scalar statements, introduce variable LAST_VUSE to keep
> vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
> begining of current masked store processing, did source re-formatting,
> skip parsing of debug gimples, stop processing if a gimple with
> volatile operand has been encountered, save scalar statement
> with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
> iterator, change vuse of all saved scalar statements to LAST_VUSE if
> it makes sence.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/torture/pr69652.c: New test.
>
> 2016-02-04 19:40 GMT+03:00 Jakub Jelinek :
>> On Thu, Feb 04, 2016 at 05:46:27PM +0300, Yuri Rumyantsev wrote:
>>> Here is a patch that cures the issues with non-correct vuse for scalar
>>> statements during code motion, i.e. if vuse of scalar statement is
>>> vdef of masked store which has been sunk to new basic block, we must
>>> fix it up.  The patch also fixed almost all remarks pointed out by
>>> Jacub.
>>>
>>> Bootstrapping and regression testing on v86-64 did not show any new 
>>> failures.
>>> Is it OK for trunk?
>>>
>>> ChangeLog:
>>> 2016-02-04  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/69652
>>> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
>>> to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
>>> skipped scalar statements, introduce variable LAST_VUSE that has
>>> vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
>>> begining of current masked store processing, did source re-formatting,
>>> skip parsing of debug gimples, stop processing when call or gimple
>>> with volatile operand habe been encountered, save scalar statement
>>> with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
>>> iterator, change vuse of all saved scalar statements to LAST_VUSE if
>>> it makes sence.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/torture/pr69652.c: New test.
>>
>> Your mailer breaks ChangeLog formatting, so it is hard to check the
>> formatting of the ChangeLog entry.
>>
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69652.c 
>> b/gcc/testsuite/gcc.dg/torture/pr69652.c
>> new file mode 100644
>> index 000..91f30cf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/torture/pr69652.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
>> +
>> +void fn1(double **matrix, int column, int row, int n)
>> +{
>> +  int k;
>> +  for (k = 0; k < n; k++)
>> +if (matrix[row][k] != matrix[column][k])
>> +  {
>> +   matrix[column][k] = -matrix[column][k];
>> +   matrix[row][k] = matrix[row][k] - matrix[column][k];
>> +  }
>> +}
>> \ No newline at end of file
>>
>> Please make sure the last line of the test is a new-line.
>>
>> @@ -6971,6 +6972,8 @@ optimize_mask_stores (struct loop *loop)
>>gsi_next (&gsi))
>> {
>>   stmt = gsi_stmt (gsi);
>> + if (is_gimple_debug (stmt))
>> +   continue;
>>   if (is_gimple_call (stmt)
>>   && gimple_call_internal_p (stmt)

[PATCH] Fix PR69726


It turns out if-conversions poor job on

 if (a)
   x[i] = ...;
 else
   x[i] = ...;

results in bogus uninit warnings of x[i] for a variety of reasons.
First of all forwprop (aka match.pd simplification) doesn't fixup
all of if-conversions poor job as canonicalization sometimes
inverts the condition in [VEC_]COND_EXPRs and thus the existing
A ? B : (A ? X : C) -> A ? B : C pattern doesn't apply.  The match.pd
hunk fixes this (albeit in an awkward way - I don't feel like mucking
with genmatch at this stage, nor exactly for the poor [VEC_]COND_EXPR
IL we should rather fix).  Second, the late uninit pass is confused
by the left-over dead code, in this case dead load feeding a dead
VEC_COND_EXPR.  Adding a DCE pass before late uninit as the comment
in passes.def suggests fixes this and also should avoid creating the dead
RTL I've sometimes seen.

Due to the PR69719 fix we're now over the alias-test limit for the
testcase (well, all alias tests are bogus, see PR69732), so I upped
that limit for the testcase.  I'm investigating the Job done there.

Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu.

Richard.

2016-02-09  Richard Biener  

PR tree-optimization/69726
* passes.def: Add DCE pass before late uninit.
* match.pd: Add A ? B : (!A ? C : X) -> A ? B : C patterns to
really fixup if-conversions job.

* gcc.dg/uninit-22.c: New testcase.

Index: gcc/passes.def
===
*** gcc/passes.def  (revision 233241)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 322,336 
NEXT_PASS (pass_fold_builtins);
NEXT_PASS (pass_optimize_widening_mul);
NEXT_PASS (pass_tail_calls);
!   /* FIXME: If DCE is not run before checking for uninitialized uses,
 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
 However, this also causes us to misdiagnose cases that should be
!real warnings (e.g., testsuite/gcc.dg/pr18501.c).
! 
!To fix the false positives in uninit-5.c, we would have to
!account for the predicates protecting the set and the use of each
!variable.  Using a representation like Gated Single Assignment
!may help.  */
/* Split critical edges before late uninit warning to reduce the
   number of false positives from it.  */
NEXT_PASS (pass_split_crit_edges);
--- 322,332 
NEXT_PASS (pass_fold_builtins);
NEXT_PASS (pass_optimize_widening_mul);
NEXT_PASS (pass_tail_calls);
!   /* If DCE is not run before checking for uninitialized uses,
 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
 However, this also causes us to misdiagnose cases that should be
!real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
!   NEXT_PASS (pass_dce);
/* Split critical edges before late uninit warning to reduce the
   number of false positives from it.  */
NEXT_PASS (pass_split_crit_edges);
Index: gcc/match.pd
===
*** gcc/match.pd(revision 233241)
--- gcc/match.pd(working copy)
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 1717,1722 
--- 1717,1745 
   (simplify
(cnd @0 @1 (cnd @0 @2 @3))
(cnd @0 @1 @3))
+  /* A ? B : (!A ? C : X) -> A ? B : C.  */
+  /* ???  This matches embedded conditions open-coded because genmatch
+ would generate matching code for conditions in separate stmts only.
+ The following is still important to merge then and else arm cases
+ from if-conversion.  */
+  (simplify
+   (cnd @0 @1 (cnd @2 @3 @4))
+   (if (COMPARISON_CLASS_P (@0)
+&& COMPARISON_CLASS_P (@2)
+&& invert_tree_comparison
+(TREE_CODE (@0), HONOR_NANS (TREE_OPERAND (@0, 0))) == TREE_CODE 
(@2)
+&& operand_equal_p (TREE_OPERAND (@0, 0), TREE_OPERAND (@2, 0), 0)
+&& operand_equal_p (TREE_OPERAND (@0, 1), TREE_OPERAND (@2, 1), 0))
+(cnd @0 @1 @3)))
+  (simplify
+   (cnd @0 (cnd @1 @2 @3) @4)
+   (if (COMPARISON_CLASS_P (@0)
+&& COMPARISON_CLASS_P (@1)
+&& invert_tree_comparison
+(TREE_CODE (@0), HONOR_NANS (TREE_OPERAND (@0, 0))) == TREE_CODE 
(@1)
+&& operand_equal_p (TREE_OPERAND (@0, 0), TREE_OPERAND (@1, 0), 0)
+&& operand_equal_p (TREE_OPERAND (@0, 1), TREE_OPERAND (@1, 1), 0))
+(cnd @0 @3 @4)))
  
   /* A ? B : B -> B.  */
   (simplify
Index: gcc/testsuite/gcc.dg/uninit-22.c
===
*** gcc/testsuite/gcc.dg/uninit-22.c(revision 0)
--- gcc/testsuite/gcc.dg/uninit-22.c(revision 0)
***
*** 0 
--- 1,69 
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -Wuninitialized --param 
vect-max-version-for-alias-checks=20" } */
+ 
+ #include 
+ 
+ #define A1  2896 /* (1/sqrt(2))<<12 */
+

Re: [PATCH] S/390: PR 69625: Add test case

2016-02-09 Thread Dominik Vogt

On Fri, Feb 05, 2016 at 05:07:57PM +0100, Dominik Vogt wrote:
> The attached patch adds a testcase for PR 69625.

Version 2 also runs with -m31.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
ChangeLog

* gcc.target/s390/pr69625.c: Add test case.
>From 5c539cfea4292dc20bb5e7f854101997f11bc215 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 5 Feb 2016 15:13:08 +0100
Subject: [PATCH] S/390: PR 69625: Add test case.

---
 gcc/testsuite/gcc.target/s390/pr69625.c | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr69625.c

diff --git a/gcc/testsuite/gcc.target/s390/pr69625.c b/gcc/testsuite/gcc.target/s390/pr69625.c
new file mode 100644
index 000..f717183
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr69625.c
@@ -0,0 +1,37 @@
+/* Test for PR 69625; make sure that a leaf vararg function does not overwrite
+   the caller's r6.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+extern void abort (void);
+
+__attribute__ ((noinline))
+int
+foo (int x, ...)
+{
+  __builtin_va_list vl;
+  int i;
+
+  __asm__ __volatile__ ("lhi %%r6,1" : : : "r6");
+  __builtin_va_start(vl, x);
+  for (i = 2; i <= 6; i++)
+x += __builtin_va_arg(vl, int);
+  __builtin_va_end (vl);
+
+  return x;
+}
+
+__attribute__ ((noinline))
+void
+bar (int r2, int r3, int r4, int r5, int r6)
+{
+  foo (r2, r3, r4, r5, r6);
+  if (r6 != 6)
+abort ();
+}
+
+int
+main (void)
+{
+  bar (2, 3, 4, 5, 6);
+}
-- 
2.3.0

Re: Fix PR67639

2016-02-09 Thread Matthias Klose


On 08.02.2016 15:26, Bernd Schmidt wrote:

On 12/21/2015 08:39 PM, Jeff Law wrote:

On 12/18/2015 11:38 AM, Bernd Schmidt wrote:

In an earlier fix, the following change was made in varasm.c for invalid
register variables:

--- trunk/gcc/varasm.c2014/08/26 14:59:59214525
+++ trunk/gcc/varasm.c2014/08/26 17:06:31214526
@@ -1371,6 +1371,11 @@ make_decl_rtl (tree decl)
/* As a register variable, it has no section.  */
return;
  }
+  /* Avoid internal errors from invalid register
+ specifications.  */
+  SET_DECL_ASSEMBLER_NAME (decl, NULL_TREE);
+  DECL_HARD_REGISTER (decl) = 0;
+  return;
  }

As seen in PR67639, this makes the IL inconsistent and triggers another
internal error where we expect to see an SSA_NAME instead of a VAR_DECL.

The following patch extends the above slightly, by also setting
DECL_EXTERNAL to pretend that the erroneous variable is actually a
global.

Bootstrapped and tested on x86_64-linux, ok?

OK.


Turns out 65702 is a dup and this should go into gcc-5 as well. Ok to backport?


ChangeLog entry is not backported.

Re: [openacc] reference-typed data mappings

On 02/01/2016 09:57 AM, Cesar Philippidis wrote:

> This patch fixes a couple of bugs preventing c++ reference-typed
> variables from working in openacc data clauses. These fixes include:
> 
>  * Teach the gimplifier to filter out pointer data mappings for
>OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>Along with using a firsptrivate mapping for the array base pointers
>in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
> 
>  * Make the data mapping errors emitted by the c and c++ front ends
>more consistent with openacc by reporting data mapping errors, not
>omp-specific map errors.
> 
>  * Add some light checking for duplicate reference mappings in c++. The
>c++ FE still fails to detect duplicate component refs, but that's not
>working in openacc at the moment, anyway.
> 
> Jakub, the latter issue also affects openmp. I've added a simple openmp
> test case, but it could probably be more extensive. Can you add more
> test coverage or tell me what should be included?

While working on a different reduction problem, I noticed that both the
c and c++ front end's are treating reductions as generic data clauses.
That means, parallel reductions of the form

  #pragma acc copy(foo) reduction(+:foo)

would get treated as an error. This patch fixes that, in addition to the
changes listed above.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index eede3a7..20ff7da 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13115,7 +13115,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask

[PATCH] Fix gcc.dg/vect/vect-mask-store-move-1.c


Tested on x86_64-linux.

2016-02-09  Richard Biener  

* gcc.dg/vect/vect-mask-store-move-1.c: Add missing space.

Index: gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c  (revision 233244)
+++ gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c  (working copy)
@@ -15,4 +15,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect"{ 
target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */

openacc reference reductions

This patch teaches omp-lower how handle reference-typed reductions,
which are common in fortran subroutines. Unlike the implementation in
gomp4 branch, this patch doesn't rewrite the reference reduction
variables as local variables. Instead, a local copy is created for
reduction variable.

There are two things that stick out in this patch. First, I took care
not remap any reduction variable appearing on a parallel directive
inside an offloaded region in order to keep it private. Second, you'll
notice that I'm creating quite a few temporary pointers inside
lower_oacc_reductions. Without those separate pointers, I'd get SSA
validation errors because those pointers get deferenced multiple times.
I didn't investigate that problem further.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..8a66760 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,28 @@ is_oacc_kernels (omp_context *ctx)
 	  == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Ret

Re: openacc reference reductions

2016-02-09 Thread Nathan Sidwell


While I've not looked at the rest of the patch, this bit stood out:


+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  /* Don't install a local copy of the decl if it used
+ inside a acc parallel reduction.  */


^^ comment is misleading -- this routine's not installing anything


+  if (is_oacc_parallel (ctx))


^^ already checked above.


+for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+ && OMP_CLAUSE_DECL (c) == var)
+   return true;
+
+  return false;
+}
+

RE: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-09 Thread Claudiu Zissulescu

Please find attached a reworked patch. It doesn't contain the ABI modifications 
as I notified you earlier in an email.  Also, you may have extra comments 
regarding these original observations:

>+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
>+ registers.  */
>+  if (TARGET_HS)
>+{
>+  for (regno = 1; regno < 32; regno +=2)
>+   {
>+ arc_hard_regno_mode_ok[regno] = S_MODES;
>+   }
>+}
>+
>
>Does TARGET_HS with -mabi=default allow for passing DFmode / DImode 
>arguments
>in odd registers?  I fear you might run into reload trouble when trying to
>access the values.

The current ABI passes the DI-like modes in any register pair. This should not 
be an issue as the movdi_insn and movdf_insn should handle those exceptional 
cases. As for partial passing of arguments, move_block_from_reg() should take 
care of exceptional cases like DImode.

>+ if (!link_insn
>+ /* Avoid FPU instructions.  */
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPUmode)
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPU_UNEQmode)
>+ || (GET_MODE (SET_DEST
>+   (PATTERN (link_insn))) == CC_FPUEmode))
>
>It's pointless to search for the CC setter and then bail out this late.
>The mode is also accessible in the CC user, so after we have computed
>pc_target, we can check the condition code register in the comparison
>XEXP (pc_target, 1) for its mode.

Most of the cases checking only the CC user may be sufficient. However, there 
are cases (only one which I found), where the CC user has a different mode than 
of the CC setter.  This is happening when running gcc.dg/pr56424.c test. Here, 
the C_FPU mode cstore is simplified by the following steps losing the CC_FPU 
mode:

In the expand:
   18: cc:CC_FPU=cmp(r159:DF,r162:DF)
   19: r163:SI=cc:CC_FPU<0
   20: r161:QI=r163:SI#0
   21: r153:SI=zero_extend(r161:QI)
   22: cc:CC_ZN=cmp(r153:SI,0)
   23: pc={(cc:CC_ZN!=0)?L28:pc}

Then after combine we get this:
   18: cc:CC_FPU=cmp(r2:DF,r4:DF)
  REG_DEAD r4:DF
  REG_DEAD r2:DF
   23: pc={(cc:CC_ZN<0)?L28:pc}
  REG_DEAD cc:CC_ZN
  REG_BR_PROB 6102

Ok to apply?
Claudiu


0001-ARC-Add-single-double-IEEE-precission-FPU-support.patch
Description: 0001-ARC-Add-single-double-IEEE-precission-FPU-support.patch

Re: [RFC] Combine vectorized loops with its scalar remainder.

2016-02-09 Thread Ilya Enkovich

2015-12-15 19:41 GMT+03:00 Yuri Rumyantsev :
> Hi Richard,
>
> I re-designed the patch to determine ability of loop masking on fly of
> vectorization analysis and invoke it after loop transformation.
> Test-case is also provided.
>
> what is your opinion?
>
> Thanks.
> Yuri.
>

Hi,

I'm going to start work on extending this patch to handle mixed mask sizes,
support vectorization of peeled loop tail and fix profitability
estimation to choose
proper loop tail processing. Here is shortly a planned changes list:

1. Don't put any restriction on mask type when check if statement can be masked.
Instead just store all required masks in LOOP_VINFO_REQUIRED_MASKS. After
all statements are checked we additionally check all required masks
can be produced
(we have proper comparison, widening and narrowing support).

2. In vect_estimate_min_profitable_iters compute overhead for masks creation,
decide what we should do with a loop tail (nothing, vectorize, combine
with loop body),
additionally return a number of tail iterations required for chosen
tail processing
profitability.

3. In vect_transform_loop depending on chosen strategy either mask whole loop or
produce vectorized tail. For now it's not fully clear to me what is
the best way to get
vectorized tail.

The first option is to just peel one iteration after loop is
vectorized. But in our masking
functions we use LOOP_VINFO and STMT_VINFO structures we loose during peeling.

Another option is to peel scalar loop and then just run vectorizer one more time
to vectorize and mask it.

Also we may peel vectorized loop and use original version (with all
STMT_VINFO still
available) as a tail and peeled version as a main loop.

Currently I think the best option is to peel scalar loop and run
vectorizer one more time
for it. This option is simpler and can also be used to vectorize loop
tail with a smaller vector
size when target doesn't support masking or masking is not profitable.

Any comments?

Thanks,
Ilya

Re: openacc reference reductions

On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
> While I've not looked at the rest of the patch, this bit stood out:
> 
>> +static bool
>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>> +{
>> +  if (!is_oacc_parallel (ctx))
>> +return false;
>> +
>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>> +
>> +  /* Don't install a local copy of the decl if it used
>> + inside a acc parallel reduction.  */
> 
> ^^ comment is misleading -- this routine's not installing anything
> 
>> +  if (is_oacc_parallel (ctx))
> 
> ^^ already checked above.
> 
>> +for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>> +  && OMP_CLAUSE_DECL (c) == var)
>> +return true;
>> +
>> +  return false;
>> +}
>> +

Thanks for catching that. Those are artifacts from when this code used
to be located exclusively in scan_sharing_clauses. I've updated the
patch with those changes.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c

Re: [RFC] Combine vectorized loops with its scalar remainder.

2016-02-09 Thread Jeff Law


On 02/09/2016 09:09 AM, Ilya Enkovich wrote:


Another option is to peel scalar loop and then just run vectorizer
one more time to vectorize and mask it.

Also we may peel vectorized loop and use original version (with all
STMT_VINFO still available) as a tail and peeled version as a main
loop.

Currently I think the best option is to peel scalar loop and run
vectorizer one more time for it. This option is simpler and can also
be used to vectorize loop tail with a smaller vector size when target
doesn't support masking or masking is not profitable.
In general, a path where we have peeling & masking as an option seems 
wise.  The sense I've gotten from rth was that there's going to be 
classes of loops where that's going to be the best option.


jeff

Re: [ARM] Use vector wide add for mixed-mode adds

Hi Michael,

On 17/12/15 00:02, Michael Collison wrote:

Kyrill,

I have attached a patch that address your comments. The only change I would ask you to re-consider renaming is the function 'bool aarch32_simd_check_vect_par_cnst_half'. This function was copied from the aarch64 port and I thought it as
important to match the naming for maintenance purposes. I did rename the function to 'bool arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm' and added '_p' per you suggestions. Is this okay?

Ok, that's fine with me.

I implemented all your other change suggestions.

Thanks, sorry it took a long time to get back to this, I was busy with
regression-fixing patches as we're
in bug-fixing mode...

2015-12-16 Michael Collison

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

I've tried this out and I have a few comments.
The arm.c hunk doesn't apply to current trunk anymore due to context.
Can you please rebase the patch?
I've fixed it up manually in my tree so I can build it.
With this patch I'm seeing two PASS->FAIL on arm-none-eabi:
FAIL: gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects scan-tree-dump-times vect
"vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts using
SLP" 1
My compiler is configured with --with-float=hard --with-cpu=cortex-a9
--with-fpu=neon --with-mode=thumb
Can you please look into these? Maybe it's just the tests that need adjustment?

Also, I'm seeing the new tests give an error:
ERROR: gcc.target/arm/neon-vaddws16.c: Unrecognized option type: arm_neon_ok for "
dg-add-options 3 arm_neon_ok "
UNRESOLVED: gcc.target/arm/neon-vaddws16.c: Unrecognized option type: arm_neon_ok for
" dg-add-options 3 arm_neon_ok "

That've because the dg-add-options argument should be arm_neon rather than
arm_neon_ok.
Also, since the new tests are compile-only the effective target check should be
arm_neon_ok rather than arm_neon_hw.

I also see ./contrib/check_GNU_style.sh complaining about some minor style
issues like trailing whitespace and
blocks of whitespace that should be replaced with tabs.

In any case, this patch is GCC 7 material at this point, so I think with the
above issues resolved
(and the FAILs investigated) this should be in good shape.

Thanks,
Kyrill

Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:
> Hi Charles,
>
>
> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>>

>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
>> op1, rtx sel)
>> arm_expand_vec_perm_1 (target, op0, op1, sel);
>>   }
>>   +/* map lane ordering between architectural lane order, and GCC lane
>> order,
>> +   taking into account ABI.  See comment above output_move_neon for
>> details.  */
>> +static int
>> +neon_endian_lane_map (machine_mode mode, int lane)
>
>
> s/map/Map/
> New line between comment and function signature.

Done.

>> +{
>> +  if (BYTES_BIG_ENDIAN)
>> +  {
>> +int nelems = GET_MODE_NUNITS (mode);
>> +/* Reverse lane order.  */
>> +lane = (nelems - 1 - lane);
>> +/* Reverse D register order, to match ABI.  */
>> +if (GET_MODE_SIZE (mode) == 16)
>> +  lane = lane ^ (nelems / 2);
>> +  }
>> +  return lane;
>> +}
>> +
>> +/* some permutations index into pairs of vectors, this is a helper
>> function
>> +   to map indexes into those pairs of vectors.  */
>> +static int
>> +neon_pair_endian_lane_map (machine_mode mode, int lane)
>
>
> Similarly, s/some/Some/ and new line after comment.

Done.

>> +{
>> +  int nelem = GET_MODE_NUNITS (mode);
>> +  if (BYTES_BIG_ENDIAN)
>> +lane =
>> +  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
>> +  return lane;
>> +}
>> +
>>   /* Generate or test for an insn that supports a constant permutation.
>> */
>> /* Recognize patterns for the VUZP insns.  */
>> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
>> unsigned int i, odd, mask, nelt = d->nelt;
>> rtx out0, out1, in0, in1;
>> rtx (*gen)(rtx, rtx, rtx, rtx);
>> +  int first_elem;
>> +  int swap;
>>
>
> Just make this a bool.

As discussed on IRC, this variable does contain an integer. I have
renamed it as swap_nelt, and changed the test on it below.

[snip]

>>   @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
>> *d)
>>   in0 = d->op0;
>> in1 = d->op1;
>> -  if (BYTES_BIG_ENDIAN)
>> +  if (swap)
>>   {
>> std::swap (in0, in1);
>> -  odd = !odd;
>>   }
>
> remove the braces around the std::swap

Done. Also changed if (swap) to if (swap_nelt != 0)

[snip]

>> @@ -0,0 +1,24 @@
>> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>> +
>> +#define SIZE 128
>> +unsigned short _Alignas (16) in[SIZE];
>> +
>> +extern void abort (void);
>> +
>> +__attribute__ ((noinline)) int
>> +test (unsigned short sum, unsigned short *in, int x)
>> +{
>> +  for (int j = 0; j < SIZE; j += 8)
>> +sum += in[j] * x;
>> +  return sum;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  for (int i = 0; i < SIZE; i++)
>> +in[i] = i;
>> +  if (test (0, in, 1) != 960)
>> +abort ();
>
>
> AFAIK tests here usually prefer __builtin_abort ();
> That way you don't have to declare the abort prototype in the beginning.

Done.

Updated patch attached
From 99a536e2e10e3759a5de88422fadcabb22084b2f Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Tue, 9 Feb 2016 15:18:43 +
Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

gcc/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* config/arm/arm.c (neon_endian_lane_map): New function.
	(neon_vector_pair_endian_lane_map): New function.
	(arm_evpc_neon_vuzp): Allow for big endian lane order.
	* config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
	endian.
	(vuzpq_s16): Likewise.
	(vuzpq_s32): Likewise.
	(vuzpq_f32): Likewise.
	(vuzpq_u8): Likewise.
	(vuzpq_u16): Likewise.
	(vuzpq_u32): Likewise.
	(vuzpq_p8): Likewise.
	(vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* gcc.c-torture/execute/pr68532.c: New test.

Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8a2745..95ee9a5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,37 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   arm_expand_vec_perm_1 (target, op0, op1, sel);
 }
 
+/* Map lane ordering between architectural lane order, and GCC lane order,
+   taking into account ABI.  See comment above output_move_neon for details.  */
+
+static int
+neon_endian_lane_map (machine_mode mode, int lane)
+{
+  if (BYTES_BIG_ENDIAN)
+  {
+int nelems = GET_MODE_NUNITS (mode);
+/* Reverse lane order.  */
+lane = (nelems - 1 - lane);
+/* Reverse D register order, to match ABI.  */
+if (GET_MODE_SIZE (mode) == 16)
+  lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* Some permutations index into pairs of vectors, this is a helper function
+   to map indexes into those pairs of vectors.  */
+
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)
+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+lane =
+  neon_endian

Re: [openacc] reference-typed data mappings

On 02/09/2016 07:00 AM, Cesar Philippidis wrote:
> On 02/01/2016 09:57 AM, Cesar Philippidis wrote:
> 
>> > This patch fixes a couple of bugs preventing c++ reference-typed
>> > variables from working in openacc data clauses. These fixes include:
>> > 
>> >  * Teach the gimplifier to filter out pointer data mappings for
>> >OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>> >Along with using a firsptrivate mapping for the array base pointers
>> >in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
>> > 
>> >  * Make the data mapping errors emitted by the c and c++ front ends
>> >more consistent with openacc by reporting data mapping errors, not
>> >omp-specific map errors.
>> > 
>> >  * Add some light checking for duplicate reference mappings in c++. The
>> >c++ FE still fails to detect duplicate component refs, but that's not
>> >working in openacc at the moment, anyway.
>> > 
>> > Jakub, the latter issue also affects openmp. I've added a simple openmp
>> > test case, but it could probably be more extensive. Can you add more
>> > test coverage or tell me what should be included?
> While working on a different reduction problem, I noticed that both the
> c and c++ front end's are treating reductions as generic data clauses.
> That means, parallel reductions of the form
> 
>   #pragma acc copy(foo) reduction(+:foo)
> 
> would get treated as an error. This patch fixes that, in addition to the
> changes listed above.
> 
> Is this patch ok for trunk?

>   libgomp/
>   * testsuite/libgomp.c++/non-scalar-data.C: New test.

I copied the wrong test here. It should be testing omp target, not acc
*. This patch updates that test case.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/li

Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:

> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>> unsigned int i, high, mask, nelt = d->nelt;
>> rtx out0, out1, in0, in1;
>> rtx (*gen)(rtx, rtx, rtx, rtx);
>> +  int first_elem;
>> +  bool is_swapped;
>>   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
>>   return false;
>>   +  is_swapped = BYTES_BIG_ENDIAN ? true : false;
>
>
> This is just "is_swapped = BYTES_BIG_ENDIAN;"

Done.

>> +
>> /* Note that these are little-endian tests.  Adjust for big-endian
>> later.  */
>
>
> I think you can remove this comment now, like in patch 1/2

Done.

>> +  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
>> +
>> high = nelt / 2;
>> -  if (d->perm[0] == high)
>> +  if (first_elem == neon_endian_lane_map (d->vmode, high))
>>   ;
>> -  else if (d->perm[0] == 0)
>> +  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
>>   high = 0;
>> else
>>   return false;
>> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>   for (i = 0; i < nelt / 2; i++)
>>   {
>> -  unsigned elt = (i + high) & mask;
>> -  if (d->perm[i * 2] != elt)
>> +  unsigned elt =
>> +   neon_pair_endian_lane_map (d->vmode, i + high) & mask;
>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>> is_swapped)]
>> + != elt)
>> return false;
>> -  elt = (elt + nelt) & mask;
>> -  if (d->perm[i * 2 + 1] != elt)
>> +  elt =
>> +   neon_pair_endian_lane_map (d->vmode, i + nelt + high)
>> +   & mask;
>
>
> The "& mask" can go on the previous line.

Done

>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>> !is_swapped)]
>> + != elt)
>> return false;
>>   }
>>   @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d
>> *d)
>>   in0 = d->op0;
>> in1 = d->op1;
>> -  if (BYTES_BIG_ENDIAN)
>> +  if (is_swapped)
>>   {
>> std::swap (in0, in1);
>> -  high = !high;
>>   }
>
>
> remove the braces around the std::swap.

Done.

> Ok with these changes.
> I've tried out both patch and they do fix execution failures on big-endian
> and don't break any NEON intrinsics tests that I threw at them.

Attached for completeness, will commit once the VUZP patch is OKd.
From 469f82610a4e70284bf23c373b8a73685cad0ec1 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Tue, 9 Feb 2016 15:18:44 +
Subject: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

gcc/ChangeLog:

2016-02-09  Charles Baylis  

	PR target/68532
	* config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
	order.
	* config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
	endian.
	(vzipq_s16): Likewise.
	(vzipq_s32): Likewise.
	(vzipq_f32): Likewise.
	(vzipq_u8): Likewise.
	(vzipq_u16): Likewise.
	(vzipq_u32): Likewise.
	(vzipq_p8): Likewise.
	(vzipq_p16): Likewise.

Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 95ee9a5..5562baa 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28318,15 +28318,20 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
   unsigned int i, high, mask, nelt = d->nelt;
   rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  bool is_swapped;
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
 return false;
 
-  /* Note that these are little-endian tests.  Adjust for big-endian later.  */
+  is_swapped = BYTES_BIG_ENDIAN;
+
+  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
+
   high = nelt / 2;
-  if (d->perm[0] == high)
+  if (first_elem == neon_endian_lane_map (d->vmode, high))
 ;
-  else if (d->perm[0] == 0)
+  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
 high = 0;
   else
 return false;
@@ -28334,11 +28339,15 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
 
   for (i = 0; i < nelt / 2; i++)
 {
-  unsigned elt = (i + high) & mask;
-  if (d->perm[i * 2] != elt)
+  unsigned elt =
+	neon_pair_endian_lane_map (d->vmode, i + high) & mask;
+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)]
+	  != elt)
 	return false;
-  elt = (elt + nelt) & mask;
-  if (d->perm[i * 2 + 1] != elt)
+  elt =
+	neon_pair_endian_lane_map (d->vmode, i + nelt + high) & mask;
+  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)]
+	  != elt)
 	return false;
 }
 
@@ -28362,11 +28371,8 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
 
   in0 = d->op0;
   in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
-{
-  std::swap (in0, in1);
-  high = !high;
-}
+  if (is_swapped)
+std::swap (in0, in1);
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
diff --git a/gcc/config/arm/arm_neon.h b/g

Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian



On 09/02/16 17:00, Charles Baylis wrote:

On 8 February 2016 at 11:42, Kyrill Tkachov  wrote:

Hi Charles,


On 03/02/16 18:59, charles.bay...@linaro.org wrote:

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
op1, rtx sel)
 arm_expand_vec_perm_1 (target, op0, op1, sel);
   }
   +/* map lane ordering between architectural lane order, and GCC lane
order,
+   taking into account ABI.  See comment above output_move_neon for
details.  */
+static int
+neon_endian_lane_map (machine_mode mode, int lane)


s/map/Map/
New line between comment and function signature.

Done.


+{
+  if (BYTES_BIG_ENDIAN)
+  {
+int nelems = GET_MODE_NUNITS (mode);
+/* Reverse lane order.  */
+lane = (nelems - 1 - lane);
+/* Reverse D register order, to match ABI.  */
+if (GET_MODE_SIZE (mode) == 16)
+  lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* some permutations index into pairs of vectors, this is a helper
function
+   to map indexes into those pairs of vectors.  */
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)


Similarly, s/some/Some/ and new line after comment.

Done.


+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+lane =
+  neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
+  return lane;
+}
+
   /* Generate or test for an insn that supports a constant permutation.
*/
 /* Recognize patterns for the VUZP insns.  */
@@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
 unsigned int i, odd, mask, nelt = d->nelt;
 rtx out0, out1, in0, in1;
 rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  int swap;


Just make this a bool.

As discussed on IRC, this variable does contain an integer. I have
renamed it as swap_nelt, and changed the test on it below.


This is ok.
Thanks,
Kyrill


[snip]


   @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
*d)
   in0 = d->op0;
 in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  if (swap)
   {
 std::swap (in0, in1);
-  odd = !odd;
   }

remove the braces around the std::swap

Done. Also changed if (swap) to if (swap_nelt != 0)

[snip]


@@ -0,0 +1,24 @@
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#define SIZE 128
+unsigned short _Alignas (16) in[SIZE];
+
+extern void abort (void);
+
+__attribute__ ((noinline)) int
+test (unsigned short sum, unsigned short *in, int x)
+{
+  for (int j = 0; j < SIZE; j += 8)
+sum += in[j] * x;
+  return sum;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < SIZE; i++)
+in[i] = i;
+  if (test (0, in, 1) != 960)
+abort ();


AFAIK tests here usually prefer __builtin_abort ();
That way you don't have to declare the abort prototype in the beginning.

Done.

Updated patch attached

[PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus


Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when 
targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar 
reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the 
caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation 
the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, 
whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the 
result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant 
with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that 
the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I 
understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar 
correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf 
configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large 
patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and 
only for static functions
that don't get called outside their translation units (or during LTO I suppose) 
so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill

2016-02-09  Kyrylo Tkachov  

PR target/65578
* config/arm/arm.c (use_vfp_abi): Remove id_double argument.
Don't check for is_double and TARGET_VFP_DOUBLE.
(aapcs_vfp_is_call_or_return_candidate): Update callsite.
(aapcs_vfp_is_return_candidate): Likewise.
(aapcs_vfp_is_call_candidate): Likewise.
(aapcs_vfp_allocate_return_reg): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e71c9f56cbe846fdddf2e42a9f4575bacee570c1..e1404c74f74d01eb9c3362c7250e2b30ba5e47e7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5696,7 +5696,7 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
 
 /* Return true if PCS_VARIANT should use VFP registers.  */
 static bool
-use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
+use_vfp_abi (enum arm_pcs pcs_variant)
 {
   if (pcs_variant == ARM_PCS_AAPCS_VFP)
 {
@@ -5715,8 +5715,7 @@ use_vfp_abi (enum arm_pcs pcs_variant, bool is_double)
   if (pcs_variant != ARM_PCS_AAPCS_LOCAL)
 return false;
 
-  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT &&
-	  (TARGET_VFP_DOUBLE || !is_double));
+  return (TARGET_32BIT && TARGET_VFP && TARGET_HARD_FLOAT);
 }
 
 /* Return true if an argument whose type is TYPE, or mode is MODE, is
@@ -5758,7 +5757,7 @@ aapcs_vfp_is_call_or_return_candidate (enum arm_pcs pcs_variant,
 return false;
 
 
-  if (!use_vfp_abi (pcs_variant, ARM_NUM_REGS (new_mode) > 1))
+  if (!use_vfp_abi (pcs_variant))
 return false;
 
   *base_mode = new_mode;
@@ -5772,7 +5771,7 @@ aapcs_vfp_is_return_candidate (enum arm_pcs pcs_variant,
   int count ATTRIBUTE_UNUSED;
   machine_mode ag_mode ATTRIBUTE_UNUSED;
 
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
 return false;
   return aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
 		&ag_mode, &count);
@@ -5782,7 +5781,7 @@ static bool
 aapcs_vfp_is_call_candidate (CUMULATIVE_ARGS *pcum, machine_mode mode,
 			 const_tree type)
 {
-  if (!use_vfp_abi (pcum->pcs_variant, false))
+  if (!use_vfp_abi (pcum->pcs_variant))
 return false;
 
   return aapcs_vfp_is_call_or_return_candidate (pcum->pcs_variant, mode, type,
@@ -5848,7 +5847,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
 			   machine_mode mode,
 			   const_tree type ATTRIBUTE_UNUSED)
 {
-  if (!use_vfp_abi (pcs_variant, false))
+  if (!use_vfp_abi (pcs_variant))
 return NULL;
 
   if (mode == BLKmode

Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus



On 09/02/16 17:21, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when 
targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar 
reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the 
caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation 
the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, 
whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the 
result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant 
with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that 
the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I 
understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar 
correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf 
configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large 
patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and 
only for static functions
that don't get called outside their translation units (or during LTO I suppose) 
so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill



Huh, I just realised I wrote completely the wrong PR number on this.
The PR I'm talking about here is PR target/69538

Sorry for the confusion.

Kyrill



2016-02-09 Kyrylo Tkachov  

PR target/65578
* config/arm/arm.c (use_vfp_abi): Remove id_double argument.
Don't check for is_double and TARGET_VFP_DOUBLE.
(aapcs_vfp_is_call_or_return_candidate): Update callsite.
(aapcs_vfp_is_return_candidate): Likewise.
(aapcs_vfp_is_call_candidate): Likewise.
(aapcs_vfp_allocate_return_reg): Likewise.

Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2016-02-09 Thread Ulrich Weigand

Hi Bill,

> 2014-02-20  Bill Schmidt  
> 
>   * config/rs6000/altivec.md (altivec_lvxl): Rename as
>   *altivec_lvxl__internal and use VM2 iterator instead of
>   V4SI.
>   (altivec_lvxl_): New define_expand incorporating
>   -maltivec=be semantics where needed.

I just noticed that this:

> -(define_insn "altivec_lvxl"
> +(define_expand "altivec_lvxl_"
>[(parallel
> -[(set (match_operand:V4SI 0 "register_operand" "=v")
> -   (match_operand:V4SI 1 "memory_operand" "Z"))
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
>   (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
>"TARGET_ALTIVEC"
> -  "lvxl %0,%y1"
> +{
> +  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
> +{
> +  altivec_expand_lvx_be (operands[0], operands[1], mode, 
> UNSPEC_SET_VSCR);
> +  DONE;
> +}
> +})
> +
> +(define_insn "*altivec_lvxl__internal"
> +  [(parallel
> +[(set (match_operand:VM2 0 "register_operand" "=v")
> +   (match_operand:VM2 1 "memory_operand" "Z"))
> + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> +  "TARGET_ALTIVEC"
> +  "lvx %0,%y1"
>[(set_attr "type" "vecload")])

now causes vec_ldl to emit the lvx instead of the lvxl instruction.
I assume this was not actually intended?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Combine simplify_set WORD_REGISTER_OPERATIONS

2016-02-09 Thread Segher Boessenkool

On Mon, Feb 01, 2016 at 08:46:42AM +1030, Alan Modra wrote:
> The comment says this test is supposed to prevent "a narrower
> operation than requested", but it actually only allows a larger
> subreg, not one the same size.  Fix that.
> 
> Bootstrapped and regression tested powerpc64-linux.  OK for stage1?

It looks good, but please post it again then.


Segher

[PATCH, i386]: Use gen_int_mode to truncate const_int operand

2016-02-09 Thread Uros Bizjak

Hello!

No need to go through all subreg processing, we already know we have
const_int here.

2016-02-09  Uros Bizjak  

* config/i386/i386.md (insv_1): Use gen_int_mode to
truncate const_int operand 1 to QImode.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32},
committed to mainline SVN.

Uros.

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 233245)
+++ config/i386/i386.md (working copy)
@@ -2883,7 +2883,7 @@
   ""
 {
   if (CONST_INT_P (operands[1]))
-operands[1] = simplify_gen_subreg (QImode, operands[1], mode, 0);
+operands[1] = gen_int_mode (INTVAL (operands[1]), QImode);
   return "mov{b}\t{%b1, %h0|%h0, %b1}";
 }
   [(set_attr "isa" "*,nox64")

[PATCH], PR target/68404, Fix PowerPC fusion error

2016-02-09 Thread Michael Meissner

This bug fixes PR 68404, which created an insn for the fusion operation when
accessing an array with a large constant offset that the downstream passes
(regrenam in particular don't like).  Because fusion in general adds so little
to the performance of power8, I just eliminated the compiler from generating
this case for GCC 6.  In the GCC 7 timeframe, I likely will revist fusion for
power9 support.  I ran a spec 2006 benchmark suite comparing the current
behavior and the fix for PR 68404, and it was in the noise level (mcf was 1%
slower, others ranged from 0.3% slower to 0.4% faster).

I did a bootstrap build, including a bootstrap profiled build with LTO (which
is how the problem was found) and it was found.  I rewrote 2 of the 3 fusion
tests so that it uses fusion from a medium code toc entry instead of accessing
an array element with a constant index over 65536 bytes.

Is this patch ok to apply?  If you would prefer, I can eliminate the code
inside of the fusion_gpr_addis predicate instead of using #if 0.

[gcc]
2016-02-08  Michael Meissner  

PR target/68404
* config/rs6000/predicates.md (fusion_gpr_addis): Prevent fusing
an ADDIS that adds a pointer to a large constant that sets the
upper16 bits with a load operation.

[gcc/testsuite]
2016-02-08  Michael Meissner  

PR target/68404
* gcc.target/powerpc/fusion.c: Rewrite test to use TOC fusion
instead accessing a really large arrray.
* gcc.target/powerpc/fusion3.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 233220)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1716,10 +1716,17 @@ (define_predicate "fusion_gpr_addis"
   if (CONST_INT_P (op))
 int_const = op;
 
+#if 0
+  /* PR 68404 -- regrename doesn't  like:
+
+   (mem (plus (plus (reg)
+(const_int))
+  (const_int  */
   else if (GET_CODE (op) == PLUS
   && base_reg_operand (XEXP (op, 0), Pmode)
   && CONST_INT_P (XEXP (op, 1)))
 int_const = XEXP (op, 1);
+#endif
 
   else
 return 0;
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===
--- gcc/testsuite/gcc.target/powerpc/fusion3.c  (revision 233220)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c  (working copy)
@@ -4,15 +4,24 @@
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
 
-#define LARGE 0x12345
+#define SIZE 4
+struct foo {
+  float f;
+  double d;
+};
 
-int fusion_float_read (float *p){ return p[LARGE]; }
-int fusion_double_read (double *p){ return p[LARGE]; }
+static struct foo st[SIZE];
+struct foo *ptr_st = &st[0];
 
-void fusion_float_write (float *p, float f){ p[LARGE] = f; }
-void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+float fusion_float_read (void){ return st[SIZE].f; }
+double fusion_float_extend (void){ return (double)st[SIZE].f; }
+double fusion_double_read (void){ return st[SIZE].d; }
 
-/* { dg-final { scan-assembler "load fusion, type SF"  } } */
-/* { dg-final { scan-assembler "load fusion, type DF"  } } */
-/* { dg-final { scan-assembler "store fusion, type SF" } } */
-/* { dg-final { scan-assembler "store fusion, type DF" } } */
+void fusion_float_write (float f){ st[SIZE].f = f; }
+void fusion_float_truncate (double d){ st[SIZE].f = (float)d; }
+void fusion_double_write (double d){ st[SIZE].d = d; }
+
+/* { dg-final { scan-assembler-times "load fusion, type SF"  2 } } */
+/* { dg-final { scan-assembler-times "load fusion, type DF"  1 } } */
+/* { dg-final { scan-assembler-times "store fusion, type SF" 2 } } */
+/* { dg-final { scan-assembler-times "store fusion, type DF" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===
--- gcc/testsuite/gcc.target/powerpc/fusion.c   (revision 233220)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c   (working copy)
@@ -1,17 +1,28 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3 -mcmodel=medium" } */
 
-#define LARGE 0x12345
+#define SIZE 4
+struct foo {
+  unsigned char uc;
+  signed char sc;
+  unsigned short us;
+  short ss;
+  int i;
+  unsigned u;
+};
 
-int fusion_uchar (unsigned char *p){ return p[LARGE]; }
-int fusion_schar

Re: [PR69634] fix debug_insn-inconsistent REG_N_CALLS_CROSSED

2016-02-09 Thread Jeff Law


On 02/06/2016 03:06 AM, Alexandre Oliva wrote:

The testcase has a debug insn referencing a pseudo right before an
insn that modifies the pseudo.

Without debug insns, REG_N_CALLS_CROSSED was zero for that pseudo, so
sched_analyze_reg added a dep between the pseudo setter and an earlier
(lib)call.

With debug insns, we miscomputed REG_N_CALLS_CROSSED as nonzero
because of the debug insn, and then no dep was added between the two
insns.  This was enough to change sched1's decisions about where to
place the pseudo setter.

REG_N_CALLS_CROSSED is computed by both regstat_bb_compute_ri and
regstat_bb_compute_calls_crossed, but although the former skipped
debug insns, the latter didn't.

Fixing this inconsistency was enough to fix the -fcompare-debug error.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR target/69634
* regstat.c (regstat_bb_compute_calls_crossed): Disregard
debug insns.

for  gcc/testsuite/ChangeLog

PR target/69634
* gcc.dg/pr69634.c: New.
I removed the explicit -m32.  It was there merely to try and exercise 
the test, even on an x86-64 configured toolchain.  As Uros noted, a 
standard multilib of x86-64 will test -m32, so the explicit -m32 isn't 
needed and it is in fact harmful.


Committed to the trunk with that change.  I'm going to add 4.9/5 
regression markers to the BZ since this affects those releases as well.


Jeff

Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian

Committed to trunk as r233252

On 9 February 2016 at 17:07, Charles Baylis  wrote:
> On 8 February 2016 at 11:42, Kyrill Tkachov  
> wrote:
>
>> On 03/02/16 18:59, charles.bay...@linaro.org wrote:
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>> unsigned int i, high, mask, nelt = d->nelt;
>>> rtx out0, out1, in0, in1;
>>> rtx (*gen)(rtx, rtx, rtx, rtx);
>>> +  int first_elem;
>>> +  bool is_swapped;
>>>   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
>>>   return false;
>>>   +  is_swapped = BYTES_BIG_ENDIAN ? true : false;
>>
>>
>> This is just "is_swapped = BYTES_BIG_ENDIAN;"
>
> Done.
>
>>> +
>>> /* Note that these are little-endian tests.  Adjust for big-endian
>>> later.  */
>>
>>
>> I think you can remove this comment now, like in patch 1/2
>
> Done.
>
>>> +  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped];
>>> +
>>> high = nelt / 2;
>>> -  if (d->perm[0] == high)
>>> +  if (first_elem == neon_endian_lane_map (d->vmode, high))
>>>   ;
>>> -  else if (d->perm[0] == 0)
>>> +  else if (first_elem == neon_endian_lane_map (d->vmode, 0))
>>>   high = 0;
>>> else
>>>   return false;
>>> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
>>>   for (i = 0; i < nelt / 2; i++)
>>>   {
>>> -  unsigned elt = (i + high) & mask;
>>> -  if (d->perm[i * 2] != elt)
>>> +  unsigned elt =
>>> +   neon_pair_endian_lane_map (d->vmode, i + high) & mask;
>>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>>> is_swapped)]
>>> + != elt)
>>> return false;
>>> -  elt = (elt + nelt) & mask;
>>> -  if (d->perm[i * 2 + 1] != elt)
>>> +  elt =
>>> +   neon_pair_endian_lane_map (d->vmode, i + nelt + high)
>>> +   & mask;
>>
>>
>> The "& mask" can go on the previous line.
>
> Done
>
>>> +  if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i +
>>> !is_swapped)]
>>> + != elt)
>>> return false;
>>>   }
>>>   @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d
>>> *d)
>>>   in0 = d->op0;
>>> in1 = d->op1;
>>> -  if (BYTES_BIG_ENDIAN)
>>> +  if (is_swapped)
>>>   {
>>> std::swap (in0, in1);
>>> -  high = !high;
>>>   }
>>
>>
>> remove the braces around the std::swap.
>
> Done.
>
>> Ok with these changes.
>> I've tried out both patch and they do fix execution failures on big-endian
>> and don't break any NEON intrinsics tests that I threw at them.
>
> Attached for completeness, will commit once the VUZP patch is OKd.

Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian