date:20151109

Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-09 Thread Nikolai Bozhenov




On 11/06/2015 08:16 PM, Kyrill Tkachov wrote:


On 06/11/15 17:09, Kyrill Tkachov wrote:


On 06/11/15 17:07, Nikolai Bozhenov wrote:

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the 
consumer:


 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?
If the accumulator can be forwarded (i.e. a SImode register), there 
isn't a reason why a subreg:SI (reg:DI) will not get forwarded.


The subreg:SI is an artifact before register allocation, thus it's 
a representation issue that the patch is fixing here unless I 
misunderstand your question.



I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?


Could you please post the full RTL instruction we're talking about 
here as it appears in the scheduler dump?

So that we're all on the same page about which case we're talking about.



Sorry, missed the above instruction.
This subreg is just a pre-register allocation representation of the 
instruction and will go away after reload.
This particular function only really has a real effect in post-reload 
scheduling as it's only there when the final

register numbers are known.



I see. aarch_accumulator_forwarding always returns 0 for virtual
registers. But isn't it overly pessimistic to assume that accumulator
forwarding is never possible at sched1? I wonder if it would be better
to be more optimistic about register allocation outcome. I mean, in
case of virtual registers we could assume forwarding from A to B if B
is the only consumer of A's result. Something like this:

if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER
|| REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER)
  return (DF_REG_USE_COUNT (REGNO (dest)) == 1)
&& (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer);
else
  return REGNO (dest) == REGNO (accumulator);

Thanks,
Nikolai

FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures

2015-11-09 Thread Thomas Preud'homme

[Forwarding to gcc-patches, doh!]
 
Best regards,

Thomas
--- Begin Message ---
Hi,

ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running 
for arm-none-eabi with -mcpu=cortex-m7:

FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors)
FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13

The first one is due to a missing type specifier in the declaration of labelref 
while the second one is due to different constant synthesis as a result of a 
different tuning for the CPU selected. This patch fixes these issues by adding 
the missing type specifier and checking for .word and similar directive instead 
of the number of movt.

The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the 
-mslow-flash-data switch.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-11-04  Thomas Preud'homme  

* gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for
labelref and check use of constant pool by looking for .word and
similar directives.


diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
index 9852ea5..089a72b 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
@@ -50,7 +50,7 @@ int
 foo (int a, int b)
 {
   int i;
-  volatile *labelref = &&label1;
+  volatile int *labelref = &&label1;
 
   if (a > b)
 {
@@ -70,5 +70,4 @@ label1:
   return a + b;
 }
 
-/* { dg-final { scan-assembler-times "movt" 13 } } */
-/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */
+/* { dg-final { scan-assembler-not 
"\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */


Is this ok for trunk?

Best regards,

Thomas
--- End Message ---

Re: [PING][PATCH][4.9]Backport "Fix register corruption bug in ree"

2015-11-09 Thread Richard Biener

On Fri, 6 Nov 2015, Renlin Li wrote:

> Hi Richard,
> 
> I am trying to come up with a simple test case, but failed to do so.
> 
> I have isolated the function where bug is triggered. However, in order to make
> it a useful test case, the silently corrupted register should be used
> somewhere, and should affect the correctness of the program. I cannot create
> such an context.
> 
> 
> libstdc++-v3/testsuite/tr1/8_c_compatibility/complex/50880.cc is a concreted
> example which will trigger this bug.

Well, if that testcase exposes the bug you already have a testcase.

Richard.

> Here is a short description about why the test case failed:
> 
> wrong code-gen in ree pass:
> 
> (insn 279 278 531 2 (set (reg:TF 32 v0)
> (zero_extend:TF (reg:DI 19 x19 [orig:150 __x+8 ] [150])))
> /libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di}
>  (nil))
> (insn 531 279 275 2 (set (reg:TF 3 x3)
> (reg:TF 32 v0)) /libstdc++-v3/include/complex:861 -1
>  (nil))
> (insn 275 531 510 2 (set (reg:DI 23 x23 [orig:151 __y ] [151])
> (mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104])
> (const_int 16 [0x10])) [9 MEM[(const long double &)__z_4(D) +
> 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 {*movdi_aarch64}
>  (nil))
> (insn 510 275 19 2 (set (zero_extract:TF (reg:TF 32 v0)
> (const_int 64 [0x40])
> (const_int 64 [0x40]))
> (zero_extend:TF (reg:DI 2 x2 [ __x+-8 ])))
> /libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di}
>  (nil))
> 
> 
> insn 531 clobbers x4 as well. That's exactly the same problem described in
> Wilco's email. The following is what the correct code should look like. It's
> generated after applying Wilco's fix.
> 
> 
> (insn 279 278 275 2 (set (reg:DI 3 x3 [orig:185 __x ] [185])
> (reg:DI 19 x19 [orig:150 __x+8 ] [150]))
> /libstdc++-v3/include/complex:861 34 {*movdi_aarch64}
>  (nil))
> (insn 275 279 509 2 (set (reg:DI 23 x23 [orig:151 __y ] [151])
> (mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104])
> (const_int 16 [0x10])) [9 MEM[(const long double &)__z_4(D) +
> 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 {*movdi_aarch64}
>  (nil))
> (insn 509 275 510 2 (set (reg:TF 32 v0)
> (zero_extend:TF (reg:DI 3 x3 [orig:185 __x ] [185])))
> /libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di}
>  (nil))
> (insn 510 509 19 2 (set (zero_extract:TF (reg:TF 32 v0)
> (const_int 64 [0x40])
> (const_int 64 [0x40]))
> (zero_extend:TF (reg:DI 2 x2 [ __x+-8 ])))
> /libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di}
>  (nil))
> 
> 
> 
> Regards,
> Renlin Li
> 
> On 29/10/15 16:33, Richard Biener wrote:
> > On October 29, 2015 4:37:08 PM GMT+01:00, Ramana Radhakrishnan
> >  wrote:
> > > On Thu, Jun 4, 2015 at 2:16 PM, Renlin Li  wrote:
> > > > Ping ~
> > > > 
> > > > Can somebody review it? Thank you!
> > > > 
> > > > Regards,
> > > > Renlin Li
> > > > 
> > > > 
> > > > On 16/04/15 10:32, Renlin Li wrote:
> > > > > Ping~
> > > > > 
> > > > > Regards,
> > > > > Renlin Li
> > > > > 
> > > > > On 16/04/15 10:09, wrote:
> > > > > > Ping~
> > > > > > Anybody has time to review it?
> > > > > > 
> > > > > > 
> > > > > > Regards,
> > > > > > Renlin Li
> > > > > > 
> > > > > > On 06/02/15 17:48, Renlin Li wrote:
> > > > > > > Hi all,
> > > > > > > 
> > > > > > > This is a backport patch for branch 4.9. You can find the
> > > original=20
> > > > > > > patch
> > > here:https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html
> > > > > > > And it has been commit on the trunk as r215205.
> > > > > > > 
> > > > > > > This fixes a few libstdc++-v3 test suite failures.
> > > > > > > x86_64 bootstraps Okay, aarch64_be-none-elf libstdc++-v3 tested
> > > Okay.
> > > > > > > Okay to commit on branch 4.9?
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Renlin Li
> > > > > > > 
> > > > > > > 2015-02-06  Renlin Li
> > > > > > > 
> > > > > > >   Backport from mainline
> > > > > > >   2014-09-12  Wilco Dijkstra
> > > > > > > 
> > > > > > >   * gcc/ree.c (combine_reaching_defs): Ensure inserted copy
> > > don't=20
> > > > > > > change
> > > > > > >   the number of hard registers.
> > > > > > > 
> > > richi - an RM question -
> > > 
> > > Is this something that can be pulled back to GCC 4.9 branch assuming
> > > testing still shows no regressions  - it breaks aarch64 be on GCC 4.9
> > If it is a regression or wrong-code fix its OK with a test case.
> > 
> > Richard.
> > 
> > > ...
> > > 
> > > 
> > > regards
> > > Ramana
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [gomp4, committed] Remove superfluous pass_expand_omp_ssa

2015-11-09 Thread Thomas Schwinge

Hi!

On Sun, 8 Nov 2015 23:00:31 +0100, Tom de Vries  wrote:
> This patch removes a superfluous pass_expand_omp_ssa. [...]

> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -122,7 +122,6 @@ along with GCC; see the file COPYING3.  If not see
> late.  */
> NEXT_PASS (pass_split_functions);
>POP_INSERT_PASSES ()
> -  NEXT_PASS (pass_expand_omp_ssa);
>NEXT_PASS (pass_release_ssa_names);
>NEXT_PASS (pass_rebuild_cgraph_edges);
>NEXT_PASS (pass_inline_parameters);

Committed to gomp-4_0-branch in r229994:

commit 38d123c26f4b1f53258742fd285df2a128150159
Author: tschwinge 
Date:   Mon Nov 9 08:36:54 2015 +

Adjust test case for removed pass_expand_omp_ssa

gcc/testsuite/
* gcc.dg/gomp/notify-new-function-3.c: Scan "ompexpssa2" instead
of "ompexpssa3".

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229994 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp  | 5 +
 gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index a300e38..22f1c74 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-11-09  Thomas Schwinge  
+
+   * gcc.dg/gomp/notify-new-function-3.c: Scan "ompexpssa2" instead
+   of "ompexpssa3".
+
 2015-11-08  Tom de Vries  
 
* g++.dg/tree-ssa/copyprop-1.C: Update for extra pass_dce in pass list.
diff --git gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c 
gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c
index c3d097e..033a407 100644
--- gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c
+++ gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c
@@ -11,4 +11,4 @@ foo (int *__restrict a, int *__restrict b, int *__restrict c)
 
 
 /* Check for new function notification in ompexpssa dump.  */
-/* { dg-final { scan-tree-dump-times "Added new ssa gimple function 
foo\\.\[\\\$_\]loopfn\\.0 to callgraph" 1 "ompexpssa3" } } */
+/* { dg-final { scan-tree-dump-times "Added new ssa gimple function 
foo\\.\[\\\$_\]loopfn\\.0 to callgraph" 1 "ompexpssa2" } } */


Grüße
 Thomas


signature.asc
Description: PGP signature

[PATCH] Minor refactoring in tree-ssanames.c & freelists verifier

2015-11-09 Thread Jeff Law



This is a piece of a larger refactoring.  I'm postponing the larger 
refactoring until next stage1 -- I have some concerns about the 
compile-time impact that I'll need to measure.


In the mean time I want to get knowledge of the freelists out of 
pass_release_ssa_names::execute.  The larger refactoring will move the 
freelists into a class and pass_release_ssa_names won't have access to 
those private members.  So the guts of that routine have been moved into 
what will ultimately be a method of the name manager.


I'm also including a leak detector/verification routine for the 
freelists.  This version will bootstrap on the trunk if the verifier is 
enabled at the end of flush_ssanames_freelist.  However, the vectorizer 
leaks names horribly, thus it won't pass a regression test or a -O3 
bootstrap.  In the mean time, the verifier is still useful to help find 
leaks -- it's obviously not checking by default and the code is marked 
as a debug function.


This wraps up the things I expect to address during this stage1 cycle. 
I've got a fair amount of patch review to cover in the immediate future 
as well as starting the bugfixing cycle.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on 
the trunk.


Jeff
commit feba260ed75060c9acc265ff12a44557e280f5be
Author: Jeff Law 
Date:   Sun Nov 8 20:33:59 2015 -0700

[PATCH] Minor refactoring in tree-ssanames.c & freelists verifier
* tree-into-ssa.c (names_to_release): No longer static.
* tree-into-ssa.h (names_to_release): Declare.
* tree-ssanames.c (verify_ssaname_freelists): New debug function.
(release_free_names_and_compact_live_names): New function extracted
from pass_release_ssa_names::execute.
(pass_release_ssa_names::execute): Use it.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dfb8350..c70ab87 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-11-09  Jeff Law  
+
+   * tree-into-ssa.c (names_to_release): No longer static.
+   * tree-into-ssa.h (names_to_release): Declare.
+   * tree-ssanames.c (verify_ssaname_freelists): New debug function.
+   (release_free_names_and_compact_live_names): New function extracted
+   from pass_release_ssa_names::execute.
+   (pass_release_ssa_names::execute): Use it.
+
 2015-11-09  Alan Modra  
 
* gensupport.c (add_mnemonic_string): Make len param a size_t.
@@ -11,7 +20,7 @@
* gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional
branch with only one successor just like unconditional branches.
 
-2015-11-08  Jeff Law  
+2015-11-08  Jeff Law  
 
* tree-ssa-threadupdate.c (register_jump_thraed): Assert that a
non-FSM path has no edges marked with EDGE_DFS_BACK.
@@ -369,7 +378,7 @@
the dominance info; free it if we can't.
(pass_call_cdce::execute): Don't free the dominance info here.
 
-2015-11-06  Jeff Law 
+2015-11-06  Jeff Law 
 
* tree-ssa-threadedge.c (dummy_simplify): Remove.
(thread_around_empty_blocks): Remove backedge_seen_p argument.
@@ -402,7 +411,7 @@
(build_scop_scattering): Call build_pbb_minimal_scattering_polyhedrons.
(build_poly_scop): Call build_scop_minimal_scattering.
 
-2015-11-06  Jeff Law 
+2015-11-06  Jeff Law 
 
* cfg-flags.def (IGNORE): New edge flag.
* tree-vrp.c (identify_jump_threads): Mark and clear edges
@@ -1174,7 +1183,7 @@
* tree-ssa.c (gimple_uses_undefined_value_p): New.
* tree-ssa.h (gimple_uses_undefined_value_p): Declare.
 
-2015-11-02  Jeff Law 
+2015-11-02  Jeff Law 
 
* tree-ssa-threadupdate.c (valid_jump_thread_path): Also detect
cases where the loop latch edge is in the middle of an FSM path.
@@ -1238,7 +1247,7 @@
PR middle-end/68166
* fold-const.c: Include "md5.h".
 
-2015-11-01  Jeff Law 
+2015-11-01  Jeff Law 
 
* vmsdbgout.c: Revert unused header file reduction patch.
 
@@ -1287,7 +1296,7 @@
 
* tree-ssa-structalias.c (ipa_pta_execute): Use make_copy_constraint.
 
-2015-10-30  Jeff Law 
+2015-10-30  Jeff Law 
Nathan Sidwell  
 
* config/nvptx/nvptx.h (HARD_REGNO_NREGS): Avoid warning on unused
diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index 6533998..3086f82 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -94,7 +94,7 @@ static sbitmap interesting_blocks;
 /* Set of SSA names that have been marked to be released after they
were registered in the replacement table.  They will be finally
released after we finish updating the SSA web.  */
-static bitmap names_to_release;
+bitmap names_to_release;
 
 /* vec of vec of PHIs to rewrite in a basic block.  Element I corresponds
the to basic block with index I.  Allocated once per compilation, *not*
diff --git a/gcc/tree-into-ssa.h b/gcc/tree-into-ssa.h
index c053f78..aed1e95 100644
--- a/gcc/tree-into-ssa.h
+++ b/gcc/tree-into-ssa.h
@@ -48,5 +48,6 @@ extern void dump_names_replac

Re: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins

2015-11-09 Thread Ramana Radhakrishnan

On 08/11/15 00:26, charles.bay...@linaro.org wrote:
> From: Charles Baylis 
> 
> gcc/ChangeLog:
> 
>   Charles Baylis  
> 
>   PR target/63870
>   * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator
>   qualifier_struct_load_store_lane_index.
>   (builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
>   (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON
>   argument qualifiers.
>   (arm_expand_neon_builtin): Handle new NEON argument qualifier.
>   * config/arm/arm.h (ENDIAN_LANE_N): New macro.
> 
> Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647
> ---
>  gcc/config/arm/arm-builtins.c | 48 
> +++
>  gcc/config/arm/arm.c  |  1 +
>  gcc/config/arm/arm.h  |  3 +++
>  3 files changed, 34 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index bad3dc3..6e3aad4 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -67,7 +67,9 @@ enum arm_type_qualifiers
>/* Polynomial types.  */
>qualifier_poly = 0x100,
>/* Lane indices - must be within range of previous argument = a vector.  */
> -  qualifier_lane_index = 0x200
> +  qualifier_lane_index = 0x200,
> +  /* Lane indices for single lane structure loads and stores.  */
> +  qualifier_struct_load_store_lane_index = 0x400
>  };
>  
>  /*  The qualifier_internal allows generation of a unary builtin from
> @@ -1963,6 +1965,7 @@ typedef enum {
>NEON_ARG_COPY_TO_REG,
>NEON_ARG_CONSTANT,
>NEON_ARG_LANE_INDEX,
> +  NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
>NEON_ARG_MEMORY,
>NEON_ARG_STOP
>  } builtin_arg;
> @@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, 
> machine_mode mem_mode,
>  /* Expand a Neon builtin.  */
>  static rtx
>  arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
> -   int icode, int have_retval, tree exp, ...)
> +   int icode, int have_retval, tree exp,
> +   builtin_arg *args)
>  {
> -  va_list ap;
>rtx pat;
>tree arg[SIMD_MAX_BUILTIN_ARGS];
>rtx op[SIMD_MAX_BUILTIN_ARGS];
> @@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode 
> map_mode, int fcode,
> || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
>  target = gen_reg_rtx (tmode);
>  
> -  va_start (ap, exp);
> -
>formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
>  
>for (;;)
>  {
> -  builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
> +  builtin_arg thisarg = args[argc];
>  
>if (thisarg == NEON_ARG_STOP)
>   break;
> @@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode 
> map_mode, int fcode,
>   op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
> break;
>  
> + case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
> +   gcc_assert (argc > 1);
> +   if (CONST_INT_P (op[argc]))
> + {
> +   neon_lane_bounds (op[argc], 0,
> + GET_MODE_NUNITS (map_mode), exp);
> +   /* Keep to GCC-vector-extension lane indices in the RTL.  */
> +   op[argc] =
> + GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc])));
> + }
> +   goto constant_arg;
> +
>   case NEON_ARG_LANE_INDEX:
> /* Previous argument must be a vector, which this indexes.  */
> gcc_assert (argc > 0);
> @@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode 
> map_mode, int fcode,
>   }
> /* Fall through - if the lane index isn't a constant then
>the next case will error.  */
> +
>   case NEON_ARG_CONSTANT:
> +constant_arg:
> if (!(*insn_data[icode].operand[opno].predicate)
> (op[argc], mode[argc]))
> - error_at (EXPR_LOCATION (exp), "incompatible type for argument 
> %d, "
> -"expected %", argc + 1);
> + {
> +   error ("%Kargument %d must be a constant immediate",
> +  exp, argc + 1);
> +   return const0_rtx;
> + }
> break;
> +
>  case NEON_ARG_MEMORY:
> /* Check if expand failed.  */
> if (op[argc] == const0_rtx)
> -   {
> - va_end (ap);
>   return 0;
> -   }
> gcc_assert (MEM_P (op[argc]));
> PUT_MODE (op[argc], mode[argc]);
> /* ??? arm_neon.h uses the same built-in functions for signed
> @@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode 
> map_mode, int fcode,
>   }
>  }
>  
> -  va_end (ap);
> -
>if (have_retval)
>  switch (argc)
>{
> @@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx 
> target)
>  
>if (d->quali

Re: RFC: C++ delayed folding merge

2015-11-09 Thread Richard Biener

On Mon, 9 Nov 2015, Jason Merrill wrote:

> I'm planning to merge the C++ delayed folding branch this week, but I need to
> get approval of the back end changes (the first patch attached).  Most of
> these are the introduction of non-folding variants of convert_to_*, but there
> are a few others.
> 
> One question: The branch changes 'convert' to not fold its result, and it's
> not clear to me whether that's part of the expected behavior of a front end
> 'convert' function or not.

History.  convert is purely frontend (but shared, unfortunately between
all frontends).  I would expect that FEs that do not do delayed folding
expect convert to fold.

> Also, I'm a bit uncertain about merging this at the end of stage 1, since it's
> a large internal change with relatively small user impact; it just improves
> handling of constant expression corner cases.  I'm inclined to go ahead with
> it at this point, but I'm interested in contrary opinions.

I welcome this change as it should allow cleaning up the FE-middle-end
interface a bit more.  It should be possible to remove all
NON_LVALUE_EXPR adding/removal from the middle-end folders.

Looks like the backend patch included frontend parts but as far as I
skimmed it only

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 5e32901..d754a90 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree type,
tree arg1)
   else if (TREE_CODE (arg1) == REAL_CST)
return fold_convert_const_fixed_from_real (type, arg1);
 }
+  else if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  if (TREE_CODE (arg1) == VECTOR_CST
+ && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE
(arg1))
+ && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
+   {
+ tree r = copy_node (arg1);
+ TREE_TYPE (arg1) = type;
+ return r;
+   }
+}

looks suspicious.  The issue here is that the vector elements will
have the wrong type after this simple handling.

If you fix that you can as well handle all kind of element type
changes via recursing to fold_convert_const (that includes
float to int / int to float changes).  Not sure if we can even
write a testcase for such conversions with the GCC vector extensions
though.

Thanks,
Richard.

Re: [PATCH 2/4] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier

2015-11-09 Thread Ramana Radhakrishnan

On 08/11/15 00:26, charles.bay...@linaro.org wrote:
> From: Charles Baylis 
> 
> gcc/ChangeLog:
> 
>   Charles Baylis  
> 
>   PR target/63870
>   * config/arm/arm-builtins.c: (arm_load1_qualifiers) Use
>   qualifier_struct_load_store_lane_index.
>   (arm_storestruct_lane_qualifiers) Likewise.
>   * config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for
>   big-endian.
>   (neon_vst1_lane) Likewise.
>   (neon_vld2_lane) Likewise.
>   (neon_vst2_lane) Likewise.
>   (neon_vld3_lane) Likewise.
>   (neon_vst3_lane) Likewise.
>   (neon_vld4_lane) Likewise.
>   (neon_vst4_lane) Likewise.
> 
> Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2
> ---
>  gcc/config/arm/arm-builtins.c |  4 ++--
>  gcc/config/arm/neon.md| 49 
> +++
>  2 files changed, 28 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 6e3aad4..113e3da 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -152,7 +152,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_const_pointer_map_mode,
> -  qualifier_none, qualifier_immediate };
> +  qualifier_none, qualifier_struct_load_store_lane_index };
>  #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
>  
>  /* The first argument (return type) of a store should be void type,
> @@ -171,7 +171,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_void, qualifier_pointer_map_mode,
> -  qualifier_none, qualifier_immediate };
> +  qualifier_none, qualifier_struct_load_store_lane_index };
>  #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
>  
>  #define v8qi_UP  V8QImode
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index e5a2b0f..e8db020 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN)
>  UNSPEC_VLD1_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[3]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
> +  operands[3] = GEN_INT (lane);
>if (lane < 0 || lane >= max)
>  error ("lane out of range");
>if (max == 1)
> @@ -4281,8 +4282,9 @@ if (BYTES_BIG_ENDIAN)
>  UNSPEC_VLD1_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[3]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
> +  operands[3] = GEN_INT (lane);
>int regno = REGNO (operands[0]);
>if (lane < 0 || lane >= max)
>  error ("lane out of range");
> @@ -4367,8 +4369,9 @@ if (BYTES_BIG_ENDIAN)
> UNSPEC_VST1_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[2]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
> +  operands[2] = GEN_INT (lane);
>if (lane < 0 || lane >= max)
>  error ("lane out of range");
>if (max == 1)
> @@ -4387,7 +4390,7 @@ if (BYTES_BIG_ENDIAN)
> UNSPEC_VST1_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[2]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>if (lane < 0 || lane >= max)
> @@ -4396,8 +4399,8 @@ if (BYTES_BIG_ENDIAN)
>  {
>lane -= max / 2;
>regno += 2;
> -  operands[2] = GEN_INT (lane);
>  }
> +  operands[2] = GEN_INT (lane);
>operands[1] = gen_rtx_REG (mode, regno);
>if (max == 2)
>  return "vst1.\t{%P1}, %A0";
> @@ -4457,7 +4460,7 @@ if (BYTES_BIG_ENDIAN)
> UNSPEC_VLD2_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[3]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> @@ -4466,7 +4469,7 @@ if (BYTES_BIG_ENDIAN)
>ops[0] = gen_rtx_REG (DImode, regno);
>ops[1] = gen_rtx_REG (DImode, regno + 2);
>ops[2] = operands[1];
> -  ops[3] = operands[3];
> +  ops[3] = GEN_INT (lane);
>output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops);
>return "";
>  }
> @@ -4482,7 +4485,7 @@ if (BYTES_BIG_ENDIAN)
> UNSPEC_VLD2_LANE))]
>"TARGET_NEON"
>  {
> -  HOST_WIDE_INT lane = INTVAL (operands[3]);
> +  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> @@ -4572,7 +4575,7 @@ if (BYTES_BIG_ENDIAN)
> UNSPE

Re: FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures

2015-11-09 Thread Kyrill Tkachov



On 09/11/15 08:21, Thomas Preud'homme wrote:

[Forwarding to gcc-patches, doh!]
  
Best regards,


Thomas

[PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures.eml

Subject:
[PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures
From:
Thomas Preud'homme 
Date:
09/11/15 08:02

To:
Richard Earnshaw , Ramana Radhakrishnan 
, Kyrylo Tkachov 


Hi,

ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running 
for arm-none-eabi with -mcpu=cortex-m7:

FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors)
FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13

The first one is due to a missing type specifier in the declaration of labelref 
while the second one is due to different constant synthesis as a result of a 
different tuning for the CPU selected. This patch fixes these issues by adding 
the missing type specifier and checking for .word and similar directive instead 
of the number of movt.

The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the 
-mslow-flash-data switch.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-11-04  Thomas Preud'homme

 * gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for
 labelref and check use of constant pool by looking for .word and
 similar directives.


Ok.
Thanks,
Kyrill




diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
index 9852ea5..089a72b 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
@@ -50,7 +50,7 @@ int
  foo (int a, int b)
  {
int i;
-  volatile *labelref = &&label1;
+  volatile int *labelref = &&label1;
  
if (a > b)

  {
@@ -70,5 +70,4 @@ label1:
return a + b;
  }
  
-/* { dg-final { scan-assembler-times "movt" 13 } } */

-/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */
+/* { dg-final { scan-assembler-not 
"\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */


Is this ok for trunk?

Best regards,

Thomas

Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-09 Thread Kyrill Tkachov



On 09/11/15 08:14, Nikolai Bozhenov wrote:



On 11/06/2015 08:16 PM, Kyrill Tkachov wrote:


On 06/11/15 17:09, Kyrill Tkachov wrote:


On 06/11/15 17:07, Nikolai Bozhenov wrote:

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the consumer:

 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?

If the accumulator can be forwarded (i.e. a SImode register), there isn't a 
reason why a subreg:SI (reg:DI) will not get forwarded.

The subreg:SI is an artifact before register allocation, thus it's a 
representation issue that the patch is fixing here unless I misunderstand your 
question.


I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?


Could you please post the full RTL instruction we're talking about here as it 
appears in the scheduler dump?
So that we're all on the same page about which case we're talking about.



Sorry, missed the above instruction.
This subreg is just a pre-register allocation representation of the instruction 
and will go away after reload.
This particular function only really has a real effect in post-reload 
scheduling as it's only there when the final
register numbers are known.



I see. aarch_accumulator_forwarding always returns 0 for virtual
registers. But isn't it overly pessimistic to assume that accumulator
forwarding is never possible at sched1? I wonder if it would be better
to be more optimistic about register allocation outcome. I mean, in
case of virtual registers we could assume forwarding from A to B if B
is the only consumer of A's result. Something like this:

if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER
|| REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER)
  return (DF_REG_USE_COUNT (REGNO (dest)) == 1)
&& (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer);
else
  return REGNO (dest) == REGNO (accumulator);



Interesting...
As far as I know sched1 tries to minimise live ranges before register 
allocation rather than trying to
perform the most exact pipeline modelling, since we only know the exact 
registers used in sched2.

What you're proposing is a heuristic, so it would need benchmarking results and 
analysis to be considered.

Thanks,
Kyrill



Thanks,
Nikolai

Re: RFC: C++ delayed folding merge

2015-11-09 Thread Eric Botcazou

> One question: The branch changes 'convert' to not fold its result, and
> it's not clear to me whether that's part of the expected behavior of a
> front end 'convert' function or not.

I don't think that you should change the behavior for front-ends that have an 
internal representation distinct from the GENERIC trees and thus do a global 
translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have 
*more* folding than less during this translation.

-- 
Eric Botcazou

Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-09 Thread Kyrill Tkachov



On 06/11/15 10:46, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?


Bootstrap and testing for arm-none-linux-gnueabihf on GCC 5 branch looks ok too.

Kyrill



Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

PR target/68143
* config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
offset from dstbase and use it appropriately in
adjust_automodify_address.
(arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

PR target/68143
* gcc.target/arm/pr68143_1.c: New test.

Re: [PATCH v2] libitm: Support sized delete.

2015-11-09 Thread Richard Henderson


On 11/09/2015 01:24 AM, Torvald Riegel wrote:

This patch supports the sized variants of operator delete.
Some change compare to v1.
Tested on x86_64-linux.



Looks good.


r~

[gomp4, committed] Cleanup formatting of pass_expand_omp_ssa::execute

2015-11-09 Thread Tom de Vries


Hi,

this patch makes the formatting of pass_expand_omp_ssa::execute 
identical to the one in trunk.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Cleanup formatting of pass_expand_omp_ssa::execute

2015-11-09  Tom de Vries  

	* omp-low.c (pass_expand_omp_ssa::execute): Cleanup formatting.
---
 gcc/omp-low.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 57ac2aa..b78a8d6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13657,10 +13657,7 @@ public:
 {
   return !(fun->curr_properties & PROP_gimple_eomp);
 }
-  virtual unsigned int execute (function *)
-{
-  return execute_expand_omp ();
-}
+  virtual unsigned int execute (function *) { return execute_expand_omp (); }
   opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
-- 
1.9.1

[gomp4, committed] Remove TODO_remove_unused_locals from todo_flags_finish of pass_data_expand_omp_ssa

2015-11-09 Thread Tom de Vries


Hi,

this patch removes TODO_remove_unused_locals from todo_flags_finish of 
pass_data_expand_omp_ssa. I can't reproduce the problem 
TODO_remove_unused_locals is supposed to fix with the current state of 
gomp-4_0-branch.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove TODO_remove_unused_locals from todo_flags_finish of pass_data_expand_omp_ssa

2015-11-09  Tom de Vries  

	* omp-low.c (pass_data_expand_omp_ssa): Remove TODO_remove_unused_locals
	from todo_flags_finish.
---
 gcc/omp-low.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b78a8d6..4d6dc99 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13641,8 +13641,7 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias
-  | TODO_remove_unused_locals, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
-- 
1.9.1

[gomp4, committed] Provide PROP_gimple_eomp unconditionally in pass_expand_omp

2015-11-09 Thread Tom de Vries


Hi,

now that we don't postpone expanding the kernels region anymore, we can 
set PROP_gimple_eomp unconditionally.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Provide PROP_gimple_eomp unconditionally in pass_expand_omp

2015-11-09  Tom de Vries  

	* omp-low.c (pass_data_expand_omp): Add PROP_gimple_eomp to
	properties_provided.
	(pass_expand_omp::execute): Don't set PROP_gimple_eomp.
---
 gcc/omp-low.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4d6dc99..cd9c9e6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13589,7 +13589,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
+  PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -13603,14 +13603,12 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *fun)
+  virtual unsigned int execute (function *)
 {
   bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
 		|| flag_openmp_simd != 0)
 		   && !seen_error ());
 
-  fun->curr_properties |= PROP_gimple_eomp;
-
   /* This pass always runs, to provide PROP_gimple_eomp.
 	 But often, there is nothing to do.  */
   if (!gate)
-- 
1.9.1

[gomp4, committed] Remove ssa support in expand_omp_target

2015-11-09 Thread Tom de Vries


Hi,

now that we don't postpone expanding the kernels region anymore, we 
don't need ssa support in expand_omp_target.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove ssa support in expand_omp_target

2015-11-09  Tom de Vries  

	* omp-low.c (release_first_vuse_in_edge_dest): Remove
	(expand_omp_target): Remove ssa support.
---
 gcc/omp-low.c | 74 ---
 1 file changed, 5 insertions(+), 69 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cd9c9e6..ed7640a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6973,38 +6973,6 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from,
 }
 }
 
-/* Release the first vuse in bb E->dest, either normal or phi arg for
-   edge E.  */
-
-static void
-release_first_vuse_in_edge_dest (edge e)
-{
-  gimple_stmt_iterator i;
-  basic_block bb = e->dest;
-
-  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
-{
-  gimple *phi = gsi_stmt (i);
-  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-
-  if (!virtual_operand_p (arg))
-	continue;
-
-  mark_virtual_operand_for_renaming (arg);
-  return;
-}
-
-  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
-{
-  gimple *stmt = gsi_stmt (i);
-  if (gimple_vuse (stmt) == NULL_TREE)
-	continue;
-
-  mark_virtual_operand_for_renaming (gimple_vuse (stmt));
-  return;
-}
-}
-
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -12727,6 +12695,7 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
 gcc_checking_assert (!child_cfun->cfg);
+  gcc_checking_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
@@ -12735,7 +12704,7 @@ expand_omp_target (struct omp_region *region)
 mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 
   basic_block entry_succ_bb = single_succ (entry_bb);
-  if (offloaded && !gimple_in_ssa_p (cfun))
+  if (offloaded)
 {
   gsi = gsi_last_bb (entry_succ_bb);
   if (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ENTRY_END)
@@ -12793,25 +12762,8 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  if (!gimple_in_ssa_p (cfun))
-	{
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
-	}
-	  else
-	{
-	  tree lhs = gimple_assign_lhs (tgtcopy_stmt);
-	  gcc_assert (SSA_NAME_VAR (lhs) == arg);
-	  /* We'd like to set the rhs to the default def in the child_fn,
-		 but it's too early to create ssa names in the child_fn.
-		 Instead, we set the rhs to the parm.  In
-		 move_sese_region_to_fn, we introduce a default def for the
-		 parm, map the parm to it's default def, and once we encounter
-		 this stmt, replace the parm with the default def.  */
-	  gimple_assign_set_rhs1 (tgtcopy_stmt, arg);
-	  gcc_assert (ssa_default_def (cfun, arg) == NULL);
-	  update_stmt (tgtcopy_stmt);
-	}
+	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	  gsi_remove (&gsi, true);
 	}
 
   /* Declare local variables needed in CHILD_CFUN.  */
@@ -12854,23 +12806,11 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
-
-	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
-	 which is about to be split off.  Mark the vdef for renaming.  */
-	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
   /* Move the offloading region into CHILD_CFUN.  */
 
-  if (gimple_in_ssa_p (cfun))
-	{
-	  init_tree_ssa (child_cfun);
-	  init_ssa_operands (child_cfun);
-	  child_cfun->gimple_df->in_ssa_p = true;
-	  block = NULL_TREE;
-	}
-  else
-	block = gimple_block (entry_stmt);
+  block = gimple_block (entry_stmt);
 
   new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
   if (exit_bb)
@@ -12937,8 +12877,6 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	cleanup_tree_cfg ();
 	}
-  if (gimple_in_ssa_p (cfun))
-	update_ssa (TODO_update_ssa);
   if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
 	verify_loop_structure ();
   pop_cfun ();
@@ -13257,8 +13195,6 @@ expand_omp_target (struct omp_region *region)
   gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
   gsi_remove (&gsi, true);
 }
-  if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 /* Expand the parallel region tree rooted at REGION.  Expansion
-- 
1.9.1

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-09 Thread Segher Boessenkool

On Mon, Nov 09, 2015 at 08:52:13AM +0100, Uros Bizjak wrote:
> On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool
>  wrote:
> > On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> >> This patch stops combine from generating widening muls of anything else
> >> but registers (immediates, memory, ...).  This probably is a reasonable
> >> tradeoff for all targets, even those (if any) that have such insns.
> >>
> >> > >I'll let you put it through it's paces on your setup :)
> >>
> >> > I'll let Segher give the final yes/no on this, but it generally looks
> >> > good to me.
> >>
> >> It looks okay to me too.  Testing now, combine patches have the tendency
> >> to do unforeseen things on other targets ;-)
> >
> > Testing shows it makes a difference only very rarely.  For many targets
> > it makes no difference, for a few it is a small win.  For 32-bit x86 it
> > creates slightly bigger code.
> >
> > I think it looks good, but let's wait to hear Uros' opinion.
> 
> >From the original patch submission, it looks that this patch would
> also benefit x86_32.

Yes, that is what I thought too.

> Regarding the above code size increase -  do you perhaps have a
> testcase, to see what causes the difference?

I could extract some.  It happens quite rarely on usual code.

> It isn't necessary due to
> the patch, but perhaps some loads are moved to the insn and aren't
> CSE'd anymore.


Segher

Re: [hsa 4/12] OpenMP lowering/expansion changes (gridification)

2015-11-09 Thread Martin Jambor

Hi,

On Thu, Nov 05, 2015 at 10:57:33PM +0100, Martin Jambor wrote:
> 
 ... 
> 
> For convenience of anybody reviewing the code, I'm attaching a very
> simple testcase with selection of dumps that illustrate the whole
> process.
> 

My apologies, I have forgotten to attach the file, so let me quickly
correct that now.  The tar file consists of the source and a selection
of dumps generated by a compilation with "-fopenmp -O -S
-fdump-tree-all -fdump-tree-omplower-details" flags.

Thanks,

Martin


plusone.tgz
Description: application/compressed-tar

Re: Add a combined_fn enum

2015-11-09 Thread Bernd Schmidt




On 11/07/2015 01:22 PM, Richard Sandiford wrote:

I'm working on a patch series that needs to be able to treat built-in
functions and internal functions in a similar way.  This patch adds a
new enum, combined_fn, that combines the two together.  It also adds
utility functions for seeing which combined_fn (if any) is called by
a given CALL_EXPR or gcall.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?


I see it's already acked, but have you considered just doing away with 
the builtin/internal function distinction?



Bernd

Re: Add a combined_fn enum

2015-11-09 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/07/2015 01:22 PM, Richard Sandiford wrote:
>> I'm working on a patch series that needs to be able to treat built-in
>> functions and internal functions in a similar way.  This patch adds a
>> new enum, combined_fn, that combines the two together.  It also adds
>> utility functions for seeing which combined_fn (if any) is called by
>> a given CALL_EXPR or gcall.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>
> I see it's already acked, but have you considered just doing away with 
> the builtin/internal function distinction?

I think they're too different to be done away with entirely.  built-in
functions map directly to a specific C-level callable function and
must have an fndecl, whereas no internal function should have an fndecl.
Whether a built-in function is available depends on the selected
language and what declarations the front-end has seen, while whether
an internal function is available depends entirely on GCC internal
information.  We always have the option of not handling a built-in
function specially and simply calling its function in the normal way,
whereas for internal functions we always need to opencode the function
in GCC.

Thanks,
Richard

[gomp4] Merge trunk r229831 (2015-11-06) into gomp-4_0-branch

2015-11-09 Thread Thomas Schwinge

Hi!

Committed to gomp-4_0-branch in r229996:

commit 16dbf91c2738167a38a4870c79015dfd619ef903
Merge: 38d123c 917fa02
Author: tschwinge 
Date:   Mon Nov 9 09:07:26 2015 +

svn merge -r 229816:229831 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229996 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature

Fix PR middle-end/68253

2015-11-09 Thread Eric Botcazou

This apparently doesn't have any effect in practice, but valgrind rightfully 
complains, as reported by Martin L.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2015-11-09  Eric Botcazou  

PR middle-end/68253
* fold-const.c (fold_truth_andor_1): Initialize new variables to 0.

-- 
Eric BotcazouIndex: fold-const.c
===
--- fold-const.c	(revision 229993)
+++ fold-const.c	(working copy)
@@ -5561,6 +5561,7 @@ fold_truth_andor_1 (location_t loc, enum
   || (rcode != EQ_EXPR && rcode != NE_EXPR))
 return 0;
 
+  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
   volatilep = 0;
   ll_inner = decode_field_reference (loc, ll_arg,
  &ll_bitsize, &ll_bitpos, &ll_mode,

Re: Add a combined_fn enum

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 11:24 AM, Richard Sandiford wrote:

Bernd Schmidt  writes:

I see it's already acked, but have you considered just doing away with
the builtin/internal function distinction?


I think they're too different to be done away with entirely.  built-in
functions map directly to a specific C-level callable function and
must have an fndecl, whereas no internal function should have an fndecl.
Whether a built-in function is available depends on the selected
language and what declarations the front-end has seen, while whether
an internal function is available depends entirely on GCC internal
information.


Yes... but aren't these things fixable relatively easily (compared with 
what your patches are doing)?


I also have the problem that I can't quite see where your patch series 
is going. Let's take "Add internal bitcount functions", it adds new 
internal functions but no users AFAICS. What is the end goal here (there 
doesn't seem to be a [0/N] description in my inbox)?



Bernd

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

2015-11-09 Thread Bilyan Borisov




On 03/11/15 11:16, James Greenhalgh wrote:

On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote:

In this patch from the series, all vmulx_lane variants have been implemented as
a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the combine pass to
pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
instruction.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

Hi,

I have a small style comment below.


gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): New.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Refactored & moved.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): New.
(vmulxq_laneq_f32): New.
(vmulxq_laneq_f64): New.
(vmulxs_lane_f32): New.
(vmulxs_laneq_f32): New.
(vmulxd_lane_f64): New.
(vmulxd_laneq_f64): New.
* config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
VDQSF): New pattern.
(*aarch64_combine_dupfmulx2, VDQF): New pattern.
(*aarch64_combine_dupfmulx3): New pattern.
(*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.

I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in
the file, this convention points to the number of operands a pattern
requires (for example add3).

I think elsewhere in the file we use:


   "*aarch64_mul3_elt"
   "*aarch64_mul3_elt_"
   "*aarch64_mul3_elt_to_128df"
   "*aarch64_mul3_elt_to_64v2df"

Is there a reason not to follow that pattern?

Thanks,
James


Hi,

I've made the changes you've requested - the pattern names have been 
changed to follow better the naming conventions elsewhere in the file.


Thanks,
Bilyan
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 269e00237bb1153ebf42505906ec5b760b04aafe..5ff19094b2fb10b332d186a6de02752b31ed4141 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2880,6 +2880,79 @@
  [(set_attr "type" "neon_fp_mul_")]
 )
 
+;; fmulxq_lane_f32, and fmulx_laneq_f32
+
+(define_insn "*aarch64_mulx_elt_"
+  [(set (match_operand:VDQSF 0 "register_operand" "=w")
+	(unspec:VDQSF
+	 [(match_operand:VDQSF 1 "register_operand" "w")
+	  (vec_duplicate:VDQSF
+	   (vec_select:
+	(match_operand: 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode,
+	  INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "neon_fp_mul__scalar")]
+)
+
+;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32
+
+(define_insn "*aarch64_mulx_elt"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF
+	 [(match_operand:VDQF 1 "register_operand" "w")
+	  (vec_duplicate:VDQF
+	   (vec_select:
+	(match_operand:VDQF 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "neon_fp_mul_")]
+)
+
+;; fmulxq_lane_f64
+
+(define_insn "*aarch64_mulx_elt_to_64v2df"
+  [(set (match_operand:V2DF 0 "register_operand" "=w")
+	(unspec:V2DF
+	 [(match_operand:V2DF 1 "register_operand" "w")
+	  (vec_duplicate:V2DF
+	(match_operand:DF 2 "register_operand" "w"))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+return "fmulx\t%0.2d, %1.2d, %2.d[0]";
+  }
+  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+)
+
+;; fmulxs_lane_f32, fmulxs_laneq_f32, fmulxd_lane_f64 ==  fmulx_lane_f64,
+;; fmulxd_laneq_f64 == fmulx_laneq_f64
+
+(define_insn "*aarch64_vgetfmulx"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec:
+	 [(match_operand: 1 "register_operand" "w")
+	  (vec_select:
+	   (match_operand:VDQF_DF 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "fmul")]
+)
 ;; q
 
 (define_insn "aarch64_"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 265c26679089f5a1b30a41360687fb1e3c2a8e50..eb65c009a430ad1ad5e0eabd50b4968617379b5e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -8509,32 +8509,6 @@ vmulq_n_u32 (uint32x4_t a, uint32_t b)
   return result;
 }
 
-#define vmulxq_lane_f32(a, b, c)\
-  __extension__ \
-({  \
-

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants + Changelog

2015-11-09 Thread Bilyan Borisov


On 09/11/15 11:03, Bilyan Borisov wrote:



On 03/11/15 11:16, James Greenhalgh wrote:

On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote:
In this patch from the series, all vmulx_lane variants have been 
implemented as

a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the 
combine pass to
pick up the dup/fmulx combination and replace it with a proper 
fmulx[lane]

instruction.

In addition, test cases for all new intrinsics were added. Tested on 
targets

aarch64-none-elf and aarch64_be-none-elf.

Hi,

I have a small style comment below.


gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): New.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Refactored & moved.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): New.
(vmulxq_laneq_f32): New.
(vmulxq_laneq_f64): New.
(vmulxs_lane_f32): New.
(vmulxs_laneq_f32): New.
(vmulxd_lane_f64): New.
(vmulxd_laneq_f64): New.
* config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
VDQSF): New pattern.
(*aarch64_combine_dupfmulx2, VDQF): New pattern.
(*aarch64_combine_dupfmulx3): New pattern.
(*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.
I'm not sure I like the use of 1,2,3 for this naming scheme. 
Elsewhere in

the file, this convention points to the number of operands a pattern
requires (for example add3).

I think elsewhere in the file we use:


   "*aarch64_mul3_elt"
   "*aarch64_mul3_elt_"
   "*aarch64_mul3_elt_to_128df"
   "*aarch64_mul3_elt_to_64v2df"

Is there a reason not to follow that pattern?

Thanks,
James


Hi,

I've made the changes you've requested - the pattern names have been 
changed to follow better the naming conventions elsewhere in the file.


Thanks,
Bilyan


Hi,

You can find the new updated Changelog for this patch below.
Thanks,
Bilyan

---

In this patch from the series, all vmulx_lane variants have been implemented as
a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the combine pass to
pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
instruction.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): Likewise.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Likewise.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): Likewise.
(vmulxq_laneq_f32): Likewise.
(vmulxq_laneq_f64): Likewise.
(vmulxs_lane_f32): Likewise.
(vmulxs_laneq_f32): Likewise.
(vmulxd_lane_f64): Likewise.
(vmulxd_laneq_f64): Likewise.
* config/aarch64/aarch64-simd.md
(*aarch64_mulx_elt_, VDQSF): New pattern.
(*aarch64_mulx_elt, VDQF): Likewise.
(*aarch64_mulx_elt_to_64v2df): Likewise.
(*aarch64_vgetfmulx, VDQF_DF): Likewise.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New.

[patch] Fix PR middle-end/68251

2015-11-09 Thread Eric Botcazou

Hi,

I don't understand how this didn't show up sooner given the amount of testing, 
but there is a nasty overloading for the new REF_REVERSE_STORAGE_ORDER flag:

#define REF_REVERSE_STORAGE_ORDER(NODE) \
   (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag)

saturating_flag is part of the 'bits' structure, which is itself part of the 
'u' union, which also contains the dependence_info structure, and:

#define MR_DEPENDENCE_CLIQUE(NODE) \
  (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.clique)
#define MR_DEPENDENCE_BASE(NODE) \
  (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.base)

So there is a conflict on MEM_REF nodes.  Therefore the attached patch changes 
REF_REVERSE_STORAGE_ORDER to use the default_def_flag instead.

Tested on x86_64-suse-linux, OK for the mainline?  I'll install the Fortran 
testcase once it is reduced because it takes a while to compile ATM.


2015-11-09  Eric Botcazou  

PR middle-end/68251
* tree-core.h (REF_REVERSE_STORAGE_ORDER): Move around.
* tree.h (REF_REVERSE_STORAGE_ORDER): Change to default_def_flag.
* tree-streamer-in.c (unpack_ts_base_value_fields): Adjust.
* tree-streamer-out.c (pack_ts_base_value_fields): Likewise.

-- 
Eric BotcazouIndex: tree-core.h
===
--- tree-core.h	(revision 229993)
+++ tree-core.h	(working copy)
@@ -1157,9 +1157,6 @@ struct GTY(()) tree_base {
TYPE_SATURATING in
other types
 
-   REF_REVERSE_STORAGE_ORDER in
-   BIT_FIELD_REF, MEM_REF
-
VAR_DECL_IS_VIRTUAL_OPERAND in
 	   VAR_DECL
 
@@ -1174,6 +1171,9 @@ struct GTY(()) tree_base {
 
default_def_flag:
 
+   TYPE_FINAL_P in
+	   RECORD_TYPE, UNION_TYPE and QUAL_UNION_TYPE
+
TYPE_VECTOR_OPAQUE in
 	   VECTOR_TYPE
 
@@ -1183,8 +1183,8 @@ struct GTY(()) tree_base {
DECL_NONLOCAL_FRAME in
 	   VAR_DECL
 
-   TYPE_FINAL_P in
-	   RECORD_TYPE, UNION_TYPE and QUAL_UNION_TYPE
+   REF_REVERSE_STORAGE_ORDER in
+   BIT_FIELD_REF, MEM_REF
 */
 
 struct GTY(()) tree_typed {
Index: tree.h
===
--- tree.h	(revision 229993)
+++ tree.h	(working copy)
@@ -927,7 +927,7 @@ extern void omp_clause_range_check_faile
But, of course, the storage order must be preserved when the accesses
themselves are rewritten or transformed.  */
 #define REF_REVERSE_STORAGE_ORDER(NODE) \
-  (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag)
+  (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.default_def_flag)
 
 /* These flags are available for each language front end to use internally.  */
 #define TREE_LANG_FLAG_0(NODE) \
Index: tree-streamer-in.c
===
--- tree-streamer-in.c	(revision 229993)
+++ tree-streamer-in.c	(working copy)
@@ -143,7 +143,10 @@ unpack_ts_base_value_fields (struct bitp
   TYPE_ADDR_SPACE (expr) = (unsigned) bp_unpack_value (bp, 8);
 }
   else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF)
-REF_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1);
+{
+  REF_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1);
+  bp_unpack_value (bp, 8);
+}
   else if (TREE_CODE (expr) == SSA_NAME)
 {
   SSA_NAME_IS_DEFAULT_DEF (expr) = (unsigned) bp_unpack_value (bp, 1);
Index: tree-streamer-out.c
===
--- tree-streamer-out.c	(revision 229993)
+++ tree-streamer-out.c	(working copy)
@@ -117,7 +117,10 @@ pack_ts_base_value_fields (struct bitpac
   bp_pack_value (bp, TYPE_ADDR_SPACE (expr), 8);
 }
   else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF)
-bp_pack_value (bp, REF_REVERSE_STORAGE_ORDER (expr), 1);
+{
+  bp_pack_value (bp, REF_REVERSE_STORAGE_ORDER (expr), 1);
+  bp_pack_value (bp, 0, 8);
+}
   else if (TREE_CODE (expr) == SSA_NAME)
 {
   bp_pack_value (bp, SSA_NAME_IS_DEFAULT_DEF (expr), 1);

[PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Martin Liška

Hi.

This is follow-up of changes that Richi started on Friday.

Patch can bootstrap on x86_64-linux-pc and regression tests are running.

Ready for trunk?
Thanks,
Martin
>From ece1ad32be127c1e00dd18c8d0358bf842fad2bd Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Nov 2015 10:49:14 +0100
Subject: [PATCH] Fix memory leaks and use a pool_allocator

gcc/ChangeLog:

2015-11-09  Martin Liska  

	* gcc.c (record_temp_file): Release name string.
	* ifcvt.c (noce_convert_multiple_sets): Release temporaries
	vector.
	* lra-lives.c (free_live_range_list): Utilize
	lra_live_range_pool for allocation and deallocation.
	(create_live_range): Likewise.
	(copy_live_range): Likewise.
	(lra_merge_live_ranges): Likewise.
	(remove_some_program_points_and_update_live_ranges): Likewise.
	(lra_create_live_ranges_1): Release point_freq_vec that can
	be not freed from previous iteration of the function.
	* tree-eh.c (lower_try_finally_switch): Release a vector.
	* tree-sra.c (sra_deinitialize): Release all vectors in
	base_access_vec.
	* tree-ssa-dom.c (free_edge_info): Make the function extern.
	* tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges):
	Release edge_info for a removed edge.
	(thread_through_all_blocks): Free region vector.
	* tree-ssa.h (free_edge_info): Declare function extern.
---
 gcc/gcc.c   |  5 -
 gcc/ifcvt.c |  3 +++
 gcc/lra-lives.c | 11 ++-
 gcc/tree-eh.c   |  2 ++
 gcc/tree-sra.c  |  4 
 gcc/tree-ssa-dom.c  |  2 +-
 gcc/tree-ssa-threadupdate.c |  6 +-
 gcc/tree-ssa.h  |  1 +
 8 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index bbc9b23..8bbf5be 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete)
   struct temp_file *temp;
   for (temp = always_delete_queue; temp; temp = temp->next)
 	if (! filename_cmp (name, temp->name))
-	  goto already1;
+	  {
+	free (name);
+	goto already1;
+	  }
 
   temp = XNEW (struct temp_file);
   temp->next = always_delete_queue;
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fff62de..eb6b7df 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   set_used_flags (targets[i]);
 }
 
+  temporaries.release ();
+
   set_used_flags (cond);
   set_used_flags (x);
   set_used_flags (y);
@@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 }
 
   num_updated_if_blocks++;
+  targets.release ();
   return TRUE;
 }
 
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 1655c47..87003be 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -103,7 +103,7 @@ free_live_range_list (lra_live_range_t lr)
   while (lr != NULL)
 {
   next = lr->next;
-  delete lr;
+  lra_live_range_pool.remove (lr);
   lr = next;
 }
 }
@@ -112,7 +112,7 @@ free_live_range_list (lra_live_range_t lr)
 static lra_live_range_t
 create_live_range (int regno, int start, int finish, lra_live_range_t next)
 {
-  lra_live_range_t p = new lra_live_range;
+  lra_live_range_t p = lra_live_range_pool.allocate ();
   p->regno = regno;
   p->start = start;
   p->finish = finish;
@@ -124,7 +124,7 @@ create_live_range (int regno, int start, int finish, lra_live_range_t next)
 static lra_live_range_t
 copy_live_range (lra_live_range_t r)
 {
-  return new lra_live_range (*r);
+  return new (lra_live_range_pool) lra_live_range (*r);
 }
 
 /* Copy live range list given by its head R and return the result.  */
@@ -167,7 +167,7 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
 	  r1->start = r2->start;
 	  lra_live_range_t temp = r2;
 	  r2 = r2->next;
-	  delete temp;
+	  lra_live_range_pool.remove (temp);
 	}
   else
 	{
@@ -1081,7 +1081,7 @@ remove_some_program_points_and_update_live_ranges (void)
 		}
 	  prev_r->start = r->start;
 	  prev_r->next = next_r;
-	  delete r;
+	  lra_live_range_pool.remove (r);
 	}
 	}
 }
@@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   dead_set = sparseset_alloc (max_regno);
   unused_set = sparseset_alloc (max_regno);
   curr_point = 0;
+  point_freq_vec.release ();
   point_freq_vec.create (get_max_uid () * 2);
   lra_point_freq = point_freq_vec.address ();
   int *post_order_rev_cfg = XNEWVEC (int, last_basic_block_for_fn (cfun));
diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index 617d657..be140ee 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -1557,6 +1557,8 @@ lower_try_finally_switch (struct leh_state *state, struct leh_tf_state *tf)
  due to not wanting to process the same goto stmts twice.  */
   gimple_seq_add_stmt (&tf->top_p_seq, switch_stmt);
   gimple_seq_add_seq (&tf->top_p_seq, switch_body);
+
+  case_label_vec.release ();
 }
 
 /* Decide whether or not we are going to duplicate the finally block.
diff --git a/gcc

Re: [PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-09 Thread Julian Brown

On Thu, 5 Nov 2015 11:22:04 +0100
Bernd Schmidt  wrote:

> >  static void
> > -mcore_print_operand_address (FILE * stream, rtx x)
> > +mcore_print_operand_address (FILE * stream, machine_mode mode
> > ATTRIBUTE_UNUSED,
> > +rtx x)  
> 
> So apparently we're settling on writing the unused arg as just 
> "machine_mode" without a name. Please change everywhere.
> 
> > @@ -1754,7 +1754,7 @@ mmix_print_operand_punct_valid_p (unsign
> >  /* TARGET_PRINT_OPERAND_ADDRESS.  */
> >
> >  static void
> > -mmix_print_operand_address (FILE *stream, rtx x)
> > +mmix_print_operand_address (FILE *stream, machine_mode mode, rtx x)
> >  {
> >if (REG_P (x))
> >  {  
> 
> The arg appears to be unused - I'd expect to see a warning here.

I've fixed those two, and a handful of other bits I missed.

> Other thank that it looks OK. I'm not going to require that you test 
> every target, but it would be good to have the full set built to cc1 
> before and after, and please be on the lookout for fallout.

Thanks! I used the attached "build-all.sh" to test all the targets
affected by the patch with "make all-gcc": those now all succeed
(I'm sure I reinvented a wheel here, but perhaps the target list is
useful to someone else).

Julian

ChangeLog

gcc/
* final.c (output_asm_insn): Pass VOIDmode to output_address.
(output_address): Add MODE argument. Pass to print_operand_address
hook.
* targhooks.c (default_print_operand_address): Add MODE argument.
* targhooks.h (default_print_operand_address): Update prototype.
* output.h (output_address): Update prototype.
* target.def (print_operand_address): Add MODE argument.
* config/vax/vax.c (print_operand_address): Pass VOIDmode to
output_address.
(print_operand): Pass access mode to output_address.
* config/mcore/mcore.c (mcore_print_operand_address): Add MODE
argument.
(mcore_print_operand): Update calls to mcore_print_operand_address.
* config/fr30/fr30.c (fr30_print_operand): Pass VOIDmode to
output_address.
* config/lm32/lm32.c (lm32_print_operand): Pass mode in calls to
output_address.
* config/tilegx/tilegx.c (output_memory_reference_mode): Remove
global.
(tilegx_print_operand): Don't set above global. Update calls to
output_address.
(tilegx_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode global.
* config/frv/frv.c (frv_print_operand_address): Add MODE argument.
(frv_print_operand): Pass mode to frv_print_operand_address calls.
* config/mn10300/mn10300.c (mn10300_print_operand): Pass mode to
output_address.
* config/cris/cris.c (cris_print_operand_address): Add MODE
argument.
(cris_print_operand): Pass mode to output_address calls.
* config/spu/spu.c (print_operand): Pass mode to output_address
calls.
* config/aarch64/aarch64.h (aarch64_print_operand)
(aarch64_print_operand_address): Remove prototypes.
* config/aarch64/aarch64.c (aarch64_memory_reference_mode): Delete
global.
(aarch64_print_operand): Make static. Update calls to
output_address.
(aarch64_print_operand_address): Add MODE argument. Use instead of
aarch64_memory_reference_mode global.
(TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define target
hooks.
* config/aarch64/aarch64.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS):
Delete macro definitions.
* config/pa/pa.c (pa_print_operand): Pass mode in output_address
calls.
* config/xtensa/xtensa.c (print_operand): Pass mode in
output_address calls.
* config/h8300/h8300.c (h8300_print_operand_address): Add MODE
argument.
(h83000_print_operand): Update calls to h8300_print_operand_address
and output_address.
* config/ia64/ia64.c (ia64_print_operand_address): Add MODE
argument.
* config/tilepro/tilepro.c (output_memory_reference_mode): Delete
global.
(tilepro_print_operand): Pass mode to output_address.
(tilepro_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode.
* config/nvptx/nvptx.c (output_decl_chunk, nvptx_assemble_integer)
(nvptx_output_call_insn, nvptx_print_address_operand): Pass VOIDmode
to output_address calls.
(nvptx_print_operand_address): Add MODE argument.
* config/alpha/alpha.c (print_operand): Pass mode argument in
output_address calls.
* config/m68k/m68k.c (print_operand): Pass mode argument in
output_address call.
* config/avr/avr.c (avr_print_operand_address): Add MODE argument.
(avr_print_operand): Update calls to avr_print_operand_address.
* config/sparc/sparc.c (sparc_print_operand_address): Add MODE
argument. Update calls to output_address.
(sparc_print_operand): Pass mode to output_address.
* config/iq2000/iq2000.c (iq2000_print_operand_address): Add MODE
argument.
(iq2000_print_operand): Pass mode in output_address calls.
* config/stormy16/stormy

[gomp4] Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-09 Thread Thomas Schwinge

Hi!

On Thu, 5 Nov 2015 18:10:49 -0800, Cesar Philippidis  
wrote:
> I've applied this patch to trunk. It also includes the fortran and
> template changes. [...]

> Also, because of these reduction problems, I decided not to merge
> combined_loops.f90 with combined-directives.f90 yet because the latter
> relies on scanning which would fail with the errors detected during
> gimplfication. I'm planning on adding a couple of more test cases [...]

ACK.


Merging Cesar's trunk r229832 into gomp-4_0-branch, I tried to replicate
this part of his commit:

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -3449,16 +3478,33 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
> [...]
> +  loop_clauses.lists[OMP_LIST_PRIVATE]
> + = construct_clauses.lists[OMP_LIST_PRIVATE];
> +  loop_clauses.lists[OMP_LIST_REDUCTION]
> + = construct_clauses.lists[OMP_LIST_REDUCTION];
> [...]
> +  construct_clauses.lists[OMP_LIST_PRIVATE] = NULL;
> [...]

... in the gcc/fortran/trans-openmp.c:gfc_filter_oacc_combined_clauses
function that we're using on gomp-4_0-branch, but ran into ICEs that
looked like "double free" (or similar; maybe related to the slightly
different structure of this code due to device_type support), so applied
the following hack and XFAILs as part of the merge, which needs to be
resolved later (on gomp-4_0-branch).  (A few related TODOs also remain to
be addressed in gcc/testsuite/c-c++-common/goacc/combined-directives.c.)

--- gcc/fortran/trans-openmp.c
+++ gcc/fortran/trans-openmp.c
@@ -3691,12 +3691,14 @@ gfc_filter_oacc_combined_clauses (gfc_omp_clauses 
**orig_clauses,
  (*orig_clauses)->tile = false;
  (*loop_clauses)->tile_list = (*orig_clauses)->tile_list;
  (*orig_clauses)->tile_list = NULL;
{+#if 0 /* TODO */+}
  (*loop_clauses)->lists[OMP_LIST_PRIVATE]
= (*orig_clauses)->lists[OMP_LIST_PRIVATE];
  (*orig_clauses)->lists[OMP_LIST_PRIVATE] = NULL;
  (*loop_clauses)->lists[OMP_LIST_REDUCTION]
= (*orig_clauses)->lists[OMP_LIST_REDUCTION];
  /* Don't reset (*orig_clauses)->lists[OMP_LIST_REDUCTION].  */
{+#endif+}

  (*loop_clauses)->device_types = (*orig_clauses)->device_types;

--- gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -4,6 +4,9 @@
! { dg-options "-fopenacc -fdump-tree-gimple" }

! TODO
{+! Fix OMP_LIST_PRIVATE and OMP_LIST_REDUCTION splitting in+}
{+! gcc/fortran/trans-openmp.c:gfc_filter_oacc_combined_clauses, and remove 
tree+}
{+! scanning XFAILs.+}
! Enable and update tree scanning for reduction clauses.
! Enable/add/update device_type clauses and tree scanning.

@@ -154,12 +157,12 @@ subroutine test
!  !$acc end kernels loop
end subroutine test

! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. 
collapse.2." 2 "gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" {+{ 
xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 
"gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 
"gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 
"gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 
"gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 
2 "gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 
"gimple" {+{ xfail *-*-* }+} } }
! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }

With that, committed to gomp-4_0-branch in r230012:

commit 80ebc1e747e3422e0cc57c4c11387ec26bbf8814
Merge: 175e08b ef014f9
Author: tschwinge 
Date:   Mon Nov 9 11:10:12 2015 +

svn merge -r 229831:229832 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230012 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature

[PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Kyrill Tkachov


Hi all,

In this PR we hit an ICE in simplify-rtx.c:
#if TARGET_SUPPORTS_WIDE_INT == 0
  /* This assert keeps the simplification from producing a result
 that cannot be represented in a CONST_DOUBLE but a lot of
 upstream callers expect that this function never fails to
 simplify something and so you if you added this to the test
 above the code would die later anyway.  If this assert
 happens, you just need to make the port support wide int.  */
  gcc_assert (width <= HOST_BITS_PER_DOUBLE_INT);
#endif

The aarch64 port does not define TARGET_SUPPORTS_WIDE_INT.
From what I understand, in order to define it we need to remove the instances 
in the backend where
we use CONST_DOUBLE rtxes with VOIDmode (i.e. for storing CONST_INTs).
From what I can see we don't create such instances, so this patch defines 
TARGET_SUPPORTS_WIDE_INT
and cleans up a couple of places in the backend that try to hypothetically 
handle CONST_DOUBLEs as CONST_INTs
if they are passed down as such from other places in the compiler.

Bootstrapped and tested on aarch64-none-linux-gnu on trunk and GCC 5.
No codegen differences on SPEC2006.

Ok for trunk and GCC 5?

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

PR target/68129
* config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
* config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
Delete VOIDmode case.  Assert that mode is not VOIDmode.
* config/aarch64/predicates.md (const0_operand): Remove const_double
match.

2015-11-09  Kyrylo Tkachov  

PR target/68129
* gcc.target/aarch64/pr68129_1.c: New test.
commit 10562c44766a57e4762e926f876f5457f9899e33
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 10:49:44 2015 +

[AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 47ccc32..389bfc0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4401,11 +4401,10 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  break;
 
 	case CONST_DOUBLE:
-	  /* CONST_DOUBLE can represent a double-width integer.
-	 In this case, the mode of x is VOIDmode.  */
-	  if (GET_MODE (x) == VOIDmode)
-	; /* Do Nothing.  */
-	  else if (aarch64_float_const_zero_rtx_p (x))
+	  /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever
+	 be getting CONST_DOUBLEs holding integers.  */
+	  gcc_assert (GET_MODE (x) != VOIDmode);
+	  if (aarch64_float_const_zero_rtx_p (x))
 	{
 	  fputc ('0', f);
 	  break;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b041a1e..0fac0a7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel;
   (aarch64_cmodel == AARCH64_CMODEL_TINY		\
|| aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
 
+#define TARGET_SUPPORTS_WIDE_INT 1
+
 /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register.  */
 #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \
   ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 1bcbf62..8775460 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand"
 
 ;; Return true if OP a (const_int 0) operand.
 (define_predicate "const0_operand"
-  (and (match_code "const_int, const_double")
+  (and (match_code "const_int")
(match_test "op == CONST0_RTX (mode)")))
 
 (define_predicate "aarch64_ccmp_immediate"
diff --git a/gcc/testsuite/gcc.target/aarch64/pr68129_1.c b/gcc/testsuite/gcc.target/aarch64/pr68129_1.c
new file mode 100644
index 000..112331e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr68129_1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-split-wide-types" } */
+
+typedef int V __attribute__ ((vector_size (8 * sizeof (int;
+
+void
+foo (V *p, V *q)
+{
+  *p = (*p == *q);
+}

Re: Unreviewed patch

2015-11-09 Thread Rainer Orth

Jeff Law  writes:

> On 11/06/2015 06:29 AM, Rainer Orth wrote:
>> The following patch has remained unrevied for a month:
>>
>>  [build] Support init priority on Solaris
>>  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00716.html
>>
>> It needs build and ia64 maintainers and someone familiar with the init
>> priority support to review.
> This is OK.  Thanks for pinging -- I didn't have it in my queue of things
> to look at, so it'd definitely fallen through the cracks.

Thanks, and no worries: I'd been away for 3 weeks, so I didn't ping
before.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH][AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW

2015-11-09 Thread Kyrill Tkachov


Hi all,

From what I understand, we shouldn't encounter the case where CONST_DOUBLE is 
used to hold
a large integer value, accessed through CONST_DOUBLE_LOW and CONST_DOUBLE_HIGH.
The aarch64 backend doesn't generate any such rtxes and the midend shouldn't be 
passing
any such rtxes either.  This allows for a bit of cleanup in 
aarch64_simd_valid_immediate.

Bootstrapped and tested on aarch64.

Ok for trunk?
Thanks,
Kyrill

P.S. I propose this patch separate from my other patch that defines 
TARGET_SUPPORTS_WIDE_INT for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00934.html)
as this is not needed to fix the ICE in that case and just removes some dead 
code and is thus not
appropriate for backporting in my opinion.

2015-11-09  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate):
Remove integer CONST_DOUBLE handling.  It should never occur.
commit 4b6c35c0f171a4841cb219e86b9d9ea752e2e849
Author: Kyrylo Tkachov 
Date:   Thu Oct 29 10:28:12 2015 +

[AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 389bfc0..cbdff44 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10080,32 +10080,16 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
  it must be laid out in the vector register in reverse order.  */
   rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
   unsigned HOST_WIDE_INT elpart;
-  unsigned int part, parts;
 
-  if (CONST_INT_P (el))
-{
-  elpart = INTVAL (el);
-  parts = 1;
-}
-  else if (GET_CODE (el) == CONST_DOUBLE)
-{
-  elpart = CONST_DOUBLE_LOW (el);
-  parts = 2;
-}
-  else
-gcc_unreachable ();
+  gcc_assert (CONST_INT_P (el));
+  elpart = INTVAL (el);
+
+  for (unsigned int byte = 0; byte < innersize; byte++)
+	{
+	  bytes[idx++] = (elpart & 0xff) ^ invmask;
+	  elpart >>= BITS_PER_UNIT;
+	}
 
-  for (part = 0; part < parts; part++)
-{
-  unsigned int byte;
-  for (byte = 0; byte < innersize; byte++)
-{
-  bytes[idx++] = (elpart & 0xff) ^ invmask;
-  elpart >>= BITS_PER_UNIT;
-}
-  if (GET_CODE (el) == CONST_DOUBLE)
-elpart = CONST_DOUBLE_HIGH (el);
-}
 }
 
   /* Sanity check.  */

[PATCH][ARM] Use snprintf rather than sprintf where possible

2015-11-09 Thread Kyrill Tkachov


Hi all,

Judging by the thread at 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01912.html
I looked at replacing calls to sprintf with calls to snprintf in the arm 
backend.
We use them a lot to print assembly mnemonics into static char buffers.
This patch replaces the calls with snprintf and adds a size argument equal to 
the size
of the buffer used. This way, if any of the format strings changes/increases 
past the size
of the allocated buffer, snprintf will truncate it (and the assembler will 
catch it) rather
than trying to write past the end of the buffer with unexpected results.

I managed to replace all uses of sprintf in the arm backend except the one in 
aout.h:
#define ASM_GENERATE_INTERNAL_LABEL(STRING, PREFIX, NUM)  \
  sprintf (STRING, "*%s%s%u", LOCAL_LABEL_PREFIX, PREFIX, (unsigned int)(NUM))

Here, ASM_GENERATE_INTERNAL_LABEL is used in various places in the midend to 
print labels
to static buffers. I've seen those buffers have sizes ranging from 12 chars to 
256 chars.
The size of the buffer that ASM_GENERATE_INTERNAL_LABEL can depend on is not 
mandated in the
documentation or passed down to the macro, so I think this is a bit dangerous. 
In practice, however,
I don't think we print labels that long that that would cause an issue.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

* config/arm/arm.c (arm_set_fixed_optab_libfunc): Use snprintf
rather than sprintf.
(arm_set_fixed_conv_libfunc): Likewise.
(arm_option_override): Likewise.
(neon_output_logic_immediate): Likewise.
(neon_output_shift_immediate): Likewise.
(arm_output_multireg_pop): Likewise.
(vfp_output_vstmd): Likewise.
(output_move_vfp): Likewise.
(output_move_neon): Likewise.
(output_return_instruction): Likewise.
(arm_elf_asm_cdtor): Likewise.
(arm_output_shift): Likewise.
(arm_output_iwmmxt_shift_immediate): Likewise.
(arm_output_iwmmxt_tinsr): Likewise.
* config/arm/neon.md (*neon_mov, VDX): Likewise.
(*neon_mov, VQXMOV): Likewise.
(neon_vc_insn): Likewise.
(neon_vc_insn_unspec): Likewise.
commit bed313cb485dac52a5bb9984b55c9985af07ab35
Author: Kyrylo Tkachov 
Date:   Wed Oct 21 14:40:07 2015 +0100

[ARM] Use snprintf rather than sprintf

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f86ebd9..fa34a7b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2303,9 +2303,10 @@ arm_set_fixed_optab_libfunc (optab optable, machine_mode mode,
   char buffer[50];
 
   if (num_suffix == 0)
-sprintf (buffer, "__gnu_%s%s", funcname, modename);
+snprintf (buffer, sizeof (buffer), "__gnu_%s%s", funcname, modename);
   else
-sprintf (buffer, "__gnu_%s%s%d", funcname, modename, num_suffix);
+snprintf (buffer, sizeof (buffer), "__gnu_%s%s%d", funcname,
+	  modename, num_suffix);
 
   set_optab_libfunc (optable, mode, buffer);
 }
@@ -2324,8 +2325,8 @@ arm_set_fixed_conv_libfunc (convert_optab optable, machine_mode to,
   && ALL_FRACT_MODE_P (from) == ALL_FRACT_MODE_P (to))
 maybe_suffix_2 = "2";
 
-  sprintf (buffer, "__gnu_%s%s%s%s", funcname, fromname, toname,
-	   maybe_suffix_2);
+  snprintf (buffer, sizeof (buffer), "__gnu_%s%s%s%s", funcname,
+	fromname, toname, maybe_suffix_2);
 
   set_conv_libfunc (optable, to, from, buffer);
 }
@@ -3036,7 +3037,8 @@ arm_option_override (void)
   if (!arm_selected_tune)
 arm_selected_tune = &all_cores[arm_selected_cpu->core];
 
-  sprintf (arm_arch_name, "__ARM_ARCH_%s__", arm_selected_cpu->arch);
+  snprintf (arm_arch_name, sizeof (arm_arch_name),
+	"__ARM_ARCH_%s__", arm_selected_cpu->arch);
   insn_flags = arm_selected_cpu->flags;
   arm_base_arch = arm_selected_cpu->base_arch;
 
@@ -12631,9 +12633,11 @@ neon_output_logic_immediate (const char *mnem, rtx *op2, machine_mode mode,
   gcc_assert (is_valid != 0);
 
   if (quad)
-sprintf (templ, "%s.i%d\t%%q0, %%2", mnem, width);
+snprintf (templ, sizeof (templ), "%s.i%d\t%%q0, %%2",
+	  mnem, width);
   else
-sprintf (templ, "%s.i%d\t%%P0, %%2", mnem, width);
+snprintf (templ, sizeof (templ), "%s.i%d\t%%P0, %%2",
+	  mnem, width);
 
   return templ;
 }
@@ -12653,9 +12657,11 @@ neon_output_shift_immediate (const char *mnem, char sign, rtx *op2,
   gcc_assert (is_valid != 0);
 
   if (quad)
-sprintf (templ, "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width);
+snprintf (templ, sizeof (templ),
+	  "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width);
   else
-sprintf (templ, "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width);
+snprintf (templ, sizeof (templ),
+	  "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width);
 
   return templ;
 }
@@ -17672,9 +17678,7 @@ arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
 
   conditional = reverse ? "%?%D0" : "%?%d0";
   if ((regno_base == SP_REGNUM) && update)
-{
-  sprintf (pattern, "pop%s\t{", conditional);
-}
+snprintf (patter

[AARCH64] Adding constant folding for __builtin_fmulx* with scalar 32 and 64 bit arguments

2015-11-09 Thread Bilyan Borisov


This patch adds an extension to aarch64_gimple_fold_builtin () that does
constant folding on __builtin_fmulx* calls for 32 and 64 bit floating point
scalar modes. We fold when both arguments are constant, as well as when only one
is. The special cases of 0*inf, -0*inf, 0*-inf, and -0*-inf are also
handled. The case for vector constant arguments will be dealt with in a future
patch since the tests for that would be obscure and would unnecessarily
complicate this patch.

Added tests to check for proper handling of constant folding. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Added
constant folding.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/simd/vmulx.x: New.
* gcc.target/aarch64/simd/vmulx_f64_2.c: Likewise.
* gcc.target/aarch64/simd/vmulxd_f64_2.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_f32_2.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a1998ed550ac801e4d80baae122bf58e394a563f..339054d344900c942d5ce7c047479de3bbb4e61b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1362,7 +1362,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   if (fndecl)
 	{
 	  int fcode = DECL_FUNCTION_CODE (fndecl);
-	  int nargs = gimple_call_num_args (stmt);
+	  unsigned nargs = gimple_call_num_args (stmt);
 	  tree *args = (nargs > 0
 			? gimple_call_arg_ptr (stmt, 0)
 			: &error_mark_node);
@@ -1386,7 +1386,54 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 		new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
 		REDUC_MIN_EXPR, args[0]);
 		break;
-
+	  BUILTIN_GPF (BINOP, fmulx, 0)
+		{
+		  gcc_assert (nargs == 2);
+		  bool a0_cst_p = TREE_CODE (args[0]) == REAL_CST;
+		  bool a1_cst_p = TREE_CODE (args[1]) == REAL_CST;
+		  if (a0_cst_p || a1_cst_p)
+		{
+		  if (a0_cst_p && a1_cst_p)
+			{
+			  tree t0 = TREE_TYPE (args[0]);
+			  real_value a0 = (TREE_REAL_CST (args[0]));
+			  real_value a1 = (TREE_REAL_CST (args[1]));
+			  if (real_equal (&a1, &dconst0))
+			std::swap (a0, a1);
+			  /* According to real_equal (), +0 equals -0.  */
+			  if (real_equal (&a0, &dconst0) && real_isinf (&a1))
+			{
+			  real_value res = dconst2;
+			  res.sign = a0.sign ^ a1.sign;
+			  new_stmt =
+gimple_build_assign (gimple_call_lhs (stmt),
+		 REAL_CST,
+		 build_real (t0, res));
+			}
+			  else
+			new_stmt =
+			  gimple_build_assign (gimple_call_lhs (stmt),
+		   MULT_EXPR,
+		   args[0], args[1]);
+			}
+		  else /* a0_cst_p ^ a1_cst_p.  */
+			{
+			  real_value const_part = a0_cst_p
+			? TREE_REAL_CST (args[0]) : TREE_REAL_CST (args[1]);
+			  if (!real_equal (&const_part, &dconst0)
+			  && !real_isinf (&const_part))
+			new_stmt =
+			  gimple_build_assign (gimple_call_lhs (stmt),
+		   MULT_EXPR, args[0], args[1]);
+			}
+		}
+		  if (new_stmt)
+		{
+		  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+		  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+		}
+		  break;
+		}
 	default:
 	  break;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x
new file mode 100644
index ..8968a64a95cb40a466dd77fea4e9f9f63ad707dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x
@@ -0,0 +1,46 @@
+#define PASS_ARRAY(...) {__VA_ARGS__}
+
+#define SETUP_TEST_CASE_VEC(I, INTRINSIC, BASE_TYPE, TYPE1, TYPE2,	\
+			VALS1, VALS2, EXPS, LEN, FM, Q_LD, Q_ST,	\
+			V1, V2)	\
+  do	\
+{	\
+  int i##I;\
+  BASE_TYPE vec##I##_1_data[] = VALS1;	\
+  BASE_TYPE vec##I##_2_data[] = VALS2;	\
+  V1 TYPE1 vec##I##_1 = vld1##Q_LD##_##FM (vec##I##_1_data);		\
+  V2 TYPE2 vec##I##_2 = vld1##Q_LD##_##FM (vec##I##_2_data);		\
+  TYPE1 actual##I##_v = INTRINSIC (vec##I##_1, vec##I##_2);		\
+  volatile BASE_TYPE expected##I[] = EXPS;\
+  BASE_TYPE actual##I[LEN];		\
+  vst1##Q_ST##_##FM (actual##I, actual##I##_v);\
+  for (i##I = 0; i##I < LEN; ++i##I)	\
+if (actual##I[i##I] != expected##I[i##I])\
+  abort ();\
+}	\
+  while (0)\
+
+#define SETUP_TEST_CASE_SCALAR(I, INTRINSIC, TYPE, VAL1, VAL2, EXP)	\
+  do	\
+{	\
+  TYPE vec_##I##_1 = VAL1;		\
+  TYPE vec_##I##_2 = VAL2;		\
+  TYPE expected_##I = EXP;		\
+  volatile TYPE actual_##I = INTRINSIC (vec_##I##_1, vec_##I##_2);	\
+  if (actual_##I != expected_##I)	\
+abort ();\
+}	\
+  while (0)\
+
+/* Functions used to return values that won't be optimised away.  */
+float32_t  __attribute__ ((noinline))
+foo32 ()
+{
+  return 1.0;
+}
+
+float64_t  __attribute__

Re: [patch] Fix PR middle-end/68251

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 12:13 PM, Eric Botcazou  wrote:
> Hi,
>
> I don't understand how this didn't show up sooner given the amount of testing,
> but there is a nasty overloading for the new REF_REVERSE_STORAGE_ORDER flag:
>
> #define REF_REVERSE_STORAGE_ORDER(NODE) \
>(TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag)
>
> saturating_flag is part of the 'bits' structure, which is itself part of the
> 'u' union, which also contains the dependence_info structure, and:
>
> #define MR_DEPENDENCE_CLIQUE(NODE) \
>   (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.clique)
> #define MR_DEPENDENCE_BASE(NODE) \
>   (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.base)
>
> So there is a conflict on MEM_REF nodes.  Therefore the attached patch changes
> REF_REVERSE_STORAGE_ORDER to use the default_def_flag instead.
>
> Tested on x86_64-suse-linux, OK for the mainline?  I'll install the Fortran
> testcase once it is reduced because it takes a while to compile ATM.

Looks good to me.  I wonder where you store the info on
TARGET_MEM_REFs though?  Or
is IVOPTs prohibited from messing with such refs (pessmizing them)?

Thanks,
Richard.

>
> 2015-11-09  Eric Botcazou  
>
> PR middle-end/68251
> * tree-core.h (REF_REVERSE_STORAGE_ORDER): Move around.
> * tree.h (REF_REVERSE_STORAGE_ORDER): Change to default_def_flag.
> * tree-streamer-in.c (unpack_ts_base_value_fields): Adjust.
> * tree-streamer-out.c (pack_ts_base_value_fields): Likewise.
>
> --
> Eric Botcazou

Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-11-09 Thread Ilya Enkovich

On 26 Oct 16:09, Richard Biener wrote:
> On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich  wrote:
> > +
> > + ops.release ();
> > + vec_defs.release ();
> 
> No need to release auto_vec<>s at the end of scope explicitely.

Fixed

> 
> > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
> > + new_stmt = gimple_build_assign (mask, vec_compare);
> > + new_temp = make_ssa_name (mask, new_stmt);
> > + gimple_assign_set_lhs (new_stmt, new_temp);
> 
>  new_temp = make_ssa_name (mask);
>  gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2);
> 
> for the 4 stmts above.

Fixed

> 
> > +
> > +  vec_oprnds0.release ();
> > +  vec_oprnds1.release ();
> 
> Please use auto_vec<>s.

These are used to hold vecs returned by vect_get_slp_defs.  Thus can't 
use auto_vec.

> 
> Ok with those changes.
> 
> RIchard.
> 


gcc/

2015-11-09  Ilya Enkovich  

* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-stmts.c (vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
* tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(vectorizable_comparison): New.


diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 11bce79..926752b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
var_kind, const char *name)
   case vect_scalar_var:
 prefix = "stmp";
 break;
+  case vect_mask_var:
+prefix = "mask";
+break;
   case vect_pointer_var:
 prefix = "vectp";
 break;
@@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree 
vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+? VECTOR_BOOLEAN_TYPE_P (vectype)
+? vect_mask_var
+: vect_simple_var
+: vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f1216c8..ee549f4 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, 
gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
+gimple **vec_stmt, tree reduc_def,
+slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec vec_oprnds0 = vNULL;
+  vec vec_oprnds1 = vNULL;
+  gimple *def_stmt;
+  tree mask_type;
+  tree mask;
+
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+  && reduc_def))
+return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"value used after loop.\n");
+  return false;
+}
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (!vect_is_simple_use (rhs1, stmt_info->vinfo, &def_stmt,
+  &dts[0], &vectype1))
+return false;
+
+  if (!vect_is_simple_use (rhs2, stmt_info->vinfo, &def_stmt,
+  &dts[1], &vectype2))
+return false;
+
+  if (vectype1 && vectype2
+  && TYPE_VECTOR_SUBPARTS (vectype1) !

Re: [patch] Fix PR middle-end/68251

2015-11-09 Thread Eric Botcazou

> Looks good to me.

Thanks.

> I wonder where you store the info on TARGET_MEM_REFs though?  Or
> is IVOPTs prohibited from messing with such refs (pessmizing them)?

Yes, the latter:

* tree-ssa-loop-ivopts.c (may_be_nonaddressable_p) : New.
Return true if reverse storage order.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
(split_address_cost): Likewise.  Bail out if reverse storage order

-- 
Eric Botcazou

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-11-09 Thread Ilya Enkovich

On 26 Oct 16:21, Richard Biener wrote:
> On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich  wrote:
> > -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
> > +   {
> > + /* Can't use VIEW_CONVERT_EXPR for booleans because
> > +of possibly different sizes of scalar value and
> > +vector element.  */
> > + if (VECTOR_BOOLEAN_TYPE_P (type))
> > +   {
> > + if (integer_zerop (val))
> > +   val = build_int_cst (TREE_TYPE (type), 0);
> > + else if (integer_onep (val))
> > +   val = build_int_cst (TREE_TYPE (type), 1);
> > + else
> > +   gcc_unreachable ();
> > +   }
> > + else
> > +   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
> 
> I think the existing code is fine with using fold_convert () here
> which should also work
> for the boolean types.  So does just
> 
>   val = fold_convert (TREE_TYPE (type), val);
> 
> work?

It seems to work OK.

> 
> > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, 
> > gimple_stmt_iterator *gsi,
> >   gimple *gtemp;
> >   vec_cond_lhs =
> >   vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0),
> > -   stmt, NULL);
> > +   stmt, NULL, comp_vectype);
> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt,
> >   loop_vinfo, >emp, &def, &dts[0]);
> >
> >   vec_cond_rhs =
> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1),
> > -   stmt, NULL);
> > + stmt, NULL, comp_vectype);
> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt,
> >   loop_vinfo, >emp, &def, &dts[1]);
> 
> I still don't like this very much but I guess without some major
> refactoring of all
> the functions there isn't a better way to do it for now.
> 
> Thus, ok with trying the change suggested above.
> 
> Thanks,
> Richard.
> 

Here is an updated version.

Thanks,
Ilya
--
gcc/

2015-11-09  Ilya Enkovich  

* expr.c (const_vector_mask_from_tree): New.
(const_vector_from_tree): Use const_vector_mask_from_tree
for boolean vectors.
* tree-vect-stmts.c (vect_init_vector): Support boolean vector
invariants.
(vect_get_vec_def_for_operand): Add VECTYPE arg.
(vectorizable_condition): Directly provide vectype for invariants
used in comparison.
* tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
arg.


diff --git a/gcc/expr.c b/gcc/expr.c
index 2b2174f..03936ee 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, tree 
minval, tree range,
   return 1;
 }
 
+/* Return a CONST_VECTOR rtx representing vector mask for
+   a VECTOR_CST of booleans.  */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+  rtvec v;
+  unsigned i;
+  int units;
+  tree elt;
+  machine_mode inner, mode;
+
+  mode = TYPE_MODE (TREE_TYPE (exp));
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+
+  v = rtvec_alloc (units);
+
+  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+{
+  elt = VECTOR_CST_ELT (exp, i);
+
+  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+  if (integer_zerop (elt))
+   RTVEC_ELT (v, i) = CONST0_RTX (inner);
+  else if (integer_onep (elt)
+  || integer_minus_onep (elt))
+   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+  else
+   gcc_unreachable ();
+}
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
 static rtx
 const_vector_from_tree (tree exp)
@@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp)
   if (initializer_zerop (exp))
 return CONST0_RTX (mode);
 
+  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
+return const_vector_mask_from_tree (exp);
+
   units = GET_MODE_NUNITS (mode);
   inner = GET_MODE_INNER (mode);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index ee549f4..af203ab 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
   if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
{
  if (CONSTANT_CLASS_P (val))
-   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+   val = fold_convert (TREE_TYPE (type), val);
  else
{
  new_temp = make_ssa_name (TREE_TYPE (type));
@@ -1328,16 +1328,18 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
STMT_VINFO_VEC_STMT of the defining st

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
> Hi.
>
> This is follow-up of changes that Richi started on Friday.
>
> Patch can bootstrap on x86_64-linux-pc and regression tests are running.
>
> Ready for trunk?

* tree-ssa-dom.c (free_edge_info): Make the function extern.
...
* tree-ssa.h (free_edge_info): Declare function extern.

declare this in tree-ssa-threadupdate.h instead and renaming it to
sth less "public", like free_dom_edge_info.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fff62de..eb6b7df 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   set_used_flags (targets[i]);
 }

+  temporaries.release ();
+
   set_used_flags (cond);
   set_used_flags (x);
   set_used_flags (y);
@@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 }

   num_updated_if_blocks++;
+  targets.release ();
   return TRUE;

suspiciously look like candidates for an auto_vec<> (didn't check).

@@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   dead_set = sparseset_alloc (max_regno);
   unused_set = sparseset_alloc (max_regno);
   curr_point = 0;
+  point_freq_vec.release ();
   point_freq_vec.create (get_max_uid () * 2);

a truncate (0) instead of a release () should be cheaper, avoiding the
re-allocation.

@@ -674,6 +674,10 @@ sra_deinitialize (void)
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);

+  for (hash_map >::iterator it =
+   base_access_vec->begin (); it != base_access_vec->end (); ++it)
+(*it).second.release ();
+
   delete base_access_vec;

I wonder if the better fix is to provide a proper free method for the hash_map?
A hash_map with 'auto_vec' looks suspicous - eventually a proper release
was intented here via default_hash_map_traits <>?

Anyway, most of the things above can be improved as followup of course.

Thanks,
Richard.

> Thanks,
> Martin

[PATCH][optabs][ifcvt][1/3] Define negcc, notcc optabs

2015-11-09 Thread Kyrill Tkachov


Hi all,

This is a rebase of the patch I posted at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00154.html

The patch has been ok'd by Jeff but I wanted to hold off committing it until
my fixes for the ifcvt regressions on sparc and x86_64 were fixed.

The rebase conflicts were due to Richard's optabs splitting patch.

I've also noticed that in my original patch I had a comparison of branch cost 
with
the magic number '2'. I removed it from this version as it's not really 
meaningful.
The transformation this patch enables is, at the moment, only supported for arm
and aarch64 where it is always beneficial. If/when we have a proper ifcvt 
costing
model (perhaps for GCC 7?) we'll update this accordingly if needed.

Jeff, sorry for taking so long to commit this, I just wanted to fix the other
ifcvt fallout before proceeding with more new functionality.
I have also uncovered a bug in the arm implementation of these optabs
(patch 3/3 in the series), so I'll post an updated version of that patch as 
well soon.

Ok to commit this updated version instead?

Bootstrapped and tested on arm, aarch64 and x86_64.
It has been sitting in my tree for a couple of months now with no issues.

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

* ifcvt.c (noce_try_inverse_constants): New function.
(noce_process_if_block): Call it.
* optabs.h (emit_conditional_neg_or_complement): Declare prototype.
* optabs.def (negcc_optab, notcc_optab): Declare.
* optabs.c (emit_conditional_neg_or_complement): New function.
* doc/tm.texi (Standard Names): Document negcc, notcc names.
commit 93cd987e9ab02ac68b44b2470bb5c4c6345efeca
Author: Kyrylo Tkachov 
Date:   Thu Aug 13 18:14:52 2015 +0100

[optabs][ifcvt][1/3] Define negcc, notcc optabs

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 619259f..c4e43f3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5791,6 +5791,21 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the
 comparison in operand 1.  If the comparison is false, operand 2 is moved into
 operand 0, otherwise (operand 2 + operand 3) is moved.
 
+@cindex @code{neg@var{mode}cc} instruction pattern
+@item @samp{neg@var{mode}cc}
+Similar to @samp{mov@var{mode}cc} but for conditional negation.  Conditionally
+move the negation of operand 2 or the unchanged operand 3 into operand 0
+according to the comparison in operand 1.  If the comparison is true, the negation
+of operand 2 is moved into operand 0, otherwise operand 3 is moved.
+
+@cindex @code{not@var{mode}cc} instruction pattern
+@item @samp{not@var{mode}cc}
+Similar to @samp{neg@var{mode}cc} but for conditional complement.
+Conditionally move the bitwise complement of operand 2 or the unchanged
+operand 3 into operand 0 according to the comparison in operand 1.
+If the comparison is true, the complement of operand 2 is moved into
+operand 0, otherwise operand 3 is moved.
+
 @cindex @code{cstore@var{mode}4} instruction pattern
 @item @samp{cstore@var{mode}4}
 Store zero or nonzero in operand 0 according to whether a comparison
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..1e773d8 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1179,6 +1179,83 @@ noce_try_store_flag (struct noce_if_info *if_info)
 }
 }
 
+
+/* Convert "if (test) x = -A; else x = A" into
+   x = A; if (test) x = -x if the machine can do the
+   conditional negate form of this cheaply.
+   Try this before noce_try_cmove that will just load the
+   immediates into two registers and do a conditional select
+   between them.  If the target has a conditional negate or
+   conditional invert operation we can save a potentially
+   expensive constant synthesis.  */
+
+static bool
+noce_try_inverse_constants (struct noce_if_info *if_info)
+{
+  if (!noce_simple_bbs (if_info))
+return false;
+
+  if (!CONST_INT_P (if_info->a)
+  || !CONST_INT_P (if_info->b)
+  || !REG_P (if_info->x))
+return false;
+
+  machine_mode mode = GET_MODE (if_info->x);
+
+  HOST_WIDE_INT val_a = INTVAL (if_info->a);
+  HOST_WIDE_INT val_b = INTVAL (if_info->b);
+
+  rtx cond = if_info->cond;
+
+  rtx x = if_info->x;
+  rtx target;
+
+  start_sequence ();
+
+  rtx_code code;
+  if (val_b != HOST_WIDE_INT_MIN && val_a == -val_b)
+code = NEG;
+  else if (val_a == ~val_b)
+code = NOT;
+  else
+{
+  end_sequence ();
+  return false;
+}
+
+  rtx tmp = gen_reg_rtx (mode);
+  noce_emit_move_insn (tmp, if_info->a);
+
+  target = emit_conditional_neg_or_complement (x, code, mode, cond, tmp, tmp);
+
+  if (target)
+{
+  rtx_insn *seq = get_insns ();
+
+  if (!seq)
+	{
+	  end_sequence ();
+	  return false;
+	}
+
+  if (target != if_info->x)
+	noce_emit_move_insn (if_info->x, target);
+
+	seq = end_ifcvt_sequence (if_info);
+
+	if (!seq)
+	  return false;
+
+	emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
+	return true;
+}
+
+  end_sequence ();
+  return false;
+}
+
+
 /* Convert "if (test) x =

GCC 6 Status Report (2015-11-09)

2015-11-09 Thread Richard Biener


Status
==

We've pushed back the switch to Stage 3 to the end of Saturday, Nov 14th.
This is to allow smooth draining of review queues.


Quality Data


Priority  #   Change from last report
---   ---
P14+   2
P2   84
P3  130+  10
P4   83-   5
P5   32
---   ---
Total P1-P3 218+  12
Total   333+   7


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-10/msg00113.html

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 01:11:48PM +0100, Richard Biener wrote:
> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
> > Hi.
> >
> > This is follow-up of changes that Richi started on Friday.
> >
> > Patch can bootstrap on x86_64-linux-pc and regression tests are running.
> >
> > Ready for trunk?
> 
> * tree-ssa-dom.c (free_edge_info): Make the function extern.
> ...
> * tree-ssa.h (free_edge_info): Declare function extern.
> 
> declare this in tree-ssa-threadupdate.h instead and renaming it to
> sth less "public", like free_dom_edge_info.
> 
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index fff62de..eb6b7df 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>set_used_flags (targets[i]);
>  }
> 
> +  temporaries.release ();
> +
>set_used_flags (cond);
>set_used_flags (x);
>set_used_flags (y);
> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>  }
> 
>num_updated_if_blocks++;
> +  targets.release ();
>return TRUE;
> 
> suspiciously look like candidates for an auto_vec<> (didn't check).

I was about to say the same thing after a little checking (maybe the
region one in tree-ssa-threadupdate.c to, but didn't check that)

> 
> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>dead_set = sparseset_alloc (max_regno);
>unused_set = sparseset_alloc (max_regno);
>curr_point = 0;
> +  point_freq_vec.release ();
>point_freq_vec.create (get_max_uid () * 2);
> 
> a truncate (0) instead of a release () should be cheaper, avoiding the
> re-allocation.

yeah, or even change it to just grow the array, afaict it doesn't expect
the array to be cleared?

> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>assign_link_pool.release ();
>obstack_free (&name_obstack, NULL);
> 
> +  for (hash_map >::iterator it =
> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
> +(*it).second.release ();
> +
>delete base_access_vec;
> 
> I wonder if the better fix is to provide a proper free method for the 
> hash_map?
> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
> was intented here via default_hash_map_traits <>?

in fact I would expect that already works, but apparently not, so I'd
say that's the bug.

Trev

> 
> Anyway, most of the things above can be improved as followup of course.
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Martin

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-11-09 Thread Andreas Krebbel

On 11/04/2015 02:39 AM, Dominik Vogt wrote:
> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
>> Dominik Vogt wrote:
>>
>>> @@ -2936,7 +2936,7 @@
>>> (set (mem:BLK (match_operand:P 1 "register_operand" "0"))
>>> (mem:BLK (match_operand:P 3 "register_operand" "2")))
>>> (set (match_operand:P 0 "register_operand" "=d")
>>> -   (unspec [(mem:BLK (match_dup 1))
>>> +   (unspec:P [(mem:BLK (match_dup 1))
>>>  (mem:BLK (match_dup 3))
>>>  (reg:SI 0)] UNSPEC_MVST))
>>> (clobber (reg:CC CC_REGNUM))]
>>
>> Don't you have to change the expander too?  Otherwise the
>> pattern will no longer match ...
> 
> Yes, you're right.  This turned out to be a bit tricky to do
> because the "movstr" expander doesn't allow variants with
> different modes.  :-/
> 
> New patch attached, including a test case that works on 31-bit and
> 64-bit.

Could you please check that the generated code doesn't change with a larger 
code base (e.g.
speccpu)?  It should not affect it but I really think we omitted the mode here 
for a reason
(although I don't remember why).

-Andreas-

[patch] backport PIE support for FreeBSD to gcc-49

2015-11-09 Thread Andreas Tobler


Hi,

any objections that I apply this patch to gcc-4.9?

It is FreeBSD only.


TIA,
Andreas

2015-11-09  Andreas Tobler  

Backport from mainline
2015-05-18  Andreas Tobler  

* config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build
pie executables.
(FBSD_ENDFILE_SPEC): Likewise.
* config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from
config/freebsd-spec.h.
(ENDFILE_SPEC): Likewise.

2015-11-02  Andreas Tobler  

* config/rs6000/freebsd64.h (ASM_SPEC32): Adust spec to handle
PIE executables.

Index: gcc/config/freebsd-spec.h
===
--- gcc/config/freebsd-spec.h   (revision 230016)
+++ gcc/config/freebsd-spec.h   (working copy)
@@ -66,8 +66,9 @@
   "%{!shared: \
  %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \
   %{!p:%{profile:gcrt1.o%s} \
-%{!profile:crt1.o%s \
-   crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}"
+%{!profile: \
+%{pie: Scrt1.o%s;:crt1.o%s} \
+   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
 
 /* Provide a ENDFILE_SPEC appropriate for FreeBSD.  Here we tack on
the magical crtend.o file (see crtstuff.c) which provides part of 
@@ -76,7 +77,7 @@
`crtn.o'.  */
 
 #define FBSD_ENDFILE_SPEC \
-  "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s"
+  "%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"
 
 /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as
required by the user-land thread model.  Before __FreeBSD_version
Index: gcc/config/i386/freebsd.h
===
--- gcc/config/i386/freebsd.h   (revision 230016)
+++ gcc/config/i386/freebsd.h   (working copy)
@@ -59,29 +59,16 @@
 #define SUBTARGET_EXTRA_SPECS \
   { "fbsd_dynamic_linker", FBSD_DYNAMIC_LINKER }
 
-/* Provide a STARTFILE_SPEC appropriate for FreeBSD.  Here we add
-   the magical crtbegin.o file (see crtstuff.c) which provides part 
-   of the support for getting C++ file-scope static object constructed 
-   before entering `main'.  */
-   
-#undef STARTFILE_SPEC
-#define STARTFILE_SPEC \
-  "%{!shared: \
- %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \
-  %{!p:%{profile:gcrt1.o%s} \
-%{!profile:crt1.o%s \
-   crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}"
+/* Use the STARTFILE_SPEC from config/freebsd-spec.h.  */
 
-/* Provide a ENDFILE_SPEC appropriate for FreeBSD.  Here we tack on
-   the magical crtend.o file (see crtstuff.c) which provides part of 
-   the support for getting C++ file-scope static object constructed 
-   before entering `main', followed by a normal "finalizer" file, 
-   `crtn.o'.  */
+#undef  STARTFILE_SPEC
+#define STARTFILE_SPEC FBSD_STARTFILE_SPEC
 
-#undef ENDFILE_SPEC
-#define ENDFILE_SPEC \
-  "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s"
+/* Use the ENDFILE_SPEC from config/freebsd-spec.h.  */
 
+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC FBSD_ENDFILE_SPEC
+
 /* Provide a LINK_SPEC appropriate for FreeBSD.  Here we provide support
for the special GCC options -static and -shared, which allow us to
link things in one of these three modes by applying the appropriate
Index: gcc/config/rs6000/freebsd64.h
===
--- gcc/config/rs6000/freebsd64.h   (revision 230016)
+++ gcc/config/rs6000/freebsd64.h   (working copy)
@@ -130,7 +130,7 @@
 #defineLINK_OS_FREEBSD_SPEC 
"%{m32:%(link_os_freebsd_spec32)}%{!m32:%(link_os_freebsd_spec64)}"
 
 #define ASM_SPEC32 "-a32 \
-%{mrelocatable} %{mrelocatable-lib} %{fpic:-K PIC} %{fPIC:-K PIC} \
+%{mrelocatable} %{mrelocatable-lib} %{fpic|fpie|fPIC|fPIE:-K PIC} \
 %{memb} %{!memb: %{msdata=eabi: -memb}} \
 %{!mlittle: %{!mlittle-endian: %{!mbig: %{!mbig-endian: \
 %{mcall-freebsd: -mbig} \

[patch] backport remove of soft float for FreeBSD powerpc for gcc-4.9

2015-11-09 Thread Andreas Tobler


Hi all,

any objections when I apply the below patch to gcc-4.9?

TIA,

Andreas

2015-11-09  Andreas Tobler  

Backport from mainline
2015-03-04  Andreas Tobler  

* config/rs6000/t-freebsd64: Remove 32-bit soft-float multilibs.


Index: gcc/config/rs6000/t-freebsd64
===
--- gcc/config/rs6000/t-freebsd64   (revision 230016)
+++ gcc/config/rs6000/t-freebsd64   (working copy)
@@ -21,11 +21,9 @@
 # On FreeBSD the 32-bit libraries are found under /usr/lib32.
 # Set MULTILIB_OSDIRNAMES according to this.

-MULTILIB_OPTIONS= m32 msoft-float
-MULTILIB_DIRNAMES   = 32 nof
+MULTILIB_OPTIONS= m32
+MULTILIB_DIRNAMES   = 32
 MULTILIB_EXTRA_OPTS = fPIC mstrict-align
 MULTILIB_EXCEPTIONS =
-MULTILIB_EXCLUSIONS = !m32/msoft-float
 MULTILIB_OSDIRNAMES= ../lib32
-#MULTILIB_MATCHES= $(MULTILIB_MATCHES_FLOAT)

[PATCH] Improve BB vectorization dependence analysis

2015-11-09 Thread Richard Biener


Currently BB vectorization computes all dependences inside a BB
region and fails all vectorization if it cannot handle some of them.

This is obviously not needed - BB vectorization can restrict the
dependence tests to those that are needed to apply the load/store
motion effectively performed by the vectorization (sinking all
participating loads/stores to the place of the last one).

With restructuring it that way it's also easy to not give up completely
but only for the SLP instance we cannot vectorize (this gives
a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic
block regions).

But first and foremost this patch is to reduce the dependence analysis
cost and somewhat mitigate the compile-time effects of the first patch.

For fixing PR56118 only a cost model issue remains.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-09  Richard Biener  

PR tree-optimization/56118
* tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare.
* tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New
function.
(vect_slp_analyze_data_ref_dependences): Instead of computing
all dependences of the region DRs just analyze the code motions
SLP vectorization will perform.  Remove SLP instances that
cannot have their store/load motions applied.
(vect_analyze_data_refs): Allow DRs without a vectype
in BB vectorization.

* gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h.orig  2015-11-09 11:01:55.688175321 +0100
--- gcc/tree-vectorizer.h   2015-11-09 11:02:18.987432840 +0100
*** extern void vect_detect_hybrid_slp (loop
*** 1075,1080 
--- 1075,1081 
  extern void vect_get_slp_defs (vec , slp_tree,
   vec > *, int);
  extern bool vect_slp_bb (basic_block);
+ extern gimple *vect_find_last_scalar_stmt_in_slp (slp_tree);
  
  /* In tree-vect-patterns.c.  */
  /* Pattern recognition functions.
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c.orig  2015-11-09 10:22:33.140125722 +0100
--- gcc/tree-vect-data-refs.c   2015-11-09 11:33:05.503874719 +0100
*** vect_slp_analyze_data_ref_dependence (st
*** 581,586 
--- 581,629 
  }
  
  
+ /* Analyze dependences involved in the transform of SLP NODE.  */
+ 
+ static bool
+ vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node)
+ {
+   /* This walks over all stmts involved in the SLP load/store done
+  in NODE verifying we can sink them up to the last stmt in the
+  group.  */
+   gimple *last_access = vect_find_last_scalar_stmt_in_slp (node);
+   for (unsigned k = 0; k < SLP_INSTANCE_GROUP_SIZE (instance); ++k)
+ {
+   gimple *access = SLP_TREE_SCALAR_STMTS (node)[k];
+   if (access == last_access)
+   continue;
+   stmt_vec_info access_stmt_info = vinfo_for_stmt (access);
+   gimple_stmt_iterator gsi = gsi_for_stmt (access);
+   gsi_next (&gsi);
+   for (; gsi_stmt (gsi) != last_access; gsi_next (&gsi))
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ if (!STMT_VINFO_DATA_REF (stmt_info)
+ || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+ && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info
+   continue;
+ 
+ ddr_p ddr = initialize_data_dependence_relation
+ (STMT_VINFO_DATA_REF (access_stmt_info),
+  STMT_VINFO_DATA_REF (stmt_info), vNULL);
+ if (vect_slp_analyze_data_ref_dependence (ddr))
+   {
+ /* ???  If the dependence analysis failed we can resort to the
+alias oracle which can handle more kinds of stmts.  */
+ free_dependence_relation (ddr);
+ return false;
+   }
+ free_dependence_relation (ddr);
+   }
+ }
+   return true;
+ }
+ 
+ 
  /* Function vect_analyze_data_ref_dependences.
  
 Examine all the data references in the basic-block, and make sure there
*** vect_slp_analyze_data_ref_dependence (st
*** 590,610 
  bool
  vect_slp_analyze_data_ref_dependences (bb_vec_info bb_vinfo)
  {
-   struct data_dependence_relation *ddr;
-   unsigned int i;
- 
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,
   "=== vect_slp_analyze_data_ref_dependences ===\n");
  
!   if (!compute_all_dependences (BB_VINFO_DATAREFS (bb_vinfo),
!   &BB_VINFO_DDRS (bb_vinfo),
!   vNULL, true))
! return false;
  
!   FOR_EACH_VEC_ELT (BB_VINFO_DDRS (bb_vinfo), i, ddr)
! if (vect_slp_analyze_data_ref

Re: [PATCH 1/2] s/390: Implement "target" attribute.

2015-11-09 Thread Andreas Krebbel

On 11/02/2015 09:44 AM, Dominik Vogt wrote:
> (@Uli: I'd like to hear your opinion on this issue.
> Original message:
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03403.html).
> 
> On Fri, Oct 30, 2015 at 03:09:39PM +0100, Andreas Krebbel wrote:
>> Why do we need x_s390_arch_specified and x_s390_tune_specified?  You
>> should be able to use opts_set->x_s390_arch and opts_set->x_s390_tune
>> instead? (patch attached, your tests keep working with that change).
> 
> The idea was that -mtune on the command line is *not* overridden
> by the "arch" target attribute.  This would allow to change the
> architecture for a specific function and keep the -mtune= option
> from the command line.  But as a matter of fact, the current patch
> doesn't do it either (bug?).

Your testcases even seem to check for this behavior so it looked intentional to 
me.  But I agree
that being able to keep the -mtune cmdline value for a function while only 
changing the used
instruction set would be good.

Could you please elaborate why implementing this requires the new flags?

-Andreas-

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Martin Liška

On 11/09/2015 01:11 PM, Richard Biener wrote:
> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
>> Hi.
>>
>> This is follow-up of changes that Richi started on Friday.
>>
>> Patch can bootstrap on x86_64-linux-pc and regression tests are running.
>>
>> Ready for trunk?
> 
> * tree-ssa-dom.c (free_edge_info): Make the function extern.
> ...
> * tree-ssa.h (free_edge_info): Declare function extern.
> 
> declare this in tree-ssa-threadupdate.h instead and renaming it to
> sth less "public", like free_dom_edge_info.
> 
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index fff62de..eb6b7df 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>set_used_flags (targets[i]);
>  }
> 
> +  temporaries.release ();
> +
>set_used_flags (cond);
>set_used_flags (x);
>set_used_flags (y);
> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>  }
> 
>num_updated_if_blocks++;
> +  targets.release ();
>return TRUE;
> 
> suspiciously look like candidates for an auto_vec<> (didn't check).
> 
> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>dead_set = sparseset_alloc (max_regno);
>unused_set = sparseset_alloc (max_regno);
>curr_point = 0;
> +  point_freq_vec.release ();
>point_freq_vec.create (get_max_uid () * 2);
> 
> a truncate (0) instead of a release () should be cheaper, avoiding the
> re-allocation.
> 
> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>assign_link_pool.release ();
>obstack_free (&name_obstack, NULL);
> 
> +  for (hash_map >::iterator it =
> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
> +(*it).second.release ();
> +
>delete base_access_vec;
> 
> I wonder if the better fix is to provide a proper free method for the 
> hash_map?
> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
> was intented here via default_hash_map_traits <>?
> 
> Anyway, most of the things above can be improved as followup of course.
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Martin

Hi.

All suggested changes were applied, sending v2 and waiting for bootstrap and
regression tests.

Thanks,
Martin
>From c97270f2daadcca1efe6201adf1eb0df469ca91e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Nov 2015 10:49:14 +0100
Subject: [PATCH 1/2] Fix memory leaks and use a pool_allocator

gcc/ChangeLog:

2015-11-09  Martin Liska  

	* gcc.c (record_temp_file): Release name string.
	* ifcvt.c (noce_convert_multiple_sets): Use auto_vec instead
	of vec.
	* lra-lives.c (free_live_range_list): Utilize
	lra_live_range_pool for allocation and deallocation.
	(create_live_range): Likewise.
	(copy_live_range): Likewise.
	(lra_merge_live_ranges): Likewise.
	(remove_some_program_points_and_update_live_ranges): Likewise.
	(lra_create_live_ranges_1): Release point_freq_vec that can
	be not freed from previous iteration of the function.
	* tree-eh.c (lower_try_finally_switch): Use auto_vec instead of
	vec.
	* tree-sra.c (sra_deinitialize): Release all vectors in
	base_access_vec.
	* tree-ssa-dom.c (free_dom_edge_info): Make the function extern.
	* tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges):
	Release edge_info for a removed edge.
	(thread_through_all_blocks): Free region vector.
	* tree-ssa.h (free_dom_edge_info): Declare function extern.
---
 gcc/gcc.c   |  5 -
 gcc/ifcvt.c |  8 +---
 gcc/lra-lives.c | 14 --
 gcc/tree-eh.c   |  2 +-
 gcc/tree-sra.c  |  6 ++
 gcc/tree-ssa-dom.c  |  8 
 gcc/tree-ssa-threadupdate.c |  6 +-
 gcc/tree-ssa-threadupdate.h |  1 +
 8 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index bbc9b23..8bbf5be 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete)
   struct temp_file *temp;
   for (temp = always_delete_queue; temp; temp = temp->next)
 	if (! filename_cmp (name, temp->name))
-	  goto already1;
+	  {
+	free (name);
+	goto already1;
+	  }
 
   temp = XNEW (struct temp_file);
   temp->next = always_delete_queue;
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fff62de..3401faa 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3076,12 +3076,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx_code cond_code = GET_CODE (cond);
 
   /* The true targets for a conditional move.  */
-  vec targets = vNULL;
+  auto_vec targets;
   /* The temporaries introduced to allow us to not consider register
  overlap.  */
-  vec temporaries = vNULL;
+  auto_vec temporaries;
   /* The insns we've emitted.  */
-  vec unmodified_insns = vNULL;
+  auto_vec unmodified_insns;
   int count = 0;
 
   FOR_BB_INSNS (then_bb, insn)
@@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_i

[PATCH] 02/N Fix memory leaks in IPA

2015-11-09 Thread Martin Liška

Hi.

Following changes were consulted with Martin Jambor to properly release
memory in IPA. It fixes leaks which popped up in tramp3d with -O2.

Bootstrap and regression tests have been running.

Ready after it finishes?
Thanks,
Martin
>From 85b63f738030dd7a901c228ba76e24f820d31c5d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Nov 2015 12:38:27 +0100
Subject: [PATCH 2/2] Fix memory leaks in IPA.

gcc/ChangeLog:

2015-11-09  Martin Liska  

	* ipa-inline-analysis.c (estimate_function_body_sizes): Call
	body_info release function.
	* ipa-prop.c (ipa_release_body_info): New function.
	(ipa_analyze_node): Call the function.
	(ipa_node_params::~ipa_node_params): Release known_csts.
	* ipa-prop.h (ipa_release_body_info): Declare.
---
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/ipa-prop.c| 20 +++-
 gcc/ipa-prop.h|  2 +-
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index c07b0da..8c8b8e3 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -2853,7 +2853,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early)
   inline_summaries->get (node)->self_time = time;
   inline_summaries->get (node)->self_size = size;
   nonconstant_names.release ();
-  fbi.bb_infos.release ();
+  ipa_release_body_info (&fbi);
   if (opt_for_fn (node->decl, optimize))
 {
   if (!early)
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index d15f0eb..f379ea7 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -2258,6 +2258,19 @@ analysis_dom_walker::before_dom_children (basic_block bb)
   ipa_compute_jump_functions_for_bb (m_fbi, bb);
 }
 
+/* Release body info FBI.  */
+
+void
+ipa_release_body_info (struct ipa_func_body_info *fbi)
+{
+  int i;
+  struct ipa_bb_info *bi;
+
+  FOR_EACH_VEC_ELT (fbi->bb_infos, i, bi)
+free_ipa_bb_info (bi);
+  fbi->bb_infos.release ();
+}
+
 /* Initialize the array describing properties of formal parameters
of NODE, analyze their uses and compute jump functions associated
with actual arguments of calls from within NODE.  */
@@ -2313,11 +2326,7 @@ ipa_analyze_node (struct cgraph_node *node)
 
   analysis_dom_walker (&fbi).walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
-  int i;
-  struct ipa_bb_info *bi;
-  FOR_EACH_VEC_ELT (fbi.bb_infos, i, bi)
-free_ipa_bb_info (bi);
-  fbi.bb_infos.release ();
+  ipa_release_body_info (&fbi);
   free_dominance_info (CDI_DOMINATORS);
   pop_cfun ();
 }
@@ -3306,6 +3315,7 @@ ipa_node_params::~ipa_node_params ()
   free (lattices);
   /* Lattice values and their sources are deallocated with their alocation
  pool.  */
+  known_csts.release ();
   known_contexts.release ();
 
   lattices = NULL;
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index b69ee8a..2fe824d 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -775,7 +775,7 @@ bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec);
 ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
 		   ipa_parm_adjustment_vec,
 		   bool);
-
+void ipa_release_body_info (struct ipa_func_body_info *);
 
 /* From tree-sra.c:  */
 tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
-- 
2.6.2

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Christophe Lyon

On 30 October 2015 at 16:52, Matthew Wahab  wrote:
> On 30/10/15 12:51, Christophe Lyon wrote:
>>
>> On 23 October 2015 at 14:26, Matthew Wahab 
>> wrote:
>>>
>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>>> vqrdmlsh for these instructions. The new intrinsics are of the form
>>> vqrdml{as}h[q]_.
>>>
>>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>>> cross-compiled check-gcc on an ARMv8.1 emulator.
>>>
>>
>> Is there a publicly available simulator for v8.1? QEMU or Foundation
>> Model?
>>
>
> Sorry, I don't know.
> Matthew
>

So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?

Thanks,

Christophe.

RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-09 Thread Robert Suchanek

Hi Bernd,

Sorry for late reply.

The updated patch was bootstrapped on x86_64-unknown-linux-gnu and cross tested
on mips-img-linux-gnu using r229786.

The results below were generated for CSiBE benchmark and the numbers in
columns express bytes in format 'net (gain/loss)' to show the difference
with and without the patch when -frename-registers switch is used. 

I looked at the gains, especially for MIPS and 'teem', and it appears
that renaming registers affects the rtl_dce pass i.e. makes it less effective.
However, on average case the patch appears to reduce the code size
slightly and moves are genuinely removed.  

I haven't tested the performance extensively but the SPEC benchmarks
showed almost the same results, which could be just the noise. 


   | MIPS n64 -Os   | MIPS o32 -Os   | x86_64 -Os   |
---+++--+
bzip2-1.0.2| -32  (0/-32)   | -24  (0/-24)   | -34   (1/-35)|
cg_compiler| -172 (0/-172)  | -156 (0/-156)  | -46   (0/-46)|
compiler   | -36  (0/-36)   | -24  (0/-24)   | -6(0/-6) |
flex-2.5.31| -68  (0/-68)   | -80  (0/-80)   | -98   (7/-105)   |
jikespg-1.3| -284 (0/-284)  | -204 (0/-204)  | -127  (9/-136)   |
jpeg-6b| -52  (8/-60)   | -20  (0/-20)   | -80   (11/-91)   |
libmspack  | -136 (0/-136)  | -28  (0/-28)   | -33   (23/-56)   |
libpng-1.2.5   | -72  (0/-72)   | -64  (0/-64)   | -176  (14/-190)  |
linux-2.4.23   | -700 (20/-720) | -384 (0/-384)  | -691  (44/-735)  |
lwip-0.5.3 | -4   (0/-4)| -4   (0/-4)| +4(13/-9)|
mpeg2dec-0.3.1 | -16  (0/-16)   || -142  (6/-148)   |
mpgcut-1.1 | -24  (0/-24)   | -12  (4/-16)   | -2(0/-2) |
OpenTCP-1.0.4  | -28  (0/-28)   | -12  (0/-12)   | -1(0/-1) |
replaypc-0.4.0 | -32  (0/-32)   | -12  (0/-12)   | -4(2/-6) |
teem-1.6.0 | -88  (480/-568)| +108 (564/-456)| -1272 (117/-1389)|
ttt-0.10.1 | -24  (0/-24)   | -20  (0/-20)   | -16   (0/-16)|
unrarlib-0.4.0 | -20  (0/-20)   | -8   (0/-8)| -59   (9/-68)|
zlib-1.1.4 | -12  (0/-12)   | -4   (0/-4)| -23   (8/-31)|


   | MIPS n64 -O2   | MIPS o32 -O2   | x86_64 -O2   |
---+++--+
bzip2-1.0.2| -104 (0/-104)  | -48  (0/-48)   | -55   (0/-55)|
cg_compiler| -184 (4/-188)  | -232 (0/-232)  | -31   (5/-36)|
compiler   | -32  (0/-32)   | -12  (0/-12)   | -4(1/-5) |
flex-2.5.31| -96  (0/-96)   | -112 (0/-112)  | -12   (34/-46)   |
jikespg-1.3| -540 (20/-560) | -476 (4/-480)  | -154  (30/-184)  |
jpeg-6b| -112 (16/-128) | -60  (0/-60)   | -136  (84/-220)  |
libmspack  | -164 (0/-164)  | -40  (0/-40)   | -87   (32/-119)  |
libpng-1.2.5   | -120 (8/-128)  | -92  (4/-96)   | -140  (53/-193)  |
linux-2.4.23   | -596 (12/-608) | -320 (8/-328)  | -794  (285/-1079)|
lwip-0.5.3 | -8   (0/-8)| -8   (0/-8)| +2(4/-2) |
mpeg2dec-0.3.1 | -44  (0/-44)   | -4   (0/-4)| -122  (8/-130)   |
mpgcut-1.1 | -8   (0/-8)| -8   (0/-8)| +28   (32/-4)|
OpenTCP-1.0.4  | -4   (0/-4)| -4   (0/-4)| -2(0/-2) |
replaypc-0.4.0 | -20  (0/-20)   | -24  (0/-24)   | -13   (0/-13)|
teem-1.6.0 | +100 (740/-640)| +84  (736/-652)| -1998 (168/-2166)|
ttt-0.10.1 | -16  (0/-16)   ||  |
unrarlib-0.4.0 | -16  (0/-16)   | -8   (0/-8)| +19   (37/-18)   |
zlib-1.1.4 | -12  (0/-12)   | -4   (0/-4)| -15   (1/-16)|

Regards,
Robert

> Hi Robert,
> > gcc/
> > * regrename.c (create_new_chain): Initialize terminated_dead,
> > renamed and tied_chain.
> > (find_best_rename_reg): Pick and check register from the tied chain.
> > (regrename_do_replace): Mark head as renamed.
> > (scan_rtx_reg): Tie chains in move insns.  Set terminate_dead flag.
> > * regrename.h (struct du_head): Add tied_chain, renamed and
> > terminated_dead members.
> 
> Thanks - this looks a lot better already. You didn't say how it was
> bootstrapped and tested; please include this information for future
> submissions. For a patch like this, some data on the improvement you got
> would also be appreciated.
> 
> I'd still like to investigate the possibility of further simplification:
> 
> > +   {
> > + /* Find the input chain.  */
> > + for (i = c->id - 1; id_to_chain.iterate (i, &head); i--)
> > +   if (head->last && head->last->insn == insn
> > +   && head->terminated_dead)
> > + {
> > +   gcc_assert (head->regno == REGNO (recog_data.operand[1]));
> > +   c->tied_chain = head;
> > +   head->tied_chain = c;
> > +
> > +   if (dump_file)
> > + fprintf (dump_file, "Tying chain %s (%d) with %s (%d)\n",
> > +  reg_names[c->regno], c->id,
> > +  reg_names[head->regno], head->

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-09 Thread Ramana Radhakrishnan



On 08/11/15 00:26, charles.bay...@linaro.org wrote:
> From: Charles Baylis 
> 
>   Charles Baylis  
> 
>   * config/arm/neon.md (neon_vld1_lane): Remove error for invalid
>   lane number.
>   (neon_vst1_lane): Likewise.
>   (neon_vld2_lane): Likewise.
>   (neon_vst2_lane): Likewise.
>   (neon_vld3_lane): Likewise.
>   (neon_vst3_lane): Likewise.
>   (neon_vld4_lane): Likewise.
>   (neon_vst4_lane): Likewise.
> 

The only way we can get here is through the intrinsics - we do a check for lane 
numbers earlier.

If things go horribly wrong - the assembler will complain, so it's ok to elide 
this internal_error here, thus OK.

regards
Ramana

> Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774
> ---
>  gcc/config/arm/neon.md | 48 
>  1 file changed, 8 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index e8db020..6574e6e 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[3] = GEN_INT (lane);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>if (max == 1)
>  return "vld1.\t%P0, %A1";
>else
> @@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[3] = GEN_INT (lane);
>int regno = REGNO (operands[0]);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[2] = GEN_INT (lane);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>if (max == 1)
>  return "vst1.\t{%P1}, %A0";
>else
> @@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = gen_rtx_REG (DImode, regno);
>ops[1] = gen_rtx_REG (DImode, regno + 2);
>ops[2] = operands[1];
> @@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = operands[0];
>ops[1] = gen_rtx_REG (DImode, regno);
>ops[2] = gen_rtx_REG (DImode, regno + 2);
> @@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = gen_rtx_REG (DImode, regno);
>ops[1] = gen_rtx_REG (DImode, regno + 2);
>ops[2] = gen_rtx_REG (DImode, regno + 4);
> @@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = operands[0];
>ops[1] = gen_rtx_REG (DImode, regno);
>ops[2] = gen_rtx_REG (DImode, regno + 2);
> @@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -err

Re: OpenACC Firstprivate

2015-11-09 Thread Jakub Jelinek

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:
> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c(revision 229892)
> +++ gcc/gimplify.c(working copy)
> @@ -108,9 +108,15 @@ enum omp_region_type
>/* Data region with offloading.  */
>ORT_TARGET = 32,
>ORT_COMBINED_TARGET = 33,
> +
> +  ORT_ACC = 0x40,  /* An OpenACC region.  */
> +  ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
> +  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
> +  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x80,  /* Kernels construct.  */
> +
>/* Dummy OpenMP region, used to disable expansion of
>   DECL_VALUE_EXPRs in taskloop pre body.  */
> -  ORT_NONE = 64
> +  ORT_NONE = 0x100
>  };

If you want to switch to hexadecimal, you should change all values
in the enum to hexadecimal for consistency.
>  
>  /* Gimplify hashtable helper.  */
> @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
>else
>  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>  
> +  c->combined_loop = false;
> +  c->distribute = false;
> +  c->target_map_scalars_firstprivate = false;
> +  c->target_map_pointers_as_0len_arrays = false;
> +  c->target_firstprivatize_array_bases = false;

Why this?  c is XCNEW allocated, so zero initialized.

> @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
>/* We shouldn't be re-adding the decl with the same data
>sharing class.  */
>gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
> -  /* The only combination of data sharing classes we should see is
> -  FIRSTPRIVATE and LASTPRIVATE.  */
>nflags = n->value | flags;
> -  gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
> -   == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
> +  /* The only combination of data sharing classes we should see is
> +  FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
> +  reduction variables to be used in data sharing clauses.  */
> +  gcc_assert ((ctx->region_type & ORT_ACC) != 0
> +   || ((nflags & GOVD_DATA_SHARE_CLASS)
> +   == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
> || (flags & GOVD_DATA_SHARE_CLASS) == 0);

Are you sure you want to give up on any kind of consistency checks for
OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
the assert for that instead?  Something that can be done incrementally of
course.

> +
> +   /*  OpenMP doesn't look in outer contexts to find an
> +   enclosing data clause.  */

I'm puzzled by the comment.  OpenMP does look in outer context for clauses
that need that (pretty much all closes but private), that is the do_outer:
recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
copy construct) the private variable one needs the access to the outer
context's var etc.).
So perhaps it would help to document what you are doing here for OpenACC and
why.

> +   struct gimplify_omp_ctx *octx = ctx->outer_context;
> +   if ((ctx->region_type & ORT_ACC) && octx)
> + {
> +   omp_notice_variable (octx, decl, in_code);
> +   
> +   for (; octx; octx = octx->outer_context)
> + {
> +   if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
> + break;
> +   splay_tree_node n2
> + = splay_tree_lookup (octx->variables,
> +  (splay_tree_key) decl);
> +   if (n2)
> + {
> +   nflags |= GOVD_MAP;
> +   goto found_outer;
> + }
> + }
>   }
> -   else if (nflags == flags)
> - nflags |= GOVD_MAP;
> +

The main issue I have is with the omp-low.c changes.
I see:
"2.5.9
private clause
The private clause is allowed on the parallel construct; it declares that a 
copy of each
item on the list will be created for each parallel gang.

2.5.10
firstprivate clause
The firstprivate clause is allowed on the parallel construct; it declares that 
a copy
of each item on the list will be created for each parallel gang, and that the 
copy will be
initialized with the value of that item on the host when the parallel construct 
is
encountered."

but looking at what you actually emit looks like standard present_copyin
clause I think with a private variable defined in the region where the
value of the present_copyin mapped variable is assigned to the private one.
This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region), but more importantly, if the var is already mapped, you could
initialize the private var with old data.
Say
  int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
  // modify arr on the host
  # pragma acc parallel firstprivate (arr)

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Matthew Wahab


On 09/11/15 13:31, Christophe Lyon wrote:

On 30 October 2015 at 16:52, Matthew Wahab  wrote:

On 30/10/15 12:51, Christophe Lyon wrote:


On 23 October 2015 at 14:26, Matthew Wahab 
wrote:


The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.


Is there a publicly available simulator for v8.1? QEMU or Foundation
Model?


Sorry, I don't know.
Matthew



So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?



No, the check at the top of the  test files

+/* { dg-require-effective-target arm_v8_1a_neon_hw } */

should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support 
for this check is added in patch #5 in this series.) Note that the aarch64-none-linux 
make check was run on ARMv8 HW which can't execute the test and correctly reported it 
as unsupported.


Matthew

Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-11-09 Thread Ilya Enkovich

2015-10-20 16:45 GMT+03:00 Richard Biener :
> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich  wrote:
>> 2015-10-13 16:37 GMT+03:00 Richard Biener :
>>> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 This patch handles statements with boolean result in vectorization factor 
 computation.  For comparison its operands type is used instead of restult 
 type to compute VF.  Other boolean statements are ignored for VF.

 Vectype for comparison is computed using type of compared values.  
 Computed type is propagated into other boolean operations.
>>>
>>> This feels rather ad-hoc, mixing up the existing way of computing
>>> vector type and VF.  I'd rather have turned the whole
>>> vector type computation around to the scheme working on the operands
>>> rather than on the lhs and then searching
>>> for smaller/larger types on the rhs'.
>>>
>>> I know this is a tricky function (heh, but you make it even worse...).
>>> And it needs a helper with knowledge about operations
>>> so one can compute the result vector type for an operation on its
>>> operands.  The seeds should be PHIs (handled like now)
>>> and loads, and yes, externals need special handling.
>>>
>>> Ideally we'd do things in two stages, first compute vector types in a
>>> less constrained manner (not forcing a single vector size)
>>> and then in a 2nd run promote to a common size also computing the VF to do 
>>> that.
>>
>> This sounds like a refactoring, not a functional change, right? Also I
>> don't see a reason to analyze DF to compute vectypes if we promote it
>> to a single vector size anyway. For booleans we have to do it because
>> boolean vectors of the same size may have different number of
>> elements. What is the reason to do it for other types?
>
> For conversions and operators which support different sized operands

That's what we handle in vector patterns and use some helper functions
to determine vectypes there. Looks like this refactoring would affects
patterns significantly. Probably compute vectypes before searching for
patterns?

>
>> Shouldn't it be a patch independent from comparison vectorization series?
>
> As you like.

I'd like to move on with vector comparison and consider VF computation
refactoring when it's stabilized. This patch is the last one (except
target ones) not approved in all vector comparison related series.
Would it be OK to go on with it in a current shape?

Thanks,
Ilya

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 08:46, Jakub Jelinek wrote:

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:

Index: gcc/gimplify.c
===




If you want to switch to hexadecimal, you should change all values
in the enum to hexadecimal for consistency.


ok.



  /* Gimplify hashtable helper.  */
@@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
else
  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;

+  c->combined_loop = false;
+  c->distribute = false;
+  c->target_map_scalars_firstprivate = false;
+  c->target_map_pointers_as_0len_arrays = false;
+  c->target_firstprivatize_array_bases = false;


Why this?  c is XCNEW allocated, so zero initialized.


I presumed it necessary, as it was on the branch.  will  remove.




@@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
/* We shouldn't be re-adding the decl with the same data
 sharing class.  */
gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
-  /* The only combination of data sharing classes we should see is
-FIRSTPRIVATE and LASTPRIVATE.  */
nflags = n->value | flags;
-  gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
- == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
+  /* The only combination of data sharing classes we should see is
+FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
+reduction variables to be used in data sharing clauses.  */
+  gcc_assert ((ctx->region_type & ORT_ACC) != 0
+ || ((nflags & GOVD_DATA_SHARE_CLASS)
+ == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
  || (flags & GOVD_DATA_SHARE_CLASS) == 0);


Are you sure you want to give up on any kind of consistency checks for
OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
the assert for that instead?  Something that can be done incrementally of
course.


Will investigate (later)




+
+ /*  OpenMP doesn't look in outer contexts to find an
+ enclosing data clause.  */


I'm puzzled by the comment.  OpenMP does look in outer context for clauses
that need that (pretty much all closes but private), that is the do_outer:
recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
copy construct) the private variable one needs the access to the outer
context's var etc.).
So perhaps it would help to document what you are doing here for OpenACC and
why.


Ok.  It seemed (and it may become clearer with default handling added), that 
OpenACC  and OpenMP scanned scopes in opposite orders.  I remember trying to 
get the ACC code to scan in the same order, but came up blank.  Anyway, you're 
right, it should say what OpenACC is trying.




The main issue I have is with the omp-low.c changes.
I see:
"2.5.9
private clause
The private clause is allowed on the parallel construct; it declares that a 
copy of each
item on the list will be created for each parallel gang.

2.5.10
firstprivate clause
The firstprivate clause is allowed on the parallel construct; it declares that 
a copy
of each item on the list will be created for each parallel gang, and that the 
copy will be
initialized with the value of that item on the host when the parallel construct 
is
encountered."

but looking at what you actually emit looks like standard present_copyin
clause I think with a private variable defined in the region where the
value of the present_copyin mapped variable is assigned to the private one.




This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region),


I don't think that can be avoided.  The host doesn't have control over when the 
CTAs (a gang) start -- they may even be serialized onto the same physical HW. 
So each gang has to initialize its own instance.  Or did you mean something else?



but more importantly, if the var is already mapped, you could
initialize the private var with old data.




Say
   int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
   // modify arr on the host
   # pragma acc parallel firstprivate (arr)
   {
 ...
   }
}


Hm, I suspect that is either ill formed or the std does not contemplate.


Is that really what you want?  If not, any reason not to implement
GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-*
side and just use the OpenMP firstprivate handling in omp-low.c?


I would have to investigate ...

nathan

Re: [PATCH] Use signed boolean type for boolean vectors

2015-11-09 Thread Ilya Enkovich

On 03 Nov 14:42, Richard Biener wrote:
> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich  wrote:
> > 2015-10-28 18:21 GMT+03:00 Richard Biener :
> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich  
> >> wrote:
> >>> Hi,
> >>>
> >>> Testing boolean vector conversions I found several runtime regressions
> >>> and investigation showed it's due to incorrect conversion caused by
> >>> unsigned boolean type.  When boolean vector is represented as an
> >>> integer vector on target it's a signed integer actually.  Unsigned
> >>> boolean type was chosen due to possible single bit values, but for
> >>> multiple bit values it causes wrong casting.  The easiest way to fix
> >>> it is to use signed boolean value.  The following patch does this and
> >>> fixes my problems with conversion.  Bootstrapped and tested on
> >>> x86_64-unknown-linux-gnu.  Is it OK?
> >>
> >> Hmm.  Actually formally the "boolean" vectors were always 0 or -1
> >> (all bits set).  That is also true for a signed boolean with precision 1
> >> but with higher precision what makes sure to sign-extend 'true'?
> >>
> >> So it's far from an obvious change, esp as you don't change the
> >> precision == 1 case.  [I still think we should have precision == 1
> >> for all boolean types]
> >>
> >> Richard.
> >>
> >
> > For 1 bit precision signed type value 1 is out of range, right? This might 
> > break
> > in many place due to used 1 as true value.
> 
> For vectors -1 is true.  Did you try whether it breaks many places?
> build_int_cst (type, 1) should still work fine.
> 
> Richard.
> 

I tried it and didn't find any new failures.  So looks I was wrong assuming it 
should cause many failures.  Testing is not complete because many SPEC 
benchmarks are failing to compile on -O3 for AVX-512 on trunk.  But I think we 
may proceed with signed type and fix constant generation issues if any 
revealed.  This patch was bootstrapped and regtested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-09  Ilya Enkovich  

* optabs.c (expand_vec_cond_expr): Always get sign from type.
* tree.c (wide_int_to_tree): Support negative values for boolean.
(build_nonstandard_boolean_type): Use signed type for booleans.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index fdcdc6a..44971ad 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
   op0a = TREE_OPERAND (op0, 0);
   op0b = TREE_OPERAND (op0, 1);
   tcode = TREE_CODE (op0);
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 }
   else
 {
@@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
   op0a = op0;
   op0b = build_zero_cst (TREE_TYPE (op0));
   tcode = LT_EXPR;
-  unsignedp = false;
 }
   cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 
 
   gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
diff --git a/gcc/tree.c b/gcc/tree.c
index 18d6544..6fb4c09 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref &pcst)
case BOOLEAN_TYPE:
  /* Cache false or true.  */
  limit = 2;
- if (hwi < 2)
+ if (IN_RANGE (hwi, 0, 1))
ix = hwi;
  break;
 
@@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT 
precision)
 
   type = make_node (BOOLEAN_TYPE);
   TYPE_PRECISION (type) = precision;
-  fixup_unsigned_type (type);
+  fixup_signed_type (type);
 
   if (precision <= MAX_INT_CACHED_PREC)
 nonstandard_boolean_type_cache[precision] = type;

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 08:59, Nathan Sidwell wrote:

On 11/09/15 08:46, Jakub Jelinek wrote:

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:





Say
   int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
   // modify arr on the host
   # pragma acc parallel firstprivate (arr)
   {
 ...
   }
}


Hm, I suspect that is either ill formed or the std does not contemplate.


just realized, there are two ways to consider the above.

1) it's  ill formed.   Once you've transferred data to the device, modifying it 
on the host is unspecified.  I'm having trouble finding words in the std that 
actually say that though :(


2) on a system with shared physical global memory, the host modification would 
be visiable on the device (possibly at an arbitrary point due to lack of 
synchronization primitive?)


I don't think this changes 'why not use OpenMP's ...' question, because IIUC you 
think that can be made to DTRT anyway?


nathan

Re: RFC: C++ delayed folding merge

2015-11-09 Thread Jason Merrill


On 11/09/2015 04:24 AM, Eric Botcazou wrote:

One question: The branch changes 'convert' to not fold its result, and
it's not clear to me whether that's part of the expected behavior of a
front end 'convert' function or not.


I don't think that you should change the behavior for front-ends that have an
internal representation distinct from the GENERIC trees and thus do a global
translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have
*more* folding than less during this translation.


Right, the change is just to the C++ front end 'convert'.

Jason

Re: OpenACC Firstprivate

2015-11-09 Thread Jakub Jelinek

On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:
> >This I'm afraid performs often two copies rather than just one (one to copy
> >the host value to the present_copyin mapped value, another one in the
> >region),
> 
> I don't think that can be avoided.  The host doesn't have control over when
> the CTAs (a gang) start -- they may even be serialized onto the same
> physical HW. So each gang has to initialize its own instance.  Or did you
> mean something else?

So, what is the scope of the private and firstprivate vars in OpenACC?
In OpenMP if a variable is private or firstprivate on the target construct,
unless further privatized in inner constructs it is really shared among all
the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
Is that the case for OpenACC too, or are the vars e.g. private to each CTA
already or to each thread in each CTA, something different?
If they are shared by all CTAs, then you should hopefully be able to use the
GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
those to provide you the initializer data to initialize your private vars
from as a cheaper alternative to mapping.

Jakub

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-09 Thread Ramana Radhakrishnan



On 08/11/15 11:42, Andreas Schwab wrote:
> This is causing a bootstrap comparison failure in gcc/go/gogo.o.

I'm looking into this - this is now PR68256.

regards
Ramana
> 
> Andreas.
>

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:26 PM, Martin Liška  wrote:
> On 11/09/2015 01:11 PM, Richard Biener wrote:
>> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
>>> Hi.
>>>
>>> This is follow-up of changes that Richi started on Friday.
>>>
>>> Patch can bootstrap on x86_64-linux-pc and regression tests are running.
>>>
>>> Ready for trunk?
>>
>> * tree-ssa-dom.c (free_edge_info): Make the function extern.
>> ...
>> * tree-ssa.h (free_edge_info): Declare function extern.
>>
>> declare this in tree-ssa-threadupdate.h instead and renaming it to
>> sth less "public", like free_dom_edge_info.
>>
>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>> index fff62de..eb6b7df 100644
>> --- a/gcc/ifcvt.c
>> +++ b/gcc/ifcvt.c
>> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
>> *if_info)
>>set_used_flags (targets[i]);
>>  }
>>
>> +  temporaries.release ();
>> +
>>set_used_flags (cond);
>>set_used_flags (x);
>>set_used_flags (y);
>> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
>> *if_info)
>>  }
>>
>>num_updated_if_blocks++;
>> +  targets.release ();
>>return TRUE;
>>
>> suspiciously look like candidates for an auto_vec<> (didn't check).
>>
>> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>>dead_set = sparseset_alloc (max_regno);
>>unused_set = sparseset_alloc (max_regno);
>>curr_point = 0;
>> +  point_freq_vec.release ();
>>point_freq_vec.create (get_max_uid () * 2);
>>
>> a truncate (0) instead of a release () should be cheaper, avoiding the
>> re-allocation.
>>
>> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>>assign_link_pool.release ();
>>obstack_free (&name_obstack, NULL);
>>
>> +  for (hash_map >::iterator it =
>> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
>> +(*it).second.release ();
>> +
>>delete base_access_vec;
>>
>> I wonder if the better fix is to provide a proper free method for the 
>> hash_map?
>> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
>> was intented here via default_hash_map_traits <>?
>>
>> Anyway, most of the things above can be improved as followup of course.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Martin
>
> Hi.
>
> All suggested changes were applied, sending v2 and waiting for bootstrap and
> regression tests.

Ok.

Thanks,
Richard.

> Thanks,
> Martin

Re: [PATCH] 02/N Fix memory leaks in IPA

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:29 PM, Martin Liška  wrote:
> Hi.
>
> Following changes were consulted with Martin Jambor to properly release
> memory in IPA. It fixes leaks which popped up in tramp3d with -O2.
>
> Bootstrap and regression tests have been running.
>
> Ready after it finishes?

Ok.

Richard.

> Thanks,
> Martin

Re: [PATCH] Use signed boolean type for boolean vectors

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 3:03 PM, Ilya Enkovich  wrote:
> On 03 Nov 14:42, Richard Biener wrote:
>> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich  
>> wrote:
>> > 2015-10-28 18:21 GMT+03:00 Richard Biener :
>> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich  
>> >> wrote:
>> >>> Hi,
>> >>>
>> >>> Testing boolean vector conversions I found several runtime regressions
>> >>> and investigation showed it's due to incorrect conversion caused by
>> >>> unsigned boolean type.  When boolean vector is represented as an
>> >>> integer vector on target it's a signed integer actually.  Unsigned
>> >>> boolean type was chosen due to possible single bit values, but for
>> >>> multiple bit values it causes wrong casting.  The easiest way to fix
>> >>> it is to use signed boolean value.  The following patch does this and
>> >>> fixes my problems with conversion.  Bootstrapped and tested on
>> >>> x86_64-unknown-linux-gnu.  Is it OK?
>> >>
>> >> Hmm.  Actually formally the "boolean" vectors were always 0 or -1
>> >> (all bits set).  That is also true for a signed boolean with precision 1
>> >> but with higher precision what makes sure to sign-extend 'true'?
>> >>
>> >> So it's far from an obvious change, esp as you don't change the
>> >> precision == 1 case.  [I still think we should have precision == 1
>> >> for all boolean types]
>> >>
>> >> Richard.
>> >>
>> >
>> > For 1 bit precision signed type value 1 is out of range, right? This might 
>> > break
>> > in many place due to used 1 as true value.
>>
>> For vectors -1 is true.  Did you try whether it breaks many places?
>> build_int_cst (type, 1) should still work fine.
>>
>> Richard.
>>
>
> I tried it and didn't find any new failures.  So looks I was wrong assuming 
> it should cause many failures.  Testing is not complete because many SPEC 
> benchmarks are failing to compile on -O3 for AVX-512 on trunk.  But I think 
> we may proceed with signed type and fix constant generation issues if any 
> revealed.  This patch was bootstrapped and regtested on 
> x86_64-unknown-linux-gnu.  OK for trunk?

Ok.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * optabs.c (expand_vec_cond_expr): Always get sign from type.
> * tree.c (wide_int_to_tree): Support negative values for boolean.
> (build_nonstandard_boolean_type): Use signed type for booleans.
>
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index fdcdc6a..44971ad 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, 
> tree op1, tree op2,
>op0a = TREE_OPERAND (op0, 0);
>op0b = TREE_OPERAND (op0, 1);
>tcode = TREE_CODE (op0);
> -  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>  }
>else
>  {
> @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, 
> tree op1, tree op2,
>op0a = op0;
>op0b = build_zero_cst (TREE_TYPE (op0));
>tcode = LT_EXPR;
> -  unsignedp = false;
>  }
>cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>
>
>gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 18d6544..6fb4c09 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref &pcst)
> case BOOLEAN_TYPE:
>   /* Cache false or true.  */
>   limit = 2;
> - if (hwi < 2)
> + if (IN_RANGE (hwi, 0, 1))
> ix = hwi;
>   break;
>
> @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT 
> precision)
>
>type = make_node (BOOLEAN_TYPE);
>TYPE_PRECISION (type) = precision;
> -  fixup_unsigned_type (type);
> +  fixup_signed_type (type);
>
>if (precision <= MAX_INT_CACHED_PREC)
>  nonstandard_boolean_type_cache[precision] = type;

Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:54 PM, Ilya Enkovich  wrote:
> 2015-10-20 16:45 GMT+03:00 Richard Biener :
>> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich  
>> wrote:
>>> 2015-10-13 16:37 GMT+03:00 Richard Biener :
 On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  
 wrote:
> Hi,
>
> This patch handles statements with boolean result in vectorization factor 
> computation.  For comparison its operands type is used instead of restult 
> type to compute VF.  Other boolean statements are ignored for VF.
>
> Vectype for comparison is computed using type of compared values.  
> Computed type is propagated into other boolean operations.

 This feels rather ad-hoc, mixing up the existing way of computing
 vector type and VF.  I'd rather have turned the whole
 vector type computation around to the scheme working on the operands
 rather than on the lhs and then searching
 for smaller/larger types on the rhs'.

 I know this is a tricky function (heh, but you make it even worse...).
 And it needs a helper with knowledge about operations
 so one can compute the result vector type for an operation on its
 operands.  The seeds should be PHIs (handled like now)
 and loads, and yes, externals need special handling.

 Ideally we'd do things in two stages, first compute vector types in a
 less constrained manner (not forcing a single vector size)
 and then in a 2nd run promote to a common size also computing the VF to do 
 that.
>>>
>>> This sounds like a refactoring, not a functional change, right? Also I
>>> don't see a reason to analyze DF to compute vectypes if we promote it
>>> to a single vector size anyway. For booleans we have to do it because
>>> boolean vectors of the same size may have different number of
>>> elements. What is the reason to do it for other types?
>>
>> For conversions and operators which support different sized operands
>
> That's what we handle in vector patterns and use some helper functions
> to determine vectypes there. Looks like this refactoring would affects
> patterns significantly. Probably compute vectypes before searching for
> patterns?
>
>>
>>> Shouldn't it be a patch independent from comparison vectorization series?
>>
>> As you like.
>
> I'd like to move on with vector comparison and consider VF computation
> refactoring when it's stabilized. This patch is the last one (except
> target ones) not approved in all vector comparison related series.
> Would it be OK to go on with it in a current shape?

Yes.

Thanks,
Richard.

> Thanks,
> Ilya

Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 1:07 PM, Ilya Enkovich  wrote:
> On 26 Oct 16:09, Richard Biener wrote:
>> On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich  
>> wrote:
>> > +
>> > + ops.release ();
>> > + vec_defs.release ();
>>
>> No need to release auto_vec<>s at the end of scope explicitely.
>
> Fixed
>
>>
>> > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
>> > + new_stmt = gimple_build_assign (mask, vec_compare);
>> > + new_temp = make_ssa_name (mask, new_stmt);
>> > + gimple_assign_set_lhs (new_stmt, new_temp);
>>
>>  new_temp = make_ssa_name (mask);
>>  gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2);
>>
>> for the 4 stmts above.
>
> Fixed
>
>>
>> > +
>> > +  vec_oprnds0.release ();
>> > +  vec_oprnds1.release ();
>>
>> Please use auto_vec<>s.
>
> These are used to hold vecs returned by vect_get_slp_defs.  Thus can't 
> use auto_vec.

Ok.

Richard.

>>
>> Ok with those changes.
>>
>> RIchard.
>>
>
>
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * tree-vect-data-refs.c (vect_get_new_vect_var): Support 
> vect_mask_var.
> (vect_create_destination_var): Likewise.
> * tree-vect-stmts.c (vectorizable_comparison): New.
> (vect_analyze_stmt): Add vectorizable_comparison.
> (vect_transform_stmt): Likewise.
> * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
> (enum stmt_vec_info_type): Add comparison_vec_info_type.
> (vectorizable_comparison): New.
>
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 11bce79..926752b 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
> var_kind, const char *name)
>case vect_scalar_var:
>  prefix = "stmp";
>  break;
> +  case vect_mask_var:
> +prefix = "mask";
> +break;
>case vect_pointer_var:
>  prefix = "vectp";
>  break;
> @@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree 
> vectype)
>tree type;
>enum vect_var_kind kind;
>
> -  kind = vectype ? vect_simple_var : vect_scalar_var;
> +  kind = vectype
> +? VECTOR_BOOLEAN_TYPE_P (vectype)
> +? vect_mask_var
> +: vect_simple_var
> +: vect_scalar_var;
>type = vectype ? vectype : TREE_TYPE (scalar_dest);
>
>gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f1216c8..ee549f4 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>return true;
>  }
>
> +/* vectorizable_comparison.
> +
> +   Check if STMT is comparison expression that can be vectorized.
> +   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
> +   comparison, put it in VEC_STMT, and insert it at GSI.
> +
> +   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
> +
> +bool
> +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
> +gimple **vec_stmt, tree reduc_def,
> +slp_tree slp_node)
> +{
> +  tree lhs, rhs1, rhs2;
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
> +  tree new_temp;
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
> +  unsigned nunits;
> +  int ncopies;
> +  enum tree_code code;
> +  stmt_vec_info prev_stmt_info = NULL;
> +  int i, j;
> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +  vec vec_oprnds0 = vNULL;
> +  vec vec_oprnds1 = vNULL;
> +  gimple *def_stmt;
> +  tree mask_type;
> +  tree mask;
> +
> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +return false;
> +
> +  mask_type = vectype;
> +  nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +
> +  if (slp_node || PURE_SLP_STMT (stmt_info))
> +ncopies = 1;
> +  else
> +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
> +
> +  gcc_assert (ncopies >= 1);
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
> +  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> +  && reduc_def))
> +return false;
> +
> +  if (STMT_VINFO_LIVE_P (stmt_info))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"value used after loop.\n");
> +  return false;
> +}
> +
> +  if (!is_gimple_assign (stmt))
> +return false;
> +
> +  code = gimple_assign_rhs_code (stmt);
> +
> +  if (TREE_CODE_CLASS (code) != tcc_comparison)
> +return false;
> +
> +  rhs1 = gimple_assign_rhs1 (stmt);
> +  rhs2 = gimple_assign_rhs

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 1:11 PM, Ilya Enkovich  wrote:
> On 26 Oct 16:21, Richard Biener wrote:
>> On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich  
>> wrote:
>> > -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
>> > +   {
>> > + /* Can't use VIEW_CONVERT_EXPR for booleans because
>> > +of possibly different sizes of scalar value and
>> > +vector element.  */
>> > + if (VECTOR_BOOLEAN_TYPE_P (type))
>> > +   {
>> > + if (integer_zerop (val))
>> > +   val = build_int_cst (TREE_TYPE (type), 0);
>> > + else if (integer_onep (val))
>> > +   val = build_int_cst (TREE_TYPE (type), 1);
>> > + else
>> > +   gcc_unreachable ();
>> > +   }
>> > + else
>> > +   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), 
>> > val);
>>
>> I think the existing code is fine with using fold_convert () here
>> which should also work
>> for the boolean types.  So does just
>>
>>   val = fold_convert (TREE_TYPE (type), val);
>>
>> work?
>
> It seems to work OK.
>
>>
>> > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, 
>> > gimple_stmt_iterator *gsi,
>> >   gimple *gtemp;
>> >   vec_cond_lhs =
>> >   vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0),
>> > -   stmt, NULL);
>> > +   stmt, NULL, comp_vectype);
>> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt,
>> >   loop_vinfo, >emp, &def, &dts[0]);
>> >
>> >   vec_cond_rhs =
>> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1),
>> > -   stmt, NULL);
>> > + stmt, NULL, comp_vectype);
>> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt,
>> >   loop_vinfo, >emp, &def, &dts[1]);
>>
>> I still don't like this very much but I guess without some major
>> refactoring of all
>> the functions there isn't a better way to do it for now.
>>
>> Thus, ok with trying the change suggested above.
>>
>> Thanks,
>> Richard.
>>
>
> Here is an updated version.

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * expr.c (const_vector_mask_from_tree): New.
> (const_vector_from_tree): Use const_vector_mask_from_tree
> for boolean vectors.
> * tree-vect-stmts.c (vect_init_vector): Support boolean vector
> invariants.
> (vect_get_vec_def_for_operand): Add VECTYPE arg.
> (vectorizable_condition): Directly provide vectype for invariants
> used in comparison.
> * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
> arg.
>
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 2b2174f..03936ee 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, 
> tree minval, tree range,
>return 1;
>  }
>
> +/* Return a CONST_VECTOR rtx representing vector mask for
> +   a VECTOR_CST of booleans.  */
> +static rtx
> +const_vector_mask_from_tree (tree exp)
> +{
> +  rtvec v;
> +  unsigned i;
> +  int units;
> +  tree elt;
> +  machine_mode inner, mode;
> +
> +  mode = TYPE_MODE (TREE_TYPE (exp));
> +  units = GET_MODE_NUNITS (mode);
> +  inner = GET_MODE_INNER (mode);
> +
> +  v = rtvec_alloc (units);
> +
> +  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
> +{
> +  elt = VECTOR_CST_ELT (exp, i);
> +
> +  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
> +  if (integer_zerop (elt))
> +   RTVEC_ELT (v, i) = CONST0_RTX (inner);
> +  else if (integer_onep (elt)
> +  || integer_minus_onep (elt))
> +   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
> +  else
> +   gcc_unreachable ();
> +}
> +
> +  return gen_rtx_CONST_VECTOR (mode, v);
> +}
> +
>  /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
>  static rtx
>  const_vector_from_tree (tree exp)
> @@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp)
>if (initializer_zerop (exp))
>  return CONST0_RTX (mode);
>
> +  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
> +return const_vector_mask_from_tree (exp);
> +
>units = GET_MODE_NUNITS (mode);
>inner = GET_MODE_INNER (mode);
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index ee549f4..af203ab 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
> gimple_stmt_iterator *gsi)
>if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
> {
>   if (CONSTANT_CLASS_P (val))
> -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type

Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Michael Matz

Hi,

On Sat, 7 Nov 2015, Richard Sandiford wrote:

> For -fmath-errno, builtins.c currently expands calls to sqrt to:
> 
> y = sqrt_optab (x);
> if (y != y)
>   [ sqrt (x); or errno = EDOM; ]
> 
> - the call to sqrt is protected by the result of the optab rather
>   than the input.  It would be better to check !(x >= 0), like
>   tree-call-cdce.c does.

It depends.  With fast-math (and hence without NaNs) you can trivially 
optimize away a (y != y) test.  You can't do so with !(x>=0) at all.  

> - the branch isn't exposed at the gimple level and so gets little
>   high-level optimisation.
> 
> - we do this for log too, but for log a zero input produces
>   -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.
> 
> This patch moves the code to tree-call-cdce.c instead,

This somehow feels wrong.  Dead-code elimination doesn't have anything to 
do with the transformation you want, it rather is rewriting all feasible 
calls into something else, like fold_builtins does.  Also cdce currently 
doesn't seem to do any checks on the fast-math flags, so I wonder if some 
of the conditions that you now also insert for calls whose results are 
used stay until final code.

> Previously the pass was only enabled by default at -O2 or above, but the 
> old builtins.c code was enabled at -O.  The patch therefore enables the 
> pass at -O as well.

The pass is somewhat expensive in that it removes dominator info and 
schedules a full ssa update.  The transformation is trivial enough that 
dominators and SSA form can be updated on the fly, I think without that 
it's not feasible for -O.

But as said I think this transformation should better be moved into 
builtin folding (or other call folding), at which point also the fast-math 
flags can be checked.  The infrastructure routines of tree-call-cdce can 
be used there of course.  If so moved the cdce pass would be subsumed by 
that btw. (because the dead call result will be trivially exposed), and 
that would be a good thing.

Ciao,
Michael.

[PATCH] Fix PR68248

2015-11-09 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-09  Richard Biener  

PR tree-optimization/68248
* tree-vect-generic.c (expand_vector_operations_1): Handle
scalar rhs2.

* gcc.dg/torture/pr68248.c: New testcase.

Index: gcc/tree-vect-generic.c
===
*** gcc/tree-vect-generic.c (revision 230003)
--- gcc/tree-vect-generic.c (working copy)
*** expand_vector_operations_1 (gimple_stmt_
*** 1527,1532 
--- 1528,1535 
tree srhs1, srhs2 = NULL_TREE;
if ((srhs1 = ssa_uniform_vector_p (rhs1)) != NULL_TREE
&& (rhs2 == NULL_TREE
+ || (! VECTOR_TYPE_P (TREE_TYPE (rhs2))
+ && (srhs2 = rhs2))
  || (srhs2 = ssa_uniform_vector_p (rhs2)) != NULL_TREE)
/* As we query direct optabs restrict to non-convert operations.  */
&& TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (srhs1)))
Index: gcc/testsuite/gcc.dg/torture/pr68248.c
===
*** gcc/testsuite/gcc.dg/torture/pr68248.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr68248.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do compile } */
+ 
+ int a, b, c, d;
+ 
+ int
+ fn1 (int p1)
+ {
+   return a > 0 ? p1 : p1 >> a;
+ }
+ 
+ void
+ fn2 ()
+ {
+   char e;
+   for (; c; c++)
+ {
+   e = fn1 (!d ^ 2);
+   b ^= e;
+ }
+ }

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 09:10, Jakub Jelinek wrote:

On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:

This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region),


I don't think that can be avoided.  The host doesn't have control over when
the CTAs (a gang) start -- they may even be serialized onto the same
physical HW. So each gang has to initialize its own instance.  Or did you
mean something else?


So, what is the scope of the private and firstprivate vars in OpenACC?
In OpenMP if a variable is private or firstprivate on the target construct,
unless further privatized in inner constructs it is really shared among all
the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
Is that the case for OpenACC too, or are the vars e.g. private to each CTA
already or to each thread in each CTA, something different?
If they are shared by all CTAs, then you should hopefully be able to use the
GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
those to provide you the initializer data to initialize your private vars
from as a cheaper alternative to mapping.


I'm going to try and get clarification, but I think the intent is to initialize 
with the value seen on the device.  Consider:



int foo = 0;
#pragma acc data copyin(foo)
{
  #pragma acc parallel present(foo)
  {
foo = 2;
  }

  if (expr){
#pragma update host (foo)
  }

  #pragma acc parallel firstprivate (foo)
  {
  // which initialization value?
  }
}

Here we copy data to the device, then set it a distinct value there.  We 
conditionally update the host's instance from the device.


My thinking is that the intent of the firstprivate is to initialize with the 
value known on the device (and behave as-if copyin, if it's not there).  Not the 
value most recently seen on the host -- the update clause could change that, and 
may well be being used as a debugging aide, so it seems bizarre that it can 
change program semantics in such a way.

Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0

2015-11-09 Thread Richard Biener

On Tue, Oct 20, 2015 at 11:27 PM, Richard Henderson  wrote:
> ---
>  gcc/testsuite/gcc.target/i386/addr-space-3.c | 10 ++
>  1 file changed, 10 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c 
> b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> new file mode 100644
> index 000..63f1f03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O" } */
> +/* { dg-final { scan-assembler "[fg]s:0" } } */

Causes

ERROR: (DejaGnu) proc "fg" does not exist.
The error code is NONE
The info on the error is:
close: spawn id exp6 not open
while executing
"close -i exp6"
invoked from within
"catch "close -i $spawn_id""



> +
> +void test(int *y)
> +{
> +  int *x = (int __seg_tls *)0;
> +  if (x == y)
> +asm("");
> +}
> --
> 2.4.3
>

Re: Add null identifiers to genmatch

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 12:17 AM, Jeff Law  wrote:
> On 11/07/2015 07:31 AM, Pedro Alves wrote:
>>
>> Hi Richard,
>>
>> Passerby comment below.
>>
>> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>>>
>>> -/* Lookup the identifier ID.  */
>>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>>
>>>   id_base *
>>> -get_operator (const char *id)
>>> +get_operator (const char *id, bool allow_null = false)
>>>   {
>>> +  if (allow_null && strcmp (id, "null") == 0)
>>> +return null_id;
>>> +
>>> id_base tem (id_base::CODE, id);
>>
>>
>> Boolean params are best avoided if possible, IMO.  In this case,
>> it seems this could instead be a new wrapper function, like:
>
> This hasn't been something we've required for GCC.I've come across this
> recommendation a few times over the last several months as I continue to
> look at refactoring and best practices for codebases such as GCC.
>
> By encoding the boolean in the function's signature, it (IMHO) does make the
> code a bit easier to read, primarily because you don't have to go lookup the
> tense of the boolean).  The problem is when the boolean is telling us some
> property an argument, but there's more than one argument and other similar
> situations.
>
> I wonder if the real benefit is in the refactoring necessary to do things in
> this way without a ton of code duplication.

I think the patch is ok as-is.

Thus ok.

Thanks,
Richard.

> Jeff
>
>

Re: Add a combined_fn enum

2015-11-09 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/09/2015 11:24 AM, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
>>> I see it's already acked, but have you considered just doing away with
>>> the builtin/internal function distinction?
>>
>> I think they're too different to be done away with entirely.  built-in
>> functions map directly to a specific C-level callable function and
>> must have an fndecl, whereas no internal function should have an fndecl.
>> Whether a built-in function is available depends on the selected
>> language and what declarations the front-end has seen, while whether
>> an internal function is available depends entirely on GCC internal
>> information.
>
> Yes... but aren't these things fixable relatively easily (compared with 
> what your patches are doing)?

I'm not sure what you mean by "fix" though.  I don't think we can change
any of the constraints above.

> I also have the problem that I can't quite see where your patch series 
> is going. Let's take "Add internal bitcount functions", it adds new 
> internal functions but no users AFAICS. What is the end goal here (there 
> doesn't seem to be a [0/N] description in my inbox)?

The main goal is to allow functions to be vectorised simply by defining
the associated optab.  At the moment you can get a scalar square root
instruction simply by defining something like sqrtdf2.  But if you want
to have vectorised sqrt, you need to have a target-specific C-level
built-in function for the vector form of sqrt, implement
TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in the same
way that the target expands other directly-called built-in functions.
That seems unnecessarily indirect, especially when in practice those
target-specific functions tend to use patterns like sqrtv2df2 anyway.
It seems better to have GCC use the vector optabs directly.

This was prompted by the patch Dave Sherwood posted to support scalar
and vector fmin() and fmax() even with -fno-fast-math on aarch64.
As things stood we needed the same approach: use an optab for the scalar
version and TARGET_BUILTIN_VECTORIZED_FUNCTION for the vector version.
The problem is that at present there's no aarch64 built-in function that
does what we want, so we'd have to define a new one.  And that seems
silly when GCC really ought to be able to use the vector version
of the optab without any more help from the target.

I'm hoping to post those patches later today.

But the series has other side-benefits, like:

- allowing genmatch to fold calls to internal functions

- splitting the computational part of maths function from the
  the fallback errno handling at an earlier point, so that they
  get more optimisation

- clearly separating out the call folds that we're prepared to do
  on gimple, rather than calling into builtins.c.

Thanks,
Richard

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-09 Thread Alan Lawrence

On 06/11/15 12:55, Richard Biener wrote:
>
>> +  /* GROUP_GAP of the first group now has to skip over the second group 
>> too.  */
>> +  GROUP_GAP (first_vinfo) += group2_size;
>
> Please add a MSG_NOTE debug printf stating that we split the group and
> at which element.

Done.

> I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt))
> otherwise this could be SLP reductions where there is no way the split
> group would enable vectorization.

Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) check
sufficed for that, as per e.g. the section above
  /* Create a node (a root of the SLP tree) for the packed grouped stores. */
But done.

> Note that BB vectorization now also very aggressively falls back to 
> considering
> non-matches being "external".
>
> Not sure why that doesn't trigger for your testcases ;)

I tested against trunk r229944, on which all of my scan-tree-dump's were 
failing

> I'm comfortable with the i < group_size half of the patch.  For the other 
> piece
> I'd like to see more compile-time / success data from, say, building
> SPEC CPU 2006.

Well, as a couple of quick data points, a compile of SPEC2000 on
aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I have:

3080 successes without patch;
+79 successes from the "i < vectorization_factor" part of the patch (on its own)
+90 successes from the (i>=vectorization_factor) && "i < group_size" part (on 
its own)
+(79 from first) +(90 from second) + (an additional 62) from both parts 
together.

And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer -mcpu=cortex-a57):
11979 successes without patch;
+ 499 from the "i < vectorization_factor" part
+ 264 from the (i >= vectorization factor) && (i < group_size)" part
+ extra 336 if both parts combined.

I haven't done any significant measurements of compile-time yet.

(snipping this bit out-of-order)
> Hmm. This is of course pretty bad for compile-time for the non-matching
> case as that effectively will always split into N pieces if we feed it
> garbage (that is, without being sure that at least one pice _will_ vectorize).
>
> OTOH with the current way of computing "matches" we'd restrict ourselves
> to cases where the first stmt we look at (and match to) happens to be
> the operation that in the end will form a matching group.
...
> Eventually we'd want to improve the "matches" return
> to include the affected stmts (ok, that'll be not very easy) so we can do
> a cheap "if we split here will it fix that particular mismatch" check.

Yes, I think there are a bunch of things we can do here, that would be more
powerful than the simple approach I used here. The biggest limiting factor will
probably be (lack of) permutation, i.e. if we only SLP stores to consecutive
addresses.

> So, please split the patch and I suggest to leave the controversical part
> for next stage1 together with some improvement on the SLP build process
> itself?

Here's a reduced version with just the second case,
bootstrapped+check-gcc/g++ on x86_64.

gcc/ChangeLog:

* tree-vect-slp.c (vect_split_slp_store_group): New.
(vect_analyze_slp_instance): During basic block SLP, recurse on
subgroups if vect_build_slp_tree fails after 1st vector.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
* gcc.dg/vect/bb-slp-subgroups-1.c: New.
* gcc.dg/vect/bb-slp-subgroups-2.c: New.
* gcc.dg/vect/bb-slp-subgroups-3.c: New.
* gcc.dg/vect/bb-slp-subgroups-4.c: New.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 ++--
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++
 gcc/tree-vect-slp.c| 74 +-
 6 files changed, 246 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
index ab54a48..b8bef8c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
@@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
   unsigned int *pout = &out[0];
   unsigned int a0, a1, a2, a3;
 
-  /* Non isomorphic.  */
+  /* Non isomorphic, even 64-bit subgroups.  */
   a0 = *pin++ + 23;
-  a1 = *pin++ + 142;
+  a1 = *pin++ * 142;
   a2 = *pin++ + 2;
   a3 = *pin++ * 31;
-  
+
   *pout++ = a0 * x;
   *pout++ = a1 * y;
   *pout++ = a2 * x;
@@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
 
   /* Check results.  */
   if (out[0] != (in[0] +

Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier

2015-11-09 Thread Michael Matz

Hi,

On Mon, 9 Nov 2015, Jeff Law wrote:

+verify_ssaname_freelists (struct function *fun)
+{
+  /* Do nothing if we are in RTL format.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+{
+  if (bb->flags & BB_RTL)
+   return;
+}

gimple_in_ssa_p (fun);

+  /* Then note the operands of each statement.  */
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next (&gsi))
+   {
+ ssa_op_iter iter;
+ gimple *stmt = gsi_stmt (gsi);
+ FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS)
+   if (TREE_CODE (t) == SSA_NAME)
+ bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t));
+   }

t will always be an SSA_NAME here.


Ciao,
Michael.

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Richard Biener

On Fri, Nov 6, 2015 at 8:14 PM, Bernd Schmidt  wrote:
> On 11/06/2015 08:04 PM, Steve Ellcey  wrote:
>>
>> When I made this change I had one regression in the GCC testsuite
>> (gcc.dg/fixed-point/convert-sat.c).  I tracked this down to the
>> fact that emit_library_call_value_1 does not do any argument promotion
>> because it does not have the original tree type information for library
>> calls.  It only knows about modes.  I can't change
>> emit_library_call_value_1
>> to do the promotion because it does not know whether to do a signed or
>> unsigned promotion, but expand_fixed_convert could do the conversion
>> before calling emit_library_call_value_1 and that is what this patch does.
>
>
> Hmm, difficult. I can see how there would be a problem, but considering how
> many calls to emit_library_call_* we have, I'm slightly worried whether this
> is really is a good approach.
>
> On the other hand, there seems to be precedent in this file:
>
>   if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION
> (SImode))
> from = convert_to_mode (SImode, from, unsignedp);
>
>> The 'real' long term fix for this problem is to have tree types for
>> builtin
>> functions so the proper promotions can always be done but that is a fairly
>> large change that I am not willing to tackle right now and it could
>> probably
>> not be done in time for GCC 6.0 anyway.
>
>
> Yeah, but I agree that this is the real fix. We should aim to get rid of the
> emit_library_call functions.

Indeed.  In the "great plan" of simplifying RTL expansion by moving stuff
up to the GIMPLE level this could be done in a lowering stage lowering
all operations that we need to do via libcalls to GIMPLE calls.  Now,
we'd either need proper function declarations for all libcalls of optabs
for this or have the optab internal function stuff from Richard also provide
the libcall fallback.

In the expansion code for the as-libcall path we can then simply use the
type of the incoming argument (as we could if emit_library_call_value_1
would in addition to the RTX operands also receive the original tree ones).

Richard.


>> +  if (SCALAR_INT_MODE_P (from_mode))
>> +{
>> +  /*  If we need to promote the integer function argument we need to
>> do
>
>
> Extra space at the start of the comment.
>
>> + it here instead of inside emit_library_call_value because here
>> we
>> + know if we should be doing a signed or unsigned promotion.  */
>> +
>> +  machine_mode arg_mode;
>> +  int unsigned_p = 0;
>> +
>> +  arg_mode = promote_function_mode (NULL_TREE, from_mode,
>> +   &unsigned_p, NULL_TREE, 0);
>> +  if (arg_mode != from_mode)
>> +   {
>> + from = convert_to_mode (arg_mode, from, uintp);
>> + from_mode = arg_mode;
>> +   }
>> +}
>
>
> Move this into a separate function (prepare_libcall_arg)? I'll think about
> it over the weekend and let others chime in if they want, but I think I'll
> probably end up approving it with that change.
>
>
> Bernd

Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Richard Sandiford

Michael Matz  writes:
> On Sat, 7 Nov 2015, Richard Sandiford wrote:
>> For -fmath-errno, builtins.c currently expands calls to sqrt to:
>> 
>> y = sqrt_optab (x);
>> if (y != y)
>>   [ sqrt (x); or errno = EDOM; ]
>> 
>> - the call to sqrt is protected by the result of the optab rather
>>   than the input.  It would be better to check !(x >= 0), like
>>   tree-call-cdce.c does.
>
> It depends.  With fast-math (and hence without NaNs) you can trivially 
> optimize away a (y != y) test.  You can't do so with !(x>=0) at all.  

-ffast-math would already cause us to treat the function as not setting
errno, so the code wouldn't be used.

>> - the branch isn't exposed at the gimple level and so gets little
>>   high-level optimisation.
>> 
>> - we do this for log too, but for log a zero input produces
>>   -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.
>> 
>> This patch moves the code to tree-call-cdce.c instead,
>
> This somehow feels wrong.  Dead-code elimination doesn't have anything to 
> do with the transformation you want, it rather is rewriting all feasible 
> calls into something else, like fold_builtins does.  Also cdce currently 
> doesn't seem to do any checks on the fast-math flags, so I wonder if some 
> of the conditions that you now also insert for calls whose results are 
> used stay until final code.

Isn't that just a question of what you call the current pass though?
Arguably "conditional DCE" was always a confusing name, since the
pass only _adds_ code :-)  Obviously the point is that it eliminates
the external call at runtime in most cases, but so does the transformation
I'm adding.

I'd be happy to rename the file to something else if that would help.

>> Previously the pass was only enabled by default at -O2 or above, but the 
>> old builtins.c code was enabled at -O.  The patch therefore enables the 
>> pass at -O as well.
>
> The pass is somewhat expensive in that it removes dominator info and 
> schedules a full ssa update.  The transformation is trivial enough that 
> dominators and SSA form can be updated on the fly, I think without that 
> it's not feasible for -O.

r229916 fixed that for the non-EH case.  I posted a patch to update
the vops for the non-EH case as well:

https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html

If this aspect is going to be a blocker, maybe we can reconsider that,
since the EH case should be rare.

> But as said I think this transformation should better be moved into 
> builtin folding (or other call folding), at which point also the fast-math 
> flags can be checked.  The infrastructure routines of tree-call-cdce can 
> be used there of course.  If so moved the cdce pass would be subsumed by 
> that btw. (because the dead call result will be trivially exposed), and 
> that would be a good thing.

Yeah, that was why I added the code to the cdce pass.  I don't think
it makes sense to have one pass that adds range checks for calls whose
lhs is unused and a separate pass for calls whose lhs isn't used.

But by "builtin folding", do you mean fold_builtin_n etc.?
I thought the point was that we were trying to move away from
relying on folding at that level.  Or did you mean something else?

Thanks,
Richard

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-09 Thread Bernd Schmidt


On 11/08/2015 10:11 AM, Richard Biener wrote:

On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"  
wrote:

+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+  const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+&& operand_equal_p (ccand1->base, ccand2->base, 0)
+&& operand_equal_p (ccand1->step, ccand2->step, 0)
+&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));


Yes.  Patch is OK then.


Doesn't follow the formatting rules though in the quoted piece.


Bernd

Re: Add a combined_fn enum

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 03:57 PM, Richard Sandiford wrote:


I'm not sure what you mean by "fix" though.  I don't think we can change
any of the constraints above.


I don't see why they couldn't be relaxed. What's stopping us from 
defining some builtins independently of the frontends, and don't all 
languages except Fortran use the normal builtins.def anyway? Also - if 
you can have an internal function without a fndecl, why wouldn't it work 
to allow that for builtins as well? But this is probably moot since I 
can now see a little more clearly what you want to do.



The main goal is to allow functions to be vectorised simply by defining
the associated optab.


Ok. So these internal functions would support arbitrary modes like 
V[1248][SDX]F etc., unlike the builtins? That wasn't entirely clear from 
the patches I looked at, it seemed like you were just duplicating every 
builtin as an internal function (something to which I would have objected).



Bernd

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Marcus Shawcroft

On 9 November 2015 at 11:32, Kyrill Tkachov  wrote:

> 2015-11-09  Kyrylo Tkachov  
>
> PR target/68129
> * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
> * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
> Delete VOIDmode case.  Assert that mode is not VOIDmode.
> * config/aarch64/predicates.md (const0_operand): Remove const_double
> match.
>
> 2015-11-09  Kyrylo Tkachov  
>
> PR target/68129
> * gcc.target/aarch64/pr68129_1.c: New test.

Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ?

Cheers
/Marcus

[PATCH series, 16] Use parloops to parallelize oacc kernels regions

2015-11-09 Thread Tom de Vries


Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

 1  Insert new exit block only when needed in
transform_to_exit_first_loop_alt
 2  Make create_parallel_loop return void
 3  Ignore reduction clause on kernels directive
 4  Implement -foffload-alias
 5  Add in_oacc_kernels_region in struct loop
 6  Add pass_oacc_kernels
 7  Add pass_dominator_oacc_kernels
 8  Add pass_ch_oacc_kernels
 9  Add pass_parallelize_loops_oacc_kernels
10  Add pass_oacc_kernels pass group in passes.def
11  Update testcases after adding kernels pass group
12  Handle acc loop directive
13  Add c-c++-common/goacc/kernels-*.c
14  Add gfortran.dg/goacc/kernels-*.f95
15  Add libgomp.oacc-c-c++-common/kernels-*.c
16  Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are 
intended to be committed at the same time.


Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a 
patch that enables accelerator testing (which is submitted at 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).


I'll post the individual patches in reply to this message.

Thanks,
- Tom
---

1
Insert new exit block only when needed in transform_to_exit_first_loop_alt

2015-06-30  Tom de Vries  

* tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit
block only when needed.
---

2
Make create_parallel_loop return void

2015-11-09  Tom de Vries  

* tree-parloops.c (create_parallel_loop): Return void.
---

3
Ignore reduction clause on kernels directive

2015-11-08  Tom de Vries  

* c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION,
classify as loop clause.
---

4
Implement -foffload-alias

2015-11-03  Tom de Vries  

* common.opt (foffload-alias): New option.
* flag-types.h (enum offload_alias): New enum.
* omp-low.c (install_var_field): Handle flag_offload_alias.
* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
(@item -foffload-alias): New item.

* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---

5
Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  

* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
---

6
Add pass_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_oacc_kernels): Declare.
* tree-ssa-loop.c (gate_oacc_kernels): New static function.
(pass_data_oacc_kernels): New pass_data.
(class pass_oacc_kernels): New pass.
(make_pass_oacc_kernels): New function.
---

7
Add pass_dominator_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_dominator_oacc_kernels): Declare.
* tree-ssa-dom.c (class dominator_base): New class.  Factor out of ...
(class pass_dominator): ... here.
(dominator_base::may_peel_loop_headers_p)
(pass_dominator::may_peel_loop_headers_p): New function.
(pass_dominator_oacc_kernels): New pass.
(make_pass_dominator_oacc_kernels): New function.
(dominator_base::execute): Use may_peel_loop_headers_p.
---

8
Add pass_ch_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_ch_oacc_kernels): Declare.
* tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New
constructor.
(pass_data_ch_oacc_kernels): New pass_data.
(class pass_ch_oacc_kernels): New pass.
(pass_ch_oacc_kernels::process_loop_p): New function.
(make_pass_ch_oacc_kernels): New function.
---

9
Add pass_parallelize_loops_oacc_kernels

2015-11-09  Tom de Vries  

* omp-low.c (set_oacc_fn_attrib): Make extern.
* omp-low.c (expand_omp_atomic_fetch_op):  Release defs of update stmt.
* omp-low.h (set_oacc_fn_attrib): Declare.
* tree-parloops.c (struct reduction_info): Add reduc_addr field.
(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
(create_parallel_loop, gen_parallel_loop, try_create_reduction_list):
Add and handle function parameter oacc_kernels_p.
(get_omp_data_i_param): New function.
(ref_conflicts_with_region, oacc_entry_exit_ok_1)
(oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function.
(parallelize_loops): Add and handle function parameter oacc_kernels_p.
Calculate dominance info.  Skip loops that are not in a kernels region
in oacc_kernels_p mode.  Skip inner loops of parallelized loops.
(

Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-09 Thread Andreas Krebbel

On 11/04/2015 05:17 PM, Andreas Arnez wrote:
> Since r223098 ("Implement -Wmisleading-indentation") the backward-jump
> generated for a C while- or for-loop can get the wrong line number.
> This is because the check for misleading indentation peeks ahead one
> token, advancing input_location to after the loop, and then
> c_finish_loop() creates the back-jump and calls add_stmt(), which
> assigns input_location to the statement by default.
> 
> This patch swaps the check for misleading indentation with the finishing
> of the loop, such that input_location still has the right value at the
> time of any invocations of add_stmt().  Note that this does not fully
> cover all cases where the back-jump gets the wrong location.
> 
> gcc/c/ChangeLog:
> 
>   PR debug/67192
>   * c-parser.c (c_parser_while_statement): Finish the loop before
>   parsing ahead for misleading indentation.
>   (c_parser_for_statement): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR debug/67192
>   * gcc.dg/guality/pr67192.c: New test.

Applied. Thanks!

-Andreas-

Re: [PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location

2015-11-09 Thread Andreas Krebbel

On 11/04/2015 05:18 PM, Andreas Arnez wrote:
> After parsing an unconditional "while"- or "for"-loop, the C front-end
> generates a backward-goto statement and implicitly sets its location to
> the current input_location.  But in some cases the parser peeks ahead
> first, such that input_location already points to the line after the
> loop and the generated backward-goto gets the wrong line number.
> 
> One way this can occur is with a loop body consisting of an "if"
> statement, because then the parser peeks for an optional "else" before
> finishing the loop.
> 
> Another way occurred after r223098 ("Implement
> -Wmisleading-indentation"), even with a loop body enclosed in braces.
> This was because the check for misleading indentation always peeks ahead
> one token as well.
> 
> This patch avoids the use of input_location and sets the location of the
> backward-goto to the start of the loop body instead, or, if there is no
> loop body, to the start of the loop.
> 
> gcc/c/ChangeLog:
> 
>   PR debug/67192
>   * c-typeck.c (c_finish_loop): For unconditional loops, set the
>   location of the backward-goto to the start of the loop body.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR debug/67192
>   * gcc.dg/guality/pr67192.c (f3, f4): New functions.
>   (main): Invoke them.

Applied. Thanks!

-Andreas-

[PATCH, 1/16] Insert new exit block only when needed in transform_to_exit_first_loop_alt

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.



In transform_to_exit_first_loop_alt we insert a new exit block  in 
between the new loop header and the old exit block. Currently, we also 
do this if this is not necessary.


This patch figures out when we need to insert a new exit block, and only 
then inserts it.


Thanks,
- Tom

Insert new exit block only when needed in transform_to_exit_first_loop_alt

2015-06-30  Tom de Vries  

	* tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit
	block only when needed.
---
 gcc/tree-parloops.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 3d41275..6a49aa9 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1695,10 +1695,15 @@ transform_to_exit_first_loop_alt (struct loop *loop,
   /* Set the latch arguments of the new phis to ivtmp/sum_b.  */
   flush_pending_stmts (post_inc_edge);
 
-  /* Create a new empty exit block, inbetween the new loop header and the old
- exit block.  The function separate_decls_in_region needs this block to
- insert code that is active on loop exit, but not any other path.  */
-  basic_block new_exit_block = split_edge (exit);
+
+  basic_block new_exit_block = NULL;
+  if (!single_pred_p (exit->dest))
+{
+  /* Create a new empty exit block, inbetween the new loop header and the
+	 old exit block.  The function separate_decls_in_region needs this block
+	 to insert code that is active on loop exit, but not any other path.  */
+  new_exit_block = split_edge (exit);
+}
 
   /* Insert and register the reduction exit phis.  */
   for (gphi_iterator gsi = gsi_start_phis (exit_block);
@@ -1706,17 +1711,24 @@ transform_to_exit_first_loop_alt (struct loop *loop,
gsi_next (&gsi))
 {
   gphi *phi = gsi.phi ();
+  gphi *nphi = NULL;
   tree res_z = PHI_RESULT (phi);
+  tree res_c;
 
-  /* Now that we have a new exit block, duplicate the phi of the old exit
-	 block in the new exit block to preserve loop-closed ssa.  */
-  edge succ_new_exit_block = single_succ_edge (new_exit_block);
-  edge pred_new_exit_block = single_pred_edge (new_exit_block);
-  tree res_y = copy_ssa_name (res_z, phi);
-  gphi *nphi = create_phi_node (res_y, new_exit_block);
-  tree res_c = PHI_ARG_DEF_FROM_EDGE (phi, succ_new_exit_block);
-  add_phi_arg (nphi, res_c, pred_new_exit_block, UNKNOWN_LOCATION);
-  add_phi_arg (phi, res_y, succ_new_exit_block, UNKNOWN_LOCATION);
+  if (new_exit_block != NULL)
+	{
+	  /* Now that we have a new exit block, duplicate the phi of the old
+	 exit block in the new exit block to preserve loop-closed ssa.  */
+	  edge succ_new_exit_block = single_succ_edge (new_exit_block);
+	  edge pred_new_exit_block = single_pred_edge (new_exit_block);
+	  tree res_y = copy_ssa_name (res_z, phi);
+	  nphi = create_phi_node (res_y, new_exit_block);
+	  res_c = PHI_ARG_DEF_FROM_EDGE (phi, succ_new_exit_block);
+	  add_phi_arg (nphi, res_c, pred_new_exit_block, UNKNOWN_LOCATION);
+	  add_phi_arg (phi, res_y, succ_new_exit_block, UNKNOWN_LOCATION);
+	}
+  else
+	res_c = PHI_ARG_DEF_FROM_EDGE (phi, exit);
 
   if (virtual_operand_p (res_z))
 	continue;
@@ -1724,7 +1736,9 @@ transform_to_exit_first_loop_alt (struct loop *loop,
   gimple *reduc_phi = SSA_NAME_DEF_STMT (res_c);
   struct reduction_info *red = reduction_phi (reduction_list, reduc_phi);
   if (red != NULL)
-	red->keep_res = nphi;
+	red->keep_res = (nphi != NULL
+			 ? nph

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Kyrill Tkachov



On 09/11/15 15:34, Marcus Shawcroft wrote:

On 9 November 2015 at 11:32, Kyrill Tkachov  wrote:


2015-11-09  Kyrylo Tkachov  

 PR target/68129
 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
 * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
 Delete VOIDmode case.  Assert that mode is not VOIDmode.
 * config/aarch64/predicates.md (const0_operand): Remove const_double
 match.

2015-11-09  Kyrylo Tkachov  

 PR target/68129
 * gcc.target/aarch64/pr68129_1.c: New test.

Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ?


Not really, here is the patch with the test in gcc.dg/ if that's preferred.

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

PR target/68129
* config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
* config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
Delete VOIDmode case.  Assert that mode is not VOIDmode.
* config/aarch64/predicates.md (const0_operand): Remove const_double
match.

2015-11-09  Kyrylo Tkachov  

PR target/68129
* gcc.dg/pr68129_1.c: New test.


Cheers
/Marcus



commit 623ffaa527b17ad01179c30c1d4a9911243f818a
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 10:49:44 2015 +

[AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce155dc..927b72a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4403,11 +4403,10 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  break;
 
 	case CONST_DOUBLE:
-	  /* CONST_DOUBLE can represent a double-width integer.
-	 In this case, the mode of x is VOIDmode.  */
-	  if (GET_MODE (x) == VOIDmode)
-	; /* Do Nothing.  */
-	  else if (aarch64_float_const_zero_rtx_p (x))
+	  /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever
+	 be getting CONST_DOUBLEs holding integers.  */
+	  gcc_assert (GET_MODE (x) != VOIDmode);
+	  if (aarch64_float_const_zero_rtx_p (x))
 	{
 	  fputc ('0', f);
 	  break;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b041a1e..0fac0a7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel;
   (aarch64_cmodel == AARCH64_CMODEL_TINY		\
|| aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
 
+#define TARGET_SUPPORTS_WIDE_INT 1
+
 /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register.  */
 #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \
   ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 1bcbf62..8775460 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand"
 
 ;; Return true if OP a (const_int 0) operand.
 (define_predicate "const0_operand"
-  (and (match_code "const_int, const_double")
+  (and (match_code "const_int")
(match_test "op == CONST0_RTX (mode)")))
 
 (define_predicate "aarch64_ccmp_immediate"
diff --git a/gcc/testsuite/gcc.dg/pr68129_1.c b/gcc/testsuite/gcc.dg/pr68129_1.c
new file mode 100644
index 000..112331e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68129_1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-split-wide-types" } */
+
+typedef int V __attribute__ ((vector_size (8 * sizeof (int;
+
+void
+foo (V *p, V *q)
+{
+  *p = (*p == *q);
+}

[PATCH, 2/16] Make create_parallel_loop return void

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch makes create_parallel_loop return void.  The result is 
currently unused.


Thanks,
- Tom

Make create_parallel_loop return void

2015-11-09  Tom de Vries  

	* tree-parloops.c (create_parallel_loop): Return void.
---
 gcc/tree-parloops.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 6a49aa9..17415a8 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1986,10 +1986,9 @@ transform_to_exit_first_loop (struct loop *loop,
 /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  Returns the
-   basic block containing GIMPLE_OMP_PARALLEL tree.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads.  */
 
-static basic_block
+static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 		  tree new_data, unsigned n_threads, location_t loc)
 {
@@ -2162,8 +2161,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* After the above dom info is hosed.  Re-compute it.  */
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
-
-  return paral_bb;
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
-- 
1.9.1

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Marcus Shawcroft

On 9 November 2015 at 15:45, Kyrill Tkachov  wrote:
>
> On 09/11/15 15:34, Marcus Shawcroft wrote:
>>
>> On 9 November 2015 at 11:32, Kyrill Tkachov 
>> wrote:
>>
>>> 2015-11-09  Kyrylo Tkachov  
>>>
>>>  PR target/68129
>>>  * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
>>>  * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
>>>  Delete VOIDmode case.  Assert that mode is not VOIDmode.
>>>  * config/aarch64/predicates.md (const0_operand): Remove const_double
>>>  match.
>>>
>>> 2015-11-09  Kyrylo Tkachov  
>>>
>>>  PR target/68129
>>>  * gcc.target/aarch64/pr68129_1.c: New test.
>>
>> Hi, This test isn't aarch64 specific, does it need to be in
>> gcc.target/aarch64 ?
>
>
> Not really, here is the patch with the test in gcc.dg/ if that's preferred.


OK /Marcus

Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)

2015-11-09 Thread Segher Boessenkool

Hi,

On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote:
> [gcc/testsuite]
>   * lib/target-supports.exp (check_p9vector_hw_available): Add
>   checks for power9 availability.
>   (check_effective_target_powerpc_p9vector_ok): Likewise.

It's probably better not to use this for modulo; it is confusing and if
you'll later need to untangle it it is much more work.

> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */

Lose this line?  If Darwin cannot support modulo, the next line will
catch that.

+/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power9" } } */
> +/* { dg-options "-mcpu=power9 -O3" } */

Is -O3 needed?  Why won't -O2 work?

> +proc check_p9vector_hw_available { } {
> +return [check_cached_effective_target p9vector_hw_available {
> + # Some simulators are known to not support VSX/power8 instructions.
> + # For now, disable on Darwin
> + if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
> [istarget *-*-darwin*]} {

Long line.

> Index: gcc/config/rs6000/rs6000.md
> ===
> --- gcc/config/rs6000/rs6000.md   (revision 229972)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div3_sra_
> (set_attr "cell_micro" "not")])
>  
>  (define_expand "mod3"
> -  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
> -   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
> -   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
> + (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
> +  (match_operand:GPR 2 "reg_or_cint_operand" "")))]

You could delete the empty constraint strings while you're at it.

> +;; On machines with modulo support, do a combined div/mod the old fashioned
> +;; method, since the multiply/subtract is faster than doing the mod 
> instruction
> +;; after a divide.

You can instead have a "divmod" insn that is split to either of div, mod,
or div+mul+sub depending on which of the outputs is unused.  Peepholes
do not get all cases.

This can be a later improvement of course.


Segher

Re: [PATCH][AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW

2015-11-09 Thread Marcus Shawcroft

On 9 November 2015 at 11:34, Kyrill Tkachov  wrote:

> * config/aarch64/aarch64.c (aarch64_simd_valid_immediate):
> Remove integer CONST_DOUBLE handling.  It should never occur.

OK /Marcus

[PATCH, 3/16] Ignore reduction clause on kernels directive

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


As discussed here ( 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00785.html ), the kernels 
directive does not allow the reduction clause.  This patch fixes that.


Thanks,
- Tom

Ignore reduction clause on kernels directive

2015-11-08  Tom de Vries  

	* c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION,
	classify as loop clause.
---
 gcc/c-family/c-omp.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 3e93b59..a3b99b2 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -867,7 +867,7 @@ c_omp_check_loop_iv_exprs (location_t stmt_loc, tree declv, tree decl,
 tree
 c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree next, loop_clauses, t;
+  tree next, loop_clauses;
 
   loop_clauses = *not_loop_clauses = NULL_TREE;
   for (; clauses ; clauses = next)
@@ -886,16 +886,11 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_PRIVATE:
+	case OMP_CLAUSE_REDUCTION:
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
 
-	  /* Reductions belong in both constructs.  */
-	case OMP_CLAUSE_REDUCTION:
-	  t = copy_node (clauses);
-	  OMP_CLAUSE_CHAIN (t) = loop_clauses;
-	  loop_clauses = t;
-
 	  /* Parallel/kernels clauses.  */
 	default:
 	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
-- 
1.9.1

Re: Add a combined_fn enum

2015-11-09 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/09/2015 03:57 PM, Richard Sandiford wrote:
>
>> I'm not sure what you mean by "fix" though.  I don't think we can change
>> any of the constraints above.
>
> I don't see why they couldn't be relaxed. What's stopping us from 
> defining some builtins independently of the frontends, and don't all 
> languages except Fortran use the normal builtins.def anyway?

I think the problem isn't so much with the functions that we define
ourselves as those that simply mirror a standard library function.
We can't assume that all the libm functions exist.  It's also valid
for the implementation to add target-specific attributes on top of
what the standard says (e.g. "nomips16" on MIPS).

So if a call starts out as a call to sinf and we treat it as a call to
__builtin_sinf for optimisation purposes, there is always conceptually
an underlying "physical" sinf function.  We also can't convert calls
to __builtin_sin to calls to __builtin_sinf without proof that a sinf
definition exists (not guaranteed for C90).

> Also - if you can have an internal function without a fndecl, why
> wouldn't it work to allow that for builtins as well? But this is
> probably moot since I can now see a little more clearly what you want
> to do.

I think that's the defining characteristic of builtins vs. internal
functions though.  built-in functions are directly callable by source
code and have a defined function signature.  Internal functions are not
directly callable by source code and have a flexible function signature.

There are cases where this difference doesn't matter, which is why
the combined_fn enum seemed like a good idea.  But I don't think we
can do away with the distinction entirely.

>> The main goal is to allow functions to be vectorised simply by defining
>> the associated optab.
>
> Ok. So these internal functions would support arbitrary modes like 
> V[1248][SDX]F etc., unlike the builtins?

Yeah.  They also never set errno.

> That wasn't entirely clear from the patches I looked at, it seemed
> like you were just duplicating every builtin as an internal function
> (something to which I would have objected).

Yeah, sorry about that.  I'll post the patches shortly.

Thanks,
Richard

Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)

2015-11-09 Thread Segher Boessenkool

On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote:
> This patch adds support for scalar count trailing zeros instruction that is
> being added to ISA 3.0 (power9).

I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well.

> +(define_insn "ctz2_hw"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> + (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
> +  "TARGET_CTZ"
> +  "cnttz %0,%1"
> +  [(set_attr "type" "cntlz")])

We should probably rename this attr value now.  "cntz" maybe?  Could be
later of course.


Segher

Re: [OpenACC] declare directive

2015-11-09 Thread James Norris


Jakub,

This is an update of the patch from:
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00893.html.
There was an unused variable in c/c-parser.c that was
removed.

I've included the ChangeLog as a convenience, but nothing
was changed in the file.

Thanks!
Jim

2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise.
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index ac11838..cd0cc27 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1207,6 +1207,7 @@ static const struct omp_pragma_def oacc_pragmas[] = {
   { "atomic", PRAGMA_OACC_ATOMIC },
   { "cache", PRAGMA_OACC_CACHE },
   { "data", PRAGMA_OACC_DATA },
+  { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_OACC_ENTER_DATA },
   { "exit", PRAGMA_OACC_EXIT_DATA },
   { "kernels", PRAGMA_OACC_KERNELS },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 953c4e3..c6a2981 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -30,6 +30,7 @@ enum pragma_kind {
   PRAGMA_OACC_ATOMIC,
   PRAGMA_OACC_CACHE,
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_DECLARE,
   PRAGMA_OACC_ENTER_DATA,
   PRAGMA_OACC_EXIT_DATA,
   PRAGMA_OACC_KERNELS,
@@ -151,6 +152,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
+  PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
@@ -175,7 +177,8 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
-  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION
+  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION,
+  PRAGMA_OACC_CLAUSE_LINK = PRAGMA_OMP_CLAUSE_LINK
 };
 
 extern struct cpp_reader* parse_in;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..8edf745 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1231,6 +1231,7 @@ static vec *c_parser_expr_list (c_parser *, bool, bool,
 	 vec **, location_t *,
 	 tree *, vec *,
 	 unsigned int * = NULL);
+static void c_parser_oacc_declare (c_parser *);
 static void c_parser_oacc_enter_exit_data (c_parser *, bool);
 static void c_parser_oacc_update (c_parser *);
 static void c_parser_omp_construct (c_parser *);
@@ -9697,6 +

[PATCH, 4/16] Implement -foffload-alias

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch addresses the problem that once the offloading region has 
been split off from the original function, alias analysis can no longer 
use information available in the original function that would allow it 
to do a more precise analysis for the offloading function. [ At some 
point we could use fipa-pta for that, as discussed in PR46032, but 
that's not feasible now. ]


The basic idea behind the patch is that for typical usage, the base 
pointers used in an offloaded region are non-aliasing. The patch works 
by adding restrict to the types of the fields used to pass data to an 
offloading region.



The patch implements a new option
-foffload-alias=.

The option -foffload-alias=none instructs the compiler to assume that
object references and pointer dereferences in an offload region do not
alias.

The option -foffload-alias=pointer instructs the compiler to assume that 
objects references in an offload region do not alias.


The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.

The default value is -foffload-alias=none.

Thanks,
- Tom

Implement -foffload-alias

2015-11-03  Tom de Vries  

	* common.opt (foffload-alias): New option.
	* flag-types.h (enum offload_alias): New enum.
	* omp-low.c (install_var_field): Handle flag_offload_alias.
	* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
	(@item -foffload-alias): New item.

	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---
 gcc/common.opt | 16 ++
 gcc/doc/invoke.texi| 11 
 gcc/flag-types.h   |  7 +++
 gcc/omp-low.c  | 28 +-
 .../goacc/kernels-loop-offload-alias-none.c| 61 ++
 .../goacc/kernels-loop-offload-alias-ptr.c | 44 
 6 files changed, 165 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 961a1b6..7135b1a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1735,6 +1735,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-alias=
+Common Joined RejectNegative Enum(offload_alias) Var(flag_offload_alias) Init(OFFLOAD_ALIAS_NONE)
+-foffload-alias=[all|pointer|none] Assume non-aliasing in an offload region
+
+Enum
+Name(offload_alias) Type(enum offload_alias) UnknownError(unknown offload aliasing %qs)
+
+EnumValue
+Enum(offload_alias) String(all) Value(OFFLOAD_ALIAS_ALL)
+
+EnumValue
+Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
+
+EnumValue
+Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2e5953b..6928efd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1143,6 +1143,7 @@ See S/390 and zSeries Options.
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
+-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]

Re: [PATCH], Add power9 support to GCC, patch #1 (revised)

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:33 PM, Michael Meissner
 wrote:
> This is patch #1 that I revised.  I changed -mfusion-toc to -mtoc-fusion.  I
> changed the references to ISA 2.08 to 3.0.  I added two new debug switches for
> code in future patches that in undergoing development and is not ready to be 
> on
> by default.
>
> I have done a bootstrap build on a little endian power8 system and there were
> no regressions in this patch.  Is it ok to install in the trunk?
>
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
> ISA 3.0 (power9).
> (-mpower9-vector): Likewise.
> (-mpower9-dform): Likewise.
> (-mpower9-minmax): Likewise.
> (-mtoc-fusion): Likewise.
> (-mmodulo): Likewise.
> (-mfloat128-hardware): Likewise.
>
> * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
> mask for ISA 3.0 (power9).
> (POWERPC_MASKS): Add new ISA 3.0 switches.
> (power9 cpu): Add power9 cpu.
>
> * config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
> power9.
> (ASM_CPU_SPEC): Likewise.
> (EXTRA_SPECS): Likewise.
>
> * config/rs6000/rs6000-opts.h (enum processor_type): Add
> PROCESSOR_POWER9.
>
> * config/rs6000/rs6000.c (power9_cost): Initial cost setup for
> power9.
> (rs6000_debug_reg_global): Add support for power9 fusion.
> (rs6000_setup_reg_addr_masks): Cache mode size.
> (rs6000_option_override_internal): Until real power9 tuning is
> added, use -mtune=power8 for -mcpu=power9.
> (rs6000_setup_reg_addr_masks): Do not allow pre-increment,
> pre-decrement, or pre-modify on SFmode/DFmode if we allow the use
> of Altivec registers.
> (rs6000_option_override_internal): Add support for ISA 3.0
> switches.
> (rs6000_loop_align): Add support for power9 cpu.
> (rs6000_file_start): Likewise.
> (rs6000_adjust_cost): Likewise.
> (rs6000_issue_rate): Likewise.
> (insn_must_be_first_in_group): Likewise.
> (insn_must_be_last_in_group): Likewise.
> (force_new_group): Likewise.
> (rs6000_register_move_cost): Likewise.
> (rs6000_opt_masks): Likewise.
>
> * config/rs6000/rs6000.md (cpu attribute): Add power9.
> * config/rs6000/rs6000-tables.opt: Regenerate.
>
> * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
> _ARCH_PWR9 if power9 support is available.
>
> * config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
> * config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.
>
> * configure.ac: Determine if the assembler supports the ISA 3.0
> instructions.
> * config.in (HAVE_AS_POWER9): Likewise.
> * configure: Regenerate.
>
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
> switches.

Okay.

Thanks, David

Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:36 PM, Michael Meissner
 wrote:
> This is patch #2.  It adds support for the new modulus instructions that are
> being added in ISA 3.0 (power9):
>
> I have built this patch (along with patches #3 and #4) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
> modulus instructions if we have hardware support.
>
> * config/rs6000/rs6000.md (mod3): Add support for ISA 3.0
> modulus instructions.
> (umod3): Likewise.
> (divmod peephole): Likewise.
> (udivmod peephole): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  
>
> * lib/target-supports.exp (check_p9vector_hw_available): Add
> checks for power9 availability.
> (check_effective_target_powerpc_p9vector_ok): Likewise.
> (check_vect_support_and_set_flags): Likewise.
>
> * gcc.target/powerpc/mod-1.c: New test.
> * gcc.target/powerpc/mod-2.c: Likewise.

This is okay, but let's wait for revised #3 since you tested 2, 3, 4 together.

Thanks, David

[PATCH 0/6] Automatically use vector optabs

2015-11-09 Thread Richard Sandiford

The main goal of this series is to allow functions to be vectorised
simply by defining the associated optab.  At the moment you can get
a scalar square root instruction by defining an md pattern like sqrtdf2.
But if you want to have vectorised sqrt, you need to have a target-
specific C-level built-in function for the vector form of sqrt,
implement TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in
the same way that the target expands other directly-called built-in
functions.  That seems unnecessarily indirect, especially when in
practice those target-specific functions tend to use patterns like
sqrtv2df2 anyway.  It also means GCC has less idea what the vector
function is doing.

The series uses the combined_fn enum and internal functions from
the patches I posted on Saturday to make the vectoriser pick up
vector optabs automatically and represent the result as a vector
internal function call.  For example, we can convert BUILT_IN_SQRT{F,,L}
or a scalar IFN_SQRT into a vector IFN_SQRT if the appropriate sqrtM2
optab is defined.

Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnu and
powerpc64-linux-gnu.

Thanks,
Richard

Re: [OpenACC] declare directive

2015-11-09 Thread Jakub Jelinek

On Mon, Nov 09, 2015 at 10:01:32AM -0600, James Norris wrote:
> +  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))

Here you only look up "omp declare target", not "omp declare target link".
So, what happens if you mix that (once in some copy clause, once in link),
or mention twice in link, etc.?  Needs testsuite coverage and clear rules.

> +   DECL_ATTRIBUTES (decl) =
> + tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (decl));

Incorrect formatting, = goes already on the following line, no whitespace
at end of line, and next line is indented below CL from DECL.

> +   t = build_omp_clause (OMP_CLAUSE_LOCATION (c) , OMP_CLAUSE_MAP);

Wrong formatting, no space before ,.

> +if (ret_clauses)
> +  {
> + tree fndecl = current_function_decl;
> + tree attrs = lookup_attribute ("oacc declare returns",
> +DECL_ATTRIBUTES (fndecl));

Why do you use an attribute for this?  I think adding the automatic
vars to hash_map during gimplification of the OACC_DECLARE is best.

> + tree id = get_identifier ("oacc declare returns");
> + DECL_ATTRIBUTES (fndecl) =
> + tree_cons (id, ret_clauses, DECL_ATTRIBUTES (fndecl));

Formatting error.

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1065,6 +1065,7 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
>gimple_seq body, cleanup;
>gcall *stack_save;
>location_t start_locus = 0, end_locus = 0;
> +  tree ret_clauses = NULL;
>  
>tree temp = voidify_wrapper_expr (bind_expr, NULL);
>  
> @@ -1166,9 +1167,56 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
> clobber_stmt = gimple_build_assign (t, clobber);
> gimple_set_location (clobber_stmt, end_locus);
> gimplify_seq_add_stmt (&cleanup, clobber_stmt);
> +
> +   if (flag_openacc)
> + {
> +   tree attrs = lookup_attribute ("oacc declare returns",
> +DECL_ATTRIBUTES (current_function_decl));
> +   tree clauses, c, c_next = NULL, c_prev = NULL;
> +
> +   if (!attrs)
> + break;
> +
> +   clauses = TREE_VALUE (attrs);
> +
> +   for (c = clauses; c; c_prev = c, c = c_next)
> + {
> +   c_next = OMP_CLAUSE_CHAIN (c);
> +
> +   if (t == OMP_CLAUSE_DECL (c))
> + {
> +   if (ret_clauses)
> + OMP_CLAUSE_CHAIN (c) = ret_clauses;
> +
> +   ret_clauses = c;
> +
> +   if (c_prev == NULL)
> + clauses = c_next;
> +   else
> + OMP_CLAUSE_CHAIN (c_prev) = c_next;
> + }
> + }

This doesn't really scale.  Consider 1 clauses on various
oacc declare constructs in a single function, and 100 automatic
variables in such a function.
So, what I'm suggesting is during gimplification of OACC_DECLARE,
if you find a clause on an automatic variable in the current function
that you want to unmap afterwards, have a
static hash_map *oacc_declare_returns;
and you just add into the hash map the VAR_DECL -> the clause you want,
then in this spot you check
  if (oacc_declare_returns)
{
  clause = lookup in hash_map (t);
  if (clause)
{
  ...
}
}

> +
> +   if (clauses == NULL)
> + {
> +   DECL_ATTRIBUTES (current_function_decl) =
> + remove_attribute ("oacc declare returns",
> + DECL_ATTRIBUTES (current_function_decl));

Wrong formatting.

Jakub

[PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c

2015-11-09 Thread Richard Sandiford

In practice all targets that can vectorise sqrt define the appropriate
sqrt2 optab.  The only case where this isn't immediately obvious
is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
be exercised for sqrt.

This patch therefore uses the internal function interface instead of
going via the target hook.


gcc/
* tree-vect-patterns.c: Include internal-fn.h.
(vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index bab9a4f..a803e8c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
 #include "builtins.h"
+#include "internal-fn.h"
 #include "case-cfn-macros.h"
 
 /* Pattern recognition functions  */
@@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree 
*type_in,
   if (TREE_CODE (exp) == REAL_CST
   && real_equal (&TREE_REAL_CST (exp), &dconsthalf))
 {
-  tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
   *type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
-  if (*type_in)
+  if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
{
- gcall *stmt = gimple_build_call (newfn, 1, base);
- if (vectorizable_function (stmt, *type_in, *type_in)
- != NULL_TREE)
-   {
- var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
- gimple_call_set_lhs (stmt, var);
- return stmt;
-   }
+ gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
+ var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
+ gimple_call_set_lhs (stmt, var);
+ return stmt;
}
 }

1 2 3 >

1 - 100 of 249 matches

Mail list logo