[ping] Change behavior of -fsched-verbose option

2015-11-05 Thread Nikolai Bozhenov

Please commit if it is OK for trunk.

Thanks,
Nikolai

On 10/22/2015 06:38 PM, Nikolai Bozhenov wrote:

Hi!

Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.

In fact, currently there's no way to tell GCC that I want extremely
verbose logs for sched1 and I want no logs for all other passes.
Especially to the console.

I suggest disabling such redirection in the scheduler and omitting
debugging output for passes without dump_file. I believe a better
way to redirect printing to the stderr is to use options like
-fdump-rtl-sched=stderr. The attached patch implements the
suggestion.

Anyway, the old behavior may be reproduced with options
-fsched-verbose=7 -fdump-rtl-sched1 
-fdump-rtl-{sched_fusion,sched2}=stderr

if it is really necessary.

The patch has been bootstrapped and regtested on x86_64.

Thanks,
Nikolai


2015-10-22  Nikolai Bozhenov  

	* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
	* common.opt (-fsched-verbose): Set default value to 1.
	* invoke.texi (-fsched-verbose): Update the option's description.

diff --git a/gcc/common.opt b/gcc/common.opt
index 961a1b6..757ce85 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1967,17 +1967,17 @@ fsched-spec-load
 Common Report Var(flag_schedule_speculative_load) Optimization
 Allow speculative motion of some loads.
 
 fsched-spec-load-dangerous
 Common Report Var(flag_schedule_speculative_load_dangerous) Optimization
 Allow speculative motion of more loads.
 
 fsched-verbose=
-Common RejectNegative Joined UInteger Var(sched_verbose_param)
+Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1)
 -fsched-verbose=	Set the verbosity level of the scheduler.
 
 fsched2-use-superblocks
 Common Report Var(flag_sched2_use_superblocks) Optimization
 If scheduling post reload, do superblock scheduling.
 
 fsched2-use-traces
 Common Ignore
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4fc7d88..11b8697 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7415,22 +7415,17 @@ place unique stamps in coverage data files and the object files that
 produce them.  You can use the @option{-frandom-seed} option to produce
 reproducibly identical object files.
 
 The @var{number} should be different for every file you compile.
 
 @item -fsched-verbose=@var{n}
 @opindex fsched-verbose
 On targets that use instruction scheduling, this option controls the
-amount of debugging output the scheduler prints.  This information is
-written to standard error, unless @option{-fdump-rtl-sched1} or
-@option{-fdump-rtl-sched2} is specified, in which case it is output
-to the usual dump listing file, @file{.sched1} or @file{.sched2}
-respectively.  However for @var{n} greater than nine, the output is
-always printed to standard error.
+amount of debugging output the scheduler prints to the dump files.
 
 For @var{n} greater than zero, @option{-fsched-verbose} outputs the
 same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}.
 For @var{n} greater than one, it also output basic block probabilities,
 detailed ready list information and unit/insn info.  For @var{n} greater
 than two, it includes RTL at abort point, control-flow and regions info.
 And for @var{n} over four, @option{-fsched-verbose} also includes
 dependence info.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index caadc11..835648b 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -198,27 +198,24 @@ static int modulo_iter0_max_uid;
 static int modulo_backtracks_left;
 
 /* The stage in which the last insn from the original loop was
scheduled.  */
 static int modulo_last_stage;
 
 /* sched-verbose controls the amount of debugging output the
scheduler prints.  It is controlled by -fsched-verbose=N:
-   N>0 and no -DSR : the output is directed to stderr.
-   N>=10 will direct the printouts to stderr (regardless of -dSR).
-   N=1: same as -dSR.
+   N=0: no debugging output.
+   N=1: default value.
N=2: bb's probabilities, detailed ready list info, unit/insn info.
N=3: rtl at abort point, control-flow, regions info.
N=5: dependences info.  */
-
 int sched_verbose = 0;
 
-/* Debugging file.  All printouts are sent to dump, which is always set,
-   either to stderr, or to the dump listing file (-dRS).  */
+/* Debugging file.  All printouts are sent to dump. */
 FILE *sched_dump = 0;
 
 /* This is a placeholder for the scheduler parameters common
to all schedulers.  */
 struct common_sched_info_def *common_sched_info;
 
 #define INSN_TICK(INSN)	(HID (INSN)->tick)
 #define INSN_EXACT_TICK(INSN) (HID (INSN)->exact_tick)
@@

[sched] Dump dependency graph to a dot file

2015-11-05 Thread Nikolai Bozhenov

Hi!

The attached patch adds a procedure to dump the scheduler's dependency
graph into a dot file. The patch has been bootstrapped and regtested
on x86_64. Please commit if it is OK for trunk.

Thanks,
Nikolai
2015-11-04  Nikolai Bozhenov  

	* sched-int.h (dump_rgn_dependencies_dot): Declare
	* sched-rgn.c (dump_rgn_dependencies_dot): New function
	* print-rtl.h (print_insn): Add prototype

diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h
index eb079af..f601d33 100644
--- a/gcc/print-rtl.h
+++ b/gcc/print-rtl.h
@@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx);
 #endif
 
 extern void dump_value_slim (FILE *, const_rtx, int);
 extern void dump_insn_slim (FILE *, const rtx_insn *);
 extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *,
 			   int, int);
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
+extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose);
+
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
 
 #endif  // GCC_PRINT_RTL_H
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 86f5821..4600347 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void);
 extern void sched_rgn_local_free (void);
 extern void extend_regions (void);
 extern void rgn_make_new_region_out_of_new_block (basic_block);
 
 extern void compute_priorities (void);
 extern void increase_insn_priority (rtx_insn *, int);
 extern void debug_rgn_dependencies (int);
 extern void debug_dependencies (rtx_insn *, rtx_insn *);
+extern void dump_rgn_dependencies_dot (FILE *);
+extern void dump_rgn_dependencies_dot (const char *);
+
 extern void free_rgn_deps (void);
 extern int contributes_to_priority (rtx_insn *, rtx_insn *);
 extern void extend_rgns (int *, int *, sbitmap, int *);
 extern void deps_join (struct deps_desc *, struct deps_desc *);
 
 extern void rgn_setup_common_sched_info (void);
 extern void rgn_setup_sched_infos (void);
 
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index eafb3fd..d744d83 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -58,16 +58,18 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-attr.h"
 #include "except.h"
 #include "params.h"
 #include "cfganal.h"
 #include "sched-int.h"
 #include "sel-sched.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "pretty-print.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
 /* Some accessor macros for h_i_d members only used within this file.  */
 #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load)
 #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn)
 
 /* nr_inter/spec counts interblock/speculative motion for the function.  */
@@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail)
 		   DEP_NONREG (dep) ? "n" : "",
 		   DEP_MULTIPLE (dep) ? "m" : "");
   }
   fprintf (sched_dump, "\n");
 }
 
   fprintf (sched_dump, "\n");
 }
+
+/* Dump dependency graph for the current region to a file using dot syntax. */
+
+void
+dump_rgn_dependencies_dot (FILE *file)
+{
+  rtx_insn *head, *tail, *con, *pro;
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int bb;
+  pretty_printer pp;
+
+  pp.buffer->stream = file;
+  pp_printf (&pp, "digraph SchedDG {\n");
+
+  for (bb = 0; bb < current_nr_blocks; ++bb)
+{
+  // begin subgraph (basic block)
+  pp_printf (&pp, "subgraph cluster_block_%d {\n", bb);
+  pp_printf (&pp, "\t" "color=blue;" "\n");
+  pp_printf (&pp, "\t" "style=bold;" "\n");
+  pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb));
+
+  // setup head and tail (no support for EBBs)
+  gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
+  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+  tail = NEXT_INSN (tail);
+
+  // dump all insns
+  for (con = head; con != tail; con = NEXT_INSN (con))
+	{
+	  if (!INSN_P (con))
+	continue;
+
+	  // pretty print the insn
+	  pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con));
+	  pp_write_text_to_stream (&pp);
+	  print_insn (&pp, con, /*verbose=*/false);
+	  pp_write_text_as_dot_label_to_stream (&pp, /*for_record=*/true);
+	  pp_write_text_to_stream (&pp);
+
+	  // dump instruction attributes
+	  pp_printf (&pp, "|{ uid:%d | luid:%d | prio:%d }}\",shape=record]\n",
+		 INSN_UID (con), INSN_LUID (con), INSN_PRIORITY (con));
+
+	  /* dump all deps */
+	  FOR_EACH_DEP (con, SD_LIST_BACK, sd_it, dep)
+	{
+	  int weight = 0;
+	  const char *color;
+	  pro = DEP_PRO (dep);
+
+	  switch (DEP_TYPE (dep))
+		{
+		case REG_DEP_TRUE:
+		  color = "black";
+		  weight = 1;
+		  break;
+		case REG_DEP_OUTPUT:
+		case REG_DEP_ANTI:
+		  color = "orange";
+		  break;
+		case REG_DEP_CONTROL:
+		  color = "blue";
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+
+	  pp_printf (&pp, "\t%d -> %d [color

Re: [sched] Dump dependency graph to a dot file

2015-11-05 Thread Kyrill Tkachov

Hi Nikolai,

On 05/11/15 08:29, Nikolai Bozhenov wrote:

Hi!

The attached patch adds a procedure to dump the scheduler's dependency
graph into a dot file. The patch has been bootstrapped and regtested
on x86_64. Please commit if it is OK for trunk.

Thanks,
Nikolai


A couple of style nits.

+  // begin subgraph (basic block)
+  pp_printf (&pp, "subgraph cluster_block_%d {\n", bb);
+  pp_printf (&pp, "\t" "color=blue;" "\n");
+  pp_printf (&pp, "\t" "style=bold;" "\n");
+  pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb));
+
+  // setup head and tail (no support for EBBs)
+  gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
+  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+  tail = NEXT_INSN (tail);
+
+  // dump all insns
+  for (con = head; con != tail; con = NEXT_INSN (con))
+   {
+ if (!INSN_P (con))
+   continue;
+
+ // pretty print the insn
+ pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con));

Please use C-style comments i.e. /**/ instead of //.
Also, throughout the comments leave two empty spaces after a full stop
i.e. /* .  */.

You can use the check_GNU_style.sh script in the contrib/ directory on
your patches to highlight similar issues. For example:
$ ./contrib/check_GNU_style.sh ~/patches/dep-graph.patch

Dot, space, space, end of comment.
83:+/* Dump dependency graph for the current region to a file using dot syntax. 
*/
166:+/* Dump dependency graph for the current region to a file using dot 
syntax. */

Sentences should end with a dot.  Dot, space, space, end of the comment.
127:+  /* dump all deps */

Cheers,
Kyrill



Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-05 Thread Bernd Schmidt

On 11/04/2015 05:17 PM, Andreas Arnez wrote:


gcc/c/ChangeLog:

PR debug/67192
* c-parser.c (c_parser_while_statement): Finish the loop before
parsing ahead for misleading indentation.
(c_parser_for_statement): Likewise.


This is OK. Does C++ have similar issues?


Bernd



Re: [PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-05 Thread James Greenhalgh
On Wed, Nov 04, 2015 at 05:21:03PM -0600, Evandro Menezes wrote:
> Please, ignore the previous patch.  This is the intended patch.
> 
> Sorry.
> 
> -- 
> Evandro Menezes
> 
> On 11/04/2015 05:18 PM, Evandro Menezes wrote:
> >This patch adds extra tuning information about AArch64 targets:
> >
> > * Maximum number of case values before resorting to a jump table
> >   The default values assumed independently of the specific backends
> >   may be rather low for modern processors, which sport quite efficient
> >   direct branch prediction, whereas indirect branch prediction is
> >   still typically not so efficient.  This value may be specifically
> >   set for a processor or left at zero to use the default values.
> > * L1 cache line size
> >   The auto-prefetcher uses this information when emitting software
> >   prefetch insns.
> >
> >Please, commit if it's alright.
> >
> >Thank you,
> >
> 

Thanks for the patch,

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 81792bc..ecf4685 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -195,6 +195,9 @@ struct tune_params
>int vec_reassoc_width;
>int min_div_recip_mul_sf;
>int min_div_recip_mul_df;
> +  int max_case_values; /* Case values threshold; or 0 for the default.  */

If we're using an int, how about -1 as the sentinel value? (Maybe someone
really likes jump tables!). Otherwise, make this an unsigned int?

> +
> +  int cache_line_size; /* Cache line size; or 0 for the default.  */

unsigned int?

The patch is otherwise OK, though it needs a ChangeLog.

Thanks,
James



Re: [ping] Change behavior of -fsched-verbose option

2015-11-05 Thread Bernd Schmidt

On 11/05/2015 09:11 AM, Nikolai Bozhenov wrote:

Please commit if it is OK for trunk.


Hmm, do you have a copyright assignment on file?


Bernd



[PATCH][ARM][cleanup] Remove uses of CONST_DOUBLE_HIGH/LOW

2015-11-05 Thread Kyrill Tkachov

Hi all,

This cleanup patch removes handling of CONST_DOUBLE rtxes that carry large 
integers.
These should never be passed down from the midend and the arm backend doesn't 
create them.
The code has been there since 2007 but the arm backend was moved to 
TARGET_SUPPORTS_WIDE_INT
in 2014, so this path should never be taken.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

commit 8a1c27b21e6662c3203b95c457c0df5c0239805e
Author: Kyrylo Tkachov 
Date:   Thu Oct 29 11:20:43 2015 +

[ARM][cleanup] Remove uses of CONST_DOUBLE_HIGH/LOW

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5c97786..36958af 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12386,32 +12386,15 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 {
   rtx el = vector ? CONST_VECTOR_ELT (op, i) : op;
   unsigned HOST_WIDE_INT elpart;
-  unsigned int part, parts;
 
-  if (CONST_INT_P (el))
-{
-  elpart = INTVAL (el);
-  parts = 1;
-}
-  else if (CONST_DOUBLE_P (el))
-{
-  elpart = CONST_DOUBLE_LOW (el);
-  parts = 2;
-}
-  else
-gcc_unreachable ();
+  gcc_assert (CONST_INT_P (el));
+  elpart = INTVAL (el);
 
-  for (part = 0; part < parts; part++)
-{
-  unsigned int byte;
-  for (byte = 0; byte < innersize; byte++)
-{
-  bytes[idx++] = (elpart & 0xff) ^ invmask;
-  elpart >>= BITS_PER_UNIT;
-}
-  if (CONST_DOUBLE_P (el))
-elpart = CONST_DOUBLE_HIGH (el);
-}
+  for (unsigned int byte = 0; byte < innersize; byte++)
+	{
+	  bytes[idx++] = (elpart & 0xff) ^ invmask;
+	  elpart >>= BITS_PER_UNIT;
+	}
 }
 
   /* Sanity check.  */


Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-05 Thread James Greenhalgh
On Wed, Nov 04, 2015 at 02:26:31PM +, Ramana Radhakrishnan wrote:
> 
> 
> On 03/11/15 17:09, James Greenhalgh wrote:
> > On Tue, Nov 03, 2015 at 04:36:45PM +, Ramana Radhakrishnan wrote:
> >> Hi,
> >>
> >>Now that PR63304 is fixed and we have an option to address
> >> any part of the memory using adrp / add or adrp / ldr instructions
> >> it makes sense to switch out literal pools into their own
> >> mergeable sections by default.
> >>
> >> This would mean that by default we could now start getting
> >> the benefits of constant sharing across the board, potentially
> >> improving code size. The other advantage of doing so, for the
> >> security conscious is that this prevents intermingling of literal
> >> pools with code.
> >>
> >> Wilco's kindly done some performance measurements and suggests that
> >> there is not really a performance regression in doing this.
> >> I've looked at the code size for SPEC2k6 today at -Ofast and
> >> in general there is a good code size improvement as expected
> >> by sharing said constants.
> >>
> >> Tested on aarch64-none-elf with no regressions and bootstrapped 
> >> and regression tested in my tree for a number of days now.
> >>
> >> Ok to commit ?
> > 
> > OK with the comment nits below fixed up.
> > 
> >>* config/aarch64/aarch64.c (aarch64_select_rtx_section): Switch
> >> to default section handling for non PC relative literal loads.
> > 
> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> index 5c8604f..9d709e5 100644
> >> --- a/gcc/config/aarch64/aarch64.c
> >> +++ b/gcc/config/aarch64/aarch64.c
> >> @@ -5244,13 +5244,22 @@ aarch64_use_blocks_for_constant_p (machine_mode 
> >> mode ATTRIBUTE_UNUSED,
> >>return false;
> >>  }
> >>  
> >> +/* Force all constant pool entries into the current function section.
> > 
> > Is this comment accurate now? I think it only applied to -mcmodel=large
> > but maybe I'm misunderstanding?
> > 
> >> +   In the large model we cannot reliably address all the address space
> >> +   thus for now, inline this with the text.  */
> >>  static section *
> >> +aarch64_select_rtx_section (machine_mode mode,
> >> +  rtx x,
> >> +  unsigned HOST_WIDE_INT align)
> >> +{
> >> +  /* Force all constant pool entries into the current function section.
> >> + In the large model we cannot reliably address all the address space
> >> + thus for now, inline this with the text.  */
> >> +  if (!aarch64_nopcrelative_literal_loads
> >> +  || aarch64_cmodel == AARCH64_CMODEL_LARGE)
> >> +return function_section (current_function_decl);
> > 
> > This is just a copy paste of the text above (and probably the better place
> > for it).
> > 
> > I think we'd want a more general comment at the top of the function,
> > then this can stay.
> > 
> 
> True and I've just been reading more of the backend - We could now start
> using blocks for constant pools as well. So let's do that.
> 
> How does something like this look ?

One tiny nit. Otherwise, this is OK.


> Tested on aarch64-none-elf - no regressions.
> 
> 2015-11-04  Ramana Radhakrishnan  
> 
> * config/aarch64/aarch64.c
> (aarch64_can_use_per_function_literal_pools_p): New.
> (aarch64_use_blocks_for_constant_p): Adjust declaration
> and use aarch64_can_use_function_literal_pools_p.
> (aarch64_select_rtx_section): Update.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..4f5e069 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5235,24 +5235,37 @@ aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
>return 0;
>  }
>  
> +/* Constant pools are per function only when PC relative
> +   literal loads are true or we are in the large memory
> +   model.  */

Newline here please.

Thanks,
James

> +static inline bool
> +aarch64_can_use_per_function_literal_pools_p (void)
> +{
> +  return (!aarch64_nopcrelative_literal_loads
> +   || aarch64_cmodel == AARCH64_CMODEL_LARGE);
> +}
> +



[PATCH] PR54224 Warn for unused internal procedures -- update testcase

2015-11-05 Thread Dominique d'Humières
Is there any objection to commit the following (see comments 21 and 22)?

Dominique

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 229793)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2015-11-05  Dominique d'Humieres 
+
+   PR fortran/54224
+   * gfortran.dg/warn_unused_function_2.f90: Add two new 
+   "defined but not used" subroutines.
+
 2015-11-05  Richard Biener  
 
* gcc.dg/tree-ssa/loadpre2.c: Avoid undefined behavior due to
Index: gcc/testsuite/gfortran.dg/warn_unused_function_2.f90
===
--- gcc/testsuite/gfortran.dg/warn_unused_function_2.f90(revision 
229793)
+++ gcc/testsuite/gfortran.dg/warn_unused_function_2.f90(working copy)
@@ -2,6 +2,7 @@
 ! { dg-options "-Wall" }
 !
 ! [4.8 Regression] PR 54997: -Wunused-function gives false warnings
+! PR 54224: missing warnings with -Wunused-function
 !
 ! Contributed by Janus Weil 
 
@@ -14,6 +15,9 @@
 
   subroutine s1! { dg-warning "defined but not used" }
 call s2(s3)
+contains
+  subroutine s4! { dg-warning "defined but not used" }
+  end subroutine
   end subroutine
 
   subroutine s2(dummy) ! { dg-warning "Unused dummy argument" }
@@ -30,5 +34,10 @@
 entry en
 end subroutine
 
+program test
+contains
+  subroutine s5! { dg-warning "defined but not used" }
+  end subroutine
+end
 
 ! { dg-final { cleanup-modules "m" } }



[gomp4, committed] expand oacc kernels region at same time as oacc parallel region

2015-11-05 Thread Tom de Vries

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch ( 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer 
need the kernels region to be a part of the original function when doing 
alias analysis.


So, we no longer have the need to postpone splitting off the kernels 
region into a seperate function until after alias analysis, but we can 
do this at the same time as when we expand the parallel region.


The following patch series implements that:

 1  Move expansion of kernels region back to first omp-expand
 2  Update gate_oacc_kernels to handle oacc function
 3  Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
 4  Revert "Add pass_dominator::sese_mode_p ()"
 5  Handle oacc function in parloops
 6  Update goacc kernels C testcases
 7  Update goacc kernels Fortran testcases
 8  Release_defs in expand_omp_atomic_fetch_op
 9  Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as 
single commit. Various things in kernels support will be broken in 
intermediate stages. ]


Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.

Thanks,
- Tom


Re: [ping] Change behavior of -fsched-verbose option

2015-11-05 Thread Nikolai Bozhenov

On 11/05/2015 12:26 PM, Bernd Schmidt wrote:

On 11/05/2015 09:11 AM, Nikolai Bozhenov wrote:

Please commit if it is OK for trunk.


Hmm, do you have a copyright assignment on file?



Yes I do have a copyright assignment for all past and future changes
to GCC (RT:828836).

Thanks,
Nikolai


Re: [PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-05 Thread Bernd Schmidt

On 11/04/2015 01:54 PM, Julian Brown wrote:

That's not all targets by any means, but may be enough to warrant a
change in the interface. I propose that:

* The output_address function should have a machine_mode argument
   added. Bare addresses (e.g. the 'a' case in final.c) should pass
   "VOIDmode" for this argument.

* Other callers of output_address -- actually all in backends -- can
   pass the machine mode for the memory access in question.

* The TARGET_PRINT_OPERAND_ADDRESS hook shall also have a machine_mode
   argument added. The legacy PRINT_OPERAND_ADDRESS hook can be left
   alone. (The documentation for the operand-printing hooks needs fixing
   too, incidentally.)


I think this approach seems fine.


 static void
-mcore_print_operand_address (FILE * stream, rtx x)
+mcore_print_operand_address (FILE * stream, machine_mode mode ATTRIBUTE_UNUSED,
+rtx x)


So apparently we're settling on writing the unused arg as just 
"machine_mode" without a name. Please change everywhere.



@@ -1754,7 +1754,7 @@ mmix_print_operand_punct_valid_p (unsign
 /* TARGET_PRINT_OPERAND_ADDRESS.  */

 static void
-mmix_print_operand_address (FILE *stream, rtx x)
+mmix_print_operand_address (FILE *stream, machine_mode mode, rtx x)
 {
   if (REG_P (x))
 {


The arg appears to be unused - I'd expect to see a warning here.

Other thank that it looks OK. I'm not going to require that you test 
every target, but it would be good to have the full set built to cc1 
before and after, and please be on the lookout for fallout.



Bernd


Re: [sched] Dump dependency graph to a dot file

2015-11-05 Thread Nikolai Bozhenov

On 11/05/2015 12:18 PM, Kyrill Tkachov wrote:

Hi Nikolai,

On 05/11/15 08:29, Nikolai Bozhenov wrote:

Hi!

The attached patch adds a procedure to dump the scheduler's dependency
graph into a dot file. The patch has been bootstrapped and regtested
on x86_64. Please commit if it is OK for trunk.

Thanks,
Nikolai


A couple of style nits.

+  // begin subgraph (basic block)
+  pp_printf (&pp, "subgraph cluster_block_%d {\n", bb);
+  pp_printf (&pp, "\t" "color=blue;" "\n");
+  pp_printf (&pp, "\t" "style=bold;" "\n");
+  pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb));
+
+  // setup head and tail (no support for EBBs)
+  gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
+  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, 
&tail);

+  tail = NEXT_INSN (tail);
+
+  // dump all insns
+  for (con = head; con != tail; con = NEXT_INSN (con))
+{
+  if (!INSN_P (con))
+continue;
+
+  // pretty print the insn
+  pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con));

Please use C-style comments i.e. /**/ instead of //.
Also, throughout the comments leave two empty spaces after a full stop
i.e. /* .  */.

You can use the check_GNU_style.sh script in the contrib/ directory on
your patches to highlight similar issues. For example:
$ ./contrib/check_GNU_style.sh ~/patches/dep-graph.patch

Dot, space, space, end of comment.
83:+/* Dump dependency graph for the current region to a file using 
dot syntax. */
166:+/* Dump dependency graph for the current region to a file using 
dot syntax. */


Sentences should end with a dot.  Dot, space, space, end of the comment.
127:+  /* dump all deps */

Cheers,
Kyrill



Thanks for your remarks, Kyrill!

I've uploaded an updated patch.

Nikolai
2015-11-04  Nikolai Bozhenov  

	* sched-int.h (dump_rgn_dependencies_dot): Declare
	* sched-rgn.c (dump_rgn_dependencies_dot): New function
	* print-rtl.h (print_insn): Add prototype

diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h
index eb079af..f601d33 100644
--- a/gcc/print-rtl.h
+++ b/gcc/print-rtl.h
@@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx);
 #endif
 
 extern void dump_value_slim (FILE *, const_rtx, int);
 extern void dump_insn_slim (FILE *, const rtx_insn *);
 extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *,
 			   int, int);
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
+extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose);
+
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
 
 #endif  // GCC_PRINT_RTL_H
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 86f5821..4600347 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void);
 extern void sched_rgn_local_free (void);
 extern void extend_regions (void);
 extern void rgn_make_new_region_out_of_new_block (basic_block);
 
 extern void compute_priorities (void);
 extern void increase_insn_priority (rtx_insn *, int);
 extern void debug_rgn_dependencies (int);
 extern void debug_dependencies (rtx_insn *, rtx_insn *);
+extern void dump_rgn_dependencies_dot (FILE *);
+extern void dump_rgn_dependencies_dot (const char *);
+
 extern void free_rgn_deps (void);
 extern int contributes_to_priority (rtx_insn *, rtx_insn *);
 extern void extend_rgns (int *, int *, sbitmap, int *);
 extern void deps_join (struct deps_desc *, struct deps_desc *);
 
 extern void rgn_setup_common_sched_info (void);
 extern void rgn_setup_sched_infos (void);
 
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index eafb3fd..d744d83 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -58,16 +58,18 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-attr.h"
 #include "except.h"
 #include "params.h"
 #include "cfganal.h"
 #include "sched-int.h"
 #include "sel-sched.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "pretty-print.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
 /* Some accessor macros for h_i_d members only used within this file.  */
 #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load)
 #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn)
 
 /* nr_inter/spec counts interblock/speculative motion for the function.  */
@@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail)
 		   DEP_NONREG (dep) ? "n" : "",
 		   DEP_MULTIPLE (dep) ? "m" : "");
   }
   fprintf (sched_dump, "\n");
 }
 
   fprintf (sched_dump, "\n");
 }
+
+/* Dump dependency graph for the current region to a file using dot syntax.  */
+
+void
+dump_rgn_dependencies_dot (FILE *file)
+{
+  rtx_insn *head, *tail, *con, *pro;
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int bb;
+  pretty_printer pp;
+
+  pp.buffer->stream = file;
+  pp_printf (&pp, "digraph SchedDG {\n");
+
+  for (bb = 0; bb < current_nr_bl

[gomp4.5] Tweak OpenMP 4.5 mapping rules

2015-11-05 Thread Jakub Jelinek
Hi!

This patch updates the mapping of vars to what has been ratified
in OpenMP 4.5, or where left unspecified, hopefully follows the
discussed intent.

In particular:
1) for C++ references we map what they refer to, and on target
   construct privatize the references themselves (but not what
   they point to, because that is mapped)
2) same var may not be present in both data sharing and mapping
   clauses
3) structure element based array sections (or C++ references)
   don't have the structure elements privatized, but mapped with
   an always pointer store at the start of the region (except
   exit data; and update doesn't touch the structure elements)
4) omp_target_is_present on one past the last element really
   is about what mapping starts at that point, so essentially
   it is checking if the first byte at the specified address
   is mapped
5) zero length array sections pointing to one past the last
   element really are about what mapping starts at that point

>From the above, 3) is really not specified in the standard
and just based on the discussions we had, hopefully OpenMP 5.0 will
clarify, and 4)/5) are fuzzy in the standard and also based
on the discussions.

2015-11-05  Jakub Jelinek  

gcc/
* gimplify.c (omp_notice_variable): For references check
whether what it refers to has mappable type, rather than
the reference itself.
(gimplify_scan_omp_clauses): Add support for
GOMP_MAP_FIRSTPRIVATE_REFERENCE and GOMP_MAP_ALWAYS_POINTER,
remove old handling of structure element based array sections.
(gimplify_adjust_omp_clauses_1): For implicit references to
variables with reference type and when not ref to scalar or
ref to pointer, map what they refer to using tofrom and
use GOMP_MAP_FIRSTPRIVATE_REFERENCE for the reference.
(gimplify_adjust_omp_clauses): Remove GOMP_MAP_ALWAYS_POINTER
from target exit data.  Handle GOMP_MAP_FIRSTPRIVATE_REFERENCE.
Drop OMP_CLAUSE_MAP_PRIVATE support.
* omp-low.c (scan_sharing_clauses): Handle
GOMP_MAP_FIRSTPRIVATE_REFERENCE, drop OMP_CLAUSE_MAP_PRIVATE
support.
(lower_omp_target): Handle GOMP_MAP_FIRSTPRIVATE_REFERENCE
and GOMP_MAP_ALWAYS_POINTER.  Drop OMP_CLAUSE_MAP_PRIVATE
support.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_FIRSTPRIVATE_REFERENCE and GOMP_MAP_ALWAYS_POINTER.
Simplify.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Add
SIMD_CLONE_ARG_TYPE_LINEAR_{REF,VAL,UVAL}_VARIABLE_STEP
cases.
gcc/c/
* c-parser.c (c_parser_omp_target_data,
c_parser_omp_target_enter_data,
c_parser_omp_target_exit_data, c_parser_omp_target): Allow
GOMP_MAP_ALWAYS_POINTER.
* c-typeck.c (handle_omp_array_sections): For structure element
based array sections use GOMP_MAP_ALWAYS_POINTER instead of
GOMP_MAP_FIRSTPRIVATE_POINTER.
(c_finish_omp_clauses): Drop generic_field_head, structure
elements are now always mapped even as array section bases,
diagnose same var in data sharing and mapping clauses.
gcc/cp/
* parser.c (cp_parser_omp_target_data,
cp_parser_omp_target_enter_data,
cp_parser_omp_target_exit_data, cp_parser_omp_target): Allow
GOMP_MAP_ALWAYS_POINTER and GOMP_MAP_FIRSTPRIVATE_REFERENCE.
* semantics.c (handle_omp_array_sections): For structure element
based array sections use GOMP_MAP_ALWAYS_POINTER instead of
GOMP_MAP_FIRSTPRIVATE_POINTER.
(finish_omp_clauses): Drop generic_field_head, structure
elements are now always mapped even as array section bases,
diagnose same var in data sharing and mapping clauses.
For references map what they refer to using GOMP_MAP_ALWAYS_POINTER
for structure elements and GOMP_MAP_FIRSTPRIVATE_REFERENCE
otherwise.
gcc/testsuite/
* c-c++-common/gomp/clauses-2.c (foo): Adjust for diagnostics
of variables in both data sharing and mapping clauses and for
structure element based array sections being mapped rather than
privatized.
include/
* gomp-constants.h (enum gomp_map_kind): Add
GOMP_MAP_ALWAYS_POINTER and GOMP_MAP_FIRSTPRIVATE_REFERENCE.
libgomp/
* target.c (gomp_map_0len_lookup, gomp_map_val): New inline
functions.
(gomp_map_vars): Handle GOMP_MAP_ALWAYS_POINTER.  For
GOMP_MAP_ZERO_LEN_ARRAY_SECTION use gomp_map_0len_lookup.
Use gomp_map_val function.
(gomp_exit_data): For GOMP_MAP_*ZERO_LEN* use
gomp_map_0len_lookup instead of gomp_map_lookup.
(omp_target_is_present): Use gomp_map_0len_lookup instead of
gomp_map_lookup.
* testsuite/libgomp.c/target-12.c (main): Adjust for
omp_target_is_present change for one-past-last element.
* testsuite/libgomp.c/target-17.c (foo): Drop tests where
the same var

[gomp4, committed, 1/9] Move expansion of kernels region back to first omp-expand

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.



This patch moves expansion of the kernels region back to the first 
omp-expand pass, before ssa.


There's no longer a need for the region-replacing call to expand into 
BUILT_IN_GOACC_KERNELS_INTERNAL and subsequently map onto 
BUILT_IN_GOACC_PARALLEL during the second omp-expand pass.  The 
BUILT_IN_GOACC_KERNELS_INTERNAL call was modeled transparant to alias 
analysis, and that's no longer needed.


Thanks,
- Tom
Move expansion of kernels region back to first omp-expand

2015-11-04  Tom de Vries  

	* omp-low.c (expand_omp_target): Remove do_emit_library_call variable,
	assume true.  Remove do_splitoff variable, assume true.  Remove
	BUILT_IN_GOACC_KERNELS_INTERNAL handling.  Replace kernels region with a
	BUILT_IN_GOACC_PARALLEL call.
---
 gcc/omp-low.c | 57 -
 1 file changed, 4 insertions(+), 53 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6bac074..ac8c8d0 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12563,8 +12563,6 @@ expand_omp_target (struct omp_region *region)
   gimple *stmt;
   edge e;
   bool offloaded, data_region;
-  bool do_emit_library_call = true;
-  bool do_splitoff = true;
 
   entry_stmt = as_a  (last_stmt (region->entry));
 
@@ -12608,43 +12606,7 @@ expand_omp_target (struct omp_region *region)
   exit_bb = region->exit;
 
   if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
-{
-  if (!gimple_in_ssa_p (cfun))
-	{
-	  /* We need to do analysis and optimizations on the kernels region
-	 before splitoff.  Since that's hard to do on low gimple, we
-	 postpone the splitoff until we're in SSA.
-	 However, we do the emit of the corresponding function call already,
-	 in order to keep the arguments of the call alive until the
-	 splitoff.
-	 Since at this point the function that is called is empty, we can
-	 model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
-	 some of it's function arguments as non-escaping, so it acts less
-	 as an optimization barrier.  */
-	  do_splitoff = false;
-	  cfun->curr_properties &= ~PROP_gimple_eomp;
-
-	  mark_loops_in_oacc_kernels_region (region->entry, region->exit);
-	}
-  else
-	{
-	  /* Don't emit the library call.  We've already done that.  */
-	  do_emit_library_call = false;
-	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
-	 BUILT_IN_GOACC_PARALLELL.  Now that the function
-	 body will be split off, we can no longer regard the
-	 omp_data_array reference as non-escaping.  */
-	  gsi = gsi_last_bb (entry_bb);
-	  gsi_prev (&gsi);
-	  gcall *call = as_a  (gsi_stmt (gsi));
-	  gcc_assert (gimple_call_builtin_p
-		  (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
-	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_PARALLEL);
-	  gimple_call_set_fndecl (call, fndecl);
-	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
-	  gimple_call_reset_alias_info (call);
-	}
-}
+mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 
   basic_block entry_succ_bb = single_succ (entry_bb);
   if (offloaded && !gimple_in_ssa_p (cfun))
@@ -12654,8 +12616,7 @@ expand_omp_target (struct omp_region *region)
 	gsi_remove (&gsi, true);
 }
 
-  if (offloaded
-  && do_splitoff)
+  if (offloaded)
 {
   unsigned srcidx, dstidx, num;
 
@@ -12852,13 +12813,6 @@ expand_omp_target (struct omp_region *region)
   pop_cfun ();
 }
 
-  if (!do_emit_library_call)
-{
-  if (gimple_in_ssa_p (cfun))
-	update_ssa (TODO_update_ssa_only_virtuals);
-  return;
-}
-
   /* Emit a library call to launch the offloading region, or do data
  transfers.  */
   tree t1, t2, t3, t4, device, cond, depend, c, clauses;
@@ -1

Re: [PATCH] Use subreg_regno instead of subreg_regno_offset

2015-11-05 Thread Bernd Schmidt

On 11/02/2015 11:29 PM, Anatoly Sokolov wrote:



@@ -5522,12 +5516,7 @@
  op0 = SUBREG_REG (op0);
  code0 = GET_CODE (op0);
  if (code0 == REG && REGNO (op0) < FIRST_PSEUDO_REGISTER)
-  op0 = gen_rtx_REG (word_mode,
- (REGNO (op0) +
-  subreg_regno_offset (REGNO (SUBREG_REG (orig_op0)),
-   GET_MODE (SUBREG_REG (orig_op0)),
-   SUBREG_BYTE (orig_op0),
-   GET_MODE (orig_op0;
+  op0 = gen_rtx_REG (word_mode, subreg_regno (op0));
}


Same problem as in the reg-stack code? The existing code was using
orig_op0 to get the subreg, you've changed it to use op0 which is
already the SUBREG_REG.



No promblens here. At this point op0 is equivalent orig_op0. New value
to op0 can be assigned later.


Err, what? Before the quoted code we have
rtx op0 = orig_op0;
and then
   op0 = SUBREG_REG (op0);
Are you overlooking this line?


+  else if (REG_P (reg) +   && HARD_REGISTER_P (reg))


I don't see how this would even compile.


-  regno += subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)),
-SUBREG_BYTE (reg), GET_MODE (reg));
+  regno = subreg_regno (reg);   endregno = regno + subreg_nregs (reg);


This looks like you edited the patch? The endregno assignment is on its 
own line after this.


NAK, a final one as far as I'm concerned.


Bernd


[PATCH] New version of libmpx with new memmove wrapper

2015-11-05 Thread Aleksandra Tsvetkova
New version of libmpx was added. There is a new function get_bd() that
allows to get bounds directory. Wrapper for memmove was modified. Now
it moves data and then moves corresponding bounds directly from one
bounds table to another. This approach made moving unaligned pointers
possible. It also makes memmove function faster on sizes bigger than
64 bytes.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
old mode 100644
new mode 100755
index e693063..3026a05
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-10-27  Tsvetkova Alexandra  
+
+   * gcc.target/i386/mpx/memmove.c: New test for __mpx_wrapper_memmove.
+
 2015-10-27  Alan Lawrence  
 
* gcc.dg/vect/vect-strided-shift-1.c: New.
diff --git a/gcc/testsuite/gcc.target/i386/mpx/memmove.c 
b/gcc/testsuite/gcc.target/i386/mpx/memmove.c
new file mode 100755
index 000..3e2e61c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/memmove.c
@@ -0,0 +1,118 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+
+#include 
+#include 
+#include 
+#include 
+
+#ifdef __i386__
+/* i386 directory size is 4MB.  */
+#define MPX_NUM_L2_BITS 10
+#define MPX_NUM_IGN_BITS 2
+#else /* __i386__ */
+/* x86_64 directory size is 2GB.  */
+#define MPX_NUM_L2_BITS 17
+#define MPX_NUM_IGN_BITS 3
+#endif /* !__i386__ */
+
+/* bt_num_of_elems is the number of elements in bounds table.  */
+unsigned long bt_num_of_elems = (1UL << MPX_NUM_L2_BITS);
+/* Function to test MPX wrapper of memmove function.
+   src_bigger_dst determines which address is bigger, can be 0 or 1.
+   src_bt_index and dst_bt index are bt_indexes
+   from the beginning of the page.
+   bd_index_end is the bd index of the last element of src if we define
+   bd index of the first element as 0.
+   src_bt index_end is bt index of the last element of src.
+   pointers inside determines if array being copied includes pointers
+   src_align and dst_align are alignments of src and dst.
+   Arrays may contain unaligned pointers.  */
+int
+test (int src_bigger_dst, int src_bt_index, int dst_bt_index,
+  int bd_index_end, int src_bt_index_end, int pointers_inside,
+  int src_align, int dst_align)
+{
+  const int n =
+src_bt_index_end - src_bt_index + bd_index_end * bt_num_of_elems;
+  if (n < 0)
+{
+  return 0;
+}
+  const int num_of_pointers = (bd_index_end + 2) * bt_num_of_elems;
+  void **arr = 0;
+  posix_memalign ((void **) (&arr),
+   1UL << (MPX_NUM_L2_BITS + MPX_NUM_IGN_BITS),
+   num_of_pointers * sizeof (void *));
+  void **src = arr, **dst = arr;
+  if ((src_bigger_dst) && (src_bt_index < dst_bt_index))
+src_bt_index += bt_num_of_elems;
+  if (!(src_bigger_dst) && (src_bt_index > dst_bt_index))
+dst_bt_index += bt_num_of_elems;
+  src += src_bt_index;
+  dst += dst_bt_index;
+  char *realign = (char *) src;
+  realign += src_align;
+  src = (void **) realign;
+  realign = (char *) dst;
+  realign += src_align;
+  dst = (void **) realign;
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+src[i] = __bnd_set_ptr_bounds (arr + i, i * sizeof (void *) + 1);
+}
+  memmove (dst, src, n * sizeof (void *));
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+{
+  if (dst[i] != arr + i)
+abort ();
+  if (__bnd_get_ptr_lbound (dst[i]) != arr + i)
+abort ();
+  if (__bnd_get_ptr_ubound (dst[i]) != arr + 2 * i)
+abort ();
+}
+}
+  free (arr);
+  return 0;
+}
+
+/* Call testall to test common cases of memmove for MPX.  */
+void
+testall ()
+{
+  int align[3];
+  align[0] = 0;
+  align[1] = 1;
+  align[2] = 7;
+  test (0, 1, 2, 0, 2, 1, 0, 0);
+  for (int pointers_inside = 0; pointers_inside < 2; pointers_inside++)
+for (int src_bigger_dst = 0; src_bigger_dst < 2; src_bigger_dst++)
+  for (int src_align = 0; src_align < 3; src_align ++)
+for (int dst_align = 0; dst_align < 3; dst_align ++)
+  for (int pages = 0; pages < 4; pages++)
+{
+  test (src_bigger_dst, 1, 2, pages, 1, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 1, 2, pages, 2, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 2, 1, pages, 12, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 2, 1, pages, 1, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 2, 3, pages, 12, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 1, bt_num_of_elems - 2, pages, 2,
+pointers_inside, align[src_align], align[dst_align]);
+}
+};
+
+int
+main ()
+{
+  testall ();
+  return 0;
+}
diff --git a/libmpx/ChangeLog b/libmpx/ChangeLog
old mode 1006

[gomp4, committed, 2/9] Update gate_oacc_kernels to handle oacc function

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.


This patch updates the kernels pass group gate function.

Before, it needed to trigger on functions containing kernel regions.
Now, it needs to trigger on oacc functions that used to be kernels 
regions before they were split off.


Furthermore, I've duplicated the parloops gate here 
(flag_tree_parallelize_loops > 1).  There's not much sense in running 
the pass group unless we're trying to parallelize.


Consequently, I needed to add a "-ftree-parallelize-loops=32" settting 
to a testcase which missed that setting.


Thanks,
- Tom

Update gate_oacc_kernels to handle oacc function

2015-11-04  Tom de Vries  

	* tree-ssa-loop.c: Include omp-low.h.
	(gate_oacc_kernels): Test for flag_tree_parallelize_loops.  Test for
	oacc function attribute.  Test for loop with in_oacc_kernels_region.

	* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Run with
	-ftree-parallelize-loops=32.
---
 .../goacc/kernels-counter-var-redundant-load.c |  1 +
 gcc/tree-ssa-loop.c| 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
index c4ffc1d..bf59838 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-dom_oacc_kernels3" } */
 
 #include 
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 5e0b5a5..344c6c7 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
+#include "omp-low.h"
 
 
 /* A pass making sure loops are fixed up.  */
@@ -151,7 +152,26 @@ make_pass_tree_loop (gcc::context *ctxt)
 static bool
 gate_oacc_kernels (function *fn)
 {
-  return (fn->curr_properties & PROP_gimple_eomp) == 0;
+  if (flag_tree_parallelize_loops <= 1)
+return false;
+
+  tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
+  if (oacc_function_attr == NULL_TREE)
+return false;
+
+  tree val = TREE_VALUE (oacc_function_attr);
+  while (val != NULL_TREE && TREE_VALUE (val) == NULL_TREE)
+val = TREE_CHAIN (val);
+
+  if (val != NULL_TREE)
+return false;
+
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, 0)
+if (loop->in_oacc_kernels_region)
+  return true;
+
+  return false;
 }
 
 /* The oacc kernels superpass.  */
-- 
1.9.1



[gomp4, committed, 3/9] Revert "Add skip_stmt parm to pass_dominator::get_sese ()"

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.



We were using the skip_stmt functionality in pass_dominator to skip over 
a statement '.omp_data_i = &.omp_data_arr' in the kernels region.  That 
statement is removed during function split-off, so we no longer 
encounter that statement in the kernels pass group.


This patch reverts the stip_stmt functionality.

Thanks,
- Tom

Revert "Add skip_stmt parm to pass_dominator::get_sese ()"

2015-11-04  Tom de Vries  

	Revert:
	2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (dom_opt_dom_walker::set_skip_stmt): New function.
	(dom_opt_dom_walker::m_skip_stmt): New private var.
	(pass_dominator::get_sese): Add skip_stmt parameters.
	(pass_dominator::execute): Call set_skip_stmt with statement to skip for
	sese.
	(dom_opt_dom_walker::before_dom_children): Handle m_skip_stmt.
---
 gcc/tree-ssa-dom.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 6688a79..e1cf38b 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -505,7 +505,6 @@ public:
 
   virtual void before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
-  void set_skip_stmt (gimple *skip_stmt) { m_skip_stmt = skip_stmt; }
 
 private:
   void thread_across_edge (edge);
@@ -516,7 +515,6 @@ private:
 
   gcond *m_dummy_cond;
   bool m_jump_threading_p;
-  gimple *m_skip_stmt;
 };
 
 /* Jump threading, redundancy elimination and const/copy propagation.
@@ -544,11 +542,9 @@ class dominator_base : public gimple_opt_pass
   virtual bool sese_mode_p (void) { return false; }
 
   /* In sese mode, return true if there's another sese to visit.  Return the
- sese to visit in SESE_ENTRY and SESE_EXIT.  If a stmt in the sese should
- not be optimized, return it in SKIP_STMT.  */
+ sese to visit in SESE_ENTRY and SESE_EXIT.  */
   virtual bool get_sese (basic_block *sese_entry ATTRIBUTE_UNUSED,
-			 basic_block *sese_exit ATTRIBUTE_UNUSED,
-			 gimple **skip_stmt ATTRIBUTE_UNUSED)
+			 basic_block *sese_exit ATTRIBUTE_UNUSED)
 { gcc_unreachable (); }
 }; // class dominator_base
 
@@ -640,11 +636,8 @@ dominator_base::execute (function *fun)
   else
 {
   basic_block sese_entry, sese_exit;
-  gimple *skip_stmt = NULL;
-  while (get_sese (&sese_entry, &sese_exit, &skip_stmt))
+  while (get_sese (&sese_entry, &sese_exit))
 	{
-	  walker.set_skip_stmt (skip_stmt);
-
 	  threadedge_initialize_values ();
 	  avail_exprs_stack->push_marker ();
 	  const_and_copies->push_marker ();
@@ -797,8 +790,7 @@ public:
 protected:
   /* dominator_base methods: */
   virtual bool sese_mode_p (void) { return true; }
-  virtual bool get_sese (basic_block *sese_entry, basic_block *sese_exit,
-			 gimple **skip_stmt)
+  virtual bool get_sese (basic_block *sese_entry, basic_block *sese_exit)
   {
 if (m_regions == NULL)
   {
@@ -821,10 +813,6 @@ protected:
 *sese_entry = BASIC_BLOCK_FOR_FN (cfun, index);
 *sese_exit = get_oacc_kernels_region_exit (*sese_entry);
 
-tree omp_data_i = get_omp_data_i (single_pred (*sese_entry));
-if (omp_data_i != NULL_TREE)
-  *skip_stmt = SSA_NAME_DEF_STMT (omp_data_i);
-
 return true;
   }
 
@@ -1446,12 +1434,7 @@ dom_opt_dom_walker::before_dom_children (basic_block bb)
   m_avail_exprs_stack->pop_to_marker ();
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-{
-  if (gsi_stmt (gsi) == m_skip_stmt)
-	continue;
-
-  optimize_stmt (bb, gsi, m_const_and_copies, m_avail_exprs_stack);
-}
+optimize_stmt (bb, gsi, m_const_and_copies, m_avail_exprs_stack);
 
   /* Now prepare to process dominated blocks.  */
   if (m_jump_threading_p)
-- 
1.9.1



Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Alexander Monakov
On Thu, 5 Nov 2015, Joseph Myers wrote:
> On Wed, 4 Nov 2015, Martin Sebor wrote:
> 
> > Improving the compiler output is a good idea. The attached patch
> > prints "[disabled by -O0]" instead of "[enabled]" when an optimization
> > option is enabled by default but when optimization (i.e., -O1 or
> > greater) is not enabled.
> 
> I don't think it's entirely accurate that all options marked as 
> Optimization in *.opt are actually disabled by -O0.  Many are, but it 
> depends on the actual logic controlling each optimization.

Indeed, and Richard already pointed that out when reviewing (my) previous
attempt.  Martin, please review the thread ending at
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00113.html

I guess eventually we can slightly overhaul optimization options to remove
such ambiguities, but in short term I see no way to accurately determine
whether a pass/option is ultimately in effect or not.

Martin, what about taking Richard's suggestion, tacking a short explanatory
sentence onto the end of --help=optimizers output?

Thanks.
Alexander


[gomp4, committed, 4/9] Revert "Add pass_dominator::sese_mode_p ()"

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.


We needed the sese_mode_p functionality in the dominators pass to limit 
optimization scope to kernels regions. Now we no longer encounter 
kernels region in the kernels pass group, but split-off function that 
used to be  kernels region. So we no longer need this functionality.


This patch reverts the sese_mode_p functionality.

Thanks,
- Tom
Revert "Add pass_dominator::sese_mode_p ()"

2015-11-04  Tom de Vries  

	revert:
	2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (pass_dominator::jump_threading_p): Handle sese_mode_p.
	(pass_dominator::sese_mode_p, pass_dominator::get_sese): New protected
	virtual function.
	(pass_dominator::execute): Handle sese_mode_p.
---
 gcc/tree-ssa-dom.c | 78 ++
 1 file changed, 8 insertions(+), 70 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index e1cf38b..44253bf 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -44,7 +44,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dom.h"
 #include "gimplify.h"
 #include "tree-cfgcleanup.h"
-#include "cfgcleanup.h"
 #include "omp-low.h"
 
 /* This file implements optimizations on the dominator tree.  */
@@ -535,17 +534,7 @@ class dominator_base : public gimple_opt_pass
   unsigned int execute (function *);
 
   /* Return true if pass should perform jump threading.  */
-  virtual bool jump_threading_p (void) { return !sese_mode_p (); }
-
-  /* Return true if pass should visit a series of seses rather than the whole
- dominator tree.  */
-  virtual bool sese_mode_p (void) { return false; }
-
-  /* In sese mode, return true if there's another sese to visit.  Return the
- sese to visit in SESE_ENTRY and SESE_EXIT.  */
-  virtual bool get_sese (basic_block *sese_entry ATTRIBUTE_UNUSED,
-			 basic_block *sese_exit ATTRIBUTE_UNUSED)
-{ gcc_unreachable (); }
+  virtual bool jump_threading_p (void) { return true; }
 }; // class dominator_base
 
 const pass_data pass_data_dominator =
@@ -602,14 +591,11 @@ dominator_base::execute (function *fun)
  LOOPS_HAVE_PREHEADERS won't be needed here.  */
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
 
-  if (!sese_mode_p ())
-/* Initialize the value-handle array.  */
-threadedge_initialize_values ();
+  /* Initialize the value-handle array.  */
+  threadedge_initialize_values ();
 
   if (jump_threading_p ())
 {
-  gcc_assert (!sese_mode_p ());
-
   /* We need accurate information regarding back edges in the CFG
 	 for jump threading; this may include back edges that are not part of
 	 a single loop.  */
@@ -631,29 +617,7 @@ dominator_base::execute (function *fun)
 			 const_and_copies,
 			 avail_exprs_stack,
 			 jump_threading_p ());
-  if (!sese_mode_p ())
-walker.walk (fun->cfg->x_entry_block_ptr);
-  else
-{
-  basic_block sese_entry, sese_exit;
-  while (get_sese (&sese_entry, &sese_exit))
-	{
-	  threadedge_initialize_values ();
-	  avail_exprs_stack->push_marker ();
-	  const_and_copies->push_marker ();
-
-	  walker.walk_until (sese_entry, sese_exit, true);
-
-	  avail_exprs_stack->pop_to_marker ();
-	  const_and_copies->pop_to_marker ();
-	  threadedge_finalize_values ();
-
-	  /* KLUDGE: The dom_walker does not allow unreachable blocks when
-	 starting the walk, and during the dom_opt_dom_walker walk we may
-	 produce unreachable blocks, so we need to clean them up here.  */
-	  delete_unreachable_blocks ();
-	}
-}
+  walker.walk (fun->cfg->x_entry_block_ptr);
 
   {
 gimple_stmt_iterator gsi;
@@ -753,9 +717,8 @@ dominator_base::execute (function *fun)
   delete avail_exprs_stack;
   delete const_and_copies;
 
-  if (!sese_mode_p ())
-/* F

[gomp4, committed, 5/9] Handle oacc function in parloops

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.


This patch removes handling of kernels regions in tree-parloops.c, and 
adds handling of oacc functions that used to be kernels regions before 
they were split off.


That means we no longer add a parallel pragma. OTOH, we now have to 
clear PROP_gimple_eomp in order to trigger the subsequent omp-expand pass.


Thanks,
- Tom

Handle oacc function in parloops

2015-11-04  Tom de Vries  

	* omp-low.c (set_oacc_fn_attrib): Remove static.
	* omp-low.h (set_oacc_fn_attrib): Declare.
	* tree-parloops.c (create_parallel_loop): Remove region_entry parameter.
	Remove handling of oacc kernels pragma and GOACC_kernels_internal call.
	Remove insertion of oacc parallel pragma.  Set oacc function attributes.
	(gen_parallel_loop): Remove region_entry parameter.
	(get_omp_data_i_param): New function.
	(try_create_reduction_list): Use get_omp_data_i_param instead of
	gimple_stmt_omp_data_i_init_p.
	(ref_conflicts_with_region): Handle GIMPLE_RETURN.
	(oacc_entry_exit_ok_1): Same.  Add missing is_gimple_call test before
	gimple_call_internal_p test.
	(oacc_entry_exit_ok): Remove region_entry parameter.  Use
	get_omp_data_i_param instead of get_omp_data_i.  Set region_bbs to all
	bbs in function.  Use function entry as region entry.
	(parallelize_loops): Allow oacc functions and parallelized function if
	oacc_kernels_p.  Remove	region_entry variable.
	(pass_parallelize_loops_oacc_kernels::execute): Clear PROP_gimple_eomp
	if a loop was parallelized.
---
 gcc/omp-low.c   |   2 +-
 gcc/omp-low.h   |   1 +
 gcc/tree-parloops.c | 119 ++--
 3 files changed, 51 insertions(+), 71 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ac8c8d0..58cb959 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12456,7 +12456,7 @@ replace_oacc_fn_attrib (tree fn, tree dims)
function attribute.  Push any that are non-constant onto the ARGS
list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
 
-static void
+void
 set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
 {
   /* Must match GOMP_DIM ordering.  */
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index 7c9efdc..673b470 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -40,6 +40,7 @@ extern vec get_bbs_in_oacc_kernels_region (basic_block,
 extern void replace_oacc_fn_attrib (tree, tree);
 extern tree build_oacc_routine_dims (tree);
 extern tree get_oacc_fn_attrib (tree);
+extern void set_oacc_fn_attrib (tree, tree, vec *);
 
 extern GTY(()) vec *offload_funcs;
 extern GTY(()) vec *offload_vars;
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index f14cf8a..c038dfe 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2017,7 +2017,7 @@ transform_to_exit_first_loop (struct loop *loop,
 static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 		  tree new_data, unsigned n_threads, location_t loc,
-		  basic_block region_entry, bool oacc_kernels_p)
+		  bool oacc_kernels_p)
 {
   gimple_stmt_iterator gsi;
   basic_block bb, paral_bb, for_bb, ex_bb, continue_bb;
@@ -2039,10 +2039,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   paral_bb = single_pred (bb);
   gsi = gsi_last_bb (paral_bb);
 }
-  else
-/* Make sure the oacc parallel is inserted on top of the oacc kernels
-   region.  */
-gsi = gsi_last_bb (region_entry);
 
   if (!oacc_kernels_p)
 {
@@ -2056,50 +2052,10 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 }
   else
 {
-  /* Create oacc parallel pragma based on oacc kernels pragma and
-	 GOACC_kernels_internal call.  */
-  gomp_target *kernels = as_a  (gsi_stmt (gsi));
-
-  tree clauses = gimple_omp_target_clauses (kernels);
-  /*

[gomp4, committed, 6/9] Update goacc kernels C testcases

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.



This patch updates the goacc kernels C testcases.

Thanks,
- Tom

Update goacc kernels C testcases

2015-11-04  Tom de Vries  

	* c-c++-common/goacc/kernels-acc-loop-reduction.c: Scan for oacc
	function instead of oacc_parallel.
	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Same.
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Same.
	* c-c++-common/goacc/kernels-double-reduction.c: Same.
	* c-c++-common/goacc/kernels-independent.c: Same.
	* c-c++-common/goacc/kernels-loop-2-acc-loop.c:  Same.
	* c-c++-common/goacc/kernels-loop-2.c:  Same.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-3.c: Same.
	* c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-data-2.c: Same.
	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Same.
	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Same.
	* c-c++-common/goacc/kernels-loop-data-update.c: Same.
	* c-c++-common/goacc/kernels-loop-data.c: Same.
	* c-c++-common/goacc/kernels-loop-g.c: Same.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Same.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-n.c: Same.
	* c-c++-common/goacc/kernels-loop-nest-independent.c: Same.
	* c-c++-common/goacc/kernels-loop-nest.c: Same.
	* c-c++-common/goacc/kernels-loop.c: Same.
	* c-c++-common/goacc/kernels-one-counter-var.c: Same.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Same.
	* c-c++-common/goacc/kernels-reduction.c: Same.
	* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Same.  Update
	scan.
---
 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c  | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c  | 2 +-
 .../c-c++-common/goacc/kernels-counter-var-redundant-load.c| 7 +++
 .../c-c++-common/goacc/kernels-counter-vars-function-scope.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c| 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-independent.c | 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c | 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  | 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c  | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c| 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c| 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  | 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  | 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-nest-independent.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c   | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c| 3 ++-
 gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c | 3 ++-
 .../c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c | 2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c   | 3 ++-
 26 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/g

[gomp4, committed, 7/9] Update goacc kernels Fortran testcases

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.


This patch updates the goacc kernels Fortran testcases

Thanks,
- Tom

Update goacc kernels Fortran testcases

2015-11-04  Tom de Vries  

	* gfortran.dg/goacc/kernels-loop-2.f95: Scan for oacc function instead
	of oacc_parallel.
	* gfortran.dg/goacc/kernels-loop-data-2.f95:
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95:
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95:
	* gfortran.dg/goacc/kernels-loop-data-update.f95:
	* gfortran.dg/goacc/kernels-loop-data.f95:
	* gfortran.dg/goacc/kernels-loop-nest-independent.f95:
	* gfortran.dg/goacc/kernels-loop.f95:
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95| 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95| 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95   | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-nest-independent.f95   | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95| 2 +-
 .../gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index ccfc3ae..7fd6d4e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -42,4 +42,4 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
-! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 2f46c6d..f788f67 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -48,4 +48,4 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
-! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 34b4a06..3599052 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -48,4 +48,4 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
-! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95

[gomp4, committed, 8/9] Release_defs in expand_omp_atomic_fetch_op

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.



The parloops pass constructs an atomic update:
...
  #pragma omp atomic_load
  D.1839_59 = *&.paral_data_load.33_51->reduction.23;
  D.1840_60 = sum.27_56 + D.1839_59;
  #pragma omp atomic_store (D.1840_60);
...

The expand_omp_atomic_fetch_op function removes the update statement but 
doesn't release the ssa-name D.1840_60 defined by the update statement.


This causes an error when running ccp in lto1. We run into trouble here 
for this unreleased ssa-name, because SSA_NAME_VAR (var) == NULL_TREE 
and TREE_CODE (NULL_TREE) causes a sigsegv:

...
get_default_value (tree var)
{
  ccp_prop_value_t val = { UNINITIALIZED, NULL_TREE, 0 };
  gimple *stmt;

  stmt = SSA_NAME_DEF_STMT (var);

  if (gimple_nop_p (stmt))
{
  /* Variables defined by an empty statement are those used
 before being initialized.  If VAR is a local variable, we
 can assume initially that it is UNDEFINED, otherwise we must
 consider it VARYING.  */
  if (!virtual_operand_p (var)
  && TREE_CODE (SSA_NAME_VAR (var)) == VAR_DECL)
val.lattice_val = UNDEFINED;
...

This patch fixes the problem by releasing the ssa-name in 
expand_omp_atomic_fetch_op function.


Thanks,
- Tom
Release_defs in expand_omp_atomic_fetch_op

2015-11-04  Tom de Vries  

	* omp-low.c (expand_omp_atomic_fetch_op):  Release defs of update stmt.
---
 gcc/omp-low.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 58cb959..84accd9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11914,10 +11914,14 @@ expand_omp_atomic_fetch_op (basic_block load_bb,
   gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ATOMIC_STORE);
   gsi_remove (&gsi, true);
   gsi = gsi_last_bb (store_bb);
+  stmt = gsi_stmt (gsi);
   gsi_remove (&gsi, true);
 
   if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_no_phi);
+{
+  release_defs (stmt);
+  update_ssa (TODO_update_ssa_no_phi);
+}
 
   return true;
 }
-- 
1.9.1



Re: Fix 61441

2015-11-05 Thread Sujoy Saraswati
Hi,

> It seems this code has moved to fold-const.c, so the patch will need
> updating.

I have updated the patch and sending again.

> Now, is it actually correct for integer_valued_real_single_p (now) to
> treat sNaN as an integer?  Looking at existing uses, they all appear to be
> in match.pd: it's used for removing redundant trunc / floor / ceil / round
> / nearbyint on integer values.  In such cases, it's *incorrect* to count
> sNaN as an integer, because while those functions map qNaN quietly to
> qNaN, they should map sNaN to qNaN and raise "invalid" in the process.  So
> I don't think you should have this change at all.

Ok. I have also updated the comments for the functions stating they
return false for sNaN.

> (I don't see why the handling of rint checks flag_errno_math.  rint should
> quietly return infinite and qNaN arguments just like nearbyint; it never
> needs to set errno and distinguishing rint / nearbyint here is spurious;
> the only difference is regarding the "inexact" exception.)

With these changes, real_isinteger returns false for sNaN. This in
turn means that integer_valued_real_p would return false with sNaN.
This means that for sNaN, rint will not go to the path of checking
flag_errno_math. Anyway, I modified that also as part of this patch to
make rint similar to nearbyint.

> I realise this corresponds to some existing code (in const_binop, at
> least), but I don't think that existing code is a good model.
>
> If -fsignaling-nans, it's still perfectly fine to fold when the argument
> is a quiet NaN.  So what you actually want in such cases where you have a
> known constant argument - as opposed to an unknown argument that might or
> might not be constant - is to check whether that argument is itself a
> signaling NaN.  So you want to add a new REAL_VALUE_ISSIGNALING, or
> something like that.  And then find existing code that has the wrong (NaN
> and sNaNs supported) check and convert it to checking (this value is
> sNaN).
>
> (This code has also moved to another function in fold-const.c, it seems.)
>
> Cf. the match.pd handling of fmin / fmax checking for signaling NaN
> arguments explicitly.
>
> Some other places in this patch have similar issues with checking
> HONOR_SNANS && REAL_VALUE_ISNAN when they should check if the particular
> value is sNaN.

I modified the code to check for sNaN specifically. The modified patch is below.

Bootstrap and regression tests on x86_64-linux-gnu passed with changes
done on trunk.

Is this fine ?

Regards,
Sujoy

2015-11-04  Sujoy Saraswati 

   PR tree-optimization/61441
   * fold-const-call.c (fold_const_pow): Avoid the operation
   if flag_signaling_nans is on and the operand is an sNaN.
   (fold_const_call_sss): Same for BUILT_IN_POWI.
   * fold-const.c (const_binop): Convert sNaN to qNaN when
   flag_signaling_nans is off.
   (const_unop): Avoid the operation, other than NEGATE and
   ABS, if flag_signaling_nans is on and the operand is an sNaN.
   (fold_convert_const_real_from_real): Avoid the operation if
   flag_signaling_nans is on and the operand is an sNaN.
   (integer_valued_real_unary_p): Update comment stating it
   returns false for sNaN values.
   (integer_valued_real_binary_p, integer_valued_real_call_p): Same.
   (integer_valued_real_single_p): Same.
   (integer_valued_real_invalid_p, integer_valued_real_p): Same.
   * gimple-fold.c (gimple_assign_integer_valued_real_p): Same.
   (gimple_call_integer_valued_real_p): Same.
   (gimple_phi_integer_valued_real_p): Same.
   (gimple_stmt_integer_valued_real_p): Same.
   * match.pd (f(x) -> x): Removed flag_errno_math check for RINT.
   * real.c (do_add): Make resulting NaN value to be qNaN.
   (do_multiply, do_divide, do_fix_trunc): Same.
   (real_arithmetic, real_ldexp): Same.
   (real_issignaling_nan): New.
   (real_isinteger): Updated comment stating it returns false for sNaN.
   * real.h (real_issignaling_nan, REAL_VALUE_ISSIGNALING_NAN): New.
   * simplify-rtx.c (simplify_const_unary_operation): Avoid the
   operation if flag_signaling_nans is on and the operand is an sNaN.
   (simplify_const_binary_operation): Check for sNaN instead of NaN.
   * tree-ssa-math-opts.c (gimple_expand_builtin_pow): Avoid the
   operation if flag_signaling_nans is on and the operand is an sNaN.

   PR tree-optimization/61441
   * gcc.dg/pr61441.c: New testcase.

Index: gcc/fold-const-call.c
===
--- gcc/fold-const-call.c   (revision 229805)
+++ gcc/fold-const-call.c   (working copy)
@@ -484,7 +484,11 @@ fold_const_pow (real_value *result, const real_val
  || !real_equal (arg0, &dconst0)))
 {
   bool inexact = real_powi (result, format, arg0, n1);
-  if (flag_unsafe_math_optimizations || !inexact)
+  /* Avoid the folding if flag_signaling_nans is on.  */
+  if (flag_un

[gomp4, committed, 9/9] Remove BUILT_IN_GOACC_KERNELS_INTERNAL

2015-11-05 Thread Tom de Vries

On 05/11/15 11:16, Tom de Vries wrote:

Hi,

now that we have committed -foffload-alias in gomp-4_0-branch (
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer
need the kernels region to be a part of the original function when doing
alias analysis.

So, we no longer have the need to postpone splitting off the kernels
region into a seperate function until after alias analysis, but we can
do this at the same time as when we expand the parallel region.

The following patch series implements that:

  1Move expansion of kernels region back to first omp-expand
  2Update gate_oacc_kernels to handle oacc function
  3Revert "Add skip_stmt parm to pass_dominator::get_sese ()"
  4Revert "Add pass_dominator::sese_mode_p ()"
  5Handle oacc function in parloops
  6Update goacc kernels C testcases
  7Update goacc kernels Fortran testcases
  8Release_defs in expand_omp_atomic_fetch_op
  9Remove BUILT_IN_GOACC_KERNELS_INTERNAL

[ The patch series is broken up into logical bits, but intended as
single commit. Various things in kernels support will be broken in
intermediate stages. ]

Committed to gomp-4_0-branch.

I'll post the patches in reply to this message.


This patch removes the no longer needed BUILT_IN_GOACC_KERNELS_INTERNAL 
builtin.


Thanks,
- Tom

Remove BUILT_IN_GOACC_KERNELS_INTERNAL

2015-11-04  Tom de Vries  

	* omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Remove
	DEF_GOACC_BUILTIN_FNSPEC.
	* omp-low.c (expand_omp_target): Remove handling of
	BUILT_IN_GOACC_KERNELS_INTERNAL.
---
 gcc/omp-builtins.def | 5 -
 gcc/omp-low.c| 1 -
 2 files changed, 6 deletions(-)

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 0478eaf..6908f94 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -43,11 +43,6 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_ENTER_EXIT_DATA,
 			  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 			  ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST,
 			  ATTR_NOTHROW_LIST, "...rrr")
-DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_KERNELS_INTERNAL,
-			  "GOACC_kernels_internal",
-			  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
-			  ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST,
-			  ATTR_NOTHROW_LIST, "...rrr")
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
 		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 		   ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 84accd9..643d017 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12994,7 +12994,6 @@ expand_omp_target (struct omp_region *region)
 	depend = build_int_cst (ptr_type_node, 0);
   args.quick_push (depend);
   break;
-case BUILT_IN_GOACC_KERNELS_INTERNAL:
 case BUILT_IN_GOACC_PARALLEL:
   {
 	set_oacc_fn_attrib (child_fn, clauses, &args);
-- 
1.9.1



Re: OpenACC dimension range propagation optimization

2015-11-05 Thread Richard Biener
On Wed, Nov 4, 2015 at 6:29 PM, Nathan Sidwell  wrote:
> On 11/04/15 05:26, Richard Biener wrote:
>>
>> On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell  wrote:
>>>
>>> Richard,
>
>
>> this all seems a little bit fragile and relying on implementation details?
>> Is the attribute always present?  Is the call argument always a constant
>> that fits in a HOST_WIDE_INT (or even int here)?  Are there always enough
>> 'dims' in the tree list?  Is the 'dim' value always an INTEGER_CST that
>> fits a HOST_WIDE_INT (or even an int here)?
>
>
>
>> If so I'd like to see helper functions to hide these implementation
>> details
>> from generic code like this.
>
>
> Like this?
>
> I've added two helper functions to omp-low.c, one to  get the internal fn
> arg number and the other to get a dimension value, given an axis number.
> omp-low seemed the most appropriate point -- that's  where the dimension
> processing is, and the generation of these internal fn calls.
>
> (Bernd, I'll fixup the dimension folding patch to use these calls before
> applying it.)
>
> ok?

Ok

Thanks,
Richard.

> nathan


Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-05 Thread Andreas Arnez
On Thu, Nov 05 2015, Bernd Schmidt wrote:

> On 11/04/2015 05:17 PM, Andreas Arnez wrote:
>>
>> gcc/c/ChangeLog:
>>
>>  PR debug/67192
>>  * c-parser.c (c_parser_while_statement): Finish the loop before
>>  parsing ahead for misleading indentation.
>>  (c_parser_for_statement): Likewise.
>
> This is OK.

Thanks again for reviewing.  Are you going to look at patch #2 as well?

> Does C++ have similar issues?

Not this particular issue, AFAIK.  But I've just looked at how C++ fares
with the enhanced version of pr67192.c from patch #2.  There I see the
following:

  Breakpoint 2, f4 () at pr67192.cc:54
  (gdb) p cnt
  $1 = 16

I.e., when breaking on "while (1)" the first loop iteration has already
executed.  This is because the C++ parser assigns the backward-goto to
the 'while' token.  It's the same issue you pointed at with version 2 of
my patch.

Shall I open a bug for that?

--
Andreas



Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-05 Thread Bernd Schmidt

On 11/05/2015 12:33 PM, Andreas Arnez wrote:


Thanks again for reviewing.  Are you going to look at patch #2 as well?


Yeah, still thinking about that one.


Does C++ have similar issues?


Not this particular issue, AFAIK.  But I've just looked at how C++ fares
with the enhanced version of pr67192.c from patch #2.  There I see the
following:

   Breakpoint 2, f4 () at pr67192.cc:54
   (gdb) p cnt
   $1 = 16

I.e., when breaking on "while (1)" the first loop iteration has already
executed.  This is because the C++ parser assigns the backward-goto to
the 'while' token.  It's the same issue you pointed at with version 2 of
my patch.

Shall I open a bug for that?


I'd obviously prefer if you'd manage to get the two frontends behave 
identically. The alternative would be to open a bug.



Bernd


Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-05 Thread Kyrill Tkachov

Hi Segher,

On 04/11/15 23:50, Segher Boessenkool wrote:

Hi Kyrill,

On Mon, Nov 02, 2015 at 02:15:27PM +, Kyrill Tkachov wrote:

This patch attempts to restrict combine from transforming ZERO_EXTEND and
SIGN_EXTEND operations into and-bitmask
and weird SUBREG expressions when they appear inside MULT expressions. This
is because a MULT rtx containing these
extend operations is usually a well understood widening multiply operation.

Right.  I have often wanted for combine to try plain substitution as well
as substitution and simplification: simplification (which uses nonzero_bits
etc.) often makes a mess of even moderately complex instructions.

But since we do not have that yet, and your 24% number is nicely impressive,
let's try to do no simplification for widening mults, as in your patch.


With this patch on aarch64 I saw increased generation of smaddl and umaddl
instructions where
before the combination of smull and add instructions were rejected because
the extend operands
were being transformed into these weird equivalent expressions.

Overall, I see 24% more [su]maddl instructions being generated for SPEC2006
and with no performance regressions.

The testcase in the patch is the most minimal one I could get that
demonstrates the issue I'm trying to solve.

Does this approach look ok?

In broad lines it does.  Could you try this patch instead?  Not tested
etc. (other than building an aarch64 cross and your test case); I'll do
that tomorrow if you like the result.


Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline




Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..3bf7ffb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
  || GET_CODE (SET_DEST (x)) == PC))
fmt = "ie";
  
+  /* Substituting into the operands of a widening MULT is not likely

+to create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   return x;


I think we should also add:
  && REG_P (XEXP (XEXP (x, 0), 0))
  && REG_P (XEXP (XEXP (x, 1), 0))

to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
  [(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (ashiftrt:SI
   (match_operand:SI 1 "s_register_operand" "r")
   (const_int 16))
  (sign_extend:SI
   (match_operand:HI 2 "s_register_operand" "r")))
 (match_operand:SI 3 "s_register_operand" "r")))]


There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.

For the correctness issue I saw on aarch64 the shortest case I could reduce is:
short int a[16], b[16];
void
f5 (void)
{
  a[0] = b[0] / 14;
}

We synthesise the division and lose one of the intermediate immediates.
The good codegen is:
f5:
adrpx1, b
movw0, 9363// <--- This gets lost!
movkw0, 0x9249, lsl 16 // <--- This gets lost!
adrpx2, a
ldrshw1, [x1, #:lo12:b]
smullx0, w1, w0
lsrx0, x0, 32
addw0, w1, w0
asrw0, w0, 3
subw0, w0, w1, asr 31
strhw0, [x2, #:lo12:a]
ret

The bad one with this patch is:
adrpx1, b
adrpx2, a
ldrshw1, [x1, #:lo12:b]
smullx0, w1, w0
lsrx0, x0, 32
addw0, w1, w0
asrw0, w0, 3
subw0, w0, w1, asr 31
strhw0, [x2, #:lo12:a]
ret


The problematic hunk there is the subst hunk.
In the expression 'x':
(mult:DI (sign_extend:DI (reg:SI 80 [ b ]))
(sign_extend:DI (reg:SI 82)))

it tries to substitute 'from': (reg:SI 82)
with 'to': (const_int -1840700269 [0x92492493])

since we now return just 'x' combine thinks that the substitution succeeded
and eliminates the immediate move.
Is there a way that subst can signal some kind of "failed to substitute" result?
If not, I got it to work by using that check to set the in_dest variable to the
subsequent recursive call to subst, in a similar way to my original patch, but 
I was
hoping to avoid overloading the meaning of in_dest.

Thanks,
Kyrill


+
/* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 constant.  */
if (fmt[0] == 'e')
@@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned 
HOST_WIDE_INT mask,
/* ... fall through ...  */
  
  case MULT:

+  /* Substituting into the operands of a widening MUL

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Thomas Schwinge
Hi Cesar!

On Tue, 3 Nov 2015 14:16:59 -0800, Cesar Philippidis  
wrote:
> This patch does the following to the c and c++ front ends:

>  * updates c_oacc_split_loop_clauses to filter out the loop clauses
>from combined parallel/kernels loops

>   gcc/c-family/
>   * c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
>   AUTO, SEQ and independent as loop clauses.  Associate REDUCTION
>   clauses with parallel and kernels.

> --- a/gcc/c-family/c-omp.c
> +++ b/gcc/c-family/c-omp.c
> @@ -709,12 +709,21 @@ c_oacc_split_loop_clauses (tree clauses, tree 
> *not_loop_clauses)
>  
>switch (OMP_CLAUSE_CODE (clauses))
>  {
> +   /* Loop clauses.  */
>   case OMP_CLAUSE_COLLAPSE:
> - case OMP_CLAUSE_REDUCTION:
> + case OMP_CLAUSE_TILE:
> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_AUTO:
> + case OMP_CLAUSE_SEQ:
> + case OMP_CLAUSE_INDEPENDENT:
> OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
> loop_clauses = clauses;
> break;
>  
> +   /* Parallel/kernels clauses.  */
> +
>   default:
> OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
> *not_loop_clauses = clauses;

Contrary to your ChangeLog entry, this is not duplicating but is moving
OMP_CLAUSE_REDUCTION handling.  Is that intentional?  (And, doesn't
anything break in the testsuite?)


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] Fix c-c++-common/torture/vector-compare-1.c on powerpc

2015-11-05 Thread Ilya Enkovich
On 29 Oct 15:29, Richard Biener wrote:
> 
> I think this should unconditionally produce the COND_EXPR and
> build cst_true using build_all_ones_cst (stype).
> 
> Ok with that change.
> 
> Thanks,
> Richard.
> 

Here is an updated patch version.  Bootstrapped and regtested on 
powerpc64le-unknown-linux-gnu and x86_64-unknown-linux-gnu.  Applied to trunk.

Thanks,
Ilya
--
gcc/

2015-11-05  Ilya Enkovich  

* tree-vect-generic.c (do_compare): Use -1 for true
result instead of 1.


diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d0a4e0f..b59f699 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -161,10 +161,16 @@ static tree
 do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
tree bitpos, tree bitsize, enum tree_code code, tree type)
 {
+  tree stype = TREE_TYPE (type);
+  tree cst_false = build_zero_cst (stype);
+  tree cst_true = build_all_ones_cst (stype);
+  tree cmp;
+
   a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
   b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
 
-  return gimplify_build2 (gsi, code, TREE_TYPE (type), a, b);
+  cmp = build2 (code, boolean_type_node, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, stype, cmp, cst_true, cst_false);
 }
 
 /* Expand vector addition to scalars.  This does bit twiddling


[PATCH, ARM] PR target/68059 libgcc should not use __write for printing fatal error

2015-11-05 Thread Szabolcs Nagy

libgcc/config/arm/linux-atomic-64bit.c uses __write to print an error
message if the 64bit xchg method is not available in the kernel.

__write is not part of the public libc abi.  Since this code is only
run on linux the write syscall can be invoked directly.

And __builtin_trap is a simpler way to crash than abort.

The new behaviour on a linux kernel before v3.1:

# ./a.out
A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)
Illegal instruction

OK for trunk and backporting?

libgcc/ChangeLog:

2015-11-05  Szabolcs Nagy  

PR target/68059
* config/arm/linux-atomic-64bit.c (__write): Remove declaration.
(abort): Likewise.
(linux_write): Define.
diff --git a/libgcc/config/arm/linux-atomic-64bit.c b/libgcc/config/arm/linux-atomic-64bit.c
index cdf713c..aba3334 100644
--- a/libgcc/config/arm/linux-atomic-64bit.c
+++ b/libgcc/config/arm/linux-atomic-64bit.c
@@ -33,9 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
kernels; we check for that in an init section and bail out rather
unceremoneously.  */
 
-extern unsigned int __write (int fd, const void *buf, unsigned int count);
-extern void abort (void);
-
 /* Kernel helper for compare-and-exchange.  */
 typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
 	const long long* newval,
@@ -45,6 +42,19 @@ typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
 /* Kernel helper page version number.  */
 #define __kernel_helper_version (*(unsigned int *)0x0ffc)
 
+static void
+linux_write (int fd, const void *buf, unsigned int count)
+{
+	register long r7 asm ("r7") = 4; /* Linux __NR_write.  */
+	register long r0 asm ("r0") = fd;
+	register long r1 asm ("r1") = (long)buf;
+	register long r2 asm ("r2") = count;
+	asm volatile ("svc 0"
+		:
+		: "r" (r7), "r" (r0), "r" (r1), "r" (r2)
+		: "memory");
+}
+
 /* Check that the kernel has a new enough version at load.  */
 static void __check_for_sync8_kernelhelper (void)
 {
@@ -53,11 +63,9 @@ static void __check_for_sync8_kernelhelper (void)
   const char err[] = "A newer kernel is required to run this binary. "
 "(__kernel_cmpxchg64 helper)\n";
   /* At this point we need a way to crash with some information
-	 for the user - I'm not sure I can rely on much else being
-	 available at this point, so do the same as generic-morestack.c
-	 write () and abort ().  */
-  __write (2 /* stderr.  */, err, sizeof (err));
-  abort ();
+	 for the user.  */
+  linux_write (2 /* stderr.  */, err, sizeof (err));
+  __builtin_trap ();
 }
 };
 


Re: [ping] Fix PR debug/66728

2015-11-05 Thread Richard Biener
On Thu, Nov 5, 2015 at 12:45 AM, Mike Stump  wrote:
>
> On Nov 4, 2015, at 12:50 PM, Richard Sandiford  
> wrote:
>
>> Mike Stump  writes:
>>> Index: dwarf2out.c
>>> ===
>>> --- dwarf2out.c  (revision 229720)
>>> +++ dwarf2out.c  (working copy)
>>> @@ -15593,8 +15593,13 @@
>>>   return true;
>>>
>>> case CONST_WIDE_INT:
>>> -  add_AT_wide (die, DW_AT_const_value,
>>> -   std::make_pair (rtl, GET_MODE (rtl)));
>>> +  {
>>> +wide_int w1 = std::make_pair (rtl, MAX_MODE_INT);
>>> +int prec = MIN (wi::min_precision (w1, UNSIGNED),
>>> +(unsigned int)CONST_WIDE_INT_NUNITS (rtl) * 
>>> HOST_BITS_PER_WIDE_INT);
>>> +wide_int w = wide_int::from (w1, prec, UNSIGNED);
>>> +add_AT_wide (die, DW_AT_const_value, w);
>>> +  }
>>>   return true;
>>>
>>> case CONST_DOUBLE:
>>
>> Setting the precision based on CONST_WIDE_INT_NUNITS means that
>> we might end up with two different precisions for two values of
>> the same variable.  E.g. for a 192-bit type, 1<<64 would be given
>> a precision of 128 (because it needs two HWIs to store) but 1<<128
>> would be given a precision of 192 (because it needs three HWIs to store).
>> We could then abort when comparing them for equality, since equality
>> needs both integers to have the same precision.  E.g. from same_dw_val_p:
>>
>>case dw_val_class_wide_int:
>>  return *v1->v.val_wide == *v2->v.val_wide;
>
> Yeah, seems like we should have a v1.prec == v2.prec && on that.  The bad 
> news, there are four of them that are like this.  The good news, 3 of them 
> are location operands, and I don’t think they can hit for a very long time.  
> I think this is an oversight from the double_int version of the code where we 
> just check the 128 bits for equality.  We can see if Richard wants to weigh 
> in.  I think I’d just pre-approve the change, though, I think a helper to 
> perform mixed equality testing would be the way to go as there are 4 of them, 
> and I pretty sure they should all use the mixed version.  Though, maybe the 
> location list versions are never mixed.  If they aren’t, then there is only 1 
> client, so, I’d just do the precision test inline.  Anyone able to comment on 
> the location list aspect of this?

No idea on location lists but maybe this means we should just use the
maximum supported integer mode for CONST_WIDE_INTs?


Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices

2015-11-05 Thread Alan Lawrence
On 30/10/15 10:54, Eric Botcazou wrote:
> On 30/10/15 10:44, Richard Biener wrote:
>>
>> I think you want to use wide-ints here and
>>
>> wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>> wide_int maxidx = ...
>>
>> you can then simply iterate minidx with ++ and do the final compare
>> against maxidx
>> with while (++idx <= maxidx).  For the array ref index we want to use
>> TYPE_DOMAIN
>> as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN 
>> (...)..idx).
[...]
> But using offset_int should be OK, see for example get_ref_base_and_extent.
>

Here's a patch using offset_int. (Not as easy to construct as wide_int::from,
the sign-extend is what appeared to be done elsewhere that constructs
offset_ints).

Tested by bootstrap+check-{gcc,g++,ada,fortran} with the rest of the patchset
(which causes an array[-1..1] to be completely scalarized, among others), on
x86_64 and ARM.

I don't have a test without all that (such would have to be in Ada, and trigger
SRA of such an array but not involving the constant pool); is it OK without?

gcc/ChangeLog:

* tree-sra.c (completely_scalarize): Properly handle negative array
indices using offset_int.
---
 gcc/tree-sra.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index e15df1f..6168a7e 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
if (maxidx)
  {
gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
-   /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
-   unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
-   - tree_to_shwi (minidx);
-   unsigned HOST_WIDE_INT idx = 0;
-   do
+   tree domain = TYPE_DOMAIN (decl_type);
+   /* MINIDX and MAXIDX are inclusive, and must be interpreted in
+  DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
+   offset_int idx = wi::to_offset (minidx);
+   offset_int max = wi::to_offset (maxidx);
+   if (!TYPE_UNSIGNED (domain))
  {
-   tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
+   idx = wi::sext (idx, TYPE_PRECISION (domain));
+   max = wi::sext (max, TYPE_PRECISION (domain));
+ }
+   for (int el_off = offset; wi::les_p (idx, max); ++idx)
+ {
+   tree nref = build4 (ARRAY_REF, elemtype,
+   ref,
+   wide_int_to_tree (domain, idx),
NULL_TREE, NULL_TREE);
-   int el_off = offset + idx * el_size;
scalarize_elem (base, el_off, el_size, nref, elemtype);
+   el_off += el_size;
  }
-   while (++idx <= lenp1);
  }
   }
   break;
-- 
1.9.1



Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-11-05 Thread Alan Lawrence
On 3 November 2015 at 11:35, Richard Biener  wrote:
>
> I think this should simply re-write A << B to (type) (unsigned-type) A
> * (1U << B).
>
> Does that then still vectorize the signed case?

I didn't realize our representation of chrec's could express that.
Yes, it does - thanks! (And the avx512ifma- test is compiled without warnings.)

Patch attached. I've added a platform-independent version of the failing AVX512
test too.

--Alan

gcc/ChangeLog:

PR tree-optimization/65963
* tree-scalar-evolution.c (interpret_rhs_expr): Try to handle
LSHIFT_EXPRs as equivalent unsigned MULT_EXPRs.

gcc/testsuite/ChangeLog:

* gcc.dg/pr68112.c: New.
* gcc.dg/vect/vect-strided-shift-1.c: New.
---
 gcc/testsuite/gcc.dg/pr68112.c   | 11 
 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
 gcc/tree-scalar-evolution.c  | 17 
 3 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr68112.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c

diff --git a/gcc/testsuite/gcc.dg/pr68112.c b/gcc/testsuite/gcc.dg/pr68112.c
new file mode 100644
index 000..0a45b03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68112.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Waggressive-loop-optimizations" } */
+
+int *a;
+
+void
+foo ()
+{
+  for (int i = 0; i < 65536; i++)
+*a = i << 24;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 000..b1ce2ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963.  */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+  for (int i = 0; i < N; i++)
+out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+  check_vect ();
+  for (int i = 0; i < 2*N; i++)
+{
+  in[i] = i;
+  __asm__ volatile ("" : : : "memory");
+}
+  loop ();
+  __asm__ volatile ("" : : : "memory");
+  for (int i = 0; i < N; i++)
+{
+  if (out[i] != i*2 + 7)
+   abort ();
+}
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..60d515d 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,23 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
   res = chrec_fold_multiply (type, chrec1, chrec2);
   break;
 
+case LSHIFT_EXPR:
+  {
+   /* Handle A<

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-05 Thread Alan Lawrence
On 03/11/15 13:39, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence  wrote:
>>
>> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
>> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
>
> As said on IRC it should be 4 1 1 1 and 4 1 1 1.

Right. And so, if we have a twelve-element group (0 1 1 1 1 1 1 1 1 1 1 1), by
the time it became three subgroups, these should each be (8 1 1 1), via an
intermediate stage of (4 1 1 1 1 1 1 1) (8 1 1 1). This leads to the code in
the attached patch.

> No, I don't think we can split load groups that way.  So I think if
> splitting store
> groups works well (with having larger load groups) then that's the way to go
> (even for loop vect).

Well, slp-11a.c still fails if I enable the splitting for non-BB SLP; I was
thinking this was because I needed to split the load groups too, but maybe not
- maybe this is a separate bug/issue with hybrid SLP. Whatever the reason, I
still think splitting groups in hybrid SLP is another patch. (Do we really want
to put off handling the basic-block case until it works for hybrid SLP as well?
IMHO I would think not.) It sounds as if the approach of restricting splitting
to store groups with appropriate asserts GROUP_GAP == 1 is thus the right thing
to do in the longer term too (hence, renamed vect_split_slp_store_group to
emphasize that) - at least until we remove that restriction on SLP generally.

Bootstrapped + check-{gcc,g++,gfortran} on x86_64, AArch64, ARM.

Re. the extra skipping loop, I think it would theoretically be possible for the
recursive call to vect_slp_analyze to succeed on an element where the original
failed, because it may have more num_permutes remaining (after skipping over
the first vector). So there's a second argument (besides code complexity) for
dropping that part?

gcc/ChangeLog:

* tree-vect-slp.c (vect_split_slp_store_group): New.
(vect_analyze_slp_instance): Recurse on subgroup(s) if
vect_build_slp_tree fails during basic block SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
* gcc.dg/vect/bb-slp-subgroups-1.c: New.
* gcc.dg/vect/bb-slp-subgroups-2.c: New.
* gcc.dg/vect/bb-slp-subgroups-3.c: New.
* gcc.dg/vect/bb-slp-subgroups-4.c: New.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 +--
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 
 gcc/tree-vect-slp.c| 87 +-
 6 files changed, 259 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
index ab54a48..b8bef8c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
@@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
   unsigned int *pout = &out[0];
   unsigned int a0, a1, a2, a3;
 
-  /* Non isomorphic.  */
+  /* Non isomorphic, even 64-bit subgroups.  */
   a0 = *pin++ + 23;
-  a1 = *pin++ + 142;
+  a1 = *pin++ * 142;
   a2 = *pin++ + 2;
   a3 = *pin++ * 31;
-  
+
   *pout++ = a0 * x;
   *pout++ = a1 * y;
   *pout++ = a2 * x;
@@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
 
   /* Check results.  */
   if (out[0] != (in[0] + 23) * x
-  || out[1] != (in[1] + 142) * y
+  || out[1] != (in[1] * 142) * y
   || out[2] != (in[2] + 2) * x
   || out[3] != (in[3] * 31) * y)
 abort();
@@ -47,4 +47,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
-  
+
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
new file mode 100644
index 000..39c23c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_int } */
+/* PR tree-optimization/67682.  */
+
+#include "tree-vect.h"
+
+int __attribute__((__aligned__(8))) a[8];
+int __attribute__((__aligned__(8))) b[4];
+
+__attribute__ ((noinline)) void
+test ()
+{
+a[0] = b[0];
+a[1] = b[1];
+a[2] = b[2];
+a[3] = b[3];
+a[4] = 0;
+a[5] = 0;
+a[6] = 0;
+a[7] = 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  check_vect ();
+
+  for (int i = 0; i < 8; i++)
+a[i] = 1;
+  for (int i = 0; i < 4; i++)
+b[i] = i + 4;
+  __asm__ volatile ("" : : : "memory");
+  test (a, b);
+  __asm__ volatile ("" : : : "memory");
+  for (int i = 0; i < 4; i++)
+if (a[i] != i

Re: Division Optimization in match and simplify

2015-11-05 Thread Michael Matz
Hi,

On Wed, 4 Nov 2015, Richard Biener wrote:

> Ah, it was _left_ shift of negative values that ubsan complains about.

Note that this is only for the frontend definition of shifts.  I don't see 
why gimple shouldn't define it to the only sensible definition there is, 
which also happens to be the one that all CPUs in the world implement 
(well, those using two-complement representation at least).


Ciao,
Michael.


Re: [PR64164] drop copyrename, integrate into expand

2015-11-05 Thread Richard Biener
On Thu, Nov 5, 2015 at 6:08 AM, Alexandre Oliva  wrote:
> On Sep 23, 2015, Alexandre Oliva  wrote:
>
>> @@ -2982,38 +2887,39 @@ assign_parm_setup_block (struct assign_parm_data_all 
>> *all,
> [snip]
>> +  if (GET_CODE (reg) != CONCAT)
>> +stack_parm = reg;
>> +  else
>> +/* This will use or allocate a stack slot that we'd rather
>> +   avoid.  FIXME: Could we avoid it in more cases?  */
>> +target_reg = reg;
>
> It turns out that we can, and that helps fixing PR67753.  In the end, I
> ended up using the ABI-reserved stack slot if there is one, but just
> allocating an unsplit complex pseudo fixes all remaining cases that used
> to require the allocation of a stack slot.  Yay!
>
> As for pr67753 proper, we emitted the store of the PARALLEL entry_parm
> into the stack parm only in the conversion seq, which was ultimately
> emitted after the copy from stack_parm to target_reg that was supposed
> to copy the value originally in entry_parm.  So we copied an
> uninitialized stack slot, and the subsequent store in the conversion seq
> was optimized out as dead.
>
> This caused a number of regressions on hppa-linux-gnu.  The fix for this
> is to arrange for the copy to target_reg to be emitted in the conversion
> seq if the copy to stack_parm was.  I can't determine whether this fix
> all reported regressions, but from visual inspection of the generated
> code I'm pretty sure it fixes at least gcc.c-torture/execute/pr38969.c.
>
>
> When we do NOT have an ABI-reserved stack slot, the store of the
> PARALLEL entry_parm into the intermediate pseudo doesn't need to go in
> the conversion seq (emit_group_store from a PARALLEL to a pseudo only
> uses registers, according to another comment in function.c), so I've
> simplified that case.
>
>
> This was regstrapped on x86_64-linux-gnu, i686-linux-gnu,
> ppc64-linux-gnu, ppc64el-linux-gnu, and cross-build-tested for all
> targets for which I've tested the earlier patches in the patchset.
> Ok to install?

Ok.

Thanks,
Richard.

>
>
> [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg
>
> From: Alexandre Oliva 
>
> In assign_parms_setup_block, the copy of args in PARALLELs from
> entry_parm to stack_parm is deferred to the parm conversion insn seq,
> but the copy from stack_parm to target_reg was inserted in the normal
> copy seq, that is executed before the conversion insn seq.  Oops.
>
> We could do away with the need for an actual stack_parm in general,
> which would have avoided the need for emitting the copy to target_reg
> in the conversion seq, but at least on pa, due to the need for stack
> to copy between SI and SF modes, it seems like using the reserved
> stack slot is beneficial, so I put in logic to use a pre-reserved
> stack slot when there is one, and emit the copy to target_reg in the
> conversion seq if stack_parm was set up there.
>
> for  gcc/ChangeLog
>
> PR rtl-optimization/67753
> PR rtl-optimization/64164
> * function.c (assign_parm_setup_block): Avoid allocating a
> stack slot if we don't have an ABI-reserved one.  Emit the
> copy to target_reg in the conversion seq if the copy from
> entry_parm is in it too.  Don't use the conversion seq to copy
> a PARALLEL to a REG or a CONCAT.
> ---
>  gcc/function.c |   39 ++-
>  1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/function.c b/gcc/function.c
> index aaf49a4..156c72b 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -2879,6 +2879,7 @@ assign_parm_setup_block (struct assign_parm_data_all 
> *all,
>rtx entry_parm = data->entry_parm;
>rtx stack_parm = data->stack_parm;
>rtx target_reg = NULL_RTX;
> +  bool in_conversion_seq = false;
>HOST_WIDE_INT size;
>HOST_WIDE_INT size_stored;
>
> @@ -2895,9 +2896,23 @@ assign_parm_setup_block (struct assign_parm_data_all 
> *all,
>if (GET_CODE (reg) != CONCAT)
> stack_parm = reg;
>else
> -   /* This will use or allocate a stack slot that we'd rather
> -  avoid.  FIXME: Could we avoid it in more cases?  */
> -   target_reg = reg;
> +   {
> + target_reg = reg;
> + /* Avoid allocating a stack slot, if there isn't one
> +preallocated by the ABI.  It might seem like we should
> +always prefer a pseudo, but converting between
> +floating-point and integer modes goes through the stack
> +on various machines, so it's better to use the reserved
> +stack slot than to risk wasting it and allocating more
> +for the conversion.  */
> + if (stack_parm == NULL_RTX)
> +   {
> + int save = generating_concat_p;
> + generating_concat_p = 0;
> + stack_parm = gen_reg_rtx (mode);
> + generating_concat_p = save;
> +   }
> +   }
>data->stack_parm = NULL;
>  }
>
> @@ -2938,7 +2953,9 

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-11-05 Thread Richard Biener
On Wed, Nov 4, 2015 at 5:50 PM, Jan Hubicka  wrote:
>> > Are these supposed to be fixed by Richard's change to not use
>> > useless_type_conversion for VCE, or is it another issue?
>>
>> Richard's change not to use useless_type_conversion for VCE was causing
>> additional GIMPLE verification failures so I didn't pursue; I can try again,
>> but all the known regressions are now fixed thanks to Richard's latest change
>> to useless_type_conversion_p itself.
>
> I see, you re-instantiated the TYPE_CANONICAL check for aggregates instead.  I
> guess it is most practical way to go right now even though it would be really 
> nice
> to separate this from TBAA machinery.
> At the moment LTO doesn't do globbing where calling conventions should care.
> One such case is the globing of array containing char and char which is 
> required
> by Fortran standard, but that IMO is a defect in standard - if types are 
> passed
> differently by target ABI one can't expect them to be fuly interoperable as 
> Fortran
> would like.

Note that I can't see how non-register type defs/uses will ever
"change" their type
during optimization so I think using TYPE_CANONICAL for the aggregate type case
is fine.

Richard.

> Thank you very much for looking into this!
> Honza
>>
>> --
>> Eric Botcazou


[Fortran, patch, pr68218, v1] ALLOCATE with size given by a module function

2015-11-05 Thread Andre Vehreschild
Hi all,

attached is a rather trivial patch to prevent multiple evaluations of a
function in:

  allocate( array(func()) )

The patch tests whether the upper bound of the array is a function
and calls gfc_evaluate_now().

Bootstrapped and regtested for x86_64-linux-gnu/f21.

Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr68218_1.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 6bbf8cc..e28a5ce 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -5150,6 +5150,8 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 	  gcc_assert (ubound);
 	  gfc_conv_expr_type (&se, ubound, gfc_array_index_type);
 	  gfc_add_block_to_block (pblock, &se.pre);
+	  if (ubound->expr_type == EXPR_FUNCTION)
+	se.expr = gfc_evaluate_now (se.expr, pblock);
 	}
   gfc_conv_descriptor_ubound_set (descriptor_block, descriptor,
   gfc_rank_cst[n], se.expr);
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90 b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
new file mode 100644
index 000..686b612
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocate_with_arrayspec_1.f90
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+
+MODULE mo_test
+
+  integer :: n = 0
+CONTAINS
+
+  FUNCTION nquery()
+INTEGER :: nquery
+WRITE (0,*) "hello!"
+n = n + 1
+nquery = n
+  END FUNCTION nquery
+
+END MODULE mo_test
+
+
+! --
+! MAIN PROGRAM
+! --
+PROGRAM example
+   USE mo_test
+   INTEGER, ALLOCATABLE :: query_buf(:)
+   ALLOCATE(query_buf(nquery()))
+   if (n /= 1 .or. size(query_buf) /= n) call abort()
+END PROGRAM example
+
+! { dg-final { scan-tree-dump-times "nquery" 5 "original" } }


[gomp4, committed] Revert omp_data_i init handling in copy_prop

2015-11-05 Thread Tom de Vries

Hi,

this patch reverts omp_data_i init handling in copy_prop.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert omp_data_i init handling in copy_prop

2015-11-05  Tom de Vries  

	revert:
	2015-04-21  Tom de Vries  

	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
	conservatively.
---
 gcc/tree-ssa-copy.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c
index 4a3e4bd..4992bda 100644
--- a/gcc/tree-ssa-copy.c
+++ b/gcc/tree-ssa-copy.c
@@ -40,7 +40,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-dom.h"
 #include "tree-ssa-loop-niter.h"
-#include "omp-low.h"
 
 
 /* This file implements the copy propagation pass and provides a
@@ -96,9 +95,6 @@ stmt_may_generate_copy (gimple *stmt)
   if (gimple_has_volatile_ops (stmt))
 return false;
 
-  if (gimple_stmt_omp_data_i_init_p (stmt))
-return false;
-
   /* Statements with loads and/or stores will never generate a useful copy.  */
   if (gimple_vuse (stmt))
 return false;
-- 
1.9.1



Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/05/2015 04:14 AM, Thomas Schwinge wrote:

> On Tue, 3 Nov 2015 14:16:59 -0800, Cesar Philippidis  
> wrote:
>> This patch does the following to the c and c++ front ends:
> 
>>  * updates c_oacc_split_loop_clauses to filter out the loop clauses
>>from combined parallel/kernels loops
> 
>>  gcc/c-family/
>>  * c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
>>  AUTO, SEQ and independent as loop clauses.  Associate REDUCTION
>>  clauses with parallel and kernels.
> 
>> --- a/gcc/c-family/c-omp.c
>> +++ b/gcc/c-family/c-omp.c
>> @@ -709,12 +709,21 @@ c_oacc_split_loop_clauses (tree clauses, tree 
>> *not_loop_clauses)
>>  
>>switch (OMP_CLAUSE_CODE (clauses))
>>  {
>> +  /* Loop clauses.  */
>>  case OMP_CLAUSE_COLLAPSE:
>> -case OMP_CLAUSE_REDUCTION:
>> +case OMP_CLAUSE_TILE:
>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_AUTO:
>> +case OMP_CLAUSE_SEQ:
>> +case OMP_CLAUSE_INDEPENDENT:
>>OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
>>loop_clauses = clauses;
>>break;
>>  
>> +  /* Parallel/kernels clauses.  */
>> +
>>  default:
>>OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
>>*not_loop_clauses = clauses;
> 
> Contrary to your ChangeLog entry, this is not duplicating but is moving
> OMP_CLAUSE_REDUCTION handling.  Is that intentional?  (And, doesn't
> anything break in the testsuite?)

Sorry, I must have mis-phrased it. The spec is unclear here. There are
three possible ways to interpret 'acc parallel loop reduction':

  1. acc parallel reduction
 acc loop

  2. acc parallel
 acc loop reduction

  3. acc parallel reduction
 acc loop reduction


You told me to make all of the front ends consistent, and since I
started working on fortran first, I had c and c++ follow what it was doing.

I haven't observed any regressions with this in in place. Then again,
maybe we don't have sufficient test coverage. I'll do more testing.

Cesar



[gomp4, committed] Remove gimple_stmt_omp_data_i_init_p

2015-11-05 Thread Tom de Vries

Hi,

this patch:
- removes a superfluous test on gimple_stmt_omp_data_i_init_p in
  tree-parloops.c
- removes unused functions gimple_stmt_omp_data_i_init_p and
  get_omp_data_i.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove gimple_stmt_omp_data_i_init_p

2015-11-05  Tom de Vries  

	* omp-low.c (gimple_stmt_omp_data_i_init_p, get_omp_data_i): Remove.
	* omp-low.h: Same.
	* tree-parloops.c (oacc_entry_exit_ok_1): Remove test on
	gimple_stmt_omp_data_i_init_p.
---
 gcc/omp-low.c   | 48 
 gcc/omp-low.h   |  2 --
 gcc/tree-parloops.c |  2 --
 3 files changed, 52 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 643d017..b3731e3 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -18681,54 +18681,6 @@ oacc_kernels_region_entry_p (basic_block bb, gomp_target **directive)
   return res;
 }
 
-/* Return true if STMT is copy assignment .omp_data_i = &.omp_data_arr.  */
-
-bool
-gimple_stmt_omp_data_i_init_p (gimple *stmt)
-{
-  /* Extract obj from stmt 'a = &obj.  */
-  if (!gimple_assign_cast_p (stmt)
-  && !gimple_assign_single_p (stmt))
-return false;
-  tree rhs = gimple_assign_rhs1 (stmt);
-  if (TREE_CODE (rhs) != ADDR_EXPR)
-return false;
-  tree obj = TREE_OPERAND (rhs, 0);
-
-  /* Check that the last statement in the preceding bb is an oacc kernels
- stmt.  */
-  basic_block bb = gimple_bb (stmt);
-  gomp_target *kernels;
-  if (!oacc_kernels_region_entry_p (bb, &kernels))
-return false;
-
-  /* Get omp_data_arr from the oacc kernels stmt.  */
-  tree data_arg = gimple_omp_target_data_arg (kernels);
-  tree omp_data_arr = TREE_VEC_ELT (data_arg, 0);
-
-  /* If obj is omp_data_arr, we've found the .omp_data_i init statement.  */
-  return operand_equal_p (obj, omp_data_arr, 0);
-}
-
-
-/* Return omp_data_i corresponding to the assignment
-   .omp_data_i = &.omp_data_arr in oacc kernels region entry REGION_ENTRY.  */
-
-tree
-get_omp_data_i (basic_block region_entry)
-{
-  if (!single_succ_p (region_entry))
-return NULL_TREE;
-  basic_block bb = single_succ (region_entry);
-  gimple_stmt_iterator gsi = gsi_start_bb (bb);
-  if (gsi_end_p (gsi))
-return NULL_TREE;
-  gimple *stmt = gsi_stmt (gsi);
-  if (!gimple_stmt_omp_data_i_init_p (stmt))
-return NULL_TREE;
-  return gimple_assign_lhs (stmt);
-}
-
 namespace {
 
 const pass_data pass_data_late_lower_omp =
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index 673b470..d1755a8 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -30,8 +30,6 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern tree omp_member_access_dummy_var (tree);
-extern bool gimple_stmt_omp_data_i_init_p (gimple *);
-extern tree get_omp_data_i (basic_block);
 extern bool oacc_kernels_region_entry_p (basic_block, gomp_target **);
 extern basic_block get_oacc_kernels_region_exit (basic_block);
 extern basic_block loop_get_oacc_kernels_region_entry (struct loop *);
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index c038dfe..7e888d8 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2956,8 +2956,6 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 	}
 	  else if (gimple_code (stmt) == GIMPLE_OMP_RETURN)
 	continue;
-	  else if (gimple_stmt_omp_data_i_init_p (stmt))
-	continue;
 	  else if (!gimple_has_side_effects (stmt)
 		   && !gimple_could_trap_p (stmt)
 		   && !stmt_could_throw_p (stmt)
-- 
1.9.1



[gomp4, committed] Remove misc oacc kernels functions

2015-11-05 Thread Tom de Vries

Hi,

this patch removes a number of unused function related to the oacc 
kernels region.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove misc oacc kernels functions

2015-11-05  Tom de Vries  

	* omp-low.c (get_bbs_in_oacc_kernels_region):
	(loop_get_oacc_kernels_region_entry, get_oacc_kernels_region_exit)
	(oacc_kernels_region_entry_p): Remove.
	* omp-low.h: Same.
---
 gcc/omp-low.c | 134 --
 gcc/omp-low.h |   5 ---
 2 files changed, 139 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b3731e3..debedb1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12295,115 +12295,6 @@ mark_loops_in_oacc_kernels_region (basic_block region_entry,
   loop->in_oacc_kernels_region = true;
 }
 
-/* Return blocks in oacc kernels region delimited by REGION_ENTRY and
-   REGION_EXIT.  */
-
-vec
-get_bbs_in_oacc_kernels_region (basic_block region_entry,
- basic_block region_exit)
-{
-  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
-  unsigned di;
-  basic_block bb;
-
-  bitmap_clear (excludes_bitmap);
-
-  /* Get all the blocks dominated by the region entry.  That will include the
- entire region.  */
-  vec dominated
-= get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
-
-  bitmap_set_bit (excludes_bitmap, region_entry->index);
-
-  /* Exclude all the blocks which are not in the region: the blocks dominated by
- the region exit.  */
-  if (region_exit != NULL)
-{
-  vec excludes
-	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
-  FOR_EACH_VEC_ELT (excludes, di, bb)
-	bitmap_set_bit (excludes_bitmap, bb->index);
-  bitmap_clear_bit (excludes_bitmap, region_exit->index);
-}
-
-  vec bbs = vNULL;
-
-  FOR_EACH_VEC_ELT (dominated, di, bb)
-if (!bitmap_bit_p (excludes_bitmap, bb->index))
-  bbs.safe_push (bb);
-
-  return bbs;
-}
-
-/* Return the entry basic block of the oacc kernels region containing LOOP.  */
-
-basic_block
-loop_get_oacc_kernels_region_entry (struct loop *loop)
-{
-  if (!loop->in_oacc_kernels_region)
-return NULL;
-
-  basic_block bb = loop->header;
-  while (true)
-{
-  bb = get_immediate_dominator (CDI_DOMINATORS, bb);
-  gcc_assert (bb != NULL);
-
-  gimple *last = last_stmt (bb);
-  if (last != NULL
-	  && gimple_code (last) == GIMPLE_OMP_TARGET
-	  && gimple_omp_target_kind (last) == GF_OMP_TARGET_KIND_OACC_KERNELS)
-	return bb;
-}
-}
-
-/* Return the oacc kernels region exit corresponding to REGION_ENTRY.  */
-
-basic_block
-get_oacc_kernels_region_exit (basic_block region_entry)
-{
-  gcc_checking_assert (oacc_kernels_region_entry_p (region_entry, NULL));
-
-  bitmap to_visit = BITMAP_ALLOC (NULL);
-  bitmap visited = BITMAP_ALLOC (NULL);
-  bitmap_clear (to_visit);
-  bitmap_clear (visited);
-
-  bitmap_set_bit (to_visit, region_entry->index);
-
-  basic_block bb;
-  while (true)
-{
-  if (bitmap_empty_p (to_visit))
-	{
-	  bb = NULL;
-	  break;
-	}
-
-  unsigned int index = bitmap_first_set_bit (to_visit);
-  bitmap_clear_bit (to_visit, index);
-  bitmap_set_bit (visited, index);
-  bb = BASIC_BLOCK_FOR_FN (cfun, index);
-
-  gimple *last = last_stmt (bb);
-  if (last != NULL
-	  && gimple_code (last) == GIMPLE_OMP_RETURN)
-	break;
-
-  edge_iterator ei;
-  for (ei = ei_start (bb->succs); !ei_end_p (ei); ei_next (&ei))
-	{
-	  edge e = ei_edge (ei);
-	  unsigned int dest_index = e->dest->index;
-	  if (!bitmap_bit_p (visited, dest_index))
-	bitmap_set_bit (to_visit, dest_index);
-	}
-}
-
-  BITMAP_FREE (to_visit);
-  return bb;
-}
-
 /* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
macro on gomp-constants.h.  We do not check for overflow.  */
 
@@ -18656,31 +18547,6 @@ omp_finish_file (void)
 }
 }
 
-/* Return true if BB is an oacc kernels region entry.  If DIRECTIVE is non-null,
-   return the corresponding kernels directive in *DIRECTIVE.  */
-
-bool
-oacc_kernels_region_entry_p (basic_block bb, gomp_target **directive)
-{
-  /* Check that the last statement in the preceding bb is an oacc kernels
- stmt.  */
-  if (!single_pred_p (bb))
-return false;
-  gimple *last = last_stmt (single_pred (bb));
-  if (last == NULL
-  || gimple_code (last) != GIMPLE_OMP_TARGET)
-return false;
-  gomp_target *kernels = as_a  (last);
-
-  bool res = (gimple_omp_target_kind (kernels)
-	  == GF_OMP_TARGET_KIND_OACC_KERNELS);
-
-  if (res && directive)
-*directive = kernels;
-
-  return res;
-}
-
 namespace {
 
 const pass_data pass_data_late_lower_omp =
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index d1755a8..e4c81b2 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -30,11 +30,6 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern tree omp_member_access_dummy_var (tree);
-extern bool oacc_kernels_region_entry_p (basic_block, gomp_t

[gomp4, committed] Revert omp_data_i init handling in ccp and forwprop

2015-11-05 Thread Tom de Vries

Hi,

This patch reverts the omp_data_i init handling in ccp and forwprop.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert omp_data_i init handling in ccp and forwprop

2015-11-05  Tom de Vries  

	Revert:
	2015-04-21  Tom de Vries  

	* tree-ssa-ccp.c: Include omp-low.h.
	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
	conservatively.
	* tree-ssa-forwprop.c: Include omp-low.h.
	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
---
 gcc/tree-ssa-ccp.c  | 6 --
 gcc/tree-ssa-forwprop.c | 4 +---
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index cf93277..2831cfe 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -147,7 +147,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "tree-chkp.h"
-#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -810,9 +809,6 @@ surely_varying_stmt_p (gimple *stmt)
   && gimple_code (stmt) != GIMPLE_CALL)
 return true;
 
-  if (gimple_stmt_omp_data_i_init_p (stmt))
-return true;
-
   return false;
 }
 
@@ -2368,8 +2364,6 @@ ccp_visit_stmt (gimple *stmt, edge *taken_edge_p, tree *output_p)
   switch (gimple_code (stmt))
 {
   case GIMPLE_ASSIGN:
-	if (gimple_stmt_omp_data_i_init_p (stmt))
-	  break;
 /* If the statement is an assignment that produces a single
output value, evaluate its RHS to see if the lattice value of
its output has changed.  */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index bb08a4a..491178d 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -60,7 +60,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
-#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
sites of the LHS of the assignment.  It's basically a specialized
@@ -2132,8 +2131,7 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt)
-	  || gimple_stmt_omp_data_i_init_p (stmt))
+	  if (!is_gimple_assign (stmt))
 	{
 	  gsi_next (&gsi);
 	  continue;
-- 
1.9.1



Re: Re: [AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model

2015-11-05 Thread Jiong Wang

Marcus Shawcroft writes:


+#ifdef HAVE_AS_TINY_TLSGD_RELOCS
+  return SYMBOL_TINY_TLSGD;
+#else
+  return SYMBOL_SMALL_TLSGD;
+#endif

Rather than introduce blocks of conditional compilation it is better
to gate different behaviours with a test on a constant expression. In
this case add something like this:

#if define(HAVE_AS_TINY_TLSGD_RELOCS)
#define USE_TINY_TLSGD 1
#else
#define USE_TINY_TLSGD 0
#endif

up near the definition of TARGET_HAVE_TLS then write the above
fragment without using the preprocessor:

return USE_TINY_TLSGD ? SYMBOL_TINY_TLSGD : SYMBOL_SMALL_TLSGD;


Done.


- aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ if (type == SYMBOL_SMALL_TLSGD)
+  aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ else
+  aarch64_emit_call_insn (gen_tlsgd_tiny (result, imm, resolver));
  insns = get_insns ();
  end_sequence ();

Add a separate case statment for SYMBOL_TINY_TLSGD rather than reusing
the case statement for SYMBOL_SMALL_TLSGD and then needing to add
another test against symbol type within the body of the case
statement.


Done.



+(define_insn "tlsgd_tiny"
+  [(set (match_operand 0 "register_operand" "")
+ (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
+   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
UNSPEC_GOTTINYTLS)
+   (clobber (reg:DI LR_REGNUM))
+  ]
+  ""
+  "adr\tx0, %A1;bl\t%2;nop";
+  [(set_attr "type" "multiple")
+   (set_attr "length" "12")])

I don't think the explicit clobber LR_REGNUM is required since your
change last September:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html


We don't need this explict clobber LR_REGNUM only if operand 0 happen
be allocated to LR_REGNUM as after my above patch LR_REGNUM is allocable.

However we still need the explict clobber here.  Because for all other
cases LR_REGNUM not allocated, gcc data flow analysis can't deduct LR_REGNUM
will still be clobbered implicitly by the call instruction.

Without this "clobber" tag, a direct impact is df_regs_ever_live is calculated
incorrectly for x30, then for the following simple testcase:

__thread int t0 = 0x10;
__thread int t1 = 0x10;

int
main (int argc, char **argv)
{
  if (t0 != t1)
return 1;
  return  0;
}


if you compile with

 "-O2 -ftls-model=global-dynamic -fpic -mtls-dialect=trad t.c -mcmodel=tiny 
-fomit-frame-pointer",
wrong code will be generated:

 main:
str x19, [sp, -16]!  <--- x30 is not saved.
adr x0, :tlsgd:t0
bl __tls_get_addr
nop

Patch updated. tls regression OK

OK for trunk?

2015-11-05  Jiong Wang  

gcc/
  * configure.ac: Add check for binutils global dynamic tiny code model
  relocation support.
  * configure: Regenerate.
  * config.in: Regenerate.
  * config/aarch64/aarch64.md (tlsgd_tiny): New define_insn.
  * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New
  enumeration SYMBOL_TINY_TLSGD.
  (aarch64_symbol_context): New comment on SYMBOL_TINY_TLSGD.
  * config/aarch64/aarch64.c (aarch64_classify_tls_symbol): Support
  SYMBOL_TINY_TLSGD.
  (aarch64_print_operand): Likewise.
  (aarch64_expand_mov_immediate): Likewise.
  (aarch64_load_symref_appropriately): Likewise.

gcc/testsuite/
  * lib/target-supports.exp (check_effective_target_aarch64_tlsgdtiny):
  New effective check.
  * gcc.target/aarch64/tlsgd_small_1.c: New testcase.
  * gcc.target/aarch64/tlsgd_small_ilp32_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_ilp32_1.c: Likewise.
  
--

Regards,
Jiong

diff --git a/gcc/config.in b/gcc/config.in
index 093478c..617278f 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -618,6 +618,13 @@
 #endif
 
 
+/* Define if your assembler supports relocs needed by global dynamic for tiny
+   code model. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_TINY_TLSGD_RELOCS
+#endif
+
+
 /* Define if your assembler and linker support thread-local storage. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_TLS
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index baaf1bd..9128baa 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -62,6 +62,7 @@
SYMBOL_SMALL_TLSGD
SYMBOL_SMALL_TLSDESC
SYMBOL_SMALL_TLSIE
+   SYMBOL_TINY_TLSGD
SYMBOL_TINY_TLSIE
SYMBOL_TLSLE12
SYMBOL_TLSLE24
@@ -103,6 +104,7 @@ enum aarch64_symbol_type
   SYMBOL_SMALL_TLSIE,
   SYMBOL_TINY_ABSOLUTE,
   SYMBOL_TINY_GOT,
+  SYMBOL_TINY_TLSGD,
   SYMBOL_TINY_TLSIE,
   SYMBOL_TLSLE12,
   SYMBOL_TLSLE24,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7e61b16..8c24740 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1042,6 +1042,22 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	return;
   }
 
+case SYMBOL_TINY_TLSGD:
+  {
+	rtx_insn *insns;
+	rtx result = gen_rtx_REG (Pmode, R0_REGNUM);
+	rtx resolver = aarch64_tls_get_addr ();
+
+	start_sequence ();
+	aarch64_emit_call_ins

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/04/2015 11:29 PM, Jakub Jelinek wrote:
> On Wed, Nov 04, 2015 at 08:58:32PM -0800, Cesar Philippidis wrote:
>> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
>> index e3f55a7..4424596 100644
>> --- a/gcc/cp/pt.c
>> +++ b/gcc/cp/pt.c
>> @@ -14395,6 +14395,15 @@ tsubst_omp_clauses (tree clauses, bool 
>> declare_simd, bool allow_fields,
>>  case OMP_CLAUSE_PRIORITY:
>>  case OMP_CLAUSE_ORDERED:
>>  case OMP_CLAUSE_HINT:
>> +case OMP_CLAUSE_NUM_GANGS:
>> +case OMP_CLAUSE_NUM_WORKERS:
>> +case OMP_CLAUSE_VECTOR_LENGTH:
>> +case OMP_CLAUSE_GANG:
> 
> GANG has two arguments, so you want to handle it differently, you need
> to tsubst both arguments.

Good catch. Support for the static argument was added after I added
template support in gomp4. I'll fix that.

>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_ASYNC:
>> +case OMP_CLAUSE_WAIT:
>> +case OMP_CLAUSE_TILE:
> 
> I don't think tile will work well this way, if the only argument is a
> TREE_VEC, then I think you hit:
> case TREE_VEC:
>   /* A vector of template arguments.  */
>   gcc_assert (!type);
>   return tsubst_template_args (t, args, complain, in_decl);
> which does something very much different from making a copy of the TREE_VEC
> and calling tsubst_expr on each argument.
> Thus, either you need to handle it manually here, or think about different
> representation of OMP_CLAUSE_TILE?  It seems you allow at most one tile
> clause, so perhaps you could split the single source tile clause into one
> tile clause per expression in there (the only issue is that the FEs
> emit the clauses in last to first order, so you'd need to nreverse the
> tile clause list before adding it to the list of all clauses).

It shouldn't be difficult to call it manually here.

> Otherwise it looks ok, except:

How about the other patch, with the c and c++ FE changes? Is that one OK
for trunk now? Nathan's going to need it for this firstprivate changes
in the middle end.

>> diff --git a/gcc/testsuite/g++.dg/goacc/template-reduction.C 
>> b/gcc/testsuite/g++.dg/goacc/template-reduction.C
>> new file mode 100644
>> index 000..668eeb3
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/goacc/template-reduction.C
>> +++ b/gcc/testsuite/g++.dg/goacc/template.C
> 
> the testsuite coverage is orders of magnitude smaller than it should be.
> Just look at the amount of OpenMP template tests (both compile time and
> runtime):
> grep template libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep -l template 
> libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep template 
> gcc/testsuite/g++.dg/gomp/* | wc -l; grep -l template 
> gcc/testsuite/g++.dg/gomp/* | wc -l
> 629 # templates
> 45 # tests with templates
> 151 # templates
> 58 # tests with templates
> and even that is really not sufficient.  From my experience, special care is
> needed in template tests to test both non-type dependent and type-dependent
> cases (e.g. some diagnostics is emitted already when parsing the templates
> even when they won't be instantiated at all, other diagnostic is done during
> instantiation), or for e.g. references there are interesting cases where
> a reference to template parameter typename is used or where a reference to
> some time is tsubsted into a template parameter typename.
> E.g. you don't have any test coverage for the vector (num: ...)
> or gang (static: *, num: type_dep)
> or gang (static: type_dep1, type_dep2)
> (which would show you the above issue with the gang clause), or sufficient
> coverage for tile, etc.
> Of course that coverage can be added incrementally.

We'll add more tests incrementally.

Cesar


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-05 Thread Martin Sebor

On 11/04/2015 09:27 PM, Jason Merrill wrote:

On 11/04/2015 07:15 PM, Martin Sebor wrote:

There was a lot of discussion of C++ aliasing rules at the recent
meeting; we really seem to be moving in the direction of being stricter
about which union member is active.  So I think we do want to diagnose
the new-expression above; the user should write new (&u) if that's what
they mean.


Okay. I changed that in the latest patch.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

 char buf [sizeof (int)];
 new (&buf [1] - 1) int;


OK, so because we're looking at the expression from the outside in, we
first see the subtraction and adjust becomes -1, then we see the
array_ref and adjust returns to 0.  We still don't have a negative
adjust by the time we get to the quoted if/else.


I think I see what you mean. I've changed the type of the variables
and the computation to unsigned. That made it possible to eliminate
the final else and do some other cleanup. Attached is an updated
patch.


Hmm, I was suggesting that bytes_avail change to unsigned, but I don't
think adjust should change; I believe that 0u - 1u is undefined due to
overflow even though (-1u) and (unsigned)-1 are well defined.  Sorry for
the muddled messages.  I think let's leave adjust signed and assert that
it ends up non-negative.


No problem.

Unsigned ints wrap around and don't overflow so the subtraction
is well defined (0u - 1u is equal UINT_MAX).

FWIW, I had the assert there for sanity testing when you first
mentioned it to convince myself there really was no way for it
become negative. A bootstrap went fine with it but it still made
me just a teeny bit uneasy. I would hate for the code to change
in the future and for the assert to then fire after it's released.

In any case, I defer to your better judgment. Please let me know
if you would still like to go with signed + assert.

Thanks
Martin



[PATCH] Fix tree_operand_hash traits

2015-11-05 Thread Richard Biener

It was overloading equal_keys and not keys and thus the hash table
lookup used equal from the inerited pointer_hash ... (so pointer
equality rather than operand_equal_p as desired).

This makes hash_map work as intented.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-11-05  Richard Biener  

* tree-hash-traits.h (tree_operand_hash): Provide equal, not
equal_keys.

Index: gcc/tree-hash-traits.h
===
--- gcc/tree-hash-traits.h  (revision 229804)
+++ gcc/tree-hash-traits.h  (working copy)
@@ -23,18 +23,20 @@ along with GCC; see the file COPYING3.
 /* Hash for trees based on operand_equal_p.  */
 struct tree_operand_hash : ggc_ptr_hash 
 {
-  static inline hashval_t hash (const_tree);
-  static inline bool equal_keys (const_tree, const_tree);
+  static inline hashval_t hash (const value_type &);
+  static inline bool equal (const value_type &,
+   const compare_type &);
 };
 
 inline hashval_t
-tree_operand_hash::hash (const_tree t)
+tree_operand_hash::hash (const value_type &t)
 {
   return iterative_hash_expr (t, 0);
 }
 
 inline bool
-tree_operand_hash::equal_keys (const_tree t1, const_tree t2)
+tree_operand_hash::equal (const value_type &t1,
+ const compare_type &t2)
 {
   return operand_equal_p (t1, t2, 0);
 }


Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-11-05 Thread Richard Biener
On Thu, Nov 5, 2015 at 2:26 PM, Alan Lawrence  wrote:
> On 3 November 2015 at 11:35, Richard Biener  
> wrote:
>>
>> I think this should simply re-write A << B to (type) (unsigned-type) A
>> * (1U << B).
>>
>> Does that then still vectorize the signed case?
>
> I didn't realize our representation of chrec's could express that.
> Yes, it does - thanks! (And the avx512ifma- test is compiled without 
> warnings.)
>
> Patch attached. I've added a platform-independent version of the failing 
> AVX512
> test too.
>
> --Alan
>
> gcc/ChangeLog:
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Try to handle
> LSHIFT_EXPRs as equivalent unsigned MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr68112.c: New.
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/pr68112.c   | 11 
>  gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
> 
>  gcc/tree-scalar-evolution.c  | 17 
>  3 files changed, 61 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr68112.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr68112.c b/gcc/testsuite/gcc.dg/pr68112.c
> new file mode 100644
> index 000..0a45b03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68112.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Waggressive-loop-optimizations" } */
> +
> +int *a;
> +
> +void
> +foo ()
> +{
> +  for (int i = 0; i < 65536; i++)
> +*a = i << 24;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963.  */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> +  for (int i = 0; i < N; i++)
> +out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  check_vect ();
> +  for (int i = 0; i < 2*N; i++)
> +{
> +  in[i] = i;
> +  __asm__ volatile ("" : : : "memory");
> +}
> +  loop ();
> +  __asm__ volatile ("" : : : "memory");
> +  for (int i = 0; i < N; i++)
> +{
> +  if (out[i] != i*2 + 7)
> +   abort ();
> +}
> +  return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..60d515d 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1840,6 +1840,23 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
>res = chrec_fold_multiply (type, chrec1, chrec2);
>break;
>
> +case LSHIFT_EXPR:
> +  {
> +   /* Handle A< +   tree uns = unsigned_type_for (type);
> +   chrec1 = analyze_scalar_evolution (loop, rhs1);
> +   chrec2 = analyze_scalar_evolution (loop, rhs2);
> +   chrec1 = chrec_convert (uns, chrec1, at_stmt);
> +   chrec1 = instantiate_parameters (loop, chrec1);
> +   chrec2 = instantiate_parameters (loop, chrec2);
> +
> +   tree one = build_int_cst (unsigned_type_for (TREE_TYPE (rhs1)), 1);

use 'uns' for the type.

Ok with that change.

Richard.

> +   chrec2 = fold_build2 (LSHIFT_EXPR, uns, one, chrec2);
> +   res = chrec_fold_multiply (uns, chrec1, chrec2);
> +   res = chrec_convert (type, res, at_stmt);
> +  }
> +  break;
> +
>  CASE_CONVERT:
>/* In case we have a truncation of a widened operation that in
>   the truncated type has undefined overflow behavior analyze
> --
> 1.9.1
>


Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-05 Thread Daniel Gutson
On Wed, Nov 4, 2015 at 3:20 AM, Jonathan Wakely  wrote:
> On 4 November 2015 at 02:11, Daniel Gutson wrote:
>> Since this is a nothrow new, we thought that probably the system
>> might not be exceptions-friendly (such as certain embedded systems),
>> so we wanted to provide the new_handler the ability to do something else
>> other than trying to allocate memory and keep the function iterating.
>
> That could be done using an alternative overload of operator new
> instead of altering the semantics of the standard one (that could be
> provided as a GNU extension, for example).
>
>> In fact, our idea is that since the nothrow new can indeed return nullptr,
>> let the new_handler do something else and leave the no-memory-consequence
>> to the caller.
>> This new flag enables just that.
>
>
> The default configuration already allows the caller to deal with
> allocation failure from the nothrow version of new, by not installing
> a new-handler installed, and dealing with a null return value. How
> would I use this alternative configuration? Since the behaviour only
> changes when a new-handler is installed, presumably I'm meant to
> install some kind of new-handler that does something else ... but
> what? The patch should include documentation explaining when and how
> to use this option.

Real use cases: statistics and logging. It's a (one time) callback
reporting that something went wrong,
but not intended to fix things e.g. by attempting to free more memory.




-- 

Daniel F. Gutson
Chief Engineering Officer, SPD

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina

Phone:   +54 351 4217888 / +54 351 4218211
Skype:dgutson
LinkedIn: http://ar.linkedin.com/in/danielgutson


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-05 Thread Jason Merrill

On 11/05/2015 10:12 AM, Martin Sebor wrote:

On 11/04/2015 09:27 PM, Jason Merrill wrote:

On 11/04/2015 07:15 PM, Martin Sebor wrote:

There was a lot of discussion of C++ aliasing rules at the recent
meeting; we really seem to be moving in the direction of being stricter
about which union member is active.  So I think we do want to diagnose
the new-expression above; the user should write new (&u) if that's what
they mean.


Okay. I changed that in the latest patch.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

 char buf [sizeof (int)];
 new (&buf [1] - 1) int;


OK, so because we're looking at the expression from the outside in, we
first see the subtraction and adjust becomes -1, then we see the
array_ref and adjust returns to 0.  We still don't have a negative
adjust by the time we get to the quoted if/else.


I think I see what you mean. I've changed the type of the variables
and the computation to unsigned. That made it possible to eliminate
the final else and do some other cleanup. Attached is an updated
patch.


Hmm, I was suggesting that bytes_avail change to unsigned, but I don't
think adjust should change; I believe that 0u - 1u is undefined due to
overflow even though (-1u) and (unsigned)-1 are well defined.  Sorry for
the muddled messages.  I think let's leave adjust signed and assert that
it ends up non-negative.


No problem.

Unsigned ints wrap around and don't overflow so the subtraction
is well defined (0u - 1u is equal UINT_MAX).


I thought I had remembered that, but couldn't find anything in the 
standard to back it up.  Now I see that it's in 3.9.1 rather than clause 5.



FWIW, I had the assert there for sanity testing when you first
mentioned it to convince myself there really was no way for it
become negative. A bootstrap went fine with it but it still made
me just a teeny bit uneasy. I would hate for the code to change
in the future and for the assert to then fire after it's released.



In any case, I defer to your better judgment. Please let me know
if you would still like to go with signed + assert.


If we use gcc_checking_assert it won't fire in release builds; let's go 
with that.


Jason



Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-05 Thread Andreas Arnez
On Thu, Nov 05 2015, Bernd Schmidt wrote:

> On 11/05/2015 12:33 PM, Andreas Arnez wrote:
>
>> Thanks again for reviewing.  Are you going to look at patch #2 as well?
>
> Yeah, still thinking about that one.
>
>>> Does C++ have similar issues?
>>
>> Not this particular issue, AFAIK.  But I've just looked at how C++ fares
>> with the enhanced version of pr67192.c from patch #2.  There I see the
>> following:
>>
>>Breakpoint 2, f4 () at pr67192.cc:54
>>(gdb) p cnt
>>$1 = 16
>>
>> I.e., when breaking on "while (1)" the first loop iteration has already
>> executed.  This is because the C++ parser assigns the backward-goto to
>> the 'while' token.  It's the same issue you pointed at with version 2 of
>> my patch.
>>
>> Shall I open a bug for that?
>
> I'd obviously prefer if you'd manage to get the two frontends behave
> identically. The alternative would be to open a bug.

OK, I guess it depends on whether we want to go the route of patch #2.
If so, it seems we can do the same for the C++ parser, like in the patch
below.  Note that I've not tested this very much.

-- >8 --
Subject: [PATCH] C++: Fix location of loop statement

---
 gcc/cp/cp-gimplify.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index e4b50e5..d9bb708 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -266,7 +266,12 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
loop = stmt_list;
 }
   else
-loop = build1_loc (start_locus, LOOP_EXPR, void_type_node, stmt_list);
+{
+  location_t loc = EXPR_LOCATION (expr_first (body));
+  if (loc == UNKNOWN_LOCATION)
+   loc = start_locus;
+  loop = build1_loc (loc, LOOP_EXPR, void_type_node, stmt_list);
+}
 
   stmt_list = NULL;
   append_to_statement_list (loop, &stmt_list);
-- 
2.3.0



Re: [PATCH] Merge from gomp-4_5-branch to trunk

2015-11-05 Thread Nathan Sidwell

On 11/05/15 10:29, Jakub Jelinek wrote:

Hi!

I've merged the current state of gomp-4_5-branch into trunk, after
bootstrapping/regtesting it on x86_64-linux and i686-linux.

There are
+FAIL: gfortran.dg/goacc/private-3.f95   -O  (test for excess errors)
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
+UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
executable
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
+UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
executable
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
+UNRESOLVED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
executable
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
+UNRESOLVED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
executable
regressions, but I really don't know why OpenACC allows reductions against
private variables, so either the testcases are wrong, or if OpenACC
reduction can work against private vars (automatic vars inside of parallel
too?), then perhaps it shouldn't set check_non_private for OpenACC
reduction clauses or something similar.  Certainly, if there is private
on the target region, returning 1 from omp_check_private is IMNSHO desirable
(and required for OpenMP at least).


I'm working on porting patches for that, and I had noticed the check_non_private 
anomoly earlier today ...


I believe the c/c++ test cases are valid OpenACC, FWIW. (not checked the fortran 
one yet)


Anyway, thanks for the heads-up, my ball.

nathan


[Ping][C++ Patch] PR c++/67846

2015-11-05 Thread Paolo Carlini

Hi,

pinging this (admittedly minor) issue...

On 10/26/2015 07:10 PM, Paolo Carlini wrote:

Hi,

in mainline this ICE on invalid doesn't exist anymore but we may want 
to avoid issuing the additional redundant "error: cannot convert 
‘A::foo’ from type ‘void (A::)()’ to type ‘void (A::*)()’" and/or make 
the error message more informative by printing the member used 
invalidly. Tested x86_64-linux.


https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02795.html

Thanks,
Paolo.



Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-05 Thread Yuri Rumyantsev
Hi All!

I prepared another patch which performs insertion additional check on
zero mask for masked stores if only parameter
PARAM_ZERO_TEST_FOR_MASK_STORE has non-zero value. My attempt to use
approach proposed by Richard with simpler alternative for comparison -
use scalar type for 256-bit was not successful and I returned to
vectori comparison with scalar Boolean result.

ChangeLog:
2015-11-05  Yuri Rumyantsev  

* config/i386/i386.c: Add conditional initialization of
PARAM_ZERO_TEST_FOR_MASK_STORE.
(ix86_expand_branch): Implement vector comparison with boolean result.
* config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE.
* config/i386/sse.md (define_expand "cbranch4): Add define-expand
for vector comparion with eq/ne only.
* config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE.
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM.
* params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
combine vector comparison with boolean result and VEC_COND_EXPR that
has vector result.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise.

2015-11-02 18:24 GMT+03:00 Yuri Rumyantsev :
> Hi Richard,
>
> I've come back to this optimization and try to implement your proposal
> for comparison:
>> Btw, you didn't try the simpler alternative of
>>
>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>> build2 (EQ_EXPR, boolean_type_node,
>>  build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ?  That is, use the GIMPLE level equivalent of
>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>
> using the following code:
>
>   vectype = TREE_TYPE (mask);
>   ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)),
> MODE_INT, 0);
>   ext_type = lang_hooks.types.type_for_mode (ext_mode , 1);
>
> but I've got zero type for it. Should I miss something?
>
> Any help will be appreciated.
> Yuri.
>
>
> 2015-08-13 14:40 GMT+03:00 Richard Biener :
>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard,
>>>
>>> Did you have a chance to look at updated patch?
>>
>> Having a quick look now.  Btw, you didn't try the simpler alternative of
>>
>>  tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>>  build2 (EQ_EXPR, boolean_type_node,
>>build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ?  That is, use the GIMPLE level equivalent of
>>
>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>>
>> ?  That should be supported by the expander already, though again not sure if
>> the target(s) have compares that match this.
>>
>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction
>> on EQ/NE_EXPR
>> is missing.  Operand type equality is tested anyway.
>>
>> Why do you need to restrict forward_propagate_into_comparison_1?
>>
>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well?
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev :
 HI All,

 Here is updated patch which implements Richard proposal to use vector
 comparison with boolean result instead of target hook. Support for it
 was added to ix86_expand_branch.

 Any comments will be appreciated.

 Bootstrap and regression testing did not show any new failures.

 ChangeLog:
 2015-08-06  Yuri Rumyantsev  

 * config/i386/i386.c (ix86_expand_branch): Implement vector
 comparison with boolean result.
 * config/i386/sse.md (define_expand "cbranch4): Add define
 for vector comparion.
 * fold-const.c (fold_relational_const): Add handling of vector
 comparison with boolean result.
 * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM.
 * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros.
 * tree-cfg.c (verify_gimple_comparison): Add test for vector
 comparion with boolean result.
 * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
 propagate vector comparion with boolean result.
 * tree-vect-stmts.c (vectorizable_mask_load_stor

Re: [OpenACC] internal fn folding

2015-11-05 Thread Nathan Sidwell

On 11/04/15 05:02, Bernd Schmidt wrote:

On 11/02/2015 02:56 PM, Nathan Sidwell wrote:

On 10/28/15 14:40, Nathan Sidwell wrote:

Richard,
this patch adds folding for the new GOACC_DIM_POS and GOACC_DIM_SIZE
internal
functions.  IIUC gimple_fold_call is the right place to add this.

The size of a compute dimension is very often a compile-time
constant.  On the
host, in particular it's 1, which means we can deduce the POS must be
zero.

ok?


This is what I committed, using the helpers I recently added. (I realized we can 
only get here for functions with the oacc attribute already set)


nathan
2015-11-05  Nathan Sidwell  

	* gimple-fold.c: Include omp-low.h.
	(fold_internal_goacc_dim): New.
	(gimple_fold_call): Call it.

Index: gimple-fold.c
===
--- gimple-fold.c	(revision 229809)
+++ gimple-fold.c	(working copy)
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
 #include "gimple-match.h"
 #include "gomp-constants.h"
 #include "optabs-query.h"
+#include "omp-low.h"
 
 
 /* Return true when DECL can be referenced from current unit.
@@ -2906,6 +2907,28 @@ gimple_fold_builtin (gimple_stmt_iterato
   return false;
 }
 
+/* Transform IFN_GOACC_DIM_SIZE and IFN_GOACC_DIM_POS internal
+   function calls to constants, where possible.  */
+
+static tree
+fold_internal_goacc_dim (const gimple *call)
+{
+  int axis = get_oacc_ifn_dim_arg (call);
+  int size = get_oacc_fn_dim_size (current_function_decl, axis);
+  bool is_pos = gimple_call_internal_fn (call) == IFN_GOACC_DIM_POS;
+  tree result = NULL_TREE;
+
+  /* If the size is 1, or we only want the size and it is not dynamic,
+ we know the answer.  */
+  if (size == 1 || (!is_pos && size))
+{
+  tree type = TREE_TYPE (gimple_call_lhs (call));
+  result = build_int_cst (type, size - is_pos);
+}
+
+  return result;
+}
+
 /* Return true if ARG0 CODE ARG1 in infinite signed precision operation
doesn't fit into TYPE.  The test for overflow should be regardless of
-fwrapv, and even for unsigned types.  */
@@ -3106,6 +3129,10 @@ gimple_fold_call (gimple_stmt_iterator *
 	  return true;
 	}
 	  break;
+	case IFN_GOACC_DIM_SIZE:
+	case IFN_GOACC_DIM_POS:
+	  result = fold_internal_goacc_dim (stmt);
+	  break;
 	case IFN_UBSAN_CHECK_ADD:
 	  subcode = PLUS_EXPR;
 	  break;


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Martin Sebor

On 11/05/2015 03:48 AM, Alexander Monakov wrote:

On Thu, 5 Nov 2015, Joseph Myers wrote:

On Wed, 4 Nov 2015, Martin Sebor wrote:


Improving the compiler output is a good idea. The attached patch
prints "[disabled by -O0]" instead of "[enabled]" when an optimization
option is enabled by default but when optimization (i.e., -O1 or
greater) is not enabled.


I don't think it's entirely accurate that all options marked as
Optimization in *.opt are actually disabled by -O0.  Many are, but it
depends on the actual logic controlling each optimization.


Indeed, and Richard already pointed that out when reviewing (my) previous
attempt.  Martin, please review the thread ending at
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00113.html

I guess eventually we can slightly overhaul optimization options to remove
such ambiguities, but in short term I see no way to accurately determine
whether a pass/option is ultimately in effect or not.

Martin, what about taking Richard's suggestion, tacking a short explanatory
sentence onto the end of --help=optimizers output?


I don't think that reiterating in a condensed form what the manual
doesn't make clear in many more words will help. First, even users
who do find the relevant text in the manual often misunderstand it.
Others are misled by the "[enabled]" output into expecting the
optimizations to take place at -O0. Those who are in doubt and
ask on gcc-help are then told by GCC experts that -O0 disables
all optimizations without exception, and to RTFM (see the thread
I referenced in my first post on this topic for an example).

That the individual optimizations -fxxx options have no effect
without -O1 or better is made clear in the FAQ on the Wiki (in
https://gcc.gnu.org/wiki/FAQ#optimization-options). What isn't
mentioned there is the relationship to the --help=optimizers
output.

I'd be happy to tweak the "disabled by -O0" text to something
less definitive if that would make you and Joseph more
comfortable. But the "enabled" output at -O0, even though
strictly correct, is a problem. Users don't care about the
internal state of the options. What they need to know is
whether or not their code may be subject to the corresponding
optimiziation. We seem to have consensus that even though there
might be exceptions (do we know what some of them are?) the vast
majority optimizations that have -fxxx options are in fact not
performed at -O0. I think reflecting that in the output is more
helpful than implying otherwise.

Martin


Re: [PATCH, ARM] PR target/68059 libgcc should not use __write for printing fatal error

2015-11-05 Thread Richard Earnshaw
On 05/11/15 12:30, Szabolcs Nagy wrote:
> libgcc/config/arm/linux-atomic-64bit.c uses __write to print an error
> message if the 64bit xchg method is not available in the kernel.
> 
> __write is not part of the public libc abi.  Since this code is only
> run on linux the write syscall can be invoked directly.
> 
> And __builtin_trap is a simpler way to crash than abort.
> 
> The new behaviour on a linux kernel before v3.1:
> 
> # ./a.out
> A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)
> Illegal instruction
> 
> OK for trunk and backporting?

__morestack_fail in generic-morestack.c now uses writev().  Is there
some reason we can't do the same?

R.

> 
> libgcc/ChangeLog:
> 
> 2015-11-05  Szabolcs Nagy  
> 
> PR target/68059
> * config/arm/linux-atomic-64bit.c (__write): Remove declaration.
> (abort): Likewise.
> (linux_write): Define.
> 
> arm_write_2.diff
> 
> 
> diff --git a/libgcc/config/arm/linux-atomic-64bit.c 
> b/libgcc/config/arm/linux-atomic-64bit.c
> index cdf713c..aba3334 100644
> --- a/libgcc/config/arm/linux-atomic-64bit.c
> +++ b/libgcc/config/arm/linux-atomic-64bit.c
> @@ -33,9 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> kernels; we check for that in an init section and bail out rather
> unceremoneously.  */
>  
> -extern unsigned int __write (int fd, const void *buf, unsigned int count);
> -extern void abort (void);
> -
>  /* Kernel helper for compare-and-exchange.  */
>  typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
>   const long long* newval,
> @@ -45,6 +42,19 @@ typedef int (__kernel_cmpxchg64_t) (const long long* 
> oldval,
>  /* Kernel helper page version number.  */
>  #define __kernel_helper_version (*(unsigned int *)0x0ffc)
>  
> +static void
> +linux_write (int fd, const void *buf, unsigned int count)
> +{
> + register long r7 asm ("r7") = 4; /* Linux __NR_write.  */
> + register long r0 asm ("r0") = fd;
> + register long r1 asm ("r1") = (long)buf;
> + register long r2 asm ("r2") = count;
> + asm volatile ("svc 0"
> + :
> + : "r" (r7), "r" (r0), "r" (r1), "r" (r2)
> + : "memory");
> +}
> +
>  /* Check that the kernel has a new enough version at load.  */
>  static void __check_for_sync8_kernelhelper (void)
>  {
> @@ -53,11 +63,9 @@ static void __check_for_sync8_kernelhelper (void)
>const char err[] = "A newer kernel is required to run this binary. "
>   "(__kernel_cmpxchg64 helper)\n";
>/* At this point we need a way to crash with some information
> -  for the user - I'm not sure I can rely on much else being
> -  available at this point, so do the same as generic-morestack.c
> -  write () and abort ().  */
> -  __write (2 /* stderr.  */, err, sizeof (err));
> -  abort ();
> +  for the user.  */
> +  linux_write (2 /* stderr.  */, err, sizeof (err));
> +  __builtin_trap ();
>  }
>  };
>  
> 



Re: [vec-cmp, patch 1/6] Add optabs for vector comparison

2015-11-05 Thread Ilya Enkovich
2015-10-27 23:52 GMT+03:00 Jeff Law :
>
> Sigh.  I searched for the enum type, not for CODE_FOR_nothing ;(  My bad.
>
> If it's easy to get rid of, yes.  I believe we've got 3 uses of
> CODE_FOR_nothing.  AFAICT in none of those cases do we care about the code
> other than does it correspond to CODE_FOR_nothing.
>
> Ideally we'd like to have both optabs-query and optabs-tree not know about
> insn codes.  The former is supposed to be IR agnostic, but insn codes are
> part of the RTL IR, so that's a wart.  The latter is supposed to be tree
> specific and thus shouldn't know about the RTL IR either.
>
> I'd settle for getting the wart out of optabs-tree and we can put further
> cleanup of optabs-query in the queue.
>
> To get the wart out of optabs-tree all I think we need is a true boolean
> function that tells us if there's a suitable optab.
>
> It's unfortunate that the routines exported by optabs-query are
> can_{extend,float,fix}_p since those would fairly natural for the boolean
> query we want to make and they're used elsewhere, but not in a boolean form.
>
> I think that we ought to rename the existing uses & definition of can_XXX_p
> that are exported by optabs-query.c, then creating new can_XXX_p for those
> uses that just care about the boolean status should work.  At that point we
> remove insn-codes.h from optab-tree.c.

Do you want this refactoring be a part of this patch or series?

Thanks,
Ilya

>
> Jeff


[PATCH, PR tree-optimization/68145] Fix vectype computation in vectorizable_operation

2015-11-05 Thread Ilya Enkovich
Hi,

This patch fixes a way vectype is computed in vectorizable_operation.  
Currently op0 is always used to compute vectype.  If it is a loop invariant 
then its type is used to get vectype which is impossible for booleans requiring 
a context to correctly compute vectype.  This patch uses output vectype in such 
cases, this should always work fine for operations on booleans.  Bootstrapped 
on x86_64-unknown-linux-gnu.  Regression tesing is in progress.  Ok if no 
regressions?

Thanks,
Ilya
--
gcc/

2015-11-05  Ilya Enkovich  

PR tree-optimization/68145
* tree-vect-stmts.c (vectorizable_operation): Fix
determination for booleans.

gcc/testsuite/

2015-11-05  Ilya Enkovich  

PR tree-optimization/68145
* g++.dg/vect/pr68145.cc: New test.


diff --git a/gcc/testsuite/g++.dg/vect/pr68145.cc 
b/gcc/testsuite/g++.dg/vect/pr68145.cc
new file mode 100644
index 000..51e663a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr68145.cc
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+
+struct A {
+  bool operator()(int p1, int p2) { return p1 && p2; }
+};
+class B {
+public:
+  bool *cbegin();
+  bool *cend();
+};
+template  void operator&&(B p1, T p2) {
+  B a;
+  arrayContTransform(p1, p2, a, A());
+}
+
+template 
+void myrtransform(_InputIterator1 p1, _OutputIterator p2, T p3,
+  _BinaryOperation p4) {
+  _InputIterator1 b;
+  for (; b != p1; ++b, ++p2)
+*p2 = p4(*b, p3);
+}
+
+template 
+void arrayContTransform(L p1, R p2, RES p3, BinaryOperator p4) {
+  myrtransform(p1.cend(), p3.cbegin(), p2, p4);
+}
+
+class C {
+public:
+  B getArrayBool();
+};
+class D {
+  B getArrayBool(const int &);
+  C lnode_p;
+};
+bool c;
+B D::getArrayBool(const int &) { lnode_p.getArrayBool() && c; }
+
+// { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index ae14075..9aa2d4e 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4697,7 +4697,26 @@ vectorizable_operation (gimple *stmt, 
gimple_stmt_iterator *gsi,
   /* If op0 is an external or constant def use a vector type with
  the same size as the output vector type.  */
   if (!vectype)
-vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+{
+  /* For boolean type we cannot determine vectype by
+invariant value (don't know whether it is a vector
+of booleans or vector of integers).  We use output
+vectype because operations on boolean don't change
+type.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE)
+   {
+ if (TREE_CODE (TREE_TYPE (scalar_dest)) != BOOLEAN_TYPE)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"not supported operation on bool value.\n");
+ return false;
+   }
+ vectype = vectype_out;
+   }
+  else
+   vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+}
   if (vec_stmt)
 gcc_assert (vectype);
   if (!vectype)


Re: [PATCH, ARM] PR target/68059 libgcc should not use __write for printing fatal error

2015-11-05 Thread Szabolcs Nagy

On 05/11/15 15:59, Richard Earnshaw wrote:

On 05/11/15 12:30, Szabolcs Nagy wrote:

libgcc/config/arm/linux-atomic-64bit.c uses __write to print an error
message if the 64bit xchg method is not available in the kernel.

__write is not part of the public libc abi.  Since this code is only
run on linux the write syscall can be invoked directly.

And __builtin_trap is a simpler way to crash than abort.

The new behaviour on a linux kernel before v3.1:

# ./a.out
A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)
Illegal instruction

OK for trunk and backporting?


__morestack_fail in generic-morestack.c now uses writev().  Is there
some reason we can't do the same?



both write and writev should be part of libc abi on linux.

they are not in the iso c namespace so standard conforming
user code can override them, but we are trying to report a
fatal error with best effort, so that might not be an issue.

i can just s/__write/write/ if that's preferred (or use writev)
and then no inline asm syscall magic is necessary.

i just want to avoid the symbol reference that is potentially
not available in the libc (i'm not sure if the 64bit builtin
atomics are supposed to work in freestanding mode or if
dynamic linking to libgcc should work without a crash on
old kernels).


libgcc/ChangeLog:

2015-11-05  Szabolcs Nagy  

 PR target/68059
 * config/arm/linux-atomic-64bit.c (__write): Remove declaration.
 (abort): Likewise.
 (linux_write): Define.

arm_write_2.diff


diff --git a/libgcc/config/arm/linux-atomic-64bit.c 
b/libgcc/config/arm/linux-atomic-64bit.c
index cdf713c..aba3334 100644
--- a/libgcc/config/arm/linux-atomic-64bit.c
+++ b/libgcc/config/arm/linux-atomic-64bit.c
@@ -33,9 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 kernels; we check for that in an init section and bail out rather
 unceremoneously.  */

-extern unsigned int __write (int fd, const void *buf, unsigned int count);
-extern void abort (void);
-
  /* Kernel helper for compare-and-exchange.  */
  typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
const long long* newval,
@@ -45,6 +42,19 @@ typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
  /* Kernel helper page version number.  */
  #define __kernel_helper_version (*(unsigned int *)0x0ffc)

+static void
+linux_write (int fd, const void *buf, unsigned int count)
+{
+   register long r7 asm ("r7") = 4; /* Linux __NR_write.  */
+   register long r0 asm ("r0") = fd;
+   register long r1 asm ("r1") = (long)buf;
+   register long r2 asm ("r2") = count;
+   asm volatile ("svc 0"
+   :
+   : "r" (r7), "r" (r0), "r" (r1), "r" (r2)
+   : "memory");
+}
+
  /* Check that the kernel has a new enough version at load.  */
  static void __check_for_sync8_kernelhelper (void)
  {
@@ -53,11 +63,9 @@ static void __check_for_sync8_kernelhelper (void)
const char err[] = "A newer kernel is required to run this binary. "
"(__kernel_cmpxchg64 helper)\n";
/* At this point we need a way to crash with some information
-for the user - I'm not sure I can rely on much else being
-available at this point, so do the same as generic-morestack.c
-write () and abort ().  */
-  __write (2 /* stderr.  */, err, sizeof (err));
-  abort ();
+for the user.  */
+  linux_write (2 /* stderr.  */, err, sizeof (err));
+  __builtin_trap ();
  }
  };








Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread David Edelsohn
[Explicitly copying build maintainers.]

Paolo and Alexandre,

Could you review and help with this patch?

TLS symbols in AIX display a new, different symbol type in nm output.
Libtool explicitly creates a list of exported symbols for shared
libraries using nm and does not recognize the new TLS symbols, so
those symbols are not exported.

This is a regression for TLS support on AIX.

This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
libgfortran, and libgomp.  I would like to apply the patch to GCC
while I simultaneously work with the Libtool community to correct the
bug upstream.  I also would like to backport this to GCC 5.2 and GCC
4.9.x.

I have not been able to run the correct versions of autoconf to
regenerate configure directly.  I either can edit the files directly
or I would appreciate someone helping me to regenerate configure in
all library directories.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

* libtool.m4 (export_symbols_cmds) [AIX]: Add global TLS "L" symbols.
* libstdc++-v3/configure: Regenerate.
* libgfortran/configure: Regenerate.
* libgomp/configure: Regenerate.

Thanks, David


ZZ
Description: Binary data


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Jakub Jelinek
Hi!

On Wed, Nov 04, 2015 at 09:55:49AM -0800, Cesar Philippidis wrote:

So, you are going to deal with gang parsing incrementally?
Fine with me.

> +/* OpenACC 2.0:
> +   tile ( size-expr-list ) */
> +
> +static tree
> +c_parser_oacc_clause_tile (c_parser *parser, tree list)
> +{
> +  tree c, expr = error_mark_node;
> +  location_t loc, expr_loc;
> +  tree tile = NULL_TREE;
> +
> +  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
> +
> +  loc = c_parser_peek_token (parser)->location;
> +  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
> +return list;
> +
> +  vec *tvec = make_tree_vector ();

Seems I've misread your patch, thought you are using TREE_VEC, while
you are actually using TREE_LIST, but populating it in a weird way.
I think more efficient would be just to
  tree tile = NULL_TREE;
here, then:

> +
> +  vec_safe_push (tvec, expr);

  tile = tree_cons (NULL_TREE, expr, tile);

> +  if (c_parser_next_token_is (parser, CPP_COMMA))
> + c_parser_consume_token (parser);
> +}
> +  while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
> +
> +  /* Consume the trailing ')'.  */
> +  c_parser_consume_token (parser);
> +
> +  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
> +  tile = build_tree_list_vec (tvec);

  tile = nreverse (tile);
  
> +  OMP_CLAUSE_TILE_LIST (c) = tile;
> +  OMP_CLAUSE_CHAIN (c) = list;
> +  release_tree_vector (tvec);

and remove the release_tree_vector calls.

> +static tree
> +cp_parser_oacc_clause_tile (cp_parser *parser, location_t clause_loc, tree 
> list)

This is already too long line.

> + case OMP_CLAUSE_TILE:
> +   {
> + tree list = OMP_CLAUSE_TILE_LIST (c);
> +
> + while (list)

I'd say
for (tree list = OMP_CLAUSE_TILE_LIST (c);
 list; list = TREE_CHAIN (list))
would be more readable.

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -6995,9 +6995,18 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   remove = true;
> break;
>  
> + case OMP_CLAUSE_TILE:
> +   for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list;
> +list = TREE_CHAIN (list))
> + {
> +   if (gimplify_expr (&TREE_VALUE (list), pre_p, NULL,
> +  is_gimple_val, fb_rvalue) == GS_ERROR)
> + remove = true;
> + }

After all, you are already using for here ;)

Otherwise LGTM, but please clear with Thomas, I think he had some issues
with the patch.

Jakub


Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread Paolo Bonzini


On 05/11/2015 17:28, David Edelsohn wrote:
> [Explicitly copying build maintainers.]
> 
> Paolo and Alexandre,
> 
> Could you review and help with this patch?
> 
> TLS symbols in AIX display a new, different symbol type in nm output.
> Libtool explicitly creates a list of exported symbols for shared
> libraries using nm and does not recognize the new TLS symbols, so
> those symbols are not exported.
> 
> This is a regression for TLS support on AIX.
> 
> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
> libgfortran, and libgomp.  I would like to apply the patch to GCC
> while I simultaneously work with the Libtool community to correct the
> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
> 4.9.x.

I think it's okay to wait for the patch to be upstream.

I can help committing the patch once it is.

Paolo

> I have not been able to run the correct versions of autoconf to
> regenerate configure directly.  I either can edit the files directly
> or I would appreciate someone helping me to regenerate configure in
> all library directories.
> 
> Bootstrapped on powerpc-ibm-aix7.1.0.0.
> 
> * libtool.m4 (export_symbols_cmds) [AIX]: Add global TLS "L" symbols.
> * libstdc++-v3/configure: Regenerate.
> * libgfortran/configure: Regenerate.
> * libgomp/configure: Regenerate.
> 
> Thanks, David
> 

Index: libtool.m4
===
--- libtool.m4  (revision 229706)
+++ libtool.m4  (working copy)
@@ -4230,7 +4230,7 @@
 if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
   _LT_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | 
awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == 
"W")) && ([substr](\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > 
$export_symbols'
 else
-  _LT_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | 
awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B")) && ([substr](\$ 
3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
+  _LT_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | 
awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == 
"L")) && ([substr](\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > 
$export_symbols'
 fi
 ;;
   pw32*)
@@ -4641,7 +4641,7 @@
if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
  _LT_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | 
awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == 
"W")) && ([substr](\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > 
$export_symbols'
else
- _LT_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience 
| awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B")) && 
([substr](\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
+ _LT_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience 
| awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == 
"L")) && ([substr](\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > 
$export_symbols'
fi
aix_use_runtimelinking=no
 
Index: libstdc++-v3/configure
===
--- libstdc++-v3/configure  (revision 229706)
+++ libstdc++-v3/configure  (working copy)
@@ -9539,7 +9539,7 @@
if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
  export_symbols_cmds='$NM -Bpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && 
(substr(\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
else
- export_symbols_cmds='$NM -BCpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B")) && (substr(\$ 3,1,1) != 
".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
+ export_symbols_cmds='$NM -BCpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L")) && 
(substr(\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
fi
aix_use_runtimelinking=no
 
@@ -14058,7 +14058,7 @@
 if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
   export_symbols_cmds_CXX='$NM -Bpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && 
(substr(\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
 else
-  export_symbols_cmds_CXX='$NM -BCpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B")) && (substr(\$ 3,1,1) != 
".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
+  export_symbols_cmds_CXX='$NM -BCpg $libobjs $convenience | awk '\''{ if 
(((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L")) && 
(substr(\$ 3,1,1) != ".")) { print \$ 3 } }'\'' | sort -u > $export_symbols'
 fi
 ;;
   pw32*)
Index: libgfortran/configure

Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread David Edelsohn
On Thu, Nov 5, 2015 at 8:34 AM, Paolo Bonzini  wrote:
>
>
> On 05/11/2015 17:28, David Edelsohn wrote:
>> [Explicitly copying build maintainers.]
>>
>> Paolo and Alexandre,
>>
>> Could you review and help with this patch?
>>
>> TLS symbols in AIX display a new, different symbol type in nm output.
>> Libtool explicitly creates a list of exported symbols for shared
>> libraries using nm and does not recognize the new TLS symbols, so
>> those symbols are not exported.
>>
>> This is a regression for TLS support on AIX.
>>
>> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
>> libgfortran, and libgomp.  I would like to apply the patch to GCC
>> while I simultaneously work with the Libtool community to correct the
>> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
>> 4.9.x.
>
> I think it's okay to wait for the patch to be upstream.
>
> I can help committing the patch once it is.

Paolo,

The patch MUST go into GCC 5.3 release, which Richi has announced that
he wants to release around the beginning of Stage 3.  The patch should
not go into GCC 5 branch without going into trunk.  This patch cannot
wait unless you want to block GCC 5.3.

Thanks, David


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-05 Thread Martin Sebor

If we use gcc_checking_assert it won't fire in release builds; let's go
with that.


Okay. Attached is an updated patch with that change.

Martin

gcc ChangeLog
2015-11-05  Martin Sebor  

	PR c++/67942
* invoke.texi (-Wplacement-new): Document new option.
	* gcc/testsuite/g++.dg/warn/Wplacement-new-size.C: New test.

gcc/c-family ChangeLog
2015-11-05  Martin Sebor  

	PR c++/67942
* c.opt (-Wplacement-new): New option.

gcc/cp ChangeLog
2015-11-05  Martin Sebor  

	PR c++/67942
	* cp/init.c (warn_placement_new_too_small): New function.
	(build_new_1): Call it.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 47ba070..5e9d7a3 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -760,6 +760,10 @@ Wprotocol
 ObjC ObjC++ Var(warn_protocol) Init(1) Warning
 Warn if inherited methods are unimplemented

+Wplacement-new
+C++ Var(warn_placement_new) Init(1) Warning
+Warn for placement new expressions with undefined behavior
+
 Wredundant-decls
 C ObjC C++ ObjC++ Var(warn_redundant_decls) Warning
 Warn about multiple declarations of the same object
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1ed8f6c..3a9c59d 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2269,6 +2269,201 @@ throw_bad_array_new_length (void)
   return build_cxx_call (fn, 0, NULL, tf_warning_or_error);
 }

+/* Attempt to verify that the argument, OPER, of a placement new expression
+   refers to an object sufficiently large for an object of TYPE or an array
+   of NELTS of such objects when NELTS is non-null, and issue a warning when
+   it does not.  SIZE specifies the size needed to construct the object or
+   array and captures the result of NELTS * sizeof (TYPE). (SIZE could be
+   greater when the array under construction requires a cookie to store
+   NELTS.  GCC's placement new expression stores the cookie when invoking
+   a user-defined placement new operator function but not the default one.
+   Placement new expressions with user-defined placement new operator are
+   not diagnosed since we don't know how they use the buffer (this could
+   be a future extension).  */
+static void
+warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
+{
+  location_t loc = EXPR_LOC_OR_LOC (oper, input_location);
+
+  /* The number of bytes to add to or subtract from the size of the provided
+ buffer based on an offset into an array or an array element reference.
+ Although intermediate results may be negative (as in a[3] - 2) the final
+ result cannot be.  */
+  HOST_WIDE_INT adjust = 0;
+  /* True when the size of the entire destination object should be used
+ to compute the possibly optimistic estimate of the available space.  */
+  bool use_obj_size = false;
+  /* True when the reference to the destination buffer is an ADDR_EXPR.  */
+  bool addr_expr = false;
+
+  STRIP_NOPS (oper);
+
+  /* Using a function argument or a (non-array) variable as an argument
+ to placement new is not checked since it's unknown what it might
+ point to.  */
+  if (TREE_CODE (oper) == PARM_DECL
+  || TREE_CODE (oper) == VAR_DECL
+  || TREE_CODE (oper) == COMPONENT_REF)
+return;
+
+  /* Evaluate any constant expressions.  */
+  size = fold_non_dependent_expr (size);
+
+  /* Handle the common case of array + offset expression when the offset
+ is a constant.  */
+  if (TREE_CODE (oper) == POINTER_PLUS_EXPR)
+{
+  /* If the offset is comple-time constant, use it to compute a more
+	 accurate estimate of the size of the buffer.  Otherwise, use
+	 the size of the entire array as an optimistic estimate (this
+	 may lead to false negatives).  */
+  const_tree adj = TREE_OPERAND (oper, 1);
+  if (CONSTANT_CLASS_P (adj))
+	adjust += tree_to_uhwi (adj);
+  else
+	use_obj_size = true;
+
+  oper = TREE_OPERAND (oper, 0);
+
+  STRIP_NOPS (oper);
+}
+
+  if (TREE_CODE (oper) == TARGET_EXPR)
+oper = TREE_OPERAND (oper, 1);
+  else if (TREE_CODE (oper) == ADDR_EXPR)
+{
+  addr_expr = true;
+  oper = TREE_OPERAND (oper, 0);
+}
+
+  STRIP_NOPS (oper);
+
+  if (TREE_CODE (oper) == ARRAY_REF)
+{
+  /* Similar to the offset computed above, see if the array index
+	 is a compile-time constant.  If so, and unless the offset was
+	 not a compile-time constant, use the index to determine the
+	 size of the buffer.  Otherwise, use the entire array as
+	 an optimistic estimate of the size.  */
+  const_tree adj = TREE_OPERAND (oper, 1);
+  if (!use_obj_size && CONSTANT_CLASS_P (adj))
+	adjust += tree_to_shwi (adj);
+  else
+	{
+	  use_obj_size = true;
+	  adjust = 0;
+	}
+
+  oper = TREE_OPERAND (oper, 0);
+}
+
+  /* Descend into a struct or union to find the member whose address
+ is being used as the agument.  */
+  while (TREE_CODE (oper) == COMPONENT_REF)
+oper = TREE_OPERAND (oper, 1);
+
+  if ((addr_expr || !POINTER_TYPE_P (TREE_TYPE (oper)))
+  && (TREE_CODE (oper) == VAR_DECL
+	  || TREE_CODE (ope

Re: Fix 61441

2015-11-05 Thread Joseph Myers
On Thu, 5 Nov 2015, Sujoy Saraswati wrote:

> > Some other places in this patch have similar issues with checking
> > HONOR_SNANS && REAL_VALUE_ISNAN when they should check if the particular
> > value is sNaN.
> 
> I modified the code to check for sNaN specifically. The modified patch 
> is below.

This is looking pretty close to ready to go in, but the patch also seems 
to be doing several things at once and it might be easier to review if 
split into separate logical pieces.  For example (this is an illustration, 
not necessarily saying it should be exactly this way): (1) Add 
REAL_VALUE_ISSIGNALING_NAN and convert existing logic that already does 
something equivalent (so no semantic change, just cleanup).  (2) Convert 
existing code that checks for all NaNs in cases where 
REAL_VALUE_ISSIGNALING_NAN would be enough.  (3) Treat rint and nearbyint 
the same.  (4) Make real.c operations properly produce quiet NaNs for 
operations with signaling NaN input.  (5) Disable various transformations 
for signaling NaN operands, where folding them loses exceptions.

This patch is also getting big enough that it would be a good idea to 
complete the copyright assignment paperwork, certainly if you plan to make 
more contributions in future, if you're not covered by a corporate 
assignment.

> @@ -1943,7 +1965,17 @@ fold_convert_const_real_from_real (tree type, cons
>REAL_VALUE_TYPE value;
>tree t;
> 
> +  /* Don't perform the operation if flag_signaling_nans is on
> + and the operand is a NaN.  */
> +  if (HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg1)))
> +  && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
> +return NULL_TREE;
> +
>real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
> +  /* Make resulting NaN value to be qNaN when flag_signaling_nans
> + is off.  */
> +  if (REAL_VALUE_ISSIGNALING_NAN (value))
> +value.signalling = 0;

Shouldn't real_convert do this rather than the caller needing to do it?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Thomas Schwinge
Hi!

Nathan, question here about clause splitting for combined OpenACC
"parallel loop" and "kernels loop" constructs, in particular:

#pragma acc parallel loop reduction([...])

On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis  
wrote:
> On 11/05/2015 04:14 AM, Thomas Schwinge wrote:
> 
> > On Tue, 3 Nov 2015 14:16:59 -0800, Cesar Philippidis 
> >  wrote:
> >> This patch does the following to the c and c++ front ends:
> > 
> >>  * updates c_oacc_split_loop_clauses to filter out the loop clauses
> >>from combined parallel/kernels loops
> > 
> >>gcc/c-family/
> >>* c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
> >>AUTO, SEQ and independent as loop clauses.  Associate REDUCTION
> >>clauses with parallel and kernels.
> > 
> >> --- a/gcc/c-family/c-omp.c
> >> +++ b/gcc/c-family/c-omp.c
> >> @@ -709,12 +709,21 @@ c_oacc_split_loop_clauses (tree clauses, tree 
> >> *not_loop_clauses)
> >>  
> >>switch (OMP_CLAUSE_CODE (clauses))
> >>  {
> >> +/* Loop clauses.  */
> >>case OMP_CLAUSE_COLLAPSE:
> >> -  case OMP_CLAUSE_REDUCTION:
> >> +  case OMP_CLAUSE_TILE:
> >> +  case OMP_CLAUSE_GANG:
> >> +  case OMP_CLAUSE_WORKER:
> >> +  case OMP_CLAUSE_VECTOR:
> >> +  case OMP_CLAUSE_AUTO:
> >> +  case OMP_CLAUSE_SEQ:
> >> +  case OMP_CLAUSE_INDEPENDENT:
> >>  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
> >>  loop_clauses = clauses;
> >>  break;
> >>  
> >> +/* Parallel/kernels clauses.  */
> >> +
> >>default:
> >>  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
> >>  *not_loop_clauses = clauses;
> > 
> > Contrary to your ChangeLog entry, this is not duplicating but is moving
> > OMP_CLAUSE_REDUCTION handling.  Is that intentional?  (And, doesn't
> > anything break in the testsuite?)
> 
> Sorry, I must have mis-phrased it. The spec is unclear here. There are
> three possible ways to interpret 'acc parallel loop reduction':
> 
>   1. acc parallel reduction
>  acc loop

This is what you propose in your patch, but I don't think that makes
sense, or does it?  I'm happy to learn otherwise, but in my current
understanding, a reduction clause needs to be attached (at least) to the
innermost construct where reductions are to be processed.  (Let's also
consider multi-level gang/worker/vector loops/reductions.)  So, either:

>   2. acc parallel
>  acc loop reduction

... this, or even this:

>   3. acc parallel reduction
>  acc loop reduction

..., which I'm not sure what the execution model implementation requires.
(Nathan?)

And while we're at it: the very same question also applies to the private
clause, which -- contrary to all other (as far as I remember) clauses --
also is applicable to both the parallel and loop constructs:

#pragma acc parallel loop private([...])

... is to be decomposed into which of the following:

#pragma acc parallel private([...])
#pragma acc loop

#pragma acc parallel
#pragma acc loop private([...])

#pragma acc parallel private([...])
#pragma acc loop private([...])

(There is no private clause allowed to be specified with the kernels
construct for what it's worth, but that doesn't mean we couldn't use it
internally, of course, if so required.)


> You told me to make all of the front ends consistent, and since I
> started working on fortran first, I had c and c++ follow what it was doing.

Well, I had not asked you to "blindly" ;-) make the front ends
consistent, but rather to figure out why currently C/C++ vs. Fortran are
doing things differently, and then make that consistent, as needed.

> I haven't observed any regressions with this in in place. Then again,
> maybe we don't have sufficient test coverage. I'll do more testing.

Thanks!  (I do see some -- but not very many, curiously! -- regressions
when doing (only) this change on gomp-4_0-branch.)

As far as I'm concerned, this analysis/testing can also be done later on.
(Jakub?)  In that case, for now please just keep OMP_CLAUSE_REDUCTION as
it has been handled before.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Segher Boessenkool
On Thu, Nov 05, 2015 at 08:58:25AM -0700, Martin Sebor wrote:
> I don't think that reiterating in a condensed form what the manual
> doesn't make clear in many more words will help. First, even users
> who do find the relevant text in the manual often misunderstand it.
> Others are misled by the "[enabled]" output into expecting the
> optimizations to take place at -O0.

The -Q --help* output is primarily aimed at GCC developers, just
like plain -Q is.  The main problem here seems to be that the
documentation pushes beginner GCC users to use this.

Perhaps we should have a higher-level, more easy to use way to query
what flags will do what, possibly integrated with the pass manager.

> Those who are in doubt and
> ask on gcc-help are then told by GCC experts that -O0 disables
> all optimizations without exception,

And it does, for the plain meaning of "optimization"; anything else
is arguably a bug.  But people have many different understandings
of what a "compiler optimization" is, all the way to "anything the
compiler does".


Segher


Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-05 Thread Jonathan Wakely
On 5 November 2015 at 20:52, Daniel Gutson wrote:
> Real use cases: statistics and logging. It's a (one time) callback
> reporting that something went wrong,
> but not intended to fix things e.g. by attempting to free more memory.

Why can't that be done by replacing operator new with a user-defined
function, instead of modifying the default one?

A program that wants to make use of the altered semantics is
non-portable because it relies on a custom libstdc++ configuration
option, and has to install a custom new-handler that relies on the
altered libstdc++ behaviour. Replacing operator new to do the logging
is portable and requires no custom configuration.

(I'm not rejecting the patch, but am still trying to make sure it's a
useful change.)


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Joseph Myers
On Thu, 5 Nov 2015, Martin Sebor wrote:

> optimiziation. We seem to have consensus that even though there
> might be exceptions (do we know what some of them are?) the vast
> majority optimizations that have -fxxx options are in fact not
> performed at -O0. I think reflecting that in the output is more
> helpful than implying otherwise.

There are several options marked Optimization that are actually matters of 
language semantics that I don't think have anything to do with -O options, 
e.g. -fexceptions and -fcx-limited-range.  The Optimization tag really 
just means that it's saved and restored under certain circumstances.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Nathan Sidwell

On 11/05/15 12:01, Thomas Schwinge wrote:


On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis  
wrote:

On 11/05/2015 04:14 AM, Thomas Schwinge wrote:



Sorry, I must have mis-phrased it. The spec is unclear here. There are
three possible ways to interpret 'acc parallel loop reduction':

   1. acc parallel reduction
  acc loop


This is what you propose in your patch, but I don't think that makes
sense, or does it?  I'm happy to learn otherwise, but in my current
understanding, a reduction clause needs to be attached (at least) to the
innermost construct where reductions are to be processed.  (Let's also


Correct, the  above interpretation must be wrong.


consider multi-level gang/worker/vector loops/reductions.)  So, either:


   2. acc parallel
  acc loop reduction


... this, or even this:


   3. acc parallel reduction
  acc loop reduction


..., which I'm not sure what the execution model implementation requires.
(Nathan?)


interpretation #2 is sufficient, I think. However, both are lacking a 'copy 
(reduction_var)', clause as otherwise there's nothing changing the default data 
attribute of 'firstprivate' (working on that patch).  Perhaps 'reduction' on 
'parallel'  is meant to imply that  (because that's what makes sense), but the 
std doesn't say it.


In summary it's probably safe to implement interpretation #3.  That way we can 
implement the hypothesis that reductions at the outer construct imply copy.



And while we're at it: the very same question also applies to the private
clause, which -- contrary to all other (as far as I remember) clauses --
also is applicable to both the parallel and loop constructs:

 #pragma acc parallel loop private([...])

... is to be decomposed into which of the following:

 #pragma acc parallel private([...])
 #pragma acc loop

 #pragma acc parallel
 #pragma acc loop private([...])

 #pragma acc parallel private([...])
 #pragma acc loop private([...])

(There is no private clause allowed to be specified with the kernels
construct for what it's worth, but that doesn't mean we couldn't use it
internally, of course, if so required.)


I think interpretation #2 or #3 make sense, and I suspect result in the same 
emitted code.


nathan

--
Nathan Sidwell


Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread Richard Biener
On November 5, 2015 5:38:45 PM GMT+01:00, David Edelsohn  
wrote:
>On Thu, Nov 5, 2015 at 8:34 AM, Paolo Bonzini  wrote:
>>
>>
>> On 05/11/2015 17:28, David Edelsohn wrote:
>>> [Explicitly copying build maintainers.]
>>>
>>> Paolo and Alexandre,
>>>
>>> Could you review and help with this patch?
>>>
>>> TLS symbols in AIX display a new, different symbol type in nm
>output.
>>> Libtool explicitly creates a list of exported symbols for shared
>>> libraries using nm and does not recognize the new TLS symbols, so
>>> those symbols are not exported.
>>>
>>> This is a regression for TLS support on AIX.
>>>
>>> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
>>> libgfortran, and libgomp.  I would like to apply the patch to GCC
>>> while I simultaneously work with the Libtool community to correct
>the
>>> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
>>> 4.9.x.
>>
>> I think it's okay to wait for the patch to be upstream.
>>
>> I can help committing the patch once it is.
>
>Paolo,
>
>The patch MUST go into GCC 5.3 release, which Richi has announced that
>he wants to release around the beginning of Stage 3.  The patch should
>not go into GCC 5 branch without going into trunk.  This patch cannot
>wait unless you want to block GCC 5.3.

I don't remember announcing this timeframe but I won't be able to dedicate the 
required resources until the end of November.

Richard.

>Thanks, David




Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread David Edelsohn
Hi, Richi

You mentioned GCC 5.3 release around the time of stage 1 ending in the
GCC 5 status report:

https://gcc.gnu.org/ml/gcc/2015-07/msg00197.html

If the timeframe is later, we have more time to address the AIX libtool problem.

Thanks, David

On Thu, Nov 5, 2015 at 9:13 AM, Richard Biener
 wrote:
> On November 5, 2015 5:38:45 PM GMT+01:00, David Edelsohn  
> wrote:
>>On Thu, Nov 5, 2015 at 8:34 AM, Paolo Bonzini  wrote:
>>>
>>>
>>> On 05/11/2015 17:28, David Edelsohn wrote:
 [Explicitly copying build maintainers.]

 Paolo and Alexandre,

 Could you review and help with this patch?

 TLS symbols in AIX display a new, different symbol type in nm
>>output.
 Libtool explicitly creates a list of exported symbols for shared
 libraries using nm and does not recognize the new TLS symbols, so
 those symbols are not exported.

 This is a regression for TLS support on AIX.

 This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
 libgfortran, and libgomp.  I would like to apply the patch to GCC
 while I simultaneously work with the Libtool community to correct
>>the
 bug upstream.  I also would like to backport this to GCC 5.2 and GCC
 4.9.x.
>>>
>>> I think it's okay to wait for the patch to be upstream.
>>>
>>> I can help committing the patch once it is.
>>
>>Paolo,
>>
>>The patch MUST go into GCC 5.3 release, which Richi has announced that
>>he wants to release around the beginning of Stage 3.  The patch should
>>not go into GCC 5 branch without going into trunk.  This patch cannot
>>wait unless you want to block GCC 5.3.
>
> I don't remember announcing this timeframe but I won't be able to dedicate 
> the required resources until the end of November.
>
> Richard.
>
>>Thanks, David
>
>


Re: Division Optimization in match and simplify

2015-11-05 Thread Richard Biener
On November 5, 2015 2:40:30 PM GMT+01:00, Michael Matz  wrote:
>Hi,
>
>On Wed, 4 Nov 2015, Richard Biener wrote:
>
>> Ah, it was _left_ shift of negative values that ubsan complains
>about.
>
>Note that this is only for the frontend definition of shifts.  I don't
>see 
>why gimple shouldn't define it to the only sensible definition there
>is, 
>which also happens to be the one that all CPUs in the world implement 
>(well, those using two-complement representation at least).

Yeah, but front ends and back end are not separated enough to have both.

Richard.

>
>Ciao,
>Michael.




Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-05 Thread Paolo Bonzini


On 05/11/2015 17:38, David Edelsohn wrote:
> On Thu, Nov 5, 2015 at 8:34 AM, Paolo Bonzini  wrote:
>>
>>
>> On 05/11/2015 17:28, David Edelsohn wrote:
>>> [Explicitly copying build maintainers.]
>>>
>>> Paolo and Alexandre,
>>>
>>> Could you review and help with this patch?
>>>
>>> TLS symbols in AIX display a new, different symbol type in nm output.
>>> Libtool explicitly creates a list of exported symbols for shared
>>> libraries using nm and does not recognize the new TLS symbols, so
>>> those symbols are not exported.
>>>
>>> This is a regression for TLS support on AIX.
>>>
>>> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
>>> libgfortran, and libgomp.  I would like to apply the patch to GCC
>>> while I simultaneously work with the Libtool community to correct the
>>> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
>>> 4.9.x.
>>
>> I think it's okay to wait for the patch to be upstream.
>>
>> I can help committing the patch once it is.
> 
> The patch MUST go into GCC 5.3 release, which Richi has announced that
> he wants to release around the beginning of Stage 3.  The patch should
> not go into GCC 5 branch without going into trunk.

What is blocking the patch from being included in libtool in the next
week or two?

> This patch cannot
> wait unless you want to block GCC 5.3.

What blocks GCC 5.3 is decided by the release manager.  As a build
maintainer, I am not going to ack the patch for trunk until it is
included in Libtool, but I'm happy to be overridden by the GCC 5 release
manager.

Paolo


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/05/2015 09:13 AM, Nathan Sidwell wrote:
> On 11/05/15 12:01, Thomas Schwinge wrote:
> 
>> On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis
>>  wrote:
>>> On 11/05/2015 04:14 AM, Thomas Schwinge wrote:
> 
>>> Sorry, I must have mis-phrased it. The spec is unclear here. There are
>>> three possible ways to interpret 'acc parallel loop reduction':
>>>
>>>1. acc parallel reduction
>>>   acc loop
>>
>> This is what you propose in your patch, but I don't think that makes
>> sense, or does it?  I'm happy to learn otherwise, but in my current
>> understanding, a reduction clause needs to be attached (at least) to the
>> innermost construct where reductions are to be processed.  (Let's also
> 
> Correct, the  above interpretation must be wrong.
> 
>> consider multi-level gang/worker/vector loops/reductions.)  So, either:
>>
>>>2. acc parallel
>>>   acc loop reduction
>>
>> ... this, or even this:
>>
>>>3. acc parallel reduction
>>>   acc loop reduction
>>
>> ..., which I'm not sure what the execution model implementation requires.
>> (Nathan?)
> 
> interpretation #2 is sufficient, I think. However, both are lacking a
> 'copy (reduction_var)', clause as otherwise there's nothing changing the
> default data attribute of 'firstprivate' (working on that patch). 
> Perhaps 'reduction' on 'parallel'  is meant to imply that  (because
> that's what makes sense), but the std doesn't say it.
> 
> In summary it's probably safe to implement interpretation #3.  That way
> we can implement the hypothesis that reductions at the outer construct
> imply copy.

OK, #3 it is.

>> And while we're at it: the very same question also applies to the private
>> clause, which -- contrary to all other (as far as I remember) clauses --
>> also is applicable to both the parallel and loop constructs:
>>
>>  #pragma acc parallel loop private([...])
>>
>> ... is to be decomposed into which of the following:
>>
>>  #pragma acc parallel private([...])
>>  #pragma acc loop
>>
>>  #pragma acc parallel
>>  #pragma acc loop private([...])
>>
>>  #pragma acc parallel private([...])
>>  #pragma acc loop private([...])
>>
>> (There is no private clause allowed to be specified with the kernels
>> construct for what it's worth, but that doesn't mean we couldn't use it
>> internally, of course, if so required.)
> 
> I think interpretation #2 or #3 make sense, and I suspect result in the
> same emitted code.

I'll probably go #2 here to make life easier with kernels.

After I make these changes (and the c++ template updates), I'll apply
them to trunk and backport them to gomp4. Thank you Jakub, Thomas and
Nathan for reviewing these patches.

Cesar


[C PATCH] Fix error-recovery with variable-sized compound literals (PR c/68090)

2015-11-05 Thread Marek Polacek
This fixes an error-recovery problem, where we were trying to build
a C_MAYBE_CONST_EXPR even though the type is invalid.  I think let's
just skip this for error_mark_nodes.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-11-05  Marek Polacek  

PR c/68090
* c-parser.c (c_parser_postfix_expression_after_paren_type): Don't
deal with pre-evaluation on invalid types.

* gcc.dg/pr68090.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index ab324d3..e2f55b3 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -7855,7 +7855,7 @@ c_parser_postfix_expression_after_paren_type (c_parser 
*parser,
   expr.value = build_compound_literal (start_loc, type, init.value, non_const);
   expr.original_code = ERROR_MARK;
   expr.original_type = NULL;
-  if (type_expr)
+  if (type != error_mark_node && type_expr)
 {
   if (TREE_CODE (expr.value) == C_MAYBE_CONST_EXPR)
{
diff --git gcc/testsuite/gcc.dg/pr68090.c gcc/testsuite/gcc.dg/pr68090.c
index e69de29..87b3b93 100644
--- gcc/testsuite/gcc.dg/pr68090.c
+++ gcc/testsuite/gcc.dg/pr68090.c
@@ -0,0 +1,13 @@
+/* PR c/68090 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void
+fn (int i)
+{
+  (int[(0, 1)]) { 0 }; /* { dg-error "compound literal has variable size" } */
+  (int[i]) { 0 }; /* { dg-error "compound literal has variable size" } */
+  (int[(0, i)]) { 0 }; /* { dg-error "compound literal has variable size" } */
+  (int [][i]){ 0 }; /* { dg-error "compound literal has variable size" } */
+  (int [][(1, 2)]){ 0 }; /* { dg-error "compound literal has variable size" } 
*/
+}

Marek


Re: [PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-05 Thread Evandro Menezes

James,

Since other members of the "tune_params" structure were signed integers, 
even though negative numbers would make no sense for most either, I 
followed the same pattern.


Regardless, here's a patch with unsigned integers as you requested:

   [AArch64] Add extra tuning parameters for target processors

   2015-11-05  Evandro Menezes  

   gcc/

   * config/aarch64/aarch64-protos.h (tune_params): Add new members
   "max_case_values" and "cache_line_size".
   * config/aarch64/aarch64.c (aarch64_case_values_threshold): New
   function.
   (aarch64_override_options_internal): Tune heuristics based on new
   members in "tune_params".
   (TARGET_CASE_VALUES_THRESHOLD): Define macro.

Please, commit if it's alright.

Thank you,

--
Evandro Menezes

On 11/05/2015 03:22 AM, James Greenhalgh wrote:

On Wed, Nov 04, 2015 at 05:21:03PM -0600, Evandro Menezes wrote:

Please, ignore the previous patch.  This is the intended patch.

Sorry.

--
Evandro Menezes

On 11/04/2015 05:18 PM, Evandro Menezes wrote:

This patch adds extra tuning information about AArch64 targets:

* Maximum number of case values before resorting to a jump table
   The default values assumed independently of the specific backends
   may be rather low for modern processors, which sport quite efficient
   direct branch prediction, whereas indirect branch prediction is
   still typically not so efficient.  This value may be specifically
   set for a processor or left at zero to use the default values.
* L1 cache line size
   The auto-prefetcher uses this information when emitting software
   prefetch insns.

Please, commit if it's alright.

Thank you,


Thanks for the patch,


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 81792bc..ecf4685 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -195,6 +195,9 @@ struct tune_params
int vec_reassoc_width;
int min_div_recip_mul_sf;
int min_div_recip_mul_df;
+  int max_case_values; /* Case values threshold; or 0 for the default.  */

If we're using an int, how about -1 as the sentinel value? (Maybe someone
really likes jump tables!). Otherwise, make this an unsigned int?


+
+  int cache_line_size; /* Cache line size; or 0 for the default.  */

unsigned int?

The patch is otherwise OK, though it needs a ChangeLog.

Thanks,
James




diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 81792bc..ecf4685 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -195,6 +195,9 @@ struct tune_params
   int vec_reassoc_width;
   int min_div_recip_mul_sf;
   int min_div_recip_mul_df;
+  unsigned int max_case_values; /* Case values threshold; or 0 for the default.  */
+
+  unsigned int cache_line_size; /* Cache line size; or 0 for the default.  */
 
 /* An enum specifying how to take into account CPU autoprefetch capabilities
during instruction scheduling:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..e7f1c07 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -355,6 +355,8 @@ static const struct tune_params generic_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -378,6 +380,8 @@ static const struct tune_params cortexa53_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -401,6 +405,8 @@ static const struct tune_params cortexa57_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
 };
@@ -424,6 +430,8 @@ static const struct tune_params cortexa72_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -446,6 +454,8 @@ static const struct tune_params thunderx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -468,6 +478,8 @@ static const struct tune_par

Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Martin Sebor

On 11/05/2015 10:10 AM, Joseph Myers wrote:

On Thu, 5 Nov 2015, Martin Sebor wrote:


optimiziation. We seem to have consensus that even though there
might be exceptions (do we know what some of them are?) the vast
majority optimizations that have -fxxx options are in fact not
performed at -O0. I think reflecting that in the output is more
helpful than implying otherwise.


There are several options marked Optimization that are actually matters of
language semantics that I don't think have anything to do with -O options,
e.g. -fexceptions and -fcx-limited-range.  The Optimization tag really
just means that it's saved and restored under certain circumstances.


Both -fexceptions and -fcx-limited-range are reported as disabled
at -O0, so they're not the examples I'm looking for. The set of
options that are reported as disabled by -O0 with the patched GCC
and that, IIUC, are of concern (i.e., that may not need -O to take
effect) are a subset of those below.

Quickly scanning the list, I would guess -frtti, -fshort-enums,
-fsigned-zeros, and -frtapping-math should be in that category,
but there could very well be others.

If it's relatively straightforward to enumerate all such options
then maybe the way to go is to add a bit to struct cl_option to
make it possible to distinguish them from the others in the -O0
-Q --help=optimizers output.

Thanks
Martin

  -faggressive-loop-optimizations   [disabled by -O0]
  -fasynchronous-unwind-tables  [disabled by -O0]
  -fauto-inc-dec[disabled by -O0]
  -fdce [disabled by -O0]
  -fdelete-null-pointer-checks  [disabled by -O0]
  -fdse [disabled by -O0]
  -fearly-inlining  [disabled by -O0]
  -ffunction-cse[disabled by -O0]
  -fgcse-lm [disabled by -O0]
  -finline  [disabled by -O0]
  -finline-atomics  [disabled by -O0]
  -fira-hoist-pressure  [disabled by -O0]
  -fira-share-save-slots[disabled by -O0]
  -fira-share-spill-slots   [disabled by -O0]
  -fivopts  [disabled by -O0]
  -fjump-tables [disabled by -O0]
  -flifetime-dse[disabled by -O0]
  -fmath-errno  [disabled by -O0]
  -fpeephole[disabled by -O0]
  -fplt [disabled by -O0]
  -fprefetch-loop-arrays[disabled by -O0]
  -frename-registers[disabled by -O0]
  -frtti[disabled by -O0]
  -fsched-critical-path-heuristic   [disabled by -O0]
  -fsched-dep-count-heuristic   [disabled by -O0]
  -fsched-group-heuristic   [disabled by -O0]
  -fsched-interblock[disabled by -O0]
  -fsched-last-insn-heuristic   [disabled by -O0]
  -fsched-rank-heuristic[disabled by -O0]
  -fsched-spec  [disabled by -O0]
  -fsched-spec-insn-heuristic   [disabled by -O0]
  -fsched-stalled-insns-dep [disabled by -O0]
  -fschedule-fusion [disabled by -O0]
  -fshort-enums [disabled by -O0]
  -fsigned-zeros[disabled by -O0]
  -fsplit-ivs-in-unroller   [disabled by -O0]
  -fssa-backprop[disabled by -O0]
  -fstdarg-opt  [disabled by -O0]
  -fstrict-volatile-bitfields   [disabled by -O0]
  -fno-threadsafe-statics   [disabled by -O0]
  -ftrapping-math   [disabled by -O0]
  -ftree-cselim [disabled by -O0]
  -ftree-forwprop   [disabled by -O0]
  -ftree-loop-if-convert[disabled by -O0]
  -ftree-loop-im[disabled by -O0]
  -ftree-loop-ivcanon   [disabled by -O0]
  -ftree-loop-optimize  [disabled by -O0]
  -ftree-phiprop[disabled by -O0]
  -ftree-reassoc[disabled by -O0]
  -ftree-scev-cprop [disabled by -O0]
  -fvar-tracking[disabled by -O0]
  -fvar-tracking-assignments[disabled by -O0]
  -fweb [disabled by -O0]



Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-05 Thread Daniel Gutson
On Thu, Nov 5, 2015 at 2:11 PM, Jonathan Wakely  wrote:
> On 5 November 2015 at 20:52, Daniel Gutson wrote:
>> Real use cases: statistics and logging. It's a (one time) callback
>> reporting that something went wrong,
>> but not intended to fix things e.g. by attempting to free more memory.
>
> Why can't that be done by replacing operator new with a user-defined
> function, instead of modifying the default one?
>
> A program that wants to make use of the altered semantics is
> non-portable because it relies on a custom libstdc++ configuration
> option, and has to install a custom new-handler that relies on the
> altered libstdc++ behaviour. Replacing operator new to do the logging
> is portable and requires no custom configuration.
>
> (I'm not rejecting the patch, but am still trying to make sure it's a
> useful change.)

The issue is, as I understand it, to do the actual work of operator
new, i.e. allocate memory. It should force
us to copy most of the code of the original code of operator new,
which may change on new versions of the
STL, forcing us to keep updated.
This is important when having legacy code that uses operator new (but
doesn't use a new_handler),




-- 

Daniel F. Gutson
Chief Engineering Officer, SPD

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina

Phone:   +54 351 4217888 / +54 351 4218211
Skype:dgutson
LinkedIn: http://ar.linkedin.com/in/danielgutson


Re: [C PATCH] Fix error-recovery with variable-sized compound literals (PR c/68090)

2015-11-05 Thread Joseph Myers
On Thu, 5 Nov 2015, Marek Polacek wrote:

> This fixes an error-recovery problem, where we were trying to build
> a C_MAYBE_CONST_EXPR even though the type is invalid.  I think let's
> just skip this for error_mark_nodes.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[gomp4] Merge trunk r229764 (2015-11-04) into gomp-4_0-branch

2015-11-05 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r229821:

commit bed2efe185cf13a52b9cf87ee52d74265f97d97b
Merge: bb6471e bf3a27b
Author: tschwinge 
Date:   Thu Nov 5 18:02:08 2015 +

svn merge -r 229562:229764 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229821 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Joseph Myers
On Thu, 5 Nov 2015, Martin Sebor wrote:

> > There are several options marked Optimization that are actually matters of
> > language semantics that I don't think have anything to do with -O options,
> > e.g. -fexceptions and -fcx-limited-range.  The Optimization tag really
> > just means that it's saved and restored under certain circumstances.
> 
> Both -fexceptions and -fcx-limited-range are reported as disabled
> at -O0, so they're not the examples I'm looking for. The set of

But reporting that is nonsense.  They are disabled by default for C.  If 
you pass the option on the command line, it's enabled regardless of -O 
options, and should not be reported as disabled.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-05 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Martin Sebor

On 11/05/2015 11:07 AM, Joseph Myers wrote:

On Thu, 5 Nov 2015, Martin Sebor wrote:


There are several options marked Optimization that are actually matters of
language semantics that I don't think have anything to do with -O options,
e.g. -fexceptions and -fcx-limited-range.  The Optimization tag really
just means that it's saved and restored under certain circumstances.


Both -fexceptions and -fcx-limited-range are reported as disabled
at -O0, so they're not the examples I'm looking for. The set of


But reporting that is nonsense.


What I said (meant) was that they are reported as "[disabled]" when
-O0 is used (and they are not explicitly enabled on the command line).
It took me a while to see that you're talking about something slightly
different.


They are disabled by default for C.  If
you pass the option on the command line, it's enabled regardless of -O
options, and should not be reported as disabled.


Yes, I agree that printing "disabled by -O0" when -fexceptions is
explicitly enabled on the command line is not right if the option
isn't actually affected by -O0. This would be prevented if we knew
which of the -fxxx options are or aren't controlled by -O0. Can you
help identify them?

Martin


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Joseph Myers
On Thu, 5 Nov 2015, Martin Sebor wrote:

> Yes, I agree that printing "disabled by -O0" when -fexceptions is
> explicitly enabled on the command line is not right if the option
> isn't actually affected by -O0. This would be prevented if we knew
> which of the -fxxx options are or aren't controlled by -O0. Can you
> help identify them?

I don't think there's any simple way to do so; it requires checking all 
the "if (optimize)" conditions to see what is or is not conditioned that 
way.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-05 Thread Alan Lawrence
On 3 November 2015 at 14:01, Richard Biener  wrote:
>
> Hum.  I still wonder why we need all this complication ...

Well, certainly I'd love to make it simpler, and if the complication
is because I've gone about trying to deal with especially Ada in the
wrong way...

>  I would
> expect that if
> we simply decided to completely scalarize a constant pool aggregate load then
> we can always do that.  SRA should then simply emit the _same_ IL as it does
> for other complete scalarization element accesses.  The only difference is
> that the result should be simplifiable to omit the element load from
> the constant pool.
>
> 1) The FRE pass following SRA should do that for you.
...
> That is, I'd like to see the patch greatly simplified to just consider
> constant pool
> complete scalarization as if it were a regular variable.

Hmm, can you clarify, do you mean I should *not* replace constant pool
values with their DECL_INITIAL? The attempt to substitute in the
initial value is what leads to most of the problems. For example, in
gnat/opt31.adb, create_access finds this expression accessing *.LC0:

MEM[(interfaces__unsigned_8[(sizetype) .P_BOUNDS->LB0:.P_BOUNDS->UB0 >= .P_BOUNDS->LB0 ? (sizetype) .P_BOUNDS->UB0 : (sizetype)
.P_BOUNDS->LB0 +
4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}

this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
far I haven't extended subst_constant_pool_initial to handle
ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:

MEM[(interfaces__unsigned_8[(sizetype) .P_BOUNDS->LB0:.P_BOUNDS->UB0 >= .P_BOUNDS->LB0 ? (sizetype) .P_BOUNDS->UB0 : (sizetype)
.P_BOUNDS->LB0 +
4294967295] *)&*.LC0]

because the type here has size:

MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (.P_BOUNDS->UB0, .P_BOUNDS->LB0), 17179869176>

inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
4-element array (fine). Sadly while the MEM_REF
type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
does not

One possibility is that this whole construct, ARRAY_RANGE_REF that it
is, should mark *.LC0 in cannot_scalarize_away_bitmap.

However, disqualified_constants is also necessary if I want a
meaningful assertion that we do not re-add constant pool entries as
candidates after we've discovered them - or should I try to rely on
there only being one expression that accesses each constant pool
entry? (I don't think that's guaranteed as tree_output_constant_def
does hashing, I admit I haven't really tried to break that assumption)

> 2) You should be able to use fold_ctor_reference directly (in place of
> all your code
> in case offset and size are readily available - don't remember exactly how
> complete scalarization "walks" elements).  Alternatively use
> fold_const_aggregate_ref.

That should work for completely_scalarize, yes, i.e. if I can remove
the other route in create_access.

> 3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
> that will do a bit more (wasted) work compared to 2) but may be easier.
>
> I wouldn't bother with the case where we for some reason do not simplify
> the constant pool load.

Hmmm. As above, I'm not quite sure what you mean by "the constant pool
load" - if that means, substituting the DECL_INITIAL in place of the
VAR_DECL that is DECL_IN_CONSTANT_POOL, then I didn't find a general
tree substitution routine, hence writing subst_constant_pool_initial.
It might be possible to make that simpler/more generic (e.g. just call
copy_node, recurse on each operand, return the original if nothing
changed) and then fold at the end.

Thanks,
Alan


Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-05 Thread Martin Sebor

On 11/02/2015 09:55 PM, Jason Merrill wrote:

On 10/26/2015 10:06 PM, Martin Sebor wrote:

+  if (TREE_CONSTANT (maybe_constant_value (outer_nelts)))
+{
+  if (tree_int_cst_lt (max_outer_nelts_tree, outer_nelts))


maybe_constant_value may return a constant, but that doesn't mean that
outer_nelts was already constant; if it wasn't, the call to
tree_int_cst_lt will fail.


Thanks for the hint. I wasn't able to trigger the failure. I suspect
outer_nelts must have already been folded at this point because the
maybe_constant_value call isn't necessary. I removed it.


Since we're moving toward delayed folding, I'd prefer to use the result
of maybe_constant_value only for this diagnostic, and then continue to
pass the unfolded value along.


Sure. Done in the attached patch.

Martin
gcc/cp/ChangeLog

2015-10-19  Martin Sebor  

	PR c++/67913
	PR c++/67927
	* call.c (build_operator_new_call): Do not assume size_check
	is non-null, analogously to the top half of the function.
	* init.c (build_new_1): Detect and diagnose array sizes in
	excess of the maximum of roughly SIZE_MAX / 2.
	Insert a runtime check only for arrays with a non-constant size.
	(build_new): Detect and diagnose negative array sizes.

gcc/testsuite/ChangeLog

2015-10-19  Martin Sebor  

	* init/new45.C: New test to verify that operator new is invoked
	with or without overhead for a cookie.

	PR c++/67927
	* init/new44.C: New test for placement new expressions for arrays
	with excessive number of elements.

	PR c++/67913
	* init/new43.C: New test for placement new expressions for arrays
	with negative number of elements.

	* other/new-size-type.C: Expect array new expression with
	an excessive number of elements to be rejected.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 367d42b..3f76198 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4228,10 +4228,12 @@ build_operator_new_call (tree fnname, vec **args,
 	 {
 	   /* Update the total size.  */
 	   *size = size_binop (PLUS_EXPR, original_size, *cookie_size);
+	   if (size_check)
+	 {
 	   /* Set to (size_t)-1 if the size check fails.  */
-	   gcc_assert (size_check != NULL_TREE);
 	   *size = fold_build3 (COND_EXPR, sizetype, size_check,
 *size, TYPE_MAX_VALUE (sizetype));
+	 }
 	   /* Update the argument list to reflect the adjusted size.  */
 	   (**args)[0] = *size;
 	 }
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1ed8f6c..3b88098 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2272,7 +2272,11 @@ throw_bad_array_new_length (void)
 /* Generate code for a new-expression, including calling the "operator
new" function, initializing the object, and, if an exception occurs
during construction, cleaning up.  The arguments are as for
-   build_raw_new_expr.  This may change PLACEMENT and INIT.  */
+   build_raw_new_expr.  This may change PLACEMENT and INIT.
+   TYPE is the type of the object being constructed, possibly an array
+   of NELTS elements when NELTS is non-null (in "new T[NELTS]", T may
+   be an array of the form U[inner], with the whole expression being
+   "new U[NELTS][inner]").  */

 static tree
 build_new_1 (vec **placement, tree type, tree nelts,
@@ -2292,13 +2296,16 @@ build_new_1 (vec **placement, tree type, tree nelts,
  type.)  */
   tree pointer_type;
   tree non_const_pointer_type;
+  /* The most significant array bound in int[OUTER_NELTS][inner].  */
   tree outer_nelts = NULL_TREE;
-  /* For arrays, a bounds checks on the NELTS parameter. */
+  /* For arrays with a non-constant number of elements, a bounds checks
+ on the NELTS parameter to avoid integer overflow at runtime. */
   tree outer_nelts_check = NULL_TREE;
   bool outer_nelts_from_type = false;
+  /* Number of the "inner" elements in "new T[OUTER_NELTS][inner]".  */
   offset_int inner_nelts_count = 1;
   tree alloc_call, alloc_expr;
-  /* Size of the inner array elements. */
+  /* Size of the inner array elements (those with constant dimensions). */
   offset_int inner_size;
   /* The address returned by the call to "operator new".  This node is
  a VAR_DECL and is therefore reusable.  */
@@ -2492,21 +2499,41 @@ build_new_1 (vec **placement, tree type, tree nelts,
 	}

   max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
-  /* Only keep the top-most seven bits, to simplify encoding the
-	 constant in the instruction stream.  */
+  max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
+
+  size = size_binop (MULT_EXPR, size, convert (sizetype, nelts));
+
+  if (TREE_CONSTANT (outer_nelts))
+	{
+	  if (tree_int_cst_lt (max_outer_nelts_tree, outer_nelts))
+	{
+	  /* When the array size is constant, check it at compile time
+		 to make sure it doesn't exceed the implementation-defined
+		 maximum, as required by C++ 14 (in C++ 11 this requirement
+		 isn't explicitly stated but it's enforced anyway -- see
+		 grokdeclarator in cp/decl.c).  */
+	  if (complain & tf_error)
+		error ("size of arra

[PATCH 2/4][AArch64] Increase the loop peeling limit

2015-11-05 Thread Evandro Menezes

   2015-11-05  Evandro Menezes 

   gcc/

   * config/aarch64/aarch64.c (aarch64_override_options_internal):
   Increase loop peeling limit.

This patch increases the limit for the number of peeled insns.  With 
this change, I noticed no major regression in either Geekbench v3 or 
SPEC CPU2000 while some benchmarks, typically FP ones, improved 
significantly.


I tested this tuning on Exynos M1 and on A57.  ThunderX seems to benefit 
from this tuning too.  However, I'd appreciate comments from other 
stakeholders.


Thank you,

--
Evandro Menezes

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..66122e7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7672,6 +7672,12 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
+  /* Increase the maximum peeling limit.  */
+  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+			 400,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
   aarch64_override_options_after_change_1 (opts);
 }
 


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-05 Thread Martin Sebor

On 11/05/2015 10:09 AM, Segher Boessenkool wrote:

On Thu, Nov 05, 2015 at 08:58:25AM -0700, Martin Sebor wrote:

I don't think that reiterating in a condensed form what the manual
doesn't make clear in many more words will help. First, even users
who do find the relevant text in the manual often misunderstand it.
Others are misled by the "[enabled]" output into expecting the
optimizations to take place at -O0.


The -Q --help* output is primarily aimed at GCC developers, just
like plain -Q is.  The main problem here seems to be that the
documentation pushes beginner GCC users to use this.

Perhaps we should have a higher-level, more easy to use way to query
what flags will do what, possibly integrated with the pass manager.


I think that would be ideal (i.e., hiding each flag_xxx under
a get_flag_xxx() and set_flag_xxx() interface that would make
it possible to compute and dynamically query individual option
settings based on non-trivial dependencies). It strikes me as
quite a project.




Those who are in doubt and
ask on gcc-help are then told by GCC experts that -O0 disables
all optimizations without exception,


And it does, for the plain meaning of "optimization"; anything else
is arguably a bug.  But people have many different understandings
of what a "compiler optimization" is, all the way to "anything the
compiler does".


Yes, that also sometimes causes confusion, though it's not something
I'm trying to solve with this patch.

Martin


[PATCH] enable loop fusion on isl-15

2015-11-05 Thread Sebastian Pop
   * graphite-optimize-isl.c (optimize_isl): Call
   isl_options_set_schedule_maximize_band_depth.

   * gcc.dg/graphite/fuse-1.c: New.
   * gcc.dg/graphite/fuse-2.c: New.
   * gcc.dg/graphite/interchange-13.c: Remove bogus check.
---
 gcc/graphite-optimize-isl.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c | 43 ++
 gcc/testsuite/gcc.dg/graphite/fuse-2.c | 43 ++
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |  1 -
 4 files changed, 87 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/fuse-1.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/fuse-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 53355bb..0d85975 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -404,7 +404,7 @@ optimize_isl (scop_p scop)
   isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
   /* ISL-0.15 or later.  */
-  isl_options_set_schedule_serialize_sccs (scop->isl_context, 1);
+  isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
 #else
   isl_options_set_schedule_fuse (scop->isl_context, ISL_SCHEDULE_FUSE_MIN);
 #endif
diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-1.c 
b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
new file mode 100644
index 000..c9bb67d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
@@ -0,0 +1,43 @@
+/* Check that the two loops are fused and that we manage to fold the two xor
+   operations.  */
+/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop-all" } */
+/* { dg-do run } */
+
+/* Make sure we fuse the loops like this:
+ISL AST generated by ISL:
+for (int c0 = 0; c0 <= 99; c0 += 1) {
+  S_3(c0);
+  S_6(c0);
+  S_9(c0);
+}
+*/
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for (int c0 = 
0; c0 <= 99; c0 += 1) \{.*S_.*(c0);.*S_.*(c0);.*S_.*(c0);.*\}" 1 "graphite" } } 
*/
+
+/* Check that after fusing the loops, the scalar computation is also fused.  */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 
"forwprop4" } } */
+
+
+
+#define MAX 100
+int A[MAX];
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < MAX; i++)
+A[i] = i;
+  for(int i=0; i

Merge of HSA branch

2015-11-05 Thread Martin Jambor
Hi,

we have had a few last-minute issues, so I apologize this is happening
a bit late, but I would like to start the process or merging the HSA
branch to GCC trunk.  Changes to different parts of the compiler and
libgomp are posted as individual emails in this thread.  No two
patches touch any single file and almost all of them need to be
applied together at once, I have only split the one big patch up to
ease review.

Individual changes are described in slightly more detail in their
respective messages.  If you are interested in how the HSAIL
generation works in general, I encourage you to have a look at my
Cauldron slides or presentation, only very few things have changed as
far as the general principles are concerned.  Let me just quickly stress
here that we do acceleration within a single compiler, as opposed to
LTO-ways of all the other accelerator teams.

We have bootsrapped and tested the patched GCC with and without HSA
enabled on x86_64-linux and found only a single issue (see below) a
few new HSA warnings when HSA was enabled.  We have also run the full
C, C++ and Fortran testsuite on a computer that actually has a Carrizo
HSA APU and the result was the same.

The single issue is that libgomp.oacc-c/../libgomp.oacc-c-c++-common/*
tests get run using our HSA plugin, which however cannot do OpenACC.
I believe it should be something very easy to fix and I did not want
to hold off the submission and review because of it.

I also acknowledge that we should add HSA-specific tests to the GCC
testsuite but we are only now looking at how to do that and will
welcome any guidance in this regard.

I acknowledge that the submission comes quite late and that the class
of OpenMP loops we can handle well is small, but nevertheless I would
like to ask for review and eventual acceptance to trunk and GCC 6.

I'll be grateful for any review, comments or questions,

Martin



[hsa 1/12] Configuration and offloading-related changes

2015-11-05 Thread Martin Jambor
Hi,

this patch contains changes to the configuration mechanism and offload
bits, so that users can build compilers with HSA support and it plays
nicely with other accelerators despite using an altogether different
implementation approach.

With this patch, the user can request HSA support by including the
string "hsa" among the requested accelerators in
--enable-offload-targets.  This will cause the compiler to start
producing HSAIL for target OpenMP regions/functions and the hsa
libgomp plugin to be built.  Because the plugin needs to use HSA
run-time library, I have introduced options --with-hsa-runtime (and
more precise --with-hsa-include and --with-hsa-lib) to help find it.

The catch is however that there is no offload compiler for HSA and so
the wrapper should not attempt to look for it (that is what the hunk
in lto-wrapper.c does) and when HSA is the only accelerator, it is
wasteful to output LTO sections with byte-code and therefore if HSA is
the only configured accelerator, it does not set ENABLE_OFFLOADING
macro.

Finally, when the compiler has been configured for HSA but the user
disables it by omitting it in the -foffload compiler option, we need
to observe that decision.  That is what the opts.c hunk does.

Thanks,

Martin


2015-11-04  Martin Jambor  

gcc/
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* configure.ac (accel_dir_suffix): Treat hsa specially.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.

libgomp/plugin/
* Makefrag.am: Add HSA plugin requirements.
* configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.
(PLUGIN_NVPTX): New conditional.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7d53a7d..2ca16b1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1293,6 +1293,11 @@ OBJS = \
graphite-sese-to-poly.o \
gtype-desc.o \
haifa-sched.o \
+   hsa.o \
+   hsa-gen.o \
+   hsa-regalloc.o \
+   hsa-brig.o \
+   hsa-dump.o \
hw-doloop.o \
hwint.o \
ifcvt.o \
@@ -1317,6 +1322,7 @@ OBJS = \
ipa-icf.o \
ipa-icf-gimple.o \
ipa-reference.o \
+   ipa-hsa.o \
ipa-ref.o \
ipa-utils.o \
ipa.o \
@@ -2379,6 +2385,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/sanopt.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
+  $(srcdir)/hsa.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 7e22267..d0d4565 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -943,6 +943,13 @@ AC_SUBST(accel_dir_suffix)
 
 for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
   tgt=`echo $tgt | sed 's/=.*//'`
+
+  if echo "$tgt" | grep "^hsa" > /dev/null ; then
+enable_hsa=1
+  else
+enable_offloading=1
+  fi
+
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
@@ -951,11 +958,16 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ 
/g'`; do
 done
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
   [Define to offload targets, separated by commas.])
-if test x"$offload_targets" != x; then
+if test x"$enable_offloading" != x; then
   AC_DEFINE(ENABLE_OFFLOADING, 1,
 [Define this to enable support for offloading.])
 fi
 
+if test x"$enable_hsa" = x1 ; then
+  AC_DEFINE(ENABLE_HSA, 1,
+[Define this to enable support for generating HSAIL.])
+fi
+
 AC_ARG_WITH(multilib-list,
 [AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and 
x86-64 only)])],
 :,
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 20e67ed..5f564d9 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -745,6 +745,11 @@ compile_images_for_offload_targets (unsigned in_argc, char 
*in_argv[],
   offload_names = XCNEWVEC (char *, num_targets + 1);
   for (unsigned i = 0; i < num_targets; i++)
 {
+  /* HSA does not use LTO-like streaming and a diffe

[hsa 2/12] Modifications to libgomp proper

2015-11-05 Thread Martin Jambor
Hi,

The patch below contains all changes to libgomp files.  First, it adds
a new constant identifying HSA devices and a structure that is shared
between libgomp and the compiler when kernels from kernels are invoked
via dynamic parallelism.

Second it modifies the GOMP_target_41 function so that it also can take
kernel attributes (essentially the grid dimension) as a parameter and
pass it on the HSA libgomp plugin.  Because we do want HSAIL
generation to gracefully fail and use host fallback in that case, the
same function calls the host implementation if it cannot map the
requested function to an accelerated one or of a new callback
can_run_func indicates there is a problem.

We need a new hook because we use it to check for linking errors which
we cannot do when incrementally loading registered images.  And we
want to handle linking errors, so that when we cannot emit HSAIL for a
function called from a kernel (possibly in a different compilation
unit), we also resort to host fallback.

Last but not least, the patch removes data remapping when the selected
device is capable of sharing memory with the host.

Thanks,

Martin


2015-11-02  Martin Jambor  
Martin Liska  

include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.

libgomp/
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_device_descr): Extra parameter of run_func, new
field can_run_func.
* libgomp_g.h (GOMP_target_41): Add an extra parameter.
* oacc-host.c (host_run): Add an extra unused parameter.
* target.c (gomp_get_target_fn_addr): Allow failure if device shares
memory.
(GOMP_target): Assert failure did not happen.  Add extra parameter to
call of run_func.
(GOMP_target_41): Added an extra parameter, pass it to run_func.
Allow host fallback if device shares memory.  Do not remap data if
device has shared memory.
(GOMP_target_data_41): Run host fallback if device has shared memory.
(gomp_load_plugin_for_device): Also attempt to load can_run_func.


diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f834dec..46d52b3 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -160,6 +160,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NOT_HOST   4
 #define GOMP_DEVICE_NVIDIA_PTX 5
 #define GOMP_DEVICE_INTEL_MIC  6
+#define GOMP_DEVICE_HSA7
 
 #define GOMP_DEVICE_ICV-1
 #define GOMP_DEVICE_HOST_FALLBACK  -2
@@ -212,4 +213,35 @@ enum gomp_map_kind
 #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x)
 #define GOMP_LAUNCH_OP_MAX 0x
 
+/* HSA specific data structures.  */
+
+/* HSA kernel dispatch is collection of information needed for
+   a kernel dispatch.  */
+
+struct hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct hsa_kernel_dispatch **children_dispatches;
+};
+
 #endif
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 24fbb94..acf6eb7 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HSA = 7
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..0ad42d2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -876,7 +876,8 @@ struct gomp_device_descr
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
-  void (*run_func) (int, void *, void *);
+  void (*run_func) (int, void *, void *, const void *);
+  bool (*can_run_func) (void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s mem_map;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index c28ad21..adb9bcc 100644
--

[hsa 3/12] HSA libgomp plugin

2015-11-05 Thread Martin Jambor
Hi,

the patch below adds the HSA-specific plugin for libgomp.  The plugin
implements the interface mandated by libgomp and takes care of finding
any available HSA devices, finalizing HSAIL code and running it on
HSA-capable GPUs.  The plugin does not really implement any data
movement functions (it implements them with a fatal error call)
because memory is shared in HSA environments and the previous patch
has modified libgomp proper not to call those functions on devices
with this capability.

When going over the code for the last time, I realized I did not
implement any version checks in the plugin yet, but did not want to
hold off the initial review round because of that.  I will have a look
at it as soon as possible.

Thanks,

Martin


2015-11-05  Martin Jambor  
Martin Liska  

* plugin/plugin-hsa.c: New file.



diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
new file mode 100644
index 000..c1b7879
--- /dev/null
+++ b/libgomp/plugin/plugin-hsa.c
@@ -0,0 +1,1309 @@
+#include 
+#include 
+#include 
+#include 
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+#include "hsa.h"
+#include "hsa_ext_finalize.h"
+#include "dlfcn.h"
+
+/* Part of the libgomp plugin interface.  Return the name of the accelerator,
+   which is "hsa".  */
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "hsa";
+}
+
+/* Part of the libgomp plugin interface.  Return the specific capabilities the
+   HSA accelerator have.  */
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400;
+}
+
+/* Part of the libgomp plugin interface.  Identify as HSA accelerator.  */
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_HSA;
+}
+
+/* Return the libgomp version number we're compatible with.  There is
+   no requirement for cross-version compatibility.  */
+
+unsigned
+GOMP_OFFLOAD_version (void)
+{
+  return GOMP_VERSION;
+}
+
+/* Flag to decide whether print to stderr information about what is going on.
+   Set in init_debug depending on environment variables.  */
+
+static bool debug;
+
+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;
+
+/* Initialize debug and suppress_host_fallback according to the environment.  
*/
+
+static void
+init_enviroment_variables (void)
+{
+  if (getenv ("HSA_DEBUG"))
+debug = true;
+  else
+debug = false;
+
+  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress_host_fallback = false;
+}
+
+/* Print a logging message with PREFIX to stderr if HSA_DEBUG value
+   is set to true.  */
+
+#define HSA_LOG(prefix, ...) \
+  do \
+  { \
+if (debug) \
+  { \
+   fprintf (stderr, prefix); \
+   fprintf (stderr, __VA_ARGS__); \
+  } \
+  } \
+  while (false);
+
+/* Print a debugging message to stderr.  */
+
+#define HSA_DEBUG(...) HSA_LOG ("HSA debug: ", __VA_ARGS__)
+
+/* Print a warning message to stderr.  */
+
+#define HSA_WARNING(...) HSA_LOG ("HSA warning: ", __VA_ARGS__)
+
+/* Print HSA warning STR with an HSA STATUS code.  */
+
+static void
+hsa_warn (const char *str, hsa_status_t status)
+{
+  if (!debug)
+return;
+
+  const char* hsa_error;
+  hsa_status_string (status, &hsa_error);
+
+  unsigned l = strlen (hsa_error);
+
+  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);
+  memcpy (err, hsa_error, l - 1);
+  err[l] = '\0';
+
+  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);
+
+  free (err);
+}
+
+/* Report a fatal error STR together with the HSA error corresponding to STATUS
+   and terminate execution of the current process.  */
+
+static void
+hsa_fatal (const char *str, hsa_status_t status)
+{
+  const char* hsa_error;
+  hsa_status_string (status, &hsa_error);
+  GOMP_PLUGIN_fatal ("HSA fatal error: %s (%s)", str, hsa_error);
+}
+
+struct hsa_kernel_description
+{
+  const char *name;
+  unsigned omp_data_size;
+  unsigned kernel_dependencies_count;
+  const char **kernel_dependencies;
+};
+
+/* Data passed by the static initializer of a compilation unit containing BRIG
+   to GOMP_offload_register.  */
+
+struct brig_image_desc
+{
+  hsa_ext_module_t brig_module;
+  const unsigned kernel_count;
+  struct hsa_kernel_description *kernel_infos;
+};
+
+struct agent_info;
+
+/* Information required to identify, finalize and run any given kernel.  */
+
+struct kernel_info
+{
+  /* Name of the kernel, required to locate it within the brig module.  */
+  const char *name;
+  /* Size of memory space for OMP data.  */
+  unsigned omp_data_size;
+  /* The specific agent the kernel has been or will be finalized for and run
+ on.  */
+  struct agent_info *agent;
+  /* The specific module where the kernel takes place.  */
+  struct module_info *module;
+  /* Mutex enforcing that at most once thread ever initializes a kernel for
+ use.  A thread should have locked agent->modules_rwlock for readi

[hsa 4/12] OpenMP lowering/expansion changes (gridification)

2015-11-05 Thread Martin Jambor
Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.  The plan is to make that class quite a big
bigger, but only the following is ready for submission now.

Basically, whenever the compiler configured for HSAIL generation
encounters the following pattern:

  #pragma omp target
  #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional
  #pragma omp distribute parallel for firstprivate(n) private(i) 
other_sharing_clauses()
for (i = 0; i < n; i++)
  some_loop_body

it creates a copy of the entire target body and expands it slightly
differently for concurrent execution on a GPU.  Note that both teams
and distribute constructs are mandatory.  Moreover, currently the
distribute has to be in a combined statement with the inner for
construct.  And there are quite a few other restrictions which I hope
to alleviate over the next year, most notably implement reductions.  A
few days ago I hoped to finish writing support for collapse(2) and
collapse(3) clauses in time for stage1 but now I am a bit sceptical.

The first phase of the "gridification" process is run before omp
"scanning" phase.  We look for the pattern above, and if we encounter
one, we copy its entire body into a new gimple statement
GIMPLE_OMP_GPUKERNEL.  Within it, we mark the teams, distribute and
parallel constructs with a new flag "kernel_phony."  This flag will
then make OMP lowering phase process their sharing clauses like usual,
but the statements representing the constructs will be removed at
lowering (and thus will never be expanded).  The resulting wasteful
repackaging of data is nicely cleaned by our optimizers even at -O1.

At expansion time, we identify gomp_target statements with a kernel
and expand the kernel into a special function, with the loop
represented by the GPU grid and not control flow.  Afterwards, the
normal body of the target is expanded as usual.  Finally, we need to
take the grid dimensions stored within new fields of the target
statement by the first phase, store in a structure and pass them to
libgomp in a new parameter of GOMP_target_41.

Originally, when I started with the above pattern matching, I did not
allow any other gimple statements in between the respective omp
constructs.  That however proved to be too restrictive for two
reasons.  First, statements in pre-bodies of both distribute and for
loops needed to be accounted for when calculating the kernel grid size
(which is done before the target statement itself) and second, Fortran
parameter dereferences happily result in interleaving statements when
there were none in the user source code.

Therefore, I allow register-type stores to local non-addressable
variables in pre-bodies and also in between the OMP constructs.  All
of them are copied in front of the target statement and either used
for grid size calculation or removed as useless by later
optimizations.

For convenience of anybody reviewing the code, I'm attaching a very
simple testcase with selection of dumps that illustrate the whole
process.

While we have also been experimenting quite a bit with dynamic
parallelism, we have only been able to achieve any good performance
via this process of gridification.  The user can be notified whether a
particular target construct was gridified or not via our process of
dumping notes, which however only appear in the detailed dump.  I am
seriously considering emitting some kind of warning, when HSA-enabled
compiler is about to produce a non-gridified target code.

I hope that eventually I managed to write the gridification in a way
that interferes very little with the rest of the OMP pipeline and yet
only re-implement the bare necessary minimum of functionality that is
already there.  I'll be grateful for any feedback regarding the
approach.

Thanks,

Martin


2015-11-05  Martin Jambor  

* builtin-types.def (BT_FN_VOID_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* fortran/types.def (BT_FN_VOID_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Handle GIMPLE_OMP_GPUKERNEL.
* gimple-pretty-print.c (dump_gimple_omp_for): Likewise.
(dump_gimple_omp_block): Handle GF_OMP_FOR_KIND_KERNEL_BODY
(pp_gimple_stmt_1): Handle GIMPLE_OMP_GPUKERNEL.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_gpukernel): New function.
(gimple_omp_target_init_dimensions): Likewise.
(gimple_copy): Handle GIMPLE_OMP_GPUKERNEL.
* gimple.def (GIMPLE_OMP_TEAMS): Moved into its own layout.
(GIMPLE_OMP_GPUKERNEL): New.
* gimple.h (gf_mask): New element GF_OMP_FOR_KIND_KERNEL_BODY.
(gomp_for): New 

  1   2   >