date:20151012

Re: Fix more of C/fortran canonical type issues

2015-10-12 Thread Jan Hubicka

> Honza,
> > this is a variant of patch I commited (adding the suggested predicate)
> 
> This caused https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67923

Hmm, strange, I do not seem to be able to reproduce this locally. Is it x86?

/opt/gcc/_clean/gcc/testsuite/gfortran.dg/pr56015.f90:12:0: error: type 
mismatch in pointer plus expression
   subroutine foo (p)
^
complex double[10] *

complex double[10] * restrict

long int

_85 = p_5(D) + 32;

I suppose the complaint is about "long int".  Did you possibly revert the change
to skip TYPE_CANONICAL testing in useless_type_conversion? That would declare
"long int" to be the same as "unsigned long int"

Honza

Re: Fix more of C/fortran canonical type issues

2015-10-12 Thread Richard Biener

On Mon, 12 Oct 2015, Jan Hubicka wrote:

> > Honza,
> > > this is a variant of patch I commited (adding the suggested predicate)
> > 
> > This caused https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67923
> 
> Hmm, strange, I do not seem to be able to reproduce this locally. Is it x86?

Can't reproduce it either.

Richard.

> /opt/gcc/_clean/gcc/testsuite/gfortran.dg/pr56015.f90:12:0: error: type 
> mismatch in pointer plus expression
>subroutine foo (p)
> ^
> complex double[10] *
> 
> complex double[10] * restrict
> 
> long int
> 
> _85 = p_5(D) + 32;
> 
> I suppose the complaint is about "long int".  Did you possibly revert the 
> change
> to skip TYPE_CANONICAL testing in useless_type_conversion? That would declare
> "long int" to be the same as "unsigned long int"
> 
> Honza

Re: [C++ PATCH] PR c++/58566

2015-10-12 Thread Jason Merrill


OK.

Jason

Re: [patch] header file re-ordering.

2015-10-12 Thread Jeff Law


On 10/08/2015 07:37 AM, Andrew MacLeod wrote:

On 10/07/2015 06:02 PM, Jeff Law wrote:

On 10/01/2015 08:33 PM, Andrew MacLeod wrote:

these are all in the main gcc directory. 297 files total.

Everything bootstraps on x86_64-pc-linux-gnu and
powerpc64le-unknown-linux-gnu.  All targets in config-list.mk still
build. Regressions tests also came up clean.

OK for trunk?

So as I look at this and make various spot checks, what really stands
out is how often something like alias.h gets included, often in places
that have absolutely no business/need to be looking at that file.
Cut-n-paste at its worst.  It happens to many others, but alias.h
seems to have gotten its grubby self into just about everywhere for
reasons unkonwn.

I find myself also wondering if a two step approach would make this
easier.  Step #1 being ordering the headers, step #2 being removal of
the duplicates.  As you note, the downside is two checkins that would
affect most files in the tree.  I guess I'll keep slogging through the
patch as is...

jeff

Heres the patch for reordered headers.  Building as we speak.  Hard to
fully verify since Ada doesn't seem to bootstrap on trunk at the moment:

+===GNAT BUG
DETECTED==+
| 6.0.0 20151008 (experimental) (x86_64-pc-linux-gnu) GCC
error:   |
| in gen_lowpart_common, at
emit-rtl.c:1399|
| Error detected around
s-regpat.adb:1029:22   |

<...>
raised TYPES.UNRECOVERABLE_ERROR : comperr.adb:423
../gcc-interface/Makefile:311: recipe for target 's-regpat.o' failed


However, the tool has been run, and I've made the minor adjustments
required to the source files to make it work.  (ie, a few multi-line
comments and the fact that mul-tables.c is generated on the tile* targets.

So this is what it should look like.  I used -cp.Other languages are
bootstrapping, and I have yet to build all the targets... that'll just
take a day.   Be nice if ada worked tho.

I can run the reduction tool over the weekend (its a long weekend here
:-) on this if you want...  the other patch is a couple of weeks out of
date anyway now.
I find myself looking at the objc stuff and wondering if it was built. 
For example objc-act.c calls functions prototyped in fold-const.h, but 
that header is no longer included after your patch.


Similarly in objcp we remove tree.h from objcp-decl.c, but it uses TREE 
macros and I don't immediately see where those macros would be coming 
from if tree.h is no longer included.


In general, I'm worried about the objc/objcp stuff.  That in turn makes 
me wonder about the other stuff in a more general sense.  Regardless, I 
think I can take a pretty good stab at the config/ changes.



A pattern that seems to play out a lot in the target files is they liked 
to include insn-config.h, insn-codes.h, & timevar.h.  I can see how 
those typically won't be needed.  The first two are amazingly common.  A 
comment in the nds32 port indicates that insn-config.h may have been 
needed by recog.h in the past.  nds32 actually included insn-config 
twice :-)



Interestingly enough m32r, mcore & pdp11 still need insn-config

The strangest thing I saw was rs6000 dropping an include of emit-rtl.h. 
 But presumably various powerpc targets were built, so I guess it's 
really not needed.


I'm slightly concerned about the darwin, windows and solaris bits.  The 
former primarily because Darwin has been a general source of pain, and 
in the others because I'm not sure the cross testing will exercise that 
code terribly much.


I'll go ahead and approve all the config/ bits.  Please be on the 
lookout for any fallout.


I'll try and get into more of the other patches tomorrow.

jeff

[PATCH] More vectorizer TLC

2015-10-12 Thread Richard Biener


Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-12  Richard Biener  

* tree-vect-loop.c (vect_analyze_loop_operations): Move cost
related code ...
(vect_analyze_loop_2): ... here.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 228644)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1430,17 +1430,10 @@ vect_analyze_loop_operations (loop_vec_i
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
-  unsigned int vectorization_factor;
   int i;
   stmt_vec_info stmt_info;
   bool need_to_vectorize = false;
-  int min_profitable_iters;
-  int min_scalar_loop_bound;
-  unsigned int th;
   bool ok;
-  HOST_WIDE_INT max_niter;
-  HOST_WIDE_INT estimated_niter;
-  int min_profitable_estimate;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -1585,94 +1578,6 @@ vect_analyze_loop_operations (loop_vec_i
   return false;
 }
 
-  vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  gcc_assert (vectorization_factor != 0);
-
-  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"vectorization_factor = %d, niters = "
-HOST_WIDE_INT_PRINT_DEC "\n", vectorization_factor,
-LOOP_VINFO_INT_NITERS (loop_vinfo));
-
-  if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-   && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
-  || ((max_niter = max_stmt_executions_int (loop)) != -1
- && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: iteration count too small.\n");
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: iteration count smaller than "
-"vectorization factor.\n");
-  return false;
-}
-
-  /* Analyze cost.  Decide if worth while to vectorize.  */
-
-  vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,
- &min_profitable_estimate);
-  LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo) = min_profitable_iters;
-
-  if (min_profitable_iters < 0)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: vectorization not profitable.\n");
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: vector version will never be "
-"profitable.\n");
-  return false;
-}
-
-  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
-* vectorization_factor) - 1);
-
-
-  /* Use the cost model only if it is more conservative than user specified
- threshold.  */
-
-  th = (unsigned) min_scalar_loop_bound;
-  if (min_profitable_iters
-  && (!min_scalar_loop_bound
-  || min_profitable_iters > min_scalar_loop_bound))
-th = (unsigned) min_profitable_iters;
-
-  LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
-
-  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-  && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: vectorization not profitable.\n");
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"not vectorized: iteration count smaller than user "
-"specified loop bound parameter or minimum profitable "
-"iterations (whichever is more conservative).\n");
-  return false;
-}
-
-  if ((estimated_niter = estimated_stmt_executions_int (loop)) != -1
-  && ((unsigned HOST_WIDE_INT) estimated_niter
-  <= MAX (th, (unsigned)min_profitable_estimate)))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: estimated iteration count too "
- "small.\n");
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"not vectorized: estimated iteration count smaller "
- "than specified loop bound parameter or minimum "
- "profitable iterations (whichever is more "
- "conservative).\n");
-  return false;
-}
-
   return true;
 }
 
@@ -1688,7 +1593,6 @@ vect_analyze_loop_2 (loop_vec_info loop_
   bool ok;
   int max_vf = MAX_VECTORIZATION_FACTOR;
   int min_vf = 2;
-  unsigned int th;

Re: [PATCH] hurd: align -p and -pg behavior on Linux

2015-10-12 Thread Thomas Schwinge

Hi Samuel!

On Sat, 19 Sep 2015 14:00:23 +0200, Samuel Thibault  
wrote:
> On Linux, -p and -pg do not make gcc link against libc_p.a, only
> -profile does (as documented in r11246), and thus people expect -p
> and -pg to work without libc_p.a installed (it is actually even not
> available any more in Debian).  We should thus rather make the Hurd port
> do the same to avoid build failures.

ACK.  Thanks, I'll take care of your patch.

>   * gcc/config/gnu.h (LIB_SPEC) [-p|-pg]: Link with -lc instead of -lc_p.
> * gcc/config/i386/gnu.h (STARTFILE_SPEC) [-p|-pg]: Use gcrt1.o
> instead of gcrt0.o.
> 
> --- gcc/config/gnu.h.orig 2015-09-16 00:43:09.785570853 +0200
> +++ gcc/config/gnu.h  2015-09-16 00:43:12.513550418 +0200
> @@ -25,7 +25,7 @@
>  
>  /* Default C library spec.  */
>  #undef LIB_SPEC
> -#define LIB_SPEC "%{pthread:-lpthread} %{pg|p|profile:-lc_p;:-lc}"
> +#define LIB_SPEC "%{pthread:-lpthread} %{profile:-lc_p;:-lc}"
>  
>  #undef GNU_USER_TARGET_OS_CPP_BUILTINS
>  #define GNU_USER_TARGET_OS_CPP_BUILTINS()\
> --- gcc/config/i386/gnu.h.orig2015-09-17 21:41:13.0 +
> +++ gcc/config/i386/gnu.h 2015-09-17 23:03:57.0 +
> @@ -27,11 +27,11 @@
>  #undef   STARTFILE_SPEC
>  #if defined HAVE_LD_PIE
>  #define STARTFILE_SPEC \
> -  "%{!shared: 
> %{pg|p|profile:gcrt0.o%s;pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
> +  "%{!shared: 
> %{pg|p:gcrt1.o%s;profile:gcrt0.o%s;pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
> crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
>  #else
>  #define STARTFILE_SPEC \
> -  "%{!shared: %{pg|p|profile:gcrt0.o%s;static:crt0.o%s;:crt1.o%s}} \
> +  "%{!shared: %{pg|p:gcrt1.o%s;profile:gcrt0.o%s;static:crt0.o%s;:crt1.o%s}} 
> \
> crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
>  #endif


Grüße,
 Thomas


signature.asc
Description: PGP signature

[PATCH, rs6000]: Use ROUND_UP and ROUND_DOWN macros

2015-10-12 Thread Uros Bizjak

Fairly trivial patch that introduces no functional changes.

2015-10-12  Uros Bizjak  

* config/rs6000/rs6000.h (RS6000_ALIGN): Implement using
ROUND_UP macro.
* config/rs6000/rs6000.c (rs6000_darwin64_record_arg_advance_flush):
Use ROUND_UP and ROUND_DOWN macros where applicable.
(rs6000_darwin64_record_arg_flush): Ditto.
(rs6000_function_arg): Use ROUND_UP to calculate align_words.
(rs6000_emit_probe_stack_range): Use ROUND_DOWN to calculate
rounded_size.

Tested by building a crosscompiler to powerpc64-linux-gnu.

OK for mainline?

Uros.
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 228703)
+++ config/rs6000/rs6000.c  (working copy)
@@ -9790,12 +9790,12 @@ rs6000_darwin64_record_arg_advance_flush (CUMULATI
 e.g., in packed structs when there are 3 bytes to load.
 Back intoffset back to the beginning of the word in this
 case.  */
- intoffset = intoffset & -BITS_PER_WORD;
+ intoffset = ROUND_DOWN (intoffset, BITS_PER_WORD);
}
 }
 
-  startbit = intoffset & -BITS_PER_WORD;
-  endbit = (bitpos + BITS_PER_WORD - 1) & -BITS_PER_WORD;
+  startbit = ROUND_DOWN (intoffset, BITS_PER_WORD);
+  endbit = ROUND_UP (bitpos, BITS_PER_WORD);
   intregs = (endbit - startbit) / BITS_PER_WORD;
   cum->words += intregs;
   /* words should be unsigned. */
@@ -10255,15 +10255,15 @@ rs6000_darwin64_record_arg_flush (CUMULATIVE_ARGS
 e.g., in packed structs when there are 3 bytes to load.
 Back intoffset back to the beginning of the word in this
 case.  */
-intoffset = intoffset & -BITS_PER_WORD;
-mode = word_mode;
+ intoffset = ROUND_DOWN (intoffset, BITS_PER_WORD);
+ mode = word_mode;
}
 }
   else
 mode = word_mode;
 
-  startbit = intoffset & -BITS_PER_WORD;
-  endbit = (bitpos + BITS_PER_WORD - 1) & -BITS_PER_WORD;
+  startbit = ROUND_DOWN (intoffset, BITS_PER_WORD);
+  endbit = ROUND_UP (bitpos, BITS_PER_WORD);
   intregs = (endbit - startbit) / BITS_PER_WORD;
   this_regno = cum->words + intoffset / BITS_PER_WORD;
 
@@ -10622,7 +10622,7 @@ rs6000_function_arg (cumulative_args_t cum_v, mach
 save area?  */
   if (TARGET_64BIT && ! cum->prototype)
{
- int align_words = (cum->words + 1) & ~1;
+ int align_words = ROUND_UP (cum->words, 2);
  k = rs6000_psave_function_arg (mode, type, align_words, rvec);
}
 
@@ -23336,7 +23336,7 @@ rs6000_emit_probe_stack_range (HOST_WIDE_INT first
 
   /* Step 1: round SIZE to the previous multiple of the interval.  */
 
-  rounded_size = size & -PROBE_INTERVAL;
+  rounded_size = ROUND_DOWN (size, PROBE_INTERVAL);
 
 
   /* Step 2: compute initial and final value of the loop counter.  */
Index: config/rs6000/rs6000.h
===
--- config/rs6000/rs6000.h  (revision 228703)
+++ config/rs6000/rs6000.h  (working copy)
@@ -1615,7 +1615,7 @@ extern enum reg_class rs6000_constraints[RS6000_CO
   ((DEFAULT_ABI == ABI_ELFv2 ? 12 : 20) << (TARGET_64BIT ? 1 : 0))
 
 /* Align an address */
-#define RS6000_ALIGN(n,a) (((n) + (a) - 1) & ~((a) - 1))
+#define RS6000_ALIGN(n,a) ROUND_UP ((n), (a))
 
 /* Offset within stack frame to start allocating local variables at.
If FRAME_GROWS_DOWNWARD, this is the offset to the END of the

Re: [PATCH] Fix PR67783, quadraticness in IPA inline analysis

2015-10-12 Thread Richard Biener

On Sun, 11 Oct 2015, Dominique d'Humières wrote:

> > It seems there was regression on fatigue/fatigue2 
> > http://gcc.opensuse.org/c++bench/pb11/ Fatigue
> > was one of reasons to intorduce the heuristics, so it may be related to the 
> > patch :( 
> 
> The test in pr 64099 comment 14 now requires -fwhole-program to inline the 
> subroutine perdida:
> 
> [Book15] lin/test% /opt/gcc/gcc6p-228566/bin/gfortran fatigue_v1nn2.f90 -Ofast
> [Book15] lin/test% time a.out > /dev/null
> 16.266u 0.003s 0:16.27 99.9%  0+0k 0+0io 36pf+0w
> [Book15] lin/test% /opt/gcc/gcc6p-228566/bin/gfortran fatigue_v1nn2.f90 
> -Ofast -fwhole-program
> [Book15] lin/test% time a.out > /dev/null
> 6.179u 0.001s 0:06.18 99.8%   0+0k 0+0io 0pf+0w

Ok, so for loops like

:
# S.255_53 = PHI <1(134), S.255_598(136)>
if (S.255_53 > 3)
  goto ;
else
  goto ;

:
_590 = S.255_53 * iftmp.446_62;
_591 = _587 + _590;
_592 = *back_stress_tensor.0_140[_591];
_593 = S.255_53 + _589;
_594 = back_stress_rate_tensor[_593];
_595 = plastic_time_step_521 * _594;
_596 = _592 + _595;
*back_stress_tensor.0_140[_591] = _596;
S.255_598 = S.255_53 + 1;
goto ;

we analyze _591 and see that it is {iftmp.446_62, +, iftmp.446_62}
(missed strength-reduction in early opts).  And that is defined by
a stride load from the array descriptor with the usual
if (stride == 0) stride = 1 added.  With strength reduction
performed we'd see an IV in the loop header, without we have to
do the more expensive work.

I'm testing the following (solves the fatigue regression for me).

Richard.

2015-10-12  Richard Biener  

PR ipa/64099
* ipa-inline-analysis.c (estimate_function_body_sizes): Re-add
code that analyzes IVs on each stmt but in a cheaper way avoiding
quadratic behavior.

Index: gcc/ipa-inline-analysis.c
===
*** gcc/ipa-inline-analysis.c   (revision 228703)
--- gcc/ipa-inline-analysis.c   (working copy)
*** estimate_function_body_sizes (struct cgr
*** 2786,2822 
  &will_be_nonconstant);
}
  exits.release ();
  
! for (gphi_iterator gsi = gsi_start_phis (loop->header);
!  !gsi_end_p (gsi); gsi_next (&gsi))
{
! gphi *phi = gsi.phi ();
! tree use = gimple_phi_result (phi);
! affine_iv iv;
! predicate will_be_nonconstant;
! if (virtual_operand_p (use)
! || !simple_iv (loop, loop, use, &iv, true)
! || is_gimple_min_invariant (iv.step))
!   continue;
! will_be_nonconstant
!   = will_be_nonconstant_expr_predicate (fbi.info, info,
! iv.step,
! nonconstant_names);
! if (!true_predicate_p (&will_be_nonconstant))
!   will_be_nonconstant = and_predicates (info->conds,
! &bb_predicate,
! &will_be_nonconstant);
! if (!true_predicate_p (&will_be_nonconstant)
! && !false_predicate_p (&will_be_nonconstant))
!   /* This is slightly inprecise.  We may want to represent
!  each loop with independent predicate.  */
!   loop_stride = and_predicates (info->conds, &loop_stride,
! &will_be_nonconstant);
}
}
set_hint_predicate (&inline_summaries->get (node)->loop_iterations,
  loop_iterations);
!   set_hint_predicate (&inline_summaries->get (node)->loop_stride, 
loop_stride);
scev_finalize ();
  }
FOR_ALL_BB_FN (bb, my_function)
--- 2786,2845 
  &will_be_nonconstant);
}
  exits.release ();
+   }
  
!   /* To avoid quadratic behavior we analyze stride predicates only
!  with respect to the containing loop.  Thus we simply iterate
!over all defs in the outermost loop body.  */
!   for (loop = loops_for_fn (cfun)->tree_root->inner;
!  loop != NULL; loop = loop->next)
!   {
! basic_block *body = get_loop_body (loop);
! for (unsigned i = 0; i < loop->num_nodes; i++)
{
! gimple_stmt_iterator gsi;
! bb_predicate = *(struct predicate *) body[i]->aux;
! for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
!  gsi_next (&gsi))
!   {
! gimple *stmt = gsi_stmt (gsi);
! 
! if (!is_gimple_assign (stmt))
!   continue;
! 
! tree def = gimple_assign_lhs (stmt);
! if (TREE_CODE (def) != SSA_NAME)
!   continue;
! 
! affine_iv iv;
! if (!simple_iv (loop_containing_stmt (stmt),
!

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-12 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:
> > > So, ISTM we should change --with-headers (=yes) to either look
> > > in sys-include or in include.  Setting it to sys-include
> > > wouldn't help you or anyone else as it's already the default...
> > 
> > On the other hand, the current docs appear to imply that the
> > intent was for --with-headers (=yes) to look into a pre-existing
> > sys-include directory for headers.
> 
> Right.  So, if you'd prefer to align the implementation with
> that, I don't mind.  But, these are odd cases as-is, so current
> use and users matter when aligning the documentation and
> implementation and I wouldn't be surprised if the entire
> usage-space is between ours...

So overall, I now this is probably the best way forward:

1) Fix --with-headers to work just like no argument does now,
   i.e. look in sys-include.

This is a simple bugfix that brings behavior in line with
documentation.  It does imply that everybody has to use
sys-include, but that seems to be accepted practice anyway.
(For me, it just means adding a symlink.)

If at some point we do want to make things work without sys-include,
I see two options:

2a) Change  to not look into sys-include, but include.

This would be a change in existing behavior that would affect some
users.  (They could get the old behavior back by simply adding
--with-headers to their configure line, however.)

--or--

2b) Change target_header_dir from a single directory to a list of
directories, and check all of these for header files.  This list
would typically include both sys-include and include.

This should not change behavior for any existing user, and would
bring the header search at configure time in line with the actual
search order used by the compiler at run time, which will probably
be the least surprise to users anyway ...

For 1), something like the following should probably suffice:

Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 228530)
+++ gcc/configure.ac(working copy)
@@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
 fi

 if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
-  if test "x$with_headers" != x; then
+  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
 target_header_dir=$with_headers
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"

I'll probably not spend any more time right now to try to implement
either of the 2) variants; I can live with using sys-include for now.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Move sqrt and cbrt simplifications to match.pd

2015-10-12 Thread Richard Biener

On Fri, Oct 9, 2015 at 6:17 PM, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> Christophe Lyon  writes:
>>> On 8 October 2015 at 18:55, Richard Sandiford
>>>  wrote:
 Marc Glisse  writes:
> On Mon, 5 Oct 2015, Richard Sandiford wrote:
>
>> +  /* cbrt(sqrt(x)) -> pow(x,1/6).  */
>> +  (simplify
>> +   (sqrts (cbrts @0))
>> +   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))
>> +  /* sqrt(cbrt(x)) -> pow(x,1/6).  */
>> +  (simplify
>> +   (cbrts (sqrts @0))
>> +   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))
>
> I think you swapped the comments (not that it matters).

 Thanks, fixed in the committed version.

 Richard

>>> Hi Richard,
>>>
>>> Since you committed this patch, I've noticed that gcc.dg/builtins-10.c fails
>>> on arm-none-linux-gnueabi targets (as opposed to arm-none-linux-gnueabihf).
>>>
>>> gcc.log shows:
>>> /cchfHDHc.o: In function `test':
>>> builtins-10.c:(.text+0x60): undefined reference to `link_error'
>>> collect2: error: ld returned 1 exit status
>>
>> Looks like this is the same fold_strip_sign_ops problem that I was seeing
>> with some WIP follow-on patches.  We don't fold pow(abs(x), 4) to pow(x, 4).
>
> Here's the patch I'm testing.

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * real.h (real_isinteger): Declare.
> * real.c (real_isinteger): New function.
> * match.pd: Simplify pow(|x|,y) and pow(-x,y) to pow(x,y)
> if y is an even integer.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index b87c436..67f9d54 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -309,12 +309,19 @@ along with GCC; see the file COPYING3.  If not see
> && TYPE_OVERFLOW_UNDEFINED (type))
> @0)))
>
> -/* Simplify cos (-x) -> cos (x).  */
>  (for op (negate abs)
> -(for coss (COS COSH)
> - (simplify
> -  (coss (op @0))
> -   (coss @0
> + /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
> + (for coss (COS COSH)
> +  (simplify
> +   (coss (op @0))
> +(coss @0)))
> + /* Simplify pow(-x, y) and pow(|x|,y) -> pow(x,y) if y is an even integer.  
> */
> + (for pows (POW)
> +  (simplify
> +   (pows (op @0) REAL_CST@1)
> +   (with { HOST_WIDE_INT n; }
> +(if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
> + (pows @0 @1))
>
>  /* X % Y is smaller than Y.  */
>  (for cmp (lt ge)
> diff --git a/gcc/real.c b/gcc/real.c
> index f633ffd..85ac83d 100644
> --- a/gcc/real.c
> +++ b/gcc/real.c
> @@ -4997,6 +4997,24 @@ real_isinteger (const REAL_VALUE_TYPE *c, machine_mode 
> mode)
>return real_identical (c, &cint);
>  }
>
> +/* Check whether C is an integer that fits in a HOST_WIDE_INT,
> +   storing it in *INT_OUT if so.  */
> +
> +bool
> +real_isinteger (const REAL_VALUE_TYPE *c, HOST_WIDE_INT *int_out)
> +{
> +  REAL_VALUE_TYPE cint;
> +
> +  HOST_WIDE_INT n = real_to_integer (c);
> +  real_from_integer (&cint, VOIDmode, n, SIGNED);
> +  if (real_identical (c, &cint))
> +{
> +  *int_out = n;
> +  return true;
> +}
> +  return false;
> +}
> +
>  /* Write into BUF the maximum representable finite floating-point
> number, (1 - b**-p) * b**emax for a given FP format FMT as a hex
> float string.  LEN is the size of BUF, and the buffer must be large
> diff --git a/gcc/real.h b/gcc/real.h
> index 706859b..e65b526 100644
> --- a/gcc/real.h
> +++ b/gcc/real.h
> @@ -467,7 +467,8 @@ extern void real_round (REAL_VALUE_TYPE *, machine_mode,
>  extern void real_copysign (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *);
>
>  /* Check whether the real constant value given is an integer.  */
> -extern bool real_isinteger (const REAL_VALUE_TYPE *c, machine_mode mode);
> +extern bool real_isinteger (const REAL_VALUE_TYPE *, machine_mode);
> +extern bool real_isinteger (const REAL_VALUE_TYPE *, HOST_WIDE_INT *);
>
>  /* Write into BUF the maximum representable finite floating-point
> number, (1 - b**-p) * b**emax for a given FP format FMT as a hex
>

Re: [PATCH] Fix parloops gimple_uid usage

2015-10-12 Thread Richard Biener

On Fri, Oct 9, 2015 at 11:09 PM, Tom de Vries  wrote:
> Hi,
>
> In tree-parloops.c:gather_scalar_reductions, we find the comment:
> ...
>   /* As gimple_uid is used by the vectorizer in between
>  vect_analyze_loop_form and destroy_loop_vec_info, we can set
>  gimple_uid of reduc_phi stmts only now.  */
>   reduction_list->traverse  (NULL);
> ...
>
> However, the usage of gimple_uid seems to extend until the
> free_stmt_vec_info_vec call at the end of parallelize_loops (the pass
> top-level function). During free_stmt_vec_info_vec we test for gimple_uid ==
> 0 in vinfo_for_stmt.
>
> By initializing all the phis in the function with -1 before using them in
> the reduct_phi stmts:
> ...
>destroy_loop_vec_info (simple_loop_info, true);
>destroy_loop_vec_info (simple_inner_loop_info, true);
>
>
>
>/* As gimple_uid is used by the vectorizer in between
>   vect_analyze_loop_form and destroy_loop_vec_info, we can set
>   gimple_uid of reduc_phi stmts only now. */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +  gimple_set_uid (gsi_stmt (gsi), (unsigned int)-1);
>  reduction_list->traverse  (NULL);
> ...
> we trigger a sigsegv in vinfo_for_stmt while trying to access
> stmt_vec_info_vec[4294967295 - 1].
>
> This patch fixes that by moving the calls to init_stmt_vec_info_vec and
> free_stmt_vec_info_vec from parallelize_loops and gather_scalar_reductions.
>
> Furthermore, now that the gimple_uids are properly initialized, we can in
> reduction_phi:
> - handle 0 (new phi) and -1 (initialized) values, both meaning the
>   phi's not in the table, and
> - assert that returned entries in fact match the phi argument.
>
> OK for trunk if bootstrap and reg-test passes?

Ok.

Richard.

> Thanks,
> - Tom
>

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-12 Thread Richard Biener

On Thu, Oct 8, 2015 at 11:10 PM, Richard Henderson  wrote:
> On 10/08/2015 09:20 PM, Richard Biener wrote:
>>
>> On Thu, Oct 8, 2015 at 6:59 AM, Richard Henderson  wrote:
>>>
>>>  * target.def (TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): New.
>>>  * targhooks.h (default_addr_space_zero_address_valid): Declare.
>>>  * targhooks.c (default_addr_space_zero_address_valid): New.
>>>  * doc/tm.texi, doc/tm.texi.in: Update.
>>>  * config/i386/i386.c (ix86_addr_space_zero_address_valid): New.
>>>  (TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): New.
>>>  * fold-const.c (const_unop) [ADDR_SPACE_CONVERT_EXPR]: Disable
>>>  folding of 0 when it is a valid address for the source space.
>>>  * gimple.c (check_loadstore): Disable noticing dereference when
>>>  0 is a valid address for the space.
>>
>>
>> I think this is incomplete and you need to look at all places that check
>> for
>> flag_delete_null_pointer_checks (ick, the ubsan abuse looks
>> interesting...).
>> We'd best abstract that flag check somewhere, only doing the address-space
>> check for ! ADDR_SPACE_GENERIC.
>
>
> I did a fair survey of the uses of f_d_n_p_c.  Most of them are tests for
> explicit symbols that are weak, etc.  I suppose we should probably then
> check to see if the symbol is placed in a non-default address space, but in
> the context of the code I was working on, that never comes up.

One I know about is tree-ssa-structalias.c:get_constraint_for_1 which treats
zero address specially.  A zero_address_valid_p predicate taking a
pointer (type)
would be nice to have to abstract those flag checks appropriately.

>> I also wonder about the const_unop change - if the target address-space
>> has a valid 0 but the source has not then you create a valid object
>> address
>> from an invalid one?
>
>
> I guess we would, but ... what else can you do when there's no invalid
> address?

True ... maybe a less likely valid one?  (0xfff...ff?)

>> The check_loadstore change should instead have adjusted the
>> flag_delete_null_pointer_checks guard in
>> infer_nonnull_range_by_dereference.
>
>
> Nope, that doesn't work.  You have to wait until you see the actual MEM
> being dereferenced before you can look at it's address space.

Well, as we are explicitely looking for the pointer 'op' we know the
address-space
beforehand, no?  TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op)))?

Richard.

>
> r~

Re: [PATCH 2/6] always define SETUP_FRAME_ADDRESSES

2015-10-12 Thread Bernd Schmidt


On 10/11/2015 02:25 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-10-10  Trevor Saunders  

* defaults.h (SETUP_FRAME_ADDRESSES): New default definition.
* builtins.c (expand_builtin_return_addr): Adjust.
* doc/tm.texi: Likewise.
* doc/tm.texi.in: Likewise.
* except.c (expand_builtin_unwind_init): Likewise.


If we go to the trouble of changing this, could we convert macros to 
target hooks instead while we're there? REVERSE_CONDITION, 
SETUP_FRAME_ADDRESSES and FRAME_ADDR_RTX all seem to be used only in a 
handful of ports, and INITIAL_FRAME_ADDRESS_RTX only in one.



Bernd

Re: [PR67891] drop is_gimple_reg test from set_parm_rtl

2015-10-12 Thread Richard Biener

On Sat, Oct 10, 2015 at 3:16 PM, Alexandre Oliva  wrote:
> On Oct  9, 2015, Richard Biener  wrote:
>
>> Ok.  Note that I think emit_block_move shouldn't mess with the addressable 
>> flag.
>
> I have successfully tested a patch that stops it from doing so,
> reverting https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49429#c11 but
> according to bugs 49429 and 49454, it looks like removing it would mess
> with escape analysis introduced in r175063 for bug 44194.  The thread
> that introduces the mark_addressable calls suggests some discomfort with
> this solution, and even a suggestion that the markings should be
> deferred past the end of expand, but in the end there was agreement to
> go with it.  https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01746.html

Aww, indeed.  Of course the issue is that we don't track pointers to the
stack introduced during RTL properly.

> I'm leaving it alone, since I can't reasonably test on the platforms
> where the problems showed up.

Yeah.

Thanks for checking.  Might want to add a comment before that
addressable setting now that you've done the archeology.

Richard.

>
> --
> Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: Move some bit and binary optimizations in simplify and match

2015-10-12 Thread Hurugalawadi, Naveen

Hi Richard,

Thanks for your review and useful comments.
I will  move the future optimization patterns with all the conditions
present in fold-const or builtins file as per your suggestions.

Please find attached the patch as per your comments.
Please review the patch and let me know if any further modifications 
are required.

The last pattern has been removed due to the discussions over it
and a regression it caused.

+/* Fold X & (X ^ Y) as X & ~Y.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
+  (bit_and (convert @0) (convert (bit_not @1


FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0
FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;"

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 5d8822f..8889c39 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9192,26 +9192,6 @@ fold_binary_loc (location_t loc,
   return NULL_TREE;
 
 case PLUS_EXPR:
-  if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
-	{
-	  /* X + (X / CST) * -CST is X % CST.  */
-	  if (TREE_CODE (arg1) == MULT_EXPR
-	  && TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
-	  && operand_equal_p (arg0,
-  TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0))
-	{
-	  tree cst0 = TREE_OPERAND (TREE_OPERAND (arg1, 0), 1);
-	  tree cst1 = TREE_OPERAND (arg1, 1);
-	  tree sum = fold_binary_loc (loc, PLUS_EXPR, TREE_TYPE (cst1),
-  cst1, cst0);
-	  if (sum && integer_zerop (sum))
-		return fold_convert_loc (loc, type,
-	 fold_build2_loc (loc, TRUNC_MOD_EXPR,
-		  TREE_TYPE (arg0), arg0,
-		  cst0));
-	}
-	}
-
   /* Handle (A1 * C1) + (A2 * C2) with A1, A2 or C1, C2 being the same or
 	 one.  Make sure the type is not saturating and has the signedness of
 	 the stripped operands, as fold_plusminus_mult_expr will re-associate.
@@ -9652,28 +9632,6 @@ fold_binary_loc (location_t loc,
 			fold_convert_loc (loc, type,
 	  TREE_OPERAND (arg0, 0)));
 
-  if (! FLOAT_TYPE_P (type))
-	{
-	  /* Fold (A & ~B) - (A & B) into (A ^ B) - B, where B is
-	 any power of 2 minus 1.  */
-	  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && TREE_CODE (arg1) == BIT_AND_EXPR
-	  && operand_equal_p (TREE_OPERAND (arg0, 0),
-  TREE_OPERAND (arg1, 0), 0))
-	{
-	  tree mask0 = TREE_OPERAND (arg0, 1);
-	  tree mask1 = TREE_OPERAND (arg1, 1);
-	  tree tem = fold_build1_loc (loc, BIT_NOT_EXPR, type, mask0);
-
-	  if (operand_equal_p (tem, mask1, 0))
-		{
-		  tem = fold_build2_loc (loc, BIT_XOR_EXPR, type,
- TREE_OPERAND (arg0, 0), mask1);
-		  return fold_build2_loc (loc, MINUS_EXPR, type, tem, mask1);
-		}
-	}
-	}
-
   /* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
 	 __complex__ ( x, -y ).  This is not the same for SNaNs or if
 	 signed zeros are involved.  */
@@ -9763,20 +9721,6 @@ fold_binary_loc (location_t loc,
   goto associate;
 
 case MULT_EXPR:
-  /* (-A) * (-B) -> A * B  */
-  if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg0, 0)),
-			fold_convert_loc (loc, type,
-	  negate_expr (arg1)));
-  if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  negate_expr (arg0)),
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg1, 0)));
-
   if (! FLOAT_TYPE_P (type))
 	{
 	  /* Transform x * -C into -x * C if x is easily negatable.  */
@@ -9790,16 +9734,6 @@ fold_binary_loc (location_t loc,
 		  negate_expr (arg0)),
 tem);
 
-	  /* (a * (1 << b)) is (a << b)  */
-	  if (TREE_CODE (arg1) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg1, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op0,
-TREE_OPERAND (arg1, 1));
-	  if (TREE_CODE (arg0) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg0, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op1,
-TREE_OPERAND (arg0, 1));
-
 	  /* (A + A) * C -> A * 2 * C  */
 	  if (TREE_CODE (arg0) == PLUS_EXPR
 	  && TREE_CODE (arg1) == INTEGER_CST
@@ -9842,21 +9776,6 @@ fold_binary_loc (location_t loc,
 	}
   else
 	{
-	  /* Convert (C1/X)*C2 into (C1*C2)/X.  This transformation may change
- the result for floating point types due to rounding so it is applied
- only if -fassociative-math was specify.  */
-	  if (flag_associative_math
-	  && TREE_CODE (arg0) == RDIV_EXPR
-	  && TREE_CODE (arg1) == REAL_CST
-	  && TREE_CODE (TREE_OPERAND (arg0, 0)) == REAL_CST)
-	{
-	  tree tem = const_binop (MULT_EXPR, TREE_OPERAND (arg0, 0),
-  arg1);
-	  if (tem)
-		return fold_build2_loc (loc, RDIV_EXPR, type, tem,
-TREE_OPERAN

Re: [[Boolean Vector, patch 5/5] Support boolean vectors in vector lowering

2015-10-12 Thread Alan Lawrence


On 09/10/15 22:01, Jeff Law wrote:


So my question for the series as a whole is whether or not we need to do
something for the other languages, particularly Fortran.  I was a bit
surprised to see this stuff bleed into the C/C++ front-ends and
obviously wonder if it's bled into Fortran, Ada, Java, etc.


Isn't that just because, we have GNU extensions to C/C++, for vectors? I admit I 
don't know enough Ada/Fortran to know whether we've added GNU extensions to 
those languages as well...


A.

Re: [PATCH] gcc/ira.c: Check !HAVE_FP_INSTEAD_INSNS when frame pointer is needed and as global register

2015-10-12 Thread Bernd Schmidt


On 10/11/2015 05:16 PM, Chen Gang wrote:

For some architectures (e.g. bfin), when this case occurs, they will use
another instructions instead of frame pointer (e.g. LINK for bfin), so
they can still generate correct output assembly code.


What is "this case"? I don't think you have explained the problem you 
are trying to solve.



2015-10-11  Chen Gang  

gcc/
* config.in: Add HAVE_FP_INSTEAD_INSNS.
* configure: Check HAVE_FP_INSTEAD_INSNS to set 0 or 1.


And of course, that should not be a configure check. If at all, use a 
target hook.



Bernd

Re: [PATCH 2/3] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier

2015-10-12 Thread Alan Lawrence


On 07/10/15 00:59, charles.bay...@linaro.org wrote:


diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2667866..251afdc 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN)
  UNSPEC_VLD1_LANE))]
"TARGET_NEON"
  {
-  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
+  operands[3] = GEN_INT (lane);
if (lane < 0 || lane >= max)
  error ("lane out of range");


I'm just wondering whether these 'lane out of range' error messages can ever be 
triggered now we have all the other checking? Can we now remove them (perhaps in 
a followup patch)?


Cheers, Alan

Re: [gomp4, committed] Add goacc/kernels-acc-on-device.c

2015-10-12 Thread Thomas Schwinge

Hi Tom!

On Sat, 10 Oct 2015 12:49:01 +0200, Tom de Vries  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c
> @@ -0,0 +1,39 @@
> +/* { dg-additional-options "-O2" } */
> +
> +#include 

That doesn't work (at least in build-tree testing), as gcc/testsuite/ is
not set up to look for header files in [target]/libgomp/:

[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c:3:21: 
fatal error: openacc.h: No such file or directory
compilation terminated.
compiler exited with status 1

> +
> +#define N 32
> +
> +void
> +foo (float *a, float *b)
> +{
> +  float exp;
> +  int i;
> +  int n;
> +
> +#pragma acc kernels copyin(a[0:N]) copyout(b[0:N])
> +  {
> +int ii;
> +
> +for (ii = 0; ii < N; ii++)
> +  {
> + if (acc_on_device (acc_device_host))

Your two options are: if that's applicable/sufficient for what you intend
to test here, use __builtin_acc_on_device with a hard-coded acc_device_*,
or duplicate part of  as done for example in
gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c.

> +   b[ii] = a[ii] + 1;
> + else
> +   b[ii] = a[ii];
> +  }
> +  }
> +
> +#pragma acc kernels copyin(a[0:N]) copyout(b[0:N])
> +  {
> +int ii;
> +
> +for (ii = 0; ii < N; ii++)
> +  {
> + if (acc_on_device (acc_device_host))
> +   b[ii] = a[ii] + 2;
> + else
> +   b[ii] = a[ii];
> +  }
> +  }
> +}

Grüße,
 Thomas

signature.asc
Description: PGP signature

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-12 Thread Hans-Peter Nilsson

> From: Ulrich Weigand 
> Date: Mon, 12 Oct 2015 11:58:40 +0200

(cutting *only* because I had a comment; not an indication of
preference.)

> --or--
> 
> 2b) Change target_header_dir from a single directory to a list of
> directories, and check all of these for header files.  This list
> would typically include both sys-include and include.
> 
> This should not change behavior for any existing user, and would
> bring the header search at configure time in line with the actual
> search order used by the compiler at run time, which will probably
> be the least surprise to users anyway ...

Agreed.  Just pointing out that it would take some effort in
gcc/configure.ac.

> For 1), something like the following should probably suffice:
> 
> Index: gcc/configure.ac
> ===
> --- gcc/configure.ac  (revision 228530)
> +++ gcc/configure.ac  (working copy)
> @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
>  fi
>  
>  if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
> -  if test "x$with_headers" != x; then
> +  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
>  target_header_dir=$with_headers
>elif test "x$with_sysroot" = x; then
>  
> target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
> 
> 
> I'll probably not spend any more time right now to try to implement
> either of the 2) variants; I can live with using sys-include for now.

To be clear (to those skipping most of the thread), I'm ok with this.

Thanks.

brgds, H-P

Re: [PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins

2015-10-12 Thread Alan Lawrence


On 07/10/15 00:59, charles.bay...@linaro.org wrote:

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c

...

  case NEON_ARG_MEMORY:
  /* Check if expand failed.  */
  if (op[argc] == const0_rtx)
  {
-   va_end (ap);
return 0;
  }


...and drop the braces?


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 02f5dc3..448cde3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30117,4 +30117,5 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
*pri = tmp;
return;
  }
+
  #include "gt-arm.h"


This looks unrelated (and is the only change to arm.c) - perhaps commit 
separately? (Note I am not a maintainer! But this looks "obvious"...)



diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 87c9f90..27ac4dc 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -288,6 +288,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
  #define TARGET_BPABI false
  #endif

+#define ENDIAN_LANE_N(mode, n)  \
+  (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
+


Given we are making changes here to how this all works on bigendian, have you 
tested armeb at all?


Generally I would say this all looks sensible :)

Cheers, Alan

[PATCH][5] Backport ISL 0.15 support

2015-10-12 Thread Richard Biener


This backports the patch to allow bootstrapping with ISL 0.15 to the
GCC 5 branch (the GCC 4.9 branch will require backporting of some
dependencies).

Bootstrapped with ISL 0.15 (in-tree), ISL 0.14 and ISL 0.12 (both 
installed).

Committed to the branch.

Richard.

2015-10-12  Richard Biener  

Backport from mainline
2015-07-21  Mike Frysinger  
Bernhard Reutner-Fischer  

* configure.ac: Add check for new options in isl-0.15.
* config.in, configure: Rebuilt.
* graphite-blocking.c: Include 
* graphite-interchange.c,  graphite-poly.c: Likewise.
* graphhite-scop-detection.c, graphite-sese-to-poly.c: Likewise.
* graphite.c: Likewise.
* graphite-isl-ast-to-gimple.c: Include  and
.
* graphite-dependences.c: Include .
(max_number_of_out_dimensions): Returns isl_stat.
(extend_schedule_1): Likewise
(extend_schedule): Corresponding changes.
* graphite-optimize-isl.c: Include  and
.
(getSingleMap): Change return type of isl_stat.
(optimize_isl): Conditionally use
isl_options_set_schedule_serialize_sccs.
* graphite-poly.h (isl_stat, isl_stat_ok): Define fallbacks
if not HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS.

Index: gcc/config.in
===
--- gcc/config.in   (revision 228597)
+++ gcc/config.in   (working copy)
@@ -1313,6 +1313,12 @@
 #endif
 
 
+/* Define if isl_options_set_schedule_serialize_sccs exists. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+#endif
+
+
 /* Define if isl_schedule_constraints_compute_schedule exists. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
Index: gcc/configure
===
--- gcc/configure   (revision 228597)
+++ gcc/configure   (working copy)
@@ -28305,6 +28305,8 @@ fi
 
 # Check whether isl_schedule_constraints_compute_schedule is available;
 # it's new in ISL-0.13.
+# Check whether isl_options_set_schedule_serialize_sccs is available;
+# it's new in ISL-0.15.
 if test "x${ISLLIBS}" != "x" ; then
   saved_CFLAGS="$CFLAGS"
   CFLAGS="$CFLAGS $ISLINC"
@@ -28334,6 +28336,29 @@ rm -f core conftest.err conftest.$ac_obj
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$ac_has_isl_schedule_constraints_compute_schedule" >&5
 $as_echo "$ac_has_isl_schedule_constraints_compute_schedule" >&6; }
 
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking Checking for 
isl_options_set_schedule_serialize_sccs" >&5
+$as_echo_n "checking Checking for isl_options_set_schedule_serialize_sccs... " 
>&6; }
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+isl_options_set_schedule_serialize_sccs (NULL, 0);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_has_isl_options_set_schedule_serialize_sccs=yes
+else
+  ac_has_isl_options_set_schedule_serialize_sccs=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$ac_has_isl_options_set_schedule_serialize_sccs" >&5
+$as_echo "$ac_has_isl_options_set_schedule_serialize_sccs" >&6; }
+
   LIBS="$saved_LIBS"
   CFLAGS="$saved_CFLAGS"
 
@@ -28342,6 +28367,12 @@ $as_echo "$ac_has_isl_schedule_constrain
 $as_echo "#define HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE 1" >>confdefs.h
 
   fi
+
+  if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; then
+
+$as_echo "#define HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS 1" >>confdefs.h
+
+  fi
 fi
 
 # Check for plugin support
Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 228597)
+++ gcc/configure.ac(working copy)
@@ -5746,6 +5746,8 @@ fi
 
 # Check whether isl_schedule_constraints_compute_schedule is available;
 # it's new in ISL-0.13.
+# Check whether isl_options_set_schedule_serialize_sccs is available;
+# it's new in ISL-0.15.
 if test "x${ISLLIBS}" != "x" ; then
   saved_CFLAGS="$CFLAGS"
   CFLAGS="$CFLAGS $ISLINC"
@@ -5759,6 +5761,13 @@ if test "x${ISLLIBS}" != "x" ; then
   [ac_has_isl_schedule_constraints_compute_schedule=no])
   AC_MSG_RESULT($ac_has_isl_schedule_constraints_compute_schedule)
 
+  AC_MSG_CHECKING([Checking for isl_options_set_schedule_serialize_sccs])
+  AC_TRY_LINK([#include ],
+  [isl_options_set_schedule_serialize_sccs (NULL, 0);],
+  [ac_has_isl_options_set_schedule_serialize_sccs=yes],
+  [ac_has_isl_options_set_schedule_serialize_sccs=no])
+  AC_MSG_RESULT($ac_has_isl_options_set_schedule_serialize_sccs)
+
   LIBS="$saved_LIBS"
   CFLAGS="$saved_CFLAGS"
 
@@ -5766,6 +5775,11 @@ if test "x${ISLLIBS}" != "x" ; then
  AC_DEFINE(HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE, 1,

Re: [PATCH ARM]: PR67745: Fix function alignment after attribute 2/2

2015-10-12 Thread Bernd Schmidt


On 10/12/2015 12:56 PM, Christian Bruel wrote:

yes I see, I was hoping to avoid a new hook, but as you said it seems
mandatory for the mere declaration case.

Here is one proposal, it defaults to nothing and the ARM implementation
does not need to handle the vptr bit setting. so that simplifies a lot
the things.

The hook is called from rest_of_decl_compilation for mere declarations
and allocate_struct_function for definitions.


This looks good to me. I still think we also want your vptr patch.


Bernd

[PATCH, aarch64]: Remove AARCH64_ROUND_UP and AARCH64_ROUND_DOWN defines

2015-10-12 Thread Uros Bizjak

Remove private definitions and use equivalent global macros instead.

2015-10-12  Uros Bizjak  

* config/aarch/aarch64.h (AARCH64_ROUND_UP): Remove.
(AARCH64_ROUND_DOWN): Ditto.
* config/aarch64/aarch64.c: Use ROUND_UP instead of AARCH64_ROUND_UP.

Tested by building a crosscompiler to aarch64-linux-gnu.

OK for mainline?

Uros.
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c(revision 228703)
+++ config/aarch64/aarch64.c(working copy)
@@ -1860,8 +1860,8 @@ aarch64_layout_arg (cumulative_args_t pcum_v, mach
 
   /* Size in bytes, rounded to the nearest multiple of 8 bytes.  */
   size
-= AARCH64_ROUND_UP (type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode),
-   UNITS_PER_WORD);
+= ROUND_UP (type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode),
+   UNITS_PER_WORD);
 
   allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
   allocate_nvrn = aarch64_vfp_is_call_candidate (pcum_v,
@@ -1969,8 +1969,8 @@ aarch64_layout_arg (cumulative_args_t pcum_v, mach
 on_stack:
   pcum->aapcs_stack_words = size / UNITS_PER_WORD;
   if (aarch64_function_arg_alignment (mode, type) == 16 * BITS_PER_UNIT)
-pcum->aapcs_stack_size = AARCH64_ROUND_UP (pcum->aapcs_stack_size,
-  16 / UNITS_PER_WORD);
+pcum->aapcs_stack_size = ROUND_UP (pcum->aapcs_stack_size,
+  16 / UNITS_PER_WORD);
   return;
 }
 
@@ -2237,21 +2237,21 @@ aarch64_layout_frame (void)
   }
 
   cfun->machine->frame.padding0 =
-(AARCH64_ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT) - offset);
-  offset = AARCH64_ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+(ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT) - offset);
+  offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
 
   cfun->machine->frame.saved_regs_size = offset;
 
   cfun->machine->frame.hard_fp_offset
-= AARCH64_ROUND_UP (cfun->machine->frame.saved_varargs_size
-   + get_frame_size ()
-   + cfun->machine->frame.saved_regs_size,
-   STACK_BOUNDARY / BITS_PER_UNIT);
+= ROUND_UP (cfun->machine->frame.saved_varargs_size
+   + get_frame_size ()
+   + cfun->machine->frame.saved_regs_size,
+   STACK_BOUNDARY / BITS_PER_UNIT);
 
   cfun->machine->frame.frame_size
-= AARCH64_ROUND_UP (cfun->machine->frame.hard_fp_offset
-   + crtl->outgoing_args_size,
-   STACK_BOUNDARY / BITS_PER_UNIT);
+= ROUND_UP (cfun->machine->frame.hard_fp_offset
+   + crtl->outgoing_args_size,
+   STACK_BOUNDARY / BITS_PER_UNIT);
 
   cfun->machine->frame.laid_out = true;
 }
@@ -9024,8 +9024,8 @@ aarch64_expand_builtin_va_start (tree valist, rtx
  This address is gr_save_area_bytes below GRTOP, rounded
  down to the next 16-byte boundary.  */
   t = make_tree (TREE_TYPE (vrtop), virtual_incoming_args_rtx);
-  vr_offset = AARCH64_ROUND_UP (gr_save_area_size,
-STACK_BOUNDARY / BITS_PER_UNIT);
+  vr_offset = ROUND_UP (gr_save_area_size,
+   STACK_BOUNDARY / BITS_PER_UNIT);
 
   if (vr_offset)
 t = fold_build_pointer_plus_hwi (t, -vr_offset);
@@ -9118,7 +9118,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree ty
  unshare_expr (valist), f_grtop, NULL_TREE);
   f_off = build3 (COMPONENT_REF, TREE_TYPE (f_groff),
  unshare_expr (valist), f_groff, NULL_TREE);
-  rsize = (size + UNITS_PER_WORD - 1) & -UNITS_PER_WORD;
+  rsize = ROUND_UP (size, UNITS_PER_WORD);
   nregs = rsize / UNITS_PER_WORD;
 
   if (align > 8)
@@ -9357,8 +9357,8 @@ aarch64_setup_incoming_varargs (cumulative_args_t
  /* Set OFF to the offset from virtual_incoming_args_rtx of
 the first vector register.  The VR save area lies below
 the GR one, and is aligned to 16 bytes.  */
- off = -AARCH64_ROUND_UP (gr_saved * UNITS_PER_WORD,
-  STACK_BOUNDARY / BITS_PER_UNIT);
+ off = -ROUND_UP (gr_saved * UNITS_PER_WORD,
+  STACK_BOUNDARY / BITS_PER_UNIT);
  off -= vr_saved * UNITS_PER_VREG;
 
  for (i = local_cum.aapcs_nvrn; i < NUM_FP_ARG_REGS; ++i)
@@ -9377,8 +9377,8 @@ aarch64_setup_incoming_varargs (cumulative_args_t
   /* We don't save the size into *PRETEND_SIZE because we want to avoid
  any complication of having crtl->args.pretend_args_size changed.  */
   cfun->machine->frame.saved_varargs_size
-= (AARCH64_ROUND_UP (gr_saved * UNITS_PER_WORD,
- STACK_BOUNDARY / BITS_PER_UNIT)
+= (ROUND_UP (gr_saved * UNITS_PER_WORD,
+STACK_BOUNDARY / BITS_PER_UNIT)
+ vr_saved * UNITS_PER_VREG);
 }
 
Index: config/aarch64/aarch64.h

Re: [3/7] Optimize ZEXT_EXPR with tree-vrp

2015-10-12 Thread Richard Biener

On Sun, Oct 11, 2015 at 4:56 AM, Kugan
 wrote:
>
>
> On 09/10/15 21:29, Richard Biener wrote:
>> +  unsigned int prec = tree_to_uhwi (vr1.min);
>>
>> this should use unsigned HOST_WIDE_INT
>>
>> +  wide_int sign_bit = wi::shwi (1ULL << (prec - 1),
>> +   TYPE_PRECISION (TREE_TYPE (vr0.min)));
>>
>> use wi::one (TYPE_PRECISION (TREE_TYPE (vr0.min))) << (prec - 1);
>>
>> That is, you really need to handle precisions bigger than HOST_WIDE_INT.
>>
>> But I suppose wide_int really misses a test_bit function (it has a set_bit
>> one already).
>>
>> + if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
>> +   {
>> + /* If to-be-extended sign bit is one.  */
>> + tmin = type_min;
>> + tmax = may_be_nonzero;
>>
>> I think tmax should be zero-extended may_be_nonzero from prec.
>>
>> + else if (wi::bit_and (may_be_nonzero, sign_bit)
>> +  != sign_bit)
>> +   {
>> + /* If to-be-extended sign bit is zero.  */
>> + tmin = must_be_nonzero;
>> + tmax = may_be_nonzero;
>>
>> likewise here tmin/tmax should be zero-extended may/must_be_nonzero from 
>> prec.
>>
>> +case SEXT_EXPR:
>> +   {
>> + unsigned int prec = tree_to_uhwi (op1);
>> + wide_int sign_bit = wi::shwi (1ULL << (prec - 1),
>> +   TYPE_PRECISION (TREE_TYPE 
>> (vr0.min)));
>> + wide_int mask = wi::shwi (((1ULL << (prec - 1)) - 1),
>> +   TYPE_PRECISION (TREE_TYPE (vr0.max)));
>>
>> this has the same host precision issues of 1ULL (HOST_WIDE_INT).
>> There is wi::mask, eventually you can use wi::set_bit_in_zero to
>> produce the sign-bit wide_int (also above).
>
>
> Thanks Ricahrd. Does the attached patch looks better ?

Yes.  That variant is ok once prerequesites have been approved.

Thanks,
Richard.


> Thanks,
> Kugan

Re: [1/7] Add new tree code SEXT_EXPR

2015-10-12 Thread Richard Biener

On Sun, Oct 11, 2015 at 12:35 PM, Kugan
 wrote:
>
>
> On 15/09/15 23:18, Richard Biener wrote:
>> On Mon, Sep 7, 2015 at 4:55 AM, Kugan  
>> wrote:
>>>
>>> This patch adds support for new tree code SEXT_EXPR.
>>
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index d567a87..bbc3c10 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -5071,6 +5071,10 @@ expand_debug_expr (tree exp)
>>  case FMA_EXPR:
>>return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
>>
>> +case SEXT_EXPR:
>> +  return op0;
>>
>> that looks wrong.  Generate (sext:... ) here?
>>
>> +case SEXT_EXPR:
>> +   {
>> + rtx op0 = expand_normal (treeop0);
>> + rtx temp;
>> + if (!target)
>> +   target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (treeop0)));
>> +
>> + machine_mode inner_mode
>> +   = smallest_mode_for_size (tree_to_shwi (treeop1),
>> + MODE_INT);
>> + temp = convert_modes (inner_mode,
>> +   TYPE_MODE (TREE_TYPE (treeop0)), op0, 0);
>> + convert_move (target, temp, 0);
>> + return target;
>> +   }
>>
>> Humm - is that really how we expand sign extensions right now?  No helper
>> that would generate (sext ...) directly?  I wouldn't try using 'target' btw 
>> but
>> simply return (sext:mode op0 op1) or so.  But I am no way an RTL expert.
>>
>> Note that if we don't disallow arbitrary precision SEXT_EXPRs we have to
>> fall back to using shifts (and smallest_mode_for_size is simply wrong).
>>
>> +case SEXT_EXPR:
>> +  {
>> +   if (!INTEGRAL_TYPE_P (lhs_type)
>> +   || !INTEGRAL_TYPE_P (rhs1_type)
>> +   || TREE_CODE (rhs2) != INTEGER_CST)
>>
>> please constrain this some more, with
>>
>>|| !useless_type_conversion_p (lhs_type, rhs1_type)
>>
>> + {
>> +   error ("invalid operands in sext expr");
>> +   return true;
>> + }
>> +   return false;
>> +  }
>>
>> @@ -3414,6 +3422,9 @@ op_symbol_code (enum tree_code code)
>>  case MIN_EXPR:
>>return "min";
>>
>> +case SEXT_EXPR:
>> +  return "sext from bit";
>> +
>>
>> just "sext" please.
>>
>> +/*  Sign-extend operation.  It will sign extend first operand from
>> + the sign bit specified by the second operand.  */
>> +DEFTREECODE (SEXT_EXPR, "sext_expr", tcc_binary, 2)
>>
>> "from the INTEGER_CST sign bit specified"
>>
>> Also add "The type of the result is that of the first operand."
>>
>
>
>
> Thanks for the review. Attached patch attempts to address the above
> comments. Does this look better?

+case SEXT_EXPR:
+  gcc_assert (CONST_INT_P (op1));
+  inner_mode = mode_for_size (INTVAL (op1), MODE_INT, 0);

We should add

gcc_assert (GET_MODE_BITSIZE (inner_mode) == INTVAL (op1));

+  if (mode != inner_mode)
+   op0 = simplify_gen_unary (SIGN_EXTEND,
+ mode,
+ gen_lowpart_SUBREG (inner_mode, op0),
+ inner_mode);

as we're otherwise silently dropping things like SEXT (short-typed-var, 13)

+case SEXT_EXPR:
+   {
+ machine_mode inner_mode = mode_for_size (tree_to_shwi (treeop1),
+  MODE_INT, 0);

Likewise.  Also treeop1 should be unsigned, thus tree_to_uhwi?

+ rtx temp, result;
+ rtx op0 = expand_normal (treeop0);
+ op0 = force_reg (mode, op0);
+ if (mode != inner_mode)
+   {

Again, for the RTL bits I'm not sure they are correct.  For example I don't
see why we need a lowpart SUBREG, isn't a "regular" SUBREG enough?

+case SEXT_EXPR:
+  {
+   if (!INTEGRAL_TYPE_P (lhs_type)
+   || !useless_type_conversion_p (lhs_type, rhs1_type)
+   || !INTEGRAL_TYPE_P (rhs1_type)
+   || TREE_CODE (rhs2) != INTEGER_CST)

the INTEGRAL_TYPE_P (rhs1_type) check is redundant with
the useless_type_Conversion_p one.  Please check
tree_fits_uhwi (rhs2) instead of != INTEGER_CST.

Otherwise ok for trunk.

Thanks,
Richard.



>
> Thanks,
> Kugan

Re: Move some bit and binary optimizations in simplify and match

2015-10-12 Thread Marc Glisse


On Mon, 12 Oct 2015, Hurugalawadi, Naveen wrote:

+/* Fold X + (X / CST) * -CST to X % CST.  */
+(simplify
+ (plus (convert1? @0) (convert2? (mult (trunc_div @0 INTEGER_CST@1) 
INTEGER_CST@2)))
+  (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+   && wi::add (@1, @2) == 0)
+   (trunc_mod (convert @0) (convert @1

With INTEGER_CST above, the test INTEGRAL_TYPE_P might be redundant, and 
VECTOR_INTEGER_TYPE_P can never match.


+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @2))
+  (if (! FLOAT_TYPE_P (type)
+   && wi::eq_p (@1, @2))
+   (minus (bit_xor @0 @1) @1)))

I don't think FLOAT_TYPE_P can ever be true for the result of bit_and.

+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type))
+   (lshift (convert @0) (convert @2

If a and 1 are vectors and b is a scalar...

+/* Simplify (X & ~Y) | (~X & Y) is X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:s @0 (bit_not @1)) (bit_and:s (bit_not @2) @3))
+  (if (wi::eq_p (@0, @2)
+   && wi::eq_p (@1, @3))
+   (bit_xor @0 @3)))

I don't think we need the :s when the result of the transformation is so 
simple.


+/* Simplify ~X & X as zero.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_not @0)))
+  (convert { build_zero_cst (TREE_TYPE (@0)); }))

Can't you build 0 directly in the right type?


--
Marc Glisse

Re: [PATCH] PR target/67850: Wrong call_used_regs used in aggregate_value_p

2015-10-12 Thread H.J. Lu

On Wed, Oct 7, 2015 at 2:01 AM, Uros Bizjak  wrote:
> On Wed, Oct 7, 2015 at 10:53 AM, Richard Biener  wrote:
>
>>> >>> > > Since targetm.expand_to_rtl_hook may be called to switch ABI, it 
>>> >>> > > should
>>> >>> > > be called for each function before expanding to RTL.  Otherwise, we 
>>> >>> > > may
>>> >>> > > use the stale information from compilation of the previous function.
>>> >>> > > aggregate_value_p uses call_used_regs.  aggregate_value_p is used by
>>> >>> > > IPA and return value optimization, which are called before
>>> >>> > > pass_expand::execute after RTL expansion starts.  We need to call
>>> >>> > > targetm.expand_to_rtl_hook early enough in cgraph_node::expand to 
>>> >>> > > make
>>> >>> > > sure that everything is in sync when RTL expansion starts.
>>> >>> > >
>>> >>> > > Tested on Linux/x86-64.  OK for trunk?
>>> >>> >
>>> >>> > Hmm, I think set_cfun hook should handle this.  expand_to_rtl_hook 
>>> >>> > shouldn't
>>> >>> > mess with per-function stuff.
>>> >>> >
>>> >>> > Richard.
>>> >>> >
>>> >>>
>>> >>> I am testig this patch.  OK for trunk if there is no regresion?
>>> >>>
>>> >>>
>>> >>> H.J.
>>> >>> --
>>> >>> ix86_maybe_switch_abi is called to late during RTL expansion and we
>>> >>> use the stale information from compilation of the previous function.
>>> >>> aggregate_value_p uses call_used_regs.  aggregate_value_p is used by
>>> >>> IPA and return value optimization, which are called before
>>> >>> pass_expand::execute after RTL expansion starts.  Instead,
>>> >>> ix86_maybe_switch_abi should be merged with ix86_set_current_function.
>>> >>>
>>> >>>   PR target/67850
>>> >>>   * config/i386/i386.c (ix86_set_current_function): Renamed
>>> >>>   to ...
>>> >>>   (ix86_set_current_function_1): This.
>>> >>>   (ix86_set_current_function): New. incorporate old
>>> >>>   ix86_set_current_function and ix86_maybe_switch_abi.
>>> >>>   (ix86_maybe_switch_abi): Removed.
>>> >>>   (TARGET_EXPAND_TO_RTL_HOOK): Likewise.
>>> >>> ---
>>> >>>  gcc/config/i386/i386.c | 33 ++---
>>> >>>  1 file changed, 18 insertions(+), 15 deletions(-)
>>> >>>
>>> >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> >>> index d59b59b..a0adf3d 100644
>>> >>> --- a/gcc/config/i386/i386.c
>>> >>> +++ b/gcc/config/i386/i386.c
>>> >>> @@ -6222,7 +6222,7 @@ ix86_reset_previous_fndecl (void)
>>> >>> FNDECL.  The argument might be NULL to indicate processing at top
>>> >>> level, outside of any function scope.  */
>>> >>>  static void
>>> >>> -ix86_set_current_function (tree fndecl)
>>> >>> +ix86_set_current_function_1 (tree fndecl)
>>> >>>  {
>>> >>>/* Only change the context if the function changes.  This hook is 
>>> >>> called
>>> >>>   several times in the course of compiling a function, and we don't 
>>> >>> want to
>>> >>> @@ -6262,6 +6262,23 @@ ix86_set_current_function (tree fndecl)
>>> >>>ix86_previous_fndecl = fndecl;
>>> >>>  }
>>> >>>
>>> >>> +static void
>>> >>> +ix86_set_current_function (tree fndecl)
>>> >>> +{
>>> >>> +  ix86_set_current_function_1 (fndecl);
>>> >>> +
>>> >>> +  if (!cfun)
>>> >>> +return;
>>> >>
>>> >> I think you want to test !fndecl here.  Why split this out at all?
>>> >> The ix86_previous_fndecl caching should still work, no?
>>> >>
>>> >>> +  /* 64-bit MS and SYSV ABI have different set of call used registers.
>>> >>> + Avoid expensive re-initialization of init_regs each time we switch
>>> >>> + function context since this is needed only during RTL expansion.  
>>> >>> */
>>> >>
>>> >> The comment is now wrong (and your bug shows it was wrong previously).
>>> >>
>>> >
>>> > Here is the updated patch.  OK for master if there is no
>>> > regression on Linux/x86-64?
>>> >
>>>
>>> There is no regression.  OK for trunk?
>>
>> Ok with me but I defer to Uros for the approval.
>
> OK for mainline and release branches after a few days.
>

I backported it to GCC 5.  Backporting to 4.9 requires significant
change since ix86_set_current_function has changed since 4.9.
I have no plan to backport it to 4.9.


-- 
H.J.

Re: [gomp4, committed] Add goacc/kernels-acc-on-device.c

2015-10-12 Thread Tom de Vries


On 12/10/15 12:49, Thomas Schwinge wrote:

Hi Tom!

On Sat, 10 Oct 2015 12:49:01 +0200, Tom de Vries  wrote:

>--- /dev/null
>+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c
>@@ -0,0 +1,39 @@
>+/* { dg-additional-options "-O2" } */
>+
>+#include 


Hi Thomas,


That doesn't work (at least in build-tree testing), as gcc/testsuite/ is
not set up to look for header files in [target]/libgomp/:

 
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c:3:21: 
fatal error: openacc.h: No such file or directory
 compilation terminated.
 compiler exited with status 1



Ah, I see. I was doing 'make' followed by 'make install', and then 
build-tree testing. The build-tree testing seems to pick up the header 
file from the install directory. So for me test passed.



>+
>+#define N 32
>+
>+void
>+foo (float *a, float *b)
>+{
>+  float exp;
>+  int i;
>+  int n;
>+
>+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N])
>+  {
>+int ii;
>+
>+for (ii = 0; ii < N; ii++)
>+  {
>+   if (acc_on_device (acc_device_host))

Your two options are: if that's applicable/sufficient for what you intend
to test here, use __builtin_acc_on_device with a hard-coded acc_device_*,
or duplicate part of  as done for example in
gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c.



Went with second option, committed as attached.

Thanks,
- Tom

Remove openacc.h include from goacc/kernels-acc-on-device.c

2015-10-12  Tom de Vries  

	* c-c++-common/goacc/kernels-acc-on-device.c: Remove openacc.h include.
	(enum acc_device_t, acc_on_device): Declare.
	(foo): Remove unused vars.  Use acc_device_X.
---
 .../c-c++-common/goacc/kernels-acc-on-device.c | 27 --
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c
index e9e93c7..784c66a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-on-device.c
@@ -1,23 +1,36 @@
 /* { dg-additional-options "-O2" } */
 
-#include 
+#if __cplusplus
+extern "C" {
+#endif
+
+#if __cplusplus >= 201103
+# define __GOACC_NOTHROW noexcept
+#elif __cplusplus
+# define __GOACC_NOTHROW throw ()
+#else /* Not C++ */
+# define __GOACC_NOTHROW __attribute__ ((__nothrow__))
+#endif
+
+typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
+int acc_on_device (int) __GOACC_NOTHROW;
+
+#if __cplusplus
+}
+#endif
 
 #define N 32
 
 void
 foo (float *a, float *b)
 {
-  float exp;
-  int i;
-  int n;
-
 #pragma acc kernels copyin(a[0:N]) copyout(b[0:N])
   {
 int ii;
 
 for (ii = 0; ii < N; ii++)
   {
-	if (acc_on_device (acc_device_host))
+	if (acc_on_device (acc_device_X))
 	  b[ii] = a[ii] + 1;
 	else
 	  b[ii] = a[ii];
@@ -30,7 +43,7 @@ foo (float *a, float *b)
 
 for (ii = 0; ii < N; ii++)
   {
-	if (acc_on_device (acc_device_host))
+	if (acc_on_device (acc_device_X))
 	  b[ii] = a[ii] + 2;
 	else
 	  b[ii] = a[ii];
-- 
1.9.1

Re: [PATCH, rs6000]: Use ROUND_UP and ROUND_DOWN macros

2015-10-12 Thread David Edelsohn

> Fairly trivial patch that introduces no functional changes.

2015-10-12  Uros Bizjak  

* config/rs6000/rs6000.h (RS6000_ALIGN): Implement using
ROUND_UP macro.
* config/rs6000/rs6000.c (rs6000_darwin64_record_arg_advance_flush):
Use ROUND_UP and ROUND_DOWN macros where applicable.
(rs6000_darwin64_record_arg_flush): Ditto.
(rs6000_function_arg): Use ROUND_UP to calculate align_words.
(rs6000_emit_probe_stack_range): Use ROUND_DOWN to calculate
rounded_size.

> Tested by building a crosscompiler to powerpc64-linux-gnu.

> OK for mainline?

Okay.

Thanks, David

Re: [PATCH, aarch64]: Remove AARCH64_ROUND_UP and AARCH64_ROUND_DOWN defines

2015-10-12 Thread Marcus Shawcroft

On 12 October 2015 at 12:28, Uros Bizjak  wrote:
> Remove private definitions and use equivalent global macros instead.
>
> 2015-10-12  Uros Bizjak  
>
> * config/aarch/aarch64.h (AARCH64_ROUND_UP): Remove.
> (AARCH64_ROUND_DOWN): Ditto.
> * config/aarch64/aarch64.c: Use ROUND_UP instead of AARCH64_ROUND_UP.
>
> Tested by building a crosscompiler to aarch64-linux-gnu.
>
> OK for mainline?

OK Thanks /Marcus

[PATCH] Check no unreachable blocks in inverted_post_order_compute

2015-10-12 Thread Tom de Vries


Hi,

in the header comment of function inverted_post_order_compute in 
cfganal.c we find:

...
   This function assumes that all blocks in the CFG are reachable
   from the ENTRY (but not necessarily from EXIT).
...

This patch checks that there are indeed no unreachable blocks when 
calling inverted_post_order_compute.


OK for trunk if bootstrap/regtest succeeds?

Thanks,
- Tom
Check no unreachable blocks in inverted_post_order_compute

2015-10-12  Tom de Vries  

	* cfganal.c (verify_no_unreachable_blocks): New function.
	(inverted_post_order_compute) [ENABLE_CHECKING]: Call
	verify_no_unreachable_blocks.
	cfganal.h (verify_no_unreachable_blocks): Declare.
---
 gcc/cfganal.c | 17 +
 gcc/cfganal.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index 279c3b5..1f935eb 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -193,6 +193,19 @@ find_unreachable_blocks (void)
 
   free (worklist);
 }
+
+/* Verify that there are no unreachable blocks in the current function.  */
+
+void
+verify_no_unreachable_blocks (void)
+{
+  find_unreachable_blocks ();
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+gcc_assert ((bb->flags & BB_REACHABLE) != 0);
+}
+
 
 /* Functions to access an edge list with a vector representation.
Enough data is kept such that given an index number, the
@@ -772,6 +785,10 @@ inverted_post_order_compute (int *post_order)
   int post_order_num = 0;
   sbitmap visited;
 
+#if ENABLE_CHECKING
+  verify_no_unreachable_blocks ();
+#endif
+
   /* Allocate stack for back-tracking up CFG.  */
   stack = XNEWVEC (edge_iterator, n_basic_blocks_for_fn (cfun) + 1);
   sp = 0;
diff --git a/gcc/cfganal.h b/gcc/cfganal.h
index 3eb4764..2ad00c0 100644
--- a/gcc/cfganal.h
+++ b/gcc/cfganal.h
@@ -49,6 +49,7 @@ private:
 
 extern bool mark_dfs_back_edges (void);
 extern void find_unreachable_blocks (void);
+extern void verify_no_unreachable_blocks (void);
 struct edge_list * create_edge_list (void);
 void free_edge_list (struct edge_list *);
 void print_edge_list (FILE *, struct edge_list *);
-- 
1.9.1

[gomp4] OpenACC loop expand reorg

2015-10-12 Thread Nathan Sidwell

I've committed this to gomp4 branch.  It reworks the loop expansion code for 
OpenACC loops in the following ways


1) Removes OpenACC handling from expand_omp_for_static_{,no}chunk}.  These are 
thus now OpenMP only.  (Jakub, that should reduce conflicts between the two 
implemenations)


2) Implements expand_oacc_for, for openacc loops.  Loops are expanded using a 
new internal fn 'IFN_GOACC_LOOP', which abstracts the chunk size, step size, 
initial iteration per-thread iteration value, and per-thread bound.


3) IFN_GOACC_LOOP is lowered in the oacc_device_lower pass.  There are 4 
variants of the call, and an initial INTEGER_CST arg is used to distinguish use. 
 (This seemed better than having 4 new internal fns, and matches the IFN_UNIQUE 
mechanism).


4) The GOACC_LOOP lowering deals with chunking, and choses whether a compute 
axis should assign adjacent iterations to adjacent compute elements (striding), 
or assign a compute element to a contiguous span of iterations (contiguous). 
You want to stride at the vector level, but be continuous at outer levels, to 
maximize cache friendliness (in general).  Chunking of size 1 is the same as 
striding, and we reduce the former to the latter.


right now, GOACC_LOOP expansion is not device specific.  The expansion done 
earlier passes in the necessary information.  I'm working on a patch to separate 
that now, which will complete the transition.  With that

(a) kernels will be easier to mark and optimize
(b) the implementation will be device_type friendly, as device-specific choices 
will all have been moved to the target compiler.


nathan
2015-10-12  Nathan Sidwell  

	* omp-low.c (expand_omp_for_static_nochunk): Remove OpenACC
	pieces.
	(expand_omp_for_static_chunk): Likewise,
	(struct oacc_collapse): New.
	(expand_oacc_collapse_init, expand_oacc_collapse_vars): New.
	(expand_oacc_for): New.
	(expand_omp_for): Call expand_oacc_for for OpenACC loops.
	(oacc_xform_loop): New.
	(execute_oacc_device_lower): Call it.
	* internal-fn.def (GOACC_LOOP): New internal fn.
	(IFN_GOACC_LOOP_CHUNKS, IFN_GOACC_LOOP_STEP,
	IFN_GOACC_LOOP_OFFSET, IFN_GOACC_LOOP_BOUND): New.
	* internal-fn.c (expand_GOACC_LOOP): New.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228696)
+++ gcc/omp-low.c	(working copy)
@@ -7002,11 +7002,6 @@ expand_omp_for_generic (struct omp_regio
 	V += STEP;
 	if (V cond e) goto L1;
 L2:
-
- For OpenACC the above is wrapped in an OACC_FORK/OACC_JOIN pair.
- Currently we wrap the whole sequence, but it'd be better to place the
- markers just inside the outer conditional, so they can be entirely
- eliminated if the loop is unreachable.
 */
 
 static void
@@ -7025,9 +7020,8 @@ expand_omp_for_static_nochunk (struct om
   tree *counts = NULL;
   tree n1, n2, step;
 
-  gcc_checking_assert ((gimple_omp_for_kind (fd->for_stmt)
-			!= GF_OMP_FOR_KIND_OACC_LOOP)
-		   || !inner_stmt);
+  gcc_checking_assert (gimple_omp_for_kind (fd->for_stmt)
+		   != GF_OMP_FOR_KIND_OACC_LOOP);
 
   itype = type = TREE_TYPE (fd->loop.v);
   if (POINTER_TYPE_P (type))
@@ -7126,14 +7120,6 @@ expand_omp_for_static_nochunk (struct om
   threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM);
   threadid = build_call_expr (threadid, 0);
   break;
-case GF_OMP_FOR_KIND_OACC_LOOP:
-  {
-	gimple_seq seq = NULL;
-	nthreads = expand_oacc_get_num_threads (&seq, region->gwv_this);
-	threadid = expand_oacc_get_thread_num (&seq, region->gwv_this);
-	gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
-  }
-  break;
 default:
   gcc_unreachable ();
 }
@@ -7312,8 +7298,7 @@ expand_omp_for_static_nochunk (struct om
 
   /* Replace the GIMPLE_OMP_RETURN with a barrier, or nothing.  */
   gsi = gsi_last_bb (exit_bb);
-  if (gimple_omp_for_kind (fd->for_stmt) != GF_OMP_FOR_KIND_OACC_LOOP
-  && !gimple_omp_return_nowait_p (gsi_stmt (gsi)))
+  if (!gimple_omp_return_nowait_p (gsi_stmt (gsi)))
 {
   t = gimple_omp_return_lhs (gsi_stmt (gsi));
   gsi_insert_after (&gsi, build_omp_barrier (t), GSI_SAME_STMT);
@@ -7437,11 +7422,6 @@ find_phi_with_arg_on_edge (tree arg, edg
 	trip += 1;
 	goto L0;
 L4:
-
- For OpenACC the above is wrapped in an OACC_FORK/OACC_JOIN pair.
- Currently we wrap the whole sequence, but it'd be better to place the
- markers just inside the outer conditional, so they can be entirely
- eliminated if the loop is unreachable.
 */
 
 static void
@@ -7459,9 +7439,8 @@ expand_omp_for_static_chunk (struct omp_
   tree *counts = NULL;
   tree n1, n2, step;
 
-  gcc_checking_assert ((gimple_omp_for_kind (fd->for_stmt)
-			!= GF_OMP_FOR_KIND_OACC_LOOP)
-		   || !inner_stmt);
+  gcc_checking_assert (gimple_omp_for_kind (fd->for_stmt)
+		   != GF_OMP_FOR_KIND_OACC_LOOP);
 
   itype = type = TREE_TYPE (fd->loop.v);
   if (POINTER_TYPE_P (type))
@@ -7565,14 +7544,6 @@ expand_omp_for_static_chunk (struct omp_
   threadid = builtin_de

Re: Move some bit and binary optimizations in simplify and match

2015-10-12 Thread Richard Biener

On Mon, Oct 12, 2015 at 12:22 PM, Hurugalawadi, Naveen
 wrote:
> Hi Richard,
>
> Thanks for your review and useful comments.
> I will  move the future optimization patterns with all the conditions
> present in fold-const or builtins file as per your suggestions.
>
> Please find attached the patch as per your comments.
> Please review the patch and let me know if any further modifications
> are required.

+/* Fold X + (X / CST) * -CST to X % CST.  */
+(simplify
+ (plus (convert1? @0) (convert2? (mult (trunc_div @0 INTEGER_CST@1)
INTEGER_CST@2)))
+  (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+   && wi::add (@1, @2) == 0)

when you use convert? to mimic fold-const.c behavior you have to add

&& tree_nop_conversion_p (type, TREE_TYPE (@0))

note that in this case only both or no conversion can occur so please
use convert? in both places.

+   (trunc_mod (convert @0) (convert @1

This applies to other uses of convert[12]?, too.

As said for the above pattern fold-const.c also handled X + (X / A) * (-A) with
A not being a constant.  Unfortunately predicate syntax can't capture
both -A and -CST but one can use

(match (xdivamulminusa @X @A)
 (mult (trunc_div @X @A) (negate @X)))
(match (xdivamulminusa @X @A)
 (mult (trunc_div @X INTEGER_CST@A) INTEGER_CST@0)
 (if (wi::add (@A, @0) == 0)))

and then

(simplify
  (plus (convert? @0) (convert? (xdivamulminusa @0 @1)))
  (if (...)
   (trunc_mod (convert @0) (convert @1

to avoid duplicating the pattern (though that might be more readable
in this case...)

+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @2))
+  (if (! FLOAT_TYPE_P (type)
+   && wi::eq_p (@1, @2))
+   (minus (bit_xor @0 @1) @1)))

you can't simply use wi::eq_p on random trees.  Same solution like above
can be used (or pattern duplication).

+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type))
+   (lshift (convert @0) (convert @2

the conversion on @0 isn't interesting to capture, only that on
the lshift is.

+/* Fold (C1/X)*C2 into (C1*C2)/X.  */
+(simplify
+ (mult (rdiv REAL_CST@0 @1) REAL_CST@2)
+  (with
+   { tree tem = const_binop (MULT_EXPR, type, @0, @2); }
+  (if (tem && FLOAT_TYPE_P (type)
+   && flag_associative_math)
+   (rdiv (mult @0 @2) @1

you comuted 'tem', so use it:

(rdiv { tem; } @1

The FLOAT_TYPE_P check is redundant I think and the flag_associative_math
check should be done before computing tem.

+/* Simplify (X & ~Y) | (~X & Y) is X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:s @0 (bit_not @1)) (bit_and:s (bit_not @2) @3))
+  (if (wi::eq_p (@0, @2)
+   && wi::eq_p (@1, @3))
+   (bit_xor @0 @3)))

See above for handling constants and using wi::eq_p.  Looks like I
really need to make 'match' handle these kind of things.

+/* Simplify ~X & X as zero.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_not @0)))
+  (convert { build_zero_cst (TREE_TYPE (@0)); }))

please simplify to

   { build_zero_cst (type); }

directly.

+/* (-A) * (-B) -> A * B  */
+(simplify
+ (mult:c (convert? (negate @0)) (convert? (negate @1)))
+  (if ((GIMPLE && useless_type_conversion_p (type, TREE_TYPE (@0)))
+   || (GENERIC && type == TREE_TYPE (@0)))
+   (mult (convert @0) (convert @1

note that fold-const.c handled multiple ways of negation thus please
use the existing negate_expr_p 'match' like

(simplify
  (mult:c (convert? (negate @0)) (convert? negate_expr_p@1))
  (if ...
(mult (convert @0) (convert (negate @1)))

also use tree_nop_conversion_p, not the GIMPLE/GENERIC variants.

Thanks,
Richard.


> The last pattern has been removed due to the discussions over it
> and a regression it caused.
> 
> +/* Fold X & (X ^ Y) as X & ~Y.  */
> +(simplify
> + (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
> +  (bit_and (convert @0) (convert (bit_not @1
> 
>
> FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0
> FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;"
>
> Thanks,
> Naveen

Re: [AArch64_be] Fix vtbl[34] and vtbx4

2015-10-12 Thread James Greenhalgh

On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote:
> On 8 October 2015 at 11:12, James Greenhalgh  wrote:
> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> >> On 7 October 2015 at 17:09, James Greenhalgh  
> >> wrote:
> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >> >
> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> >> > directly (as in the inline asm versions you replace)?
> >> >
> >> I just followed the pattern used for vtbx3.
> >>
> >> > This sequence does make sense for vtbx3.
> >> In fact, I don't see why vtbx3 and vtbx4 should be different?
> >
> > The difference between TBL and TBX is in their handling of a request to
> > select an out-of-range value. For TBL this returns zero, for TBX this
> > returns the value which was already in the destination register.
> >
> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
> > (so two of them togather allow selecting elements in the range 0-31), and
> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 
> > 3x64-bit
> > vectors (allowing elements in the range 0-23), we need to manually check for
> > values which would have been out-of-range on AArch32, but are not out
> > of range for AArch64 and handle them appropriately. For vtbx4 on the other
> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
> > the range 0..31, so we don't need the special masked handling.
> >
> > You can find the suggested instruction sequences for the Neon intrinsics
> > in this document:
> >
> >   
> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
> >
> 
> Hi James,
> 
> Please find attached an updated version which hopefully addresses your 
> comments.
> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model.
> 
> OK?

Looks good to me,

Thanks,
James

Re: [PATCH, 1/5] Handle simple latch in expand_omp_for_generic

2015-10-12 Thread Bernd Schmidt


On 10/10/2015 01:24 PM, Tom de Vries wrote:

On 10/10/15 13:06, Tom de Vries wrote:

OK, I'll repost with the patch split up, as follows:

  1Handle simple latch in expand_omp_for_generic
  2Add missing phis in expand_omp_for_generic
  3Handle original loop tree in expand_omp_for_generic
  4Support DEFPARAMENUM in params.def
  5Add param parloops-schedule


this patch handles simple latches in expand_omp_for_generic.

This allows us to handle loops which have the LOOPS_HAVE_SIMPLE_LATCHES
property.

A similar fix was done:
- in r226427 for expand_omp_for_static_nochunk (for PR66846)
- in r227435 for expand_omp_for_static_chunk (for
   --param parloops-chunk-size)


This looks ok.


Bernd

[PR debug/67192] Fix C loops' back-jump location

2015-10-12 Thread Andreas Arnez

Since r223098 ("Implement -Wmisleading-indentation") the backward-jump
generated for a C while- or for-loop can get the wrong line number.
This is because the check for misleading indentation peeks ahead one
token, advancing input_location to after the loop, and then
c_finish_loop() creates the back-jump and calls add_stmt(), which
assigns input_location to the statement by default.

This patch swaps the check for misleading indentation with the finishing
of the loop, such that input_location still has the right value at the
time of any invocations of add_stmt().

gcc/testsuite/ChangeLog:

PR debug/67192
* gcc.dg/guality/pr67192.c: New test.

gcc/c/ChangeLog:

PR debug/67192
* c-parser.c (c_parser_while_statement): Finish the loop before
parsing ahead for misleading indentation.
(c_parser_for_statement): Likewise.
---
 gcc/c/c-parser.c   | 13 +
 gcc/testsuite/gcc.dg/guality/pr67192.c | 50 ++
 2 files changed, 57 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr67192.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2d24c21..8740922 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -5438,13 +5438,13 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
 = get_token_indent_info (c_parser_peek_token (parser));
 
   body = c_parser_c99_block_statement (parser);
+  c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
+  add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
 
   token_indent_info next_tinfo
 = get_token_indent_info (c_parser_peek_token (parser));
   warn_for_misleading_indentation (while_tinfo, body_tinfo, next_tinfo);
 
-  c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
-  add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
   c_break_label = save_break;
   c_cont_label = save_cont;
 }
@@ -5728,15 +5728,16 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
 
   body = c_parser_c99_block_statement (parser);
 
-  token_indent_info next_tinfo
-= get_token_indent_info (c_parser_peek_token (parser));
-  warn_for_misleading_indentation (for_tinfo, body_tinfo, next_tinfo);
-
   if (is_foreach_statement)
 objc_finish_foreach_loop (loc, object_expression, collection_expression, 
body, c_break_label, c_cont_label);
   else
 c_finish_loop (loc, cond, incr, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99 || c_dialect_objc 
()));
+
+  token_indent_info next_tinfo
+= get_token_indent_info (c_parser_peek_token (parser));
+  warn_for_misleading_indentation (for_tinfo, body_tinfo, next_tinfo);
+
   c_break_label = save_break;
   c_cont_label = save_cont;
 }
diff --git a/gcc/testsuite/gcc.dg/guality/pr67192.c 
b/gcc/testsuite/gcc.dg/guality/pr67192.c
new file mode 100644
index 000..73d4e44
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr67192.c
@@ -0,0 +1,50 @@
+/* PR debug/67192 */
+/* { dg-do run } */
+/* { dg-options "-g" } */
+
+static volatile int cnt = 0;
+
+__attribute__((noinline)) int
+f1 (void)
+{
+  return ++cnt % 5 == 0;
+}
+
+__attribute__((noinline)) void
+f2 (void)
+{
+}
+
+__attribute__((noinline)) void
+f3 (int (*last) (void), void (*do_it) (void))
+{
+  for (;;)
+{
+  if (last ())
+   break;
+  do_it ();
+}
+  do_it (); /* { dg-final { gdb-test 27 "cnt" "5" } } */
+
+  while (1)
+{
+  if (last ())
+   break;
+  do_it ();
+}
+  do_it (); /* { dg-final { gdb-test 35 "cnt" "10" } } */
+}
+
+int (*volatile fnp1) (void) = f1;
+void (*volatile fnp2) (void) = f2;
+void (*volatile fnp3) (int (*) (void), void (*) (void)) = f3;
+
+int
+main (int argc, char *argv[])
+{
+  asm volatile ("" : : "r" (&fnp1) : "memory");
+  asm volatile ("" : : "r" (&fnp2) : "memory");
+  asm volatile ("" : : "r" (&fnp3) : "memory");
+  fnp3 (fnp1, fnp2);
+  return 0;
+}
-- 
2.3.0

Re: [PATCH, 2/5] Add missing phis in expand_omp_for_generic

2015-10-12 Thread Bernd Schmidt


On 10/10/2015 01:49 PM, Tom de Vries wrote:

On 10/10/15 13:06, Tom de Vries wrote:

OK, I'll repost with the patch split up, as follows:

  1Handle simple latch in expand_omp_for_generic
  2Add missing phis in expand_omp_for_generic
  3Handle original loop tree in expand_omp_for_generic
  4Support DEFPARAMENUM in params.def
  5Add param parloops-schedule


Hi,

this patch adds missing phis in expand_omp_for_generic.

In expand_omp_for_generic, we add an outer loop around an inner loop.
That means we need to:
- add the necessary phis on the outer loop, and
- move the loop entry value of the inner phi to the loop entry value of
   the outer phi


Also ok, I think. This seems to be slightly different from the one 
originally submitted?



Bernd

Re: Move sqrt and cbrt simplifications to match.pd

2015-10-12 Thread Christophe Lyon

On 9 October 2015 at 18:17, Richard Sandiford  wrote:
> Richard Sandiford  writes:
>> Christophe Lyon  writes:
>>> On 8 October 2015 at 18:55, Richard Sandiford
>>>  wrote:
 Marc Glisse  writes:
> On Mon, 5 Oct 2015, Richard Sandiford wrote:
>
>> +  /* cbrt(sqrt(x)) -> pow(x,1/6).  */
>> +  (simplify
>> +   (sqrts (cbrts @0))
>> +   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))
>> +  /* sqrt(cbrt(x)) -> pow(x,1/6).  */
>> +  (simplify
>> +   (cbrts (sqrts @0))
>> +   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))
>
> I think you swapped the comments (not that it matters).

 Thanks, fixed in the committed version.

 Richard

>>> Hi Richard,
>>>
>>> Since you committed this patch, I've noticed that gcc.dg/builtins-10.c fails
>>> on arm-none-linux-gnueabi targets (as opposed to arm-none-linux-gnueabihf).
>>>
>>> gcc.log shows:
>>> /cchfHDHc.o: In function `test':
>>> builtins-10.c:(.text+0x60): undefined reference to `link_error'
>>> collect2: error: ld returned 1 exit status
>>
>> Looks like this is the same fold_strip_sign_ops problem that I was seeing
>> with some WIP follow-on patches.  We don't fold pow(abs(x), 4) to pow(x, 4).
>
> Here's the patch I'm testing.
>
> Thanks,
> Richard
>
>
> gcc/
> * real.h (real_isinteger): Declare.
> * real.c (real_isinteger): New function.
> * match.pd: Simplify pow(|x|,y) and pow(-x,y) to pow(x,y)
> if y is an even integer.
>

This makes sense indeed. I was wondering why I didn't notice
regressions on arm-*hf targets:
are such optimizations caught in later passes for some targets?

> diff --git a/gcc/match.pd b/gcc/match.pd
> index b87c436..67f9d54 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -309,12 +309,19 @@ along with GCC; see the file COPYING3.  If not see
> && TYPE_OVERFLOW_UNDEFINED (type))
> @0)))
>
> -/* Simplify cos (-x) -> cos (x).  */
>  (for op (negate abs)
> -(for coss (COS COSH)
> - (simplify
> -  (coss (op @0))
> -   (coss @0
> + /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
> + (for coss (COS COSH)
> +  (simplify
> +   (coss (op @0))
> +(coss @0)))
> + /* Simplify pow(-x, y) and pow(|x|,y) -> pow(x,y) if y is an even integer.  
> */
> + (for pows (POW)
> +  (simplify
> +   (pows (op @0) REAL_CST@1)
> +   (with { HOST_WIDE_INT n; }
> +(if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
> + (pows @0 @1))
>
>  /* X % Y is smaller than Y.  */
>  (for cmp (lt ge)
> diff --git a/gcc/real.c b/gcc/real.c
> index f633ffd..85ac83d 100644
> --- a/gcc/real.c
> +++ b/gcc/real.c
> @@ -4997,6 +4997,24 @@ real_isinteger (const REAL_VALUE_TYPE *c, machine_mode 
> mode)
>return real_identical (c, &cint);
>  }
>
> +/* Check whether C is an integer that fits in a HOST_WIDE_INT,
> +   storing it in *INT_OUT if so.  */
> +
> +bool
> +real_isinteger (const REAL_VALUE_TYPE *c, HOST_WIDE_INT *int_out)
> +{
> +  REAL_VALUE_TYPE cint;
> +
> +  HOST_WIDE_INT n = real_to_integer (c);
> +  real_from_integer (&cint, VOIDmode, n, SIGNED);
> +  if (real_identical (c, &cint))
> +{
> +  *int_out = n;
> +  return true;
> +}
> +  return false;
> +}
> +
>  /* Write into BUF the maximum representable finite floating-point
> number, (1 - b**-p) * b**emax for a given FP format FMT as a hex
> float string.  LEN is the size of BUF, and the buffer must be large
> diff --git a/gcc/real.h b/gcc/real.h
> index 706859b..e65b526 100644
> --- a/gcc/real.h
> +++ b/gcc/real.h
> @@ -467,7 +467,8 @@ extern void real_round (REAL_VALUE_TYPE *, machine_mode,
>  extern void real_copysign (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *);
>
>  /* Check whether the real constant value given is an integer.  */
> -extern bool real_isinteger (const REAL_VALUE_TYPE *c, machine_mode mode);
> +extern bool real_isinteger (const REAL_VALUE_TYPE *, machine_mode);
> +extern bool real_isinteger (const REAL_VALUE_TYPE *, HOST_WIDE_INT *);
>
>  /* Write into BUF the maximum representable finite floating-point
> number, (1 - b**-p) * b**emax for a given FP format FMT as a hex
>

Re: [PATCH, 3/5] Handle original loop tree in expand_omp_for_generic

2015-10-12 Thread Bernd Schmidt


On 10/10/2015 01:58 PM, Tom de Vries wrote:


Handle original loop tree in expand_omp_for_generic

2015-09-10  Tom de Vries

PR tree-optimization/67476
* omp-low.c (expand_omp_for_generic): Handle original loop tree.


This one I find slightly confusing.


-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
+  struct loop *loop = l1_bb->loop_father;
+  add_bb_to_loop (l2_bb, entry_bb->loop_father);
add_loop (outer_loop, l0_bb->loop_father);


Looks like a lot of bb's loop_father is being looked at. Are all or some 
of these supposed to be the same? I think I'd like one (appropriately 
named) struct loop * variable for each loop that's involved here. 
There's a comment suggesting that there can be different situations, it 
would be good to expand that to explain how they can arise.



- struct loop *loop = alloc_loop ();
+ loop = alloc_loop ();


Also, I think it would be preferrable to not reuse that loop variable 
but make a new one instead.



Bernd

Re: [PATCH, 2/5] Add missing phis in expand_omp_for_generic

2015-10-12 Thread Tom de Vries


On 12/10/15 16:05, Bernd Schmidt wrote:

This seems to be slightly different from the one originally submitted?


Hi Bernd,

As I mentioned here  ( 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01043.html ),  I've moved 
the ssa-support bit into the !broken_loop condition. I think that's the 
only difference.


Thanks,
- Tom

Re: [PATCH, 4/5] Support DEFPARAMENUM in params.def

2015-10-12 Thread Bernd Schmidt


On 10/10/2015 02:05 PM, Tom de Vries wrote:

On 10/10/15 13:06, Tom de Vries wrote:

OK, I'll repost with the patch split up, as follows:

  1Handle simple latch in expand_omp_for_generic
  2Add missing phis in expand_omp_for_generic
  3Handle original loop tree in expand_omp_for_generic
  4Support DEFPARAMENUM in params.def
  5Add param parloops-schedule



this patch adds support for DEFPARAMENUM in params.def.

Using this support, we're able to define a param parloops-schedule with
values names "static, dynamic, guided, auto, runtime" and default value
"static" like this in params.def:


This and the next one are ok once all the prerequisites are in.


Bernd

[committed. gomp4] pass_dominator_oacc_kernels patch series

2015-10-12 Thread Tom de Vries


Hi,

I've committed the following patch series to the gomp-4_0-branch.

 1  Add pass_dominator::jump_threading_p ()
 2  Add dom_walker::walk_until
 3  Add pass_dominator::sese_mode_p ()
 4  Add skip_stmt parm to pass_dominator::get_sese ()
 5  Add oacc kernels related infra functions
 6  Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the 
pass_dominator optimizations (with the exception of jump threading) on 
each oacc kernels region rather than on the whole function.


Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.

Thanks,
- Tom

[committed, gomp4, 1/6] Add pass_dominator::jump_threading_p ()

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds the possibility to pass_dominators to switch off the 
jump threading optimization.


Note that we do not disable threadedge_initialize_values / 
threadedge_finalize_values, since the values stored there are used for 
other optimizations as well.


Thanks,
- Tom
Add pass_dominator::jump_threading_p ()

2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (dom_opt_dom_walker::dom_opt_dom_walker): Add
	jump_threading_p parameters.
	(dom_opt_dom_walker::m_jump_threading_p): New private var.
	(pass_dominator::jump_threading_p): New protected virtual function.
	(pass_dominator::execute): Handle jump_threading_p.
	(dom_opt_dom_walker::before_dom_children)
	(dom_opt_dom_walker::after_dom_children): Handle m_jump_threading_p.
---
 gcc/tree-ssa-dom.c | 109 +++--
 1 file changed, 64 insertions(+), 45 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a8b7038..162d9ed 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -492,11 +492,14 @@ class dom_opt_dom_walker : public dom_walker
 public:
   dom_opt_dom_walker (cdi_direction direction,
 		  class const_and_copies *const_and_copies,
-		  class avail_exprs_stack *avail_exprs_stack)
+		  class avail_exprs_stack *avail_exprs_stack,
+		  bool jump_threading_p)
 : dom_walker (direction),
   m_const_and_copies (const_and_copies),
   m_avail_exprs_stack (avail_exprs_stack),
-  m_dummy_cond (NULL) {}
+  m_dummy_cond (NULL),
+  m_jump_threading_p (jump_threading_p)
+  {}
 
   virtual void before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
@@ -509,6 +512,7 @@ private:
   class avail_exprs_stack *m_avail_exprs_stack;
 
   gcond *m_dummy_cond;
+  bool m_jump_threading_p;
 };
 
 /* Jump threading, redundancy elimination and const/copy propagation.
@@ -544,6 +548,10 @@ public:
   virtual bool gate (function *) { return flag_tree_dom != 0; }
   virtual unsigned int execute (function *);
 
+ protected:
+  /* Return true if pass should perform jump threading.  */
+  virtual bool jump_threading_p (void) { return true; }
+
 }; // class pass_dominator
 
 unsigned int
@@ -578,25 +586,29 @@ pass_dominator::execute (function *fun)
   /* Initialize the value-handle array.  */
   threadedge_initialize_values ();
 
-  /* We need accurate information regarding back edges in the CFG
- for jump threading; this may include back edges that are not part of
- a single loop.  */
-  mark_dfs_back_edges ();
-
-  /* We want to create the edge info structures before the dominator walk
- so that they'll be in place for the jump threader, particularly when
- threading through a join block.
-
- The conditions will be lazily updated with global equivalences as
- we reach them during the dominator walk.  */
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
-record_edge_info (bb);
+  if (jump_threading_p ())
+{
+  /* We need accurate information regarding back edges in the CFG
+	 for jump threading; this may include back edges that are not part of
+	 a single loop.  */
+  mark_dfs_back_edges ();
+
+  /* We want to create the edge info structures before the dominator walk
+	 so that they'll be in place for the jump threader, particularly when
+	 threading through a join block.
+
+	 The conditions will be lazily updated with global equivalences as
+	 we reach them during the dominator walk.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+	record_edge_info (bb);
+}
 
   /* Recursively walk the dominator tree optimizing statements.  */
   dom_opt_dom_walker walker (CDI_DOMINATORS,
 			 const_and_copies,
-			 avail_exprs_stack);
+			 avail_exprs_stack,
+			 jump_threading_p ());
   walker.walk (fun->cfg->x_entry_block_ptr);
 
   {
@@ -616,10 +628,13 @@ pass_dominator::execute (function *fun)
  duplication and CFG manipulation.  */
   update_ssa (TODO_update_ssa);
 
-  free_all_edge_infos ();
+  if (jump_threading_p ())
+{
+  free_all_edge_infos ();
 
-  /* Thread jumps, creating duplicate blocks as needed.  */
-  cfg_altered |= thread_through_all_blocks (first_pass_instance);
+  /* Thread jumps, creating duplicate blocks as needed.  */
+  cfg_altered |= thread_thr

[committed, gomp4, 2/6] Add dom_walker::walk_until

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds the ability to walk a part of a dominator tree, rather 
than the whole tree.


Thanks,
- Tom
Add dom_walker::walk_until

2015-10-12  Tom de Vries  

	* domwalk.c (dom_walker::walk): Rename to ...
	(dom_walker::walk_until): ... this.  Add and handle until and
	until_inclusive parameters.
	(dom_walker::walk): Reimplement using dom_walker::walk_until.
	* domwalk.h (dom_walker::walk_until): Declare.
---
 gcc/domwalk.c | 32 +++-
 gcc/domwalk.h |  2 ++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/domwalk.c b/gcc/domwalk.c
index bbf9ff8..5fe666e 100644
--- a/gcc/domwalk.c
+++ b/gcc/domwalk.c
@@ -144,11 +144,18 @@ cmp_bb_postorder (const void *a, const void *b)
 }
 
 /* Recursively walk the dominator tree.
-   BB is the basic block we are currently visiting.  */
+   BB is the basic block we are currently visiting.  UNTIL is a basic_block that
+   is the root of a subtree that we won't visit.  If UNTIL_INCLUSIVE, we visit
+   UNTIL, but not it's children.  Otherwise don't visit UNTIL and its
+   children.  */
 
 void
-dom_walker::walk (basic_block bb)
+dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
 {
+  bool skip_self = (bb == until && !until_inclusive);
+  if (skip_self)
+return;
+
   basic_block dest;
   basic_block *worklist = XNEWVEC (basic_block,
    n_basic_blocks_for_fn (cfun) * 2);
@@ -182,9 +189,15 @@ dom_walker::walk (basic_block bb)
 	  worklist[sp++] = NULL;
 
 	  int saved_sp = sp;
-	  for (dest = first_dom_son (m_dom_direction, bb);
-	   dest; dest = next_dom_son (m_dom_direction, dest))
-	worklist[sp++] = dest;
+	  bool skip_children = bb == until && until_inclusive;
+	  if (!skip_children)
+	for (dest = first_dom_son (m_dom_direction, bb);
+		 dest; dest = next_dom_son (m_dom_direction, dest))
+	  {
+		bool skip_child = (dest == until && !until_inclusive);
+		if (!skip_child)
+		  worklist[sp++] = dest;
+	  }
 	  if (m_dom_direction == CDI_DOMINATORS)
 	switch (sp - saved_sp)
 	  {
@@ -218,3 +231,12 @@ dom_walker::walk (basic_block bb)
 }
   free (worklist);
 }
+
+/* Recursively walk the dominator tree.
+   BB is the basic block we are currently visiting.  */
+
+void
+dom_walker::walk (basic_block bb)
+{
+  walk_until (bb, NULL, true);
+}
diff --git a/gcc/domwalk.h b/gcc/domwalk.h
index 71a7c47..71e6075 100644
--- a/gcc/domwalk.h
+++ b/gcc/domwalk.h
@@ -34,6 +34,8 @@ public:
 
   /* Walk the dominator tree.  */
   void walk (basic_block);
+  /* Walk a part of the dominator tree.  */
+  void walk_until (basic_block, basic_block, bool);
 
   /* Function to call before the recursive walk of the dominator children.  */
   virtual void before_dom_children (basic_block) {}
-- 
1.9.1

[committed, gomp4, 3/6] Add pass_dominator::sese_mode_p ()

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds the ability to pass_dominator to work on a series of 
sese regions rather than on the entire function.


Thanks,
- Tom
Add pass_dominator::sese_mode_p ()

2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (pass_dominator::jump_threading_p): Handle sese_mode_p.
	(pass_dominator::sese_mode_p, pass_dominator::get_sese): New protected
	virtual function.
	(pass_dominator::execute): Handle sese_mode_p.
---
 gcc/tree-ssa-dom.c | 49 +++--
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 162d9ed..7a1250e 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dom.h"
 #include "gimplify.h"
 #include "tree-cfgcleanup.h"
+#include "cfgcleanup.h"
 
 /* This file implements optimizations on the dominator tree.  */
 
@@ -550,7 +551,17 @@ public:
 
  protected:
   /* Return true if pass should perform jump threading.  */
-  virtual bool jump_threading_p (void) { return true; }
+  virtual bool jump_threading_p (void) { return !sese_mode_p (); }
+
+  /* Return true if pass should visit a series of seses rather than the whole
+ dominator tree.  */
+  virtual bool sese_mode_p (void) { return false; }
+
+  /* In sese mode, return true if there's another sese to visit.  Return the
+ sese to visit in SESE_ENTRY and SESE_EXIT.  */
+  virtual bool get_sese (basic_block *sese_entry ATTRIBUTE_UNUSED,
+			 basic_block *sese_exit ATTRIBUTE_UNUSED)
+{ gcc_unreachable (); }
 
 }; // class pass_dominator
 
@@ -583,11 +594,14 @@ pass_dominator::execute (function *fun)
  LOOPS_HAVE_PREHEADERS won't be needed here.  */
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
 
-  /* Initialize the value-handle array.  */
-  threadedge_initialize_values ();
+  if (!sese_mode_p ())
+/* Initialize the value-handle array.  */
+threadedge_initialize_values ();
 
   if (jump_threading_p ())
 {
+  gcc_assert (!sese_mode_p ());
+
   /* We need accurate information regarding back edges in the CFG
 	 for jump threading; this may include back edges that are not part of
 	 a single loop.  */
@@ -609,7 +623,29 @@ pass_dominator::execute (function *fun)
 			 const_and_copies,
 			 avail_exprs_stack,
 			 jump_threading_p ());
-  walker.walk (fun->cfg->x_entry_block_ptr);
+  if (!sese_mode_p ())
+walker.walk (fun->cfg->x_entry_block_ptr);
+  else
+{
+  basic_block sese_entry, sese_exit;
+  while (get_sese (&sese_entry, &sese_exit))
+	{
+	  threadedge_initialize_values ();
+	  avail_exprs_stack->push_marker ();
+	  const_and_copies->push_marker ();
+
+	  walker.walk_until (sese_entry, sese_exit, true);
+
+	  avail_exprs_stack->pop_to_marker ();
+	  const_and_copies->pop_to_marker ();
+	  threadedge_finalize_values ();
+
+	  /* KLUDGE: The dom_walker does not allow unreachable blocks when
+	 starting the walk, and during the dom_opt_dom_walker walk we may
+	 produce unreachable blocks, so we need to clean them up here.  */
+	  delete_unreachable_blocks ();
+	}
+}
 
   {
 gimple_stmt_iterator gsi;
@@ -709,8 +745,9 @@ pass_dominator::execute (function *fun)
   delete avail_exprs_stack;
   delete const_and_copies;
 
-  /* Free the value-handle array.  */
-  threadedge_finalize_values ();
+  if (!sese_mode_p ())
+/* Free the value-handle array.  */
+threadedge_finalize_values ();
 
   return 0;
 }
-- 
1.9.1

Re: Make cgraph frequencies more precise

2015-10-12 Thread H.J. Lu

On Sun, Oct 11, 2015 at 11:07 PM, Jan Hubicka  wrote:
> Hi,
> this patch fixes a case of extreme imprecision I noticed while looking into
> profiles of PHP interpretter. There is a function that is called 22 times
> and contains the main loop.  Now since the frequency of entry block is dropped
> to 0, we do not have any information of relative frequencies in the colder
> areas of the function.
>
> Hope all this uglyness with go away with conversion to sreals soonish.
>
> Profiledbootstrapped/regtested ppc64le-linux, comitted.
> * cgraphbuild.c (compute_call_stmt_bb_frequency): Use
> counts when these are more informative.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67931

-- 
H.J.

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-10-12 Thread Lynn A. Boger


Thanks for doing this Alan.  I agree this looks better to me.

I assume by "etc" you mean you did biarch builds for your bootstraps on BE?

On 10/11/2015 08:07 AM, Alan Modra wrote:

On Sat, Oct 10, 2015 at 11:25:38PM +0200, Andreas Schwab wrote:

"Lynn A. Boger"  writes:


Index: gcc/config/rs6000/sysv4.h
===
--- gcc/config/rs6000/sysv4.h   (revision 228653)
+++ gcc/config/rs6000/sysv4.h   (working copy)
@@ -940,13 +940,15 @@ ncrtn.o%s"
  #undef TARGET_ASAN_SHADOW_OFFSET
  #define TARGET_ASAN_SHADOW_OFFSET rs6000_asan_shadow_offset
  
-/* On ppc64 and ppc64le, split stack is only support for

-   64 bit. */
+/* On ppc64 and ppc64le, split stack is only supported for
+   64 bit targets with a 64 bit compiler. */
  #undef TARGET_CAN_SPLIT_STACK_64BIT
+#if defined (__64BIT__) || defined (__powerpc64__) || defined (__ppc64__)

This doesn't make sense.  A target header cannot use host defines.

Right.  Here's a better fix.  A powerpc-linux biarch compiler can
default to either -m32 or -m64 so we need to take that into account,
and notice both -m32 and -m64 on the gccgo command line.  It's also
possible to build a -m64 only compiler, so in that case we can define
TARGET_CAN_SPLIT_STACK.

Bootstrapped etc. powerpc64-linux, powerpc-linux and
powerpc64le-linux.  OK?

gcc/
* config/rs6000/sysv4.h (TARGET_CAN_SPLIT_STACK_64BIT): Don't define.
* config/rs6000/linux64.h (TARGET_CAN_SPLIT_STACK): Define.
(TARGET_CAN_SPLIT_STACK_64BIT): Define.
gcc/go/
* gospec.c (saw_opt_m32): Rename to..
(is_m64): ..this, initialised by TARGET_CAN_SPLIT_STACK_64BIT.
Update uses.
(lang_specific_driver): Set is_m64 if OPT_m64, clear if OPT_m32.

diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 7b2f9bd..f48af43 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -940,14 +940,6 @@ ncrtn.o%s"
  #undef TARGET_ASAN_SHADOW_OFFSET
  #define TARGET_ASAN_SHADOW_OFFSET rs6000_asan_shadow_offset

-/* On ppc64 and ppc64le, split stack is only support for
-   64 bit. */
-#undef TARGET_CAN_SPLIT_STACK_64BIT
-#if TARGET_GLIBC_MAJOR > 2 \
-  || (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 18)
-#define TARGET_CAN_SPLIT_STACK_64BIT
-#endif
-
  /* This target uses the sysv4.opt file.  */
  #define TARGET_USES_SYSV4_OPT 1

diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 9599735..28c83e41 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -245,6 +245,21 @@ extern int dot_symbols;
  #define MULTILIB_DEFAULTS { "m32" }
  #endif

+/* Split stack is only supported for 64 bit, and requires glibc >= 2.18.  */
+#if TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2018
+# ifndef RS6000_BI_ARCH
+#  define TARGET_CAN_SPLIT_STACK
+# else
+#  if DEFAULT_ARCH64_P
+/* Supported, and the default is -m64  */
+#   define TARGET_CAN_SPLIT_STACK_64BIT 1
+#  else
+/* Supported, and the default is -m32  */
+#   define TARGET_CAN_SPLIT_STACK_64BIT 0
+#  endif
+# endif
+#endif
+
  #ifndef RS6000_BI_ARCH

  /* 64-bit PowerPC Linux always has a TOC.  */
diff --git a/gcc/go/gospec.c b/gcc/go/gospec.c
index ca3c2d7..fbb55be 100644
--- a/gcc/go/gospec.c
+++ b/gcc/go/gospec.c
@@ -120,8 +120,10 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
/* Whether the -S option was used.  */
bool saw_opt_S = false;

-  /* Whether the -m32 option was used. */
-  bool saw_opt_m32 ATTRIBUTE_UNUSED = false;
+#ifdef TARGET_CAN_SPLIT_STACK_64BIT
+  /* Whether the -m64 option is in force. */
+  bool is_m64 = TARGET_CAN_SPLIT_STACK_64BIT;
+#endif

/* The first input file with an extension of .go.  */
const char *first_go_file = NULL;
@@ -160,7 +162,11 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,

  #ifdef TARGET_CAN_SPLIT_STACK_64BIT
case OPT_m32:
- saw_opt_m32 = true;
+ is_m64 = false;
+ break;
+
+   case OPT_m64:
+ is_m64 = true;
  break;
  #endif

@@ -253,7 +259,7 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
  #endif

  #ifdef TARGET_CAN_SPLIT_STACK_64BIT
-  if (!saw_opt_m32)
+  if (is_m64)
  supports_split_stack = 1;
  #endif

[committed, gomp4, 4/6] Add skip_stmt parm to pass_dominator::get_sese ()

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds the ability in pass_dominator to skip a stmt while 
optimizing a sese region.


Thanks,
- Tom
Add skip_stmt parm to pass_dominator::get_sese ()

2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (dom_opt_dom_walker::set_skip_stmt): New function.
	(dom_opt_dom_walker::m_skip_stmt): New private var.
	(pass_dominator::get_sese): Add skip_stmt parameters.
	(pass_dominator::execute): Call set_skip_stmt with statement to skip for
	sese.
	(dom_opt_dom_walker::before_dom_children): Handle m_skip_stmt.
---
 gcc/tree-ssa-dom.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 7a1250e..573e6fc 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -504,6 +504,7 @@ public:
 
   virtual void before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
+  void set_skip_stmt (gimple *skip_stmt) { m_skip_stmt = skip_stmt; }
 
 private:
   void thread_across_edge (edge);
@@ -514,6 +515,7 @@ private:
 
   gcond *m_dummy_cond;
   bool m_jump_threading_p;
+  gimple *m_skip_stmt;
 };
 
 /* Jump threading, redundancy elimination and const/copy propagation.
@@ -558,9 +560,11 @@ public:
   virtual bool sese_mode_p (void) { return false; }
 
   /* In sese mode, return true if there's another sese to visit.  Return the
- sese to visit in SESE_ENTRY and SESE_EXIT.  */
+ sese to visit in SESE_ENTRY and SESE_EXIT.  If a stmt in the sese should
+ not be optimized, return it in SKIP_STMT.  */
   virtual bool get_sese (basic_block *sese_entry ATTRIBUTE_UNUSED,
-			 basic_block *sese_exit ATTRIBUTE_UNUSED)
+			 basic_block *sese_exit ATTRIBUTE_UNUSED,
+			 gimple **skip_stmt ATTRIBUTE_UNUSED)
 { gcc_unreachable (); }
 
 }; // class pass_dominator
@@ -628,8 +632,11 @@ pass_dominator::execute (function *fun)
   else
 {
   basic_block sese_entry, sese_exit;
-  while (get_sese (&sese_entry, &sese_exit))
+  gimple *skip_stmt = NULL;
+  while (get_sese (&sese_entry, &sese_exit, &skip_stmt))
 	{
+	  walker.set_skip_stmt (skip_stmt);
+
 	  threadedge_initialize_values ();
 	  avail_exprs_stack->push_marker ();
 	  const_and_copies->push_marker ();
@@ -1363,7 +1370,12 @@ dom_opt_dom_walker::before_dom_children (basic_block bb)
   m_avail_exprs_stack->pop_to_marker ();
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-optimize_stmt (bb, gsi, m_const_and_copies, m_avail_exprs_stack);
+{
+  if (gsi_stmt (gsi) == m_skip_stmt)
+	continue;
+
+  optimize_stmt (bb, gsi, m_const_and_copies, m_avail_exprs_stack);
+}
 
   /* Now prepare to process dominated blocks.  */
   if (m_jump_threading_p)
-- 
1.9.1

[committed. gomp4, 6/6] Add pass_dominator_oacc_kernels

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch :
- factors a class dominator_base out of class pass_dominators,
- declares a new class pass_dominators_oacc_kernels, that operates on
  oacc kernels regions, and
- adds the new pass before pass_parallelize_loops_oacc_kernels in the
  oacc kernels pass group.

Thanks,
- Tom
Add pass_dominator_oacc_kernels

2015-10-12  Tom de Vries  

	* passes.def: Add pass_dominator_oacc_kernels to pass group pass_oacc_kernels.
	Add pass_tree_loop_done before, and pass_tree_loop_init after.
	* tree-pass.h (make_pass_dominator_oacc_kernels): Declare.
	* tree-ssa-dom.c (class dominator_base): New class.  Factor out of ...
	(class pass_dominator): ... here.
	(pass_dominator_oacc_kernels): New pass.
	(make_pass_dominator_oacc_kernels): New function.

	* c-c++-common/goacc/kernels-counter-var-redundant-load.c: New test.
---
 gcc/passes.def |   3 +
 .../goacc/kernels-counter-var-redundant-load.c |  34 ++
 gcc/tree-pass.h|   1 +
 gcc/tree-ssa-dom.c | 117 +
 4 files changed, 134 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c

diff --git a/gcc/passes.def b/gcc/passes.def
index 0498a8b..bc454c0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_scev_cprop);
+	  NEXT_PASS (pass_tree_loop_done);
+	  NEXT_PASS (pass_dominator_oacc_kernels);
+	  NEXT_PASS (pass_tree_loop_init);
   	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_tree_loop_done);
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
new file mode 100644
index 000..84dee69
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
@@ -0,0 +1,34 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-dom_oacc_kernels" } */
+
+#include 
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+COUNTERTYPE
+foo (unsigned int *c)
+{
+  COUNTERTYPE ii;
+
+#pragma acc kernels copyout (c[0:N])
+  {
+for (ii = 0; ii < N; ii++)
+  c[ii] = 1;
+  }
+
+  return ii;
+}
+
+/* We're expecting:
+
+   .omp_data_i_10 = &.omp_data_arr.3;
+   _11 = .omp_data_i_10->ii;
+   *_11 = 0;
+   _15 = .omp_data_i_10->c;
+   c.1_16 = *_15;
+
+   Check that there's only one load from anonymous ssa-name (which we assume to
+   be the one to read c), and that there's no such load for ii.  */
+
+/* { dg-final { scan-tree-dump-times "(?n)\\*_\[0-9\]\[0-9\]*;$" 1 "dom_oacc_kernels" } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 52ba3e5..15c8bf6 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -392,6 +392,7 @@ extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_dominator_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 573e6fc..c7dc7b0 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "tree-cfgcleanup.h"
 #include "cfgcleanup.h"
+#include "omp-low.h"
 
 /* This file implements optimizations on the dominator tree.  */
 
@@ -526,6 +527,31 @@ private:
 
 namespace {
 
+class dominator_base : public gimple_opt_pass
+{
+ protected:
+  dominator_base (pass_data data, gcc::context *ctxt)
+: gimple_opt_pass (data, ctxt)
+  {}
+
+  unsigned int execute (function *);
+
+  /* Return true if pass should perform jump threading.  */
+  virtual bool jump_threading_p (void) { return !sese_

[committed. gomp4, 5/6] Add oacc kernels related infra functions

2015-10-12 Thread Tom de Vries


On 12/10/15 16:49, Tom de Vries wrote:

Hi,

I've committed the following patch series to the gomp-4_0-branch.

  1Add pass_dominator::jump_threading_p ()
  2Add dom_walker::walk_until
  3Add pass_dominator::sese_mode_p ()
  4Add skip_stmt parm to pass_dominator::get_sese ()
  5Add oacc kernels related infra functions
  6Add pass_dominator_oacc_kernels

The patch series adds a pass pass_dominator_oacc_kernels, which does the
pass_dominator optimizations (with the exception of jump threading) on
each oacc kernels region rather than on the whole function.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds three new oacc kernels region related infrastructure 
functions:


extern tree get_omp_data_i (basic_block);
extern bool oacc_kernels_region_entry_p (basic_block, gomp_target **);
extern basic_block get_oacc_kernels_region_exit (basic_block);

Thanks,
- Tom
Add oacc kernels related infra functions

2015-10-12  Tom de Vries  

	* omp-low.c (get_oacc_kernels_region_exit, get_omp_data_i): New
	function.
	(oacc_kernels_region_entry_p): New function. Factor out of ...
	(gimple_stmt_omp_data_i_init_p): ... here.
	* omp-low.h (get_oacc_kernels_region_exit, oacc_kernels_region_entry_p)
	(get_omp_data_i): Declare.
---
 gcc/omp-low.c | 102 --
 gcc/omp-low.h |   3 ++
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2b2c3a7..2289486 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -9981,6 +9981,53 @@ loop_get_oacc_kernels_region_entry (struct loop *loop)
 }
 }
 
+/* Return the oacc kernels region exit corresponding to REGION_ENTRY.  */
+
+basic_block
+get_oacc_kernels_region_exit (basic_block region_entry)
+{
+  gcc_checking_assert (oacc_kernels_region_entry_p (region_entry, NULL));
+
+  bitmap to_visit = BITMAP_ALLOC (NULL);
+  bitmap visited = BITMAP_ALLOC (NULL);
+  bitmap_clear (to_visit);
+  bitmap_clear (visited);
+
+  bitmap_set_bit (to_visit, region_entry->index);
+
+  basic_block bb;
+  while (true)
+{
+  if (bitmap_empty_p (to_visit))
+	{
+	  bb = NULL;
+	  break;
+	}
+
+  unsigned int index = bitmap_first_set_bit (to_visit);
+  bitmap_clear_bit (to_visit, index);
+  bitmap_set_bit (visited, index);
+  bb = BASIC_BLOCK_FOR_FN (cfun, index);
+
+  gimple *last = last_stmt (bb);
+  if (last != NULL
+	  && gimple_code (last) == GIMPLE_OMP_RETURN)
+	break;
+
+  edge_iterator ei;
+  for (ei = ei_start (bb->succs); !ei_end_p (ei); ei_next (&ei))
+	{
+	  edge e = ei_edge (ei);
+	  unsigned int dest_index = e->dest->index;
+	  if (!bitmap_bit_p (visited, dest_index))
+	bitmap_set_bit (to_visit, dest_index);
+	}
+}
+
+  BITMAP_FREE (to_visit);
+  return bb;
+}
+
 /* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
macro on gomp-constants.h.  We do not check for overflow.  */
 
@@ -15154,6 +15201,31 @@ omp_finish_file (void)
 }
 }
 
+/* Return true if BB is an oacc kernels region entry.  If DIRECTIVE is non-null,
+   return the corresponding kernels directive in *DIRECTIVE.  */
+
+bool
+oacc_kernels_region_entry_p (basic_block bb, gomp_target **directive)
+{
+  /* Check that the last statement in the preceding bb is an oacc kernels
+ stmt.  */
+  if (!single_pred_p (bb))
+return false;
+  gimple *last = last_stmt (single_pred (bb));
+  if (last == NULL
+  || gimple_code (last) != GIMPLE_OMP_TARGET)
+return false;
+  gomp_target *kernels = as_a  (last);
+
+  bool res = (gimple_omp_target_kind (kernels)
+	  == GF_OMP_TARGET_KIND_OACC_KERNELS);
+
+  if (res && directive)
+*directive = kernels;
+
+  return res;
+}
+
 /* Return true if STMT is copy assignment .omp_data_i = &.omp_data_arr.  */
 
 bool
@@ -15171,15 +15243,8 @@ gimple_stmt_omp_data_i_init_p (gimple *stmt)
   /* Check that the last statement in the preceding bb is an oacc kernels
  stmt.  */
   basic_block bb = gimple_bb (stmt);
-  if (!single_pred_p (bb))
-return false;
-  gimple *last = last_stmt (single_pred (bb));
-  if (last == NULL
-  || gimple_code (last) != GIMPLE_OMP_TARGET)
-return false;
-  gomp_target *kernels = as_a  (last);
-  if (gimple_omp_target_kind (kernels)
-  != GF_OMP_TARGET_KIND_OACC_KERNELS)
+  gomp_target *kernels;
+  if (!oacc_kernels_region_entry_p (bb, &kernels))
 return false;
 
   /* Get omp_data_arr from the oacc kernels stmt.  */
@@ -15190,6 +15255,25 @@ gimple_stmt_omp_data_i_init_p (gimple *stmt)
   return operand_equal_p (obj, omp_data_arr, 0);
 }
 
+
+/* Return omp_data_i corresponding to the assignment
+   .omp_data_i = &.omp_data_arr in oacc kernels region entry REGION_ENTRY.  */
+
+tree
+get_omp_data_i (basic_block region_entry)
+{
+  if (!single_succ_p (region_entry))
+return NULL_TREE;
+  basic_block bb = single_succ (region_entry);
+  gimple_stmt_iterator gsi = gsi_start_

Re: Possible patch for PR fortran/67806

2015-10-12 Thread Steve Kargl

On Sun, Oct 11, 2015 at 10:18:48PM -0700, Louis Krupp wrote:
> The problem involves a derived type with a character component declared 
> CHARACTER(NULL()) or CHARACTER(NULL(n)), where mold argument n is an integer 
> pointer.
> 

I was looking at 67805 this weekend, which is somewhat
related to this PR.  AFAICT, gfortran does no checking
for n in CHARACTER(LEN=n). n should be a scalar-int-expr
(that is scalar INTEGER expression).  NULL() is not
an integer, and NULL(n) is a disassociated pointer.  So,
I believe neither can appear in an scalar-int-expr.

Note, also that there is a table in 13.7.125 on where
NULL() can appear.

My patch for 67805 leads to one regression that I've been
unable to resolve.

-- 
Steve

[PATCH] v4 of diagnostic_show_locus and rich_location

2015-10-12 Thread David Malcolm

On Sun, 2015-09-27 at 02:55 +0200, Dodji Seketeli wrote:
> [Note to libcpp, C, and Fortran maintainers: we still need your input :-)]

Updated version of patch attached (v4); a diff relative to v3 can be
seen at:
https://dmalcolm.fedorapeople.org/gcc/2015-10-12/0003-Eliminate-special-casing-for-Fortran.patch

v4 eliminates the lingering parts of the old implementation of
diagnostic_show_locus, porting the Fortran frontend to use the new
implementation.

In the process I discovered an issue with the Fortran frontend: some of
the caret locations appear to have an off-by-one error.
For example, in gcc/testsuite/gfortran.dg/associate_5.f03, the old
implementation would issue this diagnostic:

associate_5.f03:33:6:

   y = 5 ! { dg-error "variable definition context" }
  1
associate_5.f03:32:20:

 ASSOCIATE (y => x) ! { dg-error "variable definition context" }
2
Error: Associate-name ‘y’ can not appear in a variable definition
context (assignment) at (1) because its target at (2) can not, either

Note how the carets 1 and 2 appear one column before the "y" and the "x"
that they refer to.

This seems to be a pre-existing bug in the Fortran FE, which I've now
filed as PR fortran/67936.

On porting the Fortran FE to fully use the new implementation of
diagnostic_show_locus, I found that the "1" caret in the above
disappeared, due to v3 of the layout printer suppressed carets and
underlines appearing within the leading whitespace before the text in
its line.  So I updated that to only suppress underlines in such a
location, and not carets, to ensure that we at least faithfully print
both carets, at the given (erroneous) locations.   I added test coverage
for this (test_caret_on_leading_whitespace).

The existing Fortran testcase for diagnostics with multiple locations
don't seem to verify the -fdiagnostics-show-caret case; I visually
inspected the results, but perhaps we could add some automated test
coverage there using the dg-{begin|end}-multiline directives from
earlier in this kit (which is now in trunk).  I don't know if adding
such test coverage is necessary for acceptance of this patch though.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.  OK for
trunk?

Some other comments inline.

> Hello,
> 
> David Malcolm  writes:
> 
> [...]
> 
> > Here's the revised comment I put in the attached patch:
> 
> [...]
> 
> > +   The class caches the lookup of the color codes for the above.
> > +
> > +   The class also has responsibility for tracking which of the above is
> > +   active, filtering out unnecessary changes.  This allows 
> > layout::print_line
> > +   to simply request a colorization code for *every* character it prints
> > +   thorough this class, and have the filtering be done for it here.
> 
> You probably meant "*through* this class" ?

Yes, thanks.  Fixed.

> > */
> 
> > Hopefully that comment explains the possible states the colorizer can
> > have.
> 
> Yes it does, great comment, thank you.
> 
> 
> > FWIW I have a follow-up patch to add support for fix-it hints, so they
> > might be another kind of colorization state.
> > (see https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00732.html for the
> > earlier version of said patch, in v1 of the kit).
> 
> Yeah, I'll comment on that one separatly.
> 
> >> Also, I am thinking that there should maybe be a layout::state type,
> >> which would have two notional properties (for now): range_index and
> >> draw_caret_p. So that this function:
> >> 
> >> +bool
> >> +layout::get_state_at_point (/* Inputs.  */
> >> +  int row, int column,
> >> +  int first_non_ws, int last_non_ws,
> >> +  /* Outputs.  */
> >> +  int *out_range_idx,
> >> +  bool *out_draw_caret_p)
> >> 
> >> Would take just one output parameter, e.g, a reference to
> >> layout::state.
> >
> > Fixed, though I called it "struct point_state", given that it's coming
> > from get_state_at_point.  I passed it by pointer, since AFAIK our coding
> > standards don't yet approve of the use of references in the codebase
> > (outside of places where we need them e.g. container classes).
> 
> Great.  Thanks.
> 
> >
> > I also added a unit test for a rich_location with two caret locations
> > (mimicking one of the Fortran examples), to give us coverage for this
> > case:
> >
> > +void test_multiple_carets (void)
> > +{
> > +#if 0
> > +   x = x + y /* { dg-warning "8: test" } */
> > +/* { dg-begin-multiline-output "" }
> > +x = x + y
> > +A   B
> > +   { dg-end-multiline-output "" } */
> > +#endif
> > +}
> >
> > where the "A" and "B" as caret chars are coming from new code in the
> > show_locus unittest plugin.
> 
> Yeah, saw that.  Excellent, thanks.
> 
> [...]
> 
> >> +  if (0)
> >> +show_ruler (context, line_width, m_x_offset);
> >> 
> >> This should probably be removed from the final code to be committed.
> >
> > FWIW, the ruler is very helpful to me when de

[PATCH] Improve FSM threader to handle compiler temporaries too

2015-10-12 Thread Jeff Law



The FSM jump threader currently will not handle threading for compiler 
generated temporaries.


I discovered this when looking at what tests regress if I remove the 
ability of the old threader to thread across backedges and why the FSM 
threader doesn't handle them.


bitmap.c has a multitude of codes that the FSM bits can now optimize.  I 
took one and let multidelta loose on it resulting in the included testcase.


I wouldn't be surprised if the testcase ultimately turns out to be 
dependent on BRANCH_COST.  I'll keep an eye on gcc-testresults to see if 
the test needs adjustment for other targets.


Bootstrapped & regression tested on x86_64-linux-gnu.  Installed on the 
trunk.



Jeff
commit 97d71bc09d2198072bed76ba36e988584f857bb1
Author: Jeff Law 
Date:   Mon Oct 12 10:24:45 2015 -0600

[PATCH] Improve FSM threader to handle compiler temporaries too

* tree-ssa-threadbackward.c (fsm_find_thread_path): Remove
restriction that traced SSA_NAME is a user variable.

* gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c34e084..32ec554 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-10-12  Jeff Law  
+
+   * tree-ssa-threadbackward.c (fsm_find_thread_path): Remove
+   restriction that traced SSA_NAME is a user variable.
+
 2015-10-12  Tom de Vries  
 
PR tree-optimization/67476
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f4b7d26..89f3363 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-10-12  Jeff Law  
+
+   * gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
+
 2015-10-12  Ville Voutilainen  
 
PR c++/58566
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
new file mode 100644
index 000..03d0334
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp2-details" } */
+/* { dg-final { scan-tree-dump "FSM" "vrp2" } } */
+
+void abort (void);
+typedef struct bitmap_head_def *bitmap;
+typedef const struct bitmap_head_def *const_bitmap;
+typedef struct bitmap_obstack
+{
+  struct bitmap_obstack *next;
+  unsigned int indx;
+}
+bitmap_element;
+typedef struct bitmap_head_def
+{
+  bitmap_element *first;
+}
+bitmap_head;
+static __inline__ unsigned char
+bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt,
+   bitmap_element * dst_prev, const bitmap_element * a_elt,
+   const bitmap_element * b_elt)
+{
+  ((void) (!(a_elt || b_elt) ? abort (), 0 : 0));
+}
+
+unsigned char
+bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
+ const_bitmap kill)
+{
+  bitmap_element *dst_elt = dst->first;
+  const bitmap_element *a_elt = a->first;
+  const bitmap_element *b_elt = b->first;
+  const bitmap_element *kill_elt = kill->first;
+  bitmap_element *dst_prev = ((void *) 0);
+  while (a_elt || b_elt)
+{
+  if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
+ && (!a_elt || a_elt->indx >= b_elt->indx));
+  else
+   {
+ bitmap_elt_ior (dst, dst_elt, dst_prev, a_elt, b_elt);
+ if (a_elt && b_elt && a_elt->indx == b_elt->indx)
+   ;
+ else if (a_elt && (!b_elt || a_elt->indx <= b_elt->indx))
+   a_elt = a_elt->next;
+   }
+}
+}
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 0012aa3..ff6481c 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -70,7 +70,7 @@ fsm_find_thread_path (basic_block start_bb, basic_block 
end_bb,
   return false;
 }
 
-/* We trace the value of the variable EXPR back through any phi nodes looking
+/* We trace the value of the SSA_NAME EXPR back through any phi nodes looking
for places where it gets a constant value and save the path.  Stop after
having recorded MAX_PATHS jump threading paths.  */
 
@@ -80,11 +80,10 @@ fsm_find_control_statement_thread_paths (tree expr,
 vec *&path,
 bool seen_loop_phi)
 {
-  tree var = SSA_NAME_VAR (expr);
   gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
   basic_block var_bb = gimple_bb (def_stmt);
 
-  if (var == NULL || var_bb == NULL)
+  if (var_bb == NULL)
 return;
 
   /* For the moment we assume that an SSA chain only contains phi nodes, and

Re: [PATCH] v4 of diagnostic_show_locus and rich_location

2015-10-12 Thread Manuel López-Ibáñez

On 12 October 2015 at 16:44, David Malcolm  wrote:
> v4 of the patch does the conversion of Fortran, and eliminates the
> adaptation layer.  No partial transitions here!
>
> Manu: I hope this addresses your concerns.

Yes, it looks great. I don't understand how this

-   and for two locations that do not fit in the same locus line:
-
-   [name]:[locus]: Error: (1)
-   [name]:[locus2]: Error: Some error at (1) and (2)
+   [locus of primary range]: Error: Some error at (1) and (2)

passes the Fortran regression testsuite since the testcases normally
try to match the two locus separately, but I guess you figured out a
way to make it work and I must admit I did not have the time to read
the patch in deep detail. But it is a bit strange that you also
deleted this part:

-   With -fdiagnostic-show-caret (the default) and for valid locations,
-   it prints for one location:
+   With -fdiagnostic-show-caret (the default) it prints:

-   [locus]:
+   [locus of primary range]:

   some code
  1
Error: Some error at (1)

-   for two locations that fit in the same locus line:
+  With -fno-diagnostic-show-caret or if the primary range is not
+  valid, it prints:

-   [locus]:
-
- some code and some more code
-1   2
-   Error: Some error at (1) and (2)
-
-   and for two locations that do not fit in the same locus line:
-
-   [locus]:
-
- some code
-1
-   [locus2]:
-
- some other code
-   2
-   Error: Some error at (1) and (2)
-

which should work the same before and after your patch. Independently
of whether the actual logic moved into some new mechanism in the new
rich locations world, this seems like useful info to keep in
fortran/error.c.

Cheers,

Manuel.

Re: [PATCH, 3/5] Handle original loop tree in expand_omp_for_generic

2015-10-12 Thread Tom de Vries


On 12/10/15 16:11, Bernd Schmidt wrote:

On 10/10/2015 01:58 PM, Tom de Vries wrote:


Handle original loop tree in expand_omp_for_generic

2015-09-10  Tom de Vries

PR tree-optimization/67476
* omp-low.c (expand_omp_for_generic): Handle original loop tree.


This one I find slightly confusing.


-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
+  struct loop *loop = l1_bb->loop_father;
+  add_bb_to_loop (l2_bb, entry_bb->loop_father);
add_loop (outer_loop, l0_bb->loop_father);


Looks like a lot of bb's loop_father is being looked at. Are all or some
of these supposed to be the same? I think I'd like one (appropriately
named) struct loop * variable for each loop that's involved here.


Done.


There's a comment suggesting that there can be different situations, it
would be good to expand that to explain how they can arise.


-  struct loop *loop = alloc_loop ();
+  loop = alloc_loop ();


Also, I think it would be preferrable to not reuse that loop variable
but make a new one instead.



Does this version look better?

Thanks,
- Tom
Handle original loop tree in expand_omp_for_generic

2015-09-12  Tom de Vries  

	PR tree-optimization/67476
	* omp-low.c (expand_omp_for_generic): Handle original loop tree.
---
 gcc/omp-low.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b2a93b9..b957428 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6439,7 +6439,6 @@ expand_omp_for_generic (struct omp_region *region,
   remove_edge (e);
 
   make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
   e = find_edge (cont_bb, l1_bb);
   if (e == NULL)
 	{
@@ -6516,17 +6515,30 @@ expand_omp_for_generic (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, l1_bb,
 			   recompute_dominator (CDI_DOMINATORS, l1_bb));
 
-  struct loop *outer_loop = alloc_loop ();
-  outer_loop->header = l0_bb;
-  outer_loop->latch = l2_bb;
-  add_loop (outer_loop, l0_bb->loop_father);
+  /* We enter expand_omp_for_generic with a loop.  This original loop may
+	 have its own loop struct, or it may be part of an outer loop struct
+	 (which may be the fake loop).  */
+  struct loop *outer_loop = entry_bb->loop_father;
+  bool orig_loop_has_loop_struct = l1_bb->loop_father != outer_loop;
 
-  if (!gimple_omp_for_combined_p (fd->for_stmt))
+  add_bb_to_loop (l2_bb, outer_loop);
+
+  /* We've added a new loop around the original loop.  Allocate the
+	 corresponding loop struct.  */
+  struct loop *new_loop = alloc_loop ();
+  new_loop->header = l0_bb;
+  new_loop->latch = l2_bb;
+  add_loop (new_loop, outer_loop);
+
+  if (/* If we already have a loop struct for the original loop, don't
+	 allocate a new one.  */
+	  !orig_loop_has_loop_struct
+	  && !gimple_omp_for_combined_p (fd->for_stmt))
 	{
-	  struct loop *loop = alloc_loop ();
-	  loop->header = l1_bb;
+	  struct loop *orig_loop = alloc_loop ();
+	  orig_loop->header = l1_bb;
 	  /* The loop may have multiple latches.  */
-	  add_loop (loop, outer_loop);
+	  add_loop (orig_loop, new_loop);
 	}
 }
 }
-- 
1.9.1

[PATCH, sparc]: Use ROUND_UP and ROUND_DOWN macros

2015-10-12 Thread Uros Bizjak

Two functional changes I'd like to point out:

 /* ALIGN FRAMES on double word boundaries */
-#define SPARC_STACK_ALIGN(LOC) \
-  (TARGET_ARCH64 ? (((LOC)+15) & ~15) : (((LOC)+7) & ~7))
+#define SPARC_STACK_ALIGN(LOC) ROUND_UP ((LOC), UNITS_PER_WORD * 2)

The one above uses UNITS_PER_WORD in stack alignment calculation

   /* Always preserve double-word alignment.  */
-  offset = (offset + 8) & -8;
+  offset = ROUND_UP (offset, 8);

The one above looks like off-by-one bug, but this needs a confirmation.

2015-10-12  Uros Bizjak  

* config/sparc/sparc.h (SPARC_STACK_ALIGN): Implement using
ROUND_UP macro and UNITS_PER_WORD * 2.
* config/sparc/sparc.c (sparc_compute_frame_size):
Use ROUND_UP and ROUND_DOWN macros where applicable.
(function_arg_record_value, function_arg_record_value_1)
(function_arg_record_value_1): Ditto.
(emit_save_or_restore_regs): Use ROUND_UP to preserve offset
alignment to double-word.
(sparc_gimplify_va_arg): Use ROUND_UP ro calculate rsize.
(sparc_emit_probe_stack_range): Use ROUND_DOWN to calculate
rounded_size.

Tested by building a crosscompiler to sparc-linux-gnu. Due to the two
above changes, can someone please bootstrap and regression test this
patch properly on sparc targets?

OK for mainline if bootstrap+regtest show no problems?

Uros.
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c(revision 228726)
+++ config/sparc/sparc.c(working copy)
@@ -4981,11 +4981,11 @@ sparc_compute_frame_size (HOST_WIDE_INT size, int
   else
 {
   /* We subtract STARTING_FRAME_OFFSET, remember it's negative.  */
-  apparent_frame_size = (size - STARTING_FRAME_OFFSET + 7) & -8;
+  apparent_frame_size = ROUND_UP (size - STARTING_FRAME_OFFSET, 8);
   apparent_frame_size += n_global_fp_regs * 4;
 
   /* We need to add the size of the outgoing argument area.  */
-  frame_size = apparent_frame_size + ((args_size + 7) & -8);
+  frame_size = apparent_frame_size + ROUND_UP (args_size, 8);
 
   /* And that of the register window save area.  */
   frame_size += FIRST_PARM_OFFSET (cfun->decl);
@@ -5116,7 +5116,7 @@ sparc_emit_probe_stack_range (HOST_WIDE_INT first,
 
   /* Step 1: round SIZE to the previous multiple of the interval.  */
 
-  rounded_size = size & -PROBE_INTERVAL;
+  rounded_size = ROUND_DOWN (size, PROBE_INTERVAL);
   emit_move_insn (g4, GEN_INT (rounded_size));
 
 
@@ -5317,7 +5317,7 @@ emit_save_or_restore_regs (unsigned int low, unsig
emit_move_insn (gen_rtx_REG (mode, regno), mem);
 
  /* Always preserve double-word alignment.  */
- offset = (offset + 8) & -8;
+ offset = ROUND_UP (offset, 8);
}
 }
 
@@ -6439,8 +6439,8 @@ function_arg_record_value_1 (const_tree type, HOST
  unsigned int startbit, endbit;
  int intslots, this_slotno;
 
- startbit = parms->intoffset & -BITS_PER_WORD;
- endbit   = (bitpos + BITS_PER_WORD - 1) & -BITS_PER_WORD;
+ startbit = ROUND_DOWN (parms->intoffset, BITS_PER_WORD);
+ endbit   = ROUND_UP (bitpos, BITS_PER_WORD);
 
  intslots = (endbit - startbit) / BITS_PER_WORD;
  this_slotno = parms->slotno + parms->intoffset
@@ -6495,8 +6495,8 @@ function_arg_record_value_3 (HOST_WIDE_INT bitpos,
   intoffset = parms->intoffset;
   parms->intoffset = -1;
 
-  startbit = intoffset & -BITS_PER_WORD;
-  endbit = (bitpos + BITS_PER_WORD - 1) & -BITS_PER_WORD;
+  startbit = ROUND_DOWN (intoffset, BITS_PER_WORD);
+  endbit = ROUND_UP (bitpos, BITS_PER_WORD);
   intslots = (endbit - startbit) / BITS_PER_WORD;
   this_slotno = parms->slotno + intoffset / BITS_PER_WORD;
 
@@ -6669,8 +6669,8 @@ function_arg_record_value (const_tree type, machin
   unsigned int startbit, endbit;
   int intslots, this_slotno;
 
-  startbit = parms.intoffset & -BITS_PER_WORD;
-  endbit = (typesize*BITS_PER_UNIT + BITS_PER_WORD - 1) & -BITS_PER_WORD;
+  startbit = ROUND_DOWN (parms.intoffset, BITS_PER_WORD);
+  endbit = ROUND_UP (typesize*BITS_PER_UNIT, BITS_PER_WORD);
   intslots = (endbit - startbit) / BITS_PER_WORD;
   this_slotno = slotno + parms.intoffset / BITS_PER_WORD;
 
@@ -7451,7 +7451,7 @@ sparc_gimplify_va_arg (tree valist, tree type, gim
 {
   indirect = false;
   size = int_size_in_bytes (type);
-  rsize = (size + UNITS_PER_WORD - 1) & -UNITS_PER_WORD;
+  rsize = ROUND_UP (size, UNITS_PER_WORD);
   align = 0;
 
   if (TARGET_ARCH64)
Index: config/sparc/sparc.h
===
--- config/sparc/sparc.h(revision 228726)
+++ config/sparc/sparc.h(working copy)
@@ -510,8 +510,7 @@ extern enum cmodel sparc_cmodel;
 #define SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 && TARGET_STACK_BIAS)
 
 /* ALIGN F

Re: Test for __cxa_thread_atexit_impl when cross-compiling libstdc++ for GNU targets

2015-10-12 Thread Bernd Schmidt


On 10/10/2015 01:18 AM, Joseph Myers wrote:

I noticed that when testing glibc with a cross compiler I got

UNSUPPORTED: nptl/tst-thread_local1

because the libstdc++-v3 configuration for cross compiling defaulted
to __cxa_thread_atexit_impl not being available.  This patch fixes
GLIBCXX_CROSSCONFIG to run the same test (for the case covering
targets with glibc) for __cxa_thread_atexit_impl as for native
compilation, just as it runs most of the other tests done for native
compilation (for these targets, it's not possible to build libstdc++
without already having built libc, so link tests are OK).

Tested with no regressions for cross to arm-none-linux-gnueabi.  OK to
commit?

2015-10-09  Joseph Myers  

* crossconfig.m4 (GLIBCXX_CROSSCONFIG) <*-linux* | *-uclinux* |
*-gnu* | *-kfreebsd*-gnu | *-knetbsd*-gnu | *-cygwin*>: Check for
__cxa_thread_atexit_impl.
* configure: Regenerate.


That looks ok.


Index: libstdc++-v3/crossconfig.m4
===
--- libstdc++-v3/crossconfig.m4 (revision 228601)
+++ libstdc++-v3/crossconfig.m4 (working copy)
@@ -156,6 +156,7 @@
  GLIBCXX_CHECK_STDLIB_SUPPORT
  AC_DEFINE(_GLIBCXX_USE_RANDOM_TR1)
  GCC_CHECK_TLS
+AC_CHECK_FUNCS(__cxa_thread_atexit_impl)
  AM_ICONV


A similar sequence of tests also occurs for *-aix*. I don't suppose the 
function is likely to exist there or on other non-glibc targets?



Bernd

[committed, gomp4] Handle sequential code in kernels region patch series

2015-10-12 Thread Tom de Vries


Hi,

I've committed the following patch series.

 1  Add get_bbs_in_oacc_kernels_region
 2  Handle sequential code in kernels region
 3  Handle sequential code in kernels region - Testcases

The patch series adds detection of whether sequential code (that is, 
code in the oacc kernels region before and after the loop that is to be 
parallelized), is safe to execute in parallel.


Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.

Thanks,
- Tom

Re: [committed, gomp4] Handle sequential code in kernels region patch series

2015-10-12 Thread Tom de Vries


On 12/10/15 19:12, Tom de Vries wrote:

Hi,

I've committed the following patch series.

  1Add get_bbs_in_oacc_kernels_region
  2Handle sequential code in kernels region
  3Handle sequential code in kernels region - Testcases

The patch series adds detection of whether sequential code (that is,
code in the oacc kernels region before and after the loop that is to be
parallelized), is safe to execute in parallel.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds an oacc kernels infrastructure function:

extern vec get_bbs_in_oacc_kernels_region (basic_block,
basic_block);

Thanks,
- Tom
Add get_bbs_in_oacc_kernels_region

2015-10-12  Tom de Vries  

	* omp-low.c (get_bbs_in_oacc_kernels_region): New function.
	* omp-low.h (get_bbs_in_oacc_kernels_region): Declare.
---
 gcc/omp-low.c | 40 
 gcc/omp-low.h |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2289486..f6e0247 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -9959,6 +9959,46 @@ mark_loops_in_oacc_kernels_region (basic_block region_entry,
   loop->in_oacc_kernels_region = true;
 }
 
+/* Return blocks in oacc kernels region delimited by REGION_ENTRY and
+   REGION_EXIT.  */
+
+vec
+get_bbs_in_oacc_kernels_region (basic_block region_entry,
+ basic_block region_exit)
+{
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  unsigned di;
+  basic_block bb;
+
+  bitmap_clear (excludes_bitmap);
+
+  /* Get all the blocks dominated by the region entry.  That will include the
+ entire region.  */
+  vec dominated
+= get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
+
+  bitmap_set_bit (excludes_bitmap, region_entry->index);
+
+  /* Exclude all the blocks which are not in the region: the blocks dominated by
+ the region exit.  */
+  if (region_exit != NULL)
+{
+  vec excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
+  FOR_EACH_VEC_ELT (excludes, di, bb)
+	bitmap_set_bit (excludes_bitmap, bb->index);
+  bitmap_clear_bit (excludes_bitmap, region_exit->index);
+}
+
+  vec bbs = vNULL;
+
+  FOR_EACH_VEC_ELT (dominated, di, bb)
+if (!bitmap_bit_p (excludes_bitmap, bb->index))
+  bbs.safe_push (bb);
+
+  return bbs;
+}
+
 /* Return the entry basic block of the oacc kernels region containing LOOP.  */
 
 basic_block
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index 62a7d4a..9f09bbc 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -34,6 +34,8 @@ extern tree get_omp_data_i (basic_block);
 extern bool oacc_kernels_region_entry_p (basic_block, gomp_target **);
 extern basic_block get_oacc_kernels_region_exit (basic_block);
 extern basic_block loop_get_oacc_kernels_region_entry (struct loop *);
+extern vec get_bbs_in_oacc_kernels_region (basic_block,
+			basic_block);
 extern void replace_oacc_fn_attrib (tree, tree);
 extern tree build_oacc_routine_dims (tree);
 extern tree get_oacc_fn_attrib (tree);
-- 
1.9.1

[PATCH, mips]: Use ROUND_UP and ROUND_DOWN macros

2015-10-12 Thread Uros Bizjak

Fairly trivial patch that introduces no functional changes.

* config/mips/mips.h (MIPS_STACK_ALIGN): Implement using
ROUND_UP macro.
* config/mips/mips.c (mips_setup_incoming_varargs): Use
ROUND_DOWN to calculate off.
(mips_gimplify_va_arg_expr): Use ROUND_UP to calculate rsize.
(mips_emit_probe_stack_range): Use ROUND_DOWN to calculate
rounded_size.

Tested by building a crosscompiler to powerpc64-linux-gnu.

OK for mainline?

Uros.
Index: config/mips/mips.c
===
--- config/mips/mips.c  (revision 228726)
+++ config/mips/mips.c  (working copy)
@@ -6080,7 +6080,7 @@ mips_setup_incoming_varargs (cumulative_args_t cum
  /* Set OFF to the offset from virtual_incoming_args_rtx of
 the first float register.  The FP save area lies below
 the integer one, and is aligned to UNITS_PER_FPVALUE bytes.  */
- off = (-gp_saved * UNITS_PER_WORD) & -UNITS_PER_FPVALUE;
+ off = ROUND_DOWN (-gp_saved * UNITS_PER_WORD, UNITS_PER_FPVALUE);
  off -= fp_saved * UNITS_PER_FPREG;
 
  mode = TARGET_SINGLE_FLOAT ? SFmode : DFmode;
@@ -6444,7 +6444,7 @@ mips_gimplify_va_arg_expr (tree valist, tree type,
unshare_expr (valist), f_gtop, NULL_TREE);
  off = build3 (COMPONENT_REF, TREE_TYPE (f_goff),
unshare_expr (valist), f_goff, NULL_TREE);
- rsize = (size + UNITS_PER_WORD - 1) & -UNITS_PER_WORD;
+ rsize = ROUND_UP (size, UNITS_PER_WORD);
  if (rsize > UNITS_PER_WORD)
{
  /* [1] Emit code for: off &= -rsize.  */
@@ -11320,7 +11320,7 @@ mips_emit_probe_stack_range (HOST_WIDE_INT first,
 
   /* Step 1: round SIZE to the previous multiple of the interval.  */
 
-  rounded_size = size & -PROBE_INTERVAL;
+  rounded_size = ROUND_DOWN (size, PROBE_INTERVAL);
 
 
   /* Step 2: compute initial and final value of the loop counter.  */
Index: config/mips/mips.h
===
--- config/mips/mips.h  (revision 228726)
+++ config/mips/mips.h  (working copy)
@@ -2486,7 +2486,7 @@ typedef struct mips_args {
 /* Treat LOC as a byte offset from the stack pointer and round it up
to the next fully-aligned offset.  */
 #define MIPS_STACK_ALIGN(LOC) \
-  (TARGET_NEWABI ? ((LOC) + 15) & -16 : ((LOC) + 7) & -8)
+  (TARGET_NEWABI ? ROUND_UP ((LOC), 16) : ROUND_UP ((LOC), 8))
 
 
 /* Output assembler code to FILE to increment profiler label # LABELNO

Re: [PATCH, 3/5] Handle original loop tree in expand_omp_for_generic

2015-10-12 Thread Bernd Schmidt


Does this version look better?


In terms of clarity, yes. Only one thing:


+  if (/* If we already have a loop struct for the original loop, don't
+allocate a new one.  */
+ !orig_loop_has_loop_struct


Don't really like the formatting with this comment. I'd pull it in front 
of the if statement, and change it to

 /* Allocate a loop structure for the original loop unless we already
had one.  */

Ok with that change.


Bernd

[committed, gomp4, 2/3] Handle sequential code in kernels region

2015-10-12 Thread Tom de Vries


On 12/10/15 19:12, Tom de Vries wrote:

Hi,

I've committed the following patch series.

  1Add get_bbs_in_oacc_kernels_region
  2Handle sequential code in kernels region
  3Handle sequential code in kernels region - Testcases

The patch series adds detection of whether sequential code (that is,
code in the oacc kernels region before and after the loop that is to be
parallelized), is safe to execute in parallel.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch checks in parloops, for each non-loop stmt in the oacc 
kernels region, that it's not a load aliasing with a store anywhere in 
the region, and vice versa.


An exception are loads and stores for reductions, which are later-on 
transformed into an atomic update.


Thanks,
- Tom
Handle sequential code in kernels region

2015-10-12  Tom de Vries  

	* omp-low.c (lower_omp_for): Don't call lower_oacc_head_tail for oacc
	kernels regions.
	* tree-parloops.c (try_create_reduction_list): Initialize keep_res
	field.
	(dead_load_p, ref_conflicts_with_region, oacc_entry_exit_ok_1)
	(oacc_entry_exit_ok): New function.
	(parallelize_loops): Call oacc_entry_exit_ok.
---
 gcc/omp-low.c   |   3 +-
 gcc/tree-parloops.c | 245 
 2 files changed, 247 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f6e0247..e700dd1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11949,7 +11949,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   /* Once lowered, extract the bounds and clauses.  */
   extract_omp_for_data (stmt, &fd, NULL);
 
-  if (is_gimple_omp_oacc (ctx->stmt))
+  if (is_gimple_omp_oacc (ctx->stmt)
+  && !ctx_in_oacc_kernels_region (ctx))
 lower_oacc_head_tail (gimple_location (stmt),
 			  gimple_omp_for_clauses (stmt),
 			  &oacc_head, &oacc_tail, ctx);
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 4b67793..d4eb32a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "tree-ssa.h"
 #include "params.h"
+#include "tree-ssa-alias.h"
+#include "tree-eh.h"
 
 /* This pass tries to distribute iterations of loops into several threads.
The implementation is straightforward -- for each loop we test whether its
@@ -2672,6 +2674,7 @@ try_create_reduction_list (loop_p loop,
 			 "  FAILED: it is not a part of reduction.\n");
 	  return false;
 	}
+	  red->keep_res = phi;
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "reduction phi is  ");
@@ -2764,6 +2767,240 @@ try_create_reduction_list (loop_p loop,
   return true;
 }
 
+/* Return true if STMT is a load of which the result is unused, and can be
+   safely deleted.  */
+
+static bool
+dead_load_p (gimple *stmt)
+{
+  if (!gimple_assign_load_p (stmt))
+return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  return (TREE_CODE (lhs) == SSA_NAME
+	  && has_zero_uses (lhs)
+	  && !gimple_has_side_effects (stmt)
+	  && !stmt_could_throw_p (stmt));
+}
+
+static bool
+ref_conflicts_with_region (gimple_stmt_iterator gsi, ao_ref *ref,
+			   bool ref_is_store, vec region_bbs,
+			   unsigned int i, gimple *skip_stmt)
+{
+  basic_block bb = region_bbs[i];
+  gsi_next (&gsi);
+
+  while (true)
+{
+  for (; !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (stmt == skip_stmt)
+	{
+	  if (dump_file)
+		{
+		  fprintf (dump_file, "skipping reduction store: ");
+		  print_gimple_stmt (dump_file, stmt, 0, 0);
+		}
+	  continue;
+	}
+
+	  if (!gimple_vdef (stmt)
+	  && !gimple_vuse (stmt))
+	continue;
+
+	  if (ref_is_store)
+	{
+	  if (dead_load_p (stmt))
+		{
+		  if (dump_file)
+		{
+		  fprintf (dump_file, "skipping dead load: ");
+		  print_gimple_stmt (dump_file, stmt, 0, 0);
+		}
+		  continue;
+		}
+
+	  if (ref_maybe_used_by_stmt_p (stmt, ref))
+		{
+		  if (dump_file)
+		{
+		  fprintf (dump_file, "Stmt ");
+		  print_gimple_stmt (dump_file, stmt, 0, 0);
+		}
+		  return true;
+		}
+	}
+	  else
+	{
+	  if (stmt_may_clobber_ref_p_1 (stmt, ref))
+		{
+		  if (dump_file)
+		{
+		  fprintf (dump_file, "Stmt ");
+		  print_gimple_stmt (dump_file, stmt, 0, 0);
+		}
+		  return true;
+		}
+	}
+	}
+  i++;
+  if (i == region_bbs.length ())
+	break;
+  bb = region_bbs[i];
+  gsi = gsi_start_bb (bb);
+}
+
+  return false;
+}
+
+static bool
+oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
+		  tree omp_data_i,
+		  reduction_info_table_type *reduction_list)
+{
+  unsigned i;
+  basic_block bb;
+  FOR_EACH_VEC_ELT (region_bbs, i, bb)
+{
+  if (bitmap_bit_p (in_loop_bbs, bb->index))
+	continue;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+

[committed, gomp4, 3/3] Handle sequential code in kernels region - Testcases

2015-10-12 Thread Tom de Vries


On 12/10/15 19:12, Tom de Vries wrote:

Hi,

I've committed the following patch series.

  1Add get_bbs_in_oacc_kernels_region
  2Handle sequential code in kernels region
  3Handle sequential code in kernels region - Testcases

The patch series adds detection of whether sequential code (that is,
code in the oacc kernels region before and after the loop that is to be
parallelized), is safe to execute in parallel.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch adds relevant test-cases.

Thanks,
- Tom
Handle sequential code in kernels region - Testcases

2015-10-12  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c: New test.
---
 .../kernels-loop-and-seq-2.c   | 36 +
 .../kernels-loop-and-seq-3.c   | 37 ++
 .../kernels-loop-and-seq-4.c   | 36 +
 .../kernels-loop-and-seq-5.c   | 37 ++
 .../kernels-loop-and-seq-6.c   | 36 +
 .../kernels-loop-and-seq.c | 37 ++
 6 files changed, 219 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
new file mode 100644
index 000..2e4100f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include 
+
+#define N 32
+
+unsigned int
+foo (int n, unsigned int *a)
+{
+#pragma acc kernels copy (a[0:N])
+  {
+a[0] = a[0] + 1;
+
+for (int i = 0; i < n; i++)
+  a[i] = 1;
+  }
+
+  return a[0];
+}
+
+int
+main (void)
+{
+  unsigned int a[N];
+  unsigned res, i;
+
+  for (i = 0; i < N; ++i)
+a[i] = i % 4;
+
+  res = foo (N, a);
+  if (res != 1)
+abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
new file mode 100644
index 000..b3e736b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include 
+
+#define N 32
+
+unsigned int
+foo (int n, unsigned int *a)
+{
+
+#pragma acc kernels copy (a[0:N])
+  {
+for (int i = 0; i < n; i++)
+  a[i] = 1;
+
+a[0] = 2;
+  }
+
+  return a[0];
+}
+
+int
+main (void)
+{
+  unsigned int a[N];
+  unsigned res, i;
+
+  for (i = 0; i < N; ++i)
+a[i] = i % 4;
+
+  res = foo (N, a);
+  if (res != 2)
+abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
new file mode 100644
index 000..8b9affa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include 
+
+#define N 32
+
+unsigned int
+foo (int n, unsigned int *a)
+{
+#pragma acc kernels copy (a[0:N])
+  {
+a[0] = 2;
+
+for (int i = 0; i < n; i++)
+  a[i] = 1;
+  }
+
+  return a[0];
+}
+
+int
+main (void)
+{
+  unsigned int a[N];
+  unsigned res, i;
+
+  for (i = 0; i < N; ++i)
+a[i] = i % 4;
+
+  res = foo (N, a);
+  if (res != 1)
+abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
new file mode 100644
index 000..83d4e7f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include 
+
+#define N 32
+
+unsigned int
+foo (int n, unsigned int *a)

[committed, PATCH] [gcc-5-branch] Wrong stack alignment adjustment

2015-10-12 Thread H.J. Lu

Committed as an obvious fix.

H.J.
---
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 228731)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2015-10-12  H.J. Lu  
+
+   PR target/67940
+   * config/i386/i386.c (ix86_compute_frame_layout): Correct
+   stack alignment adjustment.
+   (ix86_expand_prologue): Likewise.
+
 2015-10-12  Uros Bizjak  
 
Backport from mainline
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 228731)
+++ gcc/config/i386/i386.c  (working copy)
@@ -10222,7 +10222,7 @@ ix86_compute_frame_layout (struct ix86_f
  sure that no value happens to be the same before and after, force
  the alignment computation below to add a non-zero value.  */
   if (stack_realign_fp)
-offset = (offset + stack_alignment_needed) & -stack_alignment_needed;
+offset = (offset + stack_alignment_needed - 1) & -stack_alignment_needed;
 
   /* Va-arg area */
   frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
@@ -11613,7 +11613,7 @@ ix86_expand_prologue (void)
  pointer is no longer valid.  As for the value of sp_offset,
 see ix86_compute_frame_layout, which we need to match in order
 to pass verification of stack_pointer_offset at the end.  */
-  m->fs.sp_offset = (m->fs.sp_offset + align_bytes) & -align_bytes;
+  m->fs.sp_offset = (m->fs.sp_offset + align_bytes - 1) & -align_bytes;
   m->fs.sp_valid = false;
 }

[hsa] Export dump_hsa_insn

2015-10-12 Thread Martin Jambor

Hi,

this small patch makes dump_hsa_insn available for dumping in other
compilation units.  Committed to the branch.

Thanks,

Martin

2015-10-12  Martin Jambor  

* hsa-dump.c (dump_hsa_insn): Rename to dump_hsa_insn_1.
(dump_hsa_insn): New function.
(dump_hsa_bb): Use dump_hsa_insn_1.
* hsa.h (dump_hsa_insn): Declare.
---
 gcc/hsa-dump.c | 23 +--
 gcc/hsa.h  |  1 +
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index e074241..ca27bf2 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -763,10 +763,12 @@ static void indent_stream (FILE *f, int indent)
 fputc (' ', f);
 }
 
-/* Dump textual representation of HSA IL instruction INSN to file F.  */
+/* Dump textual representation of HSA IL instruction INSN to file F.  Prepend
+   the instruction with *INDENT spaces and adjust the indentation for call
+   instructions as appropriate.  */
 
 static void
-dump_hsa_insn (FILE *f, hsa_insn_basic *insn, int *indent)
+dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 {
   gcc_checking_assert (insn);
   indent_stream (f, *indent);
@@ -1011,6 +1013,15 @@ dump_hsa_insn (FILE *f, hsa_insn_basic *insn, int 
*indent)
   fprintf (f, "\n");
 }
 
+/* Dump textual representation of HSA IL instruction INSN to file F.  */
+
+void
+dump_hsa_insn (FILE *f, hsa_insn_basic *insn)
+{
+  int indent = 0;
+  dump_hsa_insn_1 (f, insn, &indent);
+}
+
 /* Dump textual representation of HSA IL in HBB to file F.  */
 
 void
@@ -1026,10 +1037,10 @@ dump_hsa_bb (FILE *f, hsa_bb *hbb)
 
   int indent = 2;
   for (insn = hbb->first_phi; insn; insn = insn->next)
-dump_hsa_insn (f, insn, &indent);
+dump_hsa_insn_1 (f, insn, &indent);
 
   for (insn = hbb->first_insn; insn; insn = insn->next)
-dump_hsa_insn (f, insn, &indent);
+dump_hsa_insn_1 (f, insn, &indent);
 
   if (hbb->last_insn && is_a  (hbb->last_insn))
 goto exit;
@@ -1088,8 +1100,7 @@ dump_hsa_cfun (FILE *f)
 DEBUG_FUNCTION void
 debug_hsa_insn (hsa_insn_basic *insn)
 {
-  int indentation = 0;
-  dump_hsa_insn (stderr, insn, &indentation);
+  dump_hsa_insn (stderr, insn);
 }
 
 /* Dump textual representation of HSA IL in HBB to stderr.  */
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 98d70e0..3c49d1b 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -1094,6 +1094,7 @@ void hsa_brig_emit_omp_symbols (void);
 
 /*  In hsa-dump.c.  */
 const char *hsa_seg_name (BrigSegment8_t);
+void dump_hsa_insn (FILE *f, hsa_insn_basic *insn);
 void dump_hsa_bb (FILE *, hsa_bb *);
 void dump_hsa_cfun (FILE *);
 DEBUG_FUNCTION void debug_hsa_operand (hsa_op_base *opc);
-- 
2.6.0

[hsa] Introduce alignment to hsa_insn_mem

2015-10-12 Thread Martin Jambor

Hi,

the newest version of the finalizer actually honors alignment
information and therefore the compiler has to provide the correct
values, which is what the following patch does.

We may need to be more conservative in the cases when we expand memset
and memcpy inline, but I have committed this patch anyway, because
everywhere else it should be correct and the performance without it
was horrible.

Thanks,

Martin


2015-10-12  Martin Jambor  

* hsa-brig.c (get_alignment): Removed.
(emit_directive_variable): Use hsa_natural_alignment.
(emit_memory_insn): Use alignment from mem.
* hsa-gen.c (hsa_insn_mem::hsa_insn_mem): Initialize align.
(gen_hsa_insns_for_bitfield_load): New argument, set alignment.
(gen_hsa_insns_for_load): Set alignment.
(gen_hsa_insns_for_store): Likewise.
* hsa.c (hsa_alignment_encoding): New function.
(hsa_natural_alignment): Likewise.
(hsa_insn_mem::set_align): Likewise.
* hsa.h (hsa_insn_mem): New members align and set_align.
(hsa_alignment_encoding): Declare.
(hsa_natural_alignment): Likewise.
---
 gcc/hsa-brig.c | 29 -
 gcc/hsa-gen.c  | 23 +--
 gcc/hsa.c  | 49 +
 gcc/hsa.h  |  9 +
 4 files changed, 79 insertions(+), 31 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index b6eabfd..49d9e1d 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -477,29 +477,6 @@ brig_release_data (void)
   brig_initialized = 0;
 }
 
-/* Find the alignment base on the type.  */
-
-static BrigAlignment8_t
-get_alignment (BrigType16_t type)
-{
-  unsigned bit_size ;
-  bit_size = hsa_type_bit_size (type & ~BRIG_TYPE_ARRAY);
-
-  if (bit_size == 1)
-return BRIG_ALIGNMENT_1;
-  if (bit_size == 8)
-return BRIG_ALIGNMENT_1;
-  if (bit_size == 16)
-return BRIG_ALIGNMENT_2;
-  if (bit_size == 32)
-return BRIG_ALIGNMENT_4;
-  if (bit_size == 64)
-return BRIG_ALIGNMENT_8;
-  if (bit_size == 128)
-return BRIG_ALIGNMENT_16;
-  gcc_unreachable ();
-}
-
 /* Enqueue operation OP.  Return the offset at which it will be stored.  */
 
 static unsigned int
@@ -595,7 +572,9 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.init = 0;
   dirvar.type = htole16 (symbol->type);
   dirvar.segment = symbol->segment;
-  dirvar.align = get_alignment (dirvar.type);
+  /* TODO: Once we are able to access global variables, we must copy their
+ alignment.  */
+  dirvar.align = MAX (hsa_natural_alignment (dirvar.type), BRIG_ALIGNMENT_4);
   dirvar.linkage = symbol->linkage;
   dirvar.dim.lo = (uint32_t) symbol->dim;
   dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->dim >> 32);
@@ -1161,7 +1140,7 @@ emit_memory_insn (hsa_insn_mem *mem)
 repr.segment = BRIG_SEGMENT_FLAT;
   repr.modifier.allBits = 0 ;
   repr.equivClass = mem->equiv_class;
-  repr.align = BRIG_ALIGNMENT_1;
+  repr.align = mem->align;
   if (mem->opcode == BRIG_OPCODE_LD)
 repr.width = BRIG_WIDTH_1;
   else
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index cf36882..d20efd8 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -76,6 +76,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "cfgloop.h"
 #include "cfganal.h"
+#include "builtins.h"
 
 /* Print a warning message and set that we have seen an error.  */
 
@@ -1324,6 +1325,7 @@ hsa_insn_mem::hsa_insn_mem (int opc, BrigType16_t t, 
hsa_op_base *arg0,
   gcc_checking_assert (opc == BRIG_OPCODE_LD || opc == BRIG_OPCODE_ST
   || opc == BRIG_OPCODE_EXPAND);
 
+  align = hsa_natural_alignment (t);
   equiv_class = 0;
 }
 
@@ -1337,6 +1339,7 @@ hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, 
BrigType16_t t,
hsa_op_base *arg2, hsa_op_base *arg3)
   : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3)
 {
+  align = hsa_natural_alignment (t);
   equiv_class = 0;
 }
 
@@ -2007,18 +2010,20 @@ gen_hsa_insns_for_bitfield (hsa_op_reg *dest, 
hsa_op_reg *value_reg,
 
 
 /* Generate HSAIL instructions loading a bit field into register DEST.  ADDR is
-   prepared memory address which is used to load the bit field.  To identify
-   a bit field BITPOS is offset to the loaded memory and BITSIZE is number
-   of bits of the bit field.  Add instructions to HBB.  */
+   prepared memory address which is used to load the bit field.  To identify a
+   bit field BITPOS is offset to the loaded memory and BITSIZE is number of
+   bits of the bit field.  Add instructions to HBB.  Load must be performaed in
+   alignment ALIGN.  */
 
 static void
 gen_hsa_insns_for_bitfield_load (hsa_op_reg *dest, hsa_op_address *addr,
-   HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-   hsa_bb *hbb)
+HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+hsa_bb *hbb, BrigAlignment8_t

[hsa] Silence a warning in emit_directive_variable

2015-10-12 Thread Martin Jambor

Hi,

in the previous commit I have introduced a warning, this one silences
it.

Thanks,

Martin


2015-10-12  Martin Jambor  

* hsa-brig.c (emit_directive_variable): Add typecast.

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 49d9e1d..e9712b5 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -574,7 +574,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.segment = symbol->segment;
   /* TODO: Once we are able to access global variables, we must copy their
  alignment.  */
-  dirvar.align = MAX (hsa_natural_alignment (dirvar.type), BRIG_ALIGNMENT_4);
+  dirvar.align = MAX (hsa_natural_alignment (dirvar.type),
+ (BrigAlignment8_t) BRIG_ALIGNMENT_4);
   dirvar.linkage = symbol->linkage;
   dirvar.dim.lo = (uint32_t) symbol->dim;
   dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->dim >> 32);

[hsa] Make debug stores conditional on a parameter

2015-10-12 Thread Martin Jambor

Hi,

because HSA run-time currently offers very few options to debug the
HSAIL, especially when it comes to tricky things like executing a
kernel from kernel, we have resorted to introducing memory stores
solely for the purpose of debugging.  While we will gladly throw them
away when HSA supports at least a debugging trap, until then we
actually quite like them.

However, they can interfere with benchmarks so we need a way of
controlling them.  This patch introduces a parameter for them.  I have
chosen a parameter rather than a swithch to emphasize the fact that
this part of the interface is likely to change and go away completely
in the future.

If it is too controversial, we can remove the whole concept before
merging to trunk, meanwhile, I have committed the following patch.

Thanks,

Martin


2015-10-12  Martin Jambor  

* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
* hsa-gen.c: Include params.h.
(init_omp_in_prologue): Emit debug store only if
hsa-gen-debug-stores allow it.

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 85107c9..8f707b5 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "cfganal.h"
 #include "builtins.h"
+#include "params.h"
 
 /* Print a warning message and set that we have seen an error.  */
 
@@ -4643,7 +4644,8 @@ init_omp_in_prologue (void)
   unsigned index = hsa_get_number_decl_kernel_mappings ();
 
   /* Emit store to debug argument.  */
-  set_debug_value (prologue, new hsa_op_immed (1000 + index, BRIG_TYPE_U64));
+  if (PARAM_VALUE (PARAM_HSA_GEN_DEBUG_STORES) > 0)
+set_debug_value (prologue, new hsa_op_immed (1000 + index, BRIG_TYPE_U64));
 }
 
 /* Go over gimple representation and generate our internal HSA one.  SSA_MAP
diff --git a/gcc/params.def b/gcc/params.def
index 3f91992..9a12238 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1152,6 +1152,11 @@ DEFPARAM (PARAM_PARLOOPS_CHUNK_SIZE,
  "parloops-chunk-size",
  "Chunk size of omp schedule for loops parallelized by parloops",
  0, 0, 0)
+
+DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
+ "hsa-gen-debug-stores",
+ "Level of hsa debug stores verbosity",
+ 0, 0, 1)
 /*
 
 Local variables:

Re: Make cgraph frequencies more precise

2015-10-12 Thread Jan Hubicka

> On Sun, Oct 11, 2015 at 11:07 PM, Jan Hubicka  wrote:
> > Hi,
> > this patch fixes a case of extreme imprecision I noticed while looking into
> > profiles of PHP interpretter. There is a function that is called 22 times
> > and contains the main loop.  Now since the frequency of entry block is 
> > dropped
> > to 0, we do not have any information of relative frequencies in the colder
> > areas of the function.
> >
> > Hope all this uglyness with go away with conversion to sreals soonish.
> >
> > Profiledbootstrapped/regtested ppc64le-linux, comitted.
> > * cgraphbuild.c (compute_call_stmt_bb_frequency): Use
> > counts when these are more informative.
> 
> This caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67931

Hmm, interesting it does not show on ppcle.  The problem is however obvious,
while we scale counts we get into roundoff errors that affects the new 
definition
of bb frequencies.  I have reverted the patch as the actual effect on generated
code should be minimal - we use counts instead of frequences when available
on all relevant places.

Honza
> 
> -- 
> H.J.

Re: vector lightweight debug mode

2015-10-12 Thread François Dumont

On 07/10/2015 22:09, Jonathan Wakely wrote:
> On 07/10/15 21:38 +0200, François Dumont wrote:
>> Hi
>>
>>I completed vector assertion mode. Here is the result of the new
>> test you will find in the attached patch.
>>
>> With debug mode:
>> /home/fdt/dev/gcc/build_git/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:375:
>>
>> Error: attempt to advance a dereferenceable (start-of-sequence)
>> iterator 2
>> steps, which falls outside its valid range.
>>
>> Objects involved in the operation:
>>iterator @ 0x0x7fff1c346760 {
>>  type =
>> __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator> std::__cxx1998::vector > >,
>> std::__debug::vector > > (mutable iterator);
>>  state = dereferenceable (start-of-sequence);
>>  references sequence with type 'std::__debug::vector> std::allocator >' @ 0x0x7fff1c3469a0
>>}
>> XFAIL: 23_containers/vector/debug/insert8_neg.cc execution test
>>
>>
>> With assertion mode:
>> /home/fdt/dev/gcc/build_git/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/stl_vector.h:1124:
>>
>> Error: invalid insert position outside container [begin, end) range.
>>
>> Objects involved in the operation:
>>sequence "this" @ 0x0x7fff60b1f870 {
>>  type = std::vector >;
>>}
>>iterator "__position" @ 0x0x7fff60b1f860 {
>>  type = __gnu_cxx::__normal_iterator> std::allocator > >;
>>}
>> XFAIL: 23_containers/vector/debug/insert8_neg.cc execution test
>
> I still don't like the formatted output for the lightweight mode, it
> adds a dependency on I/O support in libc, which is a problem for
> embedded systems.

I thought you just meant I/O dependency in terms of included headers.
The __glibcxx_assert also has some I/O as in case of failure it calls:

  inline void
  __replacement_assert(const char* __file, int __line,
   const char* __function, const char* __condition)
  {
__builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
 __function, __condition);
__builtin_abort();
  }

but it is much more limited than the _GLIBCXX_DEBUG_VERIFY counterpart
which is calling fprintf to send to stderr.

So ok let's limit this mode to glibcxx_assert.

>
> The idea was to just add really cheap checks and abort  :-(
>
> Have you compared codegen with and without assertion mode? How much
> more code is added to member functions like operator[] that must be
> inlined for good performance?  Is it likely to affect inlining
> decisions?
>
> I suspect it will have a much bigger impact than if we just use
> __builtin_abort() as I made it do originally.

I think that impact on compiled code depends more on the assert
condition than on the code executed when this assertion happens to be
false. But I haven't check it and will try.

In the attached patch I eventually:
- Move assertion macros in debug/assertions.h, it sounds like the right
place for those.
- Complete implementation of assertion checks by using __valid_range
function. All checks I can think of are now in place. I still need to
compare with google branch.

Note that for the latter, condition is still evaluated in O(1).
__valid_range detects iterator issues without looping through them.
__valid_range, by considering iterator category, also make those macros
usable in any container.

François

diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index 305d446..04bc339 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -63,6 +63,8 @@
 #include 
 #endif
 
+#include 
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
@@ -403,13 +405,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 vector(_InputIterator __first, _InputIterator __last,
 	   const allocator_type& __a = allocator_type())
 	: _Base(__a)
-{ _M_initialize_dispatch(__first, __last, __false_type()); }
+{
+	  __glibcxx_requires_valid_range(__first, __last);
+	  _M_initialize_dispatch(__first, __last, __false_type());
+	}
 #else
   template
 vector(_InputIterator __first, _InputIterator __last,
 	   const allocator_type& __a = allocator_type())
 	: _Base(__a)
 {
+	  __glibcxx_requires_valid_range(__first, __last);
+
 	  // Check whether it's an integral type.  If so, it's not an iterator.
 	  typedef typename std::__is_integer<_InputIterator>::__type _Integral;
 	  _M_initialize_dispatch(__first, __last, _Integral());
@@ -470,7 +477,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   vector&
   operator=(initializer_list __l)
   {
-	this->assign(__l.begin(), __l.end());
+	this->_M_assign_aux(__l.begin(), __l.end(),
+			random_access_iterator_tag());
 	return *this;
   }
 #endif
@@ -506,12 +514,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	   typename = std::_RequireInputIter<_InputIterator>>
 void
 assign(_InputIterator __first, _InputIterator __last)
-{ _M_assign_dispatch(__first, __last, __false_

[gomp4, committed] Neuter gang-single code in gang-redundant mode

2015-10-12 Thread Tom de Vries


Hi,

ATM stores in the non-loop parts of an oacc kernels region are executed 
by all gangs.


This patch makes sure that those stores are only executed by gang 0. I'm 
not aware atm of any related failing test cases, but it at least reduces 
the amount of stores performed.


Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Neuter gang-single code in gang-redundant mode

2015-10-12  Tom de Vries  

	* omp-low.c (is_oacc_kernels): New function.
	(lower_omp_target): Insert gang-pos at start of kernels region.
	(execute_oacc_device_lower): Handle IFN_GOACC_DIM_POS without result.
	* tree-parloops.c (create_parallel_loop): Don't expect
	single_pred_p (bb) if oacc_kernels_p.
	(oacc_entry_exit_ok_1): Add and handle reduction_stores parameter.
	(oacc_entry_exit_single_gang): New function.
	(oacc_entry_exit_ok): Call oacc_entry_exit_single_gang.
---
 gcc/omp-low.c   |  27 --
 gcc/tree-parloops.c | 139 ++--
 2 files changed, 159 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e700dd1..df08c2c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -294,6 +294,17 @@ is_oacc_parallel (omp_context *ctx)
 	  == GF_OMP_TARGET_KIND_OACC_PARALLEL));
 }
 
+/* Return true if CTX corresponds to an oacc kernels region.  */
+
+static bool
+is_oacc_kernels (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+	  && (gimple_omp_target_kind (ctx->stmt)
+	  == GF_OMP_TARGET_KIND_OACC_KERNELS));
+}
+
 /* Return true if VAR is a is private reduction variable.  A reduction
variable is considered private if the variable is local to the
offloaded region, or if it is the first reduction to use a mapped
@@ -12962,6 +12973,13 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
 {
+  if (is_oacc_kernels (ctx))
+	{
+	  tree arg = build_int_cst (integer_type_node, GOMP_DIM_GANG);
+	  gcall *gang_single
+	= gimple_build_call_internal (IFN_GOACC_DIM_POS, 1, arg);
+	  gimple_seq_add_stmt (&new_body, gang_single);
+	}
   gimple_seq_add_stmt (&new_body, gimple_build_omp_entry_end ());
   if (ctx->reductions)
 	{
@@ -15696,7 +15714,9 @@ execute_oacc_device_lower ()
 
 	  case IFN_GOACC_DIM_POS:
 	  case IFN_GOACC_DIM_SIZE:
-	if (oacc_xform_dim (call, dims, ifn_code == IFN_GOACC_DIM_POS))
+	if (gimple_call_lhs (call) == NULL_TREE)
+	  rescan = -1;
+	else if (oacc_xform_dim (call, dims, ifn_code == IFN_GOACC_DIM_POS))
 	  rescan = 1;
 	break;
 
@@ -15740,8 +15760,9 @@ execute_oacc_device_lower ()
 	  gsi_next (&gsi);
 	else if (rescan < 0)
 	  {
-	replace_uses_by (gimple_vdef (call),
-			 gimple_vuse (call));
+	if (gimple_vdef (call))
+	  replace_uses_by (gimple_vdef (call),
+			   gimple_vuse (call));
 	gsi_remove (&gsi, true);
 	  }
   }
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index d4eb32a..0ac416d 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2031,9 +2031,11 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 
   /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   bb = loop_preheader_edge (loop)->src;
-  paral_bb = single_pred (bb);
   if (!oacc_kernels_p)
-gsi = gsi_last_bb (paral_bb);
+{
+  paral_bb = single_pred (bb);
+  gsi = gsi_last_bb (paral_bb);
+}
   else
 /* Make sure the oacc parallel is inserted on top of the oacc kernels
region.  */
@@ -2859,7 +2861,8 @@ ref_conflicts_with_region (gimple_stmt_iterator gsi, ao_ref *ref,
 static bool
 oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 		  tree omp_data_i,
-		  reduction_info_table_type *reduction_list)
+		  reduction_info_table_type *reduction_list,
+		  bitmap reduction_stores)
 {
   unsigned i;
   basic_block bb;
@@ -2919,6 +2922,9 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 			  single_imm_use (val, &use_p, &use_stmt);
 			  if (gimple_store_p (use_stmt))
 			{
+			  unsigned int id
+= SSA_NAME_VERSION (gimple_vdef (use_stmt));
+			  bitmap_set_bit (reduction_stores, id);
 			  skip_stmt = use_stmt;
 			  if (dump_file)
 {
@@ -2948,6 +2954,9 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 		   && !gimple_vdef (stmt)
 		   && !gimple_vuse (stmt))
 	continue;
+	  else if (gimple_call_internal_p (stmt)
+		   && gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_POS)
+	continue;
 	  else
 	{
 	  if (dump_file)
@@ -2974,6 +2983,106 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
   return true;
 }
 
+/* Find stores inside REGION_BBS and outside IN_LOOP_BBS, and guard them with
+   GANG_POS == 0, except when the stores are REDUCTION_STORES.  Return true
+   if any changes were made.  */
+
+static bool
+oacc_entry_exit_single_gang (bitmap in_loop_bbs, vec region_bbs,
+			 bitmap redu

[PATCH] Fix libgomp OpenACC test

2015-10-12 Thread James Norris


Hi,

The attached patch fixes a test where the for-loop
iterator was not initialized.

Committed to trunk as obvious.

Jim
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
index cc915a9..8a51ee3 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -12,7 +12,7 @@ unsigned int n = N;
 int
 main (void)
 {
-  for (unsigned int i; i < n; ++i)
+  for (unsigned int i = 0; i < n; ++i)
 {
   a[i] = i % 3;
   b[i] = i % 5;
@@ -25,7 +25,7 @@ main (void)
   c[i] = a[i] + b[i];
   }
 
-  for (unsigned int i; i < n; ++i)
+  for (unsigned int i = 0; i < n; ++i)
 if (c[i] != (i % 3) + (i % 5))
   abort ();

Re: [PATCH] Check no unreachable blocks in inverted_post_order_compute

2015-10-12 Thread Jeff Law


On 10/12/2015 07:10 AM, Tom de Vries wrote:

Hi,

in the header comment of function inverted_post_order_compute in
cfganal.c we find:
...
This function assumes that all blocks in the CFG are reachable
from the ENTRY (but not necessarily from EXIT).
...

This patch checks that there are indeed no unreachable blocks when
calling inverted_post_order_compute.

OK for trunk if bootstrap/regtest succeeds?
Yes.  I won't queue it behind Mikhail's changes.  Consider yourself 
lucky :-)


jeff

[gomp4] Backport from trunk

2015-10-12 Thread James Norris


Hi,

The attached patch was backported from trunk.

commit 140722d9d2d574574c982f4616a80bb0ef766276
Author: jnorris 
Date:   Mon Oct 12 20:22:30 2015 +

* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: Fix loop
initializer.


Jim
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
index cc915a9..8a51ee3 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -12,7 +12,7 @@ unsigned int n = N;
 int
 main (void)
 {
-  for (unsigned int i; i < n; ++i)
+  for (unsigned int i = 0; i < n; ++i)
 {
   a[i] = i % 3;
   b[i] = i % 5;
@@ -25,7 +25,7 @@ main (void)
   c[i] = a[i] + b[i];
   }
 
-  for (unsigned int i; i < n; ++i)
+  for (unsigned int i = 0; i < n; ++i)
 if (c[i] != (i % 3) + (i % 5))
   abort ();

Re: [RFC VTV] Fix VTV for targets that have section anchors.

2015-10-12 Thread Jeff Law


On 10/09/2015 03:17 AM, Ramana Radhakrishnan wrote:

This started as a Friday afternoon project ...

It turned out enabling VTV for AArch64 and ARM was a matter of fixing
PR67868 which essentially comes from building libvtv with section
anchors turned on. The problem was that the flow of control from
output_object_block through to switch_section did not have the same
special casing for the vtable section that exists in
assemble_variable.
That's some ugly code.  You might consider factoring that code into a 
function and just calling it from both places.  Your version doesn't 
seem to handle PECOFF, so I'd probably refactor from assemble_variable.




However both these failures also occur on x86_64 - so I'm content to
declare victory on AArch64 as far as basic enablement goes.

Cool.



1. Are the generic changes to varasm.c ok ? 2. Can we take the
AArch64 support in now, given this amount of testing ? Marcus /
Caroline ? 3. Any suggestions / helpful debug hints for VTV debugging
(other than turning VTV_DEBUG on and inspecting trace) ?
I think that with refactoring they'd be good to go.  No opinions on the 
AArch64 specific question -- call for the AArch64 maintainers.


Good to see someone hacking on vtv.  It's in my queue to look at as well.

jeff

Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-12 Thread Jeff Law


On 10/05/2015 05:27 PM, Mikhail Maltsev wrote:

3. Another one: gcc.c-torture/compile/pr52073.c which is, I guess, caused by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67816 (the backtrace is the same,
at least).

FYI, this is fixed on the trunk.

jeff

Re: [PATCH 2/9] ENABLE_CHECKING refactoring: libcpp

2015-10-12 Thread Jeff Law


On 10/06/2015 06:40 AM, Bernd Schmidt wrote:

I'm not entirely sure what to make of this series. There seem to be good
bits in there but also some things I find questionable. I'll add some
comments on things that occur to me.
Maybe we should start pulling out the bits that we think are ready & 
good and start installing them independently.


I'm less concerned about getting conditional compilation out of the lib* 
directories right now than I am the core of the compiler.  So I wouldn't 
lose any sleep if we extracted the obviously good bits for libcpp tested 
& installed those, then table the rest of the libcpp stuff and focused 
on the core compiler.


Thoughts?




On 10/06/2015 01:28 AM, Mikhail Maltsev wrote:


* include/line-map.h: Fix use of ENABLE_CHECKING.


Fix how? What's wrong with it?


  /* Sanity-checks are dependent on command-line options, so it is
 called as a subroutine of cpp_read_main_file ().  */
-#if ENABLE_CHECKING
+#if CHECKING_P
  static void sanity_checks (cpp_reader *);
  static void sanity_checks (cpp_reader *pfile)
  {


Ok, this one seems to be a real problem (should have been #ifdef), but...

Agreed.




-#ifdef ENABLE_CHECKING
+#if CHECKING_P


I fail to see the point of this change.
I'm guessing (and Mikhail, please correct me if I'm wrong), but I think 
he's trying to get away from ENABLE_CHECKING and instead use a macro 
which is always defined to a value.






-#ifdef ENABLE_CHECKING
-  if (kind == MACRO_ARG_TOKEN_STRINGIFIED
-  || !track_macro_exp_p)
-/* We can't set the location of a stringified argument
-   token and we can't set any location if we aren't tracking
-   macro expansion locations.   */
-abort ();
-#endif
+  /* We can't set the location of a stringified argument
+ token and we can't set any location if we aren't tracking
+ macro expansion locations.   */
+  gcc_checking_assert (kind != MACRO_ARG_TOKEN_STRINGIFIED
+   && track_macro_exp_p);


This kind of change seems good. I think the patch series would benefit
if it was separated thematically rather than by sets of files. I.e.,
merge all changes like this one into one patch or maybe a few if it
grows too large.
That would work for me.  I just asked Mikhail to try and break down the 
monster patch into something that could be digested.  Ultimately my goal 
was to make it possible to start reviewing and installing these bits and 
keep making progress on removing the conditionally compiled code.





+/* Redefine abort to report an internal error w/o coredump, and
+   reporting the location of the error in the source file.  */
+extern void fancy_abort (const char *, int, const char *)
ATTRIBUTE_NORETURN;
+#define abort() fancy_abort (__FILE__, __LINE__, __FUNCTION__)
+
+/* Use gcc_assert(EXPR) to test invariants.  */
+#if ENABLE_ASSERT_CHECKING
+#define gcc_assert(EXPR) \
+   ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__),
0 : 0))
+#elif (GCC_VERSION >= 4005)
+#define gcc_assert(EXPR) \
+  ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0
: 0))
+#else
+/* Include EXPR, so that unused variable warnings do not occur.  */
+#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
+#endif


Probably a good thing, but it looks like libcpp has grown its own
variant linemap_assert; we should check whether that can be replaced.

Also, the previous patch already introduces a use of gcc_assert, or at
least a reference to it, and it's only defined here. The two
modifications of libcpp/system.h should probably be merged into one.

Agreed.

jeff

[PATCH] Allow FSM threader to thread more complex conditions

2015-10-12 Thread Jeff Law



Right now the FSM threader only handles trivial conditions. 
Specifically looking up a naked SSA_NAME (used for GIMPLE_SWITCH) and 
SSA_NAME != 0.


The FSM threader need not be so restrictive.  We can easily use it to 
lookup things like SSA_NAME  constant for integral and pointer names.


Essentially the FSM threader walks backwards to find a value for a name. 
 If we find a constant, we can then substitute the value into the 
expression and simplify.  This patch implements the substitution part, 
then exploits it from tree-ssa-threadedge.c.


I'm also renaming the test I added in my previous commit.  It was poorly 
associated with DOM when in fact it was testing the FSM threader when 
called via VRP.  Given the desire to run the FSM threader independently, 
I'm just going to call these ssa-thread-.c tests.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on 
the trunk.


Jeff
commit 4b7f0fb7fbb338ae677c83d3be33570edd464885
Author: Jeff Law 
Date:   Mon Oct 12 15:37:42 2015 -0600

[PATCH] Allow FSM threader to thread more complex conditions

* tree-ssa-threadbackward.c (get_gimple_control_stmt): New function.
(fsm_find_control_stmt_paths): Change name of first argument to
more accurately relfect what it really is.  Handle simplification
of GIMPLE_COND after finding a thread path for NAME.
* tree-ssa-threadedge.c (simplify_control_stmt_condition): Allow
nontrivial conditions to be handled by FSM threader.
(thread_through_normal_block): Extract the name to looup via
FSM threader from COND_EXPR.

* gcc.dg/tree-ssa/ssa-thread-12.c: New test.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expected output.
* gcc.dg/tree-ssa/ssa-thread-11.c: Renamed from
ssa-dom-thread-11.c.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 61e46ff..a865043 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -7,6 +7,15 @@
 
 2015-10-12  Jeff Law  
 
+   * tree-ssa-threadbackward.c (get_gimple_control_stmt): New function.
+   (fsm_find_control_stmt_paths): Change name of first argument to
+   more accurately relfect what it really is.  Handle simplification
+   of GIMPLE_COND after finding a thread path for NAME. 
+   * tree-ssa-threadedge.c (simplify_control_stmt_condition): Allow
+   nontrivial conditions to be handled by FSM threader.
+   (thread_through_normal_block): Extract the name to looup via
+   FSM threader from COND_EXPR.
+
* tree-ssa-threadbackward.c (fsm_find_thread_path): Remove
restriction that traced SSA_NAME is a user variable.
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 89f3363..4a08f0f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,10 @@
 2015-10-12  Jeff Law  
 
+   * gcc.dg/tree-ssa/ssa-thread-12.c: New test.
+   * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expected output.
+   * gcc.dg/tree-ssa/ssa-thread-11.c: Renamed from
+   ssa-dom-thread-11.c.
+
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
 
 2015-10-12  Ville Voutilainen  
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
deleted file mode 100644
index 03d0334..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp2-details" } */
-/* { dg-final { scan-tree-dump "FSM" "vrp2" } } */
-
-void abort (void);
-typedef struct bitmap_head_def *bitmap;
-typedef const struct bitmap_head_def *const_bitmap;
-typedef struct bitmap_obstack
-{
-  struct bitmap_obstack *next;
-  unsigned int indx;
-}
-bitmap_element;
-typedef struct bitmap_head_def
-{
-  bitmap_element *first;
-}
-bitmap_head;
-static __inline__ unsigned char
-bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt,
-   bitmap_element * dst_prev, const bitmap_element * a_elt,
-   const bitmap_element * b_elt)
-{
-  ((void) (!(a_elt || b_elt) ? abort (), 0 : 0));
-}
-
-unsigned char
-bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
- const_bitmap kill)
-{
-  bitmap_element *dst_elt = dst->first;
-  const bitmap_element *a_elt = a->first;
-  const bitmap_element *b_elt = b->first;
-  const bitmap_element *kill_elt = kill->first;
-  bitmap_element *dst_prev = ((void *) 0);
-  while (a_elt || b_elt)
-{
-  if (b_elt && kill_elt && kill_elt->indx == b_elt->indx
- && (!a_elt || a_elt->indx >= b_elt->indx));
-  else
-   {
- bitmap_elt_ior (dst, dst_elt, dst_prev, a_elt, b_elt);
- if (a_elt && b_elt && a_elt->indx == b_elt->indx)
-   ;
- else if (a_elt && (!b_elt || a_elt->indx <= b_elt->indx))
-   a_elt = a_elt->next;
-   }
-}
-}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index d8be023..445

Re: Possible patch for PR fortran/67806

2015-10-12 Thread Louis Krupp

 On Mon, 12 Oct 2015 08:41:43 -0700 Steve 
Kargl wrote  
 > On Sun, Oct 11, 2015 at 10:18:48PM -0700, Louis Krupp wrote: 
 > > The problem involves a derived type with a character component declared 
 > > CHARACTER(NULL()) or CHARACTER(NULL(n)), where mold argument n is an 
 > > integer pointer. 
 > >  
 >  
 > I was looking at 67805 this weekend, which is somewhat 
 > related to this PR.  AFAICT, gfortran does no checking 
 > for n in CHARACTER(LEN=n). n should be a scalar-int-expr 
 > (that is scalar INTEGER expression).  NULL() is not 
 > an integer, and NULL(n) is a disassociated pointer.  So, 
 > I believe neither can appear in an scalar-int-expr. 
 >  
 > Note, also that there is a table in 13.7.125 on where 
 > NULL() can appear. 
 >  
 > My patch for 67805 leads to one regression that I've been 
 > unable to resolve. 

For what it's worth, my patch does absolutely nothing for 67805.

As to my error message, should I fold this misuse of NULL() into the existing 
message saying "Character length needs to be a constant specification 
expression" and not mention NULL()?

There are times I wish I knew the story behind the code in some of these bug 
reports.  Were they written by someone looking for edge cases that might cause 
trouble, or was someone actually trying to do something?

Louis

Re: [patch 1/6] scalar-storage-order merge: Ada front-end

2015-10-12 Thread Jeff Law


On 10/06/2015 05:00 AM, Eric Botcazou wrote:

This is the Ada front-end (in fact mostly gigi) part.

ada/
* freeze.adb (Check_Component_Storage_Order): Skip a record component
if it has Complex_Representation.
(Freeze_Record_Type): If the type has Complex_Representation, skip
the regular treatment of Scalar_Storage_Order attribute and instead
issue a warning if it is present.
* gcc-interface/gigi.h (set_reverse_storage_order_on_pad_type):
Declare.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Set the
storage order on the enclosing record for a packed array type.
: Set the storage order.
: Likewise.
: Likewise.
: Likewise.
(gnat_to_gnu_component_type): Set the reverse storage order on a
padded type built for a non-bit-packed array.
(gnat_to_gnu_field): Likewise.
(components_to_record): Deal with TYPE_REVERSE_STORAGE_ORDER.
* gcc-interface/utils.c (make_packable_type): Likewise.
(pad_type_hasher::equal): Likewise.
(gnat_types_compatible_p): Likewise.
(unchecked_convert): Likewise.
(set_reverse_storage_order_on_pad_type): New public function.
* gcc-interface/trans.c (Attribute_to_gnu): Adjust call to
get_inner_reference.
* gcc-interface/utils2.c (build_unary_op): Likewise.
(gnat_build_constructor): Deal with TYPE_REVERSE_STORAGE_ORDER.
(gnat_rewrite_reference): Propagate REF_REVERSE_STORAGE_ORDER.

FWIW, I consider all the bits referenced above as self-approvable.

jeff

Re: [patch 6/6] scalar-storage-order merge: testsuite

2015-10-12 Thread Jeff Law


On 10/06/2015 05:07 AM, Eric Botcazou wrote:

This is the testsuite part.

testsuite/
* c-c++-common/sso-1.c: New test.
* c-c++-common/sso-2.c: Likewise.
* c-c++-common/sso-3.c: Likewise.
* c-c++-common/sso-4.c: Likewise.
* c-c++-common/sso-5.c: Likewise.
* c-c++-common/sso-6.c: Likewise.
* c-c++-common/sso-7.c: Likewise.
 * c-c++-common/sso: New directory.
 * gcc.dg/sso-1.c: New test.
 * g++.dg/sso-1.C: Likewise.
 * gcc.dg/sso: New directory.
 * g++.dg/sso: Likewise.
* gcc.target/i386/movbe-3.c: New test.
 * gnat.dg/sso1.adb: New test.
 * gnat.dg/sso2.ad[sb]: Likewise.
 * gnat.dg/sso3.adb: Likewise.
 * gnat.dg/sso4.adb: Likewise.
 * gnat.dg/sso5.adb: Likewise.
 * gnat.dg/sso6.adb: Likewise.
 * gnat.dg/sso7.adb: Likewise.
 * gnat.dg/specs/sso1.ads: Likewise.
 * gnat.dg/specs/sso2.ads: Likewise.
 * gnat.dg/sso: New directory.

And this is OK once the prerequisites have gone in.

Jeff

Re: [patch 3/6] scalar-storage-order merge: C++ front-end

2015-10-12 Thread Jeff Law


On 10/06/2015 05:03 AM, Eric Botcazou wrote:

This is the C++ front-end part, probably incomplete but passes the testsuite.

cp/
* class.c: Add c-family/c-pragma.h.
(finish_struct_1): If structure has reverse scalar storage order,
rewrite the type of array fields with scalar component.  Call
maybe_apply_pragma_scalar_storage_order on entry.
* constexpr.c (reduced_constant_expression_p): Unfold recursion and
deal with TYPE_REVERSE_STORAGE_ORDER.
* typeck.c (structural_comptypes): Return false if two aggregate
types have different scalar storage order.
(cp_build_addr_expr_1) : New case.  Issue the
error for bit-fields here and not later.
: Issue error and warning for reverse scalar storage
order.
* typeck2.c (split_nonconstant_init_1) : Adjust call to
initializer_constant_valid_p.

Explicitly leaving for Jason.

jeff

Re: [PATCH] gcc/ira.c: Check !HAVE_FP_INSTEAD_INSNS when frame pointer is needed and as global register

2015-10-12 Thread Chen Gang

On 10/12/15 18:49, Bernd Schmidt wrote:
> On 10/11/2015 05:16 PM, Chen Gang wrote:
>> For some architectures (e.g. bfin), when this case occurs, they will use
>> another instructions instead of frame pointer (e.g. LINK for bfin), so
>> they can still generate correct output assembly code.
> 
> What is "this case"? I don't think you have explained the problem you are 
> trying to solve.
> 

It is about Bug65804. I found it when building Linux bfin kernel, and
the original old version gcc can build kernel successfully.

But since the git commit "e52beba PR debug/54694", it will be failed: it
intends to check failure during building time. But for bfin, it has LINK
insn instead of, so it is still OK, and gcc should not report failure.

>> 2015-10-11  Chen Gang  
>>
>> gcc/
>> * config.in: Add HAVE_FP_INSTEAD_INSNS.
>> * configure: Check HAVE_FP_INSTEAD_INSNS to set 0 or 1.
> 
> And of course, that should not be a configure check. If at all, use a target 
> hook.
> 

OK, thanks. If we really need to fix it, which target hook should I use?
(or do we need a new target hook?)

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

Re: [PATCH] gcc/fold-const.c: Correct the report warning position.

2015-10-12 Thread Chen Gang

Hello all:

Is this patch OK? If it still needs to do anything, please let me know,
I shall try.

Thanks.

On 9/1/15 21:42, Chen Gang wrote:
> On 8/31/15 19:12, Richard Biener wrote:
>> On Sat, Aug 29, 2015 at 2:57 PM, Chen Gang  
>> wrote:
>>>
>>> It is about bug63510: current input_location isn't precise for reporting
>>> warning. The correct location is gimple location of current statement.
>>
>> Looks ok to me. Ok if bootstrapped and tested.
>>
> 
> It passes "make check". :-)
> 
> Thanks.
> --
> Chen Gang
> 
> Open, share, and attitude like air, water, and life which God blessed
> 
> 

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-10-12 Thread Alan Modra

On Mon, Oct 12, 2015 at 10:15:04AM -0500, Lynn A. Boger wrote:
> Thanks for doing this Alan.  I agree this looks better to me.
> 
> I assume by "etc" you mean you did biarch builds for your bootstraps on BE?

By "etc" I meant "and regression tested".

I built four configurations, powerpc-linux 32-bit only,
powerpc64le-linux 64-bit only, biarch powerpc-linux with 32-bit
default, and biarch powerpc64-linux with 64-bit default.

-- 
Alan Modra
Australia Development Lab, IBM

Re: Test for __cxa_thread_atexit_impl when cross-compiling libstdc++ for GNU targets

2015-10-12 Thread Joseph Myers

On Mon, 12 Oct 2015, Bernd Schmidt wrote:

> A similar sequence of tests also occurs for *-aix*. I don't suppose the
> function is likely to exist there or on other non-glibc targets?

Given that the case there has "# We don't yet support AIX's TLS ABI." and 
GCC_CHECK_TLS commented out, I don't think this function (which is 
concerned with support for destructors of C++11 thread_local variables) is 
of any current relevance to that case.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] gcc/ira.c: Check !HAVE_FP_INSTEAD_INSNS when frame pointer is needed and as global register

2015-10-12 Thread Mike Stump

On Oct 12, 2015, at 3:32 PM, Chen Gang  wrote:
> 
> OK, thanks. If we really need to fix it, which target hook should I use?
> (or do we need a new target hook?)

So, the first discussion would be if it is, or is not a bug.  If it isn’t, then 
there is no fix.  No fix, no target hook.  So far, Bernd said not a bug.

So, I’ll note that one _can_ do this with the stack pointer, as a fixed 
register.
When the frame pointer is fixed, one cannot do this.

The code that does this is:

  /* Diagnose uses of the hard frame pointer when it is used as a global

 register.  Often we can get away with letting the user appropriate 

 the frame pointer, but we should let them know when code generation

 makes that impossible.  */
  if (global_regs[HARD_FRAME_POINTER_REGNUM] && frame_pointer_needed)
{
  tree decl = global_regs_decl[HARD_FRAME_POINTER_REGNUM];
  error_at (DECL_SOURCE_LOCATION (current_function_decl),
"frame pointer required, but reserved");
  inform (DECL_SOURCE_LOCATION (decl), "for %qD", decl);
}

to `fix it’, one would simple remove this chunk as misguided and fix up any 
code gen issues exposed.

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-12 Thread Richard Henderson


On 10/12/2015 09:10 PM, Richard Biener wrote:

The check_loadstore change should instead have adjusted the
flag_delete_null_pointer_checks guard in
infer_nonnull_range_by_dereference.



Nope, that doesn't work.  You have to wait until you see the actual MEM
being dereferenced before you can look at it's address space.


Well, as we are explicitely looking for the pointer 'op' we know the
address-space
beforehand, no?  TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op)))?


No.  We don't even know what type we're looking for; we're merely looking for 
any use of NULL within any memory reference within STMT.


Specifically, when we're not looking for a specific SSA_NAME (which would be 
properly typed), we always pass in a plain (void *)0:


  bool by_dereference
= infer_nonnull_range_by_dereference (stmt, null_pointer_node);



r~

Re: [PATCH] New attribute to create target clones

2015-10-12 Thread Evgeny Stupachenko

Hi All,

Here is a new version of patch (attached).
Bootstrap and make check are in progress (all new tests passed).

New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
than "-mavx" are passed.
However it has the same behavior if "target_clones" attribute is
replaced by 2 corresponding "target" attributes.
I've filed PR67946 on this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946

Thanks,
Evgeny

ChangeLog:

2015-10-13  Evgeny Stupachenko  
gcc/
* Makefile.in (OBJS): Add multiple_target.o.
* attrib.c (make_attribute): Moved from config/i386/i386.c
* config/i386/i386.c (make_attribute): Deleted.
* multiple_target.c (make_attribute): New.
(create_dispatcher_calls): Ditto.
(get_attr_len): Ditto.
(get_attr_str): Ditto.
(is_valid_asm_symbol): Ditto.
(create_new_asm_name): Ditto.
(create_target_clone): Ditto.
(expand_target_clones): Ditto.
(ipa_target_clone): Ditto.
(ipa_dispatcher_calls): Ditto.
* passes.def (pass_target_clone): Two new ipa passes.
* tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
* c-common.c (handle_target_clones_attribute): New.
* (c_common_attribute_table): Add handle_target_clones_attribute.
* (handle_always_inline_attribute): Add check on target_clones
attribute.
* (handle_target_attribute): Ditto.

gcc/testsuite
* gcc.dg/mvc1.c: New test for multiple targets cloning.
* gcc.dg/mvc2.c: Ditto.
* gcc.dg/mvc3.c: Ditto.
* gcc.dg/mvc4.c: Ditto.
* gcc.dg/mvc5.c: Ditto.
* gcc.dg/mvc6.c: Ditto.
* gcc.dg/mvc7.c: Ditto.
* g++.dg/ext/mvc1.C: Ditto.
* g++.dg/ext/mvc2.C: Ditto.
* g++.dg/ext/mvc3.C: Ditto.
* g++.dg/ext/mvc4.C: Ditto.

gcc/doc
* doc/extend.texi (target_clones): New attribute description.

On Sat, Oct 10, 2015 at 12:44 AM, Evgeny Stupachenko  wrote:
> On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka  wrote:
>>> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka  wrote:
>>> >> >Of course it also depends what you inline into function. You can have
>>> >> >
>>> >> >bar() target(-mavx) {fancy avx code}
>>> >> >foobar() { .. if (avx) bar();}
>>> >> >foo() ctarget(-mavx,-mno-avx) {foobar();}
>>>
>>> "no-" targets are not supported
>>
>> Why not? I suppose I can use -march=x86_64 in a file compiled with 
>> -march=core-avx2 or something like that, too.
> Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) 
> returns:
>
> aaa.cpp: In function '':
> aaa.cpp:7:5: error: No dispatcher found for no-avx
>  int bar()
>  ^
>
>>>
>>> >> >
>>> >> >Now if you compile with -mavx and because ctarget takes effect only 
>>> >> >after inlining,
>>> >> >at inlining time the target attributes will match and we can edn up 
>>> >> >inline bar->foobar->foo.
>>> >> >After that we multiversion foo and drop AVX flag we will likely get ICE 
>>> >> >at expansion
>>> >> >time.
>>> >> But isn't that avoided by fixing up the call graph so that all calls
>>> >> to the affected function are going through the dispatcher?  Or is
>>> >> that happening too late?
>>> >
>>> > There is dispatcher only for foo that is the root of the callgarph tree.
>>> > When inlining we compare target attributes for match (in 
>>> > can_inline_edge_p).
>>> > We do not compare ctarget attributes.  Expanding ctarget to target early 
>>> > would
>>> > avoid need for ctarget handling.
>>> Currently inlining is disabled for functions with target_clone attribute:
>>
>> Do you also disable inlining into functions with target_clone?
>> What I am concerned about is early inliner inlining (say) AVX code into 
>> ctarget
>> function because at early inlining time the target is not applied, yet.
> Right. Now I've got your point and ICE on the test.
> Yes the solution is to disable inline into target_clones function.
> Or to move the pass creating clones before inline (as you suggested)
> and leave dispatcher creator after inline.
>
> I like you suggestion. It fixes the ICE.
> I'll fix the patch and retest.
>
> Thank you for the review,
> Evgeny.
>
>
>>
>> Honza

target_clones.patch
Description: Binary data

90 matches

Mail list logo