Re: [PATCH] Add fields to struct gomp_thread for debugging purposes

2017-11-23 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 08:25:35AM -0700, Kevin Buettner wrote:
> On Sat, 4 Nov 2017 21:39:14 -0700
> Kevin Buettner  wrote:
> 
> > On Tue, 31 Oct 2017 08:03:22 +0100
> > Jakub Jelinek  wrote:
> > 
> > > On Mon, Oct 30, 2017 at 04:06:15PM -0700, Kevin Buettner wrote:  
> > > > This patch adds a new member named "pthread_id" to the gomp_thread
> > > > struct.  It is initialized in team.c.
> > > 
> > > That part is reasonable, though it is unclear how the debugger will
> > > query it (through OMPD, or through hardcoded name lookup of the struct and
> > > field in libgomp's debug info, something else).  But the field certainly
> > > has to be guarded by #ifdef LIBGOMP_USE_PTHREADS, otherwise it will break
> > > NVPTX offloading or any other pthread-less libgomp ports.
> > > Another question is exact placement of the field, struct gomp_thread
> > > vs. struct gomp_team_state etc.  Maybe it is ok, as the pthread_id is
> > > the same once the thread is created, doesn't change when we create more
> > > levels.  
> > 
> > Assuming we can figure out how to work the rest of it out, I'll submit
> > a new patch with the appropriate ifdef.
> 
> I've decided to try for a more incremental approach.  This patch,
> below, retains the portion that looked reasonable to you.  I've
> stripped out the portions for finding the thread parent and have
> added the ifdef guards that you asked for.
> 
> Is this part okay?

Actually, isn't this redundant information at least on Linux?
How would the debugger find the pthread_id field in the TLS?

If it is TLS and libgomp uses -ftls-model=initial-exec on Linux, then
the difference between the base of the TLS area (which is I believe
pthread_self ()) and struct gomp_thread is fixed (and can be found in the
debug info on most linux targets, except e.g. aarch64 which doesn't have
as/ld support for dtoff-ish relocations).  So, if the debugger can find
the gomp_thread, then it can find pthread_self () and vice versa.  Dunno
if libthread_db or infinity notes in libpthread provide any info for this,
and/or infinity notes in libgomp could add some further info so that the
debugger can do this portably.

Of course, for targets where this isn't possible I'm not against your patch
if it is conditionalized on those targets that can't do better.
In any case, IMHO the debugger shouldn't hardcode that stuff but
ask infinity notes in libgomp and libpthread how to compute it.
Or as temporary alternative to the infinity notes it could be even a
function in the library that given struct gomp_thread returns corresponding
pthread_t.  Though, it would be nice to know exactly what the debugger wants
to do.  Perhaps it is also already part of OMPD and we could just implement
subset of it.

Jakub


Re: [PATCH] Fix PR83089

2017-11-23 Thread Richard Biener
On Wed, 22 Nov 2017, Christophe Lyon wrote:

> On 22 November 2017 at 09:44, Richard Biener  wrote:
> >
> > The following fixes if-conversion to free SCEV/niter estimates because
> > it DCEs stmts.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2017-11-22  Richard Biener  
> >
> > PR tree-optimization/83089
> > * tree-if-conv.c (pass_if_conversion::execute): If anything
> > changed reset SCEV and free the number of iteration estimates.
> >
> > * gcc.dg/pr83089.c: New testcase.
> >
> > Index: gcc/tree-if-conv.c
> > ===
> > --- gcc/tree-if-conv.c  (revision 254990)
> > +++ gcc/tree-if-conv.c  (working copy)
> > @@ -2959,6 +2959,12 @@ pass_if_conversion::execute (function *f
> > && !loop->dont_vectorize))
> >todo |= tree_if_conversion (loop);
> >
> > +  if (todo)
> > +{
> > +  free_numbers_of_iterations_estimates (fun);
> > +  scev_reset ();
> > +}
> > +
> >if (flag_checking)
> >  {
> >basic_block bb;
> > Index: gcc/testsuite/gcc.dg/pr83089.c
> > ===
> > --- gcc/testsuite/gcc.dg/pr83089.c  (nonexistent)
> > +++ gcc/testsuite/gcc.dg/pr83089.c  (working copy)
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ftree-loop-if-convert -ftree-parallelize-loops=2" } 
> > */
> > +
> > +int rl, s8;
> > +
> > +void
> > +it (int zy, short int el)
> > +{
> > +  int hb;
> > +
> > +  while (el != 0)
> > +{
> > +  hb = el;
> > +  for (rl = 0; rl < 200; ++rl)
> > +   {
> > + for (s8 = 0; s8 < 2; ++s8)
> > +   {
> > +   }
> > + if (s8 < 3)
> > +   zy = hb;
> > + if (hb == 0)
> > +   ++s8;
> > + zy += (s8 != -1);
> > +   }
> > +  el = zy;
> > +}
> > +}
> 
> I see the new testcase failing to compile on bare-metal targets
> (aarch64-elf, arm-eabi):
> spawn -ignore SIGHUP
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-elf/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-elf/gcc3/gcc/
> /gcc/testsuite/gcc.dg/pr83089.c -fno-diagnostics-show-caret
> -fdiagnostics-color=never -O2 -ftree-loop-if-convert
> -ftree-parallelize-loops=2 -S -specs=aem-ve.specs -o pr83089.s
> xgcc: error: unrecognized command line option '-pthread'
> compiler exited with status 1
> FAIL: gcc.dg/pr83089.c (test for excess errors)
> 
> Where does that -pthread come from?

>From some specs processing I guess.  Didn't think that it could
happen for a compile only testcase.

Fixed with

2017-11-23  Richard Biener  

PR tree-optimization/83089
* gcc.dg/pr83089.c: Require pthread.

Index: gcc/testsuite/gcc.dg/pr83089.c
===
--- gcc/testsuite/gcc.dg/pr83089.c  (revision 255060)
+++ gcc/testsuite/gcc.dg/pr83089.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target pthread } */
 /* { dg-options "-O2 -ftree-loop-if-convert -ftree-parallelize-loops=2" } */
 
 int rl, s8;


[PATCH][2/2] Alternate fix for PR81403

2017-11-23 Thread Richard Biener

This is the real fix.  The issue is that get_representative_for returns
the first SSA name with the value it finds which can be a non-leader.
But we're shoving this into the expression simplification machinery
which then ends up picking up bogus SSA range info.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-11-23  Richard Biener  

PR tree-optimization/81403
* tree-ssa-pre.c (get_representative_for): Add parameter specifying
a block we need a leader relative to.
(phi_translate_1): For nary processing require a leader from
get_representative_for given we run expression simplification
using match-and-simplify.  Remove previous fix.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 255091)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -1257,7 +1257,7 @@ get_expr_type (const pre_expr e)
   gcc_unreachable ();
 }
 
-/* Get a representative SSA_NAME for a given expression.
+/* Get a representative SSA_NAME for a given expression that is available in B.
Since all of our sub-expressions are treated as values, we require
them to be SSA_NAME's for simplicity.
Prior versions of GVNPRE used to use "value handles" here, so that
@@ -1266,9 +1266,9 @@ get_expr_type (const pre_expr e)
them to be usable without finding leaders).  */
 
 static tree
-get_representative_for (const pre_expr e)
+get_representative_for (const pre_expr e, basic_block b = NULL)
 {
-  tree name;
+  tree name, valnum = NULL_TREE;
   unsigned int value_id = get_expr_value_id (e);
 
   switch (e->kind)
@@ -1289,7 +1289,18 @@ get_representative_for (const pre_expr e
  {
pre_expr rep = expression_for_id (i);
if (rep->kind == NAME)
- return VN_INFO (PRE_EXPR_NAME (rep))->valnum;
+ {
+   tree name = PRE_EXPR_NAME (rep);
+   valnum = VN_INFO (name)->valnum;
+   gimple *def = SSA_NAME_DEF_STMT (name);
+   /* We have to return either a new representative or one
+  that can be used for expression simplification and thus
+  is available in B.  */
+   if (! b 
+   || gimple_nop_p (def)
+   || dominated_by_p (CDI_DOMINATORS, b, gimple_bb (def)))
+ return name;
+ }
else if (rep->kind == CONSTANT)
  return PRE_EXPR_CONSTANT (rep);
  }
@@ -1305,7 +1316,7 @@ get_representative_for (const pre_expr e
  to compute it.  */
   name = make_temp_ssa_name (get_expr_type (e), gimple_build_nop (), "pretmp");
   VN_INFO_GET (name)->value_id = value_id;
-  VN_INFO (name)->valnum = name;
+  VN_INFO (name)->valnum = valnum ? valnum : name;
   /* ???  For now mark this SSA name for release by SCCVN.  */
   VN_INFO (name)->needs_insertion = true;
   add_to_value (value_id, get_or_alloc_expr_for_name (name));
@@ -1356,7 +1367,9 @@ phi_translate_1 (pre_expr expr, bitmap_s
leader = find_leader_in_sets (op_val_id, set1, set2);
 result = phi_translate (leader, set1, set2, pred, phiblock);
if (result && result != leader)
- newnary->op[i] = get_representative_for (result);
+ /* Force a leader as well as we are simplifying this
+expression.  */
+ newnary->op[i] = get_representative_for (result, pred);
else if (!result)
  return NULL;
 
@@ -1398,6 +1411,10 @@ phi_translate_1 (pre_expr expr, bitmap_s
  return constant;
  }
 
+   /* vn_nary_* do not valueize operands.  */
+   for (i = 0; i < newnary->length; ++i)
+ if (TREE_CODE (newnary->op[i]) == SSA_NAME)
+   newnary->op[i] = VN_INFO (newnary->op[i])->valnum;
tree result = vn_nary_op_lookup_pieces (newnary->length,
newnary->opcode,
newnary->type,
@@ -1414,45 +1431,6 @@ phi_translate_1 (pre_expr expr, bitmap_s
PRE_EXPR_NARY (expr) = nary;
new_val_id = nary->value_id;
get_or_alloc_expression_id (expr);
-   /* When we end up re-using a value number make sure that
-  doesn't have unrelated (which we can't check here)
-  range or points-to info on it.  */
-   if (result
-   && INTEGRAL_TYPE_P (TREE_TYPE (result))
-   && SSA_NAME_RANGE_INFO (result)
-   && ! SSA_NAME_IS_DEFAULT_DEF (result))
- {
-   if (! VN_INFO (result)->info.range_info)
- {
-   VN_INFO (result)->info.range_info
- = SSA_NAME_RANGE_INFO (result);
-   VN_INFO (result)->rang

Re: Add optabs for common types of permutation

2017-11-23 Thread Richard Biener
On Tue, Nov 21, 2017 at 11:47 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 20, 2017 at 1:35 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law  wrote:
> On 11/09/2017 06:24 AM, Richard Sandiford wrote:
>> ...so that we can use them for variable-length vectors.  For now
>> constant-length vectors continue to use VEC_PERM_EXPR and the
>> vec_perm_const optab even for cases that the new optabs could
>> handle.
>>
>> The vector optabs are inconsistent about whether there should be
>> an underscore before the mode part of the name, but the other lo/hi
>> optabs have one.
>>
>> Doing this means that we're able to optimise some SLP tests using
>> non-SLP (for now) on targets with variable-length vectors, so the
>> patch needs to add a few XFAILs.  Most of these go away with later
>> patches.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linus-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2017-11-09  Richard Sandiford  
>>   Alan Hayward  
>>   David Sherwood  
>>
>> gcc/
>>   * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
>>   (vec_extract_even, vec_extract_odd): Document new optabs.
>>   * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
>>   (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
>>   functions.
>>   * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
>>   (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
>>   New optabs.
>>   * tree-vect-data-refs.c: Include internal-fn.h.
>>   (vect_grouped_store_supported): Try using 
>> IFN_VEC_INTERLEAVE_{LO,HI}.
>>   (vect_permute_store_chain): Use them here too.
>>   (vect_grouped_load_supported): Try using 
>> IFN_VEC_EXTRACT_{EVEN,ODD}.
>>   (vect_permute_load_chain): Use them here too.
>>   * tree-vect-stmts.c (can_reverse_vector_p): New function.
>>   (get_negative_load_store_type): Use it.
>>   (reverse_vector): New function.
>>   (vectorizable_store, vectorizable_load): Use it.
>>   * config/aarch64/iterators.md (perm_optab): New iterator.
>>   * config/aarch64/aarch64-sve.md (_): New 
>> expander.
>>   (vec_reverse_): Likewise.
>>
>> gcc/testsuite/
>>   * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
>>   * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
>>   * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
>>   * gcc.dg/vect/pr68445.c: Likewise.
>>   * gcc.dg/vect/slp-12a.c: Likewise.
>>   * gcc.dg/vect/slp-13-big-array.c: Likewise.
>>   * gcc.dg/vect/slp-13.c: Likewise.
>>   * gcc.dg/vect/slp-14.c: Likewise.
>>   * gcc.dg/vect/slp-15.c: Likewise.
>>   * gcc.dg/vect/slp-42.c: Likewise.
>>   * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>>   * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>>   * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>>   * gcc.dg/vect/slp-reduc-4.c: Likewise.
>>   * gcc.dg/vect/slp-reduc-7.c: Likewise.
>>   * gcc.target/aarch64/sve_vec_perm_2.c: New test.
>>   * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
>>   * gcc.target/aarch64/sve_vec_perm_3.c: New test.
>>   * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
>>   * gcc.target/aarch64/sve_vec_perm_4.c: New test.
>>   * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
> OK.

 It's really a step backwards - we had those optabs and a tree code in
 the past and
 canonicalizing things to VEC_PERM_EXPR made things simpler.

 Why doesn't VEC_PERM  not work?
>>>
>>> The problems with that are:
>>>
>>> - It doesn't work for vectors with 256-bit elements because the indices
>>>   wrap round.
>>
>> That's a general issue that would need to be addressed for larger
>> vectors (GCN?).
>> I presume the requirement that the permutation vector have the same size
>> needs to be relaxed.
>>
>>> - Supporting a fake VEC_PERM_EXPR  for a few
>>>   special cases would be hard, especially since v256hi isn't a normal
>>>   vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
>>>   then have to worry about that special case.
>>
>> I think it's not really a special case - any code here should just
>> expect the same
>> number of vector elements and not a particular size.  You already dealt with
>> using a char[] vector for permutations I think.
>
> It sounds like you're talking about the case in which the permutation
> vector is a VECTOR_CST.  We still use VEC_PERM_EXPRs for constant-length
> vectors, so that doesn't change.  (And yes, that probably means that it
> does break for *fixed-length* 

Re: [PATCH] Fix expand_assignment with complex modes with the same size but different inner mode (PR middle-end/82253)

2017-11-23 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> store_expr which is called if to_rtx is a CONCAT and from has complex mode
> covering the whole to_rtx can handle the case when from expands to a CONCAT
> or if it has the same complex mode, but if e.g. one mode is CTImode and
> the other mode is TCmode and from expands to something other than a CONCAT
> (e.g. a MEM, not sure if anything else is possible, perhaps a constant),
> then convert_mode can't deal with it.  We already have code to handle the
> case when to_rtx is CONCAT and from covers all its bits, so this patch
> just uses that code if the complex modes are different.
> simplify_gen_subreg doesn't work properly if from would expand as a CONCAT
> though, so I'm subregging the individual parts instead in that case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.

Thanks,
Richard.

> 2017-11-22  Jakub Jelinek  
> 
>   PR middle-end/82253
>   * expr.c (expand_assignment): For CONCAT to_rtx, complex type from and
>   bitpos/bitsize covering the whole destination, use store_expr only if
>   the complex mode is the same.  Otherwise, use expand_normal and if
>   it returns CONCAT, subreg each part separately instead of trying to
>   subreg the whole result.
> 
>   * gfortran.dg/pr82253.f90: New test.
> 
> --- gcc/expr.c.jj 2017-11-21 20:23:02.0 +0100
> +++ gcc/expr.c2017-11-22 18:31:57.513726153 +0100
> @@ -5107,7 +5107,8 @@ expand_assignment (tree to, tree from, b
>else if (GET_CODE (to_rtx) == CONCAT)
>   {
> unsigned short mode_bitsize = GET_MODE_BITSIZE (GET_MODE (to_rtx));
> -   if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
> +   if (TYPE_MODE (TREE_TYPE (from)) == GET_MODE (to_rtx)
> +   && COMPLEX_MODE_P (GET_MODE (to_rtx))
> && bitpos == 0
> && bitsize == mode_bitsize)
>   result = store_expr (from, to_rtx, false, nontemporal, reversep);
> @@ -5128,14 +5129,30 @@ expand_assignment (tree to, tree from, b
> nontemporal, reversep);
> else if (bitpos == 0 && bitsize == mode_bitsize)
>   {
> -   rtx from_rtx;
> result = expand_normal (from);
> -   from_rtx = simplify_gen_subreg (GET_MODE (to_rtx), result,
> -   TYPE_MODE (TREE_TYPE (from)), 0);
> -   emit_move_insn (XEXP (to_rtx, 0),
> -   read_complex_part (from_rtx, false));
> -   emit_move_insn (XEXP (to_rtx, 1),
> -   read_complex_part (from_rtx, true));
> +   if (GET_CODE (result) == CONCAT)
> + {
> +   machine_mode to_mode = GET_MODE_INNER (GET_MODE (to_rtx));
> +   machine_mode from_mode = GET_MODE_INNER (GET_MODE (result));
> +   rtx from_real
> + = simplify_gen_subreg (to_mode, XEXP (result, 0),
> +from_mode, 0);
> +   rtx from_imag
> + = simplify_gen_subreg (to_mode, XEXP (result, 1),
> +from_mode, 1);
> +   emit_move_insn (XEXP (to_rtx, 0), from_real);
> +   emit_move_insn (XEXP (to_rtx, 1), from_imag);
> + }
> +   else
> + {
> +   rtx from_rtx
> + = simplify_gen_subreg (GET_MODE (to_rtx), result,
> +TYPE_MODE (TREE_TYPE (from)), 0);
> +   emit_move_insn (XEXP (to_rtx, 0),
> +   read_complex_part (from_rtx, false));
> +   emit_move_insn (XEXP (to_rtx, 1),
> +   read_complex_part (from_rtx, true));
> + }
>   }
> else
>   {
> --- gcc/testsuite/gfortran.dg/pr82253.f90.jj  2017-11-22 18:41:33.421850619 
> +0100
> +++ gcc/testsuite/gfortran.dg/pr82253.f90 2017-11-22 18:41:18.0 
> +0100
> @@ -0,0 +1,40 @@
> +! PR middle-end/82253
> +! { dg-do compile { target fortran_real_16 } }
> +! { dg-options "-Og" }
> +
> +module pr82253
> +  implicit none
> +  private
> +  public :: static_type
> +  type, public :: T
> +procedure(), nopass, pointer :: testProc => null()
> +  end type
> +  type, public :: S
> +complex(kind=16), pointer :: ptr
> +  end type
> +  type(T), target :: type_complex32
> +  interface static_type
> +module procedure foo
> +  end interface
> +  interface
> +subroutine bar (testProc)
> +  procedure(), optional :: testProc
> +end subroutine
> +  end interface
> +  contains
> +function foo (self) result(res)
> +  complex(kind=16) :: self
> +  type(T), pointer :: res
> +  call bar (testProc = baz)
> +end function
> +subroutine baz (buffer, status)
> +  character(len=*) :: buffer
> +  integer(kind=4) :: status
> +  complex(kind=16), target :: obj
> +  type(S) :: self
> +  

Re: [patch] Add support for #pragma GCC unroll v2

2017-11-23 Thread Richard Biener
On Wed, Nov 22, 2017 at 11:46 AM, Eric Botcazou  wrote:
> Hi,
>
> this is a revised version of:
>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html
>
> with the following changes:
>  1. integration of Bernhard's patch for the Fortran front-end,
>  2. Sandra's fix for the documentation,
>  3. minor tweaks to the C and C++ front-end,
>  4. change at the GIMPLE level for the cunrolli pass,
>  5. change at the RTL level with a fix for a thinko,
>  6. More testcases for all languages.
>
> This makes it so that the presence of a pragma GCC unroll doesn't have a
> global effect on the function: at the GIMPLE level, the cunrolli pass is no
> longer forced, so that the unrolling is done by the first activated pass
> (cunroll at -O1, cunrolli at -O2 and above); at the RTL level, this was
> already the case but the code no longer fiddles with the unrolling flag.
>
> Tested on x86_64-suse-linux, OK for the mainline?

The middle-end, testsuite and boilerplate changes in the FEs are ok.

Pragma support in the FEs need FE maintainer approval.

Thanks,
Richard,

>
> 2017-11-22  Mike Stump  
> Eric Botcazou  
> Bernhard Reutner-Fischer  
>
> ChangeLog/
> * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
> * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
> * cfgloop.h (struct loop): Add unroll field.
> * function.h (struct function): Add has_unroll bitfield.
> * gimplify.c (gimple_boolify) : Deal with unroll kind.
> (gimplify_expr) : Propagate 3rd operand.
> * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
> (pass_rtl_unroll_loops::gate): Likewise.
> * loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
> for which loop->unroll==1.
> (decide_unroll_constant_iterations): Use note for consistency and
> take loop->unroll into account.  Return early if loop->unroll is set.
> Fix thinko in existing test.
> (decide_unroll_runtime_iterations): Use note for consistency and
> take loop->unroll into account.
> (decide_unroll_stupid): Likewise.
> * lto-streamer-in.c (input_cfg): Read loop->unroll.
> * lto-streamer-out.c (output_cfg): Write loop->unroll.
> * tree-cfg.c (replace_loop_annotate_in_block) 
> New.
> (replace_loop_annotate) : Likewise.
> (print_loop): Print loop->unroll if set.
> * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
> * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
> * tree-pretty-print.c (dump_generic_node) :
> New.
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
> loop->unroll is set and smaller than the trip count.  Otherwise bypass
> entirely the heuristics if loop->unroll is set.  Remove dead note.
> Fix off-by-one bug in other node.
> (try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
> (tree_unroll_loops_completely_1): Force unrolling if loop->unroll
> is greater than 1.
> (tree_unroll_loops_completely): Make static.
> (pass_complete_unroll::execute): Use correct type for variable.
> (pass_complete_unrolli::execute): Fix formatting.
> * tree.def (ANNOTATE_EXPR): Add 3rd operand.
>
> ada/ChangeLog:
> * gcc-interface/trans.c (gnat_gimplify_stmt) : Add 3rd
> operand to ANNOTATE_EXPR and pass unrolling hints.
>
> c-family/ChangeLog:
> * c-pragma.c (init_pragma): Register pragma GCC unroll.
> * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.
>
> c/ChangeLog:
> * c-parser.c (c_parser_while_statement): Add unroll parameter and
> build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
> (c_parser_do_statement): Likewise.
> (c_parser_for_statement): Likewise.
> (c_parser_statement_after_labels): Adjust calls to above.
> (c_parse_pragma_ivdep): New static function.
> (c_parser_pragma_unroll): Likewise.
> (c_parser_pragma) : Add support for pragma Unroll.
> : New case.
>
> cp/ChangeLog:
> * constexpr.c (cxx_eval_constant_expression) : Remove
> assertion on 2nd operand.
> (potential_constant_expression_1): Likewise.
> * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
> * cp-tree.h (cp_convert_range_for): Adjust prototype.
> (finish_while_stmt_cond): Likewise.
> (finish_do_stmt): Likewise.
> (finish_for_cond): Likewise.
> * init.c (build_vec_init): Adjut call to finish_for_cond.
> * parser.c (cp_parser_statement): Adjust call to
> cp_parser_iteration_statement.
> (cp_parser_for): Add unroll parameter and pass it in calls to
> cp_parser_range_for and cp_parser_c_for.
> (cp_parser_c_for): Add unroll parameter and pass 

[PATCH] Fix PR23094

2017-11-23 Thread Richard Biener

This fixes an old missed optimization in alias disambiguation during
SCCVN.  We can skip a clobbering stmt if that is a store with exactly
the same value as a preceeding redundancy.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-11-23  Richard Biener  

PR tree-optimization/23094
* tree-ssa-sccvn.c (vuse_ssa_val): Handle VN_TOP when we
come here from walking over backedges in the first iteration.
(vn_reference_lookup_3): Skip clobbers that store the same value.

* gcc.dg/tree-ssa/ssa-fre-61.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 255054)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -345,7 +345,12 @@ vuse_ssa_val (tree x)
 
   do
 {
-  x = SSA_VAL (x);
+  tree tem = SSA_VAL (x);
+  /* stmt walking can walk over a backedge and reach code we didn't
+value-number yet.  */
+  if (tem == VN_TOP)
+   return x;
+  x = tem;
 }
   while (SSA_NAME_IN_FREE_LIST (x));
 
@@ -1868,6 +1873,39 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  ao_ref_init (&lhs_ref, lhs);
  lhs_ref_ok = true;
}
+
+  /* If we reach a clobbering statement try to skip it and see if
+ we find a VN result with exactly the same value as the
+possible clobber.  In this case we can ignore the clobber
+and return the found value.
+Note that we don't need to worry about partial overlapping
+accesses as we then can use TBAA to disambiguate against the
+clobbering statement when looking up a load (thus the
+VN_WALKREWRITE guard).  */
+  if (vn_walk_kind == VN_WALKREWRITE
+ && is_gimple_reg_type (TREE_TYPE (lhs))
+ && types_compatible_p (TREE_TYPE (lhs), vr->type))
+   {
+ tree *saved_last_vuse_ptr = last_vuse_ptr;
+ /* Do not update last_vuse_ptr in vn_reference_lookup_2.  */
+ last_vuse_ptr = NULL;
+ tree saved_vuse = vr->vuse;
+ hashval_t saved_hashcode = vr->hashcode;
+ void *res = vn_reference_lookup_2 (ref,
+gimple_vuse (def_stmt), 0, vr);
+ /* Need to restore vr->vuse and vr->hashcode.  */
+ vr->vuse = saved_vuse;
+ vr->hashcode = saved_hashcode;
+ last_vuse_ptr = saved_last_vuse_ptr;
+ if (res && res != (void *)-1)
+   {
+ vn_reference_t vnresult = (vn_reference_t) res;
+ if (vnresult->result
+ && operand_equal_p (vnresult->result,
+ gimple_assign_rhs1 (def_stmt), 0))
+   return res;
+   }
+   }
 }
   else if (gimple_call_builtin_p (def_stmt, BUILT_IN_NORMAL)
   && gimple_call_num_args (def_stmt) <= 4)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-61.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-61.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-61.c  (working copy)
@@ -0,0 +1,43 @@
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-fre1-details" } */
+
+void link_error (void);
+
+void test1 (int *p, int *q)
+{
+  *p = 1;
+  *q = 1;
+  if (*p != 1)
+link_error ();
+}
+
+void test2 (int *p, int *q, int t)
+{
+  *p = t;
+  *q = t;
+  if (*p != t)
+link_error ();
+}
+
+void test3 (int *q, int *p)
+{
+  int tem = *p;
+  *q = tem;
+  if (*p != tem)
+link_error ();
+}
+
+char a[4];
+struct A { char a[4]; };
+void test4 (struct A *p)
+{
+  a[0] = p->a[0];
+  a[0] = p->a[0];
+  a[0] = p->a[0];
+}
+
+int main() { return 0; }
+
+/* { dg-final { scan-tree-dump-times "Replaced \\\*p" 3 "fre1" } } */
+/* { dg-final { scan-tree-dump-times "Replaced p_.\\(D\\)->" 2 "fre1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted redundant store a\\\[0\\\]" 2 
"fre1" } } */


RE: [PATCH, i386] Fix behavior for –mprefer-vector-width= option

2017-11-23 Thread Shalnov, Sergey
Uros,
Thank you for review. New patch with addressed comments is attached.
Could you please merge it?

Thank you
Sergey

2017-11-23  Sergey Shalnov  

gcc/
* config/i386/i386.h (TARGET_PREFER_AVX256): Also
enable when TARGET_PREFER_AVX128 is set.


-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, November 22, 2017 9:18 PM
To: Shalnov, Sergey 
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com; Koval, Julia 
; Senkevich, Andrew ; Peryt, 
Sebastian ; Ivchenko, Alexander 
; Joseph Myers 
Subject: Re: [PATCH, i386] Fix behavior for –mprefer-vector-width= option

On Wed, Nov 22, 2017 at 3:58 PM, Shalnov, Sergey  
wrote:
> Hi,
> This patch making –mprefer-vector-width= option inclusive. This means 
> that if we use –mprefer-vector-width=128 it should switch 
> TARGET_PREFER_AVX128=ON and TARGET_PREFER_AVX256=ON also.
> It is minor change to generate “xmm” with –mprefer-vector-width=128 on 
> the platform with “zmm”.
>
> Sergey
>
> 2017-11-22  Sergey Shalnov   gcc/
> * config/i386/i386.h (TARGET_PREFER_AVX256): Add inclusiveness of
> the TARGET_PREFER_AVX256 for TARGET_PREFER_AVX128

You could just say:

* config/i386/i386.h (TARGET_PREFER_AVX256): Also
enable when TARGET_PREFER_AVX128 is set.

+#define TARGET_PREFER_AVX256(TARGET_PREFER_AVX128 || \
+ (prefer_vector_width_type == PVW_AVX256))

No need for extra parenthesis, and following the GNU coding standard, the 
condition should start on the next line:

#define TARGET_PREFER_AVX256(TARGET_PREFER_AVX128 \
 || prefer_vector_width_type == PVW_AVX256)

Otherwise OK.

Uros.


0006-Fix-behavior-for-mprefer-vector-width-option.patch
Description: 0006-Fix-behavior-for-mprefer-vector-width-option.patch


[PATCH] rs6000_gimple_fold_builtin formatting fixes

2017-11-23 Thread Jakub Jelinek
Hi!

While looking at this function, I found so many formatting issues
(indentation and spacing) that I've decided to post a patch.  As an
additional cleanup, I've moved the temp and g declarations to the top of
the function, so that {}s don't need to clutter most of the cases.
The patch shouldn't change the generated code at all.

Ok for trunk if it passes bootstrap/regtest?

The attached patch is diff -upbd output to show fewer changes (mostly
just the removals of {}s and gimple * removal from g and tree removal from
temp).

2017-11-23  Jakub Jelinek  

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Formatting
fixes.  Declare temp and g variables at the top in order to avoid
{} in most of the cases.

--- gcc/config/rs6000/rs6000.c.jj   2017-11-17 08:40:39.0 +0100
+++ gcc/config/rs6000/rs6000.c  2017-11-23 10:17:22.711402906 +0100
@@ -16121,7 +16121,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_
   gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
   enum rs6000_builtins fn_code
 = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl);
-  tree arg0, arg1, lhs;
+  tree arg0, arg1, lhs, temp;
+  gimple *g;
 
   size_t uns_fncode = (size_t) fn_code;
   enum insn_code icode = rs6000_builtin_info[uns_fncode].icode;
@@ -16153,15 +16154,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_
 case P8V_BUILTIN_VADDUDM:
 case ALTIVEC_BUILTIN_VADDFP:
 case VSX_BUILTIN_XVADDDP:
-  {
-   arg0 = gimple_call_arg (stmt, 0);
-   arg1 = gimple_call_arg (stmt, 1);
-   lhs = gimple_call_lhs (stmt);
-   gimple *g = gimple_build_assign (lhs, PLUS_EXPR, arg0, arg1);
-   gimple_set_location (g, gimple_location (stmt));
-   gsi_replace (gsi, g, true);
-   return true;
-  }
+  arg0 = gimple_call_arg (stmt, 0);
+  arg1 = gimple_call_arg (stmt, 1);
+  lhs = gimple_call_lhs (stmt);
+  g = gimple_build_assign (lhs, PLUS_EXPR, arg0, arg1);
+  gimple_set_location (g, gimple_location (stmt));
+  gsi_replace (gsi, g, true);
+  return true;
 /* Flavors of vec_sub.  We deliberately don't expand
P8V_BUILTIN_VSUBUQM. */
 case ALTIVEC_BUILTIN_VSUBUBM:
@@ -16170,26 +16169,22 @@ rs6000_gimple_fold_builtin (gimple_stmt_
 case P8V_BUILTIN_VSUBUDM:
 case ALTIVEC_BUILTIN_VSUBFP:
 case VSX_BUILTIN_XVSUBDP:
-  {
-   arg0 = gimple_call_arg (stmt, 0);
-   arg1 = gimple_call_arg (stmt, 1);
-   lhs = gimple_call_lhs (stmt);
-   gimple *g = gimple_build_assign (lhs, MINUS_EXPR, arg0, arg1);
-   gimple_set_location (g, gimple_location (stmt));
-   gsi_replace (gsi, g, true);
-   return true;
-  }
+  arg0 = gimple_call_arg (stmt, 0);
+  arg1 = gimple_call_arg (stmt, 1);
+  lhs = gimple_call_lhs (stmt);
+  g = gimple_build_assign (lhs, MINUS_EXPR, arg0, arg1);
+  gimple_set_location (g, gimple_location (stmt));
+  gsi_replace (gsi, g, true);
+  return true;
 case VSX_BUILTIN_XVMULSP:
 case VSX_BUILTIN_XVMULDP:
-  {
-   arg0 = gimple_call_arg (stmt, 0);
-   arg1 = gimple_call_arg (stmt, 1);
-   lhs = gimple_call_lhs (stmt);
-   gimple *g = gimple_build_assign (lhs, MULT_EXPR, arg0, arg1);
-   gimple_set_location (g, gimple_location (stmt));
-   gsi_replace (gsi, g, true);
-   return true;
-  }
+  arg0 = gimple_call_arg (stmt, 0);
+  arg1 = gimple_call_arg (stmt, 1);
+  lhs = gimple_call_lhs (stmt);
+  g = gimple_build_assign (lhs, MULT_EXPR, arg0, arg1);
+  gimple_set_location (g, gimple_location (stmt));
+  gsi_replace (gsi, g, true);
+  return true;
 /* Even element flavors of vec_mul (signed). */
 case ALTIVEC_BUILTIN_VMULESB:
 case ALTIVEC_BUILTIN_VMULESH:
@@ -16198,15 +16193,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_
 case ALTIVEC_BUILTIN_VMULEUB:
 case ALTIVEC_BUILTIN_VMULEUH:
 case ALTIVEC_BUILTIN_VMULEUW:
-  {
-   arg0 = gimple_call_arg (stmt, 0);
-   arg1 = gimple_call_arg (stmt, 1);
-   lhs = gimple_call_lhs (stmt);
-   gimple *g = gimple_build_assign (lhs, VEC_WIDEN_MULT_EVEN_EXPR, arg0, 
arg1);
-   gimple_set_location (g, gimple_location (stmt));
-   gsi_replace (gsi, g, true);
-   return true;
-  }
+  arg0 = gimple_call_arg (stmt, 0);
+  arg1 = gimple_call_arg (stmt, 1);
+  lhs = gimple_call_lhs (stmt);
+  g = gimple_build_assign (lhs, VEC_WIDEN_MULT_EVEN_EXPR, arg0, arg1);
+  gimple_set_location (g, gimple_location (stmt));
+  gsi_replace (gsi, g, true);
+  return true;
 /* Odd element flavors of vec_mul (signed).  */
 case ALTIVEC_BUILTIN_VMULOSB:
 case ALTIVEC_BUILTIN_VMULOSH:
@@ -16215,65 +16208,55 @@ rs6000_gimple_fold_builtin (gimple_stmt_
 case ALTIVEC_BUILTIN_VMULOUB:
 case ALTIVEC_BUILTIN_VMULOUH:
 case ALTIVEC_BUILTIN_VMULOUW:
-  {
-   arg0 = gimple_call_arg (stmt, 0);
-   arg1 = gimple_call_arg (stmt, 1);
-   

RE: [patch] remove cilk-plus

2017-11-23 Thread Koval, Julia
Sorry, I think in this version of this patch they are fixed.

> -Original Message-
> From: Joseph Myers [mailto:jos...@codesourcery.com]
> Sent: Wednesday, November 22, 2017 6:23 PM
> To: Koval, Julia 
> Cc: Jeff Law ; Jakub Jelinek ; GCC
> Patches 
> Subject: RE: [patch] remove cilk-plus
> 
> This patch version does not appear to address my comment that you're
> leaving behind comments in c-parser.c relating to Cilk array notations
> while removing the subsequent code.  (Or in one case actually reindenting
> the comment that is no longer relevant, rather than removing it.)
> 
> --
> Joseph S. Myers
> jos...@codesourcery.com


cilk-plus.tar.xz
Description: cilk-plus.tar.xz


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-23 Thread Alan Hayward

> On 22 Nov 2017, at 16:57, Kilian Verhetsel  
> wrote:
> 
> 
> Thank you both for your comments.
> 
> I have added the check to ensure the index vector won't cause an
> overflow. I also added tests to the testsuite in order to check that the
> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
> iterations. I was not able to write code that triggers
> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
> (changing the types of last, min_v and a to something else causes
> COND_REDUCTION to be used instead), so these tests are only compiled and
> not executed.

I had similar problems when playing around with -14.c, but didn’t have chance to
investigate why. Possibly worth raising a bug to mentioning there is a missed
optimisation. It’d be nice to figure out why.

> 
> I also moved an instruction that generates a vector of zeroes (used for
> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
> should remove the unused vector that Alan noticed.

Patch is ok for me.


Alan.



Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-11-23 Thread Kyrill Tkachov

Hi Charles,

On 20/11/17 21:09, Charles Baylis wrote:

On 15 September 2017 at 18:01, Kyrill Tkachov
 wrote:
>
> On 15/09/17 16:38, Charles Baylis wrote:
>>
>> On 13 September 2017 at 10:02, Kyrill  Tkachov
>>  wrote:
>>>
>>> Hi Charles,
>>>
>>> On 12/09/17 09:34, charles.bay...@linaro.org wrote:

 From: Charles Baylis 

 This patch moves the calculation of costs for MEM into a
 separate function, and reforms the calculation into two
 parts. Firstly any additional cost of the addressing mode
 is calculated, and then the cost of the memory access itself
 is added.

 In this patch, the calculation of the cost of the addressing
 mode is left as a placeholder, to be added in a subsequent
 patch.

>>> Can you please mention how has this series been tested?
>>> A bootstrap and test run on arm-none-linux-gnueabihf is required at
>>> least.
>>
>> It has been tested with make check on arm-unknown-linux-gnueabihf with
>> no regressions. I've successfully bootstrapped the next spin.
>
>
> Thanks.
>
>>> Also, do you have any benchmarking results for this?
>>> I agree that generating the addressing modes in the new tests is
>>> desirable.
>>> So I'm not objecting to the goal of this patch, but a check to 
make sure

>>> that this doesn't regress SPEC
>>> would be great.  Further comments on the patch inline.
>>
>> SPEC2006 scores are unaffected by this patch on Cortex-A57.
>
>
> Good, thanks for checking :)
>
>
 +/* Helper function for arm_rtx_costs_internal.  Calculates the 
cost of

 a
 MEM,
 +   considering the costs of the addressing mode and memory access
 +   separately.  */
 +static bool
 +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
 +  int *cost, bool speed_p)
 +{
 +  machine_mode mode = GET_MODE (x);
 +  if (flag_pic
 +  && GET_CODE (XEXP (x, 0)) == PLUS
 +  && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
 +/* This will be split into two instructions.  Add the cost 
of the
 +   additional instruction here.  The cost of the memory 
access is

 computed
 +   below.  See arm.md:calculate_pic_address.  */
 +*cost = COSTS_N_INSNS (1);
 +  else
 +*cost = 0;
>>>
>>>
>>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a
>>> each
>>> insn)
>>> plus the appropriate field in extra_cost. So you should 
unconditionally

>>> initialise the cost
>>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)
>>> with
>>> the condition above.
>>
>> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
>> part because the cost of a single bus transfer is included in that
>> initial cost.
>>
 +
 +  /* Calculate cost of the addressing mode.  */
 +  if (speed_p)
 +{
 +  /* TODO: Add table-driven costs for addressing modes.  
(See patch

 2) */
 +}
>>>
>>>
>>> You mean "patch 3". I recommend you just remove this conditional from
>>> this
>>> patch and add the logic
>>> in patch 3 entirely.
>>
>> OK.
>>
 +
 +  /* Calculate cost of memory access.  */
 +  if (speed_p)
 +{
 +  /* data transfer is transfer size divided by bus width.  */
 +  int bus_width_bytes = current_tune->bus_width / 4;
>>>
>>>
>>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
>>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure
>>> the
>>> cost calculation and generated code is still appropriate.
>>
>> Oops, I changed the units around and messed this up. I'll fix this.
>>
 +  *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
 +  *cost += extra_cost->ldst.load;
 +}
 +  else
 +{
 +  *cost += COSTS_N_INSNS (1);
 +}
>>>
>>> Given my first comment above this else would be deleted.
>>
>> OK
>
>
> I have a concern about using the bus_width parameter which
> I explain in the thread for patch 1 (I don't think we need it, we 
should use

> the fields in extra_cost->ldst
> more carefully).

I have modified this patch accordingly. Patch 1 is no longer needed.

Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
regressions. Bootstrap is in progress.

Can I still get this in during stage 3?



Thanks, these are ok for trunk.
They were originally posted way before stage 3 and this is just a rework,
so it's acceptable at this stage as far as I'm concerned.

Thank you for working on these,
Kyrill


gcc/ChangeLog:

  Charles Baylis 

* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

  Charles Baylis 

* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.




Re: [PATCH] Fix PR82991

2017-11-23 Thread Richard Biener
On Wed, 22 Nov 2017, Richard Biener wrote:

> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
> 
> > On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote:
> > > 
> > > I am testing the following (old) patch to value number call lhs
> > > according to the ERF_RETURNS_ARG annotation.  This allows for
> > > more expression simplification and also changes downstream
> > > users of the argument to use the return value which should help
> > > register allocation and register lifetime in general.
> > 
> > If the argument is an SSA_NAME then that is generally a win, but won't
> > it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar?
> 
> Looks like it also confuses passes, for gcc.dg/tree-ssa/pr62112-1.c
> 
> char*g;
> void h(){
>   char*p=__builtin_malloc(42);
>   g=__builtin_memset(p,3,10);
>   __builtin_free(p);
> }
> 
> we no longer DSE the memset because we end up with
> 
>:
>   p_4 = __builtin_malloc (42);
>   _1 = __builtin_memset (p_4, 3, 10);
>   g = _1;
>   __builtin_free (_1);
> 
> and we no longer see that the free call clobbers the memset destination.
> 
> So I guess I'll try to do the reverse (always replace with the argument).
> That will of course make DCE remove the LHS of such calls.
> 
> In theory it's easy to recover somewhere before RTL expansion ...
> 
> > If we only later (say some later forwprop) determine the argument is
> > gimple_min_invariant, shouldn't we have something that will effectively
> > undo this (i.e. replace the uses of the function result with whatever
> > we've simplified the argument to)?
> 
> ... of course not (easily) if we substituted a constant.
> 
> The question is whether for example a pointer in TLS is cheaper
> to remat from the return value or via the TLS reg like in
> 
> __thread int x[4];
> 
> int *foo ()
> {
>   int *q = __builtin_memset (&x, 3, 4*4);
>   return q;
> }
> 
> of course if the user has written in that way we currently do not
> change it.  And we have no easy way (currently) to replace equal
> constants by a register where that is available.  So
> 
> __thread int x[4];
>   
> int *foo ()
> {
>   int *q = __builtin_memset (&x, 3, 4*4);
>   return &x;
> } 
> 
> wouldn't be optimized to return q.  Anyway, it seems the register
> allocator can do the trick at least for memset.

So doing the reverse (always propagating the result) causes

FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 
...
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 
...
FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 
...
FAIL: gcc.dg/builtin-object-size-4.c execution test

where I looked into the object-size case.  For

  r = &a.a[4];
  r = memset (r, 'a', 2);
  if (__builtin_object_size (r, 3) != sizeof (a.a) - 4)
abort ();
  r = memset (r + 2, 'b', 2) + 2;
  if (__builtin_object_size (r, 3) != sizeof (a.a) - 8)
abort ();

FRE now propagates &a.a[4] + CST adjustments through the memsets
and constant folds the last __builtin_object_size check to false.

The testcase looks quite artificial so I'm not sure this regression
is important.  The way it happens is

Value numbering .MEM_120 stmt = r_121 = memset (&a.a[4], 97, 2);
Setting value number of r_121 to &a.a[4] (changed)
...
Value numbering _37 stmt = _37 = r_121 + 2;
RHS r_121 + 2 simplified to &MEM[(void *)&a + 6B]
Setting value number of _37 to &MEM[(void *)&a + 6B] (changed)
Value numbering .MEM_122 stmt = _38 = memset (_37, 98, 2);
Setting value number of _38 to &MEM[(void *)&a + 6B] (changed)
Value numbering _171 stmt = _171 = __builtin_object_size (r_123, 3);
call __builtin_object_size (r_123, 3) simplified to 20

oops.  So while we have some cfun->after_inlining guard in place
to inhibit SCCVN itself from aggressively pruning COMPONENT_REFs
match-and-simplify still happily optimizes &a.a[4] + 2.

The gcc.c-torture/execute/builtins/memcpy-chk.c case is somewhat odd,
but we end up with two chk_calls because somehow we do figure out
the object size for

  /* buf3 points to an unknown object, so __memcpy_chk should not be done.  
*/
  if (memcpy ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4
  || memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19))
abort ();

we do this during EVRP somehow as

_19: [&MEM[(void *)&buf1 + 4B], &MEM[(void *)&buf1 + 4B]]

so we leaked that buf3 == buf1 because of dominating checks:

  if (memcpy (buf3, buf5, 17) != (char *) buf1
  || memcmp (buf1, "RSTUVWXYZ01234567\0", 19))
abort ();

Hmm.  Any idea how to best dumb down / fix things here?  All those
tests were not really supposed to be optimized at compile-time but
are supposed to be verified at runtime?

Thus add -fno-tree-vrp / -fno-tree-fre to the never ending options
we disable for them?

For reference the updated patch is appended below, otherwise
bootstrapped and tested ok on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2017-11-22  Richard Biener  

PR tree-optimization/82991
* tree-ssa-sccvn.c (visit_use): Val

Re: [PATCH] Fix PR82991

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 11:30:22AM +0100, Richard Biener wrote:
> FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 
> ...
> FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 
> ...
> FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 
> ...
> FAIL: gcc.dg/builtin-object-size-4.c execution test
> 
> where I looked into the object-size case.  For
> 
>   r = &a.a[4];
>   r = memset (r, 'a', 2);
>   if (__builtin_object_size (r, 3) != sizeof (a.a) - 4)
> abort ();
>   r = memset (r + 2, 'b', 2) + 2;
>   if (__builtin_object_size (r, 3) != sizeof (a.a) - 8)
> abort ();
> 
> FRE now propagates &a.a[4] + CST adjustments through the memsets
> and constant folds the last __builtin_object_size check to false.

That is because it tests a [13] mode which as you know is fragile and
doesn't like any kind of folding into MEM_REFs where the original access
path (&a.a[4] + 4) is lost.  So, if it is match.pd that does this, whatever is
doing it there should be also guarded with cfun->after_inlining.
While mode 3 is probably never used in real-world, mode 1 is used a lot and
so we shouldn't regress that.

BTW, seems pass_through_call could now use ERF_RETURNS_ARG stuff
except for __builtin_assume_aligned which intentionally doesn't have
ret1 because we really don't want to "optimize" it by propagating the
first argument to its uses too early.  I'll handle it.
Though, with your changes perhaps there won't be any pass through calls
anymore except for __builtin_assume_aligned.

Jakub


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-23 Thread Alan Hayward

> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
> 
> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>> 
>>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
 
> 
> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
> totally forgotten about it.  And in fact it seems to come pretty close
> to what you need…
 
 Yes, some of the code is similar to the way
 TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
 CLOBBER expr code served as a starting point for writing the patch. The 
 main difference
 here, is that _PART_CLOBBERED is around all calls and is not tied to a 
 specific Instruction,
 it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
 an expression (tls_desc).
 It meant there wasn’t really any opportunity to resume any existing code.
>>> Understood.  Though your first patch mentions that you're trying to
>>> describe partial preservation "around TLS calls". Presumably those are
>>> represented as normal insns, not call_insn.
>>> 
>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>> to itself using whatever mode is wide enough to cover the neon part of
>>> the register.
>>> 
>>> That should tell the generic parts of the compiler that you're just
>>> clobbering the upper part and at least in theory you can implement in
>>> the aarch64 backend and the rest of the compiler should "just work"
>>> because that's the existing semantics of a subreg store.
>>> 
>>> The only worry would be if a pass tried to get overly smart and
>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>> wrong given the semantics of a partial store.
>>> 
>> 
>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>> It’s something we considered, and then dismissed.
>> 
>> The problem then is you are now using SET semantics on those registers, and 
>> it
>> would make the register live around the function, which might not be the 
>> case.
>> Whereas clobber semantics will just make the register dead - which is exactly
>> what we want (but only conditionally).
> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
> low part is preserved while the upper part is clobbered across the TLS
> insns.
> 
> jeff

Consider where the TLS call is inside a loop. The compiler would normally want
to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
tls_desc we
are now making x live across the loop, x is dependant on the value from the 
previous
iteration, and the tls_desc can no longer be hoisted.

Or consider a stream of code containing two tls_desc calls (ok, the compiler 
might
optimise one of the tls calls away, but this approach should be reusable for 
other exprs).
Between the two set(x,x)’s x is considered live so the register allocator can’t 
use that
register.
Given that we are applying this to all the neon registers, the register 
allocator now throws
an ICE because it can’t find any free hard neon registers to use.


Alan.

Re: Add optabs for common types of permutation

2017-11-23 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Nov 21, 2017 at 11:47 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Nov 20, 2017 at 1:35 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law  wrote:
>> On 11/09/2017 06:24 AM, Richard Sandiford wrote:
>>> ...so that we can use them for variable-length vectors.  For now
>>> constant-length vectors continue to use VEC_PERM_EXPR and the
>>> vec_perm_const optab even for cases that the new optabs could
>>> handle.
>>>
>>> The vector optabs are inconsistent about whether there should be
>>> an underscore before the mode part of the name, but the other lo/hi
>>> optabs have one.
>>>
>>> Doing this means that we're able to optimise some SLP tests using
>>> non-SLP (for now) on targets with variable-length vectors, so the
>>> patch needs to add a few XFAILs.  Most of these go away with later
>>> patches.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linus-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> 2017-11-09  Richard Sandiford  
>>>   Alan Hayward  
>>>   David Sherwood  
>>>
>>> gcc/
>>>   * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
>>>   (vec_extract_even, vec_extract_odd): Document new optabs.
>>>   * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
>>>   (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
>>>   functions.
>>>   * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
>>>   (vec_extract_even_optab, vec_extract_odd_optab, 
>>> vec_reverse_optab):
>>>   New optabs.
>>>   * tree-vect-data-refs.c: Include internal-fn.h.
>>>   (vect_grouped_store_supported): Try using 
>>> IFN_VEC_INTERLEAVE_{LO,HI}.
>>>   (vect_permute_store_chain): Use them here too.
>>>   (vect_grouped_load_supported): Try using 
>>> IFN_VEC_EXTRACT_{EVEN,ODD}.
>>>   (vect_permute_load_chain): Use them here too.
>>>   * tree-vect-stmts.c (can_reverse_vector_p): New function.
>>>   (get_negative_load_store_type): Use it.
>>>   (reverse_vector): New function.
>>>   (vectorizable_store, vectorizable_load): Use it.
>>>   * config/aarch64/iterators.md (perm_optab): New iterator.
>>>   * config/aarch64/aarch64-sve.md (_): New 
>>> expander.
>>>   (vec_reverse_): Likewise.
>>>
>>> gcc/testsuite/
>>>   * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
>>>   * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
>>>   * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
>>>   * gcc.dg/vect/pr68445.c: Likewise.
>>>   * gcc.dg/vect/slp-12a.c: Likewise.
>>>   * gcc.dg/vect/slp-13-big-array.c: Likewise.
>>>   * gcc.dg/vect/slp-13.c: Likewise.
>>>   * gcc.dg/vect/slp-14.c: Likewise.
>>>   * gcc.dg/vect/slp-15.c: Likewise.
>>>   * gcc.dg/vect/slp-42.c: Likewise.
>>>   * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>>>   * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>>>   * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>>>   * gcc.dg/vect/slp-reduc-4.c: Likewise.
>>>   * gcc.dg/vect/slp-reduc-7.c: Likewise.
>>>   * gcc.target/aarch64/sve_vec_perm_2.c: New test.
>>>   * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
>>>   * gcc.target/aarch64/sve_vec_perm_3.c: New test.
>>>   * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
>>>   * gcc.target/aarch64/sve_vec_perm_4.c: New test.
>>>   * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
>> OK.
>
> It's really a step backwards - we had those optabs and a tree code in
> the past and
> canonicalizing things to VEC_PERM_EXPR made things simpler.
>
> Why doesn't VEC_PERM  not work?

 The problems with that are:

 - It doesn't work for vectors with 256-bit elements because the indices
   wrap round.
>>>
>>> That's a general issue that would need to be addressed for larger
>>> vectors (GCN?).
>>> I presume the requirement that the permutation vector have the same size
>>> needs to be relaxed.
>>>
 - Supporting a fake VEC_PERM_EXPR  for a few
   special cases would be hard, especially since v256hi isn't a normal
   vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
   then have to worry about that special case.
>>>
>>> I think it's not really a special case - any code here should just
>>> expect the same
>>> number of vector elements and not a particular size.  You already dealt with
>>> using a char[] vector for permutations I think.
>>
>> It sounds like you're talking about the case in which the permutation
>> vector is a VECTOR_CST.  We still u

Re: [Patch AArch64] Fixup floating point division with -march=armv8-a+nosimd

2017-11-23 Thread Richard Earnshaw (lists)
On 22/11/17 14:53, Ramana Radhakrishnan wrote:
> Hi,
> 
> I received a private report from a customer that gcc was putting out
> calls to __divdf3 when compiling with +nosimd. When the reciprocal math
> support was added this was probably an oversight or a typo.
> 
> The canonical examples is :
> 
>     double
>     foo (double x, double y)
>   {
>     return x / y;
>   }
> 
>     with -march=armv8-a+nosimd
> 
> generates a function that calls __divdf3. Ofcourse on AArch64 we don't
> have any software floating point and this causes issues.
> 
> There is also a problem in +nosimd that has existed since the dawn of
> time in the port with respect to long doubles (128 bit floating point),
> here the ABI and the compiler expect the presence of the SIMD unit as
> these parameters are passed in the vector registers. Thus while +nosimd
> tries to prevent the use of SIMD instructions in the compile we don't
> get this right as we end up using ldr qN / str qN instructions and even
> there I think things go wrong in a simple example that I tried.
> 
> Is that sufficient to consider marking +nosimd as deprecated in GCC-8
> and remove this in a future release ?

Yes, I certainly think we should consider that.  Having options that are
known not to work (especially when they are known to be unfixable, as
this case is due to the ABI requirements) is not helpful to users.

> 
> That is not a subject for this patch but something separate but I would
> like to put this into trunk and the release branches. Bootstrapped and
> regression tested on my aarch64 desktop.
> 
> Ok ?

OK.  It's perhaps useful internally to have a distinction between FP and
SIMD operations.

R.

> 
> regards
> Ramana
> 
> 
>     * config/aarch64/aarch64.md (div3): Change check to TARGET_FLOAT.
>     * config/aarch64/aarch64.c (aarch64_emit_approx_div): Add early exit
> for vector mode and !TARGET_SIMD.
> 
> p1.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0c67e2b5c67..811f19c3f11 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -8442,6 +8442,9 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
>|| !use_approx_division_p)
>  return false;
>  
> +  if (!TARGET_SIMD && VECTOR_MODE_P (mode))
> +return false;
> +
>/* Estimate the approximate reciprocal.  */
>rtx xrcp = gen_reg_rtx (mode);
>emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 423a3352aab..83e49425090 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5093,7 +5093,7 @@
>   [(set (match_operand:GPF_F16 0 "register_operand")
> (div:GPF_F16 (match_operand:GPF_F16 1 "general_operand")
>   (match_operand:GPF_F16 2 "register_operand")))]
> - "TARGET_SIMD"
> + "TARGET_FLOAT"
>  {
>if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
>  DONE;
> 



Re: [PING] Plugin support on Windows/MinGW

2017-11-23 Thread JonY
On 11/22/2017 11:14 AM, Boris Kolpackov wrote:
> JonY <10wa...@gmail.com> writes:
> 
>> Is there a problem with using .so for internal libraries instead of
>> "dll"...
> 
> I think not but I haven't tested it. The problem with using .so instead
> of .dll is that producing this non-standard extension may not be easy
> or possible depending on the build system/tool (e.g., libtool). Also,
> you never know how other pieces of the system (like antivirus) will
> react to a file that looks like a DLL but is called something else.
> 
> 

Libtool shouldn't matter since it is not used to build those, and I
doubt AVs would care what the filename is called. Apache on Windows uses
.so plugins too.

>> ... if it simplifies the code?
> 
> I don't think it simplifies that much and the potential (and unknown)
> downside is significant.
> 
> Thanks for the review,
> Boris
> 

I'll commit in a few days if there are no more inputs.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Fix PR82991

2017-11-23 Thread Richard Biener
On Thu, 23 Nov 2017, Jakub Jelinek wrote:

> On Thu, Nov 23, 2017 at 11:30:22AM +0100, Richard Biener wrote:
> > FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 
> > ...
> > FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 
> > ...
> > FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 
> > ...
> > FAIL: gcc.dg/builtin-object-size-4.c execution test
> > 
> > where I looked into the object-size case.  For
> > 
> >   r = &a.a[4];
> >   r = memset (r, 'a', 2);
> >   if (__builtin_object_size (r, 3) != sizeof (a.a) - 4)
> > abort ();
> >   r = memset (r + 2, 'b', 2) + 2;
> >   if (__builtin_object_size (r, 3) != sizeof (a.a) - 8)
> > abort ();
> > 
> > FRE now propagates &a.a[4] + CST adjustments through the memsets
> > and constant folds the last __builtin_object_size check to false.
> 
> That is because it tests a [13] mode which as you know is fragile and
> doesn't like any kind of folding into MEM_REFs where the original access
> path (&a.a[4] + 4) is lost.  So, if it is match.pd that does this, whatever is
> doing it there should be also guarded with cfun->after_inlining.
> While mode 3 is probably never used in real-world, mode 1 is used a lot and
> so we shouldn't regress that.

Hum.  But that pessimizes a _lot_ of early folding.  Like we'd no
longer optimize

  r = &a.a[4];
  r = r + 1;
  if (r != &a.a[0])

and similar stuff exposed a lot by C++ abstraction.  We really only
want to avoid elimination in certain places.

Even the current stop-gap (that isn't really one) is bad for
optimization.

> BTW, seems pass_through_call could now use ERF_RETURNS_ARG stuff
> except for __builtin_assume_aligned which intentionally doesn't have
> ret1 because we really don't want to "optimize" it by propagating the
> first argument to its uses too early.  I'll handle it.

Thanks (double-check all those builtins have ret1).

> Though, with your changes perhaps there won't be any pass through calls
> anymore except for __builtin_assume_aligned.

For the object-size pass before FRE there certainly would be.

So I have the following testsuite adjustment for the builtins.exp
FAILs.  Two of them quite obvious - just hide the conditional equivalence
we exposed.  The strncat-chk.c is a little less obvious.  We have

  const char *const s1 = "hello world";
  char dst[64], *d2;
...

  /* These __strncat_chk calls should be optimized into __strcat_chk,
 as strlen (src) <= len.  */
  strcpy (dst, s1);
  if (strncat (dst, "foo", 3) != dst || strcmp (dst, "hello worldfoo"))
abort ();
  strcpy (dst, s1);
  if (strncat (dst, "foo", 100) != dst || strcmp (dst, "hello worldfoo"))
abort ();
  strcpy (dst, s1);
  if (strncat (dst, s1, 100) != dst || strcmp (dst, "hello worldhello 
world"))
abort ();
  if (chk_calls != 3)
abort ();

so it's obvious we can optimize away all checking given the strlen
pass ought to be able to track the length of dst.  Somehow we
currently don't but with the patch we do (but not at -O[1sg]).
Looks like a missed optimization to me (currently).  I couldn't
figure any way to remove that optimization possibility while
preserving strlen (src) <= len knowledge.  Like hiding 's1'
because that makes the strcpy do a chk and thus we get 6 checks
(that I could have consistently at all opt levels).

Any preference for strncat-chk.c?

Thanks,
Richard.

2017-11-23  Richard Biener  

* gcc.c-torture/execute/builtins/memcpy-chk.c: Hide established
conditional equivalence of buf3 and buf1.
* gcc.c-torture/execute/builtins/memmove-chk.c: Likewise.
* gcc.c-torture/execute/builtins/strncat-chk.c: Allow for optimizing
of chk calls.

Index: gcc/testsuite/gcc.c-torture/execute/builtins/memcpy-chk.c
===
--- gcc/testsuite/gcc.c-torture/execute/builtins/memcpy-chk.c   (revision 
255093)
+++ gcc/testsuite/gcc.c-torture/execute/builtins/memcpy-chk.c   (working copy)
@@ -146,6 +146,7 @@ test2_sub (long *buf3, char *buf4, char
  call.  */
 
   /* buf3 points to an unknown object, so __memcpy_chk should not be done.  */
+  __asm ("" : "=r" (buf3) : "0" (buf3));
   if (memcpy ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4
   || memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19))
 abort ();
Index: gcc/testsuite/gcc.c-torture/execute/builtins/memmove-chk.c
===
--- gcc/testsuite/gcc.c-torture/execute/builtins/memmove-chk.c  (revision 
255093)
+++ gcc/testsuite/gcc.c-torture/execute/builtins/memmove-chk.c  (working copy)
@@ -149,6 +149,7 @@ test2_sub (long *buf3, char *buf4, char
  call.  */
 
   /* buf3 points to an unknown object, so __memmove_chk should not be done.  */
+  __asm ("" : "=r" (buf3) : "0" (buf3));
   if (memmove ((char *) buf3 + 4, buf5, n + 6) != (char *) buf1 + 4
   || memcmp (buf1, "aBcdRSTUVWklmnopq\0", 19))
 abort ();
Index: gcc/testsuite/gcc.c-torture/execut

Re: [PATCH] Fix PR82991

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 12:37:31PM +0100, Richard Biener wrote:
> Hum.  But that pessimizes a _lot_ of early folding.  Like we'd no
> longer optimize
> 
>   r = &a.a[4];
>   r = r + 1;
>   if (r != &a.a[0])
> 
> and similar stuff exposed a lot by C++ abstraction.  We really only
> want to avoid elimination in certain places.

But we can fold the above before IPA, just not to &MEM[&a, 5], but e.g. to
&a.a[5]; if the array has 5 elements.  So for int a[10]; field fold
&a.a[4] + -4 to &a.a[4] + 6 -> &a.a[0] to &a.a[10], don't fold out of bounds
accesses (and only fold those to MEM_REF later).

> So I have the following testsuite adjustment for the builtins.exp
> FAILs.  Two of them quite obvious - just hide the conditional equivalence
> we exposed.  The strncat-chk.c is a little less obvious.  We have

What I'm worried is that testsuite adjustment isn't sufficient, that doesn't
protect the real-world code which doesn't have asm stuff like that.
While the tests are artificial, they were written to show various cases that
do happen in real-world.

Jakub


Re: [PING] Plugin support on Windows/MinGW

2017-11-23 Thread Boris Kolpackov
JonY <10wa...@gmail.com> writes:

> Libtool shouldn't matter since it is not used to build those, [...]

We don't know which build system the plugin author will use to build
the plugin. We can, however, reasonably expect that it will be able
to produce a shared library with the platform-standard extensions
(.dll, .dylib, etc).


> I'll commit in a few days if there are no more inputs.

Great, thanks!

Boris


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-23 Thread Richard Biener
On Thu, Nov 23, 2017 at 10:51 AM, Alan Hayward  wrote:
>
>> On 22 Nov 2017, at 16:57, Kilian Verhetsel  
>> wrote:
>>
>>
>> Thank you both for your comments.
>>
>> I have added the check to ensure the index vector won't cause an
>> overflow. I also added tests to the testsuite in order to check that the
>> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
>> iterations. I was not able to write code that triggers
>> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
>> (changing the types of last, min_v and a to something else causes
>> COND_REDUCTION to be used instead), so these tests are only compiled and
>> not executed.
>
> I had similar problems when playing around with -14.c, but didn’t have chance 
> to
> investigate why. Possibly worth raising a bug to mentioning there is a missed
> optimisation. It’d be nice to figure out why.
>
>>
>> I also moved an instruction that generates a vector of zeroes (used for
>> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
>> should remove the unused vector that Alan noticed.
>
> Patch is ok for me.

Fine with me as well.

Thanks for taking care of this bug.

Richard.

>
> Alan.
>


Re: [PING] Plugin support on Windows/MinGW

2017-11-23 Thread Pedro Alves
On 11/23/2017 12:06 PM, Boris Kolpackov wrote:
> JonY <10wa...@gmail.com> writes:
> 
>> Libtool shouldn't matter since it is not used to build those, [...]
> 
> We don't know which build system the plugin author will use to build
> the plugin. We can, however, reasonably expect that it will be able
> to produce a shared library with the platform-standard extensions
> (.dll, .dylib, etc).
> 
> 
>> I'll commit in a few days if there are no more inputs.
> 
> Great, thanks!

I would just like to say that I think this is a fantastic patch
that will help open GCC to many more use cases.  The fact that
plugins don't work on Windows has been a sore spot, IMO.

I for one am very happy that this make gdb's libcc1 plugin
a viable option for Windows.  Puts us one step closer to the
long term plan of making GDB always use GCC for C/C++
expression parsing/evaluation.  Yay.  :-)

Thanks,
Pedro Alves



Re: Simplify ptr - 0

2017-11-23 Thread Richard Biener
On Wed, Nov 22, 2017 at 6:34 PM, Marc Glisse  wrote:
> Hello,
>
> I hadn't implemented this simplification because I think it is invalid code,
> but apparently people do write it, so we might as well handle it sensibly.
> This also happens to work around PR 83104 (already fixed).
>
> bootstrap+regtest on powerpc64le-unknown-linux-gnu.

Ok.

What about 0 - ptr?  (ok, that's even more weird)

Richard.

> 2017-11-22  Marc Glisse  
>
> * match.pd (ptr-0): New transformation.
>
> --
> Marc Glisse


Re: [RFA][PATCH] Use SCEV conditionally within vr-values and evrp range analysis

2017-11-23 Thread Richard Biener
On Thu, Nov 23, 2017 at 1:16 AM, Jeff Law  wrote:
>
> Clients of the evrp range analysis may not have initialized the SCEV
> infrastructure, and in fact my not want to (DOM for example).
>
> Yet inside both vr-values.c and gimple-ssa-evrp-analyze.c we have calls
> into SCEV (that will fault/abort if SCEV is not properly initialized).
>
> This patch allows clients of vr-values.c and gimple-ssa-evrp-analyze.c
> to indicate if they want SCEV analysis.
>
> Bootstrapped and regression tested by itself as well as with the DOM
> patches to use EVRP analysis  (which test the "don't want SCEV path).
>
> OK for the trunk?

There's also scev_initialized_p () which you could conveniently use.

Richard.

>
> Jeff
>
> * gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): New data
> member USE_SCEV.
> * gimple-ssa-evrp-analyze.c
> (evrp_range_analyzer::evrp_range_analyzer): Add new use_scev argument.
> Pass it to the vr_values ctor.
> (evrp_range_analyzer::record_ranges_from_phis): Conditionally use
> SCEV to refine ranges for PHIs.
> * gimple-ssa-evrp (evrp_dom_walker::evrp_dom_walker): Ask for
> SCEV analysis in the vr-values engine.
> * vr-values.h (class vr_values): New data meber USE_SCEV.
> * vr-values.c (vr_values::vr_values): Add new use_scev argument.
> (vr_values::extract_range_from_phi_node): Conditionally use
> SCEV to refine ranges for PHIs.
>
>
> diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
> index 68a2cdc..bc41be1 100644
> --- a/gcc/gimple-ssa-evrp-analyze.c
> +++ b/gcc/gimple-ssa-evrp-analyze.c
> @@ -42,7 +42,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "vr-values.h"
>  #include "gimple-ssa-evrp-analyze.h"
>
> -evrp_range_analyzer::evrp_range_analyzer () : stack (10)
> +evrp_range_analyzer::evrp_range_analyzer (bool use_scev_)
> +  : vr_values (use_scev_), stack (10), use_scev (use_scev_)
>  {
>edge e;
>edge_iterator ei;
> @@ -176,7 +177,8 @@ evrp_range_analyzer::record_ranges_from_phis (basic_block 
> bb)
>  to use VARYING for them.  But we can still resort to
>  SCEV for loop header PHIs.  */
>   struct loop *l;
> - if (interesting
> + if (use_scev
> + && interesting
>   && (l = loop_containing_stmt (phi))
>   && l->header == gimple_bb (phi))
>   vr_values.adjust_range_with_scev (&vr_result, l, phi, lhs);
> @@ -341,3 +343,4 @@ evrp_range_analyzer::pop_value_range (tree var)
>stack.pop ();
>return vr;
>  }
> +
> diff --git a/gcc/gimple-ssa-evrp-analyze.h b/gcc/gimple-ssa-evrp-analyze.h
> index 3d64c50..18f5b8c 100644
> --- a/gcc/gimple-ssa-evrp-analyze.h
> +++ b/gcc/gimple-ssa-evrp-analyze.h
> @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  class evrp_range_analyzer
>  {
>   public:
> -  evrp_range_analyzer (void);
> +  evrp_range_analyzer (bool);
>~evrp_range_analyzer (void) { stack.release (); }
>
>void enter (basic_block);
> @@ -65,6 +65,9 @@ class evrp_range_analyzer
>
>/* STACK holds the old VR.  */
>auto_vec > stack;
> +
> +  /* Whether or not to use SCEV to refine ranges.  */
> +  bool use_scev;
>  };
>
>  #endif /* GCC_GIMPLE_SSA_EVRP_ANALYZE_H */
> diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
> index f56d67f..13e4e16 100644
> --- a/gcc/gimple-ssa-evrp.c
> +++ b/gcc/gimple-ssa-evrp.c
> @@ -69,6 +69,7 @@ class evrp_dom_walker : public dom_walker
>  public:
>evrp_dom_walker ()
>  : dom_walker (CDI_DOMINATORS),
> +  evrp_range_analyzer (true),
>evrp_folder (evrp_range_analyzer.get_vr_values ())
>  {
>need_eh_cleanup = BITMAP_ALLOC (NULL);
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index 838822d..46be749 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -4737,6 +4737,7 @@ insert_range_assertions (void)
>  class vrp_prop : public ssa_propagation_engine
>  {
>   public:
> +  vrp_prop (bool use_scev_) : vr_values (use_scev_) { }
>enum ssa_prop_result visit_stmt (gimple *, edge *, tree *) FINAL OVERRIDE;
>enum ssa_prop_result visit_phi (gphi *) FINAL OVERRIDE;
>
> @@ -6862,7 +6863,7 @@ execute_vrp (bool warn_array_bounds_p)
>/* For visiting PHI nodes we need EDGE_DFS_BACK computed.  */
>mark_dfs_back_edges ();
>
> -  class vrp_prop vrp_prop;
> +  class vrp_prop vrp_prop (true);
>vrp_prop.vrp_initialize ();
>vrp_prop.ssa_propagate ();
>vrp_prop.vrp_finalize (warn_array_bounds_p);
> diff --git a/gcc/vr-values.c b/gcc/vr-values.c
> index 2d11861..54b0be5 100644
> --- a/gcc/vr-values.c
> +++ b/gcc/vr-values.c
> @@ -1907,7 +1907,8 @@ vr_values::dump_all_value_ranges (FILE *file)
>
>  /* Initialize VRP lattice.  */
>
> -vr_values::vr_values () : vrp_value_range_pool ("Tree VRP value ranges")
> +vr_values::vr_values (bool use_scev_)
> +  : vrp_value_range_pool ("Tree VRP value ranges"), use_scev (use_scev_)
>  {
>value

Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-23 Thread Janne Blomqvist
On Wed, Nov 22, 2017 at 8:10 PM, Thomas Koenig  wrote:
> Hi Janne,
>
>>> So, attached is a new version of the patch. No update
>>> on the ChangeLog. OK for trunk?
>>
>> Yup, just really fix the copyright and string length stuff first. Thanks!
>
>
> Committed as rev 255070 with the fixes.
>
> There are still some files which mention Fortran 95, that can be fixed
> later.

That's ok, I wasn't expecting you to fix all such occurences, just the
new files you added.

However, to continue my nitpicking (sorry!), it seems that in many
cases compare_fcn still takes an integer length argument. Could you
make that gfc_charlen_type as well? Or maybe size_t, since the
argument is passed straight to memcmp{_char4} anyway? Please consider
such a patch pre-approved. Thanks!


-- 
Janne Blomqvist


[PATCH] Fix .debug_rnglists generation with -gdwarf-5 -gsplit-dwarf.

2017-11-23 Thread Mark Wielaard
Early debug broke generation of .debug_rnglists when using both -gdwarf5
and -gsplit-dwarf. It introduces a generation for init_sections_and_labels,
but doesn't account for the generation of up to 4 unique ranges labels,
two created in init_sections_and_labels and two in output_rnglists.
Fix this by passing generation to output_rnglists and creating 4 unique
labels per generation.

Without this fix using -gdwarf-5 -gsplit-dwarf could result in:
  Error: symbol `.Ldebug_ranges2' is already defined

gcc/ChangeLog:

   * dwarf2out.c (init_sections_and_labels): Use generation to create
   unique ranges_section_label and ranges_base_label. Return generation.
   (output_rnglists): Add generation argument. Use generation to create
   unique ranges labels.
   (dwarf2out_finish): Get generation from init_sections_and_labels
   and pass generation to output_rnglists.
---

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4b0216e..ae3d962 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -11182,7 +11182,7 @@ index_rnglists (void)
 /* Emit .debug_rnglists section.  */
 
 static void
-output_rnglists (void)
+output_rnglists (unsigned generation)
 {
   unsigned i;
   dw_ranges *r;
@@ -11192,8 +11192,12 @@ output_rnglists (void)
 
   switch_to_section (debug_ranges_section);
   ASM_OUTPUT_LABEL (asm_out_file, ranges_section_label);
-  ASM_GENERATE_INTERNAL_LABEL (l1, DEBUG_RANGES_SECTION_LABEL, 2);
-  ASM_GENERATE_INTERNAL_LABEL (l2, DEBUG_RANGES_SECTION_LABEL, 3);
+  /* There are up to 4 unique ranges labels per generation.
+ See also init_sections_and_labels.  */
+  ASM_GENERATE_INTERNAL_LABEL (l1, DEBUG_RANGES_SECTION_LABEL,
+  2 + generation * 4);
+  ASM_GENERATE_INTERNAL_LABEL (l2, DEBUG_RANGES_SECTION_LABEL,
+  3 + generation * 4);
   if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
 dw2_asm_output_data (4, 0x,
 "Initial length escape value indicating "
@@ -27261,9 +27265,10 @@ output_macinfo (const char *debug_line_label, bool 
early_lto_debug)
 }
 
 /* Initialize the various sections and labels for dwarf output and prefix
-   them with PREFIX if non-NULL.  */
+   them with PREFIX if non-NULL.  Returns the generation (zero based
+   number of times function was called).  */
 
-static void
+static unsigned
 init_sections_and_labels (bool early_lto_debug)
 {
   /* As we may get called multiple times have a generation count for
@@ -27442,11 +27447,14 @@ init_sections_and_labels (bool early_lto_debug)
   info_section_emitted = false;
   ASM_GENERATE_INTERNAL_LABEL (debug_line_section_label,
   DEBUG_LINE_SECTION_LABEL, generation);
+  /* There are up to 4 unique ranges labels per generation.
+ See also output_rnglists.  */
   ASM_GENERATE_INTERNAL_LABEL (ranges_section_label,
-  DEBUG_RANGES_SECTION_LABEL, generation);
+  DEBUG_RANGES_SECTION_LABEL, generation * 4);
   if (dwarf_version >= 5 && dwarf_split_debug_info)
 ASM_GENERATE_INTERNAL_LABEL (ranges_base_label,
-DEBUG_RANGES_SECTION_LABEL, 2 + generation);
+DEBUG_RANGES_SECTION_LABEL,
+1 + generation * 4);
   ASM_GENERATE_INTERNAL_LABEL (debug_addr_section_label,
   DEBUG_ADDR_SECTION_LABEL, generation);
   ASM_GENERATE_INTERNAL_LABEL (macinfo_section_label,
@@ -27457,6 +27465,7 @@ init_sections_and_labels (bool early_lto_debug)
   generation);
 
   ++generation;
+  return generation - 1;
 }
 
 /* Set up for Dwarf output at the start of compilation.  */
@@ -29890,7 +29899,7 @@ dwarf2out_finish (const char *)
   move_marked_base_types ();
 
   /* Initialize sections and labels used for actual assembler output.  */
-  init_sections_and_labels (false);
+  unsigned generation = init_sections_and_labels (false);
 
   /* Traverse the DIE's and add sibling attributes to those DIE's that
  have children.  */
@@ -30179,7 +30188,7 @@ dwarf2out_finish (const char *)
   if (!vec_safe_is_empty (ranges_table))
 {
   if (dwarf_version >= 5)
-   output_rnglists ();
+   output_rnglists (generation);
   else
output_ranges ();
 }
-- 
1.8.3.1



Re: Simplify ptr - 0

2017-11-23 Thread Marc Glisse

On Thu, 23 Nov 2017, Richard Biener wrote:


On Wed, Nov 22, 2017 at 6:34 PM, Marc Glisse  wrote:

Hello,

I hadn't implemented this simplification because I think it is invalid code,
but apparently people do write it, so we might as well handle it sensibly.
This also happens to work around PR 83104 (already fixed).

bootstrap+regtest on powerpc64le-unknown-linux-gnu.


Ok.

What about 0 - ptr?  (ok, that's even more weird)


I thought about it, but it seemed less likely (I can imagine the idiom 
ptr-0 being used to let the compiler guess the right type). I'll add 0-ptr 
if you think that may be useful.


--
Marc Glisse


Re: [PATCH] Fix .debug_rnglists generation with -gdwarf-5 -gsplit-dwarf.

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 02:13:57PM +0100, Mark Wielaard wrote:
> Early debug broke generation of .debug_rnglists when using both -gdwarf5
> and -gsplit-dwarf. It introduces a generation for init_sections_and_labels,
> but doesn't account for the generation of up to 4 unique ranges labels,
> two created in init_sections_and_labels and two in output_rnglists.
> Fix this by passing generation to output_rnglists and creating 4 unique
> labels per generation.
> 
> Without this fix using -gdwarf-5 -gsplit-dwarf could result in:
>   Error: symbol `.Ldebug_ranges2' is already defined
> 
> gcc/ChangeLog:
> 
>* dwarf2out.c (init_sections_and_labels): Use generation to create
>unique ranges_section_label and ranges_base_label. Return generation.
>(output_rnglists): Add generation argument. Use generation to create
>unique ranges labels.
>(dwarf2out_finish): Get generation from init_sections_and_labels
>and pass generation to output_rnglists.

Ok.  Though, note that -gdwarf-5 -gsplit-dwarf is really unfinished anyway,
I did what was easy, but left out what was harder and did no testing at all
(don't know what tools are needed for the split dwarf processing etc.).

Jakub


Re: Add optabs for common types of permutation

2017-11-23 Thread Michael Matz
Hi,

On Thu, 23 Nov 2017, Richard Sandiford wrote:

> > I don't want variable-size vector special-casing everywhere.  I want 
> > it to be somehow naturally integrating with existing stuff.
> 
> It's going to be a special case whatever happens though.

It wouldn't have to be this way.  It's like saying that loops with a 
constant upper bound should be represented in a different way than loops 
with an invariant upper bound.  That would seem like a bad idea.

> If it's a VEC_PERM_EXPR then it'll be a new form of VEC_PERM_EXPR.

No, it'd be a VEC_PERM_EXPR where the magic mask is generated by a new 
EXPR type, instead of being a mere constant.

> The advantage of the internal functions and optabs is that they map to a 
> concept that already exists.  The code that generates the permutation 
> already knows that it's generating an interleave lo/hi, and like you 
> say, it used to do that directly via special tree codes.  I agree that 
> having a VEC_PERM_EXPR makes more sense for the constant-length case, 
> but the concept is still there.
> 
> And although using VEC_PERM_EXPR in gimple makes sense, I think not
> having the optabs is a step backwards, because it means that every
> target with interleave lo/hi has to duplicate the detection logic.

The middle end can provide helper routines to make detection easy.  The 
RTL expander could also match VEC_PERM_EXPR to specific optabs, if we 
really really want to add optab over optab for each specific kind of 
permutation in the future.

In a way the difference boils down to have
  PERM(x,y, TYPE)
(with TYPE being, say, HI_LO, EXTR_EVEN/ODD, REVERSE, and what not)
vs.
  PERM_HI_LO(x,y)
  PERM_EVEN(x,y)
  PERM_ODD(x,y)
  PERM_REVERSE(x,y)
  ...

The former way seems saner for an intermediate representation.  In this 
specific case TYPE would be detected by magicness of the constant, and if 
extended to SVE by magicness of the definition of the variably-sized 
invariant.

> > I'm not suggesting to expose it as an operation.  I'm suggesting that 
> > if the target can vec_perm_const_ok () with an "interleave/extract" 
> > permutation then we should be able to represent that with 
> > VEC_PERM_EXPR and thus also represent the permutation vector.
> 
> But vec_perm_const_ok () takes a fixed-length mask, so it can't be
> used here.  It would need to be a new hook (and thus a new special
> case for variable-length vectors).

Why do you reject extending vec_perm_const_ok to _do_ take an invarant 
mask?

> > But is there more?  Possibly a generic permute but for it you'd have
> > to explicitely construct a permutation vector using some primitives
> > like that "series" instruction?  So for that case it's reasonable to
> > have GIMPLE like
> >
> >  perm_vector_1 = VEC_SERIES_EXRP <...>
> >  ...
> >  v_2 = VEC_PERM_EXPR <.., .., perm_vector_1>;
> >
> > that is, it's not required to pretend the VEC_PERM_EXRP is a single
> > instruction or the permutation vector is "constant"?
> 
> ...by taking this approach, we're saying that we need to ensure that
> there is always a way of representing every directly-supported variable-
> length permutation mask as a constant, so that it doesn't get split from
> VEC_PERM_EXPR.

I'm having trouble understanding this.  Why would splitting away the 
defintion of perm_vector_1 from VEC_PERM_EXPR be a problem?  It's still 
the same VEC_SERIES_EXRP, and hence still recognizable as a special 
permutation (if it is one).  The optimizer won't touch VEC_SERIES_EXRP, or 
if they do (e.g. combine two of them), and they feed a VEC_PERM_EXPR they 
will make sure the combined result still is supported by the target.

In a way, on targets which support only specific forms of permutation for 
the vector type in question, this invariant mask won't be explicitely 
generated in code, it's an abstract tag in the IR to specific the type of 
the transformation.  Hence moving the def for that tag around is no 
problem.

> I don't see why that's better than having internal
> functions.

The real difference isn't internal functions vs. expression nodes, but 
rather multiple node types vs. a single node type.


Ciao,
Michael.


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-23 Thread Janne Blomqvist
On Thu, Nov 23, 2017 at 2:56 PM, Janne Blomqvist
 wrote:
> On Wed, Nov 22, 2017 at 8:10 PM, Thomas Koenig  wrote:
>> Hi Janne,
>>
 So, attached is a new version of the patch. No update
 on the ChangeLog. OK for trunk?
>>>
>>> Yup, just really fix the copyright and string length stuff first. Thanks!
>>
>>
>> Committed as rev 255070 with the fixes.
>>
>> There are still some files which mention Fortran 95, that can be fixed
>> later.
>
> That's ok, I wasn't expecting you to fix all such occurences, just the
> new files you added.
>
> However, to continue my nitpicking (sorry!), it seems that in many
> cases compare_fcn still takes an integer length argument. Could you
> make that gfc_charlen_type as well? Or maybe size_t, since the
> argument is passed straight to memcmp{_char4} anyway? Please consider
> such a patch pre-approved. Thanks!

To continue, the prototypes are inconsistent too, e.g. m4/minloc2s.m4:

extern 'rtype_name` 'name`'rtype_qual`_'atype_code` ('atype` * const
restrict, int);
export_proto('name`'rtype_qual`_'atype_code`);

'rtype_name`
'name`'rtype_qual`_'atype_code` ('atype` * const restrict array,
gfc_charlen_type len)

See also lines 77/82, and in maxloc2s.m4 lines 42/46 and 144/149.

Thanks to James and Ramana on IRC, who are working on some target
where gfc_charlen_type != int and reported build failures.

-- 
Janne Blomqvist


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-23 Thread Ramana Radhakrishnan
On Thu, Nov 23, 2017 at 1:53 PM, Janne Blomqvist
 wrote:
> On Thu, Nov 23, 2017 at 2:56 PM, Janne Blomqvist
>  wrote:
>> On Wed, Nov 22, 2017 at 8:10 PM, Thomas Koenig  wrote:
>>> Hi Janne,
>>>
> So, attached is a new version of the patch. No update
> on the ChangeLog. OK for trunk?

 Yup, just really fix the copyright and string length stuff first. Thanks!
>>>
>>>
>>> Committed as rev 255070 with the fixes.
>>>
>>> There are still some files which mention Fortran 95, that can be fixed
>>> later.
>>
>> That's ok, I wasn't expecting you to fix all such occurences, just the
>> new files you added.
>>
>> However, to continue my nitpicking (sorry!), it seems that in many
>> cases compare_fcn still takes an integer length argument. Could you
>> make that gfc_charlen_type as well? Or maybe size_t, since the
>> argument is passed straight to memcmp{_char4} anyway? Please consider
>> such a patch pre-approved. Thanks!
>
> To continue, the prototypes are inconsistent too, e.g. m4/minloc2s.m4:
>
> extern 'rtype_name` 'name`'rtype_qual`_'atype_code` ('atype` * const
> restrict, int);
> export_proto('name`'rtype_qual`_'atype_code`);
>
> 'rtype_name`
> 'name`'rtype_qual`_'atype_code` ('atype` * const restrict array,
> gfc_charlen_type len)
>
> See also lines 77/82, and in maxloc2s.m4 lines 42/46 and 144/149.
>
> Thanks to James and Ramana on IRC, who are working on some target
> where gfc_charlen_type != int and reported build failures.


I'm not sure why gfc_charlen_type != int on arm-none-eabi and
aarch64-none-elf which is where both James and I saw the issue.

The issue hasn't appeared on any of our cross-linux builds.

regards
Ramana

>
> --
> Janne Blomqvist


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-23 Thread Janne Blomqvist
On Thu, Nov 23, 2017 at 3:58 PM, Ramana Radhakrishnan
 wrote:
> On Thu, Nov 23, 2017 at 1:53 PM, Janne Blomqvist
>  wrote:
>> On Thu, Nov 23, 2017 at 2:56 PM, Janne Blomqvist
>>  wrote:
>>> On Wed, Nov 22, 2017 at 8:10 PM, Thomas Koenig  
>>> wrote:
 Hi Janne,

>> So, attached is a new version of the patch. No update
>> on the ChangeLog. OK for trunk?
>
> Yup, just really fix the copyright and string length stuff first. Thanks!


 Committed as rev 255070 with the fixes.

 There are still some files which mention Fortran 95, that can be fixed
 later.
>>>
>>> That's ok, I wasn't expecting you to fix all such occurences, just the
>>> new files you added.
>>>
>>> However, to continue my nitpicking (sorry!), it seems that in many
>>> cases compare_fcn still takes an integer length argument. Could you
>>> make that gfc_charlen_type as well? Or maybe size_t, since the
>>> argument is passed straight to memcmp{_char4} anyway? Please consider
>>> such a patch pre-approved. Thanks!
>>
>> To continue, the prototypes are inconsistent too, e.g. m4/minloc2s.m4:
>>
>> extern 'rtype_name` 'name`'rtype_qual`_'atype_code` ('atype` * const
>> restrict, int);
>> export_proto('name`'rtype_qual`_'atype_code`);
>>
>> 'rtype_name`
>> 'name`'rtype_qual`_'atype_code` ('atype` * const restrict array,
>> gfc_charlen_type len)
>>
>> See also lines 77/82, and in maxloc2s.m4 lines 42/46 and 144/149.
>>
>> Thanks to James and Ramana on IRC, who are working on some target
>> where gfc_charlen_type != int and reported build failures.
>
>
> I'm not sure why gfc_charlen_type != int on arm-none-eabi and
> aarch64-none-elf which is where both James and I saw the issue.
>
> The issue hasn't appeared on any of our cross-linux builds.

gfc_charlen_type is typedeffed to a GFC_INTEGER_4, which comes from
kinds.h generated during the library build time (see
libgfortran/mk-kinds-h.sh). At least on x86_64-pc-linux-gnu this is
then a int32_t which is a typedef for plain int. HTH.



-- 
Janne Blomqvist


Re: Add optabs for common types of permutation

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 02:43:32PM +0100, Michael Matz wrote:
> > If it's a VEC_PERM_EXPR then it'll be a new form of VEC_PERM_EXPR.
> 
> No, it'd be a VEC_PERM_EXPR where the magic mask is generated by a new 
> EXPR type, instead of being a mere constant.

Or an internal function that would produce the permutation mask vector
given kind and number of vector elements and element type for the mask.

Jakub


[SH][committed] Fix PR 83111

2017-11-23 Thread Oleg Endo
Hi,

The attached patch fixes PR 83111.
Committed to mainline as r255096 and to GCC 7 branch as r255097.

Cheers,
Oleg

gcc/ChangeLog

PR target/83111
* config/sh/sh.md (udivsi3, divsi3, sibcall_value_pcrel,
sibcall_value_pcrel_fdpic): Use local variable instead of
operands[3].
(calli_tbr_rel): Add missing operand 2.
(call_valuei_tbr_rel): Add missing operand 3.Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 255095)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2277,8 +2277,8 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
@@ -2298,16 +2298,16 @@
 	  emit_move_insn (operands[0], operands[2]);
 	  DONE;
 	}
-  function_symbol (operands[3], "__udivsi3_i4i", SFUNC_GOT);
-  last = gen_udivsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, "__udivsi3_i4i", SFUNC_GOT);
+  last = gen_udivsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3_i4", SFUNC_STATIC).lab;
+  rtx lab = function_symbol (func_ptr, "__udivsi3_i4", SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_udivsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_udivsi3_i4 (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2318,8 +2318,8 @@
 }
   else
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3", SFUNC_STATIC).lab;
-  last = gen_udivsi3_i1 (operands[0], operands[3], lab);
+  rtx lab = function_symbol (func_ptr, "__udivsi3", SFUNC_STATIC).lab;
+  last = gen_udivsi3_i1 (operands[0], func_ptr, lab);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -2405,22 +2405,22 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], sh_divsi3_libfunc,
+  rtx lab = function_symbol (func_ptr, sh_divsi3_libfunc,
  SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_divsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_divsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_divsi3_i4 (operands[0], operands[3], lab);
+	last = gen_divsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2431,8 +2431,8 @@
 }
   else
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i1 (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i1 (operands[0], func_ptr);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -6519,6 +6519,7 @@
   [(call (mem (match_operand:SI 0 "symbol_ref_operand" ""))
 	 (match_operand 1 "" ""))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 2))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[0])"
 {
@@ -6629,6 +6630,7 @@
 	(call (mem:SI (match_operand:SI 1 "symbol_ref_operand" ""))
 	  (match_operand 2 "" "")))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 3))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[1])"
 {
@@ -7044,13 +7046,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
-  rtx call_insn;
+  rtx tmp =  gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
-  call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
-			operands[3],
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
+  rtx call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
+			tmp,
 			operands[2],
 			copy_rtx (lab)));
   SIBLING_CALL_P (call_insn) = 1;
@@ -7078,12 +7078,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
+  rtx tmp = gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
   rtx i = emit

Re: [RFA][PATCH] Use SCEV conditionally within vr-values and evrp range analysis

2017-11-23 Thread Jeff Law
On 11/23/2017 05:49 AM, Richard Biener wrote:
> On Thu, Nov 23, 2017 at 1:16 AM, Jeff Law  wrote:
>>
>> Clients of the evrp range analysis may not have initialized the SCEV
>> infrastructure, and in fact my not want to (DOM for example).
>>
>> Yet inside both vr-values.c and gimple-ssa-evrp-analyze.c we have calls
>> into SCEV (that will fault/abort if SCEV is not properly initialized).
>>
>> This patch allows clients of vr-values.c and gimple-ssa-evrp-analyze.c
>> to indicate if they want SCEV analysis.
>>
>> Bootstrapped and regression tested by itself as well as with the DOM
>> patches to use EVRP analysis  (which test the "don't want SCEV path).
>>
>> OK for the trunk?
> 
> There's also scev_initialized_p () which you could conveniently use.
Wasn't aware of it.  That's almost certainly a better solution.

Jeff


Re: Add optabs for common types of permutation

2017-11-23 Thread Richard Sandiford
Michael Matz  writes:
> Hi,
>
> On Thu, 23 Nov 2017, Richard Sandiford wrote:
>
>> > I don't want variable-size vector special-casing everywhere.  I want 
>> > it to be somehow naturally integrating with existing stuff.
>> 
>> It's going to be a special case whatever happens though.
>
> It wouldn't have to be this way.  It's like saying that loops with a 
> constant upper bound should be represented in a different way than loops 
> with an invariant upper bound.  That would seem like a bad idea.

The difference is that with a loop, each iteration follows a set pattern.
But:

(1) for constant-length VEC_PERM_EXPRs, each element of the permutation
vector is independent of the others: you can't predict what the selector
for element i is given the selectors for the other elements.

(2) for variable-length permutes, the elements *do* have to follow
a set pattern that can be extended indefinitely.

Or do you mean that we should use the new representation of interleave
masks even for constant-length vectors, rather than using a VECTOR_CST?
I suppose that would be more consistent, but we'd then have to check
when generating a VEC_PERM_EXPR of a VECTOR_CST whether it should be
represented in this new way instead.  I think we then lose the benefit
using a single tree code.

The decision for VEC_DUPLICATE_CST and VEC_SERIES_CST was to restrict
them only to variable-length vectors.

>> If it's a VEC_PERM_EXPR then it'll be a new form of VEC_PERM_EXPR.
>
> No, it'd be a VEC_PERM_EXPR where the magic mask is generated by a new 
> EXPR type, instead of being a mere constant.

(See [1] below)

>> The advantage of the internal functions and optabs is that they map to a 
>> concept that already exists.  The code that generates the permutation 
>> already knows that it's generating an interleave lo/hi, and like you 
>> say, it used to do that directly via special tree codes.  I agree that 
>> having a VEC_PERM_EXPR makes more sense for the constant-length case, 
>> but the concept is still there.
>> 
>> And although using VEC_PERM_EXPR in gimple makes sense, I think not
>> having the optabs is a step backwards, because it means that every
>> target with interleave lo/hi has to duplicate the detection logic.
>
> The middle end can provide helper routines to make detection easy.  The 
> RTL expander could also match VEC_PERM_EXPR to specific optabs, if we 
> really really want to add optab over optab for each specific kind of 
> permutation in the future.

Adding a new optab doesn't seem like a big deal to me, if it's for
something that target-independent code already needs to worry about.
(The reason we need these specific optabs is because the target-
independent code already generates these particular permutes.)

The overhead attached to adding an optab isn't really any higher
than adding a new detector function, especially on targets that
don't implement the optab.

> In a way the difference boils down to have
>   PERM(x,y, TYPE)
> (with TYPE being, say, HI_LO, EXTR_EVEN/ODD, REVERSE, and what not)
> vs.
>   PERM_HI_LO(x,y)
>   PERM_EVEN(x,y)
>   PERM_ODD(x,y)
>   PERM_REVERSE(x,y)
>   ...
>
> The former way seems saner for an intermediate representation.  In this 
> specific case TYPE would be detected by magicness of the constant, and if 
> extended to SVE by magicness of the definition of the variably-sized 
> invariant.

[1] That sounds similar to the way that COND_EXPR and VEC_COND_EXPR can
embed the comparison in the first operand.  I think Richard has complained
about that in the past (and it does cause some ugliness in the way mask
types are calculated during vectorisation).

>> > I'm not suggesting to expose it as an operation.  I'm suggesting that 
>> > if the target can vec_perm_const_ok () with an "interleave/extract" 
>> > permutation then we should be able to represent that with 
>> > VEC_PERM_EXPR and thus also represent the permutation vector.
>> 
>> But vec_perm_const_ok () takes a fixed-length mask, so it can't be
>> used here.  It would need to be a new hook (and thus a new special
>> case for variable-length vectors).
>
> Why do you reject extending vec_perm_const_ok to _do_ take an invarant 
> mask?

What kind of interface were you thinking of though?

Note that the current interface is independent of the tree or rtl levels,
since it's called by both gimple optimisers and vec_perm_const expanders.
I assume we'd want to keep that.

>> > But is there more?  Possibly a generic permute but for it you'd have
>> > to explicitely construct a permutation vector using some primitives
>> > like that "series" instruction?  So for that case it's reasonable to
>> > have GIMPLE like
>> >
>> >  perm_vector_1 = VEC_SERIES_EXRP <...>
>> >  ...
>> >  v_2 = VEC_PERM_EXPR <.., .., perm_vector_1>;
>> >
>> > that is, it's not required to pretend the VEC_PERM_EXRP is a single
>> > instruction or the permutation vector is "constant"?
>> 
>> ...by taking this approach, we're saying that we need to ensure that
>> th

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-23 Thread Martin Jambor
Hi,

On Mon, Nov 13 2017, Richard Biener wrote:
> The main concern here is that GIMPLE is not very well defined for
> aggregate copies and that gimple-fold.c happily optimizes
> memcpy (&a, &b, sizeof (a)) into a = b;
>
> struct A { short s; long i; long j; };
> struct A a, b;
> void foo ()
> {
>   __builtin_memcpy (&a, &b, sizeof (struct A));
> }
>
> gets folded to
>
>   MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&b];
>   return;
>
> you see we're careful about TBAA but (don't see that above but
> can be verified by for example debugging expand_assignment)
> TREE_TYPE (MEM[...]) is actually 'struct A'.
>
> And yes, I've been worried about SRA as well here...  it _does_
> have some early outs when seeing VIEW_CONVERT_EXPR but
> appearantly not for the above.  Testcase that aborts with SRA but
> not without:
>
> struct A { short s; long i; long j; };
> struct A a, b;
> void foo ()
> {
>   struct A c;
>   __builtin_memcpy (&c, &b, sizeof (struct A));
>   __builtin_memcpy (&a, &c, sizeof (struct A));
> }
> int main()
> {
>   __builtin_memset (&b, 0, sizeof (struct A));
>   b.s = 1;
>   __builtin_memcpy ((char *)&b+2, &b, 2);
>   foo ();
>   __builtin_memcpy (&a, (char *)&a+2, 2);
>   if (a.s != 1)
> __builtin_abort ();
>   return 0;
> }

Thanks for the testcase, I agree that is a fairly big problem.  Do you
think that the following (untested) patch is an appropriate way of
fixing it and generally of extending gimple to capture that a statement
is a bit-copy?

If so, I'll add the testcase, bootstrap it and formally propose it.
Subsequently I will of course make sure that any element-wise copying
patch would test the predicate.

Thanks,

Martin


2017-11-23  Martin Jambor  

* gimple.c (gimple_bit_copy_p): New function.
* gimple.h (gimple_bit_copy_p): Declare it.
* tree-sra.c (sra_modify_assign): Use it.
---
 gcc/gimple.c   | 20 
 gcc/gimple.h   |  1 +
 gcc/tree-sra.c |  1 +
 3 files changed, 22 insertions(+)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index c986a732004..e1b428d91bb 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -3087,6 +3087,26 @@ gimple_inexpensive_call_p (gcall *stmt)
   return false;
 }
 
+/* Return true if STMT is an assignment performing bit copy and so is also
+   expected to copy any padding.  */
+
+bool
+gimple_bit_copy_p (gassign *stmt)
+{
+  if (!gimple_assign_single_p (stmt))
+return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  if (TREE_CODE (lhs) == MEM_REF
+  && TYPE_REF_CAN_ALIAS_ALL (reference_alias_ptr_type (lhs)))
+return true;
+  tree rhs = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (rhs) == MEM_REF
+  && TYPE_REF_CAN_ALIAS_ALL (reference_alias_ptr_type (rhs)))
+return true;
+  return false;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 334def89398..60929473361 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1531,6 +1531,7 @@ extern void gimple_seq_discard (gimple_seq);
 extern void maybe_remove_unused_call_args (struct function *, gimple *);
 extern bool gimple_inexpensive_call_p (gcall *);
 extern bool stmt_can_terminate_bb_p (gimple *);
+extern bool gimple_bit_copy_p (gassign *);
 
 /* Formal (expression) temporary table handling: multiple occurrences of
the same scalar expression are evaluated into the same temporary.  */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index db490b20c3e..fc0a8fe60bf 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3591,6 +3591,7 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
   || gimple_has_volatile_ops (stmt)
   || contains_vce_or_bfcref_p (rhs)
   || contains_vce_or_bfcref_p (lhs)
+  || gimple_bit_copy_p (as_a  (stmt))
   || stmt_ends_bb_p (stmt))
 {
   /* No need to copy into a constant-pool, it comes pre-initialized.  */
-- 
2.15.0



Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 04:32:43PM +0100, Martin Jambor wrote:
> > struct A { short s; long i; long j; };
> > struct A a, b;
> > void foo ()
> > {
> >   struct A c;
> >   __builtin_memcpy (&c, &b, sizeof (struct A));
> >   __builtin_memcpy (&a, &c, sizeof (struct A));
> > }
> > int main()
> > {
> >   __builtin_memset (&b, 0, sizeof (struct A));
> >   b.s = 1;
> >   __builtin_memcpy ((char *)&b+2, &b, 2);
> >   foo ();
> >   __builtin_memcpy (&a, (char *)&a+2, 2);
> >   if (a.s != 1)
> > __builtin_abort ();
> >   return 0;
> > }

Note the testcase would need to be guarded with sizeof (short) == 2
and offsetof (struct A, i) >= 4.

> Thanks for the testcase, I agree that is a fairly big problem.  Do you
> think that the following (untested) patch is an appropriate way of
> fixing it and generally of extending gimple to capture that a statement
> is a bit-copy?

Can you bail out just if the type contains any padding?  If there is no
padding, then perhaps SRA still might do its stuff (though, e.g. if it
contains bitfields, we'd need to hope store-merging merges it all back
again).

Jakub


Fix profile update when producing inline clone

2017-11-23 Thread Jan Hubicka
Hi,
while updating cgraph_node::create_clone I forgot that it also is used to create
inline clones.  In that case we can not preserve original local count even if
prof_count is 0.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* cgraphclones.c (cgraph_node::create_clone): Fix updating of profile
when inlining.
Index: cgraphclones.c
===
--- cgraphclones.c  (revision 255053)
+++ cgraphclones.c  (working copy)
@@ -428,7 +428,10 @@ cgraph_node::create_clone (tree new_decl
   if (new_inlined_to)
 dump_callgraph_transformation (this, new_inlined_to, "inlining to");
 
-  prof_count = count.combine_with_ipa_count (prof_count);
+  /* When inlining we scale precisely to prof_count, when cloning we can
+ preserve local profile.  */
+  if (!new_inlined_to)
+prof_count = count.combine_with_ipa_count (prof_count);
   new_node->count = prof_count;
 
   /* Update IPA profile.  Local profiles need no updating in original.  */


Re: [PATCH GCC]A simple implementation of loop interchange

2017-11-23 Thread Bin.Cheng
Hi Richard,
Thanks for reviewing.  It's quite lot comment, I am trying to resolve
it one by one.  Here I have some questions as embedded.

On Mon, Nov 20, 2017 at 2:46 PM, Richard Biener
 wrote:
> On Thu, Nov 16, 2017 at 4:18 PM, Bin.Cheng  wrote:
>> On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz  wrote:
>>
>>>
>>> I hope this is of some help to you :)
>> Thanks again, it's very helpful.
>>
>> I also fixed several bugs of previous implementation, mostly about debug info
>> statements and simple reductions.  As for test, I enabled this pass by 
>> default,
>> bootstrap and regtest GCC, I also build/run specs.  There must be some other
>> latent bugs in it, but guess we have to exercise it by enabling it at
>> some point.
>>
>> So any comments?
>
>  bool
> -gsi_remove (gimple_stmt_iterator *i, bool remove_permanently)
> +gsi_remove (gimple_stmt_iterator *i, bool remove_permanently, bool 
> insert_dbg)
>  {
>
> that you need this suggests you do stmt removal in wrong order (you need to
> do reverse dom order).
>
> +/* Maximum number of statements in loop nest for loop interchange.  */
> +
> +DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
> + "loop-interchange-max-num-stmts",
> + "The maximum number of stmts in loop nest for loop interchange.",
> + 64, 0, 0)
>
> is that to limit dependence computation?  In this case you should probably
> limit the number of data references instead?
No, this is to limit number of statements in loop.  We don't want to
do interchange for too large loops, right?

>
> +ftree-loop-interchange
> +Common Report Var(flag_tree_loop_interchange) Optimization
> +Enable loop interchange on trees.
> +
>
> please re-use -floop-interchange instead and change the GRAPHITE tests
> to use -floop-nest-optimize.  You can do that as pre-approved thing now.
>
> Please enable the pass by default at O3 via opts.c.
There are quite many (vectorize) test cases affected by interchange
(which is correct I believe), so I will prepare another patch enabling
it at O3 and adjusting the tests, to keep this patch small.

>
> diff --git a/gcc/tree-ssa-loop-interchange.cc 
> b/gcc/tree-ssa-loop-interchange.cc
>
> gimple-loop-interchange.cc please.
>
> new file mode 100644
> index 000..abffbf6
> --- /dev/null
> +++ b/gcc/tree-ssa-loop-interchange.cc
> @@ -0,0 +1,2274 @@
> +/* Loop invariant motion.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
>
> Loop invariant motion? ... ;)
>
> Please add a "Contributed by ..." to have an easy way to figure people to 
> blame.
>
> +}*induction_p;
> +
>
> space after '*'
>
> +}*reduction_p;
> +
>
> likewise.
>
> +/* Return true if PHI is unsupported in loop interchange, i.e, PHI contains
> +   ssa var appearing in any abnormal phi node.  */
> +
> +static inline bool
> +unsupported_phi_node (gphi *phi)
> +{
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
> +return true;
> +
> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +{
> +  tree arg = PHI_ARG_DEF (phi, i);
> +  if (TREE_CODE (arg) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg))
> +   return true;
> +}
> +
> +  return false;
>
> I believe the above isn't necessary given you rule out abnormal edges
> into the loop.
> Did you have a testcase that broke?  A minor thing I guess if it is
> just for extra
> safety...
>
> +/* Return true if all stmts in BB can be supported by loop interchange,
> +   otherwise return false.  ILOOP is not NULL if this loop_cand is the
> +   outer loop in loop nest.  */
> +
> +bool
> +loop_cand::unsupported_operation (basic_block bb, loop_cand *iloop)
> +{
>
> docs and return value suggest this be named supported_operation
>
> +  /* Or it's invariant memory reference and only used by inner loop.  */
> +  if (gimple_assign_single_p (stmt)
> + && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
> + && TREE_CODE (lhs) == SSA_NAME
> + && single_use_in_loop (lhs, iloop->loop))
> +   continue;
>
> comment suggests multiple uses in loop would be ok?
>
> +  if ((lhs = gimple_assign_lhs (producer)) == NULL_TREE
> + || lhs != re->init)
> +   return;
> +
> +  if ((rhs = gimple_assign_rhs1 (producer)) == NULL_TREE
> + || !REFERENCE_CLASS_P (rhs))
> +   return;
>
> lhs and rhs are never NULL.  Please initialize them outside of the if.
> You want to disallow DECL_P rhs here?
>
> Can you add an overall function comment what this function does?  It seems
> to detect a reduction as produced by loop store-motion?  Thus it tries to
> get at enough information to perform the reverse transform?
>
> During review I have a hard time distinguishing between locals and members
> so can you please prefix all member variables with m_ as according to our
> code guidelines?  I guess what adds to the confusion is the loop_cand argument
> that sometimes is present for loop_cand member functions...
> (I personally prefer to prefix all member accesses with this-> bu

Re: [PATCH] Fix .debug_rnglists generation with -gdwarf-5 -gsplit-dwarf.

2017-11-23 Thread Mark Wielaard
On Thu, 2017-11-23 at 14:33 +0100, Jakub Jelinek wrote:
> On Thu, Nov 23, 2017 at 02:13:57PM +0100, Mark Wielaard wrote:
> >    * dwarf2out.c (init_sections_and_labels): Use generation to create
> >    unique ranges_section_label and ranges_base_label. Return generation.
> >    (output_rnglists): Add generation argument. Use generation to create
> >    unique ranges labels.
> >    (dwarf2out_finish): Get generation from init_sections_and_labels
> >    and pass generation to output_rnglists.
> 
> Ok.

Thanks. Pushed.

>   Though, note that -gdwarf-5 -gsplit-dwarf is really unfinished anyway,
> I did what was easy, but left out what was harder and did no testing at all
> (don't know what tools are needed for the split dwarf processing etc.).

Yes, I noticed. In particular the indirect index forms addrx and strx
are never used, only the GNU addr_index and str_index ones. I am adding
DWARF5 support to elfutils libdw. But it doesn't have support for
split-dwarf yet. I'll see if I can help update gcc when I get split-
dwarf working in elfutils. One issue with that is that binutils/gdb do
support split-dwarf or DWARF5, but not DWARF5 and split-dwarf yet.

Cheers,

Mark


Re: Add optabs for common types of permutation

2017-11-23 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, Nov 23, 2017 at 02:43:32PM +0100, Michael Matz wrote:
>> > If it's a VEC_PERM_EXPR then it'll be a new form of VEC_PERM_EXPR.
>> 
>> No, it'd be a VEC_PERM_EXPR where the magic mask is generated by a new 
>> EXPR type, instead of being a mere constant.
>
> Or an internal function that would produce the permutation mask vector
> given kind and number of vector elements and element type for the mask.

I think this comes back to what to the return type of the functions
should be.  A vector of QImodes wouldn't be wide enough when vectors
are 256 elements or wider, so we'd need a vctor of HImodes with the
same number of elements as a native vector of QImodes.  This means
that the function will need to return a non-native vector type.
That doesn't matter if the function remains glued to the VEC_PERM_EXPR,
but once we expose it as a separate operation, it can get optimised
separately from the VEC_PERM_EXPR.

Also, even with this representation, the operation isn't truly
variable-length.  It just increases the maximum number of elements
from 256 to 65536.  That ought to be enough realistically (in the
same way that 640k ought to be enough realistically), but the patches
as posted have avoided needing to encode a maximum like that.

Having the internal function do the permute rather than produce the
mask means that the operation really is variable-length: we don't
require a vector index to fit within a specific integer type.

Thanks,
Richard


Re: [patch] remove cilk-plus

2017-11-23 Thread Jeff Law
On 11/22/2017 01:38 AM, Koval, Julia wrote:
> Hi,
> 
>> So it's not important, but the patch doesn't have the removal of the
>> cilk+ testsuite or runtime.  BUt again, it's not a big deal, I can guess
>> what that part of the patch looks like.
> 
> I used Jakub's suggestion in 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01348.html and didn't add 
> libcilkrts and cilk-plus directories to patch(without this patch doesn't fit 
> in gcc-patches), only to change log. I will include them, when I'll commit 
> the patch, but I guess there is nothing to review here:
> rm -rf gcc/testsuite/c-c++-common/cilk-plus
> rm -rf gcc/testsuite/g++.dg/cilk-plus
> rm -rf gcc/testsuite/gcc.dg/cilk-plus
> rm -rf libcilkrts
> I can send it as an additional patch(or patches) if this is required.
No need.  As I said, I can guess what those look like :-0

Jeff
> 


Re: [PATCH] rs6000_gimple_fold_builtin formatting fixes

2017-11-23 Thread Segher Boessenkool
Hi!

On Thu, Nov 23, 2017 at 10:26:28AM +0100, Jakub Jelinek wrote:
> While looking at this function, I found so many formatting issues
> (indentation and spacing) that I've decided to post a patch.  As an
> additional cleanup, I've moved the temp and g declarations to the top of
> the function, so that {}s don't need to clutter most of the cases.
> The patch shouldn't change the generated code at all.
> 
> Ok for trunk if it passes bootstrap/regtest?

Okay, thanks!


Segher


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-23 Thread Thomas Koenig

Hi Janne,


However, to continue my nitpicking (sorry!), it seems that in many
cases compare_fcn still takes an integer length argument. Could you
make that gfc_charlen_type as well? Or maybe size_t, since the
argument is passed straight to memcmp{_char4} anyway? Please consider
such a patch pre-approved. Thanks!


Committed as r255109.

I had missed out on the non-inlined maxval and maxloc versions...

The fun with max* and min* intrinsics is not yet over. Maxval and
Minval have yet to be implemented for character arguments, and then
there is the BACK argument to MAXLOC.

Maybe (while we are breaking compatibility) we should just add BACK
to the front end, reject it whith a "not yet implemented" message,
add the argument to the library and worry about implementation
later.

What do you think?

Regards

Thomas



Re: [PATCH, rs6000] Testcase updates for power9 codegen

2017-11-23 Thread Segher Boessenkool
On Mon, Nov 20, 2017 at 02:16:59PM -0600, Will Schmidt wrote:
> > > --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c
> > > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c
> > > @@ -1,11 +1,11 @@
> > >  /* Verify that overloaded built-ins for vec_abs with int
> > > inputs produce the right results.  */
> > >  
> > >  /* { dg-do compile } */
> > >  /* { dg-require-effective-target powerpc_altivec_ok } */
> > > -/* { dg-options "-maltivec -O2 -fwrapv" } */
> > > +/* { dg-options "-maltivec -O2 -fwrapv -mcpu=power8" } */
> > 
> > Is this testcase really testing something specific to power8 codegen?
> 
> Yes, in contrast to power9 codegen.

Ah, so there is a separate testcase for power9.  Okay.

Maybe it should say so in the filename, or in a comment at least.
It is very helpful if testcases say what they try to test.

> > Making all these testcases use -mcpu=power8 means they won't be
> tested 
> > with any other settings.  (Also, does that work if the user puts another
> > -mcpu= in RUNTESTFLAGS).
> 
> possibly not.  I did (do?) have a do-not-override option in place
> initially for a couple of the tests, but didn't seem to need it during
> my sniff-testing.I can try a few runs with manually specifying -mcpu
> flags and revisit. 
> /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
> { "-mcpu=power8" } } */

Such a line is needed afaics, yeah.

> > I'm all for making the testresults cleaner, but let's not do that by
> > (effectively) disabling all failing tests ;-)
> 
> No, thats really not my intent here.  I'm not really convinced thats
> what I've done here either.   I've added .p9 versions for most of what
> I've touched, with the intent to continue to have good coverage.

I missed that :-)  Great, thanks.

> Adding the -mcpu=foo option to dg-options shouldn't be disabling the
> test..?

It means this test won't test codegen for any later cpu.  So if there
is no separate test for later cpus, you lose test coverage.

Thanks,


Segher


Re: [PATCH, rs6000] (v2) fold-vec-splat* testcase updates for power9

2017-11-23 Thread Segher Boessenkool
Hi!

On Mon, Nov 20, 2017 at 04:04:18PM -0600, Will Schmidt wrote:
> Update the scan-assembler stanza to include those instructions
> generated for a power9 target.
> 
> 2017-11-20  Will Schmidt  
> 
> [testsuite]
> * fold-vec-splat-8.c : Add vspltisb to expected output.
> * fold-vec-splats-int.c : Add mtvsrws to expected output.
> * fold-vec-splats-longlong.c : Add mtvsrdd to expected output.

No space before the colon.  Okay for trunk with that fixed.  Thanks,


Segher


Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-11-23 Thread Charles Baylis
On 23 November 2017 at 10:01, Kyrill  Tkachov
 wrote:

>
> Thanks, these are ok for trunk.
> They were originally posted way before stage 3 and this is just a rework,
> so it's acceptable at this stage as far as I'm concerned.

Thanks. Committed to trunk as r255111.


Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.

2017-11-23 Thread Charles Baylis
On 15 September 2017 at 17:57, Kyrill  Tkachov
 wrote:
>
> Thanks, this is ok once the prerequisites are sorted.

Patch 1 was abandoned, and a later version of patch 2 has been
committed, so this was applied to trunk as r255112.


Re: [patch] implement generic debug() for vectors and hash sets

2017-11-23 Thread Gerald Pfeifer
On Tue, 21 Nov 2017, Gerald Pfeifer wrote:
> /scratch/tmp/gerald/GCC-HEAD/gcc/print-rtl.c:982:1: error: explicit 
> instantiation cannot have a storage class
> DEFINE_DEBUG_VEC (rtx_def *)
> ^
> /scratch/tmp/gerald/GCC-HEAD/gcc/vec.h:456:24: note: expanded from macro 
> 'DEFINE_DEBUG_VEC'
>   template static void debug_helper (vec &); \
>^
> 
> The first failing boostrap seems to have been yesterday at 16:40 UTC, 
> 24 hours before it still worked.

This fixes it for me; bootstrapped&tested on i586-unknown-freebsd10.4
(and compared with the last successbuild on the 19th).

Okay?

Gerald


2017-11-23  Gerald Pfeifer  

* hash-set.h (DEFINE_DEBUG_HASH_SET): Remove static qualifier
from explicit instantiation of debug_helper.
* vec.h (DEFINE_DEBUG_VEC): Ditto.

Index: hash-set.h
===
--- hash-set.h  (revision 255092)
+++ hash-set.h  (working copy)
@@ -150,7 +150,7 @@
 }
 
 #define DEFINE_DEBUG_HASH_SET(T) \
-  template static void debug_helper (hash_set &);   \
+  template void debug_helper (hash_set &);  \
   DEBUG_FUNCTION void  \
   debug (hash_set &ref) \
   {\
Index: vec.h
===
--- vec.h   (revision 255092)
+++ vec.h   (working copy)
@@ -453,8 +453,8 @@
functions for a type T.  */
 
 #define DEFINE_DEBUG_VEC(T) \
-  template static void debug_helper (vec &);\
-  template static void debug_helper (vec &); \
+  template void debug_helper (vec &);   \
+  template void debug_helper (vec &);\
   /* Define the vec debug functions.  */\
   DEBUG_FUNCTION void  \
   debug (vec &ref)  \


Re: [PATCH, rs6000] (v2) fold-vec-ld* testcase updates for power9

2017-11-23 Thread Segher Boessenkool
Hi Will,

On Mon, Nov 20, 2017 at 04:05:55PM -0600, Will Schmidt wrote:
> Update the scan-assembler stanza to include those instructions
> generated for a power9 target.
> 
> 2017-11-20  Will Schmidt  
> 
> [testsuite]
> 
> * fold-vec-ld-char.c: Add lxv insn to expected output.
> * fold-vec-ld-double.c: Add lxv insn to expected output.
> * fold-vec-ld-float.c: Add lxv insn to expected output.
> * fold-vec-ld-int.c: Add lxv insn to expected output.
> * fold-vec-ld-longlong.c: Add lxv insn to expected output.
> * fold-vec-ld-short.c: Add lxv insn to expected output.

You have the pathnames wrong?  Should include gcc.target/powerpc/ .
Okay otherwise.  Thanks,


Segher


Re: [PATCH, rs6000] (v2) testcase updates for fold-vec-abs-* for power9 codegen

2017-11-23 Thread Segher Boessenkool
Hi!

On Mon, Nov 20, 2017 at 04:06:54PM -0600, Will Schmidt wrote:
> Add additional scan-assembler entries for those tests with
> simple variations in codegen. (for power9, versus power8, etc).
> 
> 
> 2017-11-20  Will Schmidt  
> 
> [testsuite]
> 
> * fold-vec-abs-char-fwrapv.c: Add xxspltib insn to expected output.
> * fold-vec-abs-char.c: Add xxspltib insn to expected output list.

Paths...  Otherwise, okay for trunk.  Thanks!


Segher


[ft32, committed] Remove semicolon after ASM_OUTPUT_ADDR_VEC_ELT

2017-11-23 Thread Tom de Vries

Hi,

This removes a semicolon after the ft32 version of 
ASM_OUTPUT_ADDR_VEC_ELT. This allows the macro to be used in 
if-then-elses without curly braces.


Build for ft32.

Committed as obvious.

Thanks,
- Tom
[ft32] Remove semicolon after ASM_OUTPUT_ADDR_VEC_ELT

2017-11-23  Tom de Vries  

	* config/ft32/ft32.h (ASM_OUTPUT_ADDR_VEC_ELT): Remove semicolon after
	macro.

---
 gcc/config/ft32/ft32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/ft32/ft32.h b/gcc/config/ft32/ft32.h
index b276f25..c32db40 100644
--- a/gcc/config/ft32/ft32.h
+++ b/gcc/config/ft32/ft32.h
@@ -205,7 +205,7 @@ enum reg_class
 /* This is how to output an element of a case-vector that is absolute.  */
 
 #define ASM_OUTPUT_ADDR_VEC_ELT(FILE, VALUE)  \
-fprintf (FILE, "\t.long\t.L%d\n", VALUE);\
+fprintf (FILE, "\t.long\t.L%d\n", VALUE)\
 
 /* Passing Arguments in Registers */
 


[libobjc, committed] Wrap CLASS_TABLE_HASH in do {} while (0)

2017-11-23 Thread Tom de Vries

Hi,

this patch wraps CLASS_TABLE_HASH in "do {} while (0)". This allows the 
macro to be used in if-then-elses without curly braces.


Build libobcj for x86_64.

Committed as obvious.

Thanks,
- Tom
[libobjc] Wrap CLASS_TABLE_HASH in do {} while (0)

2017-11-23  Tom de Vries  

	* class.c (CLASS_TABLE_HASH): Wrap in "do {} while (0)".

---
 libobjc/class.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/libobjc/class.c b/libobjc/class.c
index 53fb5fe..7a8f832 100644
--- a/libobjc/class.c
+++ b/libobjc/class.c
@@ -153,14 +153,16 @@ static objc_mutex_t __class_table_lock = NULL;
string, and HASH the computed hash of the string; CLASS_NAME is
untouched.  */
 
-#define CLASS_TABLE_HASH(INDEX, HASH, CLASS_NAME)  \
-  HASH = 0;  \
-  for (INDEX = 0; CLASS_NAME[INDEX] != '\0'; INDEX++)\
-{\
-  HASH = (HASH << 4) ^ (HASH >> 28) ^ CLASS_NAME[INDEX]; \
-}\
- \
-  HASH = (HASH ^ (HASH >> 10) ^ (HASH >> 20)) & CLASS_TABLE_MASK;
+#define CLASS_TABLE_HASH(INDEX, HASH, CLASS_NAME)			\
+  do {	\
+HASH = 0;\
+for (INDEX = 0; CLASS_NAME[INDEX] != '\0'; INDEX++)			\
+  {	\
+	HASH = (HASH << 4) ^ (HASH >> 28) ^ CLASS_NAME[INDEX];		\
+  }	\
+	\
+HASH = (HASH ^ (HASH >> 10) ^ (HASH >> 20)) & CLASS_TABLE_MASK;	\
+  } while (0)
 
 /* Setup the table.  */
 static void


[sh, committed] Wrap ASM_OUTPUT_ADDR_VEC_ELT in do {} while (0)

2017-11-23 Thread Tom de Vries

Hi,

this patch wraps the sh version of ASM_OUTPUT_ADDR_VEC_ELT in "do {} 
while (0)".  This allows the macro to be used in if-then-elses without 
curly braces.


Build for sh.

Committed as obvious.

Thanks,
- Tom
[sh] Wrap ASM_OUTPUT_ADDR_VEC_ELT in do {} while (0)

2017-11-23  Tom de Vries  

	* config/sh/sh.h (ASM_OUTPUT_ADDR_VEC_ELT): Wrap in "do {} while (0)".

---
 gcc/config/sh/sh.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index f5d80da..82699cd 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1754,12 +1754,13 @@ extern bool current_function_interrupt;
 }
 
 /* Output an absolute table element.  */
-#define ASM_OUTPUT_ADDR_VEC_ELT(STREAM,VALUE)  \
-  if (! optimize || TARGET_BIGTABLE)	\
-asm_fprintf ((STREAM), "\t.long\t%LL%d\n", (VALUE)); 		\
-  else	\
-asm_fprintf ((STREAM), "\t.word\t%LL%d\n", (VALUE));
-
+#define ASM_OUTPUT_ADDR_VEC_ELT(STREAM,VALUE) \
+  do {	\
+if (! optimize || TARGET_BIGTABLE)	\
+  asm_fprintf ((STREAM), "\t.long\t%LL%d\n", (VALUE)); 		\
+else\
+  asm_fprintf ((STREAM), "\t.word\t%LL%d\n", (VALUE));		\
+  } while (0)
 
 /* A C statement to be executed just prior to the output of
assembler code for INSN, to modify the extracted operands so


[libgccjit, committed] Wrap RETURN_NULL_IF_FAIL_NONNULL_NUMERIC_TYPE in JIT_{BEGIN,END}_STMT.

2017-11-23 Thread Tom de Vries

Hi,

this patch wraps RETURN_NULL_IF_FAIL_NONNULL_NUMERIC_TYPE in 
JIT_{BEGIN,END}_STMT. This allows the macro to be used in if-then-elses 
without curly braces.


Tested by doing x86_64 build with jitfe enabled.

Committed as obvious.

Thanks,
- Tom
[libgccjit] Wrap RETURN_NULL_IF_FAIL_NONNULL_NUMERIC_TYPE in JIT_{BEGIN,END}_STMT.

2017-11-23  Tom de Vries  

	* libgccjit.c (RETURN_NULL_IF_FAIL_NONNULL_NUMERIC_TYPE): Wrap in
	JIT_{BEGIN,END}_STMT.

---
 gcc/jit/libgccjit.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index c00acbf..8bad4f6 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -1115,11 +1115,13 @@ gcc_jit_rvalue_get_type (gcc_jit_rvalue *rvalue)
result of gcc_jit_context_get_type (GCC_JIT_TYPE_INT).  */
 
 #define RETURN_NULL_IF_FAIL_NONNULL_NUMERIC_TYPE(CTXT, NUMERIC_TYPE) \
+  JIT_BEGIN_STMT		 \
   RETURN_NULL_IF_FAIL (NUMERIC_TYPE, CTXT, NULL, "NULL type"); \
   RETURN_NULL_IF_FAIL_PRINTF1 (\
 NUMERIC_TYPE->is_numeric (), ctxt, NULL,   \
 "not a numeric type: %s",  \
-NUMERIC_TYPE->get_debug_string ());
+NUMERIC_TYPE->get_debug_string ()); \
+  JIT_END_STMT
 
 /* Public entrypoint.  See description in libgccjit.h.
 


[committed] 4 more -Wno-return-type for powerpc* testsuite

2017-11-23 Thread Jakub Jelinek
Hi!

In powerpc64{,le}-linux bootstraps I've discovered 4 spots missing
-Wno-return-type (the warning is in the included header and the tests are
for code generation, so I'd prefer not to change the code).

Committed to trunk as obvious.

2017-11-23  Jakub Jelinek  

* g++.dg/pr65240-1.C: Add -Wno-return-type to dg-options.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.

--- gcc/testsuite/g++.dg/pr65240-1.C.jj 2017-06-19 08:27:54.0 +0200
+++ gcc/testsuite/g++.dg/pr65240-1.C2017-11-23 21:09:29.215588703 +0100
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mno-fp-in-toc" } 
*/
+/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mno-fp-in-toc 
-Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
--- gcc/testsuite/g++.dg/pr65240-2.C.jj 2017-06-19 08:27:56.0 +0200
+++ gcc/testsuite/g++.dg/pr65240-2.C2017-11-23 21:09:34.124527812 +0100
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mfp-in-toc" } */
+/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mfp-in-toc 
-Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
--- gcc/testsuite/g++.dg/pr65240-3.C.jj 2017-06-19 08:27:58.0 +0200
+++ gcc/testsuite/g++.dg/pr65240-3.C2017-11-23 21:09:39.415462183 +0100
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=medium" } */
+/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
--- gcc/testsuite/g++.dg/pr65240-4.C.jj 2017-06-19 08:28:00.0 +0200
+++ gcc/testsuite/g++.dg/pr65240-4.C2017-11-23 21:09:45.171390786 +0100
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-mcpu=power7 -O3 -ffast-math" } */
+/* { dg-options "-mcpu=power7 -O3 -ffast-math -Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 

Jakub


Re: Patch for [Bug fortran/81841] [5/6/7/8 Regression] THREADPRIVATE (OpenMP) wrongly rejected in BLOCK DATA

2017-11-23 Thread Jakub Jelinek
On Fri, Sep 01, 2017 at 03:47:10PM +0200, dbroemmel wrote:
> > If you really need a testcase, it would be enough to do something like:
> >   use omp_lib
> >   !$omp parallel num_threads(2)
> >   int2 = omp_get_thread_num ()
> >   !$omp barrier
> >   if (int2 != omp_get_thread_num ()) call abort
> >   !$omp end parallel
> > or so to ensure it has the threadprivate property by writing something
> > different to it in each thread and after barrier verifying it has the
> > expected value in each thread.
> I'm more than fine with the short compile-only testcase. I pretty sure
> my largish runtime test doesn't get near covering all relevant aspects
> of the THREADPRIVATE directive for common blocks. Also, the fix is for
> this reject-valid parsing error, so not really to do with anything else,
> so perhaps shouldn't test anything else?

I was expecting you'd repost the patch with updated testcase and then
forgot about the issue, got to it only now when looking through regressions.

I've bootstrapped/regtested this version and committed it so far to trunk:

2017-11-23  Dirk Broemmel  
Jakub Jelinek  

PR fortran/81841
* parse.c (parse_spec): Allow ST_OMP_THREADPRIVATE inside of
BLOCK DATA.

* libgomp.fortran/pr81841.f90: New test.

--- gcc/fortran/parse.c.jj  2017-11-06 08:46:32.0 +0100
+++ gcc/fortran/parse.c 2017-11-23 18:40:44.727973342 +0100
@@ -3699,6 +3699,7 @@ loop:
case ST_EQUIVALENCE:
case ST_IMPLICIT:
case ST_IMPLICIT_NONE:
+   case ST_OMP_THREADPRIVATE:
case ST_PARAMETER:
case ST_STRUCTURE_DECL:
case ST_TYPE:
--- libgomp/testsuite/libgomp.fortran/pr81841.f90.jj2017-11-23 
18:34:37.319385141 +0100
+++ libgomp/testsuite/libgomp.fortran/pr81841.f90   2017-11-23 
18:44:36.055198860 +0100
@@ -0,0 +1,26 @@
+! PR fortran/81841
+! { dg-do run }
+
+block data
+  integer :: a
+  real :: b(2)
+  common /c/ a, b
+  !$omp threadprivate (/c/)
+  data a / 32 /
+  data b /2*1./
+end
+
+program pr81841
+  use omp_lib
+  integer :: e
+  real :: f(2)
+  common /c/ e, f
+  !$omp threadprivate (/c/)
+  !$omp parallel num_threads(8)
+  if ((e /= 32) .or. any(f /= 1.)) call abort
+  e = omp_get_thread_num ()
+  f = e + 19.
+  !$omp barrier
+  if ((e /= omp_get_thread_num ()) .or. any(f /= e + 19.)) call abort
+  !$omp end parallel
+end


Jakub


[committed] Reject invalid #pragma omp declare simd in the C FE

2017-11-23 Thread Jakub Jelinek
Hi!

Unlike C++, C doesn't allow function declarations inside of if/while/for
body without {}s around, or after a label, while with #pragma omp declare simd
in between and -fopenmp it would happily accept it.  That is wrong, the
presence/absence of OpenMP pragmas shouldn't change parsing that way.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2017-11-23  Jakub Jelinek  

* c-parser.c (c_parser_omp_declare_simd): Reject declare simd in
pragma_stmt context.

* gcc.dg/gomp/declare-simd-1.c (f9): Remove.
* gcc.dg/gomp/declare-simd-5.c: New test.

--- gcc/c/c-parser.c.jj 2017-11-20 19:55:39.0 +0100
+++ gcc/c/c-parser.c2017-11-23 17:43:51.623042138 +0100
@@ -17483,11 +17483,11 @@ c_parser_omp_declare_simd (c_parser *par
   break;
 case pragma_struct:
 case pragma_param:
+case pragma_stmt:
   c_parser_error (parser, "%<#pragma omp declare simd%> must be followed 
by "
  "function declaration or definition");
   break;
 case pragma_compound:
-case pragma_stmt:
   if (c_parser_next_token_is (parser, CPP_KEYWORD)
  && c_parser_peek_token (parser)->keyword == RID_EXTENSION)
{
--- gcc/testsuite/gcc.dg/gomp/declare-simd-1.c.jj   2016-04-06 
14:46:29.0 +0200
+++ gcc/testsuite/gcc.dg/gomp/declare-simd-1.c  2017-11-23 17:46:04.914434577 
+0100
@@ -58,18 +58,6 @@ f7 (int x)
 /* { dg-final { scan-assembler-times "_ZGVeM16v_f7:" 1 { target { i?86-*-* 
x86_64-*-* } } } } */
 /* { dg-final { scan-assembler-times "_ZGVeN16v_f7:" 1 { target { i?86-*-* 
x86_64-*-* } } } } */
 
-int
-f9 (int x)
-{
-  if (x)
-#pragma omp declare simd simdlen (8) aligned (b : 8 * sizeof (int))
-extern int f10 (int a, int *b, int c);
-  while (x < 10)
-#pragma omp declare simd simdlen (8) aligned (b : 8 * sizeof (int))
-extern int f11 (int a, int *b, int c);
-  return x;
-}
-
 #pragma omp declare simd uniform (a) aligned (b : 8 * sizeof (int)) linear (c 
: 4) simdlen (8)
 int f12 (int c; int *b; int a; int a, int *b, int c);
 
--- gcc/testsuite/gcc.dg/gomp/declare-simd-5.c.jj   2017-11-23 
17:47:01.886747780 +0100
+++ gcc/testsuite/gcc.dg/gomp/declare-simd-5.c  2017-11-23 17:51:10.749747753 
+0100
@@ -0,0 +1,35 @@
+/* Test parsing of #pragma omp declare simd */
+/* { dg-do compile } */
+
+int
+f1 (int x)
+{
+  if (x)
+#pragma omp declare simd simdlen (8) aligned (b : 8 * sizeof (int))
+extern int f3 (int a, int *b, int c);  /* { dg-error "must be followed 
by function declaration or definition" } */
+  while (x < 10)
+#pragma omp declare simd simdlen (8) aligned (b : 8 * sizeof (int))
+extern int f4 (int a, int *b, int c);  /* { dg-error "must be followed 
by function declaration or definition" } */
+  {
+lab:
+#pragma omp declare simd simdlen (8) aligned (b : 8 * sizeof (int))
+extern int f5 (int a, int *b, int c);  /* { dg-error "must be followed 
by function declaration or definition" } */
+x++;   /* { dg-error "expected 
expression before" "" { target *-*-* } .-1 } */
+  }
+  return x;
+}
+
+int
+f2 (int x)
+{
+  if (x)
+extern int f6 (int a, int *b, int c);  /* { dg-error "expected 
expression before" } */
+  while (x < 10)
+extern int f7 (int a, int *b, int c);  /* { dg-error "expected 
expression before" } */
+  {
+lab:
+extern int f8 (int a, int *b, int c);  /* { dg-error "a label can only 
be part of a statement and a declaration is not a statement" } */
+x++;
+  }
+  return x;
+}

Jakub


[committed] Fix declare-simd-1.C testcase failure

2017-11-23 Thread Jakub Jelinek
Hi!

The -Wreturn-type warning on declare-simd-1.C shows we were actually
misparsing it with -fopenmp, e.g.
  if (1)
#pragma omp declare simd
extern float foo (float);
  n++;
was actually parsed as
  if (1)
{
  #pragma omp declare simd
  extern float foo (float);
  n++;
}
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2017-11-23  Jakub Jelinek  

* parser.c (cp_parser_omp_declare): Change return type to bool from
void, return true for declare simd.
(cp_parser_pragma): Return cp_parser_omp_declare returned value
rather than always false.

--- gcc/cp/parser.c.jj  2017-11-23 15:31:44.0 +0100
+++ gcc/cp/parser.c 2017-11-23 17:29:54.925145608 +0100
@@ -37904,7 +37904,7 @@ cp_parser_omp_declare_reduction (cp_pars
   initializer-clause[opt] new-line
#pragma omp declare target new-line  */
 
-static void
+static bool
 cp_parser_omp_declare (cp_parser *parser, cp_token *pragma_tok,
   enum pragma_context context)
 {
@@ -37918,7 +37918,7 @@ cp_parser_omp_declare (cp_parser *parser
  cp_lexer_consume_token (parser->lexer);
  cp_parser_omp_declare_simd (parser, pragma_tok,
  context);
- return;
+ return true;
}
   cp_ensure_no_omp_declare_simd (parser);
   if (strcmp (p, "reduction") == 0)
@@ -37926,23 +37926,24 @@ cp_parser_omp_declare (cp_parser *parser
  cp_lexer_consume_token (parser->lexer);
  cp_parser_omp_declare_reduction (parser, pragma_tok,
   context);
- return;
+ return false;
}
   if (!flag_openmp)  /* flag_openmp_simd  */
{
  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
- return;
+ return false;
}
   if (strcmp (p, "target") == 0)
{
  cp_lexer_consume_token (parser->lexer);
  cp_parser_omp_declare_target (parser, pragma_tok);
- return;
+ return false;
}
 }
   cp_parser_error (parser, "expected % or % "
   "or %");
   cp_parser_require_pragma_eol (parser, pragma_tok);
+  return false;
 }
 
 /* OpenMP 4.5:
@@ -38861,8 +38862,7 @@ cp_parser_pragma (cp_parser *parser, enu
   return false;
 
 case PRAGMA_OMP_DECLARE:
-  cp_parser_omp_declare (parser, pragma_tok, context);
-  return false;
+  return cp_parser_omp_declare (parser, pragma_tok, context);
 
 case PRAGMA_OACC_DECLARE:
   cp_parser_oacc_declare (parser, pragma_tok);

Jakub


[PATCH] Fix and improve tree-object-size.c pass_through_call

2017-11-23 Thread Jakub Jelinek
Hi!

As mentioned on IRC, we can actually now use ERF_RETURN*_ARG* to find out
what functions are pass through (except for __builtin_assume_aligned, which
is pass through but we never want to propagate the arg to the result early, as
the result contains additional information, so it isn't RET1).

While cleaning this up, I run into a bug in the function, BUILT_IN_STPNCPY_CHK
obviously isn't a pass-through function, so I've also added a testcase that
FAILs without this patch and succeeds with it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

For release branches I'd just remove the BUILT_IN_STPNCPY_CHK case and
add testcase.

2017-11-23  Jakub Jelinek  

* tree-object-size.c (pass_through_call): Use gimple_call_return_flags
ERF_RETURN*ARG* for builtins other than BUILT_IN_ASSUME_ALIGNED,
check for the latter with gimple_call_builtin_p.  Do not handle
BUILT_IN_STPNCPY_CHK which is not a pass through call.

* gcc.dg/builtin-object-size-18.c: New test.

--- gcc/tree-object-size.c.jj   2017-10-20 09:16:10.0 +0200
+++ gcc/tree-object-size.c  2017-11-23 16:23:24.544741945 +0100
@@ -464,34 +464,17 @@ alloc_object_size (const gcall *call, in
 static tree
 pass_through_call (const gcall *call)
 {
-  tree callee = gimple_call_fndecl (call);
+  unsigned rf = gimple_call_return_flags (call);
+  if (rf & ERF_RETURNS_ARG)
+{
+  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+  if (argnum < gimple_call_num_args (call))
+   return gimple_call_arg (call, argnum);
+}
 
-  if (callee
-  && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
-switch (DECL_FUNCTION_CODE (callee))
-  {
-  case BUILT_IN_MEMCPY:
-  case BUILT_IN_MEMMOVE:
-  case BUILT_IN_MEMSET:
-  case BUILT_IN_STRCPY:
-  case BUILT_IN_STRNCPY:
-  case BUILT_IN_STRCAT:
-  case BUILT_IN_STRNCAT:
-  case BUILT_IN_MEMCPY_CHK:
-  case BUILT_IN_MEMMOVE_CHK:
-  case BUILT_IN_MEMSET_CHK:
-  case BUILT_IN_STRCPY_CHK:
-  case BUILT_IN_STRNCPY_CHK:
-  case BUILT_IN_STPNCPY_CHK:
-  case BUILT_IN_STRCAT_CHK:
-  case BUILT_IN_STRNCAT_CHK:
-  case BUILT_IN_ASSUME_ALIGNED:
-   if (gimple_call_num_args (call) >= 1)
- return gimple_call_arg (call, 0);
-   break;
-  default:
-   break;
-  }
+  /* __builtin_assume_aligned is intentionally not marked RET1.  */
+  if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED))
+return gimple_call_arg (call, 0);
 
   return NULL_TREE;
 }
--- gcc/testsuite/gcc.dg/builtin-object-size-18.c.jj2017-11-23 
16:33:25.723389685 +0100
+++ gcc/testsuite/gcc.dg/builtin-object-size-18.c   2017-11-23 
16:34:02.930934647 +0100
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* __stpncpy_chk could return buf up to buf + 64, so
+   the minimum object size might be far smaller than 64.  */
+/* { dg-final { scan-tree-dump-not "return 64;" "optimized" } } */
+
+typedef __SIZE_TYPE__ size_t;
+
+size_t
+foo (const char *p, size_t s, size_t t)
+{
+  char buf[64];
+  char *q = __builtin___stpncpy_chk (buf, p, s, t);
+  return __builtin_object_size (q, 2);
+}

Jakub


[C++ PATCH] Fix structured binding initializer checking (PR c++/81888)

2017-11-23 Thread Jakub Jelinek
Hi!

My PR81258 fix actually rejects even valid cases.
The standard says that:
"The initializer shall be of the form “= assignment-expression”, of the form
“{ assignment-expression }”, or of the form “( assignment-expression )”
Now, if the form is = assigment-expression, we can e.g. in templates end up
with CONSTRUCTOR initializer which has more or fewer elements than 1.

So, this patch restricts the checks to only BRACE_ENCLOSED_INITIALIZER_P
and only if is_direct_init (i.e. not the = assignment-expression form).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
(and 7.x after a while)?

2017-11-23  Jakub Jelinek  

PR c++/81888
* parser.c (cp_parser_decomposition_declaration): Reject just
BRACE_ENCLOSED_INITIALIZER_P initializers with nelts != 1 rather
than all such CONSTRUCTORs, and only if is_direct_init is true.

* g++.dg/cpp1z/decomp30.C: Add a test for structured binding with
= {} and = { a, a } initializers.
* g++.dg/cpp1z/decomp31.C: New test.

--- gcc/cp/parser.c.jj  2017-11-21 20:23:01.0 +0100
+++ gcc/cp/parser.c 2017-11-23 15:31:44.473524252 +0100
@@ -13382,7 +13382,8 @@ cp_parser_decomposition_declaration (cp_
   if (initializer == NULL_TREE
  || (TREE_CODE (initializer) == TREE_LIST
  && TREE_CHAIN (initializer))
- || (TREE_CODE (initializer) == CONSTRUCTOR
+ || (is_direct_init
+ && BRACE_ENCLOSED_INITIALIZER_P (initializer)
  && CONSTRUCTOR_NELTS (initializer) != 1))
{
  error_at (loc, "invalid initializer for structured binding "
--- gcc/testsuite/g++.dg/cpp1z/decomp30.C.jj2017-09-15 18:11:04.0 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/decomp30.C   2017-11-23 15:33:04.208552682 
+0100
@@ -10,3 +10,5 @@ auto [j, k] { a, a }; // { dg-error "inv
 auto [l, m] = { a };   // { dg-error "deducing from brace-enclosed initializer 
list requires" }
 auto [n, o] {};// { dg-error "invalid initializer for 
structured binding declaration" }
 auto [p, q] ();// { dg-error "invalid initializer for 
structured binding declaration" }
+auto [r, s] = {};  // { dg-error "deducing from brace-enclosed initializer 
list requires" }
+auto [t, u] = { a, a };// { dg-error "deducing from brace-enclosed 
initializer list requires" }
--- gcc/testsuite/g++.dg/cpp1z/decomp31.C.jj2017-11-23 15:22:31.695255014 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp31.C   2017-11-23 15:22:21.0 
+0100
@@ -0,0 +1,18 @@
+// PR c++/81888
+// { dg-do compile { target c++17 } }
+
+struct S {
+  bool s = true;
+};
+
+auto [a] = S{};
+
+template 
+bool
+foo () noexcept
+{
+  auto [c] = T{};
+  return c;
+}
+
+const bool b = foo ();

Jakub


[PATCH] Fix ubsan on mingw hosts (PR sanitizer/83014)

2017-11-23 Thread Jakub Jelinek
Hi!

>From my reading of the PR and looking around, I believe the reason
ubsan ICEs on mingw hosts is that it is the only host that redefines
HOST_LONG_LONG_FORMAT to "I64" instead of the usual "ll", but "%I64"
is nothing pp_format can handle, and indeed this spot in ubsan.c is
the only place that calls pp_printf with HOST_WIDE_INT_PRINT_DEC
or similar format specifier.  Instead, other spots use
pp_unsigned_wide_integer or pp_scalar on top of which it is defined
that does:
  sprintf (pp_buffer (PP)->digit_buffer, FORMAT, SCALAR); \
  pp_string (PP, pp_buffer (PP)->digit_buffer);   \
and thus uses the host *printf which handles "I64" and appends that
as a string.

Bootstrapped/regtested on x86_64-linux and i686-linux, no way to test
on mingw.  Ok for trunk? 

2017-11-23  Jakub Jelinek  

PR sanitizer/83014
* ubsan.c (ubsan_type_descriptor): Use pp_unsigned_wide_integer
instead of pp_printf with HOST_WIDE_INT_PRINT_DEC.  Avoid calling
tree_to_uhwi twice.

* gcc.dg/ubsan/pr83014.c: New test.

--- gcc/ubsan.c.jj  2017-11-13 09:31:33.0 +0100
+++ gcc/ubsan.c 2017-11-23 11:35:20.452162632 +0100
@@ -436,10 +436,10 @@ ubsan_type_descriptor (tree type, enum u
  && TYPE_MAX_VALUE (dom) != NULL_TREE
  && TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST)
{
+ unsigned HOST_WIDE_INT m;
  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (dom))
- && tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1 != 0)
-   pp_printf (&pretty_name, HOST_WIDE_INT_PRINT_DEC,
-   tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1);
+ && (m = tree_to_uhwi (TYPE_MAX_VALUE (dom))) + 1 != 0)
+   pp_unsigned_wide_integer (&pretty_name, m + 1);
  else
pp_wide_int (&pretty_name,
 wi::add (wi::to_widest (TYPE_MAX_VALUE (dom)), 1),
--- gcc/testsuite/gcc.dg/ubsan/pr83014.c.jj 2017-11-23 11:52:59.613932074 
+0100
+++ gcc/testsuite/gcc.dg/ubsan/pr83014.c2017-11-23 11:53:30.867542456 
+0100
@@ -0,0 +1,12 @@
+/* PR sanitizer/83014 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+int
+foo (void)
+{
+  int data[5];
+  data[0] = 0;
+  data[5] = 0;
+  return data[0];
+}

Jakub


Re: [PATCH][i386,AVX] Enable VBMI2 support [2/7]

2017-11-23 Thread Kirill Yukhin
Hello, Julia!
On 24 Oct 08:25, Koval, Julia wrote:
> Hi,
> This patch enables VPCOMPRESSB[W] instruction. The doc for isaset and 
> instruction: 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Ok for trunk?
Your patch is OK for trunk. I;ve checked it in.

--
Thanks, K


[PATCH] Fix powerpc* ICE with vec builtins with -mno-altivec (PR target/82848)

2017-11-23 Thread Jakub Jelinek
Hi!

With -mno-altivec -mno-vsx -mno-fold-gimple we error on __builtin_vec*
builtins used when corresponding ISA is not enabled.
When folding gimple, in some cases we get away with it (e.g. when folding
the builtin to PLUS_EXPR on the generic vectors), because
tree-vect-generic.c lowers those into scalar piecewise operations,
but e.g. in case of FMA_EXPR tree-vect-generic.c isn't doing anything,
expects that FMA_EXPR is actually used only if supported, I think similarly
for various vector widening operations etc.

One option is not to fold into gimple only those builtins where we know it
will not work on generic vectors; my preference is to not fold any builtin
which would be rejected at expansion time, so user get at least consistent
diagnostics.  Usually people are using the altivec.h etc. headers that error
if the ISA is not enabled, so this will only affect people that use the
builtins directly.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2017-11-23  Jakub Jelinek  

PR target/82848
* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Don't fold
builtins not enabled in the currently selected ISA.

--- gcc/config/rs6000/rs6000.c.jj   2017-11-23 10:29:01.0 +0100
+++ gcc/config/rs6000/rs6000.c  2017-11-23 10:30:54.930019590 +0100
@@ -16143,6 +16143,12 @@ rs6000_gimple_fold_builtin (gimple_stmt_
   if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
 return false;

+  /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
+  HOST_WIDE_INT mask = rs6000_builtin_info[uns_fncode].mask;
+  bool func_valid_p = (rs6000_builtin_mask & mask) == mask;
+  if (!func_valid_p)
+return false;
+
   switch (fn_code)
 {
 /* Flavors of vec_add.  We deliberately don't expand
--- gcc/testsuite/gcc.target/powerpc/pr82848.c.jj   2017-11-23 
10:34:14.670471607 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr82848.c  2017-11-23 10:36:22.754841465 
+0100
@@ -0,0 +1,13 @@
+/* PR target/82848 */
+/* { dg-do compile } */
+/* { dg-options "-mno-altivec -mno-vsx -Wno-psabi" } */
+
+#define C 3.68249351546114573519399405666776E-44f
+#define vector __attribute__ ((altivec (vector__)))
+
+vector float
+foo (vector float a)
+{
+  vector float b = __builtin_vec_madd (b, a, (vector float) { C, C, C, C });   
/* { dg-error "requires the '-maltivec' option" } */
+  return b;
+}

Jakub


Re: [PATCH][i386,AVX] Enable VBMI2 support [3/7]

2017-11-23 Thread Kirill Yukhin
Hello, Julia!
On 24 Oct 08:35, Koval, Julia wrote:
> Hi,
> This patch enables VPEXPANDB[W] instruction. The doc for isaset and 
> instruction: 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Ok for trunk?
Your patch is OK for trunk. I've checked it in.

--
Thanks, K


Re: std::forward_list optim for always equal allocator

2017-11-23 Thread François Dumont

Gentle reminder for this patch.

I looked when the constructor got unused and I think it is back in June 
2015 in git commit:


commit debb6aabb771ed02cb7256a7719555e5fbd7d3f7
Author: redi 
Date:   Wed Jun 17 17:45:45 2015 +

    * include/bits/forward_list.h
    (_Fwd_list_base(const _Node_alloc_type&)): Change parameter to
    rvalue-reference.

If you fear abi breaking change I can restore it in a 
!_GLIBCXX_INLINE_VERSION section.


François


On 28/08/2017 21:09, François Dumont wrote:

Hi

    Any news for this patch ?

    It does remove a constructor:
-    _Fwd_list_impl(const _Node_alloc_type& __a)
-    : _Node_alloc_type(__a), _M_head()

 It was already unused before the patch. Do you think it has ever 
been used and so do I need to restore it ?


    I eventually restore the _M_head() in _Fwd_list_impl constructors 
cause IMO it is the inline init of _M_next in _Fwd_list_node_base 
which should be removed. But I remember Jonathan that you didn't want 
to do so because gcc was not good enough in detecting usage of 
uninitialized variables, is it still true ?


François


On 17/07/2017 22:10, François Dumont wrote:

Hi

    Here is the patch to implement the always equal alloc 
optimization for forward_list. With this version there is no abi issue.


    I also prefer to implement the _Fwd_list_node_base move operator 
for consistency with the move constructor and used it where applicable.



    * include/bits/forward_list.h
    (_Fwd_list_node_base& operator=(_Fwd_list_node_base&&)): Implement.
    (_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New.
    (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, 
std::true_type)):

    New, use latter.
    (forward_list(forward_list&&, _Node_alloc_type&&, std::false_type)):
    New.
    (forward_list(forward_list&&, _Node_alloc_type&&, std::true_type)):
    New.
    (forward_list(forward_list&&, const _Alloc&)): Adapt to use latters.
    * include/bits/forward_list.tcc
    (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&)): Adapt to use
    _M_impl._M_head move assignment.
    (forward_list<>::merge(forward_list<>&&, _Comp)): Likewise.

Tested under Linux x86_64, ok to commit ?

François





diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index 9d86fcc..772e9a0 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -54,6 +54,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   struct _Fwd_list_node_base
   {
 _Fwd_list_node_base() = default;
+_Fwd_list_node_base(_Fwd_list_node_base&& __x) noexcept
+  : _M_next(__x._M_next)
+{ __x._M_next = nullptr; }
+
+_Fwd_list_node_base(const _Fwd_list_node_base&) = delete;
+_Fwd_list_node_base& operator=(const _Fwd_list_node_base&) = delete;
+
+_Fwd_list_node_base&
+operator=(_Fwd_list_node_base&& __x) noexcept
+{
+  _M_next = __x._M_next;
+  __x._M_next = nullptr;
+  return *this;
+}
 
 _Fwd_list_node_base* _M_next = nullptr;
 
@@ -68,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	  __end->_M_next = _M_next;
 	}
   else
-	__begin->_M_next = 0;
+	__begin->_M_next = nullptr;
   _M_next = __keep;
   return __end;
 }
@@ -173,7 +187,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (_M_node)
 	  return _Fwd_list_iterator(_M_node->_M_next);
 	else
-  return _Fwd_list_iterator(0);
+	  return _Fwd_list_iterator(nullptr);
   }
 
   _Fwd_list_node_base* _M_node;
@@ -244,7 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (this->_M_node)
 	  return _Fwd_list_const_iterator(_M_node->_M_next);
 	else
-  return _Fwd_list_const_iterator(0);
+	  return _Fwd_list_const_iterator(nullptr);
   }
 
   const _Fwd_list_node_base* _M_node;
@@ -285,11 +299,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Fwd_list_node_base _M_head;
 
 	_Fwd_list_impl()
+	  noexcept( noexcept(_Node_alloc_type()) )
 	: _Node_alloc_type(), _M_head()
 	{ }
 
-_Fwd_list_impl(const _Node_alloc_type& __a)
-: _Node_alloc_type(__a), _M_head()
+	_Fwd_list_impl(_Fwd_list_impl&&) = default;
+
+	_Fwd_list_impl(_Fwd_list_impl&& __fl, _Node_alloc_type&& __a)
+	: _Node_alloc_type(std::move(__a)), _M_head(std::move(__fl._M_head))
 	{ }
 
 	_Fwd_list_impl(_Node_alloc_type&& __a)
@@ -312,26 +329,26 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_get_Node_allocator() const noexcept
   { return this->_M_impl; }
 
-  _Fwd_list_base()
-  : _M_impl() { }
+  _Fwd_list_base() = default;
 
   _Fwd_list_base(_Node_alloc_type&& __a)
   : _M_impl(std::move(__a)) { }
 
+  // When allocators are always equal.
+  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a,
+		 std::true_type)
+  : _M_impl(std::move(__lst._M_impl), std::move(__a))
+  { }
+
+  // When allocators are not always equal.
   _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a);
 
-  _Fwd_list_base(_Fwd_li

[PATCH] PR libstdc++/83134 Ensure std::__not_ converts B::value to bool

2017-11-23 Thread Jonathan Wakely

Agustín reported this bug in std::negation, so we need to fix our
underlying __not_<> template to meet the C++17 requirements for
std::negation.

PR libstdc++/83134
* include/std/type_traits (__not_): Explicitly convert to bool.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust dg-error.
* testsuite/20_util/logical_traits/83134.cc: New test.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Adjust
dg-error.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.

Tested powerpc64le-linux, committed to trunk.

commit 95e39f54617fc927af81e52845849beb4469b807
Author: Jonathan Wakely 
Date:   Thu Nov 23 20:26:27 2017 +

PR libstdc++/83134 Ensure std::__not_ converts B::value to bool

PR libstdc++/83134
* include/std/type_traits (__not_): Explicitly convert to bool.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust dg-error.
* testsuite/20_util/logical_traits/83134.cc: New test.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Adjust
dg-error.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7eca08c1a50..723c137f5b9 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -151,10 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct __not_
-: public integral_constant
+: public __bool_constant
 { };
 
-#if __cplusplus > 201402L
+#if __cplusplus >= 201703L
 
 #define __cpp_lib_logical_traits 201510
 
@@ -174,18 +174,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   template
-inline constexpr bool conjunction_v
-= conjunction<_Bn...>::value;
+inline constexpr bool conjunction_v = conjunction<_Bn...>::value;
 
   template
-inline constexpr bool disjunction_v
-= disjunction<_Bn...>::value;
+inline constexpr bool disjunction_v = disjunction<_Bn...>::value;
 
   template
-inline constexpr bool negation_v
-= negation<_Pp>::value;
+inline constexpr bool negation_v = negation<_Pp>::value;
 
-#endif
+#endif // C++17
 
   // For several sfinae-friendly trait implementations we transport both the
   // result information (as the member type) and the failure information (no
diff --git a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
index 17b41a007db..bc12719071e 100644
--- a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
@@ -18,7 +18,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 2093 }
+// { dg-error "declval.. must not be used" "" { target *-*-* } 0 }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/logical_traits/83134.cc 
b/libstdc++-v3/testsuite/20_util/logical_traits/83134.cc
new file mode 100644
index 000..3595e64e977
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/logical_traits/83134.cc
@@ -0,0 +1,32 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+struct X {
+  constexpr operator bool() const { return false; }
+  constexpr bool operator!() const = delete;
+};
+
+struct Y {
+  static constexpr X value{};
+};
+
+static_assert(std::negation::value);  // PR libstdc++/83134
diff --git 
a/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs_neg.cc 
b/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs_neg.cc
index 308155383f0..05bef35c03b 100644
--- a/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs_neg.cc
@@ -47,4 +47,4 @@ void test01()
 // { dg-error "required from here" "" { target *-*-* } 39 }
 // { dg-error "required from here" "" { target *-*-* } 41 }
 
-// { dg-error "invalid use of incomplete type" "" { target *-*-* } 1760 }
+// { dg-error "invalid us

Re: [libobjc, committed] Wrap CLASS_TABLE_HASH in do {} while (0)

2017-11-23 Thread Andrew Pinski
On Thu, Nov 23, 2017 at 11:54 AM, Tom de Vries  wrote:
> Hi,
>
> this patch wraps CLASS_TABLE_HASH in "do {} while (0)". This allows the
> macro to be used in if-then-elses without curly braces.
>
> Build libobcj for x86_64.
>
> Committed as obvious.

Yes this is ok.

Thanks,
Andrew


>
> Thanks,
> - Tom


Re: [PATCH] Fix PR83089

2017-11-23 Thread Christophe Lyon
On 23 November 2017 at 09:22, Richard Biener  wrote:
> On Wed, 22 Nov 2017, Christophe Lyon wrote:
>
>> On 22 November 2017 at 09:44, Richard Biener  wrote:
>> >
>> > The following fixes if-conversion to free SCEV/niter estimates because
>> > it DCEs stmts.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> >
>> > Richard.
>> >
>> > 2017-11-22  Richard Biener  
>> >
>> > PR tree-optimization/83089
>> > * tree-if-conv.c (pass_if_conversion::execute): If anything
>> > changed reset SCEV and free the number of iteration estimates.
>> >
>> > * gcc.dg/pr83089.c: New testcase.
>> >
>> > Index: gcc/tree-if-conv.c
>> > ===
>> > --- gcc/tree-if-conv.c  (revision 254990)
>> > +++ gcc/tree-if-conv.c  (working copy)
>> > @@ -2959,6 +2959,12 @@ pass_if_conversion::execute (function *f
>> > && !loop->dont_vectorize))
>> >todo |= tree_if_conversion (loop);
>> >
>> > +  if (todo)
>> > +{
>> > +  free_numbers_of_iterations_estimates (fun);
>> > +  scev_reset ();
>> > +}
>> > +
>> >if (flag_checking)
>> >  {
>> >basic_block bb;
>> > Index: gcc/testsuite/gcc.dg/pr83089.c
>> > ===
>> > --- gcc/testsuite/gcc.dg/pr83089.c  (nonexistent)
>> > +++ gcc/testsuite/gcc.dg/pr83089.c  (working copy)
>> > @@ -0,0 +1,27 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -ftree-loop-if-convert -ftree-parallelize-loops=2" } 
>> > */
>> > +
>> > +int rl, s8;
>> > +
>> > +void
>> > +it (int zy, short int el)
>> > +{
>> > +  int hb;
>> > +
>> > +  while (el != 0)
>> > +{
>> > +  hb = el;
>> > +  for (rl = 0; rl < 200; ++rl)
>> > +   {
>> > + for (s8 = 0; s8 < 2; ++s8)
>> > +   {
>> > +   }
>> > + if (s8 < 3)
>> > +   zy = hb;
>> > + if (hb == 0)
>> > +   ++s8;
>> > + zy += (s8 != -1);
>> > +   }
>> > +  el = zy;
>> > +}
>> > +}
>>
>> I see the new testcase failing to compile on bare-metal targets
>> (aarch64-elf, arm-eabi):
>> spawn -ignore SIGHUP
>> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-elf/gcc3/gcc/xgcc
>> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-elf/gcc3/gcc/
>> /gcc/testsuite/gcc.dg/pr83089.c -fno-diagnostics-show-caret
>> -fdiagnostics-color=never -O2 -ftree-loop-if-convert
>> -ftree-parallelize-loops=2 -S -specs=aem-ve.specs -o pr83089.s
>> xgcc: error: unrecognized command line option '-pthread'
>> compiler exited with status 1
>> FAIL: gcc.dg/pr83089.c (test for excess errors)
>>
>> Where does that -pthread come from?
>
> From some specs processing I guess.  Didn't think that it could
> happen for a compile only testcase.
>

In fact, it's triggered by ftree-parallelize-loops, because in
gcc/gcc.c we have:
#define GOMP_SELF_SPECS \
  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): " \
  "-pthread}"

> Fixed with
>
> 2017-11-23  Richard Biener  
>
> PR tree-optimization/83089
> * gcc.dg/pr83089.c: Require pthread.
>

Thanks,

Christophe

> Index: gcc/testsuite/gcc.dg/pr83089.c
> ===
> --- gcc/testsuite/gcc.dg/pr83089.c  (revision 255060)
> +++ gcc/testsuite/gcc.dg/pr83089.c  (working copy)
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target pthread } */
>  /* { dg-options "-O2 -ftree-loop-if-convert -ftree-parallelize-loops=2" } */
>
>  int rl, s8;


[PATCH] Add [[nodiscard]] attribute to C++17 components

2017-11-23 Thread Jonathan Wakely

C++17 added the [[nodiscard]] attribute, similar to GCC's
warn_unused_result. The C++2a draft requires it to be used in several
places. This patch adds it where required on components which are new
to C++17, which are still experimental and so I'm changing them in
stage 3.

* include/bits/fs_path.h (path::empty): Add nodiscard attribute.
* include/bits/range_access.h (empty): Likewise.
* include/std/string_view (basic_string_view::empty): Likewise.
* testsuite/21_strings/basic_string_view/capacity/empty_neg.cc: New
test.
* testsuite/24_iterators/range_access_cpp17_neg.cc: New test.
* testsuite/27_io/filesystem/path/query/empty_neg.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit b5d8dfbb291b02c071173612c63fd806e844e191
Author: Jonathan Wakely 
Date:   Thu Nov 23 21:39:30 2017 +

Add [[nodiscard]] attribute to C++17 components

* include/bits/fs_path.h (path::empty): Add nodiscard attribute.
* include/bits/range_access.h (empty): Likewise.
* include/std/string_view (basic_string_view::empty): Likewise.
* testsuite/21_strings/basic_string_view/capacity/empty_neg.cc: New
test.
* testsuite/24_iterators/range_access_cpp17_neg.cc: New test.
* testsuite/27_io/filesystem/path/query/empty_neg.cc: New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 7d97cdfbb81..99740c9b383 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -370,7 +370,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 // query
 
-bool empty() const noexcept { return _M_pathname.empty(); }
+[[nodiscard]] bool empty() const noexcept { return _M_pathname.empty(); }
 bool has_root_name() const;
 bool has_root_directory() const;
 bool has_root_path() const;
diff --git a/libstdc++-v3/include/bits/node_handle.h 
b/libstdc++-v3/include/bits/node_handle.h
index 4a830630c89..0d8dbeb4110 100644
--- a/libstdc++-v3/include/bits/node_handle.h
+++ b/libstdc++-v3/include/bits/node_handle.h
@@ -62,7 +62,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   explicit operator bool() const noexcept { return _M_ptr != nullptr; }
 
-  bool empty() const noexcept { return _M_ptr == nullptr; }
+  [[nodiscard]] bool empty() const noexcept { return _M_ptr == nullptr; }
 
 protected:
   constexpr _Node_handle_common() noexcept : _M_ptr(), _M_alloc() {}
diff --git a/libstdc++-v3/include/bits/range_access.h 
b/libstdc++-v3/include/bits/range_access.h
index 2a037ad8082..a5044f11976 100644
--- a/libstdc++-v3/include/bits/range_access.h
+++ b/libstdc++-v3/include/bits/range_access.h
@@ -257,7 +257,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template 
-constexpr auto
+[[nodiscard]] constexpr auto
 empty(const _Container& __cont) noexcept(noexcept(__cont.empty()))
 -> decltype(__cont.empty())
 { return __cont.empty(); }
@@ -267,7 +267,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __array  Container.
*/
   template 
-constexpr bool
+[[nodiscard]] constexpr bool
 empty(const _Tp (&/*__array*/)[_Nm]) noexcept
 { return false; }
 
@@ -276,7 +276,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __il  Initializer list.
*/
   template 
-constexpr bool
+[[nodiscard]] constexpr bool
 empty(initializer_list<_Tp> __il) noexcept
 { return __il.size() == 0;}
 
diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 1900b867841..fa834002726 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -160,7 +160,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
/ sizeof(value_type) / 4;
   }
 
-  constexpr bool
+  [[nodiscard]] constexpr bool
   empty() const noexcept
   { return this->_M_len == 0; }
 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/capacity/empty_neg.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/capacity/empty_neg.cc
new file mode 100644
index 000..59c87d007d8
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string_view/capacity/empty_neg.cc
@@ -0,0 +1,28 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this lib

Re: [PATCH][i386,AVX] Enable VBMI2 support [2/7]

2017-11-23 Thread Marc Glisse

On Thu, 23 Nov 2017, Kirill Yukhin wrote:


Hello, Julia!
On 24 Oct 08:25, Koval, Julia wrote:

Hi,
This patch enables VPCOMPRESSB[W] instruction. The doc for isaset and 
instruction: 
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Ok for trunk?

Your patch is OK for trunk. I;ve checked it in.


This seems to break the build for me:

In file included from 
/home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:21:0:
/home/glisse/repos/gcc/trunk/gcc/config/i386/i386.c:30165:16: error: 
‘IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST’ was not declared in this scope

 BDESC_VERIFYS (IX86_BUILTIN__BDESC_SPECIAL_ARGS2_FIRST,
^


--
Marc Glisse


Re: [PATCH] Add [[nodiscard]] attribute to C++17 components

2017-11-23 Thread Jonathan Wakely

On 23/11/17 22:40 +, Jonathan Wakely wrote:

C++17 added the [[nodiscard]] attribute, similar to GCC's
warn_unused_result. The C++2a draft requires it to be used in several
places. This patch adds it where required on components which are new
to C++17, which are still experimental and so I'm changing them in
stage 3.


This adds the other [[nodiscard]] attributes required by C++2a, but
using [[__nodiscard__]] so it can be enabled in C++11 and C++14, where
nodiscard is not a reserved name.

I don't plan to commit this until stage 1 (and don't have tests for it
yet anyway).


commit 9159c87e19510662eecfeadf8facbe68b40ac2c7
Author: Jonathan Wakely 
Date:   Thu Nov 23 22:42:45 2017 +

Add more nodiscard attributes

diff --git a/libstdc++-v3/include/bits/alloc_traits.h b/libstdc++-v3/include/bits/alloc_traits.h
index 86a48594ea1..ba75f33726d 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -296,7 +296,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  Calls @c a.allocate(n)
   */
-  static pointer
+  [[__nodiscard__]] static pointer
   allocate(_Alloc& __a, size_type __n)
   { return __a.allocate(__n); }
 
@@ -311,7 +311,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Returns  a.allocate(n, hint)  if that expression is
*  well-formed, otherwise returns @c a.allocate(n)
   */
-  static pointer
+  [[__nodiscard__]] static pointer
   allocate(_Alloc& __a, size_type __n, const_void_pointer __hint)
   { return _S_allocate(__a, __n, __hint, 0); }
 
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index a4b81137571..bec6e902076 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1008,7 +1008,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  Returns true if the %string is empty.  Equivalent to 
*  *this == "".
*/
-  bool
+  _GLIBCXX_NODISCARD bool
   empty() const _GLIBCXX_NOEXCEPT
   { return this->size() == 0; }
 
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 21e3fbb2741..cfa2534e3f5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -99,6 +99,11 @@
 # define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
 #endif
 
+#if __cplusplus >= 201103L
+# define _GLIBCXX_NODISCARD [[__nodiscard__]]
+#else
+# define _GLIBCXX_NODISCARD
+#endif
 
 #if __cplusplus
 
diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index 96494cc20de..b8eb9f3d6fe 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -756,7 +756,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  Returns true if the %forward_list is empty.  (Thus begin() would
*  equal end().)
*/
-  bool
+  [[__nodiscard__]] bool
   empty() const noexcept
   { return this->_M_impl._M_head._M_next == 0; }
 
diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h
index 32e7159eec9..4866cb42b00 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -1631,7 +1631,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* @retval true The %match_results object is empty.
* @retval false The %match_results object is not empty.
*/
-  bool
+  [[__nodiscard__]] bool
   empty() const
   { return size() == 0; }
 
diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index ac548846b0e..80a54e78ff3 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -873,7 +873,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { return size_type(const_iterator(this->_M_impl._M_end_addr(), 0)
 			 - begin()); }
 
-  bool
+  _GLIBCXX_NODISCARD bool
   empty() const _GLIBCXX_NOEXCEPT
   { return begin() == end(); }
 
diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index 93c82ca8909..ae170623247 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -1363,7 +1363,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  Returns true if the %deque is empty.  (Thus begin() would
*  equal end().)
*/
-  bool
+  _GLIBCXX_NODISCARD bool
   empty() const _GLIBCXX_NOEXCEPT
   { return this->_M_impl._M_finish == this->_M_impl._M_start; }
 
diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h
index 8ed97f7c8b0..93378139871 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -1058,7 +1058,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  Returns true if the %list is empty.  (Thus begin() would equal
*  end().)
*/
-  bool
+  _GLIBCXX_NODISCARD bool
   empty(

[Ada] Fix PR ada/83091

2017-11-23 Thread Eric Botcazou
This fixes a small oversight in gigi, which can lead to an inconsistency 
between type variants for the mode when packed array types are involved.

Tested on x86-64/Linux, applied on the mainline.


2017-11-23  Eric Botcazou  

PR ada/83091
* gcc-interface/decl.c (gnat_to_gnu_entity): Do not build a variant
type for the implementation type of a packed array.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 255106)
+++ gcc-interface/decl.c	(working copy)
@@ -4568,7 +4568,11 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 			 ? ALIAS_SET_COPY : ALIAS_SET_SUPERSET);
 	}
 
-  if (Treat_As_Volatile (gnat_entity))
+  /* Finally get to the appropriate variant, except for the implementation
+	 type of a packed array because the GNU type might be further adjusted
+	 when the original array type is itself processed.  */
+  if (Treat_As_Volatile (gnat_entity)
+	  && !Is_Packed_Array_Impl_Type (gnat_entity))
 	{
 	  const int quals
 	= TYPE_QUAL_VOLATILE


Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-23 Thread Christophe Lyon
On 22 November 2017 at 12:26, Kyrill  Tkachov
 wrote:
> Hi Tamar,
>
> On 06/11/17 16:53, Tamar Christina wrote:
>>
>> Hi All,
>>
>> This patch adds the NEON intrinsics for Dot product.
>>
>> Dot product is available from ARMv8.2-a and onwards.
>>
>> Regtested on arm-none-eabi, armeb-none-eabi,
>> aarch64-none-elf and aarch64_be-none-elf with no issues found.
>>
>> Ok for trunk?
>>
>> gcc/
>> 2017-11-06  Tamar Christina  
>>
>> * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)
>
>
> This should be config/arm/arm_neon.h
>
>> (vdot_s32, vdotq_s32): New.
>> (vdot_lane_u32, vdotq_lane_u32): New.
>> (vdot_lane_s32, vdotq_lane_s32): New.
>>
>>
>> gcc/testsuite/
>> 2017-11-06  Tamar Christina  
>>
>> * gcc.target/arm/simd/vdot-compile.c: New.
>> * gcc.target/arm/simd/vect-dot-qi.h: New.
>> * gcc.target/arm/simd/vect-dot-s8.c: New.
>> * gcc.target/arm/simd/vect-dot-u8.c: New
>>
>> --
>
>
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> index
> 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a
> 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
>   #endif
>  +/* Adv.SIMD Dot Product intrinsics.  */
>
> Please no full stop: "AdvSIMD".
>
> +
> +#pragma GCC push_options
> +#if __ARM_ARCH >= 8
> +#pragma GCC target ("arch=armv8.2-a+dotprod")
>
> 
>
Not sure if Kyrill actually meant to comment about the three lines
above, but they have a bug:
#if should be before #pragma GCC push_options.

Indeed, after this patch was committed (r255064), I've noticed many
regressions, for instance
p64_p128 is now unsupported. This is because the arm_crypto_ok
effective target now fails
with this message:
XXX/arm_neon.h:16911:1: error: inlining failed in call to
always_inline 'vaeseq_u8': target specific option mismatch

Not sure why this wasn't noticed in validations earlier?

Fixed as obvious (r255126).

Christophe

> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> new file mode 100644
> index
> ..90b00aff95cfef96d1963be17673dc191cc71169
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> @@ -0,0 +1,15 @@
> +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +
> +__attribute__ ((noinline)) int
> +foo1(int len) {
> +  int i;
> +  TYPE int result = 0;
> +  TYPE short prod;
> +
> +  for (i=0; i +prod = X[i] * Y[i];
> +result += prod;
> +  }
> +  return result;
> +}
> \ No newline at end of file
>
> Please add new lines at the end of the new test files.
> This applies to a few more new files in this patch.
>
> Ok with these nits fixed.
>
> Thanks,
> Kyrill
>
2017-11-24  Christophe Lyon  

* config/arm/arm_neon.h: Fix pragma GCC push_options before
vdot_u32.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c9a8d9..d2e936c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18036,8 +18036,8 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
 
 /* AdvSIMD Dot Product intrinsics.  */
 
-#pragma GCC push_options
 #if __ARM_ARCH >= 8
+#pragma GCC push_options
 #pragma GCC target ("arch=armv8.2-a+dotprod")
 
 __extension__ extern __inline uint32x2_t


Re: [PATCH] Add [[nodiscard]] attribute to C++17 components

2017-11-23 Thread Jonathan Wakely

On 23/11/17 22:40 +, Jonathan Wakely wrote:

C++17 added the [[nodiscard]] attribute, similar to GCC's
warn_unused_result. The C++2a draft requires it to be used in several
places. This patch adds it where required on components which are new
to C++17, which are still experimental and so I'm changing them in
stage 3.


I missed one of the C++17 pieces that should have [[nodiscard]].

Tested powerpc64le-linux, committed to trunk.


commit 65c20af6ec193916e390462ccfbd8e8e6b54ba8c
Author: Jonathan Wakely 
Date:   Thu Nov 23 22:51:31 2017 +

Add [[nodiscard]] attribute to std::launder

* libsupc++/new (launder): Add nodiscard attribute.
* testsuite/18_support/launder/nodiscard.cc: New test.

diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 795b33c4e6a..0e408b1019c 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -176,7 +176,7 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
 //@}
 } // extern "C++"
 
-#if __cplusplus > 201402L
+#if __cplusplus >= 201703L
 #if __GNUC__ >= 7
 #  define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #elif defined __has_builtin
@@ -192,7 +192,7 @@ namespace std
 #define __cpp_lib_launder 201606
   /// Pointer optimization barrier [ptr.launder]
   template
-constexpr _Tp*
+[[nodiscard]] constexpr _Tp*
 launder(_Tp* __p) noexcept
 { return __builtin_launder(__p); }
 
diff --git a/libstdc++-v3/testsuite/18_support/launder/nodiscard.cc b/libstdc++-v3/testsuite/18_support/launder/nodiscard.cc
new file mode 100644
index 000..1741465abed
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/launder/nodiscard.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+struct A { const int i; };
+
+void
+test01(A* a)
+{
+  std::launder(a); // { dg-warning "ignoring return value" }
+}


Re: RFC/A: Stop cselib.c recording sets of the frame pointer

2017-11-23 Thread Jeff Law
On 11/22/2017 10:46 AM, Richard Sandiford wrote:
> One of the effects of:
> 
>https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02066.html
> 
> is that we now emit:
> 
>(set (hard-frame-pointer) (stack-pointer))
> 
> instead of the previous non-canonical:
> 
>(set (hard-frame-pointer) (plus (stack-pointer) (const_int 0)))
> 
> However, recent changes to aarch64_expand_prologue mean that we
> can emit a frame record even if !frame_pointer_needed.  In that case,
> the frame pointer is initialised normally, but all references generated
> by GCC continue to use the stack pointer.
> 
> The combination of these two things triggered a regression in
> gcc.c-torture/execute/960419-2.c.  The problem is that alias.c
> fundemantally requires GCC to be consistent about which register
> it uses:
> 
>  2. stack_pointer_rtx, frame_pointer_rtx, hard_frame_pointer_rtx
>   (if distinct from frame_pointer_rtx) and arg_pointer_rtx.
>   Each of these rtxes has a separate ADDRESS associated with it,
>   each with a negative id.
> 
>   GCC is (and is required to be) precise in which register it
>   chooses to access a particular region of stack.  We can therefore
>   assume that accesses based on one of these rtxes do not alias
>   accesses based on another of these rtxes.
Yea.  I remember acking this code for jfc circa 1998.



> 
> But in the test case cselib saw the move above and recorded the two
> registers as being equivalent.  We therefore replaced some (but not all)
> references to the stack pointer with references to the frame pointer.
> MEMs that were still based on the stack pointer would then not alias
> MEMs that became based on the frame pointer.
> 
> cselib.c has code to try to prevent this:
> 
>   for (; p; p = p->next)
> {
>   /* Return these right away to avoid returning stack pointer based
>expressions for frame pointer and vice versa, which is something
>that would confuse DSE.  See the comment in cselib_expand_value_rtx_1
>for more details.  */
>   if (REG_P (p->loc)
> && (REGNO (p->loc) == STACK_POINTER_REGNUM
> || REGNO (p->loc) == FRAME_POINTER_REGNUM
> || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
> || REGNO (p->loc) == cfa_base_preserved_regno))
>   return p->loc;
> 
> which refers to:
> 
> /* The only thing that we are not willing to do (this
>is requirement of dse and if others potential uses
>need this function we should add a parm to control
>it) is that we will not substitute the
>STACK_POINTER_REGNUM, FRAME_POINTER or the
>HARD_FRAME_POINTER.
> 
>These expansions confuses the code that notices that
>stores into the frame go dead at the end of the
>function and that the frame is not effected by calls
>to subroutines.  If you allow the
>STACK_POINTER_REGNUM substitution, then dse will
>think that parameter pushing also goes dead which is
>wrong.  If you allow the FRAME_POINTER or the
>HARD_FRAME_POINTER then you lose the opportunity to
>make the frame assumptions.  */
> 
> But that doesn't work if the two registers are equal, and thus
> in the same value chain.  It becomes pot luck which one is chosen.
Right.  I can't recall these bits in cselib.


> 
> I think it'd be better not to enter an equivalence for the set of
> the frame pointer at all.  That should deal with the same correctness
> concern as the code above, but I think we still want the existing checks
> for optimisation reasons.  It seems better to check for pointer equality
> instead of register number equality though, since we don't want to change
> the behaviour when the frame pointer register is not being used as a frame
> pointer.
You have to be real careful here.  regcprop can create copies of the
special registers -- I ran into it making a copy of stack_pointer_rtx
earlier this year.  It special cases the stack pointer now with a comment
"It's unclear if we need to do the same for other special registers".


> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Does it look like the right approach?  OK to install if so?
> 
> Thanks,
> Richard
> 
> 
> 2017-11-22  Richard Sandiford  
> 
> gcc/
>   * cselib.c (expand_loc, cselib_expand_value_rtx_1): Change
>   justification for checking for the stack and hard frame pointers.
>   Check them by pointer rather than register number.
>   (cselib_record_set): Do nothing for sets of the frame pointer.
I think the big question here is the pointer equality vs register number
check.  Prior to reload/lra we're probably safe with the former, after
we probably need the latter, likely conditionalized with
frame_pointer_needed or somesuch to avoid triggering when we just have a
value in the frame pointer 

[PING] Ability to remap file names in __FILE__, etc (PR other/70268)

2017-11-23 Thread Boris Kolpackov
Hi,

I would like to ping this patch:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01451.html

While the patch touches quite a few places, it is mostly reshuffling
and generalizing existing code. It also includes tests for the new
functionality. Seeing that there is a lot of interest[1] in reproducible
builds, I think it will be useful to have this patch considered for
GCC 8, if possible.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70268

Thanks,
Boris