Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-17 Thread Eric Botcazou
> The following patch implements this optimization.  It checks for a range
> that does not have the sign-bit set, and an index value that is already
> sign extended, and then does a sign extend instead of an zero extend.
> 
> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite
> runs. There were no regressions.  It was also tested with an x86_64-linux
> build and testsuite run.
> 
> Ok?
> 
> Jim
> 
>   gcc/
>   * expr.c (do_tablejump): When converting index to Pmode, if we have a
>   sign extended promoted subreg, and the range does not have the sign bit
>   set, then do a sign extend.

Richard dragged me into this so I feel somewhat entitled to step up...

The patch looks OK to me, modulo:

> +  /* We know the value of INDEX is between 0 and RANGE.  If we have a
> +  sign-extended subreg, and RANGE does not have the sign bit set, then
> +  we have a value that is valid for both sign and zero extension.  In
> +  this case, we get better code if we sign extend.  */
> +  if (GET_CODE (index) == SUBREG
> +   && SUBREG_PROMOTED_VAR_P (index)
> +   && SUBREG_PROMOTED_SIGNED_P (index)
> +   && ((width = GET_MODE_PRECISION (as_a  (mode)))
> +   <= HOST_BITS_PER_WIDE_INT)
> +   && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1

I'd use UINTVAL instead of INTVAL here.

-- 
Eric Botcazou


Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-17 Thread Andreas Schwab
On Mai 16 2018, Andreas Schwab  wrote:

> On Mai 15 2018, Jason Merrill  wrote:
>
>> commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
>> Author: Jason Merrill 
>> Date:   Tue May 15 20:46:54 2018 -0400
>>
>> * cp-tree.h (cp_expr): Remove copy constructor.
>> 
>> * mangle.c (struct releasing_vec): Declare copy constructor.
>
> I'm getting an ICE on ia64 during the stage1 build of libstdc++ (perhaps
> related that this uses gcc 4.8 as the bootstrap compiler):

I have now switched to gcc 5 as the bootstrap compiler, which doesn't
have this issue.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64

2018-05-17 Thread Kyrill Tkachov

Hi,

Given this is a midend change it's a good idea to CC some of the maintainers of 
that area.
I've copied richi and Honza.

Thanks,
Kyrill

On 17/05/18 05:35, vladimir.mezent...@oracle.com wrote:

Ping.

-Vladimir


On 05/10/2018 11:30 PM, vladimir.mezent...@oracle.com wrote:
> From: Vladimir Mezentsev 
>
> When weakref_targets is not empty a target cannot be removed from the weak 
list.
> A small example is below when 'wv12' is removed from the weak list on aarch64:
>   static vtype Wv12 __attribute__((weakref ("wv12")));
>   extern vtype wv12 __attribute__((weak));
>
> Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go).
> Tested on aarch64-linux-gnu.
> No regression. The attr-weakref-1.c test passed.
>
> ChangeLog:
> 2018-05-10  Vladimir Mezentsev 
>
> PR gcc/84923
> * varasm.c (weak_finish): clean up weak_decls
> ---
>  gcc/varasm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 85296b4..8cf6e1e 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -5652,7 +5652,8 @@ weak_finish (void)
>tree alias_decl = TREE_PURPOSE (t);
>tree target = ultimate_transparent_alias_target (&TREE_VALUE (t));
>
> -  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl)))
> +  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl))
> + || TREE_SYMBOL_REFERENCED (target))
>/* Remove alias_decl from the weak list, but leave entries for
>   the target alone.  */
>target = NULL_TREE;





Allow gimple_build with internal functions

2018-05-17 Thread Richard Sandiford
This patch makes the function versions of gimple_build and
gimple_simplify take combined_fns rather than built_in_codes,
so that they work with internal functions too.  The old
gimple_builds were unused, so no existing callers need
to be updated.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-17  Richard Sandiford  

gcc/
* gimple-fold.h (gimple_build): Make the function forms take
combined_fn rather than built_in_function.
(gimple_simplify): Likewise.
* gimple-match-head.c (gimple_simplify): Likewise.
* gimple-fold.c (gimple_build): Likewise.
* tree-vect-loop.c (get_initial_def_for_reduction): Use gimple_build
rather than gimple_build_call_internal.
(get_initial_defs_for_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_live_operation): Likewise.

Index: gcc/gimple-fold.h
===
--- gcc/gimple-fold.h   2018-05-16 20:17:39.114152860 +0100
+++ gcc/gimple-fold.h   2018-05-17 09:17:32.876478942 +0100
@@ -86,28 +86,25 @@ gimple_build (gimple_seq *seq,
 {
   return gimple_build (seq, UNKNOWN_LOCATION, code, type, op0, op1, op2);
 }
-extern tree gimple_build (gimple_seq *, location_t,
- enum built_in_function, tree, tree);
+extern tree gimple_build (gimple_seq *, location_t, combined_fn, tree, tree);
 inline tree
-gimple_build (gimple_seq *seq,
- enum built_in_function fn, tree type, tree arg0)
+gimple_build (gimple_seq *seq, combined_fn fn, tree type, tree arg0)
 {
   return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0);
 }
-extern tree gimple_build (gimple_seq *, location_t,
- enum built_in_function, tree, tree, tree);
+extern tree gimple_build (gimple_seq *, location_t, combined_fn,
+ tree, tree, tree);
 inline tree
-gimple_build (gimple_seq *seq,
- enum built_in_function fn, tree type, tree arg0, tree arg1)
+gimple_build (gimple_seq *seq, combined_fn fn,
+ tree type, tree arg0, tree arg1)
 {
   return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0, arg1);
 }
-extern tree gimple_build (gimple_seq *, location_t,
- enum built_in_function, tree, tree, tree, tree);
+extern tree gimple_build (gimple_seq *, location_t, combined_fn,
+ tree, tree, tree, tree);
 inline tree
-gimple_build (gimple_seq *seq,
- enum built_in_function fn, tree type,
- tree arg0, tree arg1, tree arg2)
+gimple_build (gimple_seq *seq, combined_fn fn,
+ tree type, tree arg0, tree arg1, tree arg2)
 {
   return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0, arg1, arg2);
 }
@@ -153,11 +150,11 @@ extern tree gimple_simplify (enum tree_c
 gimple_seq *, tree (*)(tree));
 extern tree gimple_simplify (enum tree_code, tree, tree, tree, tree,
 gimple_seq *, tree (*)(tree));
-extern tree gimple_simplify (enum built_in_function, tree, tree,
+extern tree gimple_simplify (combined_fn, tree, tree,
 gimple_seq *, tree (*)(tree));
-extern tree gimple_simplify (enum built_in_function, tree, tree, tree,
+extern tree gimple_simplify (combined_fn, tree, tree, tree,
 gimple_seq *, tree (*)(tree));
-extern tree gimple_simplify (enum built_in_function, tree, tree, tree, tree,
+extern tree gimple_simplify (combined_fn, tree, tree, tree, tree,
 gimple_seq *, tree (*)(tree));
 
 #endif  /* GCC_GIMPLE_FOLD_H */
Index: gcc/gimple-match-head.c
===
--- gcc/gimple-match-head.c 2018-03-30 12:28:37.301927949 +0100
+++ gcc/gimple-match-head.c 2018-05-17 09:17:32.876478942 +0100
@@ -478,55 +478,53 @@ gimple_simplify (enum tree_code code, tr
   return maybe_push_res_to_seq (rcode, type, ops, seq);
 }
 
-/* Builtin function with one argument.  */
+/* Builtin or internal function with one argument.  */
 
 tree
-gimple_simplify (enum built_in_function fn, tree type,
+gimple_simplify (combined_fn fn, tree type,
 tree arg0,
 gimple_seq *seq, tree (*valueize)(tree))
 {
   if (constant_for_folding (arg0))
 {
-  tree res = fold_const_call (as_combined_fn (fn), type, arg0);
+  tree res = fold_const_call (fn, type, arg0);
   if (res && CONSTANT_CLASS_P (res))
return res;
 }
 
   code_helper rcode;
   tree ops[3] = {};
-  if (!gimple_simplify (&rcode, ops, seq, valueize,
-   as_combined_fn (fn), type, arg0))
+  if (!gimple_simplify (&rcode, ops, seq, valueize, fn, type, arg0))
 return NULL_TREE;
   return maybe_push_res_to_seq (rcode, type, ops, seq);
 }
 
-/* Builtin function with two arguments.  */
+/* Builtin or internal function with two argume

Gimple FE support for internal functions

2018-05-17 Thread Richard Sandiford
This patch gets the gimple FE to parse calls to internal functions.
The only non-obvious thing was how the functions should be written
to avoid clashes with real function names.  One option would be to
go the magic number of underscores route, but we already do that for
built-in functions, and it would be good to keep them visually
distinct.  In the end I borrowed the local/internal label convention
from asm and used:

  x = .SQRT (y);

I don't think even C++ has found a meaning for a leading dot yet.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-17  Richard Sandiford  

gcc/
* internal-fn.h (lookup_internal_fn): Declare
* internal-fn.c (lookup_internal_fn): New function.
* gimple.c (gimple_build_call_from_tree): Handle calls to
internal functions.
* gimple-pretty-print.c (dump_gimple_call): Print "." before
internal function names.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-scopedtables.c (expr_hash_elt::print): Likewise.

gcc/c/
* gimple-parser.c: Include internal-fn.h.
(c_parser_gimple_statement): Treat a leading CPP_DOT as a call.
(c_parser_gimple_call_internal): New function.
(c_parser_gimple_postfix_expression): Use it to handle CPP_DOT.
Fix typos in comment.

gcc/testsuite/
* gcc.dg/gimplefe-28.c: New test.
* gcc.dg/asan/use-after-scope-9.c: Adjust expected output for
internal function calls.
* gcc.dg/goacc/loop-processing-1.c: Likewise.

Index: gcc/internal-fn.h
===
--- gcc/internal-fn.h   2018-05-16 12:48:59.194282896 +0100
+++ gcc/internal-fn.h   2018-05-17 09:17:58.757608747 +0100
@@ -107,6 +107,8 @@ internal_fn_name (enum internal_fn fn)
   return internal_fn_name_array[(int) fn];
 }
 
+extern internal_fn lookup_internal_fn (const char *);
+
 /* Return the ECF_* flags for function FN.  */
 
 extern const int internal_fn_flags_array[];
Index: gcc/internal-fn.c
===
--- gcc/internal-fn.c   2018-05-16 12:48:59.410941892 +0100
+++ gcc/internal-fn.c   2018-05-17 09:22:49.808912358 +0100
@@ -64,6 +64,26 @@ #define DEF_INTERNAL_FN(CODE, FLAGS, FNS
   0
 };
 
+/* Return the internal function called NAME, or IFN_LAST if there's
+   no such function.  */
+
+internal_fn
+lookup_internal_fn (const char *name)
+{
+  typedef hash_map name_to_fn_map_type;
+  static name_to_fn_map_type *name_to_fn_map;
+
+  if (!name_to_fn_map)
+{
+  name_to_fn_map = new name_to_fn_map_type (IFN_LAST);
+  for (unsigned int i = 0; i < IFN_LAST; ++i)
+   name_to_fn_map->put (internal_fn_name (internal_fn (i)),
+internal_fn (i));
+}
+  internal_fn *entry = name_to_fn_map->get (name);
+  return entry ? *entry : IFN_LAST;
+}
+
 /* Fnspec of each internal function, indexed by function number.  */
 const_tree internal_fn_fnspec_array[IFN_LAST + 1];
 
Index: gcc/gimple.c
===
--- gcc/gimple.c2018-05-16 12:48:59.410941892 +0100
+++ gcc/gimple.c2018-05-17 09:22:49.808912358 +0100
@@ -350,12 +350,19 @@ gimple_build_call_from_tree (tree t, tre
 {
   unsigned i, nargs;
   gcall *call;
-  tree fndecl = get_callee_fndecl (t);
 
   gcc_assert (TREE_CODE (t) == CALL_EXPR);
 
   nargs = call_expr_nargs (t);
-  call = gimple_build_call_1 (fndecl ? fndecl : CALL_EXPR_FN (t), nargs);
+
+  tree fndecl = NULL_TREE;
+  if (CALL_EXPR_FN (t) == NULL_TREE)
+call = gimple_build_call_internal_1 (CALL_EXPR_IFN (t), nargs);
+  else
+{
+  fndecl = get_callee_fndecl (t);
+  call = gimple_build_call_1 (fndecl ? fndecl : CALL_EXPR_FN (t), nargs);
+}
 
   for (i = 0; i < nargs; i++)
 gimple_call_set_arg (call, i, CALL_EXPR_ARG (t, i));
Index: gcc/gimple-pretty-print.c
===
--- gcc/gimple-pretty-print.c   2018-05-16 12:48:59.410941892 +0100
+++ gcc/gimple-pretty-print.c   2018-05-17 09:22:49.808912358 +0100
@@ -874,7 +874,7 @@ dump_gimple_call (pretty_printer *buffer
   if (flags & TDF_RAW)
 {
   if (gimple_call_internal_p (gs))
-   dump_gimple_fmt (buffer, spc, flags, "%G <%s, %T", gs,
+   dump_gimple_fmt (buffer, spc, flags, "%G <.%s, %T", gs,
 internal_fn_name (gimple_call_internal_fn (gs)), lhs);
   else
dump_gimple_fmt (buffer, spc, flags, "%G <%T, %T", gs, fn, lhs);
@@ -898,7 +898,10 @@ dump_gimple_call (pretty_printer *buffer
  pp_space (buffer);
 }
   if (gimple_call_internal_p (gs))
-   pp_string (buffer, internal_fn_name (gimple_call_internal_fn (gs)));
+   {
+ pp_dot (buffer);
+ pp_string (buffer, internal_fn_name (gimple_call_internal_fn (gs)));
+   }
   else
print_call_name 

Re: [PATCH ARM] Fix armv8-m multilib build failure with stdint.h

2018-05-17 Thread Kyrill Tkachov


On 16/05/18 10:22, Jérôme Lambourg wrote:

Hello Kyrill,


Thanks for the patch! To validate it your changes you can also look at the 
disassembly
of the cmse.c binary in the build tree. If the binary changes with your patch 
then that
would indicate some trouble.

Good idea. So I just did that and the assembly of both objects are identical
(before and after the patch).


There are places in arm_cmse.h that use intptr_t. You should replace those as 
well.
Look for the cmse_nsfptr_create and cmse_is_nsfptr macros...

Indeed, good catch. I did not see those as this part is not included in the 
armv8-m.

Below the updated patch and modified changelog.


Thanks this looks good with one nit below.


2018-05-16  Jerome Lambourg  
gcc/
* config/arm/arm_cmse.h (cmse_nsfptr_create, cmse_is_nsfptr): Remove
#include . Replace intptr_t with __INTPTR_TYPE__.

libgcc/
* config/arm/cmse.c (cmse_check_address_range): Replace
UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__.



@@ -51,7 +51,8 @@
 
   /* Execute the right variant of the TT instructions.  */

   pe = pb + size - 1;
-  const int singleCheck = (((uintptr_t) pb ^ (uintptr_t) pe) < 32);
+  const int singleCheck =
+(((__UINTPTR_TYPE__) pb ^ (__UINTPTR_TYPE__) pe) < 32);

The "=" should go on the next line together with the initialisation.

Ok for trunk with that fixed.
Thanks,
Kyrill


Re: [PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64

2018-05-17 Thread Richard Biener
On Thu, 17 May 2018, Kyrill Tkachov wrote:

> Hi,
> 
> Given this is a midend change it's a good idea to CC some of the maintainers
> of that area.
> I've copied richi and Honza.

The patch is ok for trunk (it's actually mine...) and for the branch
after a while.

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> On 17/05/18 05:35, vladimir.mezent...@oracle.com wrote:
> > Ping.
> > 
> > -Vladimir
> > 
> > 
> > On 05/10/2018 11:30 PM, vladimir.mezent...@oracle.com wrote:
> > > From: Vladimir Mezentsev 
> > >
> > > When weakref_targets is not empty a target cannot be removed from the weak
> > list.
> > > A small example is below when 'wv12' is removed from the weak list on
> > aarch64:
> > >   static vtype Wv12 __attribute__((weakref ("wv12")));
> > >   extern vtype wv12 __attribute__((weak));
> > >
> > > Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go).
> > > Tested on aarch64-linux-gnu.
> > > No regression. The attr-weakref-1.c test passed.
> > >
> > > ChangeLog:
> > > 2018-05-10  Vladimir Mezentsev 
> > >
> > > PR gcc/84923
> > > * varasm.c (weak_finish): clean up weak_decls
> > > ---
> > >  gcc/varasm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > > index 85296b4..8cf6e1e 100644
> > > --- a/gcc/varasm.c
> > > +++ b/gcc/varasm.c
> > > @@ -5652,7 +5652,8 @@ weak_finish (void)
> > >tree alias_decl = TREE_PURPOSE (t);
> > >tree target = ultimate_transparent_alias_target (&TREE_VALUE (t));
> > >
> > > -  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl)))
> > > +  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl))
> > > + || TREE_SYMBOL_REFERENCED (target))
> > >/* Remove alias_decl from the weak list, but leave entries for
> > >   the target alone.  */
> > >target = NULL_TREE;
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Kyrill Tkachov


On 15/05/18 18:56, Richard Sandiford wrote:

Kyrill  Tkachov  writes:

Hi all,

We've a deficiency in our vec_set family of patterns.  We don't
support directly loading a vector lane using LD1 for V2DImode and all
the vector floating-point modes.  We do do it correctly for the other
integer vector modes (V4SI, V8HI etc) though.

The alternatives on the relative floating-point patterns only allow a
register-to-register INS instruction.  That means if we want to load a
value into a vector lane we must first load it into a scalar register
and then perform an INS, which is wasteful.

There is also an explicit V2DI vec_set expander dangling around for no
reason that I can see. It seems to do the exact same things as the
other vec_set expanders. This patch removes that.  It now unifies all
vec_set expansions into a single "vec_set" define_expand using
the catch-all VALL_F16 iterator.

I decided to leave two aarch64_simd_vec_set define_insns. One
for the integer vector modes (that now include V2DI) and one for the
floating-point vector modes. That is so that we can avoid specifying
"w,r" alternatives for floating-point modes in case the
register-allocator gets confused and starts gratuitously moving
registers between the two banks.  So the floating-point pattern only
two alternatives, one for SIMD-to-SIMD INS and one for LD1.

Did you see any cases in which this was necessary?  In some ways it
seems to run counter to Wilco's recent patches, which tended to remove
the * markers from the "unnatural" register class and trust the register
allocator to make a sensible decision.

I think our default position should be trust the allocator here.
If the consumers all require "w" registers then the RA will surely
try to use "w" registers if at all possible.  But if the consumers
don't care then it seems reasonable to offer both, since in those
cases it doesn't really make much difference whether the payload
happens to be SF or SI (say).

There are also cases in which the consumer could actively require
an integer register.  E.g. some code uses unions to bitcast floats
to ints and then do bitwise arithmetic on them.



Thanks, that makes sense. Honestly, it's been a few months since I worked on 
this patch.
I believe my reluctance to specify that alternative was that it would mean 
merging the integer and
floating-point patterns into one (like the attached version) which would put the "w, 
r" alternative
first for the floating-point case. I guess we should be able to trust the 
allocator to pick
the sensible  alternative though.

This version is then made even simpler due to all the vec_set patterns being 
merged into one.
Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-05-17  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (vec_set): Use VALL_F16 mode
iterator.  Delete separate integer-mode vec_set expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set): Delete all other patterns with that name.
Use VALL_F16 mode iterator.  Add LD1 alternative and use vwcore for
the "w, r" alternative.

2018-05-17  Kyrylo Tkachov  

* gcc.target/aarch64/vect-init-ld1.c: New test.


With this patch we avoid loading values into scalar registers and then
doing an explicit INS on them to move them into the desired vector
lanes. For example for:

typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));

v2di
foo_v2di (long long *a, long long *b)
{
v2di res = { *a, *b };
return res;
}

v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, *b, *c, *d };
return res;
}

we currently generate:

foo_v2di:
  ldr d0, [x0]
  ldr x0, [x1]
  ins v0.d[1], x0
  ret

foo_v4sf:
  ldr s0, [x0]
  ldr s3, [x1]
  ldr s2, [x2]
  ldr s1, [x3]
  ins v0.s[1], v3.s[0]
  ins v0.s[2], v2.s[0]
  ins v0.s[3], v1.s[0]
  ret

but with this patch we generate the much cleaner:
foo_v2di:
  ldr d0, [x0]
  ld1 {v0.d}[1], [x1]
  ret

foo_v4sf:
  ldr s0, [x0]
  ld1 {v0.s}[1], [x1]
  ld1 {v0.s}[2], [x2]
  ld1 {v0.s}[3], [x3]
  ret

Nice!  The original reason for:

   /* FIXME: At the moment the cost model seems to underestimate the
  cost of using elementwise accesses.  This check preserves the
  traditional behavior until that can be fixed.  */
   if (*memory_access_type == VMAT_ELEMENTWISE
   && !STMT_VINFO_STRIDED_P (stmt_info)
   && !(stmt == GROUP_FIRST_ELEMENT (stmt_info)
   && !GROUP_NEXT_ELEMENT (stmt_info)
   && !pow2p_hwi (GROUP_SIZE (stmt_info
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "not falling back to elementwise acces

Re: Replace FMA_EXPR with one internal fn per optab

2018-05-17 Thread Richard Sandiford
Richard Biener  writes:
>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>>  }
>
>> if (negate_p)
>> -   mulop1 = force_gimple_operand_gsi (&gsi,
>> -  build1 (NEGATE_EXPR,
>> -  type, mulop1),
>> -  true, NULL_TREE, true,
>> -  GSI_SAME_STMT);
>> +   mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
>
>> -  fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
>> - FMA_EXPR, mulop1, op2, addop);
>> +  if (seq)
>> +   gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>> +  fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
> addop);
>> +  gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
>> +  gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
> (use_stmt));
>> +  gsi_replace (&gsi, fma_stmt, true);
>> +  /* Valueize aggressively so that we generate FMS, FNMA and FNMS
>> +regardless of where the negation occurs.  */
>> +  if (fold_stmt (&gsi, aggressive_valueize))
>> +   update_stmt (gsi_stmt (gsi));
>
> I think it would be nice to be able to use gimple_build () with IFNs so you
> can
> gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
> only need to fold with generated negates, not with negates already in the
> IL?
> The the folding implied with gimple_build will take care of it.

The idea was to pick up existing negates that feed the multiplication
as well as any added by the pass itself.

On IRC yesterday we talked about how this should handle the ECF_NOTHROW
flag, and whether things like IFN_SQRT and IFN_FMA should always be
nothrow (like the built-in functions are).  But in the end I thought
it'd be better to keep things as they are.  We already handle
-fnon-call-exceptions for unfused a * b + c and before the patch also
handled it for FMA_EXPR.  It'd seem like a step backwards if the new
internal functions didn't handle it too.  If anything it seems like the
built-in functions should change to be closer to the tree_code and
internal_fn way of doing things, if we want to support -fnon-call-exceptions
properly.

This also surprised me when doing the if-conversion patch I sent yesterday.
We're happy to vectorise:

  for (int i = 0; i < 100; ++i)
x[i] = ... ? sqrt (x[i]) : 0;

by doing the sqrt unconditionally and selecting on the result, even with
the default maths flags, but refuse to vectorise the simpler:

  for (int i = 0; i < 100; ++i)
x[i] = ... ? x[i] + 1 : 0;

in the same way.

> Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
> alongside no_follow_ssa_edges / follow_single_use_edges and maybe
> rename it as follow_all_ssa_edges?

Ah, yeah, that's definitely a better name.

I also renamed all_scalar_fma to scalar_all_fma, since I realised
after Andrew's reply that the old name made it sound like it was
"all scalars", whereas it meant to mean "all fmas".

Tested as before.

Thanks,
Richard

2018-05-17  Richard Sandiford  

gcc/
* doc/sourcebuild.texi (scalar_all_fma): Document.
* tree.def (FMA_EXPR): Delete.
* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
* internal-fn.c (ternary_direct): New macro.
(expand_ternary_optab_fn): Likewise.
(direct_ternary_optab_supported_p): Likewise.
* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
* builtins.c (fold_builtin_fma): Delete.
(fold_builtin_3): Don't call it.
* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (operand_equal_p): Likewise.
(fold_ternary_loc): Likewise.
* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
* gimple.c (DEFTREECODE): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-eh.c (operation_could_trap_p): Likewise.
(stmt_could_throw_1_p): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
* tree-ssa-loop-im.c (stmt_cost): Likewise.
* tree-ssa-operands.c (get_expr_operands): Likewise.
* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
* fold-const-call.h (fold_fma): Delete.
* fold-const-call.c (fold_const_call_): Handle CFN_FMS,
CFN_FNMA and CFN_FNMS.
(fold_fma): Delete.
* genmatch.c (combined_fn): New enum.
(commutative_ternary_tree_code): Remove FMA_EXPR handling.
(commutative_op): New function.
(commutate): Use it.  Handle more than 2 operands.
(dt_operand::gen_g

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-17 Thread Richard Earnshaw (lists)
On 16/05/18 09:37, Kyrill Tkachov wrote:
> 
> On 15/05/18 10:58, Richard Biener wrote:
>> On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov
>> 
>> wrote:
>>
>>> Hi all,
>>> This is a respin of James's patch from:
>> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
>>> The original patch was approved and committed but was later reverted
>> because of failures on big-endian.
>>> This tweaked version fixes the big-endian failures in
>> aarch64_expand_vector_init by picking the right
>>> element of VALS to move into the low part of the vector register
>> depending on endianness. The rest of the patch
>>> stays the same. I'm looking for approval on the aarch64 parts, as they
>> are the ones that have changed
>>> since the last approved version of the patch.
>>> ---
>>> In the testcase in this patch we create an SLP vector with only two
>>> elements. Our current vector initialisation code will first duplicate
>>> the first element to both lanes, then overwrite the top lane with a new
>>> value.
>>> This duplication can be clunky and wasteful.
>>> Better would be to simply use the fact that we will always be
>>> overwriting
>>> the remaining bits, and simply move the first element to the corrcet
>>> place
>>> (implicitly zeroing all other bits).
>>> This reduces the code generation for this case, and can allow more
>>> efficient addressing modes, and other second order benefits for AArch64
>>> code which has been vectorized to V2DI mode.
>>> Note that the change is generic enough to catch the case for any vector
>>> mode, but is expected to be most useful for 2x64-bit vectorization.
>>> Unfortunately, on its own, this would cause failures in
>>> gcc.target/aarch64/load_v2vec_lanes_1.c and
>>> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
>>> vec_merge and vec_duplicate for their simplifications to apply. To fix
>>> this,
>>> add a special case to the AArch64 code if we are loading from two memory
>>> addresses, and use the load_pair_lanes patterns directly.
>>> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
>>> , to
>>> catch:
>>>  (vec_merge:OUTER
>>>     (vec_duplicate:OUTER x:INNER)
>>>     (subreg:OUTER y:INNER 0)
>>>     (const_int N))
>>> And simplify it to:
>>>  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
>>> This is similar to the existing patterns which are tested in this
>>> function,
>>> without requiring the second operand to also be a vec_duplicate.
>>> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
>>> aarch64-none-elf.
>>> Note that this requires
>>> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
>>> if we don't want to ICE creating broken vector zero extends.
>>> Are the non-AArch64 parts OK?
>> Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
>> you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
>> also doesn't handle the swapped operand case.
>>
>> Otherwise the middle-end parts looks ok.
> 
> I don't see any explicit canonicalisation code for it.
> I've updated the simplify-rtx part to handle the swapped operand case.
> Is the attached patch better in this regard? I couldn't think of a clean
> way to avoid
> duplicating some logic (beyond creating a new function away from the
> callsite).
> 
> Thanks,
> Kyrill
> 
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> James
>>> ---
>>> 2018-05-15  James Greenhalgh  
>>>    Kyrylo Tkachov  
>>>    * config/aarch64/aarch64.c (aarch64_expand_vector_init):
>>> Modify
>>>    code generation for cases where splatting a value is not
>>> useful.
>>>    * simplify-rtx.c (simplify_ternary_operation): Simplify
>>>    vec_merge across a vec_duplicate and a paradoxical subreg
>> forming a vector
>>>    mode to a vec_concat.
>>> 2018-05-15  James Greenhalgh  
>>>    * gcc.target/aarch64/vect-slp-dup.c: New.
> 

I'm surprised we don't seem to have a function in the compiler that
performs this check:

+ && rtx_equal_p (XEXP (x1, 0),
+ plus_constant (Pmode,
+XEXP (x0, 0),
+GET_MODE_SIZE (inner_mode

Without generating dead RTL (plus_constant will rarely be able to return
a subexpression of the original pattern).  I would have thought this
sort of test was not that uncommon.

However, I don't think that needs to hold up this patch.

OK.

R.
> 
> vec-splat.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> a2003fe52875f1653d644347bafd7773d1f01e91..6bf6c05535b61eef1021d46bcd8448fb3a0b25f4
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13916,9 +13916,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>   maxv = matches[i][1];
> }
>  
> -  /* C

Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-05-17 Thread Kyrill Tkachov

Hi Sudi,

On 27/03/18 13:58, Sudakshina Das wrote:

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?



This looks ok to me but you'll need approval from a maintainer.
Please put the PR marker in your ChangeLog so that the svn hook picks it up on 
commit.

Thanks,
Kyrill


Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  

* common/config/aarch64/aarch64-common.c (aarch64_handle_option):
Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
* config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
* config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
as true for strict-align.
(aarch64_can_inline_p): Perform checks even when callee has no
attributes to check for strict alignment.
* doc/extend.texi (AArch64 Function Attributes): Document
no-strict-align.
* doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  

* gcc.target/aarch64/pr84882.c: New test.
* gcc.target/aarch64/target_attr_18.c: Likewise.




Re: [PATCH][GCC][AArch64] Correct 3 way XOR instructions adding missing patterns.

2018-05-17 Thread Kyrill Tkachov

Hi Tamar,

On 30/04/18 15:12, Tamar Christina wrote:

Hi All,

This patch adds the missing neon intrinsics for all 128 bit vector Integer 
modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.

Bootstrapped and regtested on aarch64-none-linux-gnue and no issues.

Ok for master? And for backport to the GCC-8 branch?



This looks ok to me and appropriate for trunk but you'll need approval from a 
maintainer.

Thanks,
Kyrill


gcc/
2018-04-30  Tamar Christina  

* config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to
eor3q4.
(aarch64_bcaxqv8hi): Change to bcaxq4.
* config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* config/aarch64/arm_neon.h: Likewise.
* config/aarch64/iterators.md (VQ_I): New.

gcc/testsuite/
2018-04-30  Tamar Christina  

* gcc.target/gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.
* gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.
* gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.

Thanks,
Tamar

--




Re: [RFC][PR82479] missing popcount builtin detection

2018-05-17 Thread Bin.Cheng
On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 6 March 2018 at 02:24, Richard Biener  wrote:
>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 1 February 2018 at 23:21, Richard Biener  
>>> wrote:
 On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
  wrote:
> Hi Richard,
>
> On 31 January 2018 at 21:39, Richard Biener  
> wrote:
>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> Thanks for the review.
>>> On 25 January 2018 at 20:04, Richard Biener 
>>>  wrote:
 On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
  wrote:
> Hi All,
>
> Here is a patch for popcount builtin detection similar to LLVM. I
> would like to queue this for review for next stage 1.
>
> 1. This is done part of loop-distribution and effective for -O3 and 
> above.
> 2. This does not distribute loop to detect popcount (like
> memcpy/memmove). I dont think that happens in practice. Please correct
> me if I am wrong.

 But then it has no business inside loop distribution but instead is
 doing final value
 replacement, right?  You are pattern-matching the whole loop after 
 all.  I think
 final value replacement would already do the correct thing if you
 teached number of
 iteration analysis that niter for

[local count: 955630224]:
   # b_11 = PHI 
   _1 = b_11 + -1;
   b_8 = _1 & b_11;
   if (b_8 != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 850510900]:
   goto ; [100.00%]
>>>
>>> I am looking into this approach. What should be the scalar evolution
>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>> and can this be represented with the scev?
>>
>> No, it's not affine and thus cannot be represented.  You only need the
>> scalar evolution of the counting IV which is already handled and
>> the number of iteration analysis needs to handle the above IV - this
>> is the missing part.
> Thanks for the clarification. I am now matching this loop pattern in
> number_of_iterations_exit when number_of_iterations_exit_assumptions
> fails. If the pattern matches, I am inserting the _builtin_popcount in
> the loop preheater and setting the loop niter with this. This will be
> used by the final value replacement. Is this what you wanted?

 No, you shouldn't insert a popcount stmt but instead the niter
 GENERIC tree should be a CALL_EXPR to popcount with the
 appropriate argument.
>>>
>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>> niter in tree_niter_desc can take such.
>>>
>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>
>> Overall this looks ok - the patch includes changes in places that I don't 
>> think
>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>> The expression_expensive_p change should be more specific than making
>> all calls inexpensive as well.
>
> Changed it.
>
>>
>> The verify_ssa change looks bogus, you do
>>
>> +  dest = gimple_phi_result (count_phi);
>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>> +
>> +  var = build_call_expr (fn, 1, src);
>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>> +   build_int_cst (TREE_TYPE (dest), 1));
>>
>> why do you allocate a new SSA name here?  It seems unused
>> as you overwrive 'var' with the CALL_EXPR immediately.
> Changed now.
>
>>
>> I didn't review the pattern matching thoroughly nor the exact place you
>> call it.  But
>>
>> +  if (check_popcount_pattern (loop, &count))
>> +   {
>> + niter->assumptions = boolean_false_node;
>> + niter->control.base = NULL_TREE;
>> + niter->control.step = NULL_TREE;
>> + niter->control.no_overflow = false;
>> + niter->niter = count;
>> + niter->assumptions = boolean_true_node;
>> + niter->may_be_zero = boolean_false_node;
>> + niter->max = -1;
>> + niter->bound = NULL_TREE;
>> + niter->cmp = ERROR_MARK;
>> + return true;
>> +   }
>>
>> simply setting may_be_zero to false looks fishy.
> Should I set this to (argument to popcount == zero)?
No, I think that's unnecessary.  The number of iterations is computed
as: may_be_zero ? 0 : niter;
Here niter is ZERO even when may_be_zero is set to false, and niters
is computed correctly.

I think the point is that may_be_zero is false doesn't imply that
niters is non-zero.

>
>> Try with -fno-tree-loop-ch.
> I chan

Re: [PATCH, aarch64] Patch to update pipeline descriptions in thunderx2t99.md

2018-05-17 Thread Richard Earnshaw (lists)
On 09/05/18 23:37, Steve Ellcey wrote:
> On Fri, 2018-05-04 at 14:05 -0700, Andrew Pinski wrote:
>>  
>>>    (thunderx2t99_loadpair): Fix cpu unit ordering.
>> I think the original ordering was correct.  The address calculation
>> happens before the actual load.
>> thunderx2t99_asimd_load1_ldp would have a similar issue.
>>
>> Thanks,
>> Andrew
> 
> OK, I checked into that and undid the change to thunderx2t99_loadpair
> and fixed thunderx2t99_asimd_load1_ldp to match it.  Everything else is
> the same.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2018-05-09  Steve Ellcey  
> 
>   * config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete.
>   (thunderx2t99_multiple): Delete psuedo-units from used cpus.
>   Add untyped.
>   (thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg.
>   Change logics_shift_reg to logics_shift_imm.
>   (thunderx2t99_fp_loadpair_basic): Delete.
>   (thunderx2t99_fp_storepair_basic): Delete.
>   (thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types.
>   (thunderx2t99_asimd_polynomial): Delete.
>   (thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q
>   and neon_fp_mul_d_scalar_q.
>   (thunderx2t99_asimd_fp_conv): Add *int_to_fp* types.
>   (thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q.
>   (thunderx2t99_asimd_recip_step): Add missing *sqrt* types.
>   (thunderx2t99_asimd_lut): Add missing tbl types.
>   (thunderx2t99_asimd_ext): Delete.
>   (thunderx2t99_asimd_load1_1_mult): Delete.
>   (thunderx2t99_asimd_load1_2_mult): Delete.
>   (thunderx2t99_asimd_load1_ldp): New.
>   (thunderx2t99_asimd_load1): New.
>   (thunderx2t99_asimd_load2): Add missing *load2* types.
>   (thunderx2t99_asimd_load3): New.
>   (thunderx2t99_asimd_load4): New.
>   (thunderx2t99_asimd_store1_1_mult): Delete.
>   (thunderx2t99_asimd_store1_2_mult): Delete.
>   (thunderx2t99_asimd_store2_mult): Delete.
>   (thunderx2t99_asimd_store2_onelane): Delete.
>   (thunderx2t99_asimd_store_stp): New.
>   (thunderx2t99_asimd_store1): New.
>   (thunderx2t99_asimd_store2): New.
>   (thunderx2t99_asimd_store3): New.
>   (thunderx2t99_asimd_store4): New.
> 
> 

OK.

R.

> t99-sched.patch
> 
> 
> diff --git a/gcc/config/aarch64/thunderx2t99.md 
> b/gcc/config/aarch64/thunderx2t99.md
> index 589e564..fb71de5 100644
> --- a/gcc/config/aarch64/thunderx2t99.md
> +++ b/gcc/config/aarch64/thunderx2t99.md
> @@ -54,8 +54,6 @@
>  (define_reservation "thunderx2t99_ls01" "thunderx2t99_ls0|thunderx2t99_ls1")
>  (define_reservation "thunderx2t99_f01" "thunderx2t99_f0|thunderx2t99_f1")
>  
> -(define_reservation "thunderx2t99_ls_both" 
> "thunderx2t99_ls0+thunderx2t99_ls1")
> -
>  ; A load with delay in the ls0/ls1 pipes.
>  (define_reservation "thunderx2t99_l0delay" "thunderx2t99_ls0,\
> thunderx2t99_ls0d1,thunderx2t99_ls0d2,\
> @@ -86,12 +84,10 @@
>  
>  (define_insn_reservation "thunderx2t99_multiple" 1
>(and (eq_attr "tune" "thunderx2t99")
> -   (eq_attr "type" "multiple"))
> +   (eq_attr "type" "multiple,untyped"))
>"thunderx2t99_i0+thunderx2t99_i1+thunderx2t99_i2+thunderx2t99_ls0+\
> thunderx2t99_ls1+thunderx2t99_sd+thunderx2t99_i1m1+thunderx2t99_i1m2+\
> -   
> thunderx2t99_i1m3+thunderx2t99_ls0d1+thunderx2t99_ls0d2+thunderx2t99_ls0d3+\
> -   thunderx2t99_ls1d1+thunderx2t99_ls1d2+thunderx2t99_ls1d3+thunderx2t99_f0+\
> -   thunderx2t99_f1")
> +   thunderx2t99_i1m3+thunderx2t99_f0+thunderx2t99_f1")
>  
>  ;; Integer arithmetic/logic instructions.
>  
> @@ -113,9 +109,9 @@
>  
>  (define_insn_reservation "thunderx2t99_alu_shift" 2
>(and (eq_attr "tune" "thunderx2t99")
> -   (eq_attr "type" "alu_shift_imm,alu_ext,alu_shift_reg,\
> - alus_shift_imm,alus_ext,alus_shift_reg,\
> - logic_shift_imm,logics_shift_reg"))
> +   (eq_attr "type" "alu_shift_imm,alu_ext,\
> + alus_shift_imm,alus_ext,\
> + logic_shift_imm,logics_shift_imm"))
>"thunderx2t99_i012,thunderx2t99_i012")
>  
>  (define_insn_reservation "thunderx2t99_div" 13
> @@ -228,21 +224,11 @@
> (eq_attr "type" "f_loads,f_loadd"))
>"thunderx2t99_ls01")
>  
> -(define_insn_reservation "thunderx2t99_fp_loadpair_basic" 4
> -  (and (eq_attr "tune" "thunderx2t99")
> -   (eq_attr "type" "neon_load1_2reg"))
> -  "thunderx2t99_ls01*2")
> -
>  (define_insn_reservation "thunderx2t99_fp_store_basic" 1
>(and (eq_attr "tune" "thunderx2t99")
> (eq_attr "type" "f_stores,f_stored"))
>"thunderx2t99_ls01,thunderx2t99_sd")
>  
> -(define_insn_reservation "thunderx2t99_fp_storepair_basic" 1
> -  (and (eq_attr "tune" "thunderx2t99")
> -   (eq_attr "type" "neon_store1_2reg"))
> -  "thunderx2t99_ls01,(thunderx2t99_ls01+thunderx2t99_sd),thunderx2t99_sd")
> -
>  ;; ASIMD integer instructions.
>  
>  (define_insn_reservation "thunderx2t99_asimd_int" 7
> @@ -

Re: [PATCH PR85793]Fix ICE by loading vector(1) scalara_type for 1 element-wise case

2018-05-17 Thread Richard Biener
On Wed, May 16, 2018 at 5:13 PM Bin Cheng  wrote:

> Hi,
> This patch fixes ICE by loading vector(1) scalar_type if it's 1
element-wise for VMAT_ELEMENTWISE.
> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?

OK.

Richard.

> Thanks,
> bin
> 2018-05-16  Bin Cheng  
>  Richard Biener  

>  PR tree-optimization/85793
>  * tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise
load
>  for VMAT_ELEMENTWISE.

> gcc/testsuite
> 2018-05-16  Bin Cheng  

>  PR tree-optimization/85793
>  * gcc.dg/vect/pr85793.c: New test.


[PATCH 09/14] Remove cgraph_node::summary_uid and make cgraph_node::uid really unique.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-05-16  Martin Liska  

* cgraph.c (cgraph_node::remove): Do not recycle uid.
* cgraph.h (symbol_table::release_symbol): Do not pass uid.
(symbol_table::allocate_cgraph_symbol): Do not set uid.
* passes.c (uid_hash_t): Record removed_nodes by their uids.
(remove_cgraph_node_from_order): Use the removed_nodes set.
(do_per_function_toporder): Likwise.
* symbol-summary.h (symtab_insertion): Use cgraph_node::uid
instead of summary_uid.
(symtab_removal): Likewise.
(symtab_duplication): Likewise.

gcc/lto/ChangeLog:

2018-05-16  Martin Liska  

* lto-partition.c (lto_balanced_map): Use cgraph_node::uid
instead of summary_uid.
---
 gcc/cgraph.c|  3 +--
 gcc/cgraph.h| 20 ++--
 gcc/lto/lto-partition.c | 26 +++---
 gcc/passes.c| 37 -
 gcc/symbol-summary.h| 18 +-
 5 files changed, 39 insertions(+), 65 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9a7d54d7cee..a24a5ffe521 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1805,7 +1805,6 @@ void
 cgraph_node::remove (void)
 {
   cgraph_node *n;
-  int uid = this->uid;
 
   if (symtab->ipa_clones_dump_file && symtab->cloned_nodes.contains (this))
 fprintf (symtab->ipa_clones_dump_file,
@@ -1907,7 +1906,7 @@ cgraph_node::remove (void)
   instrumented_version = NULL;
 }
 
-  symtab->release_symbol (this, uid);
+  symtab->release_symbol (this);
 }
 
 /* Likewise indicate that a node is having address taken.  */
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index ee7ebb41c24..0e3b1a1785e 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1396,8 +1396,6 @@ public:
   int count_materialization_scale;
   /* Unique id of the node.  */
   int uid;
-  /* Summary unique id of the node.  */
-  int summary_uid;
   /* ID assigned by the profiling.  */
   unsigned int profile_id;
   /* Time profiler: first run of function.  */
@@ -2020,7 +2018,7 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_summary_uid (1)
+  symbol_table (): cgraph_max_uid (1)
   {
   }
 
@@ -2080,9 +2078,8 @@ public:
   /* Allocate new callgraph node and insert it into basic data structures.  */
   cgraph_node *create_empty (void);
 
-  /* Release a callgraph NODE with UID and put in to the list
- of free nodes.  */
-  void release_symbol (cgraph_node *node, int uid);
+  /* Release a callgraph NODE.  */
+  void release_symbol (cgraph_node *node);
 
   /* Output all variables enqueued to be assembled.  */
   bool output_variables (void);
@@ -2230,7 +2227,6 @@ public:
 
   int cgraph_count;
   int cgraph_max_uid;
-  int cgraph_max_summary_uid;
 
   int edges_count;
   int edges_max_uid;
@@ -2598,7 +2594,7 @@ symbol_table::unregister (symtab_node *node)
 /* Release a callgraph NODE with UID and put in to the list of free nodes.  */
 
 inline void
-symbol_table::release_symbol (cgraph_node *node, int uid)
+symbol_table::release_symbol (cgraph_node *node)
 {
   cgraph_count--;
 
@@ -2606,7 +2602,6 @@ symbol_table::release_symbol (cgraph_node *node, int uid)
  list.  */
   memset (node, 0, sizeof (*node));
   node->type = SYMTAB_FUNCTION;
-  node->uid = uid;
   SET_NEXT_FREE_NODE (node, free_nodes);
   free_nodes = node;
 }
@@ -2624,12 +2619,9 @@ symbol_table::allocate_cgraph_symbol (void)
   free_nodes = NEXT_FREE_NODE (node);
 }
   else
-{
-  node = ggc_cleared_alloc ();
-  node->uid = cgraph_max_uid++;
-}
+node = ggc_cleared_alloc ();
 
-  node->summary_uid = cgraph_max_summary_uid++;
+  node->uid = cgraph_max_uid++;
   return node;
 }
 
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 76086a2ba2e..9049a372256 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -506,12 +506,10 @@ account_reference_p (symtab_node *n1, symtab_node *n2)
 void
 lto_balanced_map (int n_lto_partitions, int max_partition_size)
 {
-  int n_nodes = 0;
   int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0;
-  struct cgraph_node **order = XNEWVEC (cgraph_node *, symtab->cgraph_max_uid);
+  auto_vec  order (symtab->cgraph_count);
   auto_vec noreorder;
   auto_vec varpool_order;
-  int i;
   struct cgraph_node *node;
   int64_t original_total_size, total_size = 0;
   int64_t partition_size;
@@ -519,7 +517,7 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
   int last_visited_node = 0;
   varpool_node *vnode;
   int64_t cost = 0, internal = 0;
-  int best_n_nodes = 0, best_i = 0;
+  unsigned int best_n_nodes = 0, best_i = 0;
   int64_t best_cost = -1, best_internal = 0, best_size = 0;
   int npartitions;
   int current_order = -1;
@@ -527,14 +525,14 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
 
   FOR_EACH_VARIABLE (vnode)
 gcc_assert (!vnode->aux);
-
+
   FOR_EACH_DEFINED_FUNCTION (node)
 if (node-

[PATCH 13/14] Make cgraph_edge::uid really unique.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* cgraph.c (symbol_table::create_edge): Always assign a new
unique number.
(symbol_table::free_edge): Do not recycle numbers.
* cgraph.h (cgraph_edge::get): New method.
* symbol-summary.h (symtab_removal): Use it.
(symtab_duplication): Likewise.
(call_summary::hashable_uid): Remove.
---
 gcc/cgraph.c |  9 ++---
 gcc/cgraph.h | 14 +++---
 gcc/symbol-summary.h | 21 +++--
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index a24a5ffe521..572c775c14c 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -846,13 +846,11 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
   free_edges = NEXT_FREE_EDGE (edge);
 }
   else
-{
-  edge = ggc_alloc ();
-  edge->uid = edges_max_uid++;
-}
+edge = ggc_alloc ();
 
   edges_count++;
 
+  edge->m_uid = edges_max_uid++;
   edge->aux = NULL;
   edge->caller = caller;
   edge->callee = callee;
@@ -1006,14 +1004,11 @@ cgraph_edge::remove_caller (void)
 void
 symbol_table::free_edge (cgraph_edge *e)
 {
-  int uid = e->uid;
-
   if (e->indirect_info)
 ggc_free (e->indirect_info);
 
   /* Clear out the edge so we do not dangle pointers.  */
   memset (e, 0, sizeof (*e));
-  e->uid = uid;
   NEXT_FREE_EDGE (e) = free_edges;
   free_edges = e;
   edges_count--;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0e3b1a1785e..1966893343d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1633,6 +1633,7 @@ struct GTY(()) cgraph_indirect_call_info
 struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
 	for_user)) cgraph_edge {
   friend class cgraph_node;
+  friend class symbol_table;
 
   /* Remove the edge in the cgraph.  */
   void remove (void);
@@ -1696,6 +1697,12 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
   /* Return true if the call can be hot.  */
   bool maybe_hot_p (void);
 
+  /* Get unique identifier of the edge.  */
+  inline int get_uid ()
+  {
+return m_uid;
+  }
+
   /* Rebuild cgraph edges for current function node.  This needs to be run after
  passes that don't update the cgraph.  */
   static unsigned int rebuild_edges (void);
@@ -1723,8 +1730,6 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
   /* The stmt_uid of call_stmt.  This is used by LTO to recover the call_stmt
  when the function is serialized in.  */
   unsigned int lto_stmt_uid;
-  /* Unique id of the edge.  */
-  int uid;
   /* Whether this edge was made direct by indirect inlining.  */
   unsigned int indirect_inlining_edge : 1;
   /* Whether this edge describes an indirect call with an undetermined
@@ -1768,6 +1773,9 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
   /* Expected frequency of executions within the function.  */
   sreal sreal_frequency ();
 private:
+  /* Unique id of the edge.  */
+  int m_uid;
+
   /* Remove the edge from the list of the callers of the callee.  */
   void remove_caller (void);
 
@@ -2018,7 +2026,7 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_uid (1)
+  symbol_table (): cgraph_max_uid (1), edges_max_uid (1)
   {
   }
 
diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index 12e50201125..8c80f309372 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -375,19 +375,19 @@ public:
  If a summary for an edge does not exist, it will be created.  */
   T* get_create (cgraph_edge *edge)
   {
-return get (hashable_uid (edge), true);
+return get (edge->get_uid (), true);
   }
 
   /* Getter for summary callgraph edge pointer.  */
   T* get (cgraph_edge *edge)
   {
-return get (hashable_uid (edge), false);
+return get (edge->get_uid (), false);
   }
 
   /* Remove edge from summary.  */
   void remove (cgraph_edge *edge)
   {
-int uid = hashable_uid (edge);
+int uid = edge->get_uid ();
 T **v = m_map.get (uid);
 if (v)
   {
@@ -405,7 +405,7 @@ public:
   /* Return true if a summary for the given EDGE already exists.  */
   bool exists (cgraph_edge *edge)
   {
-return m_map.get (hashable_uid (edge)) != NULL;
+return m_map.get (edge->get_uid ()) != NULL;
   }
 
   /* Symbol removal hook that is registered to symbol table.  */
@@ -428,13 +428,6 @@ private:
   /* Getter for summary callgraph ID.  */
   T *get (int uid, bool lazy_insert);
 
-  /* Get a hashable uid of EDGE.  */
-  int hashable_uid (cgraph_edge *edge)
-  {
-/* Edge uids start at zero which our hash_map does not like.  */
-return edge->uid + 1;
-  }
-
   /* Main summary store, where summary ID is used as key.  */
   hash_map  m_map;
   /* Internal summary removal hook pointer.  */
@@ -511,7 +504,7 @@ call_summary::symtab_removal (cgraph_edge *edge, void *data)
 {
   call_summary *summary = (call_summary  *) (data);
 
-  int h_uid = summ

[PATCH 01/14] Code refactoring of symtab_summary.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* symbol-summary.h (function_summary): Move constructor
implementation out of class declaration.
(release): Likewise.
(symtab_insertion): Likewise.
(symtab_removal): Likewise.
(symtab_duplication): Likewise.
(get): Likewise.
---
 gcc/symbol-summary.h | 219 ++-
 1 file changed, 130 insertions(+), 89 deletions(-)

diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index d11b70b5946..13f8f04342a 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -31,26 +31,23 @@ private:
   function_summary();
 };
 
+/* Function summary is a helper class that is used to associate a data structure
+   related to a callgraph node.  Typical usage can be seen in IPA passes which
+   create a temporary pass-related structures.  The summary class registers
+   hooks that are triggered when a new node is inserted, duplicated and deleted.
+   A user of a summary class can ovewrite virtual methods than are triggered by
+   the summary if such hook is triggered.  Apart from a callgraph node, the user
+   is given a data structure tied to the node.
+
+   The function summary class can work both with a heap-allocated memory and
+   a memory gained by garbage collected memory.  */
+
 template 
 class GTY((user)) function_summary 
 {
 public:
   /* Default construction takes SYMTAB as an argument.  */
-  function_summary (symbol_table *symtab, bool ggc = false): m_ggc (ggc),
-m_insertion_enabled (true), m_released (false), m_map (13, ggc),
-m_symtab (symtab)
-  {
-m_symtab_insertion_hook =
-  symtab->add_cgraph_insertion_hook
-  (function_summary::symtab_insertion, this);
-
-m_symtab_removal_hook =
-  symtab->add_cgraph_removal_hook
-  (function_summary::symtab_removal, this);
-m_symtab_duplication_hook =
-  symtab->add_cgraph_duplication_hook
-  (function_summary::symtab_duplication, this);
-  }
+  function_summary (symbol_table *symtab, bool ggc = false);
 
   /* Destructor.  */
   virtual ~function_summary ()
@@ -59,22 +56,7 @@ public:
   }
 
   /* Destruction method that can be called for GGT purpose.  */
-  void release ()
-  {
-if (m_released)
-  return;
-
-m_symtab->remove_cgraph_insertion_hook (m_symtab_insertion_hook);
-m_symtab->remove_cgraph_removal_hook (m_symtab_removal_hook);
-m_symtab->remove_cgraph_duplication_hook (m_symtab_duplication_hook);
-
-/* Release all summaries.  */
-typedef typename hash_map ::iterator map_iterator;
-for (map_iterator it = m_map.begin (); it != m_map.end (); ++it)
-  release ((*it).second);
-
-m_released = true;
-  }
+  void release ();
 
   /* Traverses all summarys with a function F called with
  ARG as argument.  */
@@ -102,16 +84,7 @@ public:
   }
 
   /* Release an item that is stored within map.  */
-  void release (T *item)
-  {
-if (m_ggc)
-  {
-	item->~T ();
-	ggc_free (item);
-  }
-else
-  delete item;
-  }
+  void release (T *item);
 
   /* Getter for summary callgraph node pointer.  */
   T* get (cgraph_node *node)
@@ -145,50 +118,14 @@ public:
   }
 
   /* Symbol insertion hook that is registered to symbol table.  */
-  static void symtab_insertion (cgraph_node *node, void *data)
-  {
-gcc_checking_assert (node->summary_uid);
-function_summary *summary = (function_summary  *) (data);
-
-if (summary->m_insertion_enabled)
-  summary->insert (node, summary->get (node));
-  }
+  static void symtab_insertion (cgraph_node *node, void *data);
 
   /* Symbol removal hook that is registered to symbol table.  */
-  static void symtab_removal (cgraph_node *node, void *data)
-  {
-gcc_checking_assert (node->summary_uid);
-function_summary *summary = (function_summary  *) (data);
-
-int summary_uid = node->summary_uid;
-T **v = summary->m_map.get (summary_uid);
-
-if (v)
-  {
-	summary->remove (node, *v);
-	summary->release (*v);
-	summary->m_map.remove (summary_uid);
-  }
-  }
+  static void symtab_removal (cgraph_node *node, void *data);
 
   /* Symbol duplication hook that is registered to symbol table.  */
   static void symtab_duplication (cgraph_node *node, cgraph_node *node2,
-  void *data)
-  {
-function_summary *summary = (function_summary  *) (data);
-T **v = summary->m_map.get (node->summary_uid);
-
-gcc_checking_assert (node2->summary_uid > 0);
-
-if (v)
-  {
-	/* This load is necessary, because we insert a new value!  */
-	T *data = *v;
-	T *duplicate = summary->allocate_new ();
-	summary->m_map.put (node2->summary_uid, duplicate);
-	summary->duplicate (node, node2, data, duplicate);
-  }
-  }
+  void *data);
 
 protected:
   /* Indication if we use ggc summary.  */
@@ -198,15 +135,7 @@ private:
   typedef int_hash  map_hash;
 
   /* Getter for summary callgraph ID.  */
-  T* get (int uid)
-  {
-bool existed;
-T **v = &m_map.get_o

[PATCH 06/14] Use symtab_summary in ipa-reference.c.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-reference.c (remove_node_data): Remove.
(duplicate_node_data): Likewise.
(class ipa_ref_var_info_summary_t): New class.
(class ipa_ref_opt_summary_t): Likewise.
(get_reference_vars_info): Use ipa_ref_var_info_summaries.
(get_reference_optimization_summary): Use
ipa_ref_opt_sum_summaries.
(set_reference_vars_info): Remove.
(set_reference_optimization_summary): Likewise.
(ipa_init): Create summaries.
(init_function_info): Use function summary.
(ipa_ref_opt_summary_t::duplicate): New function.
(ipa_ref_opt_summary_t::remove): New function.
(get_read_write_all_from_node): Fix GNU coding style.
(propagate): Use function summary.
(write_node_summary_p): Fix GNU coding style.
(stream_out_bitmap): Likewise.
(ipa_reference_read_optimization_summary): Use function summary.
(ipa_reference_c_finalize): Do not release hooks.
---
 gcc/ipa-reference.c | 205 
 1 file changed, 95 insertions(+), 110 deletions(-)

diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index 6490c03f8d0..9a9e94c3414 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -49,12 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "ipa-utils.h"
 #include "ipa-reference.h"
-
-static void remove_node_data (struct cgraph_node *node,
-			  void *data ATTRIBUTE_UNUSED);
-static void duplicate_node_data (struct cgraph_node *src,
- struct cgraph_node *dst,
- void *data ATTRIBUTE_UNUSED);
+#include "symbol-summary.h"
 
 /* The static variables defined within the compilation unit that are
loaded or stored directly by function that owns this structure.  */
@@ -84,9 +79,10 @@ struct ipa_reference_optimization_summary_d
   bitmap statics_not_written;
 };
 
-typedef struct ipa_reference_local_vars_info_d *ipa_reference_local_vars_info_t;
-typedef struct ipa_reference_global_vars_info_d *ipa_reference_global_vars_info_t;
-typedef struct ipa_reference_optimization_summary_d *ipa_reference_optimization_summary_t;
+typedef ipa_reference_local_vars_info_d *ipa_reference_local_vars_info_t;
+typedef ipa_reference_global_vars_info_d *ipa_reference_global_vars_info_t;
+typedef ipa_reference_optimization_summary_d *
+  ipa_reference_optimization_summary_t;
 
 struct ipa_reference_vars_info_d
 {
@@ -114,57 +110,55 @@ static bitmap_obstack local_info_obstack;
 /* Obstack holding global analysis live forever.  */
 static bitmap_obstack optimization_summary_obstack;
 
-/* Holders of ipa cgraph hooks: */
-static struct cgraph_2node_hook_list *node_duplication_hook_holder;
-static struct cgraph_node_hook_list *node_removal_hook_holder;
+class ipa_ref_var_info_summary_t: public function_summary
+			  
+{
+public:
+  ipa_ref_var_info_summary_t (symbol_table *symtab):
+function_summary  (symtab) {}
+};
 
-/* Vector where the reference var infos are actually stored. 
-   Indexed by UID of call graph nodes.  */
-static vec ipa_reference_vars_vector;
+static ipa_ref_var_info_summary_t *ipa_ref_var_info_summaries = NULL;
 
-/* TODO: find a place where we should release the vector.  */
-static vec ipa_reference_opt_sum_vector;
+class ipa_ref_opt_summary_t: public function_summary
+			 
+{
+public:
+  ipa_ref_opt_summary_t (symbol_table *symtab):
+function_summary  (symtab) {}
+
+
+  virtual void remove (cgraph_node *src_node,
+		   ipa_reference_optimization_summary_d *data);
+  virtual void duplicate (cgraph_node *src_node, cgraph_node *dst_node,
+			  ipa_reference_optimization_summary_d *src_data,
+			  ipa_reference_optimization_summary_d *dst_data);
+};
+
+static ipa_ref_opt_summary_t *ipa_ref_opt_sum_summaries = NULL;
 
 /* Return the ipa_reference_vars structure starting from the cgraph NODE.  */
 static inline ipa_reference_vars_info_t
 get_reference_vars_info (struct cgraph_node *node)
 {
-  if (!ipa_reference_vars_vector.exists ()
-  || ipa_reference_vars_vector.length () <= (unsigned int) node->uid)
+  if (ipa_ref_var_info_summaries == NULL)
 return NULL;
-  return ipa_reference_vars_vector[node->uid];
+
+  ipa_reference_vars_info_t v = ipa_ref_var_info_summaries->get (node);
+  return v == NULL ? NULL : v;
 }
 
 /* Return the ipa_reference_vars structure starting from the cgraph NODE.  */
 static inline ipa_reference_optimization_summary_t
 get_reference_optimization_summary (struct cgraph_node *node)
 {
-  if (!ipa_reference_opt_sum_vector.exists ()
-  || (ipa_reference_opt_sum_vector.length () <= (unsigned int) node->uid))
+  if (ipa_ref_opt_sum_summaries == NULL)
 return NULL;
-  return ipa_reference_opt_sum_vector[node->uid];
-}
 
-/* Return the ipa_reference_vars structure starting from the cgraph NODE.  */
-static inline void
-set_reference_vars_info (struct cgraph_node *node,
-			 ipa_reference_vars_info_t info)
-{
-

[PATCH 05/14] Use summaries->get where possible. Small refactoring of multiple calls.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-fnsummary.c (dump_ipa_call_summary): Use ::get method.
(analyze_function_body): Extract multiple calls of get_create.
* ipa-inline-analysis.c (simple_edge_hints): Likewise.
* ipa-inline.c (recursive_inlining): Use ::get method.
* ipa-inline.h (estimate_edge_growth): Likewise.
---
 gcc/ipa-fnsummary.c   | 14 +++---
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/ipa-inline.c  |  8 
 gcc/ipa-inline.h  |  7 +++
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 8a6c5d0b5d8..e40b537bf61 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -850,7 +850,7 @@ dump_ipa_call_summary (FILE *f, int indent, struct cgraph_node *node,
 	  }
   if (!edge->inline_failed)
 	{
-	  ipa_fn_summary *s = ipa_fn_summaries->get_create (callee);
+	  ipa_fn_summary *s = ipa_fn_summaries->get (callee);
 	  fprintf (f, "%*sStack frame offset %i, callee self size %i,"
 		   " callee size %i\n",
 		   indent + 2, "",
@@ -2363,10 +2363,9 @@ analyze_function_body (struct cgraph_node *node, bool early)
 	}
 	  free (body);
 	}
-  set_hint_predicate (&ipa_fn_summaries->get_create (node)->loop_iterations,
-			  loop_iterations);
-  set_hint_predicate (&ipa_fn_summaries->get_create (node)->loop_stride,
-			  loop_stride);
+  ipa_fn_summary *s = ipa_fn_summaries->get_create (node);
+  set_hint_predicate (&s->loop_iterations, loop_iterations);
+  set_hint_predicate (&s->loop_stride, loop_stride);
   scev_finalize ();
 }
   FOR_ALL_BB_FN (bb, my_function)
@@ -2384,8 +2383,9 @@ analyze_function_body (struct cgraph_node *node, bool early)
 	  e->aux = NULL;
 	}
 }
-  ipa_fn_summaries->get_create (node)->time = time;
-  ipa_fn_summaries->get_create (node)->self_size = size;
+  ipa_fn_summary *s = ipa_fn_summaries->get_create (node);
+  s->time = time;
+  s->self_size = size;
   nonconstant_names.release ();
   ipa_release_body_info (&fbi);
   if (opt_for_fn (node->decl, optimize))
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index c4f904730e6..2e30a6d15ba 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -126,7 +126,7 @@ simple_edge_hints (struct cgraph_edge *edge)
 			? edge->caller->global.inlined_to : edge->caller);
   struct cgraph_node *callee = edge->callee->ultimate_alias_target ();
   if (ipa_fn_summaries->get_create (to)->scc_no
-  && ipa_fn_summaries->get_create (to)->scc_no
+  && ipa_fn_summaries->get (to)->scc_no
 	 == ipa_fn_summaries->get_create (callee)->scc_no
   && !edge->recursive_p ())
 hints |= INLINE_HINT_same_scc;
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index b015db07d15..12f5ebfd582 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1559,10 +1559,10 @@ recursive_inlining (struct cgraph_edge *edge,
 fprintf (dump_file,
 	 "\n   Inlined %i times, "
 	 "body grown from size %i to %i, time %f to %f\n", n,
-	 ipa_fn_summaries->get_create (master_clone)->size,
-	 ipa_fn_summaries->get_create (node)->size,
-	 ipa_fn_summaries->get_create (master_clone)->time.to_double (),
-	 ipa_fn_summaries->get_create (node)->time.to_double ());
+	 ipa_fn_summaries->get (master_clone)->size,
+	 ipa_fn_summaries->get (node)->size,
+	 ipa_fn_summaries->get (master_clone)->time.to_double (),
+	 ipa_fn_summaries->get (node)->time.to_double ());
 
   /* Remove master clone we used for inlining.  We rely that clones inlined
  into master clone gets queued just before master clone so we don't
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index e8ae206d7b7..06bd38e551e 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -81,10 +81,9 @@ estimate_edge_size (struct cgraph_edge *edge)
 static inline int
 estimate_edge_growth (struct cgraph_edge *edge)
 {
-  gcc_checking_assert (ipa_call_summaries->get_create (edge)->call_stmt_size
-		   || !edge->callee->analyzed);
-  return (estimate_edge_size (edge)
-	  - ipa_call_summaries->get_create (edge)->call_stmt_size);
+  ipa_call_summary *s = ipa_call_summaries->get_create (edge);
+  gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
+  return (estimate_edge_size (edge) - s->call_stmt_size);
 }
 
 /* Return estimated callee runtime increase after inlining


[PATCH 10/14] Add call_summary::get method and m_initialize_when_cloning.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* symbol-summary.h (get): New function.
(call_summary::m_initialize_when_cloning): New class member.
---
 gcc/symbol-summary.h | 66 +++-
 1 file changed, 50 insertions(+), 16 deletions(-)

diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index dda3ae5718f..4896c97a1cd 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -326,7 +326,8 @@ class GTY((user)) call_summary 
 public:
   /* Default construction takes SYMTAB as an argument.  */
   call_summary (symbol_table *symtab, bool ggc = false): m_ggc (ggc),
-m_map (13, ggc), m_released (false), m_symtab (symtab)
+m_initialize_when_cloning (false), m_map (13, ggc), m_released (false),
+m_symtab (symtab)
   {
 m_symtab_removal_hook =
   symtab->add_edge_removal_hook
@@ -374,7 +375,13 @@ public:
  If a summary for an edge does not exist, it will be created.  */
   T* get_create (cgraph_edge *edge)
   {
-return get_create (hashable_uid (edge));
+return get (hashable_uid (edge), true);
+  }
+
+  /* Getter for summary callgraph edge pointer.  */
+  T* get (cgraph_edge *edge)
+  {
+return get (hashable_uid (edge), false);
   }
 
   /* Return number of elements handled by data structure.  */
@@ -400,19 +407,14 @@ protected:
   /* Indication if we use ggc summary.  */
   bool m_ggc;
 
+  /* Initialize summary for an edge that is cloned.  */
+  bool m_initialize_when_cloning;
+
 private:
   typedef int_hash  map_hash;
 
   /* Getter for summary callgraph ID.  */
-  T* get_create (int uid)
-  {
-bool existed;
-T **v = &m_map.get_or_insert (uid, &existed);
-if (!existed)
-  *v = allocate_new ();
-
-return *v;
-  }
+  T *get (int uid, bool lazy_insert);
 
   /* Get a hashable uid of EDGE.  */
   int hashable_uid (cgraph_edge *edge)
@@ -438,6 +440,28 @@ private:
   gt_pointer_operator, void *);
 };
 
+template 
+T*
+call_summary::get (int uid, bool lazy_insert)
+{
+  gcc_checking_assert (uid > 0);
+
+  if (lazy_insert)
+{
+  bool existed;
+  T **v = &m_map.get_or_insert (uid, &existed);
+  if (!existed)
+	*v = allocate_new ();
+
+  return *v;
+}
+  else
+{
+  T **v = m_map.get (uid);
+  return v == NULL ? NULL : *v;
+}
+}
+
 template 
 void
 call_summary::release ()
@@ -492,15 +516,25 @@ call_summary::symtab_duplication (cgraph_edge *edge1,
    cgraph_edge *edge2, void *data)
 {
   call_summary *summary = (call_summary  *) (data);
-  T **v = summary->m_map.get (summary->hashable_uid (edge1));
+  T *edge1_summary = NULL;
 
-  if (v)
+  if (summary->m_initialize_when_cloning)
+edge1_summary = summary->get_create (edge1);
+  else
+{
+  T **v = summary->m_map.get (summary->hashable_uid (edge1));
+  if (v)
+	{
+	  /* This load is necessary, because we insert a new value!  */
+	  edge1_summary = *v;
+	}
+}
+
+  if (edge1_summary)
 {
-  /* This load is necessary, because we insert a new value!  */
-  T *data = *v;
   T *duplicate = summary->allocate_new ();
   summary->m_map.put (summary->hashable_uid (edge2), duplicate);
-  summary->duplicate (edge1, edge2, data, duplicate);
+  summary->duplicate (edge1, edge2, edge1_summary, duplicate);
 }
 }
 


[PATCH 11/14] Port IPA CP to edge_clone_summaries.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-cp.c (class edge_clone_summary): New summary.
(grow_edge_clone_vectors): Remove.
(ipcp_edge_duplication_hook): Remove.
(class edge_clone_summary_t): New call_summary class.
(ipcp_edge_removal_hook): Remove.
(edge_clone_summary_t::duplicate): New function.
(get_next_cgraph_edge_clone): Use edge_clone_summaries.
(create_specialized_node): Likewise.
(ipcp_driver): Initialize edge_clone_summaries and do not
register hooks.
---
 gcc/ipa-cp.c | 102 ---
 1 file changed, 49 insertions(+), 53 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d8b04d14310..ed756c5ccf2 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3392,54 +3392,56 @@ ipcp_discover_new_direct_edges (struct cgraph_node *node,
 ipa_update_overall_fn_summary (node);
 }
 
-/* Vector of pointers which for linked lists of clones of an original crgaph
-   edge. */
+class edge_clone_summary;
+static call_summary  *edge_clone_summaries = NULL;
 
-static vec next_edge_clone;
-static vec prev_edge_clone;
+/* Edge clone summary.  */
 
-static inline void
-grow_edge_clone_vectors (void)
+struct edge_clone_summary
 {
-  if (next_edge_clone.length ()
-  <=  (unsigned) symtab->edges_max_uid)
-next_edge_clone.safe_grow_cleared (symtab->edges_max_uid + 1);
-  if (prev_edge_clone.length ()
-  <=  (unsigned) symtab->edges_max_uid)
-prev_edge_clone.safe_grow_cleared (symtab->edges_max_uid + 1);
-}
+  /* Default constructor.  */
+  edge_clone_summary (): prev_clone (NULL), next_clone (NULL) {}
 
-/* Edge duplication hook to grow the appropriate linked list in
-   next_edge_clone. */
+  /* Default destructor.  */
+  ~edge_clone_summary ()
+  {
+if (prev_clone)
+  edge_clone_summaries->get (prev_clone)->next_clone = next_clone;
+if (next_clone)
+  edge_clone_summaries->get (next_clone)->prev_clone = prev_clone;
+  }
 
-static void
-ipcp_edge_duplication_hook (struct cgraph_edge *src, struct cgraph_edge *dst,
-			void *)
-{
-  grow_edge_clone_vectors ();
+  cgraph_edge *prev_clone;
+  cgraph_edge *next_clone;
+};
 
-  struct cgraph_edge *old_next = next_edge_clone[src->uid];
-  if (old_next)
-prev_edge_clone[old_next->uid] = dst;
-  prev_edge_clone[dst->uid] = src;
+class edge_clone_summary_t:
+  public call_summary 
+{
+public:
+  edge_clone_summary_t (symbol_table *symtab):
+call_summary  (symtab)
+{
+  m_initialize_when_cloning = true;
+}
 
-  next_edge_clone[dst->uid] = old_next;
-  next_edge_clone[src->uid] = dst;
-}
+  virtual void duplicate (cgraph_edge *src_edge, cgraph_edge *dst_edge,
+			  edge_clone_summary *src_data,
+			  edge_clone_summary *dst_data);
+};
 
-/* Hook that is called by cgraph.c when an edge is removed.  */
+/* Edge duplication hook.  */
 
-static void
-ipcp_edge_removal_hook (struct cgraph_edge *cs, void *)
+void
+edge_clone_summary_t::duplicate (cgraph_edge *src_edge, cgraph_edge *dst_edge,
+ edge_clone_summary *src_data,
+ edge_clone_summary *dst_data)
 {
-  grow_edge_clone_vectors ();
-
-  struct cgraph_edge *prev = prev_edge_clone[cs->uid];
-  struct cgraph_edge *next = next_edge_clone[cs->uid];
-  if (prev)
-next_edge_clone[prev->uid] = next;
-  if (next)
-prev_edge_clone[next->uid] = prev;
+  if (src_data->next_clone)
+edge_clone_summaries->get (src_data->next_clone)->prev_clone = dst_edge;
+  dst_data->prev_clone = src_edge;
+  dst_data->next_clone = src_data->next_clone;
+  src_data->next_clone = dst_edge;
 }
 
 /* See if NODE is a clone with a known aggregate value at a given OFFSET of a
@@ -3567,7 +3569,8 @@ cgraph_edge_brings_value_p (cgraph_edge *cs,
 static inline struct cgraph_edge *
 get_next_cgraph_edge_clone (struct cgraph_edge *cs)
 {
-  return next_edge_clone[cs->uid];
+  edge_clone_summary *s = edge_clone_summaries->get (cs);
+  return s != NULL ? s->next_clone : NULL;
 }
 
 /* Given VAL that is intended for DEST, iterate over all its sources and if any
@@ -3871,7 +3874,7 @@ create_specialized_node (struct cgraph_node *node,
   bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
   for (unsigned j = 0; j < self_recursive_calls.length (); j++)
 {
-  cgraph_edge *cs = next_edge_clone[self_recursive_calls[j]->uid];
+  cgraph_edge *cs = get_next_cgraph_edge_clone (self_recursive_calls[j]);
   /* Cloned edges can disappear during cloning as speculation can be
 	 resolved, check that we have one and that it comes from the last
 	 cloning.  */
@@ -3881,8 +3884,8 @@ create_specialized_node (struct cgraph_node *node,
 	 edge would confuse this mechanism, so let's check that does not
 	 happen.  */
   gcc_checking_assert (!cs
-			   || !next_edge_clone[cs->uid]
-			   || next_edge_clone[cs->uid]->caller != new_node);
+			   || !get_next_cgraph_edge_clone (cs)
+			   || get_next_cgraph_edge_clone (cs)->caller != new_node);
 

[PATCH 03/14] Rename get methods in symbol-summary.h to get_create.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* config/i386/i386.c (ix86_can_inline_p): Use get_create instead
of get.
* hsa-common.c (hsa_summary_t::link_functions): Likewise.
(hsa_register_kernel): Likewise.
* hsa-common.h (hsa_gpu_implementation_p): Likewise.
* hsa-gen.c (hsa_get_host_function): Likewise.
(get_brig_function_name): Likewise.
(generate_hsa): Likewise.
(pass_gen_hsail::execute): Likewise.
* ipa-cp.c (ipcp_cloning_candidate_p): Likewise.
(devirtualization_time_bonus): Likewise.
(ipcp_propagate_stage): Likewise.
* ipa-fnsummary.c (redirect_to_unreachable): Likewise.
(edge_set_predicate): Likewise.
(evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary::reset): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(dump_ipa_call_summary): Likewise.
(ipa_dump_fn_summary): Likewise.
(analyze_function_body): Likewise.
(compute_fn_summary): Likewise.
(estimate_edge_devirt_benefit): Likewise.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time): Likewise.
(estimate_node_size_and_time): Likewise.
(inline_update_callee_summaries): Likewise.
(remap_edge_change_prob): Likewise.
(remap_edge_summaries): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(ipa_update_overall_fn_summary): Likewise.
(read_ipa_call_summary): Likewise.
(inline_read_section): Likewise.
(write_ipa_call_summary): Likewise.
(ipa_fn_summary_write): Likewise.
(ipa_free_fn_summary): Likewise.
* ipa-hsa.c (process_hsa_functions): Likewise.
(ipa_hsa_write_summary): Likewise.
(ipa_hsa_read_section): Likewise.
* ipa-icf.c (sem_function::merge): Likewise.
* ipa-inline-analysis.c (simple_edge_hints): Likewise.
(do_estimate_edge_time): Likewise.
(estimate_size_after_inlining): Likewise.
(estimate_growth): Likewise.
(growth_likely_positive): Likewise.
* ipa-inline-transform.c (clone_inlined_nodes): Likewise.
(inline_call): Likewise.
* ipa-inline.c (caller_growth_limits): Likewise.
(can_inline_edge_p): Likewise.
(can_inline_edge_by_limits_p): Likewise.
(compute_uninlined_call_time): Likewise.
(compute_inlined_call_time): Likewise.
(want_inline_small_function_p): Likewise.
(edge_badness): Likewise.
(update_caller_keys): Likewise.
(update_callee_keys): Likewise.
(recursive_inlining): Likewise.
(inline_small_functions): Likewise.
(inline_to_all_callers_1): Likewise.
(dump_overall_stats): Likewise.
(early_inline_small_functions): Likewise.
(early_inliner): Likewise.
* ipa-inline.h (estimate_edge_growth): Likewise.
* ipa-profile.c (ipa_propagate_frequency_1): Likewise.
* ipa-prop.c (ipa_make_edge_direct_to_target): Likewise.
* ipa-prop.h (IPA_NODE_REF): Likewise.
(IPA_EDGE_REF): Likewise.
* ipa-pure-const.c (malloc_candidate_p): Likewise.
(propagate_malloc): Likewise.
* ipa-split.c (execute_split_functions): Likewise.
* symbol-summary.h: Rename get to get_create.
(get): Likewise.
(get_create): Likewise.
* tree-sra.c (ipa_sra_preliminary_function_checks): Likewise.

gcc/lto/ChangeLog:

2018-04-24  Martin Liska  

* lto-partition.c (add_symbol_to_partition_1): Use get_create instead
of get.
(undo_partition): Likewise.
(lto_balanced_map): Likewise.
---
 gcc/config/i386/i386.c |   2 +-
 gcc/hsa-common.c   |   6 +--
 gcc/hsa-common.h   |   3 +-
 gcc/hsa-gen.c  |  11 +++--
 gcc/ipa-cp.c   |   6 +--
 gcc/ipa-fnsummary.c| 112 -
 gcc/ipa-hsa.c  |  12 ++---
 gcc/ipa-icf.c  |   2 +-
 gcc/ipa-inline-analysis.c  |  21 +
 gcc/ipa-inline-transform.c |  12 ++---
 gcc/ipa-inline.c   |  72 +++--
 gcc/ipa-inline.h   |   4 +-
 gcc/ipa-profile.c  |   2 +-
 gcc/ipa-prop.c |   4 +-
 gcc/ipa-prop.h |   4 +-
 gcc/ipa-pure-const.c   |   6 +--
 gcc/ipa-split.c|   2 +-
 gcc/lto/lto-partition.c|   6 +--
 gcc/symbol-summary.h   |  22 +
 gcc/tree-sra.c |   2 +-
 20 files changed, 162 insertions(+), 149 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0c7a6b7d98f..cba192e6cfb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5803,7 +5803,7 @@ ix86_can_inline_p (tree caller, tree callee)
 	  for multi-versioning call optimization, so beware of
 	  ipa_fn_summaries not available.  */
 	   && (! ipa

[PATCH 12/14] Port edge_growth_cache to call_summary.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-inline-analysis.c (inline_edge_removal_hook): Remove.
(initialize_growth_caches): Remove.
(free_growth_caches): Likewise.
(do_estimate_edge_time): Use edge_growth_cache.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-inline.c (reset_edge_caches): Likewise.
(recursive_inlining): Likewise.
(inline_small_functions): Likewise.
* ipa-inline.h (initialize_growth_caches): Remove.
(estimate_edge_size): Likewise.
(estimate_edge_time): Likewise.
(estimate_edge_hints): Likewise.
(reset_edge_growth_cache): Likewise.
* symbol-summary.h (call_summary::remove): New method.
---
 gcc/ipa-inline-analysis.c | 57 ---
 gcc/ipa-inline.c  | 31 --
 gcc/ipa-inline.h  | 44 ++--
 gcc/symbol-summary.h  | 12 ++
 4 files changed, 62 insertions(+), 82 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 2e30a6d15ba..9a7267395ea 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -51,9 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 
 /* Cached node/edge growths.  */
-vec edge_growth_cache;
-static struct cgraph_edge_hook_list *edge_removal_hook_holder;
-
+call_summary *edge_growth_cache = NULL;
 
 /* Give initial reasons why inlining would fail on EDGE.  This gets either
nullified or usually overwritten by more precise reasons later.  */
@@ -80,40 +78,13 @@ initialize_inline_failed (struct cgraph_edge *e)
 }
 
 
-/* Keep edge cache consistent across edge removal.  */
-
-static void
-inline_edge_removal_hook (struct cgraph_edge *edge,
-			  void *data ATTRIBUTE_UNUSED)
-{
-  reset_edge_growth_cache (edge);
-}
-
-
-/* Initialize growth caches.  */
-
-void
-initialize_growth_caches (void)
-{
-  if (!edge_removal_hook_holder)
-edge_removal_hook_holder =
-  symtab->add_edge_removal_hook (&inline_edge_removal_hook, NULL);
-  if (symtab->edges_max_uid)
-edge_growth_cache.safe_grow_cleared (symtab->edges_max_uid);
-}
-
-
 /* Free growth caches.  */
 
 void
 free_growth_caches (void)
 {
-  if (edge_removal_hook_holder)
-{
-  symtab->remove_edge_removal_hook (edge_removal_hook_holder);
-  edge_removal_hook_holder = NULL;
-}
-  edge_growth_cache.release ();
+  delete edge_growth_cache;
+  edge_growth_cache = NULL;
 }
 
 /* Return hints derrived from EDGE.   */
@@ -188,17 +159,17 @@ do_estimate_edge_time (struct cgraph_edge *edge)
   gcc_checking_assert (time >= 0);
 
   /* When caching, update the cache entry.  */
-  if (edge_growth_cache.exists ())
+  if (edge_growth_cache != NULL)
 {
   ipa_fn_summaries->get_create (edge->callee)->min_size = min_size;
-  if ((int) edge_growth_cache.length () <= edge->uid)
-	edge_growth_cache.safe_grow_cleared (symtab->edges_max_uid);
-  edge_growth_cache[edge->uid].time = time;
-  edge_growth_cache[edge->uid].nonspec_time = nonspec_time;
+  edge_growth_cache_entry *entry
+	= edge_growth_cache->get_create (edge);
+  entry->time = time;
+  entry->nonspec_time = nonspec_time;
 
-  edge_growth_cache[edge->uid].size = size + (size >= 0);
+  entry->size = size + (size >= 0);
   hints |= simple_edge_hints (edge);
-  edge_growth_cache[edge->uid].hints = hints + 1;
+  entry->hints = hints + 1;
 }
   return time;
 }
@@ -219,10 +190,10 @@ do_estimate_edge_size (struct cgraph_edge *edge)
 
   /* When we do caching, use do_estimate_edge_time to populate the entry.  */
 
-  if (edge_growth_cache.exists ())
+  if (edge_growth_cache != NULL)
 {
   do_estimate_edge_time (edge);
-  size = edge_growth_cache[edge->uid].size;
+  size = edge_growth_cache->get (edge)->size;
   gcc_checking_assert (size);
   return size - (size > 0);
 }
@@ -260,10 +231,10 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
 
   /* When we do caching, use do_estimate_edge_time to populate the entry.  */
 
-  if (edge_growth_cache.exists ())
+  if (edge_growth_cache != NULL)
 {
   do_estimate_edge_time (edge);
-  hints = edge_growth_cache[edge->uid].hints;
+  hints = edge_growth_cache->get (edge)->hints;
   gcc_checking_assert (hints);
   return hints - 1;
 }
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 12f5ebfd582..97164716af8 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1287,9 +1287,10 @@ reset_edge_caches (struct cgraph_node *node)
   if (where->global.inlined_to)
 where = where->global.inlined_to;
 
-  for (edge = where->callers; edge; edge = edge->next_caller)
-if (edge->inline_failed)
-  reset_edge_growth_cache (edge);
+  if (edge_growth_cache != NULL)
+for (edge = where->callers; edge; edge = edge->next_caller)
+  if (edge->inline_failed)
+	edge_growt

[PATCH 14/14] Come up with cgraph_node::get_uid and make cgraph_node::uid private.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* cgraph.c (function_version_hasher::hash): Use
cgraph_node::get_uid ().
(function_version_hasher::equal):
* cgraph.h (cgraph_node::get_uid): New method.
* ipa-inline.c (update_caller_keys): Use
cgraph_node::get_uid ().
(update_callee_keys): Likewise.
* ipa-utils.c (searchc): Likewise.
(ipa_reduced_postorder): Likewise.
* lto-cgraph.c (input_node): Likewise.
* passes.c (is_pass_explicitly_enabled_or_disabled): Likewise.
* symbol-summary.h (symtab_insertion): Likewise.
(symtab_removal): Likewise.
(symtab_duplication): Likewise.
* tree-pretty-print.c (dump_function_header): Likewise.
* tree-sra.c (convert_callers_for_node): Likewise.
---
 gcc/cgraph.c|  4 ++--
 gcc/cgraph.h| 17 ++---
 gcc/ipa-inline.c|  4 ++--
 gcc/ipa-utils.c |  4 ++--
 gcc/lto-cgraph.c|  2 +-
 gcc/passes.c|  8 
 gcc/symbol-summary.h| 14 +++---
 gcc/tree-pretty-print.c |  2 +-
 gcc/tree-sra.c  |  2 +-
 9 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 572c775c14c..e7c9632a8c8 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -121,7 +121,7 @@ static GTY(()) hash_table *cgraph_fnver_htab = NULL;
 hashval_t
 function_version_hasher::hash (cgraph_function_version_info *ptr)
 {
-  int uid = ptr->this_node->uid;
+  int uid = ptr->this_node->get_uid ();
   return (hashval_t)(uid);
 }
 
@@ -130,7 +130,7 @@ bool
 function_version_hasher::equal (cgraph_function_version_info *n1,
 			   	cgraph_function_version_info *n2)
 {
-  return n1->this_node->uid == n2->this_node->uid;
+  return n1->this_node->get_uid () == n2->this_node->get_uid ();
 }
 
 /* Mark as GC root all allocated nodes.  */
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 1966893343d..a10ea04ef0d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -97,6 +97,8 @@ class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),
   symtab_node
 {
 public:
+  friend class symbol_table;
+
   /* Return name.  */
   const char *name () const;
 
@@ -890,6 +892,8 @@ struct cgraph_edge_hasher : ggc_ptr_hash
 
 struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node {
 public:
+  friend class symbol_table;
+
   /* Remove the node from cgraph and all inline clones inlined into it.
  Skip however removal of FORBIDDEN_NODE and return true if it needs to be
  removed.  This allows to call the function from outer loop walking clone
@@ -1268,6 +1272,12 @@ public:
 dump_cgraph (stderr);
   }
 
+  /* Get unique identifier of the node.  */
+  inline int get_uid ()
+  {
+return m_uid;
+  }
+
   /* Record that DECL1 and DECL2 are semantically identical function
  versions.  */
   static void record_function_versions (tree decl1, tree decl2);
@@ -1394,8 +1404,6 @@ public:
   /* How to scale counts at materialization time; used to merge
  LTO units with different number of profile runs.  */
   int count_materialization_scale;
-  /* Unique id of the node.  */
-  int uid;
   /* ID assigned by the profiling.  */
   unsigned int profile_id;
   /* Time profiler: first run of function.  */
@@ -1445,6 +1453,9 @@ public:
   unsigned indirect_call_target : 1;
 
 private:
+  /* Unique id of the node.  */
+  int m_uid;
+
   /* Worker for call_for_symbol_and_aliases.  */
   bool call_for_symbol_and_aliases_1 (bool (*callback) (cgraph_node *,
 		void *),
@@ -2629,7 +2640,7 @@ symbol_table::allocate_cgraph_symbol (void)
   else
 node = ggc_cleared_alloc ();
 
-  node->uid = cgraph_max_uid++;
+  node->m_uid = cgraph_max_uid++;
   return node;
 }
 
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 97164716af8..9d22d208f3d 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1337,7 +1337,7 @@ update_caller_keys (edge_heap_t *heap, struct cgraph_node *node,
   if ((!node->alias && !ipa_fn_summaries->get_create (node)->inlinable)
   || node->global.inlined_to)
 return;
-  if (!bitmap_set_bit (updated_nodes, node->uid))
+  if (!bitmap_set_bit (updated_nodes, node->get_uid ()))
 return;
 
   FOR_EACH_ALIAS (node, ref)
@@ -1395,7 +1395,7 @@ update_callee_keys (edge_heap_t *heap, struct cgraph_node *node,
 	&& (callee = e->callee->ultimate_alias_target (&avail, e->caller))
 	&& ipa_fn_summaries->get_create (callee)->inlinable
 	&& avail >= AVAIL_AVAILABLE
-	&& !bitmap_bit_p (updated_nodes, callee->uid))
+	&& !bitmap_bit_p (updated_nodes, callee->get_uid ()))
 	  {
 	if (can_inline_edge_p (e, false)
 		&& want_inline_small_function_p (e, false)
diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
index aa586f5feb2..9985721f7da 100644
--- a/gcc/ipa-utils.c
+++ b/gcc/ipa-utils.c
@@ -86,7 +86,7 @@ searchc (struct searchc_env* env, struct cgraph_node *v,
 
   /* mark node as old */
   v_info->new_node = false;
-  splay_tree_remove (env->nodes_mar

Re: [PATCH PR85793]Fix ICE by loading vector(1) scalara_type for 1 element-wise case

2018-05-17 Thread Bin.Cheng
On Thu, May 17, 2018 at 11:07 AM, Richard Biener
 wrote:
> On Wed, May 16, 2018 at 5:13 PM Bin Cheng  wrote:
>
>> Hi,
>> This patch fixes ICE by loading vector(1) scalar_type if it's 1
> element-wise for VMAT_ELEMENTWISE.
>> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?
>
> OK.
Bootstrap and test finished well.  I also need approval for GCC 8 backport.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>> 2018-05-16  Bin Cheng  
>>  Richard Biener  
>
>>  PR tree-optimization/85793
>>  * tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise
> load
>>  for VMAT_ELEMENTWISE.
>
>> gcc/testsuite
>> 2018-05-16  Bin Cheng  
>
>>  PR tree-optimization/85793
>>  * gcc.dg/vect/pr85793.c: New test.


[PATCH 00/14] Finish transition of {symbol,call}_summary.

2018-05-17 Thread marxin
Hi.

Following patch series finishes transition of IPA summary info that is
assigned either to cgraph_node or to cgraph_edge. Instead of using
a vector indexed with uid, we use summaries that are internally
implemented as hash_map.

Doing the transition we can remove summary_uid and uid property finally
happens unique identifier.

There are still places where ::get_create can be replaced with ::get,
but I made consensus with Honza that it can be done incrementally.

The series can bootstrap and survives regression tests on
ppc64le-linux-gnu and can build Libreoffice with LTO on
x86_64-linux-gnu.

Martin

marxin (14):
  Code refactoring of symtab_summary.
  Code refactoring for call_summary.
  Rename get methods in symbol-summary.h to get_create.
  Add {symbol,call}_summary::get method and use it in HSA.
  Use summaries->get where possible. Small refactoring of multiple
calls.
  Use symtab_summary in ipa-reference.c.
  Covert ipa-pure-const.c to symbol_summary.
  Convert IPA CP to symbol_summary.
  Remove cgraph_node::summary_uid and make cgraph_node::uid really
unique.
  Add call_summary::get method and m_initialize_when_cloning.
  Port IPA CP to edge_clone_summaries.
  Port edge_growth_cache to call_summary.
  Make cgraph_edge::uid really unique.
  Come up with cgraph_node::get_uid and make cgraph_node::uid private.

 gcc/cgraph.c   |  16 +-
 gcc/cgraph.h   |  47 +++--
 gcc/config/i386/i386.c |   2 +-
 gcc/hsa-common.c   |   6 +-
 gcc/hsa-common.h   |  12 +-
 gcc/hsa-gen.c  |  12 +-
 gcc/ipa-cp.c   | 116 ++--
 gcc/ipa-fnsummary.c| 116 ++--
 gcc/ipa-hsa.c  |  14 +-
 gcc/ipa-icf.c  |   2 +-
 gcc/ipa-inline-analysis.c  |  76 +++-
 gcc/ipa-inline-transform.c |  12 +-
 gcc/ipa-inline.c   |  99 ++-
 gcc/ipa-inline.h   |  51 +++---
 gcc/ipa-profile.c  |   2 +-
 gcc/ipa-prop.c |  62 +++
 gcc/ipa-prop.h |  40 +++--
 gcc/ipa-pure-const.c   | 199 -
 gcc/ipa-reference.c| 205 ++---
 gcc/ipa-split.c|   2 +-
 gcc/ipa-utils.c|   4 +-
 gcc/lto-cgraph.c   |   2 +-
 gcc/lto/lto-partition.c|  32 ++--
 gcc/passes.c   |  39 ++--
 gcc/symbol-summary.h   | 431 -
 gcc/tree-pretty-print.c|   2 +-
 gcc/tree-sra.c |   4 +-
 27 files changed, 810 insertions(+), 795 deletions(-)

-- 
2.16.3



[PATCH 04/14] Add {symbol,call}_summary::get method and use it in HSA.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* hsa-common.h (enum hsa_function_kind): Rename HSA_NONE to
HSA_INVALID.
(hsa_function_summary::hsa_function_summary): Use the new enum
value.
(hsa_gpu_implementation_p): Use hsa_summaries::get.
* hsa-gen.c (hsa_get_host_function): Likewise.
(get_brig_function_name): Likewise.
* ipa-hsa.c (process_hsa_functions): Likewise.
(ipa_hsa_write_summary): Likewise.
* symbol-summary.h (symtab_duplication): Use ::get function/
(get): New function.
---
 gcc/hsa-common.h | 15 ---
 gcc/hsa-gen.c|  9 +++--
 gcc/ipa-hsa.c| 22 +++---
 gcc/symbol-summary.h | 40 ++--
 4 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/gcc/hsa-common.h b/gcc/hsa-common.h
index 849363c7b49..c72343fbdab 100644
--- a/gcc/hsa-common.h
+++ b/gcc/hsa-common.h
@@ -1208,7 +1208,7 @@ public:
 
 enum hsa_function_kind
 {
-  HSA_NONE,
+  HSA_INVALID,
   HSA_KERNEL,
   HSA_FUNCTION
 };
@@ -1234,7 +1234,7 @@ struct hsa_function_summary
 };
 
 inline
-hsa_function_summary::hsa_function_summary (): m_kind (HSA_NONE),
+hsa_function_summary::hsa_function_summary (): m_kind (HSA_INVALID),
   m_bound_function (NULL), m_gpu_implementation_p (false)
 {
 }
@@ -1244,7 +1244,10 @@ class hsa_summary_t: public function_summary 
 {
 public:
   hsa_summary_t (symbol_table *table):
-function_summary (table) { }
+function_summary (table)
+  {
+disable_insertion_hook ();
+  }
 
   /* Couple GPU and HOST as gpu-specific and host-specific implementation of
  the same function.  KIND determines whether GPU is a host-invokable kernel
@@ -1407,10 +1410,8 @@ hsa_gpu_implementation_p (tree decl)
   if (hsa_summaries == NULL)
 return false;
 
-  hsa_function_summary *s
-= hsa_summaries->get_create (cgraph_node::get_create (decl));
-
-  return s->m_gpu_implementation_p;
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
+  return s != NULL && s->m_gpu_implementation_p;
 }
 
 #endif /* HSA_H */
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index b573bd57d20..5f9feb31067 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -961,9 +961,7 @@ get_symbol_for_decl (tree decl)
 tree
 hsa_get_host_function (tree decl)
 {
-  hsa_function_summary *s
-= hsa_summaries->get_create (cgraph_node::get_create (decl));
-  gcc_assert (s->m_kind != HSA_NONE);
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
   gcc_assert (s->m_gpu_implementation_p);
 
   return s->m_bound_function ? s->m_bound_function->decl : NULL;
@@ -976,9 +974,8 @@ get_brig_function_name (tree decl)
 {
   tree d = decl;
 
-  hsa_function_summary *s
-= hsa_summaries->get_create (cgraph_node::get_create (d));
-  if (s->m_kind != HSA_NONE
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (d));
+  if (s != NULL
   && s->m_gpu_implementation_p
   && s->m_bound_function)
 d = s->m_bound_function->decl;
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 90d193fe517..7cfeba08e58 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -76,13 +76,13 @@ process_hsa_functions (void)
 
   FOR_EACH_DEFINED_FUNCTION (node)
 {
-  hsa_function_summary *s = hsa_summaries->get_create (node);
+  hsa_function_summary *s = hsa_summaries->get (node);
 
   /* A linked function is skipped.  */
-  if (s->m_bound_function != NULL)
+  if (s != NULL && s->m_bound_function != NULL)
 	continue;
 
-  if (s->m_kind != HSA_NONE)
+  if (s != NULL)
 	{
 	  if (!check_warn_node_versionable (node))
 	continue;
@@ -130,11 +130,11 @@ process_hsa_functions (void)
 
   while (e)
 	{
-	  hsa_function_summary *src = hsa_summaries->get_create (node);
-	  if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p)
+	  hsa_function_summary *src = hsa_summaries->get (node);
+	  if (src != NULL && src->m_gpu_implementation_p)
 	{
-	  hsa_function_summary *dst = hsa_summaries->get_create (e->callee);
-	  if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
+	  hsa_function_summary *dst = hsa_summaries->get (e->callee);
+	  if (dst != NULL && !dst->m_gpu_implementation_p)
 		{
 		  e->redirect_callee (dst->m_bound_function);
 		  if (dump_file)
@@ -174,9 +174,9 @@ ipa_hsa_write_summary (void)
lsei_next_function_in_partition (&lsei))
 {
   node = lsei_cgraph_node (lsei);
-  hsa_function_summary *s = hsa_summaries->get_create (node);
+  hsa_function_summary *s = hsa_summaries->get (node);
 
-  if (s->m_kind != HSA_NONE)
+  if (s != NULL)
 	count++;
 }
 
@@ -187,9 +187,9 @@ ipa_hsa_write_summary (void)
lsei_next_function_in_partition (&lsei))
 {
   node = lsei_cgraph_node (lsei);
-  hsa_function_summary *s = hsa_summaries->get_create (node);
+  hsa_function_summary *s = hsa_summaries->get (node);
 
-  if (s->

[PATCH 02/14] Code refactoring for call_summary.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* symbol-summary.h (release): Move definition out of class
declaration.
(symtab_removal): Likewise.
(symtab_duplication): Likewise.
---
 gcc/symbol-summary.h | 123 +--
 1 file changed, 70 insertions(+), 53 deletions(-)

diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index 13f8f04342a..a73472ef0ae 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -330,21 +330,7 @@ public:
   }
 
   /* Destruction method that can be called for GGT purpose.  */
-  void release ()
-  {
-if (m_released)
-  return;
-
-m_symtab->remove_edge_removal_hook (m_symtab_removal_hook);
-m_symtab->remove_edge_duplication_hook (m_symtab_duplication_hook);
-
-/* Release all summaries.  */
-typedef typename hash_map ::iterator map_iterator;
-for (map_iterator it = m_map.begin (); it != m_map.end (); ++it)
-  release ((*it).second);
-
-m_released = true;
-  }
+  void release ();
 
   /* Traverses all summarys with a function F called with
  ARG as argument.  */
@@ -369,16 +355,7 @@ public:
   }
 
   /* Release an item that is stored within map.  */
-  void release (T *item)
-  {
-if (m_ggc)
-  {
-	item->~T ();
-	ggc_free (item);
-  }
-else
-  delete item;
-  }
+  void release (T *item);
 
   /* Getter for summary callgraph edge pointer.  */
   T* get (cgraph_edge *edge)
@@ -399,37 +376,11 @@ public:
   }
 
   /* Symbol removal hook that is registered to symbol table.  */
-  static void symtab_removal (cgraph_edge *edge, void *data)
-  {
-call_summary *summary = (call_summary  *) (data);
-
-int h_uid = summary->hashable_uid (edge);
-T **v = summary->m_map.get (h_uid);
-
-if (v)
-  {
-	summary->remove (edge, *v);
-	summary->release (*v);
-	summary->m_map.remove (h_uid);
-  }
-  }
+  static void symtab_removal (cgraph_edge *edge, void *data);
 
   /* Symbol duplication hook that is registered to symbol table.  */
   static void symtab_duplication (cgraph_edge *edge1, cgraph_edge *edge2,
-  void *data)
-  {
-call_summary *summary = (call_summary  *) (data);
-T **v = summary->m_map.get (summary->hashable_uid (edge1));
-
-if (v)
-  {
-	/* This load is necessary, because we insert a new value!  */
-	T *data = *v;
-	T *duplicate = summary->allocate_new ();
-	summary->m_map.put (summary->hashable_uid (edge2), duplicate);
-	summary->duplicate (edge1, edge2, data, duplicate);
-  }
-  }
+  void *data);
 
 protected:
   /* Indication if we use ggc summary.  */
@@ -473,6 +424,72 @@ private:
   gt_pointer_operator, void *);
 };
 
+template 
+void
+call_summary::release ()
+{
+  if (m_released)
+return;
+
+  m_symtab->remove_edge_removal_hook (m_symtab_removal_hook);
+  m_symtab->remove_edge_duplication_hook (m_symtab_duplication_hook);
+
+  /* Release all summaries.  */
+  typedef typename hash_map ::iterator map_iterator;
+  for (map_iterator it = m_map.begin (); it != m_map.end (); ++it)
+release ((*it).second);
+
+  m_released = true;
+}
+
+template 
+void
+call_summary::release (T *item)
+{
+  if (m_ggc)
+{
+  item->~T ();
+  ggc_free (item);
+}
+  else
+delete item;
+}
+
+template 
+void
+call_summary::symtab_removal (cgraph_edge *edge, void *data)
+{
+  call_summary *summary = (call_summary  *) (data);
+
+  int h_uid = summary->hashable_uid (edge);
+  T **v = summary->m_map.get (h_uid);
+
+  if (v)
+{
+  summary->remove (edge, *v);
+  summary->release (*v);
+  summary->m_map.remove (h_uid);
+}
+}
+
+template 
+void
+call_summary::symtab_duplication (cgraph_edge *edge1,
+   cgraph_edge *edge2, void *data)
+{
+  call_summary *summary = (call_summary  *) (data);
+  T **v = summary->m_map.get (summary->hashable_uid (edge1));
+
+  if (v)
+{
+  /* This load is necessary, because we insert a new value!  */
+  T *data = *v;
+  T *duplicate = summary->allocate_new ();
+  summary->m_map.put (summary->hashable_uid (edge2), duplicate);
+  summary->duplicate (edge1, edge2, data, duplicate);
+}
+}
+
 template 
 void
 gt_ggc_mx(call_summary* const &summary)


[PATCH 08/14] Convert IPA CP to symbol_summary.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-cp.c (ipcp_store_bits_results): Use
ipcp_transformation_sum.
(ipcp_store_vr_results): Likewise.
* ipa-prop.c (ipcp_grow_transformations_if_necessary): Renamed
to ...
(ipcp_transformation_initialize): ... this.
(ipa_set_node_agg_value_chain):
(ipa_node_params_t::duplicate): Use ipcp_transformation_sum.
(write_ipcp_transformation_info): Likewise.
(read_ipcp_transformation_info): Likewise.
(ipcp_update_bits): Likewise.
(ipcp_update_vr): Likewise.
(ipcp_transform_function): Likewise.
* ipa-prop.h: Rename ipcp_transformation_summary to
ipcp_transformation.
(class ipcp_transformation_t): New function summary.
(ipcp_get_transformation_summary): Use ipcp_transformation_sum.
(ipa_get_agg_replacements_for_node): Likewise.
---
 gcc/ipa-cp.c   |  8 
 gcc/ipa-prop.c | 58 +-
 gcc/ipa-prop.h | 36 
 3 files changed, 61 insertions(+), 41 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index f682ee30dec..d8b04d14310 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4969,8 +4969,8 @@ ipcp_store_bits_results (void)
   if (!found_useful_result)
 	continue;
 
-  ipcp_grow_transformations_if_necessary ();
-  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+  ipcp_transformation_initialize ();
+  ipcp_transformation *ts = ipcp_transformation_sum->get_create (node);
   vec_safe_reserve_exact (ts->bits, count);
 
   for (unsigned i = 0; i < count; i++)
@@ -5042,8 +5042,8 @@ ipcp_store_vr_results (void)
   if (!found_useful_result)
 	continue;
 
-  ipcp_grow_transformations_if_necessary ();
-  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+  ipcp_transformation_initialize ();
+  ipcp_transformation *ts = ipcp_transformation_sum->get_create (node);
   vec_safe_reserve_exact (ts->m_vr, count);
 
   for (unsigned i = 0; i < count; i++)
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index c725b30c33d..daada4d55a5 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -55,8 +55,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Function summary where the parameter infos are actually stored. */
 ipa_node_params_t *ipa_node_params_sum = NULL;
-/* Vector of IPA-CP transformation data for each clone.  */
-vec *ipcp_transformations;
+
+function_summary  *ipcp_transformation_sum = NULL;
+
 /* Edge summary for IPA-CP edge information.  */
 ipa_edge_args_sum_t *ipa_edge_args_sum;
 
@@ -3729,19 +3730,18 @@ ipa_free_all_node_params (void)
   ipa_node_params_sum = NULL;
 }
 
-/* Grow ipcp_transformations if necessary.  Also allocate any necessary hash
+/* Initialize IPA CP transformation summary and also allocate any necessary hash
tables if they do not already exist.  */
 
 void
-ipcp_grow_transformations_if_necessary (void)
+ipcp_transformation_initialize (void)
 {
-  if (vec_safe_length (ipcp_transformations)
-  <= (unsigned) symtab->cgraph_max_uid)
-vec_safe_grow_cleared (ipcp_transformations, symtab->cgraph_max_uid + 1);
   if (!ipa_bits_hash_table)
 ipa_bits_hash_table = hash_table::create_ggc (37);
   if (!ipa_vr_hash_table)
 ipa_vr_hash_table = hash_table::create_ggc (37);
+  if (ipcp_transformation_sum == NULL)
+ipcp_transformation_sum = ipcp_transformation_t::create_ggc (symtab);
 }
 
 /* Set the aggregate replacements of NODE to be AGGVALS.  */
@@ -3750,8 +3750,9 @@ void
 ipa_set_node_agg_value_chain (struct cgraph_node *node,
 			  struct ipa_agg_replacement_value *aggvals)
 {
-  ipcp_grow_transformations_if_necessary ();
-  (*ipcp_transformations)[node->uid].agg_values = aggvals;
+  ipcp_transformation_initialize ();
+  ipcp_transformation *s = ipcp_transformation_sum->get_create (node);
+  s->agg_values = aggvals;
 }
 
 /* Hook that is called by cgraph.c when an edge is removed.  Adjust reference
@@ -3915,15 +3916,14 @@ ipa_node_params_t::duplicate(cgraph_node *src, cgraph_node *dst,
   ipa_set_node_agg_value_chain (dst, new_av);
 }
 
-  ipcp_transformation_summary *src_trans
-= ipcp_get_transformation_summary (src);
+  ipcp_transformation *src_trans = ipcp_get_transformation_summary (src);
 
   if (src_trans)
 {
-  ipcp_grow_transformations_if_necessary ();
-  src_trans = ipcp_get_transformation_summary (src);
-  ipcp_transformation_summary *dst_trans
-	= ipcp_get_transformation_summary (dst);
+  ipcp_transformation_initialize ();
+  src_trans = ipcp_transformation_sum->get_create (src);
+  ipcp_transformation *dst_trans
+	= ipcp_transformation_sum->get_create (dst);
 
   dst_trans->bits = vec_safe_copy (src_trans->bits);
 
@@ -4565,7 +4565,7 @@ write_ipcp_transformation_info (output_block *ob, cgraph_node *node)
   streamer_write_bitpack (&bp);

[PATCH][arm][1/2] Remove support for deprecated -march=armv5 and armv5e

2018-05-17 Thread Kyrill Tkachov

Hi all,

The -march=armv5 and armv5e options have been deprecated in GCC 7 [1].
This patch removes support for them.
It's mostly mechanical stuff. The functionality that was previously
gated on arm_arch5 is now gated on arm_arch5t and the functionality
that was gated on arm_arch5e is now gated on arm_arch5te.

A path in TARGET_OS_CPP_BUILTINS for VxWorks is now unreachable and
therefore is deleted.

References to armv5 and armv5e are deleted/updated throughout the
source tree and testsuite.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Also built a cc1 for arm-wrs-vxworks as a sanity check.

Ramana, Richard, I'd appreciate an ok from either of you that you're happy for 
this to go ahead.

Thanks,
Kyrill

[1] https://gcc.gnu.org/gcc-7/changes.html#arm

gcc/
2018-05-17  Kyrylo Tkachov  

* config/arm/arm-cpus.in (armv5, armv5e): Delete features.
(armv5t, armv5te): New features.
(ARMv5, ARMv5e): Delete fgroups.
(ARMv5t, ARMv5te): Adjust for above changes.
(ARMv6m): Likewise.
(armv5, armv5e): Delete arches.
* config/arm/arm.md (*call_reg_armv5): Use arm_arch5t instead of
arm_arch5.
(*call_reg_arm): Likewise.
(*call_value_reg_armv5): Likewise.
(*call_value_reg_arm): Likewise.
(*call_symbol): Likewise.
(*call_value_symbol): Likewise.
(*sibcall_insn): Likewise.
(*sibcall_value_insn): Likewise.
(clzsi2): Likewise.
(prefetch): Likewise.
(define_split and define_peephole2 dependent on arm_arch5):
Likewise.
* config/arm/arm.h (TARGET_LDRD): Use arm_arch5te instead of
arm_arch5e.
(TARGET_ARM_QBIT): Likewise.
(TARGET_DSP_MULTIPLY): Likewise.
(enum base_architecture): Delete BASE_ARCH_5, BASE_ARCH_5E.
(arm_arch5, arm_arch5e): Delete.
(arm_arch5t, arm_arch5te): Declare.
* config/arm/arm.c (arm_arch5, arm_arch5e): Delete.
(arm_arch5t): Declare.
(arm_option_reconfigure_globals): Update for the above.
(arm_options_perform_arch_sanity_checks): Update comment, replace
use of arm_arch5 with arm_arch5t.
(use_return_insn): Likewise.
(arm_emit_call_insn): Likewise.
(output_return_instruction): Likewise.
(arm_final_prescan_insn): Likewise.
(arm_coproc_builtin_available): Likewise.
* config/arm/arm-c.c (arm_cpu_builtins): Replace arm_arch5 and
arm_arch5e with arm_arch5t and arm_arch5te.
* config/arm/arm-protos.h (arm_arch5, arm_arch5e): Delete.
(arm_arch5t, arm_arch5te): Declare.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-arm-elf: Remove references to armv5, armv5e.
* config/arm/t-multilib: Likewise.
* config/arm/thumb1.md (*call_reg_thumb1_v5): Check arm_arch5t
instead of arm_arch5.
(*call_reg_thumb1): Likewise.
(*call_value_reg_thumb1_v5): Likewise.
(*call_value_reg_thumb1): Likewise.
* config/arm/vxworks.h (TARGET_OS_CPP_BUILTINS): Remove now
unreachable path.
* doc/invoke.texi (ARM Options): Remove references to armv5, armv5e.

gcc/testsuite/
2018-05-17  Kyrylo Tkachov  

* gcc.target/arm/pr40887.c: Update comment.
* lib/target-supports.exp: Don't generate effective target checks
and related helpers for armv5.  Update comment.
* gcc.target/arm/armv5_thumb_isa.c: Delete.
* gcc.target/arm/di-longlong64-sync-withhelpers.c: Update effective
target check and options.

libgcc/
2018-05-17  Kyrylo Tkachov  

* config/arm/libunwind.S: Update comment relating to armv5.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7c741e9fe66a0e086556272a46c4cd709996ce36..4471f7914cf282c516a142174f9913e491558b44 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -212,9 +212,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 {
   int coproc_level = 0x1;
 
-  if (arm_arch5)
+  if (arm_arch5t)
 	coproc_level |= 0x2;
-  if (arm_arch5e)
+  if (arm_arch5te)
 	coproc_level |= 0x4;
   if (arm_arch6)
 	coproc_level |= 0x8;
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 96972a057e7459874ef0bdac5e6379fb666e4189..0a318877f10394e2c045d2a03a8f0757557136cf 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -60,14 +60,14 @@ define feature mode32
 # Architecture rel 4
 define feature armv4
 
-# Architecture rel 5
-define feature armv5
-
 # Thumb aware.
 define feature thumb
 
-# Architecture rel 5e.
-define feature armv5e
+# Architecture rel 5t.
+define feature armv5t
+
+# Architecture rel 5te.
+define feature armv5te
 
 # XScale.
 define feature xscale
@@ -220,10 +220,8 @@ define fgroup ARMv3   ARMv2 mode32
 define fgroup ARMv3m  ARMv3 armv3m
 define fgroup ARMv4   ARMv3m armv4
 define fgroup ARMv4t  ARMv4 thumb
-define fgroup ARMv5   ARMv4 armv5
-define fgroup ARMv5t  ARMv5 thumb
-define fgroup ARMv5e  ARMv5 armv5e
-define fgroup ARMv5te ARMv5e thumb
+define fgroup ARMv5t  ARMv4t armv5t
+define fgroup ARMv5te ARMv5t armv5te
 define fgroup ARMv5tejARMv5te
 defin

[PATCH][arm][2/2] Remove support for -march=armv3 and older

2018-05-17 Thread Kyrill Tkachov

Hi all,

We deprecated architecture versions earlier than Armv4T in GCC 6 [1].
This patch removes support for architectures lower than Armv4.
That is the -march values armv2, armv2a, armv3, armv3m are removed
with this patch.  I did not remove armv4 because it's a bit more
involved code-wise and there has been some pushback on the implications
for -mcpu=strongarm support.

Removing armv3m and earlier though is pretty straightforward.
This allows us to get rid of the armv3m and mode32 feature bits
in arm-cpus.in as they can be assumed to be universally available.

Consequently the mcpu values arm2, arm250, arm3, arm6, arm60, arm600, arm610, 
arm620, arm7, arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720, arm710c, 
arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dm are now also removed.

Bootstrapped and tested on arm-none-linux-gnueabihf and on arm-none-eabi
with an aprofile multilib configuration (which builds quite a lot of library
configurations).

Ramana, Richard, I'd appreciate an ok from either of you that you're happy for 
this to go ahead.

Thanks,
Kyrill

[1] https://gcc.gnu.org/gcc-6/changes.html#arm

2018-05-17  Kyrylo Tkachov  

* config/arm/arm-cpus.in (armv3m, mode32): Delete features.
(ARMv4): Update.
(ARMv2, ARMv3, ARMv3m): Delete fgroups.
(ARMv6m): Update.
(armv2, armv2a, armv3, armv3m): Delete architectures.
(arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620,
arm7, arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720,
arm710c, arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dmi):
Delete cpus.
* config/arm/arm.md (maddsidi4): Remove check for arm_arch3m.
(*mulsidi3adddi): Likewise.
(mulsidi3): Likewise.
(*mulsidi3_nov6): Likewise.
(umulsidi3): Likewise.
(umulsidi3_nov6): Likewise.
(umaddsidi4): Likewise.
(*umulsidi3adddi): Likewise.
(smulsi3_highpart): Likewise.
(*smulsi3_highpart_nov6): Likewise.
(umulsi3_highpart): Likewise.
(*umulsi3_highpart_nov6): Likewise.
* config/arm/arm.h (arm_arch3m): Delete.
* config/arm/arm.c (arm_arch3m): Delete.
(arm_option_override_internal): Update armv3-related comment.
(arm_configure_build_target): Delete use of isa_bit_mode32.
(arm_option_reconfigure_globals): Delete set of arm_ach3m.
(arm_rtx_costs_internal): Delete check of arm_arch3m.
* config/arm/arm-fixed.md (mulsq3): Delete check for arm_arch3m.
(mulsa3): Likewise.
(mulusa3): Likewise.
* config/arm/arm-protos.h (arm_arch3m): Delete.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/t-arm-elf (all_early_nofp): Delete mentions of
deleted architectures.

2018-05-17  Kyrylo Tkachov  

* gcc.target/arm/pr62554.c: Delete.
* gcc.target/arm/pr69610-1.c: Likewise.
* gcc.target/arm/pr69610-2.c: Likewise.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 0a318877f10394e2c045d2a03a8f0757557136cf..16a381c86b6a7947e424b29fe67812990519ada9 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -48,15 +48,9 @@
 
 # Features - general convention: all lower case.
 
-# Extended multiply
-define feature armv3m
-
 # 26-bit mode support
 define feature mode26
 
-# 32-bit mode support
-define feature mode32
-
 # Architecture rel 4
 define feature armv4
 
@@ -215,10 +209,7 @@ define fgroup ALL_FPU_INTERNAL	vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl ALL_SIMD_I
 # -mfpu support.
 define fgroup ALL_FP	fp16 ALL_FPU_INTERNAL
 
-define fgroup ARMv2   notm
-define fgroup ARMv3   ARMv2 mode32
-define fgroup ARMv3m  ARMv3 armv3m
-define fgroup ARMv4   ARMv3m armv4
+define fgroup ARMv4   armv4 notm
 define fgroup ARMv4t  ARMv4 thumb
 define fgroup ARMv5t  ARMv4t armv5t
 define fgroup ARMv5te ARMv5t armv5te
@@ -232,7 +223,7 @@ define fgroup ARMv6zk ARMv6k
 define fgroup ARMv6t2 ARMv6 thumb2
 # This is suspect.  ARMv6-m doesn't really pull in any useful features
 # from ARMv5* or ARMv6.
-define fgroup ARMv6m  mode32 armv3m armv4 thumb armv5t armv5te armv6 be8
+define fgroup ARMv6m  armv4 thumb armv5t armv5te armv6 be8
 # This is suspect, the 'common' ARMv7 subset excludes the thumb2 'DSP' and
 # integer SIMD instructions that are in ARMv6T2.  */
 define fgroup ARMv7   ARMv6m thumb2 armv7
@@ -279,34 +270,6 @@ define fgroup ALL_QUIRKS   quirk_no_volatile_ce quirk_armv6kz quirk_cm3_ldrd
 # end arch 
 #
 
-begin arch armv2
- tune for arm2
- tune flags CO_PROC NO_MODE32
- base 2
- isa ARMv2 mode26
-end arch armv2
-
-begin arch armv2a
- tune for arm2
- tune flags CO_PROC NO_MODE32
- base 2
- isa ARMv2 mode26
-end arch armv2a
-
-begin arch armv3
- tune for arm6
- tune flags CO_PROC
- base 3
- isa ARMv3 mode26
-end arch armv3
-
-begin arch armv3m
- tune for arm7m
- tune flags CO_PROC
- base 3M
- isa ARMv3m mode26
-end arch armv3m
-
 begin arch armv4
  tune for arm7tdmi
  tune flags CO_PROC
@@ -675,154 +638,6 @@ end arch iwmmxt2
 # option must simila

[PATCH 07/14] Covert ipa-pure-const.c to symbol_summary.

2018-05-17 Thread marxin

gcc/ChangeLog:

2018-04-24  Martin Liska  

* ipa-pure-const.c (struct funct_state_d): Do it class instead
of struct.
(class funct_state_summary_t): New function_summary class.
(has_function_state): Remove.
(get_function_state): Likewise.
(set_function_state): Likewise.
(add_new_function): Likewise.
(funct_state_summary_t::insert): New function.
(duplicate_node_data): Remove.
(remove_node_data): Remove.
(funct_state_summary_t::duplicate): New function.
(register_hooks): Create new funct_state_summaries.
(pure_const_generate_summary): Use it.
(pure_const_write_summary): Likewise.
(pure_const_read_summary): Likewise.
(propagate_pure_const): Likewise.
(propagate_nothrow): Likewise.
(dump_malloc_lattice): Likewise.
(propagate_malloc): Likewise.
(execute): Do not register hooks, just remove summary
instead.
(pass_ipa_pure_const::pass_ipa_pure_const): Simplify
constructor.
---
 gcc/ipa-pure-const.c | 193 ++-
 1 file changed, 66 insertions(+), 127 deletions(-)

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 1f624965f1a..4ac2021c04e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -85,8 +85,20 @@ static const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_botto
 
 /* Holder for the const_state.  There is one of these per function
decl.  */
-struct funct_state_d
+class funct_state_d
 {
+public:
+  funct_state_d (): pure_const_state (IPA_NEITHER),
+state_previously_known (IPA_NEITHER), looping_previously_known (true),
+looping (true), can_throw (true), can_free (true),
+malloc_state (STATE_MALLOC_BOTTOM) {}
+
+  funct_state_d (const funct_state_d &s): pure_const_state (s.pure_const_state),
+state_previously_known (s.state_previously_known),
+looping_previously_known (s.looping_previously_known),
+looping (s.looping), can_throw (s.can_throw), can_free (s.can_free),
+malloc_state (s.malloc_state) {}
+
   /* See above.  */
   enum pure_const_state_e pure_const_state;
   /* What user set here; we can be always sure about this.  */
@@ -110,20 +122,25 @@ struct funct_state_d
   enum malloc_state_e malloc_state;
 };
 
-/* State used when we know nothing about function.  */
-static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
-
-
 typedef struct funct_state_d * funct_state;
 
 /* The storage of the funct_state is abstracted because there is the
possibility that it may be desirable to move this to the cgraph
local info.  */
 
-/* Array, indexed by cgraph node uid, of function states.  */
+class funct_state_summary_t: public function_summary 
+{
+public:
+  funct_state_summary_t (symbol_table *symtab):
+function_summary  (symtab) {}
+
+  virtual void insert (cgraph_node *, funct_state_d *state);
+  virtual void duplicate (cgraph_node *src_node, cgraph_node *dst_node,
+			  funct_state_d *src_data,
+			  funct_state_d *dst_data);
+};
 
-static vec funct_state_vec;
+static funct_state_summary_t *funct_state_summaries = NULL;
 
 static bool gate_pure_const (void);
 
@@ -155,12 +172,6 @@ public:
 
 private:
   bool init_p;
-
-  /* Holders of ipa cgraph hooks: */
-  struct cgraph_node_hook_list *function_insertion_hook_holder;
-  struct cgraph_2node_hook_list *node_duplication_hook_holder;
-  struct cgraph_node_hook_list *node_removal_hook_holder;
-
 }; // class pass_ipa_pure_const
 
 } // anon namespace
@@ -286,48 +297,6 @@ warn_function_cold (tree decl)
 			 true, warned_about, "cold");
 }
 
-/* Return true if we have a function state for NODE.  */
-
-static inline bool
-has_function_state (struct cgraph_node *node)
-{
-  if (!funct_state_vec.exists ()
-  || funct_state_vec.length () <= (unsigned int)node->uid)
-return false;
-  return funct_state_vec[node->uid] != NULL;
-}
-
-/* Return the function state from NODE.  */
-
-static inline funct_state
-get_function_state (struct cgraph_node *node)
-{
-  if (!funct_state_vec.exists ()
-  || funct_state_vec.length () <= (unsigned int)node->uid
-  || !funct_state_vec[node->uid])
-/* We might want to put correct previously_known state into varying.  */
-return &varying_state;
- return funct_state_vec[node->uid];
-}
-
-/* Set the function state S for NODE.  */
-
-static inline void
-set_function_state (struct cgraph_node *node, funct_state s)
-{
-  if (!funct_state_vec.exists ()
-  || funct_state_vec.length () <= (unsigned int)node->uid)
- funct_state_vec.safe_grow_cleared (node->uid + 1);
-
-  /* If funct_state_vec already contains a funct_state, we have to release
- it before it's going to be ovewritten.  */
-  if (funct_state_vec[node->uid] != NULL
-  && funct_state_vec[node->uid] != &varying_state)
-free (funct_state_vec[node->uid]);
-
-  funct_state_ve

Re: [RFC][PR64946] "abs" vectorization fails for char/short types

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 4:56 AM Andrew Pinski  wrote:

> On Wed, May 16, 2018 at 7:14 PM, Kugan Vivekanandarajah
>  wrote:
> > As mentioned in the PR, I am trying to add ABSU_EXPR to fix this
> > issue. In the attached patch, in fold_cond_expr_with_comparison I am
> > generating ABSU_EXPR for these cases. As I understand, absu_expr is
> > well defined in RTL. So, the issue is generating absu_expr  and
> > transferring to RTL in the correct way. I am not sure I am not doing
> > all that is needed. I will clean up and add more test-cases based on
> > the feedback.


> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 71e172c..2b812e5 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -235,6 +235,7 @@ optab_for_tree_code (enum tree_code code, const_tree
type,
> return trapv ? negv_optab : neg_optab;

>   case ABS_EXPR:
> +case ABSU_EXPR:
> return trapv ? absv_optab : abs_optab;


> This part is not correct, it should something like this:

>   case ABS_EXPR:
> return trapv ? absv_optab : abs_optab;
> +case ABSU_EXPR:
> +   return abs_optab ;

> Because ABSU is not undefined at the TYPE_MAX.

Also

/* Unsigned abs is simply the operand.  Testing here means we don't
  risk generating incorrect code below.  */
-  if (TYPE_UNSIGNED (type))
+  if (TYPE_UNSIGNED (type)
+ && (code != ABSU_EXPR))
 return op0;

is wrong.  ABSU of an unsigned number is still just that number.

The change to fold_cond_expr_with_comparison looks odd to me
(premature optimization).  It should be done separately - it seems
you are doing

(simplify (abs (convert @0)) (convert (absu @0)))

here.

You touch one other place in fold-const.c but there seem to be many
more that need ABSU_EXPR handling (you touched the one needed
for correctness) - esp. you should at least handle constant folding
in const_unop and the nonnegative predicate.

@@ -3167,6 +3167,9 @@ verify_expr (tree *tp, int *walk_subtrees, void *data
ATTRIBUTE_UNUSED)
CHECK_OP (0, "invalid operand to unary operator");
break;

+case ABSU_EXPR:
+  break;
+
  case REALPART_EXPR:
  case IMAGPART_EXPR:

verify_expr is no more.  Did you test this recently against trunk?

@@ -3937,6 +3940,9 @@ verify_gimple_assign_unary (gassign *stmt)
  case PAREN_EXPR:
  case CONJ_EXPR:
break;
+case ABSU_EXPR:
+  /* FIXME.  */
+  return false;

no - please not!  Please add verification here - ABSU should be only
called on INTEGRAL, vector or complex INTEGRAL types and the
type of the LHS should be always the unsigned variant of the
argument type.

if (is_gimple_val (cond_expr))
  return cond_expr;

-  if (TREE_CODE (cond_expr) == ABS_EXPR)
+  if (TREE_CODE (cond_expr) == ABS_EXPR
+  || TREE_CODE (cond_expr) == ABSU_EXPR)
  {
rhs1 = TREE_OPERAND (cond_expr, 1);
STRIP_USELESS_TYPE_CONVERSION (rhs1);

err, but the next line just builds a ABS_EXPR ...

How did you identify spots that need adjustment?  I would expect that
once folding generates ABSU_EXPR that you need to adjust frontends
(C++ constexpr handling for example).  Also I miss adjustments
to gimple-pretty-print.c and the GIMPLE FE parser.

recursively grepping throughout the whole gcc/ tree doesn't reveal too many
cases of ABS_EXPR so I think it's reasonable to audit all of them.

I also miss some trivial absu simplifications in match.pd.  There are not
a lot of abs cases but similar ones would be good to have initially.

Thanks for tackling this!
Richard.

> Thanks,
> Andrew

> >
> > Thanks,
> > Kugan
> >
> >
> > gcc/ChangeLog:
> >
> > 2018-05-13  Kugan Vivekanandarajah  
> >
> > * expr.c (expand_expr_real_2): Handle ABSU_EXPR.
> > * fold-const.c (fold_cond_expr_with_comparison): Generate ABSU_EXPR
> > (fold_unary_loc): Handle ABSU_EXPR.
> > * optabs-tree.c (optab_for_tree_code): Likewise.
> > * tree-cfg.c (verify_expr): Likewise.
> > (verify_gimple_assign_unary):  Likewise.
> > * tree-if-conv.c (fold_build_cond_expr):  Likewise.
> > * tree-inline.c (estimate_operator_cost):  Likewise.
> > * tree-pretty-print.c (dump_generic_node):  Likewise.
> > * tree.def (ABSU_EXPR): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-05-13  Kugan Vivekanandarajah  
> >
> > * gcc.dg/absu.c: New test.


Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-05-17 Thread Martin Liška
On 01/15/2018 12:22 AM, Bernhard Reutner-Fischer wrote:
> Can you please post CSiBE numbers? Ideally throwing in gcc-3.4.6 numbers too?
> 
> thanks, 

Hi.

I've just retested the patch and looks fine. There are numbers of CSiBE. I'm 
sorry I don't
have such old version of GCC:

+---++--+--+-++
|  object   | trunk  |  Trunk w/ patch  |Gcc 7.3.1 | patch vs. 
trunk | Gcc 7.3.1 vs trunk |
+---++--+--+-++
| buf.c.o   |   2531 | 2531 | 2531 |
   0 |  0 |
| ccl.c.o   |   2520 | 2520 | 2519 |
   0 | -1 |
| dfa.c.o   |   9909 | 9909 | 9909 |
   0 |  0 |
| ecs.c.o   |   1432 | 1432 | 1432 |
   0 |  0 |
| filter.c.o|   4810 | 4810 | 4810 |
   0 |  0 |
| gen.c.o   |  28696 |28696 |28805 |
   0 |109 |
| libmain.c.o   | 88 |   88 |   88 |
   0 |  0 |
| libyywrap.c.o | 54 |   54 |   67 |
   0 | 13 |
| main.c.o  |  22132 |22132 |22129 |
   0 | -3 |
| misc.c.o  |   9765 | 9765 | 9811 |
   0 | 46 |
| nfa.c.o   |   6449 | 6467 | 6449 |
  18 |  0 |
| options.c.o   |   1240 | 1240 | 1240 |
   0 |  0 |
| parse.c.o |  15737 |15737 |15742 |
   0 |  5 |
| regex.c.o |   1374 | 1374 | 1374 |
   0 |  0 |
| scan.c.o  |  66844 |66896 |66944 |
  52 |100 |
| scanflags.c.o |422 |  422 |  435 |
   0 | 13 |
| scanopt.c.o   |   8170 | 8201 | 8198 |
  31 | 28 |
| skel.c.o  |  91010 |91010 |91010 |
   0 |  0 |
| sym.c.o   |   1796 | 1796 | 1796 |
   0 |  0 |
| tables.c.o|   5000 | 5070 | 4998 |
  70 | -2 |
| tables_shared.c.o |122 |  122 |  122 |
   0 |  0 |
| tblcmp.c.o|   5587 | 5587 | 5578 |
   0 | -9 |
| yylex.c.o |   2166 | 2332 | 2122 |
 166 |-44 |
| blocksort.c.o |  13850 |13850 |13862 |
   0 | 12 |
| bzip2.c.o |  23540 |23702 |23535 |
 162 | -5 |
| bzip2recover.c.o  |   4863 | 4863 | 4865 |
   0 |  2 |
| bzlib.c.o |  21359 |21433 |21393 |
  74 | 34 |
| compress.c.o  |  24424 |24424 |24409 |
   0 |-15 |
| crctable.c.o  |  0 |0 |0 |
   0 |  0 |
| decompress.c.o|  23467 |23467 |23464 |
   0 | -3 |
| dlltest.c.o   |   1213 | 1213 | 1213 |
   0 |  0 |
| huffman.c.o   |   2180 | 2180 | 2180 |
   0 |  0 |
| mk251.c.o |103 |  103 |  103 |
   0 |  0 |
| randtable.c.o |  0 |0 |0 |
   0 |  0 |
| spewG.c.o |477 |  477 |  480 |
   0 |  3 |
| unzcrash.c.o  |   1284 | 1284 | 1284 |
   0 |  0 |
| SUM   | 404614 |   405187 |   404897 |
 573 |283 |
| ratio || 1.00141616454201 | 0.99928428108503 |
 ||
+---++--+--+-++

So the patch looks fine, only very very slightly binary is produced. I'm going 
to install the patch so that
I can carry on more complex patches

Re: Allow gimple_build with internal functions

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 10:21 AM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> This patch makes the function versions of gimple_build and
> gimple_simplify take combined_fns rather than built_in_codes,
> so that they work with internal functions too.  The old
> gimple_builds were unused, so no existing callers need
> to be updated.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard


> 2018-05-17  Richard Sandiford  

> gcc/
>  * gimple-fold.h (gimple_build): Make the function forms take
>  combined_fn rather than built_in_function.
>  (gimple_simplify): Likewise.
>  * gimple-match-head.c (gimple_simplify): Likewise.
>  * gimple-fold.c (gimple_build): Likewise.
>  * tree-vect-loop.c (get_initial_def_for_reduction): Use
gimple_build
>  rather than gimple_build_call_internal.
>  (get_initial_defs_for_reduction): Likewise.
>  (vect_create_epilog_for_reduction): Likewise.
>  (vectorizable_live_operation): Likewise.

> Index: gcc/gimple-fold.h
> ===
> --- gcc/gimple-fold.h   2018-05-16 20:17:39.114152860 +0100
> +++ gcc/gimple-fold.h   2018-05-17 09:17:32.876478942 +0100
> @@ -86,28 +86,25 @@ gimple_build (gimple_seq *seq,
>   {
> return gimple_build (seq, UNKNOWN_LOCATION, code, type, op0, op1, op2);
>   }
> -extern tree gimple_build (gimple_seq *, location_t,
> - enum built_in_function, tree, tree);
> +extern tree gimple_build (gimple_seq *, location_t, combined_fn, tree,
tree);
>   inline tree
> -gimple_build (gimple_seq *seq,
> - enum built_in_function fn, tree type, tree arg0)
> +gimple_build (gimple_seq *seq, combined_fn fn, tree type, tree arg0)
>   {
> return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0);
>   }
> -extern tree gimple_build (gimple_seq *, location_t,
> - enum built_in_function, tree, tree, tree);
> +extern tree gimple_build (gimple_seq *, location_t, combined_fn,
> + tree, tree, tree);
>   inline tree
> -gimple_build (gimple_seq *seq,
> - enum built_in_function fn, tree type, tree arg0, tree arg1)
> +gimple_build (gimple_seq *seq, combined_fn fn,
> + tree type, tree arg0, tree arg1)
>   {
> return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0, arg1);
>   }
> -extern tree gimple_build (gimple_seq *, location_t,
> - enum built_in_function, tree, tree, tree, tree);
> +extern tree gimple_build (gimple_seq *, location_t, combined_fn,
> + tree, tree, tree, tree);
>   inline tree
> -gimple_build (gimple_seq *seq,
> - enum built_in_function fn, tree type,
> - tree arg0, tree arg1, tree arg2)
> +gimple_build (gimple_seq *seq, combined_fn fn,
> + tree type, tree arg0, tree arg1, tree arg2)
>   {
> return gimple_build (seq, UNKNOWN_LOCATION, fn, type, arg0, arg1,
arg2);
>   }
> @@ -153,11 +150,11 @@ extern tree gimple_simplify (enum tree_c
>   gimple_seq *, tree (*)(tree));
>   extern tree gimple_simplify (enum tree_code, tree, tree, tree, tree,
>   gimple_seq *, tree (*)(tree));
> -extern tree gimple_simplify (enum built_in_function, tree, tree,
> +extern tree gimple_simplify (combined_fn, tree, tree,
>   gimple_seq *, tree (*)(tree));
> -extern tree gimple_simplify (enum built_in_function, tree, tree, tree,
> +extern tree gimple_simplify (combined_fn, tree, tree, tree,
>   gimple_seq *, tree (*)(tree));
> -extern tree gimple_simplify (enum built_in_function, tree, tree, tree,
tree,
> +extern tree gimple_simplify (combined_fn, tree, tree, tree, tree,
>   gimple_seq *, tree (*)(tree));

>   #endif  /* GCC_GIMPLE_FOLD_H */
> Index: gcc/gimple-match-head.c
> ===
> --- gcc/gimple-match-head.c 2018-03-30 12:28:37.301927949 +0100
> +++ gcc/gimple-match-head.c 2018-05-17 09:17:32.876478942 +0100
> @@ -478,55 +478,53 @@ gimple_simplify (enum tree_code code, tr
> return maybe_push_res_to_seq (rcode, type, ops, seq);
>   }

> -/* Builtin function with one argument.  */
> +/* Builtin or internal function with one argument.  */

>   tree
> -gimple_simplify (enum built_in_function fn, tree type,
> +gimple_simplify (combined_fn fn, tree type,
>   tree arg0,
>   gimple_seq *seq, tree (*valueize)(tree))
>   {
> if (constant_for_folding (arg0))
>   {
> -  tree res = fold_const_call (as_combined_fn (fn), type, arg0);
> +  tree res = fold_const_call (fn, type, arg0);
> if (res && CONSTANT_CLASS_P (res))
>  return res;
>   }

> code_helper rcode;
> tree ops[3] = {};
> 

Re: Gimple FE support for internal functions

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 10:27 AM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> This patch gets the gimple FE to parse calls to internal functions.
> The only non-obvious thing was how the functions should be written
> to avoid clashes with real function names.  One option would be to
> go the magic number of underscores route, but we already do that for
> built-in functions, and it would be good to keep them visually
> distinct.  In the end I borrowed the local/internal label convention
> from asm and used:

>x = .SQRT (y);

> I don't think even C++ has found a meaning for a leading dot yet.

Heh, clever idea!

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK and thanks for doing this!

Richard.

> Richard


> 2018-05-17  Richard Sandiford  

> gcc/
>  * internal-fn.h (lookup_internal_fn): Declare
>  * internal-fn.c (lookup_internal_fn): New function.
>  * gimple.c (gimple_build_call_from_tree): Handle calls to
>  internal functions.
>  * gimple-pretty-print.c (dump_gimple_call): Print "." before
>  internal function names.
>  * tree-pretty-print.c (dump_generic_node): Likewise.
>  * tree-ssa-scopedtables.c (expr_hash_elt::print): Likewise.

> gcc/c/
>  * gimple-parser.c: Include internal-fn.h.
>  (c_parser_gimple_statement): Treat a leading CPP_DOT as a call.
>  (c_parser_gimple_call_internal): New function.
>  (c_parser_gimple_postfix_expression): Use it to handle CPP_DOT.
>  Fix typos in comment.

> gcc/testsuite/
>  * gcc.dg/gimplefe-28.c: New test.
>  * gcc.dg/asan/use-after-scope-9.c: Adjust expected output for
>  internal function calls.
>  * gcc.dg/goacc/loop-processing-1.c: Likewise.

> Index: gcc/internal-fn.h
> ===
> --- gcc/internal-fn.h   2018-05-16 12:48:59.194282896 +0100
> +++ gcc/internal-fn.h   2018-05-17 09:17:58.757608747 +0100
> @@ -107,6 +107,8 @@ internal_fn_name (enum internal_fn fn)
> return internal_fn_name_array[(int) fn];
>   }

> +extern internal_fn lookup_internal_fn (const char *);
> +
>   /* Return the ECF_* flags for function FN.  */

>   extern const int internal_fn_flags_array[];
> Index: gcc/internal-fn.c
> ===
> --- gcc/internal-fn.c   2018-05-16 12:48:59.410941892 +0100
> +++ gcc/internal-fn.c   2018-05-17 09:22:49.808912358 +0100
> @@ -64,6 +64,26 @@ #define DEF_INTERNAL_FN(CODE, FLAGS, FNS
> 0
>   };

> +/* Return the internal function called NAME, or IFN_LAST if there's
> +   no such function.  */
> +
> +internal_fn
> +lookup_internal_fn (const char *name)
> +{
> +  typedef hash_map name_to_fn_map_type;
> +  static name_to_fn_map_type *name_to_fn_map;
> +
> +  if (!name_to_fn_map)
> +{
> +  name_to_fn_map = new name_to_fn_map_type (IFN_LAST);
> +  for (unsigned int i = 0; i < IFN_LAST; ++i)
> +   name_to_fn_map->put (internal_fn_name (internal_fn (i)),
> +internal_fn (i));
> +}
> +  internal_fn *entry = name_to_fn_map->get (name);
> +  return entry ? *entry : IFN_LAST;
> +}
> +
>   /* Fnspec of each internal function, indexed by function number.  */
>   const_tree internal_fn_fnspec_array[IFN_LAST + 1];

> Index: gcc/gimple.c
> ===
> --- gcc/gimple.c2018-05-16 12:48:59.410941892 +0100
> +++ gcc/gimple.c2018-05-17 09:22:49.808912358 +0100
> @@ -350,12 +350,19 @@ gimple_build_call_from_tree (tree t, tre
>   {
> unsigned i, nargs;
> gcall *call;
> -  tree fndecl = get_callee_fndecl (t);

> gcc_assert (TREE_CODE (t) == CALL_EXPR);

> nargs = call_expr_nargs (t);
> -  call = gimple_build_call_1 (fndecl ? fndecl : CALL_EXPR_FN (t), nargs);
> +
> +  tree fndecl = NULL_TREE;
> +  if (CALL_EXPR_FN (t) == NULL_TREE)
> +call = gimple_build_call_internal_1 (CALL_EXPR_IFN (t), nargs);
> +  else
> +{
> +  fndecl = get_callee_fndecl (t);
> +  call = gimple_build_call_1 (fndecl ? fndecl : CALL_EXPR_FN (t),
nargs);
> +}

> for (i = 0; i < nargs; i++)
>   gimple_call_set_arg (call, i, CALL_EXPR_ARG (t, i));
> Index: gcc/gimple-pretty-print.c
> ===
> --- gcc/gimple-pretty-print.c   2018-05-16 12:48:59.410941892 +0100
> +++ gcc/gimple-pretty-print.c   2018-05-17 09:22:49.808912358 +0100
> @@ -874,7 +874,7 @@ dump_gimple_call (pretty_printer *buffer
> if (flags & TDF_RAW)
>   {
> if (gimple_call_internal_p (gs))
> -   dump_gimple_fmt (buffer, spc, flags, "%G <%s, %T", gs,
> +   dump_gimple_fmt (buffer, spc, flags, "%G <.%s, %T", gs,
>   internal_fn_name (gimple_call_internal_fn (gs)),
lhs);
> else
>  dump_gimple_fmt (buffer, spc, flags, "%G <%T,

Re: Replace FMA_EXPR with one internal fn per optab

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 10:56 AM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> Richard Biener  writes:
> >> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
> >>  }
> >
> >> if (negate_p)
> >> -   mulop1 = force_gimple_operand_gsi (&gsi,
> >> -  build1 (NEGATE_EXPR,
> >> -  type, mulop1),
> >> -  true, NULL_TREE, true,
> >> -  GSI_SAME_STMT);
> >> +   mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
> >
> >> -  fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
> >> - FMA_EXPR, mulop1, op2, addop);
> >> +  if (seq)
> >> +   gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> >> +  fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
> > addop);
> >> +  gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
> >> +  gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
> > (use_stmt));
> >> +  gsi_replace (&gsi, fma_stmt, true);
> >> +  /* Valueize aggressively so that we generate FMS, FNMA and FNMS
> >> +regardless of where the negation occurs.  */
> >> +  if (fold_stmt (&gsi, aggressive_valueize))
> >> +   update_stmt (gsi_stmt (gsi));
> >
> > I think it would be nice to be able to use gimple_build () with IFNs so
you
> > can
> > gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
> > only need to fold with generated negates, not with negates already in
the
> > IL?
> > The the folding implied with gimple_build will take care of it.

> The idea was to pick up existing negates that feed the multiplication
> as well as any added by the pass itself.

> On IRC yesterday we talked about how this should handle the ECF_NOTHROW
> flag, and whether things like IFN_SQRT and IFN_FMA should always be
> nothrow (like the built-in functions are).  But in the end I thought
> it'd be better to keep things as they are.  We already handle
> -fnon-call-exceptions for unfused a * b + c and before the patch also
> handled it for FMA_EXPR.  It'd seem like a step backwards if the new
> internal functions didn't handle it too.  If anything it seems like the
> built-in functions should change to be closer to the tree_code and
> internal_fn way of doing things, if we want to support
-fnon-call-exceptions
> properly.

Right.  -fnon-call-exceptions isn't very well tested outside of Ada which
must
have its own builtin declarations.

> This also surprised me when doing the if-conversion patch I sent
yesterday.
> We're happy to vectorise:

>for (int i = 0; i < 100; ++i)
>  x[i] = ... ? sqrt (x[i]) : 0;

> by doing the sqrt unconditionally and selecting on the result, even with
> the default maths flags, but refuse to vectorise the simpler:

>for (int i = 0; i < 100; ++i)
>  x[i] = ... ? x[i] + 1 : 0;

> in the same way.

Heh.

> > Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
> > alongside no_follow_ssa_edges / follow_single_use_edges and maybe
> > rename it as follow_all_ssa_edges?

> Ah, yeah, that's definitely a better name.

> I also renamed all_scalar_fma to scalar_all_fma, since I realised
> after Andrew's reply that the old name made it sound like it was
> "all scalars", whereas it meant to mean "all fmas".

> Tested as before.

OK.

Thanks,
Richard.

> Thanks,
> Richard

> 2018-05-17  Richard Sandiford  

> gcc/
>  * doc/sourcebuild.texi (scalar_all_fma): Document.
>  * tree.def (FMA_EXPR): Delete.
>  * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>  * internal-fn.c (ternary_direct): New macro.
>  (expand_ternary_optab_fn): Likewise.
>  (direct_ternary_optab_supported_p): Likewise.
>  * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>  * builtins.c (fold_builtin_fma): Delete.
>  (fold_builtin_3): Don't call it.
>  * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>  * expr.c (expand_expr_real_2): Likewise.
>  * fold-const.c (operand_equal_p): Likewise.
>  (fold_ternary_loc): Likewise.
>  * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>  * gimple.c (DEFTREECODE): Likewise.
>  * gimplify.c (gimplify_expr): Likewise.
>  * optabs-tree.c (optab_for_tree_code): Likewise.
>  * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>  * tree-eh.c (operation_could_trap_p): Likewise.
>  (stmt_could_throw_1_p): Likewise.
>  * tree-inline.c (estimate_operator_cost): Likewise.
>  * tree-pretty-print.c (dump_generic_node): Likewise.
>  (op_code_prio): Likewise.
>  * tree-ssa-loop-im.c (stmt_cost): Likewise.
>  * tree-ssa-operands.c (get_expr_operands): Likewise.
>  * tree.c (commutative_tern

Re: [PATCH PR85793]Fix ICE by loading vector(1) scalara_type for 1 element-wise case

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 12:11 PM Bin.Cheng  wrote:

> On Thu, May 17, 2018 at 11:07 AM, Richard Biener
>  wrote:
> > On Wed, May 16, 2018 at 5:13 PM Bin Cheng  wrote:
> >
> >> Hi,
> >> This patch fixes ICE by loading vector(1) scalar_type if it's 1
> > element-wise for VMAT_ELEMENTWISE.
> >> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?
> >
> > OK.
> Bootstrap and test finished well.  I also need approval for GCC 8
backport.

Ok to backport.

> Thanks,
> bin
> >
> > Richard.
> >
> >> Thanks,
> >> bin
> >> 2018-05-16  Bin Cheng  
> >>  Richard Biener  
> >
> >>  PR tree-optimization/85793
> >>  * tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise
> > load
> >>  for VMAT_ELEMENTWISE.
> >
> >> gcc/testsuite
> >> 2018-05-16  Bin Cheng  
> >
> >>  PR tree-optimization/85793
> >>  * gcc.dg/vect/pr85793.c: New test.


Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Marc Glisse

On Mon, 14 May 2018, Jonathan Wakely wrote:


As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
we can simplify the allocator function for valarray memory. I also
noticed that the _Array(size_t) constructor is never used.

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove ignored
top-level restrict qualifier and add malloc attribute instead.


I am trying to understand the point of adding this attribute. The function 
is just


{ return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }

The idea is that it isn't safe (? see PR 23383) to mark operator new with 
the attribute, but it is safe for this particular use?


When optimizing, I certainly hope this trivial function gets inlined, and 
then the attribute is lost (should the inliner add 'restrict' when 
inlining a function with attribute malloc?) and all that matters is 
operator new.


--
Marc Glisse


Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Jonathan Wakely

On 17/05/18 12:54 +0200, Marc Glisse wrote:

On Mon, 14 May 2018, Jonathan Wakely wrote:


As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
we can simplify the allocator function for valarray memory. I also
noticed that the _Array(size_t) constructor is never used.

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove ignored
top-level restrict qualifier and add malloc attribute instead.


I am trying to understand the point of adding this attribute. The 
function is just


{ return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }

The idea is that it isn't safe (? see PR 23383) to mark operator new 
with the attribute, but it is safe for this particular use?


I'd forgotten about that (I was assuming the compiler doesn't need to
be told about the properties of operator new, because they're defined
by the language). We can remove the attribute.


When optimizing, I certainly hope this trivial function gets inlined, 
and then the attribute is lost (should the inliner add 'restrict' when 
inlining a function with attribute malloc?) and all that matters is 
operator new.


Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Marc Glisse

On Thu, 17 May 2018, Jonathan Wakely wrote:


On 17/05/18 12:54 +0200, Marc Glisse wrote:

On Mon, 14 May 2018, Jonathan Wakely wrote:


As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
we can simplify the allocator function for valarray memory. I also
noticed that the _Array(size_t) constructor is never used.

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove ignored
top-level restrict qualifier and add malloc attribute instead.


I am trying to understand the point of adding this attribute. The function 
is just


{ return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }

The idea is that it isn't safe (? see PR 23383) to mark operator new with 
the attribute, but it is safe for this particular use?


I'd forgotten about that (I was assuming the compiler doesn't need to
be told about the properties of operator new, because they're defined
by the language). We can remove the attribute.


I am not necessarily asking to remove it. I don't have a good 
understanding of what would break if we marked operator new with the 
attribute, so I have no idea if those reasons also apply for this use in 
valarray.


When optimizing, I certainly hope this trivial function gets inlined, and 
then the attribute is lost (should the inliner add 'restrict' when inlining 
a function with attribute malloc?) and all that matters is operator new.


If we determine that using the attribute here but not on operator new is 
the right choice, then I believe we need some middle-end tweaks so it 
isn't ignored.


--
Marc Glisse


Re: PR83648

2018-05-17 Thread Prathamesh Kulkarni
On 15 May 2018 at 12:20, Richard Biener  wrote:
> On Tue, 15 May 2018, Prathamesh Kulkarni wrote:
>
>> On 12 January 2018 at 18:26, Richard Biener  wrote:
>> > On Fri, 12 Jan 2018, Prathamesh Kulkarni wrote:
>> >
>> >> On 12 January 2018 at 05:02, Jeff Law  wrote:
>> >> > On 01/10/2018 10:04 PM, Prathamesh Kulkarni wrote:
>> >> >> On 11 January 2018 at 04:50, Jeff Law  wrote:
>> >> >>> On 01/09/2018 05:57 AM, Prathamesh Kulkarni wrote:
>> >> 
>> >>  As Jakub pointed out for the case:
>> >>  void *f()
>> >>  {
>> >>    return __builtin_malloc (0);
>> >>  }
>> >> 
>> >>  The malloc propagation would set f() to malloc.
>> >>  However AFAIU, malloc(0) returns NULL (?) and the function shouldn't
>> >>  be marked as malloc ?
>> >> >>> This seems like a pretty significant concern.   Given:
>> >> >>>
>> >> >>>
>> >> >>>  return  n ? 0 : __builtin_malloc (n);
>> >> >>>
>> >> >>> Is the function malloc-like enough to allow it to be marked?
>> >> >>>
>> >> >>> If not, then ISTM we have to be very conservative in what we mark.
>> >> >>>
>> >> >>> foo (n, m)
>> >> >>> {
>> >> >>>   return n ? 0 : __builtin_malloc (m);
>> >> >>> }
>> >> >>>
>> >> >>> Is that malloc-like enough to mark?
>> >> >> Not sure. Should I make it more conservative by marking it as malloc
>> >> >> only if the argument to __builtin_malloc
>> >> >> is constant or it's value-range is known not to include 0? And
>> >> >> similarly for __builtin_calloc ?
>> >> > It looks like the consensus is we don't need to worry about the cases
>> >> > above.  So unless Jakub chimes in with a solid reason, don't worry about
>> >> > them.
>> >> Thanks everyone for the clarification. The attached patch skips on 0 phi 
>> >> arg,
>> >> and returns false if -fno-delete-null-pointer-checks is passed.
>> >>
>> >> With the patch, malloc_candidate_p returns true for
>> >> return 0;
>> >> or
>> >> ret = phi<0, 0>
>> >> return ret
>> >>
>> >> which I believe is OK as far as correctness is concerned.
>> >> However as Martin points out suggesting malloc attribute for return 0
>> >> case is not ideal.
>> >> I suppose we can track the return 0 (or when value range of return
>> >> value is known not to include 0)
>> >> corner case and avoid suggesting malloc for those ?
>> >>
>> >> Validation in progress.
>> >> Is this patch OK for next stage-1 ?
>> >
>> > Ok.
>> I have committed this as r260250 after bootstrap+test on x86_64 on top of 
>> trunk.
>> With the patch, we now emit a suggestion for malloc attribute for a
>> function returning NULL,
>> which may not be ideal. I was wondering for which cases should we
>> avoid suggesting malloc attribute with -Wsuggest-attribute ?
>>
>> 1] Return value is NULL.
>
> Yes.
>
>> 2] Return value is phi result, and all args of phi are 0.
>
> In which case constant propagation should have eliminated the PHI.
>
>> 3] Any other cases ?
>
> Can't think of any.  Please create testcases for all cases you
> fend off.
Hi,
Does the attached patch look OK ?
It simply checks in warn_function_malloc if function returns NULL and
chooses not to warn in that case.

Thanks,
Prathamesh
>
> Richard.
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 567b615fb60..23e6b19a3c4 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -246,6 +246,21 @@ warn_function_const (tree decl, bool known_finite)
 static void
 warn_function_malloc (tree decl)
 {
+  function *fun = DECL_STRUCT_FUNCTION (decl);
+
+  basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (fun);
+  if (single_pred_p (exit_bb))
+{
+  basic_block ret_bb = single_pred (exit_bb);
+  gimple_stmt_iterator gsi = gsi_last_bb (ret_bb);
+  greturn *ret_stmt = dyn_cast (gsi_stmt (gsi));
+  gcc_assert (ret_stmt);
+  tree retval = gimple_return_retval (ret_stmt);
+  gcc_assert (retval && (TREE_CODE (TREE_TYPE (retval)) == POINTER_TYPE));
+  if (integer_zerop (retval))
+	return;
+}
+
   static hash_set *warned_about;
   warned_about
 = suggest_attribute (OPT_Wsuggest_attribute_malloc, decl,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83648-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83648-3.c
new file mode 100644
index 000..564216ceae9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83648-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wsuggest-attribute=malloc" } */
+
+__attribute__((noinline))
+void *g(unsigned n) /* { dg-bogus "function might be candidate for 'malloc' attribute" } */
+{
+  return 0;
+}
+
+void *h(unsigned n) /* { dg-bogus "function might be candidate for 'malloc' attribute" } */
+{
+  int f(int);
+
+  if (f (n))
+return 0;
+
+  f (n);
+  return 0;
+}


Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 1:14 PM Marc Glisse  wrote:

> On Thu, 17 May 2018, Jonathan Wakely wrote:

> > On 17/05/18 12:54 +0200, Marc Glisse wrote:
> >> On Mon, 14 May 2018, Jonathan Wakely wrote:
> >>
> >>> As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
> >>> we can simplify the allocator function for valarray memory. I also
> >>> noticed that the _Array(size_t) constructor is never used.
> >>>
> >>> * include/bits/valarray_array.h (__valarray_get_memory): Remove.
> >>> (__valarray_get_storage): Call operator new directly. Remove
ignored
> >>> top-level restrict qualifier and add malloc attribute instead.
> >>
> >> I am trying to understand the point of adding this attribute. The
function
> >> is just
> >>
> >> { return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }
> >>
> >> The idea is that it isn't safe (? see PR 23383) to mark operator new
with
> >> the attribute, but it is safe for this particular use?
> >
> > I'd forgotten about that (I was assuming the compiler doesn't need to
> > be told about the properties of operator new, because they're defined
> > by the language). We can remove the attribute.

> I am not necessarily asking to remove it. I don't have a good
> understanding of what would break if we marked operator new with the
> attribute, so I have no idea if those reasons also apply for this use in
> valarray.

> >> When optimizing, I certainly hope this trivial function gets inlined,
and
> >> then the attribute is lost (should the inliner add 'restrict' when
inlining
> >> a function with attribute malloc?) and all that matters is operator
new.

> If we determine that using the attribute here but not on operator new is
> the right choice, then I believe we need some middle-end tweaks so it
> isn't ignored.

We don't have a good way to do this.  Your suggestion of adding restrict
would work if it were not that only function-scope restrict uses are later
handled...

Richard.

> --
> Marc Glisse


Re: PR83648

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 1:25 PM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On 15 May 2018 at 12:20, Richard Biener  wrote:
> > On Tue, 15 May 2018, Prathamesh Kulkarni wrote:
> >
> >> On 12 January 2018 at 18:26, Richard Biener  wrote:
> >> > On Fri, 12 Jan 2018, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 12 January 2018 at 05:02, Jeff Law  wrote:
> >> >> > On 01/10/2018 10:04 PM, Prathamesh Kulkarni wrote:
> >> >> >> On 11 January 2018 at 04:50, Jeff Law  wrote:
> >> >> >>> On 01/09/2018 05:57 AM, Prathamesh Kulkarni wrote:
> >> >> 
> >> >>  As Jakub pointed out for the case:
> >> >>  void *f()
> >> >>  {
> >> >>    return __builtin_malloc (0);
> >> >>  }
> >> >> 
> >> >>  The malloc propagation would set f() to malloc.
> >> >>  However AFAIU, malloc(0) returns NULL (?) and the function
shouldn't
> >> >>  be marked as malloc ?
> >> >> >>> This seems like a pretty significant concern.   Given:
> >> >> >>>
> >> >> >>>
> >> >> >>>  return  n ? 0 : __builtin_malloc (n);
> >> >> >>>
> >> >> >>> Is the function malloc-like enough to allow it to be marked?
> >> >> >>>
> >> >> >>> If not, then ISTM we have to be very conservative in what we
mark.
> >> >> >>>
> >> >> >>> foo (n, m)
> >> >> >>> {
> >> >> >>>   return n ? 0 : __builtin_malloc (m);
> >> >> >>> }
> >> >> >>>
> >> >> >>> Is that malloc-like enough to mark?
> >> >> >> Not sure. Should I make it more conservative by marking it as
malloc
> >> >> >> only if the argument to __builtin_malloc
> >> >> >> is constant or it's value-range is known not to include 0? And
> >> >> >> similarly for __builtin_calloc ?
> >> >> > It looks like the consensus is we don't need to worry about the
cases
> >> >> > above.  So unless Jakub chimes in with a solid reason, don't
worry about
> >> >> > them.
> >> >> Thanks everyone for the clarification. The attached patch skips on
0 phi arg,
> >> >> and returns false if -fno-delete-null-pointer-checks is passed.
> >> >>
> >> >> With the patch, malloc_candidate_p returns true for
> >> >> return 0;
> >> >> or
> >> >> ret = phi<0, 0>
> >> >> return ret
> >> >>
> >> >> which I believe is OK as far as correctness is concerned.
> >> >> However as Martin points out suggesting malloc attribute for return
0
> >> >> case is not ideal.
> >> >> I suppose we can track the return 0 (or when value range of return
> >> >> value is known not to include 0)
> >> >> corner case and avoid suggesting malloc for those ?
> >> >>
> >> >> Validation in progress.
> >> >> Is this patch OK for next stage-1 ?
> >> >
> >> > Ok.
> >> I have committed this as r260250 after bootstrap+test on x86_64 on top
of trunk.
> >> With the patch, we now emit a suggestion for malloc attribute for a
> >> function returning NULL,
> >> which may not be ideal. I was wondering for which cases should we
> >> avoid suggesting malloc attribute with -Wsuggest-attribute ?
> >>
> >> 1] Return value is NULL.
> >
> > Yes.
> >
> >> 2] Return value is phi result, and all args of phi are 0.
> >
> > In which case constant propagation should have eliminated the PHI.
> >
> >> 3] Any other cases ?
> >
> > Can't think of any.  Please create testcases for all cases you
> > fend off.
> Hi,
> Does the attached patch look OK ?
> It simply checks in warn_function_malloc if function returns NULL and
> chooses not to warn in that case.

I think a better approach is to not add the pointless attribute.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.


Re: Implement SLP of internal functions

2018-05-17 Thread Richard Biener
On Wed, May 16, 2018 at 12:18 PM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> SLP of calls was previously restricted to built-in functions.
> This patch extends it to internal functions.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

> Richard


> 2018-05-16  Richard Sandiford  

> gcc/
>  * internal-fn.h (vectorizable_internal_fn_p): New function.
>  * tree-vect-slp.c (compatible_calls_p): Likewise.
>  (vect_build_slp_tree_1): Remove nops argument.  Handle calls
>  to internal functions.
>  (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.

> gcc/testsuite/
>  * gcc.target/aarch64/sve/cond_arith_4.c: New test.
>  * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.
>  * gcc.target/aarch64/sve/cond_arith_5.c: Likewise.
>  * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.
>  * gcc.target/aarch64/sve/slp_14.c: Likewise.
>  * gcc.target/aarch64/sve/slp_14_run.c: Likewise.

> Index: gcc/internal-fn.h
> ===
> --- gcc/internal-fn.h   2018-05-16 11:06:14.513574219 +0100
> +++ gcc/internal-fn.h   2018-05-16 11:12:11.872116220 +0100
> @@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn)
> return direct_internal_fn_array[fn].type0 >= -1;
>   }

> +/* Return true if FN is a direct internal function that can be
vectorized by
> +   converting the return type and all argument types to vectors of the
same
> +   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an
> +   IFN_SQRT on vectors of N floats.  */
> +
> +inline bool
> +vectorizable_internal_fn_p (internal_fn fn)
> +{
> +  return direct_internal_fn_array[fn].vectorizable;
> +}
> +
>   /* Return optab information about internal function FN.  Only meaningful
>  if direct_internal_fn_p (FN).  */

> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100
> +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100
> @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v
> return 0;
>   }

> +/* Return true if call statements CALL1 and CALL2 are similar enough
> +   to be combined into the same SLP group.  */
> +
> +static bool
> +compatible_calls_p (gcall *call1, gcall *call2)
> +{
> +  unsigned int nargs = gimple_call_num_args (call1);
> +  if (nargs != gimple_call_num_args (call2))
> +return false;
> +
> +  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))
> +return false;
> +
> +  if (gimple_call_internal_p (call1))
> +{
> +  if (TREE_TYPE (gimple_call_lhs (call1))
> + != TREE_TYPE (gimple_call_lhs (call2)))
> +   return false;
> +  for (unsigned int i = 0; i < nargs; ++i)
> +   if (TREE_TYPE (gimple_call_arg (call1, i))
> +   != TREE_TYPE (gimple_call_arg (call2, i)))

Please use types_compatible_p in these two type comparisons.

Can you please add a generic vect_call_sqrtf to the main
vectorizer testsuite?  In fact I already see
gcc.dg/vect/fast-math-bb-slp-call-1.c.
Does that mean SQRT does never appear as internal function before
vectorization?

OK with that changes.
Richard.

> + return false;
> +}
> +  else
> +{
> +  if (!operand_equal_p (gimple_call_fn (call1),
> +   gimple_call_fn (call2), 0))
> +   return false;
> +
> +  if (gimple_call_fntype (call1) != gimple_call_fntype (call2))
> +   return false;
> +}
> +  return true;
> +}
> +
>   /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
>  caller's attempt to find the vector type in STMT with the narrowest
>  element type.  Return true if VECTYPE is nonnull and if it is valid
> @@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo,
>   static bool
>   vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> vec stmts, unsigned int group_size,
> -  unsigned nops, poly_uint64 *max_nunits,
> -  bool *matches, bool *two_operators)
> +  poly_uint64 *max_nunits, bool *matches,
> +  bool *two_operators)
>   {
> unsigned int i;
> gimple *first_stmt = stmts[0], *stmt = stmts[0];
> @@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo,
> if (gcall *call_stmt = dyn_cast  (stmt))
>  {
>rhs_code = CALL_EXPR;
> - if (gimple_call_internal_p (call_stmt)
> + if ((gimple_call_internal_p (call_stmt)
> +  && (!vectorizable_internal_fn_p
> +  (gimple_call_internal_fn (call_stmt
>|| gimple_call_tail_p (call_stmt)
>|| gimple_call_noreturn_p (call_stmt)
>|| !gimple_call_nothrow_p (call_stmt)
> @@ -833,11 +870,8 @@ vect_build_slp_tree_1 (

Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Marc Glisse

On Thu, 17 May 2018, Richard Biener wrote:


On Thu, May 17, 2018 at 1:14 PM Marc Glisse  wrote:


On Thu, 17 May 2018, Jonathan Wakely wrote:



On 17/05/18 12:54 +0200, Marc Glisse wrote:

On Mon, 14 May 2018, Jonathan Wakely wrote:


As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
we can simplify the allocator function for valarray memory. I also
noticed that the _Array(size_t) constructor is never used.

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove

ignored

top-level restrict qualifier and add malloc attribute instead.


I am trying to understand the point of adding this attribute. The

function

is just

{ return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }

The idea is that it isn't safe (? see PR 23383) to mark operator new

with

the attribute, but it is safe for this particular use?


I'd forgotten about that (I was assuming the compiler doesn't need to
be told about the properties of operator new, because they're defined
by the language). We can remove the attribute.



I am not necessarily asking to remove it. I don't have a good
understanding of what would break if we marked operator new with the
attribute, so I have no idea if those reasons also apply for this use in
valarray.



When optimizing, I certainly hope this trivial function gets inlined,

and

then the attribute is lost (should the inliner add 'restrict' when

inlining

a function with attribute malloc?) and all that matters is operator

new.


If we determine that using the attribute here but not on operator new is
the right choice, then I believe we need some middle-end tweaks so it
isn't ignored.


We don't have a good way to do this.  Your suggestion of adding restrict
would work if it were not that only function-scope restrict uses are later
handled...


This seems extremely similar to the issue of inlining functions with 
restrict arguments.


I have written a PR, but it is probably not worth submitting.

--
Marc Glisse


[PATCH] Improve memset handling in value-numbering

2018-05-17 Thread Richard Biener

Noticed in PR63185.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-05-17  Richard Biener  

* tree-ssa-sccvn.c (vn_reference_lookup_3): Improve memset handling.

* gcc.dg/tree-ssa/ssa-fre-63.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-63.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-63.c
new file mode 100644
index 000..39e8c08cef9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-63.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1-stats" } */
+
+int foo(char *x)
+{
+  __builtin_memset (&x[1], 'c', 42);
+  return x[0] + x[1] + x[42] + x[43];
+}
+
+/* We should eliminate x[1] and x[42] and their conversions to int.  */
+/* { dg-final { scan-tree-dump "Eliminated: 4" "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 1463c1d4116..39de866a8ce 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1958,23 +1958,75 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*vr_,
  1) Memset.  */
   if (is_gimple_reg_type (vr->type)
   && gimple_call_builtin_p (def_stmt, BUILT_IN_MEMSET)
-  && integer_zerop (gimple_call_arg (def_stmt, 1))
+  && (integer_zerop (gimple_call_arg (def_stmt, 1))
+ || (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8)))
   && poly_int_tree_p (gimple_call_arg (def_stmt, 2))
-  && TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR)
+  && (TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR
+ || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME))
 {
-  tree ref2 = TREE_OPERAND (gimple_call_arg (def_stmt, 0), 0);
   tree base2;
   poly_int64 offset2, size2, maxsize2;
   bool reverse;
-  base2 = get_ref_base_and_extent (ref2, &offset2, &size2, &maxsize2,
-  &reverse);
+  tree ref2 = gimple_call_arg (def_stmt, 0);
+  if (TREE_CODE (ref2) == SSA_NAME)
+   {
+ ref2 = SSA_VAL (ref2);
+ if (TREE_CODE (ref2) == SSA_NAME
+ && (TREE_CODE (base) != MEM_REF
+ || TREE_OPERAND (base, 0) != ref2))
+   {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (ref2);
+ if (gimple_assign_single_p (def_stmt)
+ && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR)
+   ref2 = gimple_assign_rhs1 (def_stmt);
+   }
+   }
+  if (TREE_CODE (ref2) == ADDR_EXPR)
+   {
+ ref2 = TREE_OPERAND (ref2, 0);
+ base2 = get_ref_base_and_extent (ref2, &offset2, &size2, &maxsize2,
+  &reverse);
+ if (!known_size_p (maxsize2)
+ || !operand_equal_p (base, base2, OEP_ADDRESS_OF))
+   return (void *)-1;
+   }
+  else if (TREE_CODE (ref2) == SSA_NAME)
+   {
+ poly_int64 soff;
+ if (TREE_CODE (base) != MEM_REF
+ || !(mem_ref_offset (base) << LOG2_BITS_PER_UNIT).to_shwi (&soff))
+   return (void *)-1;
+ offset += soff;
+ offset2 = 0;
+ if (TREE_OPERAND (base, 0) != ref2)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (ref2);
+ if (is_gimple_assign (def)
+ && gimple_assign_rhs_code (def) == POINTER_PLUS_EXPR
+ && gimple_assign_rhs1 (def) == TREE_OPERAND (base, 0)
+ && poly_int_tree_p (gimple_assign_rhs2 (def))
+ && (wi::to_poly_offset (gimple_assign_rhs2 (def))
+ << LOG2_BITS_PER_UNIT).to_shwi (&offset2))
+   {
+ ref2 = gimple_assign_rhs1 (def);
+ if (TREE_CODE (ref2) == SSA_NAME)
+   ref2 = SSA_VAL (ref2);
+   }
+ else
+   return (void *)-1;
+   }
+   }
+  else
+   return (void *)-1;
   tree len = gimple_call_arg (def_stmt, 2);
-  if (known_size_p (maxsize2)
- && operand_equal_p (base, base2, 0)
- && known_subrange_p (offset, maxsize, offset2,
-  wi::to_poly_offset (len) << LOG2_BITS_PER_UNIT))
+  if (known_subrange_p (offset, maxsize, offset2,
+   wi::to_poly_offset (len) << LOG2_BITS_PER_UNIT))
{
- tree val = build_zero_cst (vr->type);
+ tree val;
+ if (integer_zerop (gimple_call_arg (def_stmt, 1)))
+   val = build_zero_cst (vr->type);
+ else
+   val = fold_convert (vr->type, gimple_call_arg (def_stmt, 1));
  return vn_reference_lookup_or_insert_for_pieces
   (vuse, vr->set, vr->type, vr->operands, val);
}


[PATCH] Improve get_ref_base_and_extent with range-info

2018-05-17 Thread Richard Biener

The following makes use of range-info to improve the basic building
block of the alias-oracle so we can tell that in

  a[0] = 1;
  for (int i = 5; i < 17; ++i)
a[i] = i;
  a[0] = 2;

the ao_ref for a[i] does not alias the a[0] acceses.  Given range-info
is not always going to improve things over knowledge gained from
the type size of the access I'm only improving it over information
gathered from the size.

For the above this allows us to DSE the first store with another
DSE improvement I'm testing separately.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2018-05-17  Richard Biener  

* tree-dfa.c (get_ref_base_and_extent): Use range-info to refine
results when processing array refs with variable index.

* gcc.dg/tree-ssa/ssa-dse-35.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-35.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-35.c
new file mode 100644
index 000..1f21670406f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-35.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1-details" } */
+
+int a[256];
+void foo (void)
+{
+  a[0] = 1;
+  for (int i = 5; i < 17; ++i)
+a[i] = i;
+  a[0] = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */
diff --git a/gcc/tree-dfa.c b/gcc/tree-dfa.c
index a121b880bb0..993ac49554d 100644
--- a/gcc/tree-dfa.c
+++ b/gcc/tree-dfa.c
@@ -529,6 +529,48 @@ get_ref_base_and_extent (tree exp, poly_int64_pod *poffset,
/* Remember that we have seen an array ref with a variable
   index.  */
seen_variable_array_ref = true;
+
+   wide_int min, max;
+   if (TREE_CODE (index) == SSA_NAME
+   && (low_bound = array_ref_low_bound (exp),
+   poly_int_tree_p (low_bound))
+   && (unit_size = array_ref_element_size (exp),
+   TREE_CODE (unit_size) == INTEGER_CST)
+   && get_range_info (index, &min, &max) == VR_RANGE)
+ {
+   poly_offset_int lbound = wi::to_poly_offset (low_bound);
+   /* Try to constrain maxsize with range information.  */
+   offset_int omax
+ = offset_int::from (max, TYPE_SIGN (TREE_TYPE (index)));
+   if (known_lt (lbound, omax))
+ {
+   poly_offset_int rmaxsize;
+   rmaxsize = (omax - lbound + 1)
+   * wi::to_offset (unit_size) << LOG2_BITS_PER_UNIT;
+   if (!known_size_p (maxsize)
+   || known_lt (rmaxsize, maxsize))
+ {
+   maxsize = rmaxsize;
+   /* Given we know an upper bound this is no
+  longer variable.  */
+   seen_variable_array_ref = false;
+ }
+ }
+   /* Try to adjust bit_offset with range information.  */
+   offset_int omin
+ = offset_int::from (min, TYPE_SIGN (TREE_TYPE (index)));
+   if (known_le (lbound, omin))
+ {
+   poly_offset_int woffset
+ = wi::sext (omin - lbound,
+ TYPE_PRECISION (TREE_TYPE (index)));
+   woffset *= wi::to_offset (unit_size);
+   woffset <<= LOG2_BITS_PER_UNIT;
+   bit_offset += woffset;
+   if (known_size_p (maxsize))
+ maxsize -= woffset;
+ }
+ }
  }
  }
  break;


[PATCH] Another DSE improvement and thinko fix

2018-05-17 Thread Richard Biener

The previous DSE improvements left us with skipping elements we could
have possibly removed because I messed up the iterator increment
upon removal.  The following fixes this and also adds another pruning
opportunity in case the only stmt feeded by the def is an already
visited PHI.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-05-17  Richard Biener  

* tree-ssa-dse.c (dse_classify_store): Fix iterator increment
for pruning loop and prune defs feeding only already visited PHIs.

diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 589cfef5df5..28dc95f1740 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -662,7 +669,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
}
 
   /* Process defs and remove those we need not process further.  */
-  for (unsigned i = 0; i < defs.length (); ++i)
+  for (unsigned i = 0; i < defs.length ();)
{
  gimple *def = defs[i];
  gimple *use_stmt;
@@ -680,11 +687,18 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
  /* In addition to kills we can remove defs whose only use
 is another def in defs.  That can only ever be PHIs of which
 we track a single for simplicity reasons (we fail for multiple
-PHIs anyways).  */
+PHIs anyways).  We can also ignore defs that feed only into
+already visited PHIs.  */
  else if (gimple_code (def) != GIMPLE_PHI
   && single_imm_use (gimple_vdef (def), &use_p, &use_stmt)
-  && use_stmt == phi_def)
+  && (use_stmt == phi_def
+  || (gimple_code (use_stmt) == GIMPLE_PHI
+  && bitmap_bit_p (visited,
+   SSA_NAME_VERSION
+ (PHI_RESULT (use_stmt))
defs.unordered_remove (i);
+ else
+   ++i;
}
 
   /* If all defs kill the ref we are done.  */


[Patch, fortran] PR82923 - Automatic allocation of deferred length character using function result

2018-05-17 Thread Paul Richard Thomas
The ChangeLog and the comments in the patch tell all.

Bootstrapped and regtested on FC27/x86_64.

OK for 7-branch through to trunk?

Paul

2018-05-17  Paul Thomas  

PR fortran/82923
* trans-array.c (gfc_alloc_allocatable_for_assignment): Set the
charlen backend_decl of the rhs expr to ss->info->string_length
so that the value in the current scope is used.

2018-05-17  Paul Thomas  

PR fortran/82923
* gfortran.dg/allocate_assumed_charlen_4.f90: New test.
Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c	(revision 260210)
--- gcc/fortran/trans-array.c	(working copy)
*** gfc_alloc_allocatable_for_assignment (gf
*** 9698,9703 
--- 9698,9709 
if (expr2 && rss == gfc_ss_terminator)
  return NULL_TREE;

+   /* Ensure that the string length from the current scope is used.  */
+   if (expr2->ts.type == BT_CHARACTER
+   && expr2->expr_type == EXPR_FUNCTION
+   && !expr2->value.function.isym)
+ expr2->ts.u.cl->backend_decl = rss->info->string_length;
+
gfc_start_block (&fblock);

/* Since the lhs is allocatable, this must be a descriptor type.
Index: gcc/testsuite/gfortran.dg/allocate_assumed_charlen_4.f90
===
*** gcc/testsuite/gfortran.dg/allocate_assumed_charlen_4.f90	(nonexistent)
--- gcc/testsuite/gfortran.dg/allocate_assumed_charlen_4.f90	(working copy)
***
*** 0 
--- 1,39 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR82923, in which an ICE occurred because the
+ ! character length from 'getchars' scope was being used in the
+ ! automatic allocataion of 'mine'.
+ !
+ ! Contributed by Werner Blokbuster  
+ !
+ module m
+ implicit none
+ contains
+ function getchars(my_len,my_size)
+ integer, intent(in) :: my_len, my_size
+ character(my_len) :: getchars(my_size)
+ getchars = 'A-'
+ end function getchars
+
+ function getchars2(my_len)
+ integer, intent(in) :: my_len
+ character(my_len) :: getchars2
+ getchars2 = 'B--'
+ end function getchars2
+ end module m
+
+ program testca
+ use m, only: getchars, getchars2
+ implicit none
+ character(:), allocatable :: mine(:)
+ character(:), allocatable :: mine2
+ integer :: i
+
+ ! ICE occured at this line:
+ mine = getchars(2,4)
+ if (any (mine .ne. [('A-', i = 1, 4)])) stop 1
+
+ ! The scalar version was fine and this will keep it so:
+ mine2 = getchars2(3)
+ if (mine2 .ne. 'B--') stop 2
+ end program testca


Re: [wwwdocs] Add GCC 8 Fortran feature description

2018-05-17 Thread Damian Rouson
Thank you!

Damian

On May 17, 2018 at 12:32:19 AM, Thomas Koenig (tkoe...@netcologne.de) wrote:

> Hi Damian,
>
> Partial support is provided for Fortran 2018 teams, which are
> hierarchical
> subsets of images that execute independently of other image subsets.
>
>
> Committed.
>
> Regards
>
> Thomas
>
>


Re: Use conditional internal functions in if-conversion

2018-05-17 Thread Richard Biener
On Wed, May 16, 2018 at 12:17 PM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> This patch uses IFN_COND_* to vectorise conditionally-executed,
> potentially-trapping arithmetic, such as most floating-point
> ops with -ftrapping-math.  E.g.:

>  if (cond) { ... x = a + b; ... }

> becomes:

>  ...
>  x = IFN_COND_ADD (cond, a, b);
>  ...

> When this transformation is done on its own, the value of x for
> !cond isn't important.

> However, the patch also looks for the equivalent of:

>  y = cond ? x : a;

As generated by predicate_all_scalar_phis, right?  So this tries to capture

  if (cond)
y = a / b;
  else
y = a;

But I think it would be more useful to represent the else value explicitely
as extra operand given a constant like 0 is sth that will happen in practice
and is also sth that is I think supported as a result by some targets.

So to support the flow of current if-conversion you'd convert to

  IFN_COND_ADD (cond, a, b, a); // or any other reasonable "default" value
(target controlled?)

and you merge that with a single-use COND_EXPR by replacing that operand.

If a target then cannot support the default value it needs to emit a blend
afterwards.

> in which the "then" value is the result of the conditionally-executed
> operation and the "else" value is the first operand of that operation.
> This "else" value is the one guaranteed by IFN_COND_* and so we can
> replace y with x.

> The patch also adds new conditional functions for multiplication
> and division, which previously weren't needed.  This enables an
> extra fully-masked reduction (of dubious value) in gcc.dg/vect/pr53773.c.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

It does look like a clever way of expressing conditional execution.  It
seems
"scaling" this to more operations is a bit awkward and certainly it would be
nice to see for example the division case also prototyped for AVX512 to see
if it works for anything besides ARM.  Kirill CCed.

Thanks,
Richard.

> Richard


> 2018-05-16  Richard Sandiford  

> gcc/
>  * internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD): New
>  internal functions.
>  * internal-fn.h (vectorized_internal_fn_supported_p): Declare.
>  * internal-fn.c (FOR_EACH_CODE_MAPPING): Handle IFN_COND_MUL,
>  IFN_COND_DIV and IFN_COND_MOD.
>  (get_conditional_internal_fn): Handle RDIV_EXPR.
>  (can_interpret_as_conditional_op_p): Use RDIV_EXPR for
floating-point
>  divisions.
>  (internal_fn_mask_index): Handle conditional internal functions.
>  (vectorized_internal_fn_supported_p): New function.
>  * optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab)
>  (cond_udiv_optab, cond_umod_optab): New optabs.
>  * tree-if-conv.c: Include internal-fn.h.
>  (any_pred_load_store): Replace with...
>  (need_to_predicate): ...this new variable.
>  (redundant_ssa_names): New variable.
>  (ifcvt_can_use_mask_load_store): Move initial checks to...
>  (ifcvt_can_predicate): ...this new function.  Handle tree codes
>  for which a conditional internal function exists.
>  (if_convertible_gimple_assign_stmt_p): Use ifcvt_can_predicate
>  instead of ifcvt_can_use_mask_load_store.  Update after variable
>  name change.
>  (predicate_load_or_store): New function, split out from
>  predicate_mem_writes.
>  (check_redundant_cond_expr, predicate_rhs_code): New functions.
>  (predicate_mem_writes): Rename to...
>  (predicate_statements): ...this.  Use predicate_load_or_store
>  and predicate_rhs_code.
>  (combine_blocks, tree_if_conversion): Update after above name
changes.
>  (ifcvt_local_dce): Handle redundant_ssa_names.
>  * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
Handle
>  general conditional functions.
>  * tree-vect-stmts.c (vectorizable_call): Likewise.
>  * config/aarch64/aarch64-sve.md (cond_): New pattern
>  for SVE_COND_INT2_SD_OP.
>  * config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_SDIV)
>  (UNSPEC_UDIV): New unspecs.
>  (SVE_COND_INT2_OP): Include UNSPEC_MUL.
>  (SVE_COND_INT2_SD_OP): New int iterator.
>  (SVE_COND_FP2_OP): Include UNSPEC_MUL and UNSPEC_SDIV.
>  (optab, sve_int_op): Handle UNSPEC_COND_MUL, UNSPEC_COND_SDIV
>  and UNSPEC_COND_UDIV.
>  (sve_fp_op): Handle UNSPEC_COND_MUL and UNSPEC_COND_SDIV.

> gcc/testsuite/
>  * gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using
>  fully-masked loops with a fixed vector length.
>  * gcc.target/aarch64/sve/cond_arith_1.c: New test.
>  * gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
>  * gcc.target/aarch64/sve/cond_arith_2.c: Likewi

Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-17 Thread Jason Merrill
On Thu, May 17, 2018 at 4:14 AM, Andreas Schwab  wrote:
> On Mai 16 2018, Andreas Schwab  wrote:
>> On Mai 15 2018, Jason Merrill  wrote:
>>
>>> commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
>>> Author: Jason Merrill 
>>> Date:   Tue May 15 20:46:54 2018 -0400
>>>
>>> * cp-tree.h (cp_expr): Remove copy constructor.
>>>
>>> * mangle.c (struct releasing_vec): Declare copy constructor.
>>
>> I'm getting an ICE on ia64 during the stage1 build of libstdc++ (perhaps
>> related that this uses gcc 4.8 as the bootstrap compiler):
>
> I have now switched to gcc 5 as the bootstrap compiler, which doesn't
> have this issue.

Aha.  Is it the cp_expr change that confused 4.8?

Jason


[PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-17 Thread Bernd Edlinger
Ping...


On 05/03/18 22:13, Bernd Edlinger wrote:
> Hi,
> 
> this is basically the same patch I posted a few months ago,
> with a few formatting nits by Jakub fixed.
> 
> Bootstrapped and reg-tested again with current trunk.
> 
> Is it OK for trunk?
> 
> 
> Bernd.
> 
> On 12/15/17 11:44, Bernd Edlinger wrote:
>> Hi,
>>
>> when working on the -Wcast-function-type patch I noticed some rather
>> ugly and non-portable function type casts that are necessary to 
>> accomplish
>> some actually very simple tasks.
>>
>> Often functions taking pointer arguments are called with a different 
>> signature
>> taking uintptr_t arguments, which is IMHO not really safe to do...
>>
>> The attached patch adds a context argument to the callback functions but
>> keeps the existing interface as far as possible.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>


Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-17 Thread Richard Biener
On Wed, May 16, 2018 at 3:54 PM Martin Liška  wrote:

> On 05/16/2018 03:39 PM, Alexander Monakov wrote:
> > On Wed, 16 May 2018, Martin Liška wrote:
> >>> Hm, is the off-by-one in the new explanatory text really intended? I
think
> >>> the previous text was accurate, and the new text should say "9th and
10th"
> >>> and then "first 10 invocations", unless I'm missing something?
> >>
> >> I've reconsidered that once more time and having zero-based values:
> >> * -fdbg-cnt=event:N - trigger event N-times
> >> * -fdbg-cnt=event:N:(N+M) - skip even N-times and then enable it M-1
times
> >>
> >> Does that make sense?
> >
> > Yes, I like this, but I think the implementation does not match. New
docs say:
> >
> >> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
> >> -@code{dbg_cnt(dce)} returns true only for first 10 invocations.
> >> +For example, with @option{-fdbg-cnt=dce:2:4,tail_call:10},
> >> +@code{dbg_cnt(dce)} returns true only for third and fourth invocation.
> >> +For @code{dbg_cnt(tail_call)} true is returned for first 10
invocations.
> >
> > which is good, but the implementation reads:
> >
> >>  bool
> >>  dbg_cnt_is_enabled (enum debug_counter index)
> >>  {
> >> -  return count[index] <= limit[index];
> >> +  unsigned v = count[index];
> >> +  return v >= limit_low[index] && v < limit_high[index];
> >>  }
> >
> > which I believe is misaligned with the docs' intention. It should be the
> > other way around:
> >
> >   return v > limit_low[index] && v <= limit_high[index];

> Note that I changed count[index]++ to happen after dbg_cnt_is_enabled.
I'm reverting that
> and now it works fine with your condition.

OK.

Richard.

> Martin

> >
> > Alexander
> >


Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-17 Thread Andreas Schwab
On Mai 17 2018, Jason Merrill  wrote:

> On Thu, May 17, 2018 at 4:14 AM, Andreas Schwab  wrote:
>> On Mai 16 2018, Andreas Schwab  wrote:
>>> On Mai 15 2018, Jason Merrill  wrote:
>>>
 commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
 Author: Jason Merrill 
 Date:   Tue May 15 20:46:54 2018 -0400

 * cp-tree.h (cp_expr): Remove copy constructor.

 * mangle.c (struct releasing_vec): Declare copy constructor.
>>>
>>> I'm getting an ICE on ia64 during the stage1 build of libstdc++ (perhaps
>>> related that this uses gcc 4.8 as the bootstrap compiler):
>>
>> I have now switched to gcc 5 as the bootstrap compiler, which doesn't
>> have this issue.
>
> Aha.  Is it the cp_expr change that confused 4.8?

I haven't looked closer.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 1:42 PM Marc Glisse  wrote:

> On Thu, 17 May 2018, Richard Biener wrote:

> > On Thu, May 17, 2018 at 1:14 PM Marc Glisse 
wrote:
> >
> >> On Thu, 17 May 2018, Jonathan Wakely wrote:
> >
> >>> On 17/05/18 12:54 +0200, Marc Glisse wrote:
>  On Mon, 14 May 2018, Jonathan Wakely wrote:
> 
> > As discussed at
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
> > we can simplify the allocator function for valarray memory. I also
> > noticed that the _Array(size_t) constructor is never used.
> >
> > * include/bits/valarray_array.h (__valarray_get_memory): Remove.
> > (__valarray_get_storage): Call operator new directly. Remove
> > ignored
> > top-level restrict qualifier and add malloc attribute instead.
> 
>  I am trying to understand the point of adding this attribute. The
> > function
>  is just
> 
>  { return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }
> 
>  The idea is that it isn't safe (? see PR 23383) to mark operator new
> > with
>  the attribute, but it is safe for this particular use?
> >>>
> >>> I'd forgotten about that (I was assuming the compiler doesn't need to
> >>> be told about the properties of operator new, because they're defined
> >>> by the language). We can remove the attribute.
> >
> >> I am not necessarily asking to remove it. I don't have a good
> >> understanding of what would break if we marked operator new with the
> >> attribute, so I have no idea if those reasons also apply for this use
in
> >> valarray.
> >
>  When optimizing, I certainly hope this trivial function gets inlined,
> > and
>  then the attribute is lost (should the inliner add 'restrict' when
> > inlining
>  a function with attribute malloc?) and all that matters is operator
> > new.
> >
> >> If we determine that using the attribute here but not on operator new
is
> >> the right choice, then I believe we need some middle-end tweaks so it
> >> isn't ignored.
> >
> > We don't have a good way to do this.  Your suggestion of adding restrict
> > would work if it were not that only function-scope restrict uses are
later
> > handled...

> This seems extremely similar to the issue of inlining functions with
> restrict arguments.

That works to the extent that the effect of restrict is reflected in
the memory references in the IL by PTA.  But for our case there
is no restrict qualified arguments but only return values.

Richard.

> I have written a PR, but it is probably not worth submitting.

> --
> Marc Glisse


Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger 
wrote:

> Ping...

So this makes all traditional users go through the indirect
splay_tree_compare_wrapper
and friends (which is also exported for no good reason?).  And all users
are traditional
at the moment.

So I wonder if it's better to have a complete alternate interface?  I do
not see many
users besides gcc, there's a use in bfd elf32-xtensa.c and some uses in
gdb.  Of course
disregarding any users outside of SRC.

Richard.


> On 05/03/18 22:13, Bernd Edlinger wrote:
> > Hi,
> >
> > this is basically the same patch I posted a few months ago,
> > with a few formatting nits by Jakub fixed.
> >
> > Bootstrapped and reg-tested again with current trunk.
> >
> > Is it OK for trunk?
> >
> >
> > Bernd.
> >
> > On 12/15/17 11:44, Bernd Edlinger wrote:
> >> Hi,
> >>
> >> when working on the -Wcast-function-type patch I noticed some rather
> >> ugly and non-portable function type casts that are necessary to
> >> accomplish
> >> some actually very simple tasks.
> >>
> >> Often functions taking pointer arguments are called with a different
> >> signature
> >> taking uintptr_t arguments, which is IMHO not really safe to do...
> >>
> >> The attached patch adds a context argument to the callback functions
but
> >> keeps the existing interface as far as possible.
> >>
> >>
> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> >> Is it OK for trunk?
> >>
> >>
> >> Thanks
> >> Bernd.
> >>


Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 3:25 PM Andreas Schwab  wrote:

> On Mai 17 2018, Jason Merrill  wrote:

> > On Thu, May 17, 2018 at 4:14 AM, Andreas Schwab  wrote:
> >> On Mai 16 2018, Andreas Schwab  wrote:
> >>> On Mai 15 2018, Jason Merrill  wrote:
> >>>
>  commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
>  Author: Jason Merrill 
>  Date:   Tue May 15 20:46:54 2018 -0400
> 
>  * cp-tree.h (cp_expr): Remove copy constructor.
> 
>  * mangle.c (struct releasing_vec): Declare copy
constructor.
> >>>
> >>> I'm getting an ICE on ia64 during the stage1 build of libstdc++
(perhaps
> >>> related that this uses gcc 4.8 as the bootstrap compiler):
> >>
> >> I have now switched to gcc 5 as the bootstrap compiler, which doesn't
> >> have this issue.
> >
> > Aha.  Is it the cp_expr change that confused 4.8?

> I haven't looked closer.

I have no problems with GCC 4.8 on trunk x86_64.

Richard.

> Andreas.

> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Kyrill Tkachov


On 17/05/18 09:46, Kyrill Tkachov wrote:


On 15/05/18 18:56, Richard Sandiford wrote:

Kyrill  Tkachov  writes:

Hi all,

We've a deficiency in our vec_set family of patterns.  We don't
support directly loading a vector lane using LD1 for V2DImode and all
the vector floating-point modes.  We do do it correctly for the other
integer vector modes (V4SI, V8HI etc) though.

The alternatives on the relative floating-point patterns only allow a
register-to-register INS instruction.  That means if we want to load a
value into a vector lane we must first load it into a scalar register
and then perform an INS, which is wasteful.

There is also an explicit V2DI vec_set expander dangling around for no
reason that I can see. It seems to do the exact same things as the
other vec_set expanders. This patch removes that.  It now unifies all
vec_set expansions into a single "vec_set" define_expand using
the catch-all VALL_F16 iterator.

I decided to leave two aarch64_simd_vec_set define_insns. One
for the integer vector modes (that now include V2DI) and one for the
floating-point vector modes. That is so that we can avoid specifying
"w,r" alternatives for floating-point modes in case the
register-allocator gets confused and starts gratuitously moving
registers between the two banks.  So the floating-point pattern only
two alternatives, one for SIMD-to-SIMD INS and one for LD1.

Did you see any cases in which this was necessary?  In some ways it
seems to run counter to Wilco's recent patches, which tended to remove
the * markers from the "unnatural" register class and trust the register
allocator to make a sensible decision.

I think our default position should be trust the allocator here.
If the consumers all require "w" registers then the RA will surely
try to use "w" registers if at all possible.  But if the consumers
don't care then it seems reasonable to offer both, since in those
cases it doesn't really make much difference whether the payload
happens to be SF or SI (say).

There are also cases in which the consumer could actively require
an integer register.  E.g. some code uses unions to bitcast floats
to ints and then do bitwise arithmetic on them.



Thanks, that makes sense. Honestly, it's been a few months since I worked on 
this patch.
I believe my reluctance to specify that alternative was that it would mean 
merging the integer and
floating-point patterns into one (like the attached version) which would put the "w, 
r" alternative
first for the floating-point case. I guess we should be able to trust the 
allocator to pick
the sensible  alternative though.



With some help from Wilco I can see how this approach will give us suboptimal 
code though.
If we modify the example from my original post to be:
v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, b[2], *c, *d };
return res;
}

The b[2] load will load into a GP register then do an expensive INS into the 
SIMD register
instead of loading into an FP S-register and then doing a SIMD-to-SIMD INS.
The only way I can get it to use the FP load then is to mark the "w, r" 
alternative with a '?'

Kyrill



This version is then made even simpler due to all the vec_set patterns being 
merged into one.
Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-05-17  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (vec_set): Use VALL_F16 mode
iterator.  Delete separate integer-mode vec_set expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set): Delete all other patterns with that name.
Use VALL_F16 mode iterator.  Add LD1 alternative and use vwcore for
the "w, r" alternative.

2018-05-17  Kyrylo Tkachov  

* gcc.target/aarch64/vect-init-ld1.c: New test.


With this patch we avoid loading values into scalar registers and then
doing an explicit INS on them to move them into the desired vector
lanes. For example for:

typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));

v2di
foo_v2di (long long *a, long long *b)
{
v2di res = { *a, *b };
return res;
}

v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, *b, *c, *d };
return res;
}

we currently generate:

foo_v2di:
  ldr d0, [x0]
  ldr x0, [x1]
  ins v0.d[1], x0
  ret

foo_v4sf:
  ldr s0, [x0]
  ldr s3, [x1]
  ldr s2, [x2]
  ldr s1, [x3]
  ins v0.s[1], v3.s[0]
  ins v0.s[2], v2.s[0]
  ins v0.s[3], v1.s[0]
  ret

but with this patch we generate the much cleaner:
foo_v2di:
  ldr d0, [x0]
  ld1 {v0.d}[1], [x1]
  ret

foo_v4sf:
  ldr s0, [x0]
  ld1 {v0.s}[1], [x1]
  ld1 {v0.s}[2], [x2]
  ld1 {v0.s}[3], [x3]
  ret

Nice!  The original reason for:

   /* FIXME: At the 

Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-17 Thread Jakub Jelinek
On Thu, May 17, 2018 at 03:39:42PM +0200, Richard Biener wrote:
> On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger 
> wrote:
> 
> > Ping...
> 
> So this makes all traditional users go through the indirect
> splay_tree_compare_wrapper
> and friends (which is also exported for no good reason?).  And all users
> are traditional
> at the moment.
> 
> So I wonder if it's better to have a complete alternate interface?  I do
> not see many
> users besides gcc, there's a use in bfd elf32-xtensa.c and some uses in

libgomp has a copy (intentionally so, it is slightly changed).

Jakub


Re: [PATCH GCC][4/6]Support regional coalesce and live range computation

2018-05-17 Thread Richard Biener
On Tue, May 15, 2018 at 5:44 PM Bin.Cheng  wrote:

> On Fri, May 11, 2018 at 1:53 PM, Richard Biener
>  wrote:
> > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng  wrote:
> >> Hi,
> >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c
and
> >> tree-ssa-coalesce.c to compute register pressure, rather than inventing
> >> another live range solver.
> >>
> >> The major change is to record region's basic blocks in var_map and use
that
> >> information in computation, rather than FOR_EACH_BB_FN.  For now only
loop
> >> and function type regions are supported.  The default one is function
type
> >> region which is used in out-of-ssa.  Loop type region will be used in
next
> >> patch to compute information for a loop.
> >>
> >> Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?
> >
> > I believe your changes to create_outofssa_var_map should be done
differently
> > by simply only calling it from the coalescing context and passing in the
> > var_map rather than initializing it therein and returning it.
> >
> > This also means the coalesce_vars_p flag in the var_map structure looks
> > somewhat out-of-place.  That is, it looks you could do with many less
> > changes if you refactored what calls what slightly?  For example
> > the extra arg to gimple_can_coalesce_p looks unneeded.
> >
> > Just as a note I do have a CFG helper pending that computes RPO order
> > for SEME regions (attached).  loops are SEME regions, so your RTYPE_SESE
> > is somewhat odd - I guess RTYPE_LOOP exists only because of the
> > convenience of passing in a loop * to the "constructor".  I'd rather
> > drop this region_type thing and always assume a SEME region - at least
> > I didn't see anything in the patch that depends on any of the forms
> > apart from the initial BB gathering.

> Hi Richard,

> Thanks for reviewing.  I refactored tree-ssa-live.c and
> tree-ssa-coalesce.c following your comments.
> Basically I did following changes:
> 1) Remove region_type and only support loop region live range computation.
>  Also I added one boolean field in var_map indicating whether we
> are computing
>  loop region live range or out-of-ssa.
> 2) Refactored create_outofssa_var_map into
create_coalesce_list_for_region and
>  populate_coalesce_list_for_outofssa.  Actually the original
> function name doesn't
>  quite make sense because it has nothing to do with var_map.
> 3) Hoist init_var_map up in call stack.  Now it's called by caller
> (out-of-ssa or live range)
>  and the returned var_map is passed to coalesce_* stuff.
> 4) Move global functions to tree-outof-ssa.c and make them static.
> 5) Save/restore flag_tree_coalesce_vars in order to avoid updating
> checks on the flag.

> So how is this one?  Patch attached.

A lot better.  Few things I noticed:

+  map->bmp_bbs = BITMAP_ALLOC (NULL);

use a bitmap_head member and bitmap_initialize ().

+  map->vec_bbs = new vec ();

use a vec<> member and map->vec_bbs = vNULL;

Both changes remove an unnecessary indirection.

+  map->outofssa_p = true;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+   {
+ bitmap_set_bit (map->bmp_bbs, bb->index);
+ map->vec_bbs->safe_push (bb);
+   }

I think you can avoid populating the bitmap and return
true unconditionally for outofssa_p in the contains function?
Ah, you already do - so why populate the bitmap?

+/* Return TRUE if region of the MAP contains basic block BB.  */
+
+inline bool
+region_contains_p (var_map map, basic_block bb)
+{
+  if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+  || bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
+return false;
+
+  if (map->outofssa_p)
+return true;
+
+  return bitmap_bit_p (map->bmp_bbs, bb->index);

the entry/exit block check should be conditional in map->outofssa_p
but I think we should never get the function called with those args
so we can as well use a gcc_checking_assert ()?

I think as followup we should try to get a BB order that
is more suited for the dataflow problem.  Btw, I was
thinking about adding anoter "visited" BB flag that is guaranteed to
be unset and free to be used by infrastructure.  So the bitmap
could be elided for a bb flag check (but we need to clear that flag
at the end of processing).  Not sure if it's worth to add a machinery
to dynamically assign pass-specific flags...  it would at least be
less error prone.  Sth to think about.

So -- I think the patch is ok with the two indirections removed,
the rest can be optimized as followup.

Thanks,
Richard.


> Thanks,
> bin
> 2018-05-15  Bin Cheng  

>  * tree-outof-ssa.c (tree-ssa.h, tree-dfa.h): Include header files.
>  (create_default_def, for_all_parms): Moved from tree-ssa-coalesce.c.
>  (parm_default_def_partition_arg): Ditto.
>  (set_parm_default_def_partition): Ditto.
>  (get_parm_default_def_partitions): Ditto and make it static.
>  (get_undefined_value_partitions): Ditto and make it static.
>  (remove_ssa_form): Refactor call to init_

Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-17 Thread Richard Biener
On Thu, May 17, 2018 at 4:04 PM Jakub Jelinek  wrote:

> On Thu, May 17, 2018 at 03:39:42PM +0200, Richard Biener wrote:
> > On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger <
bernd.edlin...@hotmail.de>
> > wrote:
> >
> > > Ping...
> >
> > So this makes all traditional users go through the indirect
> > splay_tree_compare_wrapper
> > and friends (which is also exported for no good reason?).  And all users
> > are traditional
> > at the moment.
> >
> > So I wonder if it's better to have a complete alternate interface?  I do
> > not see many
> > users besides gcc, there's a use in bfd elf32-xtensa.c and some uses in

> libgomp has a copy (intentionally so, it is slightly changed).

So if that is not exported ABI wise then maybe we can unshare it again
into an even more improved splay_tree2 in libiberty and deprecate the
"old" one?

DJ, how's "compatibility" supposed to work with libiberty?

Richard.


>  Jakub


[C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Paolo Carlini

Hi,

thus I had to revert my first try, when it caused c++/85713. I added two 
testcases for the latter (the second one covering what I learned from 
yet another defective try which I attached to the trail of c++/84588 
yesterday) and finally figured out that the problem was that I was 
incorrectly calling abort_fully_implicit_template while tentatively 
parsing (if you look at the new lambda-generic-85713.C, that made 
impossible correctly parsing 'auto (&array) [5]' as second lambda 
parameter).


Anyway, a few days ago, while looking for a completely different 
solution and comparing to other compilers too, I noticed that, more 
generally, we were missing a check in cp_parser_condition that we aren't 
declaring a function type when we are sure that we are handling a 
declaration: simply adding such a check covers as a special case 
c++/84588 too, and, being the check in cp_parser_condition, it 
automatically covers variants for conditions elsewhere, eg, for, while 
loops: with the patchlet below we handle all of them very similarly to 
clang and icc.


Tested x86_64-linux.

Thanks, Paolo.

//

/cp
2018-05-17  Paolo Carlini  

PR c++/84588
* parser.c (cp_parser_condition): Reject a declaration of
a function type.

/testsuite
2018-05-17  Paolo Carlini  

PR c++/84588
* g++.dg/cpp1y/pr84588.C: New.
* g++.old-deja/g++.jason/cond.C: Adjust.
Index: cp/parser.c
===
--- cp/parser.c (revision 260308)
+++ cp/parser.c (working copy)
@@ -11571,6 +11571,7 @@ cp_parser_condition (cp_parser* parser)
   tree attributes;
   cp_declarator *declarator;
   tree initializer = NULL_TREE;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Parse the declarator.  */
   declarator = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
@@ -11597,15 +11598,23 @@ cp_parser_condition (cp_parser* parser)
 for sure.  */
   if (cp_parser_parse_definitely (parser))
{
- tree pushed_scope;
+ tree pushed_scope = NULL_TREE;
  bool non_constant_p;
  int flags = LOOKUP_ONLYCONVERTING;
 
  /* Create the declaration.  */
- decl = start_decl (declarator, &type_specifiers,
-/*initialized_p=*/true,
-attributes, /*prefix_attributes=*/NULL_TREE,
-&pushed_scope);
+ if (declarator->kind == cdk_function)
+   {
+ error_at (loc, "a function type is not allowed here");
+ if (parser->fully_implicit_function_template_p)
+   abort_fully_implicit_template (parser);
+ decl = error_mark_node;
+   }
+ else
+   decl = start_decl (declarator, &type_specifiers,
+  /*initialized_p=*/true,
+  attributes, /*prefix_attributes=*/NULL_TREE,
+  &pushed_scope);
 
  /* Parse the initializer.  */
  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
Index: testsuite/g++.dg/cpp1y/pr84588.C
===
--- testsuite/g++.dg/cpp1y/pr84588.C(nonexistent)
+++ testsuite/g++.dg/cpp1y/pr84588.C(working copy)
@@ -0,0 +1,25 @@
+// { dg-do compile { target c++14 } }
+
+struct a {
+  void b() {}
+  void c(void (*) () = [] {
+  if (a a(int auto) {})  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
+
+struct d {
+  void e() {}
+  void f(void (*) () = [] {
+  for (;d d(int auto) {};)  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
+
+struct g {
+  void h() {}
+  void i(void (*) () = [] {
+  while (g g(int auto) {})  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
Index: testsuite/g++.old-deja/g++.jason/cond.C
===
--- testsuite/g++.old-deja/g++.jason/cond.C (revision 260308)
+++ testsuite/g++.old-deja/g++.jason/cond.C (working copy)
@@ -47,8 +47,7 @@ int main()
   if (struct B * foo = new B)
 ;
 
-  if (int f () = 1)// { dg-warning "extern" "extern" } 
-  // { dg-error "is initialized like a variable" "var" { target *-*-* } .-1 }
+  if (int f () = 1)// { dg-error "function type" } 
 ;
   
   if (int a[2] = {1, 2})   // { dg-error "extended init" "" { target { ! 
c++11 } } }


Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Kyrill Tkachov


On 17/05/18 14:56, Kyrill Tkachov wrote:


On 17/05/18 09:46, Kyrill Tkachov wrote:


On 15/05/18 18:56, Richard Sandiford wrote:

Kyrill  Tkachov  writes:

Hi all,

We've a deficiency in our vec_set family of patterns.  We don't
support directly loading a vector lane using LD1 for V2DImode and all
the vector floating-point modes.  We do do it correctly for the other
integer vector modes (V4SI, V8HI etc) though.

The alternatives on the relative floating-point patterns only allow a
register-to-register INS instruction.  That means if we want to load a
value into a vector lane we must first load it into a scalar register
and then perform an INS, which is wasteful.

There is also an explicit V2DI vec_set expander dangling around for no
reason that I can see. It seems to do the exact same things as the
other vec_set expanders. This patch removes that.  It now unifies all
vec_set expansions into a single "vec_set" define_expand using
the catch-all VALL_F16 iterator.

I decided to leave two aarch64_simd_vec_set define_insns. One
for the integer vector modes (that now include V2DI) and one for the
floating-point vector modes. That is so that we can avoid specifying
"w,r" alternatives for floating-point modes in case the
register-allocator gets confused and starts gratuitously moving
registers between the two banks.  So the floating-point pattern only
two alternatives, one for SIMD-to-SIMD INS and one for LD1.

Did you see any cases in which this was necessary?  In some ways it
seems to run counter to Wilco's recent patches, which tended to remove
the * markers from the "unnatural" register class and trust the register
allocator to make a sensible decision.

I think our default position should be trust the allocator here.
If the consumers all require "w" registers then the RA will surely
try to use "w" registers if at all possible.  But if the consumers
don't care then it seems reasonable to offer both, since in those
cases it doesn't really make much difference whether the payload
happens to be SF or SI (say).

There are also cases in which the consumer could actively require
an integer register.  E.g. some code uses unions to bitcast floats
to ints and then do bitwise arithmetic on them.



Thanks, that makes sense. Honestly, it's been a few months since I worked on 
this patch.
I believe my reluctance to specify that alternative was that it would mean 
merging the integer and
floating-point patterns into one (like the attached version) which would put the "w, 
r" alternative
first for the floating-point case. I guess we should be able to trust the 
allocator to pick
the sensible  alternative though.



With some help from Wilco I can see how this approach will give us suboptimal 
code though.
If we modify the example from my original post to be:
v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, b[2], *c, *d };
return res;
}

The b[2] load will load into a GP register then do an expensive INS into the 
SIMD register
instead of loading into an FP S-register and then doing a SIMD-to-SIMD INS.
The only way I can get it to use the FP load then is to mark the "w, r" 
alternative with a '?'



That patch would look like the attached. Is this preferable?
For the above example it generates the desired:
foo_v4sf:
ldr s0, [x0]
ldr s1, [x1, 8]
ins v0.s[1], v1.s[0]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
ret


rather than loading [x1, 8] into a W-reg.

Thanks,
Kyrill



Kyrill



This version is then made even simpler due to all the vec_set patterns being 
merged into one.
Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-05-17  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (vec_set): Use VALL_F16 mode
iterator.  Delete separate integer-mode vec_set expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set): Delete all other patterns with that name.
Use VALL_F16 mode iterator.  Add LD1 alternative and use vwcore for
the "w, r" alternative.

2018-05-17  Kyrylo Tkachov  

* gcc.target/aarch64/vect-init-ld1.c: New test.


With this patch we avoid loading values into scalar registers and then
doing an explicit INS on them to move them into the desired vector
lanes. For example for:

typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));

v2di
foo_v2di (long long *a, long long *b)
{
v2di res = { *a, *b };
return res;
}

v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, *b, *c, *d };
return res;
}

we currently generate:

foo_v2di:
  ldr d0, [x0]
  ldr x0, [x1]
  ins v0.d[1], x0
  ret

foo_v4sf:
  ldr s0, [x0]
  ldr s3, [x1]
  ldr s2, [x2]
  ldr s1, [x3]
  ins v0.s[1], v3.s[0]
  ins v0.s[2], v2.s[0]

Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Paolo Carlini
PS: maybe better using function_declarator_p??? I think I regression 
tested that variant too, at some point.


Paolo.


Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-17 Thread Bernd Edlinger
On 05/17/18 15:39, Richard Biener wrote:
> On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger 
> wrote:
> 
>> Ping...
> 
> So this makes all traditional users go through the indirect
> splay_tree_compare_wrapper
> and friends (which is also exported for no good reason?).  And all users
> are traditional
> at the moment.
> 

all except gcc/typed-splay-tree.h which only works if VALUE_TYPE is
compatible with uint_ptr_t but cannot check this requirement.
This one worried me the most.

But not having to rewrite omp-low.c for instance where splay_tree_lookup
and access to n->value are made all the time, made me think it
will not work to rip out the old interface completely.


Bernd.

> So I wonder if it's better to have a complete alternate interface?  I do
> not see many
> users besides gcc, there's a use in bfd elf32-xtensa.c and some uses in
> gdb.  Of course
> disregarding any users outside of SRC.
> 
> Richard.
> 
> 
>> On 05/03/18 22:13, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is basically the same patch I posted a few months ago,
>>> with a few formatting nits by Jakub fixed.
>>>
>>> Bootstrapped and reg-tested again with current trunk.
>>>
>>> Is it OK for trunk?
>>>
>>>
>>> Bernd.
>>>
>>> On 12/15/17 11:44, Bernd Edlinger wrote:
 Hi,

 when working on the -Wcast-function-type patch I noticed some rather
 ugly and non-portable function type casts that are necessary to
 accomplish
 some actually very simple tasks.

 Often functions taking pointer arguments are called with a different
 signature
 taking uintptr_t arguments, which is IMHO not really safe to do...

 The attached patch adds a context argument to the callback functions
> but
 keeps the existing interface as far as possible.


 Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
 Is it OK for trunk?


 Thanks
 Bernd.



Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Jason Merrill
On Thu, May 17, 2018 at 10:27 AM, Paolo Carlini
 wrote:
> PS: maybe better using function_declarator_p?

I think so, yes.  The relevant rule seems to be "The declarator shall
not specify a function or an array.", so let's check for arrays, too.

Jason


Re: Rb_tree constructor optimization

2018-05-17 Thread Jonathan Wakely

On 15/05/18 07:30 +0200, François Dumont wrote:
Here it is again even more simplified. Should I backport the Debug 
mode fix to 8 branch ?


Yes, please do backport the include/debug/* changes.



    * include/bits/stl_tree.h
    (_Rb_tree_impl(_Rb_tree_impl&&, _Node_allocator&&)): New.
    (_Rb_tree(_Rb_tree&&, _Node_allocator&&, true_type)): New, use latter.
    (_Rb_tree(_Rb_tree&&, _Node_allocator&&, false_type)): New.
    (_Rb_tree(_Rb_tree&&, _Node_allocator&&)): Adapt, use latters.
    * include/debug/map.h
    (map(map&&, const_allocator_type&)): Add noexcept qualitication.
    * include/debug/multimap.h
    (multimap(multimap&&, const_allocator_type&)): Add noexcept 
qualification.

    * include/debug/set.h
    (set(set&&, const_allocator_type&)): Add noexcept qualitication.
    * include/debug/multiset.h
    (multiset(multiset&&, const_allocator_type&)): Add noexcept 
qualification.

    * testsuite/23_containers/map/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/map/cons/noexcept_move_construct.cc:
    Add checks.
    * testsuite/23_containers/multimap/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/multimap/cons/noexcept_move_construct.cc:
    Add checks.
    * testsuite/23_containers/multiset/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/multiset/cons/noexcept_move_construct.cc:
    Add checks.
    * testsuite/23_containers/set/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/set/cons/noexcept_move_construct.cc:
    Add checks.

Ok to commit ?


Yes, OK for trunk - thanks.




Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Wilco Dijkstra
Kyrill Tkachov wrote:

> That patch would look like the attached. Is this preferable?
> For the above example it generates the desired:
> foo_v4sf:
>   ldr s0, [x0]
>   ldr s1, [x1, 8]
>   ins v0.s[1], v1.s[0]
>   ld1 {v0.s}[2], [x2]
>   ld1 {v0.s}[3], [x3]
>    ret

Yes that's what I expect. Also with only non-zero offsets we emit:

foo_v2di:
ldr d0, [x0, 8]
ldr d1, [x1, 16]
ins v0.d[1], v1.d[0]
ret

foo_v4sf:
ldr s0, [x0, 4]
ldr s3, [x1, 20]
ldr s2, [x2, 32]
ldr s1, [x3, 80]
ins v0.s[1], v3.s[0]
ins v0.s[2], v2.s[0]
ins v0.s[3], v1.s[0]
ret

The patch looks good now, lots of patterns removed, yet we generate better code!

Wilco

[PATCH] PR libstdc++/85812 fix memory leak in std::make_exception_ptr

2018-05-17 Thread Jonathan Wakely

As the PR points out, the constructor called in the placement new
expression can throw, in which case the allocation would be leaked.

Separating the two implementations makes it much easier to read what
the code is doing.

PR libstdc++/85812
* libsupc++/cxxabi_init_exception.h (__cxa_free_exception): Declare.
* libsupc++/exception_ptr.h (make_exception_ptr) [__cpp_exceptions]:
Refactor to separate non-throwing and throwing implementations.
[__cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI]: Deallocate the memory
if constructing the object throws.

Tested powerpc64le-linux (and also using ASan to verify the fix).
Committed to trunk. Backports to gcc-7 and gcc-8 will follow.


commit 3d02d84556f2be22945d397ed2fb4dbff8a0788e
Author: Jonathan Wakely 
Date:   Thu May 17 13:51:04 2018 +0100

PR libstdc++/85812 fix memory leak in std::make_exception_ptr

PR libstdc++/85812
* libsupc++/cxxabi_init_exception.h (__cxa_free_exception): Declare.
* libsupc++/exception_ptr.h (make_exception_ptr) [__cpp_exceptions]:
Refactor to separate non-throwing and throwing implementations.
[__cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI]: Deallocate the memory
if constructing the object throws.

diff --git a/libstdc++-v3/libsupc++/cxxabi_init_exception.h 
b/libstdc++-v3/libsupc++/cxxabi_init_exception.h
index d973a087f14..e438c1008d9 100644
--- a/libstdc++-v3/libsupc++/cxxabi_init_exception.h
+++ b/libstdc++-v3/libsupc++/cxxabi_init_exception.h
@@ -62,6 +62,9 @@ namespace __cxxabiv1
   void*
   __cxa_allocate_exception(size_t) _GLIBCXX_NOTHROW;
 
+  void
+  __cxa_free_exception(void*) _GLIBCXX_NOTHROW;
+
   // Initialize exception (this is a GNU extension)
   __cxa_refcounted_exception*
   __cxa_init_primary_exception(void *object, std::type_info *tinfo,
diff --git a/libstdc++-v3/libsupc++/exception_ptr.h 
b/libstdc++-v3/libsupc++/exception_ptr.h
index a927327214d..bd355ed880b 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -178,25 +178,31 @@ namespace std
 exception_ptr 
 make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
 {
-#if __cpp_exceptions
+#if __cpp_exceptions && __cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI
+  void* __e = __cxxabiv1::__cxa_allocate_exception(sizeof(_Ex));
+  (void) __cxxabiv1::__cxa_init_primary_exception(
+ __e, const_cast(&typeid(__ex)),
+ __exception_ptr::__dest_thunk<_Ex>);
   try
{
-#if __cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI
-  void *__e = __cxxabiv1::__cxa_allocate_exception(sizeof(_Ex));
-  (void)__cxxabiv1::__cxa_init_primary_exception(
- __e, const_cast(&typeid(__ex)),
- __exception_ptr::__dest_thunk<_Ex>);
   ::new (__e) _Ex(__ex);
   return exception_ptr(__e);
-#else
+   }
+  catch(...)
+   {
+ __cxxabiv1::__cxa_free_exception(__e);
+ return current_exception();
+   }
+#elif __cpp_exceptions
+  try
+   {
   throw __ex;
-#endif
}
   catch(...)
{
  return current_exception();
}
-#else
+#else // no RTTI and no exceptions
   return exception_ptr();
 #endif
 }


[PATCH] PR libstdc++/85818 ensure path::preferred_separator is defined

2018-05-17 Thread Jonathan Wakely

Because path.cc is compiled with -std=gnu++17 the static constexpr
data member is implicitly 'inline' and so no definition gets emitted
unless it gets used in that translation unit. Other translation units
built as C++11 or C++14 still require a namespace-scope definition of
the variable, so mark the definition as used.

PR libstdc++/85818
* src/filesystem/path.cc (path::preferred_separator): Add used
attribute.
* testsuite/experimental/filesystem/path/preferred_separator.cc: New.

Tested powerpc64le-linux, committed to trunk. Backport to gcc-8 to
follow.



commit e80e2abad41faf6bd62bd1f08baa86f71714811e
Author: Jonathan Wakely 
Date:   Thu May 17 13:59:00 2018 +0100

PR libstdc++/85818 ensure path::preferred_separator is defined

Because path.cc is compiled with -std=gnu++17 the static constexpr
data member is implicitly 'inline' and so no definition gets emitted
unless it gets used in that translation unit. Other translation units
built as C++11 or C++14 still require a namespace-scope definition of
the variable, so mark the definition as used.

PR libstdc++/85818
* src/filesystem/path.cc (path::preferred_separator): Add used
attribute.
* testsuite/experimental/filesystem/path/preferred_separator.cc: 
New.

diff --git a/libstdc++-v3/src/filesystem/path.cc 
b/libstdc++-v3/src/filesystem/path.cc
index 4d84168d742..899d94e0067 100644
--- a/libstdc++-v3/src/filesystem/path.cc
+++ b/libstdc++-v3/src/filesystem/path.cc
@@ -33,7 +33,7 @@ using fs::path;
 
 fs::filesystem_error::~filesystem_error() = default;
 
-constexpr path::value_type path::preferred_separator;
+constexpr path::value_type path::preferred_separator [[gnu::used]];
 
 path&
 path::remove_filename()
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/path/preferred_separator.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/path/preferred_separator.cc
new file mode 100644
index 000..b470e312bb1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/preferred_separator.cc
@@ -0,0 +1,34 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-O0 -lstdc++fs -std=gnu++11" }
+// { dg-do link { target c++11 } }
+
+#include 
+
+std::experimental::filesystem::path::value_type
+test01()
+{
+  auto* sep = &std::experimental::filesystem::path::preferred_separator;
+  return *sep;
+}
+
+int
+main()
+{
+  test01();
+}


[PATCH rs6000] Fix PR85698

2018-05-17 Thread Pat Haugen
The following patch fixes a problem that resulted in incorrect code generation 
for the CPU2017 benchmark 525.x264_r. The fix correctly checks the "dest" 
operand, which is the memory operand.

Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
regressions. Ok for trunk?

-Pat


2018-05-17  Pat Haugen  
Segher Boessenkool  

PR target/85698
* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest 
operand.

testsuite/ChangeLog:
2018-05-17  Pat Haugen  

PR target/85698
* gcc.target/powerpc/pr85698.c: New test.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 260267)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20234,7 +20234,7 @@ rs6000_output_move_128bit (rtx operands[
}
 
   else if (TARGET_ALTIVEC && src_vmx_p
-  && altivec_indexed_or_indirect_operand (src, mode))
+  && altivec_indexed_or_indirect_operand (dest, mode))
return "stvx %1,%y0";
 
   else if (TARGET_VSX && src_vsx_p)
Index: gcc/testsuite/gcc.target/powerpc/pr85698.c
===
--- gcc/testsuite/gcc.target/powerpc/pr85698.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr85698.c  (working copy)
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
+/* { dg-options "-O3 -mcpu=power7" } */
+
+/* PR85698: Incorrect code generated on LE due to use of stxvw4x. */
+
+typedef unsigned char uint8_t;
+typedef short int16_t;
+extern void abort (void);
+extern int memcmp(const void *, const void *, __SIZE_TYPE__);
+
+uint8_t expected[128] =
+{14, 0, 4, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
+ 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 28, 35, 33, 35, 36, 37, 38, 39, 40,
+ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 66, 63, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
+ 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 96,
+ 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
+ 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127};
+
+static uint8_t x264_clip_uint8( int x )
+{
+  return x&(~255) ? (-x)>>31 : x;
+}
+void add4x4_idct( uint8_t *p_dst, int16_t dct[16])
+{
+  int16_t d[16];
+  int16_t tmp[16];
+  int i, y, x;
+  for( i = 0; i < 4; i++ )
+{
+  int s02 =  dct[0*4+i] +  dct[2*4+i];
+  int d02 =  dct[0*4+i] -  dct[2*4+i];
+  int s13 =  dct[1*4+i] + (dct[3*4+i]>>1);
+  int d13 = (dct[1*4+i]>>1) -  dct[3*4+i];
+  tmp[i*4+0] = s02 + s13;
+  tmp[i*4+1] = d02 + d13;
+  tmp[i*4+2] = d02 - d13;
+  tmp[i*4+3] = s02 - s13;
+}
+  for( i = 0; i < 4; i++ )
+{
+  int s02 =  tmp[0*4+i] +  tmp[2*4+i];
+  int d02 =  tmp[0*4+i] -  tmp[2*4+i];
+  int s13 =  tmp[1*4+i] + (tmp[3*4+i]>>1);
+  int d13 = (tmp[1*4+i]>>1) -  tmp[3*4+i];
+  d[0*4+i] = ( s02 + s13 + 32 ) >> 6;
+  d[1*4+i] = ( d02 + d13 + 32 ) >> 6;
+  d[2*4+i] = ( d02 - d13 + 32 ) >> 6;
+  d[3*4+i] = ( s02 - s13 + 32 ) >> 6;
+}
+  for( y = 0; y < 4; y++ )
+{
+  for( x = 0; x < 4; x++ )
+p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] );
+  p_dst += 32;
+}
+}
+
+int main()
+{
+  uint8_t dst[128];
+  int16_t dct[16];
+  int i;
+
+  for (i = 0; i < 16; i++)
+dct[i] = i*10 + i;
+  for (i = 0; i < 128; i++)
+dst[i] = i;
+
+  add4x4_idct(dst, dct);
+
+  if (memcmp (dst, expected, 128))
+abort();
+
+ return 0;
+}
+



Re: [PATCH GCC][4/6]Support regional coalesce and live range computation

2018-05-17 Thread Bin.Cheng
On Thu, May 17, 2018 at 3:04 PM, Richard Biener
 wrote:
> On Tue, May 15, 2018 at 5:44 PM Bin.Cheng  wrote:
>
>> On Fri, May 11, 2018 at 1:53 PM, Richard Biener
>>  wrote:
>> > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng  wrote:
>> >> Hi,
>> >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c
> and
>> >> tree-ssa-coalesce.c to compute register pressure, rather than inventing
>> >> another live range solver.
>> >>
>> >> The major change is to record region's basic blocks in var_map and use
> that
>> >> information in computation, rather than FOR_EACH_BB_FN.  For now only
> loop
>> >> and function type regions are supported.  The default one is function
> type
>> >> region which is used in out-of-ssa.  Loop type region will be used in
> next
>> >> patch to compute information for a loop.
>> >>
>> >> Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?
>> >
>> > I believe your changes to create_outofssa_var_map should be done
> differently
>> > by simply only calling it from the coalescing context and passing in the
>> > var_map rather than initializing it therein and returning it.
>> >
>> > This also means the coalesce_vars_p flag in the var_map structure looks
>> > somewhat out-of-place.  That is, it looks you could do with many less
>> > changes if you refactored what calls what slightly?  For example
>> > the extra arg to gimple_can_coalesce_p looks unneeded.
>> >
>> > Just as a note I do have a CFG helper pending that computes RPO order
>> > for SEME regions (attached).  loops are SEME regions, so your RTYPE_SESE
>> > is somewhat odd - I guess RTYPE_LOOP exists only because of the
>> > convenience of passing in a loop * to the "constructor".  I'd rather
>> > drop this region_type thing and always assume a SEME region - at least
>> > I didn't see anything in the patch that depends on any of the forms
>> > apart from the initial BB gathering.
>
>> Hi Richard,
>
>> Thanks for reviewing.  I refactored tree-ssa-live.c and
>> tree-ssa-coalesce.c following your comments.
>> Basically I did following changes:
>> 1) Remove region_type and only support loop region live range computation.
>>  Also I added one boolean field in var_map indicating whether we
>> are computing
>>  loop region live range or out-of-ssa.
>> 2) Refactored create_outofssa_var_map into
> create_coalesce_list_for_region and
>>  populate_coalesce_list_for_outofssa.  Actually the original
>> function name doesn't
>>  quite make sense because it has nothing to do with var_map.
>> 3) Hoist init_var_map up in call stack.  Now it's called by caller
>> (out-of-ssa or live range)
>>  and the returned var_map is passed to coalesce_* stuff.
>> 4) Move global functions to tree-outof-ssa.c and make them static.
>> 5) Save/restore flag_tree_coalesce_vars in order to avoid updating
>> checks on the flag.
>
>> So how is this one?  Patch attached.
>
> A lot better.  Few things I noticed:
>
> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
>
> use a bitmap_head member and bitmap_initialize ().
>
> +  map->vec_bbs = new vec ();
>
> use a vec<> member and map->vec_bbs = vNULL;
>
> Both changes remove an unnecessary indirection.
>
> +  map->outofssa_p = true;
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +   {
> + bitmap_set_bit (map->bmp_bbs, bb->index);
> + map->vec_bbs->safe_push (bb);
> +   }
>
> I think you can avoid populating the bitmap and return
> true unconditionally for outofssa_p in the contains function?
> Ah, you already do - so why populate the bitmap?
>
> +/* Return TRUE if region of the MAP contains basic block BB.  */
> +
> +inline bool
> +region_contains_p (var_map map, basic_block bb)
> +{
> +  if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
> +  || bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
> +return false;
> +
> +  if (map->outofssa_p)
> +return true;
> +
> +  return bitmap_bit_p (map->bmp_bbs, bb->index);
>
> the entry/exit block check should be conditional in map->outofssa_p
> but I think we should never get the function called with those args
> so we can as well use a gcc_checking_assert ()?
>
> I think as followup we should try to get a BB order that
> is more suited for the dataflow problem.  Btw, I was
> thinking about adding anoter "visited" BB flag that is guaranteed to
> be unset and free to be used by infrastructure.  So the bitmap
> could be elided for a bb flag check (but we need to clear that flag
> at the end of processing).  Not sure if it's worth to add a machinery
> to dynamically assign pass-specific flags...  it would at least be
> less error prone.  Sth to think about.
>
> So -- I think the patch is ok with the two indirections removed,
> the rest can be optimized as followup.
Hi,
This is the updated patch.  I moved checks on ENTRY/EXIT blocks under
outofssa_p,
as well as changed vec_bbs into object.  Note bmp_bbs is kept in
pointer so that we
can avoid allocating memory in out-of-ssa case.
Bootstrap and test ongoing.  Is it OK?

Thanks,
bin

Re: [PATCH rs6000] Fix PR85698

2018-05-17 Thread Segher Boessenkool
On Thu, May 17, 2018 at 10:42:46AM -0500, Pat Haugen wrote:
> The following patch fixes a problem that resulted in incorrect code 
> generation for the CPU2017 benchmark 525.x264_r. The fix correctly checks the 
> "dest" operand, which is the memory operand.
> 
> Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
> regressions. Ok for trunk?

Okay.  Thanks!


Segher


> 2018-05-17  Pat Haugen  
>   Segher Boessenkool  
> 
>   PR target/85698
>   * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest 
> operand.
> 
> testsuite/ChangeLog:
> 2018-05-17  Pat Haugen  
> 
>   PR target/85698
>   * gcc.target/powerpc/pr85698.c: New test.


Re: PR83648

2018-05-17 Thread H.J. Lu
On Mon, May 14, 2018 at 11:11 PM, Prathamesh Kulkarni
 wrote:
> On 12 January 2018 at 18:26, Richard Biener  wrote:
>> On Fri, 12 Jan 2018, Prathamesh Kulkarni wrote:
>>
>>> On 12 January 2018 at 05:02, Jeff Law  wrote:
>>> > On 01/10/2018 10:04 PM, Prathamesh Kulkarni wrote:
>>> >> On 11 January 2018 at 04:50, Jeff Law  wrote:
>>> >>> On 01/09/2018 05:57 AM, Prathamesh Kulkarni wrote:
>>> 
>>>  As Jakub pointed out for the case:
>>>  void *f()
>>>  {
>>>    return __builtin_malloc (0);
>>>  }
>>> 
>>>  The malloc propagation would set f() to malloc.
>>>  However AFAIU, malloc(0) returns NULL (?) and the function shouldn't
>>>  be marked as malloc ?
>>> >>> This seems like a pretty significant concern.   Given:
>>> >>>
>>> >>>
>>> >>>  return  n ? 0 : __builtin_malloc (n);
>>> >>>
>>> >>> Is the function malloc-like enough to allow it to be marked?
>>> >>>
>>> >>> If not, then ISTM we have to be very conservative in what we mark.
>>> >>>
>>> >>> foo (n, m)
>>> >>> {
>>> >>>   return n ? 0 : __builtin_malloc (m);
>>> >>> }
>>> >>>
>>> >>> Is that malloc-like enough to mark?
>>> >> Not sure. Should I make it more conservative by marking it as malloc
>>> >> only if the argument to __builtin_malloc
>>> >> is constant or it's value-range is known not to include 0? And
>>> >> similarly for __builtin_calloc ?
>>> > It looks like the consensus is we don't need to worry about the cases
>>> > above.  So unless Jakub chimes in with a solid reason, don't worry about
>>> > them.
>>> Thanks everyone for the clarification. The attached patch skips on 0 phi 
>>> arg,
>>> and returns false if -fno-delete-null-pointer-checks is passed.
>>>
>>> With the patch, malloc_candidate_p returns true for
>>> return 0;
>>> or
>>> ret = phi<0, 0>
>>> return ret
>>>
>>> which I believe is OK as far as correctness is concerned.
>>> However as Martin points out suggesting malloc attribute for return 0
>>> case is not ideal.
>>> I suppose we can track the return 0 (or when value range of return
>>> value is known not to include 0)
>>> corner case and avoid suggesting malloc for those ?
>>>
>>> Validation in progress.
>>> Is this patch OK for next stage-1 ?
>>
>> Ok.
> I have committed this as r260250 after bootstrap+test on x86_64 on top of 
> trunk.

r260250 caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85820

-- 
H.J.


Re: PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-17 Thread H.J. Lu
On Mon, May 14, 2018 at 8:00 PM, Martin Sebor  wrote:
> On 05/14/2018 01:10 PM, H.J. Lu wrote:
>>
>> On Mon, May 14, 2018 at 10:40 AM, H.J. Lu  wrote:
>>
>> $ cat c.i
>> struct B { int i; };
>> struct C { struct B b; } __attribute__ ((packed));
>>
>> long* g8 (struct C *p) { return p; }
>> $ gcc -O2 -S c.i -Wno-incompatible-pointer-types
>> c.i: In function ‘g8’:
>> c.i:4:33: warning: taking value of packed 'struct C *' may result in
>> an
>> unaligned pointer value [-Waddress-of-packed-member]


  ^
 That should read "taking address" (not value) but...
>>>
>>>
>>> The value of 'struct C *' is an address. There is no address taken here.
>>>
 ...to help explain the problem I would suggest to mention the expected
 and actual alignment in the warning message.  E.g.,

   storing the address of a packed 'struct C' in 'struct C *' increases
 the
 alignment of the pointer from 1 to 4.
>>>
>>>
>>> I will take a look.
>>>
 (IIUC, the source type and destination type need not be the same so
 including both should be helpful in those cases.)

 Adding a note pointing to the declaration of either the struct or
 the member would help users find it if it's a header far removed
 from the point of use.
>>>
>>>
>>> I will see what I can do.
>>
>>
>> How about this
>>
>> [hjl@gnu-skx-1 pr51628]$ cat n9.i
>> struct B { int i; };
>> struct C { struct B b; } __attribute__ ((packed));
>>
>> long* g8 (struct C *p) { return p; }
>> [hjl@gnu-skx-1 pr51628]$
>> /export/build/gnu/gcc-test/build-x86_64-linux/gcc/xgcc
>> -B/export/build/gnu/gcc-test/build-x86_64-linux/gcc/ -O2 -S n9.i
>> n9.i: In function ‘g8’:
>> n9.i:4:33: warning: returning ‘struct C *’ from a function with
>> incompatible return type ‘long int *’ [-Wincompatible-pointer-types]
>>  long* g8 (struct C *p) { return p; }
>>  ^
>> n9.i:4:33: warning: taking value of packed ‘struct C *’ increases the
>> alignment of the pointer from 1 to 8 [-Waddress-of-packed-member]
>> n9.i:2:8: note: defined here
>>  struct C { struct B b; } __attribute__ ((packed));
>
>
> Mentioning the alignments looks good.
>
> I still find the "taking value" phrasing odd.  I think we would
> describe what's going on as "converting a pointer to a packed C
> to a pointer to C (with an alignment of 8)" so I'd suggest to
> use the term converting instead.

How about this?

[hjl@gnu-skx-1 pr51628]$ cat n12.i
struct B { int i; };
struct C { struct B b; } __attribute__ ((packed));

struct B* g8 (struct C *p) { return p; }
[hjl@gnu-skx-1 pr51628]$ make n12.s
/export/build/gnu/gcc-test/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-test/build-x86_64-linux/gcc/ -O2 -S n12.i
n12.i: In function ‘g8’:
n12.i:4:37: warning: returning ‘struct C *’ from a function with
incompatible return type ‘struct B *’ [-Wincompatible-pointer-types]
 struct B* g8 (struct C *p) { return p; }
 ^
n12.i:4:37: warning: converting a pointer to packed ‘struct C *’
increases the alignment of the pointer to ‘struct B *’ from 1 to 4
[-Waddress-of-packed-member]
n12.i:2:8: note: defined here
 struct C { struct B b; } __attribute__ ((packed));
^
n12.i:1:8: note: defined here
 struct B { int i; };
^
[hjl@gnu-skx-1 pr51628]$

> I also think mentioning both the source and the destination types
> is useful irrespective of -Wincompatible-pointer-types because
> the latter is often suppressed using a cast, as in:
>
>   struct __attribute__ ((packed)) A { int i; };
>   struct B {
> struct A a;
>   } b;
>
>   long *p = (long*)&b.a.i;   // -Waddress-of-packed-member
>   int *q = (int*)&b.a;   // missing warning
>
> If the types above were obfuscated by macros, typedefs, or in
> C++ template parameters, it could be difficult to figure out
> what the type of the member is because neither it nor the name
> of the member appears in the message.

How about this

[hjl@gnu-skx-1 pr51628]$ cat n13.i
struct __attribute__ ((packed)) A { int i; };
struct B {
  struct A a;
} b;

long *p = (long*)&b.a.i;
int *q = (int*)&b.a;
[hjl@gnu-skx-1 pr51628]$ make n13.s
/export/build/gnu/gcc-test/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-test/build-x86_64-linux/gcc/ -O2 -S n13.i
n13.i:6:18: warning: taking address of packed member of ‘struct A’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
 long *p = (long*)&b.a.i;
  ^~
n13.i:7:16: warning: taking address of packed member of ‘struct B’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
 int *q = (int*)&b.a;
^~~~
[hjl@gnu-skx-1 pr51628]$


-- 
H.J.


Re: [PATCH rs6000] Fix PR85698

2018-05-17 Thread Richard Biener
On May 17, 2018 6:04:36 PM GMT+02:00, Segher Boessenkool 
 wrote:
>On Thu, May 17, 2018 at 10:42:46AM -0500, Pat Haugen wrote:
>> The following patch fixes a problem that resulted in incorrect code
>generation for the CPU2017 benchmark 525.x264_r. The fix correctly
>checks the "dest" operand, which is the memory operand.
>> 
>> Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
>> regressions. Ok for trunk?
>
>Okay.  Thanks!

Don't forget the branch. 

Richard. 

>
>Segher
>
>
>> 2018-05-17  Pat Haugen  
>>  Segher Boessenkool  
>> 
>>  PR target/85698
>>  * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest
>operand.
>> 
>> testsuite/ChangeLog:
>> 2018-05-17  Pat Haugen  
>> 
>>  PR target/85698
>>  * gcc.target/powerpc/pr85698.c: New test.



Documentation patch for -floop-interchange and -floop-unroll-and-jam.

2018-05-17 Thread Toon Moene
The documentation of both options is still inconsistent, in both the 
trunk and the gcc-8 branch.


The following is my suggestion to clear this up (and move 
-floop-unroll-and-jam close to -floop-interchange.


ChangeLog:

2018-05-17  Toon Moene  

* doc/invoke.texi: Move -floop-unroll-and-jam documentation
directly after that of -floop-interchange. Indicate that both
options are enabled by default when specifying -O3.

OK for trunk and gcc-8 ?

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Index: invoke.texi
===
--- invoke.texi	(revision 260287)
+++ invoke.texi	(working copy)
@@ -8866,7 +8866,14 @@
 for (int j = 0; j < N; j++)
   c[i][j] = c[i][j] + a[i][k]*b[k][j];
 @end smallexample
+This flag is enabled by default at @option{-O3}.
 
+@item -floop-unroll-and-jam
+@opindex floop-unroll-and-jam
+Apply unroll and jam transformations on feasible loops.  In a loop
+nest this unrolls the outer loop by some factor and fuses the resulting
+multiple inner loops.  This flag is enabled by default at @option{-O3}.
+
 @item -ftree-loop-im
 @opindex ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
@@ -10038,12 +10045,6 @@
 Move branches with loop invariant conditions out of the loop, with duplicates
 of the loop on both branches (modified according to result of the condition).
 
-@item -floop-unroll-and-jam
-@opindex floop-unroll-and-jam
-Apply unroll and jam transformations on feasible loops.  In a loop
-nest this unrolls the outer loop by some factor and fuses the resulting
-multiple inner loops.
-
 @item -ffunction-sections
 @itemx -fdata-sections
 @opindex ffunction-sections


Re: [PATCH rs6000] Fix PR85698

2018-05-17 Thread Segher Boessenkool
On Thu, May 17, 2018 at 07:58:20PM +0200, Richard Biener wrote:
> On May 17, 2018 6:04:36 PM GMT+02:00, Segher Boessenkool 
>  wrote:
> >On Thu, May 17, 2018 at 10:42:46AM -0500, Pat Haugen wrote:
> >> The following patch fixes a problem that resulted in incorrect code
> >generation for the CPU2017 benchmark 525.x264_r. The fix correctly
> >checks the "dest" operand, which is the memory operand.
> >> 
> >> Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
> >> regressions. Ok for trunk?
> >
> >Okay.  Thanks!
> 
> Don't forget the branch. 

It's okay for both 7 and 8, too.


Segher


> >> 2018-05-17  Pat Haugen  
> >>Segher Boessenkool  
> >> 
> >>PR target/85698
> >>* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest
> >operand.
> >> 
> >> testsuite/ChangeLog:
> >> 2018-05-17  Pat Haugen  
> >> 
> >>PR target/85698
> >>* gcc.target/powerpc/pr85698.c: New test.


Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread James Greenhalgh
On Thu, May 17, 2018 at 09:26:37AM -0500, Kyrill Tkachov wrote:
> 
> On 17/05/18 14:56, Kyrill Tkachov wrote:
> >
> > On 17/05/18 09:46, Kyrill Tkachov wrote:
> >>
> >> On 15/05/18 18:56, Richard Sandiford wrote:
> >>> Kyrill  Tkachov  writes:
>  Hi all,
> 
>  We've a deficiency in our vec_set family of patterns.  We don't
>  support directly loading a vector lane using LD1 for V2DImode and all
>  the vector floating-point modes.  We do do it correctly for the other
>  integer vector modes (V4SI, V8HI etc) though.
> 
>  The alternatives on the relative floating-point patterns only allow a
>  register-to-register INS instruction.  That means if we want to load a
>  value into a vector lane we must first load it into a scalar register
>  and then perform an INS, which is wasteful.
> 
>  There is also an explicit V2DI vec_set expander dangling around for no
>  reason that I can see. It seems to do the exact same things as the
>  other vec_set expanders. This patch removes that.  It now unifies all
>  vec_set expansions into a single "vec_set" define_expand using
>  the catch-all VALL_F16 iterator.
> 
>  I decided to leave two aarch64_simd_vec_set define_insns. One
>  for the integer vector modes (that now include V2DI) and one for the
>  floating-point vector modes. That is so that we can avoid specifying
>  "w,r" alternatives for floating-point modes in case the
>  register-allocator gets confused and starts gratuitously moving
>  registers between the two banks.  So the floating-point pattern only
>  two alternatives, one for SIMD-to-SIMD INS and one for LD1.
> >>> Did you see any cases in which this was necessary?  In some ways it
> >>> seems to run counter to Wilco's recent patches, which tended to remove
> >>> the * markers from the "unnatural" register class and trust the register
> >>> allocator to make a sensible decision.
> >>>
> >>> I think our default position should be trust the allocator here.
> >>> If the consumers all require "w" registers then the RA will surely
> >>> try to use "w" registers if at all possible.  But if the consumers
> >>> don't care then it seems reasonable to offer both, since in those
> >>> cases it doesn't really make much difference whether the payload
> >>> happens to be SF or SI (say).
> >>>
> >>> There are also cases in which the consumer could actively require
> >>> an integer register.  E.g. some code uses unions to bitcast floats
> >>> to ints and then do bitwise arithmetic on them.
> >>>
> >>
> >> Thanks, that makes sense. Honestly, it's been a few months since I worked 
> >> on this patch.
> >> I believe my reluctance to specify that alternative was that it would mean 
> >> merging the integer and
> >> floating-point patterns into one (like the attached version) which would 
> >> put the "w, r" alternative
> >> first for the floating-point case. I guess we should be able to trust the 
> >> allocator to pick
> >> the sensible  alternative though.
> >>
> >
> > With some help from Wilco I can see how this approach will give us 
> > suboptimal code though.
> > If we modify the example from my original post to be:
> > v4sf
> > foo_v4sf (float *a, float *b, float *c, float *d)
> > {
> > v4sf res = { *a, b[2], *c, *d };
> > return res;
> > }
> >
> > The b[2] load will load into a GP register then do an expensive INS into 
> > the SIMD register
> > instead of loading into an FP S-register and then doing a SIMD-to-SIMD INS.
> > The only way I can get it to use the FP load then is to mark the "w, r" 
> > alternative with a '?'
> >
> 
> That patch would look like the attached. Is this preferable?
> For the above example it generates the desired:
> foo_v4sf:
>  ldr s0, [x0]
>  ldr s1, [x1, 8]
>  ins v0.s[1], v1.s[0]
>  ld1 {v0.s}[2], [x2]
>  ld1 {v0.s}[3], [x3]
>  ret
> 
> 
> rather than loading [x1, 8] into a W-reg.

OK,

Thanks,
James



Cybersecurity Software Users Contact List

2018-05-17 Thread Meghan Hudson
Hi,

Hope you having a great day!

I just wanted to be aware if you would be interested in acquiring Cybersecurity 
Software Users Contact List for marketing your product or service.

These are the fields that we provide for each contacts: Names, Title, Email, 
Contact Number, Company Name, Company URL, and Company physical location, SIC 
Code, Industry and Company Size (Revenue and Employee).

Kindly review and let me be aware of your interest so that I can get back to 
you with the exact counts and more info regarding the same.

Do let me be aware if you have any questions for me.

Regards,
Meghan Hudson
Database Executive
If you do not wish to receive these emails. Please respond Exit.


Re: [PATCH , rs6000] Add missing builtin test cases, fix arguments to match specifications.

2018-05-17 Thread Segher Boessenkool
Hi!

On Wed, May 16, 2018 at 12:53:13PM -0700, Carl Love wrote:
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-12.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> index b0267b5..1f3175f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> @@ -18,7 +18,7 @@ vector char scz;
>  vector unsigned char uca = {0,4,8,1,5,9,2,6,10,3,7,11,15,12,14,13};
>  vector unsigned char ucb = {6,4,8,3,1,9,2,6,10,3,7,11,15,12,14,13};
>  vector unsigned char uc_expected = {3,4,8,2,3,9,2,6,10,3,7,11,15,12,14,13};
> -vector char ucz;
> +vector unsigned char ucz;

Why?  Was this a bug in the test case, does it quieten a warning?

> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> index 1e690be..f1eb78f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do compile { target powerpc64-*-* } } */

This is not correct.  The target triple is the (canonical) name of the
architecture the compiler is built for, but you can do for example
powerpc64-linux-gcc -m32, because we are a biarch target; a typical
way to test is

make -k -jNNN check RUNTESTFLAGS="--target_board=unix'{-m64,-m32}'"

If you want the test to only run on 64-bit (why?), you want e.g.
{ dg-do compile { target powerpc*-*-* && lp64 } } */

> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> index 2dd4953..c74c493 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc64le-*-* } } } */
> -/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc64le-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */

This makes no difference, does it?  Please keep it as it was.

> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> index 2df9fca..85d57c8 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do compile { target powerpc64-*-* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-mvsx" } */
>  
> @@ -27,21 +27,21 @@
>  /* { dg-final { scan-assembler-times "vmulosb" 1 } } */
>  
>  // For LE platforms P9 and later, we generate the lxv insn instead of lxvd2x.
> -/* { dg-final { scan-assembler-times {\mlxvd2x\M}  0  { target { { 
> powerpc64*le-*-* } && { p9vector_hw } } } } } */
> -/* { dg-final { scan-assembler-times {\mlxv\M}36  { target { { 
> powerpc64*le-*-* } && { p9vector_hw } } } } } */
> +/* { dg-final { scan-assembler-times {lxvd2x}  0  { target { { 
> powerpc64*le-*-* } && { p9vector_hw } } } } } */
> +/* { dg-final { scan-assembler-times {lxv}36  { target { { 
> powerpc64*le-*-* } && { p9vector_hw } } } } } */

This {lxv} matches {lxvd2x} as well.  \m\M in Tcl are like \b\b in Perl,
or \<\> in many other regex dialects.


Segher


Re: [PATCH , rs6000] Add missing builtin test cases, fix arguments to match specifications.

2018-05-17 Thread Peter Bergner
On 5/17/18 3:31 PM, Segher Boessenkool wrote:
> On Wed, May 16, 2018 at 12:53:13PM -0700, Carl Love wrote:
>> @@ -27,21 +27,21 @@
>>  /* { dg-final { scan-assembler-times "vmulosb" 1 } } */
>>  
>>  // For LE platforms P9 and later, we generate the lxv insn instead of 
>> lxvd2x.
>> -/* { dg-final { scan-assembler-times {\mlxvd2x\M}  0  { target { { 
>> powerpc64*le-*-* } && { p9vector_hw } } } } } */
>> -/* { dg-final { scan-assembler-times {\mlxv\M}36  { target { { 
>> powerpc64*le-*-* } && { p9vector_hw } } } } } */
>> +/* { dg-final { scan-assembler-times {lxvd2x}  0  { target { { 
>> powerpc64*le-*-* } && { p9vector_hw } } } } } */
>> +/* { dg-final { scan-assembler-times {lxv}36  { target { { 
>> powerpc64*le-*-* } && { p9vector_hw } } } } } */
> 
> This {lxv} matches {lxvd2x} as well.  \m\M in Tcl are like \b\b in Perl,
> or \<\> in many other regex dialects.

The target triplet of powerpc64*le-*-* isn't modified by the patch,
but the '*' in powerpc64*le seems superfluous, so can we just remove it?

Peter



Re: [PATCH , rs6000] Add missing builtin test cases, fix arguments to match specifications.

2018-05-17 Thread Carl Love
On Thu, 2018-05-17 at 15:31 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, May 16, 2018 at 12:53:13PM -0700, Carl Love wrote:
> > diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> > b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> > index b0267b5..1f3175f 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
> > @@ -18,7 +18,7 @@ vector char scz;
> >  vector unsigned char uca =
> > {0,4,8,1,5,9,2,6,10,3,7,11,15,12,14,13};
> >  vector unsigned char ucb =
> > {6,4,8,3,1,9,2,6,10,3,7,11,15,12,14,13};
> >  vector unsigned char uc_expected =
> > {3,4,8,2,3,9,2,6,10,3,7,11,15,12,14,13};
> > -vector char ucz;
> > +vector unsigned char ucz;
> 
> Why?  Was this a bug in the test case, does it quieten a warning?

I was actually just making the naming consistent with the rest of the
variable naming.  It doesn't impact the functionality.  The other
variables, uca, ucb for example have their types explicitly stated as 
"unsigned char" where the leading "u" stands for unsigned, "c"
represents char.  However, we have ucz as type char not explicitly
"unsigned char".  So, was just looking for consistency in the
name/declaration.

> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> > b/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> > index 1e690be..f1eb78f 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target powerpc*-*-* } } */
> > +/* { dg-do compile { target powerpc64-*-* } } */
> 
> This is not correct.  The target triple is the (canonical) name of
> the
> architecture the compiler is built for, but you can do for example
> powerpc64-linux-gcc -m32, because we are a biarch target; a typical
> way to test is

OK, wasn't thinking about the fact that the change makes it a 64-bit
only test.  The test is supposed to be for big endian, i.e. the name is
altivec-7-be.c.  We have another test file altivec-7-le.c for little
endian testing.  The change was trying to make it a BE only test but as
you point out, I lose the 32-bit testing.  The 32-bit mode will
obviously be BE.  The thinking was powerpc64-*-* restricts the test to
BE where as powerpc64le-*-* restricts the test to LE.  So I need to
qualify that on 64-bit I only want to run if I am on a 64-bit BE
system.  How can I do that?

> > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> > b/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> > index 2dd4953..c74c493 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-le.c
> > @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc64le-*-* } } } */
> > -/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-
> > mcpu=*" } { "-mcpu=power8" } } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc64le-*-* } { "-
> > mcpu=*" } { "-mcpu=power8" } } */
> 
> This makes no difference, does it?  Please keep it as it was.

Ditto, trying to make this only run on LE as there is also a test file
builtins-1-be.c with  /* { dg-do compile { target { powerpc64-*-* } } }
*/ for testing on BE.  
> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> > b/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> > index 2df9fca..85d57c8 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target powerpc*-*-* } } */
> > +/* { dg-do compile { target powerpc64-*-* } } */
> >  /* { dg-require-effective-target powerpc_vsx_ok } */
> >  /* { dg-options "-mvsx" } */
> >  
> > @@ -27,21 +27,21 @@
> >  /* { dg-final { scan-assembler-times "vmulosb" 1 } } */
> >  
> >  // For LE platforms P9 and later, we generate the lxv insn instead
> > of lxvd2x.
> > -/* { dg-final { scan-assembler-times {\mlxvd2x\M}  0  { target { {
> > powerpc64*le-*-* } && { p9vector_hw } } } } } */
> > -/* { dg-final { scan-assembler-times {\mlxv\M}36  { target { {
> > powerpc64*le-*-* } && { p9vector_hw } } } } } */
> > +/* { dg-final { scan-assembler-times {lxvd2x}  0  { target { {
> > powerpc64*le-*-* } && { p9vector_hw } } } } } */
> > +/* { dg-final { scan-assembler-times {lxv}36  { target { {
> > powerpc64*le-*-* } && { p9vector_hw } } } } } */
> 
> This {lxv} matches {lxvd2x} as well.  \m\M in Tcl are like \b\b in
> Perl,
> or \<\> in many other regex dialects.

OK, went and looked that up, didn't realize that was what it was doing.
  

I can change powerpc64*le-*-* to powerpc64le-*-* and leave the rest
alone if you like.

   Carl 



Re: [Patch] Do not call the linker if we are creating precompiled header files

2018-05-17 Thread Steve Ellcey
Ping.

Steve Ellcey
sell...@cavium.com


On Wed, 2018-05-02 at 12:47 -0700, Steve Ellcey wrote:
> This is a new version of a patch I sent out last year to stop gcc from
> trying to do a link when creating precompiled headers and a linker
> flag is also given.
> 
> When I build and test GCC I also build glibc and then I run the GCC tests
> with -Wl,-rpath and -Wl,--dynamic-linker so that I don't have to install
> glibc and the compiler in the default locations.  When I do this some
> precompiled header tests fail because the existance of the linker flags
> causes the compiler to try and call the linker when we really just want to
> create pch files.
> 
> I tracked this down to driver::maybe_run_linker where it sees the linker
> flags and increments num_linker_inputs, this causes the routine to call
> the linker.   This patch checks to see if we are creating precompiled
> header files and avoids calling the linker in that case.
> 
> I tested this with the GCC testsuite and got no regressions, OK to
> checkin?
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2018-05-02  Steve Ellcey  
> 
>   * gcc.c (create_pch_flag): New variable.
>   (driver::prepare_infiles): Set create_pch_flag
>   when we are creating precompiled headers.
>   (driver::maybe_run_linker): Do not link if
>   create_pch_flag is set.
>   (driver::finalize): Reset create_pch_flag.
> 
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index a716f70..ca986cf 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -208,6 +208,9 @@ int is_cpp_driver;
>  /* Flag set to nonzero if an @file argument has been supplied to gcc.  */
>  static bool at_file_supplied;
>  
> +/* Flag set to nonzero if we are generating a precompiled header.  */
> +static bool create_pch_flag;
> +
>  /* Definition of string containing the arguments given to configure.  */
>  #include "configargs.h"
>  
> @@ -8095,8 +8098,15 @@ driver::prepare_infiles ()
>      strlen (name),
>      infiles[i].langua
> ge);
>  
> -  if (compiler && !(compiler->combinable))
> - combine_inputs = false;
> +  if (compiler)
> + {
> +   if (!(compiler->combinable))
> + combine_inputs = false;
> +
> +   if ((strcmp(compiler->suffix, "@c-header") == 0)
> +   || (strcmp(compiler->suffix, "@c++-header") == 0))
> + create_pch_flag = true;
> + }
>  
>    if (lang_n_infiles > 0 && compiler != input_file_compiler
>     && infiles[i].language && infiles[i].language[0] != '*')
> @@ -8282,6 +8292,10 @@ driver::maybe_run_linker (const char *argv0)
> const
>    int linker_was_run = 0;
>    int num_linker_inputs;
>  
> +  /* If we are creating a precompiled header, do not run the linker.  */
> +  if (create_pch_flag)
> +return;
> +
>    /* Determine if there are any linker input files.  */
>    num_linker_inputs = 0;
>    for (i = 0; (int) i < n_infiles; i++)
> @@ -10052,6 +10066,7 @@ driver::finalize ()
>  
>    is_cpp_driver = 0;
>    at_file_supplied = 0;
> +  create_pch_flag = 0;
>    print_help_list = 0;
>    print_version = 0;
>    verbose_only_flag = 0;


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Paolo Carlini

Hi,

On 17/05/2018 16:58, Jason Merrill wrote:

On Thu, May 17, 2018 at 10:27 AM, Paolo Carlini
 wrote:

PS: maybe better using function_declarator_p?

I think so, yes.  The relevant rule seems to be "The declarator shall
not specify a function or an array.", so let's check for arrays, too.
Agreed. I had the amended patch ready when I noticed (again) that it 
wasn't addressing another related class of issues which involves 
declarators not followed by initializers. Thus I tried to fix those too, 
and the below which moves the check up appears to work fine, passes 
testing, etc. Are there any risks that an erroneous function / array as 
declarator is in fact a well formed expression?!? I haven't been able so 
far to construct examples...


Thanks!
Paolo.


Index: cp/parser.c
===
--- cp/parser.c (revision 260331)
+++ cp/parser.c (working copy)
@@ -11527,6 +11527,33 @@ cp_parser_selection_statement (cp_parser* parser,
 }
 }
 
+/* Helper function for cp_parser_condition.  Enforces [stmt.stmt]:
+   The declarator shall not specify a function or an array.  Returns
+   TRUE is the declator is valid, FALSE otherwise.  */
+
+static bool
+cp_parser_check_condition_declarator (cp_parser* parser,
+ cp_declarator *declarator,
+ location_t loc)
+{
+  if (function_declarator_p (declarator)
+  || declarator->kind == cdk_array)
+{
+  if (declarator->kind == cdk_array)
+   error_at (loc, "an array type is not allowed here");
+  else
+   error_at (loc, "a function type is not allowed here");
+  if (parser->fully_implicit_function_template_p)
+   abort_fully_implicit_template (parser);
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+/*or_comma=*/false,
+/*consume_paren=*/false);
+  return false;
+}
+  else
+return true;
+}
+
 /* Parse a condition.
 
condition:
@@ -11571,6 +11598,7 @@ cp_parser_condition (cp_parser* parser)
   tree attributes;
   cp_declarator *declarator;
   tree initializer = NULL_TREE;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Parse the declarator.  */
   declarator = cp_parser_declarator (parser, CP_PARSER_DECLARATOR_NAMED,
@@ -11592,10 +11620,15 @@ cp_parser_condition (cp_parser* parser)
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ)
  && cp_lexer_next_token_is_not (parser->lexer, CPP_OPEN_BRACE))
cp_parser_simulate_error (parser);
-   
+
   /* If we did see an `=' or '{', then we are looking at a declaration
 for sure.  */
-  if (cp_parser_parse_definitely (parser))
+  bool decl_p = cp_parser_parse_definitely (parser);
+
+  if (!cp_parser_check_condition_declarator (parser, declarator, loc))
+   return error_mark_node;
+
+  if (decl_p)
{
  tree pushed_scope;
  bool non_constant_p;
Index: testsuite/g++.dg/cpp0x/cond1.C
===
--- testsuite/g++.dg/cpp0x/cond1.C  (nonexistent)
+++ testsuite/g++.dg/cpp0x/cond1.C  (working copy)
@@ -0,0 +1,23 @@
+// PR c++/84588
+// { dg-do compile { target c++11 } }
+
+void foo()
+{
+  if (int bar() {})  // { dg-error "function type is not allowed" }
+;
+
+  for (;int bar() {};)  // { dg-error "function type is not allowed" }
+;
+
+  while (int bar() {})  // { dg-error "function type is not allowed" }
+;
+
+  if (int a[] {})  // { dg-error "array type is not allowed" }
+;
+
+  for (;int a[] {};)  // { dg-error "array type is not allowed" }
+;
+
+  while (int a[] {})  // { dg-error "array type is not allowed" }
+;
+}
Index: testsuite/g++.dg/cpp1y/pr84588-1.C
===
--- testsuite/g++.dg/cpp1y/pr84588-1.C  (nonexistent)
+++ testsuite/g++.dg/cpp1y/pr84588-1.C  (working copy)
@@ -0,0 +1,25 @@
+// { dg-do compile { target c++14 } }
+
+struct a {
+  void b() {}
+  void c(void (*) () = [] {
+  if (a a(int auto) {})  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
+
+struct d {
+  void e() {}
+  void f(void (*) () = [] {
+  for (;d d(int auto) {};)  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
+
+struct g {
+  void h() {}
+  void i(void (*) () = [] {
+  while (g g(int auto) {})  // { dg-error "two or more data types|function 
type" }
+  ;
+  }) {}
+};
Index: testsuite/g++.dg/cpp1y/pr84588-2.C
===
--- testsuite/g++.dg/cpp1y/pr84588-2.C  (nonexistent)
+++ testsuite/g++.dg/cpp1y/pr84588-2.C  (working copy)
@@ -0,0 +1,25 @@
+// { dg-do compile { target c++14 } }
+
+struct a {
+  void b() {}
+  void c(void (*) () = [] {
+  if

Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-17 Thread Jim Wilson
On Thu, May 17, 2018 at 12:25 AM, Eric Botcazou  wrote:
> The patch looks OK to me, modulo:
>
>> +   && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1
>
> I'd use UINTVAL instead of INTVAL here.

Thanks.  Committed with that change.

Jim


Re: [PATCH 2/2, RISC-V] Optimize switch with sign-extended index.

2018-05-17 Thread Jim Wilson
On Wed, May 2, 2018 at 3:05 PM, Jim Wilson  wrote:>
> * config/riscv/riscv.c (riscv_extend_comparands): In unsigned QImode
> test, check for sign extended subreg and/or constant operands, and
> do a sign extend in that case.
>
> gcc/testsuite/
> * gcc.target/riscv/switch-qi.c: New.
> * gcc.target/riscv/switch-si.c: New.

Committed.

Jim


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Jason Merrill
On Thu, May 17, 2018 at 5:54 PM, Paolo Carlini  wrote:
> On 17/05/2018 16:58, Jason Merrill wrote:
>>
>> On Thu, May 17, 2018 at 10:27 AM, Paolo Carlini
>>  wrote:
>>>
>>> PS: maybe better using function_declarator_p?
>>
>> I think so, yes.  The relevant rule seems to be "The declarator shall
>> not specify a function or an array.", so let's check for arrays, too.
>
> Agreed. I had the amended patch ready when I noticed (again) that it wasn't
> addressing another related class of issues which involves declarators not
> followed by initializers. Thus I tried to fix those too, and the below which
> moves the check up appears to work fine, passes testing, etc. Are there any
> risks that an erroneous function / array as declarator is in fact a well
> formed expression?!?

That doesn't matter; if it parses as a declarator, it's a declarator,
even if it's an ill-formed declarator.  But...

+  bool decl_p = cp_parser_parse_definitely (parser);
+  if (!cp_parser_check_condition_declarator (parser, declarator, loc))
+return error_mark_node;

...if cp_parser_parse_definitely returns false, parsing as a
declarator failed, so we shouldn't look at "declarator".

Also, "here" in the diagnostic seems unnecessarily vague; we could be
more specific.  Maybe "condition declares a function/array"?

Jason


libcpp PATCH to avoid deprecated copy assignment

2018-05-17 Thread Jason Merrill
Another case of assignment from a value-initialized temporary, which
in this case ought to be placement new.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit ccd4031ebdabf02fe0d54bb43a68c0fa72ec2708
Author: Jason Merrill 
Date:   Thu May 17 17:16:28 2018 -0400

* line-map.c (linemap_init): Use placement new.

* system.h: #include .

diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index a84084c99f0..b2ebfeb16d4 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -348,7 +348,7 @@ linemap_init (struct line_maps *set,
   /* PR33916, needed to fix PR82939.  */
   memset (set, 0, sizeof (struct line_maps));
 #else
-  *set = line_maps ();
+  new (set) line_maps();
 #endif
   set->highest_location = RESERVED_LOCATION_COUNT - 1;
   set->highest_line = RESERVED_LOCATION_COUNT - 1;
diff --git a/libcpp/system.h b/libcpp/system.h
index 719435df949..76420e16cfb 100644
--- a/libcpp/system.h
+++ b/libcpp/system.h
@@ -438,6 +438,10 @@ extern void fancy_abort (const char *, int, const char *) ATTRIBUTE_NORETURN;
 /* Some compilers do not allow the use of unsigned char in bitfields.  */
 #define BOOL_BITFIELD unsigned int
 
+#ifdef __cplusplus
+#include 
+#endif
+
 /* Poison identifiers we do not want to use.  */
 #if (GCC_VERSION >= 3000)
 #undef calloc


Re: [PATCH] use string length to relax -Wstringop-overflow for nonstrings (PR 85623)

2018-05-17 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00509.html

On 05/10/2018 01:26 PM, Martin Sebor wrote:

GCC 8.1 warns for unbounded (and some bounded) string comparisons
involving arrays declared attribute nonstring (i.e., char arrays
that need not be nul-terminated).  For instance:

  extern __attribute__((nonstring)) char a[4];

  int f (void)
  {
return strncmp (a, "123", sizeof a);
  }

  warning: ‘strcmp’ argument 1 declared attribute ‘nonstring’

Note that the warning refers to strcmp even though the call in
the source is to strncmp, because prior passes transform one to
the other.

The warning above is unnecessary (for strcmp) and incorrect for
strncmp because the call reads exactly four bytes from the non-
string array a regardless of the bound and so there is no risk
that it will read past the end of the array.

The attached change enhances the warning to use the length of
the string argument to suppress some of these needless warnings
for both bounded and unbounded string comparison functions.
When the length of the string is unknown, the warning uses its
size (when possible) as the upper bound on the number of accessed
bytes.  The change adds no new warnings.

I'm looking for approval to commit it to both trunk and 8-branch.

Martin




Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Paolo Carlini

Hi,

On 18/05/2018 01:21, Jason Merrill wrote:

On Thu, May 17, 2018 at 5:54 PM, Paolo Carlini  wrote:

On 17/05/2018 16:58, Jason Merrill wrote:

On Thu, May 17, 2018 at 10:27 AM, Paolo Carlini
 wrote:

PS: maybe better using function_declarator_p?

I think so, yes.  The relevant rule seems to be "The declarator shall
not specify a function or an array.", so let's check for arrays, too.

Agreed. I had the amended patch ready when I noticed (again) that it wasn't
addressing another related class of issues which involves declarators not
followed by initializers. Thus I tried to fix those too, and the below which
moves the check up appears to work fine, passes testing, etc. Are there any
risks that an erroneous function / array as declarator is in fact a well
formed expression?!?

That doesn't matter; if it parses as a declarator, it's a declarator,
even if it's an ill-formed declarator.  But...

+  bool decl_p = cp_parser_parse_definitely (parser);
+  if (!cp_parser_check_condition_declarator (parser, declarator, loc))
+return error_mark_node;

...if cp_parser_parse_definitely returns false, parsing as a
declarator failed, so we shouldn't look at "declarator".
Uhm, then you are saying that we should fix cp_parser_declarator itself, 
right? Because we don't want cp_parser_parse_definitely returning false 
after cp_parser_declarator parses, say, 'if (int foo())' and therefore 
cp_parser_condition proceed with cp_parser_expression, we want to emit 
our error and bail out. Therefore the problem in the new patch seems 
that it tries to paper over that cp_parser_declarator issue in the 
caller?!? Like, Ok, cp_parser_declarator failed, but it was anyway 
trying to declare a function / array and that can't possibly be an 
expression, see what I mean? *Somehow*, the question you answered above.


Paolo.


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")​ (Take 2)

2018-05-17 Thread Paolo Carlini

Hi again,

On 18/05/2018 02:31, Paolo Carlini wrote:

Hi,

On 18/05/2018 01:21, Jason Merrill wrote:
On Thu, May 17, 2018 at 5:54 PM, Paolo Carlini 
 wrote:

On 17/05/2018 16:58, Jason Merrill wrote:

On Thu, May 17, 2018 at 10:27 AM, Paolo Carlini
 wrote:

PS: maybe better using function_declarator_p?

I think so, yes.  The relevant rule seems to be "The declarator shall
not specify a function or an array.", so let's check for arrays, too.
Agreed. I had the amended patch ready when I noticed (again) that it 
wasn't
addressing another related class of issues which involves 
declarators not
followed by initializers. Thus I tried to fix those too, and the 
below which
moves the check up appears to work fine, passes testing, etc. Are 
there any
risks that an erroneous function / array as declarator is in fact a 
well

formed expression?!?

That doesn't matter; if it parses as a declarator, it's a declarator,
even if it's an ill-formed declarator.  But...

+  bool decl_p = cp_parser_parse_definitely (parser);
+  if (!cp_parser_check_condition_declarator (parser, declarator, 
loc))

+    return error_mark_node;

...if cp_parser_parse_definitely returns false, parsing as a
declarator failed, so we shouldn't look at "declarator".
Uhm, then you are saying that we should fix cp_parser_declarator 
itself, right? Because we don't want cp_parser_parse_definitely 
returning false after cp_parser_declarator parses, say, 'if (int 
foo())' and therefore cp_parser_condition proceed with 
cp_parser_expression, we want to emit our error and bail out. 
Therefore the problem in the new patch seems that it tries to paper 
over that cp_parser_declarator issue in the caller?!? Like, Ok, 
cp_parser_declarator failed, but it was anyway trying to declare a 
function / array and that can't possibly be an expression, see what I 
mean? *Somehow*, the question you answered above.
Ok, now I finally see what's the exact issue you pointed out (I'm a bit 
tired). Seems fixable.


If I understand correctly, the reason why the 3 lines you cited above 
are wrong as they are is that my patch *assumes* that 
cp_parser_declarator didn't really fail and cp_parser_condition has 
forced the tentative parse to fail by calling cp_parser_simulate_error 
immediately before when it didn't see an initializer immediately 
following. That's actually true for 'if (int foo())', thus it makes 
sense to check the declarator anyway for such cases *even* if 
cp_parser_parse_definitely returns false. See what I mean?


Therefore, it seems to me that an amended patch would rearrange 
cp_parser_condition to *not* call cp_parser_simulate_error for the cases 
we care about ('if (int foo())') and instead check the declarator.


I'll work on that tomorrow...

Thanks,
Paolo.



[PATCH] refine -Wstringop-truncation and -Wsizeof-pointer-memaccess for strncat of nonstrings (PR 85602)

2018-05-17 Thread Martin Sebor

The -Wstringop-truncation and -Wsizeof-pointer-memaccess warnings
I added and enhanced, respectively, in GCC 8 are arguably overly
strict for source arguments declared with the nonstring attribute.

For example, -Wsizeof-pointer-memaccess triggers for the strncat
call below:

  __attribute__ ((nonstring)) char nonstr[8];
  extern char *d;
  strncat (d, nonstr, sizeof nonstr);

even though it's still a fairly common (if unsafe) idiom from
the early UNIX days (V7 from 1979 to be exact) where strncat
was introduced.  (This use case, modulo the attribute, was
reduced from coreutils.)

Simialrly, -Wstringop-truncation warns for some strcat calls that
are actually safe, such as in:

  strcpy (nonstr, "123");
  strncat (d, nonstr, 32);

To help with the adoption of the warnings and the attribute and
avoid unnecessary churn the attached patch relaxes both warnings
to accept code like this without diagnostics.

The patch doesn't add any new warnings so I'd like it considered
for GCC 8 in addition to trunk.

Thanks
Martin
PR middle-end/85602 -  -Wsizeof-pointer-memaccess for strncat with size of source

gcc/c-family/ChangeLog:

	PR middle-end/85602
	* c-warn.c (sizeof_pointer_memaccess_warning): Check for attribute
	nonstring.

gcc/ChangeLog:

	PR middle-end/85602
	* calls.c (maybe_warn_nonstring_arg): Handle strncat.
	* tree-ssa-strlen.c (is_strlen_related_p): Make extern.
	Handle integer subtraction.
	(maybe_diag_stxncpy_trunc): Handle nonstring source arguments.
	* tree-ssa-strlen.h (is_strlen_related_p): Declare.

gcc/testsuite/ChangeLog:

	PR middle-end/85602
	* c-c++-common/attr-nonstring-3.c: Adjust.
	* c-c++-common/attr-nonstring-6.c: New test.

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index d0d9c78..dc87f01 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcc-rich-location.h"
 #include "gimplify.h"
 #include "c-family/c-indentation.h"
+#include "calls.h"
 
 /* Print a warning if a constant expression had overflow in folding.
Invoke this function on every expression that the language
@@ -798,7 +799,12 @@ sizeof_pointer_memaccess_warning (location_t *sizeof_arg_loc, tree callee,
 	  tem = tree_strip_nop_conversions (src);
 	  if (TREE_CODE (tem) == ADDR_EXPR)
 	tem = TREE_OPERAND (tem, 0);
-	  if (operand_equal_p (tem, sizeof_arg[idx], OEP_ADDRESS_OF))
+
+	  /* Avoid diagnosing sizeof SRC when SRC is declared with
+	 attribute nonstring.  */
+	  tree dummy;
+	  if (operand_equal_p (tem, sizeof_arg[idx], OEP_ADDRESS_OF)
+	  && !get_attr_nonstring_decl (tem, &dummy))
 	warning_at (sizeof_arg_loc[idx], OPT_Wsizeof_pointer_memaccess,
 			"argument to % in %qD call is the same "
 			"expression as the source; did you mean to use "
diff --git a/gcc/calls.c b/gcc/calls.c
index 9eb0467..472c330 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-chkp.h"
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
+#include "tree-ssa-strlen.h"
 #include "rtl-chkp.h"
 #include "intl.h"
 #include "stringpool.h"
@@ -1614,9 +1615,12 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
 
   /* It's safe to call "bounded" string functions with a non-string
  argument since the functions provide an explicit bound for this
- purpose.  */
-  switch (DECL_FUNCTION_CODE (fndecl))
+ purpose.  The exception is strncat where the bound may refer to
+ either the destination or the source.  */
+  int fncode = DECL_FUNCTION_CODE (fndecl);
+  switch (fncode)
 {
+case BUILT_IN_STRNCAT:
 case BUILT_IN_STPNCPY:
 case BUILT_IN_STPNCPY_CHK:
 case BUILT_IN_STRNCMP:
@@ -1687,30 +1691,89 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
   if (!decl)
 	continue;
 
-  tree type = TREE_TYPE (decl);
-
   offset_int wibnd = 0;
-  if (bndrng[0])
+
+  if (argno && fncode == BUILT_IN_STRNCAT)
+	{
+	  /* See if the bound in strncat is derived from the length
+	 of the strlen of the destination (as it's expected to be).
+	 If so, reset BOUND and FNCODE to trigger a warning.  */
+	  tree dstarg = CALL_EXPR_ARG (exp, 0);
+	  if (is_strlen_related_p (dstarg, bound))
+	{
+	  /* The bound applies to the destination, not to the source,
+		 so reset these to trigger a warning without mentioning
+		 the bound.  */
+	  bound = NULL;
+	  fncode = 0;
+	}
+	  else if (bndrng[1])
+	/* Use the upper bound of the range for strncat.  */
+	wibnd = wi::to_offset (bndrng[1]);
+	}
+  else if (bndrng[0])
+	/* Use the lower bound of the range for functions other than
+	   strncat.  */
 	wibnd = wi::to_offset (bndrng[0]);
 
+  /* Determine the size of the argument array if it is one.  */
   offset_int asize = wibnd;
+  bool known_size = false;
+  tree type = TREE_TYPE (decl);
 
   if (TREE_CODE (type) == ARRAY_TYPE)
 	if (tree arrbnd = TYPE_DOMAIN (type))
 	  {
 	  

Re: [RFC][PR64946] "abs" vectorization fails for char/short types

2018-05-17 Thread Kugan Vivekanandarajah
Hi Richard,

Thanks for the review. I am revising the patch based on Andrew's comments too.

On 17 May 2018 at 20:36, Richard Biener  wrote:
> On Thu, May 17, 2018 at 4:56 AM Andrew Pinski  wrote:
>
>> On Wed, May 16, 2018 at 7:14 PM, Kugan Vivekanandarajah
>>  wrote:
>> > As mentioned in the PR, I am trying to add ABSU_EXPR to fix this
>> > issue. In the attached patch, in fold_cond_expr_with_comparison I am
>> > generating ABSU_EXPR for these cases. As I understand, absu_expr is
>> > well defined in RTL. So, the issue is generating absu_expr  and
>> > transferring to RTL in the correct way. I am not sure I am not doing
>> > all that is needed. I will clean up and add more test-cases based on
>> > the feedback.
>
>
>> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
>> index 71e172c..2b812e5 100644
>> --- a/gcc/optabs-tree.c
>> +++ b/gcc/optabs-tree.c
>> @@ -235,6 +235,7 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
>> return trapv ? negv_optab : neg_optab;
>
>>   case ABS_EXPR:
>> +case ABSU_EXPR:
>> return trapv ? absv_optab : abs_optab;
>
>
>> This part is not correct, it should something like this:
>
>>   case ABS_EXPR:
>> return trapv ? absv_optab : abs_optab;
>> +case ABSU_EXPR:
>> +   return abs_optab ;
>
>> Because ABSU is not undefined at the TYPE_MAX.
>
> Also
>
> /* Unsigned abs is simply the operand.  Testing here means we don't
>   risk generating incorrect code below.  */
> -  if (TYPE_UNSIGNED (type))
> +  if (TYPE_UNSIGNED (type)
> + && (code != ABSU_EXPR))
>  return op0;
>
> is wrong.  ABSU of an unsigned number is still just that number.
>
> The change to fold_cond_expr_with_comparison looks odd to me
> (premature optimization).  It should be done separately - it seems
> you are doing

FE seems to be using this to generate ABS_EXPR from
c_fully_fold_internal to fold_build3_loc and so on. I changed this to
generate ABSU_EXPR for the case in the testcase. So the question
should be, in what cases do we need ABS_EXPR and in what cases do we
need ABSU_EXPR. It is not very clear to me.


>
> (simplify (abs (convert @0)) (convert (absu @0)))
>
> here.
>
> You touch one other place in fold-const.c but there seem to be many
> more that need ABSU_EXPR handling (you touched the one needed
> for correctness) - esp. you should at least handle constant folding
> in const_unop and the nonnegative predicate.

OK.
>
> @@ -3167,6 +3167,9 @@ verify_expr (tree *tp, int *walk_subtrees, void *data
> ATTRIBUTE_UNUSED)
> CHECK_OP (0, "invalid operand to unary operator");
> break;
>
> +case ABSU_EXPR:
> +  break;
> +
>   case REALPART_EXPR:
>   case IMAGPART_EXPR:
>
> verify_expr is no more.  Did you test this recently against trunk?

This patch is against slightly older trunk. I will rebase it.

>
> @@ -3937,6 +3940,9 @@ verify_gimple_assign_unary (gassign *stmt)
>   case PAREN_EXPR:
>   case CONJ_EXPR:
> break;
> +case ABSU_EXPR:
> +  /* FIXME.  */
> +  return false;
>
> no - please not!  Please add verification here - ABSU should be only
> called on INTEGRAL, vector or complex INTEGRAL types and the
> type of the LHS should be always the unsigned variant of the
> argument type.

OK.
>
> if (is_gimple_val (cond_expr))
>   return cond_expr;
>
> -  if (TREE_CODE (cond_expr) == ABS_EXPR)
> +  if (TREE_CODE (cond_expr) == ABS_EXPR
> +  || TREE_CODE (cond_expr) == ABSU_EXPR)
>   {
> rhs1 = TREE_OPERAND (cond_expr, 1);
> STRIP_USELESS_TYPE_CONVERSION (rhs1);
>
> err, but the next line just builds a ABS_EXPR ...
>
> How did you identify spots that need adjustment?  I would expect that
> once folding generates ABSU_EXPR that you need to adjust frontends
> (C++ constexpr handling for example).  Also I miss adjustments
> to gimple-pretty-print.c and the GIMPLE FE parser.

I will add this.
>
> recursively grepping throughout the whole gcc/ tree doesn't reveal too many
> cases of ABS_EXPR so I think it's reasonable to audit all of them.
>
> I also miss some trivial absu simplifications in match.pd.  There are not
> a lot of abs cases but similar ones would be good to have initially.

I will add them in the next version.

Thanks,
Kugan

>
> Thanks for tackling this!
> Richard.
>
>> Thanks,
>> Andrew
>
>> >
>> > Thanks,
>> > Kugan
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> > 2018-05-13  Kugan Vivekanandarajah  
>> >
>> > * expr.c (expand_expr_real_2): Handle ABSU_EXPR.
>> > * fold-const.c (fold_cond_expr_with_comparison): Generate ABSU_EXPR
>> > (fold_unary_loc): Handle ABSU_EXPR.
>> > * optabs-tree.c (optab_for_tree_code): Likewise.
>> > * tree-cfg.c (verify_expr): Likewise.
>> > (verify_gimple_assign_unary):  Likewise.
>> > * tree-if-conv.c (fold_build_cond_expr):  Likewise.
>> > * tree-inline.c (estimate_operator_cost):  Likewise.
>> > * tree-pretty-print.c (dump_generic_node):  Likewise.
>> >