Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Allan Sandfeld Jensen
On Mittwoch, 24. Juli 2019 08:45:21 CEST Martin Liška wrote:
> On 7/24/19 12:11 AM, Allan Sandfeld Jensen wrote:
> > On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> >> Hi.
> >> 
> >> As we as openSUSE started using -flto, I see it very handy to have
> >> an option value that will automatically detect number of cores
> >> that can be used for parallel LTRANS phase.
> >> 
> >> Thoughts?
> > 
> > That's really nice.
> > 
> > How much extra work would it be to make it support a posix make jobserver?
> 
> We do support it via -flto=jobserver:
> 
Good to know :)

> 
> Problem is that nowadays you how much more common make systems like ninja,
> meson and others that probably do not support that.
> 
There are patches to enable it in ninja, and I know some Linux distros apply 
the patches by default. Though that is more listening, so it probably requires 
launching ninja using make, if you want to be able to pass it own to gcc.

'Allan




Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Martin Liška
On 7/24/19 9:03 AM, Allan Sandfeld Jensen wrote:
> On Mittwoch, 24. Juli 2019 08:45:21 CEST Martin Liška wrote:
>> On 7/24/19 12:11 AM, Allan Sandfeld Jensen wrote:
>>> On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
 Hi.

 As we as openSUSE started using -flto, I see it very handy to have
 an option value that will automatically detect number of cores
 that can be used for parallel LTRANS phase.

 Thoughts?
>>>
>>> That's really nice.
>>>
>>> How much extra work would it be to make it support a posix make jobserver?
>>
>> We do support it via -flto=jobserver:
>>
> Good to know :)
> 
>>
>> Problem is that nowadays you how much more common make systems like ninja,
>> meson and others that probably do not support that.
>>
> There are patches to enable it in ninja, and I know some Linux distros apply 
> the patches by default. Though that is more listening, so it probably 
> requires 
> launching ninja using make, if you want to be able to pass it own to gcc.

Btw. I know that Nathan will need a much closer integration between a compiler
and a build system for hit C++ modules work. That can be a good opportunity
for the LTO to utilize the infrastructure as well.

Martin

> 
> 'Allan
> 
> 



Re: [libgomp, WIP, GSoC'19] Modification to a single queue, single execution path.

2019-07-24 Thread Jakub Jelinek
Hi!

Thanks for the patch.

On Mon, Jul 22, 2019 at 04:40:21AM +0900, 김규래 wrote:

Can you please try to tweak your mailer settings?  In the text version of
the patch the mailer ate tab characters, so the patch can't apply, and in
the html version which we generally don't want to see on the mailing list,
one has to open it in some web browser, copy from there and replace \n\n
with \n to get something that can apply.

> @@ -447,7 +451,7 @@ struct gomp_task
>/* Parent of this task.  */
>struct gomp_task *parent;
>/* Children of this task.  */
> -  struct priority_queue children_queue;
> +  /* struct priority_queue children_queue; */

Generally, we don't want to have code commented out like this in the final
patch submission.  For this WIP, I think it is acceptable, as I think in the
end you don't want to use the team's queue, but actually either
children_queue (renamed), but only use it on the implicit tasks, or during
team creation allocate together with the team structure also memory that
would be used as a trailing array for an array of the implicit queues next
to the array of implicit tasks.

>/* The priority node for this task in each of the different queues.
>   We put this here to avoid allocating space for each priority
>   node.  Then we play offsetof() games to convert between pnode[]
>   entries and the gomp_task in which they reside.  */
> -  struct priority_node pnode[3];
> +  struct priority_node pnode;

If there is just one priority_node, the above comment can go,

> @@ -1211,7 +1218,7 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t 
> *) __GOMP_NOTHROW;
>  static inline size_t
>  priority_queue_offset (enum priority_queue_type type)
>  {
> -  return offsetof (struct gomp_task, pnode[(int) type]);
> +  return offsetof (struct gomp_task, pnode);

this whole routine as well and we just don't need offsetof anymore, nor
priority_queue_type anywhere, PQ_*, etc., just use the single pnode.

> @@ -182,8 +128,8 @@ gomp_task_handle_depend (struct gomp_task *task, struct 
> gomp_task *parent,
>  }
>else
>  {
> -  ndepend = (uintptr_t) depend[1]; /* total # */
> -  size_t nout = (uintptr_t) depend[2]; /* # of out: and inout: */
> +  ndepend = (uintptr_t) depend[1];  /* total # */
> +  size_t nout = (uintptr_t) depend[2];/* # of out: and inout: */

The patch contains a lot of changes like the above one, a small portion
improves formatting, but most of them make it worse or are unnecessary.
The above one seems unnecessary (what is wrong is just space followed by
tab, that should never happen).

> @@ -235,8 +181,8 @@ gomp_task_handle_depend (struct gomp_task *task, struct 
> gomp_task *parent,
>task->depend[i].redundant = false;
>task->depend[i].redundant_out = false;
>  
> -  hash_entry_type *slot = htab_find_slot (&parent->depend_hash,
> -   &task->depend[i], INSERT);
> +  hash_entry_type *slot
> + = htab_find_slot (&parent->depend_hash, &task->depend[i], INSERT);
>hash_entry_type out = NULL, last = NULL;
>if (*slot)

This change is correct formatting-wise, but the old one was correct too, so
there is no reason to change it like that.

>   {
> @@ -282,13 +228,12 @@ gomp_task_handle_depend (struct gomp_task *task, struct 
> gomp_task *parent,
>   continue;
> else if (tsk->dependers->n_elem == tsk->dependers->allocated)
>   {
> -   tsk->dependers->allocated
> - = tsk->dependers->allocated * 2 + 2;
> +   tsk->dependers->allocated = tsk->dependers->allocated * 2 + 2;

This one is wrong, at the line is already too long after the change by 1
character.

> tsk->dependers
>   = gomp_realloc (tsk->dependers,
>   sizeof (struct gomp_dependers_vec)
> - + (tsk->dependers->allocated
> -* sizeof (struct gomp_task *)));
> +   + (tsk->dependers->allocated
> + * sizeof (struct gomp_task *)));

The old formatting was good, the new one is not.
>   }
> tsk->dependers->elem[tsk->dependers->n_elem++] = task;
> task->num_dependees++;
> @@ -371,8 +316,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
> (void *, void *),
>   {
> if (thr->task->taskgroup->cancelled)
>   return;
> -   if (thr->task->taskgroup->workshare
> -   && thr->task->taskgroup->prev
> +   if (thr->task->taskgroup->workshare && thr->task->taskgroup->prev
> && thr->task->taskgroup->prev->cancelled)

Again.  The coding conventions have a rule that if the whole condition
fits on one line, then it should be on one line, but if it doesn't, each
&& or || operand should go on a separate line (I know, there are several
spots where the code doesn't honor it, but this one wasn't the case).

>   return;
>   }
> @@ -383,8 +327,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
> (void *, void *),
>else if (priority > gomp_max_task_priority_var)
>  priority = gomp_max_task_priority_var;
>  
> -  if (!if_clause || team == NULL
> -  || (thr->ta

Re: [libgomp, WIP, GSoC'19] Modification to a single queue, single execution path.

2019-07-24 Thread 김규래
Hi Jakub, 
thanks for your detailed comments.
 
> Can you please try to tweak your mailer settings?  In the text version of
> the patch the mailer ate tab characters, so the patch can't apply, and in
> the html version which we generally don't want to see on the mailing list,
> one has to open it in some web browser, copy from there and replace \n\n
> with \n to get something that can apply.
 
I blindly ran clang-format before submission,
I'll try to comply with the formatting standards by hand next time.
And about the mailer, I think copy-pasting through a text editor ate all the 
tabs.
I'll try to do something about my mailer.
Sorry for the inconvenience.
 
> Generally, we don't want to have code commented out like this in the final
> patch submission.  For this WIP, I think it is acceptable, as I think in the
> end you don't want to use the team's queue, but actually either
> children_queue (renamed), but only use it on the implicit tasks, or during
 
Can you elaborate?
What do you mean by "children_queue (renamed)"?

> team creation allocate together with the team structure also memory that
> would be used as a trailing array for an array of the implicit queues next
> to the array of implicit tasks. 
 
Do you mean to make two trailing arrays in gomp_team?
Also, this is a personal question, why do gcc prefer trailing arrays over 
dynamically allocated pointers?
 
Thank you again for your detailed comments,
I'll do my best for the rest of GSoC
Ray Kim
 
 


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-24 Thread Richard Biener
On Tue, 23 Jul 2019, Richard Biener wrote:

> 
> The following fixes the runtime regression of 456.hmmer caused
> by matching ICC in code generation and using cmov more aggressively
> (through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
> by manual assembler editing) using the SSE unit for performing
> SImode loads, adds and then two singed max operations plus stores
> is quite a bit faster than cmovs - even faster than the original
> single cmov plus branchy second max.  Even more so for AMD CPUs
> than Intel CPUs.
> 
> Instead of hacking up some pattern recognition pass to transform
> integer mode memory-to-memory computation chains involving
> conditional moves to "vector" code (similar to what STV does
> for TImode ops on x86_64) the following simply allows SImode
> into SSE registers (support for this is already there in some
> important places like move patterns!).  For the particular
> case of 456.hmmer the required support is loads/stores
> (already implemented), SImode adds and SImode smax.
> 
> So the patch adds a smax pattern for SImode (we don't have any
> for scalar modes but currently expand via a conditional move sequence)
> emitting as SSE vector max or cmp/cmov depending on the alternative.
> 
> And it amends the *add_1 pattern with SSE alternatives
> (which have to come before the memory alternative as IRA otherwise
> doesn't consider reloading a memory operand to a register).
> 
> With this in place the runtime of 456.hmmer improves by 10%
> on Haswell which is back to before regression speed but not
> to same levels as seen with manually editing just the single
> important loop.
> 
> I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
> interesting is probably Zen where moves crossing the
> integer - vector domain are excessively expensive (they get
> done via the stack).
> 
> Clearly this approach will run into register allocation issues
> but it looks cleaner than writing yet another STV-like pass
> (STV itself is quite awkwardly structured so I refrain from
> touching it...).
> 
> Anyway - comments?  It seems to me that MMX-in-SSE does
> something very similar.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, previous testing
> revealed some issue.  Forgot that *add_1 also handles
> DImode..., fixed below, re-testing in progress.

Bootstrapped/tested on x86_64-unknown-linux-gnu.  A 3-run of
SPEC CPU 2006 on a Haswell machine completed and results
are in the noise besides the 456.hmmer improvement:

456.hmmer9330184   50.7 S9330162   
57.4 S
456.hmmer9330182   51.2 *9330162   
57.7 *
456.hmmer9330182   51.2 S9330162   
57.7 S

the peak binaries (patched) are all a slightly bit bigger, the
smaxsi3 pattern triggers 6840 times, every time using SSE
registers and never expanding to the cmov variant.  The
*add_1 pattern ends up using SSE regs 264 times
(out of undoubtly many more, uncounted, times).

I do see cases where the RA ends up moving sources of
the max from GPR to XMM when the destination is stored
to memory and used in other ops with SSE but still
it could have used XMM regs for the sources as well:

movl-208(%rbp), %r8d
addl(%r9,%rax), %r8d
vmovd   %r8d, %xmm2
movq-120(%rbp), %r8
# MAX WITH SSE
vpmaxsd %xmm4, %xmm2, %xmm2

amending the *add_1 was of course the trickiest part,
mostly because the GPR case has memory alternatives while
the SSE part does not (since we have to use a whole-vector
add we can't use a memory operand which would be wider
than SImode - AVX512 might come to the rescue with
using {splat} from scalar/immediate or masking
but that might come at a runtime cost as well).  Allowing
memory and splitting after reload, adding a match-scratch
might work as well.  But I'm not sure if that wouldn't
make using SSE regs too obvious if it's not all in the
same alternative.  While the above code isn't too bad
on Core, both Bulldozer and Zen take a big hit.

Another case from 400.perlbench:

vmovd   .LC45(%rip), %xmm7
vmovd   %ebp, %xmm5
# MAX WITH SSE
vpmaxsd %xmm7, %xmm5, %xmm4
vmovd   %xmm4, %ecx

eh?  I can't see why the RA would ever choose the second
alternative.  It looks like it prefers SSE_REGS for the
operand set from a constant.  A testcase like

int foo (int a)
{
  return a > 5 ? a : 5;
}

produces the above with -mavx2, possibly IRA thinks
the missing matching constraint for the 2nd alternative
makes it win?  The dumps aren't too verbose here just
showing the costs, not how we arrive at them.

Generally using SSE for scalar integer ops shouldn't be
bad, esp. in loops it might free GPRs for induction variables.
Cons are larger instruction encoding and inefficient/missing
handling of immediates and no memory operands.

Of course in the end it's just that for some unknown
reason cmp + cmov is so much slower than pmaxsd
(OK, it's a lot less 

[patch] Set TREE_THIS_NOTRAP throughout tree-nested.c

2019-07-24 Thread Eric Botcazou
Hi,

stack memory is considered non-trapping by the compiler once the frame has 
been established so TREE_THIS_NOTRAP can be set on the dereferences built 
during the unnesting pass.

Tested on x86_64-suse-linux, OK for the mainline?


2019-07-24  Eric Botcazou  

* tree-nested.c (build_simple_mem_ref_notrap): New function.
(get_static_chain): Call it instead of (build_simple_mem_ref.
(get_frame_field): Likewise.
(get_nonlocal_debug_decl): Likewise.
(convert_nonlocal_reference_op): Likewise.

-- 
Eric Botcazoucommit cbe1e80c5af525403608dc00ef5e92c5a9f85ee1
Author: Eric Botcazou 
Date:   Wed Jul 17 18:11:32 2019 +0200

Part of work for S716-019 (compiler crash on nested task type at -O2).

diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 60dfc548b5a..74c70681d40 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -169,6 +169,16 @@ create_tmp_var_for (struct nesting_info *info, tree type, const char *prefix)
   return tmp_var;
 }
 
+/* Like build_simple_mem_ref, but set TREE_THIS_NOTRAP on the result.  */
+
+static tree
+build_simple_mem_ref_notrap (tree ptr)
+{
+  tree t = build_simple_mem_ref (ptr);
+  TREE_THIS_NOTRAP (t) = 1;
+  return t;
+}
+
 /* Take the address of EXP to be used within function CONTEXT.
Mark it for addressability as necessary.  */
 
@@ -877,7 +887,7 @@ get_static_chain (struct nesting_info *info, tree target_context,
 	{
 	  tree field = get_chain_field (i);
 
-	  x = build_simple_mem_ref (x);
+	  x = build_simple_mem_ref_notrap (x);
 	  x = build3 (COMPONENT_REF, TREE_TYPE (field), x, field, NULL_TREE);
 	  x = init_tmp_var (info, x, gsi);
 	}
@@ -914,12 +924,12 @@ get_frame_field (struct nesting_info *info, tree target_context,
 	{
 	  tree field = get_chain_field (i);
 
-	  x = build_simple_mem_ref (x);
+	  x = build_simple_mem_ref_notrap (x);
 	  x = build3 (COMPONENT_REF, TREE_TYPE (field), x, field, NULL_TREE);
 	  x = init_tmp_var (info, x, gsi);
 	}
 
-  x = build_simple_mem_ref (x);
+  x = build_simple_mem_ref_notrap (x);
 }
 
   x = build3 (COMPONENT_REF, TREE_TYPE (field), x, field, NULL_TREE);
@@ -963,16 +973,16 @@ get_nonlocal_debug_decl (struct nesting_info *info, tree decl)
   for (i = info->outer; i->context != target_context; i = i->outer)
 	{
 	  field = get_chain_field (i);
-	  x = build_simple_mem_ref (x);
+	  x = build_simple_mem_ref_notrap (x);
 	  x = build3 (COMPONENT_REF, TREE_TYPE (field), x, field, NULL_TREE);
 	}
-  x = build_simple_mem_ref (x);
+  x = build_simple_mem_ref_notrap (x);
 }
 
   field = lookup_field_for_decl (i, decl, INSERT);
   x = build3 (COMPONENT_REF, TREE_TYPE (field), x, field, NULL_TREE);
   if (use_pointer_in_frame (decl))
-x = build_simple_mem_ref (x);
+x = build_simple_mem_ref_notrap (x);
 
   /* ??? We should be remapping types as well, surely.  */
   new_decl = build_decl (DECL_SOURCE_LOCATION (decl),
@@ -1060,7 +1070,7 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data)
 	if (use_pointer_in_frame (t))
 	  {
 		x = init_tmp_var (info, x, &wi->gsi);
-		x = build_simple_mem_ref (x);
+		x = build_simple_mem_ref_notrap (x);
 	  }
 	  }
 


[patch] More precise message with -Winline

2019-07-24 Thread Eric Botcazou
Hi,

for EH cleanups, the branch probability heuristics can consider that edges are 
never taken, i.e. their profile count is set to zero.  In this case, no matter 
how you tweak the inlining parameters, you cannot get function calls inlined, 
but -Winline nevertheless prints the standard message about unlikely calls as 
in the cases when tweaking the inlining parameters can work.

Therefore the attached patch adds a new CIF code with the warning message:

"call is never executed and code size would grow [-Winline]"

for this case, thus signaling the user that he'd better not waste time trying 
to get the call inlined.

Tested on x86_64-suse-linux, OK for the mainline?


2019-07-24  Eric Botcazou  

* cif-code.def (NEVER_CALL): New code.
* ipa-inline.c (want_inline_small_function_p): Fix formatting issues.
Set the failure to CIF_NEVER_CALL if the IPA count is zero.

-- 
Eric BotcazouIndex: cif-code.def
===
--- cif-code.def	(revision 273480)
+++ cif-code.def	(working copy)
@@ -83,6 +83,10 @@ DEFCIFCODE(RECURSIVE_INLINING, CIF_FINAL_NORMAL,
 DEFCIFCODE(UNLIKELY_CALL, CIF_FINAL_NORMAL,
 	   N_("call is unlikely and code size would grow"))
 
+/* Call is considered never executed.  */
+DEFCIFCODE(NEVER_CALL, CIF_FINAL_NORMAL,
+	   N_("call is never executed and code size would grow"))
+
 /* Function is not declared as inline.  */
 DEFCIFCODE(NOT_DECLARED_INLINED, CIF_FINAL_NORMAL,
 	   N_("function not declared inline and code size would grow"))
Index: ipa-inline.c
===
--- ipa-inline.c	(revision 273480)
+++ ipa-inline.c	(working copy)
@@ -810,7 +810,7 @@ want_inline_small_function_p (struct cgraph_edge *
   | INLINE_HINT_loop_stride))
 		   && !(big_speedup = big_speedup_p (e)
 	{
-  e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
+	  e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
 	  want_inline = false;
 	}
   else if (!DECL_DECLARED_INLINE_P (callee->decl)
@@ -818,12 +818,12 @@ want_inline_small_function_p (struct cgraph_edge *
 	   && growth >= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SMALL))
 	{
 	  /* growth_likely_positive is expensive, always test it last.  */
-  if (growth >= MAX_INLINE_INSNS_SINGLE
+	  if (growth >= MAX_INLINE_INSNS_SINGLE
 	  || growth_likely_positive (callee, growth))
 	{
-  e->inline_failed = CIF_NOT_DECLARED_INLINED;
+	  e->inline_failed = CIF_NOT_DECLARED_INLINED;
 	  want_inline = false;
- 	}
+	}
 	}
   /* Apply MAX_INLINE_INSNS_AUTO limit for functions not declared inline
 	 Upgrade it to MAX_INLINE_INSNS_SINGLE when hints suggests that
@@ -839,12 +839,12 @@ want_inline_small_function_p (struct cgraph_edge *
 	   && !(big_speedup == -1 ? big_speedup_p (e) : big_speedup))
 	{
 	  /* growth_likely_positive is expensive, always test it last.  */
-  if (growth >= MAX_INLINE_INSNS_SINGLE
+	  if (growth >= MAX_INLINE_INSNS_SINGLE
 	  || growth_likely_positive (callee, growth))
 	{
 	  e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
 	  want_inline = false;
- 	}
+	}
 	}
   /* If call is cold, do not inline when function body would grow. */
   else if (!e->maybe_hot_p ()
@@ -851,7 +851,10 @@ want_inline_small_function_p (struct cgraph_edge *
 	   && (growth >= MAX_INLINE_INSNS_SINGLE
 		   || growth_likely_positive (callee, growth)))
 	{
-  e->inline_failed = CIF_UNLIKELY_CALL;
+	  if (e->count.ipa () == profile_count::zero ())
+	e->inline_failed = CIF_NEVER_CALL;
+	  else
+	e->inline_failed = CIF_UNLIKELY_CALL;
 	  want_inline = false;
 	}
 }


[patch] Add simple narrowing optimization to expand_case

2019-07-24 Thread Eric Botcazou
Hi,

this adds a simple narrowing optimization to expand_case in order to avoid 
calling a double-word comparison routine when it is unnecessary to do so.
This is mainly for -O0 because the optimization is otherwise done by forward 
propagation and avoids having to drag libgcc or not depending on the -O level.

Tested on x86_64-suse-linux, OK for the mainline?


2019-07-24  Eric Botcazou  

* stmt.c (expand_case): Try to narrow the index type if it's larger
than a word.  Tidy up.


2019-07-24  Eric Botcazou  

* gnat.dg/case_optimization3.ad[sb]: New test.

-- 
Eric BotcazouIndex: stmt.c
===
--- stmt.c	(revision 273480)
+++ stmt.c	(working copy)
@@ -885,6 +885,7 @@ expand_case (gswitch *stmt)
   tree index_type = TREE_TYPE (index_expr);
   tree elt;
   basic_block bb = gimple_bb (stmt);
+  gimple *def_stmt;
 
   auto_vec case_list;
 
@@ -918,6 +919,31 @@ expand_case (gswitch *stmt)
   else
 maxval = fold_convert (index_type, CASE_LOW (elt));
 
+  /* Try to narrow the index type if it's larger than a word.
+ That is mainly for -O0 where an equivalent optimization
+ done by forward propagation is not run and is aimed at
+ avoiding a call to a comparison routine of libgcc.  */
+  if (TYPE_PRECISION (index_type) > BITS_PER_WORD
+  && TREE_CODE (index_expr) == SSA_NAME
+  && (def_stmt = SSA_NAME_DEF_STMT (index_expr))
+  && is_gimple_assign (def_stmt)
+  && gimple_assign_rhs_code (def_stmt) == NOP_EXPR)
+{
+  tree inner_index_expr = gimple_assign_rhs1 (def_stmt);
+  tree inner_index_type = TREE_TYPE (inner_index_expr);
+
+  if (INTEGRAL_TYPE_P (inner_index_type)
+	  && TYPE_PRECISION (inner_index_type) <= BITS_PER_WORD
+	  && int_fits_type_p (minval, inner_index_type)
+	  && int_fits_type_p (maxval, inner_index_type))
+	{
+	  index_expr = inner_index_expr;
+	  index_type = inner_index_type;
+	  minval = fold_convert (index_type, minval);
+	  maxval = fold_convert (index_type, maxval);
+	}
+}
+
   /* Compute span of values.  */
   range = fold_build2 (MINUS_EXPR, index_type, maxval, minval);
 
@@ -969,27 +995,22 @@ expand_case (gswitch *stmt)
 
   rtx_insn *before_case = get_last_insn ();
 
-  /* Decide how to expand this switch.
- The two options at this point are a dispatch table (casesi or
- tablejump) or a decision tree.  */
-
+  /* If the default case is unreachable, then set default_label to NULL
+ so that we omit the range check when generating the dispatch table.
+ We also remove the edge to the unreachable default case.  The block
+ itself will be automatically removed later.  */
+  if (EDGE_COUNT (default_edge->dest->succs) == 0
+  && gimple_seq_unreachable_p (bb_seq (default_edge->dest)))
 {
-  /* If the default case is unreachable, then set default_label to NULL
-	 so that we omit the range check when generating the dispatch table.
-	 We also remove the edge to the unreachable default case.  The block
-	 itself will be automatically removed later.  */
-  if (EDGE_COUNT (default_edge->dest->succs) == 0
-	  && gimple_seq_unreachable_p (bb_seq (default_edge->dest)))
-	{
-	  default_label = NULL;
-	  remove_edge (default_edge);
-	  default_edge = NULL;
-	}
-  emit_case_dispatch_table (index_expr, index_type,
-case_list, default_label, default_edge,
-minval, maxval, range, bb);
+  default_label = NULL;
+  remove_edge (default_edge);
+  default_edge = NULL;
 }
 
+  emit_case_dispatch_table (index_expr, index_type,
+			case_list, default_label, default_edge,
+			minval, maxval, range, bb);
+
   reorder_insns (NEXT_INSN (before_case), get_last_insn (), before_case);
 
   free_temp_slots ();
package Case_Optimization3 is

   type T_UINT32 is range 0 .. (2 ** 32) - 1;
   for T_UINT32'Size use 32;

   subtype T_RANGE is T_UINT32 range 0 .. 7;

   procedure Proc (Val : T_RANGE);

end Case_Optimization3;
-- { dg-do compile }

package body Case_Optimization3 is

   procedure Proc (Val : T_RANGE) is
   begin
  case Val is
 when 0 =>
raise Program_Error;
 when 1 =>
null;
 when 2 =>
null;
 when 3 =>
null;
 when 4 =>
null;
 when others =>
null;
  end case;
   end;

end Case_Optimization3;

-- { dg-final { scan-assembler-not "__ucmpdi2" } }


[PATCH] i386: Roundeven expansion for SSE4.1+

2019-07-24 Thread Tejas Joshi
Hi.
This is a patch that Uros suggested for roundeven expansion, here.
Thanks for the heads up.

I have rerun the testsuite on the patch, it survives the regression
tests and bootstraps on x86_64-linux-gnu. Note, patch to be applied on
top of


Thanks,
Tejas

gcc/ChangeLog:

2019-07-24  Tejas Joshi  

* builtins.c (mathfn_built_in_2): Change CASE_MATHFN to
CASE_MATHFN_FLOATN for roundeven.
* config/i386/i386.c (ix86_i387_mode_needed): Add case I387_ROUNDEVEN.
(ix86_mode_needed): Likewise.
(ix86_mode_after): Likewise.
(ix86_mode_entry): Likewise.
(ix86_mode_exit): Likewise.
(ix86_emit_mode_set): Likewise.
(emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN.
* config/i386/i386.h (ix86_stack_slot) : Add SLOT_CW_ROUNDEVEN.
(ix86_entry): Add I387_ROUNDEVEN.
(avx_u128_state): Add I387_CW_ANY.
* config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN.
(define_int_iterator): Likewise.
(define_int_attr): Likewise for rounding_insn, rounding and ROUNDING.
(define_constant): Define ROUND_ROUNDEVEN mode.
(define_attr): Add roundeven mode for i387_cw.
(2): Add condition for ROUND_ROUNDEVEN.
* internal-fn.def (ROUNDEVEN): New builtin function.
* optabs.def (roundeven_optab): New optab.

gcc/testsuite/ChangeLog:

2019-07-24  Tejas Joshi  

* gcc.target/i386/avx-vround-roundeven-1.c: New test.
* gcc.target/i386/avx-vround-roundeven-2.c: New test.
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 8ceb077b0bf..f61f10422fd 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2056,7 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 CASE_MATHFN (REMQUO)
 CASE_MATHFN_FLOATN (RINT)
 CASE_MATHFN_FLOATN (ROUND)
-CASE_MATHFN (ROUNDEVEN)
+CASE_MATHFN_FLOATN (ROUNDEVEN)
 CASE_MATHFN (SCALB)
 CASE_MATHFN (SCALBLN)
 CASE_MATHFN (SCALBN)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 49f49c5f8d0..aa32a61f416 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13593,6 +13593,11 @@ ix86_i387_mode_needed (int entity, rtx_insn *insn)
 
   switch (entity)
 {
+case I387_ROUNDEVEN:
+  if (mode == I387_CW_ROUNDEVEN)
+	return mode;
+  break;
+
 case I387_TRUNC:
   if (mode == I387_CW_TRUNC)
 	return mode;
@@ -13627,6 +13632,7 @@ ix86_mode_needed (int entity, rtx_insn *insn)
   return ix86_dirflag_mode_needed (insn);
 case AVX_U128:
   return ix86_avx_u128_mode_needed (insn);
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13687,6 +13693,7 @@ ix86_mode_after (int entity, int mode, rtx_insn *insn)
   return mode;
 case AVX_U128:
   return ix86_avx_u128_mode_after (mode, insn);
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13739,6 +13746,7 @@ ix86_mode_entry (int entity)
   return ix86_dirflag_mode_entry ();
 case AVX_U128:
   return ix86_avx_u128_mode_entry ();
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13776,6 +13784,7 @@ ix86_mode_exit (int entity)
   return X86_DIRFLAG_ANY;
 case AVX_U128:
   return ix86_avx_u128_mode_exit ();
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -13810,6 +13819,12 @@ emit_i387_cw_initialization (int mode)
 
   switch (mode)
 {
+case I387_CW_ROUNDEVEN:
+  /* round to nearest */
+  emit_insn (gen_andhi3 (reg, reg, GEN_INT (0x0c00)));
+  slot = SLOT_CW_ROUNDEVEN;
+  break;
+
 case I387_CW_TRUNC:
   /* round toward zero (truncate) */
   emit_insn (gen_iorhi3 (reg, reg, GEN_INT (0x0c00)));
@@ -13856,6 +13871,7 @@ ix86_emit_mode_set (int entity, int mode, int prev_mode ATTRIBUTE_UNUSED,
   if (mode == AVX_U128_CLEAN)
 	emit_insn (gen_avx_vzeroupper ());
   break;
+case I387_ROUNDEVEN:
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1f70844fc45..a8e1cd048a8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2470,6 +2470,7 @@ enum ix86_stack_slot
 {
   SLOT_TEMP = 0,
   SLOT_CW_STORED,
+  SLOT_CW_ROUNDEVEN,
   SLOT_CW_TRUNC,
   SLOT_CW_FLOOR,
   SLOT_CW_CEIL,
@@ -2481,6 +2482,7 @@ enum ix86_entity
 {
   X86_DIRFLAG = 0,
   AVX_U128,
+  I387_ROUNDEVEN,
   I387_TRUNC,
   I387_FLOOR,
   I387_CEIL,
@@ -2516,7 +2518,7 @@ enum avx_u128_state
 
 #define NUM_MODES_FOR_MODE_SWITCHING			\
   { X86_DIRFLAG_ANY, AVX_U128_ANY,			\
-I387_CW_ANY, I387_CW_ANY, I387_CW_ANY }
+I387_CW_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY  }
 
 
 /* Avoid renaming of stack registers, as doing so in combination with
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index db5fa9ae3ca..21532494f6d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/con

Re: [PATCH, og9] Port OpenACC profiling interface to OG9

2019-07-24 Thread Thomas Schwinge
Hi Kwok!

On 2019-06-24T20:37:12+0100, Kwok Cheung Yeung  wrote:
> On 17/06/2019 6:24 pm, Thomas Schwinge wrote:
>>> Okay to push to openacc-gcc-9-branch?

>> I think what would be best, is the following approach: [...]

> I have now ported over the mainline patch over to OG9, plus an 
> additional patch on top of that to bring in the bits from OG8 that did 
> not make it upstream.
> 
> I have dropped the differences in comments, TODOs, documentation etc. in 
> favour of the upstream patch. There are also various places where the 
> OG8 patch sets up profiling, then gotos the end of the function to tear 
> it down again, whereas the mainline version just aborts early without 
> setting up profiling in the first place - these I have also resolved in 
> favour of the mainline version.

ACK, thanks!

> I have rerun the libgomp testsuite with no regressions noted.

(I didn't actually test, but just reviewed the code changes.)

> Okay to push to openacc-gcc-9-branch?

Yes, please push after re-testing with minor modifications in the second
commit, as follows:

> From 341285e282f5b7ed73daaeb9fd2f820dc1fe91f9 Mon Sep 17 00:00:00 2001
> From: Kwok Cheung Yeung 
> Date: Fri, 21 Jun 2019 10:40:38 -0700
> Subject: [PATCH 2/2] Add changes to profiling interface from OG8 branch
>
> This bundles up the parts of the profiling code from the OG8 branch that were
> not included in the upstream patch.

> +2017-02-28  Thomas Schwinge  
> +
> + [...]
> + * oacc-parallel.c (GOACC_parallel_keyed_internal): Set device_api for
> + profiling.

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -275,6 +275,8 @@ GOACC_parallel_keyed_internal (int flags_m, int params, 
> void (*fn) (void *),
>goacc_call_host_fn (fn, mapnum, hostaddrs, params);
>goto out_prof;
>  }
> +  else if (profiling_p)
> +api_info.device_api = acc_device_api_cuda;

That change is not quite right, and I'm pretty sure it wasn't me who
introduced that code ;-P -- but that can be resolved later.

> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -524,7 +525,7 @@ search_path = $(addprefix $(top_srcdir)/config/, 
> $(config_path)) $(top_srcdir) \
>  
>  fincludedir = 
> $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
>  libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
> -libgomp_la_LIBADD = $(LIBFFI)
> +@USE_LIBFFI_TRUE@libgomp_la_LIBADD = $(LIBFFI)

That must've been an earlier commit not properly regenerating the file,
but we shall not bother with that now.

> diff --git a/libgomp/config/nvptx/oacc-profiling-acc_register_library.c 
> b/libgomp/config/nvptx/oacc-profiling-acc_register_library.c
> new file mode 100644
> index 000..e69de29
> diff --git a/libgomp/config/nvptx/oacc-profiling.c 
> b/libgomp/config/nvptx/oacc-profiling.c
> new file mode 100644
> index 000..e69de29

Not sure if these empty files are actually needed, but no harm done, so
that can be resolved later.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
> @@ -41,6 +41,7 @@ static int state = -1;
>  static acc_device_t acc_device_type;
>  static int acc_device_num;
>  static int num_gangs, num_workers, vector_length;
> +static int async;

All these 'async' changes in this file logically belong into the
respective commit of the OpenACC 'kernels' changes.  It's not a problem
to have them included here; we shall just try to remember to include them
in the OpenACC 'kernels' trunk changes.  (I've made a note; no need for
you to re-test/re-post.)

> @@ -165,6 +166,15 @@ int main()
>for (int i = 0; i < N; ++i)
>   x[i] = i * i;
>  }
> +#ifdef __OPTIMIZE__
> +/* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +   "state == 0" still holds.  It's not yet clear what's going on.
> +   Mis-optimization across the GOMP function call boundary?  Per its
> +   gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +   "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the 
> compiler
> +   must expect calls back into this compilation unit?  */
> +asm volatile ("" : : : "memory");
> +#endif

That workaround is no longer needed given the more specific workaround
that I've added, marked with "TODO PR90488".

> @@ -184,12 +196,22 @@ int main()
>for (int i = 0; i < N; ++i)
>   x[i] = i * i;
>  }
> +#ifdef __OPTIMIZE__
> +/* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +   "state == 0" still holds.  It's not yet clear what's going on.
> +   Mis-optimization across the GOMP function call boundary?  Per its
> +   gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +   "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the 
> compiler
> +   must expect calls back into this compilation unit?  */
> +asm volatile ("" : : : "memory");
> +#endif

Likewise.

Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Nathan Sidwell

On 7/24/19 3:12 AM, Martin Liška wrote:

On 7/24/19 9:03 AM, Allan Sandfeld Jensen wrote:



There are patches to enable it in ninja, and I know some Linux distros apply
the patches by default. Though that is more listening, so it probably requires
launching ninja using make, if you want to be able to pass it own to gcc.


Btw. I know that Nathan will need a much closer integration between a compiler
and a build system for hit C++ modules work. That can be a good opportunity
for the LTO to utilize the infrastructure as well.



Yes indeed, I do mention LTO requirements as a foil against 'everything 
is different' complaints about modules :)  FWIW WG21 (C++ std) Study 
Group 15 is the tooling group, which is where build system people hang 
out.  If anyone's interested in joining email me.


nathan

--
Nathan Sidwell


Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-24 Thread Maciej W. Rozycki
Fredrik,

> > > How can one reasonably prevent the bug when compiling a whole Linux
> > > distribution with thousands of packages if GAS merely warns and proceeds
> > > in many cases?
> > 
> >  I think the tools must not set a policy.  By using `.set noreorder' the 
> > user told the toolchain he knows better and asked to keep its hands away.
> > 
> >  As I say you can use `-Werror' for code auditing.  And in most cases 
> > manually scheduled delay slots in handcoded assembly are a sign of a 
> > problem with the piece of code affected, regardless of the R5900 erratum.
> 
> Well, it seems practical to use unofficial tools and a patched GAS to fix
> this assembly bug, then. It's a grave problem for the R5900 and it needs to
> be reliably detected.

 Why?  What is wrong with using `-Werror'?

 Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to 
the assembly stage only if you are concerned about bad high-level language 
code quality interfering with your goal.

> >  See `nops_for_insn'.  However again, as I noted, `.set noreorder' tells 
> > GAS not to analyse the dependencies for code within.  There is no need to 
> > schedule this delay slot manually, and if this is generated code, then the 
> > producer (GCC) should have taken care of the hazards, be it architectural 
> > or errata.  If this is manually written code, then the author asked for 
> > trouble.
> 
> I'm using the principle above to unobtrusively adjust problematic kernel
> code, via a short_loop_war macro. Here is one patched instance:
> 
> --- a/arch/mips/boot/compressed/head.S
> +++ b/arch/mips/boot/compressed/head.S
> @@ -13,6 +13,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  
>   .set noreorder
> @@ -29,6 +30,7 @@ start:
>   PTR_LA  a0, _edata
>   PTR_LA  a2, _end
>  1:   sw  zero, 0(a0)
> + short_loop_war(1b)
>   bne a2, a0, 1b
>addiu  a0, a0, 4
>  
> @@ -48,6 +50,7 @@ start:
>   jr  k0
>nop
>  3:
> + short_loop_war(3b)
>   b   3b
>nop

 Is it needed here given that the delay slot instruction is a NOP anyway?  

 Also this source does not need `.set noreorder', except for the loop 
around `1:' where a data dependency is present with the delay slot 
instruction.  And I am surprised that it even assembles as it uses 
`.cprestore' without the required offset argument (not that this pseudo-op 
makes any sense here).  And it's weird overall, e.g. it loads $ra 
explicitly rather than using JALR (or JAL even).  But these are unrelated 
problems.

> >  Well, BogoMIPS is just an arbitrary number.
> 
> So presumably the noreorder BogoMIPS variant can be replaced with a
> single reorder variant that works with all MIPS implementations?

 Sure, BogoMIPS just records how quickly the delay loop runs, and 
therefore how many iterations are required for a given amount of time to 
spend twiddling thumbs.

  Maciej


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-24 Thread Richard Biener
On Wed, 24 Jul 2019, Richard Biener wrote:

> On Tue, 23 Jul 2019, Richard Biener wrote:
> 
> > 
> > The following fixes the runtime regression of 456.hmmer caused
> > by matching ICC in code generation and using cmov more aggressively
> > (through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
> > by manual assembler editing) using the SSE unit for performing
> > SImode loads, adds and then two singed max operations plus stores
> > is quite a bit faster than cmovs - even faster than the original
> > single cmov plus branchy second max.  Even more so for AMD CPUs
> > than Intel CPUs.
> > 
> > Instead of hacking up some pattern recognition pass to transform
> > integer mode memory-to-memory computation chains involving
> > conditional moves to "vector" code (similar to what STV does
> > for TImode ops on x86_64) the following simply allows SImode
> > into SSE registers (support for this is already there in some
> > important places like move patterns!).  For the particular
> > case of 456.hmmer the required support is loads/stores
> > (already implemented), SImode adds and SImode smax.
> > 
> > So the patch adds a smax pattern for SImode (we don't have any
> > for scalar modes but currently expand via a conditional move sequence)
> > emitting as SSE vector max or cmp/cmov depending on the alternative.
> > 
> > And it amends the *add_1 pattern with SSE alternatives
> > (which have to come before the memory alternative as IRA otherwise
> > doesn't consider reloading a memory operand to a register).
> > 
> > With this in place the runtime of 456.hmmer improves by 10%
> > on Haswell which is back to before regression speed but not
> > to same levels as seen with manually editing just the single
> > important loop.
> > 
> > I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
> > interesting is probably Zen where moves crossing the
> > integer - vector domain are excessively expensive (they get
> > done via the stack).
> > 
> > Clearly this approach will run into register allocation issues
> > but it looks cleaner than writing yet another STV-like pass
> > (STV itself is quite awkwardly structured so I refrain from
> > touching it...).
> > 
> > Anyway - comments?  It seems to me that MMX-in-SSE does
> > something very similar.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, previous testing
> > revealed some issue.  Forgot that *add_1 also handles
> > DImode..., fixed below, re-testing in progress.
> 
> Bootstrapped/tested on x86_64-unknown-linux-gnu.  A 3-run of
> SPEC CPU 2006 on a Haswell machine completed and results
> are in the noise besides the 456.hmmer improvement:
> 
> 456.hmmer9330184   50.7 S9330162   
> 57.4 S
> 456.hmmer9330182   51.2 *9330162   
> 57.7 *
> 456.hmmer9330182   51.2 S9330162   
> 57.7 S
> 
> the peak binaries (patched) are all a slightly bit bigger, the
> smaxsi3 pattern triggers 6840 times, every time using SSE
> registers and never expanding to the cmov variant.  The
> *add_1 pattern ends up using SSE regs 264 times
> (out of undoubtly many more, uncounted, times).
> 
> I do see cases where the RA ends up moving sources of
> the max from GPR to XMM when the destination is stored
> to memory and used in other ops with SSE but still
> it could have used XMM regs for the sources as well:
> 
> movl-208(%rbp), %r8d
> addl(%r9,%rax), %r8d
> vmovd   %r8d, %xmm2
> movq-120(%rbp), %r8
> # MAX WITH SSE
> vpmaxsd %xmm4, %xmm2, %xmm2
> 
> amending the *add_1 was of course the trickiest part,
> mostly because the GPR case has memory alternatives while
> the SSE part does not (since we have to use a whole-vector
> add we can't use a memory operand which would be wider
> than SImode - AVX512 might come to the rescue with
> using {splat} from scalar/immediate or masking
> but that might come at a runtime cost as well).  Allowing
> memory and splitting after reload, adding a match-scratch
> might work as well.  But I'm not sure if that wouldn't
> make using SSE regs too obvious if it's not all in the
> same alternative.  While the above code isn't too bad
> on Core, both Bulldozer and Zen take a big hit.
> 
> Another case from 400.perlbench:
> 
> vmovd   .LC45(%rip), %xmm7
> vmovd   %ebp, %xmm5
> # MAX WITH SSE
> vpmaxsd %xmm7, %xmm5, %xmm4
> vmovd   %xmm4, %ecx
> 
> eh?  I can't see why the RA would ever choose the second
> alternative.  It looks like it prefers SSE_REGS for the
> operand set from a constant.  A testcase like
> 
> int foo (int a)
> {
>   return a > 5 ? a : 5;
> }
> 
> produces the above with -mavx2, possibly IRA thinks
> the missing matching constraint for the 2nd alternative
> makes it win?  The dumps aren't too verbose here just
> showing the costs, not how we arrive at them.

Eh, this is to my use of the "cpu" attribute for smaxsi3 which
makes it only e

[Committed] S/390: Add add/sub/mul overflow check patterns

2019-07-24 Thread Andreas Krebbel
This patch implements the addv, subv, and mulv patterns for signed
integers.

Bootstrapped and regression tested on s390x (IBM z14).

Committed to mainline.

gcc/ChangeLog:

2019-07-24  Andreas Krebbel  

* config/s390/predicates.md (addv_const_operand): New predicate.
* config/s390/s390-modes.def (CCO): New condition code mode.
* config/s390/s390.c (s390_match_ccmode_set): Handle E_CCOmode.
(s390_branch_condition_mask): Likewise.
* config/s390/s390.md ("addv4", "subv4")
("mulv4"): New expanders.
("*addv3_ccoverflow", "*addv3_ccoverflow_const")
("*subv3_ccoverflow", "*mulv3_ccoverflow"): New
pattern definitions.

gcc/testsuite/ChangeLog:

2019-07-24  Andreas Krebbel  

* gcc.target/s390/addsub-signed-overflow-1.c: New test.
* gcc.target/s390/addsub-signed-overflow-2.c: New test.
* gcc.target/s390/mul-signed-overflow-1.c: New test.
* gcc.target/s390/mul-signed-overflow-2.c: New test.
---
 gcc/config/s390/predicates.md  |   6 +
 gcc/config/s390/s390-modes.def |  14 ++
 gcc/config/s390/s390.c |  10 ++
 gcc/config/s390/s390.md| 144 +
 .../gcc.target/s390/addsub-signed-overflow-1.c |  81 
 .../gcc.target/s390/addsub-signed-overflow-2.c |  80 
 .../gcc.target/s390/mul-signed-overflow-1.c|  56 
 .../gcc.target/s390/mul-signed-overflow-2.c|  56 
 8 files changed, 447 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/addsub-signed-overflow-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/addsub-signed-overflow-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 4d2f8b2..fa15c05 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -585,3 +585,9 @@
   return s390_valid_shift_count (op, 0);
 }
 )
+
+; An integer constant which can be used in a signed add with overflow
+; pattern without being reloaded.
+(define_predicate "addv_const_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) >= -32768 && INTVAL (op) <= 32767")))
diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index 88c8673..7b7c114 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -31,6 +31,8 @@ FLOAT_MODE (TF, 16, ieee_quad_format);
 
 Condition Codes
 
+  CC0 CC1  CC2 CC3
+
 Check for zero
 
 CCZ:  EQ  NE   NE  NE
@@ -57,6 +59,10 @@ CCA:  EQ  LT   GT  Overflow
 CCAP: EQ  LT   GT  LT (AGHI, AHI)
 CCAN: EQ  LT   GT  GT (AGHI, AHI)
 
+Condition codes for overflow checking resulting from signed adds/subs/mults
+
+CCO:  EQ  EQ   EQ  NE (AGR, AGHI, SGR, MSC, 
...)
+
 Condition codes of unsigned adds and subs
 
 CCL:  EQ  NE   EQ  NE (ALGF/R, ALG/R, AL/R/Y,
@@ -98,6 +104,13 @@ If you know whether the used constant is positive or 
negative you can predict
 the sign of the result even in case of an overflow.
 
 
+CCO
+
+This mode is used to check whether there was an overflow condition in
+a signed add, sub, or mul operation.  See (addv4, subv4,
+mulv4 patterns).
+
+
 CCT, CCT1, CCT2, CCT3
 
 If bits of an integer masked with an AND instruction are checked, the test 
under
@@ -204,6 +217,7 @@ CC_MODE (CCZ1);
 CC_MODE (CCA);
 CC_MODE (CCAP);
 CC_MODE (CCAN);
+CC_MODE (CCO);
 CC_MODE (CCL);
 CC_MODE (CCL1);
 CC_MODE (CCL2);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 75b0b5b..24b8a5c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1378,6 +1378,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
 case E_CCSRmode:
 case E_CCUmode:
 case E_CCURmode:
+case E_CCOmode:
 case E_CCLmode:
 case E_CCL1mode:
 case E_CCL2mode:
@@ -2071,6 +2072,15 @@ s390_branch_condition_mask (rtx code)
}
   break;
 
+case E_CCOmode:
+  switch (GET_CODE (code))
+   {
+   case EQ:return CC0 | CC1 | CC2;
+   case NE:return CC3;
+   default:return -1;
+   }
+  break;
+
 case E_CCSmode:
   switch (GET_CODE (code))
{
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 94a7340..e4516f6 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -5961,6 +5961,83 @@
   "agh\t%0,%2"
   [(set_attr "op_type"  "RXY")])
 
+
+; Jump to label OP3 if OP1 + OP2 results in a signed overflow
+
+; addv_const_operand accepts all constants which can be handled
+; without reloads.  These will be handled primarily by
+; "*addv3_ccoverflow_const

Ping: [RFC, PATCH] Display inlining context for uninitialized warnings

2019-07-24 Thread Vladislav Ivanishin
Hi,

I'm pinging .

I think, there are two subtopics to it that can be discussed separately.
I would like to focus on the patch itself here.  I am going to also
start a subthread dedicated to dealing with representative returns.

I still have the same questions I had when I was sending the patch.
I am going to reiterate them, but in a more structured/focused fashion.

> When percent_K_format is called, TEXT might already hold precise
> location information in case its (indirect) caller is
> warning_at/error_at.  So it seems to me, this function lacks
> distinguishing the two cases: being called from plain warning/error
> functions vs. their *_at counterparts (the location shouldn't be
> updated in the latter case).

David, do you agree?  Should a discriminator of some sort be introduced?

> Sometimes inlining context is still lost (with the current patch), as
> can be seen e.g. with a version of the test with 's/unsigned yo/int yo/'
> substitution applied.  (The chain of block = BLOCK_SUPERCONTEXT (block)
> is broken - it ends with 0 rather than a FUNCTION_DECL.)  Is this known
> and expected (e.g. because pass_late_warn_uninitialized runs very late),
> or something to be investigated and fixed?

I don't know whom to address this question to.  I guess I'll just go
ahead and investigate if no expert shows up.

> The patch was bootstrapped and regtested on x86_64-pc-linux-gnu.
> Shall it be applied?

Addressing the two points above would make the solution more complete.
However, I think, the patch is still beneficial as-is, and it is not
hacky.  Re-tested, no regressions.

-- 
Vlad



Representative returns and location info (Re: [RFC, PATCH] Display inlining context for uninitialized warnings)

2019-07-24 Thread Vladislav Ivanishin
Jeff Law  writes:
> On 6/19/19 8:57 AM, Martin Sebor wrote:
>> On 6/19/19 5:11 AM, Vladislav Ivanishin wrote:
>>> Hi,
>>>
>>> This patch (partially) adds displaying inlining context for
>>> -W{maybe,}uninitialized warnings.  This is not as trivial to enable as
>>> simply supplying the "%G" format specifier, so I have some questions
>>> below.
>>>
>>> I need this hunk
>>>
>>>   void
>>>   percent_K_format (text_info *text, location_t loc, tree block)
>>>   {
>>> -  text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
>>> +  if (loc != UNKNOWN_LOCATION)
>>> +    text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
>>>    for two reasons:
>>>
>>> - The gimple return stmt has UNKNOWN_LOCATION due to this code in
>>>    lower_gimple_return ():
>>>
>>>    if (gimple_return_retval (stmt) == gimple_return_retval (tmp_rs.stmt))
>>>  {
>>>    /* Remove the line number from the representative return
>>> statement.
>>>   It now fills in for many such returns.  Failure to remove this
>>>   will result in incorrect results for coverage analysis.  */
>>>    gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);
>> 
>> This code is also causing quite a few non-trivial instances of
>> -Wreturn-local-addr to have no location after two (or more) such
>> statements have been merged (I raised PR 90735 for it), independent
>> of inlining.  I wonder if using the location of the closing curly
>> brace of the function definition (if it's available somewhere)
>> would work without messing up the coverage analysis.
>> 
>>>
>>>    (In case you are wondering, the code and the comment were
>>>    added in 2004.)
[... snip ...]
> You might also want to bring in Andreas K who added the original bits to
> help with debug info generation and Eric B who extended that code in
> 2014 due to impacts on profiling.
>
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00769.html
> https://gcc.gnu.org/ml/gcc-patches/2011-02/msg00421.html

(Let's focus on location info and representative returns in this
subthread.)

AFAIU, a return statement may be chosen as a representative return for a
function.  The representative return's location is set to
UNKNOWN_LOCATION, because otherwise the results of coverage analysis are
skewed.  What is the difficulty preventing us from having both the
location info for the return and faithful coverage?

Andreas, Eric, would you have any comments?

-- 
Vlad


Re: [PATCH 2/4] New parameter manipulation infrastructure

2019-07-24 Thread Martin Jambor
Hello,

On Tue, Jul 23 2019, Martin Jambor wrote:
> This patch adds the capability to split parameters directly to the
> call graph cloning infrastructure, while still allowing omp-simd to
> clone functions on its own.  Please see the cover letter for the whole
> IPA-SRA patch-set for more details.
>
> Martin
>
> 2019-07-23  Martin Jambor  
>
> * Makefile.in (GTFILES): Added ipa-param-manipulation.h.
> * cgraph.h (ipa_replace_map): Removed fields old_tree, replace_p
> and ref_p, added fields param_adjustments and performed_splits.
> (struct cgraph_clone_info): Remove ags_to_skip and
> combined_args_to_skip, new field param_adjustments.
> (cgraph_node::create_clone): Changed parameters to use
> ipa_param_adjustments.
> (cgraph_node::create_virtual_clone): Likewise.
> (cgraph_node::create_virtual_clone_with_body): Likewise.
> (tree_function_versioning): Likewise.
> (cgraph_build_function_type_skip_args): Removed.
> * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Convert to
> using ipa_param_adjustments.
> (clone_of_p): Likewise.
> * cgraphclones.c (cgraph_build_function_type_skip_args): Removed.
> (build_function_decl_skip_args): Likewise.
> (duplicate_thunk_for_node): Adjust parameters using
> ipa_param_body_adjustments, copy param_adjustments instead of
> args_to_skip.
> (cgraph_node::create_clone): Convert to using ipa_param_adjustments.
> (cgraph_node::create_virtual_clone): Likewise.
> (cgraph_node::create_version_clone_with_body): Likewise.
> (cgraph_materialize_clone): Likewise.
> (symbol_table::materialize_all_clones): Likewise.
> * coretypes.h (cgraph_edge): Declare.
> * ipa-cp.c (get_replacement_map): Do not initialize removed fields.
> (initialize_node_lattices): Make aware that some parameters might have
> already been removed.
> (want_remove_some_param_p): New function.
> (create_specialized_node): Convert to using ipa_param_adjustments and
> deal with possibly pre-existing adjustments.
> * ipa-fnsummary.c (ipa_fn_summary_t::duplicate): Simplify
> ipa_replace_map check.
> * ipa-inline-transform.c (save_inline_function_body): Update to
> refelct new tree_function_versioning signature.
> * ipa-param-manipulation.c: Rewrite.
> * ipa-param-manipulation.h: Likewise.
> * ipa-prop.c (adjust_agg_replacement_values): Use a helper from
> ipa_param_adjustments to get current parameter indices.
> (ipcp_modif_dom_walker::before_dom_children): Likewise.
> (ipcp_update_bits): Likewise.
> (ipcp_update_vr): Likewise.
> * ipa-split.c (split_function): Convert to using 
> ipa_param_adjustments.
> * lto-cgraph.c (output_cgraph_opt_summary_p): Likewise.
> (output_node_opt_summary): Do not stream removed fields.  Stream
> parameter adjustments instead of argumetns to skip.
> (input_node_opt_summary): Likewise.
> (input_node_opt_summary): Likewise.
> * multiple_target.c (create_target_clone): Update to reflet new type
> of create_version_clone_with_body.
> * omp-simd-clone.c (simd_clone_vector_of_formal_parm_types): Adjust
> for the new interface.
> (simd_clone_clauses_extract): Likewise, make args an auto_vec.
> (simd_clone_compute_base_data_type): Likewise.
> (simd_clone_init_simd_arrays): Adjust for the new interface.
> (simd_clone_adjust_argument_types): Likewise.
> (struct modify_stmt_info): Likewise.
> (ipa_simd_modify_stmt_ops): Likewise.
> (ipa_simd_modify_function_body): Likewise.
> (simd_clone_adjust): Likewise.
> * trans-mem.c (ipa_tm_create_version): Update to reflect new type of
> tree_function_versioning.
> * tree-inline.h (copy_body_data): New fields killed_new_ssa_names and
>   param_body_adjs.
> (copy_decl_to_var): Declare.
> * tree-inline.c (update_clone_info): Do not remap old_tree.
> (remap_gimple_stmt): Use ipa_param_body_adjustments to modify gimple
> statements, walk all extra generated statements and remap their
> operands.
> (redirect_all_calls): Add killed SSA names to a hash set.
> (remap_ssa_name): Do not remap killed SSA names.
> (copy_arguments_for_versioning): Renames to copy_arguments_nochange,
> half of functionality moved to ipa_param_body_adjustments.
> (copy_decl_to_var): Make exported.
> (copy_body): Destroy killed_new_ssa_names hash set.
> (expand_call_inline): Remap performed splits.
> (update_clone_info): Likewise.
> (tree_function_versioning): Simplify tree_map processing.  Updated to
> accept ipa_param_adjustments and use ipa_param_body_adjustments.


H

Ping: [PATCH 1/2] gdbhooks.py: use strip_typedefs to simplify matching type names

2019-07-24 Thread Vladislav Ivanishin
Vladislav Ivanishin  writes:

> David Malcolm  writes:
>
>> On Tue, 2019-07-02 at 14:29 +0300, Vladislav Ivanishin wrote:
>>> David Malcolm  writes:
>>> 
>>> > On Mon, 2019-07-01 at 12:50 +0300, Vladislav Ivanishin wrote:
>>> > > Hi!
>>> > > 
>>> > > GDB's Python API provides strip_typedefs method that can be
>>> > > instrumental for writing DRY code.  Using it at least partially
>>> > > obviates the need for the add_printer_for_types method we have in
>>> > > gdbhooks.py (it takes a list of typenames to match and is usually
>>> > > used to deal with typedefs).
>>> > > 
>>> > > I think, the code can be simplified further (there's still
>>> > > ['tree_node *', 'const tree_node *'], which is a little awkward)
>>> > > if we deal with pointer types in a uniform fashion (I'll touch on
>>> > > this in the description of the second patch).  But that can be
>>> > > improved separately.
>>> > > 
>>> > > The gimple_statement_cond, etc. part has been dysfunctional for a
>>> > > while (namely since gimple-classes-v2-option-3 branch was
>>> > > merged).  I updated it to use the new classes' names.  That seems
>>> > > to work (though it doesn't output much info anyway: pretty
>>> > > vs. 
>>> > >(gphi *) 0x778c0200
>>> > > in the raw version).
>>> > > 
>>> > > I changed the name passed to pp.add_printer_for_types() for
>>> > > scalar_mode and friends -- so they all share the same name now --
>>> > > but I don't believe the name is used in any context where it
>>> > > would matter.
>>> > 
>>> > IIRC there's a gdb command to tell you what the registered pretty-
>>> > printers are;
>>> 
>>> Yeah, it's (gdb) info pretty-printer
>>> 
>>> > I think the name is only ever used in that context (or maybe for
>>> > fine-grained control of pretty-printing) -
>>> 
>>> From the gdb info manual:
>>> 'disable pretty-printer [OBJECT-REGEXP [NAME-REGEXP]]'
>>>  Disable pretty-printers matching OBJECT-REGEXP and NAME-REGEXP.
>>> 
>>> In our case, this is how to do it:
>>> (gdb) disable pretty-printer global gcc;scalar.*
>>> 
>>> So, that would be a regression, if different modes were given
>>> different names on purpose (starts with r251467).  Looks
>>> unintentional though, and the subprinter is very simple.
>>> 
>>> I think, these scenarios exhaust the uses for the name attribute of a
>>> pretty printer.
>>> 
>>> > and I don't know of anyone who uses that.  I don't think changing
>>> > the name matters.
>>> 
>>> Good.  Neither do I :) Except after writing this I started using this
>>> feature myself.  It provides a convenient way to isolate a faulty
>>> pretty-printer, or continue debugging without it.  The manual says
>>> that "The consequences of a broken pretty-printer are severe enough
>>> that GDB provides support for enabling and disabling individual
>>> printers".  Oh, were they right ...  (see below).
>>
>> In any case, I have no objection to the change-of-names part of the
>> patch.
>>
>>> > > This is just a clean up of gdbhooks.py.  OK to commit?
>>> > 
>>> > The only area I wasn't sure about were the removal hunks relating
>>> > to machine modes: is that all covered now by the looking-through-
>>> > typedefs?
>>> 
>>> Yes, I checked (using debug output) that all the modes for which I
>>> removed explicit handling are correctly expanded to either
>>> opt_mode<>, or pod_mode<>.
>>> 
>>> > Otherwise, looks good to me.  I assume you've tested the patch by
>>> > debugging with it.
>>> 
>>> Yeah, I thought my debugging with the patch should have given it
>>> reasonable coverage.
>>> 
>>> However, I had not previously actually tested debugging code
>>> involving modes, so I went ahead and did it.  And I am seeing a
>>> segfault with RtxPrinter now (-ex "b reg_nonzero_bits_for_combine"
>>> -ex "r" -ex "bt").  Delving in...
>>> 
>>> The rtx type (for which the unqualified form is also 'rtx') was not
>>> being matched by any of the printers.  The typedef-stripped form is
>>> 'rtx_def *'. It matches now and uncovers a bug in the subprinter(?)
>>> 
>>> I dug into it and the impression I have so far is that the
>>> print_inline_rtx call upsets gdb's frame unwinders... Upon the "bt"
>>> command, GDB itself produces more than 157000 frames before hitting
>>> the segfault.
>>
>> Sounds like an infinite recursion (possibly a mutual recursion
>> involving gdb repeatedly injecting new frames into the inferior cc1, or
>> could just be an infinite recursion in the inferior).
>
> It was an infinite recursion, all inside gdb.
>
>>> Have you seen anything like that before?  It would be nice to
>>> understand the root cause of this bug.  I am going to spend some more
>>> time on it, but I've no prior experience in debugging gdb internals.
>>> 
>>> As a workaround, I'd propose perhaps removing the kludge, but the
>>> output looks nice, while the alternative implementation (after
>>> return) is not as informative.
>>
>> Yeah, the rtx-printing is a kludge.
>>
>> The gdbhooks.py code isn't well tested.  I wonder if what you're

Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Richard Biener
On Wed, Jul 24, 2019 at 12:56 AM Andrew MacLeod  wrote:
>
> On 7/23/19 5:33 AM, Richard Biener wrote:
> >
> >> What irange contains internally is a bit arbitrary.  It's really an API
> >> for managing a set of subranges.  We could have used trees internally
> >> just as easily, then we wouldnt need a type field. Since we were doing
> >> lots of operations, rather than going back and forth from trees all the
> >> time, we just used the underlying wide_int directly.   we could have
> >> fiddle farted around with HOST_WIDE_INT or whatever, but wide_int is
> >> there, has all the operations, and it works fine. so thats what it
> >> currently is on the branch.
> > But a range has no type.  Period.  The fact that with the in-tree 
> > implementation
> > there's access to a type is artificial.  All workers in the in-tree
> > implementation
> > get a type for the operation they carry out.
> >
> > But somehow irange doesn't get that.
> >
> > In fact, an irange shouldn't be bound to any type, and the usual
> > "carry out multiplication of two integer typed vars with the result
> > in integer" on irange should be multiply two iranges (with unbound
> > result) plus a "truncate to integer type" operation.
> >
>
> so following thru on the implementation details of doing that,  we do
> the exact same thing that VRP does for multiply.. we eventually call
> wide_int_range_multiplicative_op.
> The code from tree-vrp.c:
>
>wide_int res_lb, res_ub;
>wide_int vr0_lb = wi::to_wide (vr0_min);
>wide_int vr0_ub = wi::to_wide (vr0_max);
>wide_int vr1_lb = wi::to_wide (vr1->min ());
>wide_int vr1_ub = wi::to_wide (vr1->max ());
>bool overflow_undefined = TYPE_OVERFLOW_UNDEFINED (type);
>unsigned prec = TYPE_PRECISION (type);
>
>if (wide_int_range_multiplicative_op (res_lb, res_ub,
>  code, TYPE_SIGN (type), prec,
>  vr0_lb, vr0_ub, vr1_lb, vr1_ub,
>  overflow_undefined))
>  vr->set (VR_RANGE, wide_int_to_tree (type, res_lb),
>   wide_int_to_tree (type, res_ub));
>
>
> Which , lo and behold, it requires a type in order to get the signop
> right in the 4th argument.  we also use the type to figure out the
> precision in the 5th argument, as well as the overflow condition at the
> end.
>
> So it *is* bound to a type in order to do the operation, its just a
> matter of whether you pass that type around, or include it with the
> object.  I simply can't imagine why you think it isn't.

Because wide_int_range_multiplicative_op includes the truncating
and because for some unknown reason the caller of
extract_range_from_multiplicative_op doesn't pass down the
type of the computation we extract it from the type of operand zero.
The caller being extract_range_from_binary_expr.

That said, the wide_int_range_* stuff isn't the prettiest stuff in the
world and it uses wide_ints which have a precision (I didn't agree
to that - heh).

> sure, in this case the LHS, vr0, and vr1 are all the same type.. (or
> should be type compatible_p), so we can pass in one type, but not all
> operations follow that pattern... casts, shifts, comparisons, etc can
> have different types in the operand positions.  We include it with each
> range and  we always have accurate info associated with the operand.
>
> How is that  a bad thing?

It makes you _require_ a type on operands.  But constants do not
have a type.  A 1 is a 1 if it is unsigned int or bool.

> >> We are treating a range object as a unique self contained object.
> >> Therefore, the range has a type so we know how to print it, and can
> >> confirm before any operation that the ranges being operated on are
> >> appropriately matched.  We found and opened bugzillas over the past
> >> couple years for places where our code caught bugs because a range was
> >> created and then operated on in a way that was not compatible with
> >> another range.  I think there is a still an open one against ada(?)
> >> where the switch and case  are different precision.
> > I'm fine with sanity-checking where it makes sense but looking at
> > the irange code the fact that there is a type is a fundamental requirement.
> > IMHO that's bad design.
>
> we could remove the type from the range object.. we aren't critically
> tied to it, but then virtually everywhere we pass a range, we'll be
> passing in the type to be operated on, or the sign flag, or overflow
> flag,   or some combination of those.  Its only a matter of time until
> someone gets them out of whack.. It seems far more logical  to simply
> keep the type associated so we can pick up overflow, signop, precision
> and such as needed.. and do some sanity checking along the way.  thats
> what the type field is for after all, to consolidate all the info you
> might want...  Why pass the extra parameters when you don' t need to.

Because you might want to use a range without first needing to change
its type

Re: [PATCH 1/2] [ARC] Fix and refurbish the interrupts.

2019-07-24 Thread Claudiu Zissulescu
Pushed. Thank you for your review,
Claudiu

On Tue, Jul 23, 2019 at 12:51 AM Jeff Law  wrote:
>
> On 7/9/19 10:23 AM, claz...@gmail.com wrote:
> > Hi Jeff,
> >
> > Please find attached the updated patch.
> >
> > What is new:
> > - mailing list feedback is taken into account.
> > - some comments are updated.
> > - a new test is added.
> > - the ARC AUX registers used by ZOL (hardware loop) and FPX (a custom
> > floating point implementation) are saved before fp-register.
> > - the millicode optimization is not used by ISR.
> >
> >
> > Thank you,
> > Claudiu
> >
> >
> > 0001-ARC-Fix-and-refurbish-the-interrupts.patch
> >
> > From d22368681b7aab4bef4b5c32a9a472808f2c16cd Mon Sep 17 00:00:00 2001
> > From: Claudiu Zissulescu 
> > Date: Fri, 17 May 2019 14:48:17 +0300
> > Subject: [PATCH] [ARC] Fix and refurbish the interrupts.
> >
> > When entering an interrupt, not only the call save registers needs to
> > be place on stack but also the call clobbers one. More over, the
> > ARC700 return from interrupt instruction needs to be rtie, the same
> > like ARCv2 CPUs. While the ARC6xx family uses j.f [ilinkX]
> > instruction. Additionally, we need to save the state of the ZOL
> > machinery, namely the lp_count, lp_end and lp_start registers. For
> > architectures which are using extension registers (i.e., HS48) we need
> > to save/restore them as well.
> >
> > gcc/
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * config/arc/arc-protos.h (arc_output_function_epilogue): Delete
> >   declaration.
> >   (arc_compute_frame_size): Millicode is disabled when compiling
> >   ISR.
> >   (arc_return_address_register): Likewise.
> >   (arc_compute_function_type): Likewise.
> >   (arc_compute_frame_size): Likewise.
> >   (secondary_reload_info): Likewise.
> >   (arc_get_unalign): Likewise.
> >   (arc_can_use_return_insn): Declare.
> >   * config/arc/arc.c (AUX_LP_START): Define
> >   (AUX_LP_END): Likewise.
> >   (arc_frame_info): Update gmask member to 64-bit datum.
> >   (GMASK_LEN): Update.
> >   (arc_compute_function_type): Make it static, move it forward.
> >   (arc_must_save_register): Update, consider the extra regs.
> >   (arc_compute_millicode_save_restore_regs): Update to use the 64
> >   bit gmask.
> >   (arc_compute_frame_size): Likewise.
> >   (arc_enter_leave_p): Likewise.
> >   (arc_save_callee_saves): Likewise.
> >   (arc_restore_callee_saves): Likewise.
> >   (arc_save_callee_enter): Likewise.
> >   (arc_restore_callee_leave): Likewise.
> >   (arc_save_callee_milli): Likewise.
> >   (arc_restore_callee_milli): Likewise.
> >   (arc_expand_prologue): Add new interrupt handling.
> >   (arc_return_address_register): Make it static, move it forward.
> >   (arc_expand_epilogue): Add new interrupt handling.
> >   (arc_get_unalign): Delete.
> >   (arc_epilogue_uses): Make sure we do not remove the extra
> >   saved/restored registers when interrupt.
> >   (arc_can_use_return_insn): New function.
> >   (push_reg): Likewise.
> >   (pop_reg): Likewise.
> >   (arc_save_callee_saves): Add ZOL and FPX aux registers saving
> >   procedures.
> >   (arc_restore_callee_saves): Likewise, but restoring.
> >   * config/arc/arc.md (VUNSPEC_ARC_ARC600_RTIE): Define.
> >   (R33_REG): Likewise.
> >   (R34_REG): Likewise.
> >   (R35_REG): Likewise.
> >   (R36_REG): Likewise.
> >   (R37_REG): Likewise.
> >   (R38_REG): Likewise.
> >   (R39_REG): Likewise.
> >   (R45_REG): Likewise.
> >   (R46_REG): Likewise.
> >   (R47_REG): Likewise.
> >   (R48_REG): Likewise.
> >   (R49_REG): Likewise.
> >   (R50_REG): Likewise.
> >   (R51_REG): Likewise.
> >   (R52_REG): Likewise.
> >   (R53_REG): Likewise.
> >   (R54_REG): Likewise.
> >   (R55_REG): Likewise.
> >   (R56_REG): Likewise.
> >   (R58_REG): Likewise.
> >   (type): Add rtie attribute.
> >   (in_call_delay_slot): Use RETURN_ADDR_REGNUM.
> >   (movsi_insn): Accept moves to lp_count.
> >   (rtie): Update pattern.
> >   (simple_return): Simplify it, don't use this pattern as a return
> >   from an interrupt.
> >   (arc600_rtie): New pattern.
> >   (p_return_i): Clean up.
> >   (return): Likewise.
> >   * config/arc/builtins.def (rtie): Only available for non ARC6xx
> >   family CPUs.
> >   * config/arc/predicates.md (move_src_operand): Consider lp_count
> >   as a register.
> >
> > gcc/testsuite
> > -xx-xx  Claudiu Zissulescu  
> >
> >   * gcc.target/arc/arc.exp (check_effective_target_accregs): New
> >   predicate.
> >   * gcc.target/arc/builtin_special.c: Update test/
> >   * gcc.target/arc/interrupt-1.c: Likewise.
> >   * gcc.target/arc/interrupt-10.c: New test.
> >   * gcc.target/arc/interrupt-11.c: Likewise.
> >   * gcc.target/arc/interrupt-12.c: Likewise.
> OK
>
> Jeff


Re: [patch] Set TREE_THIS_NOTRAP throughout tree-nested.c

2019-07-24 Thread Richard Biener
On Wed, Jul 24, 2019 at 11:14 AM Eric Botcazou  wrote:
>
> Hi,
>
> stack memory is considered non-trapping by the compiler once the frame has
> been established so TREE_THIS_NOTRAP can be set on the dereferences built
> during the unnesting pass.
>
> Tested on x86_64-suse-linux, OK for the mainline?

OK.

Thanks,
Richard.

>
> 2019-07-24  Eric Botcazou  
>
> * tree-nested.c (build_simple_mem_ref_notrap): New function.
> (get_static_chain): Call it instead of (build_simple_mem_ref.
> (get_frame_field): Likewise.
> (get_nonlocal_debug_decl): Likewise.
> (convert_nonlocal_reference_op): Likewise.
>
> --
> Eric Botcazou


Re: [patch] More precise message with -Winline

2019-07-24 Thread Richard Biener
On Wed, Jul 24, 2019 at 11:29 AM Eric Botcazou  wrote:
>
> Hi,
>
> for EH cleanups, the branch probability heuristics can consider that edges are
> never taken, i.e. their profile count is set to zero.  In this case, no matter
> how you tweak the inlining parameters, you cannot get function calls inlined,
> but -Winline nevertheless prints the standard message about unlikely calls as
> in the cases when tweaking the inlining parameters can work.
>
> Therefore the attached patch adds a new CIF code with the warning message:
>
> "call is never executed and code size would grow [-Winline]"
>
> for this case, thus signaling the user that he'd better not waste time trying
> to get the call inlined.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Looks good besides

+ if (e->count.ipa () == profile_count::zero ())
+   e->inline_failed = CIF_NEVER_CALL;

does it actually matter what kind of profile quality e->count.ipa has
compared to profile_count::zero ()?  Also

+/* Call is considered never executed.  */
+DEFCIFCODE(NEVER_CALL, CIF_FINAL_NORMAL,
+  N_("call is never executed and code size would grow"))

suggests the call is never executed, but we only assume that
(or the profile training run never hit it).

Richard.

>
>
> 2019-07-24  Eric Botcazou  
>
> * cif-code.def (NEVER_CALL): New code.
> * ipa-inline.c (want_inline_small_function_p): Fix formatting issues.
> Set the failure to CIF_NEVER_CALL if the IPA count is zero.
>
> --
> Eric Botcazou


Re: Representative returns and location info (Re: [RFC, PATCH] Display inlining context for uninitialized warnings)

2019-07-24 Thread Eric Botcazou
> (Let's focus on location info and representative returns in this
> subthread.)
> 
> AFAIU, a return statement may be chosen as a representative return for a
> function.  The representative return's location is set to
> UNKNOWN_LOCATION, because otherwise the results of coverage analysis are
> skewed.  What is the difficulty preventing us from having both the
> location info for the return and faithful coverage?

That's simply impossible since you merge two instructions with different 
locations into a single one.  But the return location should be on the goto.

-- 
Eric Botcazou


[PATCH] Fix PR91236

2019-07-24 Thread Richard Biener


The following fixes a stack corruption and a missed optimization.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-07-24  Richard Biener  

PR tree-optimization/91236
* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Fix
size of CONSTRUCTOR write.  Fix buffer size we pass to
native_encode_expr.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 273758)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1818,7 +1870,7 @@ vn_walk_cb_data::push_partial_def (const
   if (TREE_CODE (pd.rhs) == CONSTRUCTOR)
/* Empty CONSTRUCTOR.  */
memset (buffer + MAX (0, pd.offset),
-   0, MIN ((HOST_WIDE_INT)sizeof (buffer),
+   0, MIN ((HOST_WIDE_INT)sizeof (buffer) - MAX (0, pd.offset),
pd.size + MIN (0, pd.offset)));
   else
{
@@ -1833,7 +1885,7 @@ vn_walk_cb_data::push_partial_def (const
  pad = GET_MODE_SIZE (mode) - pd.size;
}
  len = native_encode_expr (pd.rhs, buffer + MAX (0, pd.offset),
-   sizeof (buffer - MAX (0, pd.offset)),
+   sizeof (buffer) - MAX (0, pd.offset),
MAX (0, -pd.offset) + pad);
  if (len <= 0 || len < (pd.size - MAX (0, -pd.offset)))
{



Re: [patch] More precise message with -Winline

2019-07-24 Thread Eric Botcazou
> Looks good besides
> 
> + if (e->count.ipa () == profile_count::zero ())
> +   e->inline_failed = CIF_NEVER_CALL;
> 
> does it actually matter what kind of profile quality e->count.ipa has
> compared to profile_count::zero ()?

I don't really know, the other examples in the function use e->count.ipa too.

> Also
> 
> +/* Call is considered never executed.  */
> +DEFCIFCODE(NEVER_CALL, CIF_FINAL_NORMAL,
> +  N_("call is never executed and code size would grow"))
> 
> suggests the call is never executed, but we only assume that
> (or the profile training run never hit it).

OK, I added "considered" as in the comment above.

-- 
Eric Botcazou


Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-24 Thread Fredrik Noring
Hi Maciej,

> > > > How can one reasonably prevent the bug when compiling a whole Linux
> > > > distribution with thousands of packages if GAS merely warns and proceeds
> > > > in many cases?
> > > 
> > >  I think the tools must not set a policy.  By using `.set noreorder' the 
> > > user told the toolchain he knows better and asked to keep its hands away.
> > > 
> > >  As I say you can use `-Werror' for code auditing.  And in most cases 
> > > manually scheduled delay slots in handcoded assembly are a sign of a 
> > > problem with the piece of code affected, regardless of the R5900 erratum.
> > 
> > Well, it seems practical to use unofficial tools and a patched GAS to fix
> > this assembly bug, then. It's a grave problem for the R5900 and it needs to
> > be reliably detected.
> 
>  Why?  What is wrong with using `-Werror'?
> 
>  Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to 
> the assembly stage only if you are concerned about bad high-level language 
> code quality interfering with your goal.

It appears unfeasible to fix thousands of build scripts to work like that.
Although the numbers with actual MIPS code in them likely are very small,
some may source macros and other stuff via various libraries. Some
distributions, such as Gentoo, do have global CFLAGS but such mechanism are
unreliable.

I understand that one may want to take a principled stance with "hands away"
and "author asked for trouble", but fact is that whomever put a given
noreorder directive into a piece of code most likely didn't have R5900
errata in mind, and R5900 users most likely don't want the assembler to
generate code that is known to cause subtle bugs of all imaginable kinds,
including data corruption.

Hence my recommendation. It's not really an argument because I'm not going
to ask that the official GAS changes behaviour on this.

> >  3:
> > +   short_loop_war(3b)
> > b   3b
> >  nop
> 
>  Is it needed here given that the delay slot instruction is a NOP anyway?  

Ah, no. That is a left-over from the note that "a branch delay slot of the
loop is not NOP (EE 2.9 or later)", as it appears a NOP is not enough for
EE before 2.9.

Perhaps the short_loop_war macro can be improved to give errors when
misused. For example, it could potentially give an error if it is placed
outside of noreorder, but it doesn't seem like GAS can inform the macro
whether it is inside or outside.

>  Also this source does not need `.set noreorder', except for the loop 
> around `1:' where a data dependency is present with the delay slot 
> instruction.

That data dependency can surely be reworked too, so as to make the whole
piece reorderable?

> And I am surprised that it even assembles as it uses 
> `.cprestore' without the required offset argument (not that this pseudo-op 
> makes any sense here).  And it's weird overall, e.g. it loads $ra 
> explicitly rather than using JALR (or JAL even).  But these are unrelated 
> problems.

Good to know, thanks. :)

Fredrik


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-24 Thread Jeff Law
On 7/23/19 8:00 AM, Richard Biener wrote:
> 
> The following fixes the runtime regression of 456.hmmer caused
> by matching ICC in code generation and using cmov more aggressively
> (through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
> by manual assembler editing) using the SSE unit for performing
> SImode loads, adds and then two singed max operations plus stores
> is quite a bit faster than cmovs - even faster than the original
> single cmov plus branchy second max.  Even more so for AMD CPUs
> than Intel CPUs.
> 
> Instead of hacking up some pattern recognition pass to transform
> integer mode memory-to-memory computation chains involving
> conditional moves to "vector" code (similar to what STV does
> for TImode ops on x86_64) the following simply allows SImode
> into SSE registers (support for this is already there in some
> important places like move patterns!).  For the particular
> case of 456.hmmer the required support is loads/stores
> (already implemented), SImode adds and SImode smax.
> 
> So the patch adds a smax pattern for SImode (we don't have any
> for scalar modes but currently expand via a conditional move sequence)
> emitting as SSE vector max or cmp/cmov depending on the alternative.
> 
> And it amends the *add_1 pattern with SSE alternatives
> (which have to come before the memory alternative as IRA otherwise
> doesn't consider reloading a memory operand to a register).
> 
> With this in place the runtime of 456.hmmer improves by 10%
> on Haswell which is back to before regression speed but not
> to same levels as seen with manually editing just the single
> important loop.
> 
> I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
> interesting is probably Zen where moves crossing the
> integer - vector domain are excessively expensive (they get
> done via the stack).
> 
> Clearly this approach will run into register allocation issues
> but it looks cleaner than writing yet another STV-like pass
> (STV itself is quite awkwardly structured so I refrain from
> touching it...).
> 
> Anyway - comments?  It seems to me that MMX-in-SSE does
> something very similar.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, previous testing
> revealed some issue.  Forgot that *add_1 also handles
> DImode..., fixed below, re-testing in progress.
Certainly simpler than most of the options and seems effective.

FWIW, I think all the STV code is still disabled and has been for
several releases.  One could make an argument it should get dropped.  If
someone wants to make something like STV work, they can try again and
hopefully learn from the problems with the first implementation.
jeff


Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Jeff Law
On 7/24/19 12:47 AM, Martin Liška wrote:
> On 7/24/19 12:32 AM, Jeff Law wrote:
>> On 7/23/19 8:23 AM, Martin Liška wrote:
>>> On 7/23/19 3:57 PM, Jeff Law wrote:
 On 7/23/19 7:50 AM, Michael Matz wrote:
> Hi,
>
> On Tue, 23 Jul 2019, Jeff Law wrote:
>
>>> great you found time to make this. It should become the default for
>>> -flto IMO.
>> I was going to hack it into the rpm configury bits since we have access
>> to the # cores there.  But an auto-selector within GCC is even better.
>>
>> BTW, isn't this all going to wreck havoc with reproducible builds since 
>> partitioning can affect code generation?  That's one of our open 
>> questions...
>
> See Richi for this, but the reason for -flto=auto (or just -flto, or 
> whatever the endresult will be) is _also_ reproducible builds, because 
> some packages like to encode the compile flags into their binaries and 
> hence would change depending on build host just because of "-flto=32" vs. 
> "-flto=64" even when the code remains exactly the same.
 Makes sense.

 What did you end up doing with old autoconf scripts that aren't LTO
 safe?  Many of the older style tests to see if a function exists are
 broken by LTO.  I've seen more issues with this than anything in the LTO
 space so far.
>>>
>>> Well, I've seen some of these failures, but only a few.
>> Many appear to be silent, possibly not really affecting anything (like
>> all the packages that test for doprnt, but really don't care about it in
>> the end).But there were enough real failures that I put in auditing
>> to detect any cases where we get different config.h files with LTO vs
>> non-LTO and that is tripping often enough to have my concerns about how
>> much work it's going to be to get everything fixed.
> 
> I see.
> 
>>
>>
>> But still, overall we're moving forward.  Next step is to get everything
>> classified into buckets and start iterating.  Presumably you'd be open
>> to a google doc of some kind where we can coordinate any such efforts?
> 
> Sure, I'm open. In the meantime, I've got a META issue that I use for tracking
> of LTO-related issues in openSUSE:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1133084
Sounds good.  I'll try to get it populated and at least the first level
categorization done today.  Matthew B. may do some additional analysis
while I'm offline.

I've got scripts which can query the jenkins build failure plugin to
help with that first level categorization.  So, in theory, I can query
for all the configury differences or query for any cases where LTO
exposed a fatal warning,  testsuite failures, etc.

I don't want to burn a lot of time on the config.h stuff right now.  At
this stage we're more interested in the other failure categories.  But
I'm going to have to figure out something shortly after returning when I
put in the Fedora 32 change request.

I've got your META BZ in one of my tabs.  I've referred to it several
times over the last few weeks :-)

jeff


[PATCH][ARM] Fix low reg issue in Thumb-2 movsi patterns

2019-07-24 Thread Wilco Dijkstra
The Thumb-2 movsi patterns try to prefer low registers for loads and stores.
However this is done incorrectly by using 2 separate variants with 'l' and 'h'
register classes.  The register allocator will only use low registers, and
as a result we end up with significantly more spills and moves to high
registers.  Fix this by merging the alternatives and use 'l*r' to indicate
preference for low registers.  This saves ~400 instructions from the pr77308
testcase.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-24  Wilco Dijkstra  

* config/arm/thumb2.md (thumb2_movsi_insn): Fix load/store low reg.
* config/arm/vfp.md (thumb2_movsi_vfp): Likewise.

--
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
78a6ea0b10dab97ed6651ce62e99cfd7a81722ab..c7000d0772a7e5887b6d05be188e8eb38c97217d
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -247,8 +247,8 @@ (define_insn "*thumb2_pop_single"
 ;; regs.  The high register alternatives are not taken into account when
 ;; choosing register preferences in order to reflect their expense.
 (define_insn "*thumb2_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l ,*hk,m,*m")
-   (match_operand:SI 1 "general_operand"  "rk,I,Py,K,j,mi,*mi,l,*hk"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l*rk,m")
+   (match_operand:SI 1 "general_operand"  "rk,I,Py,K,j,mi,l*rk"))]
   "TARGET_THUMB2 && !TARGET_IWMMXT && !TARGET_HARD_FLOAT
&& (   register_operand (operands[0], SImode)
|| register_operand (operands[1], SImode))"
@@ -262,22 +262,20 @@ (define_insn "*thumb2_movsi_insn"
 case 3: return \"mvn%?\\t%0, #%B1\";
 case 4: return \"movw%?\\t%0, %1\";
 case 5:
-case 6:
   /* Cannot load it directly, split to load it via MOV / MOVT.  */
   if (!MEM_P (operands[1]) && arm_disable_literal_pool)
return \"#\";
   return \"ldr%?\\t%0, %1\";
-case 7:
-case 8: return \"str%?\\t%1, %0\";
+case 6: return \"str%?\\t%1, %0\";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "type" 
"mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,load_4,store_4,store_4")
-   (set_attr "length" "2,4,2,4,4,4,4,4,4")
+  [(set_attr "type" "mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,store_4")
+   (set_attr "length" "2,4,2,4,4,4,4")
(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no")
-   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,0,0,*,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no")
+   (set_attr "pool_range" "*,*,*,*,*,4094,*")
+   (set_attr "neg_pool_range" "*,*,*,*,*,0,*")]
 )
 
 (define_insn "tls_load_dot_plus_four"
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
e0aaa7b00bb41c046da4531a293e123c94e8b9a4..b59dd6b71d228e042feda3a3a06d81dd01d200da
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -258,8 +258,8 @@ (define_insn "*arm_movsi_vfp"
 ;; is chosen with length 2 when the instruction is predicated for
 ;; arm_restrict_it.
 (define_insn "*thumb2_movsi_vfp"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r, l,*hk,m, 
*m,*t, r,*t,*t,  *Uv")
-   (match_operand:SI 1 "general_operand"  "rk,I,Py,K,j,mi,*mi,l,*hk, 
r,*t,*t,*UvTu,*t"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l*rk,m,*t, 
r,*t,*t,  *Uv")
+   (match_operand:SI 1 "general_operand"  "rk,I,Py,K,j,mi,l*rk, 
r,*t,*t,*UvTu,*t"))]
   "TARGET_THUMB2 && TARGET_HARD_FLOAT
&& (   s_register_operand (operands[0], SImode)
|| s_register_operand (operands[1], SImode))"
@@ -275,32 +275,30 @@ (define_insn "*thumb2_movsi_vfp"
 case 4:
   return \"movw%?\\t%0, %1\";
 case 5:
-case 6:
   /* Cannot load it directly, split to load it via MOV / MOVT.  */
   if (!MEM_P (operands[1]) && arm_disable_literal_pool)
return \"#\";
   return \"ldr%?\\t%0, %1\";
-case 7:
-case 8:
+case 6:
   return \"str%?\\t%1, %0\";
-case 9:
+case 7:
   return \"vmov%?\\t%0, %1\\t%@ int\";
-case 10:
+case 8:
   return \"vmov%?\\t%0, %1\\t%@ int\";
-case 11:
+case 9:
   return \"vmov%?.f32\\t%0, %1\\t%@ int\";
-case 12: case 13:
+case 10: case 11:
   return output_move_vfp (operands);
 default:
   gcc_unreachable ();
 }
   "
   [(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" 
"yes,no,yes,no,no,no,no,no,no,no,no,no,no,no")
-   (set_attr "type" 
"mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,load_4,store_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")
-   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4")
-   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no")
+   (set

Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread Jeff Law
On 7/23/19 10:20 AM, Martin Sebor wrote:
> On 7/22/19 10:26 PM, JiangNing OS wrote:
>> This patch is to fix PR91195. Is it OK for trunk?
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 711a31ea597..4db36644160 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2019-07-22  Jiangning Liu  
>> +
>> +    PR middle-end/91195
>> +    * tree-ssa-phiopt.c (cond_store_replacement): Work around
>> +    -Wmaybe-uninitialized warning.
>> +
>>   2019-07-22  Stafford Horne  
>>     * config/or1k/or1k.c (or1k_expand_compare): Check for int before
>> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
>> index b64bde695f4..7a86007d087 100644
>> --- a/gcc/tree-ssa-phiopt.c
>> +++ b/gcc/tree-ssa-phiopt.c
>> @@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb,
>> basic_block join_bb,
>>     tree base = get_base_address (lhs);
>>     if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
>>   return false;
>> +
>> +  /* The transformation below will inherently introduce a memory
>> load,
>> + for which LHS may not be initialized yet if it is not in NOTRAP,
>> + so a -Wmaybe-uninitialized warning message could be triggered.
>> + Since it's a bit hard to have a very accurate uninitialization
>> + analysis to memory reference, we disable the warning here to avoid
>> + confusion.  */
>> +  TREE_NO_WARNING (lhs) = 1;
> 
> The no-warning bit is sometimes (typically?) set by the middle-end
> after a warning has been issued, to avoid triggering other warnings
> down the line for an already diagnosed expression.  Unless it's
> done just for the purposes of a single pass and the bit is cleared
> afterwards, using it to avoid potential false positives doesn't
> seem like a robust solution.  It will mask warnings for constructs
> that have been determined to be invalid.
> 
> FWIW, the middle-end is susceptible to quite a few false positives
> that would nice to avoid.  We have discussed various approaches to
> the problem but setting the no-warning bit seems like too blunt of
> an instrument.
All true.

But in the case JiangNing is working with the transformation inherently
can introduce an uninitialized read.  It seems reasonable to mark those
loads he generates that don't have a dominating read.

His code takes something like

  if (x)
*p = 

And turns it into

  t1 = *p;
  t2 = x ?  : t1;
  *p = t2;

In the past we required there be a dominating read from *p (among other
restrictions).  That requirement was meant to ensure *p isn't going to
fault.  Jiang's work relaxes that requirement somewhat for objects that
we can prove aren't going to fault via other means.

Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
Certainly.  However, I believe we use it in other places where we know
the code we're emitting is safe, but can cause a warning.  I think
Jiang's work falls into that category.

I do think the bit should only be set if we don't have a dominating load
to minimize cases where we might inhibit a valid warning.

jeff






Re: [PATCH][ARM] Fix low reg issue in Thumb-2 movsi patterns

2019-07-24 Thread Kyrill Tkachov

Hi Wilco,

On 7/24/19 4:16 PM, Wilco Dijkstra wrote:
The Thumb-2 movsi patterns try to prefer low registers for loads and 
stores.
However this is done incorrectly by using 2 separate variants with 'l' 
and 'h'

register classes.  The register allocator will only use low registers, and
as a result we end up with significantly more spills and moves to high
registers.  Fix this by merging the alternatives and use 'l*r' to indicate
preference for low registers.  This saves ~400 instructions from the 
pr77308

testcase.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57


LGTM.

Ok.

Thanks,

Kyrill



ChangeLog:
2019-07-24  Wilco Dijkstra  

* config/arm/thumb2.md (thumb2_movsi_insn): Fix load/store low reg.
* config/arm/vfp.md (thumb2_movsi_vfp): Likewise.

--
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
78a6ea0b10dab97ed6651ce62e99cfd7a81722ab..c7000d0772a7e5887b6d05be188e8eb38c97217d 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -247,8 +247,8 @@ (define_insn "*thumb2_pop_single"
 ;; regs.  The high register alternatives are not taken into account when
 ;; choosing register preferences in order to reflect their expense.
 (define_insn "*thumb2_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l 
,*hk,m,*m")

-(match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,*mi,l,*hk"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l*rk,m")
+(match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,l*rk"))]
   "TARGET_THUMB2 && !TARGET_IWMMXT && !TARGET_HARD_FLOAT
    && (   register_operand (operands[0], SImode)
    || register_operand (operands[1], SImode))"
@@ -262,22 +262,20 @@ (define_insn "*thumb2_movsi_insn"
 case 3: return \"mvn%?\\t%0, #%B1\";
 case 4: return \"movw%?\\t%0, %1\";
 case 5:
-    case 6:
   /* Cannot load it directly, split to load it via MOV / MOVT.  */
   if (!MEM_P (operands[1]) && arm_disable_literal_pool)
 return \"#\";
   return \"ldr%?\\t%0, %1\";
-    case 7:
-    case 8: return \"str%?\\t%1, %0\";
+    case 6: return \"str%?\\t%1, %0\";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "type" 
"mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,load_4,store_4,store_4")

-   (set_attr "length" "2,4,2,4,4,4,4,4,4")
+  [(set_attr "type" 
"mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,store_4")

+   (set_attr "length" "2,4,2,4,4,4,4")
    (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no")
-   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,0,0,*,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no")
+   (set_attr "pool_range" "*,*,*,*,*,4094,*")
+   (set_attr "neg_pool_range" "*,*,*,*,*,0,*")]
 )

 (define_insn "tls_load_dot_plus_four"
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
e0aaa7b00bb41c046da4531a293e123c94e8b9a4..b59dd6b71d228e042feda3a3a06d81dd01d200da 
100644

--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -258,8 +258,8 @@ (define_insn "*arm_movsi_vfp"
 ;; is chosen with length 2 when the instruction is predicated for
 ;; arm_restrict_it.
 (define_insn "*thumb2_movsi_vfp"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r, 
l,*hk,m, *m,*t, r,*t,*t,  *Uv")
-(match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,*mi,l,*hk, 
r,*t,*t,*UvTu,*t"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" 
"=rk,r,l,r,r,l*rk,m,*t, r,*t,*t,  *Uv")
+(match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,l*rk, 
r,*t,*t,*UvTu,*t"))]

   "TARGET_THUMB2 && TARGET_HARD_FLOAT
    && (   s_register_operand (operands[0], SImode)
    || s_register_operand (operands[1], SImode))"
@@ -275,32 +275,30 @@ (define_insn "*thumb2_movsi_vfp"
 case 4:
   return \"movw%?\\t%0, %1\";
 case 5:
-    case 6:
   /* Cannot load it directly, split to load it via MOV / MOVT.  */
   if (!MEM_P (operands[1]) && arm_disable_literal_pool)
 return \"#\";
   return \"ldr%?\\t%0, %1\";
-    case 7:
-    case 8:
+    case 6:
   return \"str%?\\t%1, %0\";
-    case 9:
+    case 7:
   return \"vmov%?\\t%0, %1\\t%@ int\";
-    case 10:
+    case 8:
   return \"vmov%?\\t%0, %1\\t%@ int\";
-    case 11:
+    case 9:
   return \"vmov%?.f32\\t%0, %1\\t%@ int\";
-    case 12: case 13:
+    case 10: case 11:
   return output_move_vfp (operands);
 default:
   gcc_unreachable ();
 }
   "
   [(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" 
"yes,no,yes,no,no,no,no,no,no,no,no,no,no,no")
-   (set_attr "type" 
"mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,load_4,store_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")

-   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4")
-   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,   0, 0,*,*,*,*,*,1008,*")]
+   (set_attr "predicable_short_it" 
"yes,no,yes,no,no,no,no,no,no,no,

Re: [PATCH][MSP430] Reject -m{code,data}-region options unless large memory model is selected

2019-07-24 Thread Jeff Law
On 7/23/19 7:22 AM, Jozef Lawrynowicz wrote:
> The attached patch adds error messages to be emitted when -mcode/data-region 
> are
> used without the accompanying -mlarge option that enables the large memory
> model.
> Use of the upper/either regions without -mlarge can cause relocation overflows
> or stack mismanagement when incorrect call/return instructions are generated.
> In most cases, using the lower region without -mlarge has no effect, but it 
> can
> affect code generation unexpectedly (msp430_do_not_relax_short_jumps), or
> add the ".lower" prefix to some section names.
> 
> Similarly, handling of the upper/lower/either attributes has been
> modified so the attributes will be ignored, and a warning will be emitted, 
> when
> they are used without -mlarge.
> 
> I have added the error messages to the driver so that uses of
> -mcode/data-region=none can be caught. The compiler cannot catch "none" since
> it is the default value that the option variable is initialized to.
> 
> So only users calling cc1* directly will see the errors in
> msp430_option_override, but I also have updated them to be more similar to the
> new errors emitted by the driver.
> 
> Successfully regtested on trunk for msp430-elf.
> 
> Ok for trunk?
> 
> 
> 0001-MSP430-Disallow-mcode-data-region-unless-the-large-m.patch
> 
> From 0c37dc30d67229392ae8f834e462dd6336f7ccfc Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Mon, 22 Jul 2019 21:37:45 +0100
> Subject: [PATCH] MSP430: Disallow mcode/data-region unless the large memory
>  model is in use
> 
> gcc/ChangeLog:
> 
> 2019-07-23  Jozef Lawrynowicz  
> 
>   * config/msp430/msp430.h (DRIVER_SELF_SPECS): Define and emit errors
>   when -m{code,data}-region are used without -mlarge.
>   * config/msp430/msp430.c (msp430_option_override): Error when a
>   non-default code or data region is used without -mlarge.
>   (msp430_section_attr): Emit a warning and do not add upper/lower/either
>   attributes when they are used without -mlarge.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-07-23  Jozef Lawrynowicz  
> 
>   * gcc.target/msp430/pr78818-data-region.c: Add -mlarge to dg-options.
>   * gcc.target/msp430/region-misuse-code.c: New test.
>   * gcc.target/msp430/region-misuse-data.c: Likewise.
>   * gcc.target/msp430/region-misuse-code-data.c: Likewise.
>   * gcc.target/msp430/region-attribute-misuse.c: Likewise.
OK.  I didn't check all the formatting directives in the error messages.
 You might consider building a native compiler and using that to then
build your msp cross to ensure those formatting directives pass the
checker.  Reasonable adjustments to the error messages to address any
such issues found should be considered pre-approved.

jeff


Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-24 Thread Maciej W. Rozycki
Fredrik,

> >  Why?  What is wrong with using `-Werror'?
> > 
> >  Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to 
> > the assembly stage only if you are concerned about bad high-level language 
> > code quality interfering with your goal.
> 
> It appears unfeasible to fix thousands of build scripts to work like that.
> Although the numbers with actual MIPS code in them likely are very small,
> some may source macros and other stuff via various libraries. Some
> distributions, such as Gentoo, do have global CFLAGS but such mechanism are
> unreliable.

 I'd expect a sane distribution packager to prepare/patch any disobedient 
software packages to accept global distribution-specific flags, as there 
are often more important flags to pass which all packages need to be build 
with.

 However if they haven't, then put the flags in CC/CXX/etc. as you'd do 
with multilib selection options, or if that fails too, then wrap your 
compiler driver used for a distribution build into a shell script found 
according to $PATH (which may be non-standard) that adds any options 
required implicitly.  I've done that many times when I found myself in 
situation where I had to override some silly hardcoded assumptions, 
including removing or transforming rather than adding options passed.

> I understand that one may want to take a principled stance with "hands away"
> and "author asked for trouble", but fact is that whomever put a given
> noreorder directive into a piece of code most likely didn't have R5900
> errata in mind, and R5900 users most likely don't want the assembler to
> generate code that is known to cause subtle bugs of all imaginable kinds,
> including data corruption.

 Yes, and a warning is I think a reasonable solution.  Alternatively maybe 
you could persuade binutils maintainers to have a `configure' option to 
make `-fatal-warnings' GAS's default for a given build.

> > >  3:
> > > + short_loop_war(3b)
> > >   b   3b
> > >nop
> > 
> >  Is it needed here given that the delay slot instruction is a NOP anyway?  
> 
> Ah, no. That is a left-over from the note that "a branch delay slot of the
> loop is not NOP (EE 2.9 or later)", as it appears a NOP is not enough for
> EE before 2.9.

 Ah, OK then.

> >  Also this source does not need `.set noreorder', except for the loop 
> > around `1:' where a data dependency is present with the delay slot 
> > instruction.
> 
> That data dependency can surely be reworked too, so as to make the whole
> piece reorderable?

 The SW and ADDIU instructions can surely be swapped, e.g.:

PTR_LA  a0, _edata - 4
PTR_LA  a2, _end
1:  addiu   a0, a0, 4
sw  zero, 0(a0)
bne a2, a0, 1b

You can make it all `reorder' then.  Note that this preserves the final 
out-of-range store, which I gather is deliberate for performance reasons.

  Maciej


Re: [PATCH][MSP430] Allow lower-case "r" to be used in register names by defining ADDITIONAL_REGISTER_NAMES (PR target/70320)

2019-07-24 Thread Jeff Law
On 7/23/19 6:50 AM, Jozef Lawrynowicz wrote:
> The attached patch enables a lower-case "r" to be used in register names
> specified in asm statements clobber list or command line options, in addition 
> to
> the upper case "R" that is currently supported.
> 
> Successfully regtested on trunk for msp430-elf.
> 
> Ok for trunk?
> 
> 
> 0001-MSP430-additional-register.patch
> 
> From d639b2ba7d4a93d790bde3ad55df751116eab04b Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Mon, 22 Jul 2019 10:35:43 +0100
> Subject: [PATCH] MSP430 additional register
> 
> gcc/ChangeLog:
> 
> 2019-07-23  Jozef Lawrynowicz  
> 
>   PR target/70320
>   * config/msp430/msp430.h: Define ADDITIONAL_REGISTER_NAMES.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-07-23  Jozef Lawrynowicz  
> 
>   PR target/70320
>   * gcc.target/msp430/asm-register-names-lower-case.c: New test.
>   * gcc.target/msp430/asm-register-names-upper-case.c: Likewise.
OK
jeff


Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-24 Thread Jeff Law
On 7/24/19 9:31 AM, Maciej W. Rozycki wrote:
> Fredrik,
> 
>>>  Why?  What is wrong with using `-Werror'?
>>>
>>>  Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to 
>>> the assembly stage only if you are concerned about bad high-level language 
>>> code quality interfering with your goal.
>>
>> It appears unfeasible to fix thousands of build scripts to work like that.
>> Although the numbers with actual MIPS code in them likely are very small,
>> some may source macros and other stuff via various libraries. Some
>> distributions, such as Gentoo, do have global CFLAGS but such mechanism are
>> unreliable.
> 
>  I'd expect a sane distribution packager to prepare/patch any disobedient 
> software packages to accept global distribution-specific flags, as there 
> are often more important flags to pass which all packages need to be build 
> with.
Getting good flags injection and being able to audit it is painful, but
amazingly helpful (insert plug here for Nick's annobin/annocheck).
Fixing packages to accept the standard methods of flag injection as well
as addressing issues that arise is all standard procedure.


Jeff



Re: [PATCH][ARM] Fix low reg issue in Thumb-2 movsi patterns

2019-07-24 Thread Richard Earnshaw (lists)




On 24/07/2019 16:16, Wilco Dijkstra wrote:

The Thumb-2 movsi patterns try to prefer low registers for loads and stores.
However this is done incorrectly by using 2 separate variants with 'l' and 'h'
register classes.  The register allocator will only use low registers, and
as a result we end up with significantly more spills and moves to high
registers.  Fix this by merging the alternatives and use 'l*r' to indicate
preference for low registers.  This saves ~400 instructions from the pr77308
testcase.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-24  Wilco Dijkstra  

* config/arm/thumb2.md (thumb2_movsi_insn): Fix load/store low reg.
* config/arm/vfp.md (thumb2_movsi_vfp): Likewise.

--
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
78a6ea0b10dab97ed6651ce62e99cfd7a81722ab..c7000d0772a7e5887b6d05be188e8eb38c97217d
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -247,8 +247,8 @@ (define_insn "*thumb2_pop_single"
  ;; regs.  The high register alternatives are not taken into account when
  ;; choosing register preferences in order to reflect their expense.
  (define_insn "*thumb2_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l ,*hk,m,*m")
-   (match_operand:SI 1 "general_operand""rk,I,Py,K,j,mi,*mi,l,*hk"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l*rk,m")
+   (match_operand:SI 1 "general_operand""rk,I,Py,K,j,mi,l*rk"))]


I think this should be "lk*r", not "l*rk".  SP is only going to crop up 
in rare circumstances, but we are always going to need this pattern if 
it does and hiding this from register preferencing is pointless.  It's 
not like the compiler is going to start allocating SP in the more 
general case.


Same in the other case below, which leads to:

I'd also like to see all these movsi matching patterns merged into a 
single pattern that just enables the appropriate variants.  Having 
separate implementations for Arm, thumb2, vfp, iwmmx is just making 
maintenance of this stuff a nightmare.


R.


"TARGET_THUMB2 && !TARGET_IWMMXT && !TARGET_HARD_FLOAT
 && (   register_operand (operands[0], SImode)
 || register_operand (operands[1], SImode))"
@@ -262,22 +262,20 @@ (define_insn "*thumb2_movsi_insn"
  case 3: return \"mvn%?\\t%0, #%B1\";
  case 4: return \"movw%?\\t%0, %1\";
  case 5:
-case 6:
/* Cannot load it directly, split to load it via MOV / MOVT.  */
if (!MEM_P (operands[1]) && arm_disable_literal_pool)
return \"#\";
return \"ldr%?\\t%0, %1\";
-case 7:
-case 8: return \"str%?\\t%1, %0\";
+case 6: return \"str%?\\t%1, %0\";
  default: gcc_unreachable ();
  }
  }
-  [(set_attr "type" 
"mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,load_4,store_4,store_4")
-   (set_attr "length" "2,4,2,4,4,4,4,4,4")
+  [(set_attr "type" "mov_reg,mov_imm,mov_imm,mvn_imm,mov_imm,load_4,store_4")
+   (set_attr "length" "2,4,2,4,4,4,4")
 (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no")
-   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,0,0,*,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no")
+   (set_attr "pool_range" "*,*,*,*,*,4094,*")
+   (set_attr "neg_pool_range" "*,*,*,*,*,0,*")]
  )
  
  (define_insn "tls_load_dot_plus_four"

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
e0aaa7b00bb41c046da4531a293e123c94e8b9a4..b59dd6b71d228e042feda3a3a06d81dd01d200da
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -258,8 +258,8 @@ (define_insn "*arm_movsi_vfp"
  ;; is chosen with length 2 when the instruction is predicated for
  ;; arm_restrict_it.
  (define_insn "*thumb2_movsi_vfp"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r, l,*hk,m, *m,*t, 
r,*t,*t,  *Uv")
-   (match_operand:SI 1 "general_operand""rk,I,Py,K,j,mi,*mi,l,*hk, 
r,*t,*t,*UvTu,*t"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l*rk,m,*t, r,*t,*t, 
 *Uv")
+   (match_operand:SI 1 "general_operand""rk,I,Py,K,j,mi,l*rk, 
r,*t,*t,*UvTu,*t"))]
"TARGET_THUMB2 && TARGET_HARD_FLOAT
 && (   s_register_operand (operands[0], SImode)
 || s_register_operand (operands[1], SImode))"
@@ -275,32 +275,30 @@ (define_insn "*thumb2_movsi_vfp"
  case 4:
return \"movw%?\\t%0, %1\";
  case 5:
-case 6:
/* Cannot load it directly, split to load it via MOV / MOVT.  */
if (!MEM_P (operands[1]) && arm_disable_literal_pool)
return \"#\";
return \"ldr%?\\t%0, %1\";
-case 7:
-case 8:
+case 6:
return \"str%?\\t%1, %0\";
-case 9:
+case 7:
return \"vmov%?\\t%0, %1\\t%@ int\";
-case 10:
+case 8:
return \"vmov%?\\t%0, %1\\t%@ int\";
-case 11:
+case 9:
return \"vmov%?.f32\\t%0, %1\\t%@

Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Jeff Law
On 7/23/19 2:30 AM, Martin Liška wrote:
> Hi.
> 
> As we as openSUSE started using -flto, I see it very handy to have
> an option value that will automatically detect number of cores
> that can be used for parallel LTRANS phase.
> 
> Thoughts?
> 
> gcc/ChangeLog:
> 
> 2019-07-23  Martin Liska  
> 
>   * doc/invoke.texi: Document the new option value.
>   * lto-wrapper.c (cpuset_popcount): New function
>   is a copy of libgomp/config/linux/proc.c.
>   (init_num_threads): Likewise.
>   (run_gcc): Support -flto=auto.
OK.  I won't be at all surprised if this causes headaches on some hosts,
so please watch for Solaris, AIX, etc etc issues.

Jeff


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-24 Thread Jeff Law
On 7/23/19 4:15 AM, Martin Liška wrote:
> On 7/23/19 11:20 AM, Jan Hubicka wrote:
>> Hi,
>> great you found time to make this. It should become the default for
>> -flto IMO.
> 
> Works for me. Then I'm suggesting to not come up with -flto=auto and
> only document that -flto passed during linking will automatically detect
> number of cores. It's the same what clang does.
This variant is fine too.  Same caveats apply with the non-linux hosts.

jeff


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread Jeff Law
On 7/22/19 10:26 PM, JiangNing OS wrote:
> This patch is to fix PR91195. Is it OK for trunk?
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 711a31ea597..4db36644160 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2019-07-22  Jiangning Liu  
> +
> + PR middle-end/91195
> + * tree-ssa-phiopt.c (cond_store_replacement): Work around
> + -Wmaybe-uninitialized warning.
I'll conditionally OK this for the trunk.  Give Richi 48hrs to chime in
if he doesn't like the TREE_NO_WARNING approach.

For anyone watching the thread, the setting of TREE_NO_WARNING here is
only done in those cases where we haven't already seen a dominating
memory reference, so it minimizes how often we set TREE_NO_WARNING.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Jeff Law
On 7/23/19 3:42 AM, Richard Biener wrote:
> On Tue, Jul 23, 2019 at 1:59 AM Jeff Law  wrote:
>>
>> On 7/16/19 12:37 PM, Andrew MacLeod wrote:
>>> On 7/9/19 5:56 AM, Richard Biener wrote:
 On Tue, Jul 9, 2019 at 9:28 AM Aldy Hernandez  wrote:
>
>
> On 7/4/19 6:33 AM, Richard Biener wrote:
>> On Wed, Jul 3, 2019 at 2:17 PM Aldy Hernandez  wrote:
>>> On 7/3/19 7:08 AM, Richard Biener wrote:
 On Wed, Jul 3, 2019 at 11:19 AM Aldy Hernandez 
 wrote:
>>> How about we keep VARYING and UNDEFINED typeless until right before we
>>> call into the ranger.  At which point, we have can populate min/max
>>> because we have the tree_code and the type handy.  So right before we
>>> call into the ranger do:
>>>
>>>   if (varying_p ())
>>> foo->set_varying(TYPE);
>>>
>>> This would avoid the type cache, and keep the ranger happy.
>> you cannot do set_varying on the static const range but instead
>> you'd do
>>
>> value_range tem (*foo);
>> if (varying_p ())
>>  tem->set_full_range (TYPE);
>>
>> which I think we already do in some places.  Thus my question _where_
>> you actually need this.
> Basically, everywhere.  By having a type for varying/undefined, we don't
> have to special case anything.  Sure, we could for example, special case
> the invert operation for undefined / varying.  And we could special case
> everything dealing with ranges to handle varying and undefined, but why?
>We could also pass a type argument everywhere, but that's just ugly.
> However, I do understand your objection to the type cache.
>
> How about the attached approach?  Set the type for varying/undefined
> when we know it, while avoiding touching the CONST varying.  Then right
> before calling the ranger, pass down a new varying node with min/max for
> any varyings that were still typeless until that point.
>
> I have taken care of never adding a set_varying() that was not already
> there.  Would this keep the const happy?
>
> Technically we don't need to set varying/undef types for every instance
> in VRP, but we need it at least for the code that will be shared with
> range-ops (extract_range_from_multiplicative_op, union, intersect, etc).
>I just figured if we have the information, might as well set it for
> consistency.
>
> If you like this approach, I can rebase the other patches that depend on
> this one.
 OK, so I went ant checked what you do for class irange which has
 a type but no kind member (but constructors with a kind).  It also
 uses wide_int members for storage.  For a pure integer constant
 range representation this represents somewhat odd choices;  I'd
 have elided the m_type member completely here, it seems fully
 redundant.  Only range operations need to be carried out in a
 specific type (what I was suggesting above).  Even the precision
 encoded in the wide_int members is redundant then (I'd have
 expected widest_int here and trailing-wide-ints for optimizing
 storage).
>>>
>>> What irange contains internally is a bit arbitrary.  It's really an API
>>> for managing a set of subranges.  We could have used trees internally
>>> just as easily, then we wouldnt need a type field. Since we were doing
>>> lots of operations, rather than going back and forth from trees all the
>>> time, we just used the underlying wide_int directly.   we could have
>>> fiddle farted around with HOST_WIDE_INT or whatever, but wide_int is
>>> there, has all the operations, and it works fine. so thats what it
>>> currently is on the branch.
>>>
>>> We are treating a range object as a unique self contained object.
>>> Therefore, the range has a type so we know how to print it, and can
>>> confirm before any operation that the ranges being operated on are
>>> appropriately matched.  We found and opened bugzillas over the past
>>> couple years for places where our code caught bugs because a range was
>>> created and then operated on in a way that was not compatible with
>>> another range.  I think there is a still an open one against ada(?)
>>> where the switch and case  are different precision.
>>>
>>> From my point of view, a range object is similar to a tree node. A tree
>>> node has the bits to indicate what the value is, but also associates a
>>> type with those bits within the same object.  This is less error prone
>>> than passing around the bits and the type separately.  As ranges are
>>> starting to be used in many places outside of VRP, we should do the same
>>> thing with ranges.  WIth value_range it would actually be free since
>>> there is already a tree for the bounds already which contains the type.
>>>
>>>
>>>
>>>
>>>
 to fold_range/op_range?  The API also seems to be oddly
 constrained to binary ops.  Anyway, the way you build
 the operator table r

Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread Martin Sebor

On 7/24/19 9:25 AM, Jeff Law wrote:

On 7/23/19 10:20 AM, Martin Sebor wrote:

On 7/22/19 10:26 PM, JiangNing OS wrote:

This patch is to fix PR91195. Is it OK for trunk?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..4db36644160 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  Jiangning Liu  
+
+    PR middle-end/91195
+    * tree-ssa-phiopt.c (cond_store_replacement): Work around
+    -Wmaybe-uninitialized warning.
+
   2019-07-22  Stafford Horne  
     * config/or1k/or1k.c (or1k_expand_compare): Check for int before
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index b64bde695f4..7a86007d087 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb,
basic_block join_bb,
     tree base = get_base_address (lhs);
     if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
   return false;
+
+  /* The transformation below will inherently introduce a memory
load,
+ for which LHS may not be initialized yet if it is not in NOTRAP,
+ so a -Wmaybe-uninitialized warning message could be triggered.
+ Since it's a bit hard to have a very accurate uninitialization
+ analysis to memory reference, we disable the warning here to avoid
+ confusion.  */
+  TREE_NO_WARNING (lhs) = 1;


The no-warning bit is sometimes (typically?) set by the middle-end
after a warning has been issued, to avoid triggering other warnings
down the line for an already diagnosed expression.  Unless it's
done just for the purposes of a single pass and the bit is cleared
afterwards, using it to avoid potential false positives doesn't
seem like a robust solution.  It will mask warnings for constructs
that have been determined to be invalid.

FWIW, the middle-end is susceptible to quite a few false positives
that would nice to avoid.  We have discussed various approaches to
the problem but setting the no-warning bit seems like too blunt of
an instrument.

All true.

But in the case JiangNing is working with the transformation inherently
can introduce an uninitialized read.  It seems reasonable to mark those
loads he generates that don't have a dominating read.

His code takes something like

   if (x)
 *p = 

And turns it into

   t1 = *p;
   t2 = x ?  : t1;
   *p = t2;

In the past we required there be a dominating read from *p (among other
restrictions).  That requirement was meant to ensure *p isn't going to
fault.  Jiang's work relaxes that requirement somewhat for objects that
we can prove aren't going to fault via other means.

Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
Certainly.  However, I believe we use it in other places where we know
the code we're emitting is safe, but can cause a warning.  I think
Jiang's work falls into that category.

I do think the bit should only be set if we don't have a dominating load
to minimize cases where we might inhibit a valid warning.


I was thinking of a few cases where setting the no-warning bit might
interfere with detecting bugs unrelated to uninitialized reads:

  1) -Warray-bounds in gimple-ssa-warn-restrict and tree-vrp
  2) -Wstringop-overflow in tree-ssa-strlen (other than for calls
 to built-ins)

I couldn't come up with a test case that shows how it might happen
with this patch but I didn't spend too much time on it.

There are a number of existing instances of setting TREE_NO_WARNING
to suppress -Wuninitialized, and some are the cause of known problems.
Bugs 51545, 58950, 74762, 74765 or 89697 are examples.  They all boil
down to the fact that there's just one bit for all warnings.  Jakub
mentioned adding a hash-map for this.  That seems like a simple and
good solution.

Martin


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Richard Biener
On July 24, 2019 6:00:04 PM GMT+02:00, Jeff Law  wrote:
>On 7/23/19 3:42 AM, Richard Biener wrote:
>> On Tue, Jul 23, 2019 at 1:59 AM Jeff Law  wrote:
>>>
>>> On 7/16/19 12:37 PM, Andrew MacLeod wrote:
 On 7/9/19 5:56 AM, Richard Biener wrote:
> On Tue, Jul 9, 2019 at 9:28 AM Aldy Hernandez 
>wrote:
>>
>>
>> On 7/4/19 6:33 AM, Richard Biener wrote:
>>> On Wed, Jul 3, 2019 at 2:17 PM Aldy Hernandez 
>wrote:
 On 7/3/19 7:08 AM, Richard Biener wrote:
> On Wed, Jul 3, 2019 at 11:19 AM Aldy Hernandez
>
> wrote:
 How about we keep VARYING and UNDEFINED typeless until right
>before we
 call into the ranger.  At which point, we have can populate
>min/max
 because we have the tree_code and the type handy.  So right
>before we
 call into the ranger do:

   if (varying_p ())
 foo->set_varying(TYPE);

 This would avoid the type cache, and keep the ranger happy.
>>> you cannot do set_varying on the static const range but instead
>>> you'd do
>>>
>>> value_range tem (*foo);
>>> if (varying_p ())
>>>  tem->set_full_range (TYPE);
>>>
>>> which I think we already do in some places.  Thus my question
>_where_
>>> you actually need this.
>> Basically, everywhere.  By having a type for varying/undefined,
>we don't
>> have to special case anything.  Sure, we could for example,
>special case
>> the invert operation for undefined / varying.  And we could
>special case
>> everything dealing with ranges to handle varying and undefined,
>but why?
>>We could also pass a type argument everywhere, but that's just
>ugly.
>> However, I do understand your objection to the type cache.
>>
>> How about the attached approach?  Set the type for
>varying/undefined
>> when we know it, while avoiding touching the CONST varying.  Then
>right
>> before calling the ranger, pass down a new varying node with
>min/max for
>> any varyings that were still typeless until that point.
>>
>> I have taken care of never adding a set_varying() that was not
>already
>> there.  Would this keep the const happy?
>>
>> Technically we don't need to set varying/undef types for every
>instance
>> in VRP, but we need it at least for the code that will be shared
>with
>> range-ops (extract_range_from_multiplicative_op, union,
>intersect, etc).
>>I just figured if we have the information, might as well set
>it for
>> consistency.
>>
>> If you like this approach, I can rebase the other patches that
>depend on
>> this one.
> OK, so I went ant checked what you do for class irange which has
> a type but no kind member (but constructors with a kind).  It also
> uses wide_int members for storage.  For a pure integer constant
> range representation this represents somewhat odd choices;  I'd
> have elided the m_type member completely here, it seems fully
> redundant.  Only range operations need to be carried out in a
> specific type (what I was suggesting above).  Even the precision
> encoded in the wide_int members is redundant then (I'd have
> expected widest_int here and trailing-wide-ints for optimizing
> storage).

 What irange contains internally is a bit arbitrary.  It's really an
>API
 for managing a set of subranges.  We could have used trees
>internally
 just as easily, then we wouldnt need a type field. Since we were
>doing
 lots of operations, rather than going back and forth from trees all
>the
 time, we just used the underlying wide_int directly.   we could
>have
 fiddle farted around with HOST_WIDE_INT or whatever, but wide_int
>is
 there, has all the operations, and it works fine. so thats what it
 currently is on the branch.

 We are treating a range object as a unique self contained object.
 Therefore, the range has a type so we know how to print it, and can
 confirm before any operation that the ranges being operated on are
 appropriately matched.  We found and opened bugzillas over the past
 couple years for places where our code caught bugs because a range
>was
 created and then operated on in a way that was not compatible with
 another range.  I think there is a still an open one against ada(?)
 where the switch and case  are different precision.

 From my point of view, a range object is similar to a tree node. A
>tree
 node has the bits to indicate what the value is, but also
>associates a
 type with those bits within the same object.  This is less error
>prone
 than passing around the bits and the type separately.  As ranges
>are
 starting to be used in many places outside of VRP, we should do the
>same
 thing with ranges.  WIth value_range it would actually be free
>since
 there is already a tree for the bounds already which cont

Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-24 Thread Jeff Law
On 7/22/19 5:17 PM, Martin Sebor wrote:

>> Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
>> name by writing it this way?  :-)
> 
> I'm actually saving four characters over "store_before_nul" ;-)
:-)

> 
> It's just a name I picked because I like it.  Would you prefer to
> go back to the original 'cmp' instead?  It doesn't get much shorter
> than that, or less descriptive, especially when there is no comment
> explaining what it's for.  (FWIW, I renamed it because I found myself
> going back to the description of compare_nonzero_chars to remember
> what cmp actually meant.)
Fair enough.  Though "b4" reads like it belongs in a text message to me.
  I don't want to get nitty over this.  Will s/b4/before/ work for you?

If you wanted to add a comment before the variable, that would be good
as well, particularly since we don't have a good name.


> 
>> You've got two entries in the array, but the comment reads as if there's
>> just one element.  What is the difference between the two array elements?
> 
> Since handle_store deals with sequences of one or more bytes some
> of the optimizations it implements(*) need to consider both where
> the first byte of the sequence is stored and where the last one is.
> The first element of this array is for the first byte and the second
> one is for the last character.  The comment a few lines down is meant
> to make the distinction clear but we can expand the comment above
> the variable even more.
I think  my brain locked up with the "b4".  Maybe it went into a mode
where I'm trying to parse texts from my teenager which is clearly
incompatible with reading code. :-)

> 
>   /* The offset of the last stored byte.  */
>   unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;
> 
> (lenrange[2] is the size of the store.)
> 
> [*] E.g., the one we discussed not so long ago (PR90989) or the one
> that removes a store of '\0' over the terminating '\0'.
> 
>>
>> I didn't see anything terribly concerning so far, but I do want to look
>> at handle_store again after the comment is fixed so that I'm sure I'm
>> interpreting things correctly.
> 
> Does this help?
> 
>   /* The corresponding element is set to 1 if the first and last
>  element, respectively, of the sequence of characters being
>  written over the string described by SI ends before
>  the terminating nul (if it has one), to zero if the nul is
>  being overwritten but not beyond, or negative otherwise.  */
Yea.  I also note a /* */ empty comment in handle_store, you probably
wanted to write something there :-)

OK with the nit fixes noted earlier, variable name fix and comment fixes.


jeff


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread Jeff Law
On 7/24/19 10:09 AM, Martin Sebor wrote:
> On 7/24/19 9:25 AM, Jeff Law wrote:
>> On 7/23/19 10:20 AM, Martin Sebor wrote:
>>> On 7/22/19 10:26 PM, JiangNing OS wrote:
 This patch is to fix PR91195. Is it OK for trunk?

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 711a31ea597..4db36644160 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,9 @@
 +2019-07-22  Jiangning Liu  
 +
 +    PR middle-end/91195
 +    * tree-ssa-phiopt.c (cond_store_replacement): Work around
 +    -Wmaybe-uninitialized warning.
 +
    2019-07-22  Stafford Horne  
      * config/or1k/or1k.c (or1k_expand_compare): Check for int
 before
 diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
 index b64bde695f4..7a86007d087 100644
 --- a/gcc/tree-ssa-phiopt.c
 +++ b/gcc/tree-ssa-phiopt.c
 @@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb,
 basic_block join_bb,
  tree base = get_base_address (lhs);
  if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
    return false;
 +
 +  /* The transformation below will inherently introduce a memory
 load,
 + for which LHS may not be initialized yet if it is not in NOTRAP,
 + so a -Wmaybe-uninitialized warning message could be triggered.
 + Since it's a bit hard to have a very accurate uninitialization
 + analysis to memory reference, we disable the warning here to
 avoid
 + confusion.  */
 +  TREE_NO_WARNING (lhs) = 1;
>>>
>>> The no-warning bit is sometimes (typically?) set by the middle-end
>>> after a warning has been issued, to avoid triggering other warnings
>>> down the line for an already diagnosed expression.  Unless it's
>>> done just for the purposes of a single pass and the bit is cleared
>>> afterwards, using it to avoid potential false positives doesn't
>>> seem like a robust solution.  It will mask warnings for constructs
>>> that have been determined to be invalid.
>>>
>>> FWIW, the middle-end is susceptible to quite a few false positives
>>> that would nice to avoid.  We have discussed various approaches to
>>> the problem but setting the no-warning bit seems like too blunt of
>>> an instrument.
>> All true.
>>
>> But in the case JiangNing is working with the transformation inherently
>> can introduce an uninitialized read.  It seems reasonable to mark those
>> loads he generates that don't have a dominating read.
>>
>> His code takes something like
>>
>>    if (x)
>>  *p = 
>>
>> And turns it into
>>
>>    t1 = *p;
>>    t2 = x ?  : t1;
>>    *p = t2;
>>
>> In the past we required there be a dominating read from *p (among other
>> restrictions).  That requirement was meant to ensure *p isn't going to
>> fault.  Jiang's work relaxes that requirement somewhat for objects that
>> we can prove aren't going to fault via other means.
>>
>> Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
>> Certainly.  However, I believe we use it in other places where we know
>> the code we're emitting is safe, but can cause a warning.  I think
>> Jiang's work falls into that category.
>>
>> I do think the bit should only be set if we don't have a dominating load
>> to minimize cases where we might inhibit a valid warning.
> 
> I was thinking of a few cases where setting the no-warning bit might
> interfere with detecting bugs unrelated to uninitialized reads:
> 
>   1) -Warray-bounds in gimple-ssa-warn-restrict and tree-vrp
>   2) -Wstringop-overflow in tree-ssa-strlen (other than for calls
>  to built-ins)
> 
> I couldn't come up with a test case that shows how it might happen
> with this patch but I didn't spend too much time on it.
I bet we could find one and it's more likely to show up on aarch64 than
x86 due to costing issues.  Either way we're making a bit of a judgment
call -- an extra false positive here due to a load the compiler inserted
on a path that didn't have one before, or potentially missing a warning
on that load because of the TREE_NO_WARNING.

I believe the false positive here is worse than the potential missed
warning.


> 
> There are a number of existing instances of setting TREE_NO_WARNING
> to suppress -Wuninitialized, and some are the cause of known problems.
> Bugs 51545, 58950, 74762, 74765 or 89697 are examples.  They all boil
> down to the fact that there's just one bit for all warnings.  Jakub
> mentioned adding a hash-map for this.  That seems like a simple and
> good solution.
I'm not sure how that really helps here.  We marking the MEM on the LHS
and that's not a shared object.  I don't see how it's going to be
significantly different using a hash map vs the bit in this circumstance.

If the bit were on an SSA_NAME, or a _DECL node, then the flag bit is
shared and would be a much larger concern.

jeff


> 
> Martin



Re: [PATCH 1/3] add -Wstruct-not-pod, -Wclass-is-pod, -Wmismatched-tags (PR 61339)

2019-07-24 Thread Jeff Law
On 7/22/19 6:53 PM, Martin Sebor wrote:
> On 7/22/19 4:19 PM, Jeff Law wrote:
>> On 7/8/19 3:58 PM, Martin Sebor wrote:
>>> The attached patch implements three new warnings:
>>>
>>>   *  -Wstruct-not-pod triggers for struct definitions that are not
>>>  POD structs,
>>>   *  -Wclass-is-pod triggers for class definitions that satisfy
>>>  the requirements on POD structs, and
>>>   *  -Wmismatched-tags that triggers for class and struct declarations
>>>  with class-key that doesn't match either their definition or
>>>  the first declaration (if no definition is provided).
>>>
>>> The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly
>>> straightforward but the -Wmismatched-tags solution is slightly unusual.
>>> It collects struct and class declarations first and diagnoses mismatches
>>> only after the whole tramslation unit has been processed.  This is so
>>> that the definition of a class can guide which declarations to diagnose
>>> no matter which come first.
>> So there was some discussion on whether or not we even wanted to keep
>> the struct vs class convention for GCC.
>>
>> Did that reach any kind of conclusion?  I don't have a strong opinion
>> here and will adjust to whatever the consensus is.
> 
> I'm not really sure how to gauge consensus here but this patch doesn't
> actually enforce the GCC convention, it just makes it possible (none
> of the new options is included in -Wall or -Wextra; they all have to
> be enabled explicitly).
I'm not entirely sure either.  To a large degree this feels like a case
where the C++ leads as well as the major contributors chime in.

How about this, when I get back from PTO if there haven't been
objections from the major contributors, then we go forward and change
the convention.  That givesfolks just shy of 2 weeks to object (I've
read Jason and Jon's comments as supporting dropping the existing
convention).  If there's objections, then we discuss further :-)


> 
> I myself wouldn't adopt a class/struct convention like that if given
> the choice and wouldn't be at all upset if we dropped it going forward,
> despite all the effort I put into the cleanup (which has already been
> committed).  But us dropping it won't affect other projects that also
> use it(*), albeit without enforcement, and so I would rather not tie
> the consideration of the patch to the GCC guideline.  I would think
> helping other projects enforce it if they find it helpful would be
> seen as valuable even if we don't find the convention useful in GCC
> anymore.
Right.  Given this seems to be a convention used elsewhere we could
still go forward with your patch to warn, even if we decide against
using the convention going forward for GCC.  It'd be your call if you
want to go forward with the warning patch.

Jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Andrew MacLeod

On 7/24/19 9:25 AM, Richard Biener wrote:

On Wed, Jul 24, 2019 at 12:56 AM Andrew MacLeod  wrote:

On 7/23/19 5:33 AM, Richard Biener wrote:


   From my point of view, a range object is similar to a tree node. A tree
node has the bits to indicate what the value is, but also associates a
type with those bits within the same object.  This is less error prone
than passing around the bits and the type separately.  As ranges are
starting to be used in many places outside of VRP, we should do the same
thing with ranges.  WIth value_range it would actually be free since
there is already a tree for the bounds already which contains the type.

Not for VARYING or UNDEFINED.

Personally, Id put it in both for consistency.  I view a range as an
object we are associating with either an expression or an ssa-name. That
range should then have a consistent type with that name or expression.
Even if the range is empty, I would still expect it to be compatible
since I'm usually associating it with an ssa_name or expression.

Imagine you have int i; long l; and you want to ask if both variables
have the same value range, how would you do that?  What is "same" here?



you would do it exactly the same as you do with value_range today. What 
does adding a type() method which maps to TREE_TYPE (range.min()), which 
also works on VARYING now, have to do with any of this?

We aren't suddenly adding a bunch of restrictions to value_range...

you are arguing about irrelevant things to this patch.   we aren't 
adding a new set of restrictions that don't allow you to do something 
you could do before.






Because you seem so dead set against it, I can also see some consistency
in not having a type for undefined... Since there are 0 ranges, we can't
ask for a lower_bound () or upper_bound () on an empty range, so I can
see an argument for extending that not being able to ask the type()
either.  I agree that this is also a plausible viewpoint, and we are
changing our code to make that happen, even though we have to pass a few
extra types around and will lose some minor sanity checking when doing
intersection or union with an empty range.   To me an empty range is
akin to a NaN.. it isn't a real float value, but it does still have a
type associated with it.

Both VARYING and UNDEFINED are currently lattice states, not real
(integer) ranges.  irange (rightfully so) isn't convering the lattice state
and the current value_range does so for historical reasons.  In
principle the VRP lattice could be

   struct { enum lattice_state state; irange *range; };

where for state != VR_[ANTI]RANGE 'range' is NULL.

You shouldn't tie your hands by trying to design something that plugs
1:1 into the existing uses.  That constrains you too much IMHO.

Given that we do not represent VARYING as [-INF, +INF] currently
and you appearantly want to preserve that(?) it should be
straight-forward to add a irange_from_varying  (type)
irange_from_undefined (type)
if you want to get a range for a lattice state VARYING/UNDEFINED.



You've read the patch...  of course that is the main change we are 
proposing.

varying == [-INF, +INF] ==  [MIN, MAX]

the patch enforces that if you set the state to varying, it will set the 
endpoints to [MIN, MAX] for that object.
It also enforces that if you set the range to [MIN, MAX], it will also 
change the lattice value to VARYING for you, which doesn't always happen 
today.


And it is disengenuous to suggest that VARYING is typeless.   a varying 
char and a varying long are not equivalent.
If we assign a varying char to an unsigned int, the result is no longer 
varying, we get  a range of [0,255].
By claiming its some magical typeless state, the unsigned int would then 
become varying as well , or [0, 4294967295] which is wrong.


so varying is *not* typeless, and it is always representative of 
something which has a type.


A varying char has a range of [0,255]. We just want to make that 
available in value_range without everyone having to write

  if (r.varying_p())
 lbound = min_for_type (type_passed in);
  else
    lbound = r.min()

everytime they want to look at the lower bound.   The lower bound is now 
just always set.



thats it.  Thats what the patch is about.





So we can now normalize that code to not special case varying.. simply
ask for lower_bound and upper_bound ().  Its less special casing, its
consistent, costs no memory.. It all seems like a really good cleanup
which you appear to be opposing for reasons I can't quite fathom.





Well, I don't agree.  It's not archaic, it's the way GCC works everywhere
else.  So it's consistent.  Archaically consistend in your view.

Are you really arguing we should write code the same way for 30+ years
and never change?

Yes..


That is problematic at best.


Indeed, I would argue that when we have the opportunity to modernize
code in our code base, we really ought to be doing it... and we don't do
it nearly enough.

Why don't we just pass a type along with eve

Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread Martin Sebor

On 7/24/19 11:12 AM, Jeff Law wrote:

On 7/24/19 10:09 AM, Martin Sebor wrote:

On 7/24/19 9:25 AM, Jeff Law wrote:

On 7/23/19 10:20 AM, Martin Sebor wrote:

On 7/22/19 10:26 PM, JiangNing OS wrote:

This patch is to fix PR91195. Is it OK for trunk?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..4db36644160 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  Jiangning Liu  
+
+    PR middle-end/91195
+    * tree-ssa-phiopt.c (cond_store_replacement): Work around
+    -Wmaybe-uninitialized warning.
+
    2019-07-22  Stafford Horne  
      * config/or1k/or1k.c (or1k_expand_compare): Check for int
before
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index b64bde695f4..7a86007d087 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb,
basic_block join_bb,
  tree base = get_base_address (lhs);
  if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
    return false;
+
+  /* The transformation below will inherently introduce a memory
load,
+ for which LHS may not be initialized yet if it is not in NOTRAP,
+ so a -Wmaybe-uninitialized warning message could be triggered.
+ Since it's a bit hard to have a very accurate uninitialization
+ analysis to memory reference, we disable the warning here to
avoid
+ confusion.  */
+  TREE_NO_WARNING (lhs) = 1;


The no-warning bit is sometimes (typically?) set by the middle-end
after a warning has been issued, to avoid triggering other warnings
down the line for an already diagnosed expression.  Unless it's
done just for the purposes of a single pass and the bit is cleared
afterwards, using it to avoid potential false positives doesn't
seem like a robust solution.  It will mask warnings for constructs
that have been determined to be invalid.

FWIW, the middle-end is susceptible to quite a few false positives
that would nice to avoid.  We have discussed various approaches to
the problem but setting the no-warning bit seems like too blunt of
an instrument.

All true.

But in the case JiangNing is working with the transformation inherently
can introduce an uninitialized read.  It seems reasonable to mark those
loads he generates that don't have a dominating read.

His code takes something like

    if (x)
  *p = 

And turns it into

    t1 = *p;
    t2 = x ?  : t1;
    *p = t2;

In the past we required there be a dominating read from *p (among other
restrictions).  That requirement was meant to ensure *p isn't going to
fault.  Jiang's work relaxes that requirement somewhat for objects that
we can prove aren't going to fault via other means.

Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
Certainly.  However, I believe we use it in other places where we know
the code we're emitting is safe, but can cause a warning.  I think
Jiang's work falls into that category.

I do think the bit should only be set if we don't have a dominating load
to minimize cases where we might inhibit a valid warning.


I was thinking of a few cases where setting the no-warning bit might
interfere with detecting bugs unrelated to uninitialized reads:

   1) -Warray-bounds in gimple-ssa-warn-restrict and tree-vrp
   2) -Wstringop-overflow in tree-ssa-strlen (other than for calls
  to built-ins)

I couldn't come up with a test case that shows how it might happen
with this patch but I didn't spend too much time on it.

I bet we could find one and it's more likely to show up on aarch64 than
x86 due to costing issues.  Either way we're making a bit of a judgment
call -- an extra false positive here due to a load the compiler inserted
on a path that didn't have one before, or potentially missing a warning
on that load because of the TREE_NO_WARNING.

I believe the false positive here is worse than the potential missed
warning.




There are a number of existing instances of setting TREE_NO_WARNING
to suppress -Wuninitialized, and some are the cause of known problems.
Bugs 51545, 58950, 74762, 74765 or 89697 are examples.  They all boil
down to the fact that there's just one bit for all warnings.  Jakub
mentioned adding a hash-map for this.  That seems like a simple and
good solution.

I'm not sure how that really helps here.  We marking the MEM on the LHS
and that's not a shared object.  I don't see how it's going to be
significantly different using a hash map vs the bit in this circumstance.


I don't know what Jakub had in mind but the mapping I envision is
one like hash_map that would make it possible to set
a bit for each distinct warning for every tree node.  It would let
us set a bit for -Wuninitialized while leaving the bit for
-Warray-bounds (and all other warnings) clear.



If the bit were on an SSA_NAME, or a _DECL node, then the flag bit is
shared and would be a much larger concern.


For shared objects the mapping would have to be more involved but
I haven't thought about it in any detail to 

Re: [PATH] Patch to fix -finstrument-functions-exclude-function-list handling of namespaces and escaped commas

2019-07-24 Thread Jeff Law
On 7/4/19 2:18 AM, Oliver Browne wrote:
> See below for modified patch, indentation and newline for curly braces
> style applied, and commented out chunk removed. Apologies, indentation
> and newline for scope are not they way I normally write things, habits
> got the better of me, and I forgot to remove the commented out chunk
> before submission.
> 
> Adding the scope braces and writing the for loop in ordinary C instead
> of relying on a macro are changes made for the sake of
> maintainability.
The right way to do this in GCC is to use the facilities we have
designed for these purposes.  In some cases there are constraints that
those facilities enforce and doing an open-coded implementation will not
work the way you want (the immediate use iterators immediately come to
mind) and in other cases the facilities we've built may be much faster
(the bitmap iterators) and in some cases it's just syntatic sugar, but
the consistency with the rest of the codebase is important.

When you don't follow conventions, the maintainer looking at your patch
has to disentangle the real change from your formatting and convention
changes, then apply the fix by hand, possibly getting it wrong in the
process (made even more possible by the lack of a test in the patch).
THen because they made changes, they'll have to sanity tests to make
sure they didn't goof anything with a typo, etc which takes even more of
their limited time.

If you make it easy to review the patch by following conventions, not
making extraneous changes, include tests, indicate that you regression
tested your patch, etc, then it becomes very easy for a maintainer to
look at the patch and gate it in.

Anyway, I've fixed up your patch to follow existing conventions,
bootstrapped and regression tested it and installed the patch on the trunk.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Jeff Law
On 7/24/19 11:00 AM, Richard Biener wrote:
[ Big snip, ignore missing reply attributions... ]

>> it. But I'd claim that if callers are required not to change these
>> ranges, then the callers are fundamentally broken.  I'm not sure
>> what the "sanitization" is really buying you here.  Can you point
>> to something specific?
>> 
>>> 
>>> But you lose the sanitizing that nobody can change it and the 
>>> changed info leaks to other SSA vars.
>>> 
>>> As said, fix all callers to deal with NULL.
>>> 
>>> But I argue the current code is exactly optimal and safe.
>> ANd I'd argue that it's just plain broken and that the
>> sanitization you're referring to points to something broken
>> elsewhere,  higher up in the callers.
> 
> Another option is to make get_value_range return by value and the
> only way to change the lattice to call an appropriate set function. I
> think we already do the latter in all cases (but we use
> get_value_range in the setter) and returning by reference is just
> eliding the copy.
OK, so what I think you're getting at (and please correct me if I'm
wrong) is that once the lattice values are set, you don't want something
changing the recorded ranges underneath?

ISTM the way to enforce that is to embed the concept in the class and
enforce it by not allowing direct manipulation of range by the clients.
 So a client that wants this behavior somehow tells the class that
ranges are "set in stone" and from that point the setters don't allow
changing the underlying ranges.

I just want to make sure we're on the same page WRT why you think the
constant varying range object is useful.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Richard Biener
On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law  wrote:
>On 7/24/19 11:00 AM, Richard Biener wrote:
>[ Big snip, ignore missing reply attributions... ]
>
>>> it. But I'd claim that if callers are required not to change these
>>> ranges, then the callers are fundamentally broken.  I'm not sure
>>> what the "sanitization" is really buying you here.  Can you point
>>> to something specific?
>>> 
 
 But you lose the sanitizing that nobody can change it and the 
 changed info leaks to other SSA vars.
 
 As said, fix all callers to deal with NULL.
 
 But I argue the current code is exactly optimal and safe.
>>> ANd I'd argue that it's just plain broken and that the
>>> sanitization you're referring to points to something broken
>>> elsewhere,  higher up in the callers.
>> 
>> Another option is to make get_value_range return by value and the
>> only way to change the lattice to call an appropriate set function. I
>> think we already do the latter in all cases (but we use
>> get_value_range in the setter) and returning by reference is just
>> eliding the copy.
>OK, so what I think you're getting at (and please correct me if I'm
>wrong) is that once the lattice values are set, you don't want
>something
>changing the recorded ranges underneath?
>
>ISTM the way to enforce that is to embed the concept in the class and
>enforce it by not allowing direct manipulation of range by the clients.
> So a client that wants this behavior somehow tells the class that
>ranges are "set in stone" and from that point the setters don't allow
>changing the underlying ranges.

Yes. You'll see that nearly all callers do

Value_range vr = *get_value_range (name);

Modify 

Update_value_range (name, &vr) ;

And returning by reference was mostly an optimization. We _did_ have callers 
Changing the range in place and the const varying catched those. 

When returning by value we can return individual VARYINGs not in the lattice if 
we decide that's what we want. 

>I just want to make sure we're on the same page WRT why you think the
>constant varying range object is useful.

As said it's an optimization. We do not want to reallocate the lattice. And we 
want lattice updating to happen in a controlled manner, so returning a pointer 
into the lattice is bad design at this point. 

Richard. 

>jeff



Re: [patch] Add simple narrowing optimization to expand_case

2019-07-24 Thread Jeff Law
On 7/24/19 3:57 AM, Eric Botcazou wrote:
> Hi,
> 
> this adds a simple narrowing optimization to expand_case in order to avoid 
> calling a double-word comparison routine when it is unnecessary to do so.
> This is mainly for -O0 because the optimization is otherwise done by forward 
> propagation and avoids having to drag libgcc or not depending on the -O level.
> 
> Tested on x86_64-suse-linux, OK for the mainline?
> 
> 
> 2019-07-24  Eric Botcazou  
> 
>   * stmt.c (expand_case): Try to narrow the index type if it's larger
>   than a word.  Tidy up.
> 
> 
> 2019-07-24  Eric Botcazou  
> 
>   * gnat.dg/case_optimization3.ad[sb]: New test.
> 
I wouldn't much care about dragging in libgcc for a -O0 build.  Instead
I'd pitch this as generating sensible code regardless of the
optimization level.

OK by me.

Jeff


Re: [PATCH v4] Generalize get_most_common_single_value to return k_th value & count

2019-07-24 Thread Jeff Law
On 7/17/19 1:55 AM, Martin Liška wrote:
> On 7/17/19 7:44 AM, luoxhu wrote:
>> Hi Martin,
>> Thanks for your review, v4 Changes as below:
>>  1. Use decrease bubble sort.
>> BTW, I have a question about hist->hvalue.counters[2], when will it become
>>  -1, please? Thanks.  Currently, if it is -1, the function will return false.
> Hi.
> 
> Thanks for that. I made a minor changes to your patch, please see it in 
> attachment.
> -1 is a value that we use for invalidated histogram. That happens when you 
> need
> to fit in more values during instrumentation than you have counters in the 
> histogram.
> It helps to make reproducible builds of a software.
> 
> Martin
> 
> 
> most-common-value.patch
> 
> diff --git a/gcc/profile.c b/gcc/profile.c
> index 441cb8eb183..1151b491848 100644
> --- a/gcc/profile.c
> +++ b/gcc/profile.c
> @@ -743,6 +743,44 @@ compute_branch_probabilities (unsigned cfg_checksum, 
> unsigned lineno_checksum)
>free_aux_for_blocks ();
>  }
>  
> +/* Sort the histogram value and count for TOPN and INDIR_CALL types.  */
> +
> +static void
> +sort_hist_values (histogram_value hist)
> +{
> +  /* counters[2] equal to -1 means that all counters are invalidated.  */
> +  if (hist->hvalue.counters[2] == -1)
> +return;
> +
> +  gcc_assert (hist->type == HIST_TYPE_TOPN_VALUES
> +   || hist->type == HIST_TYPE_INDIR_CALL);
> +
> +  gcc_assert (hist->n_counters == GCOV_TOPN_VALUES_COUNTERS);
> +
> +  /* Hist value is organized as:
> + [total_executions, value1, counter1, ..., value4, counter4]
> + Use decrese bubble sort to rearrange it.  The sort starts from 

Re: [RFC/PATCH v2][PR89245] Check REG_CALL_DECL note during the tail-merging

2019-07-24 Thread Jeff Law
On 7/17/19 2:29 AM, Dragan Mladjenovic wrote:
> 
> 
> On 09.07.2019. 23:21, Jeff Law wrote:
>> On 7/9/19 2:06 PM, Dragan Mladjenovic wrote:
>>> This patch prevents merging of CALL instructions that that have different
>>> REG_CALL_DECL notes attached to them.
>>>
>>> On most architectures this is not an important distinction. Usually 
>>> instruction patterns
>>> for calls to different functions reference different SYMBOL_REF-s, so they 
>>> won't match.
>>> On MIPS PIC calls get split into an got_load/*call_internal pair where the 
>>> latter represents
>>> indirect register call w/o SYMBOL_REF attached (until machine_reorg pass). 
>>> The bugzilla issue
>>> had such two internal_call-s merged despite the fact that they had 
>>> different register usage
>>> information assigned by ipa-ra.
>>>
>>> As per comment form Richard Sandiford, this version compares reg usage for 
>>> both call
>>> instruction instead of shallow comparing the notes. Tests updated 
>>> accordingly.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-07-09  Dragan Mladjenovic  
>>>
>>> * cfgcleanup.c (old_insns_match_p): Check if used hard regs set is equal
>>> for both call instructions.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-07-09  Dragan Mladjenovic  
>>>
>>> * gcc.target/mips/cfgcleanup-jalr1.c: New test.
>>> * gcc.target/mips/cfgcleanup-jalr2.c: New test.
>>> * gcc.target/mips/cfgcleanup-jalr3.c: New test.
>> THanks.  I've installed this on the trunk.
>>
>> jeff
> Thanks. Can this be back-ported to active branches also. This issue 
> seems to be there > since gcc6 if not gcc5.
I've asked Matthew to handle the backport.  I'm going to be on PTO the
next couple weeks.

jeff


Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-24 Thread Jeff Law
On 7/11/19 12:45 AM, Martin Liška wrote:
> On 7/9/19 11:00 PM, Jason Merrill wrote:

> Ok, I hopefully addressed the suggested improvements to the patch.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> 0001-Extend-DCE-to-remove-unnecessary-new-delete-pairs-PR.patch
> 
> From 771d9128144745fe530577912521fc8228ca7424 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 2 Jul 2019 09:08:27 +0200
> Subject: [PATCH] Extend DCE to remove unnecessary new/delete-pairs (PR
>  c++/23383).
> 
> gcc/ChangeLog:
> 
> 2019-07-02  Martin Liska  
>   Dominik Infuhr  
> 
>   PR c++/23383
>   * common.opt: Add -fallocation-dce
>   * gimple.c (gimple_call_operator_delete_p): New.
>   * gimple.h (gimple_call_operator_delete_p): Likewise.
>   * tree-core.h (enum function_decl_type): Add OPERATOR_DELETE.
>   * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
>   DECL_IS_OPERATOR_DELETE_P.
>   (mark_all_reaching_defs_necessary_1): Likewise.
>   (propagate_necessity): Likewise.
>   (eliminate_unnecessary_stmts): Handle
>   gimple_call_operator_delete_p.
>   * tree-streamer-in.c (unpack_ts_function_decl_value_fields):
>   Add packing of OPERATOR_DELETE.
>   * tree-streamer-out.c (pack_ts_function_decl_value_fields):
>   Similarly here.
>   * tree.h (DECL_IS_OPERATOR_DELETE_P): New.
>   (DECL_SET_IS_OPERATOR_DELETE): New.
>   (DECL_IS_REPLACEABLE_OPERATOR_NEW_P): Likewise.
> 
> gcc/c/ChangeLog:
> 
> 2019-07-02  Martin Liska  
>   Dominik Infuhr  
> 
>   PR c++/23383
>   * c-decl.c (merge_decls): Merge OPERATOR_DELETE flag.
> 
> gcc/cp/ChangeLog:
> 
> 2019-07-02  Martin Liska  
>   Dominik Infuhr  
> 
>   PR c++/23383
>   * decl.c (cxx_init_decl_processing): Mark delete operators
>   with DECL_SET_IS_OPERATOR_DELETE.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-07-02  Martin Liska Dominik Infuhr  
> 
>   PR c++/23383
>   * g++.dg/cpp1y/new1.C: New test.
> 
> libstdc++-v3/ChangeLog:
> 
> 2019-07-02  Martin Liska  
>   Dominik Infuhr  
> 
>   PR c++/23383
>   * testsuite/ext/bitmap_allocator/check_delete.cc: Add
>   -fno-allocation-dce.
>   * testsuite/ext/bitmap_allocator/check_new.cc: Likewise.
>   * testsuite/ext/new_allocator/check_delete.cc: Likewise.
>   * testsuite/ext/new_allocator/check_new.cc: Likewise.
OK.

I don't see the 2/2 from this series in my queue.  Has it already been
dealt with?

jeff


Re: [PATCH 1/3] add -Wstruct-not-pod, -Wclass-is-pod, -Wmismatched-tags (PR 61339)

2019-07-24 Thread Jeff Law
On 7/8/19 3:58 PM, Martin Sebor wrote:
> The attached patch implements three new warnings:
> 
>  *  -Wstruct-not-pod triggers for struct definitions that are not
> POD structs,
>  *  -Wclass-is-pod triggers for class definitions that satisfy
> the requirements on POD structs, and
>  *  -Wmismatched-tags that triggers for class and struct declarations
> with class-key that doesn't match either their definition or
> the first declaration (if no definition is provided).
> 
> The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly
> straightforward but the -Wmismatched-tags solution is slightly unusual.
> It collects struct and class declarations first and diagnoses mismatches
> only after the whole tramslation unit has been processed.  This is so
> that the definition of a class can guide which declarations to diagnose
> no matter which come first.
> 
> Martin
> 
> gcc-61339-impl.diff
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt (-Wstruct-not-pod, -Wclass-is-pod): New options.
>   (-Wmismatched-tags): Same.
> 
> gcc/cp/ChangeLog:
> 
>   * parser.c (maybe_warn_struct_vs_class): New function.
>   (cp_parser_check_class_key): Add argument.
>   (cp_parser_type_specifier): Call maybe_warn_struct_vs_class.
>   (cp_parser_elaborated_type_specifier): Call maybe_warn_struct_vs_class
>   before setting CLASSTYPE_DECLARED_CLASS.  Avoid setting it for classes
>   that are in the process of being defined.
>   (cp_parser_class_head): Call maybe_warn_struct_vs_class.
>   (class_or_template_pod_p): New static function.
>   (maybe_warn_struct_vs_class) Same.
>   (class rec_decl_loc_t): New.
>   (cp_parser_check_class_key): Record a struct declaration.
>   (diag_mismatched_tags): Hanlde -Wmismatched-tags.
>   (c_parse_file): Call diag_mismatched_tags.
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi (-Wstruct-not-pod, -Wclass-is-pod): Document new
>   options.
>   (-Wmismatched-tags): Same.
So I'm going to defer to Jason here given this is a front-end patch.
Your call whether or not to push on it prior to getting any resolution
on changing GCC's conventions.

jeff



Re: [PATCH 1/4] Remove old IPA-SRA, introduce tree-sra.h

2019-07-24 Thread Jeff Law
On 7/23/19 10:16 AM, Martin Jambor wrote:
> This patch removes the old IPA-SRA.  Please see the covert letter for
> more information about the whole patch-set.
> 
> Martin
> 
> 2019-07-23  Martin Jambor  
> 
>   * dbgcnt.def: Remove eipa_sra.
> * passes.def: Remove old IPA-SRA.
> * tree-pass.h (make_pass_early_ipa_sra): Remove declaration.
> * tree-sra.c: Removed IPA-SRA.  Include tree-sra.h.
> (type_internals_preclude_sra_p): Make public.
> * tree-sra.h: New file.
This seems trivially OK once the new implementation is accepted.

No, I don't think I'll have time to dig into the new implementation
before I go on PTO.  I'm hoping Jan can chime in since this hits the IPA
space hard.

jeff


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-24 Thread Andrew MacLeod

On 7/24/19 2:18 PM, Jeff Law wrote:

On 7/24/19 11:00 AM, Richard Biener wrote:
[ Big snip, ignore missing reply attributions... ]


it. But I'd claim that if callers are required not to change these
ranges, then the callers are fundamentally broken.  I'm not sure
what the "sanitization" is really buying you here.  Can you point
to something specific?


But you lose the sanitizing that nobody can change it and the
changed info leaks to other SSA vars.

As said, fix all callers to deal with NULL.

But I argue the current code is exactly optimal and safe.

ANd I'd argue that it's just plain broken and that the
sanitization you're referring to points to something broken
elsewhere,  higher up in the callers.

Another option is to make get_value_range return by value and the
only way to change the lattice to call an appropriate set function. I
think we already do the latter in all cases (but we use
get_value_range in the setter) and returning by reference is just
eliding the copy.

OK, so what I think you're getting at (and please correct me if I'm
wrong) is that once the lattice values are set, you don't want something
changing the recorded ranges underneath?

ISTM the way to enforce that is to embed the concept in the class and
enforce it by not allowing direct manipulation of range by the clients.
  So a client that wants this behavior somehow tells the class that
ranges are "set in stone" and from that point the setters don't allow
changing the underlying ranges.

I just want to make sure we're on the same page WRT why you think the
constant varying range object is useful.

jeff


That is not the functionality we are seeing.

whenever get_value_range silently returns a CONST varying node,  the 
ONLY way you can tell that the node might possibly be const elsewhere 
would be if you first check that it is varying, like in  :


   void
   vr_values::set_defs_to_varying (gimple *stmt)
   {
  ssa_op_iter i;
  tree def;
  FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF)
    {
  value_range *vr = get_value_range (def);
  /* Avoid writing to vr_const_varying get_value_range may
   return.  */
  if (!vr->varying_p ())
    vr->set_varying ();
    }
   }



Which means there can be *no* context in which we ever try move one of 
these nodes from varying to anything else, or we'd trap on a write to 
read-only space.


Which means no place is ever trying to change those nodes from varying 
to anything else.  But nothing is preventing changes from other ranges 
to something else.


Which also means the only thing this approach accomplishes is to force 
us to check if a node is already varying, so that we don't overwrite the 
node to varying just in case its a hidden const.


how can this hidden const node really be useful?

I submit this is just a dangerous way to flag previously unprocessed 
nodes as VARYING for the duration of the pass after values_propagated is 
set...  not some higher level "Don't change this range any more" plan.  
Its already bottom of the lattice..  it isn't going anywhere.


Andrew



[PATCH] Fix gcc.dg/gomp/pr89104.c failure on aarch64

2019-07-24 Thread Steve Ellcey
I noticed that the test gcc.dg/gomp/pr89104.c fails on aarch64 platforms.
As mentioned in the bug report for PR 89104, this message is coming from
aarch64 target specific code which is why it does not occur on other
platforms.  There doesn't seem to be complete consensus in the bug report
on how to deal with this but I chose to just use -w on aarch64 to surpress
the warning.

The warning that we get is:

pr89104.c:8:1: warning: GCC does not currently support mixed size types for 
‘simd’ functions
8 | foo (int *x, int y)

This is because 'x' is a 64 bit pointer and 'y' is a 32 bit integer
in the default LP64 mode.  If I use -mabi=ilp32, then aarch64 does not 
generate a warning because both arguments are 32 bits.  I could force
ILP32 mode for aarch64 and/or only use -w only when not in 32 bit mode
but that seemed like overkill to me.

OK to checkin?

Steve Ellcey
sell...@marvell.com



2019-07-24  Steve Ellcey  

* gcc.dg/gomp/pr89104.c: Use -w on aarch64*-*-* targets.


diff --git a/gcc/testsuite/gcc.dg/gomp/pr89104.c 
b/gcc/testsuite/gcc.dg/gomp/pr89104.c
index 505fdda..7f0f688 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr89104.c
+++ b/gcc/testsuite/gcc.dg/gomp/pr89104.c
@@ -2,6 +2,7 @@
 /* PR ipa/89104 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -fopenmp-simd" } */
+/* { dg-options "-O2 -fopenmp-simd -w" { target aarch64*-*-* } } */
 
 #pragma omp declare simd uniform (x) aligned (x)
 int


Re: [PATCH] Fix gcc.dg/gomp/pr89104.c failure on aarch64

2019-07-24 Thread Rainer Orth
Hi Steve,

> 2019-07-24  Steve Ellcey  
>
>   * gcc.dg/gomp/pr89104.c: Use -w on aarch64*-*-* targets.
>
>
> diff --git a/gcc/testsuite/gcc.dg/gomp/pr89104.c 
> b/gcc/testsuite/gcc.dg/gomp/pr89104.c
> index 505fdda..7f0f688 100644
> --- a/gcc/testsuite/gcc.dg/gomp/pr89104.c
> +++ b/gcc/testsuite/gcc.dg/gomp/pr89104.c
> @@ -2,6 +2,7 @@
>  /* PR ipa/89104 */
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fopenmp-simd" } */
> +/* { dg-options "-O2 -fopenmp-simd -w" { target aarch64*-*-* } } */

please use 

/* { dg-additional-options "-w" { target aarch64*-*-* } } */

instead of repeating the option list.  I'll leave approval of the
substance of the patch to an OpenMP maintainer.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] report as disabled options unsupported by a language (PR 80545)

2019-07-24 Thread Martin Sebor

On 7/22/19 5:26 PM, Jeff Law wrote:

On 7/22/19 3:54 PM, Martin Sebor wrote:

While resolving PR80545 - option -Wstringop-overflow not recognized
by Fortran, I discovered that a command line options that's supported
only by a subset of languages is considered as enabled when tested
by the middle-end even when the current language doesn't support
the option.

When the option controls a warning, there is no good way to suppress
its false positives via the usual mechanisms (i.e., specifying
-Wno-stringop-overflow either on the command line or via a pragma)
because the option is not recognized by the languages that do not
support it.

The attached patch arranges for such options to be treated as disabled
when queried by the middle-end when the current language doesn't support
them.  The fix wasn't as straightforward as testing
lang_hooks.option_lang_mask () in the diagnostics subsystem because
it doesn't have access to lang_hooks. To get around it the patch adds
a new member, diagnostic_context::lang_mask, and initializes it with
the result of the hook.

To make debugging and testing this fix easier I enhanced the output
of the -Q --help= options to clearly indicate when an otherwise
recognized option isn't supported by a front-end.  For example,
where the current trunk prints

   -Walign-commons [enabled]

for the Fortran-only option -Walign-commons even when GCC is invoked
via a driver for a language like C or C++, with the patch applied it
prints

   -Walign-commons [available in Fortran]

I also changed the output to indicate the what option an alias
is for, so that for example the -Wattribute-alias option that's
an alias for -Wattribute-alias=1 is shown with the alias target
as the value:

   -Wattribute-alias   -Wattribute-alias=1

Besides this, I also corrected the printing of byte-size arguments
(they were sliced from 64 to 32 bits).

Martin

gcc-80545.diff

PR driver/80545 - option -Wstringop-overflow not recognized by Fortran

gcc/cp/ChangeLog:

PR driver/80545
* decl.c (finish_function): Use lang_mask.

gcc/testsuite/ChangeLog:

PR driver/80545
* gcc.misc-tests/help.exp: Add tests.
* lib/options.exp: Handle C++.

gcc/ChangeLog:

PR driver/80545
* diagnostic.c (diagnostic_classify_diagnostic): Use lang_mask.
(diagnostic_report_diagnostic): Same.
* diagnostic.h (diagnostic_context::option_enabled): Add an argument.
(diagnostic_context::lang_mask): New data member.
* ipa-pure-const.c (suggest_attribute): Use
lang_hooks.option_lang_mask ().
* opts-common.c (option_enabled): Handle new argument.
(get_option_state): Pass an additional argument.
* opts.c (print_filtered_help): Print supported languages for
unsupported options.  Adjust printing of current state.
* opts.h (option_enabled): Add argument.
* toplev.c (print_switch_values): Use lang_mask.
(general_init): Set global_dc->lang_mask.

Ironically this might have helped me today.  I was looking at a case
where an LTO build get a fatal warning, but the non-LTO build got a
different (and non-fatal due to flags warning).  It wasn't until I
started debugging the diagnostic machinery that I realized -Wall was a
language specific flag and that accounted for the difference between the
LTO and non-LTO builds.

I think this is OK, but give other folks 48hrs to chime in just in case.


Having heard no concerns I just checked it in as r273771.

Martin


Re: [libgomp, WIP, GSoC'19] Modification to a single queue, single execution path.

2019-07-24 Thread Jakub Jelinek
On Wed, Jul 24, 2019 at 05:25:56PM +0900, 김규래 wrote:
> > Generally, we don't want to have code commented out like this in the final
> > patch submission.  For this WIP, I think it is acceptable, as I think in the
> > end you don't want to use the team's queue, but actually either
> > children_queue (renamed), but only use it on the implicit tasks, or during
>  
> Can you elaborate?
> What do you mean by "children_queue (renamed)"?

I meant use a queue structure member in the same location as current
children_queue, just rename it to something more sensible (just task_queue
etc.), because it will not be really a queue of task children anymore.

> > team creation allocate together with the team structure also memory that
> > would be used as a trailing array for an array of the implicit queues next
> > to the array of implicit tasks. 
>  
> Do you mean to make two trailing arrays in gomp_team?

Yes.  Of course, in C you can't have two flexible array members after each
other, and we certainly don't want to use the GNU C extension of variable
length structures, but it would be something like struct gomp_team have
a struct priority_queue *task_queues; where the struct gomp_team
initialization would set team->task_queues to (struct priority_queue *)
&team->implicit_task[team->nthreads].

> Also, this is a personal question, why do gcc prefer trailing arrays over 
> dynamically allocated pointers?

Because malloc is fairly expensive and lots of benchmarks etc. care about
#pragma omp parallel latency (when the threads are already around, of course
the first one that needs to pthread_create is much more expensive).  So it
makes quite noticeable difference if you allocate one allocation or 3 or
more.

Jakub


[committed][MSP430] Only add crtn.o to ENDFILE_SPEC if it exists

2019-07-24 Thread Jozef Lawrynowicz
A recent patch to Newlib (commit 884b05b54) removes crtn.o for msp430.
This patch wraps uses of crtn.o in ENDFILE_SPEC from config/msp430.h in
if-exists, so compatibility with new and old newlib is maintained.

Committed on trunk as obvious.
>From 65a155eb552343a28861b9d5e04e59db28bd2743 Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Wed, 24 Jul 2019 21:08:34 +
Subject: [PATCH] 2019-07-24  Jozef Lawrynowicz  

	* config/msp430/msp430.h (ENDFILE_SPEC): Wrap uses of crtn*.o in
	if-exists.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@273773 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 5 +
 gcc/config/msp430/msp430.h | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5cd80e8c7a1..8ed57edef03 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-07-24  Jozef Lawrynowicz  
+
+	* config/msp430/msp430.h (ENDFILE_SPEC): Wrap uses of crtn*.o in
+	if-exists.
+
 2019-07-24  Martin Sebor  
 
 	PR driver/80545
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 1288b1a263d..ca7cf20e1d7 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -49,7 +49,7 @@ extern bool msp430x;
 
 /* -lgcc is included because crtend.o needs __mspabi_func_epilog_1.  */
 #undef  ENDFILE_SPEC
-#define ENDFILE_SPEC "%{!minrt:crtend.o%s} %{minrt:crtn-minrt.o%s}%{!minrt:crtn.o%s} -lgcc"
+#define ENDFILE_SPEC "%{!minrt:crtend.o%s} %{minrt:%:if-exists(crtn-minrt.o%s)}%{!minrt:%:if-exists(crtn.o%s)} -lgcc"
 
 #define ASM_SPEC "-mP " /* Enable polymorphic instructions.  */ \
   "%{mcpu=*:-mcpu=%*}%{!mcpu=*:%{mmcu=*:-mmcu=%*}} " /* Pass the CPU type on to the assembler.  */ \
-- 
2.17.1



[C++ Patch] Improve delete_sanity locations

2019-07-24 Thread Paolo Carlini

Hi,

this takes care of the four locations, two warnings and two errors. I'm 
also adding the missing complain & tf_warning or tf_error guards, I 
don't have a SFINAE testcase failing but since the function takes a 
tsubst_flags_t argument I think it's the right thing to do. Tested 
x86_64-linux.


By the way, shall we add to cp-tree.h, at least temporarily, a:

inline location_t
cp_expr_loc_or_loc (const_tree t)
{
  return cp_expr_loc_or_loc (t, input_location);
}

overload? We could use it in a ton of places.

Thanks, Paolo.

//

/cp
2019-07-24  Paolo Carlini  

* decl2.c (delete_sanity): Improve diagnostic locations, use
cp_expr_loc_or_loc in four places.

/testsuite
2019-07-24  Paolo Carlini  

* g++.dg/diagnostic/delete1.C: New.
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 273767)
+++ cp/decl2.c  (working copy)
@@ -487,15 +487,19 @@ delete_sanity (tree exp, tree size, bool doing_vec
 }
 
   /* An array can't have been allocated by new, so complain.  */
-  if (TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE)
-warning (0, "deleting array %q#E", exp);
+  if (TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
+  && (complain & tf_warning))
+warning_at (cp_expr_loc_or_loc (exp, input_location), 0,
+   "deleting array %q#E", exp);
 
   t = build_expr_type_conversion (WANT_POINTER, exp, true);
 
   if (t == NULL_TREE || t == error_mark_node)
 {
-  error ("type %q#T argument given to %, expected pointer",
-TREE_TYPE (exp));
+  if (complain & tf_error)
+   error_at (cp_expr_loc_or_loc (exp, input_location),
+ "type %q#T argument given to %, expected pointer",
+ TREE_TYPE (exp));
   return error_mark_node;
 }
 
@@ -506,8 +510,10 @@ delete_sanity (tree exp, tree size, bool doing_vec
   /* You can't delete functions.  */
   if (TREE_CODE (TREE_TYPE (type)) == FUNCTION_TYPE)
 {
-  error ("cannot delete a function.  Only pointer-to-objects are "
-"valid arguments to %");
+  if (complain & tf_error)
+   error_at (cp_expr_loc_or_loc (exp, input_location),
+ "cannot delete a function.  Only pointer-to-objects are "
+ "valid arguments to %");
   return error_mark_node;
 }
 
@@ -514,7 +520,10 @@ delete_sanity (tree exp, tree size, bool doing_vec
   /* Deleting ptr to void is undefined behavior [expr.delete/3].  */
   if (VOID_TYPE_P (TREE_TYPE (type)))
 {
-  warning (OPT_Wdelete_incomplete, "deleting %qT is undefined", type);
+  if (complain & tf_warning)
+   warning_at (cp_expr_loc_or_loc (exp, input_location),
+   OPT_Wdelete_incomplete,
+   "deleting %qT is undefined", type);
   doing_vec = 0;
 }
 
Index: testsuite/g++.dg/diagnostic/delete1.C
===
--- testsuite/g++.dg/diagnostic/delete1.C   (nonexistent)
+++ testsuite/g++.dg/diagnostic/delete1.C   (working copy)
@@ -0,0 +1,14 @@
+void f ()
+{
+  int a[1];
+  delete (a);  // { dg-warning "11:deleting array" }
+
+  bool b;
+  delete (b);  // { dg-error "11:type .bool. argument" }
+
+  void g ();
+  delete (g);  // { dg-error "11:cannot delete a function" }
+
+  void* p;
+  delete (p);  // { dg-warning "11:deleting .void*." }
+}


[committed][MSP430] Enable initfini_array by default

2019-07-24 Thread Jozef Lawrynowicz
A recent patch to newlib (commit 884b05b54) removes the .init/.fini sections
from the CRT library for msp430.
initfini_array must now be enabled by default, so that libgcc/crtstuff.c does
not try to put functions relating to .ctors/.dtors in .init/.fini.

.init_array was already the preferred way of running global C++ constructors
for msp430-elf (and it is what the ABI specifies).

initfini_array can be explicitly disabled by passing --disable-initfini-array to
configure.

Committed on trunk as obvious.
>From f0a47dba32e640a3cd0d3b18d3cadab3aa5cbb05 Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Wed, 24 Jul 2019 21:13:54 +
Subject: [PATCH] 2019-07-24  Jozef Lawrynowicz  

	* config.gcc (msp430*-*-*): Enable initfini_array by default unless
	explicitly disabled with --disable-initfini-array.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@273774 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 5 +
 gcc/config.gcc | 6 ++
 2 files changed, 11 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8ed57edef03..8afc61ded72 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-07-24  Jozef Lawrynowicz  
+
+	* config.gcc (msp430*-*-*): Enable initfini_array by default unless
+	explicitly disabled with --disable-initfini-array.
+
 2019-07-24  Jozef Lawrynowicz  
 
 	* config/msp430/msp430.h (ENDFILE_SPEC): Wrap uses of crtn*.o in
diff --git a/gcc/config.gcc b/gcc/config.gcc
index e55c67a4248..76c0cb3674c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2512,6 +2512,12 @@ msp430*-*-*)
 	cxx_target_objs="msp430-c.o"
 	tmake_file="${tmake_file} msp430/t-msp430"
 	extra_gcc_objs="driver-msp430.o"
+	# Enable .init_array unless it has been explicitly disabled.
+	# The MSP430 EABI mandates the use of .init_array, and the Newlib CRT
+	# code since mid-2019 expects it.
+	if test x${disable_initfini_array} != xyes; then
+		gcc_cv_initfini_array=yes
+	fi
 	;;
 nds32*-*-*)
 	target_cpu_default="0"
-- 
2.17.1



Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-24 Thread Martin Sebor

On 7/24/19 11:06 AM, Jeff Law wrote:

On 7/22/19 5:17 PM, Martin Sebor wrote:


Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
name by writing it this way?  :-)


I'm actually saving four characters over "store_before_nul" ;-)

:-)



It's just a name I picked because I like it.  Would you prefer to
go back to the original 'cmp' instead?  It doesn't get much shorter
than that, or less descriptive, especially when there is no comment
explaining what it's for.  (FWIW, I renamed it because I found myself
going back to the description of compare_nonzero_chars to remember
what cmp actually meant.)

Fair enough.  Though "b4" reads like it belongs in a text message to me.
   I don't want to get nitty over this.  Will s/b4/before/ work for you?


If it's distracting I'll change it.



If you wanted to add a comment before the variable, that would be good
as well, particularly since we don't have a good name.





You've got two entries in the array, but the comment reads as if there's
just one element.  What is the difference between the two array elements?


Since handle_store deals with sequences of one or more bytes some
of the optimizations it implements(*) need to consider both where
the first byte of the sequence is stored and where the last one is.
The first element of this array is for the first byte and the second
one is for the last character.  The comment a few lines down is meant
to make the distinction clear but we can expand the comment above
the variable even more.

I think  my brain locked up with the "b4".  Maybe it went into a mode
where I'm trying to parse texts from my teenager which is clearly
incompatible with reading code. :-)


That's a good enough argument to change it :)


   /* The offset of the last stored byte.  */
   unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;

(lenrange[2] is the size of the store.)

[*] E.g., the one we discussed not so long ago (PR90989) or the one
that removes a store of '\0' over the terminating '\0'.



I didn't see anything terribly concerning so far, but I do want to look
at handle_store again after the comment is fixed so that I'm sure I'm
interpreting things correctly.


Does this help?

   /* The corresponding element is set to 1 if the first and last
  element, respectively, of the sequence of characters being
  written over the string described by SI ends before
  the terminating nul (if it has one), to zero if the nul is
  being overwritten but not beyond, or negative otherwise.  */

Yea.  I also note a /* */ empty comment in handle_store, you probably
wanted to write something there :-)


I did initially want to write something there but then I wasn't
sure the conditional was quite right.  I went to investigate and
forgot to fix the conditional.  There was an unnecessarily test
there so I removed both it and the comment placeholder.



OK with the nit fixes noted earlier, variable name fix and comment fixes.


Committed in r273783 after retesting and including a test for
the warning that I had left out of the patch I posted here.

Martin

PS I suspect some of the tests I added might need tweaking on
big-endian systems.  I'll deal with them tomorrow.


Re: [PATCH 1/2] Come up with function_decl_type and use it in tree_function_decl.

2019-07-24 Thread Marc Glisse
I would also mark DECL_IS_OPERATOR_DELETE_P the other operators delete, 
because of the name of the macro, but it is not important for this patch.


DCE has special code to avoid removing the LHS of malloc when it is 
unused. Is there something different about operator new that makes it not 
need the same handling?


The patch is happy to simplify malloc+delete or new+free. That makes sense 
to me, if someone wants to diagnose weird mixes, that's a separate work 
they can do later.


(Not this patch: I wonder how much speed we would lose by making 
gimple_call_operator_delete_p take a gimple* and check if the thing is a 
call, so we don't have to repeat is_gimple_call and as_a in half 
of the callers. An overload (overkill) would of course lose nothing. But, 
with just the gimple* version, especially if it was in a .h, I wonder if 
we would manage to eliminate the overhead of checking is_gimple_call when 
we already know it is a call)


--
Marc Glisse


Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-24 Thread Martin Sebor

Committed in r273783 after retesting and including a test for
the warning that I had left out of the patch I posted here.

Martin

PS I suspect some of the tests I added might need tweaking on
big-endian systems.  I'll deal with them tomorrow.


And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
shows these failures.  It looks like it doesn't like something about
some of the 64 bit stores in the tests.

FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized 
"_not_eliminated_" 0

FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized 
"_not_eliminated_" 0


Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-24 Thread Jeff Law
On 7/24/19 8:17 PM, Martin Sebor wrote:
>> Committed in r273783 after retesting and including a test for
>> the warning that I had left out of the patch I posted here.
>>
>> Martin
>>
>> PS I suspect some of the tests I added might need tweaking on
>> big-endian systems.  I'll deal with them tomorrow.
> 
> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
> shows these failures.  It looks like it doesn't like something about
> some of the 64 bit stores in the tests.
> 
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
> "_not_eliminated_" 0
> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
> "_not_eliminated_" 0


msp430-elf failures:


> New tests that FAIL (5 tests):
> 
> msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
> msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized 
> "_not_eliminated_" 0
> msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)
> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized 
> "_not_eliminated_" 0
> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0


Re: [PATCH] handle multibyte stores larger than char in strlen (PR 91183, 86888)

2019-07-24 Thread Jeff Law
On 7/24/19 8:17 PM, Martin Sebor wrote:
>> Committed in r273783 after retesting and including a test for
>> the warning that I had left out of the patch I posted here.
>>
>> Martin
>>
>> PS I suspect some of the tests I added might need tweaking on
>> big-endian systems.  I'll deal with them tomorrow.
> 
> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
> shows these failures.  It looks like it doesn't like something about
> some of the 64 bit stores in the tests.
> 
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
> "_not_eliminated_" 0
> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
> "_not_eliminated_" 0

visium-elf:

> New tests that FAIL (16 tests):
> 
> visium-sim: gcc.dg/Wstringop-overflow-14.c  (test for warnings, line 24)
> visium-sim: gcc.dg/Wstringop-overflow-14.c (test for excess errors)
> visium-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized 
> "_not_eliminated_" 0
> visium-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> visium-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized 
> "_not_eliminated_" 0
> visium-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0


[PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-24 Thread Peter Bergner
The -mdejagnu-cpu= option was added as a way for a test case to ensure a
particular -mcpu= option is used to compile the test, regardless of whether
the user attempts to override it (purposely or accidentally) via RUNTESTFLAGS.
This was well and good, but the ASM_CPU_SPEC was not updated to handle
-mdejagnu-cpu=, so the option passed to the assembler is only determined
by -mcpu=, even if that is overridden by -mdejagnu-cpu=.  This can cause
cases of using the wrong assembler option.

We could just add -mdejagnu-cpu= support to ASM_CPU_SPEC, but that spec entry
is already WAY too huge and ugly and to add support for -mdejagnu-cpu=, we'd
have to essentially double the size of that spec.  The patch below takes
a different approach and removes Segher's original patch altogether and
instead implements -mdejagnu-cpu= using a DRIVER_SELF_SPECS spec which
simplifies things by not even needing to touch ASM_CPU_SPEC.  I also added
support for -mdejagnu-tune=, even though we don't have any test cases
at the moment that use it, since it was easy to add.

Segher, I tried your suggestion of writing the spec more generically
(ie, %{mdejagnu-*: ... -m*}), which worked in that it did the correct
replacement.  However, the %= 0)
-rs6000_cpu_index = rs6000_dejagnu_cpu_index;
-
   /* Process the -mcpu= and -mtune= argument.  If the user changed
  the cpu in a target attribute or pragma, but did not specify a tuning
  option, use the cpu for the tuning option rather than the option specified
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 273707)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -77,6 +77,15 @@
 #define PPC405_ERRATUM77 0
 #endif
 
+/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
+   With older versions of Dejagnu the command line arguments you set in
+   RUNTESTFLAGS override those set in the testcases; with this option,
+   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
+#define DRIVER_SELF_SPECS \
+  "%{mdejagnu-cpu=*: %

RE: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread JiangNing OS
> -Original Message-
> From: Martin Sebor 
> Sent: Thursday, July 25, 2019 2:08 AM
> To: Jeff Law ; JiangNing OS
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for
> conditional store optimization
> 
> On 7/24/19 11:12 AM, Jeff Law wrote:
> > On 7/24/19 10:09 AM, Martin Sebor wrote:
> >> On 7/24/19 9:25 AM, Jeff Law wrote:
> >>> On 7/23/19 10:20 AM, Martin Sebor wrote:
>  On 7/22/19 10:26 PM, JiangNing OS wrote:
> > This patch is to fix PR91195. Is it OK for trunk?
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index
> > 711a31ea597..4db36644160 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2019-07-22  Jiangning Liu  
> > +
> > +    PR middle-end/91195
> > +    * tree-ssa-phiopt.c (cond_store_replacement): Work around
> > +    -Wmaybe-uninitialized warning.
> > +
> >     2019-07-22  Stafford Horne  
> >       * config/or1k/or1k.c (or1k_expand_compare): Check for
> > int before diff --git a/gcc/tree-ssa-phiopt.c
> > b/gcc/tree-ssa-phiopt.c index b64bde695f4..7a86007d087 100644
> > --- a/gcc/tree-ssa-phiopt.c
> > +++ b/gcc/tree-ssa-phiopt.c
> > @@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block
> > middle_bb, basic_block join_bb,
> >   tree base = get_base_address (lhs);
> >   if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
> >     return false;
> > +
> > +  /* The transformation below will inherently introduce a
> > +memory
> > load,
> > + for which LHS may not be initialized yet if it is not in
> > +NOTRAP,
> > + so a -Wmaybe-uninitialized warning message could be triggered.
> > + Since it's a bit hard to have a very accurate
> > +uninitialization
> > + analysis to memory reference, we disable the warning here to
> > avoid
> > + confusion.  */
> > +  TREE_NO_WARNING (lhs) = 1;
> 
>  The no-warning bit is sometimes (typically?) set by the middle-end
>  after a warning has been issued, to avoid triggering other warnings
>  down the line for an already diagnosed expression.  Unless it's
>  done just for the purposes of a single pass and the bit is cleared
>  afterwards, using it to avoid potential false positives doesn't
>  seem like a robust solution.  It will mask warnings for constructs
>  that have been determined to be invalid.
> 
>  FWIW, the middle-end is susceptible to quite a few false positives
>  that would nice to avoid.  We have discussed various approaches to
>  the problem but setting the no-warning bit seems like too blunt of
>  an instrument.
> >>> All true.
> >>>
> >>> But in the case JiangNing is working with the transformation
> >>> inherently can introduce an uninitialized read.  It seems reasonable
> >>> to mark those loads he generates that don't have a dominating read.
> >>>
> >>> His code takes something like
> >>>
> >>>     if (x)
> >>>   *p = 
> >>>
> >>> And turns it into
> >>>
> >>>     t1 = *p;
> >>>     t2 = x ?  : t1;
> >>>     *p = t2;
> >>>
> >>> In the past we required there be a dominating read from *p (among
> >>> other restrictions).  That requirement was meant to ensure *p isn't
> >>> going to fault.  Jiang's work relaxes that requirement somewhat for
> >>> objects that we can prove aren't going to fault via other means.
> >>>
> >>> Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
> >>> Certainly.  However, I believe we use it in other places where we
> >>> know the code we're emitting is safe, but can cause a warning.  I
> >>> think Jiang's work falls into that category.
> >>>
> >>> I do think the bit should only be set if we don't have a dominating
> >>> load to minimize cases where we might inhibit a valid warning.
> >>
> >> I was thinking of a few cases where setting the no-warning bit might
> >> interfere with detecting bugs unrelated to uninitialized reads:
> >>
> >>    1) -Warray-bounds in gimple-ssa-warn-restrict and tree-vrp
> >>    2) -Wstringop-overflow in tree-ssa-strlen (other than for calls
> >>   to built-ins)
> >>
> >> I couldn't come up with a test case that shows how it might happen
> >> with this patch but I didn't spend too much time on it.
> > I bet we could find one and it's more likely to show up on aarch64
> > than
> > x86 due to costing issues.  Either way we're making a bit of a
> > judgment call -- an extra false positive here due to a load the
> > compiler inserted on a path that didn't have one before, or
> > potentially missing a warning on that load because of the
> TREE_NO_WARNING.
> >
> > I believe the false positive here is worse than the potential missed
> > warning.
> >
> >
> >>
> >> There are a number of existing instances of setting TREE_NO_WARNING
> >> to suppress -Wuninitialized, and some are the cause of known problems.
> >> Bugs 51545, 58950, 74762, 

Re: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-07-24 Thread Prathamesh Kulkarni
On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
 wrote:
>
> On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
>  wrote:
> >
> > Hi Prathamesh
> >
> > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > Hi,
> > > For following test-case,
> > > static long long AL[24];
> > >
> > > int
> > > check_ok (void)
> > > {
> > >   return (__sync_bool_compare_and_swap (AL+1, 0x20003ll,
> > > 0x1234567890ll));
> > > }
> > >
> > > Compiling with -O2 -march=armv8.2-a results in:
> > > pr90724.c: In function ‘check_ok’:
> > > pr90724.c:7:1: error: unrecognizable insn:
> > > 7 | }
> > >   | ^
> > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > (compare:CC (reg:DI 95)
> > > (const_int 8589934595 [0x20003]))) "pr90724.c":6:11 -1
> > >  (nil))
> > >
> > > IIUC, the issue is that 0x20003 falls outside the range of
> > > allowable immediate in cmp ? If it's replaced by a small constant then
> > > it works.
> > >
> > > The ICE results with -march=armv8.2-a because, we enter if
> > > (TARGET_LSE) { ... } condition
> > > in aarch64_expand_compare_and_swap, while with -march=armv8.a it goes
> > > into else,
> > > which forces oldval into register if the predicate fails to match.
> > >
> > > The attached patch checks if y (oldval) satisfies aarch64_plus_operand
> > > predicate and if not, forces it to be in register, which resolves ICE.
> > > Does it look OK ?
> > >
> > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > >
> > > PS: The issue has nothing to do with SVE, which I incorrectly
> > > mentioned in bug report.
> > >
> > This looks ok to me (but you'll need maintainer approval).
> >
> > Does this fail on the branches as well?
> Hi Kyrill,
> Thanks for the review. The test also fails on gcc-9-branch (but not on gcc-8).
Hi James,
Is the patch OK to commit  ?
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> >
> > Kyrill
> >
> >
> > > Thanks,
> > > Prathamesh