date:20201006

Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Martin Liška


On 10/5/20 6:34 PM, Ian Lance Taylor wrote:

On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:


The previous patch was not correct. This one should be.

Ready for master?


I don't understand why this code uses symtab_indices_shndx at all.
There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
any need for the symtab_indices_shndx vector.


Well, the question is if we can have multiple .symtab sections in one ELF
file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX sections.
Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
by PR81968 which is about Solaris ld.



But in any case this patch looks OK.


Waiting for a feedback from Richi.

Thanks,
Martin



Thanks.

Ian

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Alexandre Oliva

On Oct  6, 2020, Richard Biener  wrote:

> On October 6, 2020 3:15:02 AM GMT+02:00, Alexandre Oliva  
> wrote:
>> 
>> This is a first step towards enabling the sincos optimization in Ada.
>> 
>> The issue this patch solves is that sincos takes the type to be looked
>> up with mathfn_built_in from variables or temporaries in which results
>> of sin and cos are stored.  In Ada, sin and cos are declared in an
>> internal aux package, with uses thereof in a standard generic package,
>> which ensures that the types are not what mathfn_built_in expects.

> But are they not compatible? 

They are, in that they use the same underlying representation, but
they're distinct types, not associated with the same TYPE_MAIN_VARIANT.

In Ada it's not unusual to have multiple floating-point types unrelated
to each other, even if they share identical underlying representation.
Each such type is a distinct type, in a similar sense that in C++ each
struct type holding a single double field is a distinct type.

Each such distinct FP type gets a different instantiation of
Ada.Numerics.Generic_Elementary_Functions, just as a C++ template taking
a parameter type would get different instantiations for such different
struct types.

Overall, it's a very confusing situation.  We use these alternate types
to declare the Sin and Cos functions imported from libm as intrinsics
(separate patch I've written very recently, yet to be contributed), and
they get matched to the libm intrinsics despite the distinct types, we
issue calls to them, passing variables of the alternate types without
explicit conversions, but when the sincos pass looks up the sincos/cexpi
intrinsic, it uses the alternate type taken from the variable and fails,
rather than the types declared as taken by the builtins.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

[committed] openmp: Fix ICE in omp_discover_declare_target_tgt_fn_r [PR97289]

2020-10-06 Thread Jakub Jelinek via Gcc-patches

Hi!

This ICEs because node->alias_target is (not yet) a FUNCTION_DECL, but
IDENTIFIER_NODE.

I guess we should retry the discovery before LTO streaming out, the reason
to do it this early is that it can affect the gimplification and omp lowering.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-06  Jakub Jelinek  

PR middle-end/97289
* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Only follow
node->alias_target if it is a FUNCTION_DECL.

* c-c++-common/gomp/pr97289.c: New test.

--- gcc/omp-offload.c.jj2020-10-01 10:40:10.0 +0200
+++ gcc/omp-offload.c   2020-10-05 11:45:40.450501897 +0200
@@ -203,7 +203,8 @@ omp_discover_declare_target_tgt_fn_r (tr
   symtab_node *node = symtab_node::get (*tp);
   if (node != NULL)
{
- while (node->alias_target)
+ while (node->alias_target
+&& TREE_CODE (node->alias_target) == FUNCTION_DECL)
{
  if (!omp_declare_target_fn_p (node->decl)
  && !lookup_attribute ("omp declare target host",
--- gcc/testsuite/c-c++-common/gomp/pr97289.c.jj2020-10-05 
11:48:58.818623202 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr97289.c   2020-10-05 11:48:38.631916154 
+0200
@@ -0,0 +1,14 @@
+/* PR middle-end/97289 */
+/* { dg-do compile } */
+/* { dg-require-weak "" } */
+/* { dg-skip-if "" { "hppa*-*-hpux*" "*-*-aix*" "nvptx-*-*" } } */
+
+void foo (void);
+static void bar (void) __attribute__ ((__weakref__ ("foo")));
+
+void
+baz (void)
+{
+#pragma omp target
+  bar ();
+}


Jakub

Re: [PATCH] RISC-V: Derive ABI from -march if -mabi is not present.

2020-10-06 Thread Andreas Schwab

On Okt 06 2020, Kito Cheng wrote:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index f623467b7637..c6ba738aa0b7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -25928,7 +25928,14 @@ allows floating-point values up to 32 bits long to 
> be passed in registers; or
>  @samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments will 
> be
>  passed in registers.
>  
> -The default for this argument is system dependent, users who want a specific
> +When @option{-mabi=} is not specified, the default value will derived from
> +@option{-march=}, the rules is using @samp{d} ABI variant if D extension is
> +enabled, otherwise using soft-float ABI variant even F extension is enabled,
> +there is an special rule for @samp{rv32e} variant is it always use
> +@samp{ilp32e}.

When @option{-mabi=} is not specified, the default value will be derived
from @option{-march=}.  If the D extension is enabled use the @samp{d}
ABI variant, otherwise use the soft-float ABI variant even if the F
extension is enabled.  For the @samp{rv32e} architecture the default is
@samp{ilp32e}.

> +
> +If @option{-march} and @option{-mabi=} both are not specified, the default 
> for

If both ... are not specified

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[PATCH] divmod: Match and expand DIVMOD even in some cases of constant divisor [PR97282]

2020-10-06 Thread Jakub Jelinek via Gcc-patches

Hi!

As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
ifn call for division + modulo by constant for the fear that during
expansion we could generate better code for those cases.
If the divisoris a power of two, that is certainly the case always,
but otherwise expand_divmod can punt in many cases, e.g. if the division
type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
choose_multiplier, because it works on HOST_WIDE_INTs (true, something
we should fix eventually now that we have wide_ints), or if pre/post shift
is larger than BITS_PER_WORD.

So, the following patch recognizes DIVMOD with constant last argument even
when it is unclear if expand_divmod will be able to optimize it, and then
during DIVMOD expansion if the divisor is constant attempts to expand it as
division + modulo and if they actually don't contain any libcalls or
division/modulo, they are kept as is, otherwise that sequence is thrown away
and divmod optab or libcall is used.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-10-06  Jakub Jelinek  

PR rtl-optimization/97282
* tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
constant op2 if it is not a power of two and the type has precision
larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
* internal-fn.c (contains_call_div_mod): New function.
(expand_DIVMOD): If last argument is a constant, try to expand it as
TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
divmod optab or divmod libfunc.

* gcc.target/i386/pr97282.c: New test.

--- gcc/tree-ssa-math-opts.c.jj 2020-10-01 10:40:10.104755999 +0200
+++ gcc/tree-ssa-math-opts.c2020-10-05 13:51:54.476628287 +0200
@@ -3567,9 +3567,24 @@ divmod_candidate_p (gassign *stmt)
 
   /* Disable the transform if either is a constant, since division-by-constant
  may have specialized expansion.  */
-  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+  if (CONSTANT_CLASS_P (op1))
 return false;
 
+  if (CONSTANT_CLASS_P (op2))
+{
+  if (integer_pow2p (op2))
+   return false;
+
+  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+ && TYPE_PRECISION (type) <= BITS_PER_WORD)
+   return false;
+
+  /* If the divisor is not power of 2 and the precision wider than
+HWI, expand_divmod punts on that, so in that case it is better
+to use divmod optab or libfunc.  Similarly if choose_multiplier
+might need pre/post shifts of BITS_PER_WORD or more.  */
+}
+
   /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
  expand using the [su]divv optabs.  */
   if (TYPE_OVERFLOW_TRAPS (type))
--- gcc/internal-fn.c.jj2020-10-02 10:36:43.272290992 +0200
+++ gcc/internal-fn.c   2020-10-05 15:15:12.498349327 +0200
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "explow.h"
+#include "rtl-iter.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -2985,6 +2986,32 @@ expand_gather_load_optab_fn (internal_fn
 emit_move_insn (lhs_rtx, ops[0].value);
 }
 
+/* Helper for expand_DIVMOD.  Return true if the sequence starting with
+   INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
+
+static bool
+contains_call_div_mod (rtx_insn *insn)
+{
+  subrtx_iterator::array_type array;
+  for (; insn; insn = NEXT_INSN (insn))
+if (CALL_P (insn))
+  return true;
+else if (INSN_P (insn))
+  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
+   switch (GET_CODE (*iter))
+ {
+ case CALL:
+ case DIV:
+ case UDIV:
+ case MOD:
+ case UMOD:
+   return true;
+ default:
+   break;
+ }
+  return false;
+ }
+
 /* Expand DIVMOD() using:
  a) optab handler for udivmod/sdivmod if it is available.
  b) If optab_handler doesn't exist, generate call to
@@ -3007,10 +3034,44 @@ expand_DIVMOD (internal_fn, gcall *call_
   rtx op1 = expand_normal (arg1);
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
 
-  rtx quotient, remainder, libfunc;
+  rtx quotient = NULL_RTX, remainder = NULL_RTX;
+  rtx_insn *insns = NULL;
+
+  if (TREE_CODE (arg1) == INTEGER_CST)
+{
+  /* For DIVMOD by integral constants, there could be efficient code
+expanded inline e.g. using shifts and plus/minus.  Try to expand
+the division and modulo and if it emits any library calls or any
+{,U}{DIV,MOD} rtxes throw it away and use a divmod optab or
+divmod libcall.  */
+  struct separate_ops ops;
+  ops.code = TRUNC_DIV_EXPR;
+  ops.type = type;
+  ops.op0 = make_tree (ops.type, op0);
+  ops.op1 = arg1;
+  ops.op2 = NULL_TREE;
+  ops.locat

[PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Aldy Hernandez via Gcc-patches


Pushed as obvious.

gcc/ChangeLog:

* value-range.h (irange_allocator::allocate): Increase
newir storage by one.
---
 gcc/value-range.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

   struct newir {
 irange range;
-tree mem[1];
+tree mem[2];
   };
   size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
   struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);
--
2.26.2

Re: [PATCH] optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts

2020-10-06 Thread Richard Biener

On Fri, 2 Oct 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > This introduces a permute optimization phase for SLP which is
> > intended to cover the existing permute eliding for SLP reductions
> > plus handling commonizing the easy cases.
> >
> > It currently uses graphds to compute a postorder on the reverse
> > SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
> > did (hopefully - I've adjusted most testcases that triggered it
> > a few days ago).  It restricts itself to move around bijective
> > permutations to simplify things for now, mainly around constant nodes.
> >
> > As a prerequesite it makes the SLP graph cyclic (ugh).  It looks
> > like it would pay off to compute a PRE/POST order visit array
> > once and elide all the recursive SLP graph walks and their
> > visited hash-set.  At least for the time where we do not change
> > the SLP graph during such walk.
> >
> > I do not like using graphds too much but at least I don't have to
> > re-implement yet another RPO walk, so maybe it isn't too bad.
> >
> > Comments are welcome - I do want to see vect_attempt_slp_rearrange_stmts
> > go way for GCC 11 and the permute optimization helps non-store
> > BB vectorization opportunities where we can end up with a lot of
> > useless load permutes otherwise.
> 
> Looks really nice.  Got a couple of questions that probably just show
> my misunderstanding :-)
> 
> Is this intended to compute an optimal-ish solution?

The intent was to keep it simple but compute a solution that will
not increase the number of permutes.

> It looked from
> a quick read like it tried to push permutes as far away from loads as
> possible without creating permuted and unpermuted versions of the same
> node.  But I guess there will be cases where the optimal placement is
> somewhere between the two extremes of permuting at the loads and
> permuting as far away as possible.

So what it does is that it pushes permutes away from the loads until
there's a use requiring a different permutation.  But handling of
constants/externals as having "all" permutations causes us to push
permutes along binary ops with one constant/external argument (in
addition to pushing it along all unary operations).

I have some patches that try to unify constant/external nodes during
SLP build (we're currently _not_ sharing them, thus not computing their
cost correctly) - once that's in (not sure if it happens this stage1)
it would make sense to try to not have too many different permutes
of constants/externals (esp. externals I guess).

Now, did you have some other sub-optimality in mind?

> Of course, whatever we do will be a heuristic.  I just wasn't sure how
> often this would be best in practice.

Yeah, so I'm not sure where in a "series" of unary ops we'd want to
push a permutation.  The argument could be to leave it at the load
for as little as possible changes from the current handling.  That
could be done with a reverse propagation stage.  I'll see if
splitting out some predicates from the current code makes it not
too much duplication to introduce this.

> It looks like the materialisation phase changes the choices for nodes
> on the fly, is that right?  If so, how does that work for backedges?
> I'd expected the materialisation phase to treat the permutation choice
> as read-only, and simply implement what the graph already said.

The materialization phase is also the decision stage (wanted to avoid
duplicating the loop).  When we materialize a permutation at the
node which has differing uses we have to update the graph from there.
As for backedges I wasn't sure and indeed there may be bugs - I do
have to investigate one libgomp FAIL from the testing.  It would be
odd to require iteration in the decision stage again but in case we're
breaking a cycle we have to re-consider the backedge permutation as well.
Which would mean we'd better to the decision where to materialize during
the propagation stage(?)

I'm going to analyze the FAIL now.

Richard.

> Thanks,
> Richard
> 
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Richard.
> >
> > 2020-10-02  Richard Biener  
> >
> > * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
> > Use SLP_TREE_REPRESENTATIVE.
> > * tree-vectorizer.h (_slp_tree::vertex): New member used
> > for graphds interfacing.
> > * tree-vect-slp.c (vect_build_slp_tree_2): Allocate space
> > for PHI SLP children.
> > (vect_analyze_slp_backedges): New function filling in SLP
> > node children for PHIs that correspond to backedge values.
> > (vect_analyze_slp): Call vect_analyze_slp_backedges for the
> > graph.
> > (vect_slp_analyze_node_operations): Deal with a cyclic graph.
> > (vect_schedule_slp_instance): Likewise.
> > (vect_schedule_slp): Likewise.
> > (slp_copy_subtree): Remove.
> > (vect_slp_rearrange_stmts): Likewise.
> > (vect_attempt_slp_rearrange_stmts): Likewise.
> > (vect_slp_build_v

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Richard Biener via Gcc-patches

On Fri, Oct 2, 2020 at 3:23 PM Martin Liška  wrote:
>
> On 9/24/20 2:41 PM, Richard Biener wrote:
> > On Wed, Sep 2, 2020 at 1:53 PM Martin Liška  wrote:
> >>
> >> On 9/1/20 4:50 PM, David Malcolm wrote:
> >>> Hope this is constructive
> >>> Dave
> >>
> >> Thank you David. All of them very very useful!
> >>
> >> There's updated version of the patch.
> >
> > I noticed several functions without a function-level comment.
> >
> > -  cluster (tree case_label_expr, basic_block case_bb, profile_probability 
> > prob,
> > -  profile_probability subtree_prob);
> > +  inline cluster (tree case_label_expr, basic_block case_bb,
> > + profile_probability prob, profile_probability 
> > subtree_prob);
> >
> > I thought we generally leave this to the compiler ...
>
> Hey.
>
> This one is needed, otherwise we'll have a compilation error (multiple 
> definitions).
>
> >
> > +@item -fconvert-if-to-switch
> > +@opindex fconvert-if-to-switch
> > +Perform conversion of an if cascade into a switch statement.
> > +Do so if the switch can be later transformed using a jump table
> > +or a bit test.  The transformation can help to produce faster code for
> > +the switch statement.  This flag is enabled by default
> > +at @option{-O2} and higher.
> >
> > this mentions we do this only when we later can convert the
> > switch again but both passes (we still have two :/) have
> > independent guards.
>
> All right, I'm planning to come up with -fbit-tests options and this 
> transformation
> will happen only if BT or JT are enabled.
>
> >
> > +  /* For now, just wipe the dominator information.  */
> > +  free_dominance_info (CDI_DOMINATORS);
> >
> > could at least be conditional on the vop renaming condition...
>
> How do you mean this?

don't free dominators if you didn't change anything.

> >
> > +  if (!all_candidates.is_empty ())
> > +mark_virtual_operands_for_renaming (fun);
> >
> > +  if (bitmap_bit_p (*visited_bbs, bb->index))
> > +   break;
> > +  bitmap_set_bit (*visited_bbs, bb->index);
> >
> > since you are using a bitmap and not a sbitmap (why?)
> > you can combine those into
>
> Yes, sbitmap would be better.
>
> >
> > if (!bitmap_set_bit (*visited_bbs, bb->index))
> >  break;
>
> Unfortunately, bitmap_set_bit for sbitmap is a void return function.
> Should I change it?

No, with sbitmaps you have to keep your current code.

> >
> > +  /* Current we support following patterns (situations):
> > +
> > +1) if condition with equal operation:
> > +
> > ...
> >
> > did you see whether using
> >
> > register_edge_assert_for (lhs, true_edge, code, lhs, rhs, asserts);
> >
> > works equally well?  It fills the 'asserts' vector with relations
> > derived from 'lhs'.  There's also
> > vr_values::extract_range_for_var_from_comparison_expr
> > to compute the case_range
> >
> > +  /* If it's not the first condition, then we need a BB without
> > +any statements.  */
> > +  if (!first)
> > +   {
> > + unsigned stmt_count = 0;
> > + for (gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> > +  !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
> > +   ++stmt_count;
> > +
> > + if (stmt_count - visited_stmt_count != 0)
> > +   break;
> >
> > hmm, OK, this might be a bit iffy to get correct then, still it's a lot
> > of pattern maching code that is there elsewhere already.
> > ifcombine simply hoists any stmts without side-effects up the
> > dominator tree and thus only requires BBs without side-effects
> > (IIRC there's a predicate fn for that).
> >
> > +  /* Prevent loosing information for a PHI node where 2 edges will
> > +be folded into one.  Note that we must do the same also for 
> > false_edge
> > +(for last BB in a if-elseif chain).  */
> > +  if (!chain->record_phi_arguments (true_edge)
> > + || !chain->record_phi_arguments (false_edge))
> >
> > I don't really get this - looking at record_phi_arguments it seems
> > we're requiring that all edges into the same PHI from inside the case
> > (irrespective of from which case label) have the same value for the
> > PHI arg?
> >
> > + if (arg != *v)
> > +   return false;
>
> This one is really needed for situations like:
>
> cat wchar.i
> int i;
>
> int
> pg_utf_mblen() {
>int len;
>if (i == 4)
>  len = 3;
>else if (i == 2)
>  len = 4;
>else if (i == 6)
>  len = 1;
>return len;
> }
>
> where we end up just with one edge from switch BB to a destination BB where
> we have the PHI:
># len_4 = PHI <3(2), 4(3), len_6(D)(4), 1(5)>

Yeah, see my comment on how to deal with this in code generation
(introduce a forwarder block).

> >
> > should use operand_equal_p at least, REAL_CSTs are for example
> > not shared tree nodes.  I'll also notice that if record_phi_arguments
> > fails we still may have altered its hash-map even though the particular
> > edge will not participate in t

Re: [PATCH][ftracer] Factor out can_duplicate_bb_p

2020-10-06 Thread Richard Biener

On Mon, 5 Oct 2020, Tom de Vries wrote:

> [ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]
> 
> On 10/5/20 9:05 AM, Tom de Vries wrote:
> > Ack, updated the patch accordingly, and split it up in two bits, one
> > that does refactoring, and one that adds the actual caching:
> > - [ftracer] Factor out can_duplicate_bb_p
> > - [ftracer] Add caching of can_duplicate_bb_p
> > 
> > I'll post these in reply to this email.
> > 
> 
> OK?

OK.

Richard.

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 09:37:21AM +0200, Aldy Hernandez via Gcc-patches wrote:
> Pushed as obvious.
> 
> gcc/ChangeLog:
> 
>   * value-range.h (irange_allocator::allocate): Increase
>   newir storage by one.
> ---
>  gcc/value-range.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 94b48e55e77..7031a823138 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)
> 
>struct newir {
>  irange range;
> -tree mem[1];
> +tree mem[2];
>};
>size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
>struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);

So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).
Is there any reason why the code is written that way?
I mean, we could just use:
  size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
  irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
  return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Jakub

Re: [PATCH][ftracer] Add caching of can_duplicate_bb_p

2020-10-06 Thread Richard Biener

On Mon, 5 Oct 2020, Tom de Vries wrote:

> [ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]
> 
> On 10/5/20 9:05 AM, Tom de Vries wrote:
> > Ack, updated the patch accordingly, and split it up in two bits, one
> > that does refactoring, and one that adds the actual caching:
> > - [ftracer] Factor out can_duplicate_bb_p
> > - [ftracer] Add caching of can_duplicate_bb_p
> > 
> > I'll post these in reply to this email.
> 
> OK?

OK.

Richard.

Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Richard Biener via Gcc-patches

On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:
>
> On 10/5/20 6:34 PM, Ian Lance Taylor wrote:
> > On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
> >>
> >> The previous patch was not correct. This one should be.
> >>
> >> Ready for master?
> >
> > I don't understand why this code uses symtab_indices_shndx at all.
> > There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
> > any need for the symtab_indices_shndx vector.
>
> Well, the question is if we can have multiple .symtab sections in one ELF
> file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX sections.
> Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
> by PR81968 which is about Solaris ld.

It wasn't my code but I suppose this way the implementation was
"easiest".  There
should be exactly one symtab / shndx section.  Rainer authored this support.

> >
> > But in any case this patch looks OK.

I also think the patch looks OK.  Rainer?

Richard.

> Waiting for a feedback from Richi.
>
> Thanks,
> Martin
>
> >
> > Thanks.
> >
> > Ian
> >
>

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Richard Biener via Gcc-patches

On Tue, Oct 6, 2020 at 9:21 AM Alexandre Oliva  wrote:
>
> On Oct  6, 2020, Richard Biener  wrote:
>
> > On October 6, 2020 3:15:02 AM GMT+02:00, Alexandre Oliva 
> >  wrote:
> >>
> >> This is a first step towards enabling the sincos optimization in Ada.
> >>
> >> The issue this patch solves is that sincos takes the type to be looked
> >> up with mathfn_built_in from variables or temporaries in which results
> >> of sin and cos are stored.  In Ada, sin and cos are declared in an
> >> internal aux package, with uses thereof in a standard generic package,
> >> which ensures that the types are not what mathfn_built_in expects.
>
> > But are they not compatible?
>
> They are, in that they use the same underlying representation, but
> they're distinct types, not associated with the same TYPE_MAIN_VARIANT.
>
> In Ada it's not unusual to have multiple floating-point types unrelated
> to each other, even if they share identical underlying representation.
> Each such type is a distinct type, in a similar sense that in C++ each
> struct type holding a single double field is a distinct type.
>
> Each such distinct FP type gets a different instantiation of
> Ada.Numerics.Generic_Elementary_Functions, just as a C++ template taking
> a parameter type would get different instantiations for such different
> struct types.
>
>
> Overall, it's a very confusing situation.  We use these alternate types
> to declare the Sin and Cos functions imported from libm as intrinsics
> (separate patch I've written very recently, yet to be contributed), and
> they get matched to the libm intrinsics despite the distinct types, we
> issue calls to them, passing variables of the alternate types without
> explicit conversions, but when the sincos pass looks up the sincos/cexpi
> intrinsic, it uses the alternate type taken from the variable and fails,
> rather than the types declared as taken by the builtins.

OK, I see.  mathfn_built_in expects a type inter-operating with
the C ABI types (float_type_node, double_type_node, etc.) where
"inter-operating" means having the same main variant.

Now, I guess for the sincos pass we want to combine sinl + cosl
to sincosl, independent on the case where the result would be
assigned to a 'double' when 'double == long double'?  Now
what about sinl + cos when 'double == long double'?  What I'm after
is rather than going with sth like your patch use the original
builtin code to derive the canonical C ABI type, sort-of a "reverse"
mathfn_built_in.  In execute_cse_sincos_1 we have the

  switch (gimple_call_combined_fn (use_stmt))
{
CASE_CFN_COS:
  seen_cos |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
  break;

CASE_CFN_SIN:
  seen_sin |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
  break;

CASE_CFN_CEXPI:
  seen_cexpi |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
  break;

default:;
}

switch so we _do_ have an idea what builtins we call and thus
we could add a

tree
mathfn_type (combined_fn fn)
{
  switch (fn)
{
case all-double-typed-fns:
   return double_type_node;
case ...:
...
}

function that might prove useful in other places we're trying to find
a "matching" alternate builtin function?  Maybe this function can
also simply look at the implicit/explicit defined decls but at least
for the case of lrint using the return value will be misleading.

Richard.


> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer

Re: [PATCH] divmod: Match and expand DIVMOD even in some cases of constant divisor [PR97282]

2020-10-06 Thread Richard Biener

On Tue, 6 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
> ifn call for division + modulo by constant for the fear that during
> expansion we could generate better code for those cases.
> If the divisoris a power of two, that is certainly the case always,
> but otherwise expand_divmod can punt in many cases, e.g. if the division
> type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
> choose_multiplier, because it works on HOST_WIDE_INTs (true, something
> we should fix eventually now that we have wide_ints), or if pre/post shift
> is larger than BITS_PER_WORD.
> 
> So, the following patch recognizes DIVMOD with constant last argument even
> when it is unclear if expand_divmod will be able to optimize it, and then
> during DIVMOD expansion if the divisor is constant attempts to expand it as
> division + modulo and if they actually don't contain any libcalls or
> division/modulo, they are kept as is, otherwise that sequence is thrown away
> and divmod optab or libcall is used.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2020-10-06  Jakub Jelinek  
> 
>   PR rtl-optimization/97282
>   * tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
>   constant op2 if it is not a power of two and the type has precision
>   larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
>   * internal-fn.c (contains_call_div_mod): New function.
>   (expand_DIVMOD): If last argument is a constant, try to expand it as
>   TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
>   contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
>   divmod optab or divmod libfunc.
> 
>   * gcc.target/i386/pr97282.c: New test.
> 
> --- gcc/tree-ssa-math-opts.c.jj   2020-10-01 10:40:10.104755999 +0200
> +++ gcc/tree-ssa-math-opts.c  2020-10-05 13:51:54.476628287 +0200
> @@ -3567,9 +3567,24 @@ divmod_candidate_p (gassign *stmt)
>  
>/* Disable the transform if either is a constant, since 
> division-by-constant
>   may have specialized expansion.  */
> -  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
> +  if (CONSTANT_CLASS_P (op1))
>  return false;
>  
> +  if (CONSTANT_CLASS_P (op2))
> +{
> +  if (integer_pow2p (op2))
> + return false;
> +
> +  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> +   && TYPE_PRECISION (type) <= BITS_PER_WORD)
> + return false;
> +
> +  /* If the divisor is not power of 2 and the precision wider than
> +  HWI, expand_divmod punts on that, so in that case it is better
> +  to use divmod optab or libfunc.  Similarly if choose_multiplier
> +  might need pre/post shifts of BITS_PER_WORD or more.  */
> +}
> +
>/* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
>   expand using the [su]divv optabs.  */
>if (TYPE_OVERFLOW_TRAPS (type))
> --- gcc/internal-fn.c.jj  2020-10-02 10:36:43.272290992 +0200
> +++ gcc/internal-fn.c 2020-10-05 15:15:12.498349327 +0200
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-phinodes.h"
>  #include "ssa-iterators.h"
>  #include "explow.h"
> +#include "rtl-iter.h"
>  
>  /* The names of each internal function, indexed by function number.  */
>  const char *const internal_fn_name_array[] = {
> @@ -2985,6 +2986,32 @@ expand_gather_load_optab_fn (internal_fn
>  emit_move_insn (lhs_rtx, ops[0].value);
>  }
>  
> +/* Helper for expand_DIVMOD.  Return true if the sequence starting with
> +   INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
> +
> +static bool
> +contains_call_div_mod (rtx_insn *insn)
> +{
> +  subrtx_iterator::array_type array;
> +  for (; insn; insn = NEXT_INSN (insn))
> +if (CALL_P (insn))
> +  return true;
> +else if (INSN_P (insn))
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> + switch (GET_CODE (*iter))
> +   {
> +   case CALL:
> +   case DIV:
> +   case UDIV:
> +   case MOD:
> +   case UMOD:
> + return true;
> +   default:
> + break;
> +   }
> +  return false;
> + }
> +
>  /* Expand DIVMOD() using:
>   a) optab handler for udivmod/sdivmod if it is available.
>   b) If optab_handler doesn't exist, generate call to
> @@ -3007,10 +3034,44 @@ expand_DIVMOD (internal_fn, gcall *call_
>rtx op1 = expand_normal (arg1);
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>  
> -  rtx quotient, remainder, libfunc;
> +  rtx quotient = NULL_RTX, remainder = NULL_RTX;
> +  rtx_insn *insns = NULL;
> +
> +  if (TREE_CODE (arg1) == INTEGER_CST)
> +{
> +  /* For DIVMOD by integral constants, there could be efficient code
> +  expanded inline e.g. using shifts and plus/minus.  Try to expand
> +  the division and modulo and if it emits any library calls or any
> +  {,U}{DIV,MOD} rtxes throw it away and use a divmod o

[PATCH] arm: [MVE] Remove illegal intrinsics

2020-10-06 Thread Christophe Lyon via Gcc-patches

A few MVE intrinsics had an unsigned variant implement while they are
supported by the hardware.  This patch removes them:
__arm_vqrdmlashq_n_u8
__arm_vqrdmlahq_n_u8
__arm_vqdmlahq_n_u8
__arm_vqrdmlashq_n_u16
__arm_vqrdmlahq_n_u16
__arm_vqdmlahq_n_u16
__arm_vqrdmlashq_n_u32
__arm_vqrdmlahq_n_u32
__arm_vqdmlahq_n_u32
__arm_vmlaldavaxq_p_u32
__arm_vmlaldavaxq_p_u16

2020-10-06  Christophe Lyon  

gcc/
PR target/96914
* config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
(vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
(vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32)
(vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
* config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u)
(vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
* config/arm/mve.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U, VQRDMLASHQ_N_U)
(VMLALDAVAXQ_P_U): Remove unspecs.
(VQDMLAHQ_N_U, VQRDMLAHQ_N_U, VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U):
Remove attributes.
(VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove
unsigned variants from iterators.
(mve_vqdmlahq_n_, mve_vqrdmlahq_n_)
(mve_vqrdmlashq_n_, mve_vmlaldavaxq_p_):
Update comment.

gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
---
 gcc/config/arm/arm_mve.h   | 199 +
 gcc/config/arm/arm_mve_builtins.def|   4 -
 gcc/config/arm/mve.md  |  32 ++--
 .../arm/mve/intrinsics/vmlaldavaxq_p_u16.c |  21 ---
 .../arm/mve/intrinsics/vmlaldavaxq_p_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u16.c   |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u32.c   |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u16.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u32.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u8.c   |  21 ---
 14 files changed, 23 insertions(+), 443 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 59460ef..bee0579 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1237,9 +1237,6 @@
 #define vpselq_u8(__a, __b, __p) __arm_vpselq_u8(__a, __b, __p)
 #define vpselq_s8(__a, __b, __p) __arm_vpselq_s8(__a, __b, __p)
 #define vrev64q_m_u8(__inactive, __a, __p) __arm_vrev64q_m_u8(__inactive, __a, 
__p)
-#define vqrdmlashq_n_u8(__a, __b, __c) __arm_vqrdmlashq_n_u8(__a, __b, __c)
-#define vqrdmlahq_n_u8(__a, __b, __c) __arm_vqrdmlahq_n_u8(__a, __b, __c)
-#define vqdmlahq_n_u8(__a, __b, __c) __arm_vqdmlahq_n_u8(__a, __b, __c)
 #define vmvnq_m_u8(__inactive, __a, __p) __arm_vmvnq_m_u8(__inactive, __a, __p)
 #define vmlasq_n_u8(__a, __b, __c) __arm_vmlasq_n_u8(__a, __b, __c)
 #define vmlaq_n_u8(__a, __b, __c) __arm_vmlaq_n_u8(__a, __b, __c)
@@ -1323,9 +1320,6 @@
 #define vpselq_u16(__a, __b, __p) __arm_vpselq_u16(__a, __b, __p)
 #define vpselq_s16(__a, __b, __p) __arm_vpselq_s16(__a, __b, __p)
 #define vrev64q_m_u16(__inactive, __a, __p) __arm_vrev64q_m_u16(__inactive, 
__a, __p)
-#define vqrdmlash

Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-06 Thread Christophe Lyon via Gcc-patches

ping?

On Tue, 29 Sep 2020 at 21:50, Christophe Lyon
 wrote:
>
> When mi_delta is > 255 and -mpure-code is used, we cannot load delta
> from code memory (like we do without -mpure-code).
>
> This patch builds the value of mi_delta into r3 with a series of
> movs/adds/lsls.
>
> We also do some cleanup by not emitting the function address and delta
> via .word directives at the end of the thunk since we don't use them
> with -mpure-code.
>
> No need for new testcases, this bug was already identified by
> eg. pr46287-3.C
>
> 2020-09-29  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and
> do not emit function address and delta when -mpure-code is used.
>
> k#   (use "git pull" to merge the remote branch into yours)
> ---
>  gcc/config/arm/arm.c | 91 
> +---
>  1 file changed, 66 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index ceeb91f..62abeb5 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -28342,9 +28342,43 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT 
> delta,
>  {
>if (mi_delta > 255)
> {
> - fputs ("\tldr\tr3, ", file);
> - assemble_name (file, label);
> - fputs ("+4\n", file);
> + /* With -mpure-code, we cannot load delta from the constant
> +pool: we build it explicitly.  */
> + if (target_pure_code)
> +   {
> + bool mov_done_p = false;
> + int i;
> +
> + /* Emit upper 3 bytes if needed.  */
> + for (i = 0; i < 3; i++)
> +   {
> + int byte = (mi_delta >> (8 * (3 - i))) & 0xff;
> +
> + if (byte)
> +   {
> + if (mov_done_p)
> +   asm_fprintf (file, "\tadds\tr3, #%d\n", byte);
> + else
> +   asm_fprintf (file, "\tmovs\tr3, #%d\n", byte);
> + mov_done_p = true;
> +   }
> +
> + if (mov_done_p)
> +   asm_fprintf (file, "\tlsls\tr3, #8\n");
> +   }
> +
> + /* Emit lower byte if needed.  */
> + if (!mov_done_p)
> +   asm_fprintf (file, "\tmovs\tr3, #%d\n", mi_delta & 0xff);
> + else if (mi_delta & 0xff)
> +   asm_fprintf (file, "\tadds\tr3, #%d\n", mi_delta & 0xff);
> +   }
> + else
> +   {
> + fputs ("\tldr\tr3, ", file);
> + assemble_name (file, label);
> + fputs ("+4\n", file);
> +   }
>   asm_fprintf (file, "\t%ss\t%r, %r, r3\n",
>mi_op, this_regno, this_regno);
> }
> @@ -28380,30 +28414,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, 
> HOST_WIDE_INT delta,
> fputs ("\tpop\t{r3}\n", file);
>
>fprintf (file, "\tbx\tr12\n");
> -  ASM_OUTPUT_ALIGN (file, 2);
> -  assemble_name (file, label);
> -  fputs (":\n", file);
> -  if (flag_pic)
> +
> +  /* With -mpure-code, we don't need to emit literals for the
> +function address and delta since we emitted code to build
> +them.  */
> +  if (!target_pure_code)
> {
> - /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn".  */
> - rtx tem = XEXP (DECL_RTL (function), 0);
> - /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC
> -pipeline offset is four rather than eight.  Adjust the offset
> -accordingly.  */
> - tem = plus_constant (GET_MODE (tem), tem,
> -  TARGET_THUMB1_ONLY ? -3 : -7);
> - tem = gen_rtx_MINUS (GET_MODE (tem),
> -  tem,
> -  gen_rtx_SYMBOL_REF (Pmode,
> -  ggc_strdup (labelpc)));
> - assemble_integer (tem, 4, BITS_PER_WORD, 1);
> -   }
> -  else
> -   /* Output ".word .LTHUNKn".  */
> -   assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1);
> + ASM_OUTPUT_ALIGN (file, 2);
> + assemble_name (file, label);
> + fputs (":\n", file);
> + if (flag_pic)
> +   {
> + /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn".  */
> + rtx tem = XEXP (DECL_RTL (function), 0);
> + /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC
> +pipeline offset is four rather than eight.  Adjust the offset
> +accordingly.  */
> + tem = plus_constant (GET_MODE (tem), tem,
> +  TARGET_THUMB1_ONLY ? -3 : -7);
> + tem = gen_rtx_MINUS (GET_MODE (tem),
> +  tem,
> +  gen_rtx_SYMBOL_REF (Pmode,
> +  ggc_st

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-10-06 Thread Richard Biener via Gcc-patches

On Fri, Oct 2, 2020 at 10:43 AM Alexandre Oliva  wrote:
>
> Here's what I got so far, passing regstrap except for field-merge-1.c,
> that relies on combining non-adjacent compares, which I haven't
> implemented yet.

Thanks for trying.

> I had to retain some parts of fold_truth_andor_1 to avoid regressions in
> gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
> gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
> turning sign tests into masks.
>
> I moved the field-compare-merging bits of fold_truth_andor_1, and
> auxiliary functions for it, into fold_truth_andor_maybe_separate, though
> the ability to deal with non-adjacent compares is no longer used, at
> least for the time being.
>
> I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
> maybe_fold_and_comparisons, but...  the fact that it may return a tree
> that isn't a gimple condexpr, e.g. when there is masking or shifting,
> and that requires re-gimplification, suggests moving it to
> tree-ssa-ifcombine.c where this can be dealt with, possibly even for
> non-adjacent compares.  Right now, there is code in ifcombine to
> re-gimplify results from the regular folder, which I've extended, for
> the time being, to maybe_fold_and_comparisons results as well.

A bit of a baind-aid, but let it be this way for now.

> I've reworked decode_field_reference to seek and "substitute" SSA DEFs
> in a way that avoids building new trees, but the test for
> substitutability feels incomplete: there's nothing to ensure that vuses
> aren't brought from much-earlier blocks, and that there couldn't
> possibly be an intervening store.  I suspect I will have to give these
> pieces a little more info for it to be able to tell whether the memory
> accesses, if moved, would still get the same value.  Is there anything
> easily reusable for this sort of test?

Well, the easiest way (and matching the GENERIC folding) is to
make sure the VUSEs are the same.  I didn't quite get the "substitute"
code though.

>
> As for fields crossing alignment boundaries, the two-tests condition
> currently returned by fold_truth_andor_maybe_separate, that ends up
> gimplified into a new block, causes the chain of ifcombine optimizations
> to stop, something that I'm yet to investigate.  My plan is to rearrange
> ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
> to handle such resplit conditions by inserting one in each of the
> preexisting blocks, to simplify the gimplification and in preparation
> for combining non-adjacent compares, if we wish to do that.  Do we?

I guess it's simply because we iterate over a fixed set of BBs?  Or of course
if you re-shape conditions rather than just "merging" aka eliding one you
might create new opportunities.

>  I
> was convinced that it was a safe optimization, because of the absence of
> intervening side effects and the undefinedness of intervening trapping
> memory accesses, but combining later tests that access a word with
> earlier tests that access the same word may cause intervening trapping
> accesses to be skipped.  Is this a possibility we should enable or
> disable on e.g. a per-language basis, avoid altogether, or ...  any
> other suggestions?

ISTR ifcombine checks whether there are side-effects (such as trapping)
that might be skipped.  And indeed we shouldn't do such thing though
eliding a trap is OK-ish, introducing one is obviously not ;)  We do,
after all, fold 0/x to 0 (just not literal 0/0).

>
> Richi, is this sort of what you had in mind, or were you thinking of
> something for match.pd or so?  Is match.pd even able to perform
> cross-block multi-condition matches?

So what I see in your patch is matching the condition itself - this
part can be done using match.pd (match ...) patterns in a much
nicer way, see for example the use of gimple_ctz_table_index
in tree-ssa-forwprop.c and the corresponding

/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
   The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
   constant which when multiplied by a power of 2 contains a unique value
   in the top 5 or 6 bits.  This is then indexed into a table which maps it
   to the number of trailing zeroes.  */
(match (ctz_table_index @1 @2 @3)
  (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))

match.pd itself doesn't match "control flow", that is you'd have to simplify
  if (c == 1) if (c != 5) { ... }
as (c == 1) & (c != 5) which match.pd might turn into 0.

> Any further advice as to my plans above?

I do think we eventually want to remove all GENERIC time handling of _load_
simplifications.  Note with match.pd you cannot easily handle loads, but
eventually there's bits that do not involve loads that can be implemented
as match.pd patterns.

Otherwise thanks for working on this.

Richard.

> Thanks,
>
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0cc80ad..47b5419 100644
> --- a/gcc/fold-const

Re: [committed] libstdc++: Add deduction guide for std::ranges::join_view [LWG 3474]

2020-10-06 Thread Jonathan Wakely via Gcc-patches


On 06/10/20 00:05 -0500, Tim Song via Libstdc++ wrote:

I thought LWG approved the other option in the PR (changing views::join to
not use CTAD)?


Oops, good point. Fixed like so.

Tested powerpc64le-linux, pushed.



On Mon, Aug 24, 2020 at 10:22 AM Jonathan Wakely via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:


This implements the proposed resolution for LWG 3474.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view): Add deduction guide (LWG 3474).
* testsuite/std/ranges/adaptors/join_lwg3474.cc: New test.

Tested powerpc64le-linux. Committed to trunk.




commit 9065c4adab0b1280f5707d53833d195d0d350fd2
Author: Jonathan Wakely 
Date:   Tue Oct 6 09:41:16 2020

libstdc++: Avoid CTAD for std::ranges::join_view [LWG 3474]

In commit ef275d1f2083f8a1fa1b59a3cd07fd3e8431023e I implemented the
wrong resolution of LWG 3474. This removes the deduction guide and
alters the views::join factory to create the right type explicitly.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view): Remove deduction guide.
(views::join): Add explicit template argument list to prevent
deducing the wrong type.
* testsuite/std/ranges/adaptors/join.cc: Move test for LWG 3474
here, from ...
* testsuite/std/ranges/adaptors/join_lwg3474.cc: Removed.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 7fd5d5110ed..10f1f7b525b 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -2369,17 +2369,14 @@ namespace views
   template
 explicit join_view(_Range&&) -> join_view>;
 
-// _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 3474. Nesting join_views is broken because of CTAD
-  template
-explicit join_view(join_view<_View>) -> join_view>;
-
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptorClosure join
   = []  (_Range&& __r)
   {
-	return join_view{std::forward<_Range>(__r)};
+	// _GLIBCXX_RESOLVE_LIB_DEFECTS
+	// 3474. Nesting join_views is broken because of CTAD
+	return join_view>{std::forward<_Range>(__r)};
   };
   } // namespace views
 
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
index 142c9feddcd..e21e7054b35 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
@@ -123,6 +123,21 @@ test06()
   b = ranges::end(v);
 }
 
+void
+test07()
+{
+  // LWG 3474. Nesting join_views is broken because of CTAD
+  std::vector>> nested_vectors = {
+{{1, 2, 3}, {4, 5}, {6}},
+{{7},   {8, 9}, {10, 11, 12}},
+{{13}}
+  };
+  auto joined = nested_vectors | std::views::join | std::views::join;
+
+  using V = decltype(joined);
+  static_assert( std::same_as, int> );
+}
+
 int
 main()
 {
@@ -132,4 +147,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc b/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc
deleted file mode 100644
index 516aaba7070..000
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc
+++ /dev/null
@@ -1,37 +0,0 @@
-// Copyright (C) 2020 Free Software Foundation, Inc.
-//
-// This file is part of the GNU ISO C++ Library.  This library is free
-// software; you can redistribute it and/or modify it under the
-// terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option)
-// any later version.
-
-// This library is distributed in the hope that it will be useful,
-// but WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-// GNU General Public License for more details.
-
-// You should have received a copy of the GNU General Public License along
-// with this library; see the file COPYING3.  If not see
-// .
-
-// { dg-options "-std=gnu++2a" }
-// { dg-do compile { target c++2a } }
-
-#include 
-#include 
-
-void
-test01()
-{
-  // LWG 3474. Nesting join_views is broken because of CTAD
-  std::vector>> nested_vectors = {
-{{1, 2, 3}, {4, 5}, {6}},
-{{7},   {8, 9}, {10, 11, 12}},
-{{13}}
-  };
-  auto joined = nested_vectors | std::views::join | std::views::join;
-
-  using V = decltype(joined);
-  static_assert( std::same_as, int> );
-}

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andreas Schwab

On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:

> I mean, we could just use:
>   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
>   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
>   return new (r) irange ((tree *) (r + 1), num_pairs);
> without any new type.

Modulo proper alignment.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH 2/2] arm: Improve handling of relocations with small offsets with -mpure-code on v6m (PR96770)

2020-10-06 Thread Christophe Lyon via Gcc-patches

ping?

On Mon, 28 Sep 2020 at 11:09, Christophe Lyon
 wrote:
>
> With -mpure-code on v6m (thumb-1), we can use small offsets with
> upper/lower relocations to avoid the extra addition of the
> offset.
>
> This patch accepts expressions symbol+offset as legitimate constants
> when the literal pool is disabled, making sure that the offset is
> within the range supported by thumb-1 [0..255].
>
> It also makes sure that thumb1_movsi_insn emits an error in case we
> try to use it with an unsupported RTL construct.
>
> 2020-09-28  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (thumb_legitimate_constant_p): Accept
> (symbol_ref + addend) when literal pool is disabled.
> (arm_valid_symbolic_address_p): Add support for thumb-1 without
> MOVT/MOVW.
> * config/arm/thumb1.md (*thumb1_movsi_insn): Accept (symbol_ref +
> addend) in the pure-code alternative.
>
> gcc/testsuite/
> * gcc.target/arm/pure-code/pr96770.c: New test.
> ---
>  gcc/config/arm/arm.c | 15 ---
>  gcc/config/arm/thumb1.md |  5 +++--
>  gcc/testsuite/gcc.target/arm/pure-code/pr96770.c | 21 +
>  3 files changed, 36 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index abe357e..ceeb91f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -9489,7 +9489,8 @@ thumb_legitimate_constant_p (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
>  we build the symbol address with upper/lower
>  relocations.  */
>   || (TARGET_THUMB1
> - && GET_CODE (x) == SYMBOL_REF
> + && !label_mentioned_p (x)
> + && arm_valid_symbolic_address_p (x)
>   && arm_disable_literal_pool)
>   || flag_pic);
>  }
> @@ -31495,7 +31496,10 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, 
> rtx out, rtx in,
> According to the ARM ELF ABI, the initial addend of REL-type relocations
> processing MOVW and MOVT instructions is formed by interpreting the 16-bit
> literal field of the instruction as a 16-bit signed value in the range
> -   -32768 <= A < 32768.  */
> +   -32768 <= A < 32768.
> +
> +   In Thumb-1 mode, we use upper/lower relocations which have an 8-bit
> +   unsigned range of 0 <= A < 256.  */
>
>  bool
>  arm_valid_symbolic_address_p (rtx addr)
> @@ -31519,7 +31523,12 @@ arm_valid_symbolic_address_p (rtx addr)
>xop1 = XEXP (tmp, 1);
>
>if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
> - return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
> +   {
> + if (TARGET_THUMB1 && !TARGET_HAVE_MOVT)
> +   return IN_RANGE (INTVAL (xop1), 0, 0xff);
> + else
> +   return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
> +   }
>  }
>
>return false;
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 3dedcae..2258a52 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -675,7 +675,7 @@ (define_insn "*thumb1_movsi_insn"
>case 7:
>/* pure-code alternative: build the constant byte by byte,
>  instead of loading it from a constant pool.  */
> -   if (GET_CODE (operands[1]) == SYMBOL_REF)
> +   if (arm_valid_symbolic_address_p (operands[1]))
>   {
> output_asm_insn (\"movs\\t%0, #:upper8_15:%1\", operands);
> output_asm_insn (\"lsls\\t%0, #8\", operands);
> @@ -686,7 +686,7 @@ (define_insn "*thumb1_movsi_insn"
> output_asm_insn (\"adds\\t%0, #:lower0_7:%1\", operands);
> return \"\";
>   }
> -   else
> +   else if (GET_CODE (operands[1]) == CONST_INT)
>   {
> int i;
> HOST_WIDE_INT op1 = INTVAL (operands[1]);
> @@ -721,6 +721,7 @@ (define_insn "*thumb1_movsi_insn"
>   output_asm_insn ("adds\t%0, %1", ops);
> return "";
>   }
> + gcc_unreachable ();
>
>case 8: return "ldr\t%0, %1";
>case 9: return "str\t%1, %0";
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
> new file mode 100644
> index 000..a43d71f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mpure-code" } */
> +
> +int arr[1000];
> +int *f4 (void) { return &arr[1]; }
> +
> +/* For cortex-m0 (thumb-1/v6m), we generate 4 movs with upper/lower:#arr+4.  
> */
> +/* { dg-final { scan-assembler-times "\\+4" 4 { target { { ! 
> arm_thumb1_movt_ok } && { ! arm_thumb2_ok } } } } } */
> +
> +/* For cortex-m with movt/movw (thumb-1/v8m.base or thumb-2), we
> +   generate a movt/movw pair with upper/lower:#arr+4.  */
> +/* { dg-final { scan-assembler-times "\\+4" 2 { target { arm_thumb1_movt_ok 
> ||

Re: [PATCH 1/2] arm: Avoid indirection with -mpure-code on v6m (PR96967)

2020-10-06 Thread Christophe Lyon via Gcc-patches

ping?

On Mon, 28 Sep 2020 at 11:09, Christophe Lyon
 wrote:
>
> With -mpure-code on v6m (thumb-1), to avoid a useless indirection when
> building the address of a symbol, we want to consider SYMBOL_REF as a
> legitimate constant. This way, we build the address using a series of
> upper/lower relocations instead of loading the address from memory.
>
> This patch also fixes a missing "clob" conds attribute for
> thumb1_movsi_insn, needed because that alternative clobbers the flags.
>
> 2020-09-28  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (thumb_legitimate_constant_p): Add support for
> disabled literal pool in thumb-1.
> * config/arm/thumb1.md (thumb1_movsi_symbol_ref): Remove.
> (*thumb1_movsi_insn): Add support for SYMBOL_REF with -mpure-code.
>
> gcc/testsuite
> * gcc.target/arm/pure-code/pr96767.c: New test.
> ---
>  gcc/config/arm/arm.c |   6 ++
>  gcc/config/arm/thumb1.md | 102 
> +++
>  gcc/testsuite/gcc.target/arm/pure-code/pr96767.c |  10 +++
>  3 files changed, 63 insertions(+), 55 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96767.c
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 022ef6c..abe357e 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -9485,6 +9485,12 @@ thumb_legitimate_constant_p (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
>   || CONST_DOUBLE_P (x)
>   || CONSTANT_ADDRESS_P (x)
>   || (TARGET_HAVE_MOVT && GET_CODE (x) == SYMBOL_REF)
> + /* On Thumb-1 without MOVT/MOVW and literal pool disabled,
> +we build the symbol address with upper/lower
> +relocations.  */
> + || (TARGET_THUMB1
> + && GET_CODE (x) == SYMBOL_REF
> + && arm_disable_literal_pool)
>   || flag_pic);
>  }
>
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 4a59d87..3dedcae 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -43,27 +43,6 @@
>
>
>
> -(define_insn "thumb1_movsi_symbol_ref"
> -  [(set (match_operand:SI 0 "register_operand" "=l")
> -   (match_operand:SI 1 "general_operand" ""))
> -   ]
> -  "TARGET_THUMB1
> -   && arm_disable_literal_pool
> -   && GET_CODE (operands[1]) == SYMBOL_REF"
> -  "*
> -  output_asm_insn (\"movs\\t%0, #:upper8_15:%1\", operands);
> -  output_asm_insn (\"lsls\\t%0, #8\", operands);
> -  output_asm_insn (\"adds\\t%0, #:upper0_7:%1\", operands);
> -  output_asm_insn (\"lsls\\t%0, #8\", operands);
> -  output_asm_insn (\"adds\\t%0, #:lower8_15:%1\", operands);
> -  output_asm_insn (\"lsls\\t%0, #8\", operands);
> -  output_asm_insn (\"adds\\t%0, #:lower0_7:%1\", operands);
> -  return \"\";
> -  "
> -  [(set_attr "length" "14")
> -   (set_attr "conds" "clob")]
> -)
> -
>  (define_insn "*thumb1_adddi3"
>[(set (match_operand:DI  0 "register_operand" "=l")
> (plus:DI (match_operand:DI 1 "register_operand" "%0")
> @@ -696,40 +675,53 @@ (define_insn "*thumb1_movsi_insn"
>case 7:
>/* pure-code alternative: build the constant byte by byte,
>  instead of loading it from a constant pool.  */
> -   {
> - int i;
> - HOST_WIDE_INT op1 = INTVAL (operands[1]);
> - bool mov_done_p = false;
> - rtx ops[2];
> - ops[0] = operands[0];
> -
> - /* Emit upper 3 bytes if needed.  */
> - for (i = 0; i < 3; i++)
> -   {
> -  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> -
> - if (byte)
> -   {
> - ops[1] = GEN_INT (byte);
> - if (mov_done_p)
> -   output_asm_insn ("adds\t%0, %1", ops);
> - else
> -   output_asm_insn ("movs\t%0, %1", ops);
> - mov_done_p = true;
> -   }
> -
> - if (mov_done_p)
> -   output_asm_insn ("lsls\t%0, #8", ops);
> -   }
> +   if (GET_CODE (operands[1]) == SYMBOL_REF)
> + {
> +   output_asm_insn (\"movs\\t%0, #:upper8_15:%1\", operands);
> +   output_asm_insn (\"lsls\\t%0, #8\", operands);
> +   output_asm_insn (\"adds\\t%0, #:upper0_7:%1\", operands);
> +   output_asm_insn (\"lsls\\t%0, #8\", operands);
> +   output_asm_insn (\"adds\\t%0, #:lower8_15:%1\", operands);
> +   output_asm_insn (\"lsls\\t%0, #8\", operands);
> +   output_asm_insn (\"adds\\t%0, #:lower0_7:%1\", operands);
> +   return \"\";
> + }
> +   else
> + {
> +   int i;
> +   HOST_WIDE_INT op1 = INTVAL (operands[1]);
> +   bool mov_done_p = false;
> +   rtx ops[2];
> +   ops[0] = operands[0];
> +
> +   /* Emit upper 3 bytes if needed.  */
> +   for (i = 0; i < 3; i++)
> + {
> +   int byte = (op1 >> (8 * (3 - i))

[Patch] configure: Fix in-tree building of GMP on BSD (PR97302)

2020-10-06 Thread Tobias Burnus


As reported in the PR, the in-tree build of GMP fails for BSD.
The reason is that with_gmp is set if there is a gmp.h under
/usr/local – such that in-tree GMP is build but later not used,
which causes build fails if /usr/local has no proper GMP.

Fixed by skipping the setting of with_gmp if there is an
in-tree GMP.

Patch was tested by the PR creator.

OK for the trunk? (Other branches?)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
configure: Fix in-tree building of GMP on BSD (PR97302)

ChangeLog:
	PR target/97302
	* configure.ac: Only set with_gmp to /usr/local
	if not building in tree.
	* configure: Regenerate.

diff --git a/configure b/configure
index 057b88966e4..2b805dbd26a 100755
--- a/configure
+++ b/configure
@@ -3764,6 +3764,7 @@ case "${target}" in
 ;;
   *-*-freebsd*)
 if test "x$with_gmp" = x && test "x$with_gmp_dir" = x \
+	&& ! test -d ${srcdir}/gmp \
 	&& test -f /usr/local/include/gmp.h; then
   with_gmp=/usr/local
 fi
diff --git a/configure.ac b/configure.ac
index 392389fb2fb..e0da73548c9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1038,6 +1038,7 @@ case "${target}" in
 ;;
   *-*-freebsd*)
 if test "x$with_gmp" = x && test "x$with_gmp_dir" = x \
+	&& ! test -d ${srcdir}/gmp \
 	&& test -f /usr/local/include/gmp.h; then
   with_gmp=/usr/local
 fi

Re: Fix handling of stores in modref_summary::useful_p

2020-10-06 Thread Szabolcs Nagy via Gcc-patches

The 10/05/2020 23:45, Jan Hubicka wrote:
> > The 10/05/2020 17:28, Szabolcs Nagy via Gcc-patches wrote:
> > minimal reproducer:
> > 
> > #include 
> > int main()
> > {
> > int r,t;
> > r = sscanf("01", "%2x", &t);
> > printf("scanf: %d  %02x\n", r, t);
> > return 0;
> > }
> > 
> > should print
> > 
> > scanf: 1  01
> > 
> > but when glibc is compiled with gcc trunk on aarch64 it prints
> > 
> > scanf: 0  00
> > 
> > i will continute the debugging from here tomorrow.
> 
> There is a report on glibc issue here 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97264
> it turned out to be a latent glibc bug type punning const char * and
> const unsigned char *.
> 
> I wonder if it is same as problem you are seeing?

thanks, that indeed looks very similar, i'll comment on the glibc bug.

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 10:47:34AM +0200, Andreas Schwab wrote:
> On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:
> 
> > I mean, we could just use:
> >   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
> >   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
> >   return new (r) irange ((tree *) (r + 1), num_pairs);
> > without any new type.
> 
> Modulo proper alignment.

Sure, but irange's last element is tree * which is pointer to pointer,
and we need here an array of tree, i.e. pointers.  So, it would indeed break
on a hypothetical host that has smaller struct X ** alignment than struct X *
alignment.  I'm not aware of any.
One could add a static_assert to verify that (that alignof (irange) >= alignof 
(tree)
and that sizeof (irange) % alignof (tree) == 0).

Jakub

Re: [Patch] configure: Fix in-tree building of GMP on BSD (PR97302)

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 10:56:35AM +0200, Tobias Burnus wrote:
> As reported in the PR, the in-tree build of GMP fails for BSD.
> The reason is that with_gmp is set if there is a gmp.h under
> /usr/local – such that in-tree GMP is build but later not used,
> which causes build fails if /usr/local has no proper GMP.
> 
> Fixed by skipping the setting of with_gmp if there is an
> in-tree GMP.
> 
> Patch was tested by the PR creator.
> 
> OK for the trunk? (Other branches?)

Ok for trunk and branches.

> configure: Fix in-tree building of GMP on BSD (PR97302)

I think the conventions say [PR97302] rather than (PR97302).

> ChangeLog:
>   PR target/97302
>   * configure.ac: Only set with_gmp to /usr/local
>   if not building in tree.
>   * configure: Regenerate.

Jakub

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Aldy Hernandez via Gcc-patches





On 10/6/20 9:52 AM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 09:37:21AM +0200, Aldy Hernandez via Gcc-patches wrote:

Pushed as obvious.

gcc/ChangeLog:

* value-range.h (irange_allocator::allocate): Increase
newir storage by one.
---
  gcc/value-range.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

struct newir {
  irange range;
-tree mem[1];
+tree mem[2];
};
size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);


So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).


We know we need _at least_ two trees, so what's wrong with the above?


Is there any reason why the code is written that way?
I mean, we could just use:
   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;


We had that originally, but IIRC, the alignment didn't come out right.

Aldy


   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
   return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Jakub

Re: [patch] convert -Walloca pass to ranger

2020-10-06 Thread Aldy Hernandez via Gcc-patches





On 10/5/20 7:12 PM, Martin Sebor wrote:

It the future, I would even like to remove the specific range the 
ranger was able to compute from the error message itself.  As will 
become obvious, the ranger can get pretty outrageous ranges that are 
entirely non-obvious by looking at the code.  Peppering the error 
messages with these ranges will ultimately just confuse the user.  But 
alas, that's a problem for another patch to solve.


I agree that when it comes to sizes where just one bound of the range
is used to decide whether or not to warn (the lower bound in the case
of most warnings but, as the example above shows, the upper bound for
-Walloca-larger-than=), printing multiple subranges is unnecessary
and could easily be confusing.  Even printing the very large bounds
(in decimal) in the warning above may be too much.  At the same time,
simply printing:

   warning: argument to ‘alloca’ may be too large

and nothing else wouldn't be helpful either.  (Though it would make
the alloca pass real simple ;-)  It's a matter of deciding what
the right amount of detail is in each instance and choosing the best
representation for the values included in it (small values are okay
in decimal, larger values may be better formatted in hex, or involving
well-known symbolic constants like INT_MAX or PTRDIFF_MAX).  But
different people have different ideas about how much detail is enough
and what presentation is best.  Rather than each of us imposing our
individual preference on all users and ending up with arbitrary
inconsistencies between warnings we should design them so that their
output can be customized.  If I like to see the full ranges in warnings
in decimal I should be able to ask GCC to do that. If I just care about
the high level details (say just the more important bound) and prefer
hex for large numbers there should be a way to do that too.  It's just
a matter of adding a %R or some such to the pretty-printer to format
a range and exposing an option that will let users choose the format.
(It still leaves room for us to argue about the defaults ;)


Sure, some general shared infrastructure for reporting errors involving 
ranges would be nice, inasmuch as the testcases do not test for specific 
ranges since those are liable to change from release to release.


Aldy

Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-10-06 Thread Andreas Schwab

options-save.c: In function 'void cl_target_option_save(cl_target_option*, 
gcc_options*, gcc_options*)':
options-save.c:8526:26: error: unused variable 'mask' [-Werror=unused-variable]
 8526 |   unsigned HOST_WIDE_INT mask = 0;
  |  ^~~~
options-save.c: In function 'void cl_target_option_restore(gcc_options*, 
gcc_options*, cl_target_option*)':
options-save.c:8537:26: error: unused variable 'mask' [-Werror=unused-variable]
 8537 |   unsigned HOST_WIDE_INT mask;
  |  ^~~~

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Alexandre Oliva

On Oct  6, 2020, Richard Biener  wrote:

> OK, I see.  mathfn_built_in expects a type inter-operating with
> the C ABI types (float_type_node, double_type_node, etc.) where
> "inter-operating" means having the same main variant.

Yup.

> Now, I guess for the sincos pass we want to combine sinl + cosl
> to sincosl, independent on the case where the result would be
> assigned to a 'double' when 'double == long double'?

Sorry, I goofed in the patch description and misled you.

When looking at

  _d = sin (_s);

the sincos didn't take the type of _d, but that of _s.

I changed it so it takes the not from the actual passed to the
intrinsic, but from the formal in the intrinsic declaration.

If we had conversions of _s to different precisions, the optimization
wouldn't kick in: we'd have different actuals passed to sin and cos.
I'm not sure it makes much sense to try to turn e.g.

  _d1 = sin (_s);
  _t = (float) _s;
  _d2 = cosf (_t);

into:

  sincos (_s, &D1, &T);
  _d1 = D1;
  _td2 = T;
  _d2 = (float) _td2;

If someone goes through the trouble of computing sin and cos for the
same angle at different precisions, you might as well leave it alone.

> Now what about sinl + cos when 'double == long double'?

Now that might make more sense to optimize, but if we're going to do
such transformations, we might as well canonicalize the *l intrinsics to
the equivalent double versions (assuming long double and double have the
same precision), and then sincos will get it right.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: [PATCH] divmod: Match and expand DIVMOD even in some cases of constant divisor [PR97282]

2020-10-06 Thread Christophe Lyon via Gcc-patches

Hi Jakub,

On Tue, 6 Oct 2020 at 10:13, Richard Biener  wrote:
>
> On Tue, 6 Oct 2020, Jakub Jelinek wrote:
>
> > Hi!
> >
> > As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
> > ifn call for division + modulo by constant for the fear that during
> > expansion we could generate better code for those cases.
> > If the divisoris a power of two, that is certainly the case always,
> > but otherwise expand_divmod can punt in many cases, e.g. if the division
> > type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
> > choose_multiplier, because it works on HOST_WIDE_INTs (true, something
> > we should fix eventually now that we have wide_ints), or if pre/post shift
> > is larger than BITS_PER_WORD.
> >
> > So, the following patch recognizes DIVMOD with constant last argument even
> > when it is unclear if expand_divmod will be able to optimize it, and then
> > during DIVMOD expansion if the divisor is constant attempts to expand it as
> > division + modulo and if they actually don't contain any libcalls or
> > division/modulo, they are kept as is, otherwise that sequence is thrown away
> > and divmod optab or libcall is used.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> OK.
>
> Richard.
>
> > 2020-10-06  Jakub Jelinek  
> >
> >   PR rtl-optimization/97282
> >   * tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
> >   constant op2 if it is not a power of two and the type has precision
> >   larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
> >   * internal-fn.c (contains_call_div_mod): New function.
> >   (expand_DIVMOD): If last argument is a constant, try to expand it as
> >   TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
> >   contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
> >   divmod optab or divmod libfunc.
> >

This patch causes ICEs on arm while building newlib or glibc

For instance with newlib when compiling vfwprintf.o:
during RTL pass: expand
In file included from
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdio/vfprintf.c:153:
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdio.h:
In function '_vfprintf_r':
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdio.h:503:9:
internal compiler error: in int_mode_for_mode, at stor-layout.c:404
  503 | int _vfprintf_r (struct _reent *, FILE *__restrict, const
char *__restrict, __VALIST)
  | ^~~
0xaed4e3 int_mode_for_mode(machine_mode)

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/stor-layout.c:404
0x7ff73d emit_move_via_integer
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3425
0x808f2d emit_move_insn_1(rtx_def*, rtx_def*)
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3793
0x8092d7 emit_move_insn(rtx_def*, rtx_def*)
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3935
0x6e703f emit_library_call_value_1(int, rtx_def*, rtx_def*,
libcall_type, machine_mode, int, std::pair*)

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/calls.c:5601
0xdff642 emit_library_call_value
/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/rtl.h:4258
0xdff642 arm_expand_divmod_libfunc

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:33256
0x8c69af expand_DIVMOD

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/internal-fn.c:3084
0x7021b7 expand_call_stmt

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:2612
0x7021b7 expand_gimple_stmt_1

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3686
0x7021b7 expand_gimple_stmt

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3851
0x702cfd expand_gimple_basic_block

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5892
0x70533e execute

/tmp/1435347_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6576

Christophe



> >   * gcc.target/i386/pr97282.c: New test.
> >
> > --- gcc/tree-ssa-math-opts.c.jj   2020-10-01 10:40:10.104755999 +0200
> > +++ gcc/tree-ssa-math-opts.c  2020-10-05 13:51:54.476628287 +0200
> > @@ -3567,9 +3567,24 @@ divmod_candidate_p (gassign *stmt)
> >
> >/* Disable the transform if either is a constant, since 
> > division-by-constant
> >   may have specialized expansion.  */
> > -  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
> > +  if (CONSTANT_CLASS_P (op1))
> >  return false;
> >
> > +  if (CONSTANT_CLASS_P (op2))
> > +{
> > +  if (integer_pow2p (op2))
> > + return false;
> > +
> > +  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> > +   && TYPE_PRECISION (type) <= BITS_PER_WORD)
> > + return false;
> > +
> > +  /* If the divisor is not power of 2 and the prec

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Aldy Hernandez via Gcc-patches


-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making it 
public, or making builtin_access a friend of builtin_memref (eeech)?


Aldy

Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Martin Liška


On 10/6/20 10:00 AM, Richard Biener wrote:

On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:


On 10/5/20 6:34 PM, Ian Lance Taylor wrote:

On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:


The previous patch was not correct. This one should be.

Ready for master?


I don't understand why this code uses symtab_indices_shndx at all.
There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
any need for the symtab_indices_shndx vector.


Well, the question is if we can have multiple .symtab sections in one ELF
file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX sections.
Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
by PR81968 which is about Solaris ld.


It wasn't my code but I suppose this way the implementation was
"easiest".  There
should be exactly one symtab / shndx section.  Rainer authored this support.


If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
suggesting
an updated version of the patch. It's what Ian offered.

Thoughts?
Martin





But in any case this patch looks OK.


I also think the patch looks OK.  Rainer?

Richard.


Waiting for a feedback from Richi.

Thanks,
Martin



Thanks.

Ian





>From bb259b4dc2a79ef45d449896d05855122ecc2ef9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 5 Oct 2020 18:03:08 +0200
Subject: [PATCH] lto: fix LTO debug sections copying.

readelf -S prints:

There are 81999 section headers, starting at offset 0x1f488060:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg Lk Inf Al
  [ 0]   NULL 00 01404f 00 81998   0  0
  [ 1] .groupGROUP    40 08 04 81995 105027  4
...
  [81995] .symtab   SYMTAB   d5d9298 2db310 18 81997 105026  8
  [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 079dd8 04 81995   0  4
  [81997] .strtab   STRTAB   d92e380 80460c 00  0   0  1
...

Expect only at maximum one .symtab_shndx section.

libiberty/ChangeLog:

PR lto/97290
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Expect only one .symtab_shndx section.
---
 libiberty/simple-object-elf.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 7c9d492f6a4..6dc5c60a842 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1109,7 +1109,7 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
   unsigned new_i;
   unsigned *sh_map;
   unsigned first_shndx = 0;
-  unsigned int *symtab_indices_shndx;
+  unsigned int symtab_shndx = 0;
 
   shdr_size = (ei_class == ELFCLASS32
 	   ? sizeof (Elf32_External_Shdr)
@@ -1151,9 +1151,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
   pfnret = XNEWVEC (int, shnum);
   pfnname = XNEWVEC (const char *, shnum);
 
-  /* Map of symtab to index section.  */
-  symtab_indices_shndx = XCNEWVEC (unsigned int, shnum - 1);
-
   /* First perform the callbacks to know which sections to preserve and
  what name to use for those.  */
   for (i = 1; i < shnum; ++i)
@@ -1188,10 +1185,9 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
  shdr, sh_type, Elf_Word);
   if (sh_type == SHT_SYMTAB_SHNDX)
 	{
-	  unsigned int sh_link;
-	  sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
- shdr, sh_link, Elf_Word);
-	  symtab_indices_shndx[sh_link - 1] = i;
+	  if (symtab_shndx != 0)
+	return "Multiple SYMTAB SECTION INDICES sections";
+	  symtab_shndx = i - 1;
 	  /* Always discard the extended index sections, after
 	 copying it will not be needed.  This way we don't need to
 	 update it and deal with the ordering constraints of
@@ -1323,7 +1319,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  *err = 0;
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return "ELF section name out of range";
 	}
 
@@ -1341,7 +1336,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	{
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return errmsg;
 	}
 
@@ -1362,7 +1356,6 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  XDELETEVEC (buf);
 	  XDELETEVEC (names);
 	  XDELETEVEC (shdrs);
-	  XDELETEVEC (symtab_indices_shndx);
 	  return errmsg;
 	}
 
@@ -1372,19 +1365,22 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	{
 	  unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	  shdr, sh_entsize, Elf_Addr);
-	  unsigned strtab = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	 shdr, sh_link, Elf_Word);
 	  size_t prevailing_name_idx = 0;
 	  unsigned char *e

[PATCH][obvious] dbgcnt: report upper limit when lower == upper

2020-10-06 Thread Martin Liška


Hey.

There's one obvious patch that should inform about upper limit reach
of a debug counter.

I'm going to install the patch.
Martin

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt): Report also upper limit.
---
 gcc/dbgcnt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index ae98a281d63..01893ce7238 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -79,7 +79,10 @@ dbg_cnt (enum debug_counter index)
 {
   print_limit_reach (map[index].name, v, false);
   if (min == max)
-   limits[index].pop ();
+   {
+ print_limit_reach (map[index].name, v, true);
+ limits[index].pop ();
+   }
   return true;
 }
   else if (v < max)
--
2.28.0

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 11:20:52AM +0200, Aldy Hernandez wrote:
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 94b48e55e77..7031a823138 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)
> > > 
> > > struct newir {
> > >   irange range;
> > > -tree mem[1];
> > > +tree mem[2];
> > > };
> > > size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 
> > > 1));
> > > struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);
> > 
> > So, we essentially want a flexible array member, which C++ without extension
> > doesn't have, and thus need to rely on the compiler handling the trailing
> > array as a poor men's flexible array member (again, GCC does for any size,
> > but not 100% sure about other compilers, if they e.g. don't handle that way
> > just size of 1).
> 
> We know we need _at least_ two trees, so what's wrong with the above?

See the discussions we even had in GCC.  Some of us are arguing that only
flexible array member should be treated as such, others also add [0] to
that, others [1] and others any arrays at the trailing positions.
Because standard C++ lacks both [] and [0], at least [1] support is needed
eventhough perhaps pedantically it is invalid.  GCC after all heavily relies
on that elsewhere, e.g. in rtl or gimple structures.  But it is still all
just [1], not [2] or [32].  And e.g. Coverity complains about that a lot.
There is another way around it, using [MAXIMUM_POSSIBLE_COUNT] instead and
then allocating only a subset of those using offsetof to count the size.
But that is undefined in a different way, would probably make Coverity
happy and e.g. for RTL is doable because we have maximum number of operands,
and for many gimple stmts too, except that e.g. GIMPLE_CALLs don't really
have a maximum (well, have it as UINT_MAX - 3 or so).

GCC to my knowledge will treat all the trailing arrays that way, but it is
unclear if other compilers do the same or not.
You can use mem[1] and just use
  size_t nbytes = sizeof (newir) + sizeof (tree) * (2 * num_pairs - 1);

> > Is there any reason why the code is written that way?
> > I mean, we could just use:
> >size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
> 
> We had that originally, but IIRC, the alignment didn't come out right.

That surprises me, because I don't understand how it could (unless irange
didn't have a pointer member at that point).

Jakub

Re: [PATCH] RISC-V: Derive ABI from -march if -mabi is not present.

2020-10-06 Thread Maciej W. Rozycki

On Tue, 6 Oct 2020, Kito Cheng wrote:

> I think this patch is kind of major change for GCC RISC-V port, so I cc all
> RISC-V gcc maintainer to make sure this change is fine with you guys.
> 
>  - Motivation of this patch:
>1. Sync behavior between clang/llvm.
>2. Preparation for -mcpu option support, -mcpu will set -march
>   according the core default arch, however it would be awkward
>   if we only change arch: user need to know the default arch of
>   the core and then set the right ABI, of cause user still can
>   specify arch and abi via -march and -mabi.
> 
>  - This patch has change the behavior for default value of ABI, the ABI
>will derive from -march if -mabi is not given, which is same behavior
>as clang/llvm.

 Just to warn you: it used to be the case with the MIPS target and the 
`-mips[1234]' ISA level options originating from SGI's MIPSpro toolchain 
and it has turned out confusing and troublesome.  After many discussions 
we ended up with the current `-march='/`-mtune='/`-mabi=' scheme, for the 
instruction set, the DFA scheduling and the ABI respectively.  Defaults 
are set with `--with-arch='/`--with-tune='/`--with-abi=' respectively, and 
in the absence of an override `-mtune=' is derived from `-march=', which 
is derived from `-mabi='.  Defaults for different ABIs can be set with 
respective `--with-arch*=' and `--with-tune*=' options.

 This prevents the ABI from being changed unexpectedly, especially if 
different though link-compatible `-march=' options are used for individual 
objects in a compilation.

 The MIPS port used to have `-mcpu=' as well, which used to be roughly 
equivalent to modern `-mtune='; from your description I gather `-mcpu=' is 
going to be roughly equivalent to a combination of `-mtune=' and `-march=' 
setting DFA scheduling for a specific CPU and the instruction set to the 
underlying architecture (do we plan to allow vendor extensions?).  In 
which case to compile a set of CPU-specific modules to be linked together 
(e.g. individual platform support in a generic piece of software like an 
OS kernel or a bare metal library) you'll always have to specify the ABI 
explicitly (though maybe you want anyway, hmm).

 FWIW,

  Maciej

[PATCH] dbgcnt: print list after compilation

2020-10-06 Thread Martin Liška


Hello.

Motivation of the patch is to display debug counter values after a compilation.
It's handy for bisection of a debug counter. The new output is printed to stderr
(instead of stdout) and it works fine with LTO as well.

Sample output:

  counter name  counter value closed intervals
-
  asan_use_after_scope  0 unset
  auto_inc_dec  0 unset
  ccp   29473 unset
  cfg_cleanup   292   unset
  cprop 45unset
  cse2_move2add 451   unset
  dce   740   unset
  dce_fast  15unset
  dce_ud15unset
  delete_trivial_dead   5747  unset
  devirt0 unset
  df_byte_scan  0 unset
  dom_unreachable_edges 10unset
  tail_call 393   [1, 4], [100, 200]
...


Ready for master?
Thanks,
Martin

gcc/ChangeLog:

* common.opt: Remove -fdbg-cnt-list from deferred options.
* dbgcnt.c (dbg_cnt_set_limit_by_index): Make a copy
to original_limits.
(dbg_cnt_list_all_counters): Print also current counter value
and print to stderr.
* opts-global.c (handle_common_deferred_options): Do not handle
-fdbg-cnt-list.
* opts.c (common_handle_option): Likewise.
* toplev.c (finalize): Handle it after compilation here.
---
 gcc/common.opt|  2 +-
 gcc/dbgcnt.c  | 25 +++--
 gcc/opts-global.c |  4 
 gcc/opts.c|  5 -
 gcc/toplev.c  |  4 
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 292c2de694e..7e789d1c47f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1202,7 +1202,7 @@ Common Report Var(flag_data_sections)
 Place data items into their own section.
 
 fdbg-cnt-list

-Common Report Var(common_deferred_options) Defer
+Common Report Var(flag_dbg_cnt_list)
 List all available debugging counters with their limits and counts.
 
 fdbg-cnt=

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 01893ce7238..2a2dd57507d 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -45,6 +45,7 @@ static struct string2counter_map 
map[debug_counter_number_of_counters] =
 typedef std::pair limit_tuple;
 
 static vec limits[debug_counter_number_of_counters];

+static vec original_limits[debug_counter_number_of_counters];
 
 static unsigned int count[debug_counter_number_of_counters];
 
@@ -134,6 +135,8 @@ dbg_cnt_set_limit_by_index (enum debug_counter index, const char *name,

}
 }
 
+  original_limits[index] = limits[index].copy ();

+
   return true;
 }
 
@@ -226,25 +229,27 @@ void

 dbg_cnt_list_all_counters (void)
 {
   int i;
-  printf ("  %-30s %s\n", G_("counter name"), G_("closed intervals"));
-  printf 
("-\n");
+  fprintf (stderr, "  %-30s%-15s   %s\n", G_("counter name"),
+  G_("counter value"), G_("closed intervals"));
+  fprintf (stderr, 
"-\n");
   for (i = 0; i < debug_counter_number_of_counters; i++)
 {
-  printf ("  %-30s ", map[i].name);
-  if (limits[i].exists ())
+  fprintf (stderr, "  %-30s%-15d   ", map[i].name, count[i]);
+  if (original_limits[i].exists ())
{
- for (int j = limits[i].length () - 1; j >= 0; j--)
+ for (int j = original_limits[i].length () - 1; j >= 0; j--)
{
- printf ("[%u, %u]", limits[i][j].first, limits[i][j].second);
+ fprintf (stderr, "[%u, %u]", original_limits[i][j].first,
+  original_limits[i][j].second);
  if (j > 0)
-   printf (", ");
+   fprintf (stderr, ", ");
}
- putchar ('\n');
+ fprintf (stderr, "\n");
}
   else
-   printf ("unset\n");
+   fprintf (stderr, "unset\n");
 }
-  printf ("\n");
+  fprintf (stderr, "\n");
 }
 
 #if CHECKING_P

diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b024ab8e18f..1816acf805b 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -378,10 +378,6 @@ handle_common_deferred_options (void)
  dbg_cnt_process_opt (opt->arg);
  break;
 
-	case OPT_fdbg_cnt_list:

- dbg_cnt_list_all_counters ();
- break;
-
case OPT_fdebug_prefix_map_:
  add_debug_prefix_map (opt->arg);
  break;
diff --git a/gcc/opts.c b/gcc/opts.c
index 3bda59afced..da503c32dd0 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2361,11 +2361,6 @@ common_handle_option (struct gcc_options *opts,
   /* Deferred.  */
   break;
 
-case OPT

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andreas Schwab

On Okt 06 2020, Jakub Jelinek wrote:

> On Tue, Oct 06, 2020 at 10:47:34AM +0200, Andreas Schwab wrote:
>> On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:
>> 
>> > I mean, we could just use:
>> >   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
>> >   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
>> >   return new (r) irange ((tree *) (r + 1), num_pairs);
>> > without any new type.
>> 
>> Modulo proper alignment.
>
> Sure, but irange's last element is tree * which is pointer to pointer,
> and we need here an array of tree, i.e. pointers.  So, it would indeed break
> on a hypothetical host that has smaller struct X ** alignment than struct X *
> alignment.  I'm not aware of any.
> One could add a static_assert to verify that (that alignof (irange) >= 
> alignof (tree)
> and that sizeof (irange) % alignof (tree) == 0).

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[RFC] Add support for the "retain" attribute utilizing SHF_GNU_RETAIN

2020-10-06 Thread Jozef Lawrynowicz

Hi,

I'd like to propose a new "retain" attribute, which can
be applied to function and variable declarations.

The attribute is used to protect the function or variable declaration it
is applied to from linker garbage collection, by applying the
SHF_GNU_RETAIN section flag to the section containing it. This flag is a
GNU OSABI ELF extension.

The SHF_GNU_RETAIN flag was discussed on the GNU gABI mailing list here:
https://sourceware.org/pipermail/gnu-gabi/2020q3/000429.html

The Binutils patch for SHF_GNU_RETAIN was discussed in the following
threads:
https://sourceware.org/pipermail/binutils/2020-September/113406.html
https://sourceware.org/pipermail/binutils/2020-September/113499.html

The Binutils patch is still being iterated on, and I'd like to get some
feedback on one particular aspect of the GCC functionality before
finalizing the Binutils side of things.

When the "retain" attribute is applied to a declaration, there are three
ways to apply the SHF_GNU_RETAIN flag to the section containing the
declaration:
(1) Mark the entire section containing the declaration with the
SHF_GNU_RETAIN flag
(2) Place the declaration in a new, uniquely named section with
SHF_GNU_RETAIN set.
(3) Place the declaration in a new section with its default name, and
SHF_GNU_RETAIN set.

I think that (2) is the best option, as it most closely corresponds to
the behavior the user wants to apply by using the "retain" attribute.
That is, only the declaration itself needs to be retained.

Option (3) has the same advantage, however it requires some non-standard
behavior in the assembler to support. The assembler would normally emit
an error if two input sections have the same name, but different flags
set. At the moment, SHF_GNU_RETAIN is an exception to this, but
there is no fundamental reason that this exception is required, as the
associated behavior can be fully supported by just giving the section a
unique name.

As far as I'm aware, option (1) would be tricky to support in GCC.
We'd have to examine all the declarations within a section before the
first assembler directive to create a section is created, which isn't
really compatible with the current, linear nature of the assembly output
stream. I guess there's probably something we could do in the middle-end
to set a flag somewhere to catch this without it getting too
complicated.
However, it would also lead to large portions of the program being
unnecessarily retained in the linked file, when only one declaration was
required.

If anyone has any strong opinions that option (2) isn't the best choice
for the "retain" attribute, please let me know. I plan on finalizing the
Binutils patch in the coming days, removing the added support for unique
input sections with the same name, but different states for the
SHF_GNU_RETAIN flag, which is required for option (3).

Should "used" apply SHF_GNU_RETAIN?
===
Another talking point is whether the existing "used" attribute should
apply the SHF_GNU_RETAIN flag to the containing section.

It seems unlikely that a user applies the "used" attribute to a
declaration, and means for it to be saved from only compiler
optimization, but *not* linker optimization. So perhaps it would be
beneficial for "used" to apply SHF_GNU_RETAIN in some way.

If "used" did apply SHF_GNU_RETAIN, we would also have to
consider the above options for how to apply SHF_GNU_RETAIN to the
section. Since the "used" attribute has been around for a while 
it might not be appropriate for its behavior to be changed to place the
associated declaration in its own, unique section, as in option (2).

However, I tested this "used" attribute modification on
x86_64-pc-linux-gnu, and there was only a small number of regressions
(27 PASS->FAIL, from 6 tests) across the GCC and G++ testsuites.

I briefly investigated these, and some failures are just due to a change
in the expected output of tests, but also some real errors from issues
with hot/cold function partitioning. I believe those would just require
some additional functional changes, and there isn't anything
fundamentally broken.

So nothing that can't be worked around, but I am more concerned about
the wider impact of changing the attribute, which is not represented by
this small subset of testing. The changes would also only affect targets
that support the GNU ELF OSABI, which would lead to inconsistent
behavior between non-GNU OS's. Perhaps this isn't an issue since we can
just document it in the description for the "used" attribute:
  As a GNU ELF extension, the declaration the "used" attribute is
  applied to will be placed in a new, uniquely named section with the
  SHF_GNU_RETAIN flag applied.

I think that unless "used" creates a new, uniquely named SHF_GNU_RETAIN
section for a declaration, there is merit to having a separate "retain"
attribute that has this behavior.

To summarize the talking points:
- Any downsides to the new "retain" attribute creating a new, uniquely

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška


On 10/2/20 4:19 PM, Andrew MacLeod wrote:

On 10/2/20 9:26 AM, Martin Liška wrote:

Yes, you simply get all sorts of conditions that hold when a condition is
true, not just those based on the SSA name you put in.  But it occured
to me that the use-case is somewhat different - for switch-conversion
you want to know whether the test _exactly_ matches a range test,
the VRP worker will not tell you that.  For example if you had
if (x &&  a > 3 && a < 7) then it will give you 'a in [4, 6]' and it might
not give you 'x in [1, 1]' (for example if x is float).  But that's required
for correctness.


Hello.

Adding Ranger guys. Is it something that can be handled by the upcoming changes 
in VRP?


Presumably. It depends on exactly how the code lays out.  We dont process 
floats, so we wont know anything about the float (at least this release :-).  
We will sort through complex logicals and tell you what we do know, so if x is 
integral


    if (x &&  a > 3 && a < 7)

will give you, on the final true edge:

x_5(D)  int [-INF, -1][1, +INF]
a_6(D)  int [4, 6]


IF x is a float, then we wont give you anything for x obviously, but on the 
eventual true edge we'd still give you
a_6(D)  int [4, 6]


Which is an acceptable limitation for me.

However, I can't convince ranger to give me a proper ranges for. I'm using the 
following
code snippet:

  outgoing_range query;

  edge e;
  edge_iterator ei;
  FOR_EACH_EDGE (e, ei, bb->succs)
{
  int_range_max range;
  if (query.edge_range_p (range, e))
{
  if (dump_file)
{
  fprintf (dump_file, "%d->%d: ", e->src->index, e->dest->index);
  range.dump(dump_file);
  fprintf (dump_file, "\n");
}
}
}


if (9 <= index && index <= 123)
return 123;

   :
  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)
goto ; [INV]
  else
goto ; [INV]

I get:

2->3: _Bool [1, 1]
2->4: _Bool [0, 0]

Can I get to index_5 [9, 123] ?
Thanks,
Martin




Andrew

[PATCH] openmp: Improve composite simd vectorization

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 08:22:13AM +0200, Richard Biener wrote:
> > I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> > and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> > together with that inner loop another loop, but apparently loop discovery
> > thinks it is just one loop.
> > Any ideas what I'm doing wrong or is there any way how to make it two loops
> > (that would also survive all the cfg cleanups until vectorization)?
> 
> The early CFG looks like we have a common header with two latches
> so it boils down to how we disambiguate those in the end (we seem
> to unify the latches via a forwarder).  IIRC OMP lowering builds
> loops itself, could it not do the appropriate disambiguation itself?

I realized I emit the same stmts on both paths (before goto doit; and before
falling through it), at least the MIN_EXPR and PLUS_EXPR, so by forcing
there an extra bb which does those two and having the "doit" label before
that the innermost loop doesn't have multiple latches anymore and so is
vectorized fine.

Will commit this after full bootstrap/regtest.

Thanks.

2020-10-06  Jakub Jelinek  

* omp-expand.c (expand_omp_simd): Don't emit MIN_EXPR and PLUS_EXPR
at the end of entry_bb and innermost init_bb, instead force arguments
for MIN_EXPR into temporaries in both cases and jump to a new bb that
performs MIN_EXPR and PLUS_EXPR.

* gcc.dg/gomp/simd-2.c: New test.
* gcc.dg/gomp/simd-3.c: New test.

--- gcc/omp-expand.c.jj 2020-09-26 10:09:57.524001314 +0200
+++ gcc/omp-expand.c2020-10-06 13:38:14.295073351 +0200
@@ -6347,6 +6347,7 @@ expand_omp_simd (struct omp_region *regi
   tree n2var = NULL_TREE;
   tree n2v = NULL_TREE;
   tree *nonrect_bounds = NULL;
+  tree min_arg1 = NULL_TREE, min_arg2 = NULL_TREE;
   if (fd->collapse > 1)
 {
   if (broken_loop || gimple_omp_for_combined_into_p (fd->for_stmt))
@@ -6406,9 +6407,10 @@ expand_omp_simd (struct omp_region *regi
 fold_convert (itype, fd->loops[i].step));
  t = fold_convert (type, t);
  tree t2 = fold_build2 (MINUS_EXPR, type, n2, n1);
- t = fold_build2 (MIN_EXPR, type, t2, t);
- t = fold_build2 (PLUS_EXPR, type, fd->loop.v, t);
- expand_omp_build_assign (&gsi, n2var, t);
+ min_arg1 = create_tmp_var (type);
+ expand_omp_build_assign (&gsi, min_arg1, t2);
+ min_arg2 = create_tmp_var (type);
+ expand_omp_build_assign (&gsi, min_arg2, t);
}
   else
{
@@ -6815,7 +6817,16 @@ expand_omp_simd (struct omp_region *regi
}
  else
t = counts[i + 1];
- t = fold_build2 (MIN_EXPR, type, t2, t);
+ expand_omp_build_assign (&gsi, min_arg1, t2);
+ expand_omp_build_assign (&gsi, min_arg2, t);
+ e = split_block (init_bb, last_stmt (init_bb));
+ gsi = gsi_after_labels (e->dest);
+ init_bb = e->dest;
+ remove_edge (FALLTHRU_EDGE (entry_bb));
+ make_edge (entry_bb, init_bb, EDGE_FALLTHRU);
+ set_immediate_dominator (CDI_DOMINATORS, init_bb, entry_bb);
+ set_immediate_dominator (CDI_DOMINATORS, l1_bb, init_bb);
+ t = fold_build2 (MIN_EXPR, type, min_arg1, min_arg2);
  t = fold_build2 (PLUS_EXPR, type, fd->loop.v, t);
  expand_omp_build_assign (&gsi, n2var, t);
}
--- gcc/testsuite/gcc.dg/gomp/simd-2.c.jj   2020-10-06 13:33:53.568870663 
+0200
+++ gcc/testsuite/gcc.dg/gomp/simd-2.c  2020-10-06 13:32:59.674655600 +0200
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+/* { dg-final { scan-tree-dump-times "vectorized \[1-9]\[0-9]* loops in 
function" 5 "vect" } } */
+
+int a[1][128];
+
+void
+foo (void)
+{
+  #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+bar (void)
+{
+  #pragma omp parallel for simd schedule (simd: dynamic, 32) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+baz (void)
+{
+  #pragma omp distribute parallel for simd schedule (simd: dynamic, 32) 
collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+qux (void)
+{
+  #pragma omp distribute simd dist_schedule (static, 128) collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
+
+void
+corge (void)
+{
+  #pragma omp taskloop simd collapse(2)
+  for (int i = 0; i < 1; i++)
+for (int j = 0; j < 128; j++)
+  a[i][j] += 3;
+}
--- gcc/testsuite/gcc.dg/gomp/simd-3.c.jj   2020-10-06 13:33:59.543783638 
+0200
+++ gcc/testsuite/gcc.dg/gomp/simd-3.c  2020-10-06 13:36:25.65065568

[PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches

Hello,

To maintain consistency with other Arm Architectures backend, iterators and 
iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from 
mve.md file to unspecs.md file.

Regression tested on arm-none-eabi and found no regressions.

Ok for master? Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:

2020-10-06  Srinath Parvathaneni  

* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRN

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 6:40 AM, Andreas Schwab wrote:

On Okt 06 2020, Jakub Jelinek wrote:


On Tue, Oct 06, 2020 at 10:47:34AM +0200, Andreas Schwab wrote:

On Okt 06 2020, Jakub Jelinek via Gcc-patches wrote:


I mean, we could just use:
   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
   return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Modulo proper alignment.

Sure, but irange's last element is tree * which is pointer to pointer,
and we need here an array of tree, i.e. pointers.  So, it would indeed break
on a hypothetical host that has smaller struct X ** alignment than struct X *
alignment.  I'm not aware of any.
One could add a static_assert to verify that (that alignof (irange) >= alignof 
(tree)
and that sizeof (irange) % alignof (tree) == 0).

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?


They do not, it just seemed wasteful to do 2 allocs each time, and it'd 
be nice to have them co-located since accessing one inevitable leads to 
accessing the other.

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 6:22 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 11:20:52AM +0200, Aldy Hernandez wrote:

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

 struct newir {
   irange range;
-tree mem[1];
+tree mem[2];
 };
 size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
 struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);

So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).

We know we need _at least_ two trees, so what's wrong with the above?

See the discussions we even had in GCC.  Some of us are arguing that only
flexible array member should be treated as such, others also add [0] to
that, others [1] and others any arrays at the trailing positions.
Because standard C++ lacks both [] and [0], at least [1] support is needed
eventhough perhaps pedantically it is invalid.  GCC after all heavily relies
on that elsewhere, e.g. in rtl or gimple structures.  But it is still all
just [1], not [2] or [32].  And e.g. Coverity complains about that a lot.
There is another way around it, using [MAXIMUM_POSSIBLE_COUNT] instead and
then allocating only a subset of those using offsetof to count the size.
But that is undefined in a different way, would probably make Coverity
happy and e.g. for RTL is doable because we have maximum number of operands,
and for many gimple stmts too, except that e.g. GIMPLE_CALLs don't really
have a maximum (well, have it as UINT_MAX - 3 or so).

GCC to my knowledge will treat all the trailing arrays that way, but it is
unclear if other compilers do the same or not.
You can use mem[1] and just use
   size_t nbytes = sizeof (newir) + sizeof (tree) * (2 * num_pairs - 1);


sure that is fine too. I was not aware of any issue with changing [1] to 
[2], it just seemed like the obvious thing :-).


so everything is copacetic  if we go back to [1] and add a sizeof(tree) 
instead?


Andrew






Is there any reason why the code is written that way?
I mean, we could just use:
size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;

We had that originally, but IIRC, the alignment didn't come out right.

That surprises me, because I don't understand how it could (unless irange
didn't have a pointer member at that point).

Jakub

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 08:47:53AM -0400, Andrew MacLeod wrote:
> > I think the proper alignment will be guaranteed if irange and tree[] are
> > obstack_alloc'd separately.  They don't need to be adjacent, do they?
> > 
> > 
> They do not, it just seemed wasteful to do 2 allocs each time, and it'd be
> nice to have them co-located since accessing one inevitable leads to
> accessing the other.

When using normal allocator like malloc or ggc allocation I'd totally agree
here, but I actually think with obstack it is better to do two successive
allocations.
obstack_alloc is generally pretty cheap, in the common case there is room
for the allocation and so it just bumps the next pointer in the structure
and that is it, for the two allocation the same like with one.
And, if there is room just for the irange and not for the subsequent
allocation, with two allocations they wouldn't be collocated, but wouldn't
waste the memory that would otherwise remain unused.

Jakub

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 8:09 AM, Martin Liška wrote:

On 10/2/20 4:19 PM, Andrew MacLeod wrote:

On 10/2/20 9:26 AM, Martin Liška wrote:
Yes, you simply get all sorts of conditions that hold when a 
condition is

true, not just those based on the SSA name you put in.  But it occured
to me that the use-case is somewhat different - for switch-conversion
you want to know whether the test _exactly_ matches a range test,
the VRP worker will not tell you that.  For example if you had
if (x &&  a > 3 && a < 7) then it will give you 'a in [4, 6]' and 
it might
not give you 'x in [1, 1]' (for example if x is float).  But that's 
required

for correctness.


Hello.

Adding Ranger guys. Is it something that can be handled by the 
upcoming changes in VRP?


Presumably. It depends on exactly how the code lays out.  We dont 
process floats, so we wont know anything about the float (at least 
this release :-).  We will sort through complex logicals and tell you 
what we do know, so if x is integral



    if (x &&  a > 3 && a < 7)

will give you, on the final true edge:

x_5(D)  int [-INF, -1][1, +INF]
a_6(D)  int [4, 6]


IF x is a float, then we wont give you anything for x obviously, but 
on the eventual true edge we'd still give you

a_6(D)  int [4, 6]


Which is an acceptable limitation for me.

However, I can't convince ranger to give me a proper ranges for. I'm 
using the following

code snippet:

  outgoing_range query;

  edge e;
  edge_iterator ei;
  FOR_EACH_EDGE (e, ei, bb->succs)
    {
  int_range_max range;
  if (query.edge_range_p (range, e))
{
  if (dump_file)
    {
  fprintf (dump_file, "%d->%d: ", e->src->index, e->dest->index);
  range.dump(dump_file);
  fprintf (dump_file, "\n");
    }
}
    }


if (9 <= index && index <= 123)
    return 123;

   :
  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)
    goto ; [INV]
  else
    goto ; [INV]

I get:

2->3: _Bool [1, 1]
2->4: _Bool [0, 0]

Can I get to index_5 [9, 123] ?
Thanks,
Martin

Ah, by just using the outgoing_range class, all you are getting is 
static edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]

I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the ranger 
(which I think is going in today).  It uses those values as the starting 
point for winding back to calculate other dependent names.


Then  you will want to query the ranger for the range of index_5 on that 
edge..


so you will need a gimple ranger instance instead of an outgoing_range 
object.


Andrew

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 8:55 AM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 08:47:53AM -0400, Andrew MacLeod wrote:

I think the proper alignment will be guaranteed if irange and tree[] are
obstack_alloc'd separately.  They don't need to be adjacent, do they?



They do not, it just seemed wasteful to do 2 allocs each time, and it'd be
nice to have them co-located since accessing one inevitable leads to
accessing the other.

When using normal allocator like malloc or ggc allocation I'd totally agree
here, but I actually think with obstack it is better to do two successive
allocations.
obstack_alloc is generally pretty cheap, in the common case there is room
for the allocation and so it just bumps the next pointer in the structure
and that is it, for the two allocation the same like with one.
And, if there is room just for the irange and not for the subsequent
allocation, with two allocations they wouldn't be collocated, but wouldn't
waste the memory that would otherwise remain unused.

Jakub

okeydoke then.  we can do 2 allocations.

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška


On 10/6/20 2:56 PM, Andrew MacLeod wrote:

Ah, by just using the outgoing_range class, all you are getting is static 
edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]
I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the ranger (which 
I think is going in today).  It uses those values as the starting point for 
winding back to calculate other dependent names.


Ah, all right!



Then  you will want to query the ranger for the range of index_5 on that edge..


Fine! So the only tricky thing here is to select a proper SSA_NAME to query 
right?
In my case I need to cover situations like:

  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)

or

_1 = aChar_8(D) == 1;
_2 = aChar_8(D) == 10;
_3 = _1 | _2;
if (_3 != 0)

Anything Ranger can help me with?

Martin



so you will need a gimple ranger instance instead of an outgoing_range object.

Andrew

[PATCH] options: Avoid unused variable mask warning [PR97305]

2020-10-06 Thread Jakub Jelinek via Gcc-patches

Hi!

On Tue, Oct 06, 2020 at 11:28:22AM +0200, Andreas Schwab wrote:
> options-save.c: In function 'void cl_target_option_save(cl_target_option*, 
> gcc_options*, gcc_options*)':
> options-save.c:8526:26: error: unused variable 'mask' 
> [-Werror=unused-variable]
>  8526 |   unsigned HOST_WIDE_INT mask = 0;
>   |  ^~~~
> options-save.c: In function 'void cl_target_option_restore(gcc_options*, 
> gcc_options*, cl_target_option*)':
> options-save.c:8537:26: error: unused variable 'mask' 
> [-Werror=unused-variable]
>  8537 |   unsigned HOST_WIDE_INT mask;
>   |  ^~~~

Oops, missed that, sorry.

The following patch should fix that, tested on x86_64-linux make
options-save.c (same file as before) and -> ia64-linux cross make
options-save.o (no warning anymore, just the unwanted declarations gone).

Ok for trunk if it passes bootstrap/regtest?

2020-10-06  Jakub Jelinek  

PR bootstrap/97305
* optc-save-gen.awk: Don't declare mask variable if explicit_mask
array is not present.

--- gcc/optc-save-gen.awk.jj2020-10-05 09:34:26.561874335 +0200
+++ gcc/optc-save-gen.awk   2020-10-06 14:44:04.679556591 +0200
@@ -597,11 +597,13 @@ for (i = 0; i < n_target_string; i++) {
 }
 
 print "";
-print "  unsigned HOST_WIDE_INT mask = 0;";
 
 j = 0;
 k = 0;
 for (i = 0; i < n_extra_target_vars; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" extra_target_vars[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -617,6 +619,9 @@ for (i = 0; i < n_target_other; i++) {
print "  ptr->explicit_mask_" var_target_other[i] " = 
opts_set->x_" var_target_other[i] ";";
continue;
}
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_other[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -628,6 +633,9 @@ for (i = 0; i < n_target_other; i++) {
 }
 
 for (i = 0; i < n_target_enum; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_enum[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -643,6 +651,9 @@ for (i = 0; i < n_target_int; i++) {
print "  ptr->explicit_mask_" var_target_int[i] " = 
opts_set->x_" var_target_int[i] ";";
continue;
}
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_int[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -654,6 +665,9 @@ for (i = 0; i < n_target_int; i++) {
 }
 
 for (i = 0; i < n_target_short; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_short[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -665,6 +679,9 @@ for (i = 0; i < n_target_short; i++) {
 }
 
 for (i = 0; i < n_target_char; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_char[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -676,6 +693,9 @@ for (i = 0; i < n_target_char; i++) {
 }
 
 for (i = 0; i < n_target_string; i++) {
+   if (j == 0 && k == 0) {
+   print "  unsigned HOST_WIDE_INT mask = 0;";
+   }
print "  if (opts_set->x_" var_target_string[i] ") mask |= 
HOST_WIDE_INT_1U << " j ";";
j++;
if (j == 64) {
@@ -732,7 +752,9 @@ for (i = 0; i < n_target_string; i++) {
 }
 
 print "";
-print "  unsigned HOST_WIDE_INT mask;";
+if (has_target_explicit_mask) {
+   print "  unsigned HOST_WIDE_INT mask;";
+}
 
 j = 64;
 k = 0;


Jakub

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 9:09 AM, Martin Liška wrote:

On 10/6/20 2:56 PM, Andrew MacLeod wrote:
Ah, by just using the outgoing_range class, all you are getting is 
static edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]

I provided that class so you could get the constant edges on switches.

if you want to get actual ranges for ssa-names, you will need the 
ranger (which I think is going in today).  It uses those values as 
the starting point for winding back to calculate other dependent names.


Ah, all right!



Then  you will want to query the ranger for the range of index_5 on 
that edge..


Fine! So the only tricky thing here is to select a proper SSA_NAME to 
query right?

In my case I need to cover situations like:

  index.0_1 = (unsigned int) index_5(D);
  _2 = index.0_1 + 4294967287;
  if (_2 <= 114)

or

    _1 = aChar_8(D) == 1;
    _2 = aChar_8(D) == 10;
    _3 = _1 | _2;
    if (_3 != 0)

Anything Ranger can help me with?

Martin



Well, it *does* assume you know the name of what you are looking for :-P


however, lets see.  it does know the names of things it can generate 
ranges for. We havent gotten around to adding an API for querying 
that.  but that would be possible.


It maintains an export list of names it can calculate ranges for (as a 
bitmap). so for your 2 examples,  the export list of the first block 
contains

   _2, index.0_1, and index_5
and in the second case, it would contain
_3, _2, _1, and aChar_8

So even if you had access to the export list, you'd still have to figure 
out which one you wanted so I'm not sure that helps.  But i suppose you 
could go thru the list looking for something interesting.


I have longer term plans to expose/determine the "control names" which 
trigger the branch, and would be  '_2' in the first example and 
'aChar_8' in the second... but that facility is not built yet.






so you will need a gimple ranger instance instead of an 
outgoing_range object.


Andrew

[PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tom de Vries

Hi,

Consider test-case test.c, with VLA A:
...
int main (void) {
  int N = 1000;
  int A[N];
  #pragma acc declare copy(A)
  return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
  #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
  int[0:D.2074] * A.1;

  {
int A[0:D.2074] [value-expr: *A.1];

saved_stack.2 = __builtin_stack_save ();
try
  {
A.1 = __builtin_alloca_with_align (D.2078, 32);
#pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
  }
finally
  {
__builtin_stack_restore (saved_stack.2);
  }
  }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
 finally
   {
+#pragma omp target oacc_declare map(from:(*A.1))
 __builtin_stack_restore (saved_stack.2);
   }
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

OK for trunk?

Thanks,
- Tom

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* gimplify.c (gimplify_bind_expr): Handle lookup in
oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

---
 gcc/gimplify.c| 13 ++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c |  5 -
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..fa89e797940 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1468,15 +1468,22 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
 
  if (flag_openacc && oacc_declare_returns != NULL)
{
- tree *c = oacc_declare_returns->get (t);
+ tree key = t;
+ if (DECL_HAS_VALUE_EXPR_P (key))
+   {
+ key = DECL_VALUE_EXPR (key);
+ if (TREE_CODE (key) == INDIRECT_REF)
+   key = TREE_OPERAND (key, 0);
+   }
+ tree *c = oacc_declare_returns->get (key);
  if (c != NULL)
{
  if (ret_clauses)
OMP_CLAUSE_CHAIN (*c) = ret_clauses;
 
- ret_clauses = *c;
+ ret_clauses = unshare_expr (*c);
 
- oacc_declare_returns->remove (t);
+ oacc_declare_returns->remove (key);
 
  if (oacc_declare_returns->is_empty ())
{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51badca42..714935772c1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -59,8 +59,3 @@ main ()
 
   return 0;
 }
-
-
-/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
-   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
-   address.  */

Re: [PATCH] options: Avoid unused variable mask warning [PR97305]

2020-10-06 Thread Richard Biener

On Tue, 6 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Oct 06, 2020 at 11:28:22AM +0200, Andreas Schwab wrote:
> > options-save.c: In function 'void cl_target_option_save(cl_target_option*, 
> > gcc_options*, gcc_options*)':
> > options-save.c:8526:26: error: unused variable 'mask' 
> > [-Werror=unused-variable]
> >  8526 |   unsigned HOST_WIDE_INT mask = 0;
> >   |  ^~~~
> > options-save.c: In function 'void cl_target_option_restore(gcc_options*, 
> > gcc_options*, cl_target_option*)':
> > options-save.c:8537:26: error: unused variable 'mask' 
> > [-Werror=unused-variable]
> >  8537 |   unsigned HOST_WIDE_INT mask;
> >   |  ^~~~
> 
> Oops, missed that, sorry.
> 
> The following patch should fix that, tested on x86_64-linux make
> options-save.c (same file as before) and -> ia64-linux cross make
> options-save.o (no warning anymore, just the unwanted declarations gone).
> 
> Ok for trunk if it passes bootstrap/regtest?

OK.

Richard.

> 2020-10-06  Jakub Jelinek  
> 
>   PR bootstrap/97305
>   * optc-save-gen.awk: Don't declare mask variable if explicit_mask
>   array is not present.
> 
> --- gcc/optc-save-gen.awk.jj  2020-10-05 09:34:26.561874335 +0200
> +++ gcc/optc-save-gen.awk 2020-10-06 14:44:04.679556591 +0200
> @@ -597,11 +597,13 @@ for (i = 0; i < n_target_string; i++) {
>  }
>  
>  print "";
> -print "  unsigned HOST_WIDE_INT mask = 0;";
>  
>  j = 0;
>  k = 0;
>  for (i = 0; i < n_extra_target_vars; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" extra_target_vars[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -617,6 +619,9 @@ for (i = 0; i < n_target_other; i++) {
>   print "  ptr->explicit_mask_" var_target_other[i] " = 
> opts_set->x_" var_target_other[i] ";";
>   continue;
>   }
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_other[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -628,6 +633,9 @@ for (i = 0; i < n_target_other; i++) {
>  }
>  
>  for (i = 0; i < n_target_enum; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_enum[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -643,6 +651,9 @@ for (i = 0; i < n_target_int; i++) {
>   print "  ptr->explicit_mask_" var_target_int[i] " = 
> opts_set->x_" var_target_int[i] ";";
>   continue;
>   }
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_int[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -654,6 +665,9 @@ for (i = 0; i < n_target_int; i++) {
>  }
>  
>  for (i = 0; i < n_target_short; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_short[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -665,6 +679,9 @@ for (i = 0; i < n_target_short; i++) {
>  }
>  
>  for (i = 0; i < n_target_char; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_char[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -676,6 +693,9 @@ for (i = 0; i < n_target_char; i++) {
>  }
>  
>  for (i = 0; i < n_target_string; i++) {
> + if (j == 0 && k == 0) {
> + print "  unsigned HOST_WIDE_INT mask = 0;";
> + }
>   print "  if (opts_set->x_" var_target_string[i] ") mask |= 
> HOST_WIDE_INT_1U << " j ";";
>   j++;
>   if (j == 64) {
> @@ -732,7 +752,9 @@ for (i = 0; i < n_target_string; i++) {
>  }
>  
>  print "";
> -print "  unsigned HOST_WIDE_INT mask;";
> +if (has_target_explicit_mask) {
> + print "  unsigned HOST_WIDE_INT mask;";
> +}
>  
>  j = 64;
>  k = 0;
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Re: [PATCH] optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmts

2020-10-06 Thread Richard Biener

On Tue, 6 Oct 2020, Richard Biener wrote:

> On Fri, 2 Oct 2020, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > This introduces a permute optimization phase for SLP which is
> > > intended to cover the existing permute eliding for SLP reductions
> > > plus handling commonizing the easy cases.
> > >
> > > It currently uses graphds to compute a postorder on the reverse
> > > SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
> > > did (hopefully - I've adjusted most testcases that triggered it
> > > a few days ago).  It restricts itself to move around bijective
> > > permutations to simplify things for now, mainly around constant nodes.
> > >
> > > As a prerequesite it makes the SLP graph cyclic (ugh).  It looks
> > > like it would pay off to compute a PRE/POST order visit array
> > > once and elide all the recursive SLP graph walks and their
> > > visited hash-set.  At least for the time where we do not change
> > > the SLP graph during such walk.
> > >
> > > I do not like using graphds too much but at least I don't have to
> > > re-implement yet another RPO walk, so maybe it isn't too bad.
> > >
> > > Comments are welcome - I do want to see vect_attempt_slp_rearrange_stmts
> > > go way for GCC 11 and the permute optimization helps non-store
> > > BB vectorization opportunities where we can end up with a lot of
> > > useless load permutes otherwise.
> > 
> > Looks really nice.  Got a couple of questions that probably just show
> > my misunderstanding :-)
> > 
> > Is this intended to compute an optimal-ish solution?
> 
> The intent was to keep it simple but compute a solution that will
> not increase the number of permutes.
> 
> > It looked from
> > a quick read like it tried to push permutes as far away from loads as
> > possible without creating permuted and unpermuted versions of the same
> > node.  But I guess there will be cases where the optimal placement is
> > somewhere between the two extremes of permuting at the loads and
> > permuting as far away as possible.
> 
> So what it does is that it pushes permutes away from the loads until
> there's a use requiring a different permutation.  But handling of
> constants/externals as having "all" permutations causes us to push
> permutes along binary ops with one constant/external argument (in
> addition to pushing it along all unary operations).
> 
> I have some patches that try to unify constant/external nodes during
> SLP build (we're currently _not_ sharing them, thus not computing their
> cost correctly) - once that's in (not sure if it happens this stage1)
> it would make sense to try to not have too many different permutes
> of constants/externals (esp. externals I guess).
> 
> Now, did you have some other sub-optimality in mind?
> 
> > Of course, whatever we do will be a heuristic.  I just wasn't sure how
> > often this would be best in practice.
> 
> Yeah, so I'm not sure where in a "series" of unary ops we'd want to
> push a permutation.  The argument could be to leave it at the load
> for as little as possible changes from the current handling.  That
> could be done with a reverse propagation stage.  I'll see if
> splitting out some predicates from the current code makes it not
> too much duplication to introduce this.
> 
> > It looks like the materialisation phase changes the choices for nodes
> > on the fly, is that right?  If so, how does that work for backedges?
> > I'd expected the materialisation phase to treat the permutation choice
> > as read-only, and simply implement what the graph already said.
> 
> The materialization phase is also the decision stage (wanted to avoid
> duplicating the loop).  When we materialize a permutation at the
> node which has differing uses we have to update the graph from there.
> As for backedges I wasn't sure and indeed there may be bugs - I do
> have to investigate one libgomp FAIL from the testing.  It would be
> odd to require iteration in the decision stage again but in case we're
> breaking a cycle we have to re-consider the backedge permutation as well.
> Which would mean we'd better to the decision where to materialize during
> the propagation stage(?)
> 
> I'm going to analyze the FAIL now.

OK, that one was a stupid mistake (passing hash_set<> by value).

The following adjusted patch computes the materialization points
during iteration so we should handle backedges more obviously
correct (I guess the previous variant worked because the SLP
graphs with backedges are quite special with only "perfect cycles"
allowed).

The question remains on whether we want to use graphds or whether
we want a (lazily filled?) SLP_TREE_PARENTS array and compute the
RPO on the reverse graph on the SLP data structure (we only need
an iteration order that has at least one child visited before
visiting parents, but we still need the reverse edges - still
a pre-order on the reverse graph will likely work as well, just
not converge as quickly eventually).

Thoughts on that?

Otherwise bootstra

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Richard Biener via Gcc-patches

On Tue, Oct 6, 2020 at 11:34 AM Alexandre Oliva  wrote:
>
> On Oct  6, 2020, Richard Biener  wrote:
>
> > OK, I see.  mathfn_built_in expects a type inter-operating with
> > the C ABI types (float_type_node, double_type_node, etc.) where
> > "inter-operating" means having the same main variant.
>
> Yup.
>
> > Now, I guess for the sincos pass we want to combine sinl + cosl
> > to sincosl, independent on the case where the result would be
> > assigned to a 'double' when 'double == long double'?
>
> Sorry, I goofed in the patch description and misled you.
>
> When looking at
>
>   _d = sin (_s);
>
> the sincos didn't take the type of _d, but that of _s.
>
> I changed it so it takes the not from the actual passed to the
> intrinsic, but from the formal in the intrinsic declaration.

Yes, I understand.

> If we had conversions of _s to different precisions, the optimization
> wouldn't kick in: we'd have different actuals passed to sin and cos.
> I'm not sure it makes much sense to try to turn e.g.
>
>   _d1 = sin (_s);
>   _t = (float) _s;
>   _d2 = cosf (_t);
>
> into:
>
>   sincos (_s, &D1, &T);
>   _d1 = D1;
>   _td2 = T;
>   _d2 = (float) _td2;
>
> If someone goes through the trouble of computing sin and cos for the
> same angle at different precisions, you might as well leave it alone.
>
> > Now what about sinl + cos when 'double == long double'?
>
> Now that might make more sense to optimize, but if we're going to do
> such transformations, we might as well canonicalize the *l intrinsics to
> the equivalent double versions (assuming long double and double have the
> same precision), and then sincos will get it right.

Ah, we eventually already do that.

So how about that mathfn_type helper instead of hard-wring this logic
in sincos()?

Richard.

>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer

Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Dennis Zhang via Gcc-patches

On 9/16/20 4:00 PM, Dennis Zhang wrote:
> Hi all,
> 
> This patch enables SIMD modes for MVE auto-vectorization.
> In this patch, the integer and float MVE SIMD modes are returned by
> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> MVE or MVE_FLOAT is enabled.
> Then the expanders for auto-vectorization can be used for generating MVE
> SIMD code.
> 
> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> revealed by the enabled MVE SIMD modes.
> The tests are for checking the MVE reinterpret intrinsics.
> There are two functions in each of the tests. The two functions contain
> the pattern of identical code so that they are folded in icf pass.
> Because of icf, the instruction count only checks one function which is 8.
> However when the SIMD modes are enabled, the estimation of the code size
> becomes smaller so that inlining is applied after icf, then the
> instruction count becomes 16 which causes failure of the tests.
> Because the icf is not the expected pattern to be tested but causes
> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> instruction count.
> 
> This patch is separated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> because this part is not strongly connected to the aim of that one so
> that causing confusion.
> 
> Regtested and bootstraped.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

Re: [PATCH] dbgcnt: print list after compilation

2020-10-06 Thread Richard Biener via Gcc-patches

On Tue, Oct 6, 2020 at 12:29 PM Martin Liška  wrote:
>
> Hello.
>
> Motivation of the patch is to display debug counter values after a 
> compilation.
> It's handy for bisection of a debug counter. The new output is printed to 
> stderr
> (instead of stdout) and it works fine with LTO as well.
>
> Sample output:
>
>counter name  counter value closed intervals
> -
>asan_use_after_scope  0 unset
>auto_inc_dec  0 unset
>ccp   29473 unset
>cfg_cleanup   292   unset
>cprop 45unset
>cse2_move2add 451   unset
>dce   740   unset
>dce_fast  15unset
>dce_ud15unset
>delete_trivial_dead   5747  unset
>devirt0 unset
>df_byte_scan  0 unset
>dom_unreachable_edges 10unset
>tail_call 393   [1, 4], [100, 200]
> ...
>
>
> Ready for master?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * common.opt: Remove -fdbg-cnt-list from deferred options.
> * dbgcnt.c (dbg_cnt_set_limit_by_index): Make a copy
> to original_limits.
> (dbg_cnt_list_all_counters): Print also current counter value
> and print to stderr.
> * opts-global.c (handle_common_deferred_options): Do not handle
> -fdbg-cnt-list.
> * opts.c (common_handle_option): Likewise.
> * toplev.c (finalize): Handle it after compilation here.
> ---
>   gcc/common.opt|  2 +-
>   gcc/dbgcnt.c  | 25 +++--
>   gcc/opts-global.c |  4 
>   gcc/opts.c|  5 -
>   gcc/toplev.c  |  4 
>   5 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 292c2de694e..7e789d1c47f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1202,7 +1202,7 @@ Common Report Var(flag_data_sections)
>   Place data items into their own section.
>
>   fdbg-cnt-list
> -Common Report Var(common_deferred_options) Defer
> +Common Report Var(flag_dbg_cnt_list)
>   List all available debugging counters with their limits and counts.
>
>   fdbg-cnt=
> diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
> index 01893ce7238..2a2dd57507d 100644
> --- a/gcc/dbgcnt.c
> +++ b/gcc/dbgcnt.c
> @@ -45,6 +45,7 @@ static struct string2counter_map 
> map[debug_counter_number_of_counters] =
>   typedef std::pair limit_tuple;
>
>   static vec limits[debug_counter_number_of_counters];
> +static vec original_limits[debug_counter_number_of_counters];
>
>   static unsigned int count[debug_counter_number_of_counters];
>
> @@ -134,6 +135,8 @@ dbg_cnt_set_limit_by_index (enum debug_counter index, 
> const char *name,
> }
>   }
>
> +  original_limits[index] = limits[index].copy ();
> +
> return true;
>   }
>
> @@ -226,25 +229,27 @@ void
>   dbg_cnt_list_all_counters (void)
>   {
> int i;
> -  printf ("  %-30s %s\n", G_("counter name"), G_("closed intervals"));
> -  printf 
> ("-\n");
> +  fprintf (stderr, "  %-30s%-15s   %s\n", G_("counter name"),
> +  G_("counter value"), G_("closed intervals"));
> +  fprintf (stderr, 
> "-\n");
> for (i = 0; i < debug_counter_number_of_counters; i++)
>   {
> -  printf ("  %-30s ", map[i].name);
> -  if (limits[i].exists ())
> +  fprintf (stderr, "  %-30s%-15d   ", map[i].name, count[i]);
> +  if (original_limits[i].exists ())
> {
> - for (int j = limits[i].length () - 1; j >= 0; j--)
> + for (int j = original_limits[i].length () - 1; j >= 0; j--)
> {
> - printf ("[%u, %u]", limits[i][j].first, limits[i][j].second);
> + fprintf (stderr, "[%u, %u]", original_limits[i][j].first,
> +  original_limits[i][j].second);
>   if (j > 0)
> -   printf (", ");
> +   fprintf (stderr, ", ");
> }
> - putchar ('\n');
> + fprintf (stderr, "\n");
> }
> else
> -   printf ("unset\n");
> +   fprintf (stderr, "unset\n");
>   }
> -  printf ("\n");
> +  fprintf (stderr, "\n");
>   }
>
>   #if CHECKING_P
> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index b024ab8e18f..1816acf805b 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -378,10 +378,6 @@ handle_common_deferred_options (void)
>   dbg_cnt_process_opt (opt->arg);
>   break;
>
> -   case OPT_fdbg_cnt_list:
> - dbg_cnt_list_all_counters ()

[PATCH][GCC-10 backport] arm: Remove coercion from scalar argument to vmin & vmax intrinsics.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches

Hello,

Straight backport of Joe's patch with no changes.

This patch fixes an issue with vmin* and vmax* intrinsics which accept
a scalar argument. Previously when the scalar was of different width
to the vector elements this would generate __ARM_undef. This change
allows the scalar argument to be implicitly converted to the correct
width. Also tidied up the relevant unit tests, some of which would
have passed even if only one of two or three intrinsic calls had
compiled correctly.

Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
testsuites are clean. OK for trunk?

Thanks,
Joe

gcc/ChangeLog:

2020-08-10  Joe Ramsay  

* config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of scalar
argument.
(__arm_vmaxnmvq): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmaxavq): Likewise.
(__arm_vmaxavq_p): Likewise.
(__arm_vmaxvq): Likewise.
(__arm_vmaxvq_p): Likewise.
(__arm_vminavq): Likewise.
(__arm_vminavq_p): Likewise.
(__arm_vminvq): Likewise.
(__arm_vminvq_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for mismatched
width of scalar argument.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.

(cherry picked from commit 251950d899bc3c18b5775fe9fe20bebbdc8d15cb)


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm_m

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Richard Biener via Gcc-patches

On Tue, Oct 6, 2020 at 3:09 PM Martin Liška  wrote:
>
> On 10/6/20 2:56 PM, Andrew MacLeod wrote:
> > Ah, by just using the outgoing_range class, all you are getting is static 
> > edges.  so a TRUE edge is always a [1,1] and a false edge is [0,0]
> > I provided that class so you could get the constant edges on switches.
> >
> > if you want to get actual ranges for ssa-names, you will need the ranger 
> > (which I think is going in today).  It uses those values as the starting 
> > point for winding back to calculate other dependent names.
>
> Ah, all right!
>
> >
> > Then  you will want to query the ranger for the range of index_5 on that 
> > edge..
>
> Fine! So the only tricky thing here is to select a proper SSA_NAME to query 
> right?
> In my case I need to cover situations like:

Note what ranger will get you has the very same limitations as what
register_edge_assert_for has - it will _not_ necessarily provide
something that, when "concatenated" with &&, reproduces the
original condition in full its full semantics.  That is, it's not a condition
"decomposition" tool either.

Richard.

>index.0_1 = (unsigned int) index_5(D);
>_2 = index.0_1 + 4294967287;
>if (_2 <= 114)
>
> or
>
>  _1 = aChar_8(D) == 1;
>  _2 = aChar_8(D) == 10;
>  _3 = _1 | _2;
>  if (_3 != 0)
>
> Anything Ranger can help me with?
>
> Martin
>
> >
> > so you will need a gimple ranger instance instead of an outgoing_range 
> > object.
> >
> > Andrew
>

RE: [PATCH][GCC-10 backport] arm: Remove coercion from scalar argument to vmin & vmax intrinsics.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 14:37
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH][GCC-10 backport] arm: Remove coercion from scalar
> argument to vmin & vmax intrinsics.
> 
> Hello,
> 
> Straight backport of Joe's patch with no changes.
> 
> This patch fixes an issue with vmin* and vmax* intrinsics which accept
> a scalar argument. Previously when the scalar was of different width
> to the vector elements this would generate __ARM_undef. This change
> allows the scalar argument to be implicitly converted to the correct
> width. Also tidied up the relevant unit tests, some of which would
> have passed even if only one of two or three intrinsic calls had
> compiled correctly.
> 
> Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
> testsuites are clean. OK for trunk?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Joe
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Joe Ramsay  
> 
>   * config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of
> scalar
>   argument.
>   (__arm_vmaxnmvq): Likewise.
>   (__arm_vminnmavq): Likewise.
>   (__arm_vminnmvq): Likewise.
>   (__arm_vmaxnmavq_p): Likewise.
>   (__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
>   (__arm_vminnmavq_p): Likewise.
>   (__arm_vminnmvq_p): Likewise.
>   (__arm_vmaxavq): Likewise.
>   (__arm_vmaxavq_p): Likewise.
>   (__arm_vmaxvq): Likewise.
>   (__arm_vmaxvq_p): Likewise.
>   (__arm_vminavq): Likewise.
>   (__arm_vminavq_p): Likewise.
>   (__arm_vminvq): Likewise.
>   (__arm_vminvq_p): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for
> mismatched
>   width of scalar argument.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_u16

RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 13:27
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to
> maintain consistency.
> 
> Hello,
> 
> To maintain consistency with other Arm Architectures backend, iterators and
> iterator attributes are moved
> from mve.md file to iterators.md. Also move enumerators for MVE unspecs
> from mve.md file to unspecs.md file.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for master? Ok for GCC-10 branch?

Ok for trunk.
I'm not sure if this is needed for the GCC 10 branch (but am open to being 
convinced otherwise?)

Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2020-10-06  Srinath Parvathaneni  
> 
>   * config/arm/iterators.md (MVE_types): Move mode iterator from
> mve.md to
>   iterators.md.
>   (MVE_VLD_ST): Likewise.
>   (MVE_0): Likewise.
>   (MVE_1): Likewise.
>   (MVE_3): Likewise.
>   (MVE_2): Likewise.
>   (MVE_5): Likewise.
>   (MVE_6): Likewise.
>   (MVE_CNVT): Move mode attribute iterator from mve.md to
> iterators.md.
>   (MVE_LANES): Likewise.
>   (MVE_constraint): Likewise.
>   (MVE_constraint1): Likewise.
>   (MVE_constraint2): Likewise.
>   (MVE_constraint3): Likewise.
>   (MVE_pred): Likewise.
>   (MVE_pred1): Likewise.
>   (MVE_pred2): Likewise.
>   (MVE_pred3): Likewise.
>   (MVE_B_ELEM): Likewise.
>   (MVE_H_ELEM): Likewise.
>   (V_sz_elem1): Likewise.
>   (V_extr_elem): Likewise.
>   (earlyclobber_32): Likewise.
>   (supf): Move int attribute from mve.md to iterators.md.
>   (mode1): Likewise.
>   (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
>   (VMVNQ_N): Likewise.
>   (VREV64Q): Likewise.
>   (VCVTQ_FROM_F): Likewise.
>   (VREV16Q): Likewise.
>   (VCVTAQ): Likewise.
>   (VMVNQ): Likewise.
>   (VDUPQ_N): Likewise.
>   (VCLZQ): Likewise.
>   (VADDVQ): Likewise.
>   (VREV32Q): Likewise.
>   (VMOVLBQ): Likewise.
>   (VMOVLTQ): Likewise.
>   (VCVTPQ): Likewise.
>   (VCVTNQ): Likewise.
>   (VCVTMQ): Likewise.
>   (VADDLVQ): Likewise.
>   (VCTPQ): Likewise.
>   (VCTPQ_M): Likewise.
>   (VCVTQ_N_TO_F): Likewise.
>   (VCREATEQ): Likewise.
>   (VSHRQ_N): Likewise.
>   (VCVTQ_N_FROM_F): Likewise.
>   (VADDLVQ_P): Likewise.
>   (VCMPNEQ): Likewise.
>   (VSHLQ): Likewise.
>   (VABDQ): Likewise.
>   (VADDQ_N): Likewise.
>   (VADDVAQ): Likewise.
>   (VADDVQ_P): Likewise.
>   (VANDQ): Likewise.
>   (VBICQ): Likewise.
>   (VBRSRQ_N): Likewise.
>   (VCADDQ_ROT270): Likewise.
>   (VCADDQ_ROT90): Likewise.
>   (VCMPEQQ): Likewise.
>   (VCMPEQQ_N): Likewise.
>   (VCMPNEQ_N): Likewise.
>   (VEORQ): Likewise.
>   (VHADDQ): Likewise.
>   (VHADDQ_N): Likewise.
>   (VHSUBQ): Likewise.
>   (VHSUBQ_N): Likewise.
>   (VMAXQ): Likewise.
>   (VMAXVQ): Likewise.
>   (VMINQ): Likewise.
>   (VMINVQ): Likewise.
>   (VMLADAVQ): Likewise.
>   (VMULHQ): Likewise.
>   (VMULLBQ_INT): Likewise.
>   (VMULLTQ_INT): Likewise.
>   (VMULQ): Likewise.
>   (VMULQ_N): Likewise.
>   (VORNQ): Likewise.
>   (VORRQ): Likewise.
>   (VQADDQ): Likewise.
>   (VQADDQ_N): Likewise.
>   (VQRSHLQ): Likewise.
>   (VQRSHLQ_N): Likewise.
>   (VQSHLQ): Likewise.
>   (VQSHLQ_N): Likewise.
>   (VQSHLQ_R): Likewise.
>   (VQSUBQ): Likewise.
>   (VQSUBQ_N): Likewise.
>   (VRHADDQ): Likewise.
>   (VRMULHQ): Likewise.
>   (VRSHLQ): Likewise.
>   (VRSHLQ_N): Likewise.
>   (VRSHRQ_N): Likewise.
>   (VSHLQ_N): Likewise.
>   (VSHLQ_R): Likewise.
>   (VSUBQ): Likewise.
>   (VSUBQ_N): Likewise.
>   (VADDLVAQ): Likewise.
>   (VBICQ_N): Likewise.
>   (VMLALDAVQ): Likewise.
>   (VMLALDAVXQ): Likewise.
>   (VMOVNBQ): Likewise.
>   (VMOVNTQ): Likewise.
>   (VORRQ_N): Likewise.
>   (VQMOVNBQ): Likewise.
>   (VQMOVNTQ): Likewise.
>   (VSHLLBQ_N): Likewise.
>   (VSHLLTQ_N): Likewise.
>   (VRMLALDAVHQ): Likewise.
>   (VBICQ_M_N): Likewise.
>   (VCVTAQ_M): Likewise.
>   (VCVTQ_M_TO_F): Likewise.
>   (VQRSHRNBQ_N): Likewise.
>   (VABAVQ): Likewise.
>   (VSHLCQ): Likewise.
>   (VRMLALDAVHAQ): Likewise.
>   (VADDVAQ_P): Likewise.
>   (VCLZQ_M): Likewise.
>   (VCMPEQQ_M_N): Likewise.
>   (VCMPEQQ_M): Likewise.
>   (VCMPNEQ_M_N): Likewise.
>   (VCMPNEQ_M): Likewise.
>   (VDUPQ_M_N): Likewise.
>   (VMAXVQ_P): Likewise.
>   (VMINVQ_P): Likewise.
>   (VMLADAVAQ): Likewise.
>   (VMLADAVQ_P): Likewise.
>   (VMLAQ_N): Likewise.
>   (VMLASQ_N): Likewise.
>   (VMVNQ_M): Likewise.
>   (VPSELQ): Likewise.
>   (VQDMLAHQ_N): Likewise.
>   (VQR

RE: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches

Hi Dennis,

> -Original Message-
> From: Dennis Zhang 
> Sent: 06 October 2020 14:37
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; nd ;
> Richard Earnshaw ; Ramana Radhakrishnan
> 
> Subject: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
> 
> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch enables SIMD modes for MVE auto-vectorization.
> > In this patch, the integer and float MVE SIMD modes are returned by
> > arm_preferred_simd_mode
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > MVE or MVE_FLOAT is enabled.
> > Then the expanders for auto-vectorization can be used for generating MVE
> > SIMD code.
> >
> > This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > revealed by the enabled MVE SIMD modes.
> > The tests are for checking the MVE reinterpret intrinsics.
> > There are two functions in each of the tests. The two functions contain
> > the pattern of identical code so that they are folded in icf pass.
> > Because of icf, the instruction count only checks one function which is 8.
> > However when the SIMD modes are enabled, the estimation of the code
> size
> > becomes smaller so that inlining is applied after icf, then the
> > instruction count becomes 16 which causes failure of the tests.
> > Because the icf is not the expected pattern to be tested but causes
> > above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > instruction count.
> >
> > This patch is separated from
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > because this part is not strongly connected to the aim of that one so
> > that causing confusion.
> >
> > Regtested and bootstraped.
> >
> > Is it OK for trunk please?

Ok.
Sorry for the delay.
Kyrill

> >
> > Thanks
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  
> >
> > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD
> modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  
> >
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> >
> 
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> September/554100.html

RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches

Hi Kyrill,

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: 06 October 2020 14:42
> To: Srinath Parvathaneni ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> to maintain consistency.
> 
> 
> 
> > -Original Message-
> > From: Srinath Parvathaneni 
> > Sent: 06 October 2020 13:27
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov 
> > Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> > to maintain consistency.
> >
> > Hello,
> >
> > To maintain consistency with other Arm Architectures backend,
> > iterators and iterator attributes are moved from mve.md file to
> > iterators.md. Also move enumerators for MVE unspecs from mve.md file
> > to unspecs.md file.
> >
> > Regression tested on arm-none-eabi and found no regressions.
> >
> > Ok for master? Ok for GCC-10 branch?
> 
> Ok for trunk.
> I'm not sure if this is needed for the GCC 10 branch (but am open to being
> convinced otherwise?)

Thanks for approving this patch.
Backporting this patch avoid conflicts when backporting any bug fix modifying 
MVE Patterns (iterators and unspecs), I hope this convinces you.

Regards,
SRI.
> 
> Thanks,
> Kyrill
> 
> >
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2020-10-06  Srinath Parvathaneni  
> >
> > * config/arm/iterators.md (MVE_types): Move mode iterator from
> mve.md
> > to
> > iterators.md.
> > (MVE_VLD_ST): Likewise.
> > (MVE_0): Likewise.
> > (MVE_1): Likewise.
> > (MVE_3): Likewise.
> > (MVE_2): Likewise.
> > (MVE_5): Likewise.
> > (MVE_6): Likewise.
> > (MVE_CNVT): Move mode attribute iterator from mve.md to
> iterators.md.
> > (MVE_LANES): Likewise.
> > (MVE_constraint): Likewise.
> > (MVE_constraint1): Likewise.
> > (MVE_constraint2): Likewise.
> > (MVE_constraint3): Likewise.
> > (MVE_pred): Likewise.
> > (MVE_pred1): Likewise.
> > (MVE_pred2): Likewise.
> > (MVE_pred3): Likewise.
> > (MVE_B_ELEM): Likewise.
> > (MVE_H_ELEM): Likewise.
> > (V_sz_elem1): Likewise.
> > (V_extr_elem): Likewise.
> > (earlyclobber_32): Likewise.
> > (supf): Move int attribute from mve.md to iterators.md.
> > (mode1): Likewise.
> > (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
> > (VMVNQ_N): Likewise.
> > (VREV64Q): Likewise.
> > (VCVTQ_FROM_F): Likewise.
> > (VREV16Q): Likewise.
> > (VCVTAQ): Likewise.
> > (VMVNQ): Likewise.
> > (VDUPQ_N): Likewise.
> > (VCLZQ): Likewise.
> > (VADDVQ): Likewise.
> > (VREV32Q): Likewise.
> > (VMOVLBQ): Likewise.
> > (VMOVLTQ): Likewise.
> > (VCVTPQ): Likewise.
> > (VCVTNQ): Likewise.
> > (VCVTMQ): Likewise.
> > (VADDLVQ): Likewise.
> > (VCTPQ): Likewise.
> > (VCTPQ_M): Likewise.
> > (VCVTQ_N_TO_F): Likewise.
> > (VCREATEQ): Likewise.
> > (VSHRQ_N): Likewise.
> > (VCVTQ_N_FROM_F): Likewise.
> > (VADDLVQ_P): Likewise.
> > (VCMPNEQ): Likewise.
> > (VSHLQ): Likewise.
> > (VABDQ): Likewise.
> > (VADDQ_N): Likewise.
> > (VADDVAQ): Likewise.
> > (VADDVQ_P): Likewise.
> > (VANDQ): Likewise.
> > (VBICQ): Likewise.
> > (VBRSRQ_N): Likewise.
> > (VCADDQ_ROT270): Likewise.
> > (VCADDQ_ROT90): Likewise.
> > (VCMPEQQ): Likewise.
> > (VCMPEQQ_N): Likewise.
> > (VCMPNEQ_N): Likewise.
> > (VEORQ): Likewise.
> > (VHADDQ): Likewise.
> > (VHADDQ_N): Likewise.
> > (VHSUBQ): Likewise.
> > (VHSUBQ_N): Likewise.
> > (VMAXQ): Likewise.
> > (VMAXVQ): Likewise.
> > (VMINQ): Likewise.
> > (VMINVQ): Likewise.
> > (VMLADAVQ): Likewise.
> > (VMULHQ): Likewise.
> > (VMULLBQ_INT): Likewise.
> > (VMULLTQ_INT): Likewise.
> > (VMULQ): Likewise.
> > (VMULQ_N): Likewise.
> > (VORNQ): Likewise.
> > (VORRQ): Likewise.
> > (VQADDQ): Likewise.
> > (VQADDQ_N): Likewise.
> > (VQRSHLQ): Likewise.
> > (VQRSHLQ_N): Likewise.
> > (VQSHLQ): Likewise.
> > (VQSHLQ_N): Likewise.
> > (VQSHLQ_R): Likewise.
> > (VQSUBQ): Likewise.
> > (VQSUBQ_N): Likewise.
> > (VRHADDQ): Likewise.
> > (VRMULHQ): Likewise.
> > (VRSHLQ): Likewise.
> > (VRSHLQ_N): Likewise.
> > (VRSHRQ_N): Likewise.
> > (VSHLQ_N): Likewise.
> > (VSHLQ_R): Likewise.
> > (VSUBQ): Likewise.
> > (VSUBQ_N): Likewise.
> > (VADDLVAQ): Likewise.
> > (VBICQ_N): Likewise.
> > (VMLALDAVQ): Likewise.
> > (VMLALDAVXQ): Likewise.
> > (VMOVNBQ): Likewise.
> > (VMOVNTQ): Likewise.
> > (VORRQ_N): Likewise.
> > (VQMOVNBQ): Likewise.
> > (VQMOVNTQ): Likewise.
> > (VSHLLBQ_N): Likewise.
> > (VSHLLTQ_N): Likewise.
> > (VRMLALDAVHQ): Likewise.
> > (VBICQ_M_N): Likewise.
> > (VCVTAQ_M): Likewise.
> > (VCVTQ_M_TO_F): Likewise.
> > (VQRSHRNBQ_N): Likewise.
> > (VABAVQ): Likewise.
> > (VSHLCQ): Likewise.
> > (VRMLALDAVHA

RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 06 October 2020 14:55
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> to maintain consistency.
> 
> Hi Kyrill,
> 
> > -Original Message-
> > From: Kyrylo Tkachov 
> > Sent: 06 October 2020 14:42
> > To: Srinath Parvathaneni ; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [PATCH][GCC] arm: Move iterators from mve.md to
> iterators.md
> > to maintain consistency.
> >
> >
> >
> > > -Original Message-
> > > From: Srinath Parvathaneni 
> > > Sent: 06 October 2020 13:27
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov 
> > > Subject: [PATCH][GCC] arm: Move iterators from mve.md to iterators.md
> > > to maintain consistency.
> > >
> > > Hello,
> > >
> > > To maintain consistency with other Arm Architectures backend,
> > > iterators and iterator attributes are moved from mve.md file to
> > > iterators.md. Also move enumerators for MVE unspecs from mve.md file
> > > to unspecs.md file.
> > >
> > > Regression tested on arm-none-eabi and found no regressions.
> > >
> > > Ok for master? Ok for GCC-10 branch?
> >
> > Ok for trunk.
> > I'm not sure if this is needed for the GCC 10 branch (but am open to being
> > convinced otherwise?)
> 
> Thanks for approving this patch.
> Backporting this patch avoid conflicts when backporting any bug fix
> modifying MVE Patterns (iterators and unspecs), I hope this convinces you.

Ok then, thanks for explaining.
Kyrill

> 
> Regards,
> SRI.
> >
> > Thanks,
> > Kyrill
> >
> > >
> > > Regards,
> > > Srinath.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-10-06  Srinath Parvathaneni  
> > >
> > >   * config/arm/iterators.md (MVE_types): Move mode iterator from
> > mve.md
> > > to
> > >   iterators.md.
> > >   (MVE_VLD_ST): Likewise.
> > >   (MVE_0): Likewise.
> > >   (MVE_1): Likewise.
> > >   (MVE_3): Likewise.
> > >   (MVE_2): Likewise.
> > >   (MVE_5): Likewise.
> > >   (MVE_6): Likewise.
> > >   (MVE_CNVT): Move mode attribute iterator from mve.md to
> > iterators.md.
> > >   (MVE_LANES): Likewise.
> > >   (MVE_constraint): Likewise.
> > >   (MVE_constraint1): Likewise.
> > >   (MVE_constraint2): Likewise.
> > >   (MVE_constraint3): Likewise.
> > >   (MVE_pred): Likewise.
> > >   (MVE_pred1): Likewise.
> > >   (MVE_pred2): Likewise.
> > >   (MVE_pred3): Likewise.
> > >   (MVE_B_ELEM): Likewise.
> > >   (MVE_H_ELEM): Likewise.
> > >   (V_sz_elem1): Likewise.
> > >   (V_extr_elem): Likewise.
> > >   (earlyclobber_32): Likewise.
> > >   (supf): Move int attribute from mve.md to iterators.md.
> > >   (mode1): Likewise.
> > >   (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
> > >   (VMVNQ_N): Likewise.
> > >   (VREV64Q): Likewise.
> > >   (VCVTQ_FROM_F): Likewise.
> > >   (VREV16Q): Likewise.
> > >   (VCVTAQ): Likewise.
> > >   (VMVNQ): Likewise.
> > >   (VDUPQ_N): Likewise.
> > >   (VCLZQ): Likewise.
> > >   (VADDVQ): Likewise.
> > >   (VREV32Q): Likewise.
> > >   (VMOVLBQ): Likewise.
> > >   (VMOVLTQ): Likewise.
> > >   (VCVTPQ): Likewise.
> > >   (VCVTNQ): Likewise.
> > >   (VCVTMQ): Likewise.
> > >   (VADDLVQ): Likewise.
> > >   (VCTPQ): Likewise.
> > >   (VCTPQ_M): Likewise.
> > >   (VCVTQ_N_TO_F): Likewise.
> > >   (VCREATEQ): Likewise.
> > >   (VSHRQ_N): Likewise.
> > >   (VCVTQ_N_FROM_F): Likewise.
> > >   (VADDLVQ_P): Likewise.
> > >   (VCMPNEQ): Likewise.
> > >   (VSHLQ): Likewise.
> > >   (VABDQ): Likewise.
> > >   (VADDQ_N): Likewise.
> > >   (VADDVAQ): Likewise.
> > >   (VADDVQ_P): Likewise.
> > >   (VANDQ): Likewise.
> > >   (VBICQ): Likewise.
> > >   (VBRSRQ_N): Likewise.
> > >   (VCADDQ_ROT270): Likewise.
> > >   (VCADDQ_ROT90): Likewise.
> > >   (VCMPEQQ): Likewise.
> > >   (VCMPEQQ_N): Likewise.
> > >   (VCMPNEQ_N): Likewise.
> > >   (VEORQ): Likewise.
> > >   (VHADDQ): Likewise.
> > >   (VHADDQ_N): Likewise.
> > >   (VHSUBQ): Likewise.
> > >   (VHSUBQ_N): Likewise.
> > >   (VMAXQ): Likewise.
> > >   (VMAXVQ): Likewise.
> > >   (VMINQ): Likewise.
> > >   (VMINVQ): Likewise.
> > >   (VMLADAVQ): Likewise.
> > >   (VMULHQ): Likewise.
> > >   (VMULLBQ_INT): Likewise.
> > >   (VMULLTQ_INT): Likewise.
> > >   (VMULQ): Likewise.
> > >   (VMULQ_N): Likewise.
> > >   (VORNQ): Likewise.
> > >   (VORRQ): Likewise.
> > >   (VQADDQ): Likewise.
> > >   (VQADDQ_N): Likewise.
> > >   (VQRSHLQ): Likewise.
> > >   (VQRSHLQ_N): Likewise.
> > >   (VQSHLQ): Likewise.
> > >   (VQSHLQ_N): Likewise.
> > >   (VQSHLQ_R): Likewise.
> > >   (VQSUBQ): Likewise.
> > >   (VQSUBQ_N): Likewise.
> > >   (VRHADDQ): Likewise.
> > >   (VRMULHQ): Likewise.
> > >   (VRSHLQ): Likewise.
> > >   (VRSHLQ_N): Likewise.
> > >   (VRSHRQ_N): Likewise.
> > >   (VSHLQ_N): Likewise.
> > >   (VSHLQ_R): Likewise.
> > >   (VSUBQ): Likewise.
> > >   (VSUBQ_N): Likewise.
> > >   (VADDLVAQ): Likewise.
> > >   (VBICQ_N): Likewise.
> > >   (VMLALDAVQ): Likewise.
> > >   (VMLALDAVXQ): Likewise.
> > >   (VMOVNBQ): Likewise.
> > >   (VMOVNTQ): Likew

[PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-06 Thread Qing Zhao via Gcc-patches

Hi, Gcc team,

This is the 3rd version of the implementation of patch -fzero-call-used-regs.

We will provide a new feature into GCC:

Add 
-fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
 command-line option
and
zero_call_used_regs("skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all")
 function attribues:

   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")

   Don't zero call-used registers upon function return. This is the default 
behavior.

   2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")

   Zero used call-used general purpose registers that are used to pass 
parameters upon function return.

   3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

   Zero used call-used registers that are used to pass parameters upon function 
return.

   4. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")

   Zero all call-used registers that are used to pass parameters upon function 
return.

   5. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

   Zero used call-used general purpose registers upon function return.

   6. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

   Zero all call-used general purpose registers upon function return.

   7. -fzero-call-used-regs=used and zero_call_used_regs("used")

   Zero used call-used registers upon function return.

   8. -fzero-call-used-regs=all and zero_call_used_regs("all")

   Zero all call-used registers upon function return.

Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or
preventing information leak through registers.

{skip}, which is the default, doesn't zero call-used registers.

{used-arg-gpr} zeros used call-used general purpose registers that
pass parameters. {used-arg} zeros used call-used registers that
pass parameters. {arg} zeros all call-used registers that pass
parameters. These 3 choices are used for ROP mitigation.

{used-gpr} zeros call-used general purpose registers
which are used in function.  {all-gpr} zeros all
call-used registers.  {used} zeros call-used registers which
are used in function.  {all} zeros all call-used registers.
These 4 choices are used for preventing information leak through
registers.

You can control this behavior for a specific function by using the function
attribute {zero_call_used_regs}.

**Tests be done:
1. Gcc bootstrap on x86, aarch64 and rs6000.
2. Regression test on x86, aarch64 and rs6000.
(X86, aarch64 have no any issue, rs6000 failed at the new testing case in 
middle end which is expected)

3. Cpu2017 on x86, -O2 
-fzero-call-used-regs=used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all

**runtime performance data of CPU2017 on x86
https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv
 


**The major changes compared to the previous version are:

1. Add 3 new sub-options and corresponding function attributes:
  used-gpr-arg, used-arg, all-arg
  for ROP mitigation purpose;
2. Updated user manual;
3. Re-design of the implementation:

  3.1 data flow change to reflect the newly added zeroing insns to avoid
  these insns been deleted, moved, or merged by later passes:

  3.1.1.
  abstract EPILOGUE_USES into a new target-independent wrapper function that
  (a) returns true if EPILOGUE_USES itself returns true and (b) returns
  true for registers that need to be zero on return, if the zeroing
  instructions have already been inserted.  The places that currently
  test EPILOGUE_USES should then test this new wrapper function instead.

  Add this new wrapper function to df.h and df-scan.c.

  3.1.2.
  add a new utility routine "expand_asm_reg_clobber_mem_blockage" to generate
  a volatile asm insn that clobbers all the hard registers that are zeroed.

  emit this volatile asm in the very beginning of the zeroing sequence.

  3.2 new pass:
  add a new pass in the beginning of "late_compilation", before
  "pass_compute_alignment", called "pass_zero_call_used_regs".

  in this new pass,
  * compute the data flow information; (df_analyze ());
  * scan the exit block from backward to look for "return":
A. for each return, compute the "need_zeroed_hardregs" based on
the user request, and data flow information, and function ABI info.
B. pass this need_zeroed_hardregs set to target hook "zero_call_used_regs"
to generate the instruction sequnce that zero the regs.
C. Data flow maintenance. 
4.Use "lookup_attribute" to get the attribute information instead of setting
  the attribute information into "tree_decl_with_vis" in tree-core.h.

**The changelog:

gcc/ChangeLog: 
2020-10-05  Qing Zhao  mailto:qing.z...@oracle.com>>
H.J. Lu  mailto:hjl.to...@gmail.com>

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Martin Liška


On 10/6/20 9:47 AM, Richard Biener wrote:

But is it really extensible with the current implementation?  I doubt so.


I must agree with the statement. So let's make the pass properly.
I would need a help with the algorithm where I'm planning to do the following
steps:

1) for each BB ending with a gcond, parse index variable and it's VR;
   I'll support:
   a) index == 123 ([123, 123])
   b) 1 <= index && index <= 9 ([1, 9])
   c) index == 123 || index == 12345 ([123, 123] [12345, 12345])
   d) index != 1 ([1, 1])
   e) index != 1 && index != 5 ([1, 1] [5, 5])

2) switch range edge is identified, e.g. true_edge for 1e, while false_edge for 
1a

3) I want to support forward code hoisting, so for each condition BB we need to 
identify
   if the block contains only stmts without a side-effect

4) we can ignore BBs with condition variables that has small number of potential
   case switches

5) for each condition variable we can iterate bottom up in dominator order and 
try
   to find a BB predecessor chain (in first phase no BB in between such 
"condition" BBs)
   that uses a same condition variable

6) the chain will be converted to a switch statement
7) code hoisting must be done in order to move a gimple statements and fix 
potential
   gphis that can be collapsed

Is it something feasible that can work?
Thanks,

Martin

Re: [PATCH] Add if-chain to switch conversion pass.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 03:48:38PM +0200, Martin Liška wrote:
> On 10/6/20 9:47 AM, Richard Biener wrote:
> > But is it really extensible with the current implementation?  I doubt so.
> 
> I must agree with the statement. So let's make the pass properly.
> I would need a help with the algorithm where I'm planning to do the following
> steps:
> 
> 1) for each BB ending with a gcond, parse index variable and it's VR;
>I'll support:
>a) index == 123 ([123, 123])
>b) 1 <= index && index <= 9 ([1, 9])
>c) index == 123 || index == 12345 ([123, 123] [12345, 12345])
>d) index != 1 ([1, 1])
>e) index != 1 && index != 5 ([1, 1] [5, 5])

The fold_range_test created cases are essential to support, so
f) index - 123U < 456U ([123, 456+123])
g) (unsigned) index - 123U < 456U (ditto)
but the discovery should actually recurse on all of those forms, so it will
handle
(unsigned) index - 123U < 456U || (unsigned) index - 16384U <= 32711U
etc.
You can see what reassoc init_range_entry does and do something similar?

Jakub

[GCC-10 backport][COMMITTED] arm: Move iterators from mve.md to iterators.md to maintain consistency.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches

Backport approved here 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555618.html .

To maintain consistency with other Arm Architectures backend, iterators and 
iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from 
mve.md file to unspecs.md file.

gcc/ChangeLog:

2020-10-06  Srinath Parvathaneni  

* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.
(VORRQ_M

Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 12:20:14PM +0200, Martin Liška wrote:
> On 10/6/20 10:00 AM, Richard Biener wrote:
> > On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:
> > > 
> > > On 10/5/20 6:34 PM, Ian Lance Taylor wrote:
> > > > On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
> > > > > 
> > > > > The previous patch was not correct. This one should be.
> > > > > 
> > > > > Ready for master?
> > > > 
> > > > I don't understand why this code uses symtab_indices_shndx at all.
> > > > There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
> > > > any need for the symtab_indices_shndx vector.
> > > 
> > > Well, the question is if we can have multiple .symtab sections in one ELF
> > > file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX 
> > > sections.
> > > Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
> > > by PR81968 which is about Solaris ld.
> > 
> > It wasn't my code but I suppose this way the implementation was
> > "easiest".  There
> > should be exactly one symtab / shndx section.  Rainer authored this support.
> 
> If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
> suggesting
> an updated version of the patch. It's what Ian offered.

gABI says on
SHT_SYMTAB/SHT_DYNSYM:
Currently, an object file may have only one section of each type, but this 
restriction may be relaxed in the future.
SHT_SYMTAB_SHNDX:
This section is associated with a symbol table section and is required if any 
of the section header indexes referenced by that symbol table contain the 
escape value SHN_XINDEX.
So, I guess only at most one SHT_SYMTAB_SHNDX can appear in ET_REL objects
which we are talking about, and at most two SHT_SYMTAB_SHNDX in
ET_EXEC/ET_DYN (though only in the very unlikely case that the binary/dso
contains more than 65536-epsilon sections and both .symtab and .dynsym need
to refer to those.  One would need to play with linker scripts to convince
ld.bfd to create many sections in ET_EXEC/ET_DYN.

Jakub

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Martin Sebor via Gcc-patches


On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making it 
public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

Martin

[PATCH][GCC-10 backport] arm: Add +nomve and +nomve.fp options to -mcpu=cortex-m55.

2020-10-06 Thread Srinath Parvathaneni via Gcc-patches

Backport of Joe's patch wit no changes.

This patch rearranges feature bits for MVE and FP to implement the
following flags for -mcpu=cortex-m55.

  - +nomve:equivalent to armv8.1-m.main+fp.dp+dsp.
  - +nomve.fp: equivalent to armv8.1-m.main+mve+fp.dp (+dsp is implied by +mve).
  - +nofp: equivalent to armv8.1-m.main+mve (+dsp is implied by +mve).
  - +nodsp:equivalent to armv8.1-m.main+fp.dp.

Combinations of the above:

  - +nomve+nofp: equivalent to armv8.1-m.main+dsp.
  - +nodsp+nofp: equivalent to armv8.1-m.main.

Due to MVE and FP sharing vfp_base, some new syntax was required in the CPU
description to implement the concept of 'implied bits'. These are non-named
features added to the ISA late, depending on whether one or more features which
depend on them are present. This means vfp_base can be present when only one of
MVE and FP is removed, but absent when both are removed.

Ok for GCC-10 branch?

gcc/ChangeLog:

2020-07-31  Joe Ramsay  

* config/arm/arm-cpus.in:
(ALL_FPU_INTERNAL): Remove vfp_base.
(VFPv2): Remove vfp_base.
(MVE): Remove vfp_base.
(vfp_base): Redefine as implied bit dependent on MVE or FP
(cortex-m55): Add flags to disable MVE, MVE FP, FP and DSP extensions.
* config/arm/arm.c (arm_configure_build_target): Add implied bits to 
ISA.
* config/arm/parsecpu.awk:
(gen_isa): Print implied bits and their dependencies to ISA header.
(gen_data): Add parsing for implied feature bits.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cortex-m55-nodsp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nodsp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c: New test.
* gcc.target/arm/multilib.exp: Add tests for -mcpu=cortex-m55.

(cherry picked from commit 3e8fb15a8cfd0e62dd474af9f536863392ed7572)


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d609113e969d69505bc2f1b13fab8b1dfd622472..db0b93f6bb74f6ddf42636caa0d9a3db38692982
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -135,10 +135,6 @@ define feature armv8_1m_main
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.
 
-# This feature bit is enabled for all VFP, MVE and
-# MVE with floating point extensions.
-define feature vfp_base
-
 # Vector floating point v2.
 define feature vfpv2
 
@@ -251,7 +247,7 @@ define fgroup ALL_SIMD  ALL_SIMD_INTERNAL 
ALL_SIMD_EXTERNAL
 
 # List of all FPU bits to strip out if -mfpu is used to override the
 # default.  fp16 is deliberately missing from this list.
-define fgroup ALL_FPU_INTERNAL vfp_base vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
+define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
 # Similarly, but including fp16 and other extensions that aren't part of
 # -mfpu support.
 define fgroup ALL_FPU_EXTERNAL fp16 bf16
@@ -296,11 +292,11 @@ define fgroup ARMv8r  ARMv8a
 define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
 
 # Useful combinations.
-define fgroup VFPv2vfp_base vfpv2
+define fgroup VFPv2vfpv2
 define fgroup VFPv3VFPv2 vfpv3
 define fgroup VFPv4VFPv3 vfpv4 fp16conv
 define fgroup FPv5 VFPv4 fpv5
-define fgroup MVE  mve vfp_base armv7em
+define fgroup MVE  mve armv7em
 define fgroup MVE_FP   MVE FPv5 fp16 mve_float
 
 define fgroup FP_DBL   fp_dbl
@@ -310,6 +306,18 @@ define fgroup NEON FP_D32 neon
 define fgroup CRYPTO   NEON crypto
 define fgroup DOTPROD  NEON dotprod
 
+# Implied feature bits.  These are for non-named features shared between 
fgroups.
+# Shared feature f belonging to fgroups A and B will be erroneously removed if:
+# A and B are enabled by default AND A is disabled by a removal flag.
+# To ensure that f is retained, we must add such bits to the ISA after
+# processing the removal flags.  This is implemented by 'implied bits':
+# define implied  []+
+# This indicates that, if any of the listed features are enabled, or if any
+# member of a listed fgroup is enabled, then  will be implicitly enabled.
+
+# Enabled for all VFP, MVE and MVE with floating point extensions.
+define implied vfp_base MVE MVE_FP ALL_FP
+
 # List of all quirk bits to strip out when comparing CPU features with
 # architectures.
 # xscale isn't really a 'quirk', but it isn't an architecture either and we
@@ -1532,6 +1540,10 @@ begin cpu cortex-

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is  
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.

Re: [PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tobias Burnus


LGTM.

Thanks,

Tobias

On 10/6/20 3:28 PM, Tom de Vries wrote:

Hi,

Consider test-case test.c, with VLA A:
...
int main (void) {
   int N = 1000;
   int A[N];
   #pragma acc declare copy(A)
   return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
   #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
   int[0:D.2074] * A.1;

   {
 int A[0:D.2074] [value-expr: *A.1];

 saved_stack.2 = __builtin_stack_save ();
 try
   {
 A.1 = __builtin_alloca_with_align (D.2078, 32);
 #pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
   }
 finally
   {
 __builtin_stack_restore (saved_stack.2);
   }
   }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
  finally
{
+#pragma omp target oacc_declare map(from:(*A.1))
  __builtin_stack_restore (saved_stack.2);
}
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

OK for trunk?

Thanks,
- Tom

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * gimplify.c (gimplify_bind_expr): Handle lookup in
  oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

  PR middle-end/90861
  * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

---
  gcc/gimplify.c| 13 ++---
  libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c |  5 -
  2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..fa89e797940 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1468,15 +1468,22 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)

if (flag_openacc && oacc_declare_returns != NULL)
  {
-   tree *c = oacc_declare_returns->get (t);
+   tree key = t;
+   if (DECL_HAS_VALUE_EXPR_P (key))
+ {
+   key = DECL_VALUE_EXPR (key);
+   if (TREE_CODE (key) == INDIRECT_REF)
+ key = TREE_OPERAND (key, 0);
+ }
+   tree *c = oacc_declare_returns->get (key);
if (c != NULL)
  {
if (ret_clauses)
  OMP_CLAUSE_CHAIN (*c) = ret_clauses;

-   ret_clauses = *c;
+   ret_clauses = unshare_expr (*c);

-   oacc_declare_returns->remove (t);
+   oacc_declare_returns->remove (key);

if (oacc_declare_returns->is_empty ())
  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51badca42..714935772c1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -59,8 +59,3 @@ main ()

return 0;
  }
-
-
-/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
-   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
-   address.  */

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries

On 10/5/20 3:15 PM, Tom de Vries wrote:
> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
>> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>>> * {target-32.c, thread-limit-2.c}:
>>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
>>
>> Please don't, I want to deal with that using declare variant, just didn't
>> get yet around to finishing the last patch needed for that.  Will try next 
>> week.
>>
> 
> Hi Jakub,
> 
> Ping, any update on this?

FWIW, I've tried as in patch attached below, but I didn't get it
compiling, I still got:
...
FAIL: libgomp.c/target-32.c (test for excess errors)
Excess errors:
unresolved symbol usleep
...

Jakub, is this already supposed to work?

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..7ddf8721ed3 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,26 @@
 #include 
 #include 
 
+extern void base_delay(int);
+extern void nvptx_delay(int);
+
+#pragma omp declare variant( nvptx_delay ) match( construct={target}, implementation={vendor(nvidia)} )
+void base_delay(int d)
+{
+  usleep (d);
+}
+
+void nvptx_delay(int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case. It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +38,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Martin Sebor via Gcc-patches


On 10/6/20 8:42 AM, Andrew MacLeod wrote:

On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both DSTREF
   and SRCREF based on one another and the kind of the access.  */
-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is 
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.


Yes, I got that.  Which is why I suggest to add a namespace scope
function to the restrict pass that can then be easily replaced with
whatever solution we ultimately end up with.

What's certain (in my mind anyway) is that storing a pointer to some
global (or per-pass) range instance as a member in each class that
needs to access it is not the solution we want long term.

Martin

Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts

2020-10-06 Thread Chung-Lin Tang


On 2020/9/29 6:16 PM, Jakub Jelinek wrote:

On Tue, Sep 01, 2020 at 09:16:23PM +0800, Chung-Lin Tang wrote:

this patch set implements parts of the target mapping changes introduced
in OpenMP 5.0, mainly the attachment requirements for pointer-based
list items, and the clause ordering.

The first patch here are the C/C++ front-end changes.


Do you think you could mention in detail which exact target mapping changes
in the spec is the patchset attempting to implement?
5.0 unfortunately contains many target mapping changes and this patchset
can't implement them all and it would be easier to see the list of rules
(e.g. from openmp-diff-full-4.5-5.0.pdf, if you don't have that one, I can
send it to you), rather than trying to guess them from the patchset.

Thanks.


Hi Jakub,
the main implemented features are the clause ordering rules:

 "For a given construct, the effect of a map clause with the to, from, or 
tofrom map-type is
  ordered before the effect of a map clause with the alloc, release, or delete 
map-type."

 "If item1 is a list item in a map clause, and item2 is another list item in a 
map clause on
  the same construct that has a base pointer that is, or is part of, item1, 
then:
* If the map clause(s) appear on a target, target data, or target enter 
data construct,
  then on entry to the corresponding region the effect of the map clause on 
item1 is ordered
  to occur before the effect of the map clause on item2.
    * If the map clause(s) appear on a target, target data, or target exit 
data construct then
  on exit from the corresponding region the effect of the map clause on 
item2 is ordered to
  occur before the effect of the map clause on item1."

and the base-pointer attachment behavior:

 "If a list item in a map clause has a base pointer, and a pointer variable is 
present in the device data
  environment that corresponds to the base pointer when the effect of the map 
clause occurs, then if
  the corresponding pointer or the corresponding list item is created in the 
device data environment
  on entry to the construct, then:
...
2. The corresponding pointer variable becomes an attached pointer for the 
corresponding list item."

(these passages are all in the "2.19.7.1 map Clause" section of the 5.0 spec, 
all are new as
also verified from the diff PDFs you sent us)

Also, because of the these new features, having multiple maps of the same 
variables now have meaning
in OpenMP, so changes in the C/C++ frontends to relax the no-duplicate rules 
are also included.


 gcc/c-family/
 * c-common.h (c_omp_adjust_clauses): New declaration.
 * c-omp.c (c_omp_adjust_clauses): New function.


This function name is too broad, it should have target in it as it is
for processing target* construct clauses only.

Jakub


Sure, I'll update this naming in a later version.

Thanks,
Chung-Lin

Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 04:48:40PM +0200, Tom de Vries wrote:
> On 10/5/20 3:15 PM, Tom de Vries wrote:
> > On 2/7/20 4:29 PM, Jakub Jelinek wrote:
> >> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
> >>> * {target-32.c, thread-limit-2.c}:
> >>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> >>
> >> Please don't, I want to deal with that using declare variant, just didn't
> >> get yet around to finishing the last patch needed for that.  Will try next 
> >> week.
> >>
> > 
> > Hi Jakub,
> > 
> > Ping, any update on this?

Not finished the last step, I run into LTO issues.  Will need to return to
that soon.
Last progress in "[RFH] LTO cgraph support for late declare variant resolution"
mail from May on gcc-patches.

> --- a/libgomp/testsuite/libgomp.c/target-32.c
> +++ b/libgomp/testsuite/libgomp.c/target-32.c
> @@ -1,6 +1,26 @@
>  #include 
>  #include 
>  
> +extern void base_delay(int);

No need to declare this one early.

> +extern void nvptx_delay(int);

Space before (, and the definition could go here instead of
the declaration.

> +#pragma omp declare variant( nvptx_delay ) match( construct={target}, 
> implementation={vendor(nvidia)} )

This isn't the right declare variant for what we want though,
we only provide gnu as accepted vendor, it is implementation's vendor,
not vendor of one of the hw components.
So, it ought to be instead
#pragma omp declare variant (nvptx_delay) 
match(construct={target},device={arch(nvptx)})

> +void base_delay(int d)
> +{
> +  usleep (d);
> +}

Jakub

[committed, wwwdocs] gcc-11/changes: Add notes about column number changes

2020-10-06 Thread David Malcolm via Gcc-patches

I've taken the liberty of pushing this website patch, having checked
that it validates.

It covers the changes by Lewis in 004bb936d6d5f177af26ad4905595e843d5665a5
(PR 49973 and PR 86904).

---
 htdocs/gcc-11/changes.html | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 64655120..e2a32e51 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -72,6 +72,45 @@ a work-in-progress.
   control if function entries and exits should be instrumented.
 
   
+  
+
+  In previous releases of GCC, the "column numbers" emitted in diagnostics
+  were actually a count of bytes from the start of the source line.  This
+  could be problematic, both because of:
+
+
+  multibyte characters (requiring more than one byte to encode), 
and
+  multicolumn characters (requiring more than one column to display in 
a monospace font)
+
+
+  For example, the character π ("GREEK SMALL LETTER PI (U+03C0)")
+  occupies one column, and its UTF-8 encoding requires two bytes; the
+  character 🙂 ("SLIGHTLY SMILING FACE (U+1F642)") occupies two
+  columns, and its UTF-8 encoding requires four bytes.
+
+
+  In GCC 11 the column numbers default to being column numbers, respecting
+  multi-column characters.  The old behavior can be restored using a new
+  option
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-column-unit";>-fdiagnostics-column-unit=byte.
+  There is also a new option
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-column-origin";>-fdiagnostics-column-origin=,
+  allowing the pre-existing default of the left-hand column being column
+  1 to be overridden if desired (e.g. for 0-based columns).  The output
+  of
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format";>-fdiagnostics-format=json
+  has been extended to supply both byte counts and column numbers for all 
source locations.
+
+
+  Additionally, in previous releases of GCC, tab characters in the source
+  would be emitted verbatim when quoting source code, but be prefixed
+  with whitespace or line number information, leading to misalignments
+  in the resulting output when compared with the actual source.  Tab
+  characters are now printed as an appropriate number of spaces, using the
+  https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#index-ftabstop";>-ftabstop
+  option (which defaults to 8 spaces per tab stop).
+
+  
 
 
 
-- 
2.26.2

Re: [PATCH] rs6000: Fix extraneous characters in the documentation

2020-10-06 Thread will schmidt via Gcc-patches

On Mon, 2020-10-05 at 17:23 -0300, Tulio Magno Quites Machado Filho via 
Gcc-patches wrote:
> Ping?
+cc Segher  :-)

> 
> Tulio Magno Quites Machado Filho via Gcc-patches  
> writes:
> 
> > Replace them with a whitespace in order to avoid artifacts in the HTML
> > document.
> > 
> > 2020-08-19  Tulio Magno Quites Machado Filho  
> > 
> > gcc/
> > * doc/extend.texi (PowerPC Built-in Functions): Replace
> > extraneous characters with whitespace.
> > ---
> >  gcc/doc/extend.texi | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index bcc251481ca..0c380322280 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -21538,10 +21538,10 @@ void amo_stdat_smin (int64_t *, int64_t);
> >  ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
> >  GCC provides support for these instructions through the following built-in
> >  functions which are enabled with the @code{-mmma} option.  The vec_t type
> > -below is defined to be a normal vector unsigned char type.  The uint2, 
> > uint4
> > +below is defined to be a normal vector unsigned char type.  The uint2, 
> > uint4

That looks like a non-breaking space.  (ascii c2 a0) so 
2e c2 a0 20 becomes 2e 20 20 


> >  and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
> > -respectively.  The compiler will verify that they are constants and that
> > -their values are within range. 
> > +respectively.  The compiler will verify that they are constants and that
> > +their values are within range.

2e c2 a0 20 becomes 2e 20 20

And drops a trailing whitespace.

Those seem reasonable. 
lgtm

Thanks
-Will

> >  
> >  The built-in functions supported are:
> >  
> > -- 
> > 2.25.4
> > 
> 
>

Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries

On 10/6/20 5:02 PM, Jakub Jelinek wrote:
> On Tue, Oct 06, 2020 at 04:48:40PM +0200, Tom de Vries wrote:
>> On 10/5/20 3:15 PM, Tom de Vries wrote:
>>> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
 On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
> * {target-32.c, thread-limit-2.c}:
> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

 Please don't, I want to deal with that using declare variant, just didn't
 get yet around to finishing the last patch needed for that.  Will try next 
 week.

>>>
>>> Hi Jakub,
>>>
>>> Ping, any update on this?
> 
> Not finished the last step, I run into LTO issues.  Will need to return to
> that soon.
> Last progress in "[RFH] LTO cgraph support for late declare variant 
> resolution"
> mail from May on gcc-patches.
> 

Ack, thanks for the update.

>> --- a/libgomp/testsuite/libgomp.c/target-32.c
>> +++ b/libgomp/testsuite/libgomp.c/target-32.c
>> @@ -1,6 +1,26 @@
>>  #include 
>>  #include 
>>  
>> +extern void base_delay(int);
> 
> No need to declare this one early.
> 
>> +extern void nvptx_delay(int);
> 
> Space before (, and the definition could go here instead of
> the declaration.
> 
>> +#pragma omp declare variant( nvptx_delay ) match( construct={target}, 
>> implementation={vendor(nvidia)} )
> 
> This isn't the right declare variant for what we want though,
> we only provide gnu as accepted vendor, it is implementation's vendor,
> not vendor of one of the hw components.
> So, it ought to be instead
> #pragma omp declare variant (nvptx_delay) 
> match(construct={target},device={arch(nvptx)})
> 
>> +void base_delay(int d)
>> +{
>> +  usleep (d);
>> +}

I've updated the patch accordingly.

FWIW, I now run into an ICE which looks like PR96680:
...
lto1: internal compiler error: in lto_fixup_prevailing_decls, at
lto/lto-common.c:2595^M
0x93afcd lto_fixup_prevailing_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2595^M
0x93b1d6 lto_fixup_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2645^M
0x93bcc4 read_cgraph_and_symbols(unsigned int, char const**)^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2897^M
0x910358 lto_main()^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto.c:625^M
...

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..b8deae72b08 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,25 @@
 #include 
 #include 
 
+void
+nvptx_delay (int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case.  It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
+#pragma omp declare variant (nvptx_delay) match(construct={target},device={arch(nvptx)})
+void
+base_delay(int d)
+{
+  usleep (d);
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +37,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }

[committed][GCC 8] arm: Add missing part number for Neoverse V1

2020-10-06 Thread Alex Coplan via Gcc-patches

This patch adds the part number for Neoverse V1 which was missing from
the initial AArch32 support in GCC 8.

Bootstrapped and regtested on arm-none-linux-gnueabihf, pushing as
obvious.

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/driver-arm.c (arm_cpu_table): Add neoverse-v1.
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 45ad92e..8352289 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -56,6 +56,7 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xd05", "armv8.2-a+fp16+dotprod", "cortex-a55"},
 {"0xd0a", "armv8.2-a+fp16+dotprod", "cortex-a75"},
+{"0xd40", "armv8.4-a+fp16", "neoverse-v1"},
 {"0xd49", "armv8.4-a+fp16", "neoverse-n2"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},

Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-10-06 Thread Mark Wielaard

Hi,

On Fri, 2020-09-18 at 17:21 +0200, Mark Wielaard wrote:
> On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote:
> > Ok, here it is in patch form.
> > I've briefly tested it, with the older binutils I have around (no --gdwarf-N
> > support), with latest gas (--gdwarf-N that can be passed to as even when
> > compiling C/C++ etc. code and emitting .debug_line) and latest gas with 
> > Mark's fix
> > reverted (--gdwarf-N support, but can only pass it to as when assembling
> > user .s/.S files, not when compiling C/C++ etc.).
> > Will bootstrap/regtest (with the older binutils) later tonight.
> > 
> > 2020-09-15  Jakub Jelinek  
> > 
> > * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
> > HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
> > * gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
> > (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
> > "--gdwarf2".  Use %{cond:opt1;:opt2} style.
> > (ASM_DEBUG_OPTION_DWARF_OPT): Define.
> > (ASM_DEBUG_OPTION_SPEC): Define.
> > (asm_debug_option): New variable.
> > (asm_options): Add "%(asm_debug_option)".
> > (static_specs): Add asm_debug_option entry.
> > (static_spec_functions): Add dwarf-version-gt.
> > (debug_level_greater_than_spec_func): New function.
> > * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
> > * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
> > * config.in: Regenerated.
> > * configure: Regenerated.
> 
> Once this is in we can more generally emit DW_FORM_line_str for
> filepaths in CU DIEs for the name and comp_dir attribute. There
> currently is a bit of a hack to do this in dwarf2out_early_finish, but
> that only works when the assembler doesn't emit a DWARF5 .debug_line,
> but gcc does it itself.
> 
> What do you think of the attached patch?
>
> DWARF5 has a new string table specially for file paths. .debug_line
> file and dir tables reference strings in .debug_line_str.  If a
> .debug_line_str section is emitted then also place CU DIE file
> names and comp dirs there.
> 
> gcc/ChangeLog:
> 
>   * dwarf2out.c (add_filepath_AT_string): New function.
>   (asm_outputs_debug_line_str): Likewise.
>   (add_filename_attribute): Likewise.
>   (add_comp_dir_attribute): Call add_filepath_AT_string.
>   (gen_compile_unit_die): Call add_filename_attribute for name.
>   (init_sections_and_labels): Init debug_line_str_section when
>   asm_outputs_debug_line_str return true.
>   (dwarf2out_early_finish): Remove DW_AT_name and DW_AT_comp_dir
>   hack and call add_filename_attribute for the remap_debug_filename.

On top of that, we also need the following, which makes sure the actual
compilation directory is used in a DWARF5 .debug_line directory table
(and not just a relative path).


From 66b25bc0a5df06e211b48a54e3b5d33999c24fb6 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Tue, 6 Oct 2020 17:41:19 +0200
Subject: [PATCH] debug: Make sure to output .file 0 when generating DWARF5.

When gas outputs DWARF5 .debug_line[_str] then we have to tell it the
comp_dir and main file name for the zero entry line table. Otherwise
gas has to guess at the CU compilation directory and file.

Before a gcc -gdwarf-5 ../src/hello.c line table looked like:

Directory table:
 0 ../src (24)
 1 ../src (24)
 2 /usr/include (31)

File name table:
 0 hello.c (16),  0
 1 hello.c (16),  1
 2 stdio.h (44),  2

With this patch it looks like:

Directory table:
 0 /tmp/obj (0)
 1 ../src (24)
 2 /usr/include (31)

File name table:
 0 ../src/hello.c (9),  0
 1 hello.c (16),  1
 2 stdio.h (44),  2

gcc/ChangeLog:

	* dwarf2out.c (dwarf2out_finish): Emit .file 0 entry when
	generating DWARF5 .debug_line table through gas.
---
 gcc/dwarf2out.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index a43082864a75..399937a9f310 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -31764,6 +31764,27 @@ dwarf2out_finish (const char *filename)
   ASM_OUTPUT_LABEL (asm_out_file, debug_line_section_label);
   if (! output_asm_line_debug_info ())
 output_line_info (false);
+  else if (asm_outputs_debug_line_str ())
+{
+  /* When gas outputs DWARF5 .debug_line[_str] then we have to
+	 tell it the comp_dir and main file name for the zero entry
+	 line table.  */
+  const char *comp_dir, *filename0;
+
+  comp_dir = comp_dir_string ();
+  if (comp_dir == NULL)
+	comp_dir = "";
+
+  filename0 = get_AT_string (comp_unit_die (), DW_AT_name);
+  if (filename0 == NULL)
+	filename0 = "";
+
+  fprintf (asm_out_file, "\t.file 0 ");
+  output_quoted_string (asm_out_file, remap_debug_filename (comp_dir));
+  fputc (' ', asm_out_file);
+  output_quoted_string (asm_out_file, remap_debug_filename (filename0));
+  fputc ('\n', asm_out_file);
+}
 
   if (dwarf_split_debug_info && info_section_emitted)
 {
-- 
2.18.4

[PATCH v2] arm: [MVE[ Add vqdmlashq intrinsics

2020-10-06 Thread Christophe Lyon via Gcc-patches

This patch adds:
vqdmlashq_m_n_s16
vqdmlashq_m_n_s32
vqdmlashq_m_n_s8
vqdmlashq_n_s16
vqdmlashq_n_s32
vqdmlashq_n_s8

v2: rebased after Srinath's reorganization patch

2020-10-05  Christophe Lyon  

gcc/
PR target/96914
* config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define.
* config/arm/arm_mve_builtins.def (vqdmlashq_n_s)
(vqdmlashq_m_n_s,): New.
* config/arm/unspecs.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
unspecs.
* config/arm/iterators.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
attributes.
(VQDMLASHQ_N): New iterator.
* config/arm/mve.md (mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New
patterns.

gcc/tetsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
---
 gcc/config/arm/arm_mve.h   | 116 +
 gcc/config/arm/arm_mve_builtins.def|   2 +
 gcc/config/arm/iterators.md|   3 +
 gcc/config/arm/mve.md  |  33 ++
 gcc/config/arm/unspecs.md  |   2 +
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c  |  23 
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c   |  21 
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c   |  21 
 .../gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c |  21 
 11 files changed, 288 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index d9bfb203..7626ad1 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -141,6 +141,7 @@
 #define vrev64q_m(__inactive, __a, __p) __arm_vrev64q_m(__inactive, __a, __p)
 #define vqrdmlashq(__a, __b, __c) __arm_vqrdmlashq(__a, __b, __c)
 #define vqrdmlahq(__a, __b, __c) __arm_vqrdmlahq(__a, __b, __c)
+#define vqdmlashq(__a, __b, __c) __arm_vqdmlashq(__a, __b, __c)
 #define vqdmlahq(__a, __b, __c) __arm_vqdmlahq(__a, __b, __c)
 #define vmvnq_m(__inactive, __a, __p) __arm_vmvnq_m(__inactive, __a, __p)
 #define vmlasq(__a, __b, __c) __arm_vmlasq(__a, __b, __c)
@@ -260,6 +261,7 @@
 #define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a, __b, 
__p)
 #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, 
__b, __p)
 #define vqdmladhq_m(__inactive, __a, __b, __p) __arm_vqdmladhq_m(__inactive, 
__a, __b, __p)
+#define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b, __c, __p)
 #define vqdmladhxq_m(__inactive, __a, __b, __p) __arm_vqdmladhxq_m(__inactive, 
__a, __b, __p)
 #define vqdmlahq_m(__a, __b, __c, __p) __arm_vqdmlahq_m(__a, __b, __c, __p)
 #define vqdmlsdhq_m(__inactive, __a, __b, __p) __arm_vqdmlsdhq_m(__inactive, 
__a, __b, __p)
@@ -1307,6 +1309,7 @@
 #define vqdmlsdhxq_s8(__inactive, __a, __b) __arm_vqdmlsdhxq_s8(__inactive, 
__a, __b)
 #define vqdmlsdhq_s8(__inactive, __a, __b) __arm_vqdmlsdhq_s8(__inactive, __a, 
__b)
 #define vqdmlahq_n_s8(__a, __b, __c) __arm_vqdmlahq_n_s8(__a, __b, __c)
+#define vqdmlashq_n_s8(__a, __b, __c) __arm_vqdmlashq_n_s8(__a, __b, __c)
 #define vqdmladhxq_s8(__inactive, __a, __b) __arm_vqdmladhxq_s8(__inactive, 
__a, __b)
 #define vqdmladhq_s8(__inactive, __a, __b) __arm_vqdmladhq_s8(__inactive, __a, 
__b)
 #define vmlsdavaxq_s8(__a, __b, __c) __arm_vmlsdavaxq_s8(__a, __b, __c)
@@ -1391,6 +1394,7 @@
 #define vqrdmladhq_s16(__inactive, __a, __b) __arm_vqrdmladhq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhxq_s16(__inactive, __a, __b) __arm_vqdmlsdhxq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhq_s16(__inactive, __a, __b) __arm_vqdmlsdhq_s16(__inactive, 
__a, __b)
+#define vqdmlashq_n_s16(__a, __b, __c) __arm_vqdmlashq_n_s16(__a, __b, __c)
 #define vqdmlahq_n_s16(__a, __b, __c) __arm_vqdmlahq_n_s16(__a, __b, __c)
 #define vqdmladhxq_s16(__inactive, __a, __b) __arm_vqdmladhxq_s16(__inactive, 
__a, __b)
 #define vqdmladhq_s16(__inactive, __a, __b) __arm_vqdmladhq_s16(__inactive, 
__a, __b)
@@ -1476,6 +1480,7 @@
 #define vqrdmladhq_s32(__inactive, __a, __b) __arm_vqrdmladhq_s32(__in

[PATCH v2] arm: [MVE] Remove illegal intrinsics

2020-10-06 Thread Christophe Lyon via Gcc-patches

A few MVE intrinsics had an unsigned variant implement while they are
supported by the hardware.  This patch removes them:
__arm_vqrdmlashq_n_u8
__arm_vqrdmlahq_n_u8
__arm_vqdmlahq_n_u8
__arm_vqrdmlashq_n_u16
__arm_vqrdmlahq_n_u16
__arm_vqdmlahq_n_u16
__arm_vqrdmlashq_n_u32
__arm_vqrdmlahq_n_u32
__arm_vqdmlahq_n_u32
__arm_vmlaldavaxq_p_u32
__arm_vmlaldavaxq_p_u16

v2: rebased after Srinath's reorganization patch

2020-10-06  Christophe Lyon  

gcc/
PR target/96914
* config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
(vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
(vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32)
(vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
* config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u)
(vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
* config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U)
(VMLALDAVAXQ_P_U): Remove unspecs.
* config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
(VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove
unsigned variants from iterators.
* config/arm/mve.md (mve_vqdmlahq_n_)
(mve_vqrdmlahq_n_)
(mve_vqrdmlashq_n_, mve_vmlaldavaxq_p_):
Update comment.

gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
---
 gcc/config/arm/arm_mve.h   | 199 +
 gcc/config/arm/arm_mve_builtins.def|   4 -
 gcc/config/arm/iterators.md|  16 +-
 gcc/config/arm/mve.md  |   8 +-
 gcc/config/arm/unspecs.md  |   4 -
 .../arm/mve/intrinsics/vmlaldavaxq_p_u16.c |  21 ---
 .../arm/mve/intrinsics/vmlaldavaxq_p_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u16.c   |  21 ---
 .../arm/mve/intrinsics/vqrdmlahq_n_u32.c   |  21 ---
 .../gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u16.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u32.c  |  21 ---
 .../arm/mve/intrinsics/vqrdmlashq_n_u8.c   |  21 ---
 16 files changed, 19 insertions(+), 443 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c
 delete mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c
 delete mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 7626ad1..ccdac67 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1237,9 +1237,6 @@
 #define vpselq_u8(__a, __b, __p) __arm_vpselq_u8(__a, __b, __p)
 #define vpselq_s8(__a, __b, __p) __arm_vpselq_s8(__a, __b, __p)
 #define vrev64q_m_u8(__inactive, __a, __p) __arm_vrev64q_m_u8(__inactive, __a, 
__p)
-#define vqrdmlashq_n_u8(__a, __b, __c) __arm_vqrdmlashq_n_u8(__a, __b, __c)
-#define vqrdmlahq_n_u8(__a, __b, __c) __arm_vqrdmlahq_n_u8(__a, __b, __c)
-#define vqdmlahq_n_u8(__a, __b, __c) __arm_vqdmlahq_n_u8(__a, __b, __c)
 #define vmvnq_m_u8(__inactive, __a, __p) __arm_vmvnq_m_u8(__inactive, __a, __p)
 #define vmlasq_n_u8(__a, __b, __c) __arm_vmlasq_n_u8(__a, __b, __c)
 #define vmlaq_n_u8(__a, __b, __c) __arm_vmlaq_n_u8(__a, __b, __c)
@@ -1323,9 +1320,6 @@
 #

Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-06 Thread Richard Earnshaw via Gcc-patches

On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote:
> When mi_delta is > 255 and -mpure-code is used, we cannot load delta
> from code memory (like we do without -mpure-code).
> 
> This patch builds the value of mi_delta into r3 with a series of
> movs/adds/lsls.
> 
> We also do some cleanup by not emitting the function address and delta
> via .word directives at the end of the thunk since we don't use them
> with -mpure-code.
> 
> No need for new testcases, this bug was already identified by
> eg. pr46287-3.C
> 
> 2020-09-29  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and
>   do not emit function address and delta when -mpure-code is used.

There are some optimizations you can make to this code.

Firstly, for values between 256 and 510 (inclusive), it would be better
to just expand a mov of 255 followed by an add.  This is also true for
the literal pools alternative as well, so should be handled before all
this.  I also suspect (but haven't check) that the base adjustment will
most commonly be a multiple of the machine word size (ie 4).  If that is
the case then you could generate n/4 and then shift it left by 2 for an
even greater range of literals.  More generally, any sequence of up to
three thumb1 instructions will be no larger, and probably as fast as the
existing literal pool fall back.

Secondly, if the value is, for example, 65536 (0x1), your code will
emit a mov followed by two shift-by-8 instructions; the two shifts could
be merged into a single shift-by-16.

Finally, I'd really like to see some executable tests for this, if at
all possible.

R.

> 
> k#   (use "git pull" to merge the remote branch into yours)
> ---
>  gcc/config/arm/arm.c | 91 
> +---
>  1 file changed, 66 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index ceeb91f..62abeb5 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -28342,9 +28342,43 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT 
> delta,
>  {
>if (mi_delta > 255)
>   {
> -   fputs ("\tldr\tr3, ", file);
> -   assemble_name (file, label);
> -   fputs ("+4\n", file);
> +   /* With -mpure-code, we cannot load delta from the constant
> +  pool: we build it explicitly.  */
> +   if (target_pure_code)
> + {
> +   bool mov_done_p = false;
> +   int i;
> +
> +   /* Emit upper 3 bytes if needed.  */
> +   for (i = 0; i < 3; i++)
> + {
> +   int byte = (mi_delta >> (8 * (3 - i))) & 0xff;
> +
> +   if (byte)
> + {
> +   if (mov_done_p)
> + asm_fprintf (file, "\tadds\tr3, #%d\n", byte);
> +   else
> + asm_fprintf (file, "\tmovs\tr3, #%d\n", byte);
> +   mov_done_p = true;
> + }
> +
> +   if (mov_done_p)
> + asm_fprintf (file, "\tlsls\tr3, #8\n");
> + }
> +
> +   /* Emit lower byte if needed.  */
> +   if (!mov_done_p)
> + asm_fprintf (file, "\tmovs\tr3, #%d\n", mi_delta & 0xff);
> +   else if (mi_delta & 0xff)
> + asm_fprintf (file, "\tadds\tr3, #%d\n", mi_delta & 0xff);
> + }
> +   else
> + {
> +   fputs ("\tldr\tr3, ", file);
> +   assemble_name (file, label);
> +   fputs ("+4\n", file);
> + }
> asm_fprintf (file, "\t%ss\t%r, %r, r3\n",
>  mi_op, this_regno, this_regno);
>   }
> @@ -28380,30 +28414,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, 
> HOST_WIDE_INT delta,
>   fputs ("\tpop\t{r3}\n", file);
>  
>fprintf (file, "\tbx\tr12\n");
> -  ASM_OUTPUT_ALIGN (file, 2);
> -  assemble_name (file, label);
> -  fputs (":\n", file);
> -  if (flag_pic)
> +
> +  /* With -mpure-code, we don't need to emit literals for the
> +  function address and delta since we emitted code to build
> +  them.  */
> +  if (!target_pure_code)
>   {
> -   /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn".  */
> -   rtx tem = XEXP (DECL_RTL (function), 0);
> -   /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC
> -  pipeline offset is four rather than eight.  Adjust the offset
> -  accordingly.  */
> -   tem = plus_constant (GET_MODE (tem), tem,
> -TARGET_THUMB1_ONLY ? -3 : -7);
> -   tem = gen_rtx_MINUS (GET_MODE (tem),
> -tem,
> -gen_rtx_SYMBOL_REF (Pmode,
> -ggc_strdup (labelpc)));
> -   assemble_integer (tem, 4, BITS_PER_WORD, 1);
> - }
> -  else
> - /* Output ".word .LTHUNKn".  */
> - assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1);
>

Re: [patch] convert -Wrestrict pass to ranger

2020-10-06 Thread Aldy Hernandez via Gcc-patches





On 10/6/20 4:51 PM, Martin Sebor wrote:

On 10/6/20 8:42 AM, Andrew MacLeod wrote:

On 10/6/20 10:30 AM, Martin Sebor wrote:

On 10/6/20 3:45 AM, Aldy Hernandez wrote:

-  builtin_memref dstref (dst, dstsize);
-  builtin_memref srcref (src, srcsize);
+  builtin_memref dstref (query, call, dst, dstsize);
+  builtin_memref srcref (query, call, src, srcsize);

    /* Create a descriptor of the access.  This may adjust both 
DSTREF
   and SRCREF based on one another and the kind of the 
access.  */

-  builtin_access acs (call, dstref, srcref);
+  builtin_access acs (query, call, dstref, srcref);


Since/if the query pointer is a member of builtin_memref which is
passed to the builtin_access ctor there should be no need to pass
a second (and third) copy to it as well.


builtin_memref seems like an independent object altogether, and the 
query is a private member of said object.  Are you proposing making 
it public, or making builtin_access a friend of builtin_memref (eeech)?


Either one of those seems preferable to the duplication for the time
being, until there's an API to access the global ranger instance.

A better alternative, in view of your expectation of exposing
the instance via (cfun)->range_of_expr(), is to add some static
namespace scope function to access the range instance.  That
should make adopting the envisioned solution minimally disruptive.

The point was we don't have a fully envisioned solution yet... that is 
just one possibility and may never come to pass.   Each pass should do 
"the right thing" for themselves for now.


Yes, I got that.  Which is why I suggest to add a namespace scope
function to the restrict pass that can then be easily replaced with
whatever solution we ultimately end up with.

What's certain (in my mind anyway) is that storing a pointer to some
global (or per-pass) range instance as a member in each class that
needs to access it is not the solution we want long term.


Tell you what.  I'll make your class public, access it's internal 
members as you describe (ughh), and you can do anything else post-commit.


Aldy

RE: [PATCH v2] arm: [MVE] Remove illegal intrinsics

2020-10-06 Thread Kyrylo Tkachov via Gcc-patches

With gcc-patches on too.
Not sure why the reply-all function fails for your address
Kyrill

> -Original Message-
> From: Kyrylo Tkachov
> Sent: 06 October 2020 17:13
> To: Christophe Lyon 
> Subject: RE: [PATCH v2] arm: [MVE] Remove illegal intrinsics
> 
> 
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 06 October 2020 16:59
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH v2] arm: [MVE] Remove illegal intrinsics
> >
> > A few MVE intrinsics had an unsigned variant implement while they are
> > supported by the hardware.  This patch removes them:
> > __arm_vqrdmlashq_n_u8
> > __arm_vqrdmlahq_n_u8
> > __arm_vqdmlahq_n_u8
> > __arm_vqrdmlashq_n_u16
> > __arm_vqrdmlahq_n_u16
> > __arm_vqdmlahq_n_u16
> > __arm_vqrdmlashq_n_u32
> > __arm_vqrdmlahq_n_u32
> > __arm_vqdmlahq_n_u32
> > __arm_vmlaldavaxq_p_u32
> > __arm_vmlaldavaxq_p_u16
> >
> > v2: rebased after Srinath's reorganization patch
> 
> Ok.
> Thanks,
> Kyrill
> 
> >
> > 2020-10-06  Christophe Lyon  
> >
> > gcc/
> > PR target/96914
> > * config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
> > (vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
> > (vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16,
> > vqdmlahq_n_u32)
> > (vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
> > * config/arm/arm_mve_builtins.def (vqrdmlashq_n_u,
> > vqrdmlahq_n_u)
> > (vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
> > * config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
> > (VQRDMLASHQ_N_U)
> > (VMLALDAVAXQ_P_U): Remove unspecs.
> > * config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
> > (VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
> > (VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N,
> > VMLALDAVAXQ_P): Remove
> > unsigned variants from iterators.
> > * config/arm/mve.md (mve_vqdmlahq_n_)
> > (mve_vqrdmlahq_n_)
> > (mve_vqrdmlashq_n_,
> > mve_vmlaldavaxq_p_):
> > Update comment.
> >
> > gcc/testsuite/
> > PR target/96914
> > * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
> > * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
> > ---
> >  gcc/config/arm/arm_mve.h   | 199 
> > +
> >  gcc/config/arm/arm_mve_builtins.def|   4 -
> >  gcc/config/arm/iterators.md|  16 +-
> >  gcc/config/arm/mve.md  |   8 +-
> >  gcc/config/arm/unspecs.md  |   4 -
> >  .../arm/mve/intrinsics/vmlaldavaxq_p_u16.c |  21 ---
> >  .../arm/mve/intrinsics/vmlaldavaxq_p_u32.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlahq_n_u16.c   |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlahq_n_u32.c   |  21 ---
> >  .../gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u16.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u32.c  |  21 ---
> >  .../arm/mve/intrinsics/vqrdmlashq_n_u8.c   |  21 ---
> >  16 files changed, 19 insertions(+), 443 deletions(-)
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c
> >  delete mode 100644
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > ind

[PATCH][openacc, libgomp, testsuite] Xfail declare-5.f90

2020-10-06 Thread Tom de Vries

Hi,

We're currently running into:
...
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test
...

A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"

Xfail the fails.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[openacc, libgomp, testsuite] Xfail declare-5.f90

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

* testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.

---
 libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index 2fd25d611a9..ab434f7f127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-run-if "PR92790 - acc declare device_resident - Fortran common 
blocks not handled" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } }
 
 module vars
   implicit none

[committed] libstdc++: Inline std::exception_ptr members [PR 90295]

2020-10-06 Thread Jonathan Wakely via Gcc-patches

This inlines most members of std::exception_ptr so that all operations
on a null exception_ptr can be optimized away. This benefits code like
std::future and coroutines where an exception_ptr object is present to
cope with exceptional cases, but is usually not used and remains null.

Since those functions were previously non-inline we have to continue to
export them from the library, for objects that were compiled against the
old headers and expect to find definitions in the library.

In order to inline the copy constructor and destructor we need to export
the _M_addref() and _M_release() members that increment/decrement the
reference count when copying/destroying a non-null exception_ptr. The
copy ctor and dtor check for null and don't call _M_addref and
_M_release unless they need to. The checks for null pointers in
_M_addref and _M_release are still needed because old code might call
them without checking for null first. But we can use __builtin_expect to
predict that they are usually called for the non-null case.

libstdc++-v3/ChangeLog:

PR libstdc++/90295
* config/abi/pre/gnu.ver (CXXABI_1.3.13): New symbol version.
(exception_ptr::_M_addref(), exception_ptr::_M_release()):
Export symbols.
* libsupc++/eh_ptr.cc (exception_ptr::exception_ptr()):
Remove out-of-line definition.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
(exception_ptr::_M_addref()): Add branch prediction.
* libsupc++/exception_ptr.h (exception_ptr::operator bool):
Add noexcept.
[!_GLIBCXX_EH_PTR_COMPAT] (operator==, operator!=): Define
inline as hidden friends. Remove declarations at namespace
scope.
(exception_ptr::exception_ptr()): Define inline.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
* testsuite/util/testsuite_abi.cc: Add CXXABI_1.3.13.
* testsuite/18_support/exception_ptr/90295.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 1352ea192513e9a45808b8034df62b9434c674a7
Author: Jonathan Wakely 
Date:   Tue Oct 6 16:55:06 2020

libstdc++: Inline std::exception_ptr members [PR 90295]

This inlines most members of std::exception_ptr so that all operations
on a null exception_ptr can be optimized away. This benefits code like
std::future and coroutines where an exception_ptr object is present to
cope with exceptional cases, but is usually not used and remains null.

Since those functions were previously non-inline we have to continue to
export them from the library, for objects that were compiled against the
old headers and expect to find definitions in the library.

In order to inline the copy constructor and destructor we need to export
the _M_addref() and _M_release() members that increment/decrement the
reference count when copying/destroying a non-null exception_ptr. The
copy ctor and dtor check for null and don't call _M_addref and
_M_release unless they need to. The checks for null pointers in
_M_addref and _M_release are still needed because old code might call
them without checking for null first. But we can use __builtin_expect to
predict that they are usually called for the non-null case.

libstdc++-v3/ChangeLog:

PR libstdc++/90295
* config/abi/pre/gnu.ver (CXXABI_1.3.13): New symbol version.
(exception_ptr::_M_addref(), exception_ptr::_M_release()):
Export symbols.
* libsupc++/eh_ptr.cc (exception_ptr::exception_ptr()):
Remove out-of-line definition.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
(exception_ptr::_M_addref()): Add branch prediction.
* libsupc++/exception_ptr.h (exception_ptr::operator bool):
Add noexcept.
[!_GLIBCXX_EH_PTR_COMPAT] (operator==, operator!=): Define
inline as hidden friends. Remove declarations at namespace
scope.
(exception_ptr::exception_ptr()): Define inline.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likew

Re: [PATCH][openacc, libgomp, testsuite] Xfail declare-5.f90

2020-10-06 Thread Tobias Burnus


Hi Tom,

On 10/6/20 6:20 PM, Tom de Vries wrote:

FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test

A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"

Xfail the fails.

Tested on x86_64-linux with nvptx accelerator.
OK for trunk?


OK. I had hoped that it could be fixed soonish – as this obviously
didn't work out, XFAIL is the right solution.

Tobias



[openacc, libgomp, testsuite] Xfail declare-5.f90

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

  * testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.

---
  libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 1 +
  1 file changed, 1 insertion(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index 2fd25d611a9..ab434f7f127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,4 +1,5 @@
  ! { dg-do run }
+! { dg-xfail-run-if "PR92790 - acc declare device_resident - Fortran common blocks not handled" { 
*-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } }

  module vars
implicit none

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Martin Sebor via Gcc-patches


On 10/6/20 1:52 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 06, 2020 at 09:37:21AM +0200, Aldy Hernandez via Gcc-patches wrote:

Pushed as obvious.

gcc/ChangeLog:

* value-range.h (irange_allocator::allocate): Increase
newir storage by one.
---
  gcc/value-range.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 94b48e55e77..7031a823138 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -670,7 +670,7 @@ irange_allocator::allocate (unsigned num_pairs)

struct newir {
  irange range;
-tree mem[1];
+tree mem[2];
};
size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
struct newir *r = (newir *) obstack_alloc (&m_obstack, nbytes);


So, we essentially want a flexible array member, which C++ without extension
doesn't have, and thus need to rely on the compiler handling the trailing
array as a poor men's flexible array member (again, GCC does for any size,
but not 100% sure about other compilers, if they e.g. don't handle that way
just size of 1).


The manual documents the [0] extension and mentions but discourages
using [1].  Nothing is said about other sizes and the warnings such
as -Warray-bounds have been increasingly complaining about accesses
past the declared constant bound (it won't complain about past-
the-end accesses to a mem[1], but will about those to mem[2]).

It would be nice if existing GCC code could eventually be converted
to avoid relying on the [1] hack.  I would hope we would avoid making
use of it in new code (and certainly avoid extending its uses to other
sizes).

If it's difficult to write efficient C++ code without relying on
these hacks we are in the perfect position to propose a solution
to C++.  Otherwise, if a portable solution already exists, we
should be able to adopt it.

Martin


Is there any reason why the code is written that way?
I mean, we could just use:
   size_t nbytes = sizeof (irange) + sizeof (tree) * 2 * num_pairs;
   irange *r = (irange *) obstack_alloc (&m_obstack, nbytes);
   return new (r) irange ((tree *) (r + 1), num_pairs);
without any new type.

Jakub

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-06 Thread Dennis Zhang via Gcc-patches

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> 

This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
(neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 (match_oper

[PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches

I have checked in the ranger classes/files.    They are being built but 
not being invoked until the other passes are checked in.


there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :    Outgoing edge range calculations, 
particularly switch edge ranges.
gimple-range-gori.{h,cc} :     "Generate Outgoing Range Info" module 
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} :         gimple_ranger which pulls together the 
other components and provides on-demand ranges.


and the Makefile.

the patches are the same as in the previous post last week.  New 
streamlined ChangeLog :-)


I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

    * Makefile.in (OBJS): Add gimple-range*.o.
    * gimple-range.h: New file.
    * gimple-range.cc: New file.
    * gimple-range-cache.h: New file.
    * gimple-range-cache.cc: New file.
    * gimple-range-edge.h: New file.
    * gimple-range-edge.cc: New file.
    * gimple-range-gori.h: New file.
    * gimple-range-gori.cc: New file.

[PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-06 Thread Dennis Zhang via Gcc-patches

Hi all,

This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul3 to enable vectorization for MVE 
and modifies related vmul insns to support the expander by using 'mult' 
instead of unspec.
The mul3 for vectorization in vec-common.md uses mode iterator 
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE__ARITH are used to select supported modes for 
different targets. The redundant mul3 in neon.md is removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmulq): New entry for vmul instruction
using expression 'mult'.
(mve_vmulq_f): Use mult instead of VMULQ_F.
* config/arm/neon.md (mul3): Removed.
* config/arm/vec-common.md (mul3): Use the new mode macros
ARM_HAVE__ARITH. Use mode iterator VDQWH instead of VALLW.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vmul_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..5b2b609174c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2199,6 +2199,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vmulq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmul.i%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vornq_u, vornq_s])
 ;;
@@ -3210,9 +3221,8 @@
 (define_insn "mve_vmulq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMULQ_F))
+	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmul.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..f6632f1a25a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1899,17 +1899,6 @@
 (const_string "neon_mul_")))]
 )
 
-(define_insn "mul3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (mult:VH
-(match_operand:VH 1 "s_register_operand" "w")
-(match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
-  "vmul.f16\t%0, %1, %2"
- [(set_attr "type" "neon_mul_")]
-)
-
 (define_insn "neon_vmulf"
  [(set
(match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..45db60e7411 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -101,14 +101,11 @@
 })
 
 (define_expand "mul3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-(mult:VALLW (match_operand:VALLW 1 "s_register_operand")
-		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (mode == V4HImode && TARGET_REALLY_IWMMXT)"
-{
-})
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+		(match_operand:VDQWH 2 "s_register_operand")))]
+  "ARM_HAVE__ARITH"
+)
 
 (define_expand "smin3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
new file mode 100644
index 000..514f292c15e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] =

error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Tobias Burnus


Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope

Tobias

On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

* Makefile.in (OBJS): Add gimple-range*.o.
* gimple-range.h: New file.
* gimple-range.cc: New file.
* gimple-range-cache.h: New file.
* gimple-range-cache.cc: New file.
* gimple-range-edge.h: New file.
* gimple-range-edge.cc: New file.
* gimple-range-gori.h: New file.
* gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

[PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-06 Thread Dennis Zhang via Gcc-patches

Hi all,

This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin3, umin3, smax3 
and umax3 for vectorization.
Related insns for vmin/vmax in mve.md are modified to use smin, umin, 
smax and umax expressions instead of unspec to support the expanders.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmaxq_): Replace with ...
(mve_vmaxq_s, mve_vmaxq_u): ... these new insns to
use smax/umax instead of VMAXQ.
(mve_vminq_): Replace with ...
(mve_vminq_s, mve_vminq_u): ... these new insns to
use smin/umin instead of VMINQ.
(mve_vmaxnmq_f): Use smax instead of VMAXNMQ_F.
(mve_vminnmq_f): Use smin instead of VMINNMQ_F.
* config/arm/vec-common.md (smin3): Use the new mode macros
ARM_HAVE__ARITH.
(umin3, smax3, umax3): Likewise.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vminmax_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..0d9f932e983 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1977,15 +1977,25 @@
 ;;
 ;; [vmaxq_u, vmaxq_s])
 ;;
-(define_insn "mve_vmaxq_"
+(define_insn "mve_vmaxq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMAXQ))
+	(smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmax.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vmaxq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
+  "vmax.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2037,15 +2047,25 @@
 ;;
 ;; [vminq_s, vminq_u])
 ;;
-(define_insn "mve_vminq_"
+(define_insn "mve_vminq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMINQ))
+	(smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
+  "vmin.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vminq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmin.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3030,9 +3050,8 @@
 (define_insn "mve_vmaxnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMAXNMQ_F))
+	(smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmaxnm.f%#	%q0, %q1, %q2"
@@ -3090,9 +3109,8 @@
 (define_insn "mve_vminnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMINNMQ_F))
+	(smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vminnm.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..6a330cc82f6 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -114,39 +114,29 @@
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smin:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "umin3"
   [(set (match_operand:VINTW 0 "s_register_operand")
 	(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
 		(match_operand:VINTW 2 "s_register_operand")))]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "smax3"
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smax:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET

[PUSHED] Hybrid EVRP and testcases

2020-10-06 Thread Andrew MacLeod via Gcc-patches


I have now checked in the hybrid EVRP pass.

We have resolved all the issue we are aware of with a full Fedora build, 
but if any more issues arise, please let us know. (And Im sure you will :-)


I made some minor tweaks.   the option to the new -fevrp-mode  flag are now:

legacy             : classic EVRP mode
ranger             : Ranger only mode
*legacy-first     :  Query ranges with EVRP first, and if that fails try 
the ranger*

ranger-first     : Query the ranger first, then evrp
ranger-trace   : Ranger-only mode plus Show range tracing info in the dump
ranger-debug : Ranger-only mode, and also include all cache debugging info
trace                : Hybrid mode with range tracing info
debug          : Hybrid mode with cache debugging as well as tracing

The default is still *legacy-first*.


If there is a compilation problem, and the problem goes away with    
-fevrp-mode=legacy

the ranger is to blame, and let us know asap.


Attached is the patch which was applied.

---

These are the initial test case differences, and the detailed analysis 
is below:


1)  We now XPASS analyzer/pr94851-1.c.
2) -disable-tree-evrp added to gcc.dg/pr81192.c
3) -disable-tree-evrp added to gcc.dg/tree-ssa/pr77445-2.c
4) -disable-tree-evrp added to tree-ssa/ssa-dom-thread-6.c
5) -disable-tree-evrp added to tree-ssa/ssa-dom-thread-7.c


mostly it is one of 2 things:

1) we propagate a constant into into PHI that wasnt happening before, 
EVRP didn't  handle anything other than single entry blocks well.
2) switches are processed in a lot more detail, which again propagate a 
lot of values into PHIs, and it then triggers more threading.


Last minute update... It also turns out the analyzer xpass may be noise. 
we did change the IL, but it sounds like there are other analyzer things 
at play at the moment :-)



1)  We now XPASS analyzer/pr94851-1.c.


It was xfailing with:
In function ‘pamark’:
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:43:13: 
warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak]
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:29:6: note: 
(1) following ‘false’ branch (when ‘p’ is NULL)...
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:23: note: 
(2) ...to here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:23: note: 
(3) allocated here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:8: note: 
(4) assuming ‘p’ is non-NULL
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:32:8: note: 
(5) following ‘false’ branch (when ‘p’ is non-NULL)...
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:35:15: note: 
(6) ...to here
/gcc/master/gcc/gcc/testsuite/gcc.dg/analyzer/pr94851-1.c:43:13: note: 
(7) ‘p’ leaks here; was allocated at (3)


now we produce:
XPASS: gcc.dg/analyzer/pr94851-1.c bogus leak (test for bogus messages, 
line 43)



THe reason is in the IL:
  :
  # p_9 = PHI 
  # last_11 = PHI 
  if (p_9 != 0B)
    goto ; [INV]
  else
    goto ; [INV]  --> This outgoing edge

   :
  _3 = p_9->m_name;
  _4 = (char) _32;
  if (_3 != _4)
    goto ; [INV]
  else
    goto ; [INV]

   :
  # p_2 = PHI  <<<   THis PHI node
  # last_17 = PHI 
  if (p_2 != 0B)
    goto ; [INV]
  else
    goto ; [INV]

   :
  printf ("over writing mark %c\n", _32);
  goto ; [INV]


The ranger propagates the p_9 == 0 from the else branch into the PHI 
argument on edge 4->6

  :
  # p_2 = PHI <0B(4), p_9(5)>

which the threaders can then bypass the print in bb7 on one path, and 
that seems to resolve the current issue.


THe IL produced by the time we get to .optimized is identical, we just 
clear it up early enough for the analyzer to use now.


---

2) -fdisable-tree-evrp added to gcc.dg/pr81192.c to enable the test to pass

new version of evrp sees
 :
  if (j_8(D) != 2147483647)
    goto ; [50.00%]
  else
    goto ; [50.00%]
 :
  iftmp.2_11 = j_8(D) + 1;
 :
  # iftmp.2_12 = PHI 

EVRP now recognizes a constant can be propagated into the 3->5 edge and
produces
  # iftmp.2_12 = PHI <2147483647(3), iftmp.2_11(4)>
which causes the situation being tested to dissappear before we get to 
PRE.  */


---

3) -disable-tree-evrp added to gcc.dg/tree-ssa/pr77445-2.c

Aldy investigated this, and basically we are threading 6 more paths on 
x86_64 which is changing  the IL in visible ways.

Disabling evrp allows the threaders to test what they are looking for.

-

4) and 5)

along the same vein, we are threading anew opportunies in PHIs... these

Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 12:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope

Tobias

On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.    They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :    Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

    * Makefile.in (OBJS): Add gimple-range*.o.
    * gimple-range.h: New file.
    * gimple-range.cc: New file.
    * gimple-range-cache.h: New file.
    * gimple-range-cache.cc: New file.
    * gimple-range-edge.h: New file.
    * gimple-range-edge.cc: New file.
    * gimple-range-gori.h: New file.
    * gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / 
Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
Alexander Walter



Dang.  THe latest checkin fixes that

DOh.

Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Tobias Burnus


On 10/6/20 6:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope


And now builds – as the "Hybrid EVRP and testcases" was pushed as well,
a bit more than a quarter of an hour later. (At least it finished
building the compiler itself, I do not expect surprises in the library
parts.)

Tobias


On 10/6/20 6:49 PM, Andrew MacLeod via Gcc-patches wrote:

I have checked in the ranger classes/files.They are being built
but not being invoked until the other passes are checked in.

there are 8 new files:

gimple-range-cache.{h,cc} :   Various caches used by the ranger.
gimple-range-edge.{h,cc} :Outgoing edge range calculations,
particularly switch edge ranges.
gimple-range-gori.{h,cc} : "Generate Outgoing Range Info" module
which calculates ranges on exit to basic blocks.
gimple-range.{h,cc} : gimple_ranger which pulls together
the other components and provides on-demand ranges.

and the Makefile.

the patches are the same as in the previous post last week.  New
streamlined ChangeLog :-)

I'll check in the hybrid EVRP next and finally a few testcase changes.

Andrew

2020-10-06  Andrew MacLeod  

* Makefile.in (OBJS): Add gimple-range*.o.
* gimple-range.h: New file.
* gimple-range.cc: New file.
* gimple-range-cache.h: New file.
* gimple-range-cache.cc: New file.
* gimple-range-edge.h: New file.
* gimple-range-edge.cc: New file.
* gimple-range-gori.h: New file.
* gimple-range-gori.cc: New file.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: error: ‘EVRP_MODE_DEBUG’ was not declared – was: [PUSHED] Ranger classes.

2020-10-06 Thread Andrew MacLeod via Gcc-patches


On 10/6/20 1:10 PM, Tobias Burnus wrote:

On 10/6/20 6:56 PM, Tobias Burnus wrote:

Build fails here now with: gimple-range.h:168:59: error:
‘EVRP_MODE_DEBUG’ was not declared in this scope


And now builds – as the "Hybrid EVRP and testcases" was pushed as well,
a bit more than a quarter of an hour later. (At least it finished
building the compiler itself, I do not expect surprises in the library
parts.)

Tobias
Guess I should have just pushed it all as one commit. I thought the 
first part was pretty separate from the second... and it was except for 
one line :-P  of course I had problems getting the second one out or it 
would have followed quicker.


Sorry for the noise.

Andrew

Re: [PUSHED] Fix off-by-one storage problem in irange_allocator.

2020-10-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 06, 2020 at 10:42:12AM -0600, Martin Sebor wrote:
> The manual documents the [0] extension and mentions but discourages
> using [1].  Nothing is said about other sizes and the warnings such
> as -Warray-bounds have been increasingly complaining about accesses
> past the declared constant bound (it won't complain about past-
> the-end accesses to a mem[1], but will about those to mem[2]).
> 
> It would be nice if existing GCC code could eventually be converted
> to avoid relying on the [1] hack.  I would hope we would avoid making
> use of it in new code (and certainly avoid extending its uses to other
> sizes).

I don't see how we could, because [0] is an extension and GCC needs to
support host compilers that don't support it, and similarly [] is an
extension in C++ and can't be relied on.
Changing say rtl or gimple structs from the way we define them currently
to say templates dependent on the size of the embedded arrays is I'm
afraid not really possible.

Jakub

Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-06 Thread Ian Lance Taylor via Gcc-patches

On Tue, Oct 6, 2020 at 3:20 AM Martin Liška  wrote:
>
> On 10/6/20 10:00 AM, Richard Biener wrote:
> > On Tue, Oct 6, 2020 at 9:01 AM Martin Liška  wrote:
> >>
> >> On 10/5/20 6:34 PM, Ian Lance Taylor wrote:
> >>> On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
> 
>  The previous patch was not correct. This one should be.
> 
>  Ready for master?
> >>>
> >>> I don't understand why this code uses symtab_indices_shndx at all.
> >>> There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
> >>> any need for the symtab_indices_shndx vector.
> >>
> >> Well, the question is if we can have multiple .symtab sections in one ELF
> >> file? Theoretically yes, so we should also handle SHT_SYMTAB_SHNDX 
> >> sections.
> >> Note that the original usage of the SHT_SYMTAB_SHNDX section was motivated
> >> by PR81968 which is about Solaris ld.
> >
> > It wasn't my code but I suppose this way the implementation was
> > "easiest".  There
> > should be exactly one symtab / shndx section.  Rainer authored this support.
>
> If we expect at maximum one SHT_SYMTAB_SHNDX section section, then I'm 
> suggesting
> an updated version of the patch. It's what Ian offered.

This is OK with me with one minor change.

> + return "Multiple SYMTAB SECTION INDICES sections";

I think simply "More than one SHT_SYMTAB_SHNDX section".  SYMTAB
SECTION INDICES doesn't mean anything to me, and at least people can
do a web search for SHT_SYMTAB_SHNDX.

Thanks.

Ian

1 2 >

1 - 100 of 135 matches

Mail list logo