Re: [PATCH] Don't simplify (A & C) != 0 ? D : 0 for pointer types.

2021-05-17 Thread Richard Biener via Gcc-patches
On Sun, May 16, 2021 at 7:35 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> While rewriting part of PHI-OPT to use match-and-simplify,
> I ran into a bug where this pattern in match.pd would hit
> and would produce invalid gimple; a shift of a pointer type.
>
> This just disables this simplification for pointer types similarly
> to what is already done in PHI-OPT for the generic A ? D : 0 case.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> gcc/ChangeLog:
> * match.pd ((A & C) != 0 ? D : 0): Limit to non pointer types.
>
> gcc/testsuite/ChangeLog:
> * testsuite/gcc.dg/gimplefe-45.c: New testcase.
> ---
>  gcc/match.pd   |  2 +-
>  gcc/testsuite/gcc.dg/gimplefe-45.c | 19 +++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/gimplefe-45.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cdb87636951..10503b97ab5 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4830,7 +4830,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond
>(ne (bit_and @0 integer_pow2p@1) integer_zerop)
>INTEGER_CST@2 integer_zerop)
> - (if (integer_pow2p (@2))
> + (if (!POINTER_TYPE_P (type) && integer_pow2p (@2))

Might be more natural to check INTEGRAL_TYPE_P (type) && integer_pow2p (@2)

OK either way.

Richard.

>(with {
>   int shift = (wi::exact_log2 (wi::to_wide (@2))
>   - wi::exact_log2 (wi::to_wide (@1)));
> diff --git a/gcc/testsuite/gcc.dg/gimplefe-45.c 
> b/gcc/testsuite/gcc.dg/gimplefe-45.c
> new file mode 100644
> index 000..b1d3cbb0205
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/gimplefe-45.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fgimple" } */
> +
> +/* This used to ICE when simplifying (A & C) != 0 ? D : 0
> +   for pointer types. */
> +
> +int *__GIMPLE ()
> +p (int n)
> +{
> +  int *_2;
> +  int *_t;
> +  int *_t1;
> +  _t = (int*)8;
> +  _t1 = 0;
> +  n = n & 2;
> +  _2 = n != 0 ? _t : _t1;
> +  return _2;
> +}
> +
> --
> 2.27.0
>


Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Richard Biener via Gcc-patches
On Sun, May 16, 2021 at 8:53 PM Joern Rennecke
 wrote:
>
> For architectures with likely spilled register classes, neither
> register allocator is guaranteed
> to succeed when using optimization.  If you have just a few files to
> compile, you can try
> by hand which compiler options will succeed and still give reasonable
> code, but for large projects,
> hand-tweaking library / program build rules on a file-by-file basis is
> time intensive and does not
> scale well across different build environments and compiler versions.
>
> The attached patch adds a new option -fretry-compilation that allows
> you to specify a list - or
> lists - of options to use for a compilation retry, which is
> implemented in the compiler driver.
>
> Bootstrapped on x86_64-pc-linux-gnu.

Eh, no ;)  But funny idea, nevertheless.  Do you run into the issues
with the first scheduling pass disabled?

Richard.


Re: RFA: Don't squash target character arrays into a narrower host string

2021-05-17 Thread Richard Biener via Gcc-patches
On Sun, May 16, 2021 at 11:12 PM Joern Rennecke
 wrote:
>
> braced_list_to_string creates a host string, so it's not suitable when
> e.g. the host
> has 8 bit chars, but the target has 16 bit chars.
>
> The attached patch checks if  host and target char sizes are different
> and in that case
> falls back to leaving the array as an array.

The check might be better suited in braced_list_to_string itself (just
in case we get more uses).

OK with that change.

Richard.

> Bootstrapped on x86_64-pc-linux-gnu.
>
> FWIW, we also have patches for cpplib / lexer / parser char and string
> handling to make 8 -> 16 bit char cross-compiling work, but they can't
> be ported forward easily because the parser has changed since gcc9.


Re: [PATCH RFA] tree-iterator: C++11 range-for and tree_stmt_iterator

2021-05-17 Thread Richard Biener via Gcc-patches
On Fri, May 14, 2021 at 2:23 AM Martin Sebor via Gcc-patches
 wrote:
>
> On 5/13/21 1:26 PM, Jason Merrill via Gcc-patches wrote:
> > Ping.
> >
> > On 5/1/21 12:29 PM, Jason Merrill wrote:
> >> Like my recent patch to add ovl_range and lkp_range in the C++ front end,
> >> this patch adds the tsi_range adaptor for using C++11 range-based
> >> 'for' with
> >> a STATEMENT_LIST, e.g.
> >>
> >>for (tree stmt : tsi_range (stmt_list)) { ... }
> >>
> >> This also involves adding some operators to tree_stmt_iterator that are
> >> needed for range-for iterators, and should also be useful in code that
> >> uses
> >> the iterators directly.
> >>
> >> The patch updates the suitable loops in the C++ front end, but does not
> >> touch any loops elsewhere in the compiler.
>
> I like the modernization of the loops.

The only worry I have (and why I stopped looking at range-for) is that
this adds another style of looping over stmts without opening the
possibility to remove another or even unify all of them.  That's because
range-for isn't powerful enough w/o jumping through hoops and/or
we cannot use what appearantly ranges<> was intended for (fix
this limitation).

That said, if some C++ literate could see if for example
what gimple-iterator.h provides can be completely modernized
then that would be great of course.

There's stuff like reverse iteration, iteration skipping debug stmts,
compares of iterators like gsi_one_before_end_p, etc.

Given my failed tries (but I'm a C++ illiterate) my TODO list now
only contains turning the iterators into STL style ones, thus
gsi_stmt (it) -> *it, gsi_next (&it) -> ++it, etc. - but even
it != end_p looks a bit awkward there.

Richard.

> I can't find anything terribly wrong with the iterator but let me
> at least pick on some nits ;)
>
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-iterator.h (struct tree_stmt_iterator): Add operator++,
> >> operator--, operator*, operator==, and operator!=.
> >> (class tsi_range): New.
> >>
> >> gcc/cp/ChangeLog:
> >>
> >> * constexpr.c (build_data_member_initialization): Use tsi_range.
> >> (build_constexpr_constructor_member_initializers): Likewise.
> >> (constexpr_fn_retval, cxx_eval_statement_list): Likewise.
> >> (potential_constant_expression_1): Likewise.
> >> * coroutines.cc (await_statement_expander): Likewise.
> >> (await_statement_walker): Likewise.
> >> * module.cc (trees_out::core_vals): Likewise.
> >> * pt.c (tsubst_expr): Likewise.
> >> * semantics.c (set_cleanup_locs): Likewise.
> >> ---
> >>   gcc/tree-iterator.h  | 28 +++-
> >>   gcc/cp/constexpr.c   | 42 ++
> >>   gcc/cp/coroutines.cc | 10 --
> >>   gcc/cp/module.cc |  5 ++---
> >>   gcc/cp/pt.c  |  5 ++---
> >>   gcc/cp/semantics.c   |  5 ++---
> >>   6 files changed, 47 insertions(+), 48 deletions(-)
> >>
> >> diff --git a/gcc/tree-iterator.h b/gcc/tree-iterator.h
> >> index 076fff8644c..f57456bb473 100644
> >> --- a/gcc/tree-iterator.h
> >> +++ b/gcc/tree-iterator.h
> >> @@ -1,4 +1,4 @@
> >> -/* Iterator routines for manipulating GENERIC tree statement list.
> >> +/* Iterator routines for manipulating GENERIC tree statement list.
> >> -*- C++ -*-
> >>  Copyright (C) 2003-2021 Free Software Foundation, Inc.
> >>  Contributed by Andrew MacLeod  
> >> @@ -32,6 +32,13 @@ along with GCC; see the file COPYING3.  If not see
> >>   struct tree_stmt_iterator {
> >> struct tree_statement_list_node *ptr;
> >> tree container;
>
> I assume the absence of ctors is intentional.  If so, I suggest
> to add a comment explaing why.  Otherwise, I would provide one
> (or as many as needed).
>
> >> +
> >> +  bool operator== (tree_stmt_iterator b) const
> >> +{ return b.ptr == ptr && b.container == container; }
> >> +  bool operator!= (tree_stmt_iterator b) const { return !(*this == b); }
> >> +  tree_stmt_iterator &operator++ () { ptr = ptr->next; return *this; }
> >> +  tree_stmt_iterator &operator-- () { ptr = ptr->prev; return *this; }
>
> I would suggest to add postincrement and postdecrement.
>
> >> +  tree &operator* () { return ptr->stmt; }
>
> Given the pervasive lack of const-safety in GCC and the by-value
> semantics of the iterator this probably isn't worth it but maybe
> add a const overload.  operator-> would probably never be used.
>
> >>   };
> >>   static inline tree_stmt_iterator
> >> @@ -71,27 +78,38 @@ tsi_one_before_end_p (tree_stmt_iterator i)
> >>   static inline void
> >>   tsi_next (tree_stmt_iterator *i)
> >>   {
> >> -  i->ptr = i->ptr->next;
> >> +  ++(*i);
> >>   }
> >>   static inline void
> >>   tsi_prev (tree_stmt_iterator *i)
> >>   {
> >> -  i->ptr = i->ptr->prev;
> >> +  --(*i);
> >>   }
> >>   static inline tree *
> >>   tsi_stmt_ptr (tree_stmt_iterator i)
> >>   {
> >> -  return &i.ptr->stmt;
> >> +  return &(*i);
> >>   }
> >>   static inline tree
> >>   tsi_stmt (tree_stmt_iterator i)
> >>   {
> >> -  return i.ptr

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-05-17 Thread Richard Biener
On Fri, 14 May 2021, Xionghu Luo wrote:

> Hi Richi,
> 
> On 2021/4/21 19:54, Richard Biener wrote:
> > On Tue, 20 Apr 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/4/15 19:34, Richard Biener wrote:
> >>> On Thu, 15 Apr 2021, Xionghu Luo wrote:
> >>>
>  Thanks,
> 
>  On 2021/4/14 14:41, Richard Biener wrote:
> >> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
> >> but it moves #538 first, then #235, there is strong dependency here. It
> >> seemsdoesn't like the LCM framework that could solve all and do the
> >> delete-insert in one iteration.
> > So my question was whether we want to do both within the LCM store
> > sinking framework.  The LCM dataflow is also used by RTL PRE which
> > handles both loads and non-loads so in principle it should be able
> > to handle stores and non-stores for the sinking case (PRE on the
> > reverse CFG).
> >
> > A global dataflow is more powerful than any local ad-hoc method.
> 
>  My biggest concern is whether the LCM DF framework could support sinking
>  *multiple* reverse-dependent non-store instructions together by *one*
>  calling of LCM DF.   If this is not supported, we need run multiple LCM
>  until no new changes, it would be time consuming obviously (unless
>  compiling time is not important here).
> >>>
> >>> As said it is used for PRE and there it most definitely can do that.
> >>
> >> I did some investigation about PRE and attached a case to show how it
> >> works, it is quite like store-motion, and actually there is a rtl-hoist
> >> pass in gcse.c which only works for code size.  All of them are
> >> leveraging the LCM framework to move instructions upward or downward.
> >>
> >> PRE and rtl-hoist move instructions upward, they analyze/hash the SOURCE
> >> exprs and call pre_edge_lcm, store-motion and rtl-sink move instructions
> >> downward, so they analyze/hash the DEST exprs and call pre_edge_rev_lcm.
> >> The four problems are all converted to the LCM DF problem with
> >> n_basic_blocks * m_exprs of 4 matrix (antic, transp, avail, kill) as input
> >> and two outputs of where to insert/delete.
> >>
> >> PRE scan each instruction and hash the SRC to table without *checking the
> >> relationship between instructions*, for the case attached, BB 37, BB 38
> >> and BB 41 both contains SOURCE expr "r262:DI+r139:DI", but BB 37 and BB 41
> >> save it to index 106, BB 38 save it to index 110. After finishing this 
> >> pass,
> >> "r262:DI+r139:DI" BB41 is replaced with "r194:DI=r452:DI", then insert
> >> expr to BB 75~BB 80 to create full redundancies from partial redundancies,
> >> finally update instruction in BB 37.
> > 
> > I'm not familiar with the actual PRE code but reading the toplevel comment
> > it seems that indeed it can only handle expressions contained in a single
> > insn unless a REG_EQUAL note provides a short-hand for the larger one.
> > 
> > That of course means it would need to mark things as not transparent
> > for correctness where they'd be if moved together.  Now, nothing
> > prevents you changing the granularity of what you feed LCM.
> > 
> > So originally we arrived at looking into LCM because there's already
> > a (store) sinking pass on RTL (using LCM) so adding another (loop-special)
> > one didn't look like the best obvious solution.
> > 
> > That said, LCM would work for single-instruction expressions.
> > Alternatively a greedy algorithm like you prototyped could be used.
> > Another pass to look at would be RTL invariant motion which seems to
> > compute some kind of dependency graph - not sure if that would be
> > adaptable for the reverse CFG problem.
> > 
> 
> Actually my RTL sinking pass patch is borrowed from RTL loop invariant
> motion, it is  quite limited since only moves instructions from loop header
> to loop exits, though it could be refined with various of algorithms.
> Compared to the initial method of running gimple sink pass once more, 
> it seems much more complicated and limited without gaining obvious performance
> benefit, shall we turn back to consider gimple sink2 pass from original since
> we are in stage1 now?

OK, so while there might be new sinking opportunities exposed during
RTL expansion and early RTL opts we can consider adding another sink pass
on GIMPLE.  Since it's basically a scheduling optimization placement
shouldn't matter much but I suppose we should run it before store
merging, so anywhere between cd_dce and that.

Richard.


Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-17 Thread Andre Vieira (lists) via Gcc-patches

Hi,

So this is my second attempt at finding a way to improve how we generate 
the vector IV's and teach the vectorizer to share them between main loop 
and epilogues. On IRC we discussed my idea to use the loop's control_iv, 
but that was a terrible idea and I quickly threw it in the bin. The main 
problem, that for some reason I failed to see, was that the control_iv 
increases by 's' and the datarefs by 's' * NELEMENTS where 's' is 
usually 1 and NELEMENTs the amount of elements we handle per iteration. 
That means the epilogue loops would have to start from the last loop's 
IV * the last loop's NELEMENT's and that would just cause a mess.


Instead I started to think about creating IV's for the datarefs and what 
I thought worked best was to create these in scalar before peeling. That 
way the peeling mechanisms takes care of the duplication of these for 
the vector and scalar epilogues and it also takes care of adding 
phi-nodes for the skip_vector paths.

These new IV's have two functions:
1) 'vect_create_data_ref_ptr' can use them to:
 a) if it's the main loop: replace the values of the 'initial' value of 
the main loop's IV and the initial values in the skip_vector phi-nodes
 b) Update the the skip_vector phi-nodes argument for the non-skip path 
with the updated vector ptr.


2) They are used for the scalar epilogue ensuring they share the same 
datareference ptr.


There are still a variety of 'hacky' elements here and a lot of testing 
to be done, but I hope to be able to clean them away. One of the main 
issues I had was that I had to skip a couple of checks and things for 
the added phi-nodes and update statements as these do not have 
stmt_vec_info representation.  Though I'm not sure adding this 
representation at their creation was much cleaner... It is something I 
could play around with but I thought this was a good moment to ask you 
for input. For instance, maybe we could do this transformation before 
analysis?


Also be aware that because I create a IV for each dataref this leads to 
regressions with SVE codegen for instance. NEON is able to use the 
post-index addressing mode to increase each dr IV at access time, but 
SVE can't do this.  For this I don't know if maybe we could try to be 
smart and create shared IV's. So rather than make them based on the 
actual vector ptr, use a shared sizetype IV that can be shared among dr 
IV's with the same step. Or maybe this is something for IVOPTs?


Let me know what ya think!

Kind regards,
Andre
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 
8001cc54f518d9d9d1a0fcfe5790d22dae109fb2..939c0a7fefd4355dd75d7646ac2ae63ce23a0e14
 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -174,6 +174,8 @@ struct data_reference
 
   /* Alias information for the data reference.  */
   struct dr_alias alias;
+
+  hash_map *iv_bases;
 };
 
 #define DR_STMT(DR)(DR)->stmt
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 
124a7bea6a94161556a6622fa7b113b3cef98bcf..f638bb3e0aa007e0bf7ad8f75fb767d3484b02ce
 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1475,6 +1475,7 @@ void
 free_data_ref (data_reference_p dr)
 {
   DR_ACCESS_FNS (dr).release ();
+  delete dr->iv_bases;
   free (dr);
 }
 
@@ -1506,6 +1507,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, 
gimple *stmt,
   DR_REF (dr) = memref;
   DR_IS_READ (dr) = is_read;
   DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt;
+  dr->iv_bases = new hash_map ();
 
   dr_analyze_innermost (&DR_INNERMOST (dr), memref,
nest != NULL ? loop : NULL, stmt);
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 
86fc118b6befb06233e5e86a01454fd7075075e1..93e14d09763da5034ba97d09b07c94c20fe25a28
 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(class loop *, void *);
 
 extern void create_iv (tree, tree, tree, class loop *, gimple_stmt_iterator *,
   bool, tree *, tree *);
+extern void create_or_update_iv (tree, tree, tree, class loop *, 
gimple_stmt_iterator *,
+ bool, tree *, tree *, gphi *, bool);
 extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
class loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 
28ae1316fa0eb6939a45d15e893b7386622ba60c..1709e175c382ef5d74c2f628a61c9fffe26f726d
 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -57,9 +57,10 @@ static bitmap_obstack loop_renamer_obstack;
VAR_AFTER (unless they are NULL).  */
 
 void
-create_iv (tree base, tree step, tree var, class loop *loop,
-  gimple_stmt_iterator *incr_pos, bool after,
-  tree *var_before, tree *var_after)
+create_or_update_iv (tree base, tree step, tree var, class loop *loop,
+

Re: RFA: Improve message for wrong number of alternatives

2021-05-17 Thread Joern Rennecke
On Sun, 16 May 2021 at 22:01, Martin Sebor  wrote:
 > I think it's very helpful to provide this sort of detail.  Just as
> a matter of readability, the new error message
>
>"wrong number of alternatives in operand %d, %d, expected %d"
>
> would be improved by avoiding the two consecutive %d's,

We could also do that by phrasing it:

"wrong number of alternatives in operand %d, seen: %d, expected: %d"

so that the change is just about adding extra information.

> e.g., by
> rephrasing it like so:
>
>"%d alternatives provided to operand %d where %d are expected"

This has an additional change in that we no longer jump to the conclusion
that the operand where we notice the discrepancy is the point that's wrong.
I suppose that conclusion is more often right than wrong (assuming more than
two operands on average for patterns that have alternatives and at least two
operands), but when it's wrong, it's particularly confusing and/or jarring,
so it's an improvement to just stick to the known facts.
But if we go that way, I suppose we should spell also out where the
expectation comes from: we have a loop over the operands, and we look at
operand 0 first.  We could do that by using the diagnostic:

  error_at (d->loc,
"alternative number mismatch: operand %d has
%d, operand %d had %d",
start, d->operand[start].n_alternatives, 0, n);


I notice in passing here that printf is actually awkward for repharasings
and hence also for translations, because we can't interchange the order of
the data in the message string.

But for multi-alternative patterns, we also have the awkwardness of
repeating the abstract of the error message and the recap of the number
of alternatives of operand 0.

So I propose the attached patch now.

Bootstrapped on x86_64-pc-linux-gnu.
2021-05-17  Joern Rennecke  

Make "wrong number of alternatives" message more specific, and
remove assumption on where the problem is.

diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index 8e911cce2f5..6313b722cf7 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -757,6 +757,7 @@ validate_insn_alternatives (class data *d)
int which_alternative = 0;
int alternative_count_unsure = 0;
bool seen_write = false;
+   bool alt_mismatch = false;
 
for (p = d->operand[start].constraint; (c = *p); p += len)
  {
@@ -813,8 +814,19 @@ validate_insn_alternatives (class data *d)
if (n == 0)
  n = d->operand[start].n_alternatives;
else if (n != d->operand[start].n_alternatives)
- error_at (d->loc, "wrong number of alternatives in operand %d",
-   start);
+ {
+   if (!alt_mismatch)
+ {
+   alt_mismatch = true;
+   error_at (d->loc,
+ "alternative number mismatch: "
+ "operand %d had %d, operand %d has %d",
+ 0, n, start, d->operand[start].n_alternatives);
+ }
+   else
+ error_at (d->loc, "operand %d has %d alternatives",
+   start, d->operand[start].n_alternatives);
+ }
  }
   }
 


Re: [PATCH] go/100537 - Bootstrap-O3 and bootstrap-debug fail

2021-05-17 Thread Richard Biener via Gcc-patches
On Fri, May 14, 2021 at 11:19 AM guojiufu via Gcc-patches
 wrote:
>
> On 2021-05-14 15:39, guojiufu via Gcc-patches wrote:
> > On 2021-05-14 15:15, Richard Biener wrote:
> >> On May 14, 2021 4:52:56 AM GMT+02:00, Jiufu Guo
> >>  wrote:
> >>> As discussed in the PR, Richard mentioned the method to
> >>> figure out which VAR was not set TREE_ADDRESSABLE, and
> >>> then cause this failure.  It is address_expression which
> >>> build addr_expr (build_fold_addr_expr_loc), but not set
> >>> TREE_ADDRESSABLE.
> >>>
> >>> I drafted this patch with reference the comments from Richard
> >>> in this PR, while I'm not quite sure if more thing need to do.
> >>> So, please have review, thanks!
> >>>
> >>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
> >>
> >> I suggest to use mark_addresssable unless we're sure expr is always an
> >> entity where TREE_ADDRESSABLE has the desired meaning.
>
> Thanks, Richard!
> You point out the root concern, I'm not sure ;)
>
> With looking at code "mark_addresssable" and code around
> tree-ssa.c:1013,
> VAR_P, PARM_DECL, and RESULT_DECL are checked before accessing
> TREE_ADDRESSABLE.
> So, just wondering if these entities need to be marked as
> TREE_ADDRESSABLE?
>
> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> index 5d9dbb5d068..85d324a92cc 100644
> --- a/gcc/go/go-gcc.cc
> +++ b/gcc/go/go-gcc.cc
> @@ -1680,6 +1680,11 @@ Gcc_backend::address_expression(Bexpression*
> bexpr, Location location)
> if (expr == error_mark_node)
>   return this->error_expression();
>
> +  if ((VAR_P(expr)
> +   || TREE_CODE(expr) == PARM_DECL
> +   || TREE_CODE(expr) == RESULT_DECL)
> +TREE_ADDRESSABLE (expr) = 1;
> +

The root concern is that mark_addressable does

  while (handled_component_p (x))
x = TREE_OPERAND (x, 0);

and I do not know the constraints on 'expr' as passed to
Gcc_backend::address_expression.

I think we need input from Ian here.  Most FEs have their own *_mark_addressable
function where they also emit diagnostics (guess this is handled in
the actual Go frontend).
Since Gcc_backend does lowering to GENERIC using a middle-end is probably OK.

> tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
> return this->make_expression(ret);
>   }
>
>
> Or call mark_addressable, and update mark_addressable to avoid NULL
> pointer ICE:
> The below patch also pass bootstrap-debug.
>
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index b8c732b632a..f682841391b 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -915,6 +915,7 @@ mark_addressable (tree x)
> if (TREE_CODE (x) == VAR_DECL
> && !DECL_EXTERNAL (x)
> && !TREE_STATIC (x)
> +  && cfun != NULL

I'd be OK with this hunk of course.

> && cfun->gimple_df != NULL
> && cfun->gimple_df->decls_to_pointers != NULL)
>   {
> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> index 5d9dbb5d068..fe9dfaf8579 100644
> --- a/gcc/go/go-gcc.cc
> +++ b/gcc/go/go-gcc.cc
> @@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression*
> bexpr, Location location)
> if (expr == error_mark_node)
>   return this->error_expression();
>
> +  mark_addressable(expr);
> tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
> return this->make_expression(ret);
>   }
>
>
> >
> > I notice you mentioned "mark_addresssable" in PR.
> > And I had tried yesterday, it cause new ICEs at gimple-expr.c:918
> > below line:
> >
> >   && cfun->gimple_df != NULL
> >
> >
> >
> >>
> >> Richard.
> >>
> >>> Jiufu Guo.
> >>>
> >>> 2021-05-14  Richard Biener  
> >>> Jiufu Guo 
> >>>
> >>> PR go/100537
> >>> * go-gcc.cc
> >>> (Gcc_backend::address_expression): Set TREE_ADDRESSABLE.
> >>>
> >>> ---
> >>> gcc/go/go-gcc.cc | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> >>> index 5d9dbb5d068..8ed20a3b479 100644
> >>> --- a/gcc/go/go-gcc.cc
> >>> +++ b/gcc/go/go-gcc.cc
> >>> @@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression*
> >>> bexpr, Location location)
> >>>   if (expr == error_mark_node)
> >>> return this->error_expression();
> >>>
> >>> +  TREE_ADDRESSABLE (expr) = 1;
> >>>   tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
> >>>   return this->make_expression(ret);
> >>> }


Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735]

2021-05-17 Thread Hongtao Liu via Gcc-patches
On Fri, May 14, 2021 at 10:27 AM Hongtao Liu  wrote:
>
> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford
>  wrote:
> >
> > Jakub Jelinek  writes:
> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote:
> > >> Jakub Jelinek  writes:
> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote:
> > >> >> > >   Bootstrapped and regtested on X86_64-linux-gnu{-m32,}
> > >> >> > >   Ok for trunk?
> > >> >> >
> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and later
> > >> >> > removed for some reason). Perhaps we could resurrect the patch for 
> > >> >> > the
> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX?
> > >> >>
> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html
> > >> >
> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html
> > >> > is where it got removed, CCing Richard.
> > >>
> > >> Yeah.  Initially clobber_high seemed like the best appraoch for
> > >> handling the tlsdesc thing, but in practice it was too difficult
> > >> to shoe-horn the concept in after the fact, when so much rtl
> > >> infrastructure wasn't prepared to deal with it.  The old support
> > >> didn't handle all cases and passes correctly, and handled others
> > >> suboptimally.
> > >>
> > >> I think it would be worth using the same approach as
> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for
> > >> vzeroupper: represent the instructions as call_insns in which the
> > >> call has a special vzeroupper ABI.  I think that's likely to lead
> > >> to better code than clobber_high would (or at least, it did for tlsdesc).
>
> From an implementation perspective, I guess you're meaning we should
> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386
> backend.
>
When I implemented the vzeroupper pattern as call_insn and defined
TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related
to 2 parts

1. requires_stack_frame_p return true for vzeroupper which should be false.
2. in subst_stack_regs, vzeroupper shouldn't kill arguments

I've tried a rough patch like below, it works for those failures,
unfortunately, I don't have an arm machine to test, so I want to ask
would the below change break something in the arm backend?

modified   gcc/reg-stack.c
@@ -174,6 +174,7 @@
 #include "reload.h"
 #include "tree-pass.h"
 #include "rtl-iter.h"
+#include "function-abi.h"

 #ifdef STACK_REGS

@@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack)
   bool control_flow_insn_deleted = false;
   int i;

-  if (CALL_P (insn))
+  if (CALL_P (insn) && insn_callee_abi (insn).id () == 0)
 {
   int top = regstack->top;

modified   gcc/shrink-wrap.c
@@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn,
HARD_REG_SET prologue_used,
   unsigned regno;

   if (CALL_P (insn))
-return !SIBLING_CALL_P (insn);
+{
+  if (insn_callee_abi (insn).id() != 0)
+ return false;
+  else
+ return !SIBLING_CALL_P (insn);
+}

   /* We need a frame to get the unique CFA expected by the unwinder.  */
   if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
> > >
> > > Perhaps a magic call_insn that is split post-reload into a normal insn
> > > with the sets then?
> >
> > I'd be tempted to treat it is a call_insn throughout.  The unspec_volatile
> > means that we can't move the instruction, so converting a call_insn to an
> > insn isn't likely to help from that point of view.  The sets are also
> > likely to be handled suboptimally compared to the more accurate register
> > information attached to the call: all code that handles calls has to be
> > prepared to deal with partial clobbers, whereas most code dealing with
> > sets will assume that the set does useful work, and that the rhs of the
> > set is live.
> >
> > Thanks,
> > Richard
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH 1/1 v2] PR100281 C++: Fix SImode pointer handling

2021-05-17 Thread Richard Biener via Gcc-patches
On Thu, May 13, 2021 at 8:28 AM Andreas Krebbel via Gcc-patches
 wrote:
>
> v1 -> v2: build_reference_type_for_mode and build_pointer_type_for_mode now 
> pick pointer mode if
> MODE argument is VOIDmode.
>
> Bootstrapped and regression tested on x86_64 and s390x.
>
> Ok for mainline and GCC 11?

The middle-end parts are fine with me.

Richard.

> Andreas
>
>
> gcc/cp/ChangeLog:
>
> PR c++/100281
> * cvt.c (cp_convert_to_pointer): Use the size of the target
> pointer type.
> * tree.c (cp_build_reference_type): Call
> cp_build_reference_type_for_mode with VOIDmode.
> (cp_build_reference_type_for_mode): Rename from
> cp_build_reference_type.  Add MODE argument and invoke
> build_reference_type_for_mode.
> (strip_typedefs): Use build_pointer_type_for_mode and
> cp_build_reference_type_for_mode for pointers and references.
>
> gcc/ChangeLog:
>
> PR c++/100281
> * tree.c (build_reference_type_for_mode)
> (build_pointer_type_for_mode): Pick pointer mode if MODE argument
> is VOIDmode.
> (build_reference_type, build_pointer_type): Invoke
> build_*_type_for_mode with VOIDmode.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/100281
> * g++.target/s390/pr100281-1.C: New test.
> * g++.target/s390/pr100281-2.C: New test.
> ---
>  gcc/cp/cvt.c   |  2 +-
>  gcc/cp/tree.c  | 25 ++-
>  gcc/testsuite/g++.target/s390/pr100281-1.C | 10 
>  gcc/testsuite/g++.target/s390/pr100281-2.C |  9 +++
>  gcc/tree.c | 29 ++
>  5 files changed, 57 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/s390/pr100281-1.C
>  create mode 100644 gcc/testsuite/g++.target/s390/pr100281-2.C
>
> diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
> index f1687e804d1..7fa6e8df52b 100644
> --- a/gcc/cp/cvt.c
> +++ b/gcc/cp/cvt.c
> @@ -232,7 +232,7 @@ cp_convert_to_pointer (tree type, tree expr, bool dofold,
>  {
>if (TYPE_PRECISION (intype) == POINTER_SIZE)
> return build1 (CONVERT_EXPR, type, expr);
> -  expr = cp_convert (c_common_type_for_size (POINTER_SIZE, 0), expr,
> +  expr = cp_convert (c_common_type_for_size (TYPE_PRECISION (type), 0), 
> expr,
>  complain);
>/* Modes may be different but sizes should be the same.  There
>  is supposed to be some integral type that is the same width
> diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
> index 7f148b4b158..35faeff065a 100644
> --- a/gcc/cp/tree.c
> +++ b/gcc/cp/tree.c
> @@ -1206,12 +1206,14 @@ vla_type_p (tree t)
>return false;
>  }
>
> -/* Return a reference type node referring to TO_TYPE.  If RVAL is
> +
> +/* Return a reference type node of MODE referring to TO_TYPE.  If MODE
> +   is VOIDmode the standard pointer mode will be picked.  If RVAL is
> true, return an rvalue reference type, otherwise return an lvalue
> reference type.  If a type node exists, reuse it, otherwise create
> a new one.  */
>  tree
> -cp_build_reference_type (tree to_type, bool rval)
> +cp_build_reference_type_for_mode (tree to_type, machine_mode mode, bool rval)
>  {
>tree lvalue_ref, t;
>
> @@ -1224,7 +1226,8 @@ cp_build_reference_type (tree to_type, bool rval)
>to_type = TREE_TYPE (to_type);
>  }
>
> -  lvalue_ref = build_reference_type (to_type);
> +  lvalue_ref = build_reference_type_for_mode (to_type, mode, false);
> +
>if (!rval)
>  return lvalue_ref;
>
> @@ -1250,7 +1253,7 @@ cp_build_reference_type (tree to_type, bool rval)
>  SET_TYPE_STRUCTURAL_EQUALITY (t);
>else if (TYPE_CANONICAL (to_type) != to_type)
>  TYPE_CANONICAL (t)
> -  = cp_build_reference_type (TYPE_CANONICAL (to_type), rval);
> +  = cp_build_reference_type_for_mode (TYPE_CANONICAL (to_type), mode, 
> rval);
>else
>  TYPE_CANONICAL (t) = t;
>
> @@ -1260,6 +1263,16 @@ cp_build_reference_type (tree to_type, bool rval)
>
>  }
>
> +/* Return a reference type node referring to TO_TYPE.  If RVAL is
> +   true, return an rvalue reference type, otherwise return an lvalue
> +   reference type.  If a type node exists, reuse it, otherwise create
> +   a new one.  */
> +tree
> +cp_build_reference_type (tree to_type, bool rval)
> +{
> +  return cp_build_reference_type_for_mode (to_type, VOIDmode, rval);
> +}
> +
>  /* Returns EXPR cast to rvalue reference type, like std::move.  */
>
>  tree
> @@ -1561,11 +1574,11 @@ strip_typedefs (tree t, bool *remove_attributes, 
> unsigned int flags)
>  {
>  case POINTER_TYPE:
>type = strip_typedefs (TREE_TYPE (t), remove_attributes, flags);
> -  result = build_pointer_type (type);
> +  result = build_pointer_type_for_mode (type, TYPE_MODE (t), false);
>break;
>  case REFERENCE_TYPE:
>type = strip_typedefs (TREE_TYPE (t), remove_attributes, flags);
> -  resu

Re: [wwwdocs, patch] gcc-12/changes.html: Document -mptx for nvptx

2021-05-17 Thread Tobias Burnus

Early *PING*  - and I fixed a wording issue in my patch.

OK? Suggestions?

Tobias

On 14.05.21 00:06, Tobias Burnus wrote:

Document this new flag, added in
https://gcc.gnu.org/g:2a1586401a21dcd43e0f904bb6eec26c8b2f366b
+ https://gcc.gnu.org/onlinedocs/gcc/Nvidia-PTX-Options.html#index-mptx

Any wording suggestions?

Tobias

PS: Some background remarks:

(PTX ISA 3.1 is supported since NVidia's CUDA 5 while 6.3 is supported
since
CUDA 10.0 - and adds very useful new features; current is PTX ISA 7.3
(CUDA 11.3),* but on the PTX side, 6.3 adds a lot, >6.3 only few
features,
we still may want to support sometime in the future.)

(The new flag paves the way for additional -misa= flags
(i.e. newer hardware, relevant for enabling ptx instructions which only
newer GPUs support) and newer GPU-hardware-independent PTX ISA features;
hence, either permitting better code generation or for be used to fix
bugs.
While this will change during GCC 12, currently, the generated code is
effectively the same with either -mptx= value.)

(Regarding the produced instructions, the installed CUDA will JIT
(and then cache) the GCC-generated nvptx in the binary at startup,
optimizing for the available hardware - i.e. the chosen -mptx and
available -misa do not restrict the hardware ability, just that
PTX instructions which is only available in newer PTX / for newer
hardware may not be generated.)

(* Cf.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes__ptx-release-history
)


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 23f71411..53abf9ed 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -101,8 +101,13 @@ a work-in-progress.
 
 
 
-
-
+NVPTX
+
+  The -mptx flag has been added to specify the PTX ISA version
+  for the generated code; permitted values are 3.1
+  (default, matches previous GCC versions) and 6.3.
+  
+
 
 
 


Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Joern Rennecke
On Mon, 17 May 2021 at 08:36, Richard Biener  wrote:
>
> On Sun, May 16, 2021 at 8:53 PM Joern Rennecke
>  wrote:
> >
> > For architectures with likely spilled register classes, neither
> > register allocator is guaranteed
> > to succeed when using optimization.  If you have just a few files to
> > compile, you can try
> > by hand which compiler options will succeed and still give reasonable
> > code, but for large projects,
> > hand-tweaking library / program build rules on a file-by-file basis is
> > time intensive and does not
> > scale well across different build environments and compiler versions.
> >
> > The attached patch adds a new option -fretry-compilation that allows
> > you to specify a list - or
> > lists - of options to use for a compilation retry, which is
> > implemented in the compiler driver.
> >
> > Bootstrapped on x86_64-pc-linux-gnu.
>
> Eh, no ;)  But funny idea, nevertheless.

Why no?

lra just throws a ton of transformations at the code with no theoretical
concept that I can discern that it should - modulo bugs - succeed for
all well-formed code.  It works well most of the time so I'd like to use it as
a default, but how are you supposed to compile libgcc and newlib with
a register allocator that only works most of the time?

reload is more robust in the basic design, but it's so complex that it's
rather time-consuming to debug.  The failures I had left with reload
were not spill-failures per se, but code that was considered mal-formed by
the postreload passes and it's hard to decide which one was actually wrong.
And if I debug the failures seeen with realod, will this do any good in the
long run, or will it just be changed beyond all recognition (with works for
the top five most popular processor architectures but not quite for anything
else) or plain ripped out a few years down the line?

I had a proof-of-concept for the option in the target code first, but that used
fork(2) and thus left non-POSIX hosts (even if they have a pretend POSIX
subsystem) high and dry.  The logical place to implement the option to
make it portable is in the compiler driver.
I've called the option originally -mretry-regalloc / -fretry-regalloc, but when
I got around to write the invoke.texi patch, I realized that the option can be
used more generally to work around glitches, so it's more apt to name it
-fretry-compilation .

> Do you run into the issues
> with the first scheduling pass disabled?

The target doesn't have anything that needs scheduling, and hence no scheduling
description.  But it also has more severe register pressures for
memory access than
ports in the FSF tree.

The bane of lra are memory-memory moves.  Instead of using an intermediate
register, it starts by reloading the well-formed addresses and thus jacking up
the base register pressure.

I had a patch for that, but I found it needs a bit more work.


Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-17 Thread Richard Biener via Gcc-patches
On Thu, May 13, 2021 at 9:04 AM Kewen.Lin  wrote:
>
> Hi!
>
> >>> But in the end the vector code shouldn't end up worse than the
> >>> scalar code with respect to IVs - the cases where it would should
> >>> be already costed.  So I wonder if you have specific examples
> >>> where things go worse enough for the heuristic to trigger?
> >>>
> >>
> >> One typical case that I worked on to reuse this density check is the
> >> function mat_times_vec of src file block_solver.fppized.f of SPEC2017
> >> 503.bwaves_r, the density with the existing heuristic is 83 (doesn't
> >> exceed the threshold unlikely).  The interesting loop is the innermost
> >> one while option set is "-O2 -mcpu=power8 -ffast-math -ftree-vectorize".
> >> We have verified that this loop isn't profitable to be vectorized at
> >> O2 (without loop-interchange).
> >
> > Yeah, but that's because the loop only runs 5 iterations, not because
> > of some "density" (which suggests AGU overloading or some such)?
> > Because if you modify it so it iterates more then with keeping the
> > "density" measurement constant you suddenly become profitable?
> >
>
> Yes, I agree this isn't a perfect one showing how the density check
> matters, though it led me to find this check.  I tried to run SPEC2017
> bmks w/ and w/o this density heuristic to catch some "expected" case,
> but failed to unluckily.  It may be worth to trying with some more
> option sets or even test with the previous SPECs later.
>
> I hacked the innermost loop iteration from 5 to 20, but baseline run
> didn't stop (after more than 7 hrs then I killed it), which was
> suspected to become endless because of some garbage (out of bound) data.
>
> But the current cost modeling for this loop on Power is still bad, the
> min profitable iteration (both static and run time) are evaluated as 2,
> while the reality shows 5 isn't profitable at least.
>
>
> > The loop does have quite many memory streams so optimizing
> > the (few) arithmetic ops by vectorizign them might not be worth
> > the trouble, esp. since most of the loads are "strided" (composed
> > from scalars) when no interchange is performed.  So it's probably
> > more a "density" of # memory streams vs. # arithmetic ops, and
> > esp. with any non-consecutive vector loads this balance being
> > worse in the vector case?
> >
>
> Yeah, these many scalar "strided" loads make things worse.  The fed
> vector CTORs have to wait for all of their required loads are ready,
> and these vector CTOR are required by further multiplications.
>
> I posted one patch[1] on this, which tries to model it with
> some counts: nload (total load number), nload_ctor (strided
> load number fed into CTOR) and nctor_strided (CTOR number fed
> by strided load).
>
> Restricting the penalization by considering some factors:
>   1) vect density ratio, if there are many vector instructions,
>  the stalls from loads are easy to impact the subsequent
>  computation.
>   2) total load number, if nload is small, it's unlikely to
>  bother the load/store units much.
>   3) strided loads fed into CTOR pct., if there are high portion
>  strided loads fed into CTOR, it's very likely to block
>  the CTOR and its subsequent chain.
>
> btw, as your previous comments on add_stmt_cost, the load/strided/ctor
> statistics should be gathered there instead, like:
>
>   if (!data->costing_for_scalar && data->loop_info && where == vect_body)
> {
>   if (kind == scalar_load || kind == vector_load || kind == unaligned_load
>   || kind == vector_gather_load)
>   data->nload += count;
>   if (stmt_info && STMT_VINFO_STRIDED_P (stmt_info))
> {
>   if (kind == scalar_load || kind == unaligned_load)
> data->nload_ctor += count;
>   else if (kind == vec_construct)
> data->nctor_strided += count;
> }
> }
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569791.html
>
> > The x86 add_stmt_cost has
> >
> >   /* If we do elementwise loads into a vector then we are bound by
> >  latency and execution resources for the many scalar loads
> >  (AGU and load ports).  Try to account for this by scaling the
> >  construction cost by the number of elements involved.  */
> >   if ((kind == vec_construct || kind == vec_to_scalar)
> >   && stmt_info
> >   && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
> >   || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
> >   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
> >   && TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info))) != 
> > INTEGER_CST)
> > {
> >   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> >   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
> > }
> >
> > so it penaltizes VMAT_ELEMENTWISE for variable step for both loads and 
> > stores.
> > The above materialized over PRs 84037, 85491 and 87561, so not specifically
> > for the bwaves ca

Re: [i386] Fix ICE [PR target/100549]

2021-05-17 Thread Richard Biener via Gcc-patches
On Thu, May 13, 2021 at 11:43 AM Hongtao Liu via Gcc-patches
 wrote:
>
> Hi:
>   When arg0 is same as arg1 in __builtin_ia32_pcmpgtw,
> gimple_build (&stmts, GT_EXPR, cmp_type, arg0, arg1) will simplify the
> comparison to vector constant 0, no stmts is generated, which causes
> ICE in gsi_insert_before (gsi, stmts, GSI_SAME_STMT). So don't insert
> stmts when it's NULL.
>
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
>   Ok for trunk?

It should use

  gsi_insert_seq_before (gsi, stmts, ...)

otherwise it will only insert the first stmt in the sequence
and gsi_insert_seq_before handles a NULL seq just fine.
(you might want to scan ix86_gimple_fold_builtin for more
similar errors).

OK with that change.

Richard.

> gcc/ChangeLog:
>
> PR target/100549
> * config/i386/i386.c (ix86_gimple_fold_builtin): Insert gimple
> stmts if stmts is not NULL.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100549
> * gcc.target/i386/pr100549.c: New test.
>
> --
> BR,
> Hongtao


Re: [PATCH] LTO: merge -flto=foo both from IL and linker cmdline

2021-05-17 Thread Richard Biener via Gcc-patches
On Thu, May 13, 2021 at 1:49 PM Martin Liška  wrote:
>
> Hello.
>
> In g:3835aa0eb90292d652dd6b200f302f3cac7e643f, I changed logic that the output
> -flto=foo argument is taken from IL file command lines. However, it should be 
> also
> merged with linker command line. One can use -flto for compilation and 
> -flto=16 for linking.
>
> Ready after it finishes tests?

OK.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * lto-wrapper.c (merge_flto_options): Factor out a new function.
> (merge_and_complain): Use it.
> (run_gcc): Merge also linker command line -flto=foo argument
> with IL files.
> ---
>   gcc/lto-wrapper.c | 118 +-
>   1 file changed, 65 insertions(+), 53 deletions(-)
>
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index a71d6147152..1c2643984f9 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -189,6 +189,37 @@ find_option (vec &options, 
> cl_decoded_option *option)
> return find_option (options, option->opt_index);
>   }
>
> +/* Merge -flto FOPTION into vector of DECODED_OPTIONS.  */
> +
> +static void
> +merge_flto_options (vec &decoded_options,
> +   cl_decoded_option *foption)
> +{
> +  int existing_opt = find_option (decoded_options, foption);
> +  if (existing_opt == -1)
> +decoded_options.safe_push (*foption);
> +  else
> +{
> +  if (strcmp (foption->arg, decoded_options[existing_opt].arg) != 0)
> +   {
> + /* -flto=auto is preferred.  */
> + if (strcmp (decoded_options[existing_opt].arg, "auto") == 0)
> +   ;
> + else if (strcmp (foption->arg, "auto") == 0
> +  || strcmp (foption->arg, "jobserver") == 0)
> +   decoded_options[existing_opt].arg = foption->arg;
> + else if (strcmp (decoded_options[existing_opt].arg,
> +  "jobserver") != 0)
> +   {
> + int n = atoi (foption->arg);
> + int original_n = atoi (decoded_options[existing_opt].arg);
> + if (n > original_n)
> +   decoded_options[existing_opt].arg = foption->arg;
> +   }
> +   }
> +}
> +}
> +
>   /* Try to merge and complain about options FDECODED_OPTIONS when applied
>  ontop of DECODED_OPTIONS.  */
>
> @@ -427,28 +458,7 @@ merge_and_complain (vec 
> decoded_options,
>   break;
>
> case OPT_flto_:
> - if (existing_opt == -1)
> -   decoded_options.safe_push (*foption);
> - else
> -   {
> - if (strcmp (foption->arg, decoded_options[existing_opt].arg) != 
> 0)
> -   {
> - /* -flto=auto is preferred.  */
> - if (strcmp (decoded_options[existing_opt].arg, "auto") == 0)
> -   ;
> - else if (strcmp (foption->arg, "auto") == 0
> -  || strcmp (foption->arg, "jobserver") == 0)
> -   decoded_options[existing_opt].arg = foption->arg;
> - else if (strcmp (decoded_options[existing_opt].arg,
> -  "jobserver") != 0)
> -   {
> - int n = atoi (foption->arg);
> - int original_n = atoi 
> (decoded_options[existing_opt].arg);
> - if (n > original_n)
> -   decoded_options[existing_opt].arg = foption->arg;
> -   }
> -   }
> -   }
> + merge_flto_options (decoded_options, foption);
>   break;
> }
>   }
> @@ -1515,37 +1525,6 @@ run_gcc (unsigned argc, char *argv[])
> append_compiler_options (&argv_obstack, fdecoded_options);
> append_linker_options (&argv_obstack, decoded_options);
>
> -  /* Process LTO-related options on merged options.  */
> -  for (j = 1; j < fdecoded_options.length (); ++j)
> -{
> -  cl_decoded_option *option = &fdecoded_options[j];
> -  switch (option->opt_index)
> -   {
> -   case OPT_flto_:
> - if (strcmp (option->arg, "jobserver") == 0)
> -   {
> - parallel = 1;
> - jobserver = 1;
> -   }
> - else if (strcmp (option->arg, "auto") == 0)
> -   {
> - parallel = 1;
> - auto_parallel = 1;
> -   }
> - else
> -   {
> - parallel = atoi (option->arg);
> - if (parallel <= 1)
> -   parallel = 0;
> -   }
> - /* Fallthru.  */
> -
> -   case OPT_flto:
> - lto_mode = LTO_MODE_WHOPR;
> - break;
> -   }
> -}
> -
> /* Scan linker driver arguments for things that are of relevance to us.  
> */
> for (j = 1; j < decoded_options.length (); ++j)
>   {
> @@ -1574,6 +1553,8 @@ run_gcc (unsigned argc, char *argv[])
>   break;
>
> case OPT_flto_:
> + /* Merge linker -flto= option with what we have in IL files.  */
> + mer

Re: [wwwdocs, patch] gcc-12/changes.html: Document -mptx for nvptx

2021-05-17 Thread Tom de Vries
On 5/17/21 10:49 AM, Tobias Burnus wrote:
> Early *PING*  - and I fixed a wording issue in my patch.
> 
> OK? Suggestions?
> 

LGTM, thanks.
- Tom

> Tobias
> 
> On 14.05.21 00:06, Tobias Burnus wrote:
>> Document this new flag, added in
>> https://gcc.gnu.org/g:2a1586401a21dcd43e0f904bb6eec26c8b2f366b
>> + https://gcc.gnu.org/onlinedocs/gcc/Nvidia-PTX-Options.html#index-mptx
>>
>> Any wording suggestions?
>>
>> Tobias
>>
>> PS: Some background remarks:
>>
>> (PTX ISA 3.1 is supported since NVidia's CUDA 5 while 6.3 is supported
>> since
>> CUDA 10.0 - and adds very useful new features; current is PTX ISA 7.3
>> (CUDA 11.3),* but on the PTX side, 6.3 adds a lot, >6.3 only few
>> features,
>> we still may want to support sometime in the future.)
>>
>> (The new flag paves the way for additional -misa= flags
>> (i.e. newer hardware, relevant for enabling ptx instructions which only
>> newer GPUs support) and newer GPU-hardware-independent PTX ISA features;
>> hence, either permitting better code generation or for be used to fix
>> bugs.
>> While this will change during GCC 12, currently, the generated code is
>> effectively the same with either -mptx= value.)
>>
>> (Regarding the produced instructions, the installed CUDA will JIT
>> (and then cache) the GCC-generated nvptx in the binary at startup,
>> optimizing for the available hardware - i.e. the chosen -mptx and
>> available -misa do not restrict the hardware ability, just that
>> PTX instructions which is only available in newer PTX / for newer
>> hardware may not be generated.)
>>
>> (* Cf.
>> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes__ptx-release-history
>>
>> )
>>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Frank Thürauf


Re: [PATCH] middle-end/100464 - avoid spurious TREE_ADDRESSABLE in folding debug stmts

2021-05-17 Thread Richard Biener
On Wed, 12 May 2021, Martin Sebor wrote:

> On 5/7/21 4:21 AM, Richard Biener via Gcc-patches wrote:
> > On Fri, May 7, 2021 at 12:17 PM Richard Biener  wrote:
> >>
> >> canonicalize_constructor_val was setting TREE_ADDRESSABLE on bases
> >> of ADDR_EXPRs but that's futile when we're dealing with CTOR values
> >> in debug stmts.  This rips out the code which was added for Java
> >> and should have been an assertion when we didn't have debug stmts.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages
> >> which revealed PR100468 for which I added the cp/class.c hunk below.
> >> Re-testing with that in progress.
> >>
> >> OK for trunk and branch?  It looks like this C++ code is new in GCC 11.
> > 
> > I mislooked, the code is old.
> > 
> > This hunk also breaks (or fixes) g++.dg/tree-ssa/array-temp1.C where
> > the gimplifier previously passes the
> > 
> > && (flag_merge_constants >= 2 || !TREE_ADDRESSABLE (object))
> > 
> > check guarding it against unifying addresses of different instances
> > of variables.  Clearly in the case of the testcase there are addresses to
> > this variable as part of the initializer list construction.  So the hunk
> > fixes
> > wrong-code, but it breaks the testcase.
> > 
> > Any comments?  I can of course change the testcase accordingly.
> 
> Just one belated comment.  I realize you already pushed this change
> but...
> 
> > 
> > Thanks,
> > Richard.
> > 
> >> Thanks,
> >> Richard.
> >>
> >> 2021-05-07  Richard Biener  
> >>
> >>  PR middle-end/100464
> >>  PR c++/100468
> >> gcc/
> >>  * gimple-fold.c (canonicalize_constructor_val): Do not set
> >>  TREE_ADDRESSABLE.
> >>
> >> gcc/cp/
> >>  * call.c (set_up_extended_ref_temp): Mark the temporary
> >>  addressable if the TARGET_EXPR was.
> >>
> >> gcc/testsuite/
> >>  * gcc.dg/pr100464.c: New testcase.
> >> ---
> >>   gcc/cp/call.c   |  2 ++
> >>   gcc/gimple-fold.c   |  4 +++-
> >>   gcc/testsuite/gcc.dg/pr100464.c | 16 
> >>   3 files changed, 21 insertions(+), 1 deletion(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/pr100464.c
> >>
> >> diff --git a/gcc/cp/call.c b/gcc/cp/call.c
> >> index 57bac05fe70..ea97be22f07 100644
> >> --- a/gcc/cp/call.c
> >> +++ b/gcc/cp/call.c
> >> @@ -12478,6 +12478,8 @@ set_up_extended_ref_temp (tree decl, tree expr,
> >> vec **cleanups,
> >>VAR.  */
> >> if (TREE_CODE (expr) != TARGET_EXPR)
> >>   expr = get_target_expr (expr);
> >> +  else if (TREE_ADDRESSABLE (expr))
> >> +TREE_ADDRESSABLE (var) = 1;
> >>
> >> if (TREE_CODE (decl) == FIELD_DECL
> >> && extra_warnings && !TREE_NO_WARNING (decl))
> >> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >> index aa33779b753..768ef89d876 100644
> >> --- a/gcc/gimple-fold.c
> >> +++ b/gcc/gimple-fold.c
> >> @@ -245,7 +245,9 @@ canonicalize_constructor_val (tree cval, tree
> >> from_decl)
> >> if (TREE_TYPE (base) == error_mark_node)
> >> return NULL_TREE;
> >> if (VAR_P (base))
> >> -   TREE_ADDRESSABLE (base) = 1;
> >> +   /* ???  We should be able to assert that TREE_ADDRESSABLE is set,
> >> +  but since the use can be in a debug stmt we can't.  */
> >> +   ;
> 
> ...as I mentioned before, I find these question marks confusing
> (and there are over a thousand instances of them in GCC sources).
> They bring the comment that follows into question.  Please consider
> spelling out in words what you mean instead.  (E.g., use FIXME: We
> should be able to assert...)

To me ??? is a synonym to FIXME.  The vectorizer has TODO.

There are 856 ??? comments in gcc/*.c and only 319 FIXME ones, so
it seems ??? is prefered.

I find ??? less confusing - FIXME is sth that needs fixing while
??? reads to me as some "this could be done better but isn't a bug".

Anyway, just my personal preference and reading of course.

Richard.

> Martin
> 
> >> else if (TREE_CODE (base) == FUNCTION_DECL)
> >>  {
> >>/* Make sure we create a cgraph node for functions we'll
> >> reference.
> >> diff --git a/gcc/testsuite/gcc.dg/pr100464.c
> >> b/gcc/testsuite/gcc.dg/pr100464.c
> >> new file mode 100644
> >> index 000..46cc37dff54
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/pr100464.c
> >> @@ -0,0 +1,16 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O3 -fcompare-debug" } */
> >> +
> >> +int *a;
> >> +static int b, c, d, e, g, h;
> >> +int f;
> >> +void i() {
> >> +  int *j[] = {&e, &b, &b, &d, &b, &b, &g, &e, &g, &b, &b,
> >> +  &b, &b, &g, &e, &e, &b, &b, &d, &b, &b, &e,
> >> +  &e, &g, &b, &b, &b, &b, &g, &e, &g, &c, &e};
> >> +  int **k = &j[5];
> >> +  for (; f;)
> >> +b |= *a;
> >> +  *k = &h;
> >> +}
> >> +int main() {}
> >> --
> >> 2.26.2
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (

Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-17 Thread Kewen.Lin via Gcc-patches
on 2021/5/17 下午4:55, Richard Biener wrote:
> On Thu, May 13, 2021 at 9:04 AM Kewen.Lin  wrote:
>>
>> Hi!
>>
> But in the end the vector code shouldn't end up worse than the
> scalar code with respect to IVs - the cases where it would should
> be already costed.  So I wonder if you have specific examples
> where things go worse enough for the heuristic to trigger?
>

 One typical case that I worked on to reuse this density check is the
 function mat_times_vec of src file block_solver.fppized.f of SPEC2017
 503.bwaves_r, the density with the existing heuristic is 83 (doesn't
 exceed the threshold unlikely).  The interesting loop is the innermost
 one while option set is "-O2 -mcpu=power8 -ffast-math -ftree-vectorize".
 We have verified that this loop isn't profitable to be vectorized at
 O2 (without loop-interchange).
>>>
>>> Yeah, but that's because the loop only runs 5 iterations, not because
>>> of some "density" (which suggests AGU overloading or some such)?
>>> Because if you modify it so it iterates more then with keeping the
>>> "density" measurement constant you suddenly become profitable?
>>>
>>
>> Yes, I agree this isn't a perfect one showing how the density check
>> matters, though it led me to find this check.  I tried to run SPEC2017
>> bmks w/ and w/o this density heuristic to catch some "expected" case,
>> but failed to unluckily.  It may be worth to trying with some more
>> option sets or even test with the previous SPECs later.
>>
>> I hacked the innermost loop iteration from 5 to 20, but baseline run
>> didn't stop (after more than 7 hrs then I killed it), which was
>> suspected to become endless because of some garbage (out of bound) data.
>>
>> But the current cost modeling for this loop on Power is still bad, the
>> min profitable iteration (both static and run time) are evaluated as 2,
>> while the reality shows 5 isn't profitable at least.
>>
>>
>>> The loop does have quite many memory streams so optimizing
>>> the (few) arithmetic ops by vectorizign them might not be worth
>>> the trouble, esp. since most of the loads are "strided" (composed
>>> from scalars) when no interchange is performed.  So it's probably
>>> more a "density" of # memory streams vs. # arithmetic ops, and
>>> esp. with any non-consecutive vector loads this balance being
>>> worse in the vector case?
>>>
>>
>> Yeah, these many scalar "strided" loads make things worse.  The fed
>> vector CTORs have to wait for all of their required loads are ready,
>> and these vector CTOR are required by further multiplications.
>>
>> I posted one patch[1] on this, which tries to model it with
>> some counts: nload (total load number), nload_ctor (strided
>> load number fed into CTOR) and nctor_strided (CTOR number fed
>> by strided load).
>>
>> Restricting the penalization by considering some factors:
>>   1) vect density ratio, if there are many vector instructions,
>>  the stalls from loads are easy to impact the subsequent
>>  computation.
>>   2) total load number, if nload is small, it's unlikely to
>>  bother the load/store units much.
>>   3) strided loads fed into CTOR pct., if there are high portion
>>  strided loads fed into CTOR, it's very likely to block
>>  the CTOR and its subsequent chain.
>>
>> btw, as your previous comments on add_stmt_cost, the load/strided/ctor
>> statistics should be gathered there instead, like:
>>
>>   if (!data->costing_for_scalar && data->loop_info && where == vect_body)
>> {
>>   if (kind == scalar_load || kind == vector_load || kind == 
>> unaligned_load
>>   || kind == vector_gather_load)
>>   data->nload += count;
>>   if (stmt_info && STMT_VINFO_STRIDED_P (stmt_info))
>> {
>>   if (kind == scalar_load || kind == unaligned_load)
>> data->nload_ctor += count;
>>   else if (kind == vec_construct)
>> data->nctor_strided += count;
>> }
>> }
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569791.html
>>
>>> The x86 add_stmt_cost has
>>>
>>>   /* If we do elementwise loads into a vector then we are bound by
>>>  latency and execution resources for the many scalar loads
>>>  (AGU and load ports).  Try to account for this by scaling the
>>>  construction cost by the number of elements involved.  */
>>>   if ((kind == vec_construct || kind == vec_to_scalar)
>>>   && stmt_info
>>>   && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
>>>   || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
>>>   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
>>>   && TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info))) != 
>>> INTEGER_CST)
>>> {
>>>   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>>>   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
>>> }
>>>
>>> so it penaltizes VMAT_ELEMENTWISE for variable step for both loads and 
>>> st

Re: [PATCH] Fix ICE in output_rnglists, at dwarf2out.c:12294 [PR100515]

2021-05-17 Thread Christophe Lyon via Gcc-patches
On Wed, 12 May 2021 at 10:24, Richard Biener  wrote:
>
> On Wed, 12 May 2021, Bernd Edlinger wrote:
>
> > Hi,
> >
> > this fixes another regression from my previous patch.
> >
> >
> > Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> > Is it OK for trunk?
>
> OK.
>
> Richard.
>

Hi,

As the new test uses -fopenmp, it fails on targets that do not support it.

I've committed the attached patch to skip the test in such cases.

Christophe

> >
> > Thanks
> > Bernd.
> >
commit b050cf6a4d9c305daff4a96e5a2489ece69dc287
Author: Christophe Lyon 
Date:   Mon May 17 09:25:43 2021 +

testsuite: Require openmp effective-target for PR100515

The related test uses -fopenmp, which is not supported by newlib-based
targets such as arm-eabi or aarch64-elf.

Requiring the openmp effective-target makes the test unsupported
rather than failed.

2021-05-17  Christophe Lyon  

PR debug/100515
gcc/testsuite
* gcc.dg/debug/dwarf2/pr100515.c: Require openmp effective-target.

diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr100515.c b/gcc/testsuite/gcc.dg/debug/dwarf2/pr100515.c
index 7c72fcd6693..17f6463cc6e 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr100515.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr100515.c
@@ -1,5 +1,6 @@
 /* PR debug/100515 */
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-g -O2 -fopenmp" } */
 
 void


Re: [i386] Fix ICE [PR target/100549]

2021-05-17 Thread Hongtao Liu via Gcc-patches
On Mon, May 17, 2021 at 5:01 PM Richard Biener
 wrote:
>
> On Thu, May 13, 2021 at 11:43 AM Hongtao Liu via Gcc-patches
>  wrote:
> >
> > Hi:
> >   When arg0 is same as arg1 in __builtin_ia32_pcmpgtw,
> > gimple_build (&stmts, GT_EXPR, cmp_type, arg0, arg1) will simplify the
> > comparison to vector constant 0, no stmts is generated, which causes
> > ICE in gsi_insert_before (gsi, stmts, GSI_SAME_STMT). So don't insert
> > stmts when it's NULL.
> >
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> >   Ok for trunk?
>
> It should use
>
>   gsi_insert_seq_before (gsi, stmts, ...)
>
> otherwise it will only insert the first stmt in the sequence
> and gsi_insert_seq_before handles a NULL seq just fine.

Oh, Good to know that.

> (you might want to scan ix86_gimple_fold_builtin for more
Others are fine.
> similar errors).
>
> OK with that change.
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR target/100549
> > * config/i386/i386.c (ix86_gimple_fold_builtin): Insert gimple
> > stmts if stmts is not NULL.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/100549
> > * gcc.target/i386/pr100549.c: New test.
> >
> > --
> > BR,
> > Hongtao

Thanks for the review.


-- 
BR,
Hongtao


Re: [PATCH] testsuite/arm: Fix and rename arm_qbit_ok into arm_sat_ok effective-target

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Fri, 30 Apr 2021 at 16:22, Christophe Lyon
 wrote:
>
> ping?
>
> On Wed, 21 Apr 2021 at 22:48, Christophe Lyon
>  wrote:
> >
> > The acle/saturation.c test uses __[su]sat() and
> > __saturation_occurred() intrinsics but __[su]sat() are defined in
> > acle.h if __ARM_FEATURE_SAT true, while __saturation_occurred()
> > depends on __ARM_FEATURE_QBIT.
> >
> > QBIT is a v5te feature, while SAT is available since v6, so the test
> > really needs __ARM_FEATURE_SAT, to have both available.
> >
> > This patch renames arm_qbit_ok into arm_sat_ok and checks
> > __ARM_FEATURE_SAT. It updates acle/saturation.c accordingly.
> >
> > This enables the test to pass on arm-eabi with default cpu/fpu/mode,
> > where arm_qbit previously used -march=armv5te instead of armv6 now.
> >
> > 2021-04-22  Christophe Lyon  
> >
> > gcc/
> > * doc/sourcebuild.texi (arm_qbit_ok): Rename into...
> > (arm_sat_ok): ...this.
> >
> > gcc/testsuite/
> > * gcc.target/arm/acle/saturation.c: Use arm_sat_ok effective
> > target.
> > * lib/target-supports.exp
> > (check_effective_target_arm_qbit_ok_nocache): Rename into...
> > (check_effective_target_arm_sat_ok_nocache): ... this. Check
> > __ARM_FEATURE_SAT and use armv6.
> > ---
> >  gcc/doc/sourcebuild.texi   |  6 ++--
> >  gcc/testsuite/gcc.target/arm/acle/saturation.c |  4 +--
> >  gcc/testsuite/lib/target-supports.exp  | 42 
> > +-
> >  3 files changed, 26 insertions(+), 26 deletions(-)
> >
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index b5bdd4f..4d9ec3c 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -2041,9 +2041,9 @@ ARM Target supports options suitable for accessing 
> > the SIMD32 intrinsics from
> >  @code{arm_acle.h}.
> >  Some multilibs may be incompatible with these options.
> >
> > -@item arm_qbit_ok
> > -@anchor{arm_qbit_ok}
> > -ARM Target supports options suitable for accessing the Q-bit manipulation
> > +@item arm_sat_ok
> > +@anchor{arm_sat_ok}
> > +ARM Target supports options suitable for accessing the saturation
> >  intrinsics from @code{arm_acle.h}.
> >  Some multilibs may be incompatible with these options.
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/acle/saturation.c 
> > b/gcc/testsuite/gcc.target/arm/acle/saturation.c
> > index 0b3fe51..a9f99e5 100644
> > --- a/gcc/testsuite/gcc.target/arm/acle/saturation.c
> > +++ b/gcc/testsuite/gcc.target/arm/acle/saturation.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> > -/* { dg-require-effective-target arm_qbit_ok } */
> > -/* { dg-add-options arm_qbit } */
> > +/* { dg-require-effective-target arm_sat_ok } */
> > +/* { dg-add-options arm_sat } */
> >
> >  #include 
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp 
> > b/gcc/testsuite/lib/target-supports.exp
> > index a522da3..5fab170 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -4168,24 +4168,24 @@ proc add_options_for_arm_simd32 { flags } {
> >  return "$flags $et_arm_simd32_flags"
> >  }
> >
> > -# Return 1 if this is an ARM target supporting the saturation intrinsics
> > -# from arm_acle.h.  Some multilibs may be incompatible with these options.
> > -# Also set et_arm_qbit_flags to the best options to add.
> > -# arm_acle.h includes stdint.h which can cause trouble with incompatible
> > -# -mfloat-abi= options.
> > -
> > -proc check_effective_target_arm_qbit_ok_nocache { } {
> > -global et_arm_qbit_flags
> > -set et_arm_qbit_flags ""
> > -foreach flags {"" "-march=armv5te" "-march=armv5te -mfloat-abi=softfp" 
> > "-march=armv5te -mfloat-abi=hard"} {
> > -  if { [check_no_compiler_messages_nocache et_arm_qbit_flags object {
> > +# Return 1 if this is an ARM target supporting the __ssat and __usat
> > +# saturation intrinsics from arm_acle.h.  Some multilibs may be
> > +# incompatible with these options.  Also set et_arm_sat_flags to the
> > +# best options to add.  arm_acle.h includes stdint.h which can cause
> > +# trouble with incompatible -mfloat-abi= options.
> > +
> > +proc check_effective_target_arm_sat_ok_nocache { } {
> > +global et_arm_sat_flags
> > +set et_arm_sat_flags ""
> > +foreach flags {"" "-march=armv6" "-march=armv6 -mfloat-abi=softfp" 
> > "-march=armv6 -mfloat-abi=hard -mfpu=vfp"} {
> > +  if { [check_no_compiler_messages_nocache et_arm_sat_flags object {
> > #include 
> > int dummy;
> > -   #ifndef __ARM_FEATURE_QBIT
> > -   #error not QBIT
> > +   #ifndef __ARM_FEATURE_SAT
> > +   #error not SAT
> > #endif
> >} "$flags"] } {
> > -   set et_arm_qbit_flags $flags
> > +   set et_arm_sat_flags $flags
> > return 1
> >}
> >  }
> > @@ -4193,17 +4193,17 @@ proc check_effective_target_arm_qbit_ok_nocache { } 
> > {
> >return 0
> >  }
> >
> > -proc check_effecti

Re: [PATCH] testsuite/arm: Improve mve-vshr.c

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Mon, 10 May 2021 at 13:22, Christophe Lyon
 wrote:
>
> Ping?
>
> On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
>  wrote:
> >
> > Vector right shifts by immediate use vshr, while right shifts by
> > vectors instead use vneg and vshl.
> >
> > This patch adds the corresponding scan-assembler-times that were
> > missing.
> >
> > 2021-04-22  Christophe Lyon  
> >
> > gcc/testsuite/
> > * gcc.target/arm/simd/mve-vshr.c: Add more scan-assembler-times.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > index d4e658c..d4258e9 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > @@ -55,5 +55,12 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
> >
> >  /* MVE has only 128-bit vectors, so we can vectorize only half of the
> > functions above.  */
> > +/* Vector right shifts use vneg and left shifts.  */
> > +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } 
> > } */
> > +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } 
> > } */
> > +/* { dg-final { scan-assembler-times {vneg.s[0-9]+  q[0-9]+, q[0-9]+} 6 } 
> > } */
> > +
> > +
> > +/* Shift by immediate.  */
> >  /* { dg-final { scan-assembler-times {vshr.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } 
> > } */
> >  /* { dg-final { scan-assembler-times {vshr.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } 
> > } */
> > --
> > 2.7.4
> >


Re: [PATCH] testsuite/arm: Factorize and increase coverage in mve-sub_1.c

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Mon, 10 May 2021 at 13:22, Christophe Lyon
 wrote:
>
> Ping?
>
> On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
>  wrote:
> >
> > Use a template macro to factorize the existing test functions.
> >
> > This patch also adds a version to check subtraction with __fp16 type.
> >
> > 2021-04-26  Christophe Lyon  
> >
> > gcc/testsuite/
> > * gcc.target/arm/simd/mve-vsub_1.c: Factorize and add __fp16 test.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c | 60 
> > +-
> >  1 file changed, 21 insertions(+), 39 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > index 842e5c6..5a6c345 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > @@ -5,60 +5,42 @@
> >
> >  #include 
> >
> > -void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
> > -  int i;
> > -  for (i=0; i<4; i++) {
> > -dest[i] = a[i] - b[i];
> > -  }
> > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> > __restrict__ dest, \
> > +TYPE##BITS##_t *a, 
> > TYPE##BITS##_t *b) { \
> > +int i; \
> > +for (i=0; i > +  dest[i] = a[i] OP b[i];  \
> > +}  \
> >  }
> >
> > -void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
> > -  int i;
> > -  for (i=0; i<4; i++) {
> > -dest[i] = a[i] - b[i];
> > -  }
> > -}
> > +/* 128-bit vectors.  */
> > +FUNC(s, int, 32, 4, -, vsub)
> > +FUNC(u, uint, 32, 4, -, vsub)
> > +FUNC(s, int, 16, 8, -, vsub)
> > +FUNC(u, uint, 16, 8, -, vsub)
> > +FUNC(s, int, 8, 16, -, vsub)
> > +FUNC(u, uint, 8, 16, -, vsub)
> >
> >  /* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, 
> > q[0-9]+} 2 } } */
> > -
> > -void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
> > -  int i;
> > -  for (i=0; i<8; i++) {
> > -dest[i] = a[i] - b[i];
> > -  }
> > -}
> > -
> > -void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
> > -  int i;
> > -  for (i=0; i<8; i++) {
> > -dest[i] = a[i] - b[i];
> > -  }
> > -}
> > -
> >  /* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, 
> > q[0-9]+} 2 } } */
> > +/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 
> > 2 } } */
> >
> > -void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
> > -  int i;
> > -  for (i=0; i<16; i++) {
> > -dest[i] = a[i] - b[i];
> > -  }
> > -}
> > -
> > -void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
> > +void test_vsub_f32 (float * dest, float * a, float * b) {
> >int i;
> > -  for (i=0; i<16; i++) {
> > +  for (i=0; i<4; i++) {
> >  dest[i] = a[i] - b[i];
> >}
> >  }
> > +/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, 
> > q[0-9]+} 1 } } */
> >
> > -/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 
> > 2 } } */
> >
> > -void test_vsub_f32 (float * dest, float * a, float * b) {
> > +void test_vsub_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> >int i;
> > -  for (i=0; i<4; i++) {
> > +  for (i=0; i<8; i++) {
> >  dest[i] = a[i] - b[i];
> >}
> >  }
> >
> > -/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, 
> > q[0-9]+} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, 
> > q[0-9]+} 1 } } */
> >
> > --
> > 2.7.4
> >


Re: [PATCH] testsuite/arm: Add mve-vadd-1.c test

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Mon, 10 May 2021 at 13:22, Christophe Lyon
 wrote:
>
> Ping?
>
> On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
>  wrote:
> >
> > Support for vadd has been present for a while, but it was lacking a
> > test.
> >
> > 2021-04-22  Christophe Lyon  
> >
> > gcc/testsuite/
> > * gcc.target/arm/simd/mve-vadd-1.c: New.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c | 43 
> > ++
> >  1 file changed, 43 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > new file mode 100644
> > index 000..15a9daa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > @@ -0,0 +1,43 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include 
> > +
> > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> > __restrict__ dest, \
> > +TYPE##BITS##_t *a, 
> > TYPE##BITS##_t *b) { \
> > +int i; \
> > +for (i=0; i > +  dest[i] = a[i] OP b[i];  \
> > +}  \
> > +}
> > +
> > +/* 128-bit vectors.  */
> > +FUNC(s, int, 32, 4, +, vadd)
> > +FUNC(u, uint, 32, 4, +, vadd)
> > +FUNC(s, int, 16, 8, +, vadd)
> > +FUNC(u, uint, 16, 8, +, vadd)
> > +FUNC(s, int, 8, 16, +, vadd)
> > +FUNC(u, uint, 8, 16, +, vadd)
> > +
> > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > q[0-9]+} 2 } } */
> > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > q[0-9]+} 2 } } */
> > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, q[0-9]+} 
> > 2 } } */
> > +
> > +void test_vadd_f32 (float * dest, float * a, float * b) {
> > +  int i;
> > +  for (i=0; i<4; i++) {
> > +dest[i] = a[i] + b[i];
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, q[0-9]+} 
> > 1 } } */
> > +
> > +void test_vadd_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > +  int i;
> > +  for (i=0; i<8; i++) {
> > +dest[i] = a[i] + b[i];
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, q[0-9]+} 
> > 1 } } */
> > --
> > 2.7.4
> >


Re: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Mon, 10 May 2021 at 13:22, Christophe Lyon
 wrote:
>
> Ping?
>
> On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
>  wrote:
> >
> > This patch adds a test for the scalar mode of vadd, precisely noting
> > that we do not yet use the T2 variants of vadd, which take a scalar as
> > final argument.
> >
> > 2021-04-22  Christophe Lyon  
> >
> > gcc/testsuite/
> > * gcc.target/arm/simd/mve-vadd-scalar-1: New.
> > ---
> >  .../gcc.target/arm/simd/mve-vadd-scalar-1.c| 47 
> > ++
> >  1 file changed, 47 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > new file mode 100644
> > index 000..bbf70e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > @@ -0,0 +1,47 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include 
> > +
> > +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> > __restrict__ dest, \
> > +TYPE##BITS##_t *a) { \
> > +int i; \
> > +for (i=0; i > +  dest[i] = a[i] OP 1; \
> > +}  \
> > +}
> > +
> > +/* 128-bit vectors.  */
> > +FUNC_IMM(s, int, 32, 4, +, vaddimm)
> > +FUNC_IMM(u, uint, 32, 4, +, vaddimm)
> > +FUNC_IMM(s, int, 16, 8, +, vaddimm)
> > +FUNC_IMM(u, uint, 16, 8, +, vaddimm)
> > +FUNC_IMM(s, int, 8, 16, +, vaddimm)
> > +FUNC_IMM(u, uint, 8, 16, +, vaddimm)
> > +
> > +/* For the moment we do not select the T2 vadd variant operating on a 
> > scalar
> > +   final argument.  */
> > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > r[0-9]+} 2 { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > r[0-9]+} 2 { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, r[0-9]+} 
> > 2 { xfail *-*-* } } } */
> > +
> > +void test_vaddimm_f32 (float * dest, float * a) {
> > +  int i;
> > +  for (i=0; i<4; i++) {
> > +dest[i] = a[i] + 5.0;
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, r[0-9]+} 
> > 1 { xfail *-*-* } } } */
> > +
> > +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> > +void test_vaddimm_f16 (__fp16 * dest, __fp16 * a) {
> > +  int i;
> > +  __fp16 b = 5.0f16;
> > +  for (i=0; i<8; i++) {
> > +dest[i] = a[i] + b;
> > +  }
> > +}
> > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, r[0-9]+} 
> > 1 { xfail *-*-* } } } */
> > --
> > 2.7.4
> >


Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Wed, 5 May 2021 at 16:08, Christophe Lyon  wrote:
>
> On Tue, 4 May 2021 at 15:41, Christophe Lyon  
> wrote:
> >
> > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> >  wrote:
> > >
> > > Hi Christophe,
> > >
> > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > Since MVE has a different set of vector comparison operators from
> > > > Neon, we have to update the expansion to take into account the new
> > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > with the inverted condition.
> > > >
> > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > >
> > > > For:
> > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > >
> > > > we now generate:
> > > > cmp_eq_vs32_reg:
> > > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d5, .L123+8
> > > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d7, .L123+24
> > > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > > .L124:
> > > >   .align  3
> > > > .L123:
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >
> > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces
> > > > a pair of vldr instead of vmov.i32, qX, #0
> > > I think ideally we would even want:
> > > vpte  eq, q0, q1
> > > vmovt.i32 q0, #0
> > > vmove.i32 q0, #1
> > >
> > > But we don't have a way to generate VPT blocks with multiple
> > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> >
> > TBH,  I looked at what LLVM generates currently ;-)
> >
>
> Here is an updated version, which adds
> && (! || flag_unsafe_math_optimizations)
> to vcond_mask_
>
> This condition was not present in the neon.md version I move to vec-common.md,
> but since the VDQW iterator includes V2SF and V4SF, it should take
> float-point flags into account.
>
> Christophe
>
> > >
> > > >
> > > > 2021-03-01  Christophe Lyon  
> > > >
> > > >   gcc/
> > > >   * config/arm/arm-protos.h (arm_expand_vector_compare): Update
> > > >   prototype.
> > > >   * config/arm/arm.c (arm_expand_vector_compare): Add support for
> > > >   MVE.
> > > >   (arm_expand_vcond): Likewise.
> > > >   * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S,
> > > >   VCMPEQQ_N_S, VCMPNEQ_N_S.
> > > >   (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove.
> > > >   * config/arm/mve.md (@mve_vcmpq_): Add '@' 
> > > > prefix.
> > > >   (@mve_vcmpq_f): Likewise.
> > > >   (@mve_vcmpq_n_f): Likewise.
> > > >   (@mve_vpselq_): Likewise.
> > > >   (@mve_vpselq_f"): Likewise.
> > > >   * config/arm/neon.md (vec_cmp > > >   and move to vec-common.md.
> > > >   (vec_cmpu): Likewise.
> > > >   (vcond): Likewise.
> > > >   (vcond): Likewise.
> > > >   (vcondu): Likewise.
> > > >   (vcond_mask_): Likewise.
> > > >   * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S)
> > > >   (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U)
> > > >   (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S)
> > > >   (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U)
> > > >   (VCMPHIQ_N_U, VCMPHIQ_U): Remove.
> > > >   * config/arm/vec-common.md (vec_cmp > > >   from neon.md.
> > > >   (vec_cmpu): Likewise.
> > > >   (vcond): Likewise.
> > > >   (vcond): Likewise.
> > > >   (vcondu): Likewise.
> > > >   (vcond_mask_): Likewise.
> > > >
> > > >   gcc/testsuite
> > > >   * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors.
> > > >   * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors.
> > > >   * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC
> > > >   vectors.
> > > >   * gcc.target/arm/simd/mve-vcmp-f32.c: New test for
> > > >   auto-vectorization.
> > > >   * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization.
> > > >
> > > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c
> > > > ---
> > > >   gcc/config/arm/arm-protos.h|   2 +-
> > > >   gcc/config/arm/arm.c   | 211 
> > > > -
> > > >   gcc/config/arm/iterators.md|   9 +-
> > > >   gcc/config/arm/mve.md  |  10 +-
> > > >   gcc/config/arm/neon.md |  87 -
> > > >   gcc/config/arm/unspecs.md  |  20 --
> > > >   gcc/config/arm/vec-common.md   | 107 +++
> > > >   gcc/testsuite/gc

Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Wed, 5 May 2021 at 16:09, Christophe Lyon  wrote:
>
> On Tue, 4 May 2021 at 19:03, Christophe Lyon  
> wrote:
> >
> > On Tue, 4 May 2021 at 15:43, Christophe Lyon  
> > wrote:
> > >
> > > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists)
> > >  wrote:
> > > >
> > > > It would be good to also add tests for NEON as you also enable auto-vec
> > > > for it. I checked and I do think the necessary 'neon_vc' patterns exist
> > > > for 'VH', so we should be OK there.
> > > >
> > >
> > > Actually since I posted the patch series, I've noticed a regression in
> > > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t 
> > > loops,
> > > but we lose the fact that some FP comparisons can throw exceptions.
> > >
> > > I'll have to revisit this patch.
> >
> > Actually it looks like my patch does the right thing: we now vectorize
> > appropriately, given that the testcase is compiled with -ffast-math.
> > I need to update the testcase, though.
> >
>
> Here is a new version, with armv8_2-fp16-arith-1.c updated to take
> into account the new vectorization.
>
> Christophe
>
>
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > This patch adds __fp16 support to the previous patch that added vcmp
> > > > > support with MVE. For this we update existing expanders to use VDQWH
> > > > > iterator, and add a new expander vcond.  In the
> > > > > process we need to create suitable iterators, and update v_cmp_result
> > > > > as needed.
> > > > >
> > > > > 2021-04-26  Christophe Lyon  
> > > > >
> > > > >   gcc/
> > > > >   * config/arm/iterators.md (V16): New iterator.
> > > > >   (VH_cvtto): New iterator.
> > > > >   (v_cmp_result): Added V4HF and V8HF support.
> > > > >   * config/arm/vec-common.md (vec_cmp): Use 
> > > > > VDQWH.
> > > > >   (vcond): Likewise.
> > > > >   (vcond_mask_): Likewise.
> > > > >   (vcond): New expander.
> > > > >
> > > > >   gcc/testsuite/
> > > > >   * gcc.target/arm/simd/mve-compare-3.c: New test with GCC 
> > > > > vectors.
> > > > >   * gcc.target/arm/simd/mve-vcmp-f16.c: New test for
> > > > >   auto-vectorization.
> > > > > ---
> > > > >   gcc/config/arm/iterators.md   |  6 
> > > > >   gcc/config/arm/vec-common.md  | 40 
> > > > > ---
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38 
> > > > > +
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c  | 30 
> > > > > +
> > > > >   4 files changed, 102 insertions(+), 12 deletions(-)
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
> > > > >
> > > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > > > > index a128465..3042baf 100644
> > > > > --- a/gcc/config/arm/iterators.md
> > > > > +++ b/gcc/config/arm/iterators.md
> > > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
> > > > >   ;; Vector modes for 16-bit floating-point support.
> > > > >   (define_mode_iterator VH [V8HF V4HF])
> > > > >
> > > > > +;; Modes with 16-bit elements only.
> > > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
> > > > > +
> > > > >   ;; 16-bit floating-point vector modes suitable for moving (includes 
> > > > > BFmode).
> > > > >   (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
> > > > >
> > > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF 
> > > > > "v2si")
> > > > >   ;; (Opposite) mode to convert to/from for vector-half mode 
> > > > > conversions.
> > > > >   (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
> > > > >   (V8HI "V8HF") (V8HF "V8HI")])
> > > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
> > > > > + (V8HI "v8hf") (V8HF "v8hi")])
> > > > >
> > > > >   ;; Define element mode for each vector mode.
> > > > >   (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
> > > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI "V8QI") 
> > > > > (V16QI "V16QI")
> > > > >   (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
> > > > >   (V4HI "v4hi") (V8HI  "v8hi")
> > > > >   (V2SI "v2si") (V4SI  "v4si")
> > > > > + (V4HF "v4hi") (V8HF  "v8hi")
> > > > >   (DI   "di")   (V2DI  "v2di")
> > > > >   (V2SF "v2si") (V4SF  "v4si")])
> > > > >
> > > > > diff --git a/gcc/config/arm/vec-common.md 
> > > > > b/gcc/config/arm/vec-common.md
> > > > > index 034b48b..3fd341c 100644
> > > > > --- a/gcc/config/arm/vec-common.md
> > > > > +++ b/gcc/config/arm/vec-common.md
> > > > > @@ -366,8 +366,8 @@ (define_expand "vlshr3"
> > > > >   (define_expand "vec_cmp"
> > > > > [(set (match_operan

Re: [PATCH 8/9] arm: Auto-vectorization for MVE: vld2/vst2

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Fri, 30 Apr 2021 at 16:09, Christophe Lyon
 wrote:
>
> This patch enables MVE vld2/vst2 instructions for auto-vectorization.
> We move the existing expanders from neon.md and enable them for MVE,
> calling the respective emitter.
>
> 2021-03-12  Christophe Lyon  
>
> gcc/
> * config/arm/neon.md (vec_load_lanesoi)
> (vec_store_lanesoi): Move ...
> * config/arm/vec-common.md: here.
>
> gcc/testsuite/
> * gcc.target/arm/simd/mve-vld2.c: New test, derived from
> slp-perm-2.c
> ---
>  gcc/config/arm/neon.md   | 14 
>  gcc/config/arm/vec-common.md | 27 
>  gcc/testsuite/gcc.target/arm/simd/mve-vld2.c | 96 
> 
>  3 files changed, 123 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vld2.c
>
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 6660846..bc8775c 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -5063,13 +5063,6 @@ (define_insn "neon_vld2"
>  (const_string "neon_load2_2reg")))]
>  )
>
> -(define_expand "vec_load_lanesoi"
> -  [(set (match_operand:OI 0 "s_register_operand")
> -(unspec:OI [(match_operand:OI 1 "neon_struct_operand")
> -(unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> -  UNSPEC_VLD2))]
> -  "TARGET_NEON")
> -
>  (define_insn "neon_vld2"
>[(set (match_operand:OI 0 "s_register_operand" "=w")
>  (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
> @@ -5197,13 +5190,6 @@ (define_insn "neon_vst2"
>  (const_string "neon_store2_one_lane")))]
>  )
>
> -(define_expand "vec_store_lanesoi"
> -  [(set (match_operand:OI 0 "neon_struct_operand")
> -   (unspec:OI [(match_operand:OI 1 "s_register_operand")
> -(unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> -   UNSPEC_VST2))]
> -  "TARGET_NEON")
> -
>  (define_insn "neon_vst2"
>[(set (match_operand:OI 0 "neon_struct_operand" "=Um")
> (unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 3fd341c..7abefea 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -482,6 +482,33 @@ (define_expand "vcond_mask_"
>  }
>else
>  gcc_unreachable ();
> +  DONE;
> +})
>
> +(define_expand "vec_load_lanesoi"
> +  [(set (match_operand:OI 0 "s_register_operand")
> +(unspec:OI [(match_operand:OI 1 "neon_struct_operand")
> +(unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  UNSPEC_VLD2))]
> +  "TARGET_NEON || TARGET_HAVE_MVE"
> +{
> +  if (TARGET_NEON)
> +emit_insn (gen_neon_vld2 (operands[0], operands[1]));
> +  else
> +emit_insn (gen_mve_vld2q (operands[0], operands[1]));
> +  DONE;
> +})
> +
> +(define_expand "vec_store_lanesoi"
> +  [(set (match_operand:OI 0 "neon_struct_operand")
> +   (unspec:OI [(match_operand:OI 1 "s_register_operand")
> +(unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +   UNSPEC_VST2))]
> +  "TARGET_NEON || TARGET_HAVE_MVE"
> +{
> +  if (TARGET_NEON)
> +emit_insn (gen_neon_vst2 (operands[0], operands[1]));
> +  else
> +emit_insn (gen_mve_vst2q (operands[0], operands[1]));
>DONE;
>  })
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c
> new file mode 100644
> index 000..9c7c3f5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c
> @@ -0,0 +1,96 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +#define M00 100
> +#define M10 216
> +#define M01 1322
> +#define M11 13
> +
> +#define N 128
> +
> +
> +/* Integer tests.  */
> +#define FUNC(SIGN, TYPE, BITS) \
> +  void foo_##SIGN##BITS##x (TYPE##BITS##_t *__restrict__ pInput,   \
> +   TYPE##BITS##_t *__restrict__ pOutput)   \
> +  {\
> +unsigned int i;\
> +TYPE##BITS##_t  a, b;  \
> +   \
> +for (i = 0; i < N / BITS; i++) \
> +  {  
>   \
> +   a = *pInput++;  \
> +   b = *pInput++;  \
> +   \
> +   *pOutput++ = M00 * a + M01 * b; \
> + 

Re: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4

2021-05-17 Thread Christophe Lyon via Gcc-patches
ping?

On Tue, 4 May 2021 at 16:57, Christophe Lyon  wrote:
>
> On Tue, 4 May 2021 at 14:03, Andre Vieira (lists)
>  wrote:
> >
> > Hi Christophe,
> >
> > The series LGTM but you'll need the approval of an arm port maintainer
> > before committing. I only did code-review, did not try to build/run tests.
> >
>
> Hi Andre,
>
> Thanks for the comments!
>
> > Kind regards,
> > Andre
> >
> > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > This patch enables MVE vld4/vst4 instructions for auto-vectorization.
> > > We move the existing expanders from neon.md and enable them for MVE,
> > > calling the respective emitter.
> > >
> > > 2021-03-12  Christophe Lyon  
> > >
> > >   gcc/
> > >   * config/arm/neon.md (vec_load_lanesxi)
> > >   (vec_store_lanexoi): Move ...
> > >   * config/arm/vec-common.md: here.
> > >
> > >   gcc/testsuite/
> > >   * gcc.target/arm/simd/mve-vld4.c: New test, derived from
> > >   slp-perm-3.c
> > > ---
> > >   gcc/config/arm/neon.md   |  20 
> > >   gcc/config/arm/vec-common.md |  26 +
> > >   gcc/testsuite/gcc.target/arm/simd/mve-vld4.c | 140 
> > > +++
> > >   3 files changed, 166 insertions(+), 20 deletions(-)
> > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vld4.c
> > >
> > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > > index bc8775c..fb58baf 100644
> > > --- a/gcc/config/arm/neon.md
> > > +++ b/gcc/config/arm/neon.md
> > > @@ -5617,16 +5617,6 @@ (define_insn "neon_vld4"
> > >   (const_string "neon_load4_4reg")))]
> > >   )
> > >
> > > -(define_expand "vec_load_lanesxi"
> > > -  [(match_operand:XI 0 "s_register_operand")
> > > -   (match_operand:XI 1 "neon_struct_operand")
> > > -   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> > > -  "TARGET_NEON"
> > > -{
> > > -  emit_insn (gen_neon_vld4 (operands[0], operands[1]));
> > > -  DONE;
> > > -})
> > > -
> > >   (define_expand "neon_vld4"
> > > [(match_operand:XI 0 "s_register_operand")
> > >  (match_operand:XI 1 "neon_struct_operand")
> > > @@ -5818,16 +5808,6 @@ (define_insn "neon_vst4"
> > >   (const_string "neon_store4_4reg")))]
> > >   )
> > >
> > > -(define_expand "vec_store_lanesxi"
> > > -  [(match_operand:XI 0 "neon_struct_operand")
> > > -   (match_operand:XI 1 "s_register_operand")
> > > -   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> > > -  "TARGET_NEON"
> > > -{
> > > -  emit_insn (gen_neon_vst4 (operands[0], operands[1]));
> > > -  DONE;
> > > -})
> > > -
> > >   (define_expand "neon_vst4"
> > > [(match_operand:XI 0 "neon_struct_operand")
> > >  (match_operand:XI 1 "s_register_operand")
> > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> > > index 7abefea..d46b78d 100644
> > > --- a/gcc/config/arm/vec-common.md
> > > +++ b/gcc/config/arm/vec-common.md
> > > @@ -512,3 +512,29 @@ (define_expand "vec_store_lanesoi"
> > >   emit_insn (gen_mve_vst2q (operands[0], operands[1]));
> > > DONE;
> > >   })
> > > +
> > > +(define_expand "vec_load_lanesxi"
> > > +  [(match_operand:XI 0 "s_register_operand")
> > > +   (match_operand:XI 1 "neon_struct_operand")
> > > +   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> > > +  "TARGET_NEON || TARGET_HAVE_MVE"
> > > +{
> > > +  if (TARGET_NEON)
> > > +emit_insn (gen_neon_vld4 (operands[0], operands[1]));
> > > +  else
> > > +emit_insn (gen_mve_vld4q (operands[0], operands[1]));
> > > +  DONE;
> > > +})
> > > +
> > > +(define_expand "vec_store_lanesxi"
> > > +  [(match_operand:XI 0 "neon_struct_operand")
> > > +   (match_operand:XI 1 "s_register_operand")
> > > +   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> > > +  "TARGET_NEON || TARGET_HAVE_MVE"
> > > +{
> > > +  if (TARGET_NEON)
> > > +emit_insn (gen_neon_vst4 (operands[0], operands[1]));
> > > +  else
> > > +emit_insn (gen_mve_vst4q (operands[0], operands[1]));
> > > +  DONE;
> > > +})
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c 
> > > b/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c
> > > new file mode 100644
> > > index 000..ce3e755
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c
> > > @@ -0,0 +1,140 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include 
> > > +
> > > +#define M00 100
> > > +#define M10 216
> > > +#define M20 23
> > > +#define M30 237
> > > +#define M01 1322
> > > +#define M11 13
> > > +#define M21 27271
> > > +#define M31 2280
> > > +#define M02 74
> > > +#define M12 191
> > > +#define M22 500
> > > +#define M32 111
> > > +#define M03 134
> > > +#define M13 117
> > > +#define M23 11
> > > +#define M33 771
> > > +
> > > +#define N 128
> > > +
> > > +/* Integer tests.  */
> > > +#define FUNC(SIGN, TYPE, BITS)   

Re: [PATCH] [i386] Fix _mm256_zeroupper to notify LRA that vzeroupper will kill sse registers. [PR target/82735]

2021-05-17 Thread Richard Sandiford via Gcc-patches
Hongtao Liu via Gcc-patches  writes:
> On Fri, May 14, 2021 at 10:27 AM Hongtao Liu  wrote:
>>
>> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford
>>  wrote:
>> >
>> > Jakub Jelinek  writes:
>> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote:
>> > >> Jakub Jelinek  writes:
>> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote:
>> > >> >> > >   Bootstrapped and regtested on X86_64-linux-gnu{-m32,}
>> > >> >> > >   Ok for trunk?
>> > >> >> >
>> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and later
>> > >> >> > removed for some reason). Perhaps we could resurrect the patch for 
>> > >> >> > the
>> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX?
>> > >> >>
>> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html
>> > >> >
>> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html
>> > >> > is where it got removed, CCing Richard.
>> > >>
>> > >> Yeah.  Initially clobber_high seemed like the best appraoch for
>> > >> handling the tlsdesc thing, but in practice it was too difficult
>> > >> to shoe-horn the concept in after the fact, when so much rtl
>> > >> infrastructure wasn't prepared to deal with it.  The old support
>> > >> didn't handle all cases and passes correctly, and handled others
>> > >> suboptimally.
>> > >>
>> > >> I think it would be worth using the same approach as
>> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for
>> > >> vzeroupper: represent the instructions as call_insns in which the
>> > >> call has a special vzeroupper ABI.  I think that's likely to lead
>> > >> to better code than clobber_high would (or at least, it did for 
>> > >> tlsdesc).
>>
>> From an implementation perspective, I guess you're meaning we should
>> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386
>> backend.
>>
> When I implemented the vzeroupper pattern as call_insn and defined
> TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related
> to 2 parts
>
> 1. requires_stack_frame_p return true for vzeroupper which should be false.
> 2. in subst_stack_regs, vzeroupper shouldn't kill arguments
>
> I've tried a rough patch like below, it works for those failures,
> unfortunately, I don't have an arm machine to test, so I want to ask
> would the below change break something in the arm backend?

ABI id 0 just means the default ABI.  Real calls can use other ABIs
besides the default.  That said…

> modified   gcc/reg-stack.c
> @@ -174,6 +174,7 @@
>  #include "reload.h"
>  #include "tree-pass.h"
>  #include "rtl-iter.h"
> +#include "function-abi.h"
>
>  #ifdef STACK_REGS
>
> @@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack)
>bool control_flow_insn_deleted = false;
>int i;
>
> -  if (CALL_P (insn))
> +  if (CALL_P (insn) && insn_callee_abi (insn).id () == 0)
>  {
>int top = regstack->top;

…reg-stack.c is effectively x86-specific code, so checking id 0 here
wouldn't affect anything else.  It doesn't feel very future-proof
though, since x86 could use ABIs other than 0 for real calls in future.

AIUI the property that matters here isn't the ABI, but that the target
of the call doesn't reference stack registers.  That can be true for
real calls too, with -fipa-ra.

> modified   gcc/shrink-wrap.c
> @@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn,
> HARD_REG_SET prologue_used,
>unsigned regno;
>
>if (CALL_P (insn))
> -return !SIBLING_CALL_P (insn);
> +{
> +  if (insn_callee_abi (insn).id() != 0)
> + return false;
> +  else
> + return !SIBLING_CALL_P (insn);
> +}

TBH I'm not sure why off-hand this function needs to treat non-sibling
calls specially, rather than rely on normal DF information.  Calls have
a use of the stack pointer, so we should return true for that reason:

/* The stack ptr is used (honorarily) by a CALL insn.  */
df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
   NULL, bb, insn_info, DF_REF_REG_USE,
   DF_REF_CALL_STACK_USAGE | flags);

I guess this is something we should suppress for fake calls though.

It looks like the rtx “used” flag is unused for INSNs, so we could
use that as a CALL_INSN flag that indicates a fake call.  We could just
need to make:

  /* For all other RTXes clear the used flag on the copy.  */
  RTX_FLAG (copy, used) = 0;

conditional on !INSN_P.

Thanks,
Richard


Re: RFA: Improve message for wrong number of alternatives

2021-05-17 Thread Richard Sandiford via Gcc-patches
Joern Rennecke  writes:
> On Sun, 16 May 2021 at 22:01, Martin Sebor  wrote:
>  > I think it's very helpful to provide this sort of detail.  Just as
>> a matter of readability, the new error message
>>
>>"wrong number of alternatives in operand %d, %d, expected %d"
>>
>> would be improved by avoiding the two consecutive %d's,
>
> We could also do that by phrasing it:
>
> "wrong number of alternatives in operand %d, seen: %d, expected: %d"
>
> so that the change is just about adding extra information.
>
>> e.g., by
>> rephrasing it like so:
>>
>>"%d alternatives provided to operand %d where %d are expected"
>
> This has an additional change in that we no longer jump to the conclusion
> that the operand where we notice the discrepancy is the point that's wrong.
> I suppose that conclusion is more often right than wrong (assuming more than
> two operands on average for patterns that have alternatives and at least two
> operands), but when it's wrong, it's particularly confusing and/or jarring,
> so it's an improvement to just stick to the known facts.
> But if we go that way, I suppose we should spell also out where the
> expectation comes from: we have a loop over the operands, and we look at
> operand 0 first.  We could do that by using the diagnostic:
>
>   error_at (d->loc,
> "alternative number mismatch: operand %d has
> %d, operand %d had %d",
> start, d->operand[start].n_alternatives, 0, n);
>
>
> I notice in passing here that printf is actually awkward for repharasings
> and hence also for translations, because we can't interchange the order of
> the data in the message string.
>
> But for multi-alternative patterns, we also have the awkwardness of
> repeating the abstract of the error message and the recap of the number
> of alternatives of operand 0.
>
> So I propose the attached patch now.
>
> Bootstrapped on x86_64-pc-linux-gnu.
>
> 2021-05-17  Joern Rennecke  
>
>   Make "wrong number of alternatives" message more specific, and
>   remove assumption on where the problem is.
>
> diff --git a/gcc/genoutput.c b/gcc/genoutput.c
> index 8e911cce2f5..6313b722cf7 100644
> --- a/gcc/genoutput.c
> +++ b/gcc/genoutput.c
> @@ -757,6 +757,7 @@ validate_insn_alternatives (class data *d)
>   int which_alternative = 0;
>   int alternative_count_unsure = 0;
>   bool seen_write = false;
> + bool alt_mismatch = false;
>  
>   for (p = d->operand[start].constraint; (c = *p); p += len)
> {
> @@ -813,8 +814,19 @@ validate_insn_alternatives (class data *d)
>   if (n == 0)
> n = d->operand[start].n_alternatives;
>   else if (n != d->operand[start].n_alternatives)
> -   error_at (d->loc, "wrong number of alternatives in operand %d",
> - start);
> +   {
> + if (!alt_mismatch)
> +   {
> + alt_mismatch = true;
> + error_at (d->loc,
> +   "alternative number mismatch: "
> +   "operand %d had %d, operand %d has %d",
> +   0, n, start, d->operand[start].n_alternatives);

IMO this is better with s/had/has/.  OK with that change, thanks.

Richard


> +   }
> + else
> +   error_at (d->loc, "operand %d has %d alternatives",
> + start, d->operand[start].n_alternatives);
> +   }
> }
>}
>  


Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Bernhard Reutner-Fischer via Gcc-patches
On 16 May 2021 20:21:13 CEST, Joern Rennecke  
wrote:

>The attached patch adds a new option -fretry-compilation that allows
>you to specify a list - or
>lists - of options to use for a compilation retry, which is
>implemented in the compiler driver.

That's gross ;)

+If the compiler fails, retry with named options appeded.  Separate multiple 
options with ',', and multiple alternatives with ':' .
s/appeded/appended/

>
>Bootstrapped on x86_64-pc-linux-gnu.



RE: [PATCH] testsuite/arm: Fix and rename arm_qbit_ok into arm_sat_ok effective-target

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 21 April 2021 21:48
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] testsuite/arm: Fix and rename arm_qbit_ok into
> arm_sat_ok effective-target
> 
> The acle/saturation.c test uses __[su]sat() and
> __saturation_occurred() intrinsics but __[su]sat() are defined in
> acle.h if __ARM_FEATURE_SAT true, while __saturation_occurred()
> depends on __ARM_FEATURE_QBIT.
> 
> QBIT is a v5te feature, while SAT is available since v6, so the test
> really needs __ARM_FEATURE_SAT, to have both available.
> 
> This patch renames arm_qbit_ok into arm_sat_ok and checks
> __ARM_FEATURE_SAT. It updates acle/saturation.c accordingly.
> 
> This enables the test to pass on arm-eabi with default cpu/fpu/mode,
> where arm_qbit previously used -march=armv5te instead of armv6 now.

Ok.
Thanks,
Kyrill

> 
> 2021-04-22  Christophe Lyon  
> 
>   gcc/
>   * doc/sourcebuild.texi (arm_qbit_ok): Rename into...
>   (arm_sat_ok): ...this.
> 
>   gcc/testsuite/
>   * gcc.target/arm/acle/saturation.c: Use arm_sat_ok effective
>   target.
>   * lib/target-supports.exp
>   (check_effective_target_arm_qbit_ok_nocache): Rename into...
>   (check_effective_target_arm_sat_ok_nocache): ... this. Check
>   __ARM_FEATURE_SAT and use armv6.
> ---
>  gcc/doc/sourcebuild.texi   |  6 ++--
>  gcc/testsuite/gcc.target/arm/acle/saturation.c |  4 +--
>  gcc/testsuite/lib/target-supports.exp  | 42 
> +-
>  3 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index b5bdd4f..4d9ec3c 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2041,9 +2041,9 @@ ARM Target supports options suitable for
> accessing the SIMD32 intrinsics from
>  @code{arm_acle.h}.
>  Some multilibs may be incompatible with these options.
> 
> -@item arm_qbit_ok
> -@anchor{arm_qbit_ok}
> -ARM Target supports options suitable for accessing the Q-bit manipulation
> +@item arm_sat_ok
> +@anchor{arm_sat_ok}
> +ARM Target supports options suitable for accessing the saturation
>  intrinsics from @code{arm_acle.h}.
>  Some multilibs may be incompatible with these options.
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/saturation.c
> b/gcc/testsuite/gcc.target/arm/acle/saturation.c
> index 0b3fe51..a9f99e5 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/saturation.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/saturation.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-require-effective-target arm_qbit_ok } */
> -/* { dg-add-options arm_qbit } */
> +/* { dg-require-effective-target arm_sat_ok } */
> +/* { dg-add-options arm_sat } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index a522da3..5fab170 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -4168,24 +4168,24 @@ proc add_options_for_arm_simd32 { flags } {
>  return "$flags $et_arm_simd32_flags"
>  }
> 
> -# Return 1 if this is an ARM target supporting the saturation intrinsics
> -# from arm_acle.h.  Some multilibs may be incompatible with these options.
> -# Also set et_arm_qbit_flags to the best options to add.
> -# arm_acle.h includes stdint.h which can cause trouble with incompatible
> -# -mfloat-abi= options.
> -
> -proc check_effective_target_arm_qbit_ok_nocache { } {
> -global et_arm_qbit_flags
> -set et_arm_qbit_flags ""
> -foreach flags {"" "-march=armv5te" "-march=armv5te -mfloat-abi=softfp"
> "-march=armv5te -mfloat-abi=hard"} {
> -  if { [check_no_compiler_messages_nocache et_arm_qbit_flags object {
> +# Return 1 if this is an ARM target supporting the __ssat and __usat
> +# saturation intrinsics from arm_acle.h.  Some multilibs may be
> +# incompatible with these options.  Also set et_arm_sat_flags to the
> +# best options to add.  arm_acle.h includes stdint.h which can cause
> +# trouble with incompatible -mfloat-abi= options.
> +
> +proc check_effective_target_arm_sat_ok_nocache { } {
> +global et_arm_sat_flags
> +set et_arm_sat_flags ""
> +foreach flags {"" "-march=armv6" "-march=armv6 -mfloat-abi=softfp" "-
> march=armv6 -mfloat-abi=hard -mfpu=vfp"} {
> +  if { [check_no_compiler_messages_nocache et_arm_sat_flags object {
>   #include 
>   int dummy;
> - #ifndef __ARM_FEATURE_QBIT
> - #error not QBIT
> + #ifndef __ARM_FEATURE_SAT
> + #error not SAT
>   #endif
>} "$flags"] } {
> - set et_arm_qbit_flags $flags
> + set et_arm_sat_flags $flags
>   return 1
>}
>  }
> @@ -4193,17 +4193,17 @@ proc
> check_effective_target_arm_qbit_ok_nocache { } {
>return 0
>  }
> 
> -proc check_effective_target_arm_qbit_ok { } {
> -return [check_cached_effective_target et_arm_qbit_flags \
> - check_effective

RE: [PATCH] testsuite/arm: Improve mve-vshr.c

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Improve mve-vshr.c
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Vector right shifts by immediate use vshr, while right shifts by
> > > vectors instead use vneg and vshl.
> > >
> > > This patch adds the corresponding scan-assembler-times that were
> > > missing.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vshr.c: Add more scan-assembler-times.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > index d4e658c..d4258e9 100644
> > > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > @@ -55,5 +55,12 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
> > >
> > >  /* MVE has only 128-bit vectors, so we can vectorize only half of the
> > > functions above.  */
> > > +/* Vector right shifts use vneg and left shifts.  */
> > > +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > +/* { dg-final { scan-assembler-times {vneg.s[0-9]+  q[0-9]+, q[0-9]+} 6 
> > > } }
> */
> > > +
> > > +
> > > +/* Shift by immediate.  */
> > >  /* { dg-final { scan-assembler-times {vshr.s[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > >  /* { dg-final { scan-assembler-times {vshr.u[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > --
> > > 2.7.4
> > >


RE: [PATCH] testsuite/arm: Factorize and increase coverage in mve-sub_1.c

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Factorize and increase coverage in mve-
> sub_1.c
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Use a template macro to factorize the existing test functions.
> > >
> > > This patch also adds a version to check subtraction with __fp16 type.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-26  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vsub_1.c: Factorize and add __fp16 test.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c | 60 +
> -
> > >  1 file changed, 21 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > index 842e5c6..5a6c345 100644
> > > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > @@ -5,60 +5,42 @@
> > >
> > >  #include 
> > >
> > > -void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
> > > -  int i;
> > > -  for (i=0; i<4; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a, 
> > > TYPE##BITS##_t *b) { \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP b[i];  \
> > > +}  \
> > >  }
> > >
> > > -void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
> > > -  int i;
> > > -  for (i=0; i<4; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > +/* 128-bit vectors.  */
> > > +FUNC(s, int, 32, 4, -, vsub)
> > > +FUNC(u, uint, 32, 4, -, vsub)
> > > +FUNC(s, int, 16, 8, -, vsub)
> > > +FUNC(u, uint, 16, 8, -, vsub)
> > > +FUNC(s, int, 8, 16, -, vsub)
> > > +FUNC(u, uint, 8, 16, -, vsub)
> > >
> > >  /* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > -
> > > -void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
> > > -  int i;
> > > -  for (i=0; i<8; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > > -void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
> > > -  int i;
> > > -  for (i=0; i<8; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > >  /* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > >
> > > -void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
> > > -  int i;
> > > -  for (i=0; i<16; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > > -void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
> > > +void test_vsub_f32 (float * dest, float * a, float * b) {
> > >int i;
> > > -  for (i=0; i<16; i++) {
> > > +  for (i=0; i<4; i++) {
> > >  dest[i] = a[i] - b[i];
> > >}
> > >  }
> > > +/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-
> 9]+} 1 } } */
> > >
> > > -/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > >
> > > -void test_vsub_f32 (float * dest, float * a, float * b) {
> > > +void test_vsub_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > >int i;
> > > -  for (i=0; i<4; i++) {
> > > +  for (i=0; i<8; i++) {
> > >  dest[i] = a[i] - b[i];
> > >}
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-
> 9]+} 1 } } */
> > >
> > > --
> > > 2.7.4
> > >


RE: [PATCH] testsuite/arm: Add mve-vadd-1.c test

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Add mve-vadd-1.c test
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Support for vadd has been present for a while, but it was lacking a
> > > test.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vadd-1.c: New.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c | 43
> ++
> > >  1 file changed, 43 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > > new file mode 100644
> > > index 000..15a9daa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > > @@ -0,0 +1,43 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include 
> > > +
> > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a, 
> > > TYPE##BITS##_t *b) { \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP b[i];  \
> > > +}  \
> > > +}
> > > +
> > > +/* 128-bit vectors.  */
> > > +FUNC(s, int, 32, 4, +, vadd)
> > > +FUNC(u, uint, 32, 4, +, vadd)
> > > +FUNC(s, int, 16, 8, +, vadd)
> > > +FUNC(u, uint, 16, 8, +, vadd)
> > > +FUNC(s, int, 8, 16, +, vadd)
> > > +FUNC(u, uint, 8, 16, +, vadd)
> > > +
> > > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +
> > > +void test_vadd_f32 (float * dest, float * a, float * b) {
> > > +  int i;
> > > +  for (i=0; i<4; i++) {
> > > +dest[i] = a[i] + b[i];
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > +
> > > +void test_vadd_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > > +  int i;
> > > +  for (i=0; i<8; i++) {
> > > +dest[i] = a[i] + b[i];
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > --
> > > 2.7.4
> > >


[Patch] OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

2021-05-17 Thread Tobias Burnus

OK for mainline?
It is an ice-on-invalid; does a GCC 11 backport nonetheless make sense?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

	PR fortran/100633

gcc/fortran/ChangeLog:

	* resolve.c (gfc_resolve_code): Reject nonintrinsic assignments in
	OMP WORKSHARE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/workshare-59.f90: New test.

 gcc/fortran/resolve.c   |  6 ++
 gcc/testsuite/gfortran.dg/gomp/workshare-59.f90 | 26 +
 2 files changed, 32 insertions(+)

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index c02bbed8739..747516fbc1d 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11940,6 +11940,12 @@ start:
 
 	  if (resolve_ordinary_assign (code, ns))
 	{
+	  if (omp_workshare_flag)
+		{
+		  gfc_error ("Expected intrinsic assignment in OMP WORKSHARE "
+			 "at %L", &code->loc);
+		  break;
+		}
 	  if (code->op == EXEC_COMPCALL)
 		goto compcall;
 	  else
diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90
new file mode 100644
index 000..65d04c2b55d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90
@@ -0,0 +1,26 @@
+! PR fortran/100633
+
+module defined_assign
+  interface assignment(=)
+module procedure work_assign
+  end interface
+
+  contains
+subroutine work_assign(a,b)
+  integer, intent(out) :: a
+  logical, intent(in) :: b(:)
+end subroutine work_assign
+end module defined_assign
+
+program omp_workshare
+  use defined_assign
+
+  integer :: a
+  logical :: l(10)
+  l = .TRUE.
+
+  !$omp workshare
+  a = l   ! { dg-error "Expected intrinsic assignment in OMP WORKSHARE" }
+  !$omp end workshare
+
+end program omp_workshare


RE: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
> >  wrote:
> > >
> > > This patch adds a test for the scalar mode of vadd, precisely noting
> > > that we do not yet use the T2 variants of vadd, which take a scalar as
> > > final argument.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vadd-scalar-1: New.
> > > ---
> > >  .../gcc.target/arm/simd/mve-vadd-scalar-1.c| 47
> ++
> > >  1 file changed, 47 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-
> scalar-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > > new file mode 100644
> > > index 000..bbf70e1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > > @@ -0,0 +1,47 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include 
> > > +
> > > +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a) { 
> > > \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP 1; \
> > > +}  \
> > > +}
> > > +
> > > +/* 128-bit vectors.  */
> > > +FUNC_IMM(s, int, 32, 4, +, vaddimm)
> > > +FUNC_IMM(u, uint, 32, 4, +, vaddimm)
> > > +FUNC_IMM(s, int, 16, 8, +, vaddimm)
> > > +FUNC_IMM(u, uint, 16, 8, +, vaddimm)
> > > +FUNC_IMM(s, int, 8, 16, +, vaddimm)
> > > +FUNC_IMM(u, uint, 8, 16, +, vaddimm)
> > > +
> > > +/* For the moment we do not select the T2 vadd variant operating on a
> scalar
> > > +   final argument.  */
> > > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +
> > > +void test_vaddimm_f32 (float * dest, float * a) {
> > > +  int i;
> > > +  for (i=0; i<4; i++) {
> > > +dest[i] = a[i] + 5.0;
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 1 { xfail *-*-* } } } */
> > > +
> > > +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> > > +void test_vaddimm_f16 (__fp16 * dest, __fp16 * a) {
> > > +  int i;
> > > +  __fp16 b = 5.0f16;
> > > +  for (i=0; i<8; i++) {
> > > +dest[i] = a[i] + b;
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 1 { xfail *-*-* } } } */
> > > --
> > > 2.7.4
> > >


Re: [PATCH] arm: Fix ICE with CMSE nonsecure call on Armv8.1-M [PR100333]

2021-05-17 Thread Alex Coplan via Gcc-patches
On 30/04/2021 09:30, Alex Coplan via Gcc-patches wrote:
> Hi,
> 
> As the PR shows, we ICE shortly after expanding nonsecure calls for
> Armv8.1-M.  For Armv8.1-M, we have TARGET_HAVE_FPCXT_CMSE. As it stands,
> the expander (arm.md:nonsecure_call_internal) moves the callee's address
> to a register (with copy_to_suggested_reg) only if
> !TARGET_HAVE_FPCXT_CMSE.
> 
> However, looking at the pattern which the insn appears to be intended to
> match (thumb2.md:*nonsecure_call_reg_thumb2_fpcxt), it requires the
> callee's address to be in a register.
> 
> This patch therefore just forces the callee's address into a register in
> the expander.
> 
> Testing:
>  * Regtested an arm-eabi cross configured with
>  --with-arch=armv8.1-m.main+mve.fp+fp.dp --with-float=hard. No regressions.
>  * Bootstrap and regtest on arm-linux-gnueabihf in progress.
> 
> OK for trunk and backports as appropriate if bootstrap looks good?

Ping? Bootstrap/regtest looked good, FWIW.

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/100333
>   * config/arm/arm.md (nonsecure_call_internal): Always ensure
>   callee's address is in a register.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/100333
>   * gcc.target/arm/cmse/pr100333.c: New test.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 45a471a887a..e2ad1a962e3 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -8580,18 +8580,21 @@ (define_expand "nonsecure_call_internal"
> (use (match_operand 2 "" ""))
> (clobber (reg:SI LR_REGNUM))])]
>"use_cmse"
> -  "
>{
> -if (!TARGET_HAVE_FPCXT_CMSE)
> -  {
> - rtx tmp =
> -   copy_to_suggested_reg (XEXP (operands[0], 0),
> -  gen_rtx_REG (SImode, R4_REGNUM),
> -  SImode);
> +rtx tmp = NULL_RTX;
> +rtx addr = XEXP (operands[0], 0);
>  
> - operands[0] = replace_equiv_address (operands[0], tmp);
> -  }
> -  }")
> +if (TARGET_HAVE_FPCXT_CMSE && !REG_P (addr))
> +  tmp = force_reg (SImode, addr);
> +else if (!TARGET_HAVE_FPCXT_CMSE)
> +  tmp = copy_to_suggested_reg (XEXP (operands[0], 0),
> +gen_rtx_REG (SImode, R4_REGNUM),
> +SImode);
> +
> +if (tmp)
> +  operands[0] = replace_equiv_address (operands[0], tmp);
> +  }
> +)
>  
>  (define_insn "*call_reg_armv5"
>[(call (mem:SI (match_operand:SI 0 "s_register_operand" "r"))
> diff --git a/gcc/testsuite/gcc.target/arm/cmse/pr100333.c 
> b/gcc/testsuite/gcc.target/arm/cmse/pr100333.c
> new file mode 100644
> index 000..d8e3d809f73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/cmse/pr100333.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mcmse" } */
> +typedef void __attribute__((cmse_nonsecure_call)) t(void);
> +t g;
> +void f() {
> +  g();
> +}


-- 
Alex


Re: [OG11] Merge GCC 11 into branch, cherry picks from mainline

2021-05-17 Thread Tobias Burnus

On 14.05.21 10:51, Tobias Burnus wrote:


OG11 = devel/omp/gcc-11, a branch with some OpenMP/OpenACC/offload
patches
which are not yet on mainline. Additionally, patches in this area are
cherry-picked from mainline


Changes since last email (cherry pick, merge, post-cherry-pick fix):

0b8439a602c Fortran/OpenMP: Support 'omp parallel master'
e9e03ca4b9f Merge branch 'releases/gcc-11' into devel/omp/gcc-11
17c55806b37 c-c++-common/gomp/map-6.c: Fix dg-error due to mapping changes

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


RE: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:08
> To: Andre Simoes Dias Vieira 
> Cc: gcc Patches 
> Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> 
> On Tue, 4 May 2021 at 15:41, Christophe Lyon 
> wrote:
> >
> > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> >  wrote:
> > >
> > > Hi Christophe,
> > >
> > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > Since MVE has a different set of vector comparison operators from
> > > > Neon, we have to update the expansion to take into account the new
> > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > with the inverted condition.
> > > >
> > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > >
> > > > For:
> > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > >
> > > > we now generate:
> > > > cmp_eq_vs32_reg:
> > > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d5, .L123+8
> > > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d7, .L123+24
> > > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > > .L124:
> > > >   .align  3
> > > > .L123:
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >
> > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> produces
> > > > a pair of vldr instead of vmov.i32, qX, #0
> > > I think ideally we would even want:
> > > vpte  eq, q0, q1
> > > vmovt.i32 q0, #0
> > > vmove.i32 q0, #1
> > >
> > > But we don't have a way to generate VPT blocks with multiple
> > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> >
> > TBH,  I looked at what LLVM generates currently ;-)
> >
> 
> Here is an updated version, which adds
> && (! || flag_unsafe_math_optimizations)
> to vcond_mask_
> 
> This condition was not present in the neon.md version I move to vec-
> common.md,
> but since the VDQW iterator includes V2SF and V4SF, it should take
> float-point flags into account.
> 

-  emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
+case NE:
+  if (TARGET_HAVE_MVE) {
+   rtx vpr_p0;

GNU style wants the '{' on the new line. This appears a few other times in the 
patch.

+   if (vcond_mve)
+ vpr_p0 = target;
+   else
+ vpr_p0 = gen_reg_rtx (HImode);
+
+   switch (cmp_mode)
+ {
+ case E_V16QImode:
+ case E_V8HImode:
+ case E_V4SImode:
+   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
(cmp_mode, op1)));
+   break;
+ case E_V8HFmode:
+ case E_V4SFmode:
+   if (TARGET_HAVE_MVE_FLOAT)
+ emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
force_reg (cmp_mode, op1)));
+   else
+ gcc_unreachable ();
+   break;
+ default:
+   gcc_unreachable ();
+ }

Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT or 
MODE_VECTOR_FLOAT here rather than have this switch statement.

+
+   /* If we are not expanding a vcond, build the result here.  */
+   if (!vcond_mve) {
+ rtx zero = gen_reg_rtx (cmp_result_mode);
+ rtx one = gen_reg_rtx (cmp_result_mode);
+ emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
+ emit_move_insn (one, CONST1_RTX (cmp_result_mode));
+ emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
zero, vpr_p0));
+   }
+  }
+  else

...
   bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-operands[4], operands[5], true);
+operands[4], operands[5], true, 
vcond_mve);
   if (inverted)
 std::swap (operands[1], operands[2]);
+  if (TARGET_NEON)
   emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
mask, operands[1], operands[2]));
+  else
+{
+  machine_mode cmp_mode = GET_MODE (operands[4]);
+  rtx vpr_p0 = mask;
+  rtx zero = gen_reg_rtx (cmp_mode);
+  rtx one = gen_reg_rtx (cmp_mode);
+  emit_move_insn (zero, CONST0_RTX (cmp_mode));
+  emit_move_insn (one, CONST1_RTX (cmp_mode));
+  switch (cmp_mode)
+   {
+   case E_V16QImode:
+   case E_V8HImode:
+   case E_V4SImode:
+ emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
one, zero, vpr_p0));
+ break;
+   case E_V8HFmode:
+   case E_V4SFmode:
+ if (TARGET_HAVE_MVE_FLOAT)
+

Re: [Patch] OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

2021-05-17 Thread Jakub Jelinek via Gcc-patches
On Mon, May 17, 2021 at 12:27:22PM +0200, Tobias Burnus wrote:
> OK for mainline?
> It is an ice-on-invalid; does a GCC 11 backport nonetheless make sense?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
> Thürauf

> OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]
> 
>   PR fortran/100633
> 
> gcc/fortran/ChangeLog:
> 
>   * resolve.c (gfc_resolve_code): Reject nonintrinsic assignments in
>   OMP WORKSHARE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/workshare-59.f90: New test.

LGTM for both trunk and 11.
Thanks.

Jakub



RE: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:09
> To: Andre Simoes Dias Vieira 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16
> support to VCMP
> 
> On Tue, 4 May 2021 at 19:03, Christophe Lyon 
> wrote:
> >
> > On Tue, 4 May 2021 at 15:43, Christophe Lyon
>  wrote:
> > >
> > > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists)
> > >  wrote:
> > > >
> > > > It would be good to also add tests for NEON as you also enable auto-
> vec
> > > > for it. I checked and I do think the necessary 'neon_vc' patterns exist
> > > > for 'VH', so we should be OK there.
> > > >
> > >
> > > Actually since I posted the patch series, I've noticed a regression in
> > > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t
> loops,
> > > but we lose the fact that some FP comparisons can throw exceptions.
> > >
> > > I'll have to revisit this patch.
> >
> > Actually it looks like my patch does the right thing: we now vectorize
> > appropriately, given that the testcase is compiled with -ffast-math.
> > I need to update the testcase, though.
> >
> 
> Here is a new version, with armv8_2-fp16-arith-1.c updated to take
> into account the new vectorization.

Ok.
Thanks,
Kyrill

> 
> Christophe
> 
> 
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > This patch adds __fp16 support to the previous patch that added
> vcmp
> > > > > support with MVE. For this we update existing expanders to use
> VDQWH
> > > > > iterator, and add a new expander vcond.  In the
> > > > > process we need to create suitable iterators, and update
> v_cmp_result
> > > > > as needed.
> > > > >
> > > > > 2021-04-26  Christophe Lyon  
> > > > >
> > > > >   gcc/
> > > > >   * config/arm/iterators.md (V16): New iterator.
> > > > >   (VH_cvtto): New iterator.
> > > > >   (v_cmp_result): Added V4HF and V8HF support.
> > > > >   * config/arm/vec-common.md (vec_cmp):
> Use VDQWH.
> > > > >   (vcond): Likewise.
> > > > >   (vcond_mask_): Likewise.
> > > > >   (vcond): New expander.
> > > > >
> > > > >   gcc/testsuite/
> > > > >   * gcc.target/arm/simd/mve-compare-3.c: New test with GCC
> vectors.
> > > > >   * gcc.target/arm/simd/mve-vcmp-f16.c: New test for
> > > > >   auto-vectorization.
> > > > > ---
> > > > >   gcc/config/arm/iterators.md   |  6 
> > > > >   gcc/config/arm/vec-common.md  | 40
> ---
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38
> +
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c  | 30
> +
> > > > >   4 files changed, 102 insertions(+), 12 deletions(-)
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-
> compare-3.c
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-
> f16.c
> > > > >
> > > > > diff --git a/gcc/config/arm/iterators.md
> b/gcc/config/arm/iterators.md
> > > > > index a128465..3042baf 100644
> > > > > --- a/gcc/config/arm/iterators.md
> > > > > +++ b/gcc/config/arm/iterators.md
> > > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
> > > > >   ;; Vector modes for 16-bit floating-point support.
> > > > >   (define_mode_iterator VH [V8HF V4HF])
> > > > >
> > > > > +;; Modes with 16-bit elements only.
> > > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
> > > > > +
> > > > >   ;; 16-bit floating-point vector modes suitable for moving (includes
> BFmode).
> > > > >   (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
> > > > >
> > > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf")
> (V2SF "v2si")
> > > > >   ;; (Opposite) mode to convert to/from for vector-half mode
> conversions.
> > > > >   (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
> > > > >   (V8HI "V8HF") (V8HF "V8HI")])
> > > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
> > > > > + (V8HI "v8hf") (V8HF "v8hi")])
> > > > >
> > > > >   ;; Define element mode for each vector mode.
> > > > >   (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
> > > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI
> "V8QI") (V16QI "V16QI")
> > > > >   (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
> > > > >   (V4HI "v4hi") (V8HI  "v8hi")
> > > > >   (V2SI "v2si") (V4SI  "v4si")
> > > > > + (V4HF "v4hi") (V8HF  "v8hi")
> > > > >   (DI   "di")   (V2DI  "v2di")
> > > > >   (V2SF "v2si") (V4SF  "v4si")])
> > > > >
> > > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> > > > > index 034b48b..3fd341c 100644
> > > > > --- a/gcc/config/arm/vec-common.md
> > > > 

[PATCH][DOCS] Remove install-old.texi

2021-05-17 Thread Martin Liška

Hello.

As mentioned at the beginning of https://gcc.gnu.org/install/old.html:
"Note most of this information is out of date and superseded by the previous 
chapters of this manual."

The installation page is deprecated for 20 years now.

Does it make sense to remove it?
Thanks,
Martin

gcc/ChangeLog:

* Makefile.in: Remove it.
* doc/include/fdl.texi: Update next/previous chapters.
* doc/install.texi: Likewise.
* doc/install-old.texi: Removed.
---
 gcc/Makefile.in  |   2 +-
 gcc/doc/include/fdl.texi |   2 +-
 gcc/doc/install-old.texi | 184 ---
 gcc/doc/install.texi |  20 +
 4 files changed, 3 insertions(+), 205 deletions(-)
 delete mode 100644 gcc/doc/install-old.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 1b5d3f4696c..5fd6ac97117 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3314,7 +3314,7 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
 match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
 
-TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi		\

+TEXI_GCCINSTALL_FILES = install.texi fdl.texi  \
 gcc-common.texi gcc-vers.texi
 
 TEXI_CPPINT_FILES = cppinternals.texi gcc-common.texi gcc-vers.texi

diff --git a/gcc/doc/include/fdl.texi b/gcc/doc/include/fdl.texi
index 4e3457fe9c4..7fa222c5f32 100644
--- a/gcc/doc/include/fdl.texi
+++ b/gcc/doc/include/fdl.texi
@@ -19,7 +19,7 @@ of this license document, but changing it is not allowed.
 @ifset gfdlhtml
 @ifnothtml
 @comment node-name, next,  previous, up
-@nodeGNU Free Documentation License, Concept Index, Old, Top
+@nodeGNU Free Documentation License, Concept Index, Specific, Top
 @end ifnothtml
 @html
 Installing GCC: GNU Free Documentation License
diff --git a/gcc/doc/install-old.texi b/gcc/doc/install-old.texi
deleted file mode 100644
index b425971f944..000
--- a/gcc/doc/install-old.texi
+++ /dev/null
@@ -1,184 +0,0 @@
-@c Copyright (C) 1988-2021 Free Software Foundation, Inc.
-@c This is part of the GCC manual.
-@c For copying conditions, see the file install.texi.
-
-@ifnothtml
-@comment node-name, next,  previous, up
-@nodeOld, GNU Free Documentation License, Specific, Top
-@end ifnothtml
-@html
-Old installation documentation
-@end html
-@ifnothtml
-@chapter Old installation documentation
-@end ifnothtml
-
-Note most of this information is out of date and superseded by the
-previous chapters of this manual.  It is provided for historical
-reference only, because of a lack of volunteers to merge it into the
-main manual.
-
-@ifnothtml
-@menu
-* Configurations::Configurations Supported by GCC.
-@end menu
-@end ifnothtml
-
-Here is the procedure for installing GCC on a GNU or Unix system.
-
-@enumerate
-@item
-If you have chosen a configuration for GCC which requires other GNU
-tools (such as GAS or the GNU linker) instead of the standard system
-tools, install the required tools in the build directory under the names
-@file{as}, @file{ld} or whatever is appropriate.
-
-Alternatively, you can do subsequent compilation using a value of the
-@code{PATH} environment variable such that the necessary GNU tools come
-before the standard system tools.
-
-@item
-Specify the host, build and target machine configurations.  You do this
-when you run the @file{configure} script.
-
-The @dfn{build} machine is the system which you are using, the
-@dfn{host} machine is the system where you want to run the resulting
-compiler (normally the build machine), and the @dfn{target} machine is
-the system for which you want the compiler to generate code.
-
-If you are building a compiler to produce code for the machine it runs
-on (a native compiler), you normally do not need to specify any operands
-to @file{configure}; it will try to guess the type of machine you are on
-and use that as the build, host and target machines.  So you don't need
-to specify a configuration when building a native compiler unless
-@file{configure} cannot figure out what your configuration is or guesses
-wrong.
-
-In those cases, specify the build machine's @dfn{configuration name}
-with the @option{--host} option; the host and target will default to be
-the same as the host machine.
-
-Here is an example:
-
-@smallexample
-./configure --host=sparc-sun-sunos4.1
-@end smallexample
-
-A configuration name may be canonical or it may be more or less
-abbreviated.
-
-A canonical configuration name has three parts, separated by dashes.
-It looks like this: @samp{@var{cpu}-@var{company}-@var{system}}.
-(The three parts may themselves contain dashes; @file{configure}
-can figure out which dashes serve which purpose.)  For example,
-@samp{m68k-sun-sunos4.1} specifies a Sun 3.
-
-You can also replace parts of the configuration by nicknames or aliases.
-For example, @samp{sun3} stands for @samp{m68k-sun}, so

Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Richard Biener via Gcc-patches
On Mon, May 17, 2021 at 10:55 AM Joern Rennecke
 wrote:
>
> On Mon, 17 May 2021 at 08:36, Richard Biener  
> wrote:
> >
> > On Sun, May 16, 2021 at 8:53 PM Joern Rennecke
> >  wrote:
> > >
> > > For architectures with likely spilled register classes, neither
> > > register allocator is guaranteed
> > > to succeed when using optimization.  If you have just a few files to
> > > compile, you can try
> > > by hand which compiler options will succeed and still give reasonable
> > > code, but for large projects,
> > > hand-tweaking library / program build rules on a file-by-file basis is
> > > time intensive and does not
> > > scale well across different build environments and compiler versions.
> > >
> > > The attached patch adds a new option -fretry-compilation that allows
> > > you to specify a list - or
> > > lists - of options to use for a compilation retry, which is
> > > implemented in the compiler driver.
> > >
> > > Bootstrapped on x86_64-pc-linux-gnu.
> >
> > Eh, no ;)  But funny idea, nevertheless.
>
> Why no?
>
> lra just throws a ton of transformations at the code with no theoretical
> concept that I can discern that it should - modulo bugs - succeed for
> all well-formed code.  It works well most of the time so I'd like to use it as
> a default, but how are you supposed to compile libgcc and newlib with
> a register allocator that only works most of the time?
>
> reload is more robust in the basic design, but it's so complex that it's
> rather time-consuming to debug.  The failures I had left with reload
> were not spill-failures per se, but code that was considered mal-formed by
> the postreload passes and it's hard to decide which one was actually wrong.
> And if I debug the failures seeen with realod, will this do any good in the
> long run, or will it just be changed beyond all recognition (with works for
> the top five most popular processor architectures but not quite for anything
> else) or plain ripped out a few years down the line?

The plan for reload is to axe it similar to CC0 support.  Sooner than later, but
give it's still used exclusively by a lot of target means it might
take some time.

> I had a proof-of-concept for the option in the target code first, but that 
> used
> fork(2) and thus left non-POSIX hosts (even if they have a pretend POSIX
> subsystem) high and dry.  The logical place to implement the option to
> make it portable is in the compiler driver.
> I've called the option originally -mretry-regalloc / -fretry-regalloc, but 
> when
> I got around to write the invoke.texi patch, I realized that the option can be
> used more generally to work around glitches, so it's more apt to name it
> -fretry-compilation .

So for you it's always just -fretry-compilation -m[no-]lra?  Given -m[no-]lra
is a thing cycling between the two directly in RA lra/reload should be possible?
Or are reload/LRA too greedy in that they ICE when having transformed half
of the code already?

> > Do you run into the issues
> > with the first scheduling pass disabled?
>
> The target doesn't have anything that needs scheduling, and hence no 
> scheduling
> description.  But it also has more severe register pressures for
> memory access than
> ports in the FSF tree.

I see.  It's of course difficult for the FSF tree to cater for
extremes that are not
represented in its tree.  I wonder what prevents you from contributing the port?

> The bane of lra are memory-memory moves.  Instead of using an intermediate
> register, it starts by reloading the well-formed addresses and thus jacking up
> the base register pressure.
>
> I had a patch for that, but I found it needs a bit more work.

Still if that solves a lot of the issues this seems like the way to go.

Richard.


[committed] libstdc++: Allow lualatex to be used for Doxygen PDF

2021-05-17 Thread Jonathan Wakely via Gcc-patches
This allows the Doxygen PDF to be built using lualatex instead of
pdflatex, which solves a problem with pdflatex running out of memory
sometimes. This is done by adding a --latex_cmd option to the
run_doxygen script, which then sets the specified command in the
generated user.cfg file used by Doxygen. The makefile is adjusted to
pass --latex_cmd=$(LATEX_CMD) to the script, so using running make with
LATEX_CMD=lualatex will override the default.

Additionally, this does some refactoring of the doc/Makefile.am rules
and the run_doxygen script.

libstdc++-v3/ChangeLog:

* doc/Makefile.am: Simplify doxygen recipes and use --latex_cmd.
* doc/Makefile.in: Regenerate.
* doc/doxygen/user.cfg.in (LATEX_CMD_NAME): Add placeholder
value.
* scripts/run_doxygen (print_usage): Always print to stdout and
do not exit.
(fail): New function for exiting on error.
(parse_options): Handle --latex_cmd. Do not treat --help the
same as errors. Simplify handling of required arguments.

Tested x86_64-linux. Committed to trunk.

commit e3b6d3a887fc0df09ea742c9c5a5acbc27c11ea7
Author: Jonathan Wakely 
Date:   Fri May 14 14:19:50 2021

libstdc++: Allow lualatex to be used for Doxygen PDF

This allows the Doxygen PDF to be built using lualatex instead of
pdflatex, which solves a problem with pdflatex running out of memory
sometimes. This is done by adding a --latex_cmd option to the
run_doxygen script, which then sets the specified command in the
generated user.cfg file used by Doxygen. The makefile is adjusted to
pass --latex_cmd=$(LATEX_CMD) to the script, so using running make with
LATEX_CMD=lualatex will override the default.

Additionally, this does some refactoring of the doc/Makefile.am rules
and the run_doxygen script.

libstdc++-v3/ChangeLog:

* doc/Makefile.am: Simplify doxygen recipes and use --latex_cmd.
* doc/Makefile.in: Regenerate.
* doc/doxygen/user.cfg.in (LATEX_CMD_NAME): Add placeholder
value.
* scripts/run_doxygen (print_usage): Always print to stdout and
do not exit.
(fail): New function for exiting on error.
(parse_options): Handle --latex_cmd. Do not treat --help the
same as errors. Simplify handling of required arguments.

diff --git a/libstdc++-v3/doc/Makefile.am b/libstdc++-v3/doc/Makefile.am
index 2f8bb0770f3..487e8621b23 100644
--- a/libstdc++-v3/doc/Makefile.am
+++ b/libstdc++-v3/doc/Makefile.am
@@ -226,10 +226,10 @@ ${doxygen_outdir}/man:
mkdir -p ${doxygen_outdir}/man
 
 stamp-xml-doxygen: ${doxygen_outdir}/xml
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=xml $${srcdir} $${builddir} NO)
+ --host_alias=${host_alias} --mode=xml \
+ "${top_srcdir}" "$${builddir}" NO || true
$(STAMP) stamp-xml-doxygen
 
 stamp-xml-single-doxygen: stamp-xml-doxygen
@@ -239,29 +239,29 @@ stamp-xml-single-doxygen: stamp-xml-doxygen
$(STAMP) stamp-xml-single-doxygen
 
 stamp-html-doxygen: ${doxygen_outdir}/html
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=html $${srcdir} $${builddir} YES)
+ --host_alias=${host_alias} --mode=html \
+ "${top_srcdir}" "$${builddir}" YES || true
$(STAMP) stamp-html-doxygen
 
 stamp-latex-doxygen: ${doxygen_outdir}/latex
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=latex $${srcdir} $${builddir} NO)
+ --host_alias=${host_alias} --mode=latex --latex_cmd=$(LATEX_CMD) \
+ "${top_srcdir}" "$${builddir}" NO || true
$(STAMP) stamp-latex-doxygen
 
 # Chance of loonnggg creation time on this rule.  Iff this fails,
 # look at refman.log and see if TeX's memory is exhausted. Symptoms
 # include asking a wizard to enlarge capacity. If this is the case,
 # find texmf.cnf and add a zero for pool_size, string_vacancies,
-# max_strings, and pool_free values. A much simpler workaround is to install
-# lualatex and set LATEX_CMD_NAME = lualatex in the doxygen user.cfg file.
+# max_strings, and pool_free values. A much simpler workaround is to
+# install lualatex and set LATEX_CMD=lualatex when running make.
 # Errors like "File `foo.sty' not found" mean a TeX package is missing.
 stamp-pdf-doxygen: stamp-latex-doxygen ${doxygen_outdir}/pdf
-   -(cd ${doxygen_outdir}/latex && $(MAKE) -i pdf;)
@echo "Generating doxygen pdf file...";
+   -

[PATCH] c++: Fix diagnostic for binding lvalue reference to volatile rvalue [PR 100635]

2021-05-17 Thread Jonathan Wakely via Gcc-patches
The current diagnostic assumes the reference binding fails because the
reference is non-const, but it can also fail if the rvalue is volatile.

Use the current diagnostic for non-const cases, and a modified
diagnostic otherwise.

gcc/cp/ChangeLog:

PR c++/100635
* call.c (convert_like_internal): Print different diagnostic if
the lvalue reference is const.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/pr100635.C: New test.

Tested powerpc64le-linux.

OK for trunk?


commit 26624b68aebd80d0c922ee48f944124dcc8c02e2
Author: Jonathan Wakely 
Date:   Mon May 17 10:53:56 2021

c++: Fix diagnostic for binding lvalue reference to volatile rvalue [PR 
100635]

The current diagnostic assumes the reference binding fails because the
reference is non-const, but it can also fail if the rvalue is volatile.

Use the current diagnostic for non-const cases, and a modified
diagnostic otherwise.

gcc/cp/ChangeLog:

PR c++/100635
* call.c (convert_like_internal): Print different diagnostic if
the lvalue reference is const.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/pr100635.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f07e09a36d1..1e2d1d43184 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7900,9 +7900,13 @@ convert_like_internal (conversion *convs, tree expr, 
tree fn, int argnum,
  "type %qH to a value of type %qI",
  totype, next->type);
  }
-   else
+   else if (!CP_TYPE_CONST_P (TREE_TYPE (ref_type)))
  error_at (loc, "cannot bind non-const lvalue reference of "
"type %qH to an rvalue of type %qI", totype, 
extype);
+   else // extype is volatile
+ error_at (loc, "cannot bind lvalue reference of type "
+   "%qH to an rvalue of type %qI", totype,
+   extype);
  }
else if (!reference_compatible_p (TREE_TYPE (totype), extype))
  {
diff --git a/gcc/testsuite/g++.dg/conversion/pr100635.C 
b/gcc/testsuite/g++.dg/conversion/pr100635.C
new file mode 100644
index 000..58412152238
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr100635.C
@@ -0,0 +1,12 @@
+// PR c++/100635
+// { dg-do compile }
+// { dg-additional-options "-Wno-volatile" { target c++2a } }
+
+struct S { };
+volatile S v();
+const volatile S& svol = v(); // { dg-error "cannot bind lvalue reference of 
type 'const volatile S&' to an rvalue of type 'volatile S'" }
+
+#if __cplusplus >= 201103L
+volatile int&& declvol();
+const volatile int& voli = declvol(); // { dg-error "cannot bind lvalue 
reference of type 'const volatile int&' to an rvalue of type 'volatile int'" "" 
{ target c++11} }
+#endif


[PATCH] aix: handle 64bit inodes for include directories

2021-05-17 Thread CHIGOT, CLEMENT via Gcc-patches
On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
If the inode is larger, it will return -1 in st_ino.
Thus, in incpath.c when comparing include directories, if several
of them have 64bit inodes, they will be considered as duplicated.

gcc/ChangeLog:
2021-05-06  Clément Chigot  

* configure.ac: Check sizeof ino_t and dev_t.
* config.in: Regenerate.
* configure: Regenerate.
* config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
* incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
(remove_duplicates): Use it.

libcpp/ChangeLog:
2021-05-06  Clément Chigot  

* configure.ac: Check sizeof ino_t and dev_t.
* config.in: Regenerate.
* configure: Regenerate.
* include/cpplib.h (INO_T_CPP): Change for AIX.
(DEV_T_CPP): New macro.
(struct cpp_dir): Use it.





0001-aix-handle-64bit-inodes-for-include-directories.patch
Description: 0001-aix-handle-64bit-inodes-for-include-directories.patch


[PATCH] RISC-V: Properly parse the letter 'p' in '-march'.

2021-05-17 Thread Geng Qi via Gcc-patches
gcc/ChangeLog:
* common/config/riscv/riscv-common.c
(riscv_subset_list::parsing_subset_version): Properly parse the letter
'p' in '-march'.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-12.c: New.
* gcc.target/riscv/attribute-19.c: New.
---
 gcc/common/config/riscv/riscv-common.c| 64 +--
 gcc/testsuite/gcc.target/riscv/arch-12.c  |  4 ++
 gcc/testsuite/gcc.target/riscv/attribute-19.c |  4 ++
 3 files changed, 40 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-19.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 34b74e5..76f544e 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -518,40 +518,38 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   unsigned version = 0;
   unsigned major = 0;
   unsigned minor = 0;
-  char np;
   *explicit_version_p = false;
 
-  for (; *p; ++p)
-{
-  if (*p == 'p')
-   {
- np = *(p + 1);
-
- if (!ISDIGIT (np))
-   {
- /* Might be beginning of `p` extension.  */
- if (std_ext_p)
-   {
- get_default_version (ext, major_version, minor_version);
- return p;
-   }
- else
-   {
- error_at (m_loc, "%<-march=%s%>: Expect number "
-   "after %<%dp%>.", m_arch, version);
- return NULL;
-   }
-   }
-
- major = version;
- major_p = false;
- version = 0;
-   }
-  else if (ISDIGIT (*p))
-   version = (version * 10) + (*p - '0');
-  else
-   break;
-}
+  if (*p == 'p')
+gcc_assert (std_ext_p);
+  else {
+for (; *p; ++p)
+  {
+   if (*p == 'p')
+ {
+   if (!ISDIGIT (*(p+1)))
+ {
+   error_at (m_loc, "%<-march=%s%>: Expect number "
+ "after %<%dp%>.", m_arch, version);
+   return NULL;
+ }
+   if (!major_p)
+ {
+   error_at (m_loc, "%<-march=%s%>: For %<%s%dp%dp?%>, version "
+ "number with more than 2 level is not supported.",
+ m_arch, ext, major, version);
+   return NULL;
+ }
+   major = version;
+   major_p = false;
+   version = 0;
+ }
+   else if (ISDIGIT (*p))
+ version = (version * 10) + (*p - '0');
+   else
+ break;
+  }
+  }
 
   if (major_p)
 major = version;
@@ -681,6 +679,8 @@ riscv_subset_list::parse_std_ext (const char *p)
 
   p = parsing_subset_version (subset, p, &major_version, &minor_version,
  /* std_ext_p= */ true, &explicit_version_p);
+  if (p == NULL)
+   return NULL;
 
   add (subset, major_version, minor_version, explicit_version_p, false);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-12.c 
b/gcc/testsuite/gcc.target/riscv/arch-12.c
new file mode 100644
index 000..29e16c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-12.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64im1p2p3 -mabi=lp64" } */
+int foo() {}
+/* { dg-error "'-march=rv64im1p2p3': For 'm1p2p\\?', version number with more 
than 2 level is not supported." "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-19.c 
b/gcc/testsuite/gcc.target/riscv/attribute-19.c
new file mode 100644
index 000..18f68d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-19.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-mriscv-attribute -march=rv64imp0p9 -mabi=lp64" } */
+int foo() {}
+/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p0_m2p0_p0p9\"" } } */
-- 
2.7.4



Re: [PATCH 4/5] Rework indirect struct handling for OpenACC/OpenMP in gimplify.c

2021-05-17 Thread Bernd Edlinger
On 5/14/21 11:26 PM, Julian Brown wrote:
> This patch reworks indirect struct handling in gimplify.c (i.e. for struct
> components mapped with "mystruct->a[0:n]", "mystruct->b", etc.), for
> both OpenACC and OpenMP.  The key observation leading to these changes
> was that component mappings of references-to-structures is already
> implemented and working, and indirect struct component handling via a
> pointer can work quite similarly.  That lets us remove some earlier,
> special-case handling for mapping indirect struct component accesses
> for OpenACC, which required the pointed-to struct to be manually mapped
> before the indirect component mapping.
> 
> With this patch, you can map struct components directly (e.g. an array
> slice "mystruct->a[0:n]") just like you can map a non-indirect struct
> component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
> the former syntax) and references to structs (with the latter syntax)
> work now.
> 
> For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
> class metadata (the structure that points to the class data and vptr)
> -- it is instead treated as any other struct.
> 
> For C++, the struct handling also works for class members ("this->foo"),
> without having to explicitly map "this[:1]" first.
> 
> For OpenACC, we permit chained indirect component references
> ("mystruct->a->b[0:n]"), though only the last part of such mappings will
> trigger an attach/detach operation.  To properly use such a construct
> on the target, you must still manually map "mystruct->a[:1]" first --
> but there's no need to map "mystruct[:1]" explicitly before that.
> 
> This patch incorporates parts of Chung-Lin's patch "Recommit "Enable
> gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF
> ...)) map clauses"." from the og10 branch.
> 
> OK for trunk?
> 
> Thanks,
> 
> Julian
> 
> 2021-05-14  Julian Brown  
>   Chung-Lin Tang  
> 
> gcc/fortran/
>   * trans-openmp.c (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
>   mappings for class metadata, nor GOMP_MAP_POINTER mappings for
>   POINTER_TYPE_P decls.
> 
> gcc/
>   * gimplify.c (tree-hash-traits.h): Include.
>   (extract_base_bit_offset): Add BASE_IND parameter.  Handle
>   pointer-typed indirect references alongside reference-typed ones.
>   (strip_components_and_deref, aggregate_base_p): New functions.
>   (build_struct_group): Update struct_map_to_clause type.  Add pointer
>   type indirect ref handling, including chained references.  Handle
>   pointers and references to structs in OpenACC regions as well as
>   OpenMP ones.
>   (gimplify_scan_omp_clauses): Remove struct_deref_set handling.  Rework
>   pointer-type indirect structure access handling to work more like
>   the reference-typed handling.
>   * omp-low.c (scan_sharing_clauses): Handle pointer-type indirect struct
>   references, and references to pointers to structs also.
> 
> gcc/testsuite/
>   * g++.dg/goacc/member-array-acc.C: New test (XFAILed for now).
>   * g++.dg/gomp/member-array-omp.C: New test (XFAILed for now).
> 
> libgomp/
>   * testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
>   * testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
>   * testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.
> ---
>  gcc/fortran/trans-openmp.c|  20 +-
>  gcc/gimplify.c| 285 ++
>  gcc/omp-low.c |  16 +-
>  gcc/testsuite/g++.dg/goacc/member-array-acc.C |  14 +
>  gcc/testsuite/g++.dg/gomp/member-array-omp.C  |  14 +
>  .../testsuite/libgomp.oacc-c++/deep-copy-17.C | 101 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-15.c  |  71 +
>  .../libgomp.oacc-c-c++-common/deep-copy-16.c  | 231 ++
>  8 files changed, 612 insertions(+), 140 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
>  create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c
> 
> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> index 5666cd68c7e..ff614ffe744 100644
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -2721,30 +2721,16 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
> tree present = gfc_omp_check_optional_argument (decl, true);
> if (openacc && n->sym->ts.type == BT_CLASS)
>   {
> -   tree type = TREE_TYPE (decl);
> if (n->sym->attr.optional)
>   sorry ("optional class parameter");
> -   if (POINTER_TYPE_P (type))
> - {
> -

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-17 Thread Christophe Lyon via Gcc-patches
On Mon, 17 May 2021 at 12:35, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 05 May 2021 15:08
> > To: Andre Simoes Dias Vieira 
> > Cc: gcc Patches 
> > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> >
> > On Tue, 4 May 2021 at 15:41, Christophe Lyon 
> > wrote:
> > >
> > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> > >  wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > Since MVE has a different set of vector comparison operators from
> > > > > Neon, we have to update the expansion to take into account the new
> > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > > with the inverted condition.
> > > > >
> > > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > > >
> > > > > For:
> > > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > > >
> > > > > we now generate:
> > > > > cmp_eq_vs32_reg:
> > > > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > > > >   vldr.64 d5, .L123+8
> > > > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > > > >   vldr.64 d7, .L123+24
> > > > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > > > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > > > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > > > .L124:
> > > > >   .align  3
> > > > > .L123:
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   1
> > > > >   .word   1
> > > > >   .word   1
> > > > >   .word   1
> > > > >
> > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> > produces
> > > > > a pair of vldr instead of vmov.i32, qX, #0
> > > > I think ideally we would even want:
> > > > vpte  eq, q0, q1
> > > > vmovt.i32 q0, #0
> > > > vmove.i32 q0, #1
> > > >
> > > > But we don't have a way to generate VPT blocks with multiple
> > > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> > >
> > > TBH,  I looked at what LLVM generates currently ;-)
> > >
> >
> > Here is an updated version, which adds
> > && (! || flag_unsafe_math_optimizations)
> > to vcond_mask_
> >
> > This condition was not present in the neon.md version I move to vec-
> > common.md,
> > but since the VDQW iterator includes V2SF and V4SF, it should take
> > float-point flags into account.
> >
>
> -  emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> +case NE:
> +  if (TARGET_HAVE_MVE) {
> +   rtx vpr_p0;
>
> GNU style wants the '{' on the new line. This appears a few other times in 
> the patch.
>
> +   if (vcond_mve)
> + vpr_p0 = target;
> +   else
> + vpr_p0 = gen_reg_rtx (HImode);
> +
> +   switch (cmp_mode)
> + {
> + case E_V16QImode:
> + case E_V8HImode:
> + case E_V4SImode:
> +   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
> (cmp_mode, op1)));
> +   break;
> + case E_V8HFmode:
> + case E_V4SFmode:
> +   if (TARGET_HAVE_MVE_FLOAT)
> + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
> force_reg (cmp_mode, op1)));
> +   else
> + gcc_unreachable ();
> +   break;
> + default:
> +   gcc_unreachable ();
> + }
>
> Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT 
> or MODE_VECTOR_FLOAT here rather than have this switch statement.
>
> +
> +   /* If we are not expanding a vcond, build the result here.  */
> +   if (!vcond_mve) {
> + rtx zero = gen_reg_rtx (cmp_result_mode);
> + rtx one = gen_reg_rtx (cmp_result_mode);
> + emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> + emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
> zero, vpr_p0));
> +   }
> +  }
> +  else
>
> ...
>bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
> -operands[4], operands[5], true);
> +operands[4], operands[5], true, 
> vcond_mve);
>if (inverted)
>  std::swap (operands[1], operands[2]);
> +  if (TARGET_NEON)
>emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
> mask, operands[1], operands[2]));
> +  else
> +{
> +  machine_mode cmp_mode = GET_MODE (operands[4]);
> +  rtx vpr_p0 = mask;
> +  rtx zero = gen_reg_rtx (cmp_mode);
> +  rtx one = gen_reg_rtx (cmp_mode);
> +  emit_move_insn (zero, CONST0_RTX (cmp_mode));
> +  emit_move_insn (one, CONST1_RTX (cmp_

Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC

2021-05-17 Thread Chung-Lin Tang

Hi Julian,

On 2021/5/15 5:27 AM, Julian Brown wrote:

GCC currently raises a parse error for indirect accesses to struct
members, where the base of the access is a reference to a pointer.
This patch fixes that case.



gcc/cp/
* semantics.c (finish_omp_clauses): Handle components of references to
pointers to structs.

libgomp/
* testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test.



--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
  && TREE_CODE (t) == COMPONENT_REF
  && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF)
-   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+   {
+ t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+ /* References to pointers have a double indirection here.  */
+ if (TREE_CODE (t) == INDIRECT_REF)
+   t = TREE_OPERAND (t, 0);
+   }
  if (TREE_CODE (t) == COMPONENT_REF
  && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
  || ort == C_ORT_ACC)


There is already a large plethora of such modifications in this patch:
"[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping 
semantics, and other front-end adjustments."
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

I am in the process of taking that patch to mainline, so are you sure this is 
not already handled there?


diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C 
b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
index dacbb520f3d..e038e9e3802 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
@@ -83,7 +83,7 @@ void strrp (void)
a[0] = 8;
c[0] = 10;
e[0] = 12;
-  #pragma acc parallel copy(n->a[0:10], n->c[0:10], n->e[0:10])
+  #pragma acc parallel copy(n->a[0:10], n->b, n->c[0:10], n->d, n->e[0:10])
{
  n->a[0] = n->c[0] + n->e[0];
}


This testcase can be added.

Chung-Lin






Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-17 Thread Chung-Lin Tang

On 2021/5/11 4:57 PM, Julian Brown wrote:

This work-in-progress patch tries to get
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
to be processed by build_struct_group/build_struct_comp_map.  I think
that's important to integrate with how groups of mappings for array
sections are handled in other cases.

This patch isn't sufficient by itself to fix a couple of broken test cases
at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C),
though.


No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just a slightly
different behavior version of GOMP_MAP_ATTACH; it tolerates an unmapped
pointer-target and assigns NULL on the device, instead of just gomp_fatal().
(see its handling in libgomp/target.c)

In case OpenACC can have the same such zero-length array section behavior,
we can just share one GOMP_MAP_ATTACH map. For now it is treated as separate
cases.

Chung-Lin


2021-05-11  Julian Brown  

gcc/
* gimplify.c (build_struct_comp_nodes): Add
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling.
(build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
as part of pointer group.
(gimplify_scan_omp_clauses): Update prev_list_p such that
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer
group.
---
  gcc/gimplify.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6d204908c82..c5cb486aa23 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
if (grp_mid
&& OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP
&& (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER
- || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH))
+ || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH
+ || (OMP_CLAUSE_MAP_KIND (grp_mid)
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
  {
tree c3
= build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
@@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
 ? splay_tree_lookup (ctx->variables, (splay_tree_key) decl)
 : NULL);
bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER);
-  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH);
+  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
+   || (OMP_CLAUSE_MAP_KIND (c)
+   == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION));
bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
 || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH);
bool has_attachments = false;
/* For OpenACC, pointers in structs should trigger an attach action.  */
-  if (attach_detach
+  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
&& ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA))
  || code == OMP_TARGET_ENTER_DATA
  || code == OMP_TARGET_EXIT_DATA))
@@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  if (!remove
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
  && OMP_CLAUSE_CHAIN (c)
  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP
@@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
  == GOMP_MAP_ATTACH_DETACH)
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
- == GOMP_MAP_TO_PSET)))
+ == GOMP_MAP_TO_PSET)
+ || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
prev_list_p = list_p;
  
  	  break;




[PATCH v3 04/12] Remove MAX_BITSIZE_MODE_ANY_INT

2021-05-17 Thread H.J. Lu via Gcc-patches
It is only defined for i386 and everyone uses the default:

 #define MAX_BITSIZE_MODE_ANY_INT (64*BITS_PER_UNIT)

Whatever problems we had before, they have been fixed now.

* config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): Removed.
---
 gcc/config/i386/i386-modes.def | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index dbddfd8e48f..4e7014be034 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -107,19 +107,10 @@ INT_MODE (XI, 64);
 PARTIAL_INT_MODE (HI, 16, P2QI);
 PARTIAL_INT_MODE (SI, 32, P2HI);
 
-/* Mode used for signed overflow checking of TImode.  As
-   MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that
-   rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc.,
-   so OImode is too large.  For the overflow checking we actually need
-   just 1 or 2 bits beyond TImode precision.  Use 160 bits to have
-   a multiple of 32.  */
+/* Mode used for signed overflow checking of TImode.  For the overflow
+   checking we actually need just 1 or 2 bits beyond TImode precision.
+   Use 160 bits to have a multiple of 32.  */
 PARTIAL_INT_MODE (OI, 160, POI);
 
-/* Keep the OI and XI modes from confusing the compiler into thinking
-   that these modes could actually be used for computation.  They are
-   only holders for vectors during data movement.  Include POImode precision
-   though.  */
-#define MAX_BITSIZE_MODE_ANY_INT (160)
-
 /* The symbol Pmode stands for one of the above machine modes (usually SImode).
The tm.h file specifies which one.  It is not a distinct mode.  */
-- 
2.31.1



[PATCH v3 00/12] Allow TImode/OImode/XImode in op_by_pieces operations

2021-05-17 Thread H.J. Lu via Gcc-patches
Changes in the v3 patches:

1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
into the generic part and the x86 part.


1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

 Before After
libc.so 19065721906444

Some code sequence differences in libc.so are:

:
...
jne   | jne

test   %r15,%r15test   
%r15,%r15
je| je 

mov%r13d,(%r14) mov
%r13d,(%r14)
lea0x10(%r14),%rdi  lea
0x10(%r14),%rdi
mov$0x1,%ecxmov
$0x1,%ecx
mov%r13d,%edx   mov
%r13d,%edx
mov%r15,0x40(%r12)  mov
%r15,0x40(%r12)
mov%r15,%rsimov
%r15,%rsi
call call   

lea0xa2f9b(%rip),%rax# | lea
0xa2fab(%rip),%rax# 
xor%esi,%esixor
%esi,%esi
mov%ebp,%edimov
%ebp,%edi
mov%rax,0x8(%r12)   mov
%rax,0x8(%r12)
movzwl 0x12(%rsp),%eax  movzwl 
0x12(%rsp),%eax
mov$0x8,%edx  <
lea0xc(%rsp),%rcx   lea
0xc(%rsp),%rcx
mov%r14,0x48(%r12)<
add$0x40,%r14 <
mov$0x4,%r8dmov
$0x4,%r8d
  > movq   
$0x0,0x1d0(%r14)
  > mov
$0x8,%edx
rol$0x8,%ax rol
$0x8,%ax
mov%ebp,(%r12)| mov
%r14,0x48(%r12)
movq   $0x0,0x190(%r14)   | add
$0x40,%r14
mov%ax,0x4(%r12)  <
mov%r14,0x30(%r12)  mov
%r14,0x30(%r12)
  > mov
%ax,0x4(%r12)
  > mov
%ebp,(%r12)
movl   $0x1,0xc(%rsp)   movl   
$0x1,0xc(%rsp)
callcall   

mov%r12,%rdimov
%r12,%rdi
movabs $0x101010101010101,%rdx<
test   %eax,%eaxtest   
%eax,%eax
mov$0xff,%eax   mov
$0xff,%eax
cmove  %eax,%ebxcmove  
%eax,%ebx
movzbl %bl,%eax   | movd   
%ebx,%xmm0
mov%ebx,0xc(%rsp)   mov
%ebx,0xc(%rsp)
mov%rax,%rsi  | 
punpcklbw %xmm0,%xmm0
imul   %rdx,%rsi  | 
punpcklwd %xmm0,%xmm0
mul%rdx   | pshufd 
$0x0,%xmm0,%xmm0
add%rsi,%rdx  | movups 
%xmm0,0x50(%r12)
mov%rax,0x50(%r12)| movups 
%xmm0,0x60(%r12)
mov%rdx,0x58(%r12)| movups 
%xmm0,0x70(%r12)
mov%rax,0x60(%r12)| movups 
%xmm0,0x80(%r12)
mov%rdx,0x68(%r12)| movups 
%xmm0,0x90(%r12)
mov%rax,0x70(%r12)| movups 
%xmm0,0xa0(%r12)
mov%rdx,0x78(%r12)|

[PATCH v3 01/12] Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE

2021-05-17 Thread H.J. Lu via Gcc-patches
Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.

PR middle-end/90773
* builtins.c (builtin_memset_read_str): Call
targetm.read_memset_value.
(builtin_memset_gen_str): Call targetm.gen_memset_value.
* target.def (read_memset_value): New hook.
(gen_memset_value): Likewise.
* targhooks.c: Inclue "builtins.h".
(default_read_memset_value): New function.
(default_gen_memset_value): Likewise.
* targhooks.h (default_read_memset_value): New prototype.
(default_gen_memset_value): Likewise.
* doc/tm.texi.in: Add TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE hooks.
* doc/tm.texi: Regenerated.
---
 gcc/builtins.c | 47 --
 gcc/doc/tm.texi| 16 +
 gcc/doc/tm.texi.in |  4 
 gcc/target.def | 20 +
 gcc/targhooks.c| 56 ++
 gcc/targhooks.h|  4 
 6 files changed, 104 insertions(+), 43 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e1b284846b1..f78a36478ef 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6584,24 +6584,11 @@ expand_builtin_strncpy (tree exp, rtx target)
previous iteration.  */
 
 rtx
-builtin_memset_read_str (void *data, void *prevp,
+builtin_memset_read_str (void *data, void *prev,
 HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
 scalar_int_mode mode)
 {
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-}
-
-  const char *c = (const char *) data;
-  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
-
-  memset (p, *c, GET_MODE_SIZE (mode));
-
-  return c_readstr (p, mode);
+  return targetm.read_memset_value ((const char *) data, prev, mode);
 }
 
 /* Callback routine for store_by_pieces.  Return the RTL of a register
@@ -6611,37 +6598,11 @@ builtin_memset_read_str (void *data, void *prevp,
nullptr, it has the RTL info from the previous iteration.  */
 
 static rtx
-builtin_memset_gen_str (void *data, void *prevp,
+builtin_memset_gen_str (void *data, void *prev,
HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)
 {
-  rtx target, coeff;
-  size_t size;
-  char *p;
-
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-
-  target = simplify_gen_subreg (mode, prev->data, prev->mode, 0);
-  if (target != nullptr)
-   return target;
-}
-
-  size = GET_MODE_SIZE (mode);
-  if (size == 1)
-return (rtx) data;
-
-  p = XALLOCAVEC (char, size);
-  memset (p, 1, size);
-  coeff = c_readstr (p, mode);
-
-  target = convert_to_mode (mode, (rtx) data, 1);
-  target = expand_mult (mode, target, coeff, NULL_RTX, 1);
-  return force_reg (mode, target);
+  return targetm.gen_memset_value ((rtx) data, prev, mode);
 }
 
 /* Expand expression EXP, which is a call to the memset builtin.  Return
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 85ea9395560..51385044e76 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11868,6 +11868,22 @@ This function prepares to emit a conditional 
comparison within a sequence
  @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the compares.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_MEMSET_VALUE (const char *@var{c}, 
void *@var{prev}, scalar_int_mode @var{mode})
+This function returns the RTL of a constant integer corresponding to
+target reading @code{GET_MODE_SIZE (@var{mode})} bytes from the stringn
+constant @var{str}.  If @var{prev} is not @samp{nullptr}, it contains
+the RTL information from the previous interation.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_GEN_MEMSET_VALUE (rtx @var{data}, void 
*@var{prev}, scalar_int_mode @var{mode})
+This function returns the RTL of a register containing
+@code{GET_MODE_SIZE (@var{mode})} consecutive copies of the unsigned
+char value given in the RTL register @var{data}.  For example, if
+@var{mode} is 4 bytes wide, return the RTL for 0x01010101*@var{data}.
+If @var{PREV} is not @samp{nullptr}, it is the RTL information from
+the previous iteration.
+@end deftypefn
+
 @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned 
@var{nunroll}, class loop *@var{loop})
 This target hook returns a new value for the number of times @var{loop}
 should be unrolled. The parameter @var{nunroll} is the number of times
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d8e3de14af1..8d4c3949fbf 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -79

[PATCH v3 03/12] x86: Avoid stack realignment when copying data

2021-05-17 Thread H.J. Lu via Gcc-patches
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Use
SCRATCH_SSE_REG to copy data from one memory location to
another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c   | 16 -
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 485825b3c15..f799678b273 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -431,7 +431,21 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   && !register_operand (op0, mode)
   && !register_operand (op1, mode))
 {
-  emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+  rtx tmp;
+  mode = GET_MODE (op0);
+  if (TARGET_SSE
+ && (GET_MODE_ALIGNMENT (mode)
+ > ix86_minimum_incoming_stack_boundary (false, true)))
+   {
+ /* NB: Don't increase stack alignment requirement by using
+a scratch SSE register to copy data from one memory
+location to another since it doesn't require a spill.  */
+ tmp = gen_rtx_REG (mode, SCRATCH_SSE_REG);
+ emit_move_insn (tmp, op1);
+   }
+  else
+   tmp = force_reg (mode, op1);
+  emit_move_insn (op0, tmp);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (&this_context);
+  __builtin_memcpy (&this_context, &cur_context,
+   sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((&cur_context)->ra);
+  uw_install_context_1 (&cur_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1



[PATCH v3 02/12] x86: Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE

2021-05-17 Thread H.J. Lu via Gcc-patches
1. Make ix86_expand_vector_init_duplicate global to duplicate QImode
value to TImode/OImode/XImode.
2. Make ix86_minimum_incoming_stack_boundary global and add an argument
to ignore stack_alignment_estimated.
3. Define SCRATCH_SSE_REG as a scratch register for ix86_gen_memset_value.
4. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.

gcc/

PR middle-end/90773
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Make it global.
* config/i386/i386-protos.h (ix86_minimum_incoming_stack_boundary):
New.
(ix86_expand_vector_init_duplicate): Likewise.
* config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
an argument to ignore stack_alignment_estimated.  It is passed
as false by default.  Make it global.
(ix86_gen_memset_value_from_prev): New function.
(ix86_gen_memset_value): Likewise.
(ix86_read_memset_value): Likewise.
(TARGET_GEN_MEMSET_VALUE): New.
(TARGET_READ_MEMSET_VALUE): Likewise.
* config/i386/i386.h (SCRATCH_SSE_REG): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/config/i386/i386-expand.c  |   2 +-
 gcc/config/i386/i386-protos.h  |   5 +
 gcc/config/i386/i386.c | 268 -
 gcc/config/i386/i386.h |   4 +
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-18.c |  15 ++
 gcc/testsuite/gcc.target/i386/pr90773-19.c |  14 ++
 9 files changed, 345 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0fa8d45a684..485825b3c15 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -13648,7 +13648,7 @@ static bool expand_vec_perm_1 (struct expand_vec_perm_d 
*d);
 /* A subroutine of ix86_expand_vector_init.  Store into TARGET a vector
with all elements equal to VAR.  Return true if successful.  */
 
-static bool
+bool
 ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
   rtx target, rtx val)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7782cf1163f..c4896c2da74 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,6 +50,9 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern unsigned int ix86_minimum_incoming_stack_boundary (bool,
+ bool = false);
+
 extern unsigned int ix86_regmode_natural_size (machine_mode);
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
@@ -257,6 +260,8 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
+extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
+  rtx);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6a1f5746089..8b9b2346478 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -415,7 +415,6 @@ static unsigned int split_stack_prologue_scratch_regno 
(void);
 static bool i386_asm_output_addr_const_extra (FILE *, rtx);
 
 static bool ix86_can_inline_p (tree, tree);
-static unsigned int ix86_minimum_incoming_stack_boundary (bool);
 
 
 /* Whether -mtune= or -march= were specified */
@@ -7232,8 +7231,9 @@ find_drap_reg (void)
 
 /* Return minimum incoming stack alignment.  */
 
-static unsigned int
-ix86_minimum_incoming_stack_boundary (bool sibcall)
+unsigned int
+ix86_minimum_incoming_stack_boundary (bool sibcall,
+ bool ignore_estimated)
 {
   unsigned int incoming_stack_boundary;
 
@@ -7248,7 +7248,8 @@ ix86_minimum_incoming_stack_boundary (bool sibcall)
  estimated stack alignment is 128bit.  */
   else if (!sibcall
   && ix86_force_align_arg_pointer
-  && crtl->stack_alignment_estimated == 128)
+  && (ignore_estimated
+ 

[PATCH v3 08/12] x86: Also pass -mno-avx to pr72839.c

2021-05-17 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c 
b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1



[PATCH v3 10/12] x86: Also pass -mno-avx to sw-1.c for ia32

2021-05-17 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c 
b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" 
} */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include 
-- 
2.31.1



[PATCH v3 06/12] x86: Add AVX2 tests for PR middle-end/90773

2021-05-17 Thread H.J. Lu via Gcc-patches
PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c 
b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c 
b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c 
b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c 
b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
-- 
2.31.1



[PATCH v3 11/12] x86: Update gcc.target/i386/incoming-11.c

2021-05-17 Thread H.J. Lu via Gcc-patches
Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c 
b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1



[PATCH v3 07/12] x86: Add tests for piecewise move and store

2021-05-17 Thread H.J. Lu via Gcc-patches
* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memcpy-17.c: Likewise.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c | 16 
 .../gcc.target/i386/pieces-memcpy-11.c | 17 +
 .../gcc.target/i386/pieces-memcpy-12.c | 16 
 .../gcc.target/i386/pieces-memcpy-13.c | 16 
 .../gcc.target/i386/pieces-memcpy-14.c | 17 +
 .../gcc.target/i386/pieces-memcpy-15.c | 16 
 .../gcc.target/i386/pieces-memcpy-16.c | 16 
 .../gcc.target/i386/pieces-memcpy-7.c  | 15 +++
 .../gcc.target/i386/pieces-memcpy-8.c  | 14 ++
 .../gcc.target/i386/pieces-memcpy-9.c  | 14 ++
 .../gcc.target/i386/pieces-memset-1.c  | 16 
 .../gcc.target/i386/pieces-memset-10.c | 16 
 .../gcc.target/i386/pieces-memset-11.c | 16 
 .../gcc.target/i386/pieces-memset-12.c | 16 
 .../gcc.target/i386/pieces-memset-13.c | 16 
 .../gcc.target/i386/pieces-memset-14.c | 16 
 .../gcc.target/i386/pieces-memset-15.c | 16 
 .../gcc.target/i386/pieces-memset-16.c | 16 
 .../gcc.target/i386/pieces-memset-17.c | 16 
 .../gcc.target/i386/pieces-memset-18.c | 16 
 .../gcc.target/i386/pieces-memset-19.c | 17 +
 .../gcc.target/i386/pieces-memset-2.c  | 12 
 .../gcc.target/i386/pieces-memset-20.c | 17 +
 .../gcc.target/i386/pieces-memset-21.c | 17 +
 .../gcc.target/i386/pieces-memset-22.c | 17 +
 .../gcc.target/i386/pieces-memset-23.c | 17 +
 .../gcc.target/i386/pieces-memset-24.c | 17 +
 .../gcc.target/i386/pieces-memset-25.c | 17 +
 .../gcc.target/i386

[PATCH v3 12/12] constructor: Check if it is faster to load constant from memory

2021-05-17 Thread H.J. Lu via Gcc-patches
When expanding a constant constructor, don't call expand_constructor if
it is more efficient to load the data from the memory via move by pieces.

gcc/

PR middle-end/90773
* expr.c (expand_expr_real_1): Don't call expand_constructor if
it is more efficient to load the data from the memory.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.
---
 gcc/expr.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++
 gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c

diff --git a/gcc/expr.c b/gcc/expr.c
index d09ee42e262..80e01ea1cbe 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
unsigned HOST_WIDE_INT ix;
tree field, value;
 
+   /* Check if it is more efficient to load the data from
+  the memory directly.  FIXME: How many stores do we
+  need here if not moved by pieces?  */
+   unsigned HOST_WIDE_INT bytes
+ = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+   if ((bytes / UNITS_PER_WORD) > 2
+   && MOVE_MAX_PIECES > UNITS_PER_WORD
+   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
+ goto normal_inner_ref;
+
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init), ix,
  field, value)
  if (tree_int_cst_equal (field, index))
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c 
b/gcc/testsuite/gcc.target/i386/pr90773-24.c
new file mode 100644
index 000..4a4b62533dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
48\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c 
b/gcc/testsuite/gcc.target/i386/pr90773-25.c
new file mode 100644
index 000..2520b670989
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1



[PATCH v3 05/12] x86: Update piecewise move and store

2021-05-17 Thread H.J. Lu via Gcc-patches
We can use TImode/OImode/XImode integers for piecewise move and store.
When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MOVE_MAX): Set to 64.
(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
by vector register.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
---
 gcc/config/i386/i386.c | 21 ---
 gcc/config/i386/i386.h | 31 +-
 gcc/testsuite/gcc.target/i386/pr90773-1.c  | 10 +++
 gcc/testsuite/gcc.target/i386/pr90773-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  6 ++---
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  2 +-
 8 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8b9b2346478..b5c1436464f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7943,8 +7943,17 @@ ix86_finalize_stack_frame_flags (void)
  assumed stack realignment might be needed or -fno-omit-frame-pointer
  is used, but in the end nothing that needed the stack alignment had
  been spilled nor stack access, clear frame_pointer_needed and say we
- don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+ don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign
+   || (!flag_omit_frame_pointer && optimize)
+   || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && crtl->sp_is_unchanging
@@ -10403,7 +10412,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
  /* FALLTHRU */
case E_OImode:
case E_XImode:
- if (!standard_sse_constant_p (x, mode))
+ if (!standard_sse_constant_p (x, mode)
+ && GET_MODE_SIZE (TARGET_AVX512F
+   ? XImode
+   : (TARGET_AVX
+  ? OImode
+  : (TARGET_SSE2
+ ? TImode : DImode))) < GET_MODE_SIZE 
(mode))
return false;
default:
  break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 45d86802c51..677afbf7031 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1754,7 +1754,7 @@ typedef struct ix86_args {
 
 /* Max number of bytes we can move from memory to memory
in one reasonably fast instruction.  */
-#define MOVE_MAX 16
+#define MOVE_MAX 64
 
 /* MOVE_MAX_PIECES is the number of bytes at a time which we can
move efficiently, as opposed to  MOVE_MAX which is the maximum
@@ -1765,11 +1765,30 @@ typedef struct ix86_args {
widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
64-bit mode.  */
 #define MOVE_MAX_PIECES \
-  ((TARGET_64BIT \
-&& TARGET_SSE2 \
-&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
-&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   && !TARGET_PREFER_AVX128 \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+  ? 32 \
+  : ((TARGET_SSE2 \
+ && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+ && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+? 16 : UNITS_PER_WORD)))
+
+/* STORE_MAX_PIECES is the number of bytes at a time that we can
+   store efficiently.  */
+#define STORE_MAX_PIECES \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   && !T

[PATCH v3 09/12] x86: Also pass -mno-avx to cold-attribute-1.c

2021-05-17 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c 
b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include 
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1



Re: RFA: fix gcc.dg/tree-ssa/popcount4l.c 16 bit failure, improve 64 bit popcount expansion for 32 bit target

2021-05-17 Thread Joern Wolfgang Rennecke
Attached is the updated version of the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu.

OK to apply?
Recognize popcount also when a double width operation is needed.

2021-01-27  Joern Rennecke  

gcc/
* match.pd :
When generating popcount directly fails, try doing it in two halves.
gcc/testsuite/
* gcc.dg/tree-ssa/popcount4ll.c: Remove lp64 condition.
Adjust scanning pattern for !lp64.
* gcc.dg/tree-ssa/popcount5ll.c: Likewise.
* gcc.dg/tree-ssa/popcount4l.c: Adjust scanning pattern
for ! int32plus.

diff --git a/gcc/match.pd b/gcc/match.pd
index cdb87636951..3cfa5e761a4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6550,10 +6550,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_to_uhwi (@3) == c2
&& tree_to_uhwi (@9) == c3
&& tree_to_uhwi (@7) == c3
-   && tree_to_uhwi (@11) == c4
-   && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
-  OPTIMIZE_FOR_BOTH))
-(convert (IFN_POPCOUNT:type @0)
+   && tree_to_uhwi (@11) == c4)
+(if (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
+OPTIMIZE_FOR_BOTH))
+ (convert (IFN_POPCOUNT:type @0))
+ /* Try to do popcount in two halves.  PREC must be at least
+   five bits for this to work without extension before adding.  */
+ (with {
+   tree half_type = NULL_TREE;
+   machine_mode m = mode_for_size ((prec + 1) / 2, MODE_INT, 1).require ();
+   int half_prec = GET_MODE_PRECISION (as_a  (m));
+   if (m != TYPE_MODE (type))
+half_type = build_nonstandard_integer_type (half_prec, 1);
+   gcc_assert (half_prec > 2);
+  }
+  (if (half_type != NULL_TREE
+  && direct_internal_fn_supported_p (IFN_POPCOUNT, half_type,
+ OPTIMIZE_FOR_BOTH))
+   (convert (plus
+(IFN_POPCOUNT:half_type (convert @0))
+(IFN_POPCOUNT:half_type (convert (rshift @0
+   { build_int_cst (integer_type_node, half_prec); } )))
 
 /* __builtin_ffs needs to deal on many targets with the possible zero
argument.  If we know the argument is always non-zero, __builtin_ctz + 1
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
index 69fb2d1134d..269e56e90f9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
@@ -25,6 +25,7 @@ int popcount64c(unsigned long x)
 return (x * h01) >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target 
int32plus } } } */
+/* { dg-final { scan-tree-dump "\.POPCOUNT" "optimized" { target { ! int32plus 
} } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
index c1588be68e4..7abadf6df04 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { lp64 } } } */
+/* { dg-do compile } */
 /* { dg-require-effective-target popcountll } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
@@ -16,4 +16,5 @@ int popcount64c(unsigned long long x)
 return (x * h01) >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target { 
lp64 } } } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 2 "optimized" { target { ! 
lp64 } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
index edb191bf894..2afe08124fe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/94800 */
-/* { dg-do compile { target { lp64 } } } */
+/* { dg-do compile } */
 /* { dg-require-effective-target popcountll } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
@@ -19,4 +19,5 @@ int popcount64c(unsigned long long x)
 return x >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target { 
lp64 } } } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 2 "optimized" { target { ! 
lp64 } } } } */


[PATCH] middle-end/100582 - fix array_at_struct_end_p for vector indexing

2021-05-17 Thread Richard Biener
Vector indexing leaves us with ARRAY_REFs of VIEW_CONVERT_EXPRs,
sth which array_at_struct_end_p considers a array-at-struct-end
even when there's an underlying decl visible.  The following fixes
the latter.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-05-17  Richard Biener  

PR middle-end/100582
* tree.c (array_at_struct_end_p): Get to the base of the
reference before looking for the underlying decl.

* gcc.target/i386/pr100582.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr100582.c | 16 
 gcc/tree.c   |  8 +++-
 2 files changed, 19 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100582.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100582.c 
b/gcc/testsuite/gcc.target/i386/pr100582.c
new file mode 100644
index 000..9520fe7a197
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100582.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2" } */
+
+typedef unsigned char v32qi __attribute__((vector_size(32)));
+
+v32qi
+f2 (v32qi x, v32qi a, v32qi b)
+{
+v32qi e;
+  for (int i = 0; i != 32; i++)
+ e[i] = x[i] ? a[i] : b[i];
+
+  return e;
+}
+
+/* { dg-final { scan-assembler-times "pblendvb" 1 } } */
diff --git a/gcc/tree.c b/gcc/tree.c
index 01eda553a65..8afba598eb5 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12550,13 +12550,11 @@ array_at_struct_end_p (tree ref)
   || ! TYPE_MAX_VALUE (TYPE_DOMAIN (atype)))
 return true;
 
-  if (TREE_CODE (ref) == MEM_REF
-  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
-ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
-
   /* If the reference is based on a declared entity, the size of the array
  is constrained by its given domain.  (Do not trust commons PR/69368).  */
-  if (DECL_P (ref)
+  ref = get_base_address (ref);
+  if (ref
+  && DECL_P (ref)
   && !(flag_unconstrained_commons
   && VAR_P (ref) && DECL_COMMON (ref))
   && DECL_SIZE_UNIT (ref)
-- 
2.26.2


[PATCH][v2] c/100547 - reject overly large vector_size attributes

2021-05-17 Thread Richard Biener
This rejects a number of vector components that does not fit an 'int'
which is an internal limitation of RTVEC.  This requires adjusting
gcc.dg/attr-vector_size.c which checks for much larger
supported vectors.  Note that the RTVEC limitation is a host specific
limitation (unless we change this 'int' to int32_t), but should be
32bits in practice everywhere.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2021-05-12  Richard Biener  

PR c/100547
gcc/c-family/
* c-attribs.c (type_valid_for_vector_size): Reject too large nunits.
Reword existing nunit diagnostic.

* gcc.dg/pr100547.c: New testcase.
* gcc.dg/attr-vector_size.c: Adjust.
---
 gcc/c-family/c-attribs.c| 16 +--
 gcc/testsuite/gcc.dg/attr-vector_size.c | 16 ---
 gcc/testsuite/gcc.dg/pr100547.c | 35 +
 3 files changed, 49 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100547.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index f54388e9939..ecb32c70172 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -4245,10 +4245,22 @@ type_valid_for_vector_size (tree type, tree atname, 
tree args,
   if (nunits & (nunits - 1))
 {
   if (error_p)
-   error ("number of components of the vector not a power of two");
+   error ("number of vector components %wu not a power of two", nunits);
   else
warning (OPT_Wattributes,
-"number of components of the vector not a power of two");
+"number of vector components %wu not a power of two", nunits);
+  return NULL_TREE;
+}
+
+  if (nunits >= (unsigned HOST_WIDE_INT)INT_MAX)
+{
+  if (error_p)
+   error ("number of vector components %wu exceeds %d",
+  nunits, INT_MAX - 1);
+  else
+   warning (OPT_Wattributes,
+"number of vector components %wu exceeds %d",
+nunits, INT_MAX - 1);
   return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/gcc.dg/attr-vector_size.c 
b/gcc/testsuite/gcc.dg/attr-vector_size.c
index 00be26accd5..3f2ce889121 100644
--- a/gcc/testsuite/gcc.dg/attr-vector_size.c
+++ b/gcc/testsuite/gcc.dg/attr-vector_size.c
@@ -22,14 +22,6 @@ DEFVEC (extern, 30);
 
 #if __SIZEOF_SIZE_T__ > 4
 
-DEFVEC (extern, 31);
-DEFVEC (extern, 32);
-DEFVEC (extern, 33);
-DEFVEC (extern, 34);
-DEFVEC (extern, 60);
-DEFVEC (extern, 61);
-DEFVEC (extern, 62);
-
 VEC (POW2 (63)) char v63; /* { dg-error  "'vector_size' attribute argument 
value '9223372036854775808' exceeds 9223372036854775807" "LP64" { target lp64 } 
} */
 
 #else
@@ -49,14 +41,6 @@ void test_local_scope (void)
 
 #if __SIZEOF_SIZE_T__ > 4
 
-  DEFVEC (auto, 31);
-  DEFVEC (auto, 32);
-  DEFVEC (auto, 33);
-  DEFVEC (auto, 34);
-  DEFVEC (auto, 60);
-  DEFVEC (auto, 61);
-  DEFVEC (auto, 62);
-
   VEC (POW2 (63)) char v63;   /* { dg-error  "'vector_size' attribute argument 
value '9223372036854775808' exceeds 9223372036854775807" "LP64" { target lp64 } 
} */
 
 #else
diff --git a/gcc/testsuite/gcc.dg/pr100547.c b/gcc/testsuite/gcc.dg/pr100547.c
new file mode 100644
index 000..2d3da4eb50e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100547.c
@@ -0,0 +1,35 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -g" } */
+
+typedef int __attribute__((vector_size(
+((8 * sizeof(short)) * sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short V; /* { dg-error "number of vector components" } */
+void k() { V w = { 0 }; }
-- 
2.26.2


[PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches
The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


commit 4cd69a5a0dd31bc6fdef1bbabc8d6d1416014ea1
Author: Jonathan Wakely 
Date:   Mon May 17 11:54:06 2021

libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 3d341916db5..6e0a85417cc 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -116,7 +116,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index c5fc3beed1f..0a8f4eee0a1 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -124,7 +124,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
+
+void f(bool) { }
+void f(const std::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
new file mode 100644
index 000..b2428ff74cf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+
+void f(bool) { }
+void f(const std::experimental::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}


Re: [PATCH 1/2] c-family: Copy DECL_USER_ALIGN even if DECL_ALIGN is similar.

2021-05-17 Thread Robin Dapp via Gcc-patches

on s390 a warning test fails:

inline int ATTR ((cold, aligned (8)))
finline_hot_noret_align (int);

inline int ATTR ((warn_unused_result))
finline_hot_noret_align (int);

inline int ATTR ((aligned (4)))
finline_hot_noret_align (int);  /* { dg-warning "ignoring attribute
.aligned \\(4\\). because it conflicts with attribute .aligned \\(8\\)."

This test actually uncovered two problems.  First, on s390 the default
function alignment is 8 bytes.  When the second decl above is merged
with the first one, DECL_USER_ALIGN is only copied if DECL_ALIGN (old) >
DECL_ALIGN (new).  Subsequently, when merging the third decl, no warning
is emitted since DECL_USER_ALIGN is unset.


[..]

Ping.


Re: [PATCH 4/5] Rework indirect struct handling for OpenACC/OpenMP in gimplify.c

2021-05-17 Thread Julian Brown
On Mon, 17 May 2021 14:12:00 +0200
Bernd Edlinger  wrote:

> >  */ @@ -8715,19 +8770,26 @@ static tree
> >  build_struct_group (struct gimplify_omp_ctx *ctx,
> > enum omp_region_type region_type, enum
> > tree_code code, tree decl, unsigned int *flags, tree c,
> > -   hash_map *&struct_map_to_clause,
> > +   hash_map
> > *&struct_map_to_clause, tree *&prev_list_p, tree *&list_p, bool
> > *cont) {
> >poly_offset_int coffset;
> >poly_int64 cbitpos;
> > -  tree base_ref;
> > +  tree base_ind, base_ref;
> > +  tree *list_in_p = list_p, *prev_list_in_p = prev_list_p;
> >
> 
> Is this a kind of debug code?
> This fails to compile:
> 
> ../../gcc-trunk/gcc/gimplify.c: In function ‘tree_node*
> build_struct_group(gimplify_omp_ctx*, omp_region_type, tree_code,
> tree, unsigned int*, tree, hash_map*&,
> tree_node**&, tree_node**&, bool*)’:
> ../../gcc-trunk/gcc/gimplify.c:8779:9: error: unused variable
> ‘list_in_p’ [-Werror=unused-variable] 8779 |   tree *list_in_p =
> list_p, *prev_list_in_p = prev_list_p; | ^
> ../../gcc-trunk/gcc/gimplify.c:8779:30: error: unused variable
> ‘prev_list_in_p’ [-Werror=unused-variable] 8779 |   tree *list_in_p =
> list_p, *prev_list_in_p = prev_list_p; |
> ^~

Oops, that's left over from an earlier iteration of the patch, and
indeed isn't needed any more. I'll be sure to bootstrap the next
iteration of these patches I send upstream.

Thanks,

Julian


Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC

2021-05-17 Thread Julian Brown
On Mon, 17 May 2021 21:07:19 +0800
Chung-Lin Tang  wrote:

> Hi Julian,
> 
> On 2021/5/15 5:27 AM, Julian Brown wrote:
> > GCC currently raises a parse error for indirect accesses to struct
> > members, where the base of the access is a reference to a pointer.
> > This patch fixes that case.  
> 
> > gcc/cp/
> > * semantics.c (finish_omp_clauses): Handle components of
> > references to pointers to structs.
> > 
> > libgomp/
> > * testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test.  
> 
> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> > @@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum
> > c_omp_region_type ort) if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
> >   && TREE_CODE (t) == COMPONENT_REF
> >   && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF)
> > -   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> > +   {
> > + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> > + /* References to pointers have a double indirection
> > here.  */
> > + if (TREE_CODE (t) == INDIRECT_REF)
> > +   t = TREE_OPERAND (t, 0);
> > +   }
> >   if (TREE_CODE (t) == COMPONENT_REF
> >   && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
> >   || ort == C_ORT_ACC)  
> 
> There is already a large plethora of such modifications in this patch:
> "[PATCH, OG10, OpenMP 5.0, committed] Remove array section
> base-pointer mapping semantics, and other front-end adjustments."
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html
> 
> I am in the process of taking that patch to mainline, so are you sure
> this is not already handled there?

Hmm, it might be -- thanks. Consider this patch withdrawn if so. (But
yeah, keep the test case by all means!)

Julian


[PATCH] c/100625 - avoid building invalid labels in the GIMPLE FE

2021-05-17 Thread Richard Biener
When duplicate labes are diagnosed, avoid building a GIMPLE_LABEL.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2021-05-17  Richard Biener  

PR c/100625
gcc/c/
* gimple-parser.c (c_parser_gimple_label): Avoid building
a GIMPLE label with NULL label decl.

* gcc.dg/gimplefe-error-9.c: New testcase.
---
 gcc/c/gimple-parser.c   | 3 ++-
 gcc/testsuite/gcc.dg/gimplefe-error-9.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-error-9.c

diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index 3a6e72ef002..398e21631d9 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -1887,7 +1887,8 @@ c_parser_gimple_label (gimple_parser &parser, gimple_seq 
*seq)
   gcc_assert (c_parser_next_token_is (parser, CPP_COLON));
   c_parser_consume_token (parser);
   tree label = define_label (loc1, name);
-  gimple_seq_add_stmt_without_update (seq, gimple_build_label (label));
+  if (label)
+gimple_seq_add_stmt_without_update (seq, gimple_build_label (label));
   return;
 }
 
diff --git a/gcc/testsuite/gcc.dg/gimplefe-error-9.c 
b/gcc/testsuite/gcc.dg/gimplefe-error-9.c
new file mode 100644
index 000..87014c1cbbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-error-9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+void __GIMPLE
+foo()
+{
+bb1:
+bb1:; /* { dg-error "duplicate" } */
+}
-- 
2.26.2


Re: [PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 15:02 +0100, Jonathan Wakely wrote:

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


Oh actually this is needed for experimental::filesystem::path on trun
kand gcc-11 (as I found when I added to the new tests to trunk) so
I'll fix it there too.



Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-17 Thread Julian Brown
On Mon, 17 May 2021 21:14:26 +0800
Chung-Lin Tang  wrote:

> On 2021/5/11 4:57 PM, Julian Brown wrote:
> > This work-in-progress patch tries to get
> > GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
> > GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
> > to be processed by build_struct_group/build_struct_comp_map.  I
> > think that's important to integrate with how groups of mappings for
> > array sections are handled in other cases.
> > 
> > This patch isn't sufficient by itself to fix a couple of broken
> > test cases at present (libgomp.c++/target-lambda-1.C,
> > libgomp.c++/target-this-4.C), though.  
> 
> No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just
> a slightly different behavior version of GOMP_MAP_ATTACH; it
> tolerates an unmapped pointer-target and assigns NULL on the device,
> instead of just gomp_fatal(). (see its handling in libgomp/target.c)
> 
> In case OpenACC can have the same such zero-length array section
> behavior, we can just share one GOMP_MAP_ATTACH map. For now it is
> treated as separate cases.

OK, understood. But, I'm a bit concerned that we're ignoring some
"hidden rules" with regards to OMP pointer clause ordering/grouping that
certain code (at least the bit that creates GOMP_MAP_STRUCT node
groups, and parts of omp-low.c) relies on. I believe those rules are as
follows:

 - an array slice is mapped using two or three pointers -- two for a
   normal (non-reference) base pointer, and three if we have a
   reference to a pointer (i.e. in C++) or an array descriptor (i.e. in
   Fortran). So we can have e.g.

   GOMP_MAP_TO
   GOMP_MAP_ALWAYS_POINTER

   GOMP_MAP_TO
   GOMP_MAP_.*_POINTER
   GOMP_MAP_ALWAYS_POINTER

   GOMP_MAP_TO
   GOMP_MAP_TO_PSET
   GOMP_MAP_ALWAYS_POINTER

 - for OpenACC, we extend this to allow (up to and including
   gimplify.c) the GOMP_MAP_ATTACH_DETACH mapping. So we can have (for
   component refs):

   GOMP_MAP_TO
   GOMP_MAP_ATTACH_DETACH

   GOMP_MAP_TO
   GOMP_MAP_TO_PSET
   GOMP_MAP_ATTACH_DETACH

   GOMP_MAP_TO
   GOMP_MAP_.*_POINTER
   GOMP_MAP_ATTACH_DETACH

For the scanning in insert_struct_comp_map (as it is at present) to
work right, these groups must stay intact.  I think the current
behaviour of omp_target_reorder_clauses on the og10 branch can break
those groups apart though!

(The "prev_list_p" stuff in the loop in question in gimplify.c just
keeps track of the first node in these groups.)

For OpenACC, the GOMP_MAP_ATTACH_DETACH code does *not* depend on the
previous clause when lowering in omp-low.c. But GOMP_MAP_ALWAYS_POINTER
does! And in one case ("update" directive), GOMP_MAP_ATTACH_DETACH is
rewritten to GOMP_MAP_ALWAYS_POINTER, so for that case at least, the
dependency on the preceding mapping node must stay intact.

OpenACC also allows "bare" GOMP_MAP_ATTACH and GOMP_MAP_DETACH nodes
(corresponding to the "attach" and "detach" clauses). Those are handled
a bit differently to GOMP_MAP_ATTACH_DETACH in gimplify.c -- but
GOMP_MAP_ATTACH_Z_L_A_S doesn't quite behave like that either, I don't
think?

Anyway: I've not entirely understood what omp_target_reorder_clauses is
doing, but I think it may need to try harder to keep the groups
mentioned above together.  What do you think?

Thanks,

Julian


Re: [PATCH] Bail in bounds_of_var_in_loop if scev returns NULL.

2021-05-17 Thread Andrew MacLeod via Gcc-patches

On 5/13/21 4:15 PM, Aldy Hernandez via Gcc-patches wrote:

Both initial_condition_in_loop_num and evolution_part_in_loop_num
can return NULL.  This patch exits if either one is NULL.  Presumably
this didn't happen before, because adjust_range_with_scev was called
far less frequently than in ranger, which can call it for every PHI.

OK pending tests?

gcc/ChangeLog:

PR tree-optimization/100349
* vr-values.c (bounds_of_var_in_loop): Bail if scev returns
  NULL.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100349.c: New test.
-


OK.

Andrew



Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Joern Rennecke
On Mon, 17 May 2021 at 11:59, Richard Biener  wrote:

> The plan for reload is to axe it similar to CC0 support.  Sooner than later, 
> but
> give it's still used exclusively by a lot of target means it might
> take some time.

> So for you it's always just -fretry-compilation -m[no-]lra?  Given -m[no-]lra
> is a thing cycling between the two directly in RA lra/reload should be 
> possible?

Even if that were possible, it wouldn't solve the problem.  When I try compiling
newlib without -fretry-compilation, it's falling over first for
libc/time/strftime.c .
With lra, lra finishes, but it ignores an earlyclobber constraint, so
reload_cse_simplify_operands ICEs.  With reload, you get a spill failure.
I've tried various options, but only -O0 seems to work.  Compiling strftime with
-O0 is not really an issue because the target is too deeply embedded to hope
to link something that uses strftime.  But identifyig all the files
that can't be
compiled with optimization and treating them differently is a problem if it has
to be done by hand.

> Or are reload/LRA too greedy in that they ICE when having transformed half
> of the code already?

Both of them do a lot of transformations before they ICE.  Or they don't even
ICE themselves, but leave behind invalid rtl that a later pass catches.

Even if we fixed both passes so that they could roll back everything
(which I think would be a lot harder for lra; reload can already roll
back a lot),
what's the point if you axe reload soon after?

> I see.  It's of course difficult for the FSF tree to cater for
> extremes that are not
> represented in its tree.  I wonder what prevents you from contributing the 
> port?

I can neither confirm nor deny that I can't contribute the port.

> Still if that solves a lot of the issues this seems like the way to go.

It has merit in it's own right, but it can't fix all the ICEs, and thus doesn't
make building libraries manageable.


Re: [PATCH] Add a couple of A?CST1:CST2 match and simplify optimizations

2021-05-17 Thread Bernd Edlinger
On 5/16/21 10:36 PM, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski 
> 
> Instead of some of the more manual optimizations inside phi-opt,
> it would be good idea to do a lot of the heavy lifting inside match
> and simplify instead. In the process, this moves the three simple
> A?CST1:CST2 (where CST1 or CST2 is zero) simplifications.
> 
> OK? Boostrapped and tested on x86_64-linux-gnu with no regressions.
> 
> Thanks,
> Andrew Pinski
> 
> gcc:
> * match.pd (A?CST1:CST2): Add simplifcations for A?0:+-1, A?+-1:0,
> A?POW2:0 and A?0:POW2.
> ---
>  gcc/match.pd | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 10503b97ab5..844f7dd5f87 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3711,6 +3711,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (if (integer_all_onesp (@1) && integer_zerop (@2))
>  @0
>  
> +/* A few simplifications of "a ? CST1 : CST2". */
> +/* NOTE: Only do this on gimple as the if-chain-to-switch
> +   optimization depends on the gimple to have if statements in it. */
> +#if GIMPLE
> +(simplify
> + (cond @0 INTEGER_CST@1 INTEGER_CST@2)
> + (switch
> +  (if (integer_zerop (@2))
> +   (switch
> +/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
> +(if (integer_onep (@1))
> + (convert (convert:boolean_type_node @0)))
> +/* a ? -1 : 0 -> -a. */
> +(if (integer_all_onesp (@1))
> + (negate (convert (convert:boolean_type_node @0
> +/* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
> +(if (!POINTER_TYPE_P (type) && integer_pow2p (@1))
> + (with {
> +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> (wi::to_wide (@1)));
> +  }
> +  (lshift (convert (convert:boolean_type_node @0)) { shift; })
> +  (if (integer_zerop (@1))
> +   (switch
> +/* a ? 0 : 1 -> !a. */
> +(if (integer_onep (@2))
> + (convert (bit_not:boolean_type_node (convert:boolean_type_node @0
> +/* a ? -1 : 0 -> -(!a). */
> +(if (integer_all_onesp (@2))
> + (negate (convert (bit_not:boolean_type_node (convert:boolean_type_node 
> @0)
> +/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
> +(if (!POINTER_TYPE_P (type) && integer_pow2p (@2))
> + (with {
> +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> (wi::to_wide (@2)));
> +  }
> +  (lshift (convert (bit_not:boolean_type_node (convert:boolean_type_node 
> @0))) { shift; })))
> +#endif
> +
>  /* Simplification moved from fold_cond_expr_with_comparison.  It may also
> be extended.  */
>  /* This pattern implements two kinds simplification:
> 

Hi Andrew,

Sorry, but I don't know what is exactly  wrong with this patch,
but it seems to cause this, when I try to bootstrap it:


/home/ed/gnu/gcc-build-2/./prev-gcc/xgcc -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g -O2 
-fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada 
-Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
-I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
-I../../gcc-trunk-1/gcc/ada/libgnat 
../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads -o ada/libgnat/a-charac.o
/home/ed/gnu/gcc-build-2/./prev-gcc/xgcc -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g -O2 
-fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada 
-Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
-I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
-I../../gcc-trunk-1/gcc/ada/libgnat 
../../gcc-trunk-1/gcc/ada/libgnat/a-chlat1.ads -o ada/libgnat/a-chlat1.o
+===GNAT BUG DETECTED==+
| 12.0.0 20210517 (experimental) (x86_64-pc-linux-gnu) Storage_Error stack 
overflow or erroneous memory access|
| Error detected at a-charac.ads:16:12 |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered. 

[PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Patrick Palka via Gcc-patches
This fixes two issues with our iterator caching as described in detail
in the PR.  Since r12-336 added the __non_propagating_cache class
template as part of P2328, this patch just rewrites the _CachedPosition
partial specialization in terms of this class template.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?  Shall we
also backport this?

libstdc++-v3/ChangeLog:

PR libstdc++/100479
* include/std/ranges (__detail::__non_propagating_cache): Move
definition up to before that of _CachedPosition.  Make base
class _Optional_base protected instead of private.  Add const
overload for operator*.
(__detail::_CachedPosition): Rewrite the partial specialization
for forward ranges as a derived class of __non_propagating_cache.
Remove the size constraint on the partial specialization for
random access ranges.
* testsuite/std/ranges/adaptors/100479.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 133 +-
 .../testsuite/std/ranges/adaptors/100479.cc   |  82 +++
 2 files changed, 148 insertions(+), 67 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/100479.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 1707aeaebcd..fe6379fb858 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1139,6 +1139,67 @@ namespace views::__adaptor
 
   namespace __detail
   {
+template
+  struct __non_propagating_cache
+  {
+   // When _Tp is not an object type (e.g. is a reference type), we make
+   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
+   // users can easily conditionally declare data members with this type
+   // (such as join_view::_M_inner).
+  };
+
+template
+  requires is_object_v<_Tp>
+  struct __non_propagating_cache<_Tp> : protected _Optional_base<_Tp>
+  {
+   __non_propagating_cache() = default;
+
+   constexpr
+   __non_propagating_cache(const __non_propagating_cache&) noexcept
+   { }
+
+   constexpr
+   __non_propagating_cache(__non_propagating_cache&& __other) noexcept
+   { __other._M_reset(); }
+
+   constexpr __non_propagating_cache&
+   operator=(const __non_propagating_cache& __other) noexcept
+   {
+ if (std::__addressof(__other) != this)
+   this->_M_reset();
+ return *this;
+   }
+
+   constexpr __non_propagating_cache&
+   operator=(__non_propagating_cache&& __other) noexcept
+   {
+ this->_M_reset();
+ __other._M_reset();
+ return *this;
+   }
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return this->_M_get(); }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return this->_M_get(); }
+
+   template
+ _Tp&
+ _M_emplace_deref(const _Iter& __i)
+ {
+   this->_M_reset();
+   // Using _Optional_base::_M_construct to initialize from '*__i'
+   // would incur an extra move due to the indirection, so we instead
+   // use placement new directly.
+   ::new ((void *) std::__addressof(this->_M_payload._M_payload)) 
_Tp(*__i);
+   this->_M_payload._M_engaged = true;
+   return this->_M_get();
+ }
+  };
+
 template
   struct _CachedPosition
   {
@@ -1160,27 +1221,25 @@ namespace views::__adaptor
 
 template
   struct _CachedPosition<_Range>
+   : protected __non_propagating_cache>
   {
-  private:
-   iterator_t<_Range> _M_iter{};
-
-  public:
constexpr bool
_M_has_value() const
-   { return _M_iter != iterator_t<_Range>{}; }
+   { return this->_M_is_engaged(); }
 
constexpr iterator_t<_Range>
_M_get(const _Range&) const
{
  __glibcxx_assert(_M_has_value());
- return _M_iter;
+ return **this;
}
 
constexpr void
_M_set(const _Range&, const iterator_t<_Range>& __it)
{
  __glibcxx_assert(!_M_has_value());
- _M_iter = __it;
+ this->_M_payload._M_payload._M_value = __it;
+ this->_M_payload._M_engaged = true;
}
   };
 
@@ -2339,66 +2398,6 @@ namespace views::__adaptor
 inline constexpr _DropWhile drop_while;
   } // namespace views
 
-  namespace __detail
-  {
-template
-  struct __non_propagating_cache
-  {
-   // When _Tp is not an object type (e.g. is a reference type), we make
-   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
-   // users can easily conditionally declare data members with this type
-   // (such as join_view::_M_inner).
-  };
-
-template
-  requires is_object_v<_Tp>
-  struct __non_propagating_cache<_Tp> : private _Optional_base<_Tp>
-  {
-   __non_propagating_cache() = default;
-
-   constexp

[PATCH] libstdc++: Fix up semiregular-box partial specialization [PR100475]

2021-05-17 Thread Patrick Palka via Gcc-patches
This makes the in-place constructor of our partial specialization of
__box for already-semiregular types to use direct-non-list-initialization
(in accordance with the specification of the primary template), and
additionally makes its data() member function use std::__addressof.

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/100475
* include/std/ranges (__box::__box): Use non-list-initialization
in member initializer list of in-place constructor of the
partial specialization for semiregular types.
(__box::operator->): Use std::__addressof.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc
(test02): New test.
* testsuite/std/ranges/single_view.cc (test04): New test.
---
 libstdc++-v3/include/std/ranges|  6 +++---
 .../ranges/adaptors/detail/semiregular_box.cc  | 18 ++
 .../testsuite/std/ranges/single_view.cc| 16 
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 0f69d4f0839..1707aeaebcd 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -163,7 +163,7 @@ namespace ranges
  constexpr explicit
  __box(in_place_t, _Args&&... __args)
  noexcept(is_nothrow_constructible_v<_Tp, _Args...>)
- : _M_value{std::forward<_Args>(__args)...}
+ : _M_value(std::forward<_Args>(__args)...)
  { }
 
constexpr bool
@@ -180,11 +180,11 @@ namespace ranges
 
constexpr _Tp*
operator->() noexcept
-   { return &_M_value; }
+   { return std::__addressof(_M_value); }
 
constexpr const _Tp*
operator->() const noexcept
-   { return &_M_value; }
+   { return std::__addressof(_M_value); }
   };
   } // namespace __detail
 
diff --git 
a/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
index 65931dea51a..ed694e04fd1 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
@@ -81,3 +81,21 @@ test01()
   return true;
 }
 static_assert(test01());
+
+template
+  struct A {
+A() requires make_semiregular;
+A(int, int);
+A(std::initializer_list) = delete;
+  };
+
+void
+test02()
+{
+  // PR libstdc++/100475
+  static_assert(std::semiregular>);
+  __box> x2(std::in_place, 0, 0);
+
+  static_assert(!std::semiregular>);
+  __box> x1(std::in_place, 0, 0);
+}
diff --git a/libstdc++-v3/testsuite/std/ranges/single_view.cc 
b/libstdc++-v3/testsuite/std/ranges/single_view.cc
index 97bc39bb636..f530cc07565 100644
--- a/libstdc++-v3/testsuite/std/ranges/single_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/single_view.cc
@@ -58,9 +58,25 @@ test03()
   VERIFY(*std::ranges::begin(s3) == 'a');
 }
 
+void
+test04()
+{
+  // PR libstdc++/100475
+  struct A {
+A() = default;
+A(int, int) { }
+A(std::initializer_list) = delete;
+void operator&() const = delete;
+  };
+  std::ranges::single_view s(std::in_place, 0, 0);
+  s.data();
+  std::as_const(s).data();
+}
+
 int main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
-- 
2.31.1.621.g97eea85a0a



[PATCH] openmp: Notify team barrier of pending tasks in, omp_fulfill_event

2021-05-17 Thread Kwok Cheung Yeung

Hello

This patch fixes the issue where a call to omp_fulfill_event could fail to 
trigger the execution of tasks that were dependent on the task whose completion 
event is being fulfilled.


This mainly (or can only?) occurs when the thread is external to OpenMP, and all 
the barrier threads are sleeping when the omp_fulfill_event is called. 
omp_fulfill_event wakes the appropriate number of threads, but if 
BAR_TASK_PENDING is not set on bar->generation, the threads go back to sleep 
again rather than process new tasks.


I have added a new testcase using a pthread thread to call omp_fulfill_event on 
a suspended task after a short delay. I have not included a Fortran version as 
there doesn't appear to be a standard interface for threading on Fortran.


I have tested all the task-detach-* libgomp tests (which are the only tests that 
call omp_fulfill_event) with no offloading and offloading to Nvidia, with no 
fails. Okay to commit to master, releases/gcc-11 and devel/omp/gcc-11?


Thanks

Kwok
From 348c7cd00e358a8dc0b7563055f367fce2713fa5 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 14 May 2021 09:59:11 -0700
Subject: [PATCH] openmp: Notify team barrier of pending tasks in
 omp_fulfill_event

The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.

2021-05-17  Kwok Cheung Yeung  

libgomp/
* task.c (omp_fulfill_event): Call gomp_team_barrier_set_task_pending
if new tasks generated.
* testsuite/libgomp.c-c++-common/task-detach-13.c: New.
---
 libgomp/task.c|  1 +
 .../libgomp.c-c++-common/task-detach-13.c | 60 +++
 2 files changed, 61 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c

diff --git a/libgomp/task.c b/libgomp/task.c
index 1c73c759a8d..feb4796a3ac 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -2460,6 +2460,7 @@ omp_fulfill_event (omp_event_handle_t event)
   if (new_tasks > 0)
 {
   /* Wake up threads to run new tasks.  */
+  gomp_team_barrier_set_task_pending (&team->barrier);
   do_wake = team->nthreads - team->task_running_count;
   if (do_wake > new_tasks)
do_wake = new_tasks;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
new file mode 100644
index 000..4306524526d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-fopenmp" } */
+/* { dg-timeout 10 } */
+
+/* Test that omp_fulfill_event works when called from an external
+   non-OpenMP thread.  */
+
+#include 
+#include 
+#include 
+#include 
+
+int finished = 0;
+int event_pending = 0;
+omp_event_handle_t detach_event;
+
+void*
+fulfill_thread (void *)
+{
+  while (!__atomic_load_n (&finished, __ATOMIC_RELAXED))
+{
+  if (__atomic_load_n (&event_pending, __ATOMIC_ACQUIRE))
+   {
+ omp_fulfill_event (detach_event);
+ __atomic_store_n (&event_pending, 0, __ATOMIC_RELEASE);
+   }
+
+  sleep(1);
+}
+
+  return 0;
+}
+
+int
+main (void)
+{
+  pthread_t thr;
+  int dep;
+  pthread_create (&thr, NULL, fulfill_thread, 0);
+
+  #pragma omp parallel
+#pragma omp single
+  {
+   omp_event_handle_t ev;
+
+   #pragma omp task depend (out: dep) detach (ev)
+   {
+ detach_event = ev;
+ __atomic_store_n (&event_pending, 1, __ATOMIC_RELEASE);
+   }
+
+   #pragma omp task depend (in: dep)
+   {
+ __atomic_store_n (&finished, 1, __ATOMIC_RELAXED);
+   }
+  }
+
+
+  pthread_join (thr, 0);
+}
-- 
2.30.0.335.ge636282



[PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tom de Vries
Hi,

[ Tobias, can you test this on volta ? ]

The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.

Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.

Tested libgomp on x86_64-linux with nvptx accelerator.

Any comments?

Thanks,
- Tom

[nvptx] Handle memmodel for atomic ops

gcc/ChangeLog:

2021-05-17  Tom de Vries  

PR target/100497
* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
* config/nvptx/nvptx.c (nvptx_output_barrier)
(nvptx_output_atomic_insn): New function.
(nvptx_print_operand): Add support for 'B'.
* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
insns.

---
 gcc/config/nvptx/nvptx-protos.h |  1 +
 gcc/config/nvptx/nvptx.c| 77 +
 gcc/config/nvptx/nvptx.md   | 31 ++---
 3 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 15122096487..b7e6ae26522 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -57,5 +57,6 @@ extern const char *nvptx_output_set_softstack (unsigned);
 extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
 extern const char *nvptx_output_red_partition (rtx, rtx);
+extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ebbfa921589..722b0faa330 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2444,6 +2444,53 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   return "%.\tcvt%t0%t1\t%0, %1;";
 }
 
+/* Output a pre/post barrier for MEM_OPERAND according to MEMMODEL.  */
+
+static void
+nvptx_output_barrier (rtx *mem_operand, int memmodel, bool pre_p)
+{
+  bool post_p = !pre_p;
+
+  switch (memmodel)
+{
+case MEMMODEL_RELAXED:
+  return;
+case MEMMODEL_CONSUME:
+case MEMMODEL_ACQUIRE:
+case MEMMODEL_SYNC_ACQUIRE:
+  if (post_p)
+   break;
+  return;
+case MEMMODEL_RELEASE:
+case MEMMODEL_SYNC_RELEASE:
+  if (pre_p)
+   break;
+  return;
+case MEMMODEL_ACQ_REL:
+case MEMMODEL_SEQ_CST:
+case MEMMODEL_SYNC_SEQ_CST:
+  if (pre_p || post_p)
+   break;
+  return;
+default:
+  gcc_unreachable ();
+}
+
+  output_asm_insn ("%.\tmembar%B0;", mem_operand);
+}
+
+const char *
+nvptx_output_atomic_insn (const char *asm_template, rtx *operands, int mem_pos,
+ int memmodel_pos)
+{
+  nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]),
+   true);
+  output_asm_insn (asm_template, operands);
+  nvptx_output_barrier (&operands[mem_pos], INTVAL (operands[memmodel_pos]),
+   false);
+  return "";
+}
+
 static void nvptx_print_operand (FILE *, rtx, int);
 
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
@@ -2660,6 +2707,36 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 
   switch (code)
 {
+case 'B':
+  if (SYMBOL_REF_P (XEXP (x, 0)))
+   switch (SYMBOL_DATA_AREA (XEXP (x, 0)))
+ {
+ case DATA_AREA_GENERIC:
+   /* Assume worst-case: global.  */
+   gcc_fallthrough (); /* FALLTHROUGH.  */
+ case DATA_AREA_GLOBAL:
+   break;
+ case DATA_AREA_SHARED:
+   fputs (".cta", file);
+   return;
+ case DATA_AREA_LOCAL:
+ case DATA_AREA_CONST:
+ case DATA_AREA_PARAM:
+ default:
+   gcc_unreachable ();
+ }
+
+  /* There are 2 cases where membar.sys differs from membar.gl:
+- host accesses global memory (f.i. systemwide atomics)
+- 2 or more devices are setup in peer-to-peer mode, and one
+  peer can access global memory of other peer.
+Neither are currently supported by openMP/OpenACC on nvptx, but
+that could change, so we default to membar.sys.  We could support
+this more optimally by adding DATA_AREA_SYS and then emitting
+.gl for DATA_AREA_GLOBAL and .sys for DATA_AREA_SYS.  */
+  fputs (".sys", file);
+  return;
+
 case 'A':
   x = XEXP (x, 0);
   gcc_fallthrough (); /* FALLTHROUGH. */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 00bb8fea821..108de1c0c59 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1642,7 +1642,11 @@
(set (match_dup 1)
(unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))]
   ""
-  "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"
+  {
+const char *t
+  = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+return nvptx_output_atomic_insn (t, operands, 1, 4);
+  }
   [(set_attr "atomic" "true")])
 
 (define_insn "atomic_exchange"
@@ -1654,7 +1658,11 @@
(set (match_dup 1)
(match_oper

Re: [PATCH] openmp: Notify team barrier of pending tasks in, omp_fulfill_event

2021-05-17 Thread Jakub Jelinek via Gcc-patches
On Mon, May 17, 2021 at 04:48:03PM +0100, Kwok Cheung Yeung wrote:
> 2021-05-17  Kwok Cheung Yeung  
> 
>   libgomp/
>   * task.c (omp_fulfill_event): Call gomp_team_barrier_set_task_pending
>   if new tasks generated.
>   * testsuite/libgomp.c-c++-common/task-detach-13.c: New.
> ---
>  libgomp/task.c|  1 +
>  .../libgomp.c-c++-common/task-detach-13.c | 60 +++
>  2 files changed, 61 insertions(+)
>  create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> 
> diff --git a/libgomp/task.c b/libgomp/task.c
> index 1c73c759a8d..feb4796a3ac 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -2460,6 +2460,7 @@ omp_fulfill_event (omp_event_handle_t event)
>if (new_tasks > 0)
>  {
>/* Wake up threads to run new tasks.  */
> +  gomp_team_barrier_set_task_pending (&team->barrier);
>do_wake = team->nthreads - team->task_running_count;
>if (do_wake > new_tasks)
>   do_wake = new_tasks;
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c 
> b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> new file mode 100644
> index 000..4306524526d
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> @@ -0,0 +1,60 @@
> +/* { dg-do run } */
> +/* { dg-options "-fopenmp" } */

-fopenmp as dg-options is implicit, please remove it.

> +/* { dg-timeout 10 } */

This will fail on targets that don't have pthreads.
We have already some tests that do use pthread_create,
and those currently use
/* { dg-do run { target *-*-linux* *-*-gnu* *-*-freebsd* } } */
so I'd do the same for this test.
There is also effective target pthread but am not sure if it covers
everything we need to test.

> +
> +
> +  pthread_join (thr, 0);

I'd add return 0;
While we default to C17 which doesn't need it, we don't say anywhere
in the testcase that it is C99+ or C++ only, so I think better make it valid
C89 too.

Otherwise LGTM, thanks.

Jakub



[PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Befriend
_Sentinel.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): Here.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.
---
 libstdc++-v3/include/std/ranges   | 15 ++---
 .../testsuite/std/ranges/adaptors/elements.cc | 31 +++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index fe6379fb858..bf52074ca05 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3749,15 +3749,22 @@ namespace views::__adaptor
  { return __x._M_current - __y._M_current; }
 
  friend _Sentinel<_Const>;
+ friend _Sentinel;
};
 
   template
struct _Sentinel
{
private:
- constexpr bool
- _M_equal(const _Iterator<_Const>& __x) const
- { return __x._M_current == _M_end; }
+ template
+   constexpr bool
+   _M_equal(const _Iterator<_Const2>& __x) const
+   { return __x._M_current == _M_end; }
+
+ template
+   constexpr auto
+   _M_distance_from(const _Iterator<_Const2>& __i) const
+   { return _M_end - __i._M_current; }
 
  using _Base = elements_view::_Base<_Const>;
  sentinel_t<_Base> _M_end = sentinel_t<_Base>();
@@ -3800,7 +3807,7 @@ namespace views::__adaptor
requires sized_sentinel_for, iterator_t<_Base2>>
friend constexpr range_difference_t<_Base>
operator-(const _Sentinel& __x, const _Iterator<_Const2>& __y)
-   { return __x._M_end - __y._M_current; }
+   { return __x._M_distance_from(__y); }
 
  friend _Sentinel;
};
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
index 134afd6a873..27aba2c0ff0 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
@@ -115,6 +115,35 @@ test05()
   VERIFY( r2[0] == 1 && r2[1] == 3 );
 }
 
+void
+test06()
+{
+  // PR libstdc++/100631
+  auto r = std::views::iota(0)
+| std::views::filter([](int){ return true; })
+| std::views::take(42)
+| std::views::reverse
+| std::views::transform([](int) { return std::make_pair(42, "hello"); })
+| std::views::take(42)
+| std::views::keys;
+  auto b = r.begin();
+  auto e = r.end();
+  e - b;
+}
+
+void
+test07()
+{
+  // PR libstdc++/100631 comment #2
+  auto r = std::views::iota(0)
+| std::views::transform([](int) { return std::make_pair(42, "hello"); })
+| std::views::keys;
+  auto b = std::ranges::cbegin(r);
+  auto e = std::end(r);
+  b.base() == e.base();
+  b == e;
+}
+
 int
 main()
 {
@@ -123,4 +152,6 @@ main()
   test03();
   test04();
   test05();
+  test06();
+  test07();
 }
-- 
2.31.1.621.g97eea85a0a



[PATCH] libstdc++: Fix condition for memoizing reverse_view::begin() [PR100621]

2021-05-17 Thread Patrick Palka via Gcc-patches
A range being a random access range is not a sufficient condition for
ranges::next(iter, sent) to have constant time complexity; the range
must also have a sized sentinel.  This adjusts the memoization condition
for reverse_view accordingly.

Tested on x86_64-pc-linxu-gnu, does this look OK for trunk?  Doesn't
seem to be worth backporting.

libstdc++-v3/ChangeLog:

* include/std/ranges (reverse_view::_S_needs_cached_begin):
Set to false if the underlying non-common random-access range
doesn't have a sized sentinel.
---
 libstdc++-v3/include/std/ranges | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index bf52074ca05..e93469ca3b4 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3340,7 +3340,9 @@ namespace views::__adaptor
 {
 private:
   static constexpr bool _S_needs_cached_begin
-   = !common_range<_Vp> && !random_access_range<_Vp>;
+   = !common_range<_Vp> && !(random_access_range<_Vp>
+ && sized_sentinel_for,
+   iterator_t<_Vp>>);
 
   [[no_unique_address]]
__detail::__maybe_present_t<_S_needs_cached_begin,
-- 
2.31.1.621.g97eea85a0a



Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Richard Biener via Gcc-patches
On Mon, May 17, 2021 at 5:33 PM Joern Rennecke
 wrote:
>
> On Mon, 17 May 2021 at 11:59, Richard Biener  
> wrote:
>
> > The plan for reload is to axe it similar to CC0 support.  Sooner than 
> > later, but
> > give it's still used exclusively by a lot of target means it might
> > take some time.
>
> > So for you it's always just -fretry-compilation -m[no-]lra?  Given 
> > -m[no-]lra
> > is a thing cycling between the two directly in RA lra/reload should be 
> > possible?
>
> Even if that were possible, it wouldn't solve the problem.  When I try 
> compiling
> newlib without -fretry-compilation, it's falling over first for
> libc/time/strftime.c .
> With lra, lra finishes, but it ignores an earlyclobber constraint, so
> reload_cse_simplify_operands ICEs.  With reload, you get a spill failure.
> I've tried various options, but only -O0 seems to work.  Compiling strftime 
> with
> -O0 is not really an issue because the target is too deeply embedded to hope
> to link something that uses strftime.  But identifyig all the files
> that can't be
> compiled with optimization and treating them differently is a problem if it 
> has
> to be done by hand.
>
> > Or are reload/LRA too greedy in that they ICE when having transformed half
> > of the code already?
>
> Both of them do a lot of transformations before they ICE.  Or they don't even
> ICE themselves, but leave behind invalid rtl that a later pass catches.
>
> Even if we fixed both passes so that they could roll back everything
> (which I think would be a lot harder for lra; reload can already roll
> back a lot),
> what's the point if you axe reload soon after?
>
> > I see.  It's of course difficult for the FSF tree to cater for
> > extremes that are not
> > represented in its tree.  I wonder what prevents you from contributing the 
> > port?
>
> I can neither confirm nor deny that I can't contribute the port.
>
> > Still if that solves a lot of the issues this seems like the way to go.
>
> It has merit in it's own right, but it can't fix all the ICEs, and thus 
> doesn't
> make building libraries manageable.

But then it's a sub-par quality port (whoever is to blame here), working
around this way "officially" doesn't sound like a good thing.  So I suppose
this plumbing as to stay private to your port.

Richard.


Re: [PATCH] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2021-05-17 Thread Alex Coplan via Gcc-patches
Hi Kyrill,

On 27/04/2021 13:47, Kyrylo Tkachov wrote:
> Hi Alex,
> 
> > -Original Message-
> > From: Alex Coplan 
> > Sent: 27 April 2021 14:14
> > To: gcc-patches@gcc.gnu.org
> > Cc: ni...@redhat.com; Richard Earnshaw ;
> > Ramana Radhakrishnan ; Kyrylo
> > Tkachov 
> > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> > march=armv8-m.base [PR99977]
> > 
> > Ping
> > 
> > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote:
> > > Hi all,
> > >
> > > The PR shows two ICEs with __sync_bool_compare_and_swap and
> > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and
> > one
> > > later on, after the CAS insn is split.
> > >
> > > The LRA ICE occurs because the
> > > @atomic_compare_and_swap_1 pattern
> > attempts to tie
> > > two output operands together (operands 0 and 1 in the third
> > > alternative). LRA can't handle this, since it doesn't make sense for an
> > > insn to assign to the same operand twice.
> > >
> > > The later (post-splitting) ICE occurs because the expansion of the
> > > cbranchsi4_scratch insn doesn't quite go according to plan. As it
> > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
> > > attempting to pass a register (neg_bval) to use as a scratch register.
> > > However, since the RTL template has a match_scratch here,
> > > gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
> > > Since this is all happening after RA, this is doomed to fail (and we get
> > > an ICE about the insn not matching its constraints).
> > >
> > > It seems that the motivation for the choice of constraints in the
> > > atomic_compare_and_swap pattern comes from an attempt to satisfy the
> > > constraints of the cbranchsi4_scratch insn. This insn requires the
> > > scratch register to be the same as the input register in the case that
> > > we use a larger negative immediate (one that satisfies J, but not L).
> > >
> > > Of course, as noted above, LRA refuses to assign two output operands to
> > > the same register, so this was never going to work.
> > >
> > > The solution I'm proposing here is to collapse the alternatives to the
> > > CAS insn (allowing the two output register operands to be matched to
> > > different registers) and to ensure that the constraints for
> > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this
> > by
> > > inserting a move to ensure the source and destination registers match if
> > > necessary (i.e. in the case of large negative immediates).
> > >
> > > Another notable change here is that we only do:
> > >
> > >   emit_move_insn (neg_bval, const1_rtx);
> > >
> > > for non-negative immediates. This is because the ADDS instruction used in
> > > the negative case suffices to leave a suitable value in neg_bval: if the
> > > operands compare equal, we don't take the branch (so neg_bval will be
> > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero
> > > value in neg_bval, which will correctly signal that the CAS has failed
> > > when it is later negated.
> > >
> > > Testing:
> > >  * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
> > >  * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base,
> > no
> > >  regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this 
> > > config.
> > >
> > > OK for trunk?
> 
> Ok.

The patch applies cleanly on the 11 branch and passes bootstrap/regtest on
arm-linux-gnueabihf as well as a regtest on arm-eabi configured with
--with-arch=armv8-m.base.

OK for the 11 branch? OK for the other affected branches if the same is
true there?

Thanks,
Alex

> Thanks,
> Kyrill
> 
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/99977
> > >   * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
> > >   with negative immediates: ensure we expand cbranchsi4_scratch
> > >   correctly and ensure we satisfy its constraints.
> > >   * config/arm/sync.md
> > >   (@atomic_compare_and_swap_1):
> > Don't
> > >   attempt to tie two output operands together with constraints;
> > >   collapse two alternatives.
> > >   (@atomic_compare_and_swap_1): Likewise.
> > >   * config/arm/thumb1.md (cbranchsi4_neg_late): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR target/99977
> > >   * gcc.target/arm/pr99977.c: New test.
> > 
> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > index 475fb0d827f..8d19b8a73fd 100644
> > > --- a/gcc/config/arm/arm.c
> > > +++ b/gcc/config/arm/arm.c
> > > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx
> > operands[])
> > >  }
> > >else
> > >  {
> > > -  emit_move_insn (neg_bval, const1_rtx);
> > >cond = gen_rtx_NE (VOIDmode, rval, oldval);
> > >if (thumb1_cmpneg_operand (oldval, SImode))
> > > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval,
> > > - label2, cond));
> > > + {
> > > +   rtx src = rval;
> > > +   i

RE: [PATCH] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Alex Coplan 
> Sent: 17 May 2021 17:29
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; ni...@redhat.com; Richard Earnshaw
> ; Ramana Radhakrishnan
> 
> Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> march=armv8-m.base [PR99977]
> 
> Hi Kyrill,
> 
> On 27/04/2021 13:47, Kyrylo Tkachov wrote:
> > Hi Alex,
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: 27 April 2021 14:14
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: ni...@redhat.com; Richard Earnshaw
> ;
> > > Ramana Radhakrishnan ; Kyrylo
> > > Tkachov 
> > > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> > > march=armv8-m.base [PR99977]
> > >
> > > Ping
> > >
> > > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote:
> > > > Hi all,
> > > >
> > > > The PR shows two ICEs with __sync_bool_compare_and_swap and
> > > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA
> and
> > > one
> > > > later on, after the CAS insn is split.
> > > >
> > > > The LRA ICE occurs because the
> > > > @atomic_compare_and_swap_1 pattern
> > > attempts to tie
> > > > two output operands together (operands 0 and 1 in the third
> > > > alternative). LRA can't handle this, since it doesn't make sense for an
> > > > insn to assign to the same operand twice.
> > > >
> > > > The later (post-splitting) ICE occurs because the expansion of the
> > > > cbranchsi4_scratch insn doesn't quite go according to plan. As it
> > > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
> > > > attempting to pass a register (neg_bval) to use as a scratch register.
> > > > However, since the RTL template has a match_scratch here,
> > > > gen_cbranchsi4_scratch ignores this argument and produces a scratch
> rtx.
> > > > Since this is all happening after RA, this is doomed to fail (and we get
> > > > an ICE about the insn not matching its constraints).
> > > >
> > > > It seems that the motivation for the choice of constraints in the
> > > > atomic_compare_and_swap pattern comes from an attempt to satisfy
> the
> > > > constraints of the cbranchsi4_scratch insn. This insn requires the
> > > > scratch register to be the same as the input register in the case that
> > > > we use a larger negative immediate (one that satisfies J, but not L).
> > > >
> > > > Of course, as noted above, LRA refuses to assign two output operands
> to
> > > > the same register, so this was never going to work.
> > > >
> > > > The solution I'm proposing here is to collapse the alternatives to the
> > > > CAS insn (allowing the two output register operands to be matched to
> > > > different registers) and to ensure that the constraints for
> > > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do
> this
> > > by
> > > > inserting a move to ensure the source and destination registers match if
> > > > necessary (i.e. in the case of large negative immediates).
> > > >
> > > > Another notable change here is that we only do:
> > > >
> > > >   emit_move_insn (neg_bval, const1_rtx);
> > > >
> > > > for non-negative immediates. This is because the ADDS instruction used
> in
> > > > the negative case suffices to leave a suitable value in neg_bval: if the
> > > > operands compare equal, we don't take the branch (so neg_bval will be
> > > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero
> > > > value in neg_bval, which will correctly signal that the CAS has failed
> > > > when it is later negated.
> > > >
> > > > Testing:
> > > >  * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
> > > >  * Regtested an arm-eabi cross configured with --with-arch=armv8-
> m.base,
> > > no
> > > >  regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this 
> > > > config.
> > > >
> > > > OK for trunk?
> >
> > Ok.
> 
> The patch applies cleanly on the 11 branch and passes bootstrap/regtest on
> arm-linux-gnueabihf as well as a regtest on arm-eabi configured with
> --with-arch=armv8-m.base.
> 
> OK for the 11 branch? OK for the other affected branches if the same is
> true there?

Yes,
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> > Thanks,
> > Kyrill
> >
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/99977
> > > > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
> > > > with negative immediates: ensure we expand cbranchsi4_scratch
> > > > correctly and ensure we satisfy its constraints.
> > > > * config/arm/sync.md
> > > > (@atomic_compare_and_swap_1):
> > > Don't
> > > > attempt to tie two output operands together with constraints;
> > > > collapse two alternatives.
> > > > (@atomic_compare_and_swap_1): Likewise.
> > > > * config/arm/thumb1.md (cbranchsi4_neg_late): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/99977
> > > > * gcc.target/arm/pr99977.c: New test.
> > >
> > > > diff --git a

Re: [PATCH] arm: Fix ICE with CMSE nonsecure call on Armv8.1-M [PR100333]

2021-05-17 Thread Richard Earnshaw via Gcc-patches




On 30/04/2021 09:30, Alex Coplan via Gcc-patches wrote:

Hi,

As the PR shows, we ICE shortly after expanding nonsecure calls for
Armv8.1-M.  For Armv8.1-M, we have TARGET_HAVE_FPCXT_CMSE. As it stands,
the expander (arm.md:nonsecure_call_internal) moves the callee's address
to a register (with copy_to_suggested_reg) only if
!TARGET_HAVE_FPCXT_CMSE.

However, looking at the pattern which the insn appears to be intended to
match (thumb2.md:*nonsecure_call_reg_thumb2_fpcxt), it requires the
callee's address to be in a register.

This patch therefore just forces the callee's address into a register in
the expander.

Testing:
  * Regtested an arm-eabi cross configured with
  --with-arch=armv8.1-m.main+mve.fp+fp.dp --with-float=hard. No regressions.
  * Bootstrap and regtest on arm-linux-gnueabihf in progress.

OK for trunk and backports as appropriate if bootstrap looks good?

Thanks,
Alex

gcc/ChangeLog:

PR target/100333
* config/arm/arm.md (nonsecure_call_internal): Always ensure
callee's address is in a register.

gcc/testsuite/ChangeLog:

PR target/100333
* gcc.target/arm/cmse/pr100333.c: New test.




-  "
   {
-if (!TARGET_HAVE_FPCXT_CMSE)
-  {
-   rtx tmp =
- copy_to_suggested_reg (XEXP (operands[0], 0),
-gen_rtx_REG (SImode, R4_REGNUM),
-SImode);
+rtx tmp = NULL_RTX;
+rtx addr = XEXP (operands[0], 0);

-   operands[0] = replace_equiv_address (operands[0], tmp);
-  }
-  }")
+if (TARGET_HAVE_FPCXT_CMSE && !REG_P (addr))
+  tmp = force_reg (SImode, addr);
+else if (!TARGET_HAVE_FPCXT_CMSE)
+  tmp = copy_to_suggested_reg (XEXP (operands[0], 0),
+  gen_rtx_REG (SImode, R4_REGNUM),
+  SImode);


I think it might be better to handle the !TARGET_HAVE_FPCXT_CMSE case 
via a pseudo as well, then we don't end up generating a potentially 
non-trivial insn that directly writes a fixed hard reg - it's better to 
let later passes clean that up if they can.


Also, you've extracted XEXP (operands[0], 0) into 'addr', but then 
continue to use the XEXP form in the existing path.  Please be 
consistent use XEXP directly everywhere, or use 'addr' everywhere.


So you want something like

  addr = XEXP (operands[0], 0);
  if (!REG_P (addr))
addr = force_reg (SImode, addr);

  if (!T_H_F_C)
addr = copy...(addr, gen(r4), SImode);

  operands[0] = replace_equiv_addr (operands[0], addr);

R.

R.


Re: [PATCH] libstdc++: Fix wrong thread waking on notify [PR100334]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 14/05/21 18:09 +0100, Jonathan Wakely wrote:

On 13/05/21 18:54 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

Please ignore the previous patch. This one removes the need to carry any
extra state in the case of a 'laundered' atomic wait.

libstdc++/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member function.
(__watier_base::_M_notify): Check _M_laundered() to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
---
libstdc++-v3/include/bits/atomic_wait.h   | 28 --
.../29_atomics/atomic/wait_notify/100334.cc   | 94 +++
2 files changed, 114 insertions(+), 8 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 984ed70f16c..07bb744d822 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -181,11 +181,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return false;
 }

+// return true if equal
   template
 bool __atomic_compare(const _Tp& __a, const _Tp& __b)
 {
// TODO make this do the correct padding bit ignoring comparison
-   return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) != 0;
+   return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
 }

   struct __waiter_pool_base
@@ -300,14 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  explicit __waiter_base(const _Up* __addr) noexcept
: _M_w(_S_for(__addr))
, _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
- {
- }
+ { }
+
+   bool
+   _M_laundered() const
+   { return _M_addr == &_M_w._M_ver; }

void
_M_notify(bool __all, bool __bare = false)
{
- if (_M_addr == &_M_w._M_ver)
-   __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
+ if (_M_laundered())
+   {
+ __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);


Please mention this increment in the changelog.


Ugh, sorry, I seem to have forgotten how to read a diff.


OK for trunk and gcc-11 with that change, thanks.


OK to push, no changes needed.




Re: [PATCH] libstdc++: Fix up semiregular-box partial specialization [PR100475]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 11:43 -0400, Patrick Palka via Libstdc++ wrote:

This makes the in-place constructor of our partial specialization of
__box for already-semiregular types to use direct-non-list-initialization
(in accordance with the specification of the primary template), and
additionally makes its data() member function use std::__addressof.

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?


Yes for all, thanks.




Re: [PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tobias Burnus

On 17.05.21 17:49, Tom de Vries wrote:

[ Tobias, can you test this on volta ? ]


Unfortunately, it does not seem to help. On a non-Volta system, it still
works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).

Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
  @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
with no prior membar in GOMP_atomic_start. Likewise with
nvptx-none/lib/libgomp.a which has
  atom.global.exch.b32 %r22,[atomic_lock],1;
I thought a barrier would show up there?


The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.
Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.


Is there a reason that PR target/96932 isn't listed in the
ChangeLog? Or is it supposed that the barrier does not show up
at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
and still help with those two testcases?

Sorry for not having better news. (Unless I messed up and it is an
issue on my side - but it doesn't look like.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH] libstdc++: Fix condition for memoizing reverse_view::begin() [PR100621]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 12:17 -0400, Patrick Palka via Libstdc++ wrote:

A range being a random access range is not a sufficient condition for
ranges::next(iter, sent) to have constant time complexity; the range
must also have a sized sentinel.  This adjusts the memoization condition
for reverse_view accordingly.

Tested on x86_64-pc-linxu-gnu, does this look OK for trunk?  Doesn't
seem to be worth backporting.


OK for trunk. I agree the backports probably aren't needed, but if
it causes anybody problems we can do it later.



libstdc++-v3/ChangeLog:

* include/std/ranges (reverse_view::_S_needs_cached_begin):
Set to false if the underlying non-common random-access range
doesn't have a sized sentinel.
---
libstdc++-v3/include/std/ranges | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index bf52074ca05..e93469ca3b4 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3340,7 +3340,9 @@ namespace views::__adaptor
{
private:
  static constexpr bool _S_needs_cached_begin
-   = !common_range<_Vp> && !random_access_range<_Vp>;
+   = !common_range<_Vp> && !(random_access_range<_Vp>
+ && sized_sentinel_for,
+   iterator_t<_Vp>>);

  [[no_unique_address]]
__detail::__maybe_present_t<_S_needs_cached_begin,
--
2.31.1.621.g97eea85a0a





Re: [PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 12:17 -0400, Patrick Palka via Libstdc++ wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?


OK, thanks.


libstdc++-v3/ChangeLog:

PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Befriend
_Sentinel.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): Here.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.




Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 11:43 -0400, Patrick Palka via Libstdc++ wrote:

This fixes two issues with our iterator caching as described in detail
in the PR.  Since r12-336 added the __non_propagating_cache class
template as part of P2328, this patch just rewrites the _CachedPosition
partial specialization in terms of this class template.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk? 


OK, thanks.


Shall we
also backport this?


I think so, but give it a couple of weeks (or more) on trunk first.




Re: [PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tom de Vries
On 5/17/21 6:47 PM, Tobias Burnus wrote:
> On 17.05.21 17:49, Tom de Vries wrote:
>> [ Tobias, can you test this on volta ? ]
> 
> Unfortunately, it does not seem to help. On a non-Volta system, it still
> works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).
> 
> Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
>  @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
> with no prior membar in GOMP_atomic_start.

I have:
...
@ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
@ %r25 membar.sys;
...

> Likewise with
> nvptx-none/lib/libgomp.a which has
>  atom.global.exch.b32 %r22,[atomic_lock],1;
> I thought a barrier would show up there?
> 

and:
...
atom.global.exch.b32 %r22,[atomic_lock],1;
membar.sys;
...

So both look as expect.

>> The atomic ops in nvptx.md have memmodel arguments, which are currently
>> ignored.
>> Handle these, fixing test-case fails
>> libgomp.c-c++-common/reduction-{5,6}.c
>> on volta.
> 
> Is there a reason that PR target/96932 isn't listed in the
> ChangeLog?

Just that I'm going to mark it a duplicate when this is fixed.

> Or is it supposed that the barrier does not show up
> at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
> and still help with those two testcases?
> 

Nope, it should show up.

> Sorry for not having better news. (Unless I messed up and it is an
> issue on my side - but it doesn't look like.)

Well yes, it's possible that the patch somehow does not work, but then
you'll need to investigate why that is.

Thanks,
- Tom



[committed] libstdc++: Fix std::jthread assertion and re-enable skipped test

2021-05-17 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/std/thread (jthread::_S_create): Fix static assert
message.
* testsuite/30_threads/jthread/95989.cc: Re-enable test.
* testsuite/30_threads/jthread/jthread.cc: Do not require
pthread effective target.
* testsuite/30_threads/jthread/2.cc: Moved to...
* testsuite/30_threads/jthread/version.cc: ...here.

Tested powerpc64le-linux. Committed to trunk.

Let's see if this test is actually fixed, or if it still causes
failures on some targets.


commit 60a156ae53e976dfe44689f7c89e607596e7cf67
Author: Jonathan Wakely 
Date:   Mon May 17 14:55:22 2021

libstdc++: Fix std::jthread assertion and re-enable skipped test

libstdc++-v3/ChangeLog:

* include/std/thread (jthread::_S_create): Fix static assert
message.
* testsuite/30_threads/jthread/95989.cc: Re-enable test.
* testsuite/30_threads/jthread/jthread.cc: Do not require
pthread effective target.
* testsuite/30_threads/jthread/2.cc: Moved to...
* testsuite/30_threads/jthread/version.cc: ...here.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 886994c1320..f51392ab42c 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -219,7 +219,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  {
static_assert(is_invocable_v,
 decay_t<_Args>...>,
- "std::thread arguments must be invocable after"
+ "std::jthread arguments must be invocable after"
  " conversion to rvalues");
return thread{std::forward<_Callable>(__f),
  std::forward<_Args>(__args)...};
diff --git a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
index 53f90827f2e..fb3f43bc722 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
@@ -20,7 +20,6 @@
 // { dg-require-gthreads {} }
 // { dg-additional-options "-pthread" { target pthread } }
 // { dg-additional-options "-static" { target static } }
-// { dg-skip-if "broken" { *-*-* } }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
index 6adc4981175..799787088ac 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
@@ -16,9 +16,9 @@
 // .
 
 // { dg-options "-std=gnu++2a -pthread" }
-// { dg-add-options libatomic }
 // { dg-do run { target c++2a } }
-// { dg-require-effective-target pthread }
+// { dg-add-options libatomic }
+// { dg-additional-options "-pthread" { target pthread } }
 // { dg-require-gthreads "" }
 
 #include 


Re: [PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 17/05/21 15:25 +0100, Jonathan Wakely wrote:

On 17/05/21 15:02 +0100, Jonathan Wakely wrote:

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


Oh actually this is needed for experimental::filesystem::path on trun
kand gcc-11 (as I found when I added to the new tests to trunk) so
I'll fix it there too.


Here's the patch for trunk and gcc-11.

commit 45aa7a447652e8541cc381d7ab128544f81ed857
Author: Jonathan Wakely 
Date:   Mon May 17 11:54:06 2021

libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/experimental/bits/fs_path.h (__is_constructible_from):
Test construction from a const lvalue, not an rvalue.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index 2df2bba3dcd..1ecf2f3a7bd 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -124,7 +124,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
+
+void f(bool) { }
+void f(const std::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
new file mode 100644
index 000..b2428ff74cf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+
+void f(bool) { }
+void f(const std::experimental::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}


Re: [PATCH] PR libstdc++/89728 diagnose some missuses of [locale.convenience] functions

2021-05-17 Thread Jonathan Wakely via Gcc-patches

On 12/05/21 17:16 +0100, Jonathan Wakely wrote:

On 12/05/21 18:51 +0300, Antony Polukhin via Libstdc++ wrote:

ср, 12 мая 2021 г. в 18:38, Antony Polukhin :


ср, 12 мая 2021 г. в 17:44, Jonathan Wakely :


On 12/05/21 12:58 +0300, Antony Polukhin wrote:
>ср, 12 мая 2021 г. в 12:18, Jonathan Wakely :
><...>
>> Or just leave it undefined, as libc++ seems to do according to your
>> comment in PR 89728:
>>
>> error: implicit instantiation of undefined template 
'std::__1::ctype >'
>>
>> Was your aim to have a static_assert that gives a more descriptive
>> error? We could leave it undefined in C++98 and have the static assert
>> for C++11 and up.
>
>Leaving it undefined would be the best. It would allow SFINAE on ctype
>and a compile time error is informative enough.
>
>However, there may be users who instantiate ctype in a
>shared library without ctype template specializations in
>the main executable. Making the default ctype undefined would break
>their compilation:
>
>#include 
>// no ctype specialization
>c = std::tolower(ThierChar{42}, locale_from_shared_library()); // OK
>right now in libstdc++, fails on libc++

What I meant was leaving the partial specialization undefined, not the
primary template, i.e.

--- a/libstdc++-v3/include/bits/locale_facets.h
+++ b/libstdc++-v3/include/bits/locale_facets.h
@@ -1476,6 +1476,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  };
  #endif //_GLIBCXX_USE_WCHAR_T

+  template
+class ctype >;
+
/// class ctype_byname [22.2.1.2].
template
  class ctype_byname : public ctype<_CharT>

This makes your test fail with errors like this:

In file included from /home/jwakely/gcc/12/include/c++/12.0.0/locale:40,
  from loc.C:1:
/home/jwakely/gcc/12/include/c++/12.0.0/bits/locale_facets.h: In instantiation of 'bool 
std::isspace(_CharT, const std::locale&) [with _CharT = 
std::__cxx11::basic_string]':
loc.C:16:15:   required from here
/home/jwakely/gcc/12/include/c++/12.0.0/bits/locale_facets.h:2600:47: error: invalid use of 
incomplete type 'const class std::ctype >'
  2600 | { return use_facet >(__loc).is(ctype_base::space, 
__c); }
   |  ~^~

But it shouldn't affect the uses of ctype.

What do you think?


Good idea. That way the compiler message points directly to the
misused function.

Patch is in attachment


Replaced {} with () in test to be C++98 compatible


Looks great, thanks.

I'll test and commit this tomorrow.


Not quite "tomorrow", but it's pushed to trunk now. Thanks again!




  1   2   >