date:20210112

RE: [PATCH v2] aarch64: Add cpu cost tables for A64FX

2021-01-12 Thread Qian, Jianhua

Hi Richard

> >> > ChangeLog:
> >> > 2021-01-08 Qian jianhua 
> >> >
> >> > gcc/
> >> >  * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New.
> >> >  * config/aarch64/aarch64.c (a64fx_addrcost_table): New.
> >> >  (a64fx_regmove_cost, a64fx_vector_cost): New.
> >> >  (a64fx_tunings): Use the new added cost tables.
> >>
> >> OK for trunk, thanks.  The v1 patch is OK for branches that support
> >> -mcpu=a64fx.
> >>
> >> Would you like commit access, so that you can commit it yourself?
> >> If so, please fill out the form mentioned at the beginning of
> >> https://gcc.gnu.org/gitwrite.html listing me as sponsor.
> >>
> > It‘s my pleasure. I've applied it.
> 
> Great!
> 
> > Thank you so much.
> >
> > I don't quite know the process of gcc source committing.
> > If I have the commit access, how will process be different?
> 
> The patch submission process is pretty much the same: patches need to be
> sent to the list and most patches need to be approved by a reviewer or
> maintainer.  The main differences are:
> 
> - If a patch is “obviously correct”, you can apply it without going
>   through the approval process.  (Please still send the patch to the
>   list though.)
> 
> - Once a patch has been approved, you can commit the patch yourself,
>   rather than rely on someone else to do it for you.  The main benefits
>   of this are:
> 
>   - You can commit from the tree that you actually tested.
> 
>   - You can deal with any merge conflicts caused by other people's
> patches without having to go through another review cycle.  (Most
> merge conflict resolutions are “obvious” and so don't need approval.)
> 
>   - A typical workflow is to test a patch on trunk, post it for review,
> and ask for approval to apply the patch to both trunk and whichever
> branches are appropriate.  If the patch is approved, you can later
> test the patch on the approved branches (at your own pace) and
> apply it if the tests pass.
> 
> In terms of the mechanics of committing, just “git push” should work.
> The server hooks will check for things like a well-formed changelog.
> 

Thanks for detailed information. I understood.

> https://gcc.gnu.org/gitwrite.html has more info about the process in general.
> Quoting from that page, the next step is:
> 
>   Check out a tree using the instructions below and add yourself to the
>   MAINTAINERS file. Note: Your first and last names must be exactly the
>   same between your account on gcc.gnu.org and the MAINTAINERS file.
>   Place your name in the correct section following the conventions
>   specified in the file (e.g. "Write After Approval" is "last name
>   alphabetical order").
> 

I've already added myself to MAINTAINERS file.

>   Then produce a diff to that file and circulate it to the gcc-patches
>   list, whilst also checking in your change to test write access
>   (approval from the mailing list is not needed in this one case). For
>   all other changes, please be sure to follow the write access policies
>   below.
> 
> > And which branch, which range(aarch64?) could I commit patches to?
> 
> This patch should go to master.  The v1 patch should go to
> releases/gcc-10 and releases/gcc-9.
> 
> You might need to remove some lines from the cost tables when backporting to
> gcc-10 and gcc-9 (I haven't checked).  If so, that kind of change counts as
> “obviously correct” and so doesn't need approval.
> 

I will commit this patch to master tomorrow.
Then make some changes, and push to gcc-10 and gcc-9. 

> Hope this helps.  Please let me know if you have any questions.
> 

Thanks again.

Regards,
Qian

> Thanks,
> Richard
>

Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]

2021-01-12 Thread Richard Biener





On Mon, 11 Jan 2021, Jeff Law wrote:




On 1/9/21 5:43 PM, Maciej W. Rozycki wrote:

On Mon, 21 Dec 2020, Jakub Jelinek wrote:


This patch adds the ~(X - Y) -> ~X + Y simplification requested
in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
be safely negated.

 This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
its similar counterparts) with the `vax-netbsdelf' target.

The point of the match.pd changes is to canonicalize GIMPLE on some form
when there are several from GIMPLE POV equivalent or better forms of writing
the same thing.  The advantage of having one canonical way is that ICF,
SCCVN etc. optimizations can then understand the different forms are
equivalent.

 Fair enough, though in cases like this I think it is unclear which of the
two forms is going to be ultimately better, especially as it may depend on
the exact form of the operands used, e.g. values of any immediates, so I
think a way to make the reverse transformation (whether to undo one made
here or genuinely) needs to be available at a later compilation stage.
One size doesn't fit all.

 With this in mind...

So in this case the number of operations are the same before/after and
parallelism is the same before/after, register lifetimes, etc.   I doubt
either form is particularly better suited for CSE or gives better VRP
data, etc.   The fact that we can't always do ~(X +C) -> ~X + -C
probably argues against that form ever so slightly.





If another form is then better for a particular machine, it should be done
either during expansion (being able to produce both RTLs and computing their
costs), or during combine with either combine splitters or
define_insn_and_split in the backend, or, if it can't be done in RTL, during
the isel pass.

 Hmm, maybe it has been discussed before, so please forgive me if I write
something silly, but it seems to me like this should be done in a generic
way like match.pd so that all the targets do not have to track the changes
made there and then perhaps repeat the same or similar code each.  So I
think it would make sense to make a change like this include that reverse
transformation as well, so that ultimately both forms are tried with RTL,
as there is no clear advantage to either here.

The idea we've kicked around in the past was to use the same syntax as
match.pd, but have it be target dependent to reform expressions in ways
that are beneficial to the target and have it run at the end of the
gimple/ssa pipeline.  Nobody's implemented this though.


Yes.  And while a gimple-to-gimple transform is indeed quite simple
to make eventually a match.pd-like gimple-to-RTL would be more
useful in the end.  Note RTL can eventually be emulated close enough
via the use of internal functions mapping to optabs.  But still
complex combined instructions will need expander help unless we
expose all named instruction RTL patterns as target specific
internal functions to use from that .pd file.

Richard.


jeff

[PATCH] reassoc: Optimize in reassoc x < 0 && y < 0 to (x | y) < 0 etc. [PR95731]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

We already had x != 0 && y != 0 to (x | y) != 0 and
x != -1 && y != -1 to (x & y) != -1 and
x < 32U && y < 32U to (x | y) < 32U, this patch adds signed
x < 0 && y < 0 to (x | y) < 0.  In that case, the low/high seem to be
always the same and just in_p indices whether it is >= 0 or < 0,
also, all types in the same bucket (same precision) should be type
compatible, but we can have some >= 0 and some < 0 comparison mixed,
so the patch handles that by using the right BIT_IOR_EXPR or BIT_AND_EXPR
and doing one set of < 0 or >= 0 first, then BIT_NOT_EXPR and then the other
one.  I had to move optimize_range_tests_var_bound before this optimization
because that one deals with signed a >= 0 && a < b, and limited it to the
last reassoc pass as reassoc itself can't virtually undo this optimization
yet (and not sure if vrp would be able to).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-12  Jakub Jelinek  

PR tree-optimization/95731
* tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize
x < 0 && y < 0 && z < 0 into (x | y | z) < 0 for signed x, y, z.
(optimize_range_tests): Call optimize_range_tests_cmp_bitwise
only after optimize_range_tests_var_bound.

* gcc.dg/tree-ssa/pr95731.c: New test.
* gcc.c-torture/execute/pr95731.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2021-01-11 10:35:02.204501369 +0100
+++ gcc/tree-ssa-reassoc.c  2021-01-11 15:20:17.578155827 +0100
@@ -3320,7 +3320,8 @@ optimize_range_tests_to_bit_test (enum t
 /* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0
and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1.
Also, handle x < C && y < C && z < C where C is power of two as
-   (x | y | z) < C.  */
+   (x | y | z) < C.  And also handle signed x < 0 && y < 0 && z < 0
+   as (x | y | z) < 0.  */
 
 static bool
 optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int length,
@@ -3340,13 +3341,13 @@ optimize_range_tests_cmp_bitwise (enum t
 
   if (ranges[i].exp == NULL_TREE
  || TREE_CODE (ranges[i].exp) != SSA_NAME
- || !ranges[i].in_p
  || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1
  || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE)
continue;
 
   if (ranges[i].low != NULL_TREE
  && ranges[i].high != NULL_TREE
+ && ranges[i].in_p
  && tree_int_cst_equal (ranges[i].low, ranges[i].high))
{
  idx = !integer_zerop (ranges[i].low);
@@ -3354,7 +3355,8 @@ optimize_range_tests_cmp_bitwise (enum t
continue;
}
   else if (ranges[i].high != NULL_TREE
-  && TREE_CODE (ranges[i].high) == INTEGER_CST)
+  && TREE_CODE (ranges[i].high) == INTEGER_CST
+  && ranges[i].in_p)
{
  wide_int w = wi::to_wide (ranges[i].high);
  int prec = TYPE_PRECISION (TREE_TYPE (ranges[i].exp));
@@ -3370,10 +3372,20 @@ optimize_range_tests_cmp_bitwise (enum t
&& integer_zerop (ranges[i].low
continue;
}
+  else if (ranges[i].high == NULL_TREE
+  && ranges[i].low != NULL_TREE
+  /* Perform this optimization only in the last
+ reassoc pass, as it interferes with the reassociation
+ itself or could also with VRP etc. which might not
+ be able to virtually undo the optimization.  */
+  && !reassoc_insert_powi_p
+  && !TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp))
+  && integer_zerop (ranges[i].low))
+   idx = 3;
   else
continue;
 
-  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 3 + idx;
+  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 4 + idx;
   if (buckets.length () <= b)
buckets.safe_grow_cleared (b + 1, true);
   if (chains.length () <= (unsigned) i)
@@ -3386,7 +3398,7 @@ optimize_range_tests_cmp_bitwise (enum t
 if (i && chains[i - 1])
   {
int j, k = i;
-   if ((b % 3) == 2)
+   if ((b % 4) == 2)
  {
/* When ranges[X - 1].high + 1 is a power of two,
   we need to process the same bucket up to
@@ -3439,6 +3451,19 @@ optimize_range_tests_cmp_bitwise (enum t
  {
tree type = TREE_TYPE (ranges[j - 1].exp);
strict_overflow_p |= ranges[j - 1].strict_overflow_p;
+   if ((b % 4) == 3)
+ {
+   /* For the signed < 0 cases, the types should be
+  really compatible (all signed with the same precision,
+  instead put ranges that have different in_p from
+  k first.  */
+   if (!useless_type_conversion_p (type1, type))
+ continue;
+   if (ranges[j - 1].in_p != ranges[k - 1].in_p)
+ candidates.safe_push (&ranges[j - 1]);
+   type2 = type1;
+   continue;
+

[PATCH] configure, make: Fix up --enable-link-serialization

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

As reported by Matthias, --enable-link-serialization=1 can currently start
two concurrent links first (e.g. gnat1 and cc1).
The problem is that make var = value values seem to work differently between
dependencies and actual rules (where it was tested).
As the language make fragments can be in different order, we can have:
# Part of Makefile added by configure
ada.prev = ... magic that will become $(c.serial) under 
--enable-link-serialization=1
# ada/gcc-interface/Make-lang.in
gnat1$(exe): . $(ada.prev)
...
# c/Make-lang.in
c.serial = cc1$(exe)
and while if I add echo $(ada.prev) in the gnat1 rule's command, it prints
cc1, the dependencies are actually evaluated during reading of the goal or
when.
The configure creates (and puts into Makefile) some serialization order of
the languages and in that order c always comes first, and the rest is
actually sorted the way the all_lang_makefrags are already sorted,
so just by forcing c/Make-lang.in first we achieve that X.serial variable
is always defined before some other Y.prev will use it in its goal
dependencies.

Bootstrapped/regtested on x86_64-linux and i686-linux, without
--enable-link-serialization altogether, with =1 and with =3, all results
look good.  Ok for trunk?

2021-01-12  Jakub Jelinek  

* configure.ac: Ensure c/Make-lang.in comes first in 
@all_lang_makefrags@.
* configure: Regenerated.

--- gcc/configure.ac.jj 2021-01-05 13:57:59.911905006 +0100
+++ gcc/configure.ac2021-01-11 14:49:22.878218318 +0100
@@ -6975,7 +6975,12 @@ changequote([,])dnl
$ok || continue
 
all_lang_configurefrags="$all_lang_configurefrags 
\$(srcdir)/$gcc_subdir/config-lang.in"
-   all_lang_makefrags="$all_lang_makefrags 
\$(srcdir)/$gcc_subdir/Make-lang.in"
+   if test "x$language" = xc && test -n "$all_lang_makefrags"; then
+   # Put c/Make-lang.in fragment first to match serialization 
languages order.
+   all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in 
$all_lang_makefrags"
+   else
+   all_lang_makefrags="$all_lang_makefrags 
\$(srcdir)/$gcc_subdir/Make-lang.in"
+   fi
if test -f $srcdir/$gcc_subdir/lang.opt; then
lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt"
all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt"
--- gcc/configure.jj2021-01-06 22:17:07.275967932 +0100
+++ gcc/configure   2021-01-11 14:49:29.708140555 +0100
@@ -31174,7 +31174,12 @@ do
$ok || continue
 
all_lang_configurefrags="$all_lang_configurefrags 
\$(srcdir)/$gcc_subdir/config-lang.in"
-   all_lang_makefrags="$all_lang_makefrags 
\$(srcdir)/$gcc_subdir/Make-lang.in"
+   if test "x$language" = xc && test -n "$all_lang_makefrags"; then
+   # Put c/Make-lang.in fragment first to match serialization 
languages order.
+   all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in 
$all_lang_makefrags"
+   else
+   all_lang_makefrags="$all_lang_makefrags 
\$(srcdir)/$gcc_subdir/Make-lang.in"
+   fi
if test -f $srcdir/$gcc_subdir/lang.opt; then
lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt"
all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt"

Jakub

[PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch adds patterns (in the end I went with define_insn rather
than combiner define_split + define_insn_and_split I initially hoped or
define_insn_and_split) to represent (so far 128-bit only) permutations
like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second
operand is CONST0_RTX CONST_VECTOR as pmovzx.
define_split didn't work (due to the combiner not trying combine_split_insn
when i1 is NULL) but in the end was quite large, and the reason for not
trying to split this afterwards is the different vector mode of the output,
and lowpart_subreg on the result is undesirable,
so we'd need to split it into two instructions and hope some later pass
optimizes the move into just rewriting the uses using lowpart_subreg.
While I initially wrote also avx2 and avx512{bw,f} patterns, turns out that
without further changes they aren't useful, such permutations are expanded
into multiple permutation instructions and while combiner in theory could
be able to undo that, what stops it is that the const_vector of 0 has
multiple uses.  So either we'd need to allow pre-reload const0_operand
next to registers (and sometimes memory) in some permutations, or perhaps
change the vec_perm_const target hook calling code to not force zero vectors
into registers and change each vec_perm_const hook to force it into register
itself if it can't expand such a permutation smartly.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-12  Jakub Jelinek  

PR target/95905
* config/i386/predicates.md (pmovzx_parallel): New predicate.
* config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3,
*sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New
define_insn patterns.

* gcc.target/i386/pr95905-1.c: New test.
* gcc.target/i386/pr95905-2.c: New test.

--- gcc/config/i386/predicates.md.jj2021-01-04 10:25:45.248161185 +0100
+++ gcc/config/i386/predicates.md   2021-01-11 19:40:04.793433826 +0100
@@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel"
   return true;
 })
 
+;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select,
+;; where one of the two operands of the vec_concat is const0_operand.
+(define_predicate "pmovzx_parallel"
+  (and (match_code "parallel")
+   (match_code "const_int" "a"))
+{
+  int nelt = XVECLEN (op, 0);
+  int elt, i;
+
+  if (nelt < 2)
+return false;
+
+  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
+ For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }.  */
+  elt = INTVAL (XVECEXP (op, 0, 0));
+  if (elt == 0)
+{
+  for (i = 1; i < nelt; ++i)
+   if ((i & 1) != 0)
+ {
+   if (INTVAL (XVECEXP (op, 0, i)) < nelt)
+ return false;
+ }
+   else if (INTVAL (XVECEXP (op, 0, i)) != i / 2)
+ return false;
+}
+  else
+return false;
+
+  return true;
+})
+
 ;; Return true if OP is a parallel for a vbroadcast permute.
 (define_predicate "avx_vbroadcast_operand"
   (and (match_code "parallel")
--- gcc/config/i386/sse.md.jj   2021-01-07 15:29:52.604974544 +0100
+++ gcc/config/i386/sse.md  2021-01-11 21:19:06.247346965 +0100
@@ -17683,6 +17683,27 @@ (define_insn_and_split "*sse4_1_v8
(any_extend:V8HI (match_dup 1)))]
   "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);")
 
+(define_insn "*sse4_1_zero_extendv8qiv8hi2_3"
+  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm")
+   (match_operand:V16QI 2 "const0_operand" "C,C,C"))
+ (match_parallel 3 "pmovzx_parallel"
+   [(match_operand 4 "const_int_operand" "n,n,n")])))]
+  "TARGET_SSE4_1"
+{
+  if (MEM_P (operands[1]))
+return "%vpmovzxbw\t{%1, %0|%0, %1}";
+  else
+return "%vpmovzxbw\t{%1, %0|%0, %q1}";
+}
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_expand "v8qiv8hi2"
   [(set (match_operand:V8HI 0 "register_operand")
(any_extend:V8HI
@@ -17929,6 +17950,27 @@ (define_expand "v4hiv4si2"
 }
 })
 
+(define_insn "*sse4_1_zero_extendv4hiv4si2_3"
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
+   (vec_select:V8HI
+ (vec_concat:V16HI
+   (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm")
+   (match_operand:V8HI 2 "const0_operand" "C,C,C"))
+ (match_parallel 3 "pmovzx_parallel"
+   [(match_operand 4 "const_int_operand" "n,n,n")])))]
+  "TARGET_SSE4_1"
+{
+  if (MEM_P (operands[1]))
+return "%vpmovzxwd\t{%1, %0|%0, %1}";
+  else
+return "%vpmovzxwd\t{%1, %0|%0, %q1}";
+}
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI

[PATCH] widening_mul: Fix up ICE caused by my signed multiplication overflow pattern recognition changes [PR98629]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

As the testcase shows, my latest changes caused ICE on that testcase.
The problem is that arith_overflow_check_p now can change the use_stmt
argument (has a reference), so that if it succeeds (returns non-zero),
it points it to the GIMPLE_COND or EQ/NE or COND_EXPR assignment from the
TRUNC_DIV_EXPR assignment.
The problem was that it would change use_stmt also if it returned 0 in some
cases, such as multiple imm uses of the division, and in one of the callers
if arith_overflow_check_p returns 0 it looks at use_stmt again and performs
other checks, which of course assumes that use_stmt is the one passed
to arith_overflow_check_p and not e.g. NULL instead or some other unrelated
stmt.

The following patch fixes that by only changing use_stmt when we are about
to return non-zero (for the MULT_EXPR case, which is the only one with the
need to use different use_stmt).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-12  Jakub Jelinek  

PR tree-optimization/98629
* tree-ssa-math-opts.c (arith_overflow_check_p): Don't update use_stmt
unless returning non-zero.

* gcc.c-torture/compile/pr98629.c: New test.

--- gcc/tree-ssa-math-opts.c.jj 2021-01-11 10:35:02.196501461 +0100
+++ gcc/tree-ssa-math-opts.c2021-01-11 23:56:24.154380944 +0100
@@ -3667,6 +3667,7 @@ arith_overflow_check_p (gimple *stmt, gi
   tree rhs1 = gimple_assign_rhs1 (stmt);
   tree rhs2 = gimple_assign_rhs2 (stmt);
   tree multop = NULL_TREE, divlhs = NULL_TREE;
+  gimple *cur_use_stmt = use_stmt;
 
   if (code == MULT_EXPR)
 {
@@ -3697,26 +3698,26 @@ arith_overflow_check_p (gimple *stmt, gi
   if (!divlhs)
return 0;
   use_operand_p use;
-  if (!single_imm_use (divlhs, &use, &use_stmt))
+  if (!single_imm_use (divlhs, &use, &cur_use_stmt))
return 0;
 }
-  if (gimple_code (use_stmt) == GIMPLE_COND)
+  if (gimple_code (cur_use_stmt) == GIMPLE_COND)
 {
-  ccode = gimple_cond_code (use_stmt);
-  crhs1 = gimple_cond_lhs (use_stmt);
-  crhs2 = gimple_cond_rhs (use_stmt);
+  ccode = gimple_cond_code (cur_use_stmt);
+  crhs1 = gimple_cond_lhs (cur_use_stmt);
+  crhs2 = gimple_cond_rhs (cur_use_stmt);
 }
-  else if (is_gimple_assign (use_stmt))
+  else if (is_gimple_assign (cur_use_stmt))
 {
-  if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
+  if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS)
{
- ccode = gimple_assign_rhs_code (use_stmt);
- crhs1 = gimple_assign_rhs1 (use_stmt);
- crhs2 = gimple_assign_rhs2 (use_stmt);
+ ccode = gimple_assign_rhs_code (cur_use_stmt);
+ crhs1 = gimple_assign_rhs1 (cur_use_stmt);
+ crhs2 = gimple_assign_rhs2 (cur_use_stmt);
}
-  else if (gimple_assign_rhs_code (use_stmt) == COND_EXPR)
+  else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR)
{
- tree cond = gimple_assign_rhs1 (use_stmt);
+ tree cond = gimple_assign_rhs1 (cur_use_stmt);
  if (COMPARISON_CLASS_P (cond))
{
  ccode = TREE_CODE (cond);
@@ -3792,11 +3793,17 @@ arith_overflow_check_p (gimple *stmt, gi
{
  if ((crhs1 == divlhs && arith_cast_equal_p (crhs2, multop))
  || (crhs2 == divlhs && arith_cast_equal_p (crhs1, multop)))
-   return ccode == NE_EXPR ? 1 : -1;
+   {
+ use_stmt = cur_use_stmt;
+ return ccode == NE_EXPR ? 1 : -1;
+   }
}
  else if ((crhs1 == divlhs && operand_equal_p (crhs2, multop, 0))
   || (crhs2 == divlhs && crhs1 == multop))
-   return ccode == NE_EXPR ? 1 : -1;
+   {
+ use_stmt = cur_use_stmt;
+ return ccode == NE_EXPR ? 1 : -1;
+   }
}
   break;
 default:
--- gcc/testsuite/gcc.c-torture/compile/pr98629.c.jj2021-01-12 
00:00:21.132949613 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr98629.c   2021-01-11 
23:59:58.178206894 +0100
@@ -0,0 +1,11 @@
+/* PR tree-optimization/98629 */
+
+unsigned int a;
+int b, c;
+
+void
+foo (void)
+{
+  unsigned int *e = &a;
+  (a /= a |= b) - (0 <= (*e += *e)) * (c *= *e);
+}

Jakub

Re: [PATCH] reassoc: Optimize in reassoc x < 0 && y < 0 to (x | y) < 0 etc. [PR95731]

2021-01-12 Thread Richard Biener

On Tue, 12 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> We already had x != 0 && y != 0 to (x | y) != 0 and
> x != -1 && y != -1 to (x & y) != -1 and
> x < 32U && y < 32U to (x | y) < 32U, this patch adds signed
> x < 0 && y < 0 to (x | y) < 0.  In that case, the low/high seem to be
> always the same and just in_p indices whether it is >= 0 or < 0,
> also, all types in the same bucket (same precision) should be type
> compatible, but we can have some >= 0 and some < 0 comparison mixed,
> so the patch handles that by using the right BIT_IOR_EXPR or BIT_AND_EXPR
> and doing one set of < 0 or >= 0 first, then BIT_NOT_EXPR and then the other
> one.  I had to move optimize_range_tests_var_bound before this optimization
> because that one deals with signed a >= 0 && a < b, and limited it to the
> last reassoc pass as reassoc itself can't virtually undo this optimization
> yet (and not sure if vrp would be able to).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/95731
>   * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize
>   x < 0 && y < 0 && z < 0 into (x | y | z) < 0 for signed x, y, z.
>   (optimize_range_tests): Call optimize_range_tests_cmp_bitwise
>   only after optimize_range_tests_var_bound.
> 
>   * gcc.dg/tree-ssa/pr95731.c: New test.
>   * gcc.c-torture/execute/pr95731.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2021-01-11 10:35:02.204501369 +0100
> +++ gcc/tree-ssa-reassoc.c2021-01-11 15:20:17.578155827 +0100
> @@ -3320,7 +3320,8 @@ optimize_range_tests_to_bit_test (enum t
>  /* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0
> and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1.
> Also, handle x < C && y < C && z < C where C is power of two as
> -   (x | y | z) < C.  */
> +   (x | y | z) < C.  And also handle signed x < 0 && y < 0 && z < 0
> +   as (x | y | z) < 0.  */
>  
>  static bool
>  optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int 
> length,
> @@ -3340,13 +3341,13 @@ optimize_range_tests_cmp_bitwise (enum t
>  
>if (ranges[i].exp == NULL_TREE
> || TREE_CODE (ranges[i].exp) != SSA_NAME
> -   || !ranges[i].in_p
> || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1
> || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE)
>   continue;
>  
>if (ranges[i].low != NULL_TREE
> && ranges[i].high != NULL_TREE
> +   && ranges[i].in_p
> && tree_int_cst_equal (ranges[i].low, ranges[i].high))
>   {
> idx = !integer_zerop (ranges[i].low);
> @@ -3354,7 +3355,8 @@ optimize_range_tests_cmp_bitwise (enum t
>   continue;
>   }
>else if (ranges[i].high != NULL_TREE
> -&& TREE_CODE (ranges[i].high) == INTEGER_CST)
> +&& TREE_CODE (ranges[i].high) == INTEGER_CST
> +&& ranges[i].in_p)
>   {
> wide_int w = wi::to_wide (ranges[i].high);
> int prec = TYPE_PRECISION (TREE_TYPE (ranges[i].exp));
> @@ -3370,10 +3372,20 @@ optimize_range_tests_cmp_bitwise (enum t
>   && integer_zerop (ranges[i].low
>   continue;
>   }
> +  else if (ranges[i].high == NULL_TREE
> +&& ranges[i].low != NULL_TREE
> +/* Perform this optimization only in the last
> +   reassoc pass, as it interferes with the reassociation
> +   itself or could also with VRP etc. which might not
> +   be able to virtually undo the optimization.  */
> +&& !reassoc_insert_powi_p
> +&& !TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp))
> +&& integer_zerop (ranges[i].low))
> + idx = 3;
>else
>   continue;
>  
> -  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 3 + idx;
> +  b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 4 + idx;
>if (buckets.length () <= b)
>   buckets.safe_grow_cleared (b + 1, true);
>if (chains.length () <= (unsigned) i)
> @@ -3386,7 +3398,7 @@ optimize_range_tests_cmp_bitwise (enum t
>  if (i && chains[i - 1])
>{
>   int j, k = i;
> - if ((b % 3) == 2)
> + if ((b % 4) == 2)
> {
>   /* When ranges[X - 1].high + 1 is a power of two,
>  we need to process the same bucket up to
> @@ -3439,6 +3451,19 @@ optimize_range_tests_cmp_bitwise (enum t
> {
>   tree type = TREE_TYPE (ranges[j - 1].exp);
>   strict_overflow_p |= ranges[j - 1].strict_overflow_p;
> + if ((b % 4) == 3)
> +   {
> + /* For the signed < 0 cases, the types should be
> +really compatible (all signed with the same precision,
> +instead put ranges that have different in_p from
> +k first.  */
> + if (!useless_type_conversion_p (type1, type))
> +   continue;
> + if (ranges[j

Re: [PATCH] configure, make: Fix up --enable-link-serialization

2021-01-12 Thread Richard Biener

On Tue, 12 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> As reported by Matthias, --enable-link-serialization=1 can currently start
> two concurrent links first (e.g. gnat1 and cc1).
> The problem is that make var = value values seem to work differently between
> dependencies and actual rules (where it was tested).
> As the language make fragments can be in different order, we can have:
> # Part of Makefile added by configure
> ada.prev = ... magic that will become $(c.serial) under 
> --enable-link-serialization=1
> # ada/gcc-interface/Make-lang.in
> gnat1$(exe): . $(ada.prev)
>   ...
> # c/Make-lang.in
> c.serial = cc1$(exe)
> and while if I add echo $(ada.prev) in the gnat1 rule's command, it prints
> cc1, the dependencies are actually evaluated during reading of the goal or
> when.
> The configure creates (and puts into Makefile) some serialization order of
> the languages and in that order c always comes first, and the rest is
> actually sorted the way the all_lang_makefrags are already sorted,
> so just by forcing c/Make-lang.in first we achieve that X.serial variable
> is always defined before some other Y.prev will use it in its goal
> dependencies.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, without
> --enable-link-serialization altogether, with =1 and with =3, all results
> look good.  Ok for trunk?

OK.

Richard.

> 2021-01-12  Jakub Jelinek  
> 
>   * configure.ac: Ensure c/Make-lang.in comes first in 
> @all_lang_makefrags@.
>   * configure: Regenerated.
> 
> --- gcc/configure.ac.jj   2021-01-05 13:57:59.911905006 +0100
> +++ gcc/configure.ac  2021-01-11 14:49:22.878218318 +0100
> @@ -6975,7 +6975,12 @@ changequote([,])dnl
>   $ok || continue
>  
>   all_lang_configurefrags="$all_lang_configurefrags 
> \$(srcdir)/$gcc_subdir/config-lang.in"
> - all_lang_makefrags="$all_lang_makefrags 
> \$(srcdir)/$gcc_subdir/Make-lang.in"
> + if test "x$language" = xc && test -n "$all_lang_makefrags"; then
> + # Put c/Make-lang.in fragment first to match serialization 
> languages order.
> + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in 
> $all_lang_makefrags"
> + else
> + all_lang_makefrags="$all_lang_makefrags 
> \$(srcdir)/$gcc_subdir/Make-lang.in"
> + fi
>   if test -f $srcdir/$gcc_subdir/lang.opt; then
>   lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt"
>   all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt"
> --- gcc/configure.jj  2021-01-06 22:17:07.275967932 +0100
> +++ gcc/configure 2021-01-11 14:49:29.708140555 +0100
> @@ -31174,7 +31174,12 @@ do
>   $ok || continue
>  
>   all_lang_configurefrags="$all_lang_configurefrags 
> \$(srcdir)/$gcc_subdir/config-lang.in"
> - all_lang_makefrags="$all_lang_makefrags 
> \$(srcdir)/$gcc_subdir/Make-lang.in"
> + if test "x$language" = xc && test -n "$all_lang_makefrags"; then
> + # Put c/Make-lang.in fragment first to match serialization 
> languages order.
> + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in 
> $all_lang_makefrags"
> + else
> + all_lang_makefrags="$all_lang_makefrags 
> \$(srcdir)/$gcc_subdir/Make-lang.in"
> + fi
>   if test -f $srcdir/$gcc_subdir/lang.opt; then
>   lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt"
>   all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] widening_mul: Fix up ICE caused by my signed multiplication overflow pattern recognition changes [PR98629]

2021-01-12 Thread Richard Biener

On Tue, 12 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, my latest changes caused ICE on that testcase.
> The problem is that arith_overflow_check_p now can change the use_stmt
> argument (has a reference), so that if it succeeds (returns non-zero),
> it points it to the GIMPLE_COND or EQ/NE or COND_EXPR assignment from the
> TRUNC_DIV_EXPR assignment.
> The problem was that it would change use_stmt also if it returned 0 in some
> cases, such as multiple imm uses of the division, and in one of the callers
> if arith_overflow_check_p returns 0 it looks at use_stmt again and performs
> other checks, which of course assumes that use_stmt is the one passed
> to arith_overflow_check_p and not e.g. NULL instead or some other unrelated
> stmt.
> 
> The following patch fixes that by only changing use_stmt when we are about
> to return non-zero (for the MULT_EXPR case, which is the only one with the
> need to use different use_stmt).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-01-12  Jakub Jelinek  
> 
>   PR tree-optimization/98629
>   * tree-ssa-math-opts.c (arith_overflow_check_p): Don't update use_stmt
>   unless returning non-zero.
> 
>   * gcc.c-torture/compile/pr98629.c: New test.
> 
> --- gcc/tree-ssa-math-opts.c.jj   2021-01-11 10:35:02.196501461 +0100
> +++ gcc/tree-ssa-math-opts.c  2021-01-11 23:56:24.154380944 +0100
> @@ -3667,6 +3667,7 @@ arith_overflow_check_p (gimple *stmt, gi
>tree rhs1 = gimple_assign_rhs1 (stmt);
>tree rhs2 = gimple_assign_rhs2 (stmt);
>tree multop = NULL_TREE, divlhs = NULL_TREE;
> +  gimple *cur_use_stmt = use_stmt;
>  
>if (code == MULT_EXPR)
>  {
> @@ -3697,26 +3698,26 @@ arith_overflow_check_p (gimple *stmt, gi
>if (!divlhs)
>   return 0;
>use_operand_p use;
> -  if (!single_imm_use (divlhs, &use, &use_stmt))
> +  if (!single_imm_use (divlhs, &use, &cur_use_stmt))
>   return 0;
>  }
> -  if (gimple_code (use_stmt) == GIMPLE_COND)
> +  if (gimple_code (cur_use_stmt) == GIMPLE_COND)
>  {
> -  ccode = gimple_cond_code (use_stmt);
> -  crhs1 = gimple_cond_lhs (use_stmt);
> -  crhs2 = gimple_cond_rhs (use_stmt);
> +  ccode = gimple_cond_code (cur_use_stmt);
> +  crhs1 = gimple_cond_lhs (cur_use_stmt);
> +  crhs2 = gimple_cond_rhs (cur_use_stmt);
>  }
> -  else if (is_gimple_assign (use_stmt))
> +  else if (is_gimple_assign (cur_use_stmt))
>  {
> -  if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
> +  if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS)
>   {
> -   ccode = gimple_assign_rhs_code (use_stmt);
> -   crhs1 = gimple_assign_rhs1 (use_stmt);
> -   crhs2 = gimple_assign_rhs2 (use_stmt);
> +   ccode = gimple_assign_rhs_code (cur_use_stmt);
> +   crhs1 = gimple_assign_rhs1 (cur_use_stmt);
> +   crhs2 = gimple_assign_rhs2 (cur_use_stmt);
>   }
> -  else if (gimple_assign_rhs_code (use_stmt) == COND_EXPR)
> +  else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR)
>   {
> -   tree cond = gimple_assign_rhs1 (use_stmt);
> +   tree cond = gimple_assign_rhs1 (cur_use_stmt);
> if (COMPARISON_CLASS_P (cond))
>   {
> ccode = TREE_CODE (cond);
> @@ -3792,11 +3793,17 @@ arith_overflow_check_p (gimple *stmt, gi
>   {
> if ((crhs1 == divlhs && arith_cast_equal_p (crhs2, multop))
> || (crhs2 == divlhs && arith_cast_equal_p (crhs1, multop)))
> - return ccode == NE_EXPR ? 1 : -1;
> + {
> +   use_stmt = cur_use_stmt;
> +   return ccode == NE_EXPR ? 1 : -1;
> + }
>   }
> else if ((crhs1 == divlhs && operand_equal_p (crhs2, multop, 0))
>  || (crhs2 == divlhs && crhs1 == multop))
> - return ccode == NE_EXPR ? 1 : -1;
> + {
> +   use_stmt = cur_use_stmt;
> +   return ccode == NE_EXPR ? 1 : -1;
> + }
>   }
>break;
>  default:
> --- gcc/testsuite/gcc.c-torture/compile/pr98629.c.jj  2021-01-12 
> 00:00:21.132949613 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr98629.c 2021-01-11 
> 23:59:58.178206894 +0100
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/98629 */
> +
> +unsigned int a;
> +int b, c;
> +
> +void
> +foo (void)
> +{
> +  unsigned int *e = &a;
> +  (a /= a |= b) - (0 <= (*e += *e)) * (c *= *e);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Uros Bizjak via Gcc-patches

On Tue, Jan 12, 2021 at 10:33 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds patterns (in the end I went with define_insn rather
> than combiner define_split + define_insn_and_split I initially hoped or
> define_insn_and_split) to represent (so far 128-bit only) permutations
> like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second
> operand is CONST0_RTX CONST_VECTOR as pmovzx.
> define_split didn't work (due to the combiner not trying combine_split_insn
> when i1 is NULL) but in the end was quite large, and the reason for not
> trying to split this afterwards is the different vector mode of the output,
> and lowpart_subreg on the result is undesirable,
> so we'd need to split it into two instructions and hope some later pass
> optimizes the move into just rewriting the uses using lowpart_subreg.

You can use post-reload define_insn_and_split here. This way,
gen_lowpart on all arguments, including output, can be used. So,
instead of generating an insn template, the patterns you introduced
should split to "real" sse4_1 zero-extend insns. This approach is
preferred to avoid having several pseudo-insns in .md files that do
the same thing with slightly different patterns. There are many
examples of post-reload splitters that use gen_lowpart in i386.md.

OTOH, perhaps some of the new testcases can be handled in x86
target_fold_builtin? In the long term, maybe target_fold_shuffle can
be introduced to map __builtin_shufle to various target builtins, so
the builtin can be processed further in target_fold_builtin. As
pointed out below, vector insn patterns can be quite complex, and push
RTL combiners to their limits, so perhaps they can be more efficiently
handled by tree passes.

Uros.

> While I initially wrote also avx2 and avx512{bw,f} patterns, turns out that
> without further changes they aren't useful, such permutations are expanded
> into multiple permutation instructions and while combiner in theory could
> be able to undo that, what stops it is that the const_vector of 0 has
> multiple uses.  So either we'd need to allow pre-reload const0_operand
> next to registers (and sometimes memory) in some permutations, or perhaps
> change the vec_perm_const target hook calling code to not force zero vectors
> into registers and change each vec_perm_const hook to force it into register
> itself if it can't expand such a permutation smartly.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-12  Jakub Jelinek  
>
> PR target/95905
> * config/i386/predicates.md (pmovzx_parallel): New predicate.
> * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3,
> *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New
> define_insn patterns.
>
> * gcc.target/i386/pr95905-1.c: New test.
> * gcc.target/i386/pr95905-2.c: New test.
>
> --- gcc/config/i386/predicates.md.jj2021-01-04 10:25:45.248161185 +0100
> +++ gcc/config/i386/predicates.md   2021-01-11 19:40:04.793433826 +0100
> @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel"
>return true;
>  })
>
> +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select,
> +;; where one of the two operands of the vec_concat is const0_operand.
> +(define_predicate "pmovzx_parallel"
> +  (and (match_code "parallel")
> +   (match_code "const_int" "a"))
> +{
> +  int nelt = XVECLEN (op, 0);
> +  int elt, i;
> +
> +  if (nelt < 2)
> +return false;
> +
> +  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
> + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }.  */
> +  elt = INTVAL (XVECEXP (op, 0, 0));
> +  if (elt == 0)
> +{
> +  for (i = 1; i < nelt; ++i)
> +   if ((i & 1) != 0)
> + {
> +   if (INTVAL (XVECEXP (op, 0, i)) < nelt)
> + return false;
> + }
> +   else if (INTVAL (XVECEXP (op, 0, i)) != i / 2)
> + return false;
> +}
> +  else
> +return false;
> +
> +  return true;
> +})
> +
>  ;; Return true if OP is a parallel for a vbroadcast permute.
>  (define_predicate "avx_vbroadcast_operand"
>(and (match_code "parallel")
> --- gcc/config/i386/sse.md.jj   2021-01-07 15:29:52.604974544 +0100
> +++ gcc/config/i386/sse.md  2021-01-11 21:19:06.247346965 +0100
> @@ -17683,6 +17683,27 @@ (define_insn_and_split "*sse4_1_v8
> (any_extend:V8HI (match_dup 1)))]
>"operands[1] = adjust_address_nv (operands[1], V8QImode, 0);")
>
> +(define_insn "*sse4_1_zero_extendv8qiv8hi2_3"
> +  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v")
> +   (vec_select:V16QI
> + (vec_concat:V32QI
> +   (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm")
> +   (match_operand:V16QI 2 "const0_operand" "C,C,C"))
> + (match_parallel 3 "pmovzx_parallel"
> +   [(match_operand 4 "const_int_operand" "n,n,n")])))]
> +  "TARGET_SSE4_1"
> +{
> +  if (MEM_P (operands[1]))
> +return "%vpmovzxbw\t{%1, %0|

Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak wrote:
> > The following patch adds patterns (in the end I went with define_insn rather
> > than combiner define_split + define_insn_and_split I initially hoped or
> > define_insn_and_split) to represent (so far 128-bit only) permutations
> > like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second
> > operand is CONST0_RTX CONST_VECTOR as pmovzx.
> > define_split didn't work (due to the combiner not trying combine_split_insn
> > when i1 is NULL) but in the end was quite large, and the reason for not
> > trying to split this afterwards is the different vector mode of the output,
> > and lowpart_subreg on the result is undesirable,
> > so we'd need to split it into two instructions and hope some later pass
> > optimizes the move into just rewriting the uses using lowpart_subreg.
> 
> You can use post-reload define_insn_and_split here. This way,
> gen_lowpart on all arguments, including output, can be used. So,
> instead of generating an insn template, the patterns you introduced
> should split to "real" sse4_1 zero-extend insns. This approach is
> preferred to avoid having several pseudo-insns in .md files that do
> the same thing with slightly different patterns. There are many
> examples of post-reload splitters that use gen_lowpart in i386.md.

Ok, will change it that way.

> OTOH, perhaps some of the new testcases can be handled in x86
> target_fold_builtin? In the long term, maybe target_fold_shuffle can
> be introduced to map __builtin_shufle to various target builtins, so
> the builtin can be processed further in target_fold_builtin. As
> pointed out below, vector insn patterns can be quite complex, and push
> RTL combiners to their limits, so perhaps they can be more efficiently
> handled by tree passes.

My primary motivation was to generate good code from __builtin_shuffle here
and trying to find the best permutation and map it back from insns to
builtins would be a nightmare.
I'll see how many targets I need to modify to try the no middle-end
force_reg for CONST0_RTX case...

Jakub

[PATCH] alias: Fix offset checks involving section anchors [PR92294]

2021-01-12 Thread Richard Sandiford via Gcc-patches

This is a repost of:

  https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html

which was initially posted during stage 4.  (And yeah, I only just
missed stage 4 again.)

IMO it would be better to fix the bug directly (as the patch tries
to do) instead of wait for a more thorough redesign of this area.
See the end of:

  https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540002.html

for some stats.

Honza: Richard said he'd like your opinion on the patch.


memrefs_conflict_p has a slightly odd structure.  It first checks
whether two addresses based on SYMBOL_REFs refer to the same object,
with a tristate result:

  int cmp = compare_base_symbol_refs (x,y);

If the addresses do refer to the same object, we can use offset-based checks:

  /* If both decls are the same, decide by offsets.  */
  if (cmp == 1)
return offset_overlap_p (c, xsize, ysize);

But then, apart from the special case of forced address alignment,
we use an offset-based check even if we don't know whether the
addresses refer to the same object:

  /* Assume a potential overlap for symbolic addresses that went
 through alignment adjustments (i.e., that have negative
 sizes), because we can't know how far they are from each
 other.  */
  if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
return -1;
  /* If decls are different or we know by offsets that there is no overlap,
 we win.  */
  if (!cmp || !offset_overlap_p (c, xsize, ysize))
return 0;

This somewhat contradicts:

  /* In general we assume that memory locations pointed to by different labels
 may overlap in undefined ways.  */

at the end of compare_base_symbol_refs.  In other words, we're taking -1
to mean that either (a) the symbols are equal (via aliasing) or (b) the
references access non-overlapping objects.

But even assuming that's true for normal symbols, it doesn't cope
correctly with section anchors.  If a symbol X at ANCHOR+OFFSET is
preemptible, either (a) X = ANCHOR+OFFSET (rather than the X = ANCHOR
assumed above) or (b) X and ANCHOR reference non-overlapping objects.

And an offset-based comparison makes no sense for an anchor symbol
vs. a bare symbol with no decl.  If the bare symbol is allowed to
alias other symbols then it can surely alias any symbol in the
anchor's block, so there are multiple anchor offsets that might
induce an alias.

This patch therefore replaces the current tristate:

  - known equal
  - known independent (two accesses can't alias)
  - equal or independent

with:

  - known distance apart
  - known independent (two accesses can't alias)
  - known distance apart or independent
  - don't know

For safety, the patch puts all bare symbols in the "don't know"
category.  If that turns out to be too conservative, we at least
need that behaviour for combinations involving a bare symbol
and a section anchor.  However, bare symbols should be relatively
rare these days.

Retested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


gcc/
PR rtl-optimization/92294
* alias.c (compare_base_symbol_refs): Take an extra parameter
and add the distance between two symbols to it.  Enshrine in
comments that -1 means "either 0 or 1, but we can't tell
which at compile time".  Return -2 for symbols whose
relationship is unknown.
(memrefs_conflict_p): Update call accordingly.
(rtx_equal_for_memref_p): Likewise.  Punt for a return value of -2,
without even checking the offset.  Take the distance between symbols
into account.
---
 gcc/alias.c | 53 ++---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index 8d3575e4e27..e22863a929a 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -159,7 +159,8 @@ static tree decl_for_component_ref (tree);
 static int write_dependence_p (const_rtx,
   const_rtx, machine_mode, rtx,
   bool, bool, bool);
-static int compare_base_symbol_refs (const_rtx, const_rtx);
+static int compare_base_symbol_refs (const_rtx, const_rtx,
+HOST_WIDE_INT * = NULL);
 
 static void memory_modified_1 (rtx, const_rtx, void *);
 
@@ -1837,7 +1838,11 @@ rtx_equal_for_memref_p (const_rtx x, const_rtx y)
   return label_ref_label (x) == label_ref_label (y);
 
 case SYMBOL_REF:
-  return compare_base_symbol_refs (x, y) == 1;
+  {
+   HOST_WIDE_INT distance = 0;
+   return (compare_base_symbol_refs (x, y, &distance) == 1
+   && distance == 0);
+  }
 
 case ENTRY_VALUE:
   /* This is magic, don't go through canonicalization et al.  */
@@ -2172,10 +2177,24 @@ compare_base_decls (tree base1, tree base2)
   return ret;
 }
 
-/* Same as compare_base_decls but for SYMBOL_REF.  */
+/* Compare SYMBOL_REFs X_BASE and Y_BASE.
+
+   - Return

Re: [PATCH, OpenMP 5.0] Basic support of release/delete clauses on target/target-data directives

2021-01-12 Thread Chung-Lin Tang





On 2021/1/11 6:28 PM, Jakub Jelinek wrote:

On Wed, Dec 16, 2020 at 11:06:10PM +0800, Chung-Lin Tang wrote:

we have some other sollve_vv tests for OpenMP 5.0, which tests the occurrence of
'delete' map kind on the target directive.


That is strange.
In OpenMP 5.0, I certainly see the target data:
[163:16] A map-type in a map clause must be to, from, tofrom or alloc.
restriction and similarly for target:
[174:27] A map-type in a map clause must be to, from, tofrom or alloc.
restriction.


Thanks! I was originally trying to find such a restriction too, but missed 
those lines above.
Appears that this patch isn't needed then.

Chung-Lin

[PATCH][pushed] gcov: add more debugging facility

2021-01-12 Thread Martin Liška


The patch is about an extensive debugging facility that I see
beneficial for GCOV issue.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to push it.

Thanks,
Martin

gcc/ChangeLog:

* gcov.c (source_info::debug): New.
(print_usage): Add --debug (-D) option.
(process_args): Likewise.
(generate_results): Call src->debug after
accumulate_line_counts.
(read_graph_file): Properly assign id for EXIT_BLOCK.
* profile.c (branch_prob): Dump function body before it is
instrumented.
---
 gcc/gcov.c| 43 ++-
 gcc/profile.c |  5 +
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 01ca52b215d..93128721ef6 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -371,6 +371,9 @@ public:
   /* Register a new function.  */
   void add_function (function_info *fn);
 
+  /* Debug the source file.  */

+  void debug ();
+
   /* Index of the source_info in sources vector.  */
   unsigned index;
 
@@ -428,6 +431,31 @@ source_info::get_functions_at_location (unsigned line_num) const

   return slot;
 }
 
+void source_info::debug ()

+{
+  fprintf (stderr, "source_info: %s\n", name);
+  for (vector::iterator it = functions.begin ();
+   it != functions.end (); it++)
+{
+  function_info *fn = *it;
+  fprintf (stderr, "  function_info: %s\n", fn->get_name ());
+  for (vector::iterator bit = fn->blocks.begin ();
+  bit != fn->blocks.end (); bit++)
+   {
+ fprintf (stderr, "block_info id=%d, count=%ld\n",
+  bit->id, bit->count);
+   }
+}
+
+  for (unsigned lineno = 1; lineno < lines.size (); ++lineno)
+{
+  line_info &line = lines[lineno];
+  fprintf (stderr, "  line_info=%d, count=%ld\n", lineno, line.count);
+}
+
+  fprintf (stderr, "\n");
+}
+
 class name_map
 {
 public:
@@ -579,6 +607,10 @@ static int flag_human_readable_numbers = 0;
 
 static int flag_function_summary = 0;
 
+/* Print debugging dumps.  */

+
+static int flag_debug = 0;
+
 /* Object directory file prefix.  This is the directory/file where the
graph and data files are looked for, if nonzero.  */
 
@@ -896,6 +928,7 @@ print_usage (int error_p)

   fnotice (file, "  -c, --branch-counts Output counts of branches 
taken\n\
 rather than percentages\n");
   fnotice (file, "  -d, --display-progress  Display progress 
information\n");
+  fnotice (file, "  -D, --debug   Display debugging 
dumps\n");
   fnotice (file, "  -f, --function-summariesOutput summaries for each 
function\n");
   fnotice (file, "  -h, --help  Print this help, then 
exit\n");
   fnotice (file, "  -j, --json-format   Output JSON intermediate 
format\n\
@@ -963,6 +996,7 @@ static const struct option options[] =
   { "hash-filenames",no_argument,   NULL, 'x' },
   { "use-colors",no_argument,   NULL, 'k' },
   { "use-hotness-colors",   no_argument,   NULL, 'q' },
+  { "debug", no_argument,   NULL, 'D' },
   { 0, 0, 0, 0 }
 };
 
@@ -973,7 +1007,7 @@ process_args (int argc, char **argv)

 {
   int opt;
 
-  const char *opts = "abcdfhHijklmno:pqrs:tuvwx";

+  const char *opts = "abcdDfhHijklmno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
@@ -1044,6 +1078,9 @@ process_args (int argc, char **argv)
case 't':
  flag_use_stdout = 1;
  break;
+   case 'D':
+ flag_debug = 1;
+ break;
case 'v':
  print_version ();
  /* print_version will exit.  */
@@ -1466,6 +1503,8 @@ generate_results (const char *file_name)
}
 
   accumulate_line_counts (src);

+  if (flag_debug)
+   src->debug ();
 
   if (!flag_use_stdout)

file_summary (&src->coverage);
@@ -1804,6 +1843,8 @@ read_graph_file (void)
  arc = XCNEW (arc_info);
 
 	  arc->dst = &fn->blocks[dest];

+ /* Set id in order to find EXIT_BLOCK.  */
+ arc->dst->id = dest;
  arc->src = src_blk;
 
 	  arc->count = 0;

diff --git a/gcc/profile.c b/gcc/profile.c
index d629687484b..1f1d60c8180 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -1294,6 +1294,11 @@ branch_prob (bool thunk)
   if (dump_file)
 fprintf (dump_file, "%d instrumentation edges\n", num_instrumented);
 
+  /* Dump function body before it's instrumented.

+ It helps to debug gcov tool.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+dump_function_to_file (cfun->decl, dump_file, dump_flags);
+
   /* Compute two different checksums. Note that we want to compute
  the checksum in only once place, since it depends on the shape
  of the control flow which can change during
--
2.29.2

Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches wrote:
> > OTOH, perhaps some of the new testcases can be handled in x86
> > target_fold_builtin? In the long term, maybe target_fold_shuffle can
> > be introduced to map __builtin_shufle to various target builtins, so
> > the builtin can be processed further in target_fold_builtin. As
> > pointed out below, vector insn patterns can be quite complex, and push
> > RTL combiners to their limits, so perhaps they can be more efficiently
> > handled by tree passes.
> 
> My primary motivation was to generate good code from __builtin_shuffle here
> and trying to find the best permutation and map it back from insns to
> builtins would be a nightmare.
> I'll see how many targets I need to modify to try the no middle-end
> force_reg for CONST0_RTX case...

For the folding, I think best would be to change _mm_unpacklo_epi8
and all the similar intrinsics for hardcoded specific permutations
from using a builtin to just using __builtin_shuffle (together with
verification that we emit as good or better code from it for each case of
course), and keep __builtin_shuffle -> VEC_PERM_EXPR as the canonical
form (with which the GIMPLE code can do any optimizations it wants).

Jakub

[PATCH][pushed] options: properly compare string arguments

2021-01-12 Thread Martin Liška


Similarly to 7f967bd2a7ba156ede3fbb147e66dea5fb7137a6, we need to
compare string with strcmp.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to push it as it's the same as what I did in 
7f967bd2a7ba156ede3fbb147e66dea5fb7137a6.

Martin

gcc/ChangeLog:

PR c++/97284
* optc-save-gen.awk: Compare also n_target_save vars with
strcmp.
---
 gcc/optc-save-gen.awk | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 85debfe0b64..b1f85928275 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1036,8 +1036,10 @@ for (i = 0; i < n_target_save; i++) {
type = var;
sub("^.*[ *]", "", name)
sub(" *" name "$", "", type)
-   if (target_save_decl[i] ~ "^const char \\*+[_" alnum "]+$")
+   if (target_save_decl[i] ~ "^const char \\*+[_" alnum "]+$") {
var_target_str[n_target_str++] = name;
+   string_options_names[name]++
+   }
else {
if (target_save_decl[i] ~ " .*\\[.+\\]+$") {
size = name;
@@ -1451,7 +1453,7 @@ for (i = 0; i < n_opts; i++) {
continue;
checked_options[name]++
 
-	if (name in string_options_names) {

+   if (name in string_options_names || ("x_" name) in 
string_options_names) {
  print "  if (ptr1->x_" name " != ptr2->x_" name "";
  print "  && (!ptr1->x_" name" || !ptr2->x_" name
  print "  || strcmp (ptr1->x_" name", ptr2->x_" name ")))";
--
2.29.2

[PATCH] i386, v2: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak via Gcc-patches wrote:
> You can use post-reload define_insn_and_split here. This way,
> gen_lowpart on all arguments, including output, can be used. So,
> instead of generating an insn template, the patterns you introduced
> should split to "real" sse4_1 zero-extend insns. This approach is
> preferred to avoid having several pseudo-insns in .md files that do
> the same thing with slightly different patterns. There are many
> examples of post-reload splitters that use gen_lowpart in i386.md.

So like this?

If I tweak the vec_perm_const, the other define_insn_and_split will be
easier, as they won't need the vec_select and variants in what they split
into, just lowpart_subreg on the operand unconditionally and zero_extend it.

2021-01-12  Jakub Jelinek  

PR target/95905
* config/i386/predicates.md (pmovzx_parallel): New predicate.
* config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3,
*sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New
define_insn_and_split patterns.

* gcc.target/i386/pr95905-1.c: New test.
* gcc.target/i386/pr95905-2.c: New test.

--- gcc/config/i386/predicates.md.jj2021-01-12 11:01:28.458643868 +0100
+++ gcc/config/i386/predicates.md   2021-01-12 13:58:39.816222121 +0100
@@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel"
   return true;
 })
 
+;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select,
+;; where one of the two operands of the vec_concat is const0_operand.
+(define_predicate "pmovzx_parallel"
+  (and (match_code "parallel")
+   (match_code "const_int" "a"))
+{
+  int nelt = XVECLEN (op, 0);
+  int elt, i;
+
+  if (nelt < 2)
+return false;
+
+  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
+ For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }.  */
+  elt = INTVAL (XVECEXP (op, 0, 0));
+  if (elt == 0)
+{
+  for (i = 1; i < nelt; ++i)
+   if ((i & 1) != 0)
+ {
+   if (INTVAL (XVECEXP (op, 0, i)) < nelt)
+ return false;
+ }
+   else if (INTVAL (XVECEXP (op, 0, i)) != i / 2)
+ return false;
+}
+  else
+return false;
+
+  return true;
+})
+
 ;; Return true if OP is a parallel for a vbroadcast permute.
 (define_predicate "avx_vbroadcast_operand"
   (and (match_code "parallel")
--- gcc/config/i386/sse.md.jj   2021-01-12 11:01:28.494643460 +0100
+++ gcc/config/i386/sse.md  2021-01-12 14:30:32.688546846 +0100
@@ -17683,6 +17683,36 @@ (define_insn_and_split "*sse4_1_v8
(any_extend:V8HI (match_dup 1)))]
   "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);")
 
+(define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
+  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm")
+   (match_operand:V16QI 2 "const0_operand" "C,C,C"))
+ (match_parallel 3 "pmovzx_parallel"
+   [(match_operand 4 "const_int_operand" "n,n,n")])))]
+  "TARGET_SSE4_1"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (zero_extend:V8HI
+ (vec_select:V8QI
+   (match_dup 1)
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)
+  (const_int 4) (const_int 5)
+  (const_int 6) (const_int 7)]]
+{
+  operands[0] = lowpart_subreg (V8HImode, operands[0], V16QImode);
+  if (MEM_P (operands[1]))
+{
+  operands[1] = lowpart_subreg (V8QImode, operands[1], V16QImode);
+  operands[1] = gen_rtx_ZERO_EXTEND (V8HImode, operands[1]);
+  emit_insn (gen_rtx_SET (operands[0], operands[1]));
+  DONE;
+}
+})
+
 (define_expand "v8qiv8hi2"
   [(set (match_operand:V8HI 0 "register_operand")
(any_extend:V8HI
@@ -17929,6 +17959,34 @@ (define_expand "v4hiv4si2"
 }
 })
 
+(define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
+   (vec_select:V8HI
+ (vec_concat:V16HI
+   (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm")
+   (match_operand:V8HI 2 "const0_operand" "C,C,C"))
+ (match_parallel 3 "pmovzx_parallel"
+   [(match_operand 4 "const_int_operand" "n,n,n")])))]
+  "TARGET_SSE4_1"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (zero_extend:V4SI
+ (vec_select:V4HI
+   (match_dup 1)
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)]]
+{
+  operands[0] = lowpart_subreg (V4SImode, operands[0], V8HImode);
+  if (MEM_P (operands[1]))
+{
+  operands[1] = lowpart_subreg (V4HImode, operands[1], V8HImode);
+  operands[1] = gen_rtx_ZERO_EXTEND (V4SImode, operands[1]);
+  emit_insn (gen_rtx_SET (operands[0], operands[1]));
+  DONE;
+}
+})
+
 (define_insn "avx512f_v

Re: [PATCH] i386, v2: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

2021-01-12 Thread Uros Bizjak via Gcc-patches

On Tue, Jan 12, 2021 at 2:40 PM Jakub Jelinek  wrote:
>
> On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak via Gcc-patches wrote:
> > You can use post-reload define_insn_and_split here. This way,
> > gen_lowpart on all arguments, including output, can be used. So,
> > instead of generating an insn template, the patterns you introduced
> > should split to "real" sse4_1 zero-extend insns. This approach is
> > preferred to avoid having several pseudo-insns in .md files that do
> > the same thing with slightly different patterns. There are many
> > examples of post-reload splitters that use gen_lowpart in i386.md.
>
> So like this?
>
> If I tweak the vec_perm_const, the other define_insn_and_split will be
> easier, as they won't need the vec_select and variants in what they split
> into, just lowpart_subreg on the operand unconditionally and zero_extend it.
>
> 2021-01-12  Jakub Jelinek  
>
> PR target/95905
> * config/i386/predicates.md (pmovzx_parallel): New predicate.
> * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3,
> *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New
> define_insn_and_split patterns.
>
> * gcc.target/i386/pr95905-1.c: New test.
> * gcc.target/i386/pr95905-2.c: New test.

LGTM.

Thanks,
Uros.

>
> --- gcc/config/i386/predicates.md.jj2021-01-12 11:01:28.458643868 +0100
> +++ gcc/config/i386/predicates.md   2021-01-12 13:58:39.816222121 +0100
> @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel"
>return true;
>  })
>
> +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select,
> +;; where one of the two operands of the vec_concat is const0_operand.
> +(define_predicate "pmovzx_parallel"
> +  (and (match_code "parallel")
> +   (match_code "const_int" "a"))
> +{
> +  int nelt = XVECLEN (op, 0);
> +  int elt, i;
> +
> +  if (nelt < 2)
> +return false;
> +
> +  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
> + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }.  */
> +  elt = INTVAL (XVECEXP (op, 0, 0));
> +  if (elt == 0)
> +{
> +  for (i = 1; i < nelt; ++i)
> +   if ((i & 1) != 0)
> + {
> +   if (INTVAL (XVECEXP (op, 0, i)) < nelt)
> + return false;
> + }
> +   else if (INTVAL (XVECEXP (op, 0, i)) != i / 2)
> + return false;
> +}
> +  else
> +return false;
> +
> +  return true;
> +})
> +
>  ;; Return true if OP is a parallel for a vbroadcast permute.
>  (define_predicate "avx_vbroadcast_operand"
>(and (match_code "parallel")
> --- gcc/config/i386/sse.md.jj   2021-01-12 11:01:28.494643460 +0100
> +++ gcc/config/i386/sse.md  2021-01-12 14:30:32.688546846 +0100
> @@ -17683,6 +17683,36 @@ (define_insn_and_split "*sse4_1_v8
> (any_extend:V8HI (match_dup 1)))]
>"operands[1] = adjust_address_nv (operands[1], V8QImode, 0);")
>
> +(define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
> +  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v")
> +   (vec_select:V16QI
> + (vec_concat:V32QI
> +   (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm")
> +   (match_operand:V16QI 2 "const0_operand" "C,C,C"))
> + (match_parallel 3 "pmovzx_parallel"
> +   [(match_operand 4 "const_int_operand" "n,n,n")])))]
> +  "TARGET_SSE4_1"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 0)
> +   (zero_extend:V8HI
> + (vec_select:V8QI
> +   (match_dup 1)
> +   (parallel [(const_int 0) (const_int 1)
> +  (const_int 2) (const_int 3)
> +  (const_int 4) (const_int 5)
> +  (const_int 6) (const_int 7)]]
> +{
> +  operands[0] = lowpart_subreg (V8HImode, operands[0], V16QImode);
> +  if (MEM_P (operands[1]))
> +{
> +  operands[1] = lowpart_subreg (V8QImode, operands[1], V16QImode);
> +  operands[1] = gen_rtx_ZERO_EXTEND (V8HImode, operands[1]);
> +  emit_insn (gen_rtx_SET (operands[0], operands[1]));
> +  DONE;
> +}
> +})
> +
>  (define_expand "v8qiv8hi2"
>[(set (match_operand:V8HI 0 "register_operand")
> (any_extend:V8HI
> @@ -17929,6 +17959,34 @@ (define_expand "v4hiv4si2"
>  }
>  })
>
> +(define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
> +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
> +   (vec_select:V8HI
> + (vec_concat:V16HI
> +   (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm")
> +   (match_operand:V8HI 2 "const0_operand" "C,C,C"))
> + (match_parallel 3 "pmovzx_parallel"
> +   [(match_operand 4 "const_int_operand" "n,n,n")])))]
> +  "TARGET_SSE4_1"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 0)
> +   (zero_extend:V4SI
> + (vec_select:V4HI
> +   (match_dup 1)
> +   (parallel [(const_int 0) (const_int 1)
> +  (const_int 2) (const_int 3)]]
> +{
> +  operands[0] = lowpar

[PATCH] tree-optimization/98550 - fix BB vect unrolling check

2021-01-12 Thread Richard Biener

This fixes the check that disqualifies BB vectorization because of
required unrolling to match up with the later exact_div we do.  To
not disable the ability to split groups that do not match up
exactly with a choosen vector type this also introduces a soft-fail
mechanism to vect_build_slp_tree_1 which delays failing to after
the matches[] array is populated from other checks and only then
determines the split point according to the vector type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-12  Richard Biener  

PR tree-optimization/98550
* tree-vect-slp.c (vect_record_max_nunits): Check whether
the group size is a multiple of the vector element count.
(vect_build_slp_tree_1): When we need to fail because
the vector type choosen causes unrolling do so lazily
without affecting matches only at the end to guide group splitting.

* g++.dg/opt/pr98550.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr98550.C | 96 ++
 gcc/tree-vect-slp.c| 40 ++---
 2 files changed, 128 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr98550.C

diff --git a/gcc/testsuite/g++.dg/opt/pr98550.C 
b/gcc/testsuite/g++.dg/opt/pr98550.C
new file mode 100644
index 000..49102e6c1a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr98550.C
@@ -0,0 +1,96 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target c++11 } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-march=z13" { target s390x-*-* } } */
+
+template  struct k { static constexpr int c = a; };
+template  struct o;
+template  struct o {
+  typedef decltype(0) h;
+};
+template  struct p : o::c, k::c, g...> {};
+class q;
+class r {
+public:
+  void ap(q);
+};
+template  void ax(aw ay) { ay(); }
+template  void ba(az bb) {
+  using bc = p;
+  using bd = typename bc::h;
+  ax(bb);
+}
+template  class s;
+class t {
+public:
+  s<8> br();
+  template  void operator()() { ba(br()); }
+};
+class q {
+public:
+  template  q(az) { H(); }
+  struct H {
+t cc;
+H() { cc(); }
+  };
+};
+template  struct I {};
+template  void cm(j cn, I) {
+  cm(cn, I());
+  cn(cl);
+}
+template  void cm(j, I<0>) {}
+template  struct u {
+  long cp[co];
+  void cq(const u &);
+  void cs(int);
+  void operator<(u);
+};
+template  void u::cq(const u &l) {
+  cm([&](int i) { cp[i] &= l.cp[i]; }, I());
+}
+template  void u::cs(int m) {
+  cm([&](int i) { cp[i] >>= m; }, I());
+}
+template  class K;
+template  class v {
+  int cv;
+  friend K;
+
+public:
+  void cx(int, unsigned char *, unsigned long long);
+};
+template  class K {
+public:
+  static void cx(v &);
+};
+template 
+void v::cx(int, unsigned char *, unsigned long long) {
+  K::cx(*this);
+}
+template  void K::cx(v &cz) {
+  u a, b, d;
+  int e, n = cz.cv;
+  for (; e;)
+if (cz.cv)
+  a.cs(cz.cv);
+  a.cq(d);
+  a < b;
+}
+template  class s {
+  v *dh;
+
+public:
+  void operator()();
+};
+template  void s::operator()() {
+  int f;
+  unsigned char g;
+  long h;
+  dh->cx(f, &g, h);
+}
+void d() {
+  r i;
+  t j;
+  i.ap(j);
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 877d44b2257..65b7a27e1e8 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -873,11 +873,8 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info 
stmt_info,
 
   /* If populating the vector type requires unrolling then fail
  before adjusting *max_nunits for basic-block vectorization.  */
-  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  unsigned HOST_WIDE_INT const_nunits;
   if (is_a  (vinfo)
-  && (!nunits.is_constant (&const_nunits)
- || const_nunits > group_size))
+  && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -928,6 +925,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
   bool first_stmt_load_p = false, load_p = false;
   bool first_stmt_phi_p = false, phi_p = false;
+  bool maybe_soft_fail = false;
+  tree soft_fail_nunits_vectype = NULL_TREE;
 
   /* For every stmt in NODE find its def stmt/s.  */
   stmt_vec_info stmt_info;
@@ -977,10 +976,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
 
   tree nunits_vectype;
   if (!vect_get_vector_types_for_stmt (vinfo, stmt_info, &vectype,
-  &nunits_vectype, group_size)
- || (nunits_vectype
- && !vect_record_max_nunits (vinfo, stmt_info, group_size,
- nunits_vectype, max_nunits)))
+  &nunits_vectype, group_size))
{
  if (is_a  (vinfo) && i != 0)
continue;
@@ -988,6 +984,17 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  matches[0] = false;
  return fals

Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-12 Thread Richard Sandiford via Gcc-patches

Iain Sandoe  writes:
> Hi,
>
> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
> of the immediate rotation quantity.
>
> Although, it seems, GAS is able to infer the # (or is leninent about
> its absence) assemblers based on the LLVM back end expect it and error out.
>
> tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental)

Sorry for the slow reply, didn't see this till now.

The patch is OK in principle, and personally I prefer “#”.  But how far
does this spread?  Are only ROR modifiers on logical patterns affected?
Or is the use of a paranthesised expression instead of a literal the thing
that makes the difference?  GCC is generally quite lax at including “#”,
so I'd expect more fallout than this.

Thanks,
Richard

>
> OK for master?
> thanks
> Iain
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (_rol3): Add a '#'
>   mark in front of the immediate quantity.
>   (_rolsi3_uxtw): Likewise.
> ---
> gcc/config/aarch64/aarch64.md | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 45d9c6ac45a..e0de82c938a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4416,7 +4416,7 @@
> (match_operand:QI 2 "aarch64_shift_imm_" "n"))
>(match_operand:GPI 3 "register_operand" "r")))]
>   ""
> -  "\\t%0, %3, %1, ror ( - %2)"
> +  "\\t%0, %3, %1, ror #( - %2)"
>   [(set_attr "type" "logic_shift_imm")]
> )
>
> @@ -4441,7 +4441,7 @@
> (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
>(match_operand:SI 3 "register_operand" "r"]
>   ""
> -  "\\t%w0, %w3, %w1, ror (32 - %2)"
> +  "\\t%w0, %w3, %w1, ror #(32 - %2)"
>   [(set_attr "type" "logic_shift_imm")]
> )

Re: make FOR_EACH_IMM_USE_STMT safe for early exits

2021-01-12 Thread Andrew MacLeod via Gcc-patches


On 1/9/21 3:33 PM, Alexandre Oliva wrote:

On Jan  7, 2021, Richard Biener  wrote:


On Wed, Jan 6, 2021 at 12:34 PM Alexandre Oliva  wrote:

On Jan  4, 2021, Richard Biener  wrote:


Thus, please remove uses of BREAK_FROM_IMM_USE_STMT
together with this patch.

And RETURN_FROM_IMM_USE_STMT, I suppose?

Sure.

Done.


make FOR_EACH_IMM_USE_STMT safe for early exits

Use a dtor to automatically remove ITER from IMM_USE list in
FOR_EACH_IMM_USE_STMT.

Regstrapped on x86_64-linux-gnu.  Ok to install?




Nice!  I like it.  One less wart for my epitaph :-).

Thanks
Andrew

[PATCH] if-to-switch: fix also virtual phis

2021-01-12 Thread Martin Liška


Hello.

As seen in the PR, we need to fix also virtual PHIs, otherwise
TODO_cfg will skip edges for a missing PHI argument.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/98455
* gimple-if-to-switch.cc (condition_info::record_phi_mapping):
Record also virtual PHIs.

gcc/testsuite/ChangeLog:

PR tree-optimization/98455
* gcc.dg/tree-ssa/pr98455.c: New test.
---
 gcc/gimple-if-to-switch.cc  |  7 ++-
 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++
 2 files changed, 21 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 560753d0311..96213d86c28 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec *vec)
gsi_next (&gsi))
 {
   gphi *phi = gsi.phi ();
-  if (!virtual_operand_p (gimple_phi_result (phi)))
-   {
- tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
- vec->safe_push (std::make_pair (phi, arg));
-   }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+  vec->safe_push (std::make_pair (phi, arg));
 }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c

new file mode 100644
index 000..24e249f6fcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
@@ -0,0 +1,19 @@
+/* PR tree-optimization/98455 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */
+
+void
+n4 (int io, int vb)
+{
+  double uc[2] = { 1.0, 2.0, };
+
+  if (io == 0)
+uc[0] = 0.0;
+
+  for (;;)
+if (io == 0)
+  if (vb == 0)
+uc[0] = uc[1];
+  else if (vb == 1)
+uc[1] = 0.0;
+}
--
2.29.2

Re: [PATCH] libphobos: Allow building libphobos using Solaris/x86 assembler

2021-01-12 Thread Rainer Orth

Hi Iain,

> Having a look, actually I can just use the presence of TEST_OUTPUT to be
> a gate for whether to prune all output or not.

that's quite nice indeed.

> Maybe this can be improved later to extract the contents of TEST_OUTPUT,
> but for now, it has caught a few hidden bugs in the tests that I have
> handled in the patch below (no changelog entry yet).

I've tried the patch myself last night on both i386-pc-solaris2.11 (both
with as and gas) and sparc-sun-solaris2.11 (gas only until PR d/98584 is
fixed) and found a few issues:

> +proc gdc-extra-test-options { fdout test } {
> +switch $test {
[...]
> + "runnable/test42.d" {
> + # Tests that overflow line limits of older assemblers.
> + puts $fdout "// { dg-xfail-if \"Lines exceed 10240 characters\" { 
> *-*-solaris2.* && { ! gas } } }"

This doesn't work: I get

+XPASS: gdc.test/runnable/test42.d   (test for excess errors)
+UNRESOLVED: gdc.test/runnable/test42.d   compilation failed to produce 
executable
+XPASS: gdc.test/runnable/test42.d -shared-libphobos   (test for excess errors)
+UNRESOLVED: gdc.test/runnable/test42.d -shared-libphobos   compilation failed 
to produce executable

which is no wonder: due to the presence of TEST_OUTPUT in the test, all
output gets pruned, including the assembler message.  This leads to

success
myInt int
myBool bool
i
s
C6test42__T4T219TiZ1C
C6test427test219FZ8__mixin11C
Input string too long, limit 10240
compiler exited with status 1
XPASS: gdc.test/runnable/test42.d   (test for excess errors)

Unfortunately the exit status of compiler and assembler is ignored by
the testsuite framework when deciding whether the compilation has
succeeded.

Besides, the dg-xfail-if above should be restricted to *86*-*-solaris2.*
&& ! gas (to allow for both i?86-*-solaris2.* and x86_64-*-solaris2.*):
I've previously run a Solaris/SPARC build with as and libphobos enabled
and only the 64-bit gdc.test/runnable/test42.d execution tests FAIL.
While the Solaris sparc and x86 assemblers have a common origin, the
code bases have diverged over time and the sparc assembler doesn't have
this particular low limit.

However, testing on Solaris/SPARC with gas and libphobos showed another
issue:

+FAIL: gdc.test/runnable/traits_getPointerBitmap.d   (test for excess errors)
 UNRESOLVED: gdc.test/runnable/traits_getPointerBitmap.d   compilation failed 
to produce executable
+FAIL: gdc.test/runnable/traits_getPointerBitmap.d -shared-libphobos   (test 
for excess errors)
 UNRESOLVED: gdc.test/runnable/traits_getPointerBitmap.d -shared-libphobos   
compilation failed to produce executable

Excess errors:
runnable/traits_getPointerBitmap.d:220:9: error: vector type __vector(float[4]) 
is not supported on this platform

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-12 Thread Iain Sandoe


Hi Richard,

Richard Sandiford  wrote:


Iain Sandoe  writes:



The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
of the immediate rotation quantity.

Although, it seems, GAS is able to infer the # (or is leninent about
its absence) assemblers based on the LLVM back end expect it and error  
out.


tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental)


Sorry for the slow reply, didn't see this till now.


Hmm .. I did CC you directly ( but having some troubles with this ISP which  
I am

trying to resolve - emails going missing or have to be re-sent …  :/ )


The patch is OK in principle, and personally I prefer “#”.  But how far
does this spread?  Are only ROR modifiers on logical patterns affected?
Or is the use of a paranthesised expression instead of a literal the thing
that makes the difference?


perhaps,

Unfortunately, I have only a cause and effect mechanism to check this (i.e.
problems that prevent bootstrap or ones I’ve triaged in the testsuite fails).

For X86 and PPC I have a more direct way of invoking the LLVM backend (not  
had

time to extend that to aarch64 yet).

.. but via XCode (which is what 99.99% of macOS folks will use as their  
‘binutils’) it’s
done by “as” ==> "clang -cc1as” (effectively an assembler with  
preprocessing).  This
issue certainly fires there - and I’d have no reason to think that the  
preprocessor would
make any material difference, so it’s likely that the LLVM backend is not  
set up to

deduce that the value is a constant and infer the ‘#’.

GCC is generally quite lax at including “#”, so I'd expect more fallout  
than this.


This is one of three issues (that are not directly related to mach-o format  
or relocation
syntax) that I’ve encountered in my experimental port that happen to  
trigger in bootstrap.


(the others are a format specifier type naming clash and one related to  
Darwin having

 signed chars by default - so this is the only asm syntax one remaining - Alex 
fixed the
 other fails earlier in the year).

Of course, it’s possible that there are other cases in the testsuite  
fallout that i’ve not yet

examined - there’s quite a lot still to triage :(.

Once upon a time there were some tests (at least for PPC and X86) that  
essentially
were a single file containing all the insns - so that could be thrown at an  
assembler
and would be expected to complete without error - but I don’t know if  
there’s an

equivalent for aarch64.

cheers
Iain

* I suppose someone could grep ‘#’ in the armv8_arm and c.f. cases against  
aarch64.md
(I dare not volunteer, at present, since I’m already way overcomitted with  
non-$dayjob).




Thanks,
Richard


OK for master?
thanks
Iain

gcc/ChangeLog:

* config/aarch64/aarch64.md (_rol3): Add a '#'
mark in front of the immediate quantity.
(_rolsi3_uxtw): Likewise.
---
gcc/config/aarch64/aarch64.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 45d9c6ac45a..e0de82c938a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4416,7 +4416,7 @@
  (match_operand:QI 2 "aarch64_shift_imm_" "n"))
 (match_operand:GPI 3 "register_operand" "r")))]
 ""
-  "\\t%0, %3, %1, ror ( - %2)"
+  "\\t%0, %3, %1, ror #( - %2)"
 [(set_attr "type" "logic_shift_imm")]
)

@@ -4441,7 +4441,7 @@
  (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
 (match_operand:SI 3 "register_operand" "r"]
 ""
-  "\\t%w0, %w3, %w1, ror (32 - %2)"
+  "\\t%w0, %w3, %w1, ror #(32 - %2)"
 [(set_attr "type" "logic_shift_imm")]
)

Re: [PATCH] if-to-switch: fix also virtual phis

2021-01-12 Thread Richard Biener via Gcc-patches

) On Tue, Jan 12, 2021 at 3:50 PM Martin Liška  wrote:
>
> Hello.
>
> As seen in the PR, we need to fix also virtual PHIs, otherwise
> TODO_cfg will skip edges for a missing PHI argument.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK - doesn't this mean you can remove the

  mark_virtual_operands_for_renaming (fun);

call and thus TODO_update_ssa?  Btw, the pass seems
to unconditionally schedule TODO_cleanup_cfg - it would
be nice to only do that (return TODO_cleanup_cfg from
pass_if_to_switch::execute) if it did any transform.

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> PR tree-optimization/98455
> * gimple-if-to-switch.cc (condition_info::record_phi_mapping):
> Record also virtual PHIs.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/98455
> * gcc.dg/tree-ssa/pr98455.c: New test.
> ---
>   gcc/gimple-if-to-switch.cc  |  7 ++-
>   gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++
>   2 files changed, 21 insertions(+), 5 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
>
> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> index 560753d0311..96213d86c28 100644
> --- a/gcc/gimple-if-to-switch.cc
> +++ b/gcc/gimple-if-to-switch.cc
> @@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec 
> *vec)
>  gsi_next (&gsi))
>   {
> gphi *phi = gsi.phi ();
> -  if (!virtual_operand_p (gimple_phi_result (phi)))
> -   {
> - tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> - vec->safe_push (std::make_pair (phi, arg));
> -   }
> +  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> +  vec->safe_push (std::make_pair (phi, arg));
>   }
>   }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
> new file mode 100644
> index 000..24e249f6fcb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/98455 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */
> +
> +void
> +n4 (int io, int vb)
> +{
> +  double uc[2] = { 1.0, 2.0, };
> +
> +  if (io == 0)
> +uc[0] = 0.0;
> +
> +  for (;;)
> +if (io == 0)
> +  if (vb == 0)
> +uc[0] = uc[1];
> +  else if (vb == 1)
> +uc[1] = 0.0;
> +}
> --
> 2.29.2
>

Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-12 Thread Richard Sandiford via Gcc-patches

Iain Sandoe  writes:
> Hi Richard,
>
> Richard Sandiford  wrote:
>
>> Iain Sandoe  writes:
>
>>> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front
>>> of the immediate rotation quantity.
>>>
>>> Although, it seems, GAS is able to infer the # (or is leninent about
>>> its absence) assemblers based on the LLVM back end expect it and error  
>>> out.
>>>
>>> tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental)
>>
>> Sorry for the slow reply, didn't see this till now.
>
> Hmm .. I did CC you directly ( but having some troubles with this ISP which  
> I am
> trying to resolve - emails going missing or have to be re-sent …  :/ )

Yeah, I got the message, I just didn't see it, sorry.

>> The patch is OK in principle, and personally I prefer “#”.  But how far
>> does this spread?  Are only ROR modifiers on logical patterns affected?
>> Or is the use of a paranthesised expression instead of a literal the thing
>> that makes the difference?
>
> perhaps,

Trying it out locally, that does seem to be the difference:

and x1, x2, x3, ror 1// OK
and x1, x2, x3, ror (1)  // Rejected
and x1, x2, x3, ror #(1) // OK

Same for the other modifiers I tried.

That doesn't look intentional, but whether it's intentional obviously
isn't important in this context.  So yeah, the patch is OK.

Thanks,
Richard

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread CHIGOT, CLEMENT via Gcc-patches

Hi everyone, 

I've reworked the patch to merged dragonfly and AIX
models into the new one named "ieee_1003.1-2008". 
It seems okay on the AIX part but if someone can test
on Dragonfly and Freebsd I would be glad. Configure 
needs to be regenerated, first.

For now, I've used #ifdef inside the code for the few
differences. There are less than I thought. So, it seems 
okay to me. 
However, I haven't changed all the (locale_t) casts 
made on Dragonfly when passing a locale object to
a syscall. On AIX, this is transparent, __c_locale being 
locale_t. 

I haven't updated tests failing on AIX yet. 
And I don't know for the Changelog/Patch, how renamed 
files must be handled ? I've retrieved my git commit with 
"git format-patch --no-renames" and thus the Changelog 
is made of "Removed" and "New File". Is it okay or is there 
anything special to add ? I didn't find the answer quickly. 

Thanks, 
ClémentFrom d36a04c7d4eb390817d711dc59ad0e0759bccdce Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Tue, 29 Dec 2020 11:08:33 +0100
Subject: [PATCH] libstdc++: implement locale support for POSIX 2008
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The implementation is based on dragonfly one.
It also adds support for AIX with a few tweaks.
As of now, a few locale functions are missing on AIX.
For strftime_l, localeconv_l, mbstowcs_l and wcsftime_l,
uselocale must be set prior to use the version without _l.
For strtof_l, strtod_l, strtold_l, a wrapper simply calls
the default version.

libstdc++-v3/ChangeLog:
2021-01-12  ClÃ©ment Chigot  

* acinclude.m4: Add ieee_1003.1-2008 locale model.
* configure: Regenerate.
* config/os/aix/ctype_configure_char.cc: Enable locale support.
* testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
Handle AIX locale names.
* testsuite/util/testsuite_hooks.h: Likewise.
* config/locale/dragonfly/c_locale.cc: Removed.
* config/locale/dragonfly/c_locale.h: Removed.
* config/locale/dragonfly/codecvt_members.cc: Removed.
* config/locale/dragonfly/collate_members.cc: Removed.
* config/locale/dragonfly/ctype_members.cc: Removed.
* config/locale/dragonfly/monetary_members.cc: Removed.
* config/locale/dragonfly/numeric_members.cc: Removed.
* config/locale/dragonfly/time_members.cc: Removed.
* config/locale/dragonfly/time_members.h: Removed.
* config/locale/ieee_1003.1-2008/c_locale.cc: New file.
* config/locale/ieee_1003.1-2008/c_locale.h: New file.
* config/locale/ieee_1003.1-2008/codecvt_members.cc: New file.
* config/locale/ieee_1003.1-2008/collate_members.cc: New file.
* config/locale/ieee_1003.1-2008/ctype_members.cc: New file.
* config/locale/ieee_1003.1-2008/monetary_members.cc: New file.
* config/locale/ieee_1003.1-2008/numeric_members.cc: New file.
* config/locale/ieee_1003.1-2008/time_members.cc: New file.
* config/locale/ieee_1003.1-2008/time_members.h: New file.
---
 libstdc++-v3/acinclude.m4 | 41 ++---
 .../c_locale.cc   |  3 +
 .../c_locale.h| 31 ++
 .../codecvt_members.cc|  0
 .../collate_members.cc|  0
 .../ctype_members.cc  | 58 +++
 .../monetary_members.cc   | 39 -
 .../numeric_members.cc| 27 +
 .../time_members.cc   | 32 +-
 .../time_members.h|  0
 .../config/os/aix/ctype_configure_char.cc | 44 +++---
 libstdc++-v3/testsuite/lib/libstdc++.exp  |  5 ++
 libstdc++-v3/testsuite/util/testsuite_hooks.h |  3 +
 13 files changed, 252 insertions(+), 31 deletions(-)
 rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/c_locale.cc 
(99%)
 rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/c_locale.h 
(80%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/codecvt_members.cc (100%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/collate_members.cc (100%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/ctype_members.cc (82%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/monetary_members.cc (97%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/numeric_members.cc (93%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/time_members.cc (94%)
 rename libstdc++-v3/config/locale/{dragonfly => 
ieee_1003.1-2008}/time_members.h (100%)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e4175ea3e64..64a91b72359 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2421,7 +2421,7 @@ dnl
 AC_DEFUN([GLIBCXX_ENABLE_CLOCALE], [
   GLIBCXX_ENA

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Jonathan Wakely via Gcc-patches


On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:

Hi everyone,Â 

I've reworked the patch to merged dragonfly and AIX
models into the new one named "ieee_1003.1-2008".Â 
It seems okay on the AIX part but if someone can test
on Dragonfly and Freebsd I would be glad.Â Configure
needs to be regenerated, first.


Presumably it could also be tested on GNU/Linux and Solaris, since
they implement the POSIX 2008 APIs needed.

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread CHIGOT, CLEMENT via Gcc-patches

> And I don't know for the Changelog/Patch, how renamed 
> files must be handled ? I've retrieved my git commit with 
> "git format-patch --no-renames" and thus the Changelog 
> is made of "Removed" and "New File". Is it okay or is there 
> anything special to add ? I didn't find the answer quickly. 
Actually, on the one, I've sent I've forgotten the 
"--no-renames" thus it's using git mechanism... 


Thanks, 
Clément

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread CHIGOT, CLEMENT via Gcc-patches

> >Hi everyone,Â 
> >
> >I've reworked the patch to merged dragonfly and AIX
> >models into the new one named "ieee_1003.1-2008".Â 
> >It seems okay on the AIX part but if someone can test
> >on Dragonfly and Freebsd I would be glad.Â Configure
> >needs to be regenerated, first.
> 
> Presumably it could also be tested on GNU/Linux and Solaris, since
> they implement the POSIX 2008 APIs needed.
Indeed, I'll try it tomorrow for Linux. I don't have any Solaris 
VMs on my labs. 
Note that I've forgotten to add some "defined(__FreeBSD__)"
in the current patch, so it should not work on it. I'll send 
a better one tomorrow after having check on Linux.

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread David Edelsohn via Gcc-patches

On Tue, Jan 12, 2021 at 10:25 AM Jonathan Wakely  wrote:
>
> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:
> >Hi everyone,
> >
> >I've reworked the patch to merged dragonfly and AIX
> >models into the new one named "ieee_1003.1-2008".
> >It seems okay on the AIX part but if someone can test
> >on Dragonfly and Freebsd I would be glad. Configure
> >needs to be regenerated, first.
>
> Presumably it could also be tested on GNU/Linux and Solaris, since
> they implement the POSIX 2008 APIs needed.

GNU/Linux currently uses the "gnu" locale configuration.  Are you
suggesting that GNU/Linux use the new POSIX 2008 locale configuration
by default or to use GNU/Linux as another sniff test?

Thanks, David

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Rainer Orth

Hi Jonathan,

> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:
>>Hi everyone,Â 
>>
>>I've reworked the patch to merged dragonfly and AIX
>>models into the new one named "ieee_1003.1-2008".Â 
>>It seems okay on the AIX part but if someone can test
>>on Dragonfly and Freebsd I would be glad.Â Configure
>>needs to be regenerated, first.
>
> Presumably it could also be tested on GNU/Linux and Solaris, since
> they implement the POSIX 2008 APIs needed.

I'll give the patch a whirl on Solaris.  However, we will need to
distinguish between 11.3 (which is XPG6 only) and 11.4 (which support
XPG7).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Rainer Orth

Hi Clement,

> I've reworked the patch to merged dragonfly and AIX
> models into the new one named "ieee_1003.1-2008". 
> It seems okay on the AIX part but if someone can test
> on Dragonfly and Freebsd I would be glad. Configure 
> needs to be regenerated, first.
>
> For now, I've used #ifdef inside the code for the few
> differences. There are less than I thought. So, it seems 
> okay to me. 

TBH, I find this liberal sprinkling of target-specific #ifdefs over the
code horrible: it's completely out of style with GCC conventions.  This
is exactly what autoconf is for: test for the existance of headers and
functions like strtof_l and act accordingly.  If in the future other
OSes will make use of the code, most differences will already be handled
and it's way easier to understand defined(HAVE_STRTOF_L) than
defined(_AIX) || defined(HP_UX).  From your code one simply cannot tell
why some code is used on AIX while something else on (say) DragonflyBSD.
It gets even worse when (as does happen) standard support differs
between versions: expressing this with platform ifdefs is nothing short
of a nightmare.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH][pushed] gcov: add more debugging facility

2021-01-12 Thread H.J. Lu via Gcc-patches

On Tue, Jan 12, 2021 at 3:55 AM Martin Liška  wrote:
>
> The patch is about an extensive debugging facility that I see
> beneficial for GCOV issue.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> I'm going to push it.
>
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * gcov.c (source_info::debug): New.
> (print_usage): Add --debug (-D) option.
> (process_args): Likewise.
> (generate_results): Call src->debug after
> accumulate_line_counts.
> (read_graph_file): Properly assign id for EXIT_BLOCK.
> * profile.c (branch_prob): Dump function body before it is
> instrumented.
> ---
>   gcc/gcov.c| 43 ++-
>   gcc/profile.c |  5 +
>   2 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/gcov.c b/gcc/gcov.c
> index 01ca52b215d..93128721ef6 100644
> --- a/gcc/gcov.c
> +++ b/gcc/gcov.c
> @@ -371,6 +371,9 @@ public:
> /* Register a new function.  */
> void add_function (function_info *fn);
>
> +  /* Debug the source file.  */
> +  void debug ();
> +
> /* Index of the source_info in sources vector.  */
> unsigned index;
>
> @@ -428,6 +431,31 @@ source_info::get_functions_at_location (unsigned 
> line_num) const
> return slot;
>   }
>
> +void source_info::debug ()
> +{
> +  fprintf (stderr, "source_info: %s\n", name);
> +  for (vector::iterator it = functions.begin ();
> +   it != functions.end (); it++)
> +{
> +  function_info *fn = *it;
> +  fprintf (stderr, "  function_info: %s\n", fn->get_name ());
> +  for (vector::iterator bit = fn->blocks.begin ();
> +  bit != fn->blocks.end (); bit++)
> +   {
> + fprintf (stderr, "block_info id=%d, count=%ld\n",
> +  bit->id, bit->count);
> +   }
> +}
> +
> +  for (unsigned lineno = 1; lineno < lines.size (); ++lineno)
> +{
> +  line_info &line = lines[lineno];
> +  fprintf (stderr, "  line_info=%d, count=%ld\n", lineno, line.count);
> +}
> +
> +  fprintf (stderr, "\n");
> +}
> +

This breaks build on 32-bit hosts:

https://gcc.gnu.org/pipermail/gcc-regression/2021-January/074130.html

-- 
H.J.

[committed] LRA: patch to fix PR97969

2021-01-12 Thread Vladimir Makarov via Gcc-patches


The following patch fixes

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969

The patch was successfully bootstrapped on x86-64.

[PR97969] LRA: Transform pattern `plus (plus (hard reg, const),	pseudo)` after elimination

LRA can loop infinitely on targets without `reg + imm` insns.  Register elimination
on such targets can increase register pressure resulting in permanent
stack size increase and changing elimination offset.  To avoid such situation, a simple
transformation can be done to avoid register pressure increase after
generating reload insns containing eliminated hard regs.

gcc/ChangeLog:

	PR target/97969
	* lra-eliminations.c (eliminate_regs_in_insn): Add transformation
	of pattern 'plus (plus (hard reg, const), pseudo)'.

gcc/testsuite/ChangeLog:

	PR target/97969
	* gcc.target/arm/pr97969.c: New.

diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index b28f3c410d3..ebcadd1006c 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -885,7 +885,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p,
 			poly_int64 update_sp_offset)
 {
   int icode = recog_memoized (insn);
-  rtx old_set = single_set (insn);
+  rtx set, old_set = single_set (insn);
   bool validate_p;
   int i;
   rtx substed_operand[MAX_RECOG_OPERANDS];
@@ -1038,6 +1038,32 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p,
   for (i = 0; i < static_id->n_dups; i++)
 *id->dup_loc[i] = substed_operand[(int) static_id->dup_num[i]];
 
+  /* Transform plus (plus (hard reg, const), pseudo) to plus (plus (pseudo,
+ const), hard reg) in order to keep insn containing eliminated register
+ after all reloads calculating its offset.  This permits to keep register
+ pressure under control and helps to avoid LRA cycling in patalogical
+ cases.  */
+  if (! replace_p && (set = single_set (insn)) != NULL
+  && GET_CODE (SET_SRC (set)) == PLUS
+  && GET_CODE (XEXP (SET_SRC (set), 0)) == PLUS)
+{
+  rtx reg1, reg2, op1, op2;
+  
+  reg1 = op1 = XEXP (XEXP (SET_SRC (set), 0), 0);
+  reg2 = op2 = XEXP (SET_SRC (set), 1);
+  if (GET_CODE (reg1) == SUBREG)
+	reg1 = SUBREG_REG (reg1);
+  if (GET_CODE (reg2) == SUBREG)
+	reg2 = SUBREG_REG (reg2);
+  if (REG_P (reg1) && REG_P (reg2)
+	  && REGNO (reg1) < FIRST_PSEUDO_REGISTER
+	  && REGNO (reg2) >= FIRST_PSEUDO_REGISTER)
+	{
+	  XEXP (XEXP (SET_SRC (set), 0), 0) = op2;
+	  XEXP (SET_SRC (set), 1) = op1;
+	}
+}
+
   /* If we had a move insn but now we don't, re-recognize it.
  This will cause spurious re-recognition if the old move had a
  PARALLEL since the new one still will, but we can't call
diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c b/gcc/testsuite/gcc.target/arm/pr97969.c
new file mode 100644
index 000..714a1d18870
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97969.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w -Os" } */
+
+typedef a[23];
+enum { b };
+typedef struct {
+  int c;
+  char *e;
+  char f
+} d;
+typedef enum { g = 1 } h;
+typedef struct {
+  h i;
+  int j
+} k;
+typedef struct {
+  a l;
+  int a;
+  int m;
+  int n;
+  int o;
+  short p;
+  int q;
+  k r;
+  char e;
+  char *s;
+  d t;
+  d *u;
+  short v;
+  int w
+} aa;
+c(char x, int y, char z, int ab) {
+  aa ac;
+  ac.r.i = 0;
+  d ad;
+  ac.t = ad;
+  ac.u = 0;
+  ae(&ac.v, 0, 0);
+  ac.w = 0;
+  af(&ac, x + y, z, z + ab);
+  if (ag(0))
+return 0;
+  if (x)
+ac.s = z + ab;
+  else
+ac.s = x + y;
+  ac.o |= g;
+  if (!setjmp()) {
+ah(ac);
+ai(b);
+ac.e = z + ab;
+aj(ac);
+  }
+}

[PATCH] Fix typo in function-abi.h

2021-01-12 Thread Andrea Corallo via Gcc-patches

Hi all,

just pushed the following as obvious.

Regards

  Andrea
  
>From 1aff68d54c33ae10fbbb52adb50527baf7b2f627 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Tue, 12 Jan 2021 17:52:52 +0100
Subject: [PATCH] Fix typo in function-abi.h

gcc/Changelog

2021-01-12  Andrea Corallo  

* function-abi.h: Fix typo.
---
 gcc/function-abi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/function-abi.h b/gcc/function-abi.h
index 0f34c8cfc65..ac58f676e33 100644
--- a/gcc/function-abi.h
+++ b/gcc/function-abi.h
@@ -1,4 +1,4 @@
-/* Information about fuunction binary interfaces.
+/* Information about function binary interfaces.
Copyright (C) 2019-2021 Free Software Foundation, Inc.
 
 This file is part of GCC
-- 
2.17.1

Re: [PATCH][pushed] gcov: add more debugging facility

2021-01-12 Thread Martin Liška


On 1/12/21 5:27 PM, H.J. Lu wrote:

This breaks build on 32-bit hosts:


Sorry for that.

Should be fixed with:

commit 248feb2fa2c0dfc8950566010b30351879c9d057
Author: Martin Liska 
Date:   Tue Jan 12 18:16:05 2021 +0100

gcov: fix printf format for 32-bit hosts

gcc/ChangeLog:

* gcov.c (source_info::debug): Fix printf format for 32-bit hosts.


diff --git a/gcc/gcov.c b/gcc/gcov.c
index 93128721ef6..5c651a9bdce 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -442,7 +442,7 @@ void source_info::debug ()
   for (vector::iterator bit = fn->blocks.begin ();
   bit != fn->blocks.end (); bit++)
{
- fprintf (stderr, "block_info id=%d, count=%ld\n",
+ fprintf (stderr, "block_info id=%d, count=%" PRId64 " \n",
   bit->id, bit->count);
}
 }
@@ -450,7 +450,7 @@ void source_info::debug ()
   for (unsigned lineno = 1; lineno < lines.size (); ++lineno)
 {
   line_info &line = lines[lineno];
-  fprintf (stderr, "  line_info=%d, count=%ld\n", lineno, line.count);
+  fprintf (stderr, "  line_info=%d, count=%" PRId64 "\n", lineno, 
line.count);
 }
 
   fprintf (stderr, "\n");


I'm going to install the patch.
Martin

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Jonathan Wakely via Gcc-patches

On 12/01/21 10:44 -0500, David Edelsohn wrote:

On Tue, Jan 12, 2021 at 10:25 AM Jonathan Wakely  wrote:

On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:
>Hi everyone,
>
>I've reworked the patch to merged dragonfly and AIX
>models into the new one named "ieee_1003.1-2008".
>It seems okay on the AIX part but if someone can test
>on Dragonfly and Freebsd I would be glad. Configure
>needs to be regenerated, first.

Presumably it could also be tested on GNU/Linux and Solaris, since
they implement the POSIX 2008 APIs needed.

GNU/Linux currently uses the "gnu" locale configuration.  Are you
suggesting that GNU/Linux use the new POSIX 2008 locale configuration
by default or to use GNU/Linux as another sniff test?

I'm suggesting to use it for testing that the new config is portable
to other systems that provide the POSIX 2008 APIs.

We wouldn't want to use it by default, because POSIX doesn't provide
strtod_l etc that glibc provides. But maybe the "gnu" model should use
newlocale and uselocale instead of __newlocale etc. so that we depend
on fewer GNU extensions that are only defined with _GNU_SOURCE.

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Rainer Orth

Hi Jonathan,

>> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:
>>>Hi everyone,Â 
>>>
>>>I've reworked the patch to merged dragonfly and AIX
>>>models into the new one named "ieee_1003.1-2008".Â 
>>>It seems okay on the AIX part but if someone can test
>>>on Dragonfly and Freebsd I would be glad.Â Configure
>>>needs to be regenerated, first.
>>
>> Presumably it could also be tested on GNU/Linux and Solaris, since
>> they implement the POSIX 2008 APIs needed.
>
> I'll give the patch a whirl on Solaris.  However, we will need to
> distinguish between 11.3 (which is XPG6 only) and 11.4 (which support
> XPG7).

as almost expected, a build on Solaris 11.4 failed miserably due to the
use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l,
strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations
are currently guarded by _AIX.  Should those be used in a directory
supposed to conform to POSIX.1-2008 at all?  OTOH, almost duplicating
the code into a separate bsd (or whatever, it's certainly not only
DragonflyBSD) directory for this sole reason would be a shame.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread David Edelsohn via Gcc-patches

On Tue, Jan 12, 2021 at 12:41 PM Rainer Orth
 wrote:
>
> Hi Jonathan,
>
> >> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote:
> >>>Hi everyone,Â
> >>>
> >>>I've reworked the patch to merged dragonfly and AIX
> >>>models into the new one named "ieee_1003.1-2008".Â
> >>>It seems okay on the AIX part but if someone can test
> >>>on Dragonfly and Freebsd I would be glad.Â Configure
> >>>needs to be regenerated, first.
> >>
> >> Presumably it could also be tested on GNU/Linux and Solaris, since
> >> they implement the POSIX 2008 APIs needed.
> >
> > I'll give the patch a whirl on Solaris.  However, we will need to
> > distinguish between 11.3 (which is XPG6 only) and 11.4 (which support
> > XPG7).
>
> as almost expected, a build on Solaris 11.4 failed miserably due to the
> use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l,
> strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations
> are currently guarded by _AIX.  Should those be used in a directory
> supposed to conform to POSIX.1-2008 at all?  OTOH, almost duplicating
> the code into a separate bsd (or whatever, it's certainly not only
> DragonflyBSD) directory for this sole reason would be a shame.

Hi, Rainer

Thanks for testing.

I agree that #ifdef's are not the correct approach, but, if you enable
the fallbacks for Solaris, does everything then work?  Are those
fallbacks portable and we solely need a better mechanism to enable
them on platforms that require them?

Thanks, David

[PATCH] MAINTAINERS: Fix spacing

2021-01-12 Thread Segher Boessenkool

We indent with tabs, not spaces.  This fixes it.

Committed.


2021-01-12  Segher Boessenkool  

* MAINTAINERS: Fix spacing.
---
 MAINTAINERS | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e88808f..6ec346c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -415,7 +415,7 @@ Jiufu Guo   

 Xuepeng Guo
 Wei Guozhi 
 Mostafa Hagog  
-Andrew Haley   
+Andrew Haley   
 Frederik Harwath   
 Stuart Hastings
 Michael Haubenwallner  

@@ -451,7 +451,7 @@ Daniel Jacobowitz   
 Andreas Jaeger 
 Harsha Jagasia 
 Fariborz Jahanian  
-Qian Jianhua
+Qian Jianhua   
 Janis Johnson  
 Teresa Johnson 
 Kean Johnston  
@@ -465,7 +465,7 @@ Andi Kleen  

 Jeff Knaggs
 Michael Koch   
 Nicolas Koenig 
-Boris Kolpackov 
+Boris Kolpackov

 Dave Korn  
 Julia Koval
 Hongtao Liu
@@ -480,7 +480,7 @@ Razya Ladelsky  

 Thierry Lafage 
 Aaron W. LaFramboise   
 Rask Ingemann Lambertsen   
-Jerome Lambourg 
+Jerome Lambourg
 Asher Langton  
 Chris Lattner  
 Terry Laurenzo 
-- 
1.8.3.1

Re: [COMMITED] MAINTAINERS: Add myself for write after approval

2021-01-12 Thread Segher Boessenkool

Hi!

On Tue, Jan 12, 2021 at 03:44:22PM +0800, Qian Jianhua wrote:
> ChangeLog:
> 
> 2021-01-12  Qian Jianhua  
> 
>   * MAINTAINERS (Write After Approval): Add myself

Welcome!

> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -451,6 +451,7 @@ Daniel Jacobowitz 
>  Andreas Jaeger   
>  Harsha Jagasia   
>  Fariborz Jahanian
> +Qian Jianhua

Indents should be with tabs here.  I fixed it (and some others).


Segher

[PATCH, Fortran] Bug fix in ISO_Fortran_binding - unsigned char arrays

2021-01-12 Thread Harris Snyder

Hi everyone,

In a previous email thread (subject: Possible bug re:
ISO_Fortran_binding.h) it was determined that there was a bug in
ISO_Fortran_binding.c. When attempting to use the ISO Fortran binding
from C to create an array of type uint8_t or signed char, a crash
would result unless the element size was manually specified as 1 when
calling CFI_establish. The F2018 standard indicates that signed char /
uint8_t should be an integer type and therefore explicitly specifying
the element size should not be required. The attached patch fixes the
issue (test case included).

OK for master? I don't have write access so I will need someone else
to commit this for me if possible.

Thanks,
Harris Snyder


Fixes a bug in ISO_Fortran_binding.c whereby signed char or uint8_t
arrays would cause crashes unless an element size is specified.

libgfortran/ChangeLog:

* runtime/ISO_Fortran_binding.c (CFI_establish): fixed signed char arrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/iso_fortran_binding_uint8_array.f90: New test.
* gfortran.dg/iso_fortran_binding_uint8_array_driver.c: New test.


diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90
new file mode 100644
index 000..bf85bf52949
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90
@@ -0,0 +1,11 @@
+! { dg-do run }
+! { dg-additional-sources iso_fortran_binding_uint8_array_driver.c }
+
+module m
+   use iso_c_binding
+contains
+   subroutine fsub( x ) bind(C, name="fsub")
+  integer(c_int8_t), intent(inout) :: x(:)
+  x = x+1
+   end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c
new file mode 100644
index 000..84ed127d525
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c
@@ -0,0 +1,25 @@
+#include 
+#include 
+#include 
+#include "ISO_Fortran_binding.h"
+
+extern void fsub(CFI_cdesc_t *);
+
+int main(void)
+{
+   int8_t x[] = {1,2,3,4};
+   int N = sizeof(x)/sizeof(x[0]);
+
+   CFI_CDESC_T(1) dat;
+   CFI_index_t ext[1];
+   ext[0] = (CFI_index_t)N;
+   int rc = CFI_establish((CFI_cdesc_t *)&dat, &x, CFI_attribute_other,
+  CFI_type_int8_t, 0, (CFI_rank_t)1, ext);
+   printf("CFI_establish call returned: %d\n", rc);
+
+   fsub((CFI_cdesc_t *)&dat );
+
+   for (int i=0; ibase_addr = base_addr;

   if (type == CFI_type_char || type == CFI_type_ucs4_char ||
-  type == CFI_type_signed_char || type == CFI_type_struct ||
-  type == CFI_type_other)
+  type == CFI_type_struct || type == CFI_type_other)
 dv->elem_len = elem_len;
   else
 {

[PATCH] sh: Remove match_scratch operand test

2021-01-12 Thread Richard Sandiford via Gcc-patches

This patch fixes a regression on sh4 introduced by the rtl-ssa stuff.
The port had a pattern:

(define_insn "movsf_ie"
  [(set (match_operand:SF 0 "general_movdst_operand"
"=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y")
(match_operand:SF 1 "general_movsrc_operand"
" f,r,G,H,FQ,mf,f,FQ,mr,r,y,f,>,fr,y,r,y,>,y"))
   (use (reg:SI FPSCR_MODES_REG))
   (clobber (match_scratch:SI 2 "=X,X,X,X,&z, X,X, X, X,X,X,X,X, y,X,X,X,X,X"))]
  "TARGET_SH2E
   && (arith_reg_operand (operands[0], SFmode)
   || fpul_operand (operands[0], SFmode)
   || arith_reg_operand (operands[1], SFmode)
   || fpul_operand (operands[1], SFmode)
   || arith_reg_operand (operands[2], SImode))"

But recog can generate this pattern from something that matches:

  [(set (match_operand:SF 0 "general_movdst_operand")
(match_operand:SF 1 "general_movsrc_operand")
   (use (reg:SI FPSCR_MODES_REG))]

with recog adding the (clobber (match_scratch:SI)) automatically.
recog tests the C condition before adding the clobber, so there might
not be an operands[2] to test.

Similarly, gen_movsf_ie takes only two arguments, with operand 2
being filled in automatically.  The only way to create this pattern
with a REG operands[2] before RA would be to generate it directly
from RTL.  AFAICT the only things that do this are the secondary
reload patterns, which are generated during RA and come with
pre-vetted operands.

arith_reg_operand rejects 6 specific registers:

  return (regno != T_REG && regno != PR_REG
  && regno != FPUL_REG && regno != FPSCR_REG
  && regno != MACH_REG && regno != MACL_REG);

The fpul_operand tests allow FPUL_REG, leaving 5 invalid registers.
However, in all alternatives of movsf_ie, either operand 0 or
operand 1 is a register that belongs r, f or y, none of which
include any of the 5 rejected registers.  This means that any
post-RA pattern would satisfy the operands[0] or operands[1]
condition without the operands[2] test being necessary.

Tested on sh4-elf, which it fixes quite a few ICEs.  OK to install?

Richard


gcc/
* config/sh/sh.md (movsf_ie): Remove operands[2] test.
---
 gcc/config/sh/sh.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index fe1231612cc..e3af9ae21c1 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -6067,8 +6067,7 @@ (define_insn "movsf_ie"
&& (arith_reg_operand (operands[0], SFmode)
|| fpul_operand (operands[0], SFmode)
|| arith_reg_operand (operands[1], SFmode)
-   || fpul_operand (operands[1], SFmode)
-   || arith_reg_operand (operands[2], SImode))"
+   || fpul_operand (operands[1], SFmode))"
   "@
fmov%1,%0
mov %1,%0

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-12 Thread Rainer Orth

Hi David,

>> >> Presumably it could also be tested on GNU/Linux and Solaris, since
>> >> they implement the POSIX 2008 APIs needed.
>> >
>> > I'll give the patch a whirl on Solaris.  However, we will need to
>> > distinguish between 11.3 (which is XPG6 only) and 11.4 (which support
>> > XPG7).
>>
>> as almost expected, a build on Solaris 11.4 failed miserably due to the
>> use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l,
>> strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations
>> are currently guarded by _AIX.  Should those be used in a directory
>> supposed to conform to POSIX.1-2008 at all?  OTOH, almost duplicating
>> the code into a separate bsd (or whatever, it's certainly not only
>> DragonflyBSD) directory for this sole reason would be a shame.
>
> Hi, Rainer
>
> Thanks for testing.
>
> I agree that #ifdef's are not the correct approach, but, if you enable
> the fallbacks for Solaris, does everything then work?  Are those
> fallbacks portable and we solely need a better mechanism to enable
> them on platforms that require them?

it mostly compiles, with two caveats:

* c_locale.h needs to include  for declarations of strtod and
  friends.

* Solaris  only declares the int_p_cs_precedes etc. members of
  struct lconv for C99+, but not for C++11+, as it should.  I'll file a
  bug for that, but for now one can work around the issue by defining
  _LCONV_C99 before including  in monetary_members.cc.

With those changes, I can at least build libstdc++ with
--enable-clocale=ieee_1003.1-2008.  Bootstrap still running, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[committed] avoid two more ICEs in print_mem_ref (PR c/98597)

2021-01-12 Thread Martin Sebor via Gcc-patches


A recent improvement to MEM_REF formatting introduced a few invalid
assumptions that didn't get exposed during testing and ended up
causing ICEs in downstream packages and some distress to their
maintainers.  I have committed the attached patch to remove two
of these assumptions and unblock the package builds.

Since the MEM_REF formatting is clearly not being sufficiently
exercised by the test suite I've also instrumented GCC to format
every MEM_REF the uninit pass comes across, independent of whether
or nor it is initialized, and will build a few packages with it
to uncover any outstanding bugs.

Martin
commit 5a9cfad2de92f2d65585774acb524b3fa17621b5
Author: Martin Sebor 
Date:   Tue Jan 12 12:58:27 2021 -0700

Avoid a couple more ICEs in print_mem_ref (PR c/98597).

Resolves:
PR c/98597 - ICE in -Wuninitialized printing a MEM_REF
PR c/98592 - ICE in gimple_canonical_types_compatible_p while formatting

gcc/c-family/ChangeLog:

PR c/98597
PR c/98592
* c-pretty-print.c (print_mem_ref): Avoid assuming MEM_REF operand
has pointer type.  Remove redundant code.  Avoid calling
gimple_canonical_types_compatible_p.

gcc/testsuite/ChangeLog:

PR c/98597
PR c/98592
* g++.dg/warn/Wuninitialized-13.C: New test.
 gcc.dg/uninit-39.c: New test.

#

diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 87301a2091c..4d17c270b16 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -1847,10 +1847,11 @@ print_mem_ref (c_pretty_printer *pp, tree e)
   tree access_type = TREE_TYPE (e);
   if (TREE_CODE (access_type) == ARRAY_TYPE)
 access_type = TREE_TYPE (access_type);
-  tree arg_type = TREE_TYPE (TREE_TYPE (arg));
+  tree arg_type = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (arg_type))
+arg_type = TREE_TYPE (arg_type);
   if (TREE_CODE (arg_type) == ARRAY_TYPE)
 arg_type = TREE_TYPE (arg_type);
-
   if (tree access_size = TYPE_SIZE_UNIT (access_type))
 if (TREE_CODE (access_size) == INTEGER_CST)
   {
@@ -1866,16 +1867,13 @@ print_mem_ref (c_pretty_printer *pp, tree e)
 
   /* True to include a cast to the accessed type.  */
   const bool access_cast = VOID_TYPE_P (arg_type)
-|| !gimple_canonical_types_compatible_p (access_type, arg_type);
+|| TYPE_MAIN_VARIANT (access_type) != TYPE_MAIN_VARIANT (arg_type);
 
   if (byte_off != 0)
 {
   /* When printing the byte offset for a pointer to a type of
 	 a different size than char, include a cast to char* first,
 	 before printing the cast to a pointer to the accessed type.  */
-  tree arg_type = TREE_TYPE (TREE_TYPE (arg));
-  if (TREE_CODE (arg_type) == ARRAY_TYPE)
-	arg_type = TREE_TYPE (arg_type);
   offset_int arg_size = 0;
   if (tree size = TYPE_SIZE (arg_type))
 	arg_size = wi::to_offset (size);
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C
new file mode 100644
index 000..49ee878806a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C
@@ -0,0 +1,28 @@
+/* PR c/98597 - ICE in -Wuninitialized printing a MEM_REF
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+struct shared_count {
+  shared_count () { }
+  shared_count (shared_count &r)
+: pi (r.pi) { } // { dg-warning "\\\[-Wuninitialized" }
+  int pi;
+};
+
+// There's another (redundant) -Wuninitialized on the line below.
+struct shared_ptr {
+  int ptr;
+  shared_count refcount;
+};
+
+struct Bar {
+  Bar (int, shared_ptr);
+};
+
+void g () {
+  shared_ptr foo;
+  Bar (0, foo);
+}
+
+// Prune out duplicates.
+// { dg-prune-output "-Wuninitialized" }
diff --git a/gcc/testsuite/gcc.dg/uninit-39.c b/gcc/testsuite/gcc.dg/uninit-39.c
new file mode 100644
index 000..0f918542773
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-39.c
@@ -0,0 +1,47 @@
+/* PR c/98592 - ICE in gimple_canonical_types_compatible_p while formatting
+   a MEM_REF
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+void f (int);
+
+void vlaNx3_to_pia1 (int n)
+{
+  int a[n][3];
+
+  /* The VLA isn't formatted correctly due to PR 98587.  Just verify
+ there is no ICE and a warning is issued.  */
+  f (((*(int(*)[4])&a[1][2]))[3]);  // { dg-warning "\\\[-Wuninitialized" }
+}
+
+void vlaNxN_to_pia1 (int n)
+{
+  int a[n][n];
+
+  /* Same as above.  */
+  f (((*(int(*)[4])&a[1][2]))[3]);  // { dg-warning "\\\[-Wuninitialized" }
+}
+
+void vlaNxN_to_pvla4xN (int n)
+{
+  int a[n][n];
+
+  /* Same as above.  */
+  f (((*(int(*)[4][n])&a[1][2]))[3][4]);  // { dg-warning "\\\[-Wuninitialized" }
+}
+
+void vlaN_to_pia2 (int n)
+{
+  int a[n];
+
+  /* Same as above.  */
+  f (((*(int(*)[3][4])&a[1]))[2][3]);   // { dg-warning "\\\[-Wuninitialized" }
+}
+
+void vlaN_to_pvlaNx4 (int n)
+{
+  int a[n];
+
+  /* Same as above.  */
+  f (((*(int(*)[n][4])&a[1]))[1][3]);   //

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-12 Thread Qing Zhao via Gcc-patches

Hi, 

Just check in to see whether you have any comments and suggestions on this:

FYI, I have been continue with Approach D implementation since last week:

D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
.DEFFERED_INIT during expand to
real initialization. Adjusting uninitialized pass with the new refs with 
“.DEFFERED_INIT”.

For the remaining work of Approach D:

 ** complete the implementation of -ftrivial-auto-var-init=pattern;
 ** complete the implementation of uninitialized warnings maintenance work for 
D. 

I have completed the uninitialized warnings maintenance work for D.
And finished partial of the -ftrivial-auto-var-init=pattern implementation. 

The following are remaining work of Approach D:

   ** -ftrivial-auto-var-init=pattern for VLA;
   **add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.
   ** adding complete testing cases;
  

Please let me know if you have any objection on my current decision on 
implementing approach D. 

Thanks a lot for your help.

Qing


> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi,
> 
> This is an update for our previous discussion. 
> 
> 1. I implemented the following two different implementations in the latest 
> upstream gcc:
> 
> A. Adding real initialization during gimplification, not maintain the 
> uninitialized warnings.
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with 
> “.DEFFERED_INIT”.
> 
> Note, in this initial implementation,
>   ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
> -ftrivial-auto-var-init=pattern 
>  is not done yet.  Therefore, the performance data is only about 
> -ftrivial-auto-var-init=zero. 
> 
>   ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
> choose implementation A or D for 
>  runtime performance study.
>   ** I didn’t finish the uninitialized warnings maintenance work for D. 
> (That might take more time than I expected). 
> 
> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
> for the following 3 cases:
> 
> no: default. (-g -O2 -march=native )
> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> 
> And then compute the slowdown data for both A and D as following:
> 
> benchmarksA / no  D /no
> 
> 500.perlbench_r   1.25%   1.25%
> 502.gcc_r 0.68%   1.80%
> 505.mcf_r 0.68%   0.14%
> 520.omnetpp_r 4.83%   4.68%
> 523.xalancbmk_r   0.18%   1.96%
> 525.x264_r1.55%   2.07%
> 531.deepsjeng_11.57%  11.85%
> 541.leela_r   0.64%   0.80%
> 557.xz_-0.41% -0.41%
> 
> 507.cactuBSSN_r   0.44%   0.44%
> 508.namd_r0.34%   0.34%
> 510.parest_r  0.17%   0.25%
> 511.povray_r  56.57%  57.27%
> 519.lbm_r 0.00%   0.00%
> 521.wrf_r  -0.28% -0.37%
> 526.blender_r 16.96%  17.71%
> 527.cam4_r0.70%   0.53%
> 538.imagick_r 2.40%   2.40%
> 544.nab_r 0.00%   -0.65%
> 
> avg   5.17%   5.37%
> 
> From the above data, we can see that in general, the runtime performance 
> slowdown for 
> implementation A and D are similar for individual benchmarks.
> 
> There are several benchmarks that have significant slowdown with the new 
> added initialization for both
> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will 
> try to study a little bit
> more on what kind of new initializations introduced such slowdown. 
> 
> From the current study so far, I think that approach D should be good enough 
> for our final implementation. 
> So, I will try to finish approach D with the following remaining work
> 
>  ** complete the implementation of -ftrivial-auto-var-init=pattern;
>  ** complete the implementation of uninitialized warnings maintenance 
> work for D. 
> 
> 
> Let me know if you have any comments and suggestions on my current and future 
> work.
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
>>  wrote:
>> 
>> The following are the approaches I will implement and compare:
>> 
>> Our final goal is to keep the uninitialized warning and minimize the 
>> run-time performance cost.
>> 
>> A. Adding real initialization during gimplification, not maintain the 
>> uninitialized warnings.
>> B. Adding real initialization during gimplification, marking them with 
>> “artificial_init”. 
>>Adjusting uninitialized pass, maintaining the annotation, making sure the 
>> real init not
>>Deleted from the fake init. 
>> C.  Marking the DECL for an uninitialized auto variable as 
>> “no_explicit

Re: [patch] fix -Wformat-diag warnings in rs6000-call.c

2021-01-12 Thread Segher Boessenkool

On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote:
> These warnings, including the suggested fixes are seen on power*-linux builds.
> 
> warning: misspelled term 'builtin function' in format; use 'bult-in function'
> instead [-Wformat-diag]

This one is wrong.  Almost all the functions are called __builtin_xxx
so changing the warning message like that is just silly.  Instead, just
rephrase it so you do not need to repeat it,

The warning message itself is buggy: it pretends it knows what it is
talking about.  No warning ever does, that is the difference between
warnings and errors.

The hint (after the semi-colon) is extra wrong.

> warning: spurious trailing punctuation sequence '].' in format [-Wformat-diag]

This is also wrong (it says
  error ("Second argument of %qs must be in the range [0, 3].", name);
which is obviously correct: "]" is not punctuation, and that makes the
error message (again, not phrased as a warning) incorrect as well.

> warning: misspelled term 'floating point' in format; use 'floating-point'
> instead [-Wformat-diag]

It is a a noun here, which is spelled without dash.  The warning is
mistaken, and its advice (not phrased as advice) is bad advice.

[-- Attachment #2: format-diag.diff --]
[-- Type: text/x-patch, Encoding: quoted-printable, Size: 2.2K --]

Please use plain text attachments, so that people can a) read it on the
mail archives, and b) apply the patches easily, etc.

Segher

libgo patch committed: Ensure openat uses variadic wrapper

2021-01-12 Thread Ian Lance Taylor via Gcc-patches

This libgo patch by Paul Murphy ensures that openat always uses the
variadic libc wrapper.  On powerpc64le, not using the wrapper caused a
failure in TestUnshareUidGidMapping due to stack corruption which
resulted in a bogus execve syscall.  This fixes GCC PR 98610.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
3cab7034b52dbc40c78c8ccb1738795de80e927b
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 094b8fad483..cd95c3d0755 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-a2578eb3983514641f0baf44d27d6474d3a96758
+255657dc8d61ab26121ca68f124412ef37599166
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/exec_linux.go b/libgo/go/syscall/exec_linux.go
index 38975810432..0da6c964966 100644
--- a/libgo/go/syscall/exec_linux.go
+++ b/libgo/go/syscall/exec_linux.go
@@ -20,7 +20,7 @@ import (
 //mount(source *byte, target *byte, fstype *byte, flags _C_long, data *byte) 
_C_int
 
 //sysnb rawOpenat(dirfd int, pathname *byte, flags int, perm uint32) (fd int, 
err Errno)
-//openat(dirfd _C_int, pathname *byte, flags _C_int, perm Mode_t) _C_int
+//__go_openat(dirfd _C_int, pathname *byte, flags _C_int, perm Mode_t) _C_int
 
 // SysProcIDMap holds Container ID to Host ID mappings used for User 
Namespaces in Linux.
 // See user_namespaces(7).

Re: [patch] fix -Wformat-diag warnings in rs6000-call.c

2021-01-12 Thread Segher Boessenkool

Hi!

On Sun, Jan 10, 2021 at 02:18:24PM -0700, Martin Sebor wrote:
> Symbols/identifiers should be formatted using the appropriate
> directives or quoted in %< %>.

We do not have a way to mark up the mathematical symbols [], but we do
require those to express ranges (which is a bad idea /an sich/, but
that aside).

> The purpose of the -Wformat-diag warnings is to improve the consistency
> of user-visible messages

Which is not a good idea in and of itself.  It *is* good to identify bad
practices, of course, but to make the GCC feedback more dull and
formalised only makes it harder to find where an error came from (and
seeing that as a positive thing requires a special kind of mind
already ;-) ).

> and make them easier to translate.

That can be helpful.

> There was
> a discussion some time back about whether internal errors should fall
> into this category.

IMNSHO it is counter-productive to translate internal messages at all.
Can we automatically mark those as non-translated?

> I'm not sure if it reached a conclusion one way
> or the other but in similar situations elsewhere in GCC we have
> suppressed the warning via #pragma GCC diagnostic.

This isn't needed then either.  Win-win-win!

> If it takes too
> much effort to clean them up it might make sense to do the same here
> (the downside is that it doesn't help translators).  Otherwise,
> the messages are not really phrased in a way that's comprehensible
> either to users or to tranlators (acronyms like elt or rtx aren't 
> universally understood).

Neither translators or even users are an audience for those messages at
all: the only thing they should see is "internal error", and perhaps
how to report it.  Reporting a translated version of the message is a
bad idea.

Segher

[Patch, RFC] PR fortran/93340 - [8/9/10/11 Regression] fix missed substring simplifications

2021-01-12 Thread Harald Anlauf via Gcc-patches

Dear all,

when playing around with the issues exposed by PR93340, particularly visible
in the tree dump, I tried to find ways to simplify substrings in those cases
where they are eligible as designator, which is required e.g. in DATA 
statements.

Given my limited understanding, I finally arrived at a potential solution which
does that simplification near the end of match_string_constant in primary.c.
I couldn't find a better place, but I am open to better suggestions.

The simplification below does an even better job at detecting invalid substring
starting or ending indices than HEAD, and regtests cleanly on 
x86_64-pc-linux-gnu.

Feedback appreciated.  Is this potentially ok for master, or should this be done
differently?

Thanks,
Harald


PR fortran/93340 - fix missed substring simplifications

Substrings were not reduced early enough for use in initializations,
such as DATA statements.  Add an early simplification for substrings
with constant starting and ending points.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_resolve_substring): Add prototype.
* primary.c (match_string_constant): Simplify substrings with
constant starting and ending points.
* resolve.c: Rename resolve_substring to gfc_resolve_substring.
(gfc_resolve_ref): Use renamed function gfc_resolve_substring.

gcc/testsuite/ChangeLog:

* substr_10.f90: New test.
* substr_9.f90: New test.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6585e4f3ecd..4dd72b620c9 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3467,6 +3467,7 @@ bool find_forall_index (gfc_expr *, gfc_symbol *, int);
 bool gfc_resolve_index (gfc_expr *, int);
 bool gfc_resolve_dim_arg (gfc_expr *);
 bool gfc_is_formal_arg (void);
+bool gfc_resolve_substring (gfc_ref *, bool *);
 void gfc_resolve_substring_charlen (gfc_expr *);
 match gfc_iso_c_sub_interface(gfc_code *, gfc_symbol *);
 gfc_expr *gfc_expr_to_initialize (gfc_expr *);
diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c
index db9ecf9a4f6..7cb378e3090 100644
--- a/gcc/fortran/primary.c
+++ b/gcc/fortran/primary.c
@@ -1190,6 +1190,61 @@ got_delim:
   if (match_substring (NULL, 0, &e->ref, false) != MATCH_NO)
 e->expr_type = EXPR_SUBSTRING;

+  /* Substrings with constant starting and ending points are eligible as
+ designators (F2018, section 9.1).  Simplify substrings to make them usable
+ e.g. in data statements.  */
+  if (e->expr_type == EXPR_SUBSTRING
+  && e->ref && e->ref->type == REF_SUBSTRING
+  && e->ref->u.ss.start->expr_type == EXPR_CONSTANT
+  && (e->ref->u.ss.end == NULL
+	  || e->ref->u.ss.end->expr_type == EXPR_CONSTANT))
+{
+  gfc_expr *res;
+  ptrdiff_t istart, iend;
+  size_t length;
+  bool equal_length = false;
+
+  /* Basic checks on substring  starting and ending indices.  */
+  if (!gfc_resolve_substring (e->ref, &equal_length))
+	return MATCH_ERROR;
+
+  length = e->value.character.length;
+  istart = gfc_mpz_get_hwi (e->ref->u.ss.start->value.integer);
+  if (e->ref->u.ss.end == NULL)
+	iend = length;
+  else
+	iend = gfc_mpz_get_hwi (e->ref->u.ss.end->value.integer);
+
+  if (istart <= iend)
+	{
+	  if (istart < 1)
+	{
+	  gfc_error ("Substring start index (%ld) at %L below 1",
+			 (long) istart, &e->ref->u.ss.start->where);
+	  return MATCH_ERROR;
+	}
+	  if (iend > (ssize_t) length)
+	{
+	  gfc_error ("Substring end index (%ld) at %L exceeds string "
+			 "length", (long) iend, &e->ref->u.ss.end->where);
+	  return MATCH_ERROR;
+	}
+	  length = iend - istart + 1;
+	}
+  else
+	length = 0;
+
+  res = gfc_get_constant_expr (BT_CHARACTER, e->ts.kind, &e->where);
+  res->value.character.string = gfc_get_wide_string (length + 1);
+  res->value.character.length = length;
+  if (length > 0)
+	memcpy (res->value.character.string,
+		&e->value.character.string[istart - 1],
+		length * sizeof (gfc_char_t));
+  res->value.character.string[length] = '\0';
+  e = res;
+}
+
   *result = e;

   return MATCH_YES;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index f243bd185b0..3929ddff849 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -5068,8 +5068,8 @@ resolve_array_ref (gfc_array_ref *ar)
 }


-static bool
-resolve_substring (gfc_ref *ref, bool *equal_length)
+bool
+gfc_resolve_substring (gfc_ref *ref, bool *equal_length)
 {
   int k = gfc_validate_kind (BT_INTEGER, gfc_charlen_int_kind, false);

@@ -5277,7 +5277,7 @@ gfc_resolve_ref (gfc_expr *expr)

   case REF_SUBSTRING:
 	equal_length = false;
-	if (!resolve_substring (*prev, &equal_length))
+	if (!gfc_resolve_substring (*prev, &equal_length))
 	  return false;

 	if (expr->expr_type != EXPR_SUBSTRING && equal_length)
diff --git a/gcc/testsuite/substr_10.f90 b/gcc/testsuite/substr_10.f90
new file mode 100644
index 000..918ca8af162
--- /dev/null
+++ b/gcc/testsuite/substr_1

[PATCH] c++: Always check access during late-parsing of members [PR58993]

2021-01-12 Thread Patrick Palka via Gcc-patches

This patch removes a vestigial use of dk_no_check from
cp_parser_late_parsing_for_member, which ideally should have been
removed as part of the PR41437 patch that improved access checking
inside templates.  This allows us to correctly reject f1 and f2 below in
the testcase access34.C below (whereas before we'd only reject f3).

Additional testing revealed an access issue when late-parsing a hidden
friend within a class template.  In the testcase friend68.C below, we're
tripping over the checking assert from friend_accessible_p(f, S::j, S, S)
during lookup of j in x.j (for which type_dependent_object_expression_p
returns false, which is why we're doing the lookup at parse time).  The
reason for the assert failure is that DECL_FRIENDLIST(S) contains f but
DECL_BEFRIENDING_CLASSES(f) is empty, and so friend_accessible_p (which
looks at DECL_BEFRIENDING_CLASSES) wants to return false, but is_friend
(which looks at DECL_FRIENDLIST) returns true.  For sake of symmetry one
would probably hope that DECL_BEFRIENDING_CLASSES(f) contains S, but
add_friend avoids updating DECL_BEFRIENDING_CLASSES when the class type
(S in this case) is dependent, for some reason.

This patch works around this issue by making friend_accessible_p
consider the DECL_FRIEND_CONTEXT of SCOPE.  Thus we sidestep the
DECL_BEFRIENDING_CLASSES / DECL_FRIENDLIST asymmetry issue while
correctly validating the x.j access at parse time.

A earlier version of this patch called friend_accessible_p instead of
protected_accessible_p in the DECL_FRIEND_CONTEXT hunk below, but this
had the side effect of making us accept the ill-formed testcase
friend69.C below (ill-formed because a hidden friend is not actually a
class member, so g doesn't have access to B's members even though A is a
friend of B).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look like
the right approach?

gcc/cp/ChangeLog:

PR c++/41437
PR c++/58993
* search.c (friend_accessible_p): If scope is a hidden friend
defined inside a dependent class, consider access from the
defining class.
* parser.c (cp_parser_late_parsing_for_member): Don't push a
dk_no_check access state.

gcc/testsuite/ChangeLog:

PR c++/41437
PR c++/58993
* g++.dg/opt/pr87974.C: Adjust.
* g++.dg/template/access34.C: New test.
* g++.dg/template/friend68a.C: New test.
* g++.dg/template/friend68b.C: New test.
* g++.dg/template/friend69.C: New test.
---
 gcc/cp/parser.c  |  7 --
 gcc/cp/search.c  |  8 +++
 gcc/testsuite/g++.dg/opt/pr87974.C   |  1 +
 gcc/testsuite/g++.dg/template/access34.C | 29 
 gcc/testsuite/g++.dg/template/friend68.C | 13 +++
 gcc/testsuite/g++.dg/template/friend69.C | 18 +++
 6 files changed, 69 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access34.C
 create mode 100644 gcc/testsuite/g++.dg/template/friend68.C
 create mode 100644 gcc/testsuite/g++.dg/template/friend69.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index b95151bf90d..c26e16beb70 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30825,10 +30825,6 @@ cp_parser_late_parsing_for_member (cp_parser* parser, 
tree member_function)
   start_preparsed_function (member_function, NULL_TREE,
SF_PRE_PARSED | SF_INCLASS_INLINE);
 
-  /* Don't do access checking if it is a templated function.  */
-  if (processing_template_decl)
-   push_deferring_access_checks (dk_no_check);
-
   /* #pragma omp declare reduction needs special parsing.  */
   if (DECL_OMP_DECLARE_REDUCTION_P (member_function))
{
@@ -30842,9 +30838,6 @@ cp_parser_late_parsing_for_member (cp_parser* parser, 
tree member_function)
cp_parser_function_definition_after_declarator (parser,
/*inline_p=*/true);
 
-  if (processing_template_decl)
-   pop_deferring_access_checks ();
-
   /* Leave the scope of the containing function.  */
   if (function_scope)
pop_function_context ();
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 1a9dba451c7..dd3773da4f7 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -698,6 +698,14 @@ friend_accessible_p (tree scope, tree decl, tree type, 
tree otype)
   if (DECL_CLASS_SCOPE_P (scope)
  && friend_accessible_p (DECL_CONTEXT (scope), decl, type, otype))
return 1;
+  /* Perhaps SCOPE is a friend function defined inside a class from which
+DECL is accessible.  Checking this is necessary only when the class
+is dependent, for otherwise add_friend will already have added the
+class to SCOPE's DECL_BEFRIENDING_CLASSES.  */
+  if (tree fctx = DECL_FRIEND_CONTEXT (scope))
+   if (dependent_type_p (fctx)
+   && protected_accessible_p (decl, fctx, type, otype))
+ return

Re: [PATCH] sh: Remove match_scratch operand test

2021-01-12 Thread Jeff Law via Gcc-patches




On 1/12/21 12:35 PM, Richard Sandiford wrote:
> This patch fixes a regression on sh4 introduced by the rtl-ssa stuff.
> The port had a pattern:
>
> (define_insn "movsf_ie"
>   [(set (match_operand:SF 0 "general_movdst_operand"
>   "=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y")
>   (match_operand:SF 1 "general_movsrc_operand"
>   " f,r,G,H,FQ,mf,f,FQ,mr,r,y,f,>,fr,y,r,y,>,y"))
>(use (reg:SI FPSCR_MODES_REG))
>(clobber (match_scratch:SI 2 "=X,X,X,X,&z, X,X, X, X,X,X,X,X, 
> y,X,X,X,X,X"))]
>   "TARGET_SH2E
>&& (arith_reg_operand (operands[0], SFmode)
>|| fpul_operand (operands[0], SFmode)
>|| arith_reg_operand (operands[1], SFmode)
>|| fpul_operand (operands[1], SFmode)
>|| arith_reg_operand (operands[2], SImode))"
>
> But recog can generate this pattern from something that matches:
>
>   [(set (match_operand:SF 0 "general_movdst_operand")
>   (match_operand:SF 1 "general_movsrc_operand")
>(use (reg:SI FPSCR_MODES_REG))]
>
> with recog adding the (clobber (match_scratch:SI)) automatically.
> recog tests the C condition before adding the clobber, so there might
> not be an operands[2] to test.
>
> Similarly, gen_movsf_ie takes only two arguments, with operand 2
> being filled in automatically.  The only way to create this pattern
> with a REG operands[2] before RA would be to generate it directly
> from RTL.  AFAICT the only things that do this are the secondary
> reload patterns, which are generated during RA and come with
> pre-vetted operands.
>
> arith_reg_operand rejects 6 specific registers:
>
>   return (regno != T_REG && regno != PR_REG
> && regno != FPUL_REG && regno != FPSCR_REG
> && regno != MACH_REG && regno != MACL_REG);
>
> The fpul_operand tests allow FPUL_REG, leaving 5 invalid registers.
> However, in all alternatives of movsf_ie, either operand 0 or
> operand 1 is a register that belongs r, f or y, none of which
> include any of the 5 rejected registers.  This means that any
> post-RA pattern would satisfy the operands[0] or operands[1]
> condition without the operands[2] test being necessary.
>
> Tested on sh4-elf, which it fixes quite a few ICEs.  OK to install?
OK.

Hopefully sh4/sh4eb will bootstrap again after you install that change. 
Thanks!

jeff
>
> Richard
>
>
> gcc/
>   * config/sh/sh.md (movsf_ie): Remove operands[2] test.
> ---
>  gcc/config/sh/sh.md | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index fe1231612cc..e3af9ae21c1 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -6067,8 +6067,7 @@ (define_insn "movsf_ie"
> && (arith_reg_operand (operands[0], SFmode)
> || fpul_operand (operands[0], SFmode)
> || arith_reg_operand (operands[1], SFmode)
> -   || fpul_operand (operands[1], SFmode)
> -   || arith_reg_operand (operands[2], SImode))"
> +   || fpul_operand (operands[1], SFmode))"
>"@
>   fmov%1,%0
>   mov %1,%0
>

Re: [patch] fix -Wformat-diag warnings in rs6000-call.c

2021-01-12 Thread Joseph Myers

On Tue, 12 Jan 2021, Segher Boessenkool wrote:

> On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote:
> > These warnings, including the suggested fixes are seen on power*-linux 
> > builds.
> > 
> > warning: misspelled term 'builtin function' in format; use 'bult-in 
> > function'
> > instead [-Wformat-diag]
> 
> This one is wrong.  Almost all the functions are called __builtin_xxx
> so changing the warning message like that is just silly.  Instead, just
> rephrase it so you do not need to repeat it,

As documented in codingconventions.html we use "built-in" as an adjective 
in English text (as opposed to literal sequences of characters from C 
source code).  Text in diagnostics that is meant to be source code rather 
than English words should be enclosed in %<%>.  A literal %<__builtin%> 
should be unchanged in translations, whereas "built-in" used as an 
adjective should be translated.

> > warning: spurious trailing punctuation sequence '].' in format 
> > [-Wformat-diag]
> 
> This is also wrong (it says
>   error ("Second argument of %qs must be in the range [0, 3].", name);
> which is obviously correct: "]" is not punctuation, and that makes the
> error message (again, not phrased as a warning) incorrect as well.

The leading capital letter is incorrect, as is the trailing '.'; both of 
those are contrary to the GNU Coding Standards for diagnostics.

Once those are fixed, having a trailing ']' is fine.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] amdgcn: Fix subdf3 pattern

2021-01-12 Thread Julian Brown

This patch fixes a typo in the subdf3 pattern that meant it had a
non-standard name and thus the compiler would emit a libcall rather than
the proper hardware instruction for DFmode subtraction.

Tested with standalone AMD GCN target. I will commit shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* config/gcn/gcn-valu.md (subdf): Rename to...
(subdf3): This.
---
 gcc/config/gcn/gcn-valu.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 59d9849cb4e..24c17212c2a 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2185,7 +2185,7 @@
   [(set_attr "type" "vop3a")
(set_attr "length" "8,8")])
 
-(define_insn "subdf"
+(define_insn "subdf3"
   [(set (match_operand:DF 0 "register_operand"  "=  v,   v")
(minus:DF
  (match_operand:DF 1 "gcn_alu_operand" "vSvB,   v")
-- 
2.29.2

[PATCH] amdgcn: Improve FP division accuracy

2021-01-12 Thread Julian Brown

GCN has a reciprocal-approximation instruction but no
hardware divide. This patch adjusts the open-coded reciprocal
approximation/Newton-Raphson refinement steps to use fused multiply-add
instructions as is necessary to obtain a properly-rounded result, and
adds further refinement steps to correctly round the full division result.

The patterns in question are still guarded by a flag_reciprocal_math
condition, and do not yet support denormals.

Tested with standalone AMD GCN target, with the new test failing without
the path and passing with. I will commit shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* config/gcn/gcn-valu.md (recip2, recip2): Use unspec
for reciprocal-approximation instructions.
(div3): Use fused multiply-accumulate operations for reciprocal
refinement and division result.
* config/gcn/gcn.md (UNSPEC_RCP): New unspec constant.

gcc/testsuite/
* gcc.target/gcn/fpdiv.c: New test.
---
 gcc/config/gcn/gcn-valu.md   | 60 +++-
 gcc/config/gcn/gcn.md|  3 +-
 gcc/testsuite/gcc.target/gcn/fpdiv.c | 38 ++
 3 files changed, 81 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/gcn/fpdiv.c

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 24c17212c2a..beefcf754d7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2351,9 +2351,9 @@
 
 (define_insn "recip2"
   [(set (match_operand:V_FP 0 "register_operand"  "=  v")
-   (div:V_FP
- (vec_duplicate:V_FP (float: (const_int 1)))
- (match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))]
+   (unspec:V_FP
+ [(match_operand:V_FP 1 "gcn_alu_operand" "vSvB")]
+ UNSPEC_RCP))]
   ""
   "v_rcp%i0\t%0, %1"
   [(set_attr "type" "vop1")
@@ -2361,9 +2361,9 @@
 
 (define_insn "recip2"
   [(set (match_operand:FP 0 "register_operand"  "=  v")
-   (div:FP
- (float:FP (const_int 1))
- (match_operand:FP 1 "gcn_alu_operand"  "vSvB")))]
+   (unspec:FP
+ [(match_operand:FP 1 "gcn_alu_operand" "vSvB")]
+ UNSPEC_RCP))]
   ""
   "v_rcp%i0\t%0, %1"
   [(set_attr "type" "vop1")
@@ -2382,28 +2382,39 @@
(match_operand:V_FP 2 "gcn_valu_src0_operand")]
   "flag_reciprocal_math"
   {
-rtx two = gcn_vec_constant (mode,
- const_double_from_real_value (dconst2, mode));
+rtx one = gcn_vec_constant (mode,
+ const_double_from_real_value (dconst1, mode));
 rtx initrcp = gen_reg_rtx (mode);
 rtx fma = gen_reg_rtx (mode);
 rtx rcp;
+rtx num = operands[1], denom = operands[2];
 
-bool is_rcp = (GET_CODE (operands[1]) == CONST_VECTOR
+bool is_rcp = (GET_CODE (num) == CONST_VECTOR
   && real_identical
(CONST_DOUBLE_REAL_VALUE
- (CONST_VECTOR_ELT (operands[1], 0)), &dconstm1));
+ (CONST_VECTOR_ELT (num, 0)), &dconstm1));
 
 if (is_rcp)
   rcp = operands[0];
 else
   rcp = gen_reg_rtx (mode);
 
-emit_insn (gen_recip2 (initrcp, operands[2]));
-emit_insn (gen_fma4_negop2 (fma, initrcp, operands[2], two));
-emit_insn (gen_mul3 (rcp, initrcp, fma));
+emit_insn (gen_recip2 (initrcp, denom));
+emit_insn (gen_fma4_negop2 (fma, initrcp, denom, one));
+emit_insn (gen_fma4 (rcp, fma, initrcp, initrcp));
 
 if (!is_rcp)
-  emit_insn (gen_mul3 (operands[0], operands[1], rcp));
+  {
+   rtx div_est = gen_reg_rtx (mode);
+   rtx fma2 = gen_reg_rtx (mode);
+   rtx fma3 = gen_reg_rtx (mode);
+   rtx fma4 = gen_reg_rtx (mode);
+   emit_insn (gen_mul3 (div_est, num, rcp));
+   emit_insn (gen_fma4_negop2 (fma2, div_est, denom, num));
+   emit_insn (gen_fma4 (fma3, fma2, rcp, div_est));
+   emit_insn (gen_fma4_negop2 (fma4, fma3, denom, num));
+   emit_insn (gen_fma4 (operands[0], fma4, rcp, fma3));
+  }
 
 DONE;
   })
@@ -2414,10 +2425,11 @@
(match_operand:FP 2 "gcn_valu_src0_operand")]
   "flag_reciprocal_math"
   {
-rtx two = const_double_from_real_value (dconst2, mode);
+rtx one = const_double_from_real_value (dconst1, mode);
 rtx initrcp = gen_reg_rtx (mode);
 rtx fma = gen_reg_rtx (mode);
 rtx rcp;
+rtx num = operands[1], denom = operands[2];
 
 bool is_rcp = (GET_CODE (operands[1]) == CONST_DOUBLE
   && real_identical (CONST_DOUBLE_REAL_VALUE (operands[1]),
@@ -2428,12 +2440,22 @@
 else
   rcp = gen_reg_rtx (mode);
 
-emit_insn (gen_recip2 (initrcp, operands[2]));
-emit_insn (gen_fma4_negop2 (fma, initrcp, operands[2], two));
-emit_insn (gen_mul3 (rcp, initrcp, fma));
+emit_insn (gen_recip2 (initrcp, denom));
+emit_insn (gen_fma4_negop2 (fma, initrcp, denom, one));
+emit_insn (gen_fma4 (rcp, fma, initrcp, initrcp));
 
 if (!is_rcp)
-  emit_insn (gen_mul3 (operands[0], operands[1], rcp));
+  {
+   rtx div_est =

[PATCH] amdgcn: Fix exec register live-on-entry to BB in md-reorg

2021-01-12 Thread Julian Brown

This patch fixes a corner case in the AMD GCN md-reorg pass when the
EXEC register is live on entry to a BB, and could be clobbered by code
inserted by the pass before a use in (e.g.) a different BB.

I don't have a standalone test case demonstrating this failure mode,
but I did observe it with an offload compiler (on another branch) with
a particular benchmark.

Tested with standalone AMD GCN target. I will commit shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_md_reorg): Fix case where EXEC reg is live
on entry to a BB.
---
 gcc/config/gcn/gcn.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 0fb9bd26727..630ce4eebc7 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4501,6 +4501,8 @@ gcn_md_reorg (void)
   df_insn_rescan_all ();
 }
 
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
   df_analyze ();
 
   /* This pass ensures that the EXEC register is set correctly, according
@@ -4522,6 +4524,17 @@ gcn_md_reorg (void)
   int64_t curr_exec = 0;   /* 0 here means 'the value is that of EXEC
   after last_exec_def is executed'.  */
 
+  bitmap live_in = DF_LR_IN (bb);
+  bool exec_live_on_entry = false;
+  if (bitmap_bit_p (live_in, EXEC_LO_REG)
+ || bitmap_bit_p (live_in, EXEC_HI_REG))
+   {
+ if (dump_file)
+   fprintf (dump_file, "EXEC reg is live on entry to block %d\n",
+(int) bb->index);
+ exec_live_on_entry = true;
+   }
+
   FOR_BB_INSNS_SAFE (bb, insn, curr)
{
  if (!NONDEBUG_INSN_P (insn))
@@ -4660,6 +4673,8 @@ gcn_md_reorg (void)
 exec_lo_def_p == exec_hi_def_p ? "full" : "partial",
 INSN_UID (insn));
}
+
+ exec_live_on_entry = false;
}
 
   COPY_REG_SET (&live, DF_LR_OUT (bb));
@@ -4669,7 +4684,7 @@ gcn_md_reorg (void)
 at the end of the block.  */
   if ((REGNO_REG_SET_P (&live, EXEC_LO_REG)
   || REGNO_REG_SET_P (&live, EXEC_HI_REG))
- && (!curr_exec_known || !curr_exec_explicit))
+ && (!curr_exec_known || !curr_exec_explicit || exec_live_on_entry))
{
  rtx_insn *end_insn = BB_END (bb);
 
-- 
2.29.2

[PATCH] amdgcn: Remove dead code for fixed v0 register

2021-01-12 Thread Julian Brown

This patch removes code to fix the v0 register in
gcn_conditional_register_usage that was missed out of the previous patch
removing the need for that:

  https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534284.html

Tested with standalone AMD GCN target. I will commit shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_conditional_register_usage): Remove dead code
to fix v0 register.
---
 gcc/config/gcn/gcn.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 630ce4eebc7..b08f4b32c9c 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2137,10 +2137,6 @@ gcn_conditional_register_usage (void)
 fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Y_ARG]] = 1;
   if (cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG] >= 0)
 fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG]] = 1;
-
-  if (TARGET_GCN5_PLUS)
-/* v0 is always zero, for global nul-offsets.  */
-fixed_regs[VGPR_REGNO (0)] = 1;
 }
 
 /* Determine if a load or store is valid, according to the register classes
-- 
2.29.2

Re: [patch] fix -Wformat-diag warnings in rs6000-call.c

2021-01-12 Thread Segher Boessenkool

Hi!

On Wed, Jan 13, 2021 at 12:49:42AM +, Joseph Myers wrote:
> On Tue, 12 Jan 2021, Segher Boessenkool wrote:
> > On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote:
> > > These warnings, including the suggested fixes are seen on power*-linux 
> > > builds.
> > > 
> > > warning: misspelled term 'builtin function' in format; use 'bult-in 
> > > function'
> > > instead [-Wformat-diag]
> > 
> > This one is wrong.  Almost all the functions are called __builtin_xxx
> > so changing the warning message like that is just silly.  Instead, just
> > rephrase it so you do not need to repeat it,
> 
> As documented in codingconventions.html we use "built-in" as an adjective 
> in English text (as opposed to literal sequences of characters from C 
> source code).  Text in diagnostics that is meant to be source code rather 
> than English words should be enclosed in %<%>.  A literal %<__builtin%> 
> should be unchanged in translations, whereas "built-in" used as an 
> adjective should be translated.

My point is that "built-in function `__builtin_blablabla'" looks silly.
It is much better to not mention "built-in" again (it isn't useful
anyway).

> > > warning: spurious trailing punctuation sequence '].' in format 
> > > [-Wformat-diag]
> > 
> > This is also wrong (it says
> >   error ("Second argument of %qs must be in the range [0, 3].", name);
> > which is obviously correct: "]" is not punctuation, and that makes the
> > error message (again, not phrased as a warning) incorrect as well.
> 
> The leading capital letter is incorrect, as is the trailing '.'; both of 
> those are contrary to the GNU Coding Standards for diagnostics.

Yes, and yes.  But the warning message is *wrong*, that is my point.  It
says "]." is a punctuation sequence, and it is not (the "[" isn't
punctuation at all, it is a mathematical symbol).

I am not saying the diagnostics could not be improved, they certainly
could.  But the warning messages are just wrong on many levels as well,
and the suggested "fixes" make things worse in many cases (and the
messages do not say "maybe" or "we suggest" or similar).

Thanks,


Segher

[PATCH] libstdc++: c++2b, implement WG21 P1679R3

2021-01-12 Thread Paul Fee via Gcc-patches

Add contains member function to basic_string_view and basic_string.

The new method is enabled for -std=gnu++20, gnu++2b and c++2b.  This allows
users to access the method as a GNU extension to C++20.  The conditional
test may be reduced to "__cplusplus > 202011L" once GCC has a c++2b switch.

libstdc++-v3/

Add contains member function to basic_string_view.
Likewise to basic_string_view, both with and without _GLIBCXX_USE_CXX11_ABI.
Enabled with -std=gnu++20, gnu++2b and c++2b.
* include/bits/basic_string.h (basic_string::contains): New.
* libstdc++-v3/include/std/string_view (basic_string_view::contains): New.
* testsuite/21_strings/basic_string/operations/contains/char/1.cc: New test.
* testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc:
New test.
* testsuite/21_strings/basic_string/operations/starts_with/char/1.cc:
Remove trailing whitespace
* testsuite/21_strings/basic_string/operations/starts_with/wchar_t/1.cc:
Remove trailing whitespace
* testsuite/21_strings/basic_string_view/operations/contains/char/1.cc:
New test.
* testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc:
New test.
---
 libstdc++-v3/include/bits/basic_string.h
   |   30 
 libstdc++-v3/include/std/string_view
   |   16 ++
 libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc
|   65 ++
 libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc
 |   65 ++
 libstdc++-v3/testsuite/21_strings/basic_string/operations/starts_with/char/1.cc
 |2
 
libstdc++-v3/testsuite/21_strings/basic_string/operations/starts_with/wchar_t/1.cc
  |2
 
libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/1.cc
   |   51 +++
 
libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc
|   51 +++
 8 files changed, 280 insertions(+), 2 deletions(-)


diff --git a/libstdc++-v3/include/bits/basic_string.h
b/libstdc++-v3/include/bits/basic_string.h
index e272d332934..a569ecd8c08 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3073,6 +3073,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { return __sv_type(this->data(), this->size()).ends_with(__x); }
 #endif // C++20

+#if __cplusplus > 202011L || \
+  (__cplusplus == 202002L && !defined __STRICT_ANSI__)
+  bool
+  contains(basic_string_view<_CharT, _Traits> __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(_CharT __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(const _CharT* __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+#endif // C++23
+
   // Allow basic_stringbuf::__xfer_bufptrs to call _M_length:
   template friend class basic_stringbuf;
 };
@@ -5998,6 +6013,21 @@ _GLIBCXX_END_NAMESPACE_CXX11
   { return __sv_type(this->data(), this->size()).ends_with(__x); }
 #endif // C++20

+#if __cplusplus > 202011L || \
+  (__cplusplus == 202002L && !defined __STRICT_ANSI__)
+  bool
+  contains(basic_string_view<_CharT, _Traits> __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(_CharT __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(const _CharT* __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+#endif // C++23
+
 # ifdef _GLIBCXX_TM_TS_INTERNAL
   friend void
   ::_txnal_cow_string_C1_for_exceptions(void* that, const char* s,
diff --git a/libstdc++-v3/include/std/string_view
b/libstdc++-v3/include/std/string_view
index e33e1bc4b79..2f47ef6ed12 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -352,6 +352,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return this->ends_with(basic_string_view(__x)); }
 #endif // C++20

+#if __cplusplus > 202011L || \
+  (__cplusplus == 202002L && !defined __STRICT_ANSI__)
+#define __cpp_lib_string_contains 202011L
+  constexpr bool
+  contains(basic_string_view __x) const noexcept
+  { return this->find(__x) != npos; }
+
+  constexpr bool
+  contains(_CharT __x) const noexcept
+  { return this->find(__x) != npos; }
+
+  constexpr bool
+  contains(const _CharT* __x) const noexcept
+  { return this->find(__x) != npos; }
+#endif // C++23
+
   // [string.view.find], searching

   constexpr size_type
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc
b/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc
new file mode 100644
index 000..5d81dcee0ad
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains

[PATCH] c++: ICE when late parsing noexcept/NSDMI [PR98333]

2021-01-12 Thread Marek Polacek via Gcc-patches

Since certain members of a class are a complete-class context
[class.mem.general]p7, we delay their parsing untile the whole class has
been parsed.  For instance, NSDMIs and noexcept-specifiers.  The order
in which we perform this delayed parsing matters; we were first parsing
NSDMIs and only they did we parse noexcept-specifiers.   That turns out
to be wrong: since NSDMIs may use noexcept-specifiers, we must process
noexcept-specifiers first.  Otherwise we'll ICE in code that doesn't
expect to see DEFERRED_PARSE.

This doesn't just shift the problem, noexcept-specifiers can use members
with a NSDMI just fine, and I've also tested a similar test with this
member function:

  bool f() { return __has_nothrow_constructor (S); }

and that compiled fine too.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/98333
* parser.c (cp_parser_class_specifier_1): Perform late-parsing
of NSDMIs before late-parsing of noexcept-specifiers.

gcc/testsuite/ChangeLog:

PR c++/98333
* g++.dg/cpp0x/noexcept62.C: New test.
---
 gcc/cp/parser.c | 44 -
 gcc/testsuite/g++.dg/cpp0x/noexcept62.C | 10 ++
 2 files changed, 31 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept62.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c713852fe93..92ea4a23d17 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -25008,31 +25008,10 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  maybe_end_member_template_processing ();
}
   vec_safe_truncate (unparsed_funs_with_default_args, 0);
-  /* Now parse any NSDMIs.  */
-  save_ccp = current_class_ptr;
-  save_ccr = current_class_ref;
-  FOR_EACH_VEC_SAFE_ELT (unparsed_nsdmis, ix, decl)
-   {
- if (class_type != DECL_CONTEXT (decl))
-   {
- if (pushed_scope)
-   pop_scope (pushed_scope);
- class_type = DECL_CONTEXT (decl);
- pushed_scope = push_scope (class_type);
-   }
- inject_this_parameter (class_type, TYPE_UNQUALIFIED);
- cp_parser_late_parsing_nsdmi (parser, decl);
-   }
-  vec_safe_truncate (unparsed_nsdmis, 0);
-  current_class_ptr = save_ccp;
-  current_class_ref = save_ccr;
-  if (pushed_scope)
-   pop_scope (pushed_scope);
 
   /* If there are noexcept-specifiers that have not yet been processed,
-take care of them now.  */
-  class_type = NULL_TREE;
-  pushed_scope = NULL_TREE;
+take care of them now.  Do this before processing NSDMIs as they
+may depend on noexcept-specifiers already having been processed.  */
   FOR_EACH_VEC_SAFE_ELT (unparsed_noexcepts, ix, decl)
{
  tree ctx = DECL_CONTEXT (decl);
@@ -25084,6 +25063,25 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  maybe_end_member_template_processing ();
}
   vec_safe_truncate (unparsed_noexcepts, 0);
+
+  /* Now parse any NSDMIs.  */
+  save_ccp = current_class_ptr;
+  save_ccr = current_class_ref;
+  FOR_EACH_VEC_SAFE_ELT (unparsed_nsdmis, ix, decl)
+   {
+ if (class_type != DECL_CONTEXT (decl))
+   {
+ if (pushed_scope)
+   pop_scope (pushed_scope);
+ class_type = DECL_CONTEXT (decl);
+ pushed_scope = push_scope (class_type);
+   }
+ inject_this_parameter (class_type, TYPE_UNQUALIFIED);
+ cp_parser_late_parsing_nsdmi (parser, decl);
+   }
+  vec_safe_truncate (unparsed_nsdmis, 0);
+  current_class_ptr = save_ccp;
+  current_class_ref = save_ccr;
   if (pushed_scope)
pop_scope (pushed_scope);
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept62.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept62.C
new file mode 100644
index 000..53606c79142
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept62.C
@@ -0,0 +1,10 @@
+// PR c++/98333
+// { dg-do compile { target c++11 } }
+
+struct T {
+  template 
+  struct S {
+S () noexcept (N) {}
+  };
+  int a = __has_nothrow_constructor (S);
+};

base-commit: cfaaa6a1ca744c1a93fa08a3e7ab2a821383cac1
-- 
2.29.2

[PATCH] c++: ICE with delayed noexcept and attribute used [PR97966]

2021-01-12 Thread Marek Polacek via Gcc-patches

Another ICE with delayed noexcept parsing, but a bit gnarlier.

A function definition marked with __attribute__((used)) ought to be
emitted even when it is not referenced in a TU.  For a member function
template marked with __attribute__((used)) this means that it will
be instantiated: in instantiate_class_template_1 we have

11971   /* Instantiate members marked with attribute used.  */
11972   if (r != error_mark_node && DECL_PRESERVE_P (r))
11973 mark_used (r);

It is not so surprising that this doesn't work well with delayed
noexcept parsing: when we're processing the function template we delay
the parsing, so the member "foo" is found, but then when we're
instantiating it, "foo" hasn't yet been seen, which creates a
discrepancy and a crash ensues.  "foo" hasn't yet been seen because
instantiate_class_template_1 just loops over the class members and
instantiates right away.

It stands to reason to disable delayed noexcept parsing when the
function is marked with used (clang++ also rejects).  Unfortunately
we can't just use lookup_attribute when parsing and check if "used"
is there and if so, clear CP_PARSER_FLAGS_DELAY_NOEXCEPT, because
we accept attributes following the noexcept-specifier, like this:

  void g() noexcept(...) __attribute__((used));

Oh well.  This patch should handle various forms of attributes in
one place.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/97966
* parser.c (cp_parser_save_default_args): Don't delay parsing of
the noexcept-specifier of a function marked with attribute used.

gcc/testsuite/ChangeLog:

PR c++/97966
* g++.dg/cpp0x/noexcept63.C: New test.
---
 gcc/cp/parser.c | 20 +-
 gcc/testsuite/g++.dg/cpp0x/noexcept63.C | 49 +
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept63.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c713852fe93..07ab966e099 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30880,7 +30880,25 @@ cp_parser_save_default_args (cp_parser* parser, tree 
decl)
   /* Remember if there is a noexcept-specifier to post process.  */
   tree spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl));
   if (UNPARSED_NOEXCEPT_SPEC_P (spec))
-vec_safe_push (unparsed_noexcepts, decl);
+{
+  /* Don't actually delay parsing of the noexcept-specifier of
+a member function marked with attribute used.  */
+  if (__builtin_expect (DECL_PRESERVE_P (decl), 0))
+   {
+ auto cleanup = make_temp_override
+   (parser->local_variables_forbidden_p);
+ if (DECL_THIS_STATIC (decl))
+   parser->local_variables_forbidden_p |= THIS_FORBIDDEN;
+ inject_this_parameter (current_class_type, TYPE_UNQUALIFIED);
+ spec = cp_parser_late_noexcept_specifier (parser,
+   TREE_PURPOSE (spec));
+ if (spec == error_mark_node)
+   spec = NULL_TREE;
+ fixup_deferred_exception_variants (TREE_TYPE (decl), spec);
+   }
+  else
+   vec_safe_push (unparsed_noexcepts, decl);
+}
 }
 
 /* DEFAULT_ARG contains the saved tokens for the initializer of DECL,
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept63.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
new file mode 100644
index 000..8efeac93a30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C
@@ -0,0 +1,49 @@
+// PR c++/97966
+// { dg-do compile { target c++11 } }
+
+template 
+struct S1 {
+  __attribute__((used)) S1() noexcept(noexcept(this->foo())); // { dg-error 
"has no member" }
+  void foo();
+};
+
+template 
+struct S2 {
+  __attribute__((used)) void bar() noexcept(noexcept(this->foo())); // { 
dg-error "has no member" }
+  void foo();
+};
+
+template 
+struct S3 {
+  void __attribute__((used)) bar() noexcept(noexcept(this->foo())); // { 
dg-error "has no member" }
+  void foo();
+};
+
+template 
+struct S4 {
+  [[gnu::used]] void bar() noexcept(noexcept(this->foo())); // { dg-error "has 
no member" }
+  void foo();
+};
+
+template 
+struct S5 {
+  void bar() noexcept(noexcept(this->foo())) __attribute__((used)); // { 
dg-error "has no member" }
+  void foo();
+};
+
+template 
+struct S6 {
+  static void bar() noexcept(noexcept(this->foo())) __attribute__((used)); // 
{ dg-error ".this. may not be used in this context" }
+  void foo();
+};
+
+void
+g ()
+{
+  S1<1> s1;
+  S2<1> s2;
+  S3<1> s3;
+  S4<1> s4;
+  S5<1> s5;
+  S6<1> s6;
+}

base-commit: cfaaa6a1ca744c1a93fa08a3e7ab2a821383cac1
-- 
2.29.2

Re: Bug fix in ISO_Fortran_binding - unsigned char arrays

2021-01-12 Thread Harris Snyder

Hi everyone,

Sorry, my previous email erroneously referred to unsigned chars / uint8_t, when 
I in fact meant signed chars / int8_t. The actual patch works, but the test 
case files have ‘unit’ in the file names. I’ll provide a corrected patch 
tomorrow to fix the file names. 

Harris

Re: [Ada,FYI] revamp ada.numerics.aux

2021-01-12 Thread Sebastian Huber


Hello Alexandre,

On 19/10/2020 13:52, Alexandre Oliva wrote:

On Oct 19, 2020, Andreas Schwab  wrote:


-nostdinc a-nallfl.ads -o a-nallfl.o
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on return value
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on argument 1
a-nallfl.ads:48:13: warning: profile of "Sin" doesn't match the builtin it binds

Thanks for the report.  Ada's Standard.Long_Long_Float is mapped to C
double rather than long double on this target.

Here's a workaround, for aarch64-* and ppc*-linux-gnu, where I've
observed the mismatch so far.

I'm not sure whether we're going to use something like this, or a fix
for the mismatch; I'm yet to figure out how to implement the latter.


aarch64-* and ppc*-linux-gnu long long float/long double mismatch


I have a similar issue on riscv-rtems*:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98595

Can this be fixed also with a patch to gcc/ada/Makefile.rtl?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

RE: [COMMITED] MAINTAINERS: Add myself for write after approval

2021-01-12 Thread Qian, Jianhua

> On Wed, Jan 13, 2021 at 02:45:19, Segher Boessenkool wrote:
> 
> Hi!
> 
> On Tue, Jan 12, 2021 at 03:44:22PM +0800, Qian Jianhua wrote:
> > ChangeLog:
> >
> > 2021-01-12  Qian Jianhua  
> >
> > * MAINTAINERS (Write After Approval): Add myself
> 
> Welcome!
> 
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -451,6 +451,7 @@ Daniel Jacobowitz
>   
> >  Andreas Jaeger 
> >  Harsha Jagasia
>   
> >  Fariborz Jahanian  
> > +Qian Jianhua
> 
> 
> Indents should be with tabs here.  I fixed it (and some others).
> 

Thank you for that.

Regards,
Qian

> 
> Segher
>

[PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch implements what I've talked about, i.e. to no longer
force operands of vec_perm_const into registers in the generic code, but let
each of the (currently 8) targets force it into registers individually,
giving the targets better control on if it does that and when and allowing
them to do something special with some particular operands.
And then defines the define_insn_and_split for the 256-bit and 512-bit
permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
add define_insn_and_split patterns also for the bd, bq and wq).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-13  Jakub Jelinek  

PR target/95905
* optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
registers before calling targetm.vectorize.vec_perm_const, only after
that.
* config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
two argument permutation when one operand is zero vector and only
after that force operands into registers.
* config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1,
*avx512bw_zero_extendv32qiv32hi2_1, *avx512f_zero_extendv16hiv16si2_1,
*avx2_zero_extendv8hiv8si2_1, *avx512f_zero_extendv8siv8di2_1,
*avx2_zero_extendv4siv4di2_1): New define_insn_and_split patterns.
* config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands
into registers.
* config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
* config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
* config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise.
* config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise.
* config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise.  Use 
std::swap.

* gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
scan-assembler.  Add tests with zero vector as first __builtin_shuffle
operand.
* gcc.target/i386/pr95905-3.c: New test.
* gcc.target/i386/pr95905-4.c: New test.

--- gcc/optabs.c.jj 2021-01-04 10:25:38.632236100 +0100
+++ gcc/optabs.c2021-01-12 14:46:44.719557815 +0100
@@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode
 
   if (targetm.vectorize.vec_perm_const != NULL)
 {
-  v0 = force_reg (mode, v0);
   if (single_arg_p)
v1 = v0;
-  else
-   v1 = force_reg (mode, v1);
 
   if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
return target;
@@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode
return gen_lowpart (mode, target_qi);
 }
 
+  v0 = force_reg (mode, v0);
+  if (single_arg_p)
+v1 = v0;
+  v1 = force_reg (mode, v1);
+
   /* Otherwise expand as a fully variable permuation.  */
 
   /* The optabs are only defined for selectors with the same width
--- gcc/config/i386/i386-expand.c.jj2021-01-12 11:01:51.189386077 +0100
+++ gcc/config/i386/i386-expand.c   2021-01-12 15:43:55.673095807 +0100
@@ -19929,6 +19929,33 @@ ix86_vectorize_vec_perm_const (machine_m
 
   two_args = canonicalize_perm (&d);
 
+  /* If one of the operands is a zero vector, try to match pmovzx.  */
+  if (two_args && (d.op0 == CONST0_RTX (vmode) || d.op1 == CONST0_RTX (vmode)))
+{
+  struct expand_vec_perm_d dzero = d;
+  if (d.op0 == CONST0_RTX (vmode))
+   {
+ d.op1 = dzero.op1 = force_reg (vmode, d.op1);
+ std::swap (dzero.op0, dzero.op1);
+ for (i = 0; i < nelt; ++i)
+   dzero.perm[i] ^= nelt;
+   }
+  else
+   d.op0 = dzero.op0 = force_reg (vmode, d.op0);
+
+  if (expand_vselect_vconcat (dzero.target, dzero.op0, dzero.op1,
+ dzero.perm, nelt, dzero.testing_p))
+   return true;
+}
+
+  /* Force operands into registers.  */
+  rtx nop0 = force_reg (vmode, d.op0);
+  if (d.op0 == d.op1)
+d.op1 = nop0;
+  d.op0 = nop0;
+  if (d.op0 != d.op1)
+d.op1 = force_reg (vmode, d.op1);
+
   if (ix86_expand_vec_perm_const_1 (&d))
 return true;
 
--- gcc/config/i386/sse.md.jj   2021-01-12 14:30:32.688546846 +0100
+++ gcc/config/i386/sse.md  2021-01-12 15:40:29.018402527 +0100
@@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2v16qiv16hi2"
   [(set (match_operand:V16HI 0 "register_operand")
(any_extend:V16HI
@@ -17628,6 +17645,23 @@ (define_insn "avx512bw_v32qiv32hi2
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
+(define_insn_and_split "*avx512bw_zero_extendv32qiv32hi2_1"
+  [(set (match_operand:V64QI 0 "register_operand" "=v")
+   (vec_select:V64QI
+ (vec_concat:V128QI
+   (match_operand:V64QI 1 "nonimmediate_operand" "vm")
+   (match_operand:V64QI 2 "const0_operand" "C"))
+ (match_parallel 3 "pmovzx_parallel"
+   [(match_operand 4 "const_int_operand" "n")])))]
+

[PATCH] i386: Add define_insn_and_split patterns for btrl [PR96938]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

In the following testcase we only optimize f2 and f7 to btrl, although we
should optimize that way all of the functions.  The problem is the type
demotion/narrowing (which is performed solely during the generic folding and
not later), without it we see the AND performed in SImode and match it as
btrl, but with it while the shifts are still performed in SImode, the
AND is already done in QImode or HImode low part of the shift.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-13  Jakub Jelinek  

PR target/96938
* config/i386/i386.md (*btr_1, *btr_2): New
define_insn_and_split patterns.
(splitter after *btr_2): New splitter.

* gcc.target/i386/pr96938.c: New test.

--- gcc/config/i386/i386.md.jj  2021-01-07 17:18:39.653487482 +0100
+++ gcc/config/i386/i386.md 2021-01-12 19:01:37.286603961 +0100
@@ -12419,6 +12419,70 @@ (define_insn_and_split "*btr_mask_
 (match_dup 3)))
   (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*btr_1"
+  [(set (match_operand:SWI12 0 "register_operand")
+   (and:SWI12
+ (subreg:SWI12
+   (rotate:SI (const_int -2)
+  (match_operand:QI 2 "register_operand")) 0)
+ (match_operand:SWI12 1 "nonimmediate_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (and:SI (rotate:SI (const_int -2) (match_dup 2))
+  (match_dup 1)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[0] = lowpart_subreg (SImode, operands[0], mode);
+  if (MEM_P (operands[1]))
+operands[1] = force_reg (mode, operands[1]);
+  operands[1] = lowpart_subreg (SImode, operands[1], mode);
+})
+
+(define_insn_and_split "*btr_2"
+  [(set (zero_extract:HI
+ (match_operand:SWI12 0 "nonimmediate_operand")
+ (const_int 1)
+ (zero_extend:SI (match_operand:QI 1 "register_operand")))
+   (const_int 0))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& MEM_P (operands[0])"
+  [(set (match_dup 2) (match_dup 0))
+   (parallel
+ [(set (match_dup 3) (and:SI (rotate:SI (const_int -2) (match_dup 1))
+(match_dup 4)))
+  (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 0) (match_dup 5))]
+{
+  operands[2] = gen_reg_rtx (mode);
+  operands[5] = gen_reg_rtx (mode);
+  operands[3] = lowpart_subreg (SImode, operands[5], mode);
+  operands[4] = lowpart_subreg (SImode, operands[2], mode);
+})
+
+(define_split
+  [(set (zero_extract:HI
+ (match_operand:SWI12 0 "register_operand")
+ (const_int 1)
+ (zero_extend:SI (match_operand:QI 1 "register_operand")))
+   (const_int 0))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  [(parallel
+ [(set (match_dup 0)
+  (and:SI (rotate:SI (const_int -2) (match_dup 1))
+  (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = lowpart_subreg (SImode, operands[0], mode);
+  operands[0] = lowpart_subreg (SImode, operands[0], mode);
+})
+
 ;; These instructions are never faster than the corresponding
 ;; and/ior/xor operations when using immediate operand, so with
 ;; 32-bit there's no point.  But in 64-bit, we can't hold the
--- gcc/testsuite/gcc.target/i386/pr96938.c.jj  2021-01-12 19:12:48.285023954 
+0100
+++ gcc/testsuite/gcc.target/i386/pr96938.c 2021-01-12 19:12:33.209194271 
+0100
@@ -0,0 +1,66 @@
+/* PR target/96938 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tbtrl\t" 10 } } */
+
+void
+f1 (unsigned char *f, int o, unsigned char v)
+{
+  *f = (*f & ~(1 << o)) | (v << o);
+}
+
+void
+f2 (unsigned char *f, int o, unsigned char v)
+{
+  int t = *f & ~(1 << o);
+  *f = t | (v << o);
+}
+
+void
+f3 (unsigned char *f, int o, unsigned char v)
+{
+  *f &= ~(1 << o);
+}
+
+void
+f4 (unsigned char *f, int o, unsigned char v)
+{
+  *f = (*f & ~(1 << (o & 31))) | v;
+}
+
+void
+f5 (unsigned char *f, int o, unsigned char v)
+{
+  *f = (*f & ~(1 << (o & 31))) | (v << (o & 31));
+}
+
+void
+f6 (unsigned short *f, int o, unsigned short v)
+{
+  *f = (*f & ~(1 << o)) | (v << o);
+}
+
+void
+f7 (unsigned short *f, int o, unsigned short v)
+{
+  int t = *f & ~(1 << o);
+  *f = t | (v << o);
+}
+
+void
+f8 (unsigned short *f, int o, unsigned short v)
+{
+  *f &= ~(1 << o);
+}
+
+void
+f9 (unsigned short *f, int o, unsigned short v)
+{
+  *f = (*f & ~(1 << (o & 31))) | v;
+}
+
+void
+f10 (unsigned short *f, int o, unsigned short v)
+{
+  *f = (*f & ~(1 << (o & 31))) | (v << (o & 31));
+}

Jakub

[PATCH] match.pd: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if C and D are constants [PR96691]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

Hi!

These simplifications are only simplifications if the (~D ^ C) or (D ^ C)
expressions fold into constants, but in that case they decrease number of
operations by 1.

For the first transformation I guess I should add reverse simplification
when C and D are arbitrary (constants would fold before;
to be tried incrementally).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-01-13  Jakub Jelinek  

PR tree-optimization/96691
* match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C),
(~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications for constant
C and D.

* gcc.dg/tree-ssa/pr96691.c: New test.

--- gcc/match.pd.jj 2021-01-12 13:17:35.0 +0100
+++ gcc/match.pd2021-01-12 21:39:51.310921835 +0100
@@ -947,6 +947,20 @@ (define_operator_list COND_TERNARY
  (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0))
  (bit_ior @3 @2))
 
+/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if C and D are constants.  */
+(simplify
+ (bit_xor (bit_ior:s (bit_not @0) @1) @2)
+  (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)
+   && TREE_CODE (@1) == TREE_CODE (@2))
+   (bit_xor (bit_ior @0 @1) (bit_xor (bit_not @2) @1
+
+/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if C and D are constants.  */
+(simplify
+ (bit_xor (bit_and:s (bit_not @0) @1) @2)
+  (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)
+   && TREE_CODE (@1) == TREE_CODE (@2))
+   (bit_xor (bit_and @0 @1) (bit_xor @2 @1
+
 /* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
 #if GIMPLE
 (simplify
--- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj  2021-01-12 21:48:35.763025413 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c 2021-01-12 21:48:19.579207370 
+0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/96691 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */
+
+int
+foo (int x)
+{
+  return (~x | 123) ^ 321;
+}
+
+int
+bar (int x)
+{
+  return (~x & 123) ^ 321;
+}

Jakub

Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-12 Thread Richard Biener

On Wed, 13 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following patch implements what I've talked about, i.e. to no longer
> force operands of vec_perm_const into registers in the generic code, but let
> each of the (currently 8) targets force it into registers individually,
> giving the targets better control on if it does that and when and allowing
> them to do something special with some particular operands.
> And then defines the define_insn_and_split for the 256-bit and 512-bit
> permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
> add define_insn_and_split patterns also for the bd, bq and wq).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if it's worth handling the v0 == v1 case this way given the
weird

+  if (op1 && op0 != op1)
+op1 = force_reg (vmode, op1);

code (presumably to handle RTX sharing here)?

Otherwise it looks sensible - for x86, only constant op1/op0 are 
interesting, correct?  Wouldn't that simplify things (to only handle
constants this way)?

Thanks,
Richard.

> 2021-01-13  Jakub Jelinek  
> 
>   PR target/95905
>   * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
>   registers before calling targetm.vectorize.vec_perm_const, only after
>   that.
>   * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
>   two argument permutation when one operand is zero vector and only
>   after that force operands into registers.
>   * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1,
>   *avx512bw_zero_extendv32qiv32hi2_1, *avx512f_zero_extendv16hiv16si2_1,
>   *avx2_zero_extendv8hiv8si2_1, *avx512f_zero_extendv8siv8di2_1,
>   *avx2_zero_extendv4siv4di2_1): New define_insn_and_split patterns.
>   * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands
>   into registers.
>   * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
>   * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
>   * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
>   * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise.
>   * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise.
>   * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise.  Use 
> std::swap.
> 
>   * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
>   scan-assembler.  Add tests with zero vector as first __builtin_shuffle
>   operand.
>   * gcc.target/i386/pr95905-3.c: New test.
>   * gcc.target/i386/pr95905-4.c: New test.
> 
> --- gcc/optabs.c.jj   2021-01-04 10:25:38.632236100 +0100
> +++ gcc/optabs.c  2021-01-12 14:46:44.719557815 +0100
> @@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode
>  
>if (targetm.vectorize.vec_perm_const != NULL)
>  {
> -  v0 = force_reg (mode, v0);
>if (single_arg_p)
>   v1 = v0;
> -  else
> - v1 = force_reg (mode, v1);
>  
>if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
>   return target;
> @@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode
>   return gen_lowpart (mode, target_qi);
>  }
>  
> +  v0 = force_reg (mode, v0);
> +  if (single_arg_p)
> +v1 = v0;
> +  v1 = force_reg (mode, v1);
> +
>/* Otherwise expand as a fully variable permuation.  */
>  
>/* The optabs are only defined for selectors with the same width
> --- gcc/config/i386/i386-expand.c.jj  2021-01-12 11:01:51.189386077 +0100
> +++ gcc/config/i386/i386-expand.c 2021-01-12 15:43:55.673095807 +0100
> @@ -19929,6 +19929,33 @@ ix86_vectorize_vec_perm_const (machine_m
>  
>two_args = canonicalize_perm (&d);
>  
> +  /* If one of the operands is a zero vector, try to match pmovzx.  */
> +  if (two_args && (d.op0 == CONST0_RTX (vmode) || d.op1 == CONST0_RTX 
> (vmode)))
> +{
> +  struct expand_vec_perm_d dzero = d;
> +  if (d.op0 == CONST0_RTX (vmode))
> + {
> +   d.op1 = dzero.op1 = force_reg (vmode, d.op1);
> +   std::swap (dzero.op0, dzero.op1);
> +   for (i = 0; i < nelt; ++i)
> + dzero.perm[i] ^= nelt;
> + }
> +  else
> + d.op0 = dzero.op0 = force_reg (vmode, d.op0);
> +
> +  if (expand_vselect_vconcat (dzero.target, dzero.op0, dzero.op1,
> +   dzero.perm, nelt, dzero.testing_p))
> + return true;
> +}
> +
> +  /* Force operands into registers.  */
> +  rtx nop0 = force_reg (vmode, d.op0);
> +  if (d.op0 == d.op1)
> +d.op1 = nop0;
> +  d.op0 = nop0;
> +  if (d.op0 != d.op1)
> +d.op1 = force_reg (vmode, d.op1);
> +
>if (ix86_expand_vec_perm_const_1 (&d))
>  return true;
>  
> --- gcc/config/i386/sse.md.jj 2021-01-12 14:30:32.688546846 +0100
> +++ gcc/config/i386/sse.md2021-01-12 15:40:29.018402527 +0100
> @@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2 (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "OI")])
>  
> +(define

Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-12 Thread Jakub Jelinek via Gcc-patches

On Wed, Jan 13, 2021 at 08:26:49AM +0100, Richard Biener wrote:
> On Wed, 13 Jan 2021, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > The following patch implements what I've talked about, i.e. to no longer
> > force operands of vec_perm_const into registers in the generic code, but let
> > each of the (currently 8) targets force it into registers individually,
> > giving the targets better control on if it does that and when and allowing
> > them to do something special with some particular operands.
> > And then defines the define_insn_and_split for the 256-bit and 512-bit
> > permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
> > add define_insn_and_split patterns also for the bd, bq and wq).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if it's worth handling the v0 == v1 case this way given the
> weird
> 
> +  if (op1 && op0 != op1)
> +op1 = force_reg (vmode, op1);
> 
> code (presumably to handle RTX sharing here)?

The v0 == v1 case is not about RTX sharing, but about telling the targets
in a quick way that it is a one argument permutation (two argument
__builtin_shuffle, or 3 argument where the expander can determine they are
the same etc.).  Sometimes that can also be determined from the permutation,
but not always.
If we just forced both arguments into registers separately, this info would
be lost.

> Otherwise it looks sensible - for x86, only constant op1/op0 are 
> interesting, correct?  Wouldn't that simplify things (to only handle
> constants this way)?

Yes, and only CONST0_RTX ATM.  I originally had the generic code only treat
CONST0_RTX that way and otherwise force_reg, but it didn't really simplify
anything at all, the backends still need to force_reg if they require a REG,
even when it is just CONST0_RTX that could make it through.

Jakub

Re: [PATCH] match.pd: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if C and D are constants [PR96691]

2021-01-12 Thread Richard Biener

On Wed, 13 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> These simplifications are only simplifications if the (~D ^ C) or (D ^ C)
> expressions fold into constants, but in that case they decrease number of
> operations by 1.
> 
> For the first transformation I guess I should add reverse simplification
> when C and D are arbitrary (constants would fold before;
> to be tried incrementally).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2021-01-13  Jakub Jelinek  
> 
>   PR tree-optimization/96691
>   * match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C),
>   (~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications for constant
>   C and D.
> 
>   * gcc.dg/tree-ssa/pr96691.c: New test.
> 
> --- gcc/match.pd.jj   2021-01-12 13:17:35.0 +0100
> +++ gcc/match.pd  2021-01-12 21:39:51.310921835 +0100
> @@ -947,6 +947,20 @@ (define_operator_list COND_TERNARY
>   (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0))
>   (bit_ior @3 @2))
>  
> +/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if C and D are constants.  */
> +(simplify
> + (bit_xor (bit_ior:s (bit_not @0) @1) @2)
> +  (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)
> +   && TREE_CODE (@1) == TREE_CODE (@2))
> +   (bit_xor (bit_ior @0 @1) (bit_xor (bit_not @2) @1

I guess you could use

  (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1))

and relax the constant test.  Not sure in which circumstances
(bit_xor (bit_not @2) @1) would simplify down to a singleton
(within VN it might CSE for example).  Maybe

 global1 = (~D) ^ C;
 global2 = (~X | C) ^ D;

which VN should then simplify to

 global1 = (~D) ^ C;
 global2 = (X | C) ^ global1;

of course such case may apply to many more patterns so not sure
if we should start to use ! more aggressively now (it's relatively new).

And yes, the reverse transform would be a simplification in general.

Richard.

> +
> +/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if C and D are constants.  */
> +(simplify
> + (bit_xor (bit_and:s (bit_not @0) @1) @2)
> +  (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)
> +   && TREE_CODE (@1) == TREE_CODE (@2))
> +   (bit_xor (bit_and @0 @1) (bit_xor @2 @1
> +
>  /* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
>  #if GIMPLE
>  (simplify
> --- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj2021-01-12 
> 21:48:35.763025413 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c   2021-01-12 21:48:19.579207370 
> +0100
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/96691 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */
> +
> +int
> +foo (int x)
> +{
> +  return (~x | 123) ^ 321;
> +}
> +
> +int
> +bar (int x)
> +{
> +  return (~x & 123) ^ 321;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-12 Thread Richard Biener

On Tue, 12 Jan 2021, Qing Zhao wrote:

> Hi, 
> 
> Just check in to see whether you have any comments and suggestions on this:
> 
> FYI, I have been continue with Approach D implementation since last week:
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with 
> “.DEFFERED_INIT”.
> 
> For the remaining work of Approach D:
> 
>  ** complete the implementation of -ftrivial-auto-var-init=pattern;
>  ** complete the implementation of uninitialized warnings maintenance work 
> for D. 
> 
> I have completed the uninitialized warnings maintenance work for D.
> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
> 
> The following are remaining work of Approach D:
> 
>** -ftrivial-auto-var-init=pattern for VLA;
>**add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>** adding complete testing cases;
>   
> 
> Please let me know if you have any objection on my current decision on 
> implementing approach D. 

Did you do any analysis on how stack usage and code size are changed 
with approach D?  How does compile-time behave (we could gobble up
lots of .DEFERRED_INIT calls I guess)?

Richard.

> Thanks a lot for your help.
> 
> Qing
> 
> 
> > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > Hi,
> > 
> > This is an update for our previous discussion. 
> > 
> > 1. I implemented the following two different implementations in the latest 
> > upstream gcc:
> > 
> > A. Adding real initialization during gimplification, not maintain the 
> > uninitialized warnings.
> > 
> > D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> > .DEFFERED_INIT during expand to
> > real initialization. Adjusting uninitialized pass with the new refs with 
> > “.DEFFERED_INIT”.
> > 
> > Note, in this initial implementation,
> > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
> > -ftrivial-auto-var-init=pattern 
> >is not done yet.  Therefore, the performance data is only about 
> > -ftrivial-auto-var-init=zero. 
> > 
> > ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
> > choose implementation A or D for 
> >runtime performance study.
> > ** I didn’t finish the uninitialized warnings maintenance work for D. 
> > (That might take more time than I expected). 
> > 
> > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
> > for the following 3 cases:
> > 
> > no: default. (-g -O2 -march=native )
> > A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> > D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> > 
> > And then compute the slowdown data for both A and D as following:
> > 
> > benchmarks  A / no  D /no
> > 
> > 500.perlbench_r 1.25%   1.25%
> > 502.gcc_r   0.68%   1.80%
> > 505.mcf_r   0.68%   0.14%
> > 520.omnetpp_r   4.83%   4.68%
> > 523.xalancbmk_r 0.18%   1.96%
> > 525.x264_r  1.55%   2.07%
> > 531.deepsjeng_  11.57%  11.85%
> > 541.leela_r 0.64%   0.80%
> > 557.xz_  -0.41% -0.41%
> > 
> > 507.cactuBSSN_r 0.44%   0.44%
> > 508.namd_r  0.34%   0.34%
> > 510.parest_r0.17%   0.25%
> > 511.povray_r56.57%  57.27%
> > 519.lbm_r   0.00%   0.00%
> > 521.wrf_r-0.28% -0.37%
> > 526.blender_r   16.96%  17.71%
> > 527.cam4_r  0.70%   0.53%
> > 538.imagick_r   2.40%   2.40%
> > 544.nab_r   0.00%   -0.65%
> > 
> > avg 5.17%   5.37%
> > 
> > From the above data, we can see that in general, the runtime performance 
> > slowdown for 
> > implementation A and D are similar for individual benchmarks.
> > 
> > There are several benchmarks that have significant slowdown with the new 
> > added initialization for both
> > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I 
> > will try to study a little bit
> > more on what kind of new initializations introduced such slowdown. 
> > 
> > From the current study so far, I think that approach D should be good 
> > enough for our final implementation. 
> > So, I will try to finish approach D with the following remaining work
> > 
> >  ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >  ** complete the implementation of uninitialized warnings maintenance 
> > work for D. 
> > 
> > 
> > Let me know if you have any comments and suggestions on my current and 
> > future work.
> > 
> > Thanks a lot for your help.
> > 
> > Qing
> > 
> >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
> >>  wrote:
> >> 
> >> The following are the approaches I will implement and compare:
> >> 
> >> Our final goal is to keep the uninit

72 matches

Mail list logo