[PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Kewen.Lin via Gcc-patches
on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> on 2021/7/14 下午2:38, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
>>>
>>> on 2021/7/13 下午8:42, Richard Biener wrote:
 On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
>>
>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>
>> Yes.
>>
> 
> Thanks for confirming!  The related patch v2 is attached and the testing
> is ongoing.
> 

It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:

XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

They weren't exposed in the testing run with the previous patch which
doesn't use IFN way.  By investigating it, the difference comes from
the different costing on MULT_HIGHPART_EXPR and IFN_MULH.

For MULT_HIGHPART_EXPR, it's costed by 16 from below call:

case MULT_EXPR:
case WIDEN_MULT_EXPR:
case MULT_HIGHPART_EXPR:
  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);

While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
becomes profitable and the expected vectorization happens.

One conservative fix seems to make IFN_MULH costing go through the
unique cost interface for multiplication, that is:

  case CFN_MULH:
stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
break;

As the test case marks the checks as "xfail", probably it's good to
revisit the costing on mul_highpart to ensure it's not priced more.

The attached patch also addressed Richard S.'s review comments on
two reformatting hunks.  Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
* internal-fn.def (IFN_MULH): New internal function.
* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
recog normal multiply highpart as IFN_MULH.
* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
function CFN_MULH.
---
 gcc/config/i386/i386.c   |  3 +++
 gcc/internal-fn.c|  1 +
 gcc/internal-fn.def  |  2 ++
 gcc/tree-vect-patterns.c | 38 --
 4 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a93128fa0a4..1dd9108353c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22559,6 +22559,9 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, 
int count,
   mode == SFmode ? ix86_cost->fmass
   : ix86_cost->fmasd);
break;
+  case CFN_MULH:
+   stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
+   break;
   default:
break;
   }
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index fb8b43d1ce2..b1b4289357c 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3703,6 +3703,7 @@ first_commutative_argument (internal_fn fn)
 case IFN_FNMS:
 case IFN_AVG_FLOOR:
 case IFN_AVG_CEIL:
+case IFN_MULH:
 case IFN_MULHS:
 case IFN_MULHRS:
 case IFN_FMIN:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3b8e730960..ed6d7de1680 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -169,6 +169,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | 
ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
  savg_ceil, uavg_ceil, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST | ECF_NOTHROW, first,
+ smul_highpart, umul_highpart, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
  smulhs, umulhs, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b2e7fc2cc7a..ada89d7060b 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1896,8 +1896,15 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
 
1) Multiply high with scaling
  TYPE res = ((TYPE) a * (TYPE) b) >> c;
+ Here, c is bitsize (TYPE) / 2 - 1.
+
2) ... or also with rounding
  TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
+ Here, d is bitsize (TYPE) / 2 - 2.
+
+   3) Normal multiply high
+ TYPE res = ((TYPE) a * (TYPE) b) >> e;
+ Here, e is bitsize (TYPE) / 2.
 
where only the bottom half of res is used.  */
 
@@ -1942,7 +1949,6 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
   stmt_vec_info mulh_stmt_info;
   tree scale_term;
   internal_fn ifn;
-  unsigned int expect_offset;
 
   /* Check for the presence of the rounding term.  */
   if (gimple_assign_rhs_code (rshif

Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
>
> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> > on 2021/7/14 下午2:38, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
> >>>
> >>> on 2021/7/13 下午8:42, Richard Biener wrote:
>  On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
> >>
> >>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>
> >> Yes.
> >>
> >
> > Thanks for confirming!  The related patch v2 is attached and the testing
> > is ongoing.
> >
>
> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

These XFAILs should be removed after your patch.

This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
is actually not needed.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696

Uros.

> They weren't exposed in the testing run with the previous patch which
> doesn't use IFN way.  By investigating it, the difference comes from
> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>
> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>
> case MULT_EXPR:
> case WIDEN_MULT_EXPR:
> case MULT_HIGHPART_EXPR:
>   stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>
> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
> becomes profitable and the expected vectorization happens.
>
> One conservative fix seems to make IFN_MULH costing go through the
> unique cost interface for multiplication, that is:
>
>   case CFN_MULH:
> stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
> break;
>
> As the test case marks the checks as "xfail", probably it's good to
> revisit the costing on mul_highpart to ensure it's not priced more.
>
> The attached patch also addressed Richard S.'s review comments on
> two reformatting hunks.  Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
> * internal-fn.def (IFN_MULH): New internal function.
> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> recog normal multiply highpart as IFN_MULH.
> * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
> function CFN_MULH.


Re: [PATCH] handle vector and aggregate stores in -Wstringop-overflow [PR 97027]

2021-07-15 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 8:46 PM Martin Sebor  wrote:
>
> On 7/14/21 1:01 AM, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 9:27 PM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> An existing, previously xfailed test that I recently removed
> >> the xfail from made me realize that -Wstringop-overflow doesn't
> >> properly detect buffer overflow resulting from vectorized stores.
> >> Because of a difference in the IL the test passes on x86_64 but
> >> fails on targets like aarch64.  Other examples can be constructed
> >> that -Wstringop-overflow fails to diagnose even on x86_64.  For
> >> INSTANCE, the overflow in the following function isn't diagnosed
> >> when the loop is vectorized:
> >>
> >> void* f (void)
> >> {
> >>   char *p = __builtin_malloc (8);
> >>   for (int i = 0; i != 16; ++i)
> >> p[i] = 1 << i;
> >>   return p;
> >> }
> >>
> >> The attached change enhances the warning to detect those as well.
> >> It found a few bugs in vectorizer tests that the patch corrects.
> >> Tested on x86_64-linux and with an aarch64 cross.
> >
> > -  dest = gimple_call_arg (stmt, 0);
> > +  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
> > + && gimple_call_num_args (stmt))
> > +   dest = gimple_call_arg (stmt, 0);
> > +  else
> > +   dest = gimple_call_lhs (stmt);
> > +
> > +  if (!dest)
> > +   return;
> >
> > so this uses arg0 for memcpy (dst, src, 4) and also for bcopy (src, dst, 4)?
>
> No.  The code is only called for assignments like *p = f () and for
> a handful of built-ins (memcpy, strcpy, and memset).

I see - that wasn't obvious when looking at the patch.

> bcopy() returns void and so its result cannot be assigned.  I believe
> bcopy() and the other legacy bxxx() functions are also lowered into
> memcpy/memmove etc. so we should see no calls to it in the middle end.
> In any case, I have adjusted the function as described below to avoid
> even this hypothetical issue.
>
> > It looks quite fragile to me.  I think you want to use the LHS only if it is
> > aggregate (and not a pointer or some random other value).  Likewise
> > you should only use arg0 for a whitelist of builtins, not for any random 
> > one.
>
> I've added an argument to the function to make the distinction
> between a call result and argument explicit but I haven't been able
> to create a test case to exercise it.  For all the built-ins I've
> tried in an assignment like:
>
>extern char a[4];
>*(double*)a = nan ("foo");
>
> the call result ends up assigned to a temporary:
>
>_1 = __builtin_nan (s_2(D));
>MEM[(double *)&a] = _1;
>
> I can only get a call and assignment in one for user-defined functions
> that return an aggregate.

Yes, call LHS will be SSA names if the result is of register type.

> >
> > It's bad enough that compute_objsize decides for itself whether it is
> > passed a pointer or an object rather than the API being explicit about this.
> >
> > if (VAR_P (exp) || TREE_CODE (exp) == CONST_DECL)
> >   {
> > -  exp = ctor_for_folding (exp);
> > -  if (!exp)
> > -   return false;
> > +  /* If EXP can be folded into a constant use the result.  Otherwise
> > +proceed to use EXP to determine a range of the result.  */
> > +  if (tree fold_exp = ctor_for_folding (exp))
>   ^
> > +   if (fold_exp != error_mark_node)
> > + exp = fold_exp;
> >
> > fold_exp can be NULL, meaning a zero-initializer but below you'll run into
>
> fold_exp is assigned to exp if it's neither null (as I underlined
> above) nor error_mark_node so I think it's correct as is.

Oops.  Somehow missed that - the

  if (..)
if (..)

style for an && is a "bad" C++ requirement as "nice" as the
new if (tree x = ...) syntax is.

> >
> >const char *prep = NULL;
> >if (TREE_CODE (exp) == STRING_CST)
> >  {
> >
> > and crash.  Either you handle a NULL fold_expr explicitely or conservatively
> > continue to return false.
> >
> > +  /* The LHS and RHS of the store.  The RHS is null if STMT is a function
> > + call.  RHSTYPE is the type of the store.  */
> > +  tree lhs, rhs, rhstype;
> > +  if (is_gimple_assign (stmt))
> > +{
> > +  lhs = gimple_assign_lhs (stmt);
> > +  rhs = gimple_assign_rhs1 (stmt);
> > +  rhstype = TREE_TYPE (rhs);
> > +}
> > +  else if (is_gimple_call (stmt))
> > +{
> > +  lhs = gimple_call_lhs (stmt);
> > +  rhs = NULL_TREE;
> > +  rhstype = TREE_TYPE (gimple_call_fntype (stmt));
> > +}
> >
> > The type of the store in a call is better determined from the LHS.
> > For internal function calls the above will crash.
> >
> > Otherwise looks like reasonable changes.
>
> Please see the attached revision.

LGTM.

Thanks,
Richard.

> Martin


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-15 Thread Iain Sandoe via Gcc-patches



> On 15 Jul 2021, at 06:09, guojiufu via Gcc-patches  
> wrote:
> 
> On 2021-07-15 02:04, Segher Boessenkool wrote:
> 

>>> +@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE
>>> (machine_mode @var{mode})
>>> +This hook takes a @var{mode} which is the original mode of doloop IV.
>>> +And if the target prefers other mode for doloop IV, this hook returns
>>> the
>>> +preferred mode.
>>> +For example, on 64bit target, DImode may be preferred than SImode.
>>> +This hook could return the original mode itself if the target prefer to
>>> +keep the original mode.
>>> +The origianl mode and return mode should be MODE_INT.
>>> +@end deftypefn
>> (Typo, "original").  That has all the right contents, but needs someone
>> who is better at English than me to look at it / improve it.

well.. how about this small tweak?

This hook takes a @var{mode} for a doloop IV, where @code{mode} is the original 
mode for the operation.  If the target prefers an alternate @code{mode} for the 
operation, then this hook should return that mode.  For example, on a 64-bit 
target, @code{DImode} might be preferred over @code{SImode}.  The original 
@code{mode} should be returned if that is suitable.  Both the original and the 
returned modes should be @code{MODE_INT}.

0.02GBP only.
Iain



Re: [pushed] c++: enable -fdelete-dead-exceptions by default

2021-07-15 Thread Richard Biener
On Wed, 14 Jul 2021, Jason Merrill wrote:

> As I was discussing with richi, I don't think it makes sense to protect
> calls to pure/const functions from DCE just because they aren't explicitly
> declared noexcept.  PR100382 indicates that there are different
> considerations for Go, which has non-call exceptions.  But still turn the
> flag off for that specific testcase.

I don't disagree.  Note this means that

void test_div (int x, int y)
{
  x / y;
}

will no longer throw externally with -fnon-call-exceptions
unless you now specify -fno-delete-dead-exceptions
(in a separate thread we question what the -fnon-call-exceptions 
-fno-exceptions state we "support" actually means).

IIRC -fdelete-dead-exceptions was specifically added for 
-fnon-call-exceptions and dead code that could trap (we don't
generally preserve possibly trapping stmts as traps are not
observable but generally result from triggering behavior that
is undefined in terms of language definition).

But yes, -fdelete-dead-exceptions naturally applies to
const/pure function calls.

Since you change c_common_post_options you also affect the C and
Objective C/C++ compliers so you might want to adjust your
documentation change.  I guess cross-referencing 
-fdelete-dead-exceptions in the -fnon-call-exceptions documentation
makes sense as well.

Richard.

> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-opts.c (c_common_post_options): Set -fdelete-dead-exceptions.
> ---
>  gcc/doc/invoke.texi | 6 --
>  gcc/c-family/c-opts.c   | 4 
>  gcc/testsuite/g++.dg/torture/pr100382.C | 1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e67d47af676..ea8812425e9 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16335,8 +16335,10 @@ arbitrary signal handlers such as @code{SIGALRM}.
>  @opindex fdelete-dead-exceptions
>  Consider that instructions that may throw exceptions but don't otherwise
>  contribute to the execution of the program can be optimized away.
> -This option is enabled by default for the Ada compiler, as permitted by
> -the Ada language specification.
> +This does not affect calls to functions except those with the
> +@code{pure} or @code{const} attributes.
> +This option is enabled by default for the Ada and C++ compilers, as 
> permitted by
> +the language specifications.
>  Optimization passes that cause dead exceptions to be removed are enabled 
> independently at different optimization levels.
>  
>  @item -funwind-tables
> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
> index 60b5802722c..1212edd1b28 100644
> --- a/gcc/c-family/c-opts.c
> +++ b/gcc/c-family/c-opts.c
> @@ -1015,6 +1015,10 @@ c_common_post_options (const char **pfilename)
>SET_OPTION_IF_UNSET (&global_options, &global_options_set, 
> flag_finite_loops,
>  optimize >= 2 && cxx_dialect >= cxx11);
>  
> +  /* It's OK to discard calls to pure/const functions that throw.  */
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +flag_delete_dead_exceptions, true);
> +
>if (cxx_dialect >= cxx11)
>  {
>/* If we're allowing C++0x constructs, don't warn about C++98
> diff --git a/gcc/testsuite/g++.dg/torture/pr100382.C 
> b/gcc/testsuite/g++.dg/torture/pr100382.C
> index ffc4182cfce..eac5743b956 100644
> --- a/gcc/testsuite/g++.dg/torture/pr100382.C
> +++ b/gcc/testsuite/g++.dg/torture/pr100382.C
> @@ -1,4 +1,5 @@
>  // { dg-do run }
> +// { dg-additional-options -fno-delete-dead-exceptions }
>  
>  int x, y;
>  int __attribute__((pure,noinline)) foo () { if (x) throw 1; return y; }
> 
> base-commit: 6d1cdb27828d2ef1ae1ab0209836646a269b9610
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 15, 2021 at 2:58 PM Wang, Pengfei  wrote:
>
> It seems Clang doesn't support -fexcess-precision=xxx:
> https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403
>
> Thanks
> Pengfei
>
> -Original Message-
> From: Hongtao Liu 
> Sent: Thursday, July 15, 2021 2:35 PM
> To: Wang, Pengfei 
> Cc: Craig Topper ; Jakub Jelinek ; 
> Liu, Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 
> 
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
> On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei  wrote:
> >
> > Clang for AArch64 promotes each individual operation and rounds immediately 
> > afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> > fadd operations. It's implemented in the LLVM backend where we can't see 
> > what was originally a single expression.
> >
> >
> >
> > Yes, but this is not consistent with Clang document. I think we should ask 
> > Clang FE to do the promotion and truncation.
> >
> >
> >
> > Thanks
> >
> > Pengfei
> >
> >
> >
> > From: llvm-dev  On Behalf Of Craig
> > Topper via llvm-dev
> > Sent: Wednesday, July 14, 2021 11:32 PM
> > To: Hongtao Liu 
> > Cc: Jakub Jelinek ; llvm-dev
> > ; Liu, Hongtao ;
> > gcc-patches@gcc.gnu.org; Joseph Myers 
> > Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
> >
> >
> >
> > On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
> >  wrote:
> >
> > > >
> > > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > > round after each operation could keep semantics right.
> > > And I'll document the behavior difference between soft-fp and
> > > AVX512FP16 instruction for exceptions.
> > I got some feedback from my colleague who's working on supporting
> > _Float16 for llvm.
> > The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> > soft-fp so that codes can be more efficient.
> > i.e.
> > _Float16 a, b, c, d;
> > d = a + b + c;
> >
> > would be transformed to
> > float tmp, tmp1, a1, b1, c1;
> > a1 = (float) a;
> > b1 = (float) b;
> > c1 = (float) c;
> > tmp = a1 + b1;
> > tmp1 = tmp + c1;
> > d = (_Float16) tmp;
> >
> > so there's only 1 truncation in the end.
> >
> > if users want to round back after every operation. codes should be
> > explicitly written as
> > _Float16 a, b, c, d, e;
> > e = a + b;
> > d = e + c;
> >
> > That's what Clang does, quote from [1]
> >  _Float16 arithmetic will be performed using native half-precision
> > support when available on the target (e.g. on ARMv8.2a); otherwise it
> > will be performed at a higher precision (currently always float) and
> > then truncated down to _Float16. Note that C and C++ allow
> > intermediate floating-point operands of an expression to be computed
> > with greater precision than is expressible in their type, so Clang may
> > avoid intermediate truncations in certain cases; this may lead to
> > results that are inconsistent with native arithmetic.
> >
> >
> >
> > Clang for AArch64 promotes each individual operation and rounds immediately 
> > afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> > fadd operations. It's implemented in the LLVM backend where we can't see 
> > what was originally a single expression.
> >
> >
> When i'm reading option documents for excess-precision from 
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>
> -fexcess-precision=style
By this option, we can provide a solution that rounds back after each
operation or not, this should provide more convenience.

>
> This option allows further control over excess precision on machines where 
> floating-point operations occur in a format with more precision or range than 
> the IEEE standard and interchange floating-point types.
> By default, -fexcess-precision=fast is in effect; this means that operations 
> may be carried out in a wider precision than the types specified in the 
> source if that would result in faster code, and it is unpredictable when 
> rounding to the types specified in the source code takes place. When 
> compiling C, if -fexcess-precision=standard is specified then excess 
> precision follows the rules specified in ISO C99; in particular, both casts 
> and assignments cause values to be rounded to their semantic types (whereas 
> -ffloat-store only affects assignments). This option is enabled by default 
> for C if a strict conformance option such as -std=c99 is used. -ffast-math 
> enables -fexcess-precision=fast by default regardless of whether a strict 
> conformance option is used.
>
> For -fexcess-precision=fast,
>  we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
> soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16
>
> For  -fexcess-precision=standard
> set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for soft-fp it 
> will round back after every operation?
> >
> >
> > and so does arm gcc
> > quote from arm.c
> >
> > /* We can calculate either in 16-bit range and precision or
> >   

[PATCH] c++: Optimize away NULLPTR_TYPE comparisons [PR101443]

2021-07-15 Thread Jakub Jelinek via Gcc-patches
Hi!

Comparisons of NULLPTR_TYPE operands cause all kinds of problems in the
middle-end and in fold-const.c, various optimizations assume that if they
see e.g. a non-equality comparison with one of the operands being
INTEGER_CST and it is not INTEGRAL_TYPE_P (which has TYPE_{MIN,MAX}_VALUE),
they can build_int_cst (type, 1) to find a successor.

The following patch fixes it by making sure they don't appear in the IL,
optimize them away at cp_fold time as all can be folded.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I've just noticed that clang++ rejects the non-equality comparisons
instead, foo () > 0 with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'int')
and foo () > nullptr with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'nullptr_t')

Shall we reject those too, in addition or instead of parts of this patch?
If so, wouldn't this patch be still useful for backports, I bet we don't
want to start reject it on the release branches when we used to accept it.

2021-07-15  Jakub Jelinek  

PR c++/101443
* cp-gimplify.c (cp_fold): For comparisons with NULLPTR_TYPE
operands, fold them right away to true or false.

* g++.dg/cpp0x/nullptr46.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2021-06-25 10:36:22.141020337 +0200
+++ gcc/cp/cp-gimplify.c2021-07-14 12:04:24.221860756 +0200
@@ -2424,6 +2424,32 @@ cp_fold (tree x)
   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
   op1 = cp_fold_rvalue (TREE_OPERAND (x, 1));
 
+  /* decltype(nullptr) has only one value, so optimize away all comparisons
+with that type right away, keeping them in the IL causes troubles for
+various optimizations.  */
+  if (COMPARISON_CLASS_P (org_x)
+ && TREE_CODE (TREE_TYPE (op0)) == NULLPTR_TYPE
+ && TREE_CODE (TREE_TYPE (op1)) == NULLPTR_TYPE)
+   {
+ switch (code)
+   {
+   case EQ_EXPR:
+   case LE_EXPR:
+   case GE_EXPR:
+ x = constant_boolean_node (true, TREE_TYPE (x));
+ break;
+   case NE_EXPR:
+   case LT_EXPR:
+   case GT_EXPR:
+ x = constant_boolean_node (false, TREE_TYPE (x));
+ break;
+   default:
+ gcc_unreachable ();
+   }
+ return omit_two_operands_loc (loc, TREE_TYPE (x), x,
+   op0, op1);
+   }
+
   if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))
{
  if (op0 == error_mark_node || op1 == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/nullptr46.C.jj   2021-07-14 11:48:03.917122727 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/nullptr46.C  2021-07-14 11:46:52.261092097 
+0200
@@ -0,0 +1,11 @@
+// PR c++/101443
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2" }
+
+decltype(nullptr) foo ();
+
+bool
+bar ()
+{
+  return foo () > nullptr || foo () < nullptr;
+}

Jakub



Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-15 Thread Richard Biener via Gcc-patches
On Wed, Jul 14, 2021 at 4:10 PM Qing Zhao  wrote:
>
> Hi, Richard,
>
> > On Jul 14, 2021, at 2:14 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
> >>
> >> Hi, Kees,
> >>
> >> I took a look at the kernel testing case you attached in the previous 
> >> email, and found the testing failed with the following case:
> >>
> >> #define INIT_STRUCT_static_all  = { .one = arg->one,\
> >>.two = arg->two,\
> >>.three = arg->three,\
> >>.four = arg->four,  \
> >>}
> >>
> >> i.e, when the structure type auto variable has been explicitly initialized 
> >> in the source code.  -ftrivial-auto-var-init in the 4th version
> >> does not initialize the paddings for such variables.
> >>
> >> But in the previous version of the patches ( 2 or 3), 
> >> -ftrivial-auto-var-init initializes the paddings for such variables.
> >>
> >> I intended to remove this part of the code from the 4th version of the 
> >> patch since the implementation for initializing such paddings is 
> >> completely different from
> >> the initializing of the whole structure as a whole with memset in this 
> >> version of the implementation.
> >>
> >> If we really need this functionality, I will add another separate patch 
> >> for this additional functionality, but not with this patch.
> >>
> >> Richard, what’s your comment and suggestions on this?
> >
> > I think this can be addressed in the gimplifier by adjusting
> > gimplify_init_constructor to clear
> > the object before the initialization (if it's not done via aggregate
> > copying).
>
> I did this in the previous versions of the patch like the following:
>
> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>   /* If a single access to the target must be ensured and all elements
>  are zero, then it's optimal to clear whatever their number.  */
>   cleared = true;
> +   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
> +&& !TREE_STATIC (object)
> +&& type_has_padding (type))
> + /* If the user requests to initialize automatic variables with
> +paddings inside the type, we should initialize the paddings too.
> +C guarantees that brace-init with fewer initializers than members
> +aggregate will initialize the rest of the aggregate as-if it were
> +static initialization.  In turn static initialization guarantees
> +that pad is initialized to zero bits.
> +So, it's better to clear the whole record under such situation.  
> */
> + cleared = true;
> else
>   cleared = false;
>
> Then the paddings are also initialized to zeroes with this option. (Even for 
> -ftrivial-auto-var-init=pattern).
>
> Is the above change Okay? (With this change, when 
> -ftrivial-auto-var-init=pattern, the paddings for the
> structure variables that have explicit initializer will be ZEROed, not 0xFE)

I guess that would be the simplest way, yes.

> > The clearing
> > could be done via .DEFERRED_INIT.
>
> You mean to add additional calls to .DEFERRED_INIT for each individual 
> padding of the structure in “gimplify_init_constructor"?
> Then  later during RTL expand, expand these calls the same as other calls?

No, I actually meant to in your patch above set

defered_padding_init = true;

and where 'cleared' is processed do sth like

  if (defered_padding_init)
.. emit .DEFERRED_INIT for the _whole_ variable ..
  else if (cleared)
 .. original cleared handling ...

that would retain the pattern init but possibly be less efficient in the end.

> >
> > Note that I think .DEFERRED_INIT can be elided for variables that do
> > not have their address
> > taken - otherwise we'll also have to worry about aggregate copy
> > initialization and SRA
> > decomposing the copy, initializing only the used parts.
>
> Please explain this a little bit more.

For sth like

struct S { int i; long j; };

void bar (struct S);
struct S
foo (struct S *p)
{
  struct S q = *p;
  struct S r = q;
  bar (r);
  return r;
}

we don't get a .DEFERRED_INIT for 'r' (do we?) and SRA decomposes the init to

   :
  q = *p_2(D);
  q$i_9 = p_2(D)->i;
  q$j_10 = p_2(D)->j;
  r.i = q$i_9;
  r.j = q$j_10;
  bar (r);
  D.1953 = r;
  r ={v} {CLOBBER};
  return D.1953;

which leaves its padding uninitialized.  Hmm, and that even happens when
you make bar take struct S * and thus pass the address of 'r' to bar.

Richard.


> Thanks.
>
> Qing
> >
> > Richard.
> >
> >> Thanks.
> >>
> >> Qing
> >>
> >>> On Jul 13, 2021, at 4:29 PM, Kees Cook  wrote:
> >>>
> >>> On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
> > On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
> >>>

Re: [PATCH] Remove legacy external declarations in toplev.h [PR101447]

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:54 AM ashimida via Gcc-patches
 wrote:
>
>
> External declarations in ./gcc/toplev.h is no longer used in newest
> version of gcc and should be cleaned up to avoid misunderstandings.

OK

> gcc/ChangeLog:
>
>  * toplev.h (set_random_seed):
>
> ---
> diff --git a/gcc/toplev.h b/gcc/toplev.h
> index 175944c..f543554 100644
> --- a/gcc/toplev.h
> +++ b/gcc/toplev.h
> @@ -94,11 +94,6 @@ extern bool set_src_pwd (const
> char *);
>   extern HOST_WIDE_INT get_random_seed (bool);
>   extern void set_random_seed (const char *);
>
> -extern unsigned int min_align_loops_log;
> -extern unsigned int min_align_jumps_log;
> -extern unsigned int min_align_labels_log;
> -extern unsigned int min_align_functions_log;
> -
>   extern void parse_alignment_opts (void);
>
>   extern void initialize_rtl (void);
>
> ---
> The history FYI:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e6de53356769e13178975c18b4ce019a800ea946;hp=118f2d8bc3e6804996ca2953b86454ec950054bf
>


Re: [PATCH 1/4] force decls to be allocated through build_decl to initialize them

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 4:24 AM Trevor Saunders  wrote:
>
> On Wed, Jul 14, 2021 at 01:27:54PM +0200, Richard Biener wrote:
> > On Wed, Jul 14, 2021 at 10:20 AM Trevor Saunders  
> > wrote:
> > >
> > > prior to this commit all calls to build_decl used input_location, even if
> > > temporarily  until build_decl reset the location to something else that 
> > > it was
> > > told was the proper location.  To avoid using the global we need the 
> > > caller to
> > > pass in the location it wants, however that's not possible with make_node 
> > > since
> > > it makes other types of nodes.  So we force all callers who wish to make 
> > > a decl
> > > to go through build_decl which already takes a location argument.  To 
> > > avoid
> > > changing behavior this just explicitly passes in input_location to 
> > > build_decl
> > > for callers of make_node that create a decl, however it would seem in 
> > > many of
> > > these cases that the location of the decl being coppied might be a better
> > > location.
> > >
> > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> >
> > I think all eventually DECL_ARTIFICIAL decls should better use
> > UNKNOWN_LOCATION instead of input_location.
>
> You'd know if that might break something better than me, but that seems
> sensible in principal.  That said, I would like to incrementally do one
> thing at a time, rather than change make_node to use unknown_location,
> and set the location to something else all at once, but I suppose I
> could first change some callers to be build_decl (unknown_location, ...)
> and then come back to changing make_node when there's fewer callers to
> reason about if that's preferable.

Sure, we can defer changing make_node (I thought the patch catched all
but three callers ...).  But it feels odd to introduce so many explicit
input_location uses for cases where it clearly doesn't matter (the
DECL_ARTIFICIAL),
so I'd prefer to "fix" those immediately,

> > I'm not sure if I like the (transitional) extra arg to make_node, I suppose
> > we could hide make_node by declaring it in tree-raw.h or so or by
> > guarding the decl with NEED_MAKE_NODE.  There's nothing inherently
> > wrong with calling make_node.  So what I mean with transitional is that
> > with this change we should simply set the location to UNKNOWN_LOCATION
> > (aka zero, which it already is), not input_location, in make_node.
>
> I sort of think it makes sense to move all the tree class specific bits
> out of make_node to functions for that specific type of tree, but it is
> mostly unrelated.  One advantage of that is that it saves pointless
> initialization in the module / lto streamer that gets over written with
> the streamed values.  However having used the argument to find all the
> places that create decls, and having updated them, while the argument
> and asserts do  prevent leaving the location uninitialized by mistake,
> I'd be fine with dropping that part and just updating all the make_node
> callers to use build_decl.

Yes, that's my thinking.

Thanks,
Richard.

>
> thanks
>
> Trev
>
> >
> > Richard.
> >
> > > Trev
> > >
> > > gcc/ChangeLog:
> > >
> > > * cfgexpand.c (avoid_deep_ter_for_debug): Call build_decl not
> > > make_node.
> > > (expand_gimple_basic_block): Likewise.
> > > * ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
> > > * Likewise.
> > > (ipa_param_body_adjustments::reset_debug_stmts): Likewise.
> > > * omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
> > > * stor-layout.c (start_bitfield_representative): Likewise.
> > > * tree-inline.c (remap_ssa_name): Likewise.
> > > (tree_function_versioning): Likewise.
> > > * tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
> > > * tree-nested.c (lookup_field_for_decl): Likewise.
> > > (get_chain_field): Likewise.
> > > (create_field_for_decl): Likewise.
> > > (get_nl_goto_field): Likewise.
> > > (finalize_nesting_tree_1): Likewise.
> > > * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
> > > * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
> > > * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
> > > * tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
> > > * tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
> > > * tree-streamer-in.c (streamer_alloc_tree): Adjust.
> > > * tree.c (make_node): Add argument to specify the caller.
> > > (build_decl): Move initialization from make_node.
> > > * tree.h (enum make_node_caller): new enum.
> > > (make_node): Adjust prototype.
> > > * varasm.c (make_debug_expr_from_rtl): call build_decl.
> > >
> > > gcc/cp/ChangeLog:
> > >
> > > * constraint.cc (build_type_constraint): Call build_decl not 
> > > make_node.
> > > * cp-gimplify.c (cp_genericize_r): Likewise.
> > > * parser.c (cp_parser_intro

[PATCH] gimplify: Fix endless recursion on volatile empty type reads/writes [PR101437]

2021-07-15 Thread Jakub Jelinek via Gcc-patches
Hi!

Andrew's recent change to optimize away during gimplification not just
assignments of zero sized types, but also assignments of empty types,
caused infinite recursion in the gimplifier.
If such assignment is optimized away, we gimplify separately the to_p
and from_p operands and throw away the result.  When gimplifying the
operand that is volatile, we run into the gimplifier code below, which has
different handling for types with non-BLKmode mode, tries to gimplify
those as vol.N = expr, and for BLKmode just throws those away.
Zero sized types will always have BLKmode and so are fine, but for the
non-BLKmode ones like struct S in the testcase, the vol.N = expr
gimplification will reach again the gimplify_modify_expr code, see it is
assignment of empty type and will gimplify again vol.N separately
(non-volatile, so ok) and expr, on which it will recurse again.

The following patch breaks that infinite recursion by ignoring bare
volatile loads from empty types.
If a volatile load or store for aggregates are supposed to be member-wise
loads or stores, then there are no non-padding members in the empty types that
should be copied and so it is probably ok.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-15  Jakub Jelinek  

PR middle-end/101437
* gimplify.c (gimplify_expr): Throw away volatile reads from empty
types even if they have non-BLKmode TYPE_MODE.

* gcc.c-torture/compile/pr101437.c: New test.

--- gcc/gimplify.c.jj   2021-06-25 10:36:22.232019090 +0200
+++ gcc/gimplify.c  2021-07-14 13:24:28.860486677 +0200
@@ -15060,7 +15060,8 @@ gimplify_expr (tree *expr_p, gimple_seq
  *expr_p = NULL;
}
   else if (COMPLETE_TYPE_P (TREE_TYPE (*expr_p))
-  && TYPE_MODE (TREE_TYPE (*expr_p)) != BLKmode)
+  && TYPE_MODE (TREE_TYPE (*expr_p)) != BLKmode
+  && !is_empty_type (TREE_TYPE (*expr_p)))
{
  /* Historically, the compiler has treated a bare reference
 to a non-BLKmode volatile lvalue as forcing a load.  */
--- gcc/testsuite/gcc.c-torture/compile/pr101437.c.jj   2021-07-14 
13:35:57.155100700 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr101437.c  2021-07-14 
13:35:41.430314232 +0200
@@ -0,0 +1,29 @@
+/* PR middle-end/101437 */
+
+struct S { int : 1; };
+
+void
+foo (volatile struct S *p)
+{
+  struct S s = {};
+  *p = s;
+}
+
+void
+bar (volatile struct S *p)
+{
+  *p;
+}
+
+void
+baz (volatile struct S *p)
+{
+  struct S s;
+  s = *p;
+}
+
+void
+qux (volatile struct S *p, volatile struct S *q)
+{
+  *p = *q;
+}

Jakub



Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Kewen.Lin via Gcc-patches
Hi Uros,

on 2021/7/15 下午3:17, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
>>
>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>> on 2021/7/14 下午2:38, Richard Biener wrote:
 On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
>
> on 2021/7/13 下午8:42, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:

> I guess the proposed IFN would be directly mapped for [us]mul_highpart?

 Yes.

>>>
>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>> is ongoing.
>>>
>>
>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> 
> These XFAILs should be removed after your patch.
> 
I'm curious whether it's intentional not to specify -fno-vect-cost-model
for this test case.  As noted above, this case is sensitive on how we
cost mult_highpart.  Without cost modeling, the XFAILs can be removed
only with this mul_highpart pattern support, no matter how we model it
(x86 part of this patch exists or not).

> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> is actually not needed.
> 

Thanks for the information!  The justification for the x86 part is that:
the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
optab support, i386 port has already customized costing for 
MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
support), if we don't follow the same way for IFN_MULH, I'm worried that
we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
a right thing (we shouldn't cost it specially), it at least means we
have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
has direct mul_highpart optab support, I think they should be costed
consistently.  Does it sound reasonable?

BR,
Kewen

> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
> 
> Uros.
> 
>> They weren't exposed in the testing run with the previous patch which
>> doesn't use IFN way.  By investigating it, the difference comes from
>> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>>
>> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>>
>> case MULT_EXPR:
>> case WIDEN_MULT_EXPR:
>> case MULT_HIGHPART_EXPR:
>>   stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>>
>> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
>> becomes profitable and the expected vectorization happens.
>>
>> One conservative fix seems to make IFN_MULH costing go through the
>> unique cost interface for multiplication, that is:
>>
>>   case CFN_MULH:
>> stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>> break;
>>
>> As the test case marks the checks as "xfail", probably it's good to
>> revisit the costing on mul_highpart to ensure it's not priced more.
>>
>> The attached patch also addressed Richard S.'s review comments on
>> two reformatting hunks.  Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>> * internal-fn.def (IFN_MULH): New internal function.
>> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>> recog normal multiply highpart as IFN_MULH.
>> * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>> function CFN_MULH.


Re: [pushed] c++: fix tree_contains_struct for C++ types [PR101095]

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 5:19 AM Jason Merrill via Gcc-patches
 wrote:
>
> Many of the types from cp-tree.def were only marked as having tree_common,
> when actually most of them have type_non_common.  This broke
> g++.dg/modules/xtreme-header-2, as the modules code relies on
> tree_contains_struct to know what bits it needs to stream.
>
> We don't seem to use type_non_common for TYPE_ARGUMENT_PACK, so I bumped it
> down to TS_TYPE_COMMON.  I tried doing the same in cp_tree_size, but that
> breaks without more extensive changes to tree_node_structure.
>
> Why do we need the init_ts function anyway?  It seems redundant with
> tree_node_structure.

tree_node_structure is a helper written for initialize_tree_contains_struct
(the language independent "init_ts"), it's also used for the GTY dispatcher
of the tree union.

>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> PR c++/101095
>
> gcc/cp/ChangeLog:
>
> * cp-objcp-common.c (cp_common_init_ts): Mark types as types.
> (cp_tree_size): Remove redundant entries.
> ---
>  gcc/cp/cp-objcp-common.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
> index 46b2248574c..ee255732d5a 100644
> --- a/gcc/cp/cp-objcp-common.c
> +++ b/gcc/cp/cp-objcp-common.c
> @@ -72,10 +72,13 @@ cp_tree_size (enum tree_code code)
>  case DEFERRED_NOEXCEPT:return sizeof (tree_deferred_noexcept);
>  case OVERLOAD: return sizeof (tree_overload);
>  case STATIC_ASSERT: return sizeof (tree_static_assert);
> -case TYPE_ARGUMENT_PACK:
> -case TYPE_PACK_EXPANSION:  return sizeof (tree_type_non_common);
> -case NONTYPE_ARGUMENT_PACK:
> -case EXPR_PACK_EXPANSION:  return sizeof (tree_exp);
> +#if 0
> +  /* This would match cp_common_init_ts, but breaks GC because
> +tree_node_structure_for_code returns TS_TYPE_NON_COMMON for all
> +types.  */
> +case UNBOUND_CLASS_TEMPLATE:
> +case TYPE_ARGUMENT_PACK:   return sizeof (tree_type_common);
> +#endif
>  case ARGUMENT_PACK_SELECT: return sizeof (tree_argument_pack_select);
>  case TRAIT_EXPR:   return sizeof (tree_trait_expr);
>  case LAMBDA_EXPR:   return sizeof (tree_lambda_expr);
> @@ -456,13 +459,8 @@ cp_common_init_ts (void)
>
>/* Random new trees.  */
>MARK_TS_COMMON (BASELINK);
> -  MARK_TS_COMMON (DECLTYPE_TYPE);
>MARK_TS_COMMON (OVERLOAD);
>MARK_TS_COMMON (TEMPLATE_PARM_INDEX);
> -  MARK_TS_COMMON (TYPENAME_TYPE);
> -  MARK_TS_COMMON (TYPEOF_TYPE);
> -  MARK_TS_COMMON (UNBOUND_CLASS_TEMPLATE);
> -  MARK_TS_COMMON (UNDERLYING_TYPE);
>
>/* New decls.  */
>MARK_TS_DECL_COMMON (TEMPLATE_DECL);
> @@ -472,10 +470,16 @@ cp_common_init_ts (void)
>MARK_TS_DECL_NON_COMMON (USING_DECL);
>
>/* New Types.  */
> +  MARK_TS_TYPE_COMMON (UNBOUND_CLASS_TEMPLATE);
> +  MARK_TS_TYPE_COMMON (TYPE_ARGUMENT_PACK);
> +
> +  MARK_TS_TYPE_NON_COMMON (DECLTYPE_TYPE);
> +  MARK_TS_TYPE_NON_COMMON (TYPENAME_TYPE);
> +  MARK_TS_TYPE_NON_COMMON (TYPEOF_TYPE);
> +  MARK_TS_TYPE_NON_COMMON (UNDERLYING_TYPE);
>MARK_TS_TYPE_NON_COMMON (BOUND_TEMPLATE_TEMPLATE_PARM);
>MARK_TS_TYPE_NON_COMMON (TEMPLATE_TEMPLATE_PARM);
>MARK_TS_TYPE_NON_COMMON (TEMPLATE_TYPE_PARM);
> -  MARK_TS_TYPE_NON_COMMON (TYPE_ARGUMENT_PACK);
>MARK_TS_TYPE_NON_COMMON (TYPE_PACK_EXPANSION);
>
>/* Statements.  */
>
> base-commit: c4fee1c646d52a9001a53fa0d4072db86b9be791
> --
> 2.27.0
>


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-15 Thread Jiufu Guo via Gcc-patches

Iain Sandoe  writes:

On 15 Jul 2021, at 06:09, guojiufu via Gcc-patches 
 wrote:


On 2021-07-15 02:04, Segher Boessenkool wrote:



+@deftypefn {Target Hook} machine_mode 
TARGET_PREFERRED_DOLOOP_MODE

(machine_mode @var{mode})
+This hook takes a @var{mode} which is the original mode of 
doloop IV.
+And if the target prefers other mode for doloop IV, this 
hook returns

the
+preferred mode.
+For example, on 64bit target, DImode may be preferred than 
SImode.
+This hook could return the original mode itself if the 
target prefer to

+keep the original mode.
+The origianl mode and return mode should be MODE_INT.
+@end deftypefn
(Typo, "original").  That has all the right contents, but 
needs someone

who is better at English than me to look at it / improve it.


well.. how about this small tweak?

This hook takes a @var{mode} for a doloop IV, where @code{mode} 
is the original mode for the operation.  If the target prefers 
an alternate @code{mode} for the operation, then this hook 
should return that mode.  For example, on a 64-bit target, 
@code{DImode} might be preferred over @code{SImode}.  The 
original @code{mode} should be returned if that is suitable. 
Both the original and the returned modes should be 
@code{MODE_INT}.


Hi Iain,

Thanks a lot! I would nearly use all your words. :) 

This hook takes a @var{mode} for a doloop IV, where @code{mode} is 
the original mode for the operation.  If the target prefers an 
alternate @code{mode} for the operation, then this hook should 
return that mode; otherwise the original @code{mode} should be 
returned.  For example, on a 64-bit target, @code{DImode} might be 
preferred over @code{SImode}.  Both the original and the returned 
modes should be @code{MODE_INT}.


BR,
Jiufu



0.02GBP only.
Iain


Re: [PATCH] gimplify: Fix endless recursion on volatile empty type reads/writes [PR101437]

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Jakub Jelinek wrote:

> Hi!
> 
> Andrew's recent change to optimize away during gimplification not just
> assignments of zero sized types, but also assignments of empty types,
> caused infinite recursion in the gimplifier.
> If such assignment is optimized away, we gimplify separately the to_p
> and from_p operands and throw away the result.  When gimplifying the
> operand that is volatile, we run into the gimplifier code below, which has
> different handling for types with non-BLKmode mode, tries to gimplify
> those as vol.N = expr, and for BLKmode just throws those away.
> Zero sized types will always have BLKmode and so are fine, but for the
> non-BLKmode ones like struct S in the testcase, the vol.N = expr
> gimplification will reach again the gimplify_modify_expr code, see it is
> assignment of empty type and will gimplify again vol.N separately
> (non-volatile, so ok) and expr, on which it will recurse again.
> 
> The following patch breaks that infinite recursion by ignoring bare
> volatile loads from empty types.
> If a volatile load or store for aggregates are supposed to be member-wise
> loads or stores, then there are no non-padding members in the empty types that
> should be copied and so it is probably ok.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-07-15  Jakub Jelinek  
> 
>   PR middle-end/101437
>   * gimplify.c (gimplify_expr): Throw away volatile reads from empty
>   types even if they have non-BLKmode TYPE_MODE.
> 
>   * gcc.c-torture/compile/pr101437.c: New test.
> 
> --- gcc/gimplify.c.jj 2021-06-25 10:36:22.232019090 +0200
> +++ gcc/gimplify.c2021-07-14 13:24:28.860486677 +0200
> @@ -15060,7 +15060,8 @@ gimplify_expr (tree *expr_p, gimple_seq
> *expr_p = NULL;
>   }
>else if (COMPLETE_TYPE_P (TREE_TYPE (*expr_p))
> -&& TYPE_MODE (TREE_TYPE (*expr_p)) != BLKmode)
> +&& TYPE_MODE (TREE_TYPE (*expr_p)) != BLKmode
> +&& !is_empty_type (TREE_TYPE (*expr_p)))
>   {
> /* Historically, the compiler has treated a bare reference
>to a non-BLKmode volatile lvalue as forcing a load.  */
> --- gcc/testsuite/gcc.c-torture/compile/pr101437.c.jj 2021-07-14 
> 13:35:57.155100700 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr101437.c2021-07-14 
> 13:35:41.430314232 +0200
> @@ -0,0 +1,29 @@
> +/* PR middle-end/101437 */
> +
> +struct S { int : 1; };
> +
> +void
> +foo (volatile struct S *p)
> +{
> +  struct S s = {};
> +  *p = s;
> +}
> +
> +void
> +bar (volatile struct S *p)
> +{
> +  *p;
> +}
> +
> +void
> +baz (volatile struct S *p)
> +{
> +  struct S s;
> +  s = *p;
> +}
> +
> +void
> +qux (volatile struct S *p, volatile struct S *q)
> +{
> +  *p = *q;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin  wrote:
>
> Hi Uros,
>
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
> >>
> >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>> on 2021/7/14 下午2:38, Richard Biener wrote:
>  On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
> >
> > on 2021/7/13 下午8:42, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
> 
> > I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> 
>  Yes.
> 
> >>>
> >>> Thanks for confirming!  The related patch v2 is attached and the testing
> >>> is ongoing.
> >>>
> >>
> >> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >
> > These XFAILs should be removed after your patch.
> >
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
>
> > This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> > is actually not needed.
> >
>
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?

Ah, I was under impression that i386 part was introduced to avoid
generation of PMULHW instructions in the testcases above (to keep
XFAILs). Based on your explanation - yes, the costing function should
be the same. So, the x86 part is OK.

Thanks,
Uros.


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-15 Thread guojiufu via Gcc-patches

On 2021-07-15 14:06, Richard Biener wrote:

On Tue, 13 Jul 2021, Jiufu Guo wrote:


Major changes from v1:
* Add target hook to query preferred doloop mode.
* Recompute doloop iv base from niter under preferred mode.

Currently, doloop.xx variable is using the type as niter which may 
shorter
than word size.  For some cases, it would be better to use word size 
type.
For example, on 64bit system, to access 32bit value, subreg maybe 
used.

Then using 64bit type maybe better for niter if it can be present in
both 32bit and 64bit.

This patch add target hook for querg perferred mode for doloop iv.
And update doloop iv mode accordingly.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* gcc.target/powerpc/pr61837.c: New test.
---
 gcc/config/rs6000/rs6000.c |  9 +++
 gcc/doc/tm.texi|  4 ++
 gcc/doc/tm.texi.in |  2 +
 gcc/target.def |  7 +++
 gcc/targhooks.c|  8 +++
 gcc/targhooks.h|  2 +
 gcc/testsuite/gcc.target/powerpc/pr61837.c | 16 ++
 gcc/tree-ssa-loop-ivopts.c | 66 
+-

 8 files changed, 112 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..444f3c49288 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1700,6 +1700,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =

 #undef TARGET_DOLOOP_COST_FOR_ADDRESS
 #define TARGET_DOLOOP_COST_FOR_ADDRESS 10

+#undef TARGET_PREFERRED_DOLOOP_MODE
+#define TARGET_PREFERRED_DOLOOP_MODE rs6000_preferred_doloop_mode
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV 
rs6000_atomic_assign_expand_fenv


@@ -27867,6 +27870,12 @@ rs6000_predict_doloop_p (struct loop *loop)
   return true;
 }

+static machine_mode
+rs6000_preferred_doloop_mode (machine_mode)
+{
+  return word_mode;
+}
+
 /* Implement TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P.  */

 static bool
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2a41ae5fba1..3f5881220f8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11984,6 +11984,10 @@ By default, the RTL loop optimizer does not 
use a present doloop pattern for

 loops containing function calls or branch on table instructions.
 @end deftypefn

+@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
(machine_mode @var{mode})

+This hook returns a more preferred mode or the @var{mode} itself.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN 
(rtx_insn *@var{insn})
 Take an instruction in @var{insn} and return @code{false} if the 
instruction

 is not appropriate as a combination of two or more instructions.  The
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f881cdabe9e..38215149a92 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7917,6 +7917,8 @@ to by @var{ce_info}.

 @hook TARGET_INVALID_WITHIN_DOLOOP

+@hook TARGET_PREFERRED_DOLOOP_MODE
+
 @hook TARGET_LEGITIMATE_COMBINED_INSN

 @hook TARGET_CAN_FOLLOW_JUMP
diff --git a/gcc/target.def b/gcc/target.def
index c009671c583..91a96150e50 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4454,6 +4454,13 @@ loops containing function calls or branch on 
table instructions.",

  const char *, (const rtx_insn *insn),
  default_invalid_within_doloop)

+DEFHOOK
+(preferred_doloop_mode,
+ "This hook returns a more preferred mode or the @var{mode} itself.",
+ machine_mode,
+ (machine_mode mode),
+ default_preferred_doloop_mode)
+
 /* Returns true for a legitimate combined insn.  */
 DEFHOOK
 (legitimate_combined_insn,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 44a1facedcf..eb5190910dc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -660,6 +660,14 @@ default_predict_doloop_p (class loop *loop 
ATTRIBUTE_UNUSED)

   return false;
 }

+/* By default, just use the input MODE itself.  */
+
+machine_mode
+default_preferred_doloop_mode (machine_mode mode)
+{
+  return mode;
+}
+
 /* NULL if INSN insn is valid within a low-overhead loop, otherwise 
returns

an error mes

Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-15 Thread Christophe Lyon via Gcc-patches
Hi,


On Sat, Jul 10, 2021 at 1:11 AM Martin Sebor via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> The attached tweak avoids the new -Warray-bounds instances when
> building libatomic for arm. Christophe confirms it resolves
> the problem (thank you!)
>
> As we have discussed, the main goal of this class of warnings
> is to detect accesses at addresses derived from null pointers
> (e.g., to struct members or array elements at a nonzero offset).
> Diagnosing accesses at hardcoded addresses is incidental because
> at the stage they are detected the two are not distinguishable
> from each another.
>
> I'm planning (hoping) to implement detection of invalid pointer
> arithmetic involving null for GCC 12, so this patch is a stopgap
> solution to unblock the arm libatomic build without compromising
> the warning.  Once the new detection is in place these workarounds
> can be removed or replaced with something more appropriate (e.g.,
> declaring the objects at the hardwired addresses with an attribute
> like AVR's address or io; that would enable bounds checking at
> those addresses as well).
>
>
May I ping this patch?
ARM toolchain build (cross & bootstrap) has been  broken for more
than a week, preventing regression detection.

Thanks,

Christophe

Martin
>


Re: [PATCH][AArch32]: Correct sdot RTL on aarch32

2021-07-15 Thread Christophe Lyon via Gcc-patches
Hi Tamar,


On Tue, May 25, 2021 at 5:41 PM Tamar Christina via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi All,
>
> The RTL Generated from dot_prod is invalid as operand3 cannot
> be
> written to, it's a normal input.  For the expand it's just another operand
> but the caller does not expect it to be written to.
>
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
>
> Ok for master? and backport to GCC 11, 10, 9?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/arm/neon.md (dot_prod): Drop statements.
>
> --- inline copy of patch --
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index
> 61d81646475ce3bf62ece2cec2faf0c1fe978ec1..9602e9993aeebf4ec620d105fd20f64498a3b851
> 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -3067,13 +3067,7 @@ (define_expand "dot_prod"
>  DOTPROD)
> (match_operand:VCVTI 3 "register_operand")))]
>"TARGET_DOTPROD"
> -{
> -  emit_insn (
> -gen_neon_dot (operands[3], operands[3], operands[1],
> -operands[2]));
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> -  DONE;
> -})
> +)
>
>  ;; Auto-vectorizer pattern for usdot
>  (define_expand "usdot_prod"
>
>
This patch is causing ICEs on arm-eabi (and probably arm-linux-gnueabi but
trunk build is currently broken):

 FAIL: gcc.target/arm/simd/vect-dot-s8.c (internal compiler error)
FAIL: gcc.target/arm/simd/vect-dot-s8.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: error:
unrecognizable insn:
(insn 29 28 30 5 (set (reg:V4SI 132 [ vect_patt_31.15 ])
(plus:V4SI (unspec:V4SI [
(reg:V16QI 182)
(reg:V16QI 183)
] UNSPEC_DOT_S)
(reg:V4SI 184))) -1
 (nil))
during RTL pass: vregs
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: internal compiler
error: in extract_insn, at recog.c:2769
0x5fc656 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/gcc/rtl-error.c:108
0x5fc672 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
/gcc/rtl-error.c:116
0xcbbe07 extract_insn(rtx_insn*)
/gcc/recog.c:2769
0x9e2e95 instantiate_virtual_regs_in_insn
/gcc/function.c:1611
0x9e2e95 instantiate_virtual_regs
/gcc/function.c:1985
0x9e2e95 execute
/gcc/function.c:2034

Can you check?

Thanks,

Christophe


Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Kewen.Lin via Gcc-patches
on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> Hi Uros,
> 
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
>>>
>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
 on 2021/7/14 下午2:38, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
>>
>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
>
>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>
> Yes.
>

 Thanks for confirming!  The related patch v2 is attached and the testing
 is ongoing.

>>>
>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>
>> These XFAILs should be removed after your patch.
>>
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
> 
>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>> is actually not needed.
>>
> 
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for 
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?
> 

Hi Richard(s),

This possibly inconsistent handling problem seems like a counter example
better to use a new IFN rather than the existing tree_code, it seems hard
to maintain (should remember to keep consistent for its handlings).  ;)
>From this perspective, maybe it's better to move backward to use tree_code
and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
costing issue Richi pointed out before)?

What do you think?

BR,
Kewen


Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Kewen.Lin via Gcc-patches
on 2021/7/15 下午4:23, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin  wrote:
>>
>> Hi Uros,
>>
>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:

 on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> on 2021/7/14 下午2:38, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
>>>
>>> on 2021/7/13 下午8:42, Richard Biener wrote:
 On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
>>
>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>
>> Yes.
>>
>
> Thanks for confirming!  The related patch v2 is attached and the testing
> is ongoing.
>

 It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
 aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:

 XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
 XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
 XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
 XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>
>>> These XFAILs should be removed after your patch.
>>>
>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
>> for this test case.  As noted above, this case is sensitive on how we
>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
>> only with this mul_highpart pattern support, no matter how we model it
>> (x86 part of this patch exists or not).
>>
>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>>> is actually not needed.
>>>
>>
>> Thanks for the information!  The justification for the x86 part is that:
>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
>> optab support, i386 port has already customized costing for
>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
>> support), if we don't follow the same way for IFN_MULH, I'm worried that
>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
>> a right thing (we shouldn't cost it specially), it at least means we
>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
>> has direct mul_highpart optab support, I think they should be costed
>> consistently.  Does it sound reasonable?
> 
> Ah, I was under impression that i386 part was introduced to avoid
> generation of PMULHW instructions in the testcases above (to keep
> XFAILs). Based on your explanation - yes, the costing function should
> be the same. So, the x86 part is OK.
> 

Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
doesn't care about the costing much just like most vectorization cases?
If so, do you want me to remove the xfails with one extra option 
"-fno-vect-cost-model" along with this patch?

BR,
Kewen


Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-15 Thread Christophe Lyon via Gcc-patches
On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Jonathan Wright  writes:
> > Hi,
> >
> > Version 2 of this patch adds more code generation tests to show the
> > benefit of this RTL simplification as well as adding a new helper
> function
> > 'rtx_vec_series_p' to reduce code duplication.
> >
> > Patch tested as version 1 - ok for master?
>
> Sorry for the slow reply.
>
> > Regression tested and bootstrapped on aarch64-none-linux-gnu,
> > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> > aarch64_be-none-linux-gnu - no issues.
>
> I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.
>
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
> 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode
> op0_mode, int in_dest,
> > - 1,
> > 0));
> >break;
> > +case VEC_SELECT:
> > +  {
> > + rtx trueop0 = XEXP (x, 0);
> > + mode = GET_MODE (trueop0);
> > + rtx trueop1 = XEXP (x, 1);
> > + int nunits;
> > + /* If we select a low-part subreg, return that.  */
> > + if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> > + && targetm.can_change_mode_class (mode, GET_MODE (x),
> ALL_REGS))
> > +   {
> > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0)
> : 0;
> > +
> > + if (rtx_vec_series_p (trueop1, offset))
> > +   {
> > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> > + if (new_rtx != NULL_RTX)
> > +   return new_rtx;
> > +   }
> > +   }
> > +  }
>
> Since this occurs three times, I think it would be worth having
> a new predicate:
>
> /* Return true if, for all OP of mode OP_MODE:
>
>  (vec_select:RESULT_MODE OP SEL)
>
>is equivalent to the lowpart RESULT_MODE of OP.  */
>
> bool
> vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx
> sel)
>
> containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
> and rtx_vec_series_p tests.
>
> I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
> is in emit-rtl.c.
>
> > diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> > index
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
> 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -1884,15 +1884,16 @@
> >  )
> >
> >  (define_insn "*zero_extend2_aarch64"
> > -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> > -(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m")))]
> > +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> > +(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m,w")))]
> >""
> >"@
> > and\t%0, %1, 
> > ldr\t%w0, %1
> > -   ldr\t%0, %1"
> > -  [(set_attr "type" "logic_imm,load_4,f_loads")
> > -   (set_attr "arch" "*,*,fp")]
> > +   ldr\t%0, %1
> > +   umov\t%w0, %1.[0]"
> > +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> > +   (set_attr "arch" "*,*,fp,fp")]
>
> FTR (just to show I thought about it): I don't know whether the umov
> can really be considered an fp operation rather than a simd operation,
> but since we don't support fp without simd, this is already a distinction
> without a difference.  So the pattern is IMO OK as-is.
>
> > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> > index
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
> 100644
> > --- a/gcc/config/arm/vfp.md
> > +++ b/gcc/config/arm/vfp.md
> > @@ -224,7 +224,7 @@
> >  ;; problems because small constants get converted into adds.
> >  (define_insn "*arm_movsi_vfp"
> >[(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m
> ,*t,r,*t,*t, *Uv")
> > -  (match_operand:SI 1 "general_operand" "rk,
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> > +  (match_operand:SI 1 "general_operand" "rk,
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
> >"TARGET_ARM && TARGET_HARD_FLOAT
> > && (   s_register_operand (operands[0], SImode)
> > || s_register_operand (operands[1], SImode))"
>
> I'll assume that an Arm maintainer would have spoken up by now if
> they didn't want this for some reason.
>
> > diff --git a/gcc/rtl.c b/gcc/rtl.c
> > index
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
> 100644
> > --- a/gcc/rtl.c
> > +++ b/gcc/rtl.c
> > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
> >  }
> >  }
> >
> > +/* Return true if element-selection indices in VEC are in series.  */
> > +
> > +bool
> > +rtx_vec_series_p (const_rtx vec, int start)
>
> I think rtvec_series_p would be better, for consistency with
> rtvec_all_equal_p.  Al

Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

2021-07-15 Thread Christophe Lyon via Gcc-patches
Hi Prathamesh,

On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 05 July 2021 10:18
> > To: gcc Patches ; Kyrylo Tkachov
> > 
> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
> > intrinsics
> >
> > Hi Kyrill,
> > I assume this patch is OK to commit after bootstrap+testing ?
>
> Yes.
> Thanks,
> Kyrill
>
>

The updated testcase fails on some configs:
gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 2
times
FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times
vdup\\.16\\tq[0-9]+, r[0-9]+ 3

For instance on arm-none-eabi with default configuration flags
(mode/cpu/fpu)
and default runtestflags.
The same toolchain config also fails on this test when
overriding runtestflags with:
-mthumb/-mfloat-abi=soft/-march=armv6s-m
-mthumb/-mfloat-abi=soft/-march=armv7-m
-mthumb/-mfloat-abi=soft/-march=armv8.1-m.main

Can you fix this please?

Thanks,

Christophe

>
> > Thanks,
> > Prathamesh
>


Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin 
napisala:

> on 2021/7/15 下午4:23, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin  wrote:
> >>
> >> Hi Uros,
> >>
> >> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
> 
>  on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> > on 2021/7/14 下午2:38, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin 
> wrote:
> >>>
> >>> on 2021/7/13 下午8:42, Richard Biener wrote:
>  On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin 
> wrote:
> >>
> >>> I guess the proposed IFN would be directly mapped for
> [us]mul_highpart?
> >>
> >> Yes.
> >>
> >
> > Thanks for confirming!  The related patch v2 is attached and the
> testing
> > is ongoing.
> >
> 
>  It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>  aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as
> below:
> 
>  XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>  XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>  XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>  XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>
> >>> These XFAILs should be removed after your patch.
> >>>
> >> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> >> for this test case.  As noted above, this case is sensitive on how we
> >> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> >> only with this mul_highpart pattern support, no matter how we model it
> >> (x86 part of this patch exists or not).
> >>
> >>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >>> is actually not needed.
> >>>
> >>
> >> Thanks for the information!  The justification for the x86 part is that:
> >> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> >> optab support, i386 port has already customized costing for
> >> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> >> support), if we don't follow the same way for IFN_MULH, I'm worried that
> >> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> >> a right thing (we shouldn't cost it specially), it at least means we
> >> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> >> has direct mul_highpart optab support, I think they should be costed
> >> consistently.  Does it sound reasonable?
> >
> > Ah, I was under impression that i386 part was introduced to avoid
> > generation of PMULHW instructions in the testcases above (to keep
> > XFAILs). Based on your explanation - yes, the costing function should
> > be the same. So, the x86 part is OK.
> >
>
> Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
> doesn't care about the costing much just like most vectorization cases?
> If so, do you want me to remove the xfails with one extra option
> "-fno-vect-cost-model" along with this patch.
>

Yes, please do so. The testcase cares only about PMULHW generation.

Thanks,
Uros.


> BR,
> Kewen
>


[PATCH V3] Use preferred mode for doloop IV [PR61837]

2021-07-15 Thread Jiufu Guo via Gcc-patches
Refine code for V2 according to review comments:
* Use if check instead assert, and refine assert
* Use better RE check for test case, e.g. (?n)/(?p)
* Use better wording for target.def

Currently, doloop.xx variable is using the type as niter which may be
shorter than word size.  For some targets, it would be better to use
word size type.  For example, on 64bit system, to access 32bit value,
subreg maybe used.  Then using 64bit type maybe better for niter if
it can be present in both 32bit and 64bit.

This patch add target hook for querg perferred mode for doloop IV.
And update mode accordingly.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-07-15  Jiufu Guo  

PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-15  Jiufu Guo  

PR target/61837
* gcc.target/powerpc/pr61837.c: New test.


---
 gcc/config/rs6000/rs6000.c | 11 
 gcc/doc/tm.texi|  9 +++
 gcc/doc/tm.texi.in |  2 +
 gcc/target.def | 13 
 gcc/targhooks.c|  8 +++
 gcc/targhooks.h|  1 +
 gcc/testsuite/gcc.target/powerpc/pr61837.c | 20 +++
 gcc/tree-ssa-loop-ivopts.c | 69 +-
 8 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..3bdf0cb97a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1700,6 +1700,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_DOLOOP_COST_FOR_ADDRESS
 #define TARGET_DOLOOP_COST_FOR_ADDRESS 10
 
+#undef TARGET_PREFERRED_DOLOOP_MODE
+#define TARGET_PREFERRED_DOLOOP_MODE rs6000_preferred_doloop_mode
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
@@ -27867,6 +27870,14 @@ rs6000_predict_doloop_p (struct loop *loop)
   return true;
 }
 
+/* Implement TARGET_PREFERRED_DOLOOP_MODE. */
+
+static machine_mode
+rs6000_preferred_doloop_mode (machine_mode)
+{
+  return word_mode;
+}
+
 /* Implement TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P.  */
 
 static bool
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2a41ae5fba1..fcfebc2ae37 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11984,6 +11984,15 @@ By default, the RTL loop optimizer does not use a 
present doloop pattern for
 loops containing function calls or branch on table instructions.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
(machine_mode @var{mode})
+This hook takes a @var{mode} for a doloop IV, where @code{mode} is the
+original mode for the operation.  If the target prefers an alternate
+@code{mode} for the operation, then this hook should return that mode;
+otherwise the original @code{mode} should be returned.  For example, on a
+64-bit target, @code{DImode} might be preferred over @code{SImode}.  Both the
+original and the returned modes should be @code{MODE_INT}.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
*@var{insn})
 Take an instruction in @var{insn} and return @code{false} if the instruction
 is not appropriate as a combination of two or more instructions.  The
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f881cdabe9e..38215149a92 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7917,6 +7917,8 @@ to by @var{ce_info}.
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
+@hook TARGET_PREFERRED_DOLOOP_MODE
+
 @hook TARGET_LEGITIMATE_COMBINED_INSN
 
 @hook TARGET_CAN_FOLLOW_JUMP
diff --git a/gcc/target.def b/gcc/target.def
index c009671c583..892c97550b2 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4454,6 +4454,19 @@ loops containing function calls or branch on table 
instructions.",
  const char *, (const rtx_insn *insn),
  default_invalid_within_doloop)
 
+/* Returns the machine mode which the target prefers for doloop IV.  */
+DEFHOOK
+(preferred_doloop_mode,
+"This hook takes a @var{mode} for a doloop IV, where @code{mode} is the\n\
+original mode for the operation.  If the target prefers an alternate\n\
+@code{mode} for the operation, then this hook should return that mode;\n\
+otherwise the original @code{mode}

[PATCH 1/2] Streamline vect_gen_while

2021-07-15 Thread Richard Biener
This adjusts the vect_gen_while API to match that of
vect_gen_while_not allowing further patches to generate more
than one stmt for the while case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, tested a
toy example on SVE that it still produces the same code.

OK?

2021-07-15  Richard Biener  

* tree-vectorizer.h (vect_gen_while): Match up with
vect_gen_while_not.
* tree-vect-stmts.c (vect_gen_while): Adjust API to that
of vect_gen_while_not.
(vect_gen_while_not): Adjust.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Likewise.
---
 gcc/tree-vect-loop-manip.c | 14 ++
 gcc/tree-vect-stmts.c  | 16 
 gcc/tree-vectorizer.h  |  3 ++-
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index c29ffb3356c..1f3d6614e6c 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -609,11 +609,8 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
}
 
  if (use_masks_p)
-   {
- init_ctrl = make_temp_ssa_name (ctrl_type, NULL, "max_mask");
- gimple *tmp_stmt = vect_gen_while (init_ctrl, start, end);
- gimple_seq_add_stmt (preheader_seq, tmp_stmt);
-   }
+   init_ctrl = vect_gen_while (preheader_seq, ctrl_type,
+   start, end, "max_mask");
  else
{
  init_ctrl = make_temp_ssa_name (compare_type, NULL, "max_len");
@@ -652,9 +649,10 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Get the control value for the next iteration of the loop.  */
   if (use_masks_p)
{
- next_ctrl = make_temp_ssa_name (ctrl_type, NULL, "next_mask");
- gcall *call = vect_gen_while (next_ctrl, test_index, this_test_limit);
- gsi_insert_before (test_gsi, call, GSI_SAME_STMT);
+ gimple_seq stmts = NULL;
+ next_ctrl = vect_gen_while (&stmts, ctrl_type, test_index,
+ this_test_limit, "next_mask");
+ gsi_insert_seq_before (test_gsi, stmts, GSI_SAME_STMT);
}
   else
{
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d9eeda50278..6a25d661800 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12002,19 +12002,21 @@ supportable_narrowing_operation (enum tree_code code,
 /* Generate and return a statement that sets vector mask MASK such that
MASK[I] is true iff J + START_INDEX < END_INDEX for all J <= I.  */
 
-gcall *
-vect_gen_while (tree mask, tree start_index, tree end_index)
+tree
+vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
+   tree end_index, const char *name)
 {
   tree cmp_type = TREE_TYPE (start_index);
-  tree mask_type = TREE_TYPE (mask);
   gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
   cmp_type, mask_type,
   OPTIMIZE_FOR_SPEED));
   gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
start_index, end_index,
build_zero_cst (mask_type));
-  gimple_call_set_lhs (call, mask);
-  return call;
+  tree tmp = make_temp_ssa_name (mask_type, NULL, name);
+  gimple_call_set_lhs (call, tmp);
+  gimple_seq_add_stmt (seq, call);
+  return tmp;
 }
 
 /* Generate a vector mask of type MASK_TYPE for which index I is false iff
@@ -12024,9 +12026,7 @@ tree
 vect_gen_while_not (gimple_seq *seq, tree mask_type, tree start_index,
tree end_index)
 {
-  tree tmp = make_ssa_name (mask_type);
-  gcall *call = vect_gen_while (tmp, start_index, end_index);
-  gimple_seq_add_stmt (seq, call);
+  tree tmp = vect_gen_while (seq, mask_type, start_index, end_index);
   return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 4c4bc810c35..49afdd898d0 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1948,7 +1948,8 @@ extern bool vect_supportable_shift (vec_info *, enum 
tree_code, tree);
 extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
 extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
 extern void optimize_mask_stores (class loop*);
-extern gcall *vect_gen_while (tree, tree, tree);
+extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
+   const char * = nullptr);
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern opt_result vect_get_vector_types_for_stmt (vec_info *,
  stmt_vec_info, tree *,
-- 
2.26.2



[PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
The following extends the existing loop masking support using
SVE WHILE_ULT to x86 by proving an alternate way to produce the
mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
you can now enable masked vectorized epilogues (=1) or fully
masked vector loops (=2).

What's missing is using a scalar IV for the loop control
(but in principle AVX512 can use the mask here - just the patch
doesn't seem to work for AVX512 yet for some reason - likely
expand_vec_cond_expr_p doesn't work there).  What's also missing
is providing more support for predicated operations in the case
of reductions either via VEC_COND_EXPRs or via implementing
some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
to masked AVX512 operations.

For AVX2 and

int foo (unsigned *a, unsigned * __restrict b, int n)
{
  unsigned sum = 1;
  for (int i = 0; i < n; ++i)
b[i] += a[i];
  return sum;
}

we get

.L3:
vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
addl$8, %edx
vpaddd  %ymm3, %ymm1, %ymm1
vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
vmovd   %edx, %xmm1
vpsubd  %ymm15, %ymm2, %ymm0
addq$32, %rax
vpbroadcastd%xmm1, %ymm1
vpaddd  %ymm4, %ymm1, %ymm1
vpsubd  %ymm15, %ymm1, %ymm1
vpcmpgtd%ymm1, %ymm0, %ymm0
vptest  %ymm0, %ymm0
jne .L3

for the fully masked loop body and for the masked epilogue
we see

.L4:
vmovdqu (%rsi,%rax), %ymm3
vpaddd  (%rdi,%rax), %ymm3, %ymm0
vmovdqu %ymm0, (%rsi,%rax)
addq$32, %rax
cmpq%rax, %rcx
jne .L4
movl%edx, %eax
andl$-8, %eax
testb   $7, %dl
je  .L11
.L3:
subl%eax, %edx
vmovdqa .LC0(%rip), %ymm1
salq$2, %rax
vmovd   %edx, %xmm0
movl$-2147483648, %edx
addq%rax, %rsi
vmovd   %edx, %xmm15
vpbroadcastd%xmm0, %ymm0
vpbroadcastd%xmm15, %ymm15
vpsubd  %ymm15, %ymm1, %ymm1
vpsubd  %ymm15, %ymm0, %ymm0
vpcmpgtd%ymm1, %ymm0, %ymm0
vpmaskmovd  (%rsi), %ymm0, %ymm1
vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
vpaddd  %ymm2, %ymm1, %ymm1
vpmaskmovd  %ymm1, %ymm0, (%rsi)
.L11:
vzeroupper

compared to

.L3:
movl%edx, %r8d
subl%eax, %r8d
leal-1(%r8), %r9d
cmpl$2, %r9d
jbe .L6
leaq(%rcx,%rax,4), %r9
vmovdqu (%rdi,%rax,4), %xmm2
movl%r8d, %eax
andl$-4, %eax
vpaddd  (%r9), %xmm2, %xmm0
addl%eax, %esi
andl$3, %r8d
vmovdqu %xmm0, (%r9)
je  .L2
.L6:
movslq  %esi, %r8
leaq0(,%r8,4), %rax
movl(%rdi,%r8,4), %r8d
addl%r8d, (%rcx,%rax)
leal1(%rsi), %r8d
cmpl%r8d, %edx
jle .L2
addl$2, %esi
movl4(%rdi,%rax), %r8d
addl%r8d, 4(%rcx,%rax)
cmpl%esi, %edx
jle .L2
movl8(%rdi,%rax), %edx
addl%edx, 8(%rcx,%rax)
.L2:

I'm giving this a little testing right now but will dig on why
I don't get masked loops when AVX512 is enabled.

Still comments are appreciated.

Thanks,
Richard.

2021-07-15  Richard Biener  

* tree-vect-stmts.c (can_produce_all_loop_masks_p): We
also can produce masks with VEC_COND_EXPRs.
* tree-vect-loop.c (vect_gen_while): Generate the mask
with a VEC_COND_EXPR in case WHILE_ULT is not supported.
---
 gcc/tree-vect-loop.c  |  8 ++-
 gcc/tree-vect-stmts.c | 50 ++-
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fc3dab0d143..2214ed11dfb 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -975,11 +975,17 @@ can_produce_all_loop_masks_p (loop_vec_info loop_vinfo, 
tree cmp_type)
 {
   rgroup_controls *rgm;
   unsigned int i;
+  tree cmp_vectype;
   FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
 if (rgm->type != NULL_TREE
&& !direct_internal_fn_supported_p (IFN_WHILE_ULT,
cmp_type, rgm->type,
-   OPTIMIZE_FOR_SPEED))
+   OPTIMIZE_FOR_SPEED)
+   && ((cmp_vectype
+  = truth_type_for (build_vector_type
+(cmp_type, TYPE_VECTOR_SUBPARTS (rgm->type,
+   true)
+   && !expand_vec_cond_expr_p (rgm->type, cmp_vectype, LT_EXPR))
   return false;
   return true;
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 6a25d661800..216986399b1 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12007,16 +12007,46 @@ vect_gen_while (gimple_seq *seq, tree mask_type, tree 
s

Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-15 Thread Thomas Schwinge
Hi!

On 2021-07-02T09:15:27+0200, Richard Biener via Gcc-patches 
 wrote:
> On Thu, Jul 1, 2021 at 5:17 PM Hafiz Abid Qadeer  
> wrote:
>>
>> Currently, if we look at the debug information for offload kernel
>> regions, it looks something like this:
>>
>> void foo (void)
>> {
>> #pragma acc kernels
>>   {
>>
>>   }
>> }
>>
>> DW_TAG_compile_unit
>>   DW_AT_name("")
>>
>>   DW_TAG_subprogram // notional parent function (foo) with no code range
>>
>> DW_TAG_subprogram // offload function foo._omp_fn.0
>>
>> There is an artificial compile unit. It contains a parent subprogram which
>> has the offload function as its child.  The parent function makes sense in
>> host code where it actually exists and does have an address range. But in
>> offload code, it does not exist and neither the generated dwarf has an
>> address range for this function.
>>
>> When debugger read the dwarf for offload code, they see a function with no
>> address range and discard it alongwith its children which include offload
>> function.  This results in a poor debug experience of offload code.
>>
>> This patch tries to solve this problem by making offload kernels children of
>> "artifical" compile unit instead of a non existent parent function. This
>> not only improves debug experience but also reflects the reality better
>> in debug info.
>>
>> Patch was tested on x86_64 with amdgcn offload. Debug behavior was
>> tested with rocgdb.
>
> The proper fix is to reflect this in the functions declaration which currently
> will have a DECL_CONTEXT of the containing function.  That could be
> done either on the host as well or alternatively at the time we offload
> the "child" but not the parent.

Does that mean adding a (very simple) new pass in the offloading
compilation pipeline, conditionalizing this 'DECL_CONTEXT' modification
under '#ifdef ACCEL_COMPILER'?  See
'gcc/omp-offload.c:pass_omp_target_link' for a simple example.

Should that be placed at the beginning of the offloading pipeline, thus
before 'pass_oacc_device_lower' (see 'gcc/passes.def'), or doesn't matter
where, I suppose?

Please cross-reference 'gcc/omp-low.c:create_omp_child_function',
'gcc/omp-expand.c:adjust_context_and_scope', and the new pass, assuming
these are the relevant pieces here?


> Note that the "parent" should be abstract but I don't think dwarf has a
> way to express a fully abstract parent of a concrete instance child - or
> at least how GCC expresses this causes consumers to "misinterpret"
> that.  I wonder if adding a DW_AT_declaration to the late DWARF
> emitted "parent" would fix things as well here?

(I suppose not, Abid?)


Grüße
 Thomas


>> gcc/
>>
>> * gcc/dwarf2out.c (notional_parents_list): New file variable.
>> (gen_subprogram_die): Record offload kernel functions in
>> notional_parents_list.
>> (fixup_notional_parents): New function.
>> (dwarf2out_finish): Call fixup_notional_parents.
>> (dwarf2out_c_finalize): Reset notional_parents_list.
>> ---
>>  gcc/dwarf2out.c | 68 +++--
>>  1 file changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>> index 80acf165fee..769bb7fc4a8 100644
>> --- a/gcc/dwarf2out.c
>> +++ b/gcc/dwarf2out.c
>> @@ -3506,6 +3506,11 @@ static GTY(()) limbo_die_node *limbo_die_list;
>> DW_AT_{,MIPS_}linkage_name once their DECL_ASSEMBLER_NAMEs are set.  */
>>  static GTY(()) limbo_die_node *deferred_asm_name;
>>
>> +/* A list of DIEs which represent parents of nested offload kernels.  These
>> +   functions exist on the host side but not in the offloed code.  But they
>> +   still show up as parent of the ofload kernels in DWARF. */
>> +static GTY(()) limbo_die_node *notional_parents_list;
>> +
>>  struct dwarf_file_hasher : ggc_ptr_hash
>>  {
>>typedef const char *compare_type;
>> @@ -23652,8 +23657,23 @@ gen_subprogram_die (tree decl, dw_die_ref 
>> context_die)
>>   if (fde->dw_fde_begin)
>> {
>>   /* We have already generated the labels.  */
>> - add_AT_low_high_pc (subr_die, fde->dw_fde_begin,
>> - fde->dw_fde_end, false);
>> + add_AT_low_high_pc (subr_die, fde->dw_fde_begin,
>> + fde->dw_fde_end, false);
>> +
>> +/* Offload kernel functions are nested within a parent function
>> +   that doesn't actually exist in the offload object.  GDB
>> +   will ignore the function and everything nested within it as
>> +   the function does not have an address range.  We mark the
>> +   parent functions here and will later fix them.  */
>> +if (lookup_attribute ("omp target entrypoint",
>> +  DECL_ATTRIBUTES (decl)))
>> +  {
>> +limbo_die_node *node = ggc_cleared_alloc ();
>> +node->die = subr_die->die_parent;
>> +   

Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-15 Thread Hafiz Abid Qadeer
On 15/07/2021 11:33, Thomas Schwinge wrote:
> 
>> Note that the "parent" should be abstract but I don't think dwarf has a
>> way to express a fully abstract parent of a concrete instance child - or
>> at least how GCC expresses this causes consumers to "misinterpret"
>> that.  I wonder if adding a DW_AT_declaration to the late DWARF
>> emitted "parent" would fix things as well here?
> 
> (I suppose not, Abid?)
> 

Yes, adding DW_AT_declaration does not fix the problem.

-- 
Hafiz Abid Qadeer
Mentor, a Siemens Business


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
>
> The following extends the existing loop masking support using
> SVE WHILE_ULT to x86 by proving an alternate way to produce the
> mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> you can now enable masked vectorized epilogues (=1) or fully
> masked vector loops (=2).
>
> What's missing is using a scalar IV for the loop control
> (but in principle AVX512 can use the mask here - just the patch
> doesn't seem to work for AVX512 yet for some reason - likely
> expand_vec_cond_expr_p doesn't work there).  What's also missing
> is providing more support for predicated operations in the case
> of reductions either via VEC_COND_EXPRs or via implementing
> some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> to masked AVX512 operations.
>
> For AVX2 and
>
> int foo (unsigned *a, unsigned * __restrict b, int n)
> {
>   unsigned sum = 1;
>   for (int i = 0; i < n; ++i)
> b[i] += a[i];
>   return sum;
> }
>
> we get
>
> .L3:
> vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> addl$8, %edx
> vpaddd  %ymm3, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> vmovd   %edx, %xmm1
> vpsubd  %ymm15, %ymm2, %ymm0
> addq$32, %rax
> vpbroadcastd%xmm1, %ymm1
> vpaddd  %ymm4, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm1, %ymm1
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vptest  %ymm0, %ymm0
> jne .L3
>
> for the fully masked loop body and for the masked epilogue
> we see
>
> .L4:
> vmovdqu (%rsi,%rax), %ymm3
> vpaddd  (%rdi,%rax), %ymm3, %ymm0
> vmovdqu %ymm0, (%rsi,%rax)
> addq$32, %rax
> cmpq%rax, %rcx
> jne .L4
> movl%edx, %eax
> andl$-8, %eax
> testb   $7, %dl
> je  .L11
> .L3:
> subl%eax, %edx
> vmovdqa .LC0(%rip), %ymm1
> salq$2, %rax
> vmovd   %edx, %xmm0
> movl$-2147483648, %edx
> addq%rax, %rsi
> vmovd   %edx, %xmm15
> vpbroadcastd%xmm0, %ymm0
> vpbroadcastd%xmm15, %ymm15
> vpsubd  %ymm15, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm0, %ymm0
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vpmaskmovd  (%rsi), %ymm0, %ymm1
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> vpaddd  %ymm2, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi)
> .L11:
> vzeroupper
>
> compared to
>
> .L3:
> movl%edx, %r8d
> subl%eax, %r8d
> leal-1(%r8), %r9d
> cmpl$2, %r9d
> jbe .L6
> leaq(%rcx,%rax,4), %r9
> vmovdqu (%rdi,%rax,4), %xmm2
> movl%r8d, %eax
> andl$-4, %eax
> vpaddd  (%r9), %xmm2, %xmm0
> addl%eax, %esi
> andl$3, %r8d
> vmovdqu %xmm0, (%r9)
> je  .L2
> .L6:
> movslq  %esi, %r8
> leaq0(,%r8,4), %rax
> movl(%rdi,%r8,4), %r8d
> addl%r8d, (%rcx,%rax)
> leal1(%rsi), %r8d
> cmpl%r8d, %edx
> jle .L2
> addl$2, %esi
> movl4(%rdi,%rax), %r8d
> addl%r8d, 4(%rcx,%rax)
> cmpl%esi, %edx
> jle .L2
> movl8(%rdi,%rax), %edx
> addl%edx, 8(%rcx,%rax)
> .L2:
>
> I'm giving this a little testing right now but will dig on why
> I don't get masked loops when AVX512 is enabled.

Ah, a simple thinko - rgroup_controls vectypes seem to be
always VECTOR_BOOLEAN_TYPE_P and thus we can
use expand_vec_cmp_expr_p.  The AVX512 fully masked
loop then looks like

.L3:
vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
vpaddd  %ymm2, %ymm1, %ymm0
vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
addq$8, %rax
vpbroadcastd%eax, %ymm0
vpaddd  %ymm4, %ymm0, %ymm0
vpcmpud $6, %ymm0, %ymm3, %k1
kortestb%k1, %k1
jne .L3

I guess for x86 it's not worth preserving the VEC_COND_EXPR
mask generation but other archs may not provide all required vec_cmp
expanders.

Richard.

> Still comments are appreciated.
>
> Thanks,
> Richard.
>
> 2021-07-15  Richard Biener  
>
> * tree-vect-stmts.c (can_produce_all_loop_masks_p): We
> also can produce masks with VEC_COND_EXPRs.
> * tree-vect-loop.c (vect_gen_while): Generate the mask
> with a VEC_COND_EXPR in case WHILE_ULT is not supported.
> ---
>  gcc/tree-vect-loop.c  |  8 ++-
>  gcc/tree-vect-stmts.c | 50 ++-
>  2 files changed, 47 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index fc3dab0d143..2214ed11dfb 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -975

Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
Well, if we don't adjust gimple_call_return_type() to handle built-ins
with no LHS, then we must adjust the callers.

The attached patch fixes gimple_expr_type() per it's documentation:

/* Return the type of the main expression computed by STMT.  Return
   void_type_node if the statement computes nothing.  */

Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.

I still think gimple_call_return_type should return void_type_node
instead of ICEing, but this will also fix my problem.

Anyone have a problem with this?

Aldy

On Thu, Jun 24, 2021 at 3:57 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> On 6/24/21 9:45 AM, Jakub Jelinek wrote:
> > On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches 
> > wrote:
> >> We'll still compute values for statements that don't have a LHS.. there's
> >> nothing inherently wrong with that.  The primary example is
> >>
> >> if (x_2 < y_3)
> >>
> >> we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  It
> >> primarily becomes a generic way to ask for the range of each of the 
> >> operands
> >> of the statement, and process it regardless of the presence of a LHS.  I
> >> don't know, maybe there is (or will be)  an internal function that doesn't
> >> have a LHS but which can be folded away/rewritten if the operands are
> >> certain values.
> > There are many internal functions that aren't ECF_CONST or ECF_PURE.  Some
> > of them, like IFN*STORE* I think never have an lhs, others have them, but
> > if the lhs is unused, various optimization passes can just remove those lhs
> > from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls
> > would be DCEd).
> >
> > I think generally, if a call doesn't have lhs, there is no point in
> > computing a value range for that missing lhs.  It won't be useful for the
> > call arguments to lhs direction (nothing would care about that value) and
> > it won't be useful on the direction from the lhs to the call arguments
> > either.  Say if one has
> >p_23 = __builtin_memcpy (p_75, q_23, 16);
> > then one can imply from ~[0, 0] range on p_75 that p_23 has that range too
> > (and vice versa), but if one has
> >__builtin_memcpy (p_125, q_23, 16);
> > none of that makes sense.
> >
> > So instead of punting when gimple_call_return_type returns NULL IMHO the
> > code should punt when gimple_call_lhs is NULL.
> >
> >
>
> Well, we are going to punt anyway, because the call type, whether it is
> NULL or VOIDmode is not supported by irange.   It was more just a matter
> of figuring out whether us checking for internal call or the
> gimple_function_return_type call should do the check...   Ultimately in
> the end it doesnt matter.. just seemed like something someone else could
> trip across if we didnt strengthen gimple_call_return_type to not ice.
>
> Andrew
>
commit 2717e79f571b23f74bb438c27ad1551de8eb9a4d
Author: Aldy Hernandez 
Date:   Thu Jul 15 12:47:26 2021 +0200

Handle built-ins with no return TYPE in gimple_expr_type.

Since gimple_call_return_type ICE's, on built-ins with no return types,
all callers must be adjusted.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_call):
* gimple.h (gimple_expr_type):

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index eff5d1f89f2..7d20c6b4b04 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -780,7 +780,7 @@ fold_using_range::range_of_phi (irange &r, gphi *phi, fur_source &src)
 bool
 fold_using_range::range_of_call (irange &r, gcall *call, fur_source &src)
 {
-  tree type = gimple_call_return_type (call);
+  tree type = gimple_expr_type (call);
   tree lhs = gimple_call_lhs (call);
   bool strict_overflow_p;
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index acf572b81be..395257eb312 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -6633,6 +6633,13 @@ gimple_expr_type (const gimple *stmt)
 	  default:
 	break;
 	  }
+
+  // ?? The call to gimple_call_return_type below will ICE on
+  // built-ins with no LHS.  An alternative would be to return
+  // void_type_node from it insteadl.
+  if (!gimple_call_lhs (call_stmt) && gimple_call_internal_p (call_stmt))
+	return void_type_node;
+
   return gimple_call_return_type (call_stmt);
 }
   else if (code == GIMPLE_ASSIGN)


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
> >
> > The following extends the existing loop masking support using
> > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > you can now enable masked vectorized epilogues (=1) or fully
> > masked vector loops (=2).
> >
> > What's missing is using a scalar IV for the loop control
> > (but in principle AVX512 can use the mask here - just the patch
> > doesn't seem to work for AVX512 yet for some reason - likely
> > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > is providing more support for predicated operations in the case
> > of reductions either via VEC_COND_EXPRs or via implementing
> > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > to masked AVX512 operations.
> >
> > For AVX2 and
> >
> > int foo (unsigned *a, unsigned * __restrict b, int n)
> > {
> >   unsigned sum = 1;
> >   for (int i = 0; i < n; ++i)
> > b[i] += a[i];
> >   return sum;
> > }
> >
> > we get
> >
> > .L3:
> > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > addl$8, %edx
> > vpaddd  %ymm3, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > vmovd   %edx, %xmm1
> > vpsubd  %ymm15, %ymm2, %ymm0
> > addq$32, %rax
> > vpbroadcastd%xmm1, %ymm1
> > vpaddd  %ymm4, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vptest  %ymm0, %ymm0
> > jne .L3
> >
> > for the fully masked loop body and for the masked epilogue
> > we see
> >
> > .L4:
> > vmovdqu (%rsi,%rax), %ymm3
> > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > vmovdqu %ymm0, (%rsi,%rax)
> > addq$32, %rax
> > cmpq%rax, %rcx
> > jne .L4
> > movl%edx, %eax
> > andl$-8, %eax
> > testb   $7, %dl
> > je  .L11
> > .L3:
> > subl%eax, %edx
> > vmovdqa .LC0(%rip), %ymm1
> > salq$2, %rax
> > vmovd   %edx, %xmm0
> > movl$-2147483648, %edx
> > addq%rax, %rsi
> > vmovd   %edx, %xmm15
> > vpbroadcastd%xmm0, %ymm0
> > vpbroadcastd%xmm15, %ymm15
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm0, %ymm0
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > vpaddd  %ymm2, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > .L11:
> > vzeroupper
> >
> > compared to
> >
> > .L3:
> > movl%edx, %r8d
> > subl%eax, %r8d
> > leal-1(%r8), %r9d
> > cmpl$2, %r9d
> > jbe .L6
> > leaq(%rcx,%rax,4), %r9
> > vmovdqu (%rdi,%rax,4), %xmm2
> > movl%r8d, %eax
> > andl$-4, %eax
> > vpaddd  (%r9), %xmm2, %xmm0
> > addl%eax, %esi
> > andl$3, %r8d
> > vmovdqu %xmm0, (%r9)
> > je  .L2
> > .L6:
> > movslq  %esi, %r8
> > leaq0(,%r8,4), %rax
> > movl(%rdi,%r8,4), %r8d
> > addl%r8d, (%rcx,%rax)
> > leal1(%rsi), %r8d
> > cmpl%r8d, %edx
> > jle .L2
> > addl$2, %esi
> > movl4(%rdi,%rax), %r8d
> > addl%r8d, 4(%rcx,%rax)
> > cmpl%esi, %edx
> > jle .L2
> > movl8(%rdi,%rax), %edx
> > addl%edx, 8(%rcx,%rax)
> > .L2:
> >
> > I'm giving this a little testing right now but will dig on why
> > I don't get masked loops when AVX512 is enabled.
>
> Ah, a simple thinko - rgroup_controls vectypes seem to be
> always VECTOR_BOOLEAN_TYPE_P and thus we can
> use expand_vec_cmp_expr_p.  The AVX512 fully masked
> loop then looks like
>
> .L3:
> vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
> vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
> vpaddd  %ymm2, %ymm1, %ymm0
> vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
> addq$8, %rax
> vpbroadcastd%eax, %ymm0
> vpaddd  %ymm4, %ymm0, %ymm0
> vpcmpud $6, %ymm0, %ymm3, %k1
> kortestb%k1, %k1
> jne .L3
>
> I guess for x86 it's not worth preserving the VEC_COND_EXPR
> mask generation but other archs may not provide all required vec_cmp
> expanders.

For the main loop, the full-masked loop's codegen seems much worse.
Basically, we need at least 4 instructions to do what while_ult in arm does.

 vpbroadcastd%eax, %ymm0
 vpaddd  %ymm4, %ymm0, %ymm0
 vpcmpud $6, %ymm0, %ymm3, %k1
 kortestb%k1, %k1
vs
   whilelo(or some other while)

more instructions 

Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

2021-07-15 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 15 Jul 2021 at 14:47, Christophe Lyon
 wrote:
>
> Hi Prathamesh,
>
> On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches 
>  wrote:
>>
>>
>>
>> > -Original Message-
>> > From: Prathamesh Kulkarni 
>> > Sent: 05 July 2021 10:18
>> > To: gcc Patches ; Kyrylo Tkachov
>> > 
>> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
>> > intrinsics
>> >
>> > Hi Kyrill,
>> > I assume this patch is OK to commit after bootstrap+testing ?
>>
>> Yes.
>> Thanks,
>> Kyrill
>>
>
>
> The updated testcase fails on some configs:
> gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 2 
> times
> FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times 
> vdup\\.16\\tq[0-9]+, r[0-9]+ 3
>
> For instance on arm-none-eabi with default configuration flags (mode/cpu/fpu)
> and default runtestflags.
> The same toolchain config also fails on this test when overriding 
> runtestflags with:
> -mthumb/-mfloat-abi=soft/-march=armv6s-m
> -mthumb/-mfloat-abi=soft/-march=armv7-m
> -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
>
> Can you fix this please?
Hi Christophe,
Sorry for the breakage, I will take a look.

Thanks,
Prathamesh
>
> Thanks,
>
> Christophe
>
>> >
>> > Thanks,
>> > Prathamesh


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Hongtao Liu wrote:

> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
> > >
> > > The following extends the existing loop masking support using
> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > > you can now enable masked vectorized epilogues (=1) or fully
> > > masked vector loops (=2).
> > >
> > > What's missing is using a scalar IV for the loop control
> > > (but in principle AVX512 can use the mask here - just the patch
> > > doesn't seem to work for AVX512 yet for some reason - likely
> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > > is providing more support for predicated operations in the case
> > > of reductions either via VEC_COND_EXPRs or via implementing
> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > > to masked AVX512 operations.
> > >
> > > For AVX2 and
> > >
> > > int foo (unsigned *a, unsigned * __restrict b, int n)
> > > {
> > >   unsigned sum = 1;
> > >   for (int i = 0; i < n; ++i)
> > > b[i] += a[i];
> > >   return sum;
> > > }
> > >
> > > we get
> > >
> > > .L3:
> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > > addl$8, %edx
> > > vpaddd  %ymm3, %ymm1, %ymm1
> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > > vmovd   %edx, %xmm1
> > > vpsubd  %ymm15, %ymm2, %ymm0
> > > addq$32, %rax
> > > vpbroadcastd%xmm1, %ymm1
> > > vpaddd  %ymm4, %ymm1, %ymm1
> > > vpsubd  %ymm15, %ymm1, %ymm1
> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > > vptest  %ymm0, %ymm0
> > > jne .L3
> > >
> > > for the fully masked loop body and for the masked epilogue
> > > we see
> > >
> > > .L4:
> > > vmovdqu (%rsi,%rax), %ymm3
> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > > vmovdqu %ymm0, (%rsi,%rax)
> > > addq$32, %rax
> > > cmpq%rax, %rcx
> > > jne .L4
> > > movl%edx, %eax
> > > andl$-8, %eax
> > > testb   $7, %dl
> > > je  .L11
> > > .L3:
> > > subl%eax, %edx
> > > vmovdqa .LC0(%rip), %ymm1
> > > salq$2, %rax
> > > vmovd   %edx, %xmm0
> > > movl$-2147483648, %edx
> > > addq%rax, %rsi
> > > vmovd   %edx, %xmm15
> > > vpbroadcastd%xmm0, %ymm0
> > > vpbroadcastd%xmm15, %ymm15
> > > vpsubd  %ymm15, %ymm1, %ymm1
> > > vpsubd  %ymm15, %ymm0, %ymm0
> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > > vpaddd  %ymm2, %ymm1, %ymm1
> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > > .L11:
> > > vzeroupper
> > >
> > > compared to
> > >
> > > .L3:
> > > movl%edx, %r8d
> > > subl%eax, %r8d
> > > leal-1(%r8), %r9d
> > > cmpl$2, %r9d
> > > jbe .L6
> > > leaq(%rcx,%rax,4), %r9
> > > vmovdqu (%rdi,%rax,4), %xmm2
> > > movl%r8d, %eax
> > > andl$-4, %eax
> > > vpaddd  (%r9), %xmm2, %xmm0
> > > addl%eax, %esi
> > > andl$3, %r8d
> > > vmovdqu %xmm0, (%r9)
> > > je  .L2
> > > .L6:
> > > movslq  %esi, %r8
> > > leaq0(,%r8,4), %rax
> > > movl(%rdi,%r8,4), %r8d
> > > addl%r8d, (%rcx,%rax)
> > > leal1(%rsi), %r8d
> > > cmpl%r8d, %edx
> > > jle .L2
> > > addl$2, %esi
> > > movl4(%rdi,%rax), %r8d
> > > addl%r8d, 4(%rcx,%rax)
> > > cmpl%esi, %edx
> > > jle .L2
> > > movl8(%rdi,%rax), %edx
> > > addl%edx, 8(%rcx,%rax)
> > > .L2:
> > >
> > > I'm giving this a little testing right now but will dig on why
> > > I don't get masked loops when AVX512 is enabled.
> >
> > Ah, a simple thinko - rgroup_controls vectypes seem to be
> > always VECTOR_BOOLEAN_TYPE_P and thus we can
> > use expand_vec_cmp_expr_p.  The AVX512 fully masked
> > loop then looks like
> >
> > .L3:
> > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
> > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
> > vpaddd  %ymm2, %ymm1, %ymm0
> > vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
> > addq$8, %rax
> > vpbroadcastd%eax, %ymm0
> > vpaddd  %ymm4, %ymm0, %ymm0
> > vpcmpud $6, %ymm0, %ymm3, %k1
> > kortestb%k1, %k1
> > jne .L3
> >
> > I guess for x86 it's not worth preserving the VEC_COND_EXPR
> > mask generation but other archs may not provide all required vec_cmp
> > expanders.
> 
> For the main loop, the full-

Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin  wrote:
>
> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> > Hi Uros,
> >
> > on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
> >>>
> >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>  on 2021/7/14 下午2:38, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
> >>
> >> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  
> >>> wrote:
> >
> >> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >
> > Yes.
> >
> 
>  Thanks for confirming!  The related patch v2 is attached and the testing
>  is ongoing.
> 
> >>>
> >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>>
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>
> >> These XFAILs should be removed after your patch.
> >>
> > I'm curious whether it's intentional not to specify -fno-vect-cost-model
> > for this test case.  As noted above, this case is sensitive on how we
> > cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> > only with this mul_highpart pattern support, no matter how we model it
> > (x86 part of this patch exists or not).
> >
> >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >> is actually not needed.
> >>
> >
> > Thanks for the information!  The justification for the x86 part is that:
> > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> > optab support, i386 port has already customized costing for
> > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> > support), if we don't follow the same way for IFN_MULH, I'm worried that
> > we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> > a right thing (we shouldn't cost it specially), it at least means we
> > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> > has direct mul_highpart optab support, I think they should be costed
> > consistently.  Does it sound reasonable?
> >
>
> Hi Richard(s),
>
> This possibly inconsistent handling problem seems like a counter example
> better to use a new IFN rather than the existing tree_code, it seems hard
> to maintain (should remember to keep consistent for its handlings).  ;)
> From this perspective, maybe it's better to move backward to use tree_code
> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> costing issue Richi pointed out before)?
>
> What do you think?

No, whenever we want to do code generation based on machine
capabilities the canonical way to test for those is to look at optabs
and then it's most natural to keep that 1:1 relation and emit
internal function calls which directly map to supported optabs
instead of going back to some tree codes.

When targets "lie" and provide expanders for something they can
only emulate then they have to compensate in their costing.
But as I understand this isn't the case for x86 here.

Now, in this case we already have the MULT_HIGHPART_EXPR tree,
so yes, it might make sense to use that instead of introducing an
alternate way via the direct internal function.  Somebody decided
that MULT_HIGHPART is generic enough to warrant this - but I
see that expand_mult_highpart can fail unless can_mult_highpart_p
and this is exactly one of the cases we want to avoid - either
we can handle something generally in which case it can be a
tree code or we can't, then it should be 1:1 tied to optabs at best
(mult_highpart has scalar support only for the direct optab,
vector support also for widen_mult).

Richard.

>
> BR,
> Kewen


Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
 wrote:
>
> On 15/07/2021 11:33, Thomas Schwinge wrote:
> >
> >> Note that the "parent" should be abstract but I don't think dwarf has a
> >> way to express a fully abstract parent of a concrete instance child - or
> >> at least how GCC expresses this causes consumers to "misinterpret"
> >> that.  I wonder if adding a DW_AT_declaration to the late DWARF
> >> emitted "parent" would fix things as well here?
> >
> > (I suppose not, Abid?)
> >
>
> Yes, adding DW_AT_declaration does not fix the problem.

Does emitting

DW_TAG_compile_unit
  DW_AT_name("")

  DW_TAG_subprogram // notional parent function (foo) with no code range
DW_AT_declaration 1
a:DW_TAG_subprogram // offload function foo._omp_fn.0
  DW_AT_declaration 1

  DW_TAG_subprogram // offload function
  DW_AT_abstract_origin a
...

do the trick?  The following would do this, flattening function definitions
for the concrete copies:

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 82783c4968b..a9c8bc43e88 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree decl)
   /* Peel types in the context stack.  */
   while (ctx && TYPE_P (ctx))
 ctx = TYPE_CONTEXT (ctx);
+  /* For functions peel the context up to namespace/TU scope.  The abstract
+ copies reveal the true nesting.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
+  ctx = DECL_CONTEXT (ctx);
   /* Likewise namespaces in case we do not want to emit DIEs for them.  */
   if (debug_info_level <= DINFO_LEVEL_TERSE)
 while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
@@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree decl)
/* Leave function local entities parent determination to when
   we process scope vars.  */
;
-  else
-   parent = lookup_decl_die (ctx);
+  parent = lookup_decl_die (ctx);
 }
   else
 /* In some cases the FEs fail to set DECL_CONTEXT properly.



>
> --
> Hafiz Abid Qadeer
> Mentor, a Siemens Business


[PUSHED] Abstract out non_null adjustments in ranger.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
There are 4 exact copies of the non-null range adjusting code in the
ranger.  This patch abstracts the functionality into a separate method.

As a follow-up I would like to remove the varying_p check, since I have
seen incoming ranges such as [0, 0xffef] which are not varying, but
are not-null.  Removing the varying restriction catches those.

Tested on x86-64 Linux.

Pushed to trunk.

p.s. Andrew, what are your thoughts on removing the varying_p() check as
a follow-up?

gcc/ChangeLog:

* gimple-range-cache.cc (non_null_ref::adjust_range): New.
(ranger_cache::range_of_def): Call adjust_range.
(ranger_cache::entry_range): Same.
* gimple-range-cache.h (non_null_ref::adjust_range): New.
* gimple-range.cc (gimple_ranger::range_of_expr): Call
adjust_range.
(gimple_ranger::range_on_entry): Same.
---
 gcc/gimple-range-cache.cc | 35 ++-
 gcc/gimple-range-cache.h  |  2 ++
 gcc/gimple-range.cc   |  8 ++--
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 98ecdbbd68e..23597ade802 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -81,6 +81,29 @@ non_null_ref::non_null_deref_p (tree name, basic_block bb, 
bool search_dom)
   return false;
 }
 
+// If NAME has a non-null dereference in block BB, adjust R with the
+// non-zero information from non_null_deref_p, and return TRUE.  If
+// SEARCH_DOM is true, non_null_deref_p should search the dominator tree.
+
+bool
+non_null_ref::adjust_range (irange &r, tree name, basic_block bb,
+   bool search_dom)
+{
+  // Check if pointers have any non-null dereferences.  Non-call
+  // exceptions mean we could throw in the middle of the block, so just
+  // punt for now on those.
+  if (!cfun->can_throw_non_call_exceptions
+  && r.varying_p ()
+  && non_null_deref_p (name, bb, search_dom))
+{
+  int_range<2> nz;
+  nz.set_nonzero (TREE_TYPE (name));
+  r.intersect (nz);
+  return true;
+}
+  return false;
+}
+
 // Allocate an populate the bitmap for NAME.  An ON bit for a block
 // index indicates there is a non-null reference in that block.  In
 // order to populate the bitmap, a quick run of all the immediate uses
@@ -857,9 +880,8 @@ ranger_cache::range_of_def (irange &r, tree name, 
basic_block bb)
r = gimple_range_global (name);
 }
 
-  if (bb && r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
-  !cfun->can_throw_non_call_exceptions)
-r = range_nonzero (TREE_TYPE (name));
+  if (bb)
+m_non_null.adjust_range (r, name, bb, false);
 }
 
 // Get the range of NAME as it occurs on entry to block BB.
@@ -878,12 +900,7 @@ ranger_cache::entry_range (irange &r, tree name, 
basic_block bb)
   if (!m_on_entry.get_bb_range (r, name, bb))
 range_of_def (r, name);
 
-  // Check if pointers have any non-null dereferences.  Non-call
-  // exceptions mean we could throw in the middle of the block, so just
-  // punt for now on those.
-  if (r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
-  !cfun->can_throw_non_call_exceptions)
-r = range_nonzero (TREE_TYPE (name));
+  m_non_null.adjust_range (r, name, bb, false);
 }
 
 // Get the range of NAME as it occurs on exit from block BB.
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index ecf63dc01b3..f842e9c092a 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -34,6 +34,8 @@ public:
   non_null_ref ();
   ~non_null_ref ();
   bool non_null_deref_p (tree name, basic_block bb, bool search_dom = true);
+  bool adjust_range (irange &r, tree name, basic_block bb,
+bool search_dom = true);
 private:
   vec  m_nn;
   void process_name (tree name);
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 1851339c528..b210787d0b7 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -69,9 +69,7 @@ gimple_ranger::range_of_expr (irange &r, tree expr, gimple 
*stmt)
   if (def_stmt && gimple_bb (def_stmt) == bb)
 {
   range_of_stmt (r, def_stmt, expr);
-  if (!cfun->can_throw_non_call_exceptions && r.varying_p () &&
- m_cache.m_non_null.non_null_deref_p (expr, bb))
-   r = range_nonzero (TREE_TYPE (expr));
+  m_cache.m_non_null.adjust_range (r, expr, bb, true);
 }
   else
 // Otherwise OP comes from outside this block, use range on entry.
@@ -95,9 +93,7 @@ gimple_ranger::range_on_entry (irange &r, basic_block bb, 
tree name)
   if (m_cache.block_range (entry_range, bb, name))
 r.intersect (entry_range);
 
-  if (!cfun->can_throw_non_call_exceptions && r.varying_p () &&
-  m_cache.m_non_null.non_null_deref_p (name, bb))
-r = range_nonzero (TREE_TYPE (name));
+  m_cache.m_non_null.adjust_range (r, name, bb, true);
 }
 
 // Calculate the range for NAME at the end of block BB and return it in R.
-- 
2.31.1



[PATCH v3] c++: Add gnu::diagnose_as attribute

2021-07-15 Thread Matthias Kretz
Hi Jason,

A new revision of the patch is attached. I think I implemented all your 
suggestions.

Please comment on cp/decl2.c (is_alias_template_p). I find it surprising that 
I had to write this function. Maybe I missed something? In any case, 
DECL_ALIAS_TEMPLATE_P requires a template_decl and the TYPE_DECL apparently 
doesn't have a template_info/decl at this point.

From: Matthias Kretz 

This attribute overrides the diagnostics output string for the entity it
appertains to. The motivation is to improve QoI for library TS
implementations, where diagnostics have a very bad signal-to-noise ratio
due to the long namespaces involved.

With the attribute, it is possible to solve PR89370 and make
std::__cxx11::basic_string<_CharT, _Traits, _Alloc> appear as
std::string in diagnostic output without extra hacks to recognize the
type in the C++ frontend.

Signed-off-by: Matthias Kretz 

gcc/ChangeLog:

PR c++/89370
* doc/extend.texi: Document the diagnose_as attribute.
* doc/invoke.texi: Document -fno-diagnostics-use-aliases.

gcc/c-family/ChangeLog:

PR c++/89370
* c.opt (fdiagnostics-use-aliases): New diagnostics flag.

gcc/cp/ChangeLog:

PR c++/89370
* cp-tree.h: Add TFF_AS_PRIMARY. Add is_alias_template_p
declaration.
* decl2.c (is_alias_template_p): New function. Determines
whether a given TYPE_DECL is actually an alias template that is
still missing its template_info.
(is_late_template_attribute): Decls with diagnose_as attribute
are early attributes only if they are alias templates.
* error.c (dump_scope): When printing the name of a namespace,
look for the diagnose_as attribute. If found, print the
associated string instead of calling dump_decl.
(dump_decl_name_or_diagnose_as): New function to replace
dump_decl (pp, DECL_NAME(t), flags) and inspect the tree for the
diagnose_as attribute before printing the DECL_NAME.
(dump_template_scope): New function. Prints the scope of a
template instance correctly applying diagnose_as attributes and
adjusting the list of template parms accordingly.
(dump_aggr_type): If the type has a diagnose_as attribute, print
the associated string instead of printing the original type
name. Print template parms only if the attribute was not applied
to the instantiation / full specialization. Delay call to
dump_scope until the diagnose_as attribute is found. If the
attribute has a second argument, use it to override the context
passed to dump_scope.
(dump_simple_decl): Call dump_decl_name_or_diagnose_as instead
of dump_decl.
(dump_decl): Ditto.
(lang_decl_name): Ditto.
(dump_function_decl): Walk the functions context list to
determine whether a call to dump_template_scope is required.
Ensure function templates are presented as primary templates.
(dump_function_name): Replace the function's identifier with the
diagnose_as attribute value, if set.
(dump_template_parms): Treat as primary template if flags
contains TFF_AS_PRIMARY.
(comparable_template_types_p): Consider the types not a template
if one carries a diagnose_as attribute.
(print_template_differences): Replace the identifier with the
diagnose_as attribute value on the most general template, if it
is set.
* name-lookup.c (handle_namespace_attrs): Handle the diagnose_as
attribute on namespaces. Ensure exactly one string argument.
Ensure previous diagnose_as attributes used the same name.
'diagnose_as' on namespace aliases are forwarded to the original
namespace. Support no-argument 'diagnose_as' on namespace
aliases.
(do_namespace_alias): Add attributes parameter and call
handle_namespace_attrs.
* name-lookup.h (do_namespace_alias): Add attributes tree
parameter.
* parser.c (cp_parser_declaration): If the next token is
RID_NAMESPACE, tentatively parse a namespace alias definition.
If this fails expect a namespace definition.
(cp_parser_namespace_alias_definition): Allow optional
attributes before and after the identifier. Fast exit if the
expected CPP_EQ token is missing. Pass attributes to
do_namespace_alias.
* tree.c (cxx_attribute_table): Add diagnose_as attribute to the
table.
(check_diagnose_as_redeclaration): New function; copied and
adjusted from check_abi_tag_redeclaration.
(handle_diagnose_as_attribute): New function; copied and
adjusted from handle_abi_tag_attribute. If the given *node is a
TYPE_DECL: allow no argument to the attribute, using DECL_NAME
instead; apply the attribute to the type on the RHS in place,
even if the type is complete. Allow 2 argume

Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Christophe Lyon via Gcc-patches
Hi,



On Tue, Jul 13, 2021 at 2:09 PM Richard Biener  wrote:

> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue.  That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust?  I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).
>
> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13  Richard Biener  
>
> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> vector types where the old vector type has a multiple of
> the new vector type elements.
> (vect_create_partial_epilog): New function, split out from...
> (vect_create_epilog_for_reduction): ... here.
> (vect_transform_cycle_phi): Reduce the re-used accumulator
> to the new vector type.
>
> * gcc.target/i386/vect-reduc-1.c: New testcase.
>

This patch is causing regressions on aarch64:
 FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
'PHI' argument 1
vector(2) unsigned int
vector(2) int
_91 = PHI <_90(17), _83(11)>
during GIMPLE pass: vect
dump file: ./pr92324-4.c.167t.vect
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
verify_gimple failed
0xe6438e verify_gimple_in_cfg(function*, bool)
/gcc/tree-cfg.c:5535
0xd13902 execute_function_todo
/gcc/passes.c:2042
0xd142a5 execute_todo
/gcc/passes.c:2096

FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

Thanks,

Christophe



> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
>  gcc/tree-vect-loop.c | 223 ---
>  2 files changed, 155 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>   ones as well.  */
>tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> +   TYPE_VECTOR_SUBPARTS (vectype)))
>  return false;
>
>/* Non-SLP reductions might apply an adjustment after the reduction
> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>return true;
>  }
>
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> code,
> +   gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (vec_def)).to_constant ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +{
> +  nunits /= 2;
> +  tree vectype1 = get_rel

[PATCH] Disable --param vect-partial-vector-usage by default on x86

2021-07-15 Thread Richard Biener
The following defaults --param vect-partial-vector-usage to zero
for x86_64 matching existing behavior where support for this
is not present.

OK for trunk?

Thanks,
Richard/

2021-07-15  Richard Biener  

* config/i386/i386-options.c (ix86_option_override_internal): Set
param_vect_partial_vector_usage to zero if not set.
---
 gcc/config/i386/i386-options.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 7cba655595e..3416a4f1752 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2834,6 +2834,11 @@ ix86_option_override_internal (bool main_args_p,
 
   SET_OPTION_IF_UNSET (opts, opts_set, param_ira_consider_dup_in_all_alts, 0);
 
+  /* Fully masking the main or the epilogue vectorized loop is not
+ profitable generally so leave it disabled until we get more
+ fine grained control & costing.  */
+  SET_OPTION_IF_UNSET (opts, opts_set, param_vect_partial_vector_usage, 0);
+
   return true;
 }
 
-- 
2.26.2


Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Christophe Lyon wrote:

> Hi,
> 
> 
> 
> On Tue, Jul 13, 2021 at 2:09 PM Richard Biener  wrote:
> 
> > The following adds support for re-using the vector reduction def
> > from the main loop in vectorized epilogue loops on architectures
> > which use different vector sizes for the epilogue.  That's only
> > x86 as far as I am aware.
> >
> > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > regtest in progress.
> >
> > There's costing issues on x86 which usually prevent vectorizing
> > an epilogue with a reduction, at least for loops that only
> > have a reduction - it could be mitigated by not accounting for
> > the epilogue there if we can compute that we can re-use the
> > main loops cost.
> >
> > Richard - did I figure the correct place to adjust?  I guess
> > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > for re-use by the skip code in vect_create_epilog_for_reduction
> > is a bit awkward but at least we're conciously doing
> > vect_create_epilog_for_reduction last (via vectorizing live
> > operations).
> >
> > OK in the unlikely case all testing succeeds (I also want to
> > run it through SPEC with/without -fno-vect-cost-model which
> > will take some time)?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-13  Richard Biener  
> >
> > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > vector types where the old vector type has a multiple of
> > the new vector type elements.
> > (vect_create_partial_epilog): New function, split out from...
> > (vect_create_epilog_for_reduction): ... here.
> > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > to the new vector type.
> >
> > * gcc.target/i386/vect-reduc-1.c: New testcase.
> >
> 
> This patch is causing regressions on aarch64:
>  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> 'PHI' argument 1
> vector(2) unsigned int
> vector(2) int
> _91 = PHI <_90(17), _83(11)>
> during GIMPLE pass: vect
> dump file: ./pr92324-4.c.167t.vect
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> verify_gimple failed
> 0xe6438e verify_gimple_in_cfg(function*, bool)
> /gcc/tree-cfg.c:5535
> 0xd13902 execute_function_todo
> /gcc/passes.c:2042
> 0xd142a5 execute_todo
> /gcc/passes.c:2096
> 
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

What exact options do you pass to cc1 to get this?  Can you track this
in a PR please?

Thanks,
Richard.

> Thanks,
> 
> Christophe
> 
> 
> 
> > ---
> >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> >  gcc/tree-vect-loop.c | 223 ---
> >  2 files changed, 155 insertions(+), 85 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > new file mode 100644
> > index 000..9ee9ba4e736
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> > +
> > +#define N 32
> > +int foo (int *a, int n)
> > +{
> > +  int sum = 1;
> > +  for (int i = 0; i < 8*N + 4; ++i)
> > +sum += a[i];
> > +  return sum;
> > +}
> > +
> > +/* The reduction epilog should be vectorized and the accumulator
> > +   re-used.  */
> > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index 8c27d75f889..98e2a845629 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >   ones as well.  */
> >tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> >tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > +   TYPE_VECTOR_SUBPARTS (vectype)))
> >  return false;
> >
> >/* Non-SLP reductions might apply an adjustment after the reduction
> > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >return true;
> >  }
> >
> > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> > +
> > +static tree
> > +vect_create_partial_epilog (tree ve

Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-15 Thread H.J. Lu via Gcc-patches
On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
>
>
> Status
> ==
>
> The GCC 11 branch is open for regression and documentation fixes.
> It's time for a GCC 11.2 release and we are aiming for a release
> candidate in about two weeks which would result in the GCC 11.2
> release about three months after GCC 11.1.
>
> Two weeks give you ample time to care for important regressions
> and backporting of fixes.  Please also look out for issues on
> non-primary/secondary targets.
>
>
> Quality Data
> 
>
> Priority  #   Change from last report
> ---   ---
> P1
> P2  272   +  20
> P3   94   +  56
> P4  210   +   2
> P5   24   -   1
> ---   ---
> Total P1-P3 366   +  76
> Total   600   +  79
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html

I'd like to backport:

https://gcc.gnu.org/g:3f04e3782536ad2f9cfbb8cfe6630e9f9dd8af4c

to fix this GCC 11 regression:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101023

-- 
H.J.


Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, H.J. Lu wrote:

> On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
> >
> >
> > Status
> > ==
> >
> > The GCC 11 branch is open for regression and documentation fixes.
> > It's time for a GCC 11.2 release and we are aiming for a release
> > candidate in about two weeks which would result in the GCC 11.2
> > release about three months after GCC 11.1.
> >
> > Two weeks give you ample time to care for important regressions
> > and backporting of fixes.  Please also look out for issues on
> > non-primary/secondary targets.
> >
> >
> > Quality Data
> > 
> >
> > Priority  #   Change from last report
> > ---   ---
> > P1
> > P2  272   +  20
> > P3   94   +  56
> > P4  210   +   2
> > P5   24   -   1
> > ---   ---
> > Total P1-P3 366   +  76
> > Total   600   +  79
> >
> >
> > Previous Report
> > ===
> >
> > https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html
> 
> I'd like to backport:
> 
> https://gcc.gnu.org/g:3f04e3782536ad2f9cfbb8cfe6630e9f9dd8af4c
> 
> to fix this GCC 11 regression:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101023

OK.


Re: [PATCH 1/2] Streamline vect_gen_while

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This adjusts the vect_gen_while API to match that of
> vect_gen_while_not allowing further patches to generate more
> than one stmt for the while case.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, tested a
> toy example on SVE that it still produces the same code.
>
> OK?
>
> 2021-07-15  Richard Biener  
>
>   * tree-vectorizer.h (vect_gen_while): Match up with
>   vect_gen_while_not.
>   * tree-vect-stmts.c (vect_gen_while): Adjust API to that
>   of vect_gen_while_not.
>   (vect_gen_while_not): Adjust.
>   * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Likewise.
> ---
>  gcc/tree-vect-loop-manip.c | 14 ++
>  gcc/tree-vect-stmts.c  | 16 
>  gcc/tree-vectorizer.h  |  3 ++-
>  3 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index c29ffb3356c..1f3d6614e6c 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -609,11 +609,8 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>   }
>  
> if (use_masks_p)
> - {
> -   init_ctrl = make_temp_ssa_name (ctrl_type, NULL, "max_mask");
> -   gimple *tmp_stmt = vect_gen_while (init_ctrl, start, end);
> -   gimple_seq_add_stmt (preheader_seq, tmp_stmt);
> - }
> + init_ctrl = vect_gen_while (preheader_seq, ctrl_type,
> + start, end, "max_mask");
> else
>   {
> init_ctrl = make_temp_ssa_name (compare_type, NULL, "max_len");
> @@ -652,9 +649,10 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>/* Get the control value for the next iteration of the loop.  */
>if (use_masks_p)
>   {
> -   next_ctrl = make_temp_ssa_name (ctrl_type, NULL, "next_mask");
> -   gcall *call = vect_gen_while (next_ctrl, test_index, this_test_limit);
> -   gsi_insert_before (test_gsi, call, GSI_SAME_STMT);
> +   gimple_seq stmts = NULL;
> +   next_ctrl = vect_gen_while (&stmts, ctrl_type, test_index,
> +   this_test_limit, "next_mask");
> +   gsi_insert_seq_before (test_gsi, stmts, GSI_SAME_STMT);
>   }
>else
>   {
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index d9eeda50278..6a25d661800 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -12002,19 +12002,21 @@ supportable_narrowing_operation (enum tree_code 
> code,
>  /* Generate and return a statement that sets vector mask MASK such that
> MASK[I] is true iff J + START_INDEX < END_INDEX for all J <= I.  */

Comment needs updating.  LGTM otherwise, thanks.

Richard

> -gcall *
> -vect_gen_while (tree mask, tree start_index, tree end_index)
> +tree
> +vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
> + tree end_index, const char *name)
>  {
>tree cmp_type = TREE_TYPE (start_index);
> -  tree mask_type = TREE_TYPE (mask);
>gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
>  cmp_type, mask_type,
>  OPTIMIZE_FOR_SPEED));
>gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
>   start_index, end_index,
>   build_zero_cst (mask_type));
> -  gimple_call_set_lhs (call, mask);
> -  return call;
> +  tree tmp = make_temp_ssa_name (mask_type, NULL, name);
> +  gimple_call_set_lhs (call, tmp);
> +  gimple_seq_add_stmt (seq, call);
> +  return tmp;
>  }
>  
>  /* Generate a vector mask of type MASK_TYPE for which index I is false iff
> @@ -12024,9 +12026,7 @@ tree
>  vect_gen_while_not (gimple_seq *seq, tree mask_type, tree start_index,
>   tree end_index)
>  {
> -  tree tmp = make_ssa_name (mask_type);
> -  gcall *call = vect_gen_while (tmp, start_index, end_index);
> -  gimple_seq_add_stmt (seq, call);
> +  tree tmp = vect_gen_while (seq, mask_type, start_index, end_index);
>return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 4c4bc810c35..49afdd898d0 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1948,7 +1948,8 @@ extern bool vect_supportable_shift (vec_info *, enum 
> tree_code, tree);
>  extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
>  extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
>  extern void optimize_mask_stores (class loop*);
> -extern gcall *vect_gen_while (tree, tree, tree);
> +extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
> + const char * = nullptr);
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector

testsuite: aarch64: Fix failing SVE tests on big endian

2021-07-15 Thread Jonathan Wright via Gcc-patches
Hi,

A recent change "gcc: Add vec_select -> subreg RTL simplification"
updated the expected test results for SVE extraction tests. The new
result should only have been changed for little endian. This patch
restores the old expected result for big endian.

Ok for master?

Thanks,
Jonathan

---

gcc/testsuite/ChangeLog:

2021-07-15  Jonathan Wright  

* gcc.target/aarch64/sve/extract_1.c: Split expected results
by big/little endian targets, restoring the old expected
result for big endian.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.


rb14655.patch
Description: rb14655.patch


[PATCH] arm: Fix multilib mapping for CDE extensions [PR100856]

2021-07-15 Thread Christophe LYON via Gcc-patches

This is a followup to Srinath's recent patch: the newly added test is
failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.

It is also failing on arm-eabi with R/M profile multilibs if the
execution engine does not support v8.1-M instructions.

The patch avoids this by adding check_effective_target_FUNC_multilib
in target-supports.exp which effectively checks whether the target
supports linking and execution, like what is already done for other
ARM effective targets.  pr100856.c is updated to use it instead of
arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
duplicate with check_effective_target_FUNC_multilib).

In addition, I noticed that requiring MVE does not seem necessary and
this enables the test to pass even when targeting a CPU without MVE:
since the test does not involve actual CDE instructions, it can pass
on other architecture versions.  For instance, when requiring MVE, we
have to use cortex-m55 under QEMU for the test to pass because the
memset() that comes from v8.1-m.main+mve multilib uses LOB
instructions (DLS) (memset is used during startup).  Keeping
arm_v8_1m_main_cde_mve_ok would mean we would enable the test provided
we have the right multilibs, causing a runtime error if the simulator
does not support LOB instructions (e.g. when targeting cortex-m7).

I do not update sourcebuild.texi since the CDE effective targets are
already collectively documented.

Finally, the patch fixes two typos in comments.

2021-07-15  Christophe Lyon  

    PR target/100856
    gcc/
    * config/arm/arm.opt: Fix typo.
    * config/arm/t-rmprofile: Fix typo.

    gcc/testsuite/
    * gcc.target/arm/acle/pr100856.c: Use arm_v8m_main_cde_multilib
    and arm_v8m_main_cde.
    * lib/target-supports.exp: Add 
check_effective_target_FUNC_multilib for ARM CDE.



From baa9ed42d986dd2569697ac8903b3ca70ad73bb9 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 15 Jul 2021 12:57:18 +
Subject: [PATCH] arm: Fix multilib mapping for CDE extensions [PR100856]

This is a followup to Srinath's recent patch: the newly added test is
failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.

It is also failing on arm-eabi with R/M profile multilibs if the
execution engine does not support v8.1-M instructions.

The patch avoids this by adding check_effective_target_FUNC_multilib
in target-supports.exp which effectively checks whether the target
supports linking and execution, like what is already done for other
ARM effective targets.  pr100856.c is updated to use it instead of
arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
duplicate with check_effective_target_FUNC_multilib).

In addition, I noticed that requiring MVE does not seem necessary and
this enables the test to pass even when targeting a CPU without MVE:
since the test does not involve actual CDE instructions, it can pass
on other architecture versions.  For instance, when requiring MVE, we
have to use cortex-m55 under QEMU for the test to pass because the
memset() that comes from v8.1-m.main+mve multilib uses LOB
instructions (DLS) (memset is used during startup).  Keeping
arm_v8_1m_main_cde_mve_ok would mean we would enable the test provided
we have the right multilibs, causing a runtime error if the simulator
does not support LOB instructions (e.g. when targeting cortex-m7).

I do not update sourcebuild.texi since the CDE effective targets are
already collectively documented.

Finally, the patch fixes two typos in comments.

2021-07-15  Christophe Lyon  

PR target/100856
gcc/
* config/arm/arm.opt: Fix typo.
* config/arm/t-rmprofile: Fix typo.

gcc/testsuite/
* gcc.target/arm/acle/pr100856.c: Use arm_v8m_main_cde_multilib
and arm_v8m_main_cde.
* lib/target-supports.exp: Add
check_effective_target_FUNC_multilib for ARM CDE.
---
 gcc/config/arm/arm.opt   |  2 +-
 gcc/config/arm/t-rmprofile   |  2 +-
 gcc/testsuite/gcc.target/arm/acle/pr100856.c |  4 ++--
 gcc/testsuite/lib/target-supports.exp| 18 ++
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index af478a946b2..7417b55122a 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ EnumValue
 Enum(arm_arch) String(native) Value(-1) DriverOnly
 
 ; Set to the name of target architecture which is required for
-; multilib linking.  This option is undocumented becuase it
+; multilib linking.  This option is undocumented because it
 ; should not be used by the users.
 mlibarch=
 Target RejectNegative JoinedOrMissing NoDWARFRecord DriverOnly Undocumented
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 3e75fcc9635..a6036bf0a51 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -54,7 +54,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv8.1-m.main+mve/mfloat-a

RE: [PATCH][AArch32]: Correct sdot RTL on aarch32

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi Christophe,

Sorry about that, the ICEs should be fixed now and the execution tests are 
being fixed now.

They were being hidden by a model bug which kept saying everything passed even 
when failed ☹

Regards,
Tamar

From: Christophe Lyon 
Sent: Thursday, July 15, 2021 9:39 AM
To: Tamar Christina 
Cc: GCC Patches ; Richard Earnshaw 
; nd ; Ramana Radhakrishnan 

Subject: Re: [PATCH][AArch32]: Correct sdot RTL on aarch32

Hi Tamar,


On Tue, May 25, 2021 at 5:41 PM Tamar Christina via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:
Hi All,

The RTL Generated from dot_prod is invalid as operand3 cannot be
written to, it's a normal input.  For the expand it's just another operand
but the caller does not expect it to be written to.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master? and backport to GCC 11, 10, 9?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (dot_prod): Drop statements.

--- inline copy of patch --
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
61d81646475ce3bf62ece2cec2faf0c1fe978ec1..9602e9993aeebf4ec620d105fd20f64498a3b851
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3067,13 +3067,7 @@ (define_expand "dot_prod"
 DOTPROD)
(match_operand:VCVTI 3 "register_operand")))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)

 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"

This patch is causing ICEs on arm-eabi (and probably arm-linux-gnueabi but 
trunk build is currently broken):

 FAIL: gcc.target/arm/simd/vect-dot-s8.c (internal compiler error)
FAIL: gcc.target/arm/simd/vect-dot-s8.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: error: unrecognizable 
insn:
(insn 29 28 30 5 (set (reg:V4SI 132 [ vect_patt_31.15 ])
(plus:V4SI (unspec:V4SI [
(reg:V16QI 182)
(reg:V16QI 183)
] UNSPEC_DOT_S)
(reg:V4SI 184))) -1
 (nil))
during RTL pass: vregs
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: internal compiler error: 
in extract_insn, at recog.c:2769
0x5fc656 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
/gcc/rtl-error.c:108
0x5fc672 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/gcc/rtl-error.c:116
0xcbbe07 extract_insn(rtx_insn*)
/gcc/recog.c:2769
0x9e2e95 instantiate_virtual_regs_in_insn
/gcc/function.c:1611
0x9e2e95 instantiate_virtual_regs
/gcc/function.c:1985
0x9e2e95 execute
/gcc/function.c:2034

Can you check?

Thanks,

Christophe



Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
>
> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> with no LHS, then we must adjust the callers.
>
> The attached patch fixes gimple_expr_type() per it's documentation:
>
> /* Return the type of the main expression computed by STMT.  Return
>void_type_node if the statement computes nothing.  */
>
> Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.
>
> I still think gimple_call_return_type should return void_type_node
> instead of ICEing, but this will also fix my problem.
>
> Anyone have a problem with this?

It's still somewhat inconsistent, no?  Because for a call without a LHS
it's now either void_type_node or the type of the return value.

It's probably known I dislike gimple_expr_type itself (it was introduced
to make the transition to tuples easier).  I wonder why you can't simply
fix range_of_call to do

   tree lhs = gimple_call_lhs (call);
   if (lhs)
 type = TREE_TYPE (lhs);

Richard.

>
> Aldy
>
> On Thu, Jun 24, 2021 at 3:57 PM Andrew MacLeod via Gcc-patches
>  wrote:
> >
> > On 6/24/21 9:45 AM, Jakub Jelinek wrote:
> > > On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches 
> > > wrote:
> > >> We'll still compute values for statements that don't have a LHS.. there's
> > >> nothing inherently wrong with that.  The primary example is
> > >>
> > >> if (x_2 < y_3)
> > >>
> > >> we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  
> > >> It
> > >> primarily becomes a generic way to ask for the range of each of the 
> > >> operands
> > >> of the statement, and process it regardless of the presence of a LHS.  I
> > >> don't know, maybe there is (or will be)  an internal function that 
> > >> doesn't
> > >> have a LHS but which can be folded away/rewritten if the operands are
> > >> certain values.
> > > There are many internal functions that aren't ECF_CONST or ECF_PURE.  Some
> > > of them, like IFN*STORE* I think never have an lhs, others have them, but
> > > if the lhs is unused, various optimization passes can just remove those 
> > > lhs
> > > from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls
> > > would be DCEd).
> > >
> > > I think generally, if a call doesn't have lhs, there is no point in
> > > computing a value range for that missing lhs.  It won't be useful for the
> > > call arguments to lhs direction (nothing would care about that value) and
> > > it won't be useful on the direction from the lhs to the call arguments
> > > either.  Say if one has
> > >p_23 = __builtin_memcpy (p_75, q_23, 16);
> > > then one can imply from ~[0, 0] range on p_75 that p_23 has that range too
> > > (and vice versa), but if one has
> > >__builtin_memcpy (p_125, q_23, 16);
> > > none of that makes sense.
> > >
> > > So instead of punting when gimple_call_return_type returns NULL IMHO the
> > > code should punt when gimple_call_lhs is NULL.
> > >
> > >
> >
> > Well, we are going to punt anyway, because the call type, whether it is
> > NULL or VOIDmode is not supported by irange.   It was more just a matter
> > of figuring out whether us checking for internal call or the
> > gimple_function_return_type call should do the check...   Ultimately in
> > the end it doesnt matter.. just seemed like something someone else could
> > trip across if we didnt strengthen gimple_call_return_type to not ice.
> >
> > Andrew
> >


Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-15 Thread Jonathan Wright via Gcc-patches
Ah, yes - those test results should have only been changed for little endian.

I've submitted a patch to the list restoring the original expected results for 
big
endian.

Thanks,
Jonathan

From: Christophe Lyon 
Sent: 15 July 2021 10:09
To: Richard Sandiford ; Jonathan Wright 
; gcc-patches@gcc.gnu.org ; 
Kyrylo Tkachov 
Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification



On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:
Jonathan Wright mailto:jonathan.wri...@arm.com>> 
writes:
> Hi,
>
> Version 2 of this patch adds more code generation tests to show the
> benefit of this RTL simplification as well as adding a new helper function
> 'rtx_vec_series_p' to reduce code duplication.
>
> Patch tested as version 1 - ok for master?

Sorry for the slow reply.

> Regression tested and bootstrapped on aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> aarch64_be-none-linux-gnu - no issues.

I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
>  100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> int in_dest,
> - 1,
> 0));
>break;
> +case VEC_SELECT:
> +  {
> + rtx trueop0 = XEXP (x, 0);
> + mode = GET_MODE (trueop0);
> + rtx trueop1 = XEXP (x, 1);
> + int nunits;
> + /* If we select a low-part subreg, return that.  */
> + if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
> +   {
> + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
> +
> + if (rtx_vec_series_p (trueop1, offset))
> +   {
> + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> + if (new_rtx != NULL_RTX)
> +   return new_rtx;
> +   }
> +   }
> +  }

Since this occurs three times, I think it would be worth having
a new predicate:

/* Return true if, for all OP of mode OP_MODE:

 (vec_select:RESULT_MODE OP SEL)

   is equivalent to the lowpart RESULT_MODE of OP.  */

bool
vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)

containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
and rtx_vec_series_p tests.

I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
is in emit-rtl.c.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1884,15 +1884,16 @@
>  )
>
>  (define_insn "*zero_extend2_aarch64"
> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> -(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m")))]
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> +(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m,w")))]
>""
>"@
> and\t%0, %1, 
> ldr\t%w0, %1
> -   ldr\t%0, %1"
> -  [(set_attr "type" "logic_imm,load_4,f_loads")
> -   (set_attr "arch" "*,*,fp")]
> +   ldr\t%0, %1
> +   umov\t%w0, %1.[0]"
> +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> +   (set_attr "arch" "*,*,fp,fp")]

FTR (just to show I thought about it): I don't know whether the umov
can really be considered an fp operation rather than a simd operation,
but since we don't support fp without simd, this is already a distinction
without a difference.  So the pattern is IMO OK as-is.

> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
>  100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -224,7 +224,7 @@
>  ;; problems because small constants get converted into adds.
>  (define_insn "*arm_movsi_vfp"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m 
> ,*t,r,*t,*t, *Uv")
> -  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> +  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
>"TARGET_ARM && TARGET_HARD_FLOAT
> && (   s_register_operand (operands[0], SImode)
> || s_register_operand (operands[1], SImode))"

I'll assume that an Arm maintainer would have spoken up by now if
they didn't want this for some reason.

> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index 
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
>  100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_r

Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Aldy Hernandez via Gcc-patches




On 7/15/21 3:06 PM, Richard Biener wrote:

On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:


Well, if we don't adjust gimple_call_return_type() to handle built-ins
with no LHS, then we must adjust the callers.

The attached patch fixes gimple_expr_type() per it's documentation:

/* Return the type of the main expression computed by STMT.  Return
void_type_node if the statement computes nothing.  */

Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.

I still think gimple_call_return_type should return void_type_node
instead of ICEing, but this will also fix my problem.

Anyone have a problem with this?


It's still somewhat inconsistent, no?  Because for a call without a LHS
it's now either void_type_node or the type of the return value.

It's probably known I dislike gimple_expr_type itself (it was introduced
to make the transition to tuples easier).  I wonder why you can't simply
fix range_of_call to do

tree lhs = gimple_call_lhs (call);
if (lhs)
  type = TREE_TYPE (lhs);


That would still leave gimple_expr_type() broken.  It's comment clearly 
says it should return void_type_node.


I still think we should just fix gimple_call_return_type to return 
void_type_node instead of ICEing.




Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
>
>
>
> On 7/15/21 3:06 PM, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> >>
> >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> >> with no LHS, then we must adjust the callers.
> >>
> >> The attached patch fixes gimple_expr_type() per it's documentation:
> >>
> >> /* Return the type of the main expression computed by STMT.  Return
> >> void_type_node if the statement computes nothing.  */
> >>
> >> Currently gimple_expr_type is ICEing because it calls 
> >> gimple_call_return_type.
> >>
> >> I still think gimple_call_return_type should return void_type_node
> >> instead of ICEing, but this will also fix my problem.
> >>
> >> Anyone have a problem with this?
> >
> > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > it's now either void_type_node or the type of the return value.
> >
> > It's probably known I dislike gimple_expr_type itself (it was introduced
> > to make the transition to tuples easier).  I wonder why you can't simply
> > fix range_of_call to do
> >
> > tree lhs = gimple_call_lhs (call);
> > if (lhs)
> >   type = TREE_TYPE (lhs);
>
> That would still leave gimple_expr_type() broken.  It's comment clearly
> says it should return void_type_node.

Does it?  What does it say for

int foo ();

and the stmt

 'foo ();'

?  How's this different from

 'bar ();'

when bar is an internal function?  Note how the comment
speaks about 'type of the main EXPRESSION' and
'if the STATEMEMT computes nothing' (emphasis mine).
I don't think it's all that clear.  A gimple_cond stmt
doesn't compute anything, does it?  Does the 'foo ()'
statement compute anything?  The current implementation
(and your patched one) says so.  But why does

 .ADD_OVERFLOW (_1, _2);

not (according to your patched implementation)?  It computes
something and that something has a type that depends on
the types of _1 and _2 and on the actual internal function.
But we don't have it readily available.  If you need it then
you are on your own - but returning void_type_node is wrong.

Richard.

> I still think we should just fix gimple_call_return_type to return
> void_type_node instead of ICEing.
>


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:21 PM Richard Biener
 wrote:
>
> On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 7/15/21 3:06 PM, Richard Biener wrote:
> > > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> > >>
> > >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> > >> with no LHS, then we must adjust the callers.
> > >>
> > >> The attached patch fixes gimple_expr_type() per it's documentation:
> > >>
> > >> /* Return the type of the main expression computed by STMT.  Return
> > >> void_type_node if the statement computes nothing.  */
> > >>
> > >> Currently gimple_expr_type is ICEing because it calls 
> > >> gimple_call_return_type.
> > >>
> > >> I still think gimple_call_return_type should return void_type_node
> > >> instead of ICEing, but this will also fix my problem.
> > >>
> > >> Anyone have a problem with this?
> > >
> > > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > > it's now either void_type_node or the type of the return value.
> > >
> > > It's probably known I dislike gimple_expr_type itself (it was introduced
> > > to make the transition to tuples easier).  I wonder why you can't simply
> > > fix range_of_call to do
> > >
> > > tree lhs = gimple_call_lhs (call);
> > > if (lhs)
> > >   type = TREE_TYPE (lhs);
> >
> > That would still leave gimple_expr_type() broken.  It's comment clearly
> > says it should return void_type_node.
>
> Does it?  What does it say for
>
> int foo ();
>
> and the stmt
>
>  'foo ();'
>
> ?  How's this different from
>
>  'bar ();'
>
> when bar is an internal function?  Note how the comment
> speaks about 'type of the main EXPRESSION' and
> 'if the STATEMEMT computes nothing' (emphasis mine).
> I don't think it's all that clear.  A gimple_cond stmt
> doesn't compute anything, does it?  Does the 'foo ()'
> statement compute anything?  The current implementation
> (and your patched one) says so.  But why does
>
>  .ADD_OVERFLOW (_1, _2);
>
> not (according to your patched implementation)?  It computes
> something and that something has a type that depends on
> the types of _1 and _2 and on the actual internal function.
> But we don't have it readily available.  If you need it then
> you are on your own - but returning void_type_node is wrong.

That said, in 99% of all cases people should have used
TREE_TYPE (gimple_get_lhs (stmt)) insead of
gimple_expr_type since that makes clear that we're
talking of a result that materializes somewhere.  It also
makes the required guard obvious - gimple_get_lhs (stmt) != NULL.

Then there are the legacy callers that call it on a GIMPLE_COND
and the (IMHO broken) ones that expect it to do magic for
masked loads and stores.

Richard.

> Richard.
>
> > I still think we should just fix gimple_call_return_type to return
> > void_type_node instead of ICEing.
> >


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:23 PM Richard Biener
 wrote:
>
> On Thu, Jul 15, 2021 at 3:21 PM Richard Biener
>  wrote:
> >
> > On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
> > >
> > >
> > >
> > > On 7/15/21 3:06 PM, Richard Biener wrote:
> > > > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> > > >>
> > > >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> > > >> with no LHS, then we must adjust the callers.
> > > >>
> > > >> The attached patch fixes gimple_expr_type() per it's documentation:
> > > >>
> > > >> /* Return the type of the main expression computed by STMT.  Return
> > > >> void_type_node if the statement computes nothing.  */
> > > >>
> > > >> Currently gimple_expr_type is ICEing because it calls 
> > > >> gimple_call_return_type.
> > > >>
> > > >> I still think gimple_call_return_type should return void_type_node
> > > >> instead of ICEing, but this will also fix my problem.
> > > >>
> > > >> Anyone have a problem with this?
> > > >
> > > > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > > > it's now either void_type_node or the type of the return value.
> > > >
> > > > It's probably known I dislike gimple_expr_type itself (it was introduced
> > > > to make the transition to tuples easier).  I wonder why you can't simply
> > > > fix range_of_call to do
> > > >
> > > > tree lhs = gimple_call_lhs (call);
> > > > if (lhs)
> > > >   type = TREE_TYPE (lhs);
> > >
> > > That would still leave gimple_expr_type() broken.  It's comment clearly
> > > says it should return void_type_node.
> >
> > Does it?  What does it say for
> >
> > int foo ();
> >
> > and the stmt
> >
> >  'foo ();'
> >
> > ?  How's this different from
> >
> >  'bar ();'
> >
> > when bar is an internal function?  Note how the comment
> > speaks about 'type of the main EXPRESSION' and
> > 'if the STATEMEMT computes nothing' (emphasis mine).
> > I don't think it's all that clear.  A gimple_cond stmt
> > doesn't compute anything, does it?  Does the 'foo ()'
> > statement compute anything?  The current implementation
> > (and your patched one) says so.  But why does
> >
> >  .ADD_OVERFLOW (_1, _2);
> >
> > not (according to your patched implementation)?  It computes
> > something and that something has a type that depends on
> > the types of _1 and _2 and on the actual internal function.
> > But we don't have it readily available.  If you need it then
> > you are on your own - but returning void_type_node is wrong.
>
> That said, in 99% of all cases people should have used
> TREE_TYPE (gimple_get_lhs (stmt)) insead of
> gimple_expr_type since that makes clear that we're
> talking of a result that materializes somewhere.  It also
> makes the required guard obvious - gimple_get_lhs (stmt) != NULL.
>
> Then there are the legacy callers that call it on a GIMPLE_COND
> and the (IMHO broken) ones that expect it to do magic for
> masked loads and stores.

Btw, void_type_node is also wrong for a GIMPLE_ASM with outputs.

I think if you really want to fix the ICEing then return NULL for
"we don't know" and adjust the current default as well.

Richard.

> Richard.
>
> > Richard.
> >
> > > I still think we should just fix gimple_call_return_type to return
> > > void_type_node instead of ICEing.
> > >


[PATCH] Add --enable-first-stage-cross configure option

2021-07-15 Thread Serge Belyshev
Add --enable-first-stage-cross configure option

Build static-only, C-only compiler that is sufficient to cross compile
glibc.  This option disables various runtime libraries that require
libc to compile, turns on --with-newlib, --without-headers,
--disable-decimal-float, --disable-shared, --disable-threads, and sets
--enable-languages=c.

Rationale: current way of building first stage compiler of a cross
toolchain requires specifying a list of target libraries that are not
going to be compiled due to their dependency on target libc.  This
list is not documented in gccinstall.texi and sometimes changes.  To
simplify the procedure, it is better to maintain that list in the GCC
itself.

Usage example as a patch to glibc's scripts/build-many-libcs.py:

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 580d25e8ee..3a6a7be76b 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -1446,17 +1446,7 @@ class Config(object):
 # required to define inhibit_libc (to stop some parts of
 # libgcc including libc headers); --without-headers is not
 # sufficient.
-cfg_opts += ['--enable-languages=c', '--disable-shared',
- '--disable-threads',
- '--disable-libatomic',
- '--disable-decimal-float',
- '--disable-libffi',
- '--disable-libgomp',
- '--disable-libitm',
- '--disable-libmpx',
- '--disable-libquadmath',
- '--disable-libsanitizer',
- '--without-headers', '--with-newlib',
+cfg_opts += ['--enable-first-stage-cross',
  '--with-glibc-version=%s' % self.ctx.glibc_version
  ]
 cfg_opts += self.first_gcc_cfg


Bootstrapped/regtested on x86_64-pc-linux-gnu, and
tested with build-many-glibcs.py with the above patch.

OK for mainline?


ChangeLog:

* configure.ac: Add --enable-first-stage-cross.
* configure: Regenerate.

gcc/ChangeLog:

* doc/install.texi: Document --enable-first-stage-cross.
---
 configure| 20 
 configure.ac | 15 +++
 gcc/doc/install.texi |  7 +++
 3 files changed, 42 insertions(+)

diff --git a/configure b/configure
index 85ab9915402..df59036e258 100755
--- a/configure
+++ b/configure
@@ -787,6 +787,7 @@ ac_user_opts='
 enable_option_checking
 with_build_libsubdir
 with_system_zlib
+enable_first_stage_cross
 enable_as_accelerator_for
 enable_offload_targets
 enable_offload_defaulted
@@ -1514,6 +1515,9 @@ Optional Features:
   --disable-option-checking  ignore unrecognized --enable/--with options
   --disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
+  --enable-first-stage-cross
+  Build a static-only compiler that is sufficient to
+  build glibc.
   --enable-as-accelerator-for=ARG
   build as offload target compiler. Specify offload
   host triple by ARG
@@ -2961,6 +2965,22 @@ case $is_cross_compiler in
   no) skipdirs="${skipdirs} ${cross_only}" ;;
 esac
 
+# Check whether --enable-first-stage-cross was given.
+if test "${enable_first_stage_cross+set}" = set; then :
+  enableval=$enable_first_stage_cross; ENABLE_FIRST_STAGE_CROSS=$enableval
+else
+  ENABLE_FIRST_STAGE_CROSS=no
+fi
+
+case "${ENABLE_FIRST_STAGE_CROSS}" in
+  yes)
+noconfigdirs="$noconfigdirs target-libatomic target-libquadmath 
target-libgomp target-libssp"
+host_configargs="$host_configargs --disable-shared --disable-threads 
--disable-decimal-float --without-headers --with-newlib"
+target_configargs="$target_configargs --disable-shared"
+enable_languages=c
+;;
+esac
+
 # If both --with-headers and --with-libs are specified, default to
 # --without-newlib.
 if test x"${with_headers}" != x && test x"${with_headers}" != xno \
diff --git a/configure.ac b/configure.ac
index 1df038b04f3..53f920c1a2c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -268,6 +268,21 @@ case $is_cross_compiler in
   no) skipdirs="${skipdirs} ${cross_only}" ;;
 esac
 
+AC_ARG_ENABLE(first-stage-cross,
+[AS_HELP_STRING([--enable-first-stage-cross],
+   [Build a static-only compiler that is
+   sufficient to build glibc.])],
+ENABLE_FIRST_STAGE_CROSS=$enableval,
+ENABLE_FIRST_STAGE_CROSS=no)
+case "${ENABLE_FIRST_STAGE_CROSS}" in
+  yes)
+noconfigdirs="$noconfigdirs target-libatomic target-libquadmath 
target-libgomp target-libssp"
+host_configargs="$host_configargs --disable-shared --disable-threads 
--disable-decimal-float --without-headers --with-newlib"
+target_configargs="$target_configargs --disable-shared"
+enable_languages=c
+;;
+esac
+
 # I

Re: [PATCH] c++: Optimize away NULLPTR_TYPE comparisons [PR101443]

2021-07-15 Thread Jason Merrill via Gcc-patches

On 7/15/21 3:53 AM, Jakub Jelinek wrote:

Hi!

Comparisons of NULLPTR_TYPE operands cause all kinds of problems in the
middle-end and in fold-const.c, various optimizations assume that if they
see e.g. a non-equality comparison with one of the operands being
INTEGER_CST and it is not INTEGRAL_TYPE_P (which has TYPE_{MIN,MAX}_VALUE),
they can build_int_cst (type, 1) to find a successor.

The following patch fixes it by making sure they don't appear in the IL,
optimize them away at cp_fold time as all can be folded.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I've just noticed that clang++ rejects the non-equality comparisons
instead, foo () > 0 with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'int')
and foo () > nullptr with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'nullptr_t')

Shall we reject those too, in addition or instead of parts of this patch?


Yes.


If so, wouldn't this patch be still useful for backports, I bet we don't
want to start reject it on the release branches when we used to accept it.


Sounds good.


2021-07-15  Jakub Jelinek  

PR c++/101443
* cp-gimplify.c (cp_fold): For comparisons with NULLPTR_TYPE
operands, fold them right away to true or false.

* g++.dg/cpp0x/nullptr46.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2021-06-25 10:36:22.141020337 +0200
+++ gcc/cp/cp-gimplify.c2021-07-14 12:04:24.221860756 +0200
@@ -2424,6 +2424,32 @@ cp_fold (tree x)
op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
op1 = cp_fold_rvalue (TREE_OPERAND (x, 1));
  
+  /* decltype(nullptr) has only one value, so optimize away all comparisons

+with that type right away, keeping them in the IL causes troubles for
+various optimizations.  */
+  if (COMPARISON_CLASS_P (org_x)
+ && TREE_CODE (TREE_TYPE (op0)) == NULLPTR_TYPE
+ && TREE_CODE (TREE_TYPE (op1)) == NULLPTR_TYPE)
+   {
+ switch (code)
+   {
+   case EQ_EXPR:
+   case LE_EXPR:
+   case GE_EXPR:
+ x = constant_boolean_node (true, TREE_TYPE (x));
+ break;
+   case NE_EXPR:
+   case LT_EXPR:
+   case GT_EXPR:
+ x = constant_boolean_node (false, TREE_TYPE (x));
+ break;
+   default:
+ gcc_unreachable ();
+   }
+ return omit_two_operands_loc (loc, TREE_TYPE (x), x,
+   op0, op1);
+   }
+
if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))
{
  if (op0 == error_mark_node || op1 == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/nullptr46.C.jj   2021-07-14 11:48:03.917122727 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/nullptr46.C  2021-07-14 11:46:52.261092097 
+0200
@@ -0,0 +1,11 @@
+// PR c++/101443
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2" }
+
+decltype(nullptr) foo ();
+
+bool
+bar ()
+{
+  return foo () > nullptr || foo () < nullptr;
+}

Jakub





Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following extends the existing loop masking support using
> SVE WHILE_ULT to x86 by proving an alternate way to produce the
> mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> you can now enable masked vectorized epilogues (=1) or fully
> masked vector loops (=2).

As mentioned on IRC, WHILE_ULT is supposed to ensure that every
element after the first zero is also zero.  That happens naturally
for power-of-2 vectors if the start index is a multiple of the VF.
(And at the moment, variable-length vectors are the only way of
supporting non-power-of-2 vectors.)

This probably works fine for =2 and =1 as things stand, since the
vector IVs always start at zero.  But if in future we have a single
IV counting scalar iterations, and use it even for peeled prologue
iterations, we could end up with a situation where the approximation
is no longer safe.

E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
If we peeled 2 iterations for alignment and then had a VF of 8,
the final vector would have a start index of (uint32_t)-6 and the
vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.

So I think it would be safer to handle this as an alternative to
using while, rather than as a direct emulation, so that we can take
the extra restrictions into account.  Alternatively, we could probably
do { 0, 1, 2, ... } < { end - start, end - start, ... }.

Thanks,
Richard



>
> What's missing is using a scalar IV for the loop control
> (but in principle AVX512 can use the mask here - just the patch
> doesn't seem to work for AVX512 yet for some reason - likely
> expand_vec_cond_expr_p doesn't work there).  What's also missing
> is providing more support for predicated operations in the case
> of reductions either via VEC_COND_EXPRs or via implementing
> some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> to masked AVX512 operations.
>
> For AVX2 and
>
> int foo (unsigned *a, unsigned * __restrict b, int n)
> {
>   unsigned sum = 1;
>   for (int i = 0; i < n; ++i)
> b[i] += a[i];
>   return sum;
> }
>
> we get
>
> .L3:
> vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> addl$8, %edx
> vpaddd  %ymm3, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> vmovd   %edx, %xmm1
> vpsubd  %ymm15, %ymm2, %ymm0
> addq$32, %rax
> vpbroadcastd%xmm1, %ymm1
> vpaddd  %ymm4, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm1, %ymm1
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vptest  %ymm0, %ymm0
> jne .L3
>
> for the fully masked loop body and for the masked epilogue
> we see
>
> .L4:
> vmovdqu (%rsi,%rax), %ymm3
> vpaddd  (%rdi,%rax), %ymm3, %ymm0
> vmovdqu %ymm0, (%rsi,%rax)
> addq$32, %rax
> cmpq%rax, %rcx
> jne .L4
> movl%edx, %eax
> andl$-8, %eax
> testb   $7, %dl
> je  .L11
> .L3:
> subl%eax, %edx
> vmovdqa .LC0(%rip), %ymm1
> salq$2, %rax
> vmovd   %edx, %xmm0
> movl$-2147483648, %edx
> addq%rax, %rsi
> vmovd   %edx, %xmm15
> vpbroadcastd%xmm0, %ymm0
> vpbroadcastd%xmm15, %ymm15
> vpsubd  %ymm15, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm0, %ymm0
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vpmaskmovd  (%rsi), %ymm0, %ymm1
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> vpaddd  %ymm2, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi)
> .L11:
> vzeroupper
>
> compared to
>
> .L3:
> movl%edx, %r8d
> subl%eax, %r8d
> leal-1(%r8), %r9d
> cmpl$2, %r9d
> jbe .L6
> leaq(%rcx,%rax,4), %r9
> vmovdqu (%rdi,%rax,4), %xmm2
> movl%r8d, %eax
> andl$-4, %eax
> vpaddd  (%r9), %xmm2, %xmm0
> addl%eax, %esi
> andl$3, %r8d
> vmovdqu %xmm0, (%r9)
> je  .L2
> .L6:
> movslq  %esi, %r8
> leaq0(,%r8,4), %rax
> movl(%rdi,%r8,4), %r8d
> addl%r8d, (%rcx,%rax)
> leal1(%rsi), %r8d
> cmpl%r8d, %edx
> jle .L2
> addl$2, %esi
> movl4(%rdi,%rax), %r8d
> addl%r8d, 4(%rcx,%rax)
> cmpl%esi, %edx
> jle .L2
> movl8(%rdi,%rax), %edx
> addl%edx, 8(%rcx,%rax)
> .L2:
>
> I'm giving this a little testing right now but will dig on why
> I don't get masked loops when AVX512 is enabled.
>
> Still comments are appreciated.
>
> Thanks,
> Richard.
>
> 2021-07-15  Richard Biener  
>
>   * tree-vect-stmts.c (can_produce_all_loop_masks_p): We
>   also can produce masks with VEC_COND_EXPRs.
>   * tree-vect-loop.c (vect_gen_while): Generate the mask
>   with a VEC_COND_EXPR in 

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following extends the existing loop masking support using
> > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > you can now enable masked vectorized epilogues (=1) or fully
> > masked vector loops (=2).
> 
> As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> element after the first zero is also zero.  That happens naturally
> for power-of-2 vectors if the start index is a multiple of the VF.
> (And at the moment, variable-length vectors are the only way of
> supporting non-power-of-2 vectors.)
> 
> This probably works fine for =2 and =1 as things stand, since the
> vector IVs always start at zero.  But if in future we have a single
> IV counting scalar iterations, and use it even for peeled prologue
> iterations, we could end up with a situation where the approximation
> is no longer safe.
> 
> E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> If we peeled 2 iterations for alignment and then had a VF of 8,
> the final vector would have a start index of (uint32_t)-6 and the
> vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.

Ah, I didn't think of overflow, yeah.  Guess the add of
{ 0, 1, 2, 3 ... } would need to be saturating ;)

> So I think it would be safer to handle this as an alternative to
> using while, rather than as a direct emulation, so that we can take
> the extra restrictions into account.  Alternatively, we could probably
> do { 0, 1, 2, ... } < { end - start, end - start, ... }.

Or this, that looks correct and not worse from a complexity point
of view.

I'll see if I can come up with a testcase and fix even.

Thanks,
Richard.

> Thanks,
> Richard
> 
> 
> 
> >
> > What's missing is using a scalar IV for the loop control
> > (but in principle AVX512 can use the mask here - just the patch
> > doesn't seem to work for AVX512 yet for some reason - likely
> > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > is providing more support for predicated operations in the case
> > of reductions either via VEC_COND_EXPRs or via implementing
> > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > to masked AVX512 operations.
> >
> > For AVX2 and
> >
> > int foo (unsigned *a, unsigned * __restrict b, int n)
> > {
> >   unsigned sum = 1;
> >   for (int i = 0; i < n; ++i)
> > b[i] += a[i];
> >   return sum;
> > }
> >
> > we get
> >
> > .L3:
> > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > addl$8, %edx
> > vpaddd  %ymm3, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > vmovd   %edx, %xmm1
> > vpsubd  %ymm15, %ymm2, %ymm0
> > addq$32, %rax
> > vpbroadcastd%xmm1, %ymm1
> > vpaddd  %ymm4, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vptest  %ymm0, %ymm0
> > jne .L3
> >
> > for the fully masked loop body and for the masked epilogue
> > we see
> >
> > .L4:
> > vmovdqu (%rsi,%rax), %ymm3
> > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > vmovdqu %ymm0, (%rsi,%rax)
> > addq$32, %rax
> > cmpq%rax, %rcx
> > jne .L4
> > movl%edx, %eax
> > andl$-8, %eax
> > testb   $7, %dl
> > je  .L11
> > .L3:
> > subl%eax, %edx
> > vmovdqa .LC0(%rip), %ymm1
> > salq$2, %rax
> > vmovd   %edx, %xmm0
> > movl$-2147483648, %edx
> > addq%rax, %rsi
> > vmovd   %edx, %xmm15
> > vpbroadcastd%xmm0, %ymm0
> > vpbroadcastd%xmm15, %ymm15
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm0, %ymm0
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > vpaddd  %ymm2, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > .L11:
> > vzeroupper
> >
> > compared to
> >
> > .L3:
> > movl%edx, %r8d
> > subl%eax, %r8d
> > leal-1(%r8), %r9d
> > cmpl$2, %r9d
> > jbe .L6
> > leaq(%rcx,%rax,4), %r9
> > vmovdqu (%rdi,%rax,4), %xmm2
> > movl%r8d, %eax
> > andl$-4, %eax
> > vpaddd  (%r9), %xmm2, %xmm0
> > addl%eax, %esi
> > andl$3, %r8d
> > vmovdqu %xmm0, (%r9)
> > je  .L2
> > .L6:
> > movslq  %esi, %r8
> > leaq0(,%r8,4), %rax
> > movl(%rdi,%r8,4), %r8d
> > addl%r8d, (%rcx,%rax)
> > leal1(%rsi), %r8d
> > cmpl%r8d, %edx
> > jle .L2
> > addl$2, %esi
> > movl4(%rdi,%rax), %r8d
> > addl%r8d

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-15 Thread Peter Bergner via Gcc-patches
On 7/14/21 4:12 PM, Peter Bergner wrote:
> I'll make the change above and rebuild just to be safe and then commit.

Regtesting was clean as expected, so I pushed the commit to trunk.  Thanks.
Is this ok for backporting to GCC 11 after a day or two on trunk?

Given GCC 10 doesn't have the opaque mode changes, I don't want this in GCC 10.


Peter


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-15 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Jul 15, 2021, at 2:56 AM, Richard Biener  
> wrote:
> 
>>> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
 
 Hi, Kees,
 
 I took a look at the kernel testing case you attached in the previous 
 email, and found the testing failed with the following case:
 
 #define INIT_STRUCT_static_all  = { .one = arg->one,\
   .two = arg->two,\
   .three = arg->three,\
   .four = arg->four,  \
   }
 
 i.e, when the structure type auto variable has been explicitly initialized 
 in the source code.  -ftrivial-auto-var-init in the 4th version
 does not initialize the paddings for such variables.
 
 But in the previous version of the patches ( 2 or 3), 
 -ftrivial-auto-var-init initializes the paddings for such variables.
 
 I intended to remove this part of the code from the 4th version of the 
 patch since the implementation for initializing such paddings is 
 completely different from
 the initializing of the whole structure as a whole with memset in this 
 version of the implementation.
 
 If we really need this functionality, I will add another separate patch 
 for this additional functionality, but not with this patch.
 
 Richard, what’s your comment and suggestions on this?
>>> 
>>> I think this can be addressed in the gimplifier by adjusting
>>> gimplify_init_constructor to clear
>>> the object before the initialization (if it's not done via aggregate
>>> copying).
>> 
>> I did this in the previous versions of the patch like the following:
>> 
>> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
>> *pre_p, gimple_seq *post_p,
>>  /* If a single access to the target must be ensured and all elements
>> are zero, then it's optimal to clear whatever their number.  */
>>  cleared = true;
>> +   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
>> +&& !TREE_STATIC (object)
>> +&& type_has_padding (type))
>> + /* If the user requests to initialize automatic variables with
>> +paddings inside the type, we should initialize the paddings too.
>> +C guarantees that brace-init with fewer initializers than 
>> members
>> +aggregate will initialize the rest of the aggregate as-if it 
>> were
>> +static initialization.  In turn static initialization guarantees
>> +that pad is initialized to zero bits.
>> +So, it's better to clear the whole record under such situation. 
>>  */
>> + cleared = true;
>>else
>>  cleared = false;
>> 
>> Then the paddings are also initialized to zeroes with this option. (Even for 
>> -ftrivial-auto-var-init=pattern).
>> 
>> Is the above change Okay? (With this change, when 
>> -ftrivial-auto-var-init=pattern, the paddings for the
>> structure variables that have explicit initializer will be ZEROed, not 0xFE)
> 
> I guess that would be the simplest way, yes.
> 
>>> The clearing
>>> could be done via .DEFERRED_INIT.
>> 
>> You mean to add additional calls to .DEFERRED_INIT for each individual 
>> padding of the structure in “gimplify_init_constructor"?
>> Then  later during RTL expand, expand these calls the same as other calls?
> 
> No, I actually meant to in your patch above set
> 
>defered_padding_init = true;
> 
> and where 'cleared' is processed do sth like
> 
>  if (defered_padding_init)
>.. emit .DEFERRED_INIT for the _whole_ variable ..
>  else if (cleared)
> .. original cleared handling ...
> 
> that would retain the pattern init but possibly be less efficient in the end.

Okay, I see.

Yes, then this will resolve the inconsistent pattern-init issue for paddings. 
I will try this.

> 
>>> 
>>> Note that I think .DEFERRED_INIT can be elided for variables that do
>>> not have their address
>>> taken - otherwise we'll also have to worry about aggregate copy
>>> initialization and SRA
>>> decomposing the copy, initializing only the used parts.
>> 
>> Please explain this a little bit more.
> 
> For sth like
> 
> struct S { int i; long j; };
> 
> void bar (struct S);
> struct S
> foo (struct S *p)
> {
>  struct S q = *p;
>  struct S r = q;
>  bar (r);
>  return r;
> }
> 
> we don't get a .DEFERRED_INIT for 'r' (do we?)

No, we don’t emit .DEFERRED_INIT for both ‘q’ and ‘r’ since they are all 
explicitly initialized.
With the current 4th patch, the paddings inside this structure variable is not 
initialized.

However, if we “clear” the whole structure in "gimplify_init_constructor “, the 
initialization might happen. I will check on this.

and SRA decomposes the init to
> 
> 
>   :
>  q = *p_2(D);
>  q$i_9 = p_2(D)->i;
>  q$j_10 = p_2(D)->j;
>  r.i = q$i_

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-15 Thread Qing Zhao via Gcc-patches


> On Jul 15, 2021, at 9:16 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
>> 
 
 Note that I think .DEFERRED_INIT can be elided for variables that do
 not have their address
 taken - otherwise we'll also have to worry about aggregate copy
 initialization and SRA
 decomposing the copy, initializing only the used parts.
>>> 
>>> Please explain this a little bit more.
>> 
>> For sth like
>> 
>> struct S { int i; long j; };
>> 
>> void bar (struct S);
>> struct S
>> foo (struct S *p)
>> {
>> struct S q = *p;
>> struct S r = q;
>> bar (r);
>> return r;
>> }
>> 
>> we don't get a .DEFERRED_INIT for 'r' (do we?)
> 
> No, we don’t emit .DEFERRED_INIT for both ‘q’ and ‘r’ since they are all 
> explicitly initialized.

Another thought on this example:

I think for the auto variable ‘q’ and ‘r’ in function ‘foo’, their 
initialization depend on the incoming parameter ‘p’.

If ‘p’ is an auto variable in ‘foo’s caller, then the incoming parameter should 
be initialized well in the caller, including it’s padding. 

So, I don’t think that we need to worry about such situation. 

If every function guarantees all its own auto-variables to be initialized 
completely including the paddings.
Then we can guarantee all such copy initialization through parameters all 
initialized completely. 

Let me know if I miss anything here.

Qing
> With the current 4th patch, the paddings inside this structure variable is 
> not initialized.
> 
> However, if we “clear” the whole structure in "gimplify_init_constructor “, 
> the initialization might happen. I will check on this.
> 
> and SRA decomposes the init to
>> 
>> 
>>  :
>> q = *p_2(D);
>> q$i_9 = p_2(D)->i;
>> q$j_10 = p_2(D)->j;
>> r.i = q$i_9;
>> r.j = q$j_10;
>> bar (r);
>> D.1953 = r;
>> r ={v} {CLOBBER};
>> return D.1953;
>> 
>> which leaves its padding uninitialized.  Hmm, and that even happens when
>> you make bar take struct S * and thus pass the address of 'r' to bar.
> 
> Will try this example and see how to resolve this issue.
> 
> Thanks for your explanation.
> 
> Qing
>> 
>> Richard.
>> 
>> 
>>> Thanks.
>>> 
>>> Qing
 
 Richard.
 
> Thanks.
> 
> Qing
> 
>> On Jul 13, 2021, at 4:29 PM, Kees Cook  wrote:
>> 
>> On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
 On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
 On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
> This is the 4th version of the patch for the new security feature for 
> GCC.
 
 It looks like padding initialization has regressed to where things 
 where
 in version 1[1] (it was, however, working in version 2[2]). I'm seeing
 these failures again in the kernel self-test:
 
 test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
 test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
 test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
 test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
 test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
 test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
>>> 
>>> Are the above failures for -ftrivial-auto-var-init=zero or 
>>> -ftrivial-auto-var-init=pattern?  Or both?
>> 
>> Yes, I was only testing =zero (the kernel test handles =pattern as well:
>> it doesn't explicitly test for 0x00). I've verified with =pattern now,
>> too.
>> 
>>> For the current implementation, I believe that all paddings should be 
>>> initialized with this option,
>>> for -ftrivial-auto-var-init=zero, the padding will be initialized to 
>>> zero as before, however, for
>>> -ftrivial-auto-var-init=pattern, the padding will be initialized to 
>>> 0xFE byte-repeatable patterns.
>> 
>> I've double-checked that I'm using the right gcc, with the flag.
>> 
 
 In looking at the gcc test cases, I think the wrong thing is
 being checked: we want to verify the padding itself. For example,
 in auto-init-17.c, the actual bytes after "four" need to be checked,
 rather than "four" itself.
>>> 
>>> **For the current auto-init-17.c
>>> 
>>> 1 /* Verify zero initialization for array type with structure element 
>>> with
>>> 2padding.  */
>>> 3 /* { dg-do compile } */
>>> 4 /* { dg-options "-ftrivial-auto-var-init=zero" } */
>>> 5
>>> 6 struct test_trailing_hole {
>>> 7 int one;
>>> 8 int two;
>>> 9 int three;
>>> 10 char four;
>>> 11 /* "sizeof(unsigned long) - 1" byte padding hole here. */
>>> 12 };
>>> 13
>>> 14
>>> 15 int foo ()
>>> 16 {
>>> 17   struct test_trailing_hole var[10];
>>> 18   return var[2].four;
>>> 19 }
>>> 20
>>> 21 /* { dg-final { scan-assembler "movl\t\\\$0," } } */
>>> 22 /* { dg-final

Re: [PATCH 1/2] Implement basic block path solver.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
Jeff has mentioned that it'll take a while longer to review the
threader rewrite, so I've decided to make some minor cleanups while he
gets to it.

There are few minor changes here:

1. I've renamed the solver to gimple-range-path.* which expresses
better that it's part of the ranger tools. The prefix tree-ssa-* is
somewhat outdated ;-).

2. I've made the folder a full blown range_query, which can be passed
around anywhere a range_query is accepted.  It turns out, we were 99%
of the way there, so might as well share the same API.  Now users will
be able use range_of_expr, range_of_stmt, and friends.  This can come
in handy when passing a range_query to something like
simplify_using_ranges, something which I am considering for my
follow-up changes to the DOM threader.

3. Finally, I've renamed the class to path_range_query to make it
obvious that it's a range_query object.

There are no functional changes.

Tested on x86-64 Linux.

I will wait on Jeff's review of the tree-ssa-threadbackward.* changes
before committing this.

Aldy

On Fri, Jul 2, 2021 at 3:17 PM Andrew MacLeod  wrote:
>
> On 7/2/21 4:13 AM, Aldy Hernandez wrote:
>
> +
> +// Return the range of STMT as it would be seen at the end of the path
> +// being analyzed.  Anything but the final conditional in a BB will
> +// return VARYING.
> +
> +void
> +path_solver::range_in_path (irange &r, gimple *stmt)
> +{
> +  if (gimple_code (stmt) == GIMPLE_COND && fold_range (r, stmt, this))
> +return;
> +
> +  r.set_varying (gimple_expr_type (stmt));
> +}
>
> Not objecting to anything here other than to note that I think we have cases 
> where there's a COND_EXPR on the RHS of statements within a block.  We're (in 
> general) not handling those well in DOM or jump threading.
>
>
> I guess I can put that on my TODO list :).
>
> note that we are no longer in the days of range-ops only processing...   
> fold_range handles COND_EXPR (and every other kind of stmt)  just fine.
>
> Andrew
From bb2d12abf7bab6306a38e143aed0f0a828f1c790 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 15 Jun 2021 12:20:43 +0200
Subject: [PATCH 2/5] Implement basic block path solver.

This is is the main basic block path solver for use in the ranger-based
backwards threader.  Given a path of BBs, the class can solve the final
conditional or any SSA name used in calculating the final conditional.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-range-path.o.
	* gimple-range-path.cc: New file.
	* gimple-range-path.h: New file.
---
 gcc/Makefile.in  |   1 +
 gcc/gimple-range-path.cc | 327 +++
 gcc/gimple-range-path.h  |  85 ++
 3 files changed, 413 insertions(+)
 create mode 100644 gcc/gimple-range-path.cc
 create mode 100644 gcc/gimple-range-path.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 934b2a05327..863f1256811 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1649,6 +1649,7 @@ OBJS = \
 	tree-ssa-loop.o \
 	tree-ssa-math-opts.o \
 	tree-ssa-operands.o \
+	gimple-range-path.o \
 	tree-ssa-phiopt.o \
 	tree-ssa-phiprop.o \
 	tree-ssa-pre.o \
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
new file mode 100644
index 000..dd7c5342d8b
--- /dev/null
+++ b/gcc/gimple-range-path.cc
@@ -0,0 +1,327 @@
+/* Basic block path solver.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfganal.h"
+#include "value-range.h"
+#include "gimple-range.h"
+#include "tree-pretty-print.h"
+#include "gimple-range-path.h"
+#include "ssa.h"
+
+// Internal construct to help facilitate debugging of solver.
+#define DEBUG_SOLVER (0 && dump_file)
+
+path_range_query::path_range_query (gimple_ranger &ranger)
+  : m_ranger (ranger)
+{
+  m_cache = new ssa_global_cache;
+  m_has_cache_entry = BITMAP_ALLOC (NULL);
+  m_path = NULL;
+}
+
+path_range_query::~path_range_query ()
+{
+  BITMAP_FREE (m_has_cache_entry);
+  delete m_cache;
+}
+
+// Mark cache entry for NAME as unused.
+
+void
+path_range_query::clear_cache (tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  bitmap_clear_bit (m_has_cache_entry, v);
+}
+
+// If NAME has a cache entry, return it in R, and return TRUE.
+
+inline bool

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, 15 Jul 2021, Hongtao Liu wrote:
>
>> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
>>  wrote:
>> >
>> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
>> > >
>> > > The following extends the existing loop masking support using
>> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
>> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
>> > > you can now enable masked vectorized epilogues (=1) or fully
>> > > masked vector loops (=2).
>> > >
>> > > What's missing is using a scalar IV for the loop control
>> > > (but in principle AVX512 can use the mask here - just the patch
>> > > doesn't seem to work for AVX512 yet for some reason - likely
>> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
>> > > is providing more support for predicated operations in the case
>> > > of reductions either via VEC_COND_EXPRs or via implementing
>> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
>> > > to masked AVX512 operations.
>> > >
>> > > For AVX2 and
>> > >
>> > > int foo (unsigned *a, unsigned * __restrict b, int n)
>> > > {
>> > >   unsigned sum = 1;
>> > >   for (int i = 0; i < n; ++i)
>> > > b[i] += a[i];
>> > >   return sum;
>> > > }
>> > >
>> > > we get
>> > >
>> > > .L3:
>> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
>> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
>> > > addl$8, %edx
>> > > vpaddd  %ymm3, %ymm1, %ymm1
>> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
>> > > vmovd   %edx, %xmm1
>> > > vpsubd  %ymm15, %ymm2, %ymm0
>> > > addq$32, %rax
>> > > vpbroadcastd%xmm1, %ymm1
>> > > vpaddd  %ymm4, %ymm1, %ymm1
>> > > vpsubd  %ymm15, %ymm1, %ymm1
>> > > vpcmpgtd%ymm1, %ymm0, %ymm0
>> > > vptest  %ymm0, %ymm0
>> > > jne .L3
>> > >
>> > > for the fully masked loop body and for the masked epilogue
>> > > we see
>> > >
>> > > .L4:
>> > > vmovdqu (%rsi,%rax), %ymm3
>> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
>> > > vmovdqu %ymm0, (%rsi,%rax)
>> > > addq$32, %rax
>> > > cmpq%rax, %rcx
>> > > jne .L4
>> > > movl%edx, %eax
>> > > andl$-8, %eax
>> > > testb   $7, %dl
>> > > je  .L11
>> > > .L3:
>> > > subl%eax, %edx
>> > > vmovdqa .LC0(%rip), %ymm1
>> > > salq$2, %rax
>> > > vmovd   %edx, %xmm0
>> > > movl$-2147483648, %edx
>> > > addq%rax, %rsi
>> > > vmovd   %edx, %xmm15
>> > > vpbroadcastd%xmm0, %ymm0
>> > > vpbroadcastd%xmm15, %ymm15
>> > > vpsubd  %ymm15, %ymm1, %ymm1
>> > > vpsubd  %ymm15, %ymm0, %ymm0
>> > > vpcmpgtd%ymm1, %ymm0, %ymm0
>> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
>> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
>> > > vpaddd  %ymm2, %ymm1, %ymm1
>> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
>> > > .L11:
>> > > vzeroupper
>> > >
>> > > compared to
>> > >
>> > > .L3:
>> > > movl%edx, %r8d
>> > > subl%eax, %r8d
>> > > leal-1(%r8), %r9d
>> > > cmpl$2, %r9d
>> > > jbe .L6
>> > > leaq(%rcx,%rax,4), %r9
>> > > vmovdqu (%rdi,%rax,4), %xmm2
>> > > movl%r8d, %eax
>> > > andl$-4, %eax
>> > > vpaddd  (%r9), %xmm2, %xmm0
>> > > addl%eax, %esi
>> > > andl$3, %r8d
>> > > vmovdqu %xmm0, (%r9)
>> > > je  .L2
>> > > .L6:
>> > > movslq  %esi, %r8
>> > > leaq0(,%r8,4), %rax
>> > > movl(%rdi,%r8,4), %r8d
>> > > addl%r8d, (%rcx,%rax)
>> > > leal1(%rsi), %r8d
>> > > cmpl%r8d, %edx
>> > > jle .L2
>> > > addl$2, %esi
>> > > movl4(%rdi,%rax), %r8d
>> > > addl%r8d, 4(%rcx,%rax)
>> > > cmpl%esi, %edx
>> > > jle .L2
>> > > movl8(%rdi,%rax), %edx
>> > > addl%edx, 8(%rcx,%rax)
>> > > .L2:
>> > >
>> > > I'm giving this a little testing right now but will dig on why
>> > > I don't get masked loops when AVX512 is enabled.
>> >
>> > Ah, a simple thinko - rgroup_controls vectypes seem to be
>> > always VECTOR_BOOLEAN_TYPE_P and thus we can
>> > use expand_vec_cmp_expr_p.  The AVX512 fully masked
>> > loop then looks like
>> >
>> > .L3:
>> > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
>> > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
>> > vpaddd  %ymm2, %ymm1, %ymm0
>> > vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
>> > addq$8, %rax
>> > vpbroadcastd%eax, %ymm0
>> > vpaddd  %ymm4, %ymm0, %ymm0
>> > vpcmpud $6, %ymm0, %ymm3, %k1
>> > kortestb%k1, %k1
>> > jne .L3
>> >
>> > I guess for x86 i

Re: [PATCH 2/2] Backwards jump threader rewrite with ranger.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
As mentioned in my previous email, these are some minor changes to the
previous revision.  All I'm changing here is the call into the solver
to use range_of_expr and range_of_stmt.  Everything else remains the
same.

Tested on x86-64 Linux.

On Mon, Jul 5, 2021 at 5:39 PM Aldy Hernandez  wrote:
>
> PING.
>
> Aldy
From 1774338ddd1f4718884e766aae2fc48b97110c5d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 15 Jun 2021 12:32:51 +0200
Subject: [PATCH 3/5] Backwards jump threader rewrite with ranger.

This is a rewrite of the backwards threader with a ranger based solver.

The code is divided into two parts: the path solver in
gimple-range-path.*, and the path discovery bits in
tree-ssa-threadbackward.c.

The legacy code is still available with --param=threader-mode=legacy,
but will be removed shortly after.

gcc/ChangeLog:

	* Makefile.in (tree-ssa-loop-im.o-warn): New.
	* flag-types.h (enum threader_mode): New.
	* params.opt: Add entry for --param=threader-mode.
	* tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New.
	(class back_threader): New.
	(back_threader::back_threader): New.
	(back_threader::~back_threader): New.
	(back_threader::maybe_register_path): New.
	(back_threader::find_taken_edge): New.
	(back_threader::find_taken_edge_switch): New.
	(back_threader::find_taken_edge_cond): New.
	(back_threader::resolve_def): New.
	(back_threader::resolve_phi): New.
	(back_threader::find_paths_to_names): New.
	(back_threader::find_paths): New.
	(dump_path): New.
	(debug): New.
	(thread_jumps::find_jump_threads_backwards): Call ranger threader.
	(thread_jumps::find_jump_threads_backwards_with_ranger): New.
	(pass_thread_jumps::execute): Abstract out code...
	(try_thread_blocks): ...here.
	* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
	Abstract out threading candidate code to...
	(single_succ_to_potentially_threadable_block): ...here.
	* tree-ssa-threadedge.h (single_succ_to_potentially_threadable_block):
	New.
	* tree-ssa-threadupdate.c (register_jump_thread): Return boolean.
	* tree-ssa-threadupdate.h (class jump_thread_path_registry):
	Return bool from register_jump_thread.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for
	threader.
	* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

	* g++.dg/debug/dwarf2/deallocator.C: Adjust for threader.
	* gcc.c-torture/compile/pr83510.c: Same.
	* gcc.dg/loop-unswitch-2.c: Same.
	* gcc.dg/old-style-asm-1.c: Same.
	* gcc.dg/pr68317.c: Same.
	* gcc.dg/pr97567-2.c: Same.
	* gcc.dg/predict-9.c: Same.
	* gcc.dg/shrink-wrap-loop.c: Same.
	* gcc.dg/sibcall-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-3.c: Same.
	* gcc.dg/tree-ssa/pr21001.c: Same.
	* gcc.dg/tree-ssa/pr21294.c: Same.
	* gcc.dg/tree-ssa/pr21417.c: Same.
	* gcc.dg/tree-ssa/pr21458-2.c: Same.
	* gcc.dg/tree-ssa/pr21563.c: Same.
	* gcc.dg/tree-ssa/pr49039.c: Same.
	* gcc.dg/tree-ssa/pr61839_1.c: Same.
	* gcc.dg/tree-ssa/pr61839_3.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/split-path-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/tree-ssa/ssa-fre-48.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
	* gcc.dg/tree-ssa/vrp02.c: Same.
	* gcc.dg/tree-ssa/vrp03.c: Same.
	* gcc.dg/tree-ssa/vrp05.c: Same.
	* gcc.dg/tree-ssa/vrp06.c: Same.
	* gcc.dg/tree-ssa/vrp07.c: Same.
	* gcc.dg/tree-ssa/vrp09.c: Same.
	* gcc.dg/tree-ssa/vrp19.c: Same.
	* gcc.dg/tree-ssa/vrp20.c: Same.
	* gcc.dg/tree-ssa/vrp33.c: Same.
	* gcc.dg/uninit-pred-9_b.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.target/i386/avx2-vect-aggressive.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-1.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-2.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-3.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-4.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-5.c: New test.
---
 gcc/Makefile.in   |   5 +
 gcc/flag-types.h  |   7 +
 gcc/params.opt|  17 +
 .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
 gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
 gcc/testsuite/gcc.dg/loop-unswitch-2.c|   2 +-
 gcc/testsuite/gcc.dg/old-style-asm-1.c|   5 +-
 gcc/testsuite/gcc.dg/pr68317.c|   4 +-
 gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
 gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
 gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
 .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |  25 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr21294.c  

RFA: Libiberty: Fix stack exhaunstion demangling corrupt rust names

2021-07-15 Thread Nick Clifton via Gcc-patches

Hi Guys,

  Attached is a proposed patch to fix PR 99935 and 100968, both
  of which are stack exhaustion problems in libiberty's Rust
  demangler.  The patch adds a recursion limit along the lines
  of the one already in place for the C++ demangler.

  OK to apply ?

Cheers
  Nick
diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index 6fd8f6a4db0..df09b7b8fdd 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -74,6 +74,12 @@ struct rust_demangler
   /* Rust mangling version, with legacy mangling being -1. */
   int version;
 
+  /* Recursion depth.  */
+  uint recursion;
+  /* Maximum number of times demangle_path may be called recursively.  */
+#define RUST_MAX_RECURSION_COUNT  1024
+#define RUST_NO_RECURSION_LIMIT   ((uint) -1)
+
   uint64_t bound_lifetime_depth;
 };
 
@@ -671,6 +677,15 @@ demangle_path (struct rust_demangler *rdm, int in_value)
   if (rdm->errored)
 return;
 
+  if (rdm->recursion != RUST_NO_RECURSION_LIMIT)
+{
+  ++ rdm->recursion;
+  if (rdm->recursion > RUST_MAX_RECURSION_COUNT)
+	/* FIXME: There ought to be a way to report
+	   that the recursion limit has been reached.  */
+	goto fail_return;
+}
+
   switch (tag = next (rdm))
 {
 case 'C':
@@ -688,10 +703,7 @@ demangle_path (struct rust_demangler *rdm, int in_value)
 case 'N':
   ns = next (rdm);
   if (!ISLOWER (ns) && !ISUPPER (ns))
-{
-  rdm->errored = 1;
-  return;
-}
+	goto fail_return;
 
   demangle_path (rdm, in_value);
 
@@ -776,9 +788,15 @@ demangle_path (struct rust_demangler *rdm, int in_value)
 }
   break;
 default:
-  rdm->errored = 1;
-  return;
+  goto fail_return;
 }
+  goto pass_return;
+
+ fail_return:
+  rdm->errored = 1;
+ pass_return:
+  if (rdm->recursion != RUST_NO_RECURSION_LIMIT)
+-- rdm->recursion;
 }
 
 static void
@@ -1317,6 +1338,7 @@ rust_demangle_callback (const char *mangled, int options,
   rdm.skipping_printing = 0;
   rdm.verbose = (options & DMGL_VERBOSE) != 0;
   rdm.version = 0;
+  rdm.recursion = (options & DMGL_NO_RECURSE_LIMIT) ? RUST_NO_RECURSION_LIMIT : 0;
   rdm.bound_lifetime_depth = 0;
 
   /* Rust symbols always start with _R (v0) or _ZN (legacy). */


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Thu, 15 Jul 2021, Hongtao Liu wrote:
> >
> >> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
> >>  wrote:
> >> >
> >> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  
> >> > wrote:
> >> > >
> >> > > The following extends the existing loop masking support using
> >> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> >> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> >> > > you can now enable masked vectorized epilogues (=1) or fully
> >> > > masked vector loops (=2).
> >> > >
> >> > > What's missing is using a scalar IV for the loop control
> >> > > (but in principle AVX512 can use the mask here - just the patch
> >> > > doesn't seem to work for AVX512 yet for some reason - likely
> >> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> >> > > is providing more support for predicated operations in the case
> >> > > of reductions either via VEC_COND_EXPRs or via implementing
> >> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> >> > > to masked AVX512 operations.
> >> > >
> >> > > For AVX2 and
> >> > >
> >> > > int foo (unsigned *a, unsigned * __restrict b, int n)
> >> > > {
> >> > >   unsigned sum = 1;
> >> > >   for (int i = 0; i < n; ++i)
> >> > > b[i] += a[i];
> >> > >   return sum;
> >> > > }
> >> > >
> >> > > we get
> >> > >
> >> > > .L3:
> >> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> >> > > addl$8, %edx
> >> > > vpaddd  %ymm3, %ymm1, %ymm1
> >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> >> > > vmovd   %edx, %xmm1
> >> > > vpsubd  %ymm15, %ymm2, %ymm0
> >> > > addq$32, %rax
> >> > > vpbroadcastd%xmm1, %ymm1
> >> > > vpaddd  %ymm4, %ymm1, %ymm1
> >> > > vpsubd  %ymm15, %ymm1, %ymm1
> >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> >> > > vptest  %ymm0, %ymm0
> >> > > jne .L3
> >> > >
> >> > > for the fully masked loop body and for the masked epilogue
> >> > > we see
> >> > >
> >> > > .L4:
> >> > > vmovdqu (%rsi,%rax), %ymm3
> >> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> >> > > vmovdqu %ymm0, (%rsi,%rax)
> >> > > addq$32, %rax
> >> > > cmpq%rax, %rcx
> >> > > jne .L4
> >> > > movl%edx, %eax
> >> > > andl$-8, %eax
> >> > > testb   $7, %dl
> >> > > je  .L11
> >> > > .L3:
> >> > > subl%eax, %edx
> >> > > vmovdqa .LC0(%rip), %ymm1
> >> > > salq$2, %rax
> >> > > vmovd   %edx, %xmm0
> >> > > movl$-2147483648, %edx
> >> > > addq%rax, %rsi
> >> > > vmovd   %edx, %xmm15
> >> > > vpbroadcastd%xmm0, %ymm0
> >> > > vpbroadcastd%xmm15, %ymm15
> >> > > vpsubd  %ymm15, %ymm1, %ymm1
> >> > > vpsubd  %ymm15, %ymm0, %ymm0
> >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> >> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> >> > > vpaddd  %ymm2, %ymm1, %ymm1
> >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> >> > > .L11:
> >> > > vzeroupper
> >> > >
> >> > > compared to
> >> > >
> >> > > .L3:
> >> > > movl%edx, %r8d
> >> > > subl%eax, %r8d
> >> > > leal-1(%r8), %r9d
> >> > > cmpl$2, %r9d
> >> > > jbe .L6
> >> > > leaq(%rcx,%rax,4), %r9
> >> > > vmovdqu (%rdi,%rax,4), %xmm2
> >> > > movl%r8d, %eax
> >> > > andl$-4, %eax
> >> > > vpaddd  (%r9), %xmm2, %xmm0
> >> > > addl%eax, %esi
> >> > > andl$3, %r8d
> >> > > vmovdqu %xmm0, (%r9)
> >> > > je  .L2
> >> > > .L6:
> >> > > movslq  %esi, %r8
> >> > > leaq0(,%r8,4), %rax
> >> > > movl(%rdi,%r8,4), %r8d
> >> > > addl%r8d, (%rcx,%rax)
> >> > > leal1(%rsi), %r8d
> >> > > cmpl%r8d, %edx
> >> > > jle .L2
> >> > > addl$2, %esi
> >> > > movl4(%rdi,%rax), %r8d
> >> > > addl%r8d, 4(%rcx,%rax)
> >> > > cmpl%esi, %edx
> >> > > jle .L2
> >> > > movl8(%rdi,%rax), %edx
> >> > > addl%edx, 8(%rcx,%rax)
> >> > > .L2:
> >> > >
> >> > > I'm giving this a little testing right now but will dig on why
> >> > > I don't get masked loops when AVX512 is enabled.
> >> >
> >> > Ah, a simple thinko - rgroup_controls vectypes seem to be
> >> > always VECTOR_BOOLEAN_TYPE_P and thus we can
> >> > use expand_vec_cmp_expr_p.  The AVX512 fully masked
> >> > loop then looks like
> >> >
> >> > .L3:
> >> > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
> >> > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
> >> > vpaddd  %ym

[PATCH] Change the type of return value of profile_count::value to uint64_t

2021-07-15 Thread Martin Jambor
Hi,

The field in which profile_count holds the count has 61 bits but the
getter method only returns it as a 32 bit number.  The getter is (and
should be) only used for dumping but even dumps are better when they do
not lie.

The patch has passed bootstrap and testing on x86_64-linux and Honza has
approved it so I will commit it shortly.

Martin


gcc/ChangeLog:

2021-07-13  Martin Jambor  

* profile-count.h (profile_count::value): Change the return type to
uint64_t.
* gimple-pretty-print.c (dump_gimple_bb_header): Adjust print
statement.
* tree-cfg.c (dump_function_to_file): Likewise.
---
 gcc/gimple-pretty-print.c | 2 +-
 gcc/profile-count.h   | 2 +-
 gcc/tree-cfg.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 39c5775e2cb..d6e63d6e57f 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -2831,7 +2831,7 @@ dump_gimple_bb_header (FILE *outf, basic_block bb, int 
indent,
  if (bb->loop_father->header == bb)
fprintf (outf, ",loop_header(%d)", bb->loop_father->num);
  if (bb->count.initialized_p ())
-   fprintf (outf, ",%s(%d)",
+   fprintf (outf, ",%s(%" PRIu64 ")",
 profile_quality_as_string (bb->count.quality ()),
 bb->count.value ());
  fprintf (outf, "):\n");
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index f2b1e3a6525..c7a45ac5ee3 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -804,7 +804,7 @@ public:
 }
 
   /* Get the value of the count.  */
-  uint32_t value () const { return m_val; }
+  uint64_t value () const { return m_val; }
 
   /* Get the quality of the count.  */
   enum profile_quality quality () const { return m_quality; }
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6..2ed191f9a47 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -8081,7 +8081,7 @@ dump_function_to_file (tree fndecl, FILE *file, 
dump_flags_t flags)
{
  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
  if (bb->count.initialized_p ())
-   fprintf (file, ",%s(%d)",
+   fprintf (file, ",%s(%" PRIu64 ")",
 profile_quality_as_string (bb->count.quality ()),
 bb->count.value ());
  fprintf (file, ")\n%s (", function_name (fun));
-- 
2.32.0



Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Christophe Lyon via Gcc-patches
On Thu, Jul 15, 2021 at 2:34 PM Richard Biener  wrote:

> On Thu, 15 Jul 2021, Christophe Lyon wrote:
>
> > Hi,
> >
> >
> >
> > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener 
> wrote:
> >
> > > The following adds support for re-using the vector reduction def
> > > from the main loop in vectorized epilogue loops on architectures
> > > which use different vector sizes for the epilogue.  That's only
> > > x86 as far as I am aware.
> > >
> > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > > regtest in progress.
> > >
> > > There's costing issues on x86 which usually prevent vectorizing
> > > an epilogue with a reduction, at least for loops that only
> > > have a reduction - it could be mitigated by not accounting for
> > > the epilogue there if we can compute that we can re-use the
> > > main loops cost.
> > >
> > > Richard - did I figure the correct place to adjust?  I guess
> > > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > > for re-use by the skip code in vect_create_epilog_for_reduction
> > > is a bit awkward but at least we're conciously doing
> > > vect_create_epilog_for_reduction last (via vectorizing live
> > > operations).
> > >
> > > OK in the unlikely case all testing succeeds (I also want to
> > > run it through SPEC with/without -fno-vect-cost-model which
> > > will take some time)?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-13  Richard Biener  
> > >
> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > > vector types where the old vector type has a multiple of
> > > the new vector type elements.
> > > (vect_create_partial_epilog): New function, split out from...
> > > (vect_create_epilog_for_reduction): ... here.
> > > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > > to the new vector type.
> > >
> > > * gcc.target/i386/vect-reduc-1.c: New testcase.
> > >
> >
> > This patch is causing regressions on aarch64:
> >  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> > Excess errors:
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> > 'PHI' argument 1
> > vector(2) unsigned int
> > vector(2) int
> > _91 = PHI <_90(17), _83(11)>
> > during GIMPLE pass: vect
> > dump file: ./pr92324-4.c.167t.vect
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> > verify_gimple failed
> > 0xe6438e verify_gimple_in_cfg(function*, bool)
> > /gcc/tree-cfg.c:5535
> > 0xd13902 execute_function_todo
> > /gcc/passes.c:2042
> > 0xd142a5 execute_todo
> > /gcc/passes.c:2096
> >
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fminnmv
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fmaxnmv
>
> What exact options do you pass to cc1 to get this?  Can you track this
> in a PR please?
>
> Thanks,
> Richard.
>
>
Sure, I filed PR 101462

Christophe


> > Thanks,
> >
> > Christophe
> >
> >
> >
> > > ---
> > >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> > >  gcc/tree-vect-loop.c | 223 ---
> > >  2 files changed, 155 insertions(+), 85 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > new file mode 100644
> > > index 000..9ee9ba4e736
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" }
> */
> > > +
> > > +#define N 32
> > > +int foo (int *a, int n)
> > > +{
> > > +  int sum = 1;
> > > +  for (int i = 0; i < 8*N + 4; ++i)
> > > +sum += a[i];
> > > +  return sum;
> > > +}
> > > +
> > > +/* The reduction epilog should be vectorized and the accumulator
> > > +   re-used.  */
> > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } }
> */
> > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index 8c27d75f889..98e2a845629 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > >   ones as well.  */
> > >tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> > >tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > > +   TYPE_VECTOR_SUBPARTS (vectype)))
> > >  return false;
> > >
> > >/* Non-SLP reductions might apply an adjustment after

[PATCH, committed] rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15 Thread Bill Schmidt via Gcc-patches

Hi,

Segher preapproved this patch in https://gcc.gnu.org/PR101129. It 
differs slightly from what was posted there, needing an additional test 
to ensure the insn is a SET.  The patch also includes the test case 
provided by the OP.  Bootstrap and regtest succeeded on P9 little-endian.


This bug has been around a long time, so the fix should be backported to 
all open releases.  Is this okay after some burn-in time?


Thanks!
Bill

rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15  Bill Schmidt  

gcc/
* config/rs6000/rs6000-p8swap.c (has_part_mult): New.
(rs6000_analyze_swaps): Insns containing a subreg of a mult are
not swappable.

gcc/testsuite/
* gcc.target/powerpc/pr101129.c: New.

diff --git a/gcc/config/rs6000/rs6000-p8swap.c 
b/gcc/config/rs6000/rs6000-p8swap.c
index 21cbcb2e28a..6b559aa5061 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -1523,6 +1523,22 @@ replace_swap_with_copy (swap_web_entry *insn_entry, 
unsigned i)
   insn->set_deleted ();
 }
 
+/* INSN is known to contain a SUBREG, which we can normally handle,

+   but if the SUBREG itself contains a MULT then we need to leave it alone
+   to avoid turning a mult_hipart into a mult_lopart, for example.  */
+static bool
+has_part_mult (rtx_insn *insn)
+{
+  rtx body = PATTERN (insn);
+  if (GET_CODE (body) != SET)
+return false;
+  rtx src = SET_SRC (body);
+  if (GET_CODE (src) != SUBREG)
+return false;
+  rtx inner = XEXP (src, 0);
+  return (GET_CODE (inner) == MULT);
+}
+
 /* Make NEW_MEM_EXP's attributes and flags resemble those of
ORIGINAL_MEM_EXP.  */
 static void
@@ -2501,6 +2517,9 @@ rs6000_analyze_swaps (function *fun)
insn_entry[uid].is_swappable = 0;
  else if (special != SH_NONE)
insn_entry[uid].special_handling = special;
+ else if (insn_entry[uid].contains_subreg
+  && has_part_mult (insn))
+   insn_entry[uid].is_swappable = 0;
  else if (insn_entry[uid].contains_subreg)
insn_entry[uid].special_handling = SH_SUBREG;
}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr101129.c 
b/gcc/testsuite/gcc.target/powerpc/pr101129.c
new file mode 100644
index 000..1abc12480e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr101129.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O " } */
+
+/* PR101129: The swaps pass was turning a mult-lopart into a mult-hipart.
+   Make sure we aren't doing that anymore.  */
+
+typedef unsigned char u8;
+typedef unsigned char __attribute__((__vector_size__ (8))) U;
+typedef unsigned char __attribute__((__vector_size__ (16))) V;
+typedef unsigned int u32;
+typedef unsigned long long u64;
+typedef __int128 u128;
+
+u8 g;
+U u;
+
+void
+foo0 (u32 u32_0, U *ret)
+{
+  u128 u128_2 = u32_0 * (u128)((V){ 5 } > (u32_0 & 4));
+  u64 u64_r = u128_2 >> 64;
+  u8 u8_r = u64_r + g;
+  *ret = u + u8_r;
+}
+
+int
+main (void)
+{
+  U x;
+  foo0 (7, &x);
+  for (unsigned i = 0; i < sizeof (x); i++)
+if (x[i] != 0) __builtin_abort();
+  return 0;
+}



Re: [PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-15 Thread Palmer Dabbelt

On Sun, 11 Jul 2021 21:29:13 PDT (-0700), kito.ch...@sifive.com wrote:

It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
 gcc/config/riscv/constraints.md | 3 +--
 gcc/doc/md.texi | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
(match_test "GET_CODE(XEXP(op,0)) == REG")))

 (define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
   (match_operand 0 "absolute_symbolic_operand"))

 (define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
 @item A
 An address that is held in a general-purpose register.

+@item S
+A constraint that matches an absolute symbolic address.
+
 @end table

 @item RX---@file{config/rx/constraints.md}


Reviewed-by: Palmer Dabbelt 


[committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

Tested powerpc64le-linux. Committed to trunk.

commit 1f7182d68c24985dace2a94422c671ff987c262c
Author: Jonathan Wakely 
Date:   Wed Jul 14 12:25:11 2021

libstdc++: Add noexcept to __replacement_assert [PR101429]

This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 9314117aed8..69ace386dd7 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -500,6 +500,7 @@ namespace std
 // Assert.
 #if defined(_GLIBCXX_ASSERTIONS) \
   || defined(_GLIBCXX_PARALLEL) || defined(_GLIBCXX_PARALLEL_ASSERTIONS)
+# if _GLIBCXX_HOSTED && _GLIBCXX_VERBOSE
 namespace std
 {
   // Avoid the use of assert, because we're trying to keep the 
@@ -508,6 +509,7 @@ namespace std
   inline void
   __replacement_assert(const char* __file, int __line,
   const char* __function, const char* __condition)
+  _GLIBCXX_NOEXCEPT
   {
 __builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
 __function, __condition);
@@ -517,10 +519,18 @@ namespace std
 #define __glibcxx_assert_impl(_Condition) \
   if (__builtin_expect(!bool(_Condition), false)) \
   {   \
-__glibcxx_constexpr_assert(_Condition);   \
+__glibcxx_constexpr_assert(false);\
 std::__replacement_assert(__FILE__, __LINE__, __PRETTY_FUNCTION__, \
  #_Condition);\
   }
+# else // ! VERBOSE
+# define __glibcxx_assert_impl(_Condition) \
+  if (__builtin_expect(!bool(_Condition), false))  \
+  {\
+__glibcxx_constexpr_assert(false); \
+__builtin_abort(); \
+  }
+#endif
 #endif
 
 #if defined(_GLIBCXX_ASSERTIONS)


[committed] libstdc++: Fix std::get for std::tuple [PR101427]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 17855eed7fc76b2cee7fbbc26f84d3c8b99be13c
Author: Jonathan Wakely 
Date:   Wed Jul 14 20:14:14 2021

libstdc++: Fix std::get for std::tuple [PR101427]

The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 2d562f8da77..6953f8715d7 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1358,7 +1358,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct tuple_element<__i, tuple<>>
 {
   static_assert(__i < tuple_size>::value,
- "tuple index is in range");
+ "tuple index must be in range");
 };
 
   template
@@ -1371,6 +1371,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __get_helper(const _Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
 { return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
 
+  // Deleted overload to improve diagnostics for invalid indices
+  template
+__enable_if_t<(__i >= sizeof...(_Types))>
+__get_helper(const tuple<_Types...>&) = delete;
+
   /// Return a reference to the ith element of a tuple.
   template
 constexpr __tuple_element_t<__i, tuple<_Elements...>>&
@@ -1389,7 +1394,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 get(tuple<_Elements...>&& __t) noexcept
 {
   typedef __tuple_element_t<__i, tuple<_Elements...>> __element_type;
-  return std::forward<__element_type&&>(std::get<__i>(__t));
+  return std::forward<__element_type>(std::__get_helper<__i>(__t));
 }
 
   /// Return a const rvalue reference to the ith element of a const tuple 
rvalue.
@@ -1398,47 +1403,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 get(const tuple<_Elements...>&& __t) noexcept
 {
   typedef __tuple_element_t<__i, tuple<_Elements...>> __element_type;
-  return std::forward(std::get<__i>(__t));
+  return std::forward(std::__get_helper<__i>(__t));
 }
 
 #if __cplusplus >= 201402L
 
 #define __cpp_lib_tuples_by_type 201304
 
-  template
-constexpr _Head&
-__get_helper2(_Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
-{ return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
-
-  template
-constexpr const _Head&
-__get_helper2(const _Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
-{ return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
+  // Return the index of _Tp in _Types, if it occurs exactly once.

Re: [committed] libstdc++: Fix std::get for std::tuple [PR101427]

2021-07-15 Thread Jonathan Wakely via Gcc-patches

On 15/07/21 16:26 +0100, Jonathan Wakely wrote:

The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

Tested powerpc64le-linux. Committed to trunk.


This should be backported to gcc-11 in time for 11.2 as well. If you
see any problems with it please let me know ASAP.



Re: [PATCH, committed] rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15 Thread David Edelsohn via Gcc-patches
On Thu, Jul 15, 2021 at 11:25 AM Bill Schmidt  wrote:
>
> Hi,
>
> Segher preapproved this patch in https://gcc.gnu.org/PR101129.  It differs 
> slightly from what was posted there, needing an additional test to ensure the 
> insn is a SET.  The patch also includes the test case provided by the OP.  
> Bootstrap and regtest succeeded on P9 little-endian.
>
> This bug has been around a long time, so the fix should be backported to all 
> open releases.  Is this okay after some burn-in time?
>
> Thanks!
> Bill
>
> rs6000: Don't let swaps pass break multiply low-part (PR101129)
>
> 2021-07-15  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-p8swap.c (has_part_mult): New.
> (rs6000_analyze_swaps): Insns containing a subreg of a mult are
> not swappable.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr101129.c: New.

Thanks for fixing this so quickly.

Okay everywhere.

Thanks, David


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Biener wrote:

> On Thu, 15 Jul 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Thu, 15 Jul 2021, Hongtao Liu wrote:
> > >
> > >> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
> > >>  wrote:
> > >> >
> > >> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  
> > >> > wrote:
> > >> > >
> > >> > > The following extends the existing loop masking support using
> > >> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > >> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > >> > > you can now enable masked vectorized epilogues (=1) or fully
> > >> > > masked vector loops (=2).
> > >> > >
> > >> > > What's missing is using a scalar IV for the loop control
> > >> > > (but in principle AVX512 can use the mask here - just the patch
> > >> > > doesn't seem to work for AVX512 yet for some reason - likely
> > >> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > >> > > is providing more support for predicated operations in the case
> > >> > > of reductions either via VEC_COND_EXPRs or via implementing
> > >> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > >> > > to masked AVX512 operations.
> > >> > >
> > >> > > For AVX2 and
> > >> > >
> > >> > > int foo (unsigned *a, unsigned * __restrict b, int n)
> > >> > > {
> > >> > >   unsigned sum = 1;
> > >> > >   for (int i = 0; i < n; ++i)
> > >> > > b[i] += a[i];
> > >> > >   return sum;
> > >> > > }
> > >> > >
> > >> > > we get
> > >> > >
> > >> > > .L3:
> > >> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > >> > > addl$8, %edx
> > >> > > vpaddd  %ymm3, %ymm1, %ymm1
> > >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > >> > > vmovd   %edx, %xmm1
> > >> > > vpsubd  %ymm15, %ymm2, %ymm0
> > >> > > addq$32, %rax
> > >> > > vpbroadcastd%xmm1, %ymm1
> > >> > > vpaddd  %ymm4, %ymm1, %ymm1
> > >> > > vpsubd  %ymm15, %ymm1, %ymm1
> > >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > >> > > vptest  %ymm0, %ymm0
> > >> > > jne .L3
> > >> > >
> > >> > > for the fully masked loop body and for the masked epilogue
> > >> > > we see
> > >> > >
> > >> > > .L4:
> > >> > > vmovdqu (%rsi,%rax), %ymm3
> > >> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > >> > > vmovdqu %ymm0, (%rsi,%rax)
> > >> > > addq$32, %rax
> > >> > > cmpq%rax, %rcx
> > >> > > jne .L4
> > >> > > movl%edx, %eax
> > >> > > andl$-8, %eax
> > >> > > testb   $7, %dl
> > >> > > je  .L11
> > >> > > .L3:
> > >> > > subl%eax, %edx
> > >> > > vmovdqa .LC0(%rip), %ymm1
> > >> > > salq$2, %rax
> > >> > > vmovd   %edx, %xmm0
> > >> > > movl$-2147483648, %edx
> > >> > > addq%rax, %rsi
> > >> > > vmovd   %edx, %xmm15
> > >> > > vpbroadcastd%xmm0, %ymm0
> > >> > > vpbroadcastd%xmm15, %ymm15
> > >> > > vpsubd  %ymm15, %ymm1, %ymm1
> > >> > > vpsubd  %ymm15, %ymm0, %ymm0
> > >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > >> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > >> > > vpaddd  %ymm2, %ymm1, %ymm1
> > >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > >> > > .L11:
> > >> > > vzeroupper
> > >> > >
> > >> > > compared to
> > >> > >
> > >> > > .L3:
> > >> > > movl%edx, %r8d
> > >> > > subl%eax, %r8d
> > >> > > leal-1(%r8), %r9d
> > >> > > cmpl$2, %r9d
> > >> > > jbe .L6
> > >> > > leaq(%rcx,%rax,4), %r9
> > >> > > vmovdqu (%rdi,%rax,4), %xmm2
> > >> > > movl%r8d, %eax
> > >> > > andl$-4, %eax
> > >> > > vpaddd  (%r9), %xmm2, %xmm0
> > >> > > addl%eax, %esi
> > >> > > andl$3, %r8d
> > >> > > vmovdqu %xmm0, (%r9)
> > >> > > je  .L2
> > >> > > .L6:
> > >> > > movslq  %esi, %r8
> > >> > > leaq0(,%r8,4), %rax
> > >> > > movl(%rdi,%r8,4), %r8d
> > >> > > addl%r8d, (%rcx,%rax)
> > >> > > leal1(%rsi), %r8d
> > >> > > cmpl%r8d, %edx
> > >> > > jle .L2
> > >> > > addl$2, %esi
> > >> > > movl4(%rdi,%rax), %r8d
> > >> > > addl%r8d, 4(%rcx,%rax)
> > >> > > cmpl%esi, %edx
> > >> > > jle .L2
> > >> > > movl8(%rdi,%rax), %edx
> > >> > > addl%edx, 8(%rcx,%rax)
> > >> > > .L2:
> > >> > >
> > >> > > I'm giving this a little testing right now but will dig on why
> > >> > > I don't get masked loops when AVX512 is enabled.
> > >> >
> > >> > Ah, a simple thinko - rgroup_controls vectypes seem to be
> 

[PATCH 1/2] testsuite: [arm] Add missing effective-target to vusdot-autovec.c

2021-07-15 Thread Christophe Lyon via Gcc-patches
This test fails when forcing an -mcpu option incompatible with
-march=armv8.2-a+i8mm.

This patch adds the missing arm_v8_2a_i8mm_ok effective-target, as
well as the associated dg-add-options arm_v8_2a_i8mm.

2021-07-15  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/simd/vusdot-autovec.c: Use arm_v8_2a_i8mm_ok
effective-target.
---
 gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
index 7cc56f68817..e7af895b423 100644
--- a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_2a_i8mm } */
 
 #define N 480
 #define SIGNEDNESS_1 unsigned
-- 
2.25.1



[PATCH 2/2] testsuite: [arm] Remove arm_v8_2a_imm8_neon_ok_nocache

2021-07-15 Thread Christophe Lyon via Gcc-patches
This patch removes this recently-introduced effective-target, as it
looks like a typo and duplicate for arm_v8_2a_i8mm_ok (imm8 vs i8mm),
and it is not used.

2021-07-15  Christophe Lyon  

gcc/testsuite/
* lib/target-supports.exp (arm_v8_2a_imm8_neon_ok_nocache):
Delete.
---
 gcc/testsuite/lib/target-supports.exp | 30 ---
 1 file changed, 30 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 1c27ccd94af..28950803b13 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5267,36 +5267,6 @@ proc 
check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
 return 0;
 }
 
-# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
-# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
-# Record the command line options needed.
-
-proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
-global et_arm_v8_2a_imm8_neon_flags
-set et_arm_v8_2a_imm8_neon_flags ""
-
-if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
-return 0;
-}
-
-# Iterate through sets of options to find the compiler flags that
-# need to be added to the -march option.
-foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
"-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
-if { [check_no_compiler_messages_nocache \
-  arm_v8_2a_imm8_neon_ok object {
-   #include 
-#if !defined (__ARM_FEATURE_MATMUL_INT8)
-#error "__ARM_FEATURE_MATMUL_INT8 not defined"
-#endif
-} "$flags -march=armv8.2-a+imm8"] } {
-set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
-return 1
-}
-}
-
-return 0;
-}
-
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
-- 
2.25.1



Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-15 Thread Segher Boessenkool
On Thu, Jul 15, 2021 at 09:15:55AM -0500, Peter Bergner wrote:
> On 7/14/21 4:12 PM, Peter Bergner wrote:
> > I'll make the change above and rebuild just to be safe and then commit.
> 
> Regtesting was clean as expected, so I pushed the commit to trunk.  Thanks.
> Is this ok for backporting to GCC 11 after a day or two on trunk?

If it is tested well enough, yes.  There are many things that can break
in this code, so I am not very comfortable with backporting it so close
to a release, but if it is important, we can take that risk.

Thanks,


Segher


Re: [RFC] ipa: Adjust references to identify read-only globals

2021-07-15 Thread Jan Hubicka
> Hi,
> 
> gcc/ChangeLog:
> 
> 2021-06-29  Martin Jambor  
> 
>   * cgraph.h (ipa_replace_map): New field force_load_ref.
>   * ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
>   aded new flag load_dereferenced, adjusted comments.
>   (ipa_get_param_dereferenced): New function.
>   (ipa_set_param_dereferenced): Likewise.
>   * cgraphclones.c (cgraph_node::create_virtual_clone): Follow it.
>   * ipa-cp.c: Include gimple.h.
>   (ipcp_discover_new_direct_edges): Take into account dereferenced flag.
>   (get_replacement_map): New parameter force_load_ref, set the
>   appropriate flag in ipa_replace_map if set.
>   (struct symbol_and_index_together): New type.
>   (adjust_references_in_act_callers): New function.
>   (adjust_references_in_caller): Likewise.
>   (create_specialized_node): When appropriate, call
>   adjust_references_in_caller and force only load references.
>   * ipa-prop.c (load_from_dereferenced_name): New function.
>   (ipa_analyze_controlled_uses): Also detect loads from a
>   dereference, harden testing of call statements.
>   (ipa_write_node_info): Stream the dereferenced flag.
>   (ipa_read_node_info): Likewise.
>   (ipa_set_jf_constant): Also create refdesc when jump function
>   references a variable.
>   (cgraph_node_for_jfunc): Rename to symtab_node_for_jfunc, work
>   also on references of variables and return a symtab_node.  Adjust
>   all callers.
>   (propagate_controlled_uses): Also remove references to VAR_DECLs.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-06-29  Martin Jambor  
> 
>   * gcc.dg/ipa/remref-3.c: New test.
>   * gcc.dg/ipa/remref-4.c: Likewise.
>   * gcc.dg/ipa/remref-5.c: Likewise.
>   * gcc.dg/ipa/remref-6.c: Likewise.
> ---
>  gcc/cgraph.h|   3 +
>  gcc/cgraphclones.c  |  10 +-
>  gcc/ipa-cp.c| 146 ++--
>  gcc/ipa-prop.c  | 166 ++--
>  gcc/ipa-prop.h  |  27 -
>  gcc/testsuite/gcc.dg/ipa/remref-3.c |  23 
>  gcc/testsuite/gcc.dg/ipa/remref-4.c |  31 ++
>  gcc/testsuite/gcc.dg/ipa/remref-5.c |  38 +++
>  gcc/testsuite/gcc.dg/ipa/remref-6.c |  24 
>  9 files changed, 419 insertions(+), 49 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-6.c
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 9f4338fdf87..0fc20cd4517 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -700,6 +700,9 @@ struct GTY(()) ipa_replace_map
>tree new_tree;
>/* Parameter number to replace, when old_tree is NULL.  */
>int parm_num;
> +  /* Set if the newly added reference should not be an address one, but a 
> load
> + one from the operand of the ADDR_EXPR in NEW_TREE.  */

So this is in case where parameter p is used only as *p?
I think the comment should be expanded to explain the situation or in a
year I will not know why we need such a flag :)
> @@ -4320,7 +4322,8 @@ gather_edges_for_value (ipcp_value *val, 
> cgraph_node *dest,
> Return it or NULL if for some reason it cannot be created.  */
>  
>  static struct ipa_replace_map *
> -get_replacement_map (class ipa_node_params *info, tree value, int parm_num)
> +get_replacement_map (class ipa_node_params *info, tree value, int parm_num,
> +  bool force_load_ref)

You want to comment the parameter here too..
> +/* At INDEX of a function being called by CS there is an ADDR_EXPR of a
> +   variable which is only dereferenced and which is represented by SYMBOL.  
> See
> +   if we can remove ADDR reference in callers assosiated witht the call. */
> +
> +static void
> +adjust_references_in_caller (cgraph_edge *cs, symtab_node *symbol, int index)
> +{
> +  ipa_edge_args *args = ipa_edge_args_sum->get (cs);
> +  ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, index);
> +  if (jfunc->type == IPA_JF_CONST)
> +{
> +  ipa_ref *to_del = cs->caller->find_reference (symbol, cs->call_stmt,
> + cs->lto_stmt_uid);
> +  if (!to_del)
> + return;
> +  to_del->remove_reference ();
> +  if (dump_file)
> + fprintf (dump_file, "Removed a reference from %s to %s.\n",
> +  cs->caller->dump_name (), symbol->dump_name ());
> +  return;
> +}
> +
> +  if (jfunc->type != IPA_JF_PASS_THROUGH
> +  || ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR)
> +return;
> +
> +  int fidx = ipa_get_jf_pass_through_formal_id (jfunc);
> +  cgraph_node *caller = cs->caller;
> +  ipa_node_params *caller_info = ipa_node_params_sum->get (caller);
> +  /* TODO: This consistency check may be too big and not really
> + that useful.  Consider removin

Re: [PATCH 1/2] testsuite: [arm] Add missing effective-target to vusdot-autovec.c

2021-07-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This test fails when forcing an -mcpu option incompatible with
> -march=armv8.2-a+i8mm.
>
> This patch adds the missing arm_v8_2a_i8mm_ok effective-target, as
> well as the associated dg-add-options arm_v8_2a_i8mm.
>
> 2021-07-15  Christophe Lyon  
>
>   gcc/testsuite/
>   * gcc.target/arm/simd/vusdot-autovec.c: Use arm_v8_2a_i8mm_ok
>   effective-target.

OK, thanks.

> ---
>  gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
> b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> index 7cc56f68817..e7af895b423 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> @@ -1,5 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
> +/* { dg-options "-O3" } */
> +/* { dg-add-options arm_v8_2a_i8mm } */
>  
>  #define N 480
>  #define SIGNEDNESS_1 unsigned


Re: [PATCH 2/2] testsuite: [arm] Remove arm_v8_2a_imm8_neon_ok_nocache

2021-07-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This patch removes this recently-introduced effective-target, as it
> looks like a typo and duplicate for arm_v8_2a_i8mm_ok (imm8 vs i8mm),
> and it is not used.
>
> 2021-07-15  Christophe Lyon  
>
>   gcc/testsuite/
>   * lib/target-supports.exp (arm_v8_2a_imm8_neon_ok_nocache):
>   Delete.

OK, thanks.

Richard

> ---
>  gcc/testsuite/lib/target-supports.exp | 30 ---
>  1 file changed, 30 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 1c27ccd94af..28950803b13 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5267,36 +5267,6 @@ proc 
> check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
>  return 0;
>  }
>  
> -# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
> -# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
> -# Record the command line options needed.
> -
> -proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
> -global et_arm_v8_2a_imm8_neon_flags
> -set et_arm_v8_2a_imm8_neon_flags ""
> -
> -if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> -return 0;
> -}
> -
> -# Iterate through sets of options to find the compiler flags that
> -# need to be added to the -march option.
> -foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
> "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
> -if { [check_no_compiler_messages_nocache \
> -  arm_v8_2a_imm8_neon_ok object {
> - #include 
> -#if !defined (__ARM_FEATURE_MATMUL_INT8)
> -#error "__ARM_FEATURE_MATMUL_INT8 not defined"
> -#endif
> -} "$flags -march=armv8.2-a+imm8"] } {
> -set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
> -return 1
> -}
> -}
> -
> -return 0;
> -}
> -
>  # Return 1 if the target supports ARMv8.1-M MVE
>  # instructions, 0 otherwise.  The test is valid for ARM.
>  # Record the command line options needed.


Re: [PATCH] consider parameter names in -Wvla-parameter (PR 97548)

2021-07-15 Thread Martin Sebor via Gcc-patches

On 7/8/21 5:36 PM, Jeff Law wrote:



On 7/1/2021 7:02 PM, Martin Sebor via Gcc-patches wrote:

-Wvla-parameter relies on operand_equal_p() with OEP_LEXICOGRAPHIC
set to compare VLA bounds for equality.  But operand_equal_p()
doesn't consider decl names, and so nontrivial expressions that
refer to the same function parameter are considered unequal by
the function, leading to false positives.

The attached fix solves the problem by adding a new flag bit,
OEP_DECL_NAME, to set of flags that control the function.  When
the bit is set, the function considers distinct decls with
the same name equal.  The caller is responsible for ensuring
that the otherwise distinct decls appear in a context where they
can be assumed to refer to the same entity.  The only caller that
sets the flag is the -Wvla-parameter checker.

In addition, the patch strips nops from the VLA bound to avoid
false positives with meaningless casts.

I don't particularly like this, though I don't see a better solution.

Can you add some more info to OEP_DECL_NAME to describe the conditions 
where it's useful and how callers can correctly use it?


With that, this is OK.


I updated the comment and pushed r12-2329.

Martin


[PATCH] c++: alias CTAD inside decltype [PR101233]

2021-07-15 Thread Patrick Palka via Gcc-patches
This is the alias CTAD version of the CTAD bug PR93248, and the fix is
the same: clear cp_unevaluated_operand so that the entire chain of
DECL_ARGUMENTS gets substituted.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101233

gcc/cp/ChangeLog:

* pt.c (alias_ctad_tweaks): Clear cp_unevaluated_operand for
substituting DECL_ARGUMENTS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias10.C: New test.
---
 gcc/cp/pt.c  | 12 +---
 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C | 10 ++
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c7bf7d412ca..bc0a0936579 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29097,9 +29097,15 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Substitute the deduced arguments plus the rewritten template
 parameters into f to get g.  This covers the type, copyness,
 guideness, and explicit-specifier.  */
- tree g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
- if (g == error_mark_node)
-   continue;
+ tree g;
+   {
+ /* Parms are to have DECL_CHAIN tsubsted, which would be skipped
+if cp_unevaluated_operand.  */
+ cp_evaluated ev;
+ g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
+ if (g == error_mark_node)
+   continue;
+   }
  DECL_USE_TEMPLATE (g) = 0;
  fprime = build_template_decl (g, gtparms, false);
  DECL_TEMPLATE_RESULT (fprime) = g;
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
new file mode 100644
index 000..a473fff5dc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
@@ -0,0 +1,10 @@
+// PR c++/101233
+// { dg-do compile { target c++20 } }
+
+template
+struct A { A(T, U); };
+
+template
+using B = A;
+
+using type = decltype(B{0, 0});
-- 
2.32.0.264.g75ae10bc75



[PATCH] c++: covariant reference return type [PR99664]

2021-07-15 Thread Patrick Palka via Gcc-patches
This implements the wording changes of DR 960 which clarifies that two
reference types are covariant only if they're both lvalue references
or both rvalue references.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

DR 960
PR c++/99664

gcc/cp/ChangeLog:

* search.c (check_final_overrider): Compare TYPE_REF_IS_RVALUE
when the return types are references.

gcc/testsuite/ChangeLog:

* g++.dg/inherit/covariant23.C: New test.
---
 gcc/cp/search.c|  8 +++-
 gcc/testsuite/g++.dg/inherit/covariant23.C | 14 ++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/inherit/covariant23.C

diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index af41bfe5835..943671acff8 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -1948,7 +1948,13 @@ check_final_overrider (tree overrider, tree basefn)
   fail = !INDIRECT_TYPE_P (base_return);
   if (!fail)
{
- fail = cp_type_quals (base_return) != cp_type_quals (over_return);
+ if (cp_type_quals (base_return) != cp_type_quals (over_return))
+   fail = 1;
+
+ if (TYPE_REF_P (base_return)
+ && (TYPE_REF_IS_RVALUE (base_return)
+ != TYPE_REF_IS_RVALUE (over_return)))
+   fail = 1;
 
  base_return = TREE_TYPE (base_return);
  over_return = TREE_TYPE (over_return);
diff --git a/gcc/testsuite/g++.dg/inherit/covariant23.C 
b/gcc/testsuite/g++.dg/inherit/covariant23.C
new file mode 100644
index 000..b27be15ef45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/inherit/covariant23.C
@@ -0,0 +1,14 @@
+// PR c++/99664
+// { dg-do compile { target c++11 } }
+
+struct Res { };
+
+struct A {
+  virtual Res &&f();
+  virtual Res &g();
+};
+
+struct B : A {
+  Res &f() override; // { dg-error "return type" }
+  Res &&g() override; // { dg-error "return type" }
+};
-- 
2.32.0.264.g75ae10bc75



[PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

These testcases accidentally contain the wrong signs for the expected values
for the scalar code.  The vector code however is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed as a trivial fix.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR middle-end/101457
* gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
* gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-9.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index 
aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 
2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index 
febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index 
cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char

[PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This means we need different patterns here.  This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.

There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (usdot_prod): Rename to...
(aarch64_usdot): ..This
(usdot_prod): New.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use
aarch64_usdot.
* config/aarch64/aarch64-simd-builtins.def: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 
063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by _prod.  */
+  /* Implemented by aarch64_{_lane}{q}.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64__lane{q}.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_dot"
 
 ;; These instructions map to the __builtins for the armv8.6a I8MM usdot
 ;; (vector) Dot Product operation.
-(define_insn "usdot_prod"
+(define_insn "aarch64_usdot"
   [(set (match_operand:VS 0 "register_operand" "=w")
(plus:VS
  (unspec:VS [(match_operand: 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod"
+  [(set (match_operand:VS 0 "register_operand")
+   (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+   (match_operand: 2 "register_operand")]
+UNSPEC_USDOT)
+(match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_dot_lane"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x2_t


-- 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by _prod.  */
+  /* Implemented by aarch64_{_lane}{q}.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64__lane{q}.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640

[PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (dot_prod): Correct
RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VS 0 "register_operand")
-   (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+   (plus:VS (match_operand:VS 3 "register_operand")
+(unspec:VS [(match_operand: 1 "register_operand")
(match_operand: 2 "register_operand")]
-DOTPROD)
-   (match_operand:VS 3 "register_operand")))]
+DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_aarch64_dot (operands[3], operands[3], operands[1],
-   operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.


-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+	(plus:VS (match_operand:VS 3 "register_operand")
+		 (unspec:VS [(match_operand: 1 "register_operand")
 			(match_operand: 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_aarch64_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.



[PATCH 4/4][AArch32]: correct dot-product RTL patterns.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

arm-none-linux-gnueabihf build is currently broken, the best I could do is
verify on arm-none-eabi but the tests are all marked UNSUPPORTED, but the ICE
is gone for the backend test.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (dot_prod): Correct RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_dot_lane"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VCVTI 0 "register_operand")
-   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
+   (plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+   (unspec:VCVTI [(match_operand: 1
"register_operand")
   (match_operand: 2
"register_operand")]
-DOTPROD)
-   (match_operand:VCVTI 3 "register_operand")))]
+DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"


-- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_dot_lane"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VCVTI 0 "register_operand")
-	(plus:VCVTI (unspec:VCVTI [(match_operand: 1
+	(plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+		(unspec:VCVTI [(match_operand: 1
 			"register_operand")
    (match_operand: 2
 			"register_operand")]
-		 DOTPROD)
-		(match_operand:VCVTI 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
- operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"



Re: [PATCH] c++: argument pack expansion inside constraint [PR100138]

2021-07-15 Thread Patrick Palka via Gcc-patches
On Sat, May 8, 2021 at 8:42 AM Jason Merrill  wrote:
>
> On 5/7/21 12:33 PM, Patrick Palka wrote:
> > This PR is about CTAD but the underlying problems are more general;
> > CTAD is a good trigger for them because of the necessary substitution
> > into constraints that deduction guide generation entails.
> >
> > In the testcase below, when generating the implicit deduction guide for
> > the constrained constructor template for A, we substitute the generic
> > flattening map 'tsubst_args' into the constructor's constraints.  During
> > this substitution, tsubst_pack_expansion returns a rebuilt pack
> > expansion for sizeof...(xs), but it's neglecting to carry over the
> > PACK_EXPANSION_LOCAL_P (and PACK_EXPANSION_SIZEOF_P) flag from the
> > original tree to the rebuilt one.  The flag is otherwise unset on the
> > original tree[1] but set for the rebuilt tree from make_pack_expansion
> > only because we're doing the CTAD at function scope (inside main).  This
> > leads us to crash when substituting into the pack expansion during
> > satisfaction because we don't have local_specializations set up (it'd be
> > set up for us if PACK_EXPANSION_LOCAL_P is unset)
> >
> > Similarly, when substituting into a constraint we need to set
> > cp_unevaluated since constraints are unevaluated operands.  This avoids
> > a crash during CTAD for C below.
> >
> > [1]: Although the original pack expansion is in a function context, I
> > guess it makes sense that PACK_EXPANSION_LOCAL_P is unset for it because
> > we can't rely on local specializations (which are formed when
> > substituting into the function declaration) during satisfaction.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
> > cmcstl2 and range-v3, does this look OK for trunk?
>
> OK.

Would it be ok to backport this patch to the 11 branch given its
impact on concepts (or perhaps backport only part of it, say all but
the PACK_EXPANSION_LOCAL_P propagation since that part just avoids
ICEing on the invalid portions of the testcase)?

>
> > gcc/cp/ChangeLog:
> >
> >   PR c++/100138
> >   * constraint.cc (tsubst_constraint): Set up cp_unevaluated.
> >   (satisfy_atom): Set up iloc_sentinel before calling
> >   cxx_constant_value.
> >   * pt.c (tsubst_pack_expansion): When returning a rebuilt pack
> >   expansion, carry over PACK_EXPANSION_LOCAL_P and
> >   PACK_EXPANSION_SIZEOF_P from the original pack expansion.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR c++/100138
> >   * g++.dg/cpp2a/concepts-ctad4.C: New test.
> > ---
> >   gcc/cp/constraint.cc|  6 -
> >   gcc/cp/pt.c |  2 ++
> >   gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C | 25 +
> >   3 files changed, 32 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 0709695fd08..30fccc46678 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -2747,6 +2747,7 @@ tsubst_constraint (tree t, tree args, tsubst_flags_t 
> > complain, tree in_decl)
> > /* We also don't want to evaluate concept-checks when substituting the
> >constraint-expressions of a declaration.  */
> > processing_constraint_expression_sentinel s;
> > +  cp_unevaluated u;
> > tree expr = tsubst_expr (t, args, complain, in_decl, false);
> > return expr;
> >   }
> > @@ -3005,7 +3006,10 @@ satisfy_atom (tree t, tree args, sat_info info)
> >
> > /* Compute the value of the constraint.  */
> > if (info.noisy ())
> > -result = cxx_constant_value (result);
> > +{
> > +  iloc_sentinel ils (EXPR_LOCATION (result));
> > +  result = cxx_constant_value (result);
> > +}
> > else
> >   {
> > result = maybe_constant_value (result, NULL_TREE,
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index 36a8cb5df5d..0d27dd1af65 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -13203,6 +13203,8 @@ tsubst_pack_expansion (tree t, tree args, 
> > tsubst_flags_t complain,
> > else
> >   result = tsubst (pattern, args, complain, in_decl);
> > result = make_pack_expansion (result, complain);
> > +  PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t);
> > +  PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t);
> > if (PACK_EXPANSION_AUTO_P (t))
> >   {
> > /* This is a fake auto... pack expansion created in add_capture with
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C 
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> > new file mode 100644
> > index 000..95a3a22dd04
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> > @@ -0,0 +1,25 @@
> > +// PR c++/100138
> > +// { dg-do compile { target c++20 } }
> > +
> > +template 
> > +struct A {
> > +  A(T, auto... xs) requires (sizeof...(xs) != 0) { }
> > +};
> > +
> > +cons

Re: [committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread François Dumont via Gcc-patches

On 15/07/21 5:26 pm, Jonathan Wakely via Libstdc++ wrote:

This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

Tested powerpc64le-linux. Committed to trunk.

ChangeLog is talking about __builtin_trap but there is none in the 
attached patch.




[PATCH] x86: Don't set AVX_U128_DIRTY when all bits are zero

2021-07-15 Thread H.J. Lu via Gcc-patches
In a single SET, all bits of the source YMM/ZMM register are zero when

1. The source is contant zero.
2. The source YMM/ZMM operand are defined from contant zero.

and we don't set AVX_U128_DIRTY.

gcc/

PR target/101456
* config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
AVX_U128_DIRTY when all bits are zero.

gcc/testsuite/

PR target/101456
* gcc.target/i386/pr101456-1.c: New test.
---
 gcc/config/i386/i386.c | 47 ++
 gcc/testsuite/gcc.target/i386/pr101456-1.c | 28 +
 2 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-1.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cff26909292..c2b06934053 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14129,6 +14129,53 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
   return AVX_U128_CLEAN;
 }
 
+  rtx set = single_set (insn);
+  if (set)
+{
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  if (ix86_check_avx_upper_register (dest))
+   {
+ /* It is not dirty if the source is known zero.  */
+ if (standard_sse_constant_p (src, GET_MODE (dest)) == 1)
+   return AVX_U128_ANY;
+ else
+   return AVX_U128_DIRTY;
+   }
+  else if (ix86_check_avx_upper_register (src))
+   {
+ /* Check for the source operand with all DEFs from constant
+zero.  */
+ df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
+ if (!def)
+   return AVX_U128_DIRTY;
+
+ for (; def; def = DF_REF_NEXT_REG (def))
+   if (DF_REF_REG_DEF_P (def)
+   && !DF_REF_IS_ARTIFICIAL (def))
+ {
+   rtx_insn *def_insn = DF_REF_INSN (def);
+   set = single_set (def_insn);
+   if (!set)
+ return AVX_U128_DIRTY;
+
+   dest = SET_DEST (set);
+   if (ix86_check_avx_upper_register (dest))
+ {
+   src = SET_SRC (set);
+   /* It is dirty if the source operand isn't constant
+  zero.  */
+   if (standard_sse_constant_p (src, GET_MODE (dest))
+   != 1)
+ return AVX_U128_DIRTY;
+ }
+ }
+
+ /* It is not dirty only if all sources are known zero.  */
+ return AVX_U128_ANY;
+   }
+}
+
   /* Require DIRTY mode if a 256bit or 512bit AVX register is referenced.
  Hardware changes state only when a 256bit register is written to,
  but we need to prevent the compiler from moving optimal insertion
diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c 
b/gcc/testsuite/gcc.target/i386/pr101456-1.c
new file mode 100644
index 000..6a0f6ccd756
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+#include 
+
+extern __m256 x1;
+extern __m256d x2;
+extern __m256i x3;
+
+void
+foo1 (void)
+{
+  x1 = _mm256_setzero_ps ();
+}
+
+void
+foo2 (void)
+{
+  x2 = _mm256_setzero_pd ();
+}
+
+void
+foo3 (void)
+{
+  x3 = _mm256_setzero_si256 ();
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.31.1



[PATCH] c++: Add C++20 #__VA_OPT__ support

2021-07-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements C++20 # __VA_OPT__ (...) support.
Testcases cover what I came up with myself and what LLVM has for #__VA_OPT__
in its testsuite and the string literals are identical between the two
compilers on the va-opt-5.c testcase.

Haven't looked at the non-#__VA_OPT__ differences between LLVM and GCC
though, I think at least the
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1042r1.html
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1)  // replaced by a b
case isn't handled right (we emit ab).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-15  Jakub Jelinek  

libcpp/
* macro.c (vaopt_state): Add m_stringify member.
(vaopt_state::vaopt_state): Initialize it.
(vaopt_state::update): Overwrite it.
(vaopt_state::stringify): New method.
(stringify_arg): Replace arg argument with first, count arguments
and add va_opt argument.  Use first instead of arg->first and
count instead of arg->count, for va_opt add paste_tokens handling.
(paste_tokens): Fix up len calculation.  Don't spell rhs twice,
instead use %.*s to supply lhs and rhs spelling lengths.  Don't call
_cpp_backup_tokens here.
(paste_all_tokens): Call it here instead.
(replace_args): Adjust stringify_arg caller.  For vaopt_state::END
if stringify is true handle __VA_OPT__ stringification.
(create_iso_definition): Handle # __VA_OPT__ similarly to # macro_arg.
gcc/testsuite/
* c-c++-common/cpp/va-opt-5.c: New test.
* c-c++-common/cpp/va-opt-6.c: New test.

--- libcpp/macro.c.jj   2021-05-21 10:34:09.328560825 +0200
+++ libcpp/macro.c  2021-07-15 17:27:30.109631306 +0200
@@ -118,6 +118,7 @@ class vaopt_state {
 m_arg (arg),
 m_variadic (is_variadic),
 m_last_was_paste (false),
+m_stringify (false),
 m_state (0),
 m_paste_location (0),
 m_location (0),
@@ -145,6 +146,7 @@ class vaopt_state {
  }
++m_state;
m_location = token->src_loc;
+   m_stringify = (token->flags & STRINGIFY_ARG) != 0;
return BEGIN;
   }
 else if (m_state == 1)
@@ -234,6 +236,11 @@ class vaopt_state {
 return m_state == 0;
   }
 
+  bool stringify () const
+  {
+return m_stringify;
+  }  
+
  private:
 
   /* The cpp_reader.  */
@@ -247,6 +254,8 @@ class vaopt_state {
   /* If true, the previous token was ##.  This is used to detect when
  a paste occurs at the end of the sequence.  */
   bool m_last_was_paste;
+  /* True for #__VA_OPT__.  */
+  bool m_stringify;
 
   /* The state variable:
  0 means not parsing
@@ -284,7 +293,8 @@ static _cpp_buff *collect_args (cpp_read
 static cpp_context *next_context (cpp_reader *);
 static const cpp_token *padding_token (cpp_reader *, const cpp_token *);
 static const cpp_token *new_string_token (cpp_reader *, uchar *, unsigned int);
-static const cpp_token *stringify_arg (cpp_reader *, macro_arg *);
+static const cpp_token *stringify_arg (cpp_reader *, const cpp_token **,
+  unsigned int, bool);
 static void paste_all_tokens (cpp_reader *, const cpp_token *);
 static bool paste_tokens (cpp_reader *, location_t,
  const cpp_token **, const cpp_token *);
@@ -818,10 +828,11 @@ cpp_quote_string (uchar *dest, const uch
   return dest;
 }
 
-/* Convert a token sequence ARG to a single string token according to
-   the rules of the ISO C #-operator.  */
+/* Convert a token sequence FIRST to FIRST+COUNT-1 to a single string token
+   according to the rules of the ISO C #-operator.  */
 static const cpp_token *
-stringify_arg (cpp_reader *pfile, macro_arg *arg)
+stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
+  bool va_opt)
 {
   unsigned char *dest;
   unsigned int i, escape_it, backslash_count = 0;
@@ -834,9 +845,27 @@ stringify_arg (cpp_reader *pfile, macro_
   *dest++ = '"';
 
   /* Loop, reading in the argument's tokens.  */
-  for (i = 0; i < arg->count; i++)
+  for (i = 0; i < count; i++)
 {
-  const cpp_token *token = arg->first[i];
+  const cpp_token *token = first[i];
+
+  if (va_opt && (token->flags & PASTE_LEFT))
+   {
+ location_t virt_loc = pfile->invocation_location;
+ const cpp_token *rhs;
+ do
+   {
+ if (i == count)
+   abort ();
+ rhs = first[++i];
+ if (!paste_tokens (pfile, virt_loc, &token, rhs))
+   {
+ --i;
+ break;
+   }
+   }
+ while (rhs->flags & PASTE_LEFT);
+   }
 
   if (token->type == CPP_PADDING)
{
@@ -923,7 +952,7 @@ paste_tokens (cpp_reader *pfile, locatio
   cpp_token *lhs;
   unsigned int len;
 
-  len = cpp_token_len (*plhs) + cpp_token_len (rhs) + 1;
+  len = cpp_token_len (*plhs) + cpp_token_len (rhs) + 2;
   buf = (unsigned char *) alloca (len);
 

Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread João Gabriel Jardim via Gcc-patches
-- 
João Gabriel Jardim


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread João Gabriel Jardim via Gcc-patches
-- 
João Gabriel Jardim


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread Abraão de Santana via Gcc-patches
Hey João , I think there's a problem with your email, it's empty!

--
*Abraão C. de Santana*


Re: [committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
On Thu, 15 Jul 2021, 18:21 François Dumont via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On 15/07/21 5:26 pm, Jonathan Wakely via Libstdc++ wrote:
> > This results in slightly smaller code when assertions are enabled when
> > either using Clang (because it adds code to call std::terminate when
> > potentially-throwing functions are called in a noexcept function) or a
> > freestanding or non-verbose build (because it doesn't use printf).
> >
> > Signed-off-by: Jonathan Wakely 
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/101429
> >   * include/bits/c++config (__replacement_assert): Add noexcept.
> >   [!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
> >   instead of __replacement_assert.
> >
> > Tested powerpc64le-linux. Committed to trunk.
> >
> ChangeLog is talking about __builtin_trap but there is none in the
> attached patch.
>


Yes I already noticed that and mentioned it in the bugzilla PR. It uses
__builtin_abort not __builtin_trap. I'll fix the ChangeLog file tomorrow
after it gets generated.

The Git commit message will stay wrong though.


[PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-15 Thread Serge Belyshev
This is to make development version string more readable, and
to simplify navigation through gcc-testresults.

Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
generate version string, which is somewhat excessive since conversion
to git because commit hashes are now stable.

Even better, gcc-git-customization.sh script provides gcc-descr alias
which makes prettier version string, and thus use it instead (or just
abbreviated commit hash when the alias is not available).

Before: [master revision 
b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
After: [master r12-2331]

OK for mainline?

contrib/Changelog:

* gcc_update: Use gcc-descr alias for revision string if it exists, or
abbreviated commit hash instead. Drop "revision" from gcc/REVISION.
---
 contrib/gcc_update | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 80fac9fc995..8f712e37616 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -332,7 +332,7 @@ case $vcs_type in
 exit 1
fi
 
-   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
+   revision=`$GCC_GIT gcc-descr || $GCC_GIT log -n1 --pretty=tformat:%h`
branch=`$GCC_GIT name-rev --name-only HEAD || :`
;;
 
@@ -414,6 +414,6 @@ rm -f LAST_UPDATED gcc/REVISION
 date
 echo "`TZ=UTC date` (revision $revision)"
 } > LAST_UPDATED
-echo "[$branch revision $revision]" > gcc/REVISION
+echo "[$branch $revision]" > gcc/REVISION
 
 touch_files_reexec


[committed] analyzer: handle self-referential phis

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as a9241df96e1950c630550ada9371c0b4a03496cf.

gcc/analyzer/ChangeLog:
* state-purge.cc (self_referential_phi_p): New.
(state_purge_per_ssa_name::process_point): Don't purge an SSA name
at its def-stmt if the def-stmt is self-referential.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/phi-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/state-purge.cc   | 37 ---
 gcc/testsuite/gcc.dg/analyzer/phi-1.c | 24 +
 2 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/phi-1.c

diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index 70a09ed581f..e82ea87e735 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -288,6 +288,20 @@ state_purge_per_ssa_name::add_to_worklist (const 
function_point &point,
 }
 }
 
+/* Does this phi depend on itself?
+   e.g. in:
+ added_2 = PHI 
+   the middle defn (from edge 3) requires added_2 itself.  */
+
+static bool
+self_referential_phi_p (const gphi *phi)
+{
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+if (gimple_phi_arg_def (phi, i) == gimple_phi_result (phi))
+  return true;
+  return false;
+}
+
 /* Process POINT, popped from WORKLIST.
Iterate over predecessors of POINT, adding to WORKLIST.  */
 
@@ -326,11 +340,28 @@ state_purge_per_ssa_name::process_point (const 
function_point &point,
 !gsi_end_p (gpi); gsi_next (&gpi))
  {
gphi *phi = gpi.phi ();
+   /* Are we at the def-stmt for m_name?  */
if (phi == def_stmt)
  {
-   if (logger)
- logger->log ("def stmt within phis; terminating");
-   return;
+   /* Does this phi depend on itself?
+  e.g. in:
+added_2 = PHI 
+  the middle defn (from edge 3) requires added_2 itself
+  so we can't purge it here.  */
+   if (self_referential_phi_p (phi))
+ {
+   if (logger)
+ logger->log ("self-referential def stmt within phis;"
+  " continuing");
+ }
+   else
+ {
+   /* Otherwise, we can stop here, so that m_name
+  can be purged.  */
+   if (logger)
+ logger->log ("def stmt within phis; terminating");
+   return;
+ }
  }
  }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/phi-1.c 
b/gcc/testsuite/gcc.dg/analyzer/phi-1.c
new file mode 100644
index 000..09260033fef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/phi-1.c
@@ -0,0 +1,24 @@
+/* { dg-do "compile" } */
+
+typedef __SIZE_TYPE__ size_t;
+#define NULL ((void *) 0)
+
+extern const char *foo (void);
+extern size_t bar (void);
+
+void
+_nl_expand_alias (const char *locale_alias_path)
+{
+  size_t added;
+  do
+{
+  added = 0;
+  while (added == 0 && locale_alias_path[0] != '\0')
+   {
+ const char *start = foo ();
+ if (start < locale_alias_path)
+   added = bar ();
+   }
+}
+  while (added != 0);
+}
-- 
2.26.3



[committed] analyzer: use DECL_DEBUG_EXPR on SSA names for artificial vars

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as e9711fe482b4abef0e7572809d3593631991276e.

gcc/analyzer/ChangeLog:
* analyzer.cc (fixup_tree_for_diagnostic_1): Use DECL_DEBUG_EXPR
if it's available.
* engine.cc (readability): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.cc |  9 +++--
 gcc/analyzer/engine.cc   | 19 ---
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 12c03f6cfbd..a8ee1a1a2dc 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -165,8 +165,13 @@ fixup_tree_for_diagnostic_1 (tree expr, hash_set 
*visited)
   && TREE_CODE (expr) == SSA_NAME
   && (SSA_NAME_VAR (expr) == NULL_TREE
  || DECL_ARTIFICIAL (SSA_NAME_VAR (expr
-if (tree expr2 = maybe_reconstruct_from_def_stmt (expr, visited))
-  return expr2;
+{
+  if (tree var = SSA_NAME_VAR (expr))
+   if (VAR_P (var) && DECL_HAS_DEBUG_EXPR_P (var))
+ return DECL_DEBUG_EXPR (var);
+  if (tree expr2 = maybe_reconstruct_from_def_stmt (expr, visited))
+   return expr2;
+}
   return expr;
 }
 
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 01b83a4ef28..8f3e7f781b2 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -527,9 +527,22 @@ readability (const_tree expr)
 case SSA_NAME:
   {
if (tree var = SSA_NAME_VAR (expr))
- /* Slightly favor the underlying var over the SSA name to
-avoid having them compare equal.  */
- return readability (var) - 1;
+ {
+   if (DECL_ARTIFICIAL (var))
+ {
+   /* If we have an SSA name for an artificial var,
+  only use it if it has a debug expr associated with
+  it that fixup_tree_for_diagnostic can use.  */
+   if (VAR_P (var) && DECL_HAS_DEBUG_EXPR_P (var))
+ return readability (DECL_DEBUG_EXPR (var)) - 1;
+ }
+   else
+ {
+   /* Slightly favor the underlying var over the SSA name to
+  avoid having them compare equal.  */
+   return readability (var) - 1;
+ }
+ }
/* Avoid printing '' for SSA names for temporaries.  */
return -1;
   }
-- 
2.26.3



[committed] analyzer: add -fdump-analyzer-exploded-paths

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 98cd4d123aa14598b1f0d54c22663c8200a96d9c.

gcc/analyzer/ChangeLog:
* analyzer.opt (fdump-analyzer-exploded-paths): New.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Implement it.
* engine.cc (exploded_path::dump_to_pp): Add ext_state param and
use it to dump states if non-NULL.
(exploded_path::dump): Likewise.
(exploded_path::dump_to_file): New.
* exploded-graph.h (exploded_path::dump_to_pp): Add ext_state
param.
(exploded_path::dump): Likewise.
(exploded_path::dump): Likewise.
(exploded_path::dump_to_file): New.

gcc/ChangeLog:
* doc/invoke.texi (-fdump-analyzer-exploded-paths): New.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.opt  |  4 
 gcc/analyzer/diagnostic-manager.cc | 11 ++
 gcc/analyzer/engine.cc | 34 --
 gcc/analyzer/exploded-graph.h  |  9 +---
 gcc/doc/invoke.texi|  6 ++
 5 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index dd34495abd5..7b77ae8a73d 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -210,6 +210,10 @@ fdump-analyzer-exploded-nodes-3
 Common RejectNegative Var(flag_dump_analyzer_exploded_nodes_3)
 Dump a textual representation of the exploded graph to SRCFILE.eg-ID.txt.
 
+fdump-analyzer-exploded-paths
+Common RejectNegative Var(flag_dump_analyzer_exploded_paths)
+Dump a textual representation of each diagnostic's exploded path to 
SRCFILE.IDX.KIND.epath.txt.
+
 fdump-analyzer-feasibility
 Common RejectNegative Var(flag_dump_analyzer_feasibility)
 Dump various analyzer internals to SRCFILE.*.fg.dot and SRCFILE.*.tg.dot.
diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index b7d263b4217..d005facc20b 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1164,6 +1164,17 @@ diagnostic_manager::emit_saved_diagnostic (const 
exploded_graph &eg,
inform_n (loc, num_dupes,
  "%i duplicate", "%i duplicates",
  num_dupes);
+  if (flag_dump_analyzer_exploded_paths)
+   {
+ auto_timevar tv (TV_ANALYZER_DUMP);
+ pretty_printer pp;
+ pp_printf (&pp, "%s.%i.%s.epath.txt",
+dump_base_name, sd.get_index (), sd.m_d->get_kind ());
+ char *filename = xstrdup (pp_formatted_text (&pp));
+ epath->dump_to_file (filename, eg.get_ext_state ());
+ inform (loc, "exploded path written to %qs", filename);
+ free (filename);
+   }
 }
   delete pp;
 }
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 8f3e7f781b2..dc07a79e185 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3630,10 +3630,12 @@ exploded_path::feasible_p (logger *logger, 
feasibility_problem **out,
   return true;
 }
 
-/* Dump this path in multiline form to PP.  */
+/* Dump this path in multiline form to PP.
+   If EXT_STATE is non-NULL, then show the nodes.  */
 
 void
-exploded_path::dump_to_pp (pretty_printer *pp) const
+exploded_path::dump_to_pp (pretty_printer *pp,
+  const extrinsic_state *ext_state) const
 {
   for (unsigned i = 0; i < m_edges.length (); i++)
 {
@@ -3643,28 +3645,48 @@ exploded_path::dump_to_pp (pretty_printer *pp) const
 eedge->m_src->m_index,
 eedge->m_dest->m_index);
   pp_newline (pp);
+
+  if (ext_state)
+   eedge->m_dest->dump_to_pp (pp, *ext_state);
 }
 }
 
 /* Dump this path in multiline form to FP.  */
 
 void
-exploded_path::dump (FILE *fp) const
+exploded_path::dump (FILE *fp, const extrinsic_state *ext_state) const
 {
   pretty_printer pp;
   pp_format_decoder (&pp) = default_tree_printer;
   pp_show_color (&pp) = pp_show_color (global_dc->printer);
   pp.buffer->stream = fp;
-  dump_to_pp (&pp);
+  dump_to_pp (&pp, ext_state);
   pp_flush (&pp);
 }
 
 /* Dump this path in multiline form to stderr.  */
 
 DEBUG_FUNCTION void
-exploded_path::dump () const
+exploded_path::dump (const extrinsic_state *ext_state) const
 {
-  dump (stderr);
+  dump (stderr, ext_state);
+}
+
+/* Dump this path verbosely to FILENAME.  */
+
+void
+exploded_path::dump_to_file (const char *filename,
+const extrinsic_state &ext_state) const
+{
+  FILE *fp = fopen (filename, "w");
+  if (!fp)
+return;
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp.buffer->stream = fp;
+  dump_to_pp (&pp, &ext_state);
+  pp_flush (&pp);
+  fclose (fp);
 }
 
 /* class feasibility_problem.  */
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 2d25e5e5167..1d8b73da7c4 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -895,9 +

[committed] analyzer: reimplement -Wanalyzer-use-of-uninitialized-value [PR95006 et al]

2021-07-15 Thread David Malcolm via Gcc-patches
The initial gcc 10 era commit of the analyzer (in
757bf1dff5e8cee34c0a75d06140ca972bfecfa7) had an implementation of
-Wanalyzer-use-of-uninitialized-value, but was sufficiently buggy
that I removed it in 78b9783774bfd3540f38f5b1e3c7fc9f719653d7 before
the release of gcc 10.1

This patch reintroduces the warning, heavily rewritten, with (I hope)
a less buggy implementation this time, for GCC 12.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-2337-g33255ad3ac14e3953750fe0f2d82b901c2852ff6.

gcc/analyzer/ChangeLog:
PR analyzer/95006
PR analyzer/94713
PR analyzer/94714
* analyzer.cc (maybe_reconstruct_from_def_stmt): Split out
GIMPLE_ASSIGN case into...
(get_diagnostic_tree_for_gassign_1): New.
(get_diagnostic_tree_for_gassign): New.
* analyzer.h (get_diagnostic_tree_for_gassign): New decl.
* analyzer.opt (Wanalyzer-write-to-string-literal): New.
* constraint-manager.cc (class svalue_purger): New.
(constraint_manager::purge_state_involving): New.
* constraint-manager.h
(constraint_manager::purge_state_involving): New.
* diagnostic-manager.cc (saved_diagnostic::supercedes_p): New.
(dedupe_winners::handle_interactions): New.
(diagnostic_manager::emit_saved_diagnostics): Call it.
* diagnostic-manager.h (saved_diagnostic::supercedes_p): New decl.
* engine.cc (impl_region_model_context::warn): Convert return type
to bool.  Return false if the diagnostic isn't saved.
(impl_region_model_context::purge_state_involving): New.
(impl_sm_context::get_state): Use NULL ctxt when querying old
rvalue.
(impl_sm_context::set_next_state): Use new sval when querying old
state.
(class dump_path_diagnostic): Move to region-model.cc
(exploded_node::on_stmt): Move to on_stmt_pre and on_stmt_post.
Remove call to purge_state_involving.
(exploded_node::on_stmt_pre): New, based on the above.  Move most
of it to region_model::on_stmt_pre.
(exploded_node::on_stmt_post): Likewise, moving to
region_model::on_stmt_post.
(class stale_jmp_buf): Fix parent class to use curiously recurring
template pattern.
(feasibility_state::maybe_update_for_edge): Call on_call_pre and
on_call_post on gcalls.
* exploded-graph.h (impl_region_model_context::warn): Return bool.
(impl_region_model_context::purge_state_involving): New decl.
(exploded_node::on_stmt_pre): New decl.
(exploded_node::on_stmt_post): New decl.
* pending-diagnostic.h (pending_diagnostic::use_of_uninit_p): New.
(pending_diagnostic::supercedes_p): New.
* program-state.cc (sm_state_map::get_state): Inherit state for
conjured_svalue as well as initial_svalue.
(sm_state_map::purge_state_involving): Also support SK_CONJURED.
* region-model-impl-calls.cc (call_details::get_uncertainty):
Handle m_ctxt being NULL.
(call_details::get_or_create_conjured_svalue): New.
(region_model::impl_call_fgets): New.
(region_model::impl_call_fread): New.
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Return an
uninitialized poisoned value for regions that can't have initial
values.
* region-model-reachability.cc
(reachable_regions::mark_escaped_clusters): Handle ctxt being
NULL.
* region-model.cc (region_to_value_map::purge_state_involving): New.
(poisoned_value_diagnostic::use_of_uninit_p): New.
(poisoned_value_diagnostic::emit): Handle POISON_KIND_UNINIT.
(poisoned_value_diagnostic::describe_final_event): Likewise.
(region_model::check_for_poison): New.
(region_model::on_assignment): Call it.
(class dump_path_diagnostic): Move here from engine.cc.
(region_model::on_stmt_pre): New, based on exploded_node::on_stmt.
(region_model::on_call_pre): Move the setting of the LHS to a
conjured svalue to before the checks for specific functions.
Handle "fgets", "fgets_unlocked", and "fread".
(region_model::purge_state_involving): New.
(region_model::handle_unrecognized_call): Handle ctxt being NULL.
(region_model::get_rvalue): Call check_for_poison.
(selftest::test_stack_frames): Use NULL for context when getting
uninitialized rvalue.
(selftest::test_alloca): Likewise.
* region-model.h (region_to_value_map::purge_state_involving): New
decl.
(call_details::get_or_create_conjured_svalue): New decl.
(region_model::on_stmt_pre): New decl.
(region_model::purge_state_involving): New decl.
(region_model::impl_call_fgets): New decl.
(region_model::impl_call_fread): New decl.
(region_model::check_for_poison): New decl.
   

  1   2   >