[PATCH] [i386] Revert x86_order_regs_for_local_alloc changes in r12-1669.

2021-06-24 Thread liuhongt via Gcc-patches
Still put general regs as first alloca order.

This should fix 2 failures introduced by r12-1669, also add xfail to new
failed testcases to temporarily avoid regression, eventually xfail should
be removed.

compare_test log on non-avx512 target

Tests that now fail, but worked before (6 tests):

gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4
gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1
gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1
unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4
unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1
unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1

Tests that now work, but didn't before (2 tests):

unix/-m32: gcc.target/i386/avx512bw-pr70329-1.c execution test
unix/-m32: gcc.target/i386/pr96814.c execution test

Bootstrap is ok, so is regression test on x86-64-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/101185
* config/i386/i386.c (x86_order_regs_for_local_alloc):
Revert r12-1669.

gcc/testsuite/ChangeLog

PR target/101185
* gcc.target/i386/bitwise_mask_op-3.c: Add xfail to
temporarily avoid regression, eventually xfail should be
removed.
---
 gcc/config/i386/i386.c| 13 -
 gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c |  6 +++---
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3d5883b8d0e..c71c9e666a4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20476,15 +20476,6 @@ x86_order_regs_for_local_alloc (void)
int pos = 0;
int i;
 
-   /* When allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
-  MASK_REGS first since it has already been disparaged. This is for
-  testcase bitwise_mask_op3.c where the input is allocated as mask
-  registers, then mask bitwise instructions should be used there.
-  Refer to pr101142.  */
-   /* Mask register.  */
-   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
- reg_alloc_order [pos++] = i;
-
/* First allocate the local general purpose registers.  */
for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
  if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i))
@@ -20511,6 +20502,10 @@ x86_order_regs_for_local_alloc (void)
for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
  reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+ reg_alloc_order [pos++] = i;
+
/* x87 registers.  */
if (TARGET_SSE_MATH)
  for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c 
b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
index 4a9078615aa..352c49d6c6b 100644
--- a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
+++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
@@ -12,7 +12,7 @@ foo_orb (__m512i a, __m512i b)
   foo = m1 | m2;
 }
 
-/* { dg-final { scan-assembler-times "korb\[\t \]" "1" } }  */
+/* { dg-final { scan-assembler-times "korb\[\t \]" "1" { xfail *-*-* } } }  */
 
 void
 foo_xorb (__m512i a, __m512i b)
@@ -22,7 +22,7 @@ foo_xorb (__m512i a, __m512i b)
   foo = m1 ^ m2;
 }
 
-/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" } }  */
+/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" { xfail *-*-* } } }  */
 
 void
 foo_andb (__m512i a, __m512i b)
@@ -40,4 +40,4 @@ foo_andnb (__m512i a, __m512i b)
   foo = m1 & ~m2;
 }
 
-/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} }  */
+/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4" { xfail *-*-* } } }  */
-- 
2.27.0



Re: [PATCH] [i386] Revert x86_order_regs_for_local_alloc changes in r12-1669.

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 10:44 AM liuhongt  wrote:
>
> Still put general regs as first alloca order.
>
> This should fix 2 failures introduced by r12-1669, also add xfail to new
> failed testcases to temporarily avoid regression, eventually xfail should
> be removed.
>
> compare_test log on non-avx512 target
>
> Tests that now fail, but worked before (6 tests):
>
> gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4
> gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1
> gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1
> unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t 
> ] 4
> unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 
> 1
> unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t 
> ] 1
>
> Tests that now work, but didn't before (2 tests):
>
> unix/-m32: gcc.target/i386/avx512bw-pr70329-1.c execution test
> unix/-m32: gcc.target/i386/pr96814.c execution test
>
> Bootstrap is ok, so is regression test on x86-64-linux-gnu{-m32,}.
> Ok for trunk?

Yes, let's start improvements from non-regressed state.

Thanks,
Uros.

> gcc/ChangeLog:
>
> PR target/101185
> * config/i386/i386.c (x86_order_regs_for_local_alloc):
> Revert r12-1669.
>
> gcc/testsuite/ChangeLog
>
> PR target/101185
> * gcc.target/i386/bitwise_mask_op-3.c: Add xfail to
> temporarily avoid regression, eventually xfail should be
> removed.
> ---
>  gcc/config/i386/i386.c| 13 -
>  gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c |  6 +++---
>  2 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 3d5883b8d0e..c71c9e666a4 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -20476,15 +20476,6 @@ x86_order_regs_for_local_alloc (void)
> int pos = 0;
> int i;
>
> -   /* When allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
> -  MASK_REGS first since it has already been disparaged. This is for
> -  testcase bitwise_mask_op3.c where the input is allocated as mask
> -  registers, then mask bitwise instructions should be used there.
> -  Refer to pr101142.  */
> -   /* Mask register.  */
> -   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> - reg_alloc_order [pos++] = i;
> -
> /* First allocate the local general purpose registers.  */
> for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>   if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i))
> @@ -20511,6 +20502,10 @@ x86_order_regs_for_local_alloc (void)
> for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
>   reg_alloc_order [pos++] = i;
>
> +   /* Mask register.  */
> +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> + reg_alloc_order [pos++] = i;
> +
> /* x87 registers.  */
> if (TARGET_SSE_MATH)
>   for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
> diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c 
> b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> index 4a9078615aa..352c49d6c6b 100644
> --- a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> @@ -12,7 +12,7 @@ foo_orb (__m512i a, __m512i b)
>foo = m1 | m2;
>  }
>
> -/* { dg-final { scan-assembler-times "korb\[\t \]" "1" } }  */
> +/* { dg-final { scan-assembler-times "korb\[\t \]" "1" { xfail *-*-* } } }  
> */
>
>  void
>  foo_xorb (__m512i a, __m512i b)
> @@ -22,7 +22,7 @@ foo_xorb (__m512i a, __m512i b)
>foo = m1 ^ m2;
>  }
>
> -/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" } }  */
> +/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" { xfail *-*-* } } }  
> */
>
>  void
>  foo_andb (__m512i a, __m512i b)
> @@ -40,4 +40,4 @@ foo_andnb (__m512i a, __m512i b)
>foo = m1 & ~m2;
>  }
>
> -/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} }  */
> +/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4" { xfail *-*-* } } }  
> */
> --
> 2.27.0
>


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-24 Thread Richard Biener via Gcc-patches
On Wed, Jun 23, 2021 at 9:25 PM Andrew MacLeod  wrote:
>
> On 6/23/21 2:37 PM, Richard Biener via Gcc-patches wrote:
> > On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
> >  wrote:
> >> The call to gimple_call_fntype() in gimple_call_return_type() may
> >> return
> >> NULL, which causes the TREE_TYPE(lhs) to ICE.  I think it would be best
> >> to
> >> return NULL (or void_type_node) rather than aborting.
> >>
> >> I'm running into this because fold_using_range::range_of_call, calls
> >> gimple_call_return_type which may ICE for builtins with no LHS.
> >> Instead
> >> of special casing things in range_of_call, perhaps it's best to plug
> >> the
> >> source.
> >>
> >> Does this sound reasonable?
> > No, you need to make sure to not call this on an internal function call 
> > instead.
> > Otherwise it is never NULL.
> >
> > Richard.
>
> Out of curiosity, why is it not OK to call this on an internal function
> call?   Shouldn't all calls return something at least? like VOIDmode if
> they don't return anything?  It just seems like it needs to be special
> cased either at any possible call site, or simply in the routine.   we
> stumbled across it and it wasn't obvious why.

Well, gimple_call_fntype simply returns NULL because internal functions
do not have any API/ABI.  So you either deal with a NULL return value
but then explicitely checking for an internal function call is clearly better
from a source documentation point of view.

I think the existing type == NULL check was likely added to avoid touching
too much code and we somehow didn't think of internal function calls
w/o a LHS (but why are you asking for the return type for a call w/o LHS?)

So yes, you could argue that

  if (type == NULL_TREE)
{
gcc_assert (gimple_call_internal_p (gs));
tree lhs = gimple_call_lhs (gs);
return lhs ? TREE_TYPE (lhs) : void_type_node;
}

would make gimple_call_return_type more robust.  But then I wonder
why the function exists in the first place ;)  I suppose like gimple_expr_type
it's one of those I'd eventually see to go away.

Richard.

> Andrew
>
>


Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-24 Thread Richard Biener via Gcc-patches
On Wed, Jun 23, 2021 at 12:22 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  
> > wrote:
> >>
> >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
> >> > On 6/21/21 1:15 AM, Richard Biener wrote:
> > [...]
> >> > >
> >> > > But maybe I'm misunderstanding C++ too much :/
> >> > >
> >> > > Well, I guess b) from above means auto_vec<> passing to
> >> > > vec<> taking functions will need changes?
> >> >
> >> > Converting an auto_vec object to a vec slices off its data members.
> >> > The auto_vec specialization has no data members so that's not
> >> > a bug in and of itself, but auto_vec does have data members
> >> > so that would be a bug.  The risk is not just passing it to
> >> > functions by value but also returning it.  That risk was made
> >> > worse by the addition of the move ctor.
> >>
> >> I would agree that the conversion from auto_vec<> to vec<> is
> >> questionable, and should get some work at some point, perhaps just
> >> passingauto_vec references is good enough, or perhaps there is value in
> >> some const_vec view to avoid having to rely on optimizations, I'm not
> >> sure without looking more at the usage.
> >
> > We do need to be able to provide APIs that work with both auto_vec<>
> > and vec<>, I agree that those currently taking a vec<> by value are
> > fragile (and we've had bugs there before), but I'm not ready to say
> > that changing them all to [const] vec<>& is OK.  The alternative
> > would be passing a const_vec<> by value, passing that along to
> > const vec<>& APIs should be valid then (I can see quite some API
> > boundary cleanups being necessary here ...).
>
> FWIW, as far as const_vec<> goes, we already have array_slice, which is
> mostly a version of std::span.  (The only real reason for not using
> std::span itself is that it requires a later version of C++.)
>
> Taking those as arguments has the advantage that we can pass normal
> arrays as well.

It's not really a "const" thing it seems though it certainly would not expose
any API that would reallocate the vector (which is the problematic part
of passing vec<> by value, not necessarily changing elements in-place).

Since array_slice doesn't inherit most of the vec<> API transforming an
API from vec<> to array_slice<> will be also quite some work.  But I
agree it might be useful for generic API stuff.

Richard.

> Thanks,
> Richard


Re: predcom: Refactor more by encapsulating global states

2021-06-24 Thread Kewen.Lin via Gcc-patches
Hi Martin,

on 2021/6/23 上午12:14, Martin Sebor wrote:
> On 6/21/21 8:35 PM, Kewen.Lin wrote:
>> Hi Richi and Martin,
>>

 Thanks Richi!  One draft (not ready for review) is attached for the further
 discussion.  It follows the idea of RAII-style cleanup.  I noticed that
 Martin suggested stepping forward to make tree_predictive_commoning_loop
 and its callees into one class (Thanks Martin), since there are not many
 this kind of C++-style work functions, I want to double confirm which 
 option
 do you guys prefer?

>>>
>>> Such general cleanup is of course desired - Giuliano started some of it 
>>> within
>>> GSoC two years ago in the attempt to thread the compilation process.  The
>>> cleanup then helps to get rid of global state which of course interferes 
>>> here
>>> (and avoids unnecessary use of TLS vars).
>>>
>>> So yes, encapsulating global state into a class and making accessors
>>> member functions is something that is desired (but a lot of mechanical
>>> work).
>>>
>>> Thanks
>>> Richard.
>>>
>>> I meant that not necessarily as something to include in this patch
>>> but as a suggestion for a future improvement.  If you'd like to
>>> tackle it at any point that would be great of course   In any
>>> event, thanks for double-checking!
>>>
>>> The attached patch looks good to me as well (more for the sake of
>>> style than anything else, declaring the class copy ctor and copy assignment 
>>> = delete would > make it clear it's not meant to be
>>> copied, although in this case it's unlikely to make a practical
>>> difference).
>>>
>>> Martin.
>>
>>
>> Thanks for your explanation!  Sorry for the late response.
>> As the way to encapsulate global state into a class and making accessors
>> member functions looks more complete, I gave up the RAII draft and
>> switched onto this way.
>>
>> This patch is to encapsulate global states into a class and
>> making their accessors as member functions, remove some
>> consequent useless clean up code, and do some clean up with
>> RAII.
> 
> Nice!
> 
> A further improvement worth considering (if you're so inclined :)
> is replacing the pcom_worker vec members with auto_vec (obviating
> having to explicitly release them) and for the same reason also
> replacing the comp_ptrs bare pointer members with auto_vecs.
> There may be other opportunities to do the same in individual
> functions (I'd look to get rid of as many calls to functions
> like XNEW()/XNEWVEC() and free() use auto_vec instead).
> 
> An unrelated but worthwhile change is to replace the FOR_EACH_
> loops with C++ 11 range loops, analogously to:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html
> 
> Finally, the only loosely followed naming convention for member
> variables is to start them with the m_ prefix.
> 
> These just suggestions that could be done in a followup, not
> something I would consider prerequisite for accepting the patch
> as is if I were in a position to make such a decision.
> 

Many thanks for all the great suggestions!  I'll deal with them
in a follow up patch.


BR,
Kewen


Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-24 Thread Richard Biener via Gcc-patches
On Thu, Jun 24, 2021 at 12:56 AM Martin Sebor  wrote:
>
> On 6/23/21 1:43 AM, Richard Biener wrote:
> > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  
> > wrote:
> >>
> >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
> >>> On 6/21/21 1:15 AM, Richard Biener wrote:
> > [...]
> 
>  But maybe I'm misunderstanding C++ too much :/
> 
>  Well, I guess b) from above means auto_vec<> passing to
>  vec<> taking functions will need changes?
> >>>
> >>> Converting an auto_vec object to a vec slices off its data members.
> >>> The auto_vec specialization has no data members so that's not
> >>> a bug in and of itself, but auto_vec does have data members
> >>> so that would be a bug.  The risk is not just passing it to
> >>> functions by value but also returning it.  That risk was made
> >>> worse by the addition of the move ctor.
> >>
> >> I would agree that the conversion from auto_vec<> to vec<> is
> >> questionable, and should get some work at some point, perhaps just
> >> passingauto_vec references is good enough, or perhaps there is value in
> >> some const_vec view to avoid having to rely on optimizations, I'm not
> >> sure without looking more at the usage.
> >
> > We do need to be able to provide APIs that work with both auto_vec<>
> > and vec<>, I agree that those currently taking a vec<> by value are
> > fragile (and we've had bugs there before), but I'm not ready to say
> > that changing them all to [const] vec<>& is OK.  The alternative
> > would be passing a const_vec<> by value, passing that along to
> > const vec<>& APIs should be valid then (I can see quite some API
> > boundary cleanups being necessary here ...).
> >
> > But with all this I don't know how to adjust auto_vec<> to no
> > longer "decay" to vec<> but still being able to pass it to vec<>&
> > and being able to call vec<> member functions w/o jumping through
> > hoops.  Any hints on that?  private inheritance achieves the first
> > but also hides all the API ...
>
> The simplest way to do that is by preventing the implicit conversion
> between the two types and adding a to_vec() member function to
> auto_vec as Jason suggested.  This exposes the implicit conversions
> to the base vec, forcing us to review each and "fix" it either by
> changing the argument to either vec& or const vec&, or less often
> to avoid the indirection if necessary, by converting the auto_vec
> to vec explicitly by calling to_vec().  The attached diff shows
> a very rough POC of what  that might look like.  A side benefit
> is that it improves const-safety and makes the effects of functions
> on their callers easier to understand.
>
> But that's orthogonal to making auto_vec copy constructible and copy
> assignable.  That change can be made independently and with much less
> effort and no risk.

There's a bunch of stuff I can't review because of lack of C++ knowledge
(the vNULL changes).

I suppose that

+  /* Prevent implicit conversion from auto_vec.  Use auto_vec::to_vec()
+ instead.  */
+  template 
+  vec (auto_vec &) = delete;
+
+  template 
+  void operator= (auto_vec &) = delete;

still allows passing auto_vec<> to [const] vec<> & without the .to_vec so
it's just the by-value passing that becomes explicit?  If so then I agree
this is an improvement.  The patch is likely incomplete though or is
it really only so few signatures that need changing?

Thanks,
Richard.

> Martin


Re: predcom: Refactor more by encapsulating global states

2021-06-24 Thread Kewen.Lin via Gcc-patches
on 2021/6/23 下午3:22, Richard Biener wrote:
> On Tue, Jun 22, 2021 at 4:35 AM Kewen.Lin  wrote:
>>
>> Hi Richi and Martin,
>>

 Thanks Richi!  One draft (not ready for review) is attached for the further
 discussion.  It follows the idea of RAII-style cleanup.  I noticed that
 Martin suggested stepping forward to make tree_predictive_commoning_loop
 and its callees into one class (Thanks Martin), since there are not many
 this kind of C++-style work functions, I want to double confirm which 
 option
 do you guys prefer?

>>>
>>> Such general cleanup is of course desired - Giuliano started some of it 
>>> within
>>> GSoC two years ago in the attempt to thread the compilation process.  The
>>> cleanup then helps to get rid of global state which of course interferes 
>>> here
>>> (and avoids unnecessary use of TLS vars).
>>>
>>> So yes, encapsulating global state into a class and making accessors
>>> member functions is something that is desired (but a lot of mechanical
>>> work).
>>>
>>> Thanks
>>> Richard.
>>>
>>> I meant that not necessarily as something to include in this patch
>>> but as a suggestion for a future improvement.  If you'd like to
>>> tackle it at any point that would be great of course   In any
>>> event, thanks for double-checking!
>>>
>>> The attached patch looks good to me as well (more for the sake of
>>> style than anything else, declaring the class copy ctor and copy assignment 
>>> = delete would > make it clear it's not meant to be
>>> copied, although in this case it's unlikely to make a practical
>>> difference).
>>>
>>> Martin.
>>
>>
>> Thanks for your explanation!  Sorry for the late response.
>> As the way to encapsulate global state into a class and making accessors
>> member functions looks more complete, I gave up the RAII draft and
>> switched onto this way.
>>
>> This patch is to encapsulate global states into a class and
>> making their accessors as member functions, remove some
>> consequent useless clean up code, and do some clean up with
>> RAII.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Is it ok for trunk?
> 
> OK, thanks for the work - Martins suggestions are good but indeed can be
> handled as followup.
> 

Thanks Richi!  Re-tested and committed in r12-1767, I will make up a
follow up patch for Martin's suggestions.

BR,
Kewen


Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-24 Thread Kewen.Lin via Gcc-patches
on 2021/6/24 上午12:58, Segher Boessenkool wrote:
> On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote:
 +#ifdef FLOAT128_HW_INSNS_ISA3_1
  TFtype __floattikf (TItype_ppc)
__attribute__ ((__ifunc__ ("__floattikf_resolve")));
>>>
>>> I wonder if we now need TItype_ppc at all anymore, btw?
>>
>> Sorry that I don't quite follow this question.
> 
> I thought it may do the same as just TItype now, but the ifunc stuff
> probably makes it different still :-)
> 

Ah, thanks for the clarification!  If I read it right, TItype is defined
with __attribute__ ((mode (TI))) while TItype_ppc is defined with 
__attribute__ ((__mode__ (__TI__))), the later writing looks special.

BR,
Kewen


Re: [PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO

2021-06-24 Thread Richard Biener via Gcc-patches
On Mon, Jun 21, 2021 at 11:55 PM Martin Sebor via Gcc-patches
 wrote:
>
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html
>
> Looking for a review of the LTO changes to switch TREE_NO_WARNING to
> the suppress_warning() API.

Hmm, since the warning suppressions are on location ad-hoc data the appropriate
thing is to adjust the location streaming and that part seems to be missing?

So what you now stream is simply the "everything" fallback, correct?

In particular:

   else
-bp_pack_value (bp, TREE_NO_WARNING (expr), 1);
+/* FIXME: pack all warning bits.  */
+bp_pack_value (bp, warning_suppressed_p (expr), 1);

this looks like a wrong comment in that light.

-  else
-compare_values (TREE_NO_WARNING);
+  else if (t1->base.nowarning_flag != t2->base.nowarning_flag)
+return false;

uh.  Can you keep sth like TREE_NO_WARNING_RAW or so?

Thanks,
Richard.

> On 6/4/21 3:43 PM, Martin Sebor wrote:
> > The attached patch replaces the uses of TREE_NO_WARNING in the LTO
> > front end with the new suppress_warning() API.  It adds a couple of
> > FIXMEs that I plan to take care of in a follow up.
>


[committed] openmp: in_reduction clause support on target construct

2021-06-24 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds support for in_reduction clause on target construct, though
for now only for synchronous targets (without nowait clause).
The encountering thread in that case runs the target task and blocks until
the target region ends, so it is implemented by remapping it before entering
the target, initializing the private copy if not yet initialized for the
current thread and then using the remapped addresses for the mapping
addresses.
For nowait combined with in_reduction the patch contains a hack where the
nowait clause is ignored.  To implement it correctly, I think we would need
to create a new private variable for the in_reduction and initialize it before
doing the async target and adjust the map addresses to that private variable
and then pass a function pointer to the library routine with code where the 
callback
would remap the address to the current threads private variable and use 
in_reduction
combiner to combine the private variable we've created into the thread's copy.
The library would then need to make sure that the routine is called in some 
thread
participating in the parallel (and not in an unshackeled thread).

Bootstrapped/regtested on x86_64-linux and i686-linux (with no offloading)
and regtested on x86_64-linux with offloading to nvptx-none, committed to
trunk.

2021-06-24  Jakub Jelinek  

gcc/
* tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): Document meaning for OpenMP.
* gimplify.c (gimplify_scan_omp_clauses): For OpenMP map clauses
with OMP_CLAUSE_MAP_IN_REDUCTION flag partially defer gimplification
of non-decl OMP_CLAUSE_DECL.  For OMP_CLAUSE_IN_REDUCTION on
OMP_TARGET user outer_ctx instead of ctx for placeholders and
initializer/combiner gimplification.
* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_MAP_IN_REDUCTION
on target constructs.
(lower_rec_input_clauses): Likewise.
(lower_omp_target): Likewise.
* omp-expand.c (expand_omp_target): Temporarily ignore nowait clause
on target if in_reduction is present.
gcc/c-family/
* c-common.h (enum c_omp_region_type): Add C_ORT_TARGET and
C_ORT_OMP_TARGET.
* c-omp.c (c_omp_split_clauses): For OMP_CLAUSE_IN_REDUCTION on
combined target constructs also add map (always, tofrom:) clause.
gcc/c/
* c-parser.c (omp_split_clauses): Pass C_ORT_OMP_TARGET instead of
C_ORT_OMP for clauses on target construct.
(OMP_TARGET_CLAUSE_MASK): Add in_reduction clause.
(c_parser_omp_target): For non-combined target add
map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION.  Pass
C_ORT_OMP_TARGET to c_finish_omp_clauses.
* c-typeck.c (handle_omp_array_sections): Adjust ort handling
for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are
never present on C_ORT_*DECLARE_SIMD.
(c_finish_omp_clauses): Likewise.  Handle OMP_CLAUSE_IN_REDUCTION
on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on
corresponding map clauses.
gcc/cp/
* parser.c (cp_omp_split_clauses): Pass C_ORT_OMP_TARGET instead of
C_ORT_OMP for clauses on target construct.
(OMP_TARGET_CLAUSE_MASK): Add in_reduction clause.
(cp_parser_omp_target): For non-combined target add
map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION.  Pass
C_ORT_OMP_TARGET to finish_omp_clauses.
* semantics.c (handle_omp_array_sections_1): Adjust ort handling
for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are
never present on C_ORT_*DECLARE_SIMD.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise.  Handle OMP_CLAUSE_IN_REDUCTION
on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on
corresponding map clauses.
* pt.c (tsubst_expr): Pass C_ORT_OMP_TARGET instead of C_ORT_OMP for
clauses on target construct.
gcc/testsuite/
* c-c++-common/gomp/target-in-reduction-1.c: New test.
* c-c++-common/gomp/clauses-1.c: Add in_reduction clauses on
target or combined target constructs.
libgomp/
* testsuite/libgomp.c-c++-common/target-in-reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/target-in-reduction-2.c: New test.
* testsuite/libgomp.c++/target-in-reduction-1.C: New test.
* testsuite/libgomp.c++/target-in-reduction-2.C: New test.

--- gcc/tree.h.jj   2021-06-23 09:57:34.624714820 +0200
+++ gcc/tree.h  2021-06-23 13:33:00.372434259 +0200
@@ -1651,7 +1651,8 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \
   TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 /* Nonzero if this map clause is for an OpenACC compute construct's reduction
-   variable.  */
+   variable or OpenMP map clause mentioned also in in_reduction clause on the
+   same construct.  */
 #define OMP_CLAUSE_MAP_IN_REDUCTION(NOD

[PATCH] Fix SLP permute propagation error

2021-06-24 Thread Richard Biener
This fixes SLP permute propagation to not propagate across operations
that have different semantics on different lanes like for example
the recently added COMPLEX_ADD_ROT90.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-24  Richard Biener  

* tree-vect-slp.c (vect_optimize_slp): Do not propagate
across operations that have different semantics on different
lanes.
---
 gcc/tree-vect-slp.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 29db56ed532..69ee8faed09 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3680,18 +3680,34 @@ vect_optimize_slp (vec_info *vinfo)
{
  int idx = ipo[i-1];
  slp_tree node = vertices[idx].node;
- /* For leafs there's nothing to do - we've seeded permutes
-on those above.  */
- if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+
+ /* Handle externals and constants optimistically throughout the
+iteration.  */
+ if (SLP_TREE_DEF_TYPE (node) == vect_external_def
+ || SLP_TREE_DEF_TYPE (node) == vect_constant_def)
continue;
 
  vertices[idx].visited = 1;
 
- /* We cannot move a permute across a store.  */
- if (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node))
- && DR_IS_WRITE
-  (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node
+ /* We do not handle stores with a permutation.  */
+ stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
+ if (STMT_VINFO_DATA_REF (rep)
+ && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep)))
continue;
+ /* We cannot move a permute across an operation that is
+not independent on lanes.  Note this is an explicit
+negative list since that's much shorter than the respective
+positive one but it's critical to keep maintaining it.  */
+ if (is_gimple_call (STMT_VINFO_STMT (rep)))
+   switch (gimple_call_combined_fn (STMT_VINFO_STMT (rep)))
+ {
+ case CFN_COMPLEX_ADD_ROT90:
+ case CFN_COMPLEX_ADD_ROT270:
+ case CFN_COMPLEX_MUL:
+ case CFN_COMPLEX_MUL_CONJ:
+   continue;
+ default:;
+ }
 
  int perm = -1;
  for (graph_edge *succ = slpg->vertices[idx].succ;
-- 
2.26.2


[PATCH] stor-layout: Avoid DECL_BIT_FIELD_REPRESENTATIVE with NULL TREE_TYPE [PR101172]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
Hi!

finish_bitfield_representative has an early out if the field after a
bitfield has error_mark_node type, but that early out leads to TREE_TYPE
of the DECL_BIT_FIELD_REPRESENTATIVE being NULL, which breaks assumptions
on code that uses the DECL_BIT_FIELD_REPRESENTATIVE during error-recovery.

The following patch instead sets TREE_TYPE of the representative to
error_mark_node, something the users can deal with better.  At this point
the representative can be set as DECL_BIT_FIELD_REPRESENTATIVE for multiple
bitfields, so making sure that we clear the DECL_BIT_FIELD_REPRESENTATIVE
instead would be harder (but doable, e.g. with the error_mark_node TREE_TYPE
set by this patch set some flag in the caller and if the flag is there, walk
all the fields once again and clear all DECL_BIT_FIELD_REPRESENTATIVE that
have error_mark_node TREE_TYPE).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-24  Jakub Jelinek  

PR middle-end/101172
* stor-layout.c (finish_bitfield_representative): If nextf has
error_mark_node type, set repr type to error_mark_node too.

* gcc.dg/pr101172.c: New test.

--- gcc/stor-layout.c.jj2021-06-21 09:39:21.798485474 +0200
+++ gcc/stor-layout.c   2021-06-23 11:10:22.617680051 +0200
@@ -2086,7 +2086,10 @@ finish_bitfield_representative (tree rep
   /* If there was an error, the field may be not laid out
  correctly.  Don't bother to do anything.  */
   if (TREE_TYPE (nextf) == error_mark_node)
-   return;
+   {
+ TREE_TYPE (repr) = error_mark_node;
+ return;
+   }
   maxsize = size_diffop (DECL_FIELD_OFFSET (nextf),
 DECL_FIELD_OFFSET (repr));
   if (tree_fits_uhwi_p (maxsize))
--- gcc/testsuite/gcc.dg/pr101172.c.jj  2021-06-23 11:15:19.934670004 +0200
+++ gcc/testsuite/gcc.dg/pr101172.c 2021-06-23 11:15:07.255841009 +0200
@@ -0,0 +1,20 @@
+/* PR middle-end/101172 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+union U
+{
+  int a[3];
+  struct
+  {
+int a : 3;
+struct this_struct var;/* { dg-error "field 'var' has incomplete type" 
} */
+  } b;
+};
+
+const union U hello = {.a = {1, 2, 3}};
+
+void foo()
+{
+  int x = hello.b.a;
+}

Jakub



[PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-24 Thread Di Zhao via Gcc-patches
This patch enhances FRE by recording equivalences generated by conditional
statements, so that we can find more optimize opportunities in situations like
following two cases:

case 1:
void f (unsigned int a, unsigned int b)
{
  if (a == b)
{
  for (unsigned i = 0; i < a; i++)
  {
if (i == b)
  foo (); /* Unreachable */
  }
}
}
The "i == b" condition is always false, yet original FRE cannot predict this.
Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s
result. (Moreover, VRP can't infer "i == b" is false either, as its boundary
calculation is hindered by the "unsigned" modifier.)

case 2:
Consider GIMPLE code:
   :
  if (a != 0)
  goto ; [INV]
  else
  goto ; [INV]

   :

   :
  # c = PHI 
  if (a != 0)
  goto ; [INV]
  else
  goto ; [INV]

   :
  if (c > x)
  goto ; [INV]
  else
  goto ; [INV]

   :
  __builtin_puts (&"Unreachable!"[0]);

   :
  __builtin_puts (&"a"[0]);

   :
  ...
The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the same
conditional statement, meanwhile bb2 dominates bb4, so it is predictable that
c==x at bb5 and c==y at bb7. Keeping record of this might bring further
optimizations, such as removing the conditional in bb5.

The basic idea is to use a hashmap to record additional equivalence information
generated by conditional statements.

Insert to the map:
1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record
the equivalence of x and y at edge destiny in the map.
2. Consider case 2 above, when we fail to predict the result of a
conditional expression (at bb4), if following conditions be satisfied:
  a. There is a previous corresponding predicate. In this case, it 
will
  be "a != 0 is true" at bb3.
  b. bb3's single predecessor bb2 dominates bb4.
  c. bb3's conditional statement's predicate code (or inverted 
code) is
  identical with that of bb4. (This condition can be relaxed.)
Then we can try to find a PHI operation along A's true and false edge, and
record the equivalence of PHI operation's result and arguments at bb4's
true and false destinations. Regarding this case, "c==x at bb5" and
"c==y at bb7" will be recorded.

Use the map:
When we cannot find a prediction result for a conditional statement by
original process, replace conditional expression's operands with qualified
equivalence and try again.

As searching predicates and the ssa names to record are based on
value numbering, we need to "unwind" the equivalence map for iteration.

Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu.

Regards,
Di Zhao

Extend FRE with an "equivalence map" for condition prediction.

2021-06-24  Di Zhao  

gcc/ChangeLog:
PR tree-optimization/101186
* tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
(dominated_by_p_w_unex): Moved upward, no change.
(vn_nary_op_get_predicated_value): Moved upward, no change.
(struct val_equiv_hasher): Hasher for the "equivalence map".
(compute_single_op_hash): Compute hashcode from ssa name.
(is_vn_qualified_at_bb): Check if vn_pval is valid at BB.
(val_equiv_insert): Insert into "equivalence map".
(vn_lookup_cond_result): Lookup conditional expression's result by VN.
(find_predicated_value_by_equiv): Search for predicated value,
utilizing equivalences that we have recorded.
(record_equiv_from_previous_edge): Record equivalence relation from a
previouse edge on current edge.
(record_equiv_from_previous_cond): Search for equivalences generated by
previous conditional statements, and record them on current BB's
successors.
(vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
into the "equivalence map" for predicate like "x==y is true".
(free_rpo_vn): Free the "equivalence map".
(process_bb): Insert into & lookup from the "equivalence map".
(struct unwind_state): Add "equivalence map" unwind state.
(do_unwind): Unwind the "equivalence map".
(do_rpo_vn): Update "equivalence map" unwind state.

gcc/testsuite/ChangeLog:
PR tree-optimization/101186
* gcc.dg/tree-ssa/vrp03.c: Disable fre.
* gcc.dg/tree-ssa/ssa-fre-95.c: New test.
* gcc.dg/tree-ssa/ssa-fre-96.c: New test.


tree-optimization-101186.patch
Description: tree-optimization-101186.patch


[PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The recent addition of gcc_assert (regno < endregno); triggers during
glibc build on m68k.
The problem is that RA decisions shouldn't depend on expressions in
DEBUG_INSNs and those expressions can contain paradoxical subregs of certain
pseudos.  If RA then decides to allocate the pseudo to a register
with very small hard register REGNO, we can trigger the new assert,
as (int) subreg_regno_offset may be negative on big endian and the small
REGNO + the negative offset can wrap around.

The following patch in that case records the range from the REGNO 0 to
endregno, before the addition of the assert as both regno and endregno are
unsigned it wouldn't record anything at all silently.

Bootstrapped/regtested on x86_64-linux and i686-linux and tested with a
cross compiler to m68k-liux on the testcase, ok for trunk?

2021-06-24  Jakub Jelinek  

PR middle-end/101170
* df-scan.c (df_ref_record): For paradoxical big-endian SUBREGs
where regno + subreg_regno_offset wraps around use 0 as starting
regno.

* gcc.dg/pr101170.c: New test.

--- gcc/df-scan.c.jj2021-06-22 10:04:46.371208994 +0200
+++ gcc/df-scan.c   2021-06-23 12:46:51.654678805 +0200
@@ -2576,9 +2576,21 @@ df_ref_record (enum df_ref_class cl,
 
   if (GET_CODE (reg) == SUBREG)
{
- regno += subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)),
-   SUBREG_BYTE (reg), GET_MODE (reg));
- endregno = regno + subreg_nregs (reg);
+ int off = subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)),
+SUBREG_BYTE (reg), GET_MODE (reg));
+ unsigned int nregno = regno + off;
+ endregno = nregno + subreg_nregs (reg);
+ if (off < 0 && regno < (unsigned) -off)
+   /* Deal with paradoxical SUBREGs on big endian where
+  in debug insns the hard reg number might be smaller
+  than -off, such as (subreg:DI (reg:SI 0 [+4 ]) 0));
+  RA decisions shouldn't be affected by debug insns
+  and so RA can decide to put pseudo into a hard reg
+  with small REGNO, even when it is referenced in
+  a paradoxical SUBREG in a debug insn.  */
+   regno = 0;
+ else
+   regno = nregno;
}
   else
endregno = END_REGNO (reg);
--- gcc/testsuite/gcc.dg/pr101170.c.jj  2021-06-23 12:27:08.866593960 +0200
+++ gcc/testsuite/gcc.dg/pr101170.c 2021-06-23 12:26:55.823769555 +0200
@@ -0,0 +1,37 @@
+/* PR middle-end/101170 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+#include 
+
+struct S { int a; int b[4]; } s;
+va_list ap;
+int i;
+long long l;
+
+struct S
+foo (int x)
+{
+  struct S a = {};
+  do
+if (x)
+  return a;
+  while (1);
+}
+
+__attribute__((noipa)) void
+bar (void)
+{
+  for (; i; i++)
+l |= va_arg (ap, long long) << s.b[i];
+  if (l)
+foo (l);
+}
+
+void
+baz (int v, ...)
+{
+  va_start (ap, v);
+  bar ();
+  va_end (ap);
+}

Jakub



Re: [PATCH] stor-layout: Avoid DECL_BIT_FIELD_REPRESENTATIVE with NULL TREE_TYPE [PR101172]

2021-06-24 Thread Richard Biener
On Thu, 24 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> finish_bitfield_representative has an early out if the field after a
> bitfield has error_mark_node type, but that early out leads to TREE_TYPE
> of the DECL_BIT_FIELD_REPRESENTATIVE being NULL, which breaks assumptions
> on code that uses the DECL_BIT_FIELD_REPRESENTATIVE during error-recovery.
> 
> The following patch instead sets TREE_TYPE of the representative to
> error_mark_node, something the users can deal with better.  At this point
> the representative can be set as DECL_BIT_FIELD_REPRESENTATIVE for multiple
> bitfields, so making sure that we clear the DECL_BIT_FIELD_REPRESENTATIVE
> instead would be harder (but doable, e.g. with the error_mark_node TREE_TYPE
> set by this patch set some flag in the caller and if the flag is there, walk
> all the fields once again and clear all DECL_BIT_FIELD_REPRESENTATIVE that
> have error_mark_node TREE_TYPE).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2021-06-24  Jakub Jelinek  
> 
>   PR middle-end/101172
>   * stor-layout.c (finish_bitfield_representative): If nextf has
>   error_mark_node type, set repr type to error_mark_node too.
> 
>   * gcc.dg/pr101172.c: New test.
> 
> --- gcc/stor-layout.c.jj  2021-06-21 09:39:21.798485474 +0200
> +++ gcc/stor-layout.c 2021-06-23 11:10:22.617680051 +0200
> @@ -2086,7 +2086,10 @@ finish_bitfield_representative (tree rep
>/* If there was an error, the field may be not laid out
>   correctly.  Don't bother to do anything.  */
>if (TREE_TYPE (nextf) == error_mark_node)
> - return;
> + {
> +   TREE_TYPE (repr) = error_mark_node;
> +   return;
> + }
>maxsize = size_diffop (DECL_FIELD_OFFSET (nextf),
>DECL_FIELD_OFFSET (repr));
>if (tree_fits_uhwi_p (maxsize))
> --- gcc/testsuite/gcc.dg/pr101172.c.jj2021-06-23 11:15:19.934670004 
> +0200
> +++ gcc/testsuite/gcc.dg/pr101172.c   2021-06-23 11:15:07.255841009 +0200
> @@ -0,0 +1,20 @@
> +/* PR middle-end/101172 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +union U
> +{
> +  int a[3];
> +  struct
> +  {
> +int a : 3;
> +struct this_struct var;  /* { dg-error "field 'var' has incomplete type" 
> } */
> +  } b;
> +};
> +
> +const union U hello = {.a = {1, 2, 3}};
> +
> +void foo()
> +{
> +  int x = hello.b.a;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] c: Fix C cast error-recovery [PR101171]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs during error-recovery, as build_c_cast calls
note_integer_operands on error_mark_node and that wraps it into
C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on.
Seems most other callers of note_integer_operands check early if something
is error_mark_node and return before calling note_integer_operands on it.

The following patch fixes it by not calling on error_mark_node, another
possibility would be to handle error_mark_node in note_integer_operands and
just return it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-24  Jakub Jelinek  

PR c/101171
* c-typeck.c (build_c_cast): Don't call note_integer_operands on
error_mark_node.

* gcc.dg/pr101171.c: New test.

--- gcc/c/c-typeck.c.jj 2021-06-23 13:33:00.375434219 +0200
+++ gcc/c/c-typeck.c2021-06-23 17:51:17.501401208 +0200
@@ -6131,6 +6131,7 @@ build_c_cast (location_t loc, tree type,
  return value reflects this.  */
   if (int_operands
   && INTEGRAL_TYPE_P (type)
+  && value != error_mark_node
   && !EXPR_INT_CONST_OPERANDS (value))
 value = note_integer_operands (value);
 
--- gcc/testsuite/gcc.dg/pr101171.c.jj  2021-06-23 17:56:53.409896019 +0200
+++ gcc/testsuite/gcc.dg/pr101171.c 2021-06-23 17:56:28.668227851 +0200
@@ -0,0 +1,13 @@
+/* PR c/101171 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+extern void foo (void);
+int x = 0x1234;
+
+void
+bar (void)
+{
+  if (x != (sizeof ((enum T) 0x1234))) /* { dg-error "conversion to incomplete 
type" } */
+foo ();
+}

Jakub



Re: [PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]

2021-06-24 Thread Richard Biener
On Thu, 24 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> The recent addition of gcc_assert (regno < endregno); triggers during
> glibc build on m68k.
> The problem is that RA decisions shouldn't depend on expressions in
> DEBUG_INSNs and those expressions can contain paradoxical subregs of certain
> pseudos.  If RA then decides to allocate the pseudo to a register
> with very small hard register REGNO, we can trigger the new assert,
> as (int) subreg_regno_offset may be negative on big endian and the small
> REGNO + the negative offset can wrap around.

Hm, I wonder if we should reset the debug_insn if the RA made a decision
that produces such non-sensical result for a debug_insn use?

> The following patch in that case records the range from the REGNO 0 to
> endregno, before the addition of the assert as both regno and endregno are
> unsigned it wouldn't record anything at all silently.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux and tested with a
> cross compiler to m68k-liux on the testcase, ok for trunk?

OK.

Thanks,
Richard.

> 2021-06-24  Jakub Jelinek  
> 
>   PR middle-end/101170
>   * df-scan.c (df_ref_record): For paradoxical big-endian SUBREGs
>   where regno + subreg_regno_offset wraps around use 0 as starting
>   regno.
> 
>   * gcc.dg/pr101170.c: New test.
> 
> --- gcc/df-scan.c.jj  2021-06-22 10:04:46.371208994 +0200
> +++ gcc/df-scan.c 2021-06-23 12:46:51.654678805 +0200
> @@ -2576,9 +2576,21 @@ df_ref_record (enum df_ref_class cl,
>  
>if (GET_CODE (reg) == SUBREG)
>   {
> -   regno += subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)),
> - SUBREG_BYTE (reg), GET_MODE (reg));
> -   endregno = regno + subreg_nregs (reg);
> +   int off = subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)),
> +  SUBREG_BYTE (reg), GET_MODE (reg));
> +   unsigned int nregno = regno + off;
> +   endregno = nregno + subreg_nregs (reg);
> +   if (off < 0 && regno < (unsigned) -off)
> + /* Deal with paradoxical SUBREGs on big endian where
> +in debug insns the hard reg number might be smaller
> +than -off, such as (subreg:DI (reg:SI 0 [+4 ]) 0));
> +RA decisions shouldn't be affected by debug insns
> +and so RA can decide to put pseudo into a hard reg
> +with small REGNO, even when it is referenced in
> +a paradoxical SUBREG in a debug insn.  */
> + regno = 0;
> +   else
> + regno = nregno;
>   }
>else
>   endregno = END_REGNO (reg);
> --- gcc/testsuite/gcc.dg/pr101170.c.jj2021-06-23 12:27:08.866593960 
> +0200
> +++ gcc/testsuite/gcc.dg/pr101170.c   2021-06-23 12:26:55.823769555 +0200
> @@ -0,0 +1,37 @@
> +/* PR middle-end/101170 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -g" } */
> +
> +#include 
> +
> +struct S { int a; int b[4]; } s;
> +va_list ap;
> +int i;
> +long long l;
> +
> +struct S
> +foo (int x)
> +{
> +  struct S a = {};
> +  do
> +if (x)
> +  return a;
> +  while (1);
> +}
> +
> +__attribute__((noipa)) void
> +bar (void)
> +{
> +  for (; i; i++)
> +l |= va_arg (ap, long long) << s.b[i];
> +  if (l)
> +foo (l);
> +}
> +
> +void
> +baz (int v, ...)
> +{
> +  va_start (ap, v);
> +  bar ();
> +  va_end (ap);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 24, 2021 at 12:13:18PM +0200, Richard Biener wrote:
> > The recent addition of gcc_assert (regno < endregno); triggers during
> > glibc build on m68k.
> > The problem is that RA decisions shouldn't depend on expressions in
> > DEBUG_INSNs and those expressions can contain paradoxical subregs of certain
> > pseudos.  If RA then decides to allocate the pseudo to a register
> > with very small hard register REGNO, we can trigger the new assert,
> > as (int) subreg_regno_offset may be negative on big endian and the small
> > REGNO + the negative offset can wrap around.
> 
> Hm, I wonder if we should reset the debug_insn if the RA made a decision
> that produces such non-sensical result for a debug_insn use?

For debug info purposes it isn't necessarily invalid.
If we extract later on only the well defined bits from the paradoxical
subreg (e.g. by using a lowpart subreg of the debug expr), or AND it with
a constant mask that only contains bits on the low part positions, then it
is fine.  And otherwise, it is useless for debug info anyway, we can't
express that in DWARF (low bits set to ..., high bits undefined) - there
the whole thing would be undefined.

Jakub



[PATCH] c: Fix up c_parser_has_attribute_expression [PR101176]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
Hi!

This function keeps src_range member of the result uninitialized, which at
least under valgrind can show up later when those uninitialized location_t's
can make it into the IL or location_t hash tables.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2021-06-24  Jakub Jelinek  

PR c/101176
* c-parser.c (c_parser_has_attribute_expression): Set source range for
the result.

--- gcc/c/c-parser.c.jj 2021-06-23 13:33:00.374434233 +0200
+++ gcc/c/c-parser.c2021-06-23 23:53:45.146083923 +0200
@@ -8406,6 +8406,7 @@ c_parser_has_attribute_expression (c_par
 {
   gcc_assert (c_parser_next_token_is_keyword (parser,
  RID_BUILTIN_HAS_ATTRIBUTE));
+  location_t start = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
   c_inhibit_evaluation_warnings++;
@@ -8484,6 +8485,7 @@ c_parser_has_attribute_expression (c_par
 
   parser->translate_strings_p = save_translate_strings_p;
 
+  location_t finish = c_parser_peek_token (parser)->location;
   if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
 c_parser_consume_token (parser);
   else
@@ -8512,6 +8514,7 @@ c_parser_has_attribute_expression (c_par
   else
 result.value =  boolean_false_node;
 
+  set_c_expr_source_range (&result, start, finish);
   return result;
 }
 

Jakub



[RFC PATCH] i386: Add pack/unpack patterns for 64bit vectors [PR89021]

2021-06-24 Thread Uros Bizjak via Gcc-patches
2021-06-24  Uroš Bizjak  

gcc/
PR target/89021
* config/i386/i386-expand.c (ix86_expand_sse_unpack):
Handle V8QI and V4HI modes.
* config/i386/mmx.md (sse4_1_v4qiv4hi2):
New insn pattern.
(sse4_1_v4qiv4hi2): Ditto.
(mmxpackmode): New mode attribute.
(vec_pack_trunc_): New expander.
(mmxunpackmode): New mode attribute.
(vec_unpacks_lo_): New expander.
(vec_unpacks_hi_): Ditto.
(vec_unpacku_lo_): Ditto.
(vec_unpacku_hi_): Ditto.
* config/i386/i386.md (extsuffix): Move from ...
* config/i386/sse.md: ... here.

gcc/testsuite/

PR target/89021
* gcc.target/i386/pr97249-1.c (foo): Add #pragma
to avoid loop vectorization.
(foo1): Ditto.
(foo2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

There is still one scan-tree-not failure in generic vectorization testsuite:

FAIL: gcc.dg/vect/vect-nb-iter-ub-3.c scan-tree-dump-not cunroll "loop
turned into non-loop; it never loops"

This probably happens due to the additional epilogue vectorization,
but I don't know how to "fix" this failure. Richi, can you perhaps
help me here?

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 2cb939e51c3..e9763eb5b3e 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -5161,6 +5161,18 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
  else
unpack = gen_sse4_1_sign_extendv2siv2di2;
  break;
+   case E_V8QImode:
+ if (unsigned_p)
+   unpack = gen_sse4_1_zero_extendv4qiv4hi2;
+ else
+   unpack = gen_sse4_1_sign_extendv4qiv4hi2;
+ break;
+   case E_V4HImode:
+ if (unsigned_p)
+   unpack = gen_sse4_1_zero_extendv2hiv2si2;
+ else
+   unpack = gen_sse4_1_sign_extendv2hiv2si2;
+ break;
default:
  gcc_unreachable ();
}
@@ -5172,10 +5184,24 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
}
   else if (high_p)
{
- /* Shift higher 8 bytes to lower 8 bytes.  */
- tmp = gen_reg_rtx (V1TImode);
- emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, src),
-GEN_INT (64)));
+ switch (GET_MODE_SIZE (imode))
+   {
+   case 16:
+ /* Shift higher 8 bytes to lower 8 bytes.  */
+ tmp = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, src),
+GEN_INT (64)));
+ break;
+   case 8:
+ /* Shift higher 4 bytes to lower 4 bytes.  */
+ tmp = gen_reg_rtx (V1DImode);
+ emit_insn (gen_mmx_lshrv1di3 (tmp, gen_lowpart (V1DImode, src),
+   GEN_INT (32)));
+ break;
+   default:
+ gcc_unreachable ();
+   }
+
  tmp = gen_lowpart (imode, tmp);
}
   else
@@ -5207,6 +5233,18 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool 
unsigned_p, bool high_p)
  else
unpack = gen_vec_interleave_lowv4si;
  break;
+   case E_V8QImode:
+ if (high_p)
+   unpack = gen_mmx_punpckhbw;
+ else
+   unpack = gen_mmx_punpcklbw;
+ break;
+   case E_V4HImode:
+ if (high_p)
+   unpack = gen_mmx_punpckhwd;
+ else
+   unpack = gen_mmx_punpcklwd;
+ break;
default:
  gcc_unreachable ();
}
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9043be3105d..9b619e2f78f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1000,6 +1000,9 @@ (define_code_iterator any_truncate [ss_truncate truncate 
us_truncate])
 (define_code_attr trunsuffix
   [(ss_truncate "s") (truncate "") (us_truncate "us")])
 
+;; Instruction suffix for SSE sign and zero extensions.
+(define_code_attr extsuffix [(sign_extend "sx") (zero_extend "zx")])
+
 ;; Used in signed and unsigned fix.
 (define_code_iterator any_fix [fix unsigned_fix])
 (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")])
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7a827dceb01..e887f03474d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2639,6 +2639,78 @@ (define_insn_and_split "mmx_punpckldq"
(set_attr "type" "mmxcvt,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
+(define_insn "sse4_1_v4qiv4hi2"
+  [(set (match_operand:V4HI 0 "register_operand" "=Yr,*x,Yw")
+   (any_extend:V4HI
+ (vec_select:V4QI
+   (match_operand:V8QI 1 "register_operand" "Yr,*x,Yw")
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)]]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "%vpmovbw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_at

Re: [RFC PATCH] i386: Add pack/unpack patterns for 64bit vectors [PR89021]

2021-06-24 Thread Richard Biener
On Thu, 24 Jun 2021, Uros Bizjak wrote:

> 2021-06-24  Uroš Bizjak  
> 
> gcc/
> PR target/89021
> * config/i386/i386-expand.c (ix86_expand_sse_unpack):
> Handle V8QI and V4HI modes.
> * config/i386/mmx.md (sse4_1_v4qiv4hi2):
> New insn pattern.
> (sse4_1_v4qiv4hi2): Ditto.
> (mmxpackmode): New mode attribute.
> (vec_pack_trunc_): New expander.
> (mmxunpackmode): New mode attribute.
> (vec_unpacks_lo_): New expander.
> (vec_unpacks_hi_): Ditto.
> (vec_unpacku_lo_): Ditto.
> (vec_unpacku_hi_): Ditto.
> * config/i386/i386.md (extsuffix): Move from ...
> * config/i386/sse.md: ... here.
> 
> gcc/testsuite/
> 
> PR target/89021
> * gcc.target/i386/pr97249-1.c (foo): Add #pragma
> to avoid loop vectorization.
> (foo1): Ditto.
> (foo2): Ditto.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> There is still one scan-tree-not failure in generic vectorization testsuite:
> 
> FAIL: gcc.dg/vect/vect-nb-iter-ub-3.c scan-tree-dump-not cunroll "loop
> turned into non-loop; it never loops"
> 
> This probably happens due to the additional epilogue vectorization,
> but I don't know how to "fix" this failure. Richi, can you perhaps
> help me here?

I would suggest to add --param vect-epilogues-nomask=0 to
dg-additional-options to preserve what the testcase tested.

Richard.


Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-24 Thread Richard Biener
On Tue, 22 Jun 2021, Uros Bizjak wrote:

> On Tue, Jun 22, 2021 at 12:34 PM Richard Biener  wrote:
> >
> > On Tue, 22 Jun 2021, Uros Bizjak wrote:
> >
> > > On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford
> > >  wrote:
> > >
> > > > >> Well, the pattern is called addsub in the x86 world because highpart
> > > > >> does add and lowpart does sub. In left-to-right writing systems
> > > > >> highpart comes before lowpart, so you have addsub.
> > > > >
> > > > > The other targets mentioned do not seem to agree but I can live
> > > > > with that, thus I'll change back to addsub.
> > > >
> > > > FWIW, subadd sounds clearer to me too.  It seems surprising to put
> > > > imaginary before real when interpreting something as complex, for 
> > > > example.
> > > >
> > > > Putting the highpart first feels especially odd on an LE system like 
> > > > x86?
> > >
> > > The XMM vector is documented left to right with MSB at the left (c.f.
> > > most significant *DIGIT* of the number at the left)
> > >
> > > xmm[MSB]  xmm[LSB]
> > >
> > > so, looking at x86 ADDSUBPD insn documentation:
> > >
> > > xmm2[127:64] xmm2[63:0]
> > > ( + -)
> > > xmm1[127:64] xmm1[63:0]
> > > (=)
> > > xmm1[127:64] holds ADD
> > > xmm1[63:0] holds SUB
> > >
> > > xmm1[127:64] xmm1 [63:0]
> > > ADD SUB
> >
> > I think if we really want to resolve the "ambiguity" we have to
> > spell it out in the optab.  vec_addodd_subeven or vec_addsub_oddeven.
> > As I noted there are targets who have the opposite so we could
> > then add vec_addsub_evenodd (not vec_subadd_evenodd).
> >
> > Just tell me what you prefer - I'll adjust the patch accordingly.
> 
> I'd use addsub when add comes at the left of sub, and MSB is also
> considered at the left. subadd for when sub comes at the left of add
> where MSB is also at the left.
> 
> I think that the documentation should clear the ambiguity.
> 
> Otherwise, just my 0.02?, I don't want to bikeshed about this ad
> infinitum, so I'll stop there.

Yeah, so I've not changed anything but after the permute propagation
fix I just pushed I've added a testcase that would be broken without
it and adding the new CFN_VEC_ADDSUB case (which I now did).

Re-bootstrapped and tested on x86_64-unknown-linux-gnu, now pushed
since it helps me working with Tamar on the SLP pattern reorg to have
at least one matching pattern on x86_64.

Richard.

>From a6bfb9106f5605c0ce6ffe1ed91b50510e43343b Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 31 May 2021 13:19:01 +0200
Subject: [PATCH] Add x86 addsub SLP pattern
To: gcc-patches@gcc.gnu.org

This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
thus subtract, add alternating on lanes, starting with subtract.

It adds a corresponding optab and direct internal function,
vec_addsub$a3 and renames the existing i386 backend patterns to
the new canonical name.

The SLP pattern matches the exact alternating lane sequence rather
than trying to be clever and anticipating incoming permutes - we
could permute the two input vectors to the needed lane alternation,
do the addsub and then permute the result vector back but that's
only profitable in case the two input or the output permute will
vanish - something Tamars refactoring of SLP pattern recog should
make possible.

2021-06-17  Richard Biener  

* config/i386/sse.md (avx_addsubv4df3): Rename to
vec_addsubv4df3.
(avx_addsubv8sf3): Rename to vec_addsubv8sf3.
(sse3_addsubv2df3): Rename to vec_addsubv2df3.
(sse3_addsubv4sf3): Rename to vec_addsubv4sf3.
* internal-fn.def (VEC_ADDSUB): New internal optab fn.
* optabs.def (vec_addsub_optab): New optab.
* tree-vect-slp-patterns.c (class addsub_pattern): New.
(slp_patterns): Add addsub_pattern.
(vect_optimize_slp): Disable propagation across CFN_VEC_ADDSUB.
* tree-vectorizer.h (vect_pattern::vect_pattern): Make
m_ops optional.
* doc/md.texi (vec_addsub3): Document.

* gcc.target/i386/vect-addsubv2df.c: New testcase.
* gcc.target/i386/vect-addsubv4sf.c: Likewise.
* gcc.target/i386/vect-addsubv4df.c: Likewise.
* gcc.target/i386/vect-addsubv8sf.c: Likewise.
* gcc.target/i386/vect-addsub-2.c: Likewise.
* gcc.target/i386/vect-addsub-3.c: Likewise.
---
 gcc/config/i386/i386-builtin.def  |   8 +-
 gcc/config/i386/sse.md|   8 +-
 gcc/doc/md.texi   |   8 ++
 gcc/internal-fn.def   |   1 +
 gcc/optabs.def|   1 +
 gcc/testsuite/gcc.target/i386/vect-addsub-2.c |  21 
 gcc/testsuite/gcc.target/i386/vect-addsub-3.c |  38 +++
 .../gcc.target/i386/vect-addsubv2df.c |  42 
 .../gcc.target/i386/vect-addsubv4df.c |  36 +++
 .../gcc.target/i386/vect-addsubv4sf.c |  46 
 .../gcc.target/i386/vect-addsubv8sf.c 

[ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

2021-06-24 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
This patch replaces builtins for vdup_n and vmov_n.
The patch results in regression for pr51534.c.
Consider following function:

uint8x8_t f1 (uint8x8_t a) {
  return vcgt_u8(a, vdup_n_u8(0));
}

code-gen before patch:
f1:
vmov.i32  d16, #0  @ v8qi
vcgt.u8 d0, d0, d16
bx lr

code-gen after patch:
f1:
vceq.i8 d0, d0, #0
vmvnd0, d0
bx lr

I am not sure which one is better tho ?

Also, this patch regressed bf16_dup.c on arm-linux-gnueabi,
which is due to a missed opt in lowering. I had filed it as
PR98435, and posted a fix for it here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html

Thanks,
Prathamesh
2021-06-24  Prathamesh Kulkarni  

PR target/66791
* gcc/config/arm/arm_neon.h (vdup_n_s8): Replace call to builtin
with constructor.
(vdup_n_s16): Likewise.
(vdup_n_s32): Likewise.
(vdup_n_s64): Likewise.
(vdup_n_u8): Likewise.
(vdup_n_u16): Likewise.
(vdup_n_u32): Likewise.
(vdup_n_u64): Likewise.
(vdup_n_p8): Likewise.
(vdup_n_p16): Likewise.
(vdup_n_p64): Likewise.
(vdup_n_f16): Likewise.
(vdup_n_f32): Likewise.
(vdupq_n_s8): Likewise.
(vdupq_n_s16): Likewise.
(vdupq_n_s32): Likewise.
(vdupq_n_s64): Likewise.
(vdupq_n_u8): Likewise.
(vdupq_n_u16): Likewise.
(vdupq_n_u32): Likewise.
(vdupq_n_u64): Likewise.
(vdupq_n_p8): Likewise.
(vdupq_n_p16): Likewise.
(vdupq_n_p64): Likewise.
(vdupq_n_f16): Likewise.
(vdupq_n_f32): Likewise.
(vmov_n_s8): Replace call to builtin with call to corresponding
vdup intrinsic.
(vmov_n_s16): Likewise.
(vmov_n_s32): Likewise.
(vmov_n_s64): Likewise.
(vmov_n_u8): Likewise.
(vmov_n_u16): Likewise.
(vmov_n_u32): Likewise.
(vmov_n_u64): Likewise.
(vmov_n_p8): Likewise.
(vmov_n_p16): Likewise.
(vmov_n_f16): Likewise.
(vmov_n_f32): Likewise. 
(vmovq_n_s8): Likewise. 
(vmovq_n_s16): Likewise.
(vmovq_n_s32): Likewise.
(vmovq_n_s64): Likewise.
(vmovq_n_u8): Likewise.
(vmovq_n_u16): Likewise.
(vmovq_n_u32): Likewise.
(vmovq_n_u64): Likewise.
(vmovq_n_p8): Likewise.
(vmovq_n_p16): Likewise.
(vmovq_n_f16): Likewise.
(vmovq_n_f32): Likewise.
* config/arm/arm_neon_builtins.def: Remove entries for vdup_n.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3efcfa45229..bf26cd49d53 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -6625,63 +6625,63 @@ __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_s8 (int8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+  return (int8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
 }
 
 __extension__ extern __inline int16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_s16 (int16_t __a)
 {
-  return (int16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+  return (int16x4_t) {__a, __a, __a, __a};
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_s32 (int32_t __a)
 {
-  return (int32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+  return (int32x2_t) {__a, __a};
 }
 
 __extension__ extern __inline float32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_f32 (float32_t __a)
 {
-  return (float32x2_t)__builtin_neon_vdup_nv2sf ((__builtin_neon_sf) __a);
+  return (float32x2_t) {__a, __a};
 }
 
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_u8 (uint8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+  return (uint8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_u16 (uint16_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+  return (uint16x4_t) {__a, __a, __a, __a};
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_u32 (uint32_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+  return (uint32x2_t) {__a, __a};
 }
 
 __extension__ extern __inline poly8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vdup_n_p8 (poly8_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+  return (poly8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
 }
 
 __extension__ extern __inline poly16x4_t
 __attribute__  ((__always_inline__, __g

[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-06-24 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



This helps with generating the code for kernel hotpatches, which
contain individual functions and are loaded more than 2G away from
vmlinux.  This should not create performance regressions for the
normal use cases, because for local functions ld replaces @PLT calls
with direct calls.

gcc/ChangeLog:

* config/s390/s390.c (print_operand): Handle %K.
* config/s390/s390.md (*movdi_64): Use %K for larl.
(reload_base_64): Likewise.
(*sibcall_brc): Use %K for j.
(*sibcall_brcl): Use %K for jg.
(*sibcall_value_brc): Use %K for j.
(*sibcall_value_brcl): Use %K for jg.
(*bras): Use %K.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(main_base_64): Use %K for larl.
(reload_base_64): Likewise.
(@split_stack_call): Use %K for jg.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call31-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z10-pic.c: New test.
* gcc.target/s390/call31-z10.c: New test.
* gcc.target/s390/call31-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z9-pic.c: New test.
* gcc.target/s390/call31-z9.c: New test.
* gcc.target/s390/call64-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z10-pic.c: New test.
* gcc.target/s390/call64-z10.c: New test.
* gcc.target/s390/call64-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z9-pic.c: New test.
* gcc.target/s390/call64-z9.c: New test.
---
 gcc/config/s390/s390.c|  9 +
 gcc/config/s390/s390.md   | 26 ++---
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/gcc.target/s390/call.h  | 38 +++
 .../s390/call31-z10-pic-nodatarel.c   | 16 
 .../gcc.target/s390/call31-z10-pic.c  | 16 
 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 
 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 
 .../s390/call64-z10-pic-nodatarel.c   | 17 +
 .../gcc.target/s390/call64-z10-pic.c  | 17 +
 gcc/testsuite/gcc.target/s390/call64-z10.c| 15 
 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9.c | 15 
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +--
 19 files changed, 258 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e7839044a40 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr)
 'E': print opcode suffix for branch on index instruction.
 'G': print the size of the operand in bytes.
 'J': print tls_load/tls_gdcall/tls_ldcall suffix
+'K': print @PLT suffix for call targets and load address values.
 'M': print the second word of a TImode operand.
 'N': print the second word of a DImode operand.
 'O': print only the displacement of a memory reference or address.
@@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code)
 case 'Y':
   print_shift_count_operand (file, x);
   return;
+
+case 'K':
+  if (TARGET_64BIT
+ && flag_pic
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_FUNCTION_P (x))
+   fprintf (file, "@PLT");
+  return

Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-24 Thread Cassio Neri via Gcc-patches
Thanks Jonathan.

Here are some benchmarks (assembly in [1]):
https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw

Unfortunately, quick-bench times out unless some implementations are
commented out. You can copy the code and run it locally (needs google
benchmark) to get the full picture.

I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits
really good code and looking at previous versions this has been very
consistent. This implementation also highlights very clearly that my hack
to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7)
saves a ror instruction with remarkable performance effects.

In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest.
Nevertheless, compared to other fast alternatives, performance differences
are small and results seem very much platform dependent. In my opinion, his
implementation is the best, especially, given recent compiler changes. My
reasoning follows.

My original motivation was contrasting 'year % 400 == 0' with 'year % 16'
as in versions 1 and 2:

  __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1
  __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2

It's fair to expect that version 2 won't be slower than version 1. However,
this is the case and by quite a big margin. The emitted instructions are
very reasonable and, I guess, the issue is the choice of registers. I've
reimplemented half of version 2 in inline asm using similar register
choices as version 1 and got the performance boost that I was expecting.
(By no means I'm suggesting a non portable implementation here. It was just
an exercise to make my point. Also, it performs very poorly when compiled
by clang.)

The point here is that code emitted for sources like versions 1 and 2
(involving %) seems sensitive to very small variations and has been
changing across compiler releases in the last 3 years or so. (This can be
checked on godbolt.) Clang also seems very confused [2]. Even if the
emitted code for version 2 and others were good today, I fear a new
compiler release could regress. The stability of the code emitted for
Ulrich Drepper's implementation is much more reliable, its performance is
very good and its clarity is only compromised by my own hack for
__is_multiple_of_100. (Recall, however, that this hack is crucial for his
implementation's performance.)

Best wishes,
Cassio.

[1] https://godbolt.org/z/TbGYEqx33
[2] https://bugs.llvm.org/show_bug.cgi?id=50334

On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely  wrote:

> On 23/06/21 18:51 +0100, Jonathan Wakely wrote:
> >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
> >Pushed to trunk.
> >
> >
> >
>
> >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
> >Author: Cassio Neri 
> >Date:   Wed Jun 23 15:32:16 2021
> >
> >libstdc++: More efficient std::chrono::year::leap
> >
> >Simple change to std::chrono::year::is_leap. If a year is multiple of
> 100,
> >then it's divisible by 400 if and only if it's divisible by 16. The
> latter
> >allows for better code generation.
> >
> >The expression is then either y%16 or y%4 which are both powers of two
> >and so it can be rearranged to use simple bitmask operations.
> >
> >Co-authored-by: Jonathan Wakely 
> >Co-authored-by: Ulrich Drepper 
> >
> >libstdc++-v3/ChangeLog:
> >
> >* include/std/chrono (chrono::year::is_leap()): Optimize.
> >
> >diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> >index 4631a727d73..863b6a27bdf 100644
> >--- a/libstdc++-v3/include/std/chrono
> >+++ b/libstdc++-v3/include/std/chrono
> >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   // [1] https://github.com/cassioneri/calendar
> >   // [2]
> https://accu.org/journals/overload/28/155/overload155.pdf#page=16
> >
> >+  // Furthermore, if y%100 != 0, then y%400==0 is equivalent to
> y%16==0,
> >+  // so we can rearrange the expression to (mult_100 ? y % 4 : y %
> 16)==0
>
> But Ulrich pointed out I got my boolean logic all muddled up in the
> comment. Fixed with the attached patch!
>
>


[PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread H.J. Lu via Gcc-patches
CPUID functions are used to detect CPU features.  If vector ISAs
are enabled, compiler is free to use them in these functions.  Add
__attribute__ ((target("general-regs-only"))) to CPUID functions
to avoid vector instructions.

gcc/

PR target/101185
* config/i386/cpuid.h (__get_cpuid_max): Add
__attribute__ ((target("general-regs-only"))).
(__get_cpuid): Likewise.
(__get_cpuid_count): Likewise.
(__cpuidex): Likewise.

gcc/testsuite/

PR target/101185
* gcc.target/i386/avx512-check.h (check_osxsave): Add
__attribute__ ((target("general-regs-only"))).
(main): Likewise.
---
 gcc/config/i386/cpuid.h  | 4 
 gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index aebc17c6827..74881ee91e5 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -243,6 +243,7 @@
pointer is non-null, then first four bytes of the signature
(as found in ebx register) are returned in location pointed by sig.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline unsigned int
 __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
 {
@@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
supported and returns 1 for valid cpuid information or 0 for
unsupported cpuid leaf.  All pointers are required to be non-null.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline int
 __get_cpuid (unsigned int __leaf,
 unsigned int *__eax, unsigned int *__ebx,
@@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
 
 /* Same as above, but sub-leaf can be specified.  */
 
+__attribute__ ((target("general-regs-only")))
 static __inline int
 __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
   unsigned int *__eax, unsigned int *__ebx,
@@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
__subleaf,
   return 1;
 }
 
+__attribute__ ((target("general-regs-only")))
 static __inline void
 __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
 {
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..406faf8fe03 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -25,6 +25,7 @@ do_test (void)
 }
 #endif
 
+__attribute__ ((target("general-regs-only")))
 static int
 check_osxsave (void)
 {
@@ -34,6 +35,7 @@ check_osxsave (void)
   return (ecx & bit_OSXSAVE) != 0;
 }
 
+__attribute__ ((target("general-regs-only")))
 int
 main ()
 {
-- 
2.31.1



Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-24 Thread Richard Biener via Gcc-patches
On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches
 wrote:
>
> This patch enhances FRE by recording equivalences generated by conditional
> statements, so that we can find more optimize opportunities in situations like
> following two cases:
>
> case 1:
> void f (unsigned int a, unsigned int b)
> {
>   if (a == b)
> {
>   for (unsigned i = 0; i < a; i++)
>   {
> if (i == b)
>   foo (); /* Unreachable */
>   }
> }
> }
> The "i == b" condition is always false, yet original FRE cannot predict this.
> Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s
> result. (Moreover, VRP can't infer "i == b" is false either, as its boundary
> calculation is hindered by the "unsigned" modifier.)
>
> case 2:
> Consider GIMPLE code:
>:
>   if (a != 0)
>   goto ; [INV]
>   else
>   goto ; [INV]
>
>:
>
>:
>   # c = PHI 
>   if (a != 0)
>   goto ; [INV]
>   else
>   goto ; [INV]
>
>:
>   if (c > x)
>   goto ; [INV]
>   else
>   goto ; [INV]
>
>:
>   __builtin_puts (&"Unreachable!"[0]);
>
>:
>   __builtin_puts (&"a"[0]);
>
>:
>   ...
> The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the 
> same
> conditional statement, meanwhile bb2 dominates bb4, so it is predictable that
> c==x at bb5 and c==y at bb7. Keeping record of this might bring further
> optimizations, such as removing the conditional in bb5.
>
> The basic idea is to use a hashmap to record additional equivalence 
> information
> generated by conditional statements.
>
> Insert to the map:
> 1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record
> the equivalence of x and y at edge destiny in the map.
> 2. Consider case 2 above, when we fail to predict the result of a
> conditional expression (at bb4), if following conditions be satisfied:
>   a. There is a previous corresponding predicate. In this case, 
> it will
>   be "a != 0 is true" at bb3.
>   b. bb3's single predecessor bb2 dominates bb4.
>   c. bb3's conditional statement's predicate code (or inverted 
> code) is
>   identical with that of bb4. (This condition can be relaxed.)
> Then we can try to find a PHI operation along A's true and false edge, and
> record the equivalence of PHI operation's result and arguments at bb4's
> true and false destinations. Regarding this case, "c==x at bb5" and
> "c==y at bb7" will be recorded.
>
> Use the map:
> When we cannot find a prediction result for a conditional statement by
> original process, replace conditional expression's operands with qualified
> equivalence and try again.
>
> As searching predicates and the ssa names to record are based on
> value numbering, we need to "unwind" the equivalence map for iteration.
>
> Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu.

I have some reservations about extending the ad-hoc "predicated value" code.

Some comments on the patch:

+/* hashtable & helpers to record equivalences at given bb.  */
+
+typedef struct val_equiv_s
+{
+  val_equiv_s *next;
+  val_equiv_s *unwind_to;
+  hashval_t hashcode;
+  /* SSA name this val_equiv_s is associated with.  */
+  tree name;
+  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
+  vn_pval *values;
+} * val_equiv_t;

all of this (and using a hashtable for recording) is IMHO a bit overkill.
Since you only ever record equivalences for values the more natural
place to hook those in is the vn_ssa_aux structure where we also
record the availability chain.

There's little commentary in the new code, in particular function-level
comments are missing everywhere.

There's complexity issues, like I see val_equiv_insert has a "recurse"
feature but also find_predicated_value_by_equiv is quadratic in
the number of equivalences of the lhs/rhs.  Without knowing what
the recursion on the former is for - nothing tells me - I suspect
either of both should be redundant.

You seem to record equivalences at possible use points which
looks odd at best - I'd expected equivalences being recorded
at the same point we record predicated values and for the
current condition, not the one determining some other predication.
What was the motivation to do it the way you do it?

Why is the code conditional on 'iterate'?

You've probably realized that there's no "nice" way to
handle temporary equivalences in a VN scheme using
hashing for expressions (unless you degrade hashes a lot).

You quote opportunities that are catched with this like

+  if (a != 0)
+{
+  c = b;
+}
+

Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread Richard Biener via Gcc-patches
On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
 wrote:
>
> CPUID functions are used to detect CPU features.  If vector ISAs
> are enabled, compiler is free to use them in these functions.  Add
> __attribute__ ((target("general-regs-only"))) to CPUID functions
> to avoid vector instructions.

But there are GPR instructions not in x86_64, so shouldn't
we use target("march=x86_64") or so?  Note doing either will
of course prevent inlining of those "inlines".

So I'm not sure how much of a fix this is ... the error will almost
always be visible in the caller as well.

> gcc/
>
> PR target/101185
> * config/i386/cpuid.h (__get_cpuid_max): Add
> __attribute__ ((target("general-regs-only"))).
> (__get_cpuid): Likewise.
> (__get_cpuid_count): Likewise.
> (__cpuidex): Likewise.
>
> gcc/testsuite/
>
> PR target/101185
> * gcc.target/i386/avx512-check.h (check_osxsave): Add
> __attribute__ ((target("general-regs-only"))).
> (main): Likewise.
> ---
>  gcc/config/i386/cpuid.h  | 4 
>  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..74881ee91e5 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -243,6 +243,7 @@
> pointer is non-null, then first four bytes of the signature
> (as found in ebx register) are returned in location pointed by sig.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline unsigned int
>  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>  {
> @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> supported and returns 1 for valid cpuid information or 0 for
> unsupported cpuid leaf.  All pointers are required to be non-null.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid (unsigned int __leaf,
>  unsigned int *__eax, unsigned int *__ebx,
> @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
>
>  /* Same as above, but sub-leaf can be specified.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>unsigned int *__eax, unsigned int *__ebx,
> @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
> __subleaf,
>return 1;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline void
>  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..406faf8fe03 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -25,6 +25,7 @@ do_test (void)
>  }
>  #endif
>
> +__attribute__ ((target("general-regs-only")))
>  static int
>  check_osxsave (void)
>  {
> @@ -34,6 +35,7 @@ check_osxsave (void)
>return (ecx & bit_OSXSAVE) != 0;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  int
>  main ()
>  {
> --
> 2.31.1
>


Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread H.J. Lu via Gcc-patches
On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
 wrote:
>
> On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > CPUID functions are used to detect CPU features.  If vector ISAs
> > are enabled, compiler is free to use them in these functions.  Add
> > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > to avoid vector instructions.
>
> But there are GPR instructions not in x86_64, so shouldn't
> we use target("march=x86_64") or so?  Note doing either will
> of course prevent inlining of those "inlines".

Does -march=x86_64, which enables CMOV and other GPR
ISAs,  work for -m32?

> So I'm not sure how much of a fix this is ... the error will almost
> always be visible in the caller as well.

I think _attribute__ ((target("general-regs-only"))) is a step
forward.

> > gcc/
> >
> > PR target/101185
> > * config/i386/cpuid.h (__get_cpuid_max): Add
> > __attribute__ ((target("general-regs-only"))).
> > (__get_cpuid): Likewise.
> > (__get_cpuid_count): Likewise.
> > (__cpuidex): Likewise.
> >
> > gcc/testsuite/
> >
> > PR target/101185
> > * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > __attribute__ ((target("general-regs-only"))).
> > (main): Likewise.
> > ---
> >  gcc/config/i386/cpuid.h  | 4 
> >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..74881ee91e5 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -243,6 +243,7 @@
> > pointer is non-null, then first four bytes of the signature
> > (as found in ebx register) are returned in location pointed by sig.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline unsigned int
> >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >  {
> > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int 
> > *__sig)
> > supported and returns 1 for valid cpuid information or 0 for
> > unsupported cpuid leaf.  All pointers are required to be non-null.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid (unsigned int __leaf,
> >  unsigned int *__eax, unsigned int *__ebx,
> > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> >
> >  /* Same as above, but sub-leaf can be specified.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >unsigned int *__eax, unsigned int *__ebx,
> > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
> > __subleaf,
> >return 1;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline void
> >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..406faf8fe03 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -25,6 +25,7 @@ do_test (void)
> >  }
> >  #endif
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static int
> >  check_osxsave (void)
> >  {
> > @@ -34,6 +35,7 @@ check_osxsave (void)
> >return (ecx & bit_OSXSAVE) != 0;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  int
> >  main ()
> >  {
> > --
> > 2.31.1
> >



-- 
H.J.


Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread Richard Biener via Gcc-patches
On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu  wrote:
>
> On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
>  wrote:
> >
> > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > are enabled, compiler is free to use them in these functions.  Add
> > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > to avoid vector instructions.
> >
> > But there are GPR instructions not in x86_64, so shouldn't
> > we use target("march=x86_64") or so?  Note doing either will
> > of course prevent inlining of those "inlines".
>
> Does -march=x86_64, which enables CMOV and other GPR
> ISAs,  work for -m32?

I don't think so.  I'm also not sure whether -march=xyz in a
target attribute overrides -mavx512f on the command-line ;)

> > So I'm not sure how much of a fix this is ... the error will almost
> > always be visible in the caller as well.
>
> I think _attribute__ ((target("general-regs-only"))) is a step
> forward.

That I agree to, but then the cpuid code is likely written the
way it is to allow inlining.  But code using CPUID should best compile
functions under the check with additional target attribute
(or in a separate TU) rather than compiling everything with
extra -mXYZ and trying to "disable" things in the dispatching
code (and the code leading to it!).

Richard.

> > > gcc/
> > >
> > > PR target/101185
> > > * config/i386/cpuid.h (__get_cpuid_max): Add
> > > __attribute__ ((target("general-regs-only"))).
> > > (__get_cpuid): Likewise.
> > > (__get_cpuid_count): Likewise.
> > > (__cpuidex): Likewise.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/101185
> > > * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > > __attribute__ ((target("general-regs-only"))).
> > > (main): Likewise.
> > > ---
> > >  gcc/config/i386/cpuid.h  | 4 
> > >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> > >  2 files changed, 6 insertions(+)
> > >
> > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > > index aebc17c6827..74881ee91e5 100644
> > > --- a/gcc/config/i386/cpuid.h
> > > +++ b/gcc/config/i386/cpuid.h
> > > @@ -243,6 +243,7 @@
> > > pointer is non-null, then first four bytes of the signature
> > > (as found in ebx register) are returned in location pointed by sig.  
> > > */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline unsigned int
> > >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > >  {
> > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int 
> > > *__sig)
> > > supported and returns 1 for valid cpuid information or 0 for
> > > unsupported cpuid leaf.  All pointers are required to be non-null.  */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline int
> > >  __get_cpuid (unsigned int __leaf,
> > >  unsigned int *__eax, unsigned int *__ebx,
> > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> > >
> > >  /* Same as above, but sub-leaf can be specified.  */
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline int
> > >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > >unsigned int *__eax, unsigned int *__ebx,
> > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
> > > __subleaf,
> > >return 1;
> > >  }
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static __inline void
> > >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> > >  {
> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> > > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > index 0a377dba1d5..406faf8fe03 100644
> > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > @@ -25,6 +25,7 @@ do_test (void)
> > >  }
> > >  #endif
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  static int
> > >  check_osxsave (void)
> > >  {
> > > @@ -34,6 +35,7 @@ check_osxsave (void)
> > >return (ecx & bit_OSXSAVE) != 0;
> > >  }
> > >
> > > +__attribute__ ((target("general-regs-only")))
> > >  int
> > >  main ()
> > >  {
> > > --
> > > 2.31.1
> > >
>
>
>
> --
> H.J.


Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread H.J. Lu via Gcc-patches
On Thu, Jun 24, 2021 at 5:47 AM Richard Biener
 wrote:
>
> On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu  wrote:
> >
> > On Thu, Jun 24, 2021 at 5:35 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches
> > >  wrote:
> > > >
> > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > are enabled, compiler is free to use them in these functions.  Add
> > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > to avoid vector instructions.
> > >
> > > But there are GPR instructions not in x86_64, so shouldn't
> > > we use target("march=x86_64") or so?  Note doing either will
> > > of course prevent inlining of those "inlines".
> >
> > Does -march=x86_64, which enables CMOV and other GPR
> > ISAs,  work for -m32?
>
> I don't think so.  I'm also not sure whether -march=xyz in a
> target attribute overrides -mavx512f on the command-line ;)
>
> > > So I'm not sure how much of a fix this is ... the error will almost
> > > always be visible in the caller as well.
> >
> > I think _attribute__ ((target("general-regs-only"))) is a step
> > forward.
>
> That I agree to, but then the cpuid code is likely written the
> way it is to allow inlining.  But code using CPUID should best compile
> functions under the check with additional target attribute
> (or in a separate TU) rather than compiling everything with
> extra -mXYZ and trying to "disable" things in the dispatching
> code (and the code leading to it!).

CPUID checks in GCC tests should be compiled with noinline, ...
plus minimum ISAs allowed.

> Richard.
>
> > > > gcc/
> > > >
> > > > PR target/101185
> > > > * config/i386/cpuid.h (__get_cpuid_max): Add
> > > > __attribute__ ((target("general-regs-only"))).
> > > > (__get_cpuid): Likewise.
> > > > (__get_cpuid_count): Likewise.
> > > > (__cpuidex): Likewise.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR target/101185
> > > > * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > > > __attribute__ ((target("general-regs-only"))).
> > > > (main): Likewise.
> > > > ---
> > > >  gcc/config/i386/cpuid.h  | 4 
> > > >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > > > index aebc17c6827..74881ee91e5 100644
> > > > --- a/gcc/config/i386/cpuid.h
> > > > +++ b/gcc/config/i386/cpuid.h
> > > > @@ -243,6 +243,7 @@
> > > > pointer is non-null, then first four bytes of the signature
> > > > (as found in ebx register) are returned in location pointed by sig. 
> > > >  */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline unsigned int
> > > >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> > > >  {
> > > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int 
> > > > *__sig)
> > > > supported and returns 1 for valid cpuid information or 0 for
> > > > unsupported cpuid leaf.  All pointers are required to be non-null.  
> > > > */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline int
> > > >  __get_cpuid (unsigned int __leaf,
> > > >  unsigned int *__eax, unsigned int *__ebx,
> > > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> > > >
> > > >  /* Same as above, but sub-leaf can be specified.  */
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline int
> > > >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> > > >unsigned int *__eax, unsigned int *__ebx,
> > > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned 
> > > > int __subleaf,
> > > >return 1;
> > > >  }
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static __inline void
> > > >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> > > >  {
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> > > > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > index 0a377dba1d5..406faf8fe03 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > @@ -25,6 +25,7 @@ do_test (void)
> > > >  }
> > > >  #endif
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  static int
> > > >  check_osxsave (void)
> > > >  {
> > > > @@ -34,6 +35,7 @@ check_osxsave (void)
> > > >return (ecx & bit_OSXSAVE) != 0;
> > > >  }
> > > >
> > > > +__attribute__ ((target("general-regs-only")))
> > > >  int
> > > >  main ()
> > > >  {
> > > > --
> > > > 2.31.1
> > > >
> >
> >
> >
> > --
> > H.J.



-- 
H.J.


Re: [PATCH 5/7] Try inverted comparison for match_simplify in phiopt

2021-06-24 Thread Bernhard Reutner-Fischer via Gcc-patches
Hi Andrew,

just a nit..

On Wed, 23 Jun 2021 15:19:13 -0700
apinski--- via Gcc-patches  wrote:

> From: Andrew Pinski 
> 
> Since match and simplify does not have all of the inverted
> comparison patterns, it make sense to just have
> phi-opt try to do the inversion and try match and simplify again.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu.
> 
> Thanks,
> Andrew Pinski
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-phiopt.c (gimple_simplify_phiopt):
>   If "A ? B : C" fails to simplify, try "(!A) ? C : B".
> ---
>  gcc/tree-ssa-phiopt.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 8b0e68c1e90..5f099eca343 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -828,7 +828,8 @@ phiopt_early_allow (enum tree_code code)
> with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> if EARLY_P is set.
> Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
> -   to simplify CMP ? ARG0 : ARG1.  */
> +   to simplify CMP ? ARG0 : ARG1.
> +   Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse failed.  */

commentary typo above, args need swap:
+   Also try to simplify (!CMP) ? ARG1 : ARG0 if the non-inverse failed.  */

thanks,


Re: [PATCH] c: Fix up c_parser_has_attribute_expression [PR101176]

2021-06-24 Thread Marek Polacek via Gcc-patches
On Thu, Jun 24, 2021 at 12:20:56PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> This function keeps src_range member of the result uninitialized, which at
> least under valgrind can show up later when those uninitialized location_t's
> can make it into the IL or location_t hash tables.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok, thanks.
 
> 2021-06-24  Jakub Jelinek  
> 
>   PR c/101176
>   * c-parser.c (c_parser_has_attribute_expression): Set source range for
>   the result.
> 
> --- gcc/c/c-parser.c.jj   2021-06-23 13:33:00.374434233 +0200
> +++ gcc/c/c-parser.c  2021-06-23 23:53:45.146083923 +0200
> @@ -8406,6 +8406,7 @@ c_parser_has_attribute_expression (c_par
>  {
>gcc_assert (c_parser_next_token_is_keyword (parser,
> RID_BUILTIN_HAS_ATTRIBUTE));
> +  location_t start = c_parser_peek_token (parser)->location;
>c_parser_consume_token (parser);
>  
>c_inhibit_evaluation_warnings++;
> @@ -8484,6 +8485,7 @@ c_parser_has_attribute_expression (c_par
>  
>parser->translate_strings_p = save_translate_strings_p;
>  
> +  location_t finish = c_parser_peek_token (parser)->location;
>if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
>  c_parser_consume_token (parser);
>else
> @@ -8512,6 +8514,7 @@ c_parser_has_attribute_expression (c_par
>else
>  result.value =  boolean_false_node;
>  
> +  set_c_expr_source_range (&result, start, finish);
>return result;
>  }

Marek



[committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

2021-06-24 Thread Jonathan Wakely via Gcc-patches
The LWG issue proposes to add a conditional noexcept-specifier to
std::unique_ptr's dereference operator. The issue is currently in
Tentatively Ready status, but even if it isn't voted into the draft, we
can do it as a conforming extensions. This commit also adds a similar
noexcept-specifier to operator[] for the unique_ptr partial
specialization.

Also ensure that all dereference operators for shared_ptr are noexcept,
and adds tests for the std::optional accessors modified by the issue,
which were already noexcept in our implementation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
Add noexcept.
* include/bits/unique_ptr.h (unique_ptr::operator*): Add
conditional noexcept as per LWG 2762.
* testsuite/20_util/shared_ptr/observers/array.cc: Check that
dereferencing cannot throw.
* testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
* testsuite/20_util/optional/observers/lwg2762.cc: New test.
* testsuite/20_util/unique_ptr/lwg2762.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 17bc3848e065c0980523e1a1592f2f03b24b4f1c
Author: Jonathan Wakely 
Date:   Thu Jun 24 12:56:20 2021

libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

The LWG issue proposes to add a conditional noexcept-specifier to
std::unique_ptr's dereference operator. The issue is currently in
Tentatively Ready status, but even if it isn't voted into the draft, we
can do it as a conforming extensions. This commit also adds a similar
noexcept-specifier to operator[] for the unique_ptr partial
specialization.

Also ensure that all dereference operators for shared_ptr are noexcept,
and adds tests for the std::optional accessors modified by the issue,
which were already noexcept in our implementation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
Add noexcept.
* include/bits/unique_ptr.h (unique_ptr::operator*): Add
conditional noexcept as per LWG 2762.
* testsuite/20_util/shared_ptr/observers/array.cc: Check that
dereferencing cannot throw.
* testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
* testsuite/20_util/optional/observers/lwg2762.cc: New test.
* testsuite/20_util/unique_ptr/lwg2762.cc: New test.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index eb9ad23ba1e..5be935d174d 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1035,7 +1035,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   element_type&
-  operator[](ptrdiff_t __i) const
+  operator[](ptrdiff_t __i) const noexcept
   {
__glibcxx_assert(_M_get() != nullptr);
__glibcxx_assert(!extent<_Tp>::value || __i < extent<_Tp>::value);
diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 6e5537536e8..1781fe15649 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -402,7 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Dereference the stored pointer.
   typename add_lvalue_reference::type
-  operator*() const
+  operator*() const noexcept(noexcept(*std::declval()))
   {
__glibcxx_assert(get() != pointer());
return *get();
@@ -655,6 +655,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Access an element of owned array.
   typename std::add_lvalue_reference::type
   operator[](size_t __i) const
+  noexcept(noexcept(std::declval()[std::declval()]))
   {
__glibcxx_assert(get() != pointer());
return get()[__i];
diff --git a/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc 
b/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc
new file mode 100644
index 000..a0cf0bc19a0
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++17 }  }
+
+// LWG 2762 adds noexcept to operator-> and operator*
+#include 
+
+struct S
+{
+  void can_throw();
+  void cannot_throw() noexcept;
+};
+
+static_assert( ! noexcept(std::declval&>()->can_throw()) );
+static_assert( noexcept(std::declval&>()->cannot_throw()) );
+
+static_assert( noexcept(std::declval&>().operator->()) );
+static_assert( noexcept(std::declval&>().operator->()) );
+
+static_assert( noexcept(*std::declval&>()) );
+static_assert( noexcept(*std::declval&>()) );
+static_assert( noexcept(*std::declval&&>()) );
+static_assert( noexcept(*std::declval&&>()) );
diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/observers/array.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/observers/array.cc
index f6

Re: [PATCH 0/3] Improve and document stdx::simd testsuite

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Wed, 23 Jun 2021 at 16:46, Jonathan Wakely wrote:
>
> On Tue, 8 Jun 2021 at 09:57, Matthias Kretz  wrote:
> >
> > As discussed a long time ago on IRC, this improves (i.e. decreases by 
> > default)
> > the verbosity of make check-simd, gives more verbosity options, and finally
> > documents how the simd testsuite is used and how it works. In addition, 
> > after
> > PR98834 was resolved, remove the -fno-tree-vrp workaround.
> >
> > Tested on x86_64-linux (and more).
>
> Great, thanks, I'm about to push all 3 patches to trunk.

I've pushed this fix for some typos in the new README.md
commit 07ba52849ffca26a3d461f94921b23a9cdbaea7f
Author: Jonathan Wakely 
Date:   Thu Jun 24 13:49:19 2021

libstdc++: Fix typos and markdown errors in new simd/README.md

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/README.md: Fix typos.

diff --git a/libstdc++-v3/testsuite/experimental/simd/README.md 
b/libstdc++-v3/testsuite/experimental/simd/README.md
index db0d71f8d43..b82453df403 100644
--- a/libstdc++-v3/testsuite/experimental/simd/README.md
+++ b/libstdc++-v3/testsuite/experimental/simd/README.md
@@ -11,7 +11,7 @@
 ### `target_list`
 
 Similar to dejagnu target lists: E.g. 
-`target_list="unix{-march=sandybridge,-march=native/-ffast-math,-march=native/-ffinite-math-only}"
 
+`target_list="unix{-march=sandybridge,-march=native/-ffast-math,-march=native/-ffinite-math-only}"`
 
 would create three subdirs in `testsuite/simd/` to run the complete simd 
 testsuite first with `-march=sandybridge`, then with `-march=native 
 -ffast-math`, and finally with `-march=native -ffinite-math-only`.
@@ -21,7 +21,7 @@
 
 This variable can be set to a path to a file which is equivalent to a dejagnu 
 board. The file needs to be a valid `sh` script since it is sourced from the 
-`scripts/check_simd` script. It's purpose is to set the `target_list` variable 
+`scripts/check_simd` script. Its purpose is to set the `target_list` variable 
 depending on `$target_triplet` (or whatever else makes sense for you). Example:
 
 ```sh
@@ -52,7 +52,7 @@
 `power7` which always uses the flags `-mcpu=power7` and `-static` when 
 compiling tests and prepends `$HOME/bin/run_on_gccfarm gcc112` to test 
 executables. In `target_list` you can now use the name `power7`. E.g. 
-`target_list="power7 power7/-ffast-math"` or it's shorthand 
+`target_list="power7 power7/-ffast-math"` or its shorthand 
 `target_list="power7{,-ffast-math}"`.
 
 
@@ -109,7 +109,7 @@
 influencing the set of tests to generate and whether the test driver should 
 expect a failure.
 
-Then the test must at least `#include "bits/verify.h", which provides `main` 
+Then the test must at least `#include "bits/verify.h"`, which provides `main` 
 and declares a `template  void test()` function, which the test 
has 
 to define. The template parameter is set to `simd` type where `T` and 
 `Abi` are determined by the type and ABI subset dimensions.


Re: [PATCH, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v4]

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote:

Note, you'll need to rebase your patch, it clashes with
r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528.
Sorry for that.  And sorry for patch review delay.

> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, 
> vec &types,
> return error_mark_node;
>   }
> t = TREE_OPERAND (t, 0);
> +   if ((ort == C_ORT_ACC || ort == C_ORT_OMP)

Map clauses never appear on declare simd, so
(ort == C_ORT_ACC || ort == C_ORT_OMP)
previously meant always and since the in_reduction change is incorrect
(as C_ORT_OMP_TARGET is used for target construct but not for
e.g. target data* or target update).

> +   && TREE_CODE (t) == MEM_REF)

So please just use if (TREE_CODE (t) == MEM_REF)
or explain when it shouldn't trigger.

> @@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   {
> while (TREE_CODE (t) == COMPONENT_REF)
>   t = TREE_OPERAND (t, 0);
> +   if (TREE_CODE (t) == MEM_REF)
> + {
> +   t = TREE_OPERAND (t, 0);
> +   STRIP_NOPS (t);
> + }

This doesn't look correct.  At least the parsing (and the spec AFAIK)
doesn't ensure that if there is ->, it must come before all the dots.
So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or
map (s.a->b->c->d->e) is used, it wouldn't.  I'd expect a single
while loop that looks through COMPONENT_REFs and MEM_REFs as they appear.
Maybe the handle_omp_array_sections_1 MEM_REF case too?

Or do you want to have it done incrementally, start with supporting only
a single -> first before all the dots and later on add support for the rest?

I think the 5.0 and especially 5.1 wording basically says that map clause
operand is arbitrary lvalue expression that includes array section support
too, so eventually we should just have somewhere in parsing scope a bool
whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or
similar tree code for those and after parsing the expression, ensure
array sections appear only where they can appear and for a subset of the
lvalue expressions where we have decl plus series of -> field or . field
or [ index ] or [ array section stuff ] handle those specially.
That arbitrary lvalue can certainly be done incrementally.
map (foo(123)->a.b[3]->c.d[:7]) and the like.

> if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> && OMP_CLAUSE_MAP_IMPLICIT (c)
> && (bitmap_bit_p (&map_head, DECL_UID (t))
> @@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>  bias) to zero here, so it is not set erroneously to the pointer
>  size later on in gimplify.c.  */
>   OMP_CLAUSE_SIZE (c) = size_zero_node;
> +   indir_component_ref_p = false;
> +   if ((ort == C_ORT_ACC || ort == C_ORT_OMP)

Same comment about ort tests.

> +   && TREE_CODE (t) == COMPONENT_REF
> +   && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF)
> + {
> +   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> +   indir_component_ref_p = true;
> +   STRIP_NOPS (t);
> + }

Again, this can handle only a single ->

> @@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
> *pragma_tok,
>   cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc;
> }
>   }
> -   tree stmt = make_node (OMP_TARGET);
> -   TREE_TYPE (stmt) = void_type_node;
> -   OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET];
> -   c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true);
> -   OMP_TARGET_BODY (stmt) = body;
> -   OMP_TARGET_COMBINED (stmt) = 1;
> -   SET_EXPR_LOCATION (stmt, pragma_tok->location);
> -   add_stmt (stmt);
> -   pc = &OMP_TARGET_CLAUSES (stmt);
> -   goto check_clauses;
> +   c_omp_adjust_map_clauses (cclauses[C_OMP_CLAUSE_SPLIT_TARGET], true);
> +   finish_omp_target (pragma_tok->location,
> +  cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true);

What is the advantage of finish_omp_target.  Perhaps the check_clauses label
can be renamed and more things common to both paths moved after the label if
needed, but as long as it isn't something also called during instantiation,
I find it cleaner to do it in cp_parser_omp_target at one place.
The reason for e.g. finish_omp_parallel is that it is called from both
parsing and instantiation.

> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -4990,6 +4990,9 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
> &types,
>  {
>if (error_operand_p (t))
>   return error_mark_node;
> +  if ((ort == C_ORT_ACC || ort == 

Re: [PATCH] c: Fix C cast error-recovery [PR101171]

2021-06-24 Thread Marek Polacek via Gcc-patches
On Thu, Jun 24, 2021 at 12:11:23PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs during error-recovery, as build_c_cast calls
> note_integer_operands on error_mark_node and that wraps it into
> C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on.
> Seems most other callers of note_integer_operands check early if something
> is error_mark_node and return before calling note_integer_operands on it.

Yeah.
 
> The following patch fixes it by not calling on error_mark_node, another
> possibility would be to handle error_mark_node in note_integer_operands and
> just return it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This looks OK to me, thanks.
 
> 2021-06-24  Jakub Jelinek  
> 
>   PR c/101171
>   * c-typeck.c (build_c_cast): Don't call note_integer_operands on
>   error_mark_node.
> 
>   * gcc.dg/pr101171.c: New test.
> 
> --- gcc/c/c-typeck.c.jj   2021-06-23 13:33:00.375434219 +0200
> +++ gcc/c/c-typeck.c  2021-06-23 17:51:17.501401208 +0200
> @@ -6131,6 +6131,7 @@ build_c_cast (location_t loc, tree type,
>   return value reflects this.  */
>if (int_operands
>&& INTEGRAL_TYPE_P (type)
> +  && value != error_mark_node
>&& !EXPR_INT_CONST_OPERANDS (value))
>  value = note_integer_operands (value);
>  
> --- gcc/testsuite/gcc.dg/pr101171.c.jj2021-06-23 17:56:53.409896019 
> +0200
> +++ gcc/testsuite/gcc.dg/pr101171.c   2021-06-23 17:56:28.668227851 +0200
> @@ -0,0 +1,13 @@
> +/* PR c/101171 */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +extern void foo (void);
> +int x = 0x1234;
> +
> +void
> +bar (void)
> +{
> +  if (x != (sizeof ((enum T) 0x1234)))   /* { dg-error "conversion to 
> incomplete type" } */
> +foo ();
> +}
> 
>   Jakub
> 

Marek



Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-24 Thread Andrew MacLeod via Gcc-patches

On 6/24/21 8:29 AM, Richard Biener wrote:

On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches
 wrote:

You quote opportunities that are catched with this like

+  if (a != 0)
+{
+  c = b;
+}
+  for (unsigned i = 0; i < c; i++)
+{
+  if (a != 0)
+   {
+ if (i >= b)
+   /* Should be eliminated.
+*/
+   foo ();

but say other "obvious" cases like

+  if (a != 0)
+{
+  c = b;
+}
+  for (unsigned i = 0; i < c; i++)
+{
+  if (a != 0)
+   {
+   /* Should be zero.  */
  return b - c;

are not handled.  That's in line with the current "value predication"
which mainly aims at catching simple jump threading opportunities;
you only simplify conditions with the recorded equivalences.  But
then the complexity of handling equivalences does probably not
outweight the opportunities catched - can you share some numbers
on how many branches are newly known taken during VN with
this patch during say bootstrap or build of SPEC CPU?

I've hoped to ditch the current "value predication" code by eventually
using the relation oracle from ranger but did not yet have the chance
to look into that.  Now, the predicated equivalences are likely not
something that infrastructure can handle?

In the end I think we should research into maintaining an alternate
expression table for conditions (those we like to simplify with
equivalences) and use a data structure that more easily allows to
introduce (temporary) equivalences.  Like by maintaining
back-references of values we've recorded a condition for and a way
to quickly re-canonicalize conditions.  Well - it requires some research,
as said.

The idea would be to handle all this eventually if it doesnt now.  It'll 
certainly be capable of it. we havent looked into any missed cases yet. 
early days :-)


The oracle  handles predicated things just fine.  We're missing 
transitive relations, and any time an edge has multiple predecessors we 
sort of bail on predicated things for now.  I also haven't gotten to 
producing a nice relation/equivlance map of what we actually know... 
That might be worthwhile sooner than later.


THe original function in EVRP currently looks like:

 === BB 2 
     :
    if (a_5(D) == b_6(D))
  goto ; [INV]
    else
  goto ; [INV]

=== BB 8 
Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides 
a_5 and b_6 as equivalences

     :
    goto ; [100.00%]

=== BB 6 
     :
    # i_1 = PHI <0(8), i_10(5)>
    if (i_1 < a_5(D))
  goto ; [INV]
    else
  goto ; [INV]

=== BB 3 
Relational : (i_1 < a_5(D)) edge 6->3 provides 
this relation

     :
    if (i_1 == b_6(D))
  goto ; [INV]
    else
  goto ; [INV]


So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < 
a_5 in BB3 as well..


so we should be able to indicate that  i_1 == b_6 as [0,0]..  we 
currently aren't.   I think I had turned on equivalence mapping during 
relational processing, so should be able to tag that without transitive 
relations...  I'll have a look at why.


And once we get a bit further along, you will be able to access this 
without ranger.. if one wants to simply register the relations directly.


Anyway, I'll get back to you why its currently being missed.

Andrew





Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-24 Thread Andrew MacLeod via Gcc-patches

On 6/24/21 5:07 AM, Richard Biener wrote:

On Wed, Jun 23, 2021 at 9:25 PM Andrew MacLeod  wrote:

On 6/23/21 2:37 PM, Richard Biener via Gcc-patches wrote:

On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:

The call to gimple_call_fntype() in gimple_call_return_type() may
return
NULL, which causes the TREE_TYPE(lhs) to ICE.  I think it would be best
to
return NULL (or void_type_node) rather than aborting.

I'm running into this because fold_using_range::range_of_call, calls
gimple_call_return_type which may ICE for builtins with no LHS.
Instead
of special casing things in range_of_call, perhaps it's best to plug
the
source.

Does this sound reasonable?

No, you need to make sure to not call this on an internal function call instead.
Otherwise it is never NULL.

Richard.

Out of curiosity, why is it not OK to call this on an internal function
call?   Shouldn't all calls return something at least? like VOIDmode if
they don't return anything?  It just seems like it needs to be special
cased either at any possible call site, or simply in the routine.   we
stumbled across it and it wasn't obvious why.

Well, gimple_call_fntype simply returns NULL because internal functions
do not have any API/ABI.  So you either deal with a NULL return value
but then explicitely checking for an internal function call is clearly better
from a source documentation point of view.

I think the existing type == NULL check was likely added to avoid touching
too much code and we somehow didn't think of internal function calls
w/o a LHS (but why are you asking for the return type for a call w/o LHS?)


We'll still compute values for statements that don't have a LHS.. 
there's nothing inherently wrong with that.  The primary example is


if (x_2 < y_3)

we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  
It primarily becomes a generic way to ask for the range of each of the 
operands of the statement, and process it regardless of the presence of 
a LHS.  I don't know, maybe there is (or will be)  an internal function 
that doesn't have a LHS but which can be folded away/rewritten if the 
operands are certain values.


Andrew



Re: [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Tue, 8 Jun 2021 at 13:10, Matthias Kretz wrote:
>
> The following patches mostly contain code cleanups and minor corrections. The
> major feature in this patchset is the last patch, which should make the use of
> stdx::simd much safer wrt. ODR violations involuntarily introduced by linking
> TUs that were compiled with different -m and floating-point flags.
>
> Matthias Kretz (11):
>   libstdc++: Improve copysign codegen
>   libstdc++: Remove dead code
>   libstdc++: Improve fixed_size codegen
>   libstdc++: Make use of __builtin_bit_cast
>   libstdc++: Remove incorrect fabs overload
>   libstdc++: Minor simd_math cleanups
>   libstdc++: Fix condition when AVX512F ldexp implementation is used
>   libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil
>   libstdc++: Ensure unrolled loops inline the lambda
>   libstdc++: Fix internal names: add missing underscores
>   libstdc++: Fix ODR issues with different -m flags

Thanks! I've pushed all except the bit_cast one (as discussed on IRC)
and the ODR one (which I'm still reviewing).



Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches wrote:
> We'll still compute values for statements that don't have a LHS.. there's
> nothing inherently wrong with that.  The primary example is
> 
> if (x_2 < y_3)
> 
> we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  It
> primarily becomes a generic way to ask for the range of each of the operands
> of the statement, and process it regardless of the presence of a LHS.  I
> don't know, maybe there is (or will be)  an internal function that doesn't
> have a LHS but which can be folded away/rewritten if the operands are
> certain values.

There are many internal functions that aren't ECF_CONST or ECF_PURE.  Some
of them, like IFN*STORE* I think never have an lhs, others have them, but
if the lhs is unused, various optimization passes can just remove those lhs
from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls
would be DCEd).

I think generally, if a call doesn't have lhs, there is no point in
computing a value range for that missing lhs.  It won't be useful for the
call arguments to lhs direction (nothing would care about that value) and
it won't be useful on the direction from the lhs to the call arguments
either.  Say if one has
  p_23 = __builtin_memcpy (p_75, q_23, 16);
then one can imply from ~[0, 0] range on p_75 that p_23 has that range too
(and vice versa), but if one has
  __builtin_memcpy (p_125, q_23, 16);
none of that makes sense.

So instead of punting when gimple_call_return_type returns NULL IMHO the
code should punt when gimple_call_lhs is NULL.

Jakub



[PATCH v2 1/2] Add -f[no-]direct-extern-access

2021-06-24 Thread H.J. Lu via Gcc-patches
-fdirect-extern-access is the default.  With -fno-direct-extern-access:

1. Always use GOT to access undefined data and function symbols,
   including in PIE and non-PIE.  These will avoid copy relocations
   in executables.  This is compatible with existing executables and
   shared libraries.
2. In executable and shared library, bind symbols with the STV_PROTECTED
   visibility locally:
   a. The address of data symbol is the address of data body.
   b. For systems without function descriptor, the function pointer is
  the address of function body.
   c. The resulting shared libraries may not be incompatible with
  executables which have copy relocations on protected symbols.
3. Update asm_preferred_eh_data_format to properly select EH encoding
format with -fno-direct-extern-access.
4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
relocation with -fno-direct-extern-access.

gcc/

PR target/35513
PR target/100593
* common.opt: Add -fdirect-extern-access.
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
bool argument.
* config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool
argument to indicate call operand.  Force non-call load
from GOT for -fno-direct-extern-access.
(legitimate_pic_address_disp_p): Avoid copy relocation in PIE
for -fno-direct-extern-access.
(ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
for call operand.
(asm_preferred_eh_data_format): Use PC-relative format for
-fno-direct-extern-access to avoid copy relocation.  Check
ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
(ix86_binds_local_p): Don't treat protected data as extern and
avoid copy relocation on common symbol with
-fno-direct-extern-access.
(ix86_reloc_rw_mask): New to avoid copy relocation for
-fno-direct-extern-access.
(TARGET_ASM_RELOC_RW_MASK): New.
* doc/invoke.texi: Document -f[no-]direct-extern-access.

gcc/testsuite/

PR target/35513
PR target/100593
* g++.dg/pr35513-1.C: New file.
* g++.dg/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-1.c: Likewise.
* gcc.target/i386/pr35513-2.c: Likewise.
* gcc.target/i386/pr35513-3.c: Likewise.
* gcc.target/i386/pr35513-4.c: Likewise.
* gcc.target/i386/pr35513-5.c: Likewise.
* gcc.target/i386/pr35513-6.c: Likewise.
* gcc.target/i386/pr35513-7.c: Likewise.
* gcc.target/i386/pr35513-8.c: Likewise.
---
 gcc/common.opt|  4 ++
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 50 +++--
 gcc/doc/invoke.texi   | 13 ++
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
 gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
 14 files changed, 278 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-8.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a1353e06bdc..b4827f59cfa 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1427,6 +1427,10 @@ fdiagnostics-minimum-margin-width=
 Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
 Set minimum width of left margin of source code when showing source.
 
+fdirect-extern-access
+Common Var(flag_direct_extern_access) Init(1) Optimization
+Do not use GOT to access external symbols.
+
 fdisable-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fdisable-[tree|rtl|ipa]-=range1+range2  Disable an optimization pass.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 65fc307dc7b..3f1bdd14e6d 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -77,7 +77,7 @@ extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, 
rtx, rtx, bool);
 extern bool constant_address

[PATCH v2 0/2] Implement indirect external access

2021-06-24 Thread H.J. Lu via Gcc-patches
Changes in the v2 patch.

1. Rename the option to -fdirect-extern-access.

---
On systems with copy relocation:
* A copy in executable is created for the definition in a shared library
at run-time by ld.so.
* The copy is referenced by executable and shared libraries.
* Executable can access the copy directly.

Issues are:
* Overhead of a copy, time and space, may be visible at run-time.
* Read-only data in the shared library becomes read-write copy in
executable at run-time.
* Local access to data with the STV_PROTECTED visibility in the shared
library must use GOT.

On systems without function descriptor, function pointers vary depending
on where and how the functions are defined.
* If the function is defined in executable, it can be the address of
function body.
* If the function, including the function with STV_PROTECTED visibility,
is defined in the shared library, it can be the address of the PLT entry
in executable or shared library.

Issues are:
* The address of function body may not be used as its function pointer.
* ld.so needs to search loaded shared libraries for the function pointer
of the function with STV_PROTECTED visibility.

Here is a proposal to remove copy relocation and use canonical function
pointer:

1. Accesses, including in PIE and non-PIE, to undefined symbols must
use GOT.
  a. Linker may optimize out GOT access if the data is defined in PIE or
  non-PIE.
2. Read-only data in the shared library remain read-only at run-time
3. Address of global data with the STV_PROTECTED visibility in the shared
library is the address of data body.
  a. Can use IP-relative access.
  b. May need GOT without IP-relative access.
4. For systems without function descriptor,
  a. All global function pointers of undefined functions in PIE and
  non-PIE must use GOT.  Linker may optimize out GOT access if the
  function is defined in PIE or non-PIE.
  b. Function pointer of functions with the STV_PROTECTED visibility in
  executable and shared library is the address of function body.
   i. Can use IP-relative access.
   ii. May need GOT without IP-relative access.
   iii. Branches to undefined functions may use PLT.
5. Single global definition marker:

Add GNU_PROPERTY_1_NEEDED:

#define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.

Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:

#define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.  This bit should be cleared in
executable when there are non-GOT or non-PLT relocations in relocatable
input files without this bit set.

  a. Protected symbol access within the shared library can be treated as
  local.
  b. Copy relocation should be disallowed at link-time and run-time.
  c. GOT function pointer reference is required at link-time and run-time.

The indirect external access marker can be used in the following ways:

1. Linker can decide the best way to resolve a relocation against a
protected symbol before seeing all relocations against the symbol.
2. Dynamic linker can decide if it is an error to have a copy relocation
in executable against the protected symbol in a shared library by checking
if the shared library is built with -fno-direct-extern-access.

Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is
the default.  With -fno-direct-extern-access:

1. Always to use GOT to access undefined symbols, including in PIE and
non-PIE.  This is safe to do and does not break the ABI.
2. In executable and shared library, for symbols with the STV_PROTECTED
visibility:
  a. The address of data symbol is the address of data body.
  b. For systems without function descriptor, the function pointer is
  the address of function body.
These break the ABI and resulting shared libraries may not be compatible
with executables which are not compiled with -fno-direct-extern-access.
3. Generate an indirect external access marker in relocatable objects if
supported by linker.

H.J. Lu (2):
  Add -f[no-]direct-extern-access
  Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

 gcc/common.opt|  4 ++
 gcc/config.in |  6 +++
 gcc/config/i386/gnu-property.c| 31 -
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 52 --
 gcc/configure | 24 ++
 gcc/configure.ac  | 20 +
 gcc/doc/invoke.texi   | 13 ++
 gcc/doc/tm.texi   |  5 +++
 gcc/doc/tm.texi.in|  2 +
 gcc/output.h  |  2 +
 gcc/target.def|  8 
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513

[PATCH v2 2/2] Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

2021-06-24 Thread H.J. Lu via Gcc-patches
Generate the marker for -fno-direct-extern-access to indicate that the
object file uses GOT to access all external symbols.  Access to protected
symbols in the resulting shared library is treated as local, which requires
canonical function pointers and cannot be used with copy relocation.

* configure.ac (HAVE_LD_INDIRECT_EXTERN_ACCESS_SUPPORT): New.
Define to 1 if linker supports -z indirect-extern-access.
* output.h (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* target.def (emit_gnu_property_note): Add a argetm.asm_out hook.
* toplev.c (compile_file): Call emit_gnu_property_note before
file_end.
* varasm.c (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/gnu-property.c (emit_gnu_property): Removed.
(TARGET_ASM_EMIT_GNU_PROPERTY_NOTE): New.
* doc/tm.texi.in: Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE.
---
 gcc/config.in  |  6 +
 gcc/config/i386/gnu-property.c | 31 --
 gcc/config/i386/i386.c |  2 ++
 gcc/configure  | 24 +
 gcc/configure.ac   | 20 +++
 gcc/doc/tm.texi|  5 
 gcc/doc/tm.texi.in |  2 ++
 gcc/output.h   |  2 ++
 gcc/target.def |  8 ++
 gcc/toplev.c   |  3 +++
 gcc/varasm.c   | 47 ++
 11 files changed, 119 insertions(+), 31 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 18e627141cc..7a3e003f8ac 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1640,6 +1640,12 @@
 #endif
 
 
+/* Define to 1 if your linker supports -z indirect-extern-access */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_INDIRECT_EXTERN_ACCESS_SUPPORT
+#endif
+
+
 /* Define if your PowerPC64 linker supports a large TOC. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_LARGE_TOC
diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
index 4ba04403002..9fe8d00132e 100644
--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -24,37 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "linux-common.h"
 
-static void
-emit_gnu_property (unsigned int type, unsigned int data)
-{
-  int p2align = ptr_mode == SImode ? 2 : 3;
-
-  switch_to_section (get_section (".note.gnu.property",
- SECTION_NOTYPE, NULL));
-
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* name length.  */
-  fprintf (asm_out_file, ASM_LONG "1f - 0f\n");
-  /* data length.  */
-  fprintf (asm_out_file, ASM_LONG "4f - 1f\n");
-  /* note type: NT_GNU_PROPERTY_TYPE_0.  */
-  fprintf (asm_out_file, ASM_LONG "5\n");
-  fprintf (asm_out_file, "0:\n");
-  /* vendor name: "GNU".  */
-  fprintf (asm_out_file, STRING_ASM_OP "\"GNU\"\n");
-  fprintf (asm_out_file, "1:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* pr_type.  */
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", type);
-  /* pr_datasz.  */
-  fprintf (asm_out_file, ASM_LONG "3f - 2f\n");
-  fprintf (asm_out_file, "2:\n");
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", data);
-  fprintf (asm_out_file, "3:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  fprintf (asm_out_file, "4:\n");
-}
-
 void
 file_end_indicate_exec_stack_and_gnu_property (void)
 {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4ab85cc4fe0..8e86781299e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -24042,6 +24042,8 @@ ix86_run_selftests (void)
 #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES
 # undef TARGET_ASM_RELOC_RW_MASK
 # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask
+# undef TARGET_ASM_EMIT_GNU_PROPERTY_NOTE
+# define TARGET_ASM_EMIT_GNU_PROPERTY_NOTE emit_gnu_property_note
 #endif
 
 static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
diff --git a/gcc/configure b/gcc/configure
index f0b2ebde3cf..1b92dd803ca 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -32177,6 +32177,30 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ld_bndplt_support" >&5
 $as_echo "$ld_bndplt_support" >&6; }
 
+# Check linker supports '-z indirect-extern-access'
+ld_indirect_extern_access=0
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker -z 
indirect-extern-access option" >&5
+$as_echo_n "checking linker -z indirect-extern-access option... " >&6; }
+if test x"$ld_is_gold" = xno; then
+  if test $in_tree_ld = yes ; then
+if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 37 -o "$gcc_cv_gld_major_version" -gt 2; then
+  ld_indirect_extern_access=1
+fi
+  elif test x$gcc_cv_ld != x; then
+# Check if linker supports -z indirect-extern-access option
+if $gcc_cv_ld --help 2>&1 | grep -- '-z indirect-extern-access' > 
/dev/null; then
+  ld_indirect

Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-24 Thread Jose E. Marchesi via Gcc-patches


Hi Jason.

> On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote:
>> This patch introduces a dwarf2int.h header, to be used by code that
>> needs access to the internal DIE structures and their attributes.
>
> Why not put these bits in dwarf2out.h?

We think that it makes sense to have a separated interface file for the
implementation of DWARF-based debug formats.  It is called `internal'
because it provides access to internal data structures as well as the
basic accessor functions to the internals of the DWARF DIEs.

That said, if you think this is a blocker, we can put these definitions
in dwarf2out.h.  But we think dwarf2out.c is in much need of some
refactoring, and this could be a first step in that direction :)


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-24 Thread Andrew MacLeod via Gcc-patches

On 6/24/21 9:45 AM, Jakub Jelinek wrote:

On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches wrote:

We'll still compute values for statements that don't have a LHS.. there's
nothing inherently wrong with that.  The primary example is

if (x_2 < y_3)

we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  It
primarily becomes a generic way to ask for the range of each of the operands
of the statement, and process it regardless of the presence of a LHS.  I
don't know, maybe there is (or will be)  an internal function that doesn't
have a LHS but which can be folded away/rewritten if the operands are
certain values.

There are many internal functions that aren't ECF_CONST or ECF_PURE.  Some
of them, like IFN*STORE* I think never have an lhs, others have them, but
if the lhs is unused, various optimization passes can just remove those lhs
from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls
would be DCEd).

I think generally, if a call doesn't have lhs, there is no point in
computing a value range for that missing lhs.  It won't be useful for the
call arguments to lhs direction (nothing would care about that value) and
it won't be useful on the direction from the lhs to the call arguments
either.  Say if one has
   p_23 = __builtin_memcpy (p_75, q_23, 16);
then one can imply from ~[0, 0] range on p_75 that p_23 has that range too
(and vice versa), but if one has
   __builtin_memcpy (p_125, q_23, 16);
none of that makes sense.

So instead of punting when gimple_call_return_type returns NULL IMHO the
code should punt when gimple_call_lhs is NULL.




Well, we are going to punt anyway, because the call type, whether it is 
NULL or VOIDmode is not supported by irange.   It was more just a matter 
of figuring out whether us checking for internal call or the 
gimple_function_return_type call should do the check...   Ultimately in 
the end it doesnt matter.. just seemed like something someone else could 
trip across if we didnt strengthen gimple_call_return_type to not ice.


Andrew



Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Matthias Kretz
For -ffast-math there was a missing using namespace __proposed left. The 
attached patch resolves the issue.

From: Matthias Kretz 

The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h (__bit_cast): Implement via
__builtin_bit_cast #if available.
(__proposed::simd_bit_cast): Add overloads for simd and
simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
available), which return an object of the requested type with
the same bits as the argument.
* include/experimental/bits/simd_math.h: Use simd_bit_cast
instead of __bit_cast to allow casts to fixed_size_simd.
(copysign): Remove branch that was only required if __bit_cast
cannot be constexpr.
* testsuite/experimental/simd/tests/bits/test_values.h: Switch
from __bit_cast to __proposed::simd_bit_cast since the former
will not cast fixed_size objects anymore.
---
 libstdc++-v3/include/experimental/bits/simd.h | 57 ++-
 .../include/experimental/bits/simd_math.h | 37 ++--
 .../simd/tests/bits/test_values.h |  8 +--
 3 files changed, 76 insertions(+), 26 deletions(-)


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──
diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 163f1b574e2..852d0b62012 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1598,7 +1598,9 @@ template 
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __bit_cast(const _From __x)
   {
-// TODO: implement with / replace by __builtin_bit_cast ASAP
+#if __has_builtin(__builtin_bit_cast)
+return __builtin_bit_cast(_To, __x);
+#else
 static_assert(sizeof(_To) == sizeof(_From));
 constexpr bool __to_is_vectorizable
   = is_arithmetic_v<_To> || is_enum_v<_To>;
@@ -1629,6 +1631,7 @@ template 
 			 reinterpret_cast(&__x), sizeof(_To));
 	return __r;
   }
+#endif
   }
 
 // }}}
@@ -2900,6 +2903,58 @@ template (__x)};
   }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd<_Up, _Abi>& __x)
+  {
+using _Tp = typename _To::value_type;
+using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+using _From = simd<_Up, _Abi>;
+using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember;
+// with concepts, the following should be constraints
+static_assert(sizeof(_To) == sizeof(_From));
+static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>);
+static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>);
+#if __has_builtin(__builtin_bit_cast)
+return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))};
+#else
+return {__private_init, __bit_cast<_ToMember>(__data(__x))};
+#endif
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd_mask<_Up, _Abi>& __x)
+  {
+using _From = simd_mask<_Up, _Abi>;
+static_assert(sizeof(_To) == sizeof(_From));
+static_assert(is_trivially_copyable_v<_From>);
+// _To can be simd, specifically simd> in which case _To is not trivially
+// copyable.
+if constexpr (is_simd_v<_To>)
+  {
+	using _Tp = typename _To::value_type;
+	using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+	static_assert(is_trivially_copyable_v<_ToMember>);
+#if __has_builtin(__builtin_bit_cast)
+	return {__private_init, __builtin_bit_cast(_ToMember, __x)};
+#else
+	return {__private_init, __bit_cast<_ToMember>(__x)};
+#endif
+  }
+else
+  {
+	static_assert(is_trivially_copyable_v<_To>);
+#if __has_builtin(__builtin_bit_cast)
+	return __builtin_bit_cast(_To, __x);
+#else
+	return __bit_cast<_To>(__x);
+#endif
+  }
+  }
 } // namespace __proposed
 
 // simd_cast {{{2
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index d954e761eee..ef2bdc641b8 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -405,10 +405,11 @@ template 
 using _Vp = simd<_Tp, _Abi>;
 usin

Re: GCC documentation: porting to Sphinx

2021-06-24 Thread Martin Liška

On 6/23/21 6:00 PM, Joseph Myers wrote:

On Wed, 23 Jun 2021, Martin Liška wrote:


@Joseph: Can you share your thoughts about the used Makefile integration? What
do you suggest for 2)
(note that explicit listing of all .rst file would be crazy)?


You can write dependencies on e.g. doc/gcc/*.rst (which might be more
files than actually are relevant in some cases, if the directory includes
some common files shared by some but not all manuals, but should be
conservatively safe if you list appropriate directories there), rather
than needing to name all the individual files.  Doing things with makefile
dependencies seems better than relying on what sphinx-build does when
rerun unnecessarily (if sphinx-build avoids rebuilding in some cases where
the makefiles think a rebuild is needed, that's fine as an optimization).


All right. I've just done that and it was easier than I expected. Now the 
dependencies
are properly followed.



It looks like this makefile integration loses some of the srcinfo / srcman
support.  That support should stay (be updated for the use of Sphinx) so
that release tarballs (as generated by maintainer-scripts/gcc_release,
which uses --enable-generated-files-in-srcdir) continue to include man
pages / info files (and make sure that, if those files are present in the
source directory, then building and installing GCC does install them even
when sphinx-build is absent at build/install time).



Oh, and I've just recovered this one as well. Pushed changes to the me/sphinx-v2
branch and I'm waiting for more feedback.

In the meantime, I'm going to prepare further integration of other manuals and
targets (PDF, HTML).

Martin


Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> --- a/libstdc++-v3/include/experimental/bits/simd.h
> +++ b/libstdc++-v3/include/experimental/bits/simd.h
> @@ -1598,7 +1598,9 @@ template 
>_GLIBCXX_SIMD_INTRINSIC constexpr _To
>__bit_cast(const _From __x)
>{
> -// TODO: implement with / replace by __builtin_bit_cast ASAP
> +#if __has_builtin(__builtin_bit_cast)

Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
c++config to define a new macro and use that macro here?
Though it is true that c++config already uses
#if __has_builtin(__builtin_is_constant_evaluated)
and so would fail miserably for compilers that don't support __has_builtin

Jakub



Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
>
> On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > @@ -1598,7 +1598,9 @@ template 
> >_GLIBCXX_SIMD_INTRINSIC constexpr _To
> >__bit_cast(const _From __x)
> >{
> > -// TODO: implement with / replace by __builtin_bit_cast ASAP
> > +#if __has_builtin(__builtin_bit_cast)
>
> Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> c++config to define a new macro and use that macro here?
> Though it is true that c++config already uses
> #if __has_builtin(__builtin_is_constant_evaluated)
> and so would fail miserably for compilers that don't support __has_builtin

GCC was the last of our supported compilers to implement
__has_builtin, so for GCC trunk we can assume that it's always
supported.

The code in c++config.h still has some value for built-ins that aren't
called __builtin_xxx because older versions of Clang need different
handling for those. But for __builtin_bit_cast and
__builtin_is_constant_evaluted we can just use __is_builtin directly.



Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 15:11, Jonathan Wakely wrote:
>
> On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> >
> > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > @@ -1598,7 +1598,9 @@ template 
> > >_GLIBCXX_SIMD_INTRINSIC constexpr _To
> > >__bit_cast(const _From __x)
> > >{
> > > -// TODO: implement with / replace by __builtin_bit_cast ASAP
> > > +#if __has_builtin(__builtin_bit_cast)
> >
> > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > c++config to define a new macro and use that macro here?
> > Though it is true that c++config already uses
> > #if __has_builtin(__builtin_is_constant_evaluated)
> > and so would fail miserably for compilers that don't support __has_builtin
>
> GCC was the last of our supported compilers to implement
> __has_builtin, so for GCC trunk we can assume that it's always
> supported.
>
> The code in c++config.h still has some value for built-ins that aren't
> called __builtin_xxx because older versions of Clang need different
> handling for those. But for __builtin_bit_cast and
> __builtin_is_constant_evaluted we can just use __is_builtin directly.

s/__is_builtin/__has_builtin/



Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote:
> On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> >
> > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > @@ -1598,7 +1598,9 @@ template 
> > >_GLIBCXX_SIMD_INTRINSIC constexpr _To
> > >__bit_cast(const _From __x)
> > >{
> > > -// TODO: implement with / replace by __builtin_bit_cast ASAP
> > > +#if __has_builtin(__builtin_bit_cast)
> >
> > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > c++config to define a new macro and use that macro here?
> > Though it is true that c++config already uses
> > #if __has_builtin(__builtin_is_constant_evaluated)
> > and so would fail miserably for compilers that don't support __has_builtin
> 
> GCC was the last of our supported compilers to implement
> __has_builtin, so for GCC trunk we can assume that it's always
> supported.

We don't support mixing GCC and libstdc++ versions, so I'm not worried
about GCC.  At least according to godbolt, already clang 3.0 supports it
which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't
support it, only ICC 2021 does.  And ICC 19.1 seems to be released in
October 2020.

So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its
definition a little bit earlier and use it also for
__builtin_is_constant_evaluated?

Jakub



Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 1:07 PM Richard Biener  wrote:

> This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
> thus subtract, add alternating on lanes, starting with subtract.
>
> It adds a corresponding optab and direct internal function,
> vec_addsub$a3 and renames the existing i386 backend patterns to
> the new canonical name.
>
> The SLP pattern matches the exact alternating lane sequence rather
> than trying to be clever and anticipating incoming permutes - we
> could permute the two input vectors to the needed lane alternation,
> do the addsub and then permute the result vector back but that's
> only profitable in case the two input or the output permute will
> vanish - something Tamars refactoring of SLP pattern recog should
> make possible.

Using the attached patch, I was also able to generate addsub for the
following testcase:

float x[2], y[2], z[2];

void foo ()
{
  x[0] = y[0] - z[0];
  x[1] = y[1] + z[1];
}

   vmovq   y(%rip), %xmm0
   vmovq   z(%rip), %xmm1
   vaddsubps   %xmm1, %xmm0, %xmm0
   vmovlps %xmm0, x(%rip)
   ret

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e887f03474d..5f10572718d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -788,6 +788,24 @@ (define_insn "*mmx_haddsubv2sf3"
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
+(define_insn "vec_addsubv2sf3"
+  [(set (match_operand:V2SF 0 "register_operand" "=x,x")
+   (vec_merge:V2SF
+ (minus:V2SF
+   (match_operand:V2SF 1 "register_operand" "0,x")
+   (match_operand:V2SF 2 "register_operand" "x,x"))
+ (plus:V2SF (match_dup 1) (match_dup 2))
+ (const_int 1)))]
+  "TARGET_SSE3 && TARGET_MMX_WITH_SSE"
+  "@
+   addsubps\t{%2, %0|%0, %2}
+   vaddsubps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_rep" "1,*")
+   (set_attr "mode" "V4SF")])
+
 ;
 ;;
 ;; Parallel single-precision floating point comparisons


Re: [PATCH] add -ltrans-objects lto-plugin debug option

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/22/2021 12:49 AM, Richard Biener wrote:

This adds a -ltrans-objects option to lto-plugin that by-passes
lto-wrapper invocation and instead feeds LD the final LTRANS objects
directly from the response file given as argument to the option.
This allows LD issues involving the linker-plugin path to be
debugged in an easier way with just the IR objects (their symtab)
and the LTRANS objects as testcase.

I've tested the path re-building stage2 build/genmatch from an
LTO bootstrap and got a bit-identical executable by adding
-plugin-opt=-ltrans-objects=y to the original collect2 invocation,
seeding y with the final objects as printed by building genmatch
with -save-temps -v.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

2021-06-22  Richard Biener  

lto-plugin/
* lto-plugin.c (ltrans_objects): New global.
(all_symbols_read_handler): If -ltrans-objects was specified,
add the output files from the specified file directly.
(process_option): Handle -ltrans-objects.


OK
jeff



Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 15:21, Jakub Jelinek  wrote:
>
> On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote:
> > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> > >
> > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > > @@ -1598,7 +1598,9 @@ template 
> > > >_GLIBCXX_SIMD_INTRINSIC constexpr _To
> > > >__bit_cast(const _From __x)
> > > >{
> > > > -// TODO: implement with / replace by __builtin_bit_cast ASAP
> > > > +#if __has_builtin(__builtin_bit_cast)
> > >
> > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > > c++config to define a new macro and use that macro here?
> > > Though it is true that c++config already uses
> > > #if __has_builtin(__builtin_is_constant_evaluated)
> > > and so would fail miserably for compilers that don't support __has_builtin
> >
> > GCC was the last of our supported compilers to implement
> > __has_builtin, so for GCC trunk we can assume that it's always
> > supported.
>
> We don't support mixing GCC and libstdc++ versions, so I'm not worried
> about GCC.  At least according to godbolt, already clang 3.0 supports it
> which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't
> support it, only ICC 2021 does.  And ICC 19.1 seems to be released in
> October 2020.
>
> So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its
> definition a little bit earlier and use it also for
> __builtin_is_constant_evaluated?

I discussed this with Judy Ward on the Intel compiler team. If you're
using their compiler, you should be using the latest version. They
also claim 100% compatibility with GCC, for versions they've been able
to test. So if you are using libstdc++ headers from a GCC release that
supports __has_builtin, then you need to use a release of the Intel
compiler that supports __has_builtin. Otherwise, it's unsupported. So
in GCC 12 C++ headers we support GCC 12, versions of Intel compatible
with GCC 12, and the last few releases of Clang. All of those have
__has_builtin.

Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
to not use it where it isn't needed, as in the attached (untested)
patch.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 9911d4deb72..3c07590 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -55,7 +55,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_is_constant_evaluated
 // Support P1032R1 in C++20 (but not P0980R1 yet).
 # define __cpp_lib_constexpr_string 201811L
-#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 // Support P0426R1 changes to char_traits in C++17.
 # define __cpp_lib_constexpr_string 201611L
 #elif __cplusplus > 201703L
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 9314117aed8..3ec668b65cf 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -720,13 +720,11 @@ namespace std
 # define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1
 #endif
 
-#ifdef __has_builtin
-# ifdef __is_identifier
+#ifdef __is_identifier
 // Intel and older Clang require !__is_identifier for some built-ins:
-#  define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B)
-# else
-#  define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B)
-# endif
+# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B)
+#else
+# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B)
 #endif
 
 #if _GLIBCXX_HAS_BUILTIN(__has_unique_object_representations)
@@ -737,18 +735,10 @@ namespace std
 # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
 #endif
 
-#if _GLIBCXX_HAS_BUILTIN(__builtin_is_constant_evaluated)
-#  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
-#endif
-
 #if _GLIBCXX_HAS_BUILTIN(__is_same)
 #  define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1
 #endif
 
-#if _GLIBCXX_HAS_BUILTIN(__builtin_launder)
-# define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
-#endif
-
 #undef _GLIBCXX_HAS_BUILTIN
 
 
diff --git a/libstdc++-v3/include/bits/char_traits.h 
b/libstdc++-v3/include/bits/char_traits.h
index 3da6e28a513..77ad7be5dfb 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -238,7 +238,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_is_constant_evaluated
 // Unofficial macro indicating P1032R1 support in C++20
 # define __cpp_lib_constexpr_char_traits 201811L
-#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 // Unofficial macro indicating P0426R1 support in C++17
 # define __cpp_lib_constexpr_char_traits 201611L
 #endif
@@ -295,7 +295,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   

Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote:
> Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
> to not use it where it isn't needed, as in the attached (untested)
> patch.

My rationale for this is that I'd prefer to use standardized features
like __has_include and __has_cpp_attribute where possible, instead of
adding more and more configure macros. You don't need to look in
c++config.h to see how the macro is defined if you just use a standard
feature directly.

__has_builtin obviously isn't standardized, but as long as it's
available on all the compilers we care about (which it is) then the
same rationale applies.



Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-24 Thread Jason Merrill via Gcc-patches

On 6/24/21 9:52 AM, Jose E. Marchesi wrote:


On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote:

This patch introduces a dwarf2int.h header, to be used by code that
needs access to the internal DIE structures and their attributes.


Why not put these bits in dwarf2out.h?


We think that it makes sense to have a separated interface file for the
implementation of DWARF-based debug formats.  It is called `internal'
because it provides access to internal data structures as well as the
basic accessor functions to the internals of the DWARF DIEs.


Yes, but "internal data structures" also describes most of the current 
dwarf2out.h.  I'm not opposed to refactoring the header, but splitting 
off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better 
dividing line.


Jason



Re: [PATCH] c++: requires-expression folding [PR101182]

2021-06-24 Thread Jason Merrill via Gcc-patches

On 6/23/21 7:37 PM, Patrick Palka wrote:

Here we're crashing because cp_fold_function walks into the (templated)
requirements of a requires-expression outside a template, but the
folding routines aren't prepared to handle templated trees.  This patch
fixes this by making cp_fold use evaluate_requires_expr to fold a
requires-expression as a whole, which also means we no longer need to
explicitly do so during gimplification.  (We don't immediately fold
non-dependent requires-expressions at parse time for sake of better
diagnostics in case one appears as the condition of a failed
static_assert.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
cmcstl2 and range-v3, does this look OK for trunk/11?


OK.


PR c++/101182

gcc/cp/ChangeLog:

* constraint.cc (evaluate_requires_expr): Adjust function comment.
* cp-gimplify.c (cp_genericize_r) : Move to ...
(cp_fold) : ... here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires27.C: New test.
---
  gcc/cp/constraint.cc |  2 +-
  gcc/cp/cp-gimplify.c | 10 --
  gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C | 10 ++
  3 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 74b16d27101..286332ba2a7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3340,7 +3340,7 @@ evaluate_concept_check (tree check)
  }
  
  /* Evaluate the requires-expression T, returning either boolean_true_node

-   or boolean_false_node.  This is used during gimplification and constexpr
+   or boolean_false_node.  This is used during folding and constexpr
 evaluation.  */
  
  tree

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 96d91b6ef2f..33e10556e32 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1465,12 +1465,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
TARGET_EXPR_NO_ELIDE (stmt) = 1;
break;
  
-case REQUIRES_EXPR:

-  /* Emit the value of the requires-expression.  */
-  *stmt_p = evaluate_requires_expr (stmt);
-  *walk_subtrees = 0;
-  break;
-
  case TEMPLATE_ID_EXPR:
gcc_assert (concept_check_p (stmt));
/* Emit the value of the concept check.  */
@@ -2708,6 +2702,10 @@ cp_fold (tree x)
x = r;
break;
  
+case REQUIRES_EXPR:

+  x = evaluate_requires_expr (x);
+  break;
+
  default:
return org_x;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C
new file mode 100644
index 000..87a759a689a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C
@@ -0,0 +1,10 @@
+// PR c++/101182
+// { dg-do compile { target concepts } }
+
+int a;
+int g(bool);
+
+int f() {
+  g(requires { a++; });
+  return requires { a++; };
+}





Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 24, 2021 at 03:40:09PM +0100, Jonathan Wakely wrote:
> On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote:
> > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
> > to not use it where it isn't needed, as in the attached (untested)
> > patch.
> 
> My rationale for this is that I'd prefer to use standardized features
> like __has_include and __has_cpp_attribute where possible, instead of
> adding more and more configure macros. You don't need to look in
> c++config.h to see how the macro is defined if you just use a standard
> feature directly.
> 
> __has_builtin obviously isn't standardized, but as long as it's
> available on all the compilers we care about (which it is) then the
> same rationale applies.

Okay.

Jakub



MSPs, MSSPs Records

2021-06-24 Thread Jennifer Daly
Hi,

Hope you find this email well!

Are you looking for a customer base of your competitors installed base which 
will help you reach niche target and also helps you to grab new clients for 
your latest service and products?

We also have an exclusive database of:


1.   Cloud Service Providers- CSPs

2.   Independent Software Vendors- ISVs

3.   System Integrators- SIs and more

4.   Managed Service Providers- MSPs

5.   Managed Security Service Providers- MSSPs and more

Data can be customized based on your requirement (e.g. Job title, Verticals, 
Geography etc.)


Please feel free to get back to me with your target criteria, if your criteria 
are different from the above-mentioned applications let me know we have close 
to 5000+ technology installations.



Best Regards,

Jennifer Daly

Sr. Lead Analyst


Note: This email is not expected to be a spam. It would be ideal if you 
acknowledge our conciliatory sentiments and answer in the subject taking with 
leave off to be expelled from our Mailing list. Why not try it/us out?


Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-24 Thread Andrew MacLeod via Gcc-patches

On 6/24/21 9:25 AM, Andrew MacLeod wrote:

On 6/24/21 8:29 AM, Richard Biener wrote:


THe original function in EVRP currently looks like:

 === BB 2 
     :
    if (a_5(D) == b_6(D))
  goto ; [INV]
    else
  goto ; [INV]

=== BB 8 
Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides 
a_5 and b_6 as equivalences

     :
    goto ; [100.00%]

=== BB 6 
     :
    # i_1 = PHI <0(8), i_10(5)>
    if (i_1 < a_5(D))
  goto ; [INV]
    else
  goto ; [INV]

=== BB 3 
Relational : (i_1 < a_5(D)) edge 6->3 provides 
this relation

     :
    if (i_1 == b_6(D))
  goto ; [INV]
    else
  goto ; [INV]


So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < 
a_5 in BB3 as well..


so we should be able to indicate that  i_1 == b_6 as [0,0]..  we 
currently aren't.   I think I had turned on equivalence mapping during 
relational processing, so should be able to tag that without 
transitive relations...  I'll have a look at why.


And once we get a bit further along, you will be able to access this 
without ranger.. if one wants to simply register the relations directly.


Anyway, I'll get back to you why its currently being missed.

Andrew



As promised.  There was a typo in the equivalency comparisons... so it 
was getting missed.  With the fix, the oracle identifies the relation 
and evrp will now fold that expression away and the IL becomes:


   :
  if (a_5(D) == b_6(D))
    goto ; [INV]
  else
    goto ; [INV]

   :
  i_10 = i_1 + 1;

   :
  # i_1 = PHI <0(2), i_10(3)>
  if (i_1 < a_5(D))
    goto ; [INV]
  else
    goto ; [INV]

   :
  return;

for the other cases you quote, there are no predictions such that if a 
!= 0 then this equivalency exists...


+  if (a != 0)
+    {
+  c = b;
+    }

but the oracle would register that in the TRUE block,  c and b are 
equivalent... so some other pass that was interested in tracking 
conditions that make a block relevant would be able to compare relations...


Andrew




Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-24 Thread Martin Sebor via Gcc-patches

On 6/24/21 3:27 AM, Richard Biener wrote:

On Thu, Jun 24, 2021 at 12:56 AM Martin Sebor  wrote:


On 6/23/21 1:43 AM, Richard Biener wrote:

On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  wrote:


On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:

On 6/21/21 1:15 AM, Richard Biener wrote:

[...]


But maybe I'm misunderstanding C++ too much :/

Well, I guess b) from above means auto_vec<> passing to
vec<> taking functions will need changes?


Converting an auto_vec object to a vec slices off its data members.
The auto_vec specialization has no data members so that's not
a bug in and of itself, but auto_vec does have data members
so that would be a bug.  The risk is not just passing it to
functions by value but also returning it.  That risk was made
worse by the addition of the move ctor.


I would agree that the conversion from auto_vec<> to vec<> is
questionable, and should get some work at some point, perhaps just
passingauto_vec references is good enough, or perhaps there is value in
some const_vec view to avoid having to rely on optimizations, I'm not
sure without looking more at the usage.


We do need to be able to provide APIs that work with both auto_vec<>
and vec<>, I agree that those currently taking a vec<> by value are
fragile (and we've had bugs there before), but I'm not ready to say
that changing them all to [const] vec<>& is OK.  The alternative
would be passing a const_vec<> by value, passing that along to
const vec<>& APIs should be valid then (I can see quite some API
boundary cleanups being necessary here ...).

But with all this I don't know how to adjust auto_vec<> to no
longer "decay" to vec<> but still being able to pass it to vec<>&
and being able to call vec<> member functions w/o jumping through
hoops.  Any hints on that?  private inheritance achieves the first
but also hides all the API ...


The simplest way to do that is by preventing the implicit conversion
between the two types and adding a to_vec() member function to
auto_vec as Jason suggested.  This exposes the implicit conversions
to the base vec, forcing us to review each and "fix" it either by
changing the argument to either vec& or const vec&, or less often
to avoid the indirection if necessary, by converting the auto_vec
to vec explicitly by calling to_vec().  The attached diff shows
a very rough POC of what  that might look like.  A side benefit
is that it improves const-safety and makes the effects of functions
on their callers easier to understand.

But that's orthogonal to making auto_vec copy constructible and copy
assignable.  That change can be made independently and with much less
effort and no risk.


There's a bunch of stuff I can't review because of lack of C++ knowledge
(the vNULL changes).

I suppose that

+  /* Prevent implicit conversion from auto_vec.  Use auto_vec::to_vec()
+ instead.  */
+  template 
+  vec (auto_vec &) = delete;
+
+  template 
+  void operator= (auto_vec &) = delete;

still allows passing auto_vec<> to [const] vec<> & without the .to_vec so
it's just the by-value passing that becomes explicit?  If so then I agree
this is an improvement.  The patch is likely incomplete though or is
it really only so few signatures that need changing?


Yes, to both questions.

I just wanted to show how we might go about making this transition.
I converted a few files but many more remain.  I stopped when
changing a vec argument to const vec& started to cause errors due
to missing const down the line (e.g., assigning the address of a vec
element to an uqualified pointer).  I'll need to follow where that
pointer flows and see if it's used to modify the object or if it too
could be made const.  To me that seems worthwhile doing now but it
makes progress slow.

A separate question is whether the vec value arguments to functions
that modify them should be changed to references or pointers.  Both
kinds of APIs exist but the latter seems prevalent.  Changing them
to the latter means more churn.

Martin


Re: [PATCH 7/7] Port most of the A CMP 0 ? A : -A to match

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/19/2021 3:49 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To improve phiopt and be able to remove abs_replacement, this ports
most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to
match.pd.  There is a few extra changes that are needed to remove
the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison:
* Need to handle (A - B) case
* Need to handle UN* comparisons.

I will handle those in a different patch.

Note phi-opt-15.c test needed to be updated as we get ABSU now
instead of not getting ABS.  When ABSU was added phiopt was not
updated even to use ABSU instead of not creating ABS.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (A CMP 0 ? A : -A): New patterns.
* tree-ssa-phiopt.c (abs_replacement): Delete function.
(tree_ssa_phiopt_worker): Don't call abs_replacement.
Update comment about abs_replacement.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect
ABSU and still not expect ABS_EXPR.



+(for cnd (cond vec_cond)
+ /* A == 0? A : -Asame as -A */

   ^^
Missing whitespace.  This occurs in the comment each pattern.

OK with the nits fixed.

jeff



Re: [PATCH 1/7] Expand the comparison argument of fold_cond_expr_with_comparison

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To make things slightly easiler to convert fold_cond_expr_with_comparison
over to match.pd, expanding the arg0 argument into 3 different arguments
is done. Also this was simple because we don't use arg0 after grabbing
the code and the two operands.
Also since we do this, we don't need to fold the comparison to
get the inverse but just use invert_tree_comparison directly.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.c (fold_cond_expr_with_comparison):
Exand arg0 into comp_code, arg00, and arg01.
(fold_ternary_loc): Use invert_tree_comparison
instead of fold_invert_truthvalue for the case
where we have A CMP B ? C : A.

OK
jeff



Re: [PATCH 2/7] Reset the range info on the moved instruction in PHIOPT

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

I had missed this when wrote the patch which allowed the
gimple to be moved from inside the conditional as it.  It
was also missed in the review.  Anyways the range information
needs to be reset for the moved gimple as it was under a
conditional and the flow has changed to be unconditional.
I have not seen any testcase in the wild that produces wrong code
yet which is why there is no testcase but this is similar to what
the other code in phiopt does so after moving those to match, there
might be some.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement): Reset
flow senatitive info on the moved ssa set.

OK
jeff



Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-24 Thread Jose E. Marchesi via Gcc-patches


 This patch introduces a dwarf2int.h header, to be used by code that
 needs access to the internal DIE structures and their attributes.
>>>
>>> Why not put these bits in dwarf2out.h?
>> We think that it makes sense to have a separated interface file for
>> the
>> implementation of DWARF-based debug formats.  It is called `internal'
>> because it provides access to internal data structures as well as the
>> basic accessor functions to the internals of the DWARF DIEs.
>
> Yes, but "internal data structures" also describes most of the current
> dwarf2out.h.

Yes right, dwarf2out.h contains a mixture of function prototypes and
several data structures, many of them "internal".

> I'm not opposed to refactoring the header, but splitting 
> off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better
> dividing line.

Ok, so what about this: at this point we remove dwarf2int.h from our
patch, put the definitions in dwarf2out.h instead, and then once the
stuff is upstream we can discuss on how better refactor the dwarf2out*
stuff.

Is that ok?


[PATCH] c++: alias CTAD and aggregate deduction cand [PR98832]

2021-06-24 Thread Patrick Palka via Gcc-patches
During alias CTAD, we're accidentally ignoring the aggregate deduction
candidate of the underlying template because it's added to the candidate
set separately via maybe_aggr_guide (which doesn't yet handle alias
templates) rather than via deduction_guides_for (which does).  This
patch makes maybe_aggr_guide handle alias templates in a manner similar
to deduction_guides_for.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/98832

gcc/cp/ChangeLog:

* pt.c (maybe_aggr_guide): Handle an alias template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias9.C: New test.
---
 gcc/cp/pt.c | 11 +++
 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C |  6 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index aeb1e8a6f97..db15e4714d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28880,6 +28880,8 @@ is_spec_or_derived (tree etype, tree tmpl)
   return !err;
 }
 
+static tree alias_ctad_tweaks (tree, tree);
+
 /* Return a C++20 aggregate deduction candidate for TYPE initialized from
INIT.  */
 
@@ -28892,6 +28894,15 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
   if (init == NULL_TREE)
 return NULL_TREE;
 
+  if (DECL_ALIAS_TEMPLATE_P (tmpl))
+{
+  tree under = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl));
+  tree tinfo = get_template_info (under);
+  if (tree guide = maybe_aggr_guide (TI_TEMPLATE (tinfo), init, args))
+   return alias_ctad_tweaks (tmpl, guide);
+  return NULL_TREE;
+}
+
   /* We might be creating a guide for a class member template, e.g.,
 
template struct A {
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C
new file mode 100644
index 000..0aaf203639a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C
@@ -0,0 +1,6 @@
+// PR c++/98832
+// { dg-do compile { target c++20 } }
+
+template struct X{ U u; };
+template using Y = X;
+Y y{0};
-- 
2.32.0.93.g670b81a890



Re: [PATCH 3/7] Duplicate the range information of the phi onto the new ssa_name

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Since match_simplify_replacement uses gimple_simplify, there is a new
ssa name created sometimes and then we go and replace the phi edge with
this new ssa name, the range information on the phi is lost.
Placing this in replace_phi_edge_with_variable is the best option instead
of doing it in each time replace_phi_edge_with_variable is called which is
what is done today.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
info if we're the only things setting the target PHI.
(value_replacement): Don't duplicate range here.
(minmax_replacement): Likewise.
---
  gcc/tree-ssa-phiopt.c | 35 ++-
  1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 24cbce9955a..147397ad82c 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -391,6 +391,24 @@ replace_phi_edge_with_variable (basic_block cond_block,
basic_block bb = gimple_bb (phi);
basic_block block_to_remove;
gimple_stmt_iterator gsi;
+  tree phi_result = PHI_RESULT (phi);
+
+  /* Duplicate range info as needed.  */
+  if (TREE_CODE (new_tree) == SSA_NAME
+  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
+  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result)))
+{
+  if (!SSA_NAME_RANGE_INFO (new_tree)
+ && SSA_NAME_RANGE_INFO (phi_result))
+   duplicate_ssa_name_range_info (new_tree,
+  SSA_NAME_RANGE_TYPE (phi_result),
+  SSA_NAME_RANGE_INFO (phi_result));
+  if (SSA_NAME_RANGE_INFO (new_tree)
+ && !SSA_NAME_RANGE_INFO (phi_result))
+   duplicate_ssa_name_range_info (phi_result,
+  SSA_NAME_RANGE_TYPE (new_tree),
+  SSA_NAME_RANGE_INFO (new_tree));
+}

I'm not sure you need the EDGE_COUNT test here.

I worry that copying the range info from the argument to the PHI result 
isn't right.  It was OK for the ABS replacement, but I'm not sure that's 
true in general.


Jeff


  
/* Change the PHI argument to new.  */

SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
@@ -1385,16 +1403,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   :
   # u_3 = PHI   */
reset_flow_sensitive_info (lhs);
-  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
-   {
- /* If available, we can use VR of phi result at least.  */
- tree phires = gimple_phi_result (phi);
- struct range_info_def *phires_range_info
-   = SSA_NAME_RANGE_INFO (phires);
- if (phires_range_info)
-   duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
-  phires_range_info);
-   }
gimple_stmt_iterator gsi_from;
for (int i = prep_cnt - 1; i >= 0; --i)
{
@@ -1794,13 +1802,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
gimple_seq stmts = NULL;
tree phi_result = PHI_RESULT (phi);
result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
-  /* Duplicate range info if we're the only things setting the target PHI.  */
-  if (!gimple_seq_empty_p (stmts)
-  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
-  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
-  && SSA_NAME_RANGE_INFO (phi_result))
-duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
-  SSA_NAME_RANGE_INFO (phi_result));
  
gsi = gsi_last_bb (cond_bb);

gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);




Re: [PATCH 5/7] Try inverted comparison for match_simplify in phiopt

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Since match and simplify does not have all of the inverted
comparison patterns, it make sense to just have
phi-opt try to do the inversion and try match and simplify again.

OK? Bootstrapped and tested on x86_64-linux-gnu.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* tree-ssa-phiopt.c (gimple_simplify_phiopt):
If "A ? B : C" fails to simplify, try "(!A) ? C : B".

OK
jeff



Re: [PATCH 7/7] Port most of the A CMP 0 ? A : -A to match

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To improve phiopt and be able to remove abs_replacement, this ports
most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to
match.pd.  There is a few extra changes that are needed to remove
the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison:
* Need to handle (A - B) case
* Need to handle UN* comparisons.

I will handle those in a different patch.

Note phi-opt-15.c test needed to be updated as we get ABSU now
instead of not getting ABS.  When ABSU was added phiopt was not
updated even to use ABSU instead of not creating ABS.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (A CMP 0 ? A : -A): New patterns.
* tree-ssa-phiopt.c (abs_replacement): Delete function.
(tree_ssa_phiopt_worker): Don't call abs_replacement.
Update comment about abs_replacement.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect
ABSU and still not expect ABS_EXPR.
And I've already ack'd this part :-)  I think it's unchanged since he 
original.


Jeff



Re: [PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO

2021-06-24 Thread Martin Sebor via Gcc-patches

On 6/24/21 3:32 AM, Richard Biener wrote:

On Mon, Jun 21, 2021 at 11:55 PM Martin Sebor via Gcc-patches
 wrote:


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html

Looking for a review of the LTO changes to switch TREE_NO_WARNING to
the suppress_warning() API.


Hmm, since the warning suppressions are on location ad-hoc data the appropriate
thing is to adjust the location streaming and that part seems to be missing?

So what you now stream is simply the "everything" fallback, correct?

In particular:

else
-bp_pack_value (bp, TREE_NO_WARNING (expr), 1);
+/* FIXME: pack all warning bits.  */
+bp_pack_value (bp, warning_suppressed_p (expr), 1);

this looks like a wrong comment in that light.


Yes, this is just a fallback.  I haven't thought about how to handle
the FIXME yet but from your comment it sounds like this code might
stay the same (or maybe even go back to streaming the flag directly)
and the nowarn_spec_t bitmap should be streamed elsewhere?



-  else
-compare_values (TREE_NO_WARNING);
+  else if (t1->base.nowarning_flag != t2->base.nowarning_flag)
+return false;

uh.  Can you keep sth like TREE_NO_WARNING_RAW or so?


The flag is used directly in fold-const.c and cp/module.cc so
this would be in keeping with that, but I also don't mind adding
a macro for it.  My only concern is with macro getting used to
inadvertently bypass the API.

Martin



Thanks,
Richard.


On 6/4/21 3:43 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the LTO
front end with the new suppress_warning() API.  It adds a couple of
FIXMEs that I plan to take care of in a follow up.






Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-24 Thread Jason Merrill via Gcc-patches

On 6/24/21 11:13 AM, Jose E. Marchesi wrote:



This patch introduces a dwarf2int.h header, to be used by code that
needs access to the internal DIE structures and their attributes.


Why not put these bits in dwarf2out.h?

We think that it makes sense to have a separated interface file for
the
implementation of DWARF-based debug formats.  It is called `internal'
because it provides access to internal data structures as well as the
basic accessor functions to the internals of the DWARF DIEs.


Yes, but "internal data structures" also describes most of the current
dwarf2out.h.


Yes right, dwarf2out.h contains a mixture of function prototypes and
several data structures, many of them "internal".


I'm not opposed to refactoring the header, but splitting
off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better
dividing line.


Ok, so what about this: at this point we remove dwarf2int.h from our
patch, put the definitions in dwarf2out.h instead, and then once the
stuff is upstream we can discuss on how better refactor the dwarf2out*
stuff.

Is that ok?


Sounds good.

Jason



Re: [PATCH] c++: alias CTAD and aggregate deduction cand [PR98832]

2021-06-24 Thread Jason Merrill via Gcc-patches

On 6/24/21 11:15 AM, Patrick Palka wrote:

During alias CTAD, we're accidentally ignoring the aggregate deduction
candidate of the underlying template because it's added to the candidate
set separately via maybe_aggr_guide (which doesn't yet handle alias
templates) rather than via deduction_guides_for (which does).  This
patch makes maybe_aggr_guide handle alias templates in a manner similar
to deduction_guides_for.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/98832

gcc/cp/ChangeLog:

* pt.c (maybe_aggr_guide): Handle an alias template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias9.C: New test.
---
  gcc/cp/pt.c | 11 +++
  gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C |  6 ++
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index aeb1e8a6f97..db15e4714d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28880,6 +28880,8 @@ is_spec_or_derived (tree etype, tree tmpl)
return !err;
  }
  
+static tree alias_ctad_tweaks (tree, tree);

+
  /* Return a C++20 aggregate deduction candidate for TYPE initialized from
 INIT.  */
  
@@ -28892,6 +28894,15 @@ maybe_aggr_guide (tree tmpl, tree init, vec *args)

if (init == NULL_TREE)
  return NULL_TREE;
  
+  if (DECL_ALIAS_TEMPLATE_P (tmpl))

+{
+  tree under = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl));
+  tree tinfo = get_template_info (under);
+  if (tree guide = maybe_aggr_guide (TI_TEMPLATE (tinfo), init, args))
+   return alias_ctad_tweaks (tmpl, guide);
+  return NULL_TREE;
+}
+
/* We might be creating a guide for a class member template, e.g.,
  
 template struct A {

diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C
new file mode 100644
index 000..0aaf203639a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C
@@ -0,0 +1,6 @@
+// PR c++/98832
+// { dg-do compile { target c++20 } }
+
+template struct X{ U u; };
+template using Y = X;
+Y y{0};





Re: [PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)

2021-06-24 Thread Jakub Jelinek via Gcc-patches
On Fri, May 14, 2021 at 09:20:25PM +0800, Chung-Lin Tang wrote:
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index e790f08b23f..69c4a8e0a0a 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -10374,6 +10374,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
> void *data)
> gcc_unreachable ();
>   }
>OMP_CLAUSE_SET_MAP_KIND (clause, kind);
> +  OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1;
>if (DECL_SIZE (decl)
> && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
>   {

As Thomas mentioned, there is now also OMP_CLAUSE_MAP_IMPLICIT that means
something different:
/* Nonzero on map clauses added implicitly for reduction clauses on combined
   or composite constructs.  They shall be removed if there is an explicit
   map clause.  */
Having OMP_CLAUSE_MAP_IMPLICIT and OMP_CLAUSE_MAP_IMPLICIT_P would be too
confusing.  So either we need to use just one flag for both purposes or
have two different flags and find a better name for one of them.
The former would be possible if no OMP_CLAUSE_MAP clauses added by the FEs
are implicit - then you could clear OMP_CLAUSE_MAP_IMPLICIT in
gimplify_scan_omp_clauses.  I wonder if it is the case though, e.g. doesn't
your "Improve OpenMP target support for C++ [PR92120 v4]" patch add a lot of
such implicit map clauses (e.g. the this[:1] and various others)?
Also, gimplify_adjust_omp_clauses_1 sometimes doesn't add just one map
clause, but several, shouldn't those be marked implicit too?  And similarly
it calls lang_hooks.decls.omp_finish_clause which can add even further map
clauses implicitly, shouldn't those be implicit too (in that case copy
the flag from the clause it is called on to the extra clauses it adds)?

Also as Thomas mentioned, it should be restricted to non-OpenACC,
it can check gimplify_omp_ctxp->region_type if it is OpenMP or OpenACC.

> @@ -10971,9 +10972,15 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
> gimple_seq body, tree *list_p,
>   list_p = &OMP_CLAUSE_CHAIN (c);
>  }
>  
> -  /* Add in any implicit data sharing.  */
> +  /* Add in any implicit data sharing. Implicit clauses are added at the 
> start

Two spaces after dot in comments.

> + of the clause list, but after any non-map clauses.  */
>struct gimplify_adjust_omp_clauses_data data;
> -  data.list_p = list_p;
> +  tree *implicit_add_list_p = orig_list_p;
> +  while (*implicit_add_list_p
> +  && OMP_CLAUSE_CODE (*implicit_add_list_p) != OMP_CLAUSE_MAP)
> +implicit_add_list_p = &OMP_CLAUSE_CHAIN (*implicit_add_list_p);

Why are the implicit map clauses added first and not last?
There is also the OpenMP 5.1 [352:17-22] case which basically says that the
implicit mappings should be ignored if there are explicit ones on the same
construct (though, do we really create implicit clauses in that case?).

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -40,11 +40,22 @@
>  #define GOMP_MAP_FLAG_SPECIAL_0  (1 << 2)
>  #define GOMP_MAP_FLAG_SPECIAL_1  (1 << 3)
>  #define GOMP_MAP_FLAG_SPECIAL_2  (1 << 4)
> +#define GOMP_MAP_FLAG_SPECIAL_3  (1 << 5)
>  #define GOMP_MAP_FLAG_SPECIAL_4  (1 << 6)
>  #define GOMP_MAP_FLAG_SPECIAL(GOMP_MAP_FLAG_SPECIAL_1 \
>| GOMP_MAP_FLAG_SPECIAL_0)
>  #define GOMP_MAP_DEEP_COPY   (GOMP_MAP_FLAG_SPECIAL_4 \
>| GOMP_MAP_FLAG_SPECIAL_2)
> +/* This value indicates the map was created implicitly according to
> +   OpenMP rules.  */
> +#define GOMP_MAP_IMPLICIT(GOMP_MAP_FLAG_SPECIAL_3 \
> +  | GOMP_MAP_FLAG_SPECIAL_4)
> +/* Mask for entire set of special map kind bits.  */
> +#define GOMP_MAP_FLAG_SPECIAL_BITS   (GOMP_MAP_FLAG_SPECIAL_0 \
> +  | GOMP_MAP_FLAG_SPECIAL_1 \
> +  | GOMP_MAP_FLAG_SPECIAL_2 \
> +  | GOMP_MAP_FLAG_SPECIAL_3 \
> +  | GOMP_MAP_FLAG_SPECIAL_4)
>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
>  #define GOMP_MAP_FLAG_FORCE  (1 << 7)
>  
> @@ -186,6 +197,9 @@ enum gomp_map_kind
>  #define GOMP_MAP_ALWAYS_P(X) \
>(GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM))
>  
> +#define GOMP_MAP_IMPLICIT_P(X) \
> +  (((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_IMPLICIT)

I think here we need to decide with which GOMP_MAP* kinds the implicit
bit will need to be combined with, with looking forward into what features
we still need to implement for OpenMP 5.0/5.1 (not aware of anything in 5.2
that would need special care but perhaps I've missed it).

E.g. for declare mapper those single OMP_CLAUSE_MAPs with the implicit
bit might need to be split into various smaller ones, map this FIELD_DECL,
map that other FIELD_DECL, what it points to, etc.  Even without declare
mapp

Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
>
> CPUID functions are used to detect CPU features.  If vector ISAs
> are enabled, compiler is free to use them in these functions.  Add
> __attribute__ ((target("general-regs-only"))) to CPUID functions
> to avoid vector instructions.

These functions are intended to be inlined, so how does target
attribute affect inlining?

Uros.

>
> gcc/
>
> PR target/101185
> * config/i386/cpuid.h (__get_cpuid_max): Add
> __attribute__ ((target("general-regs-only"))).
> (__get_cpuid): Likewise.
> (__get_cpuid_count): Likewise.
> (__cpuidex): Likewise.
>
> gcc/testsuite/
>
> PR target/101185
> * gcc.target/i386/avx512-check.h (check_osxsave): Add
> __attribute__ ((target("general-regs-only"))).
> (main): Likewise.
> ---
>  gcc/config/i386/cpuid.h  | 4 
>  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..74881ee91e5 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -243,6 +243,7 @@
> pointer is non-null, then first four bytes of the signature
> (as found in ebx register) are returned in location pointed by sig.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline unsigned int
>  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
>  {
> @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> supported and returns 1 for valid cpuid information or 0 for
> unsupported cpuid leaf.  All pointers are required to be non-null.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid (unsigned int __leaf,
>  unsigned int *__eax, unsigned int *__ebx,
> @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
>
>  /* Same as above, but sub-leaf can be specified.  */
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline int
>  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
>unsigned int *__eax, unsigned int *__ebx,
> @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
> __subleaf,
>return 1;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  static __inline void
>  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..406faf8fe03 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -25,6 +25,7 @@ do_test (void)
>  }
>  #endif
>
> +__attribute__ ((target("general-regs-only")))
>  static int
>  check_osxsave (void)
>  {
> @@ -34,6 +35,7 @@ check_osxsave (void)
>return (ecx & bit_OSXSAVE) != 0;
>  }
>
> +__attribute__ ((target("general-regs-only")))
>  int
>  main ()
>  {
> --
> 2.31.1
>


Re: [PATCH 4/7] Allow match-and-simplified phiopt to run in early phiopt

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To move a few things more to match-and-simplify from phiopt,
we need to allow match_simplify_replacement to run in early
phiopt. To do this we add a replacement for gimple_simplify
that is explictly for phiopt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement):
Add early_p argument. Call gimple_simplify_phiopt
instead of gimple_simplify.
(tree_ssa_phiopt_worker): Update call to
match_simplify_replacement and allow unconditionally.
(phiopt_early_allow): New function.
(gimple_simplify_phiopt): New function.
So the two questions on my end are why did we restrict when this could 
run before and why restrict the codes we're willing to optimize in the 
early phase?  Not an ACK or NAK at this point, just trying to understand 
a bit of the history.


jeff



Re: [PATCH 6/7] Lower for loops before lowering cond in genmatch

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

While converting some fold_cond_expr_with_comparison
to match, I found that I wanted to use "for cnd (cond vec_cond)"
but that was not causing the lowering of cond to happen.
What was happening was the lowering of the for loop
was happening after the lowering of the cond. So
swapping was the correct thing to do but it also
means we need to copy for_subst_vec in lower_cond.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* genmatch.c (lower_cond): Copy for_subst_vec
for the simplify also.
(lower): Swap the order for lower_for and lower_cond.

I need to defer this to Richi.  I don't know the genmatch code at all.

jeff



RE: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

2021-06-24 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 24 June 2021 12:11
> To: gcc Patches ; Kyrylo Tkachov
> 
> Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics
> 
> Hi,
> This patch replaces builtins for vdup_n and vmov_n.
> The patch results in regression for pr51534.c.
> Consider following function:
> 
> uint8x8_t f1 (uint8x8_t a) {
>   return vcgt_u8(a, vdup_n_u8(0));
> }
> 
> code-gen before patch:
> f1:
> vmov.i32  d16, #0  @ v8qi
> vcgt.u8 d0, d0, d16
> bx lr
> 
> code-gen after patch:
> f1:
> vceq.i8 d0, d0, #0
> vmvnd0, d0
> bx lr
> 
> I am not sure which one is better tho ?

I think they're equivalent in practice, in any case the patch itself is good 
(move away from RTL builtins).
Ok.
Thanks,
Kyrill

> 
> Also, this patch regressed bf16_dup.c on arm-linux-gnueabi,
> which is due to a missed opt in lowering. I had filed it as
> PR98435, and posted a fix for it here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html
> 
> Thanks,
> Prathamesh


Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-24 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, Jun 23, 2021 at 12:22 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  
>> > wrote:
>> >>
>> >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
>> >> > On 6/21/21 1:15 AM, Richard Biener wrote:
>> > [...]
>> >> > >
>> >> > > But maybe I'm misunderstanding C++ too much :/
>> >> > >
>> >> > > Well, I guess b) from above means auto_vec<> passing to
>> >> > > vec<> taking functions will need changes?
>> >> >
>> >> > Converting an auto_vec object to a vec slices off its data members.
>> >> > The auto_vec specialization has no data members so that's not
>> >> > a bug in and of itself, but auto_vec does have data members
>> >> > so that would be a bug.  The risk is not just passing it to
>> >> > functions by value but also returning it.  That risk was made
>> >> > worse by the addition of the move ctor.
>> >>
>> >> I would agree that the conversion from auto_vec<> to vec<> is
>> >> questionable, and should get some work at some point, perhaps just
>> >> passingauto_vec references is good enough, or perhaps there is value in
>> >> some const_vec view to avoid having to rely on optimizations, I'm not
>> >> sure without looking more at the usage.
>> >
>> > We do need to be able to provide APIs that work with both auto_vec<>
>> > and vec<>, I agree that those currently taking a vec<> by value are
>> > fragile (and we've had bugs there before), but I'm not ready to say
>> > that changing them all to [const] vec<>& is OK.  The alternative
>> > would be passing a const_vec<> by value, passing that along to
>> > const vec<>& APIs should be valid then (I can see quite some API
>> > boundary cleanups being necessary here ...).
>>
>> FWIW, as far as const_vec<> goes, we already have array_slice, which is
>> mostly a version of std::span.  (The only real reason for not using
>> std::span itself is that it requires a later version of C++.)
>>
>> Taking those as arguments has the advantage that we can pass normal
>> arrays as well.
>
> It's not really a "const" thing it seems though it certainly would not expose
> any API that would reallocate the vector (which is the problematic part
> of passing vec<> by value, not necessarily changing elements in-place).
>
> Since array_slice doesn't inherit most of the vec<> API transforming an
> API from vec<> to array_slice<> will be also quite some work.  But I
> agree it might be useful for generic API stuff.

Which API functions would be the most useful ones to have?  We could
add them if necessary.

There again, for things like searching and sorting, I think it would
be better to use  where possible.  It should usually be
more efficient than the void* callback approach that the C code tended
to use.

Thanks,
Richard


RE: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-24 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 14 June 2021 09:02
> To: Christophe Lyon 
> Cc: gcc Patches ; Kyrylo Tkachov
> 
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>  wrote:
> >
> > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon 
> wrote:
> > >
> > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > > As mentioned in PR, for the following test-case:
> > > >
> > > > #include 
> > > >
> > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > {
> > > >   return vdup_n_bf16 (a);
> > > > }
> > > >
> > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > {
> > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > }
> > > >
> > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > >
> > > > f1:
> > > > vdup.16 d16, r0
> > > > vmovr0, r1, d16  @ v4bf
> > > > bx  lr
> > > >
> > > > f2:
> > > > mov r3, r0  @ __bf16
> > > > adr r1, .L4
> > > > ldrdr0, [r1]
> > > > mov r2, r3  @ __bf16
> > > > mov ip, r3  @ __bf16
> > > > bfi r1, r2, #0, #16
> > > > bfi r0, ip, #0, #16
> > > > bfi r1, r3, #16, #16
> > > > bfi r0, r2, #16, #16
> > > > bx  lr
> > > >
> > > > This seems to happen because vec_init pattern in neon.md has VDQ
> mode
> > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > mode
> > > > to VDQX which seems to work for the test-case, and the compiler now
> generates:
> > > >
> > > > f2:
> > > > vdup.16 d16, r0
> > > > vmovr0, r1, d16  @ v4bf
> > > > bx  lr
> > > >
> > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> not
> > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > only 128-bit vectors ?
> > > >
> > >
> > > I think patterns common to both Neon and MVE should be moved to
> > > vec-common.md, I don't know why such patterns were left in neon.md.
> > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > I am not sure if we should separate the pattern ?
> > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > in attached patch so
> > it will call neon_expand_vector_init only for 128-bit vectors ?
> > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> (attaching patch as text).
> 

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -459,10 +459,12 @@
 )
 
 (define_expand "vec_init"
-  [(match_operand:VDQ 0 "s_register_operand")
+  [(match_operand:VDQX 0 "s_register_operand")
(match_operand 1 "" "")]
   "TARGET_NEON || TARGET_HAVE_MVE"
 {
+  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
+FAIL;
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
 })

I think we should move this to vec-common.md like Christophe said.
Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in 
the expander condition?
"TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< VDQ>mode) != 16)"

Thanks,
Kyrill

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > That being said, I suggest you look at other similar patterns in
> > > vec-common.md, most of which are gated on
> > > ARM_HAVE__ARITH
> > > and possibly beware of issues with iwmmxt :-)
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Prathamesh


Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/22/2021 1:05 AM, Xi Ruoyao via Gcc-patches wrote:

mips.exp does not support -fno-inline, causing the tests return "ERROR:
Unrecognised option: -fno-inline for dg-options ... ".

Use noinline attribute like other mips target tests, to workaround it.

gcc/testsuite/

* gcc.target/mips/cfgcleanup-jalr2.c: Remove -fno-inline and add
  __attribute__((noinline)).
* gcc.target/mips/cfgcleanup-jalr3.c: Likewise.
I'd like to know a bit more here.  mips.exp shouldn't care about the 
options passed to the compiler and to the best of my knowledge 
-fno-inline is still a valid GCC option.  So while I don't think the 
patch itself is wrong, I question if it's necessary and whether or not 
your just papering over some other issue.


Jeff



Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline

2021-06-24 Thread Xi Ruoyao via Gcc-patches
On Thu, 2021-06-24 at 10:48 -0600, Jeff Law wrote:
> 
> 
> On 6/22/2021 1:05 AM, Xi Ruoyao via Gcc-patches wrote:
> > mips.exp does not support -fno-inline, causing the tests return
> > "ERROR:
> > Unrecognised option: -fno-inline for dg-options ... ".
> > 
> > Use noinline attribute like other mips target tests, to workaround
> > it.
> > 
> > gcc/testsuite/
> > 
> > * gcc.target/mips/cfgcleanup-jalr2.c: Remove -fno-inline and
> > add
> >   __attribute__((noinline)).
> > * gcc.target/mips/cfgcleanup-jalr3.c: Likewise.
> I'd like to know a bit more here.  mips.exp shouldn't care about the 
> options passed to the compiler and to the best of my knowledge 
> -fno-inline is still a valid GCC option.  So while I don't think the 
> patch itself is wrong, I question if it's necessary and whether or not
> your just papering over some other issue.

There is some logic processing options in mips.exp.  Some options are
overrided for multilib.  It seems the mips.exp was originally designed
as:

* MIPS options should go in dg-options
* Other options should go in dg-additional-options

In d2148424165 marxin merged some dg-additional-options into dg-options,
exploited the problem.

And, the "origin" convention seems already broken: there is something
like -funroll-loops which is not a MIPS option, but accepted by mips.exp
in dg-options.

Possiblities are:

(1) this patch
(2) make mips.exp accept -fno-inline as "if it is a MIPS option"
(3) refactor mips.exp to pass everything itself doesn't know directly to
gcc
-- 
Xi Ruoyao 



PING^1 [PATCH v4 0/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-24 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 4:39 PM H.J. Lu  wrote:
>
> 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
> operands to vector broadcast from an integer with AVX2.
> 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> won't increase stack alignment requirement and blocks transformation by
> the combine pass.
>
> A small benchmark:
>
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
>
> shows that broadcast is a little bit faster on Intel Core i7-8559U:
>
> $ make
> gcc -g -I. -O2   -c -o test.o test.c
> gcc -g   -c -o memory.o memory.S
> gcc -g   -c -o broadcast.o broadcast.S
> gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
> gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
> ./test
> memory  : 147215
> broadcast   : 121213
> vec_dup_sse2: 171366
> $
>
> broadcast is also smaller:
>
> $ size memory.o broadcast.o
>textdata bss dec hex filename
> 132   0   0 132  84 memory.o
> 122   0   0 122  7a broadcast.o
> $
>
> 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> from memory.
> 4. Update avx512f_cond_move.c to expect integer broadcast.
>
> A small benchmark:
>
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
>
> shows that integer broadcast is faster than embedded memory broadcast:
>
> $ make
> gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
> gcc -g   -c -o memory.o memory.S
> gcc -g   -c -o broadcast.o broadcast.S
> gcc -o test test.o memory.o broadcast.o
> ./test
> memory  : 425538
> broadcast   : 375260
> $
>
> 5. Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.  This can be used by memset expander to avoid vec_duplicate
> when loading from constant pool is more efficient.
> 6. Add vec_duplicate expander and enable vec_duplicate from a
> non-standard SSE constant integer only if vector broadcast is available.
>
> H.J. Lu (2):
>   x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
>   x86: Add vec_duplicate expander
>
>  gcc/config/i386/i386-expand.c | 208 +-
>  gcc/config/i386/i386-protos.h |   3 +
>  gcc/config/i386/i386.c|  13 ++
>  gcc/config/i386/sse.md|  21 ++
>  gcc/doc/md.texi   |   2 -
>  .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
>  .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
>  .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
>  .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
>  .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
>  gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
>  gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
>  gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
>  gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 ++
>  gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
>  gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
>  gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
>  gcc/testsuite/gcc.target/i386/pr100865-2.c|  14 ++
>  gcc/testsuite/gcc.target/i386/pr100865-3.c|  15 ++
>  gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
>  gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
>  gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
>  gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
>  gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
>  gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
>  gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
>  gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
>  gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
>  gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
>  gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 +++
>  gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
>  31 files changed, 563 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
>  create mode 100644 gcc/testsuite/gcc.target/i

[COMMITTED] Correctly unify recomputation with existing range.

2021-06-24 Thread Andrew MacLeod via Gcc-patches
When propagating the on-entry cache, new block ranges are calculated by 
combining all the incoming edges and comparing to the old value.


When a recomputation was performed on an edge, it didn't take into 
account that the value in the block may already be better than a 
potential recompuation... Thus a worse values was sometimes propagated.


Fixed by simply calling the now correct range_on_edge the cache provides.

bootstraps on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew


>From 5bdcfb74ff97b42f08993af8614c35685fdd8689 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 23 Jun 2021 15:25:45 -0400
Subject: [PATCH 2/3] Correctly unify recomputation with existing range.

When propagating the on-entry cache, new block ranges are calculated
by combining all the incoming edges and comparing to the old value.
When a recomputation was performed on an edge, it didn't take into account
that the value in the block may already be better than a potential
recompuation... Thus a worse values was sometimes propagated.
Fixed by simply calling the now correct range_on_edge the cache provides.

	* gimple-range-cache.cc (ranger_cache::propagate_cache): Call
	range_on_edge instead of manually calculating.
---
 gcc/gimple-range-cache.cc | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index a377261c5c7..98ecdbbd68e 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1037,27 +1037,12 @@ ranger_cache::propagate_cache (tree name)
   new_range.set_undefined ();
   FOR_EACH_EDGE (e, ei, bb->preds)
 	{
+	  range_on_edge (e_range, e, name);
 	  if (DEBUG_RANGE_CACHE)
-	fprintf (dump_file, "   edge %d->%d :", e->src->index, bb->index);
-	  // Get whatever range we can for this edge.
-	  if (!m_gori.outgoing_edge_range_p (e_range, e, name, *this))
 	{
-	  exit_range (e_range, name, e->src);
-	  if (DEBUG_RANGE_CACHE)
-		{
-		  fprintf (dump_file, "No outgoing edge range, picked up ");
-		  e_range.dump (dump_file);
-		  fprintf (dump_file, "\n");
-		}
-	}
-	  else
-	{
-	  if (DEBUG_RANGE_CACHE)
-		{
-		  fprintf (dump_file, "outgoing range :");
-		  e_range.dump (dump_file);
-		  fprintf (dump_file, "\n");
-		}
+	  fprintf (dump_file, "   edge %d->%d :", e->src->index, bb->index);
+	  e_range.dump (dump_file);
+	  fprintf (dump_file, "\n");
 	}
 	  new_range.union_ (e_range);
 	  if (new_range.varying_p ())
@@ -1074,7 +1059,11 @@ ranger_cache::propagate_cache (tree name)
 	  if (DEBUG_RANGE_CACHE) 
 	{
 	  if (!ok_p)
-		fprintf (dump_file, " Cache failure to store value.");
+		{
+		  fprintf (dump_file, "   Cache failure to store value:");
+		  print_generic_expr (dump_file, name, TDF_SLIM);
+		  fprintf (dump_file, "  ");
+		}
 	  else
 		{
 		  fprintf (dump_file, "  Updating range to ");
-- 
2.17.2



[COMMITTED] Fix relation query of equivalences.

2021-06-24 Thread Andrew MacLeod via Gcc-patches


When looking for relations between equivalencies, a typo was causing the 
same bitmap to be checked for both operands, instead of the correct one 
for each.   This caused us to never notice relations between equivalences.


I also noticed that under some circumstances the relation dump would 
call blocks which were NULL and trap.. Also fixed.


bootstraps on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

>From ce0b409f562cd09c67cc2dce74143a0f0647cde5 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 24 Jun 2021 11:13:47 -0400
Subject: [PATCH 3/3] Fix relation query of equivalences.

When looking for relations between equivalencies, a typo was causing
the wrong bitmap to be checked. Effect was is missed them.
Plus don't dump blocks which don't exist.

	* value-relation.cc (equiv_oracle::dump): Do not dump NULL blocks.
	(relation_oracle::find_relation_block): Check correct bitmap.
	(relation_oracle::dump): Do not dump NULL blocks.
---
 gcc/value-relation.cc | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 3c8698f2a54..43fcab7995a 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -444,7 +444,7 @@ equiv_oracle::dump (FILE *f) const
 {
   fprintf (f, "Equivalency dump\n");
   for (unsigned i = 0; i < m_equiv.length (); i++)
-if (m_equiv[i])
+if (m_equiv[i] && BASIC_BLOCK_FOR_FN (cfun, i))
   {
 	fprintf (f, "BB%d\n", i);
 	dump (f, BASIC_BLOCK_FOR_FN (cfun, i));
@@ -757,9 +757,9 @@ relation_oracle::find_relation_block (unsigned bb, const_bitmap b1,
 {
   unsigned op1 = SSA_NAME_VERSION (ptr->op1 ());
   unsigned op2 = SSA_NAME_VERSION (ptr->op2 ());
-  if (bitmap_bit_p (b1, op1) && bitmap_bit_p (b1, op2))
+  if (bitmap_bit_p (b1, op1) && bitmap_bit_p (b2, op2))
 	return ptr->kind ();
-  if (bitmap_bit_p (b1, op2) && bitmap_bit_p (b1, op1))
+  if (bitmap_bit_p (b1, op2) && bitmap_bit_p (b2, op1))
 	return relation_swap (ptr->kind ());
 }
 
@@ -925,8 +925,9 @@ relation_oracle::dump (FILE *f) const
 {
   fprintf (f, "Relation dump\n");
   for (unsigned i = 0; i < m_relations.length (); i++)
-{
-  fprintf (f, "BB%d\n", i);
-  dump (f, BASIC_BLOCK_FOR_FN (cfun, i));
-}
+if (BASIC_BLOCK_FOR_FN (cfun, i))
+  {
+	fprintf (f, "BB%d\n", i);
+	dump (f, BASIC_BLOCK_FOR_FN (cfun, i));
+  }
 }
-- 
2.17.2



[PATCH] fixinc: don't "fix" machine names in quoted string [PR91085]

2021-06-24 Thread Xi Ruoyao via Gcc-patches
Bootstrapped and regtested on x86_64-linux-gnu. Ok for trunk?

fixincludes/

* fixfixes.c (print_quote): Define it unconditionally, taking
  and returning const char *.
  (machine_name_fix): Output quoted strings verbatim.
---
 fixincludes/fixfixes.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c
index 5b23a8b640d..2f4ee28a897 100644
--- a/fixincludes/fixfixes.c
+++ b/fixincludes/fixfixes.c
@@ -78,15 +78,14 @@ static void fix (const char* filname ATTRIBUTE_UNUSED , \
  const char* text ATTRIBUTE_UNUSED , \
  tFixDesc* p_fixd ATTRIBUTE_UNUSED )
 
-#ifdef NEED_PRINT_QUOTE
 /*
  *  Skip over a quoted string.  Single quote strings may
  *  contain multiple characters if the first character is
  *  a backslash.  Especially a backslash followed by octal digits.
  *  We are not doing a correctness syntax check here.
  */
-static char*
-print_quote(char q, char* text )
+static const char*
+print_quote(char q, const char* text )
 {
   fputc( q, stdout );
 
@@ -118,7 +117,6 @@ print_quote(char q, char* text )
 
   return text;
 }
-#endif /* NEED_PRINT_QUOTE */
 
 
 /*
@@ -488,7 +486,7 @@ FIX_PROC_HEAD( char_macro_def_fix )
 FIX_PROC_HEAD( machine_name_fix )
 {
   regmatch_t match[2];
-  const char *line, *base, *limit, *p, *q;
+  const char *line, *base, *limit, *p, *q, *quote;
   regex_t *label_re, *name_re;
   char scratch[SCRATCHSZ];
   size_t len;
@@ -533,7 +531,7 @@ FIX_PROC_HEAD( machine_name_fix )
   for (;;)
 {
 again:
-  if (base == limit)
+  if (base >= limit)
 break;
 
   if (xregexec (name_re, base, 1, match, REG_NOTBOL))
@@ -543,6 +541,26 @@ FIX_PROC_HEAD( machine_name_fix )
   if (match[0].rm_eo > limit - base)
 break;
 
+  /* Looking for the next non-escaped quote in the line.  */
+  quote = base - 1;
+  do
+{
+  quote++;
+  quote = strpbrk (quote, "'\"");
+  if (quote && quote > limit)
+quote = NULL;
+}
+  while (quote && quote[-1] == '\\');
+
+  if (quote && match[0].rm_eo > quote - base)
+{
+  /* Match is after the quote: print the quoted string and
+ everything before it verbatim.  */
+  fwrite (text, 1, quote - text, stdout);
+  base = text = print_quote (*quote, quote + 1);
+  goto again;
+}
+
   p = base + match[0].rm_so;
   base += match[0].rm_eo;
 
-- 
2.32.0




Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread H.J. Lu via Gcc-patches
On Thu, Jun 24, 2021 at 9:12 AM Uros Bizjak  wrote:
>
> On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
> >
> > CPUID functions are used to detect CPU features.  If vector ISAs
> > are enabled, compiler is free to use them in these functions.  Add
> > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > to avoid vector instructions.
>
> These functions are intended to be inlined, so how does target
> attribute affect inlining?

CPUID function won't be inlined.   Then we need to rewrite x86 tests
to put CPUID check in a separate function with target attribute to
limit ISAs used.

> Uros.
>
> >
> > gcc/
> >
> > PR target/101185
> > * config/i386/cpuid.h (__get_cpuid_max): Add
> > __attribute__ ((target("general-regs-only"))).
> > (__get_cpuid): Likewise.
> > (__get_cpuid_count): Likewise.
> > (__cpuidex): Likewise.
> >
> > gcc/testsuite/
> >
> > PR target/101185
> > * gcc.target/i386/avx512-check.h (check_osxsave): Add
> > __attribute__ ((target("general-regs-only"))).
> > (main): Likewise.
> > ---
> >  gcc/config/i386/cpuid.h  | 4 
> >  gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..74881ee91e5 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -243,6 +243,7 @@
> > pointer is non-null, then first four bytes of the signature
> > (as found in ebx register) are returned in location pointed by sig.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline unsigned int
> >  __get_cpuid_max (unsigned int __ext, unsigned int *__sig)
> >  {
> > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int 
> > *__sig)
> > supported and returns 1 for valid cpuid information or 0 for
> > unsupported cpuid leaf.  All pointers are required to be non-null.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid (unsigned int __leaf,
> >  unsigned int *__eax, unsigned int *__ebx,
> > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf,
> >
> >  /* Same as above, but sub-leaf can be specified.  */
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline int
> >  __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf,
> >unsigned int *__eax, unsigned int *__ebx,
> > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int 
> > __subleaf,
> >return 1;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static __inline void
> >  __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..406faf8fe03 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -25,6 +25,7 @@ do_test (void)
> >  }
> >  #endif
> >
> > +__attribute__ ((target("general-regs-only")))
> >  static int
> >  check_osxsave (void)
> >  {
> > @@ -34,6 +35,7 @@ check_osxsave (void)
> >return (ecx & bit_OSXSAVE) != 0;
> >  }
> >
> > +__attribute__ ((target("general-regs-only")))
> >  int
> >  main ()
> >  {
> > --
> > 2.31.1
> >



-- 
H.J.


Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

2021-06-24 Thread Patrick Palka via Gcc-patches
On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote:

> The LWG issue proposes to add a conditional noexcept-specifier to
> std::unique_ptr's dereference operator. The issue is currently in
> Tentatively Ready status, but even if it isn't voted into the draft, we
> can do it as a conforming extensions. This commit also adds a similar
> noexcept-specifier to operator[] for the unique_ptr partial
> specialization.

The conditional noexcept added to unique_ptr::operator[] seems to break
the case where T is complete only after the fact:

  struct T;
  extern std::unique_ptr p;
  auto& t = p[1];
  struct T { };

/include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename 
std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], 
_Dp>::operator[](std::size_t) co
nst [with _Tp = A; _Dp = std::default_delete; typename 
std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’:
testcase.cc:5:14:   required from here
/include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of incomplete 
type ‘struct A’
  658 |   
noexcept(noexcept(std::declval()[std::declval()]))
  | ~~~^
testcase.cc:3:8: note: forward declaration of ‘struct A’
3 | struct A;
  |^

> 
> Also ensure that all dereference operators for shared_ptr are noexcept,
> and adds tests for the std::optional accessors modified by the issue,
> which were already noexcept in our implementation.
> 
> Signed-off-by: Jonathan Wakely 
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
>   Add noexcept.
>   * include/bits/unique_ptr.h (unique_ptr::operator*): Add
>   conditional noexcept as per LWG 2762.
>   * testsuite/20_util/shared_ptr/observers/array.cc: Check that
>   dereferencing cannot throw.
>   * testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
>   * testsuite/20_util/optional/observers/lwg2762.cc: New test.
>   * testsuite/20_util/unique_ptr/lwg2762.cc: New test.
> 
> Tested powerpc64le-linux. Committed to trunk.
> 
> 


Re: [PATCH] aix: handle 64bit inodes for include directories

2021-06-24 Thread Jeff Law via Gcc-patches




On 6/23/2021 12:53 AM, CHIGOT, CLEMENT via Gcc-patches wrote:

Hi David,

Did you have a chance to take look at this patch ?

Thanks,
Clément



+DavidMalcolm

Can you review this patch when you have a moment?

Thanks, David

On Mon, May 17, 2021 at 3:05 PM David Edelsohn  wrote:

The aix.h change is okay with me, but you need to get approval for the
incpath.c and cpplib.h parts of the patch from the appropriate
maintainers.

Thanks, David

On Mon, May 17, 2021 at 7:44 AM CHIGOT, CLEMENT  wrote:

On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
If the inode is larger, it will return -1 in st_ino.
Thus, in incpath.c when comparing include directories, if several
of them have 64bit inodes, they will be considered as duplicated.

gcc/ChangeLog:
2021-05-06  Clément Chigot  

 * configure.ac: Check sizeof ino_t and dev_t.
 * config.in: Regenerate.
 * configure: Regenerate.
 * config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
 * incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
 (remove_duplicates): Use it.

libcpp/ChangeLog:
2021-05-06  Clément Chigot  

 * configure.ac: Check sizeof ino_t and dev_t.
 * config.in: Regenerate.
 * configure: Regenerate.
 * include/cpplib.h (INO_T_CPP): Change for AIX.
 (DEV_T_CPP): New macro.
 (struct cpp_dir): Use it.
So my worry here is this is really a host property -- ie, this is 
behavior of where GCC runs, not the target for which GCC is generating code.


That implies that the change in aix.h is wrong.  aix.h is for the 
target, not the host -- you don't want to define something like 
HOST_STAT_FOR_64BIT_INODES there.


You'd want to be triggering this behavior via a host fragment, x-aix, or 
better yet via an autoconf test.


jeff


[r12-1764 Regression] FAIL: g++.dg/opt/pr91838.C -std=c++2a scan-assembler pxor\\s+%xmm0,\\s+%xmm0 on Linux/x86_64

2021-06-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3bd86940c428de9dde53e41265fb1435ed236f5e is the first bad commit
commit 3bd86940c428de9dde53e41265fb1435ed236f5e
Author: liuhongt 
Date:   Tue Jan 26 16:29:32 2021 +0800

i386: Add vashlm3/vashrm3/vlshrm3 to enable vectorization of vector shift 
vector. [PR98434]

caused

FAIL: g++.dg/opt/pr91838.C  -std=c++14  scan-assembler pxor\\s+%xmm0,\\s+%xmm0
FAIL: g++.dg/opt/pr91838.C  -std=c++17  scan-assembler pxor\\s+%xmm0,\\s+%xmm0
FAIL: g++.dg/opt/pr91838.C  -std=c++2a  scan-assembler pxor\\s+%xmm0,\\s+%xmm0

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1764/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/opt/pr91838.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-1773 Regression] FAIL: gcc.dg/vect/pr96854.c (test for excess errors) on Linux/x86_64

2021-06-24 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

7a6c31f0f84a7295433ebac09b94fae2d5cc2892 is the first bad commit
commit 7a6c31f0f84a7295433ebac09b94fae2d5cc2892
Author: Richard Biener 
Date:   Mon May 31 13:19:01 2021 +0200

Add x86 addsub SLP pattern

caused

FAIL: gcc.dg/vect/pr96854.c -flto -ffat-lto-objects (internal compiler error)
FAIL: gcc.dg/vect/pr96854.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr96854.c (internal compiler error)
FAIL: gcc.dg/vect/pr96854.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1773/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/pr96854.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/pr96854.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

2021-06-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Jun 2021 at 19:54, Patrick Palka  wrote:
>
> On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote:
>
> > The LWG issue proposes to add a conditional noexcept-specifier to
> > std::unique_ptr's dereference operator. The issue is currently in
> > Tentatively Ready status, but even if it isn't voted into the draft, we
> > can do it as a conforming extensions. This commit also adds a similar
> > noexcept-specifier to operator[] for the unique_ptr partial
> > specialization.
>
> The conditional noexcept added to unique_ptr::operator[] seems to break
> the case where T is complete only after the fact:
>
>   struct T;
>   extern std::unique_ptr p;
>   auto& t = p[1];
>   struct T { };
>
> /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename 
> std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], 
> _Dp>::operator[](std::size_t) co
> nst [with _Tp = A; _Dp = std::default_delete; typename 
> std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’:
> testcase.cc:5:14:   required from here
> /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of 
> incomplete type ‘struct A’
>   658 |   
> noexcept(noexcept(std::declval()[std::declval()]))
>   | ~~~^
> testcase.cc:3:8: note: forward declaration of ‘struct A’
> 3 | struct A;
>   |^

OK, I'll remove that again.



Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-24 Thread Segher Boessenkool
On Thu, Jun 24, 2021 at 05:32:20PM +0800, Kewen.Lin wrote:
> on 2021/6/24 上午12:58, Segher Boessenkool wrote:
> > On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote:
>  +#ifdef FLOAT128_HW_INSNS_ISA3_1
>   TFtype __floattikf (TItype_ppc)
> __attribute__ ((__ifunc__ ("__floattikf_resolve")));
> >>>
> >>> I wonder if we now need TItype_ppc at all anymore, btw?
> >>
> >> Sorry that I don't quite follow this question.
> > 
> > I thought it may do the same as just TItype now, but the ifunc stuff
> > probably makes it different still :-)
> 
> Ah, thanks for the clarification!  If I read it right, TItype is defined
> with __attribute__ ((mode (TI))) while TItype_ppc is defined with 
> __attribute__ ((__mode__ (__TI__))), the later writing looks special.

I managed to read things wrong, I thought there was some ifunc stuff in
the definition of TItype_ppc.  Of course there is not, it is just
setting the mode.

mode(__TI__) is just the more portable way of writing mode(TI), the
latter will not work if something #define's TI (you cannot do that with
__TI__, you are not allowed to by the C standard, in application code).

So it looks like we could just use {U,}TItype here, no _ppc.  Carl, is
there some reason I'm not seeing you used it?


Segher


[COMMITTED] Add a test case to confirm the equivalence's are being checked by EVRP.

2021-06-24 Thread Andrew MacLeod via Gcc-patches
Adding a testcase which shows the equivalence/relation engine working as 
it should.


pushed.

Andrew


>From ce3316e9c02c81c509173572c71a101f4eb62a24 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 24 Jun 2021 13:49:51 -0400
Subject: [PATCH 2/2] Add a testcase to confirm the equivalence's are being
 checked by EVRP.

	* gcc.dg/tree-ssa/evrp30.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/evrp30.c | 16 
 1 file changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/evrp30.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c
new file mode 100644
index 000..2c5ff41ecd9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c
@@ -0,0 +1,16 @@
+/* Confirm the ranger is picking up a relationship with equivalences.  */
+/* { dg-do compile } */
+/* { dg-options "-O2  -fdump-tree-evrp" } */
+
+extern void foo ();
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+for (unsigned i = 0; i < a; i++)
+  if (i == b)   // Confirm  i < a also means i < b.
+	foo (); /* Unreachable */
+}
+
+/* { dg-final { scan-tree-dump-times "foo\\(" 0 "evrp"} } */
+
-- 
2.17.2



[COMMITTED] tree-optimization/101189 - Only register relations on live edges

2021-06-24 Thread Andrew MacLeod via Gcc-patches
As mentioned in the PR analysis, we shouldn't register a relation on an 
outgoing conditional edge if range analysis proves that edge can never 
be taken. Its just asking for trouble :-)


Bootstrapped on x86_64-pc-linux-gnu.  Pushed.

Andrew

>From a0accaa99844b0c40661202635859f8c0be76cdd Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 24 Jun 2021 13:35:21 -0400
Subject: [PATCH 1/2] Only register relations on live edges

Register a relation on a conditional edge only if the LHS supports
this edge being taken.

	gcc/
	PR tree-optimization/101189
	* gimple-range-fold.cc (fold_using_range::range_of_range_op): Pass
	LHS range of condition to postfold routine.
	(fold_using_range::postfold_gcond_edges): Only process the TRUE or
	FALSE edge if the LHS range supports it being taken.
	* gimple-range-fold.h (postfold_gcond_edges): Add range parameter.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr101189.c: New.
---
 gcc/gimple-range-fold.cc | 29 +++-
 gcc/gimple-range-fold.h  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr101189.c | 17 ++
 3 files changed, 41 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101189.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 583348e6e36..1fa4ace32b9 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -617,7 +617,7 @@ fold_using_range::range_of_range_op (irange &r, gimple *s, fur_source &src)
 		}
 	}
 	  else if (is_a (s))
-	postfold_gcond_edges (as_a (s), src);
+	postfold_gcond_edges (as_a (s), r, src);
 	}
   else
 	r.set_varying (type);
@@ -1247,9 +1247,11 @@ fold_using_range::relation_fold_and_or (irange& lhs_range, gimple *s,
 // Register any outgoing edge relations from a conditional branch.
 
 void
-fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src)
+fold_using_range::postfold_gcond_edges (gcond *s, irange& lhs_range,
+	fur_source &src)
 {
   int_range_max r;
+  int_range<2> e0_range, e1_range;
   tree name;
   range_operator *handler;
   basic_block bb = gimple_bb (s);
@@ -1257,10 +1259,27 @@ fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src)
   edge e0 = EDGE_SUCC (bb, 0);
   if (!single_pred_p (e0->dest))
 e0 = NULL;
+  else
+{
+  // If this edge is never taken, ignore it.
+  gcond_edge_range (e0_range, e0);
+  e0_range.intersect (lhs_range);
+  if (e0_range.undefined_p ())
+	e0 = NULL;
+}
+
 
   edge e1 = EDGE_SUCC (bb, 1);
   if (!single_pred_p (e1->dest))
 e1 = NULL;
+  else
+{
+  // If this edge is never taken, ignore it.
+  gcond_edge_range (e1_range, e1);
+  e1_range.intersect (lhs_range);
+  if (e1_range.undefined_p ())
+	e1 = NULL;
+}
 
   // At least one edge needs to be single pred.
   if (!e0 && !e1)
@@ -1276,15 +1295,13 @@ fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src)
   gcc_checking_assert (handler);
   if (e0)
 	{
-	  gcond_edge_range (r, e0);
-	  relation_kind relation = handler->op1_op2_relation (r);
+	  relation_kind relation = handler->op1_op2_relation (e0_range);
 	  if (relation != VREL_NONE)
 	src.register_relation (e0, relation, ssa1, ssa2);
 	}
   if (e1)
 	{
-	  gcond_edge_range (r, e1);
-	  relation_kind relation = handler->op1_op2_relation (r);
+	  relation_kind relation = handler->op1_op2_relation (e1_range);
 	  if (relation != VREL_NONE)
 	src.register_relation (e1, relation, ssa1, ssa2);
 	}
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index aeb923145ca..dc1b28f9acc 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -158,6 +158,6 @@ protected:
   void range_of_ssa_name_with_loop_info (irange &, tree, class loop *, gphi *,
 	 fur_source &src);
   void relation_fold_and_or (irange& lhs_range, gimple *s, fur_source &src);
-  void postfold_gcond_edges (gcond *s, fur_source &src);
+  void postfold_gcond_edges (gcond *s, irange &lhs_range, fur_source &src);
 };
 #endif // GCC_GIMPLE_RANGE_FOLD_H
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c b/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c
new file mode 100644
index 000..5730708a0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c
@@ -0,0 +1,17 @@
+/* PR tree-optimization/101189  */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static int a, b;
+int main() {
+  int d = 0, e, f = 5;
+  if (a)
+f = 0;
+  for (; f < 4; f++)
+;
+  e = f ^ -f;
+  e && d;
+  if (!e)
+e || b;
+  return 0;
+}
-- 
2.17.2



[PATCH] c++: CTAD within alias template [PR91911]

2021-06-24 Thread Patrick Palka via Gcc-patches
In the first testcase below, during parsing of the alias template
ConstSpanType, transparency of alias template specializations means we
replace SpanType with SpanType's substituted definition.  But this
substitution lowers the level of the CTAD placeholder for span(T()) from
2 to 1, and so the later instantiantion of ConstSpanType
erroneously substitutes this CTAD placeholder with the template argument
at level 1 index 0, i.e. with int, before we get a chance to perform the
CTAD.

In light of this, it seems we should avoid level lowering when
substituting through through the type-id of a dependent alias template
specialization.  To that end this patch makes lookup_template_class_1
pass tf_partial to tsubst in this situation.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/91911

gcc/cp/ChangeLog:

* pt.c (lookup_template_class_1): When looking up a dependent
alias template specialization, pass tf_partial to tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction92.C: New test.
---
 gcc/cp/pt.c   |  7 +-
 .../g++.dg/cpp1z/class-deduction92.C  | 17 +
 .../g++.dg/cpp1z/class-deduction93.C  | 25 +++
 3 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction93.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f73c7471a33..23c5f515716 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9954,7 +9954,12 @@ lookup_template_class_1 (tree d1, tree arglist, tree 
in_decl, tree context,
template-arguments for the template-parameters in the
type-id of the alias template.  */
 
- t = tsubst (TREE_TYPE (gen_tmpl), arglist, complain, in_decl);
+ /* When substituting a dependent alias template specialization,
+we pass tf_partial to avoid lowering the level of any 'auto'
+in its type-id which might correspond to CTAD placeholders.  */
+ t = tsubst (TREE_TYPE (gen_tmpl), arglist,
+ complain | (is_dependent_type * tf_partial),
+ in_decl);
  /* Note that the call above (by indirectly calling
 register_specialization in tsubst_decl) registers the
 TYPE_DECL representing the specialization of the alias
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
new file mode 100644
index 000..ae3c55508b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
@@ -0,0 +1,17 @@
+// PR c++/91911
+// { dg-do compile { target c++17 } }
+
+template
+struct span {
+  using value_type = T;
+  span(T);
+};
+
+template
+using SpanType = decltype(span(T()));
+
+template
+using ConstSpanType = span::value_type>;
+
+using type = ConstSpanType;
+using type = span;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C
new file mode 100644
index 000..eebc986832e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C
@@ -0,0 +1,25 @@
+// PR c++/98077
+// { dg-do compile { target c++17 } }
+
+template
+struct function {
+  template function(T) { }
+  using type = R;
+};
+
+template function(T) -> function;
+
+template
+struct CallableTrait;
+
+template
+struct CallableTrait> { using ReturnType = R; };
+
+template
+using CallableTraitT = decltype(function{F()});
+
+template
+using ReturnType = typename CallableTraitT::type;
+
+using type = ReturnType;
+using type = int;
-- 
2.32.0.93.g670b81a890



Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*

2021-06-24 Thread Tim Song via Gcc-patches
That example violates http://eel.is/c++draft/unique.ptr.runtime.general#3




On Thu, Jun 24, 2021 at 1:55 PM Patrick Palka via Gcc-patches
 wrote:
>
> On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote:
>
> > The LWG issue proposes to add a conditional noexcept-specifier to
> > std::unique_ptr's dereference operator. The issue is currently in
> > Tentatively Ready status, but even if it isn't voted into the draft, we
> > can do it as a conforming extensions. This commit also adds a similar
> > noexcept-specifier to operator[] for the unique_ptr partial
> > specialization.
>
> The conditional noexcept added to unique_ptr::operator[] seems to break
> the case where T is complete only after the fact:
>
>   struct T;
>   extern std::unique_ptr p;
>   auto& t = p[1];
>   struct T { };
>
> /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename 
> std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], 
> _Dp>::operator[](std::size_t) co
> nst [with _Tp = A; _Dp = std::default_delete; typename 
> std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’:
> testcase.cc:5:14:   required from here
> /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of 
> incomplete type ‘struct A’
>   658 |   
> noexcept(noexcept(std::declval()[std::declval()]))
>   | ~~~^
> testcase.cc:3:8: note: forward declaration of ‘struct A’
> 3 | struct A;
>   |^
>
> >
> > Also ensure that all dereference operators for shared_ptr are noexcept,
> > and adds tests for the std::optional accessors modified by the issue,
> > which were already noexcept in our implementation.
> >
> > Signed-off-by: Jonathan Wakely 
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
> >   Add noexcept.
> >   * include/bits/unique_ptr.h (unique_ptr::operator*): Add
> >   conditional noexcept as per LWG 2762.
> >   * testsuite/20_util/shared_ptr/observers/array.cc: Check that
> >   dereferencing cannot throw.
> >   * testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
> >   * testsuite/20_util/optional/observers/lwg2762.cc: New test.
> >   * testsuite/20_util/unique_ptr/lwg2762.cc: New test.
> >
> > Tested powerpc64le-linux. Committed to trunk.
> >
> >


Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-24 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:26 PM, Jeff Law wrote:



On 6/23/2021 1:47 PM, Martin Sebor via Gcc-patches wrote:

On 6/22/21 5:28 PM, David Malcolm wrote:

On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:

On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:

The attached patch introduces the suppress_warning(),
warning_suppressed(), and copy_no_warning() APIs without making
use of them in the rest of GCC.  They are in three files:

    diagnostic-spec.{h,c}: Location-centric overloads.
    warning-control.cc: Tree- and gimple*-centric overloads.

The location-centric overloads are suitable to use from the
diagnostic
subsystem.  The rest can be used from the front ends and the middle
end.


[...snip...]


+/* Return true if warning OPT is suppressed for decl/expression
EXPR.
+   By default tests the disposition for any warning.  */
+
+bool
+warning_suppressed_p (const_tree expr, opt_code opt /* =
all_warnings */)
+{
+  const nowarn_spec_t *spec = get_nowarn_spec (expr);
+
+  if (!spec)
+    return get_no_warning_bit (expr);
+
+  const nowarn_spec_t optspec (opt);
+  bool dis = *spec & optspec;
+  gcc_assert (get_no_warning_bit (expr) || !dis);
+  return dis;


Looking through the patch, I don't like the use of "dis" for the "is
suppressed" bool, since...


It's an argument to a a function, unlikely to confuse anyone.  But
I also don't think it's important enough to argue about so I changed
it along with the comment below.



[...snip...]


+
+/* Enable, or by default disable, a warning for the statement STMT.
+   The wildcard OPT of -1 controls all warnings.  */


...I find the above comment to be confusingly worded due to the double-
negative.

If I'm reading your intent correctly, how about this wording:

/* Change the supression of warnings for statement STMT.
    OPT controls which warnings are affected.
    The wildcard OPT of -1 controls all warnings.
    If SUPP is true (the default), enable the suppression of the
warnings.
    If SUPP is false, disable the suppression of the warnings.  */

...and rename "dis" to "supp".

Or have I misread the intent of the patch?


+
+void
+suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */,
+ bool dis /* = true */)



+{
+  if (opt == no_warning)
+    return;
+
+  const key_type_t key = convert_to_key (stmt);
+
+  dis = suppress_warning_at (key, opt, dis) || dis;
+  set_no_warning_bit (stmt, dis);
+}


[...snip...]


Also, I think I prefer having a separate entrypoints for the "all
warnings" case; on reading through the various patches I think that in
e.g.:

-  TREE_NO_WARNING (*expr_p) = 1;
+  suppress_warning (*expr_p);

I prefer:

   suppress_warnings (*expr_p);

(note the plural) since that way we can grep for them, and it seems
like better grammar to me.

Both entrypoints could be implemented by a static suppress_warning_1
internally if that makes it easier.

In that vein, "unsuppress_warning" seems far clearer to me that
"suppress_warning (FOO, false)"; IIRC there are very few uses of this
non-default arg (I couldn't find any in a quick look through the v2
kit).

Does this make sense?


In my experience, names that differ only slightly (such as in just
one letter) tend to easily lead to frustrating mistakes.  The new
warning_suppressed_p() added in this patch also handles multiple
warnings (via a wildcard) and uses a singular form, and as do
the gimple_{get,set}no_warning() functions that are being replaced.
So I don't think the suggested change is needed or would be
an improvement.

Similarly, there is no gimple_unset_no_warning() today and I don't
think adding unsuppress_warning() is necessary or would add value,
especially since there are just a handful of callers that re-enable
warnings.  For the callers that might need to enable or disable
a warning based on some condition, it's more convenient to do so
in a single call then by having to call different functions based
the value of the condition.

In the attached patch I've changed the comment as you asked and
also renamed the dis argument to supp but kept the function names
the same.  I've also moved a few changes to tree.h from a later
patch to this one to let it stand alone and regtested it on its
own on x86_64-linux.  I'll plan to go ahead with this version
unless requestsfor substantive changes come up in the review of
the rest  of the changes.

Thanks
Martin

gcc-no-warning-1.diff

Add support for per-location warning groups.

gcc/ChangeLog:

* Makefile.in (OBJS-libcommon): Add diagnostic-spec.o.
* gengtype.c (open_base_files): Add diagnostic-spec.h.
* diagnostic-spec.c: New file.
* diagnostic-spec.h: New file.
* tree.h (no_warning, all_warnings, suppress_warning_at): New
declarations.
* warning-control.cc: New file.

diff --git a/gcc/diagnostic-spec.h b/gcc/diagnostic-spec.h
new file mode 100644
index 000..62d9270ca6d
--- /dev/null
+++ b/gcc/diagnostic-spec.h
@@ -0,0 +1,140 @@
+/* Language-ind

Re: [PATCH 3/13] v2 Use new per-location warning APIs in C front end

2021-06-24 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:09 PM, Jeff Law wrote:



On 6/4/2021 3:41 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the C front
end with the new suppress_warning(), warning_suppressed_p(), and
copy_warning() APIs.

gcc-no-warning-c.diff

Add support for per-location warning groups.

gcc/c/ChangeLog:

* c-decl.c (pop_scope): Replace direct uses of TREE_NO_WARNING with
warning_suppressed_p, suppress_warning, and copy_no_warning.
(diagnose_mismatched_decls): Same.
(duplicate_decls): Same.
(grokdeclarator): Same.
(finish_function): Same.
(c_write_global_declarations_1): Same.
* c-fold.c (c_fully_fold_internal): Same.
* c-parser.c (c_parser_expr_no_commas): Same.
(c_parser_postfix_expression): Same.
* c-typeck.c (array_to_pointer_conversion): Same.
(function_to_pointer_conversion): Same.
(default_function_array_conversion): Same.
(convert_lvalue_to_rvalue): Same.
(default_conversion): Same.
(build_indirect_ref): Same.
(build_function_call_vec): Same.
(build_atomic_assign): Same.
(build_unary_op): Same.
(c_finish_return): Same.
(emit_side_effect_warnings): Same.
(c_finish_stmt_expr): Same.
(c_omp_clause_copy_ctor): Same.

OK once prereqs are approved.


I've retested this on top of the first patch and pushed r12-1802.

Martin


Re: [PATCH 4/13] v2 Use new per-location warning APIs in C family code

2021-06-24 Thread Martin Sebor via Gcc-patches

On 6/23/21 11:06 PM, Jeff Law wrote:



On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the shared
C family front end with the new suppress_warning(),
warning_suppressed_p(), and copy_warning() APIs.

gcc-no-warning-c-family.diff

Add support for per-location warning groups.

gcc/c-family/ChangeLog:

* c-common.c (c_wrap_maybe_const): Remove TREE_NO_WARNING.
(c_common_truthvalue_conversion): Replace direct uses of
TREE_NO_WARNING with warning_suppressed_p, suppress_warning, and
copy_no_warning.
(check_function_arguments_recurse): Same.
* c-gimplify.c (c_gimplify_expr): Same.
* c-warn.c (overflow_warning): Same.
(warn_logical_operator): Same.
(warn_if_unused_value): Same.
(do_warn_unused_parameter): Same.

OK once prereqs are approved.
jeff



Retested on top patch 1 and the C FE stuff and pushed r12-1803.

Martin


  1   2   >