Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-23 Thread Jonathan Wakely via Gcc-patches
On Wed, 23 Aug 2023, 06:15 Hongtao Liu via Libstdc++, 
wrote:

> On Wed, Aug 23, 2023 at 7:28 AM Hongtao Liu  wrote:
> >
> > On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
> >  wrote:
> > >
> > > On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via
> Gcc-patches wrote:
> > > > Committed as obvious.
> > > >
> > > > Less obvious (to me) is whether it's correct to say "GCC V13" here. I
> > > > don't think we refer to a version that way anywhere else, do we?
> > > >
> > > > Would "since GCC 13.1.0" be better?
> > >
> > > x86_field_alignment uses
> > >
> > >   inform (input_location, "the alignment of %<_Atomic %T%>
> "
> > >   "fields changed in %{GCC 11.1%}",
> > >
> > > so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
> > > to me.
> >  %{GCC 13.1%} sounds reasonable.
> looks like %{ can't be using in const char*, so use % instead.
>
> How about:
>
> Author: liuhongt 
> Date:   Wed Aug 23 07:31:13 2023 +0800
>
> Adjust GCC V13 to GCC 13.1 in diagnotic.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC
> V13 to GCC 13.1.
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index e7822ef6500..88d9d7d537f 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22899,7 +22899,7 @@ ix86_invalid_conversion (const_tree fromtype,
> const_tree totype)
>   || (TYPE_MODE (totype) == BFmode
>   && TYPE_MODE (fromtype) == HImode))
> warning (0, "%<__bfloat16%> is redefined from typedef % "
> -   "to real %<__bf16%> since GCC V13, be careful of "
> +   "to real %<__bf16%> since %, be careful of "
>  "implicit conversion between %<__bf16%> and %; "
>  "an explicit bitcast may be needed here");
>  }
>


Why does it need to be quoted? What's wrong with just saying GCC 13.1
without the %< decoration?




> > >
> > > > -- >8 --
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * config/i386/i386.cc (ix86_invalid_conversion): Fix grammar.
> > > > ---
> > > >  gcc/config/i386/i386.cc | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index 50860050049..5d57726e22c 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree
> fromtype, const_tree totype)
> > > >   warning (0, "%<__bfloat16%> is redefined from typedef
> % "
> > > >   "to real %<__bf16%> since GCC V13, be careful of "
> > > >"implicit conversion between %<__bf16%> and
> %; "
> > > > -  "a explicit bitcast may be needed here");
> > > > +  "an explicit bitcast may be needed here");
> > > >  }
> > > >
> > > >/* Conversion allowed.  */
> > > > --
> > > > 2.41.0
> > > >
> > >
> > > Marek
> > >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
>
> On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > evex instruction patterns.
> >
> > Why?
> > Internally for md etc. purposes, we should have the current
> > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > etc., or some other name) which says if 512-bit vector modes can be used,
> > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > keep -mavx10.1 just as an command line option which enables/disables
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.

As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
So instead
we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
-mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
set or not.

We then have the -mevex512 flag (or whatever name we agree to) to enable
(or disable) 512bit support.

If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
-mno-evex512,
but Jakub disagrees here, so I'd rather not have it at all.  We could have
-mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

Richard.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 23, 2023 3:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET*
> would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> 
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
> 
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
> 
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
these -mavx10.1-256/512 is quite controversial.

Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

Thx,
Haochen

> 
> Richard.


Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 3:02 PM Jonathan Wakely  wrote:
>
>
>
> On Wed, 23 Aug 2023, 06:15 Hongtao Liu via Libstdc++,  
> wrote:
>>
>> On Wed, Aug 23, 2023 at 7:28 AM Hongtao Liu  wrote:
>> >
>> > On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
>> >  wrote:
>> > >
>> > > On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via 
>> > > Gcc-patches wrote:
>> > > > Committed as obvious.
>> > > >
>> > > > Less obvious (to me) is whether it's correct to say "GCC V13" here. I
>> > > > don't think we refer to a version that way anywhere else, do we?
>> > > >
>> > > > Would "since GCC 13.1.0" be better?
>> > >
>> > > x86_field_alignment uses
>> > >
>> > >   inform (input_location, "the alignment of %<_Atomic %T%> "
>> > >   "fields changed in %{GCC 11.1%}",
>> > >
>> > > so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
>> > > to me.
>> >  %{GCC 13.1%} sounds reasonable.
>> looks like %{ can't be using in const char*, so use % instead.
>>
>> How about:
>>
>> Author: liuhongt 
>> Date:   Wed Aug 23 07:31:13 2023 +0800
>>
>> Adjust GCC V13 to GCC 13.1 in diagnotic.
>>
>> gcc/ChangeLog:
>>
>> * config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC
>> V13 to GCC 13.1.
>>
>> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> index e7822ef6500..88d9d7d537f 100644
>> --- a/gcc/config/i386/i386.cc
>> +++ b/gcc/config/i386/i386.cc
>> @@ -22899,7 +22899,7 @@ ix86_invalid_conversion (const_tree fromtype,
>> const_tree totype)
>>   || (TYPE_MODE (totype) == BFmode
>>   && TYPE_MODE (fromtype) == HImode))
>> warning (0, "%<__bfloat16%> is redefined from typedef % "
>> -   "to real %<__bf16%> since GCC V13, be careful of "
>> +   "to real %<__bf16%> since %, be careful of "
>>  "implicit conversion between %<__bf16%> and %; "
>>  "an explicit bitcast may be needed here");
>>  }
>
>
>
> Why does it need to be quoted? What's wrong with just saying GCC 13.1 without 
> the %< decoration?
I'll just remove that.
>
>
>
>>
>> > >
>> > > > -- >8 --
>> > > >
>> > > > gcc/ChangeLog:
>> > > >
>> > > >   * config/i386/i386.cc (ix86_invalid_conversion): Fix grammar.
>> > > > ---
>> > > >  gcc/config/i386/i386.cc | 2 +-
>> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> > > > index 50860050049..5d57726e22c 100644
>> > > > --- a/gcc/config/i386/i386.cc
>> > > > +++ b/gcc/config/i386/i386.cc
>> > > > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree fromtype, 
>> > > > const_tree totype)
>> > > >   warning (0, "%<__bfloat16%> is redefined from typedef % "
>> > > >   "to real %<__bf16%> since GCC V13, be careful of "
>> > > >"implicit conversion between %<__bf16%> and %; "
>> > > > -  "a explicit bitcast may be needed here");
>> > > > +  "an explicit bitcast may be needed here");
>> > > >  }
>> > > >
>> > > >/* Conversion allowed.  */
>> > > > --
>> > > > 2.41.0
>> > > >
>> > >
>> > > Marek
>> > >
>> >
>> >
>> > --
>> > BR,
>> > Hongtao
>>
>>
>>
>> --
>> BR,
>> Hongtao



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 23, 2023 at 01:57:59AM +, Jiang, Haochen wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > would be
> > > > like now, except that the current AVX512* sets imply also 
> > > > EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or 
> > > > TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> 
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
> 
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.

Ah, forgot about scalar instructions, even better, then we don't have to do
that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && 
!TARGET_AVX512VL
in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
the patterns which refer to 512-bit modes.  Also wonder if it
wouldn't be easiest to make "v" constraint in that case be equivalent to
just "x" so that all those hacks to make xmm16+ registers working in various
instructions through g modifiers wouldn't trigger.  Sure, that would
penalize also scalar instructions, but the above case wouldn't be something
any CPU actually supports, it would be only the common subset of say XeonPhi
and AVX10.1-256.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 3:33 PM Richard Biener
 wrote:
>
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
>
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
>
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
>
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
I think we can just support -mevex512 for now, as for avx10.1-256/512
it can wait for a while, considering it doesn't have new instructions
and is controversial.
Basically, -mno-evex512 is good enough for most needs.
The only part I disagree with Jakub is I think for -mavx512f
-mno-evex512 -mavx512bw, we need to disable 512-bit, an explicit
-mno-evex512 should precedence over implicit yes.
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).
>
> Richard.



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 4:16 PM Jakub Jelinek  wrote:
>
> On Wed, Aug 23, 2023 at 01:57:59AM +, Jiang, Haochen wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > > would be
> > > > > like now, except that the current AVX512* sets imply also 
> > > > > EVEX512/whatever
> > > > > it will be called, that option itself enables nothing (or 
> > > > > TARGET_AVX512F),
> > > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not only
> > enable 512 bit vector instructions but also enable scalar instructions, 
> > which
> > means when it comes to -mavx512bw -mno-evex512, we should enable
> > the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need something
> > to distinguish them out from the ISA set w/o AVX512VL.
>
> Ah, forgot about scalar instructions, even better, then we don't have to do
> that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && 
> !TARGET_AVX512VL
> in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
> should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
> the patterns which refer to 512-bit modes.  Also wonder if it
> wouldn't be easiest to make "v" constraint in that case be equivalent to
> just "x" so that all those hacks to make xmm16+ registers working in various
We can clear evex sse register in ix86_conditional_register_usage when
TARGET_AVX512F && !TARGET_EVEX512 && !TARGET_AVX512VL if we don't care
much about scalar ones.
> instructions through g modifiers wouldn't trigger.  Sure, that would
> penalize also scalar instructions, but the above case wouldn't be something
> any CPU actually supports, it would be only the common subset of say XeonPhi
> and AVX10.1-256.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 23, 2023 at 08:03:58AM +, Jiang, Haochen wrote:
> We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
> these -mavx10.1-256/512 is quite controversial.
> 
> Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector 
> right?

I think it should enable them because -mavx512f is after it.  But it seems the
option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
just cancels each other, rather than
enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
But, if one uses separate target pragmas, it behaves like that:
#pragma GCC target ("avx512bw")
#ifdef __AVX512F__
int a;
#endif
#ifdef __AVX512BW__
int b;
#endif
#pragma GCC target ("no-avx512bw")
#ifdef __AVX512F__
int c;
#endif
#ifdef __AVX512BW__
int d;
#endif
The above defines a, b and c vars even without any special -march= or other
command line option.

So, first important decision would be whether to make EVEX512
OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
to move some other ISA flag from the first to second set.
That OPTION_MASK_ISA*_EVEX512 then should be added to
OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
the latter, we also need to do that for tons of other AVX512*_SET),
and then just arrange for -mavx10.1-256 to enable
OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
(but, only disable it from the newly added set, not actually act as
-mavx512{f,bw,...} -mno-evex512).
OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
or just EVEX512?
And then the UNSET cases...

Jakub



Re: Loop-ch improvements, part 3

2023-08-23 Thread Jan Hubicka via Gcc-patches
> We seem to peel one iteration for no good reason.  The loop is
> a do-while loop already.  The key is we see the first iteration
> exit condition is known not taken and then:
Hi,
this is patch fixing wrong return value in should_duplicate_loop_header_p.
Doing so uncovered suboptimal decisions on some jump threading testcases
where we chose to stop duplicating just before basic block that has zero
cost and duplicating so would be always a win.

This is because the heuristics trying to chose right point to duplicate
all winning blocks and to get loop to be do_while did not account
zero_cost blocks in all cases.  The patch simplifies the logic by
simply remembering zero cost blocks and handling them last after
the right stopping point is chosen.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-ch.cc (enum ch_decision): Fix comment.
(should_duplicate_loop_header_p): Fix return value for static exits.
(ch_base::copy_headers): Improve handling of ch_possible_zero_cost.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-headers-9.c: Update template.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
index b49d1fc9576..11ee29458a2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
@@ -13,7 +13,6 @@ void test (int m, int n)
}
while (i<10);
 }
-/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 2 "ch2" } } */
-/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win. it has zero" 
1 "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will duplicate bb" 2 "ch2" } } */
 /* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 6cdb87a762f..461416e4086 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -176,7 +176,7 @@ enum ch_decision
   ch_impossible,
   /* We can copy it if it enables wins.  */
   ch_possible,
-  /* We can "cop" it if it enables wins and doing
+  /* We can "copy" it if it enables wins and doing
  so will introduce no new code.  */
   ch_possible_zero_cost,
   /* We want to copy.  */
@@ -464,7 +464,7 @@ should_duplicate_loop_header_p (basic_block header, class 
loop *loop,
  TODO: Even if duplication costs some size we may opt to do so in case
  exit probability is significant enough (do partial peeling).  */
   if (static_exit)
-return code_size_cost ? ch_possible_zero_cost : ch_win;
+return !code_size_cost ? ch_possible_zero_cost : ch_possible;
 
   /* We was not able to prove that conditional will be eliminated.  */
   int insns = estimate_num_insns (last, &eni_size_weights);
@@ -824,6 +824,7 @@ ch_base::copy_headers (function *fun)
   int last_win_nheaders = 0;
   bool last_win_invariant_exit = false;
   ch_decision ret;
+  auto_vec  decision;
   hash_set  *invariant_exits = new hash_set ;
   hash_set  *static_exits = new hash_set ;
   while ((ret = should_duplicate_loop_header_p (header, loop, ranger,
@@ -833,6 +834,7 @@ ch_base::copy_headers (function *fun)
 != ch_impossible)
{
  nheaders++;
+ decision.safe_push (ret);
  if (ret >= ch_win)
{
  last_win_nheaders = nheaders;
@@ -841,20 +843,6 @@ ch_base::copy_headers (function *fun)
fprintf (dump_file, "Duplicating bb %i is a win\n",
 header->index);
}
- /* Duplicate BB if has zero cost but be sure it will not
-imply duplication of other BBs.  */
- else if (ret == ch_possible_zero_cost
-  && (last_win_nheaders == nheaders - 1
-  || (last_win_nheaders == nheaders - 2
-  && last_win_invariant_exit)))
-   {
- last_win_nheaders = nheaders;
- last_win_invariant_exit = false;
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file,
-"Duplicating bb %i is a win; it has zero cost\n",
-header->index);
-   }
  else
if (dump_file && (dump_flags & TDF_DETAILS))
  fprintf (dump_file, "May duplicate bb %i\n", header->index);
@@ -884,6 +872,16 @@ ch_base::copy_headers (function *fun)
fprintf (dump_file,
 "Duplicating header BB to obtain do-while loop\n");
}
+  /* "Duplicate" all BBs with zero cost following last basic blocks we
+decided to copy.  */
+  while (last_win_nheaders < (int)decision.length ()
+&& decision[last_win_nheaders] == ch_possible_zero_cost)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Dupl

Re: [PATCH V2] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-23 Thread Robin Dapp via Gcc-patches
OK, thanks.

Regards
 Robin


Re: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 4:31 PM Jakub Jelinek  wrote:
>
> On Wed, Aug 23, 2023 at 08:03:58AM +, Jiang, Haochen wrote:
> > We could first work on -mevex512 then further discuss -mavx10.1-256/512 
> > since
> > these -mavx10.1-256/512 is quite controversial.
> >
> > Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector 
> > right?
>
> I think it should enable them because -mavx512f is after it.  But it seems the
> option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
> just cancels each other, rather than
> enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
> alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
> But, if one uses separate target pragmas, it behaves like that:
> #pragma GCC target ("avx512bw")
> #ifdef __AVX512F__
> int a;
> #endif
> #ifdef __AVX512BW__
> int b;
> #endif
> #pragma GCC target ("no-avx512bw")
> #ifdef __AVX512F__
> int c;
> #endif
> #ifdef __AVX512BW__
> int d;
> #endif
> The above defines a, b and c vars even without any special -march= or other
> command line option.
>
> So, first important decision would be whether to make EVEX512
> OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
> to move some other ISA flag from the first to second set.
> That OPTION_MASK_ISA*_EVEX512 then should be added to
> OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
> the latter, we also need to do that for tons of other AVX512*_SET),
> and then just arrange for -mavx10.1-256 to enable
> OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
> (but, only disable it from the newly added set, not actually act as
> -mavx512{f,bw,...} -mno-evex512).
> OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
> or just EVEX512?
> And then the UNSET cases...
We can make OPTION_MASK_ISA2_EVEX512, but not set/unset that in
ix86_handle_option, but in ix86_option_override_internal, after all
set/unset for the existing AVX512***, if there's still
OPTION_MASK_ISA_AVX512F and no explicit set/unset for
OPTION_MASK_ISA2_EVEX512, then we set OPTION_MASK_ISA2_EVEX512.
That would make -mavx512*** implicitly set -mevex-512, but when
there's explicit -mno-evex512, -mavx512f won't set -mevex512 no matter
where -mno-evex512 is put.(-mno-evex512 -mavx512f still disable
512-bit).
>
> Jakub
>


-- 
BR,
Hongtao


Re: Loop-ch improvements, part 3

2023-08-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Aug 2023, Jan Hubicka wrote:

> > We seem to peel one iteration for no good reason.  The loop is
> > a do-while loop already.  The key is we see the first iteration
> > exit condition is known not taken and then:
> Hi,
> this is patch fixing wrong return value in should_duplicate_loop_header_p.
> Doing so uncovered suboptimal decisions on some jump threading testcases
> where we chose to stop duplicating just before basic block that has zero
> cost and duplicating so would be always a win.
> 
> This is because the heuristics trying to chose right point to duplicate
> all winning blocks and to get loop to be do_while did not account
> zero_cost blocks in all cases.  The patch simplifies the logic by
> simply remembering zero cost blocks and handling them last after
> the right stopping point is chosen.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   * tree-ssa-loop-ch.cc (enum ch_decision): Fix comment.
>   (should_duplicate_loop_header_p): Fix return value for static exits.
>   (ch_base::copy_headers): Improve handling of ch_possible_zero_cost.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/copy-headers-9.c: Update template.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
> index b49d1fc9576..11ee29458a2 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
> @@ -13,7 +13,6 @@ void test (int m, int n)
>   }
>   while (i<10);
>  }
> -/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 2 "ch2" } } 
> */
> -/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win. it has 
> zero" 1 "ch2" } } */
> +/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } 
> */
>  /* { dg-final { scan-tree-dump-times "Will duplicate bb" 2 "ch2" } } */
>  /* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
> diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> index 6cdb87a762f..461416e4086 100644
> --- a/gcc/tree-ssa-loop-ch.cc
> +++ b/gcc/tree-ssa-loop-ch.cc
> @@ -176,7 +176,7 @@ enum ch_decision
>ch_impossible,
>/* We can copy it if it enables wins.  */
>ch_possible,
> -  /* We can "cop" it if it enables wins and doing
> +  /* We can "copy" it if it enables wins and doing
>   so will introduce no new code.  */
>ch_possible_zero_cost,
>/* We want to copy.  */
> @@ -464,7 +464,7 @@ should_duplicate_loop_header_p (basic_block header, class 
> loop *loop,
>   TODO: Even if duplication costs some size we may opt to do so in case
>   exit probability is significant enough (do partial peeling).  */
>if (static_exit)
> -return code_size_cost ? ch_possible_zero_cost : ch_win;
> +return !code_size_cost ? ch_possible_zero_cost : ch_possible;
>  
>/* We was not able to prove that conditional will be eliminated.  */
>int insns = estimate_num_insns (last, &eni_size_weights);
> @@ -824,6 +824,7 @@ ch_base::copy_headers (function *fun)
>int last_win_nheaders = 0;
>bool last_win_invariant_exit = false;
>ch_decision ret;
> +  auto_vec  decision;
>hash_set  *invariant_exits = new hash_set ;
>hash_set  *static_exits = new hash_set ;
>while ((ret = should_duplicate_loop_header_p (header, loop, ranger,
> @@ -833,6 +834,7 @@ ch_base::copy_headers (function *fun)
>!= ch_impossible)
>   {
> nheaders++;
> +   decision.safe_push (ret);
> if (ret >= ch_win)
>   {
> last_win_nheaders = nheaders;
> @@ -841,20 +843,6 @@ ch_base::copy_headers (function *fun)
>   fprintf (dump_file, "Duplicating bb %i is a win\n",
>header->index);
>   }
> -   /* Duplicate BB if has zero cost but be sure it will not
> -  imply duplication of other BBs.  */
> -   else if (ret == ch_possible_zero_cost
> -&& (last_win_nheaders == nheaders - 1
> -|| (last_win_nheaders == nheaders - 2
> -&& last_win_invariant_exit)))
> - {
> -   last_win_nheaders = nheaders;
> -   last_win_invariant_exit = false;
> -   if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file,
> -  "Duplicating bb %i is a win; it has zero cost\n",
> -  header->index);
> - }
> else
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "May duplicate bb %i\n", header->index);
> @@ -884,6 +872,16 @@ ch_base::copy_headers (function *fun)
>   fprintf (dump_file,
>"Duplicating header BB to obtain do-while loop\n");
>   }
> +  /* "Duplicate" all BBs with zero cost following last basic blocks we
> +  decided to copy.  */
> +  while (last_win_nheaders < (int)deci

Fix profile update in tree-ssa-reassoc

2023-08-23 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds missing profile update to maybe_optimize_range_tests.
Jakub, I hope I got the code right: I think it basically analyzes the
chain of conditionals, finds some basic blocks involved in the range
testing and then puts all the test into first BB.

The patch fixes gcc.dg/tree-ssa/update-threading.c profile misupdate on
power-pc.  Curiously enough the code is produced differently for x86_64.
I tried to find testcase for x86_64 and found that

testsuite/gcc.dg/tree-ssa/reassoc-33.c
testsuite/gcc.dg/tree-ssa/reassoc-37.c
testsuite/gcc.dg/tree-ssa/reassoc-43.c

are testing this function. However sadly neighter of these testcases seems
to work as expected.  For example in testsuite/gcc.dg/tree-ssa/reassoc-33.c
we turn

;; basic block 3, loop depth 0, count 708669600 (estimated locally, freq 
0.6600), maybe hot
;;  prev block 2, next block 4, flags: (NEW, VISITED)
;;  pred:   2 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)
_4 = a_14(D) == 44;
_5 = a_14(D) == 78;
_30 = 0;
_6 = _4 | _5;
if (_30 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]
;;  succ:   7 [34.0% (guessed)]  count:240947667 (estimated locally, freq 
0.2244) (TRUE_VALUE,EXECUTABLE)
;;  4 [66.0% (guessed)]  count:467721933 (estimated locally, freq 
0.4356) (FALSE_VALUE,EXECUTABLE)

to

;; basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;  prev block 0, next block 3, flags: (NEW, VISITED)
;;  pred:   ENTRY [always]  count:1073741824 (estimated locally, freq 
1.) (FALLTHRU,EXECUTABLE)
_18 = (unsigned int) a_14(D);
_19 = _18 + 4294967253;
_24 = (unsigned int) a_14(D);
_25 = _24 + 4294967253;
_26 = _25 & 4294967260;
_27 = _26 == 0;
_20 = _19 <= 3;
_1 = a_14(D) == 43;
_21 = (unsigned int) a_14(D);
_22 = _21 + 4294967221;
_23 = _22 <= 3;
_2 = a_14(D) == 75;
_31 = _27;
_3 = _1 | _2;
if (_31 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]

which replaces later tests

;; basic block 4, loop depth 0, count 467721934 (estimated locally, freq 
0.4356), maybe hot
;;  prev block 3, next block 5, flags: (NEW, VISITED)
;;  pred:   3 [66.0% (guessed)]  count:467721933 (estimated locally, freq 
0.4356) (FALSE_VALUE,EXECUTABLE)
_7 = a_14(D) == 77;
_8 = a_14(D) == 46;
_29 = 0;
_9 = _7 | _8;
if (_29 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]

;; basic block 5, loop depth 0, count 308696475 (estimated locally, freq 
0.2875), maybe hot
;;  prev block 4, next block 6, flags: (NEW, VISITED)
;;  pred:   4 [66.0% (guessed)]  count:308696475 (estimated locally, freq 
0.2875) (FALSE_VALUE,EXECUTABLE)
_10 = a_14(D) == 76;
_11 = a_14(D) == 45;
_28 = 0;
_12 = _10 | _11;
if (_28 != 0)
  goto ; [50.00%]
else
  goto ; [50.00%]
;;  succ:   7 [50.0% (guessed)]  count:154348238 (estimated locally, freq 
0.1437) (TRUE_VALUE,EXECUTABLE)
;;  6 [50.0% (guessed)]  count:154348238 (estimated locally, freq 
0.1437) (FALSE_VALUE,EXECUTABLE)

However BB4 and BB5 is not updated to be unconditional by tree-ssa-reassoc pass
and we thus miss the profile update.

This happens later in forwprop but at that time it is too late to update the 
probabilities.
So we get:

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;prev block 0, next block 3, flags: (NEW, VISITED)
;;pred:   ENTRY [always]  count:1073741824 (estimated locally, freq 
1.) (FALLTHRU,EXECUTABLE)
  _24 = (unsigned int) a_14(D);
  _25 = _24 + 4294967253;
  _26 = _25 & 4294967260;
  _27 = _26 == 0;
  if (_26 == 0)
goto ; [34.00%]
  else
goto ; [66.00%]
;;succ:   4 [34.0% (guessed)]  count:365072224 (estimated locally, freq 
0.3400) (TRUE_VALUE,EXECUTABLE)
;;3 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 154348237 (estimated locally, freq 
0.1437), maybe hot
;;   Invalid sum of incoming counts 708669600 (estimated locally, freq 0.6600), 
should be 154348237 (estimated locally, freq 0.1437)
;;prev block 2, next block 4, flags: (NEW, VISITED)
;;pred:   2 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)
;;succ:   4 [always]  count:154348237 (estimated locally, freq 0.1437) 
(FALLTHRU,EXECUTABLE) c.c:12:12

;;   basic block 4, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;   Invalid sum of incoming counts 519420461 (estimated locally, freq 0.4837), 
should be 1073741824 (estimated locally, freq 1.)
;;prev block 3, next block 1, flags: (NEW, VISITED)
;;pred:   2 [34.0% (guessed)]  count:365072224 (estimated locally, freq 
0.3400) (TRUE_VALUE,EXECUTABLE)
;;3 [always]  count:154348237 (estimated locally, freq 0.1437) 
(FALLTHRU,EXECUTABLE) c.c:12:12
  # _13 = PHI 
  return _13;

Jakub, it seems that the code is originally yours.  Any idea why those are not 
turned to
constant true or

Re: [PATCH V2] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-23 Thread Lehua Ding

Committed, thanks.

On 2023/8/23 16:45, Robin Dapp wrote:

OK, thanks.

Regards
  Robin




--
Best,
Lehua



Re: [PATCH 03/11] aarch64: Use br instead of ret for eh_return

2023-08-23 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy  writes:
> The expected way to handle eh_return is to pass the stack adjustment
> offset and landing pad address via
>
>   EH_RETURN_STACKADJ_RTX
>   EH_RETURN_HANDLER_RTX
>
> to the epilogue that is shared between normal return paths and the
> eh_return paths.  EH_RETURN_HANDLER_RTX is the stack slot of the
> return address that is overwritten with the landing pad in the
> eh_return case and EH_RETURN_STACKADJ_RTX is a register added to sp
> right before return and it is set to 0 in the normal return case.
>
> The issue with this design is that eh_return and normal return may
> require different return sequence but there is no way to distinguish
> the two cases in the epilogue (the stack adjustment may be 0 in the
> eh_return case too).
>
> The reason eh_return and normal return requires different return
> sequence is that control flow integrity hardening may need to treat
> eh_return as a forward-edge transfer (it is not returning to the
> previous stack frame) and normal return as a backward-edge one.
> In case of AArch64 forward-edge is protected by BTI and requires br
> instruction and backward-edge is protected by PAUTH or GCS and
> requires ret (or authenticated ret) instruction.
>
> This patch resolves the issue by using the EH_RETURN_STACKADJ_RTX
> register only as a flag that is set to 1 in the eh_return paths
> (it is 0 in normal return paths) and introduces
>
>   AARCH64_EH_RETURN_STACKADJ_RTX
>   AARCH64_EH_RETURN_HANDLER_RTX
>
> to pass the actual stack adjustment and landing pad address to the
> epilogue in the eh_return case. Then the epilogue can use the right
> return sequence based on the EH_RETURN_STACKADJ_RTX flag.
>
> The handler could be passed the old way via clobbering the return
> address, but since now the eh_return case can be distinguished, the
> handler can be in a different register than x30 and no stack frame
> is needed for eh_return.

I don't think there's any specific target-independent requirement for
EH_RETURN_HANDLER_RTX to be a stack slot.  df-scan.cc has code to handle
registers.

So couldn't we just use EH_RETURN_HANDLER_RTX for this, rather than
making it AARCH64_EH_RETURN_HANDLER_RTX?

> The new code generation for functions with eh_return is not amazing,
> since x5 and x6 is assumed to be used by the epilogue even in the
> normal return path, not just for eh_return.  But only the unwinder
> is expected to use eh_return so this is fine.

I guess the problem here is that x5 and x6 are upwards-exposed on
the non-eh_return paths, and so are treated as live for most of the
function.  Is that right?

The patch seems to be using the existing interfaces to implement
a slightly different model.  E.g. if feels like a hack (but a neat hack)
that EH_RETURN_STACKADJ_RTX is now a flag rather than an adjustment,
with AARCH64_EH_RETURN_STACKADJ_RTX then being the "real" stack
adjustment.  And the reason for the upwards exposure of the new
registers on normal return paths is that the existing model has
no hook into the normal return path.

Rather than hiding this in target code, perhaps we should add a
target-independent concept of an "eh_return taken" flag, say
EH_RETURN_TAKEN_RTX.

We could define it so that, on targets that define EH_RETURN_TAKEN_RTX,
a register EH_RETURN_STACKADJ_RTX and a register EH_RETURN_HANDLER_RTX
are only meaningful when the flag is true.  E.g. we could have:

#ifdef EH_RETURN_HANDLER_RTX
  for (rtx tmp : { EH_RETURN_STACKADJ_RTX, EH_RETURN_HANDLER_RTX })
if (tmp && REG_P (tmp))
  emit_clobber (tmp);
#endif

in the "normal return" part of expand_eh_return.  (If some other target
wants a flag with different semantics, it'd be up to them to add it.)

That should avoid most of the bad code-quality effects, since the
specialness of x4-x6 will be confined to the code immediately before
the pre-epilogue exit edges.

Thanks,
Richard

> This patch fixes a return to anywhere gadget in the unwinder with
> existing standard branch protection as well as makes EH return
> compatible with the Guarded Control Stack (GCS) extension.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-protos.h (aarch64_eh_return_handler_rtx):
>   Remove.
>   (aarch64_eh_return): New.
>   * config/aarch64/aarch64.cc (aarch64_return_address_signing_enabled):
>   Sign return address even in functions with eh_return.
>   (aarch64_epilogue_uses): Mark two registers as used.
>   (aarch64_expand_epilogue): Conditionally return with br or ret.
>   (aarch64_eh_return_handler_rtx): Remove.
>   (aarch64_eh_return): New.
>   * config/aarch64/aarch64.h (EH_RETURN_HANDLER_RTX): Remove.
>   (AARCH64_EH_RETURN_STACKADJ_REGNUM): Define.
>   (AARCH64_EH_RETURN_STACKADJ_RTX): Define.
>   (AARCH64_EH_RETURN_HANDLER_REGNUM): Define.
>   (AARCH64_EH_RETURN_HANDLER_RTX): Define.
>   * config/aarch64/aarch64.md (eh_return): New.
> ---
>  gcc/config/aarch64/aarch64-protos.h |   2 +-
>  gcc/config/aarch64/aarch64.cc   | 106 

[PATCH] rtl: use rtx_code for gen_ccmp_first and gen_ccmp_next

2023-08-23 Thread Richard Earnshaw via Gcc-patches
Note, this patch is dependent on the patch I posted yesterday to
forward declare rtx_code in coretypes.h.

--
Now that we have a forward declaration of rtx_code in coretypes.h, we
can adjust these hooks to take rtx_code arguments rather than an int.

gcc/ChangeLog:

* target.def (gen_ccmp_first, gen_ccmp_next): Use rtx_code for
CODE, CMP_CODE and BIT_CODE arguments.
* config/aarch64/aarch64.cc (aarch64_gen_ccmp_first): Likewise.
(aarch64_gen_ccmp_next): Likewise.
* doc/tm.texi: Regenerated.
---
 gcc/config/aarch64/aarch64.cc | 5 +++--
 gcc/doc/tm.texi   | 4 ++--
 gcc/target.def| 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 560e5431636..bc09185b8ec 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25585,7 +25585,7 @@ aarch64_asan_shadow_offset (void)
 
 static rtx
 aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
-			int code, tree treeop0, tree treeop1)
+			rtx_code code, tree treeop0, tree treeop1)
 {
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
   rtx op0, op1;
@@ -25659,7 +25659,8 @@ aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
 
 static rtx
 aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev,
-		   int cmp_code, tree treeop0, tree treeop1, int bit_code)
+		   rtx_code cmp_code, tree treeop0, tree treeop1,
+		   rtx_code bit_code)
 {
   rtx op0, op1, target;
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..75cb8e3417c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12005,7 +12005,7 @@ This target hook is required only when the target has several different
 modes and they have different conditional execution capability, such as ARM.
 @end deftypefn
 
-@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn **@var{prep_seq}, rtx_insn **@var{gen_seq}, int @var{code}, tree @var{op0}, tree @var{op1})
+@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx_code @var{code}, tree @var{op0}, tree @var{op1})
 This function prepares to emit a comparison insn for the first compare in a
  sequence of conditional comparisions.  It returns an appropriate comparison
  with @code{CC} for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.
@@ -12015,7 +12015,7 @@ This function prepares to emit a comparison insn for the first compare in a
  @var{code} is the @code{rtx_code} of the compare for @var{op0} and @var{op1}.
 @end deftypefn
 
-@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx_insn **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx @var{prev}, int @var{cmp_code}, tree @var{op0}, tree @var{op1}, int @var{bit_code})
+@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx_insn **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx @var{prev}, rtx_code @var{cmp_code}, tree @var{op0}, tree @var{op1}, rtx_code @var{bit_code})
 This function prepares to emit a conditional comparison within a sequence
  of conditional comparisons.  It returns an appropriate comparison with
  @code{CC} for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.
diff --git a/gcc/target.def b/gcc/target.def
index 7d684296c17..3ad0bde3ece 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2735,7 +2735,7 @@ DEFHOOK
  insns are saved in @var{gen_seq}.  They will be emitted when all the\n\
  compares in the conditional comparision are generated without error.\n\
  @var{code} is the @code{rtx_code} of the compare for @var{op0} and @var{op1}.",
- rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, int code, tree op0, tree op1),
+ rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx_code code, tree op0, tree op1),
  NULL)
 
 DEFHOOK
@@ -2752,7 +2752,7 @@ DEFHOOK
  be appropriate for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.\n\
  @var{code} is the @code{rtx_code} of the compare for @var{op0} and @var{op1}.\n\
  @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the compares.",
- rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, int cmp_code, tree op0, tree op1, int bit_code),
+ rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, rtx_code cmp_code, tree op0, tree op1, rtx_code bit_code),
  NULL)
 
 /* Return a new value for loop unroll size.  */


[PATCH] RISC-V: Add conditional sign/zero extension and truncation autovec patterns

2023-08-23 Thread Lehua Ding
Hi,

This patch adds conditional sign/zero extension and truncation autovec
patterns by combining EXTENSION/TRUNCATION and VCOND_MASK patterns.

For quad truncation, two vncvt instructions are generated. This patch
combine the second vncvt and vmerge to form a masked vncvt, while the
first vncvt remains unchanged. Of course, it is possible to convert the
first vncvt to the mask type as well, but I don't think it is necessary.
It is a similar story with 8x truncation.

--
Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
Add combine pattern.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_trunc): Ditto.
* config/riscv/autovec.md (2):
Change define_expand to define_insn_and_split.
(2): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_masked_insn): Exported.
* config/riscv/riscv-v.cc (emit_vlmax_cmp_mu_insn): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New 
test.
---
 gcc/config/riscv/autovec-opt.md   | 69 +++
 gcc/config/riscv/autovec.md   | 39 ---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   |  2 +-
 .../riscv/rvv/autovec/binop/narrow-3.c|  2 +-
 .../rvv/autovec/cond/cond_convert_int2int-1.h | 48 +
 .../rvv/autovec/cond/cond_convert_int2int-2.h | 46 +
 .../cond/cond_convert_int2int-rv32-1.c| 13 
 .../cond/cond_convert_int2int-rv32-2.c| 13 
 .../cond/cond_convert_int2int-rv64-1.c| 13 
 .../cond/cond_convert_int2int-rv64-2.c| 13 
 .../autovec/cond/cond_convert_int2int_run-1.c | 31 +
 .../autovec/cond/cond_convert_int2int_run-2.c | 30 
 13 files changed, 294 insertions(+), 26 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 8247eb87ddb..f3ef3a839df 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -723,3 +723,72 @@
  riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; Combine sign_extend/zero_extend(vf2) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+(if_then_else:VWEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VWEXTI (match_operand: 3 
"register_operand"))
+  (match_operand:VWEXTI 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf2 (, mode);
+  riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf4) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VQEXTI 0 "register_operand")
+(if_then_else:VQEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VQEXTI (match_operand: 3 
"register_operand"))
+  (match_operand:VQEXTI 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf4 (, mode);
+  riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf8) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VOEXTI 0 "register_operand")
+(if_then_else:VOEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VOEXTI (match_operand: 3 
"register_operand"))
+  

[PATCH] RISC-V: Add conditional convert autovec patterns between FPs

2023-08-23 Thread Lehua Ding
Hi,

This patch add conditional FP extendsion and truncation autovec
patterns. This patch depend on other patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628235.html .

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend):
Add combine pattern.
(*cond_trunc): Ditto.
* config/riscv/autovec.md: Adjust for combine.
* config/riscv/riscv-protos.h (emit_vlmax_masked_fp_insn): Exported.
* config/riscv/riscv-v.cc (emit_vlmax_masked_fp_insn): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: 
New test.

---
 gcc/config/riscv/autovec-opt.md   | 35 +++
 gcc/config/riscv/autovec.md   | 15 +++-
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 19 ++
 .../autovec/cond/cond_convert_float2float-1.h | 30 
 .../autovec/cond/cond_convert_float2float-2.h | 28 +++
 .../cond/cond_convert_float2float-rv32-1.c|  9 +
 .../cond/cond_convert_float2float-rv32-2.c|  9 +
 .../cond/cond_convert_float2float-rv64-1.c|  9 +
 .../cond/cond_convert_float2float-rv64-2.c|  9 +
 .../cond/cond_convert_float2float_run-1.c | 31 
 .../cond/cond_convert_float2float_run-2.c | 30 
 12 files changed, 214 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f3ef3a839df..8f9a6317592 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -792,3 +792,38 @@
   riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
   DONE;
 })
+
+;; Combine FP sign_extend/zero_extend(vf2) and vcond_mask
+(define_insn_and_split "*cond_extend"
+  [(set (match_operand:VWEXTF_ZVFHMIN 0 "register_operand")
+(if_then_else:VWEXTF_ZVFHMIN
+  (match_operand: 1 "register_operand")
+  (float_extend:VWEXTF_ZVFHMIN (match_operand: 3 
"register_operand"))
+  (match_operand:VWEXTF_ZVFHMIN 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_extend (mode);
+  riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
+
+;; Combine FP trunc(vf2) + vcond_mask
+(define_insn_and_split "*cond_trunc"
+  [(set (match_operand: 0 "register_operand")
+(if_then_else:
+  (match_operand: 1 "register_operand")
+  (float_truncate:
+(match_operand:VWEXTF_ZVFHMIN 3 "register_operand"))
+  (match_operand: 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_trunc (mode);
+  riscv_vector::emit_vlmax_masked_fp_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4936333f303..f2bf5e045ee 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -752,14 +752,9 @@
  (match_operand: 1 "register_operand")))]
   "TARGET_VECTOR && (TARGET_ZVFHMIN || TARGET_ZVFH)"
 {
-  rtx dblw = gen_reg_rtx (mode);
-  insn_code icode = code_for_pred_extend (mode);
-  rtx ops1[] = {dblw, operands[1]};
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops1);
-
-  icode = code_for_pred_extend (mode);
-  rtx ops2[] = {operands[0], dblw};
-  riscv_vector::emi

[PATCH V2] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-23 Thread Juzhe-Zhong
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest 
fusion.
I do the refactor for the following reasons:
  
  1. Current implementation of phase 3 is doing too many things which makes the 
code quality
 quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion 
including how to fuse
 VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion 
point (location)
 whether it is correct and optimal...etc.

 We are dong these things too much so I added these following functions:

enum fusion_type get_backward_fusion_type (const bb_info *,
 const vector_insn_info &);
bool hard_empty_block_p (const bb_info *, const vector_insn_info &) 
const;
bool backward_demand_fusion (void);
bool forward_demand_fusion (void);
bool cleanup_illegal_dirty_blocks (void);

 to make sure the VSETV fusion is optimal and correct. I found in may 
downstream testing it is
 not the reliable and optimal approach.

 Instead, this patch is to use 'compute_earliest' which is the function of 
LCM to fuse multiple
 'compatible' VSETVL demand info if they are having same earliest edge.  We 
let LCM decide almost
 everything of demand fusion for us. The only thing we do (Not the LCM do) 
is just checking the
 VSETVLs demand info are compatible or not. That's all we need to do.
 I belive such approach is much more reliable and optimal than before (We 
have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG 
than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for 
for
  ...
 for
   if (...)
 VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
 VSETVL 2 demand: SEW = 16.
   else
 VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 
and VSETVL 3.
 They are having same earliest edge which is outside the 1th inner-most 
loop.
   
   - Then, we check these 3 VSETVL demand info are compatible so fuse them into 
a single VSETVL info:
 demand SEW = 16, LMUL = MF2, TU, MU.
   
   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) 
will hoist such VSETVL
 to the outer-most loop. So that we can get optimal codegen.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New 
function.
(after_or_same_p): Ditto.
(find_reg_killed_by): Delete.
(has_vsetvl_killed_avl_p): Ditto.
(anticipatable_occurrence_p): Refactor.
(any_set_in_bb_p): Delete.
(count_regno_occurrences): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(demands_can_be_fused_p): Ditto.
(earliest_pred_can_be_fused_p): New function.
(vsetvl_dominated_by_p): Ditto.
(vector_insn_info::parse_insn): Refactor.
(vector_insn_info::merge): Refactor.
(vector_insn_info::dump): Refactor.
(vector_infos_manager::vector_infos_manager): Refactor.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Refactor.
(vector_infos_manager::free_bitmap_vectors): Refactor.
(vector_infos_manager::dump): Refactor.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Ditto.
(pass_vsetvl::get_backward_fusion_type): Delete.
(pass_vsetvl::hard_empty_block_p): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::forward_demand_fusion): Ditto.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Ditto.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::vsetvl_fusion): Ditto.
(pass_vsetvl::commit_vsetvls): Refactor.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Refactor.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-8.c: Adapt test.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.
 

[PATCH] RISC-V:add a more appropriate type attribute

2023-08-23 Thread Zhangjin Liao
Due to the more accurate type attribute added to the clz, ctz, and pcnt 
operations
in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the same type attribute
should be used here.

gcc/ChangeLog:

* config/riscv/bitmanip.md:add a more appropriate type attribute
---
 gcc/config/riscv/bitmanip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 0c99152ffc8..7b55528ee49 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -262,7 +262,7 @@
   (match_operand:DI 2 "const_int_operand")))]
   "TARGET_64BIT && TARGET_ZBB && ((INTVAL (operands[2]) & 0x3f) == 0x3f)"
   "w\t%0,%1"
-  [(set_attr "type" "bitmanip")
+  [(set_attr "type" "")
(set_attr "mode" "SI")])
 
 (define_insn "*di2"
-- 
2.17.1



Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-23 Thread juzhe.zh...@rivai.ai
I have reorder the functions so that we won't mess up deleted functions and new 
functions.
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628237.html 

>> Why need this exception?

Because we have this piece code here for fusion in "EMPTY" block:
new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index);
The expr may not have a reall avl source which is considered as incompatible.
However, in this case, we should skip the compatible check, just use merge to 
compute demand info.

>>Make sure I understand this correctly: it's worth if thoe edges has
>>different probability?
>>If all probability is same, then it's not worth?
The probability is supposed to help for picking the optimal VSETVL info for 
incompatible demand infos.

Consider this following case:

void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, 
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i== cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else {
vbool1_t v = *(vbool1_t*)(in + i + 400);
*(vbool1_t*)(out + i + 400) = v;
  }
}
}

Both VSETVLs are incompatible since one want e8mf8, the other wants e8m8.
For if (i == cond) is very low probability (It could only be accessed 0 times 
or once)
We want to hoist the e8m8 to get optimal codegen like this:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
vsetvli a4,zero,e8,m8,ta,ma
.L5:
beq a3,a5,.L12
vlm.v v1,0(a0)
vsm.v v1,0(a1)
.L4:
addi a5,a5,1
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a6,a1,-1200
addi t1,a0,-1200
vle8.v v1,0(t1)
vse8.v v1,0(a6)
vsetvli a4,zero,e8,m8,ta,ma
j .L4


Wheras the other case is like this:
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, 
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i > cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else {
vbool1_t v = *(vbool1_t*)(in + i + 400);
*(vbool1_t*)(out + i + 400) = v;
  }
}
}

Both condition probabilities are equal, so we don't want to take any of them as 
higher priority,
so the codegen should be:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
j .L5
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a5,a5,1
vle8.v v1,0(a6)
vse8.v v1,0(a4)
addi a0,a0,4
addi a1,a1,4
beq a2,a5,.L10
.L5:
addi a4,a1,-1200
addi a6,a0,-1200
bltu a3,a5,.L12
vsetvli t1,zero,e8,m8,ta,ma
addi a5,a5,1
vlm.v v1,0(a0)
vsm.v v1,0(a1)
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
> const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if

[PATCH v5] c++: extend cold, hot attributes to classes

2023-08-23 Thread Javier Martinez via Gcc-patches
On Tue, Aug 22, 2023 at 7:50 PM Jason Merrill  wrote:
> You still need an update to doc/extend.texi for this additional use of
> the attribute.  Sorry I didn't think of that before.

I should have caught that too, many thanks.

Also addressed the formatting comments. Patch attached.

Signed-off-by: Javier Martinez 
Signed-off-by: Javier Martinez 

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_hot_attribute): remove warning on
RECORD_TYPE and UNION_TYPE when in c_dialect_xx.
(handle_cold_attribute): Likewise.

gcc/cp/ChangeLog:

* class.cc (propagate_class_warmth_attribute): New function.
(check_bases_and_members): propagate hot and cold attributes
to all FUNCTION_DECL when the record is marked hot or cold.
* cp-tree.h (maybe_propagate_warmth_attributes): New function.
* decl2.cc (maybe_propagate_warmth_attributes): New function.
* method.cc (lazily_declare_fn): propagate hot and cold
attributes to lazily declared functions when the record is
marked hot or cold.

gcc/ChangeLog:

* doc/extend.texi: Document attributes hot, cold on C++ types.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-hotness.C: New test.

---
 gcc/c-family/c-attribs.cc   | 50 -
 gcc/cp/class.cc | 29 ++
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/decl2.cc | 44 ++
 gcc/cp/method.cc|  6 +++
 gcc/doc/extend.texi | 37 +-
 gcc/testsuite/g++.dg/ext/attr-hotness.C | 16 
 7 files changed, 179 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-hotness.C

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..5d83f54561d 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -452,10 +452,10 @@ const struct attribute_spec c_common_attribute_table[] =
   { "alloc_size",	  1, 2, false, true, true, false,
 			  handle_alloc_size_attribute,
 	  attr_alloc_exclusions },
-  { "cold",   0, 0, true,  false, false, false,
+  { "cold",		  0, 0, false,  false, false, false,
 			  handle_cold_attribute,
 	  attr_cold_hot_exclusions },
-  { "hot",0, 0, true,  false, false, false,
+  { "hot",		  0, 0, false,  false, false, false,
 			  handle_hot_attribute,
 	  attr_cold_hot_exclusions },
   { "no_address_safety_analysis",
@@ -1110,6 +1110,29 @@ handle_hot_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute hot processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	   && c_dialect_cxx ()
+	   && (flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree cold_attr = lookup_attribute ("cold", TYPE_ATTRIBUTES (*node));
+  if (cold_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+		   "conflicts with attribute %qs", name, "cold");
+	  *no_add_attrs = true;
+	}
+}
+  else if (flags & ((int) ATTR_FLAG_FUNCTION_NEXT
+		| (int) ATTR_FLAG_DECL_NEXT))
+{
+	/* Avoid applying the attribute to a function return type when
+	   used as:  void __attribute ((hot)) foo (void).  It will be
+	   passed to the function.  */
+	*no_add_attrs = true;
+}
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1131,6 +1154,29 @@ handle_cold_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute cold processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	   && c_dialect_cxx ()
+	   && (flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree hot_attr = lookup_attribute ("hot", TYPE_ATTRIBUTES (*node));
+  if (hot_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+		   "conflicts with attribute %qs", name, "hot");
+	  *no_add_attrs = true;
+	}
+}
+  else if (flags & ((int) ATTR_FLAG_FUNCTION_NEXT
+		| (int) ATTR_FLAG_DECL_NEXT))
+{
+	/* Avoid applying the attribute to a function return type when
+	   used as:  void __attribute ((cold)) foo (void).  It will be
+	   passed to the function.  */
+	*no_add_attrs = true;
+}
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 778759237dc..0bb679f15be 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -205,6 +205,7 @@ sta

[PATCH] tree-optimization/111115 - SLP of masked stores

2023-08-23 Thread Richard Biener via Gcc-patches
The following adds the capability to do SLP on .MASK_STORE, I do not
plan to add interleaving support.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/15
gcc/
* tree-vectorizer.h (vect_slp_child_index_for_operand): New.
* tree-vect-data-refs.cc (can_group_stmts_p): Also group
.MASK_STORE.
* tree-vect-slp.cc (arg3_arg2_map): New.
(vect_get_operand_map): Handle IFN_MASK_STORE.
(vect_slp_child_index_for_operand): New function.
(vect_build_slp_tree_1): Handle statements with no LHS,
masked store ifns.
(vect_remove_slp_scalar_calls): Likewise.
* tree-vect-stmts.c (vect_check_store_rhs): Lookup the
SLP child corresponding to the ifn value index.
(vectorizable_store): Likewise for the mask index.  Support
masked stores.
(vectorizable_load): Lookup the SLP child corresponding to the
ifn mask index.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_masked_store):
Supported with check_avx_available.
* gcc.dg/vect/slp-mask-store-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c | 39 +
 gcc/testsuite/lib/target-supports.exp|  3 +-
 gcc/tree-vect-data-refs.cc   |  3 +-
 gcc/tree-vect-slp.cc | 46 +---
 gcc/tree-vect-stmts.cc   | 23 +-
 gcc/tree-vectorizer.h|  1 +
 6 files changed, 94 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
new file mode 100644
index 000..50b7066778e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-additional-options "-mavx2" { target avx2 } } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (unsigned * __restrict x, int * __restrict flag)
+{
+  for (int i = 0; i < 32; ++i)
+{
+  if (flag[2*i+0])
+x[2*i+0] = x[2*i+0] + 3;
+  if (flag[2*i+1])
+x[2*i+1] = x[2*i+1] + 177;
+}
+}
+
+unsigned x[16];
+int flag[32] = { 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+unsigned res[16] = { 3, 177, 0, 0, 0, 177, 3, 0, 3, 177, 0, 0, 0, 177, 3, 0 };
+
+int
+main ()
+{
+  check_vect ();
+
+  foo (x, flag);
+
+  if (__builtin_memcmp (x, res, sizeof (x)) != 0)
+abort ();
+  for (int i = 0; i < 32; ++i)
+if (flag[i] != 0 && flag[i] != 1)
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { 
vect_masked_store && vect_masked_load } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d4623ee6b45..d353cc0aaf0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8400,7 +8400,8 @@ proc check_effective_target_vect_masked_load { } {
 # Return 1 if the target supports vector masked stores.
 
 proc check_effective_target_vect_masked_store { } {
-return [expr { [check_effective_target_aarch64_sve]
+return [expr { [check_avx_available]
+  || [check_effective_target_aarch64_sve]
   || [istarget amdgcn*-*-*] }]
 }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 3e9a284666c..a2caf6cb1c7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info, 
stmt_vec_info stmt2_info,
 like those created by build_mask_conversion.  */
   tree mask1 = gimple_call_arg (call1, 2);
   tree mask2 = gimple_call_arg (call2, 2);
-  if (!operand_equal_p (mask1, mask2, 0)
-  && (ifn == IFN_MASK_STORE || !allow_slp_p))
+  if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p)
{
  mask1 = strip_conversion (mask1);
  if (!mask1)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b5f9333fc22..cc799b6ebcd 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -503,6 +503,7 @@ static const int cond_expr_maps[3][5] = {
 static const int arg1_map[] = { 1, 1 };
 static const int arg2_map[] = { 1, 2 };
 static const int arg1_arg4_map[] = { 2, 1, 4 };
+static const int arg3_arg2_map[] = { 2, 3, 2 };
 static const int op1_op0_map[] = { 2, 1, 0 };
 
 /* For most SLP statements, there is a one-to-one mapping between
@@ -543,6 +544,9 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
  case IFN_MASK_GATHER_LOAD:
return arg1_arg4_map;
 
+ case IFN_MASK_STORE:
+   return arg3_arg2_map;
+
  default:
break;
  }
@@ -550,6 +554,20 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap =

[PATCH] RISC-V: Add initial pipeline description for an out-of-order core.

2023-08-23 Thread Robin Dapp via Gcc-patches
Hi,

this adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor could look like.
For the lack of a better name, I called the -mtune parameter "generic-ooo".

In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented.  It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted,
this feature is only enabled when specifying -madjust-lmul-cost.

Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.

As mentioned, the latency numbers are guesswork at best.  I assumed
6-wide issue as most public announcements point into that direction
and obviously everything else is similarly coarse.  Feel free to
correct in case I unnecessarily pessimized or underestimated something.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add generic_ooo.
* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
scheduler hook.
(TARGET_SCHED_ADJUST_COST): Define.
* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
* config/riscv/riscv.opt: Add -madjust-lmul-cost.
* config/riscv/generic-ooo.md: New file.
* config/riscv/vector.md: Add vsetvl_pre.
---
 gcc/config/riscv/generic-ooo.md  | 284 +++
 gcc/config/riscv/riscv-cores.def |   1 +
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.cc|  87 ++
 gcc/config/riscv/riscv.md|   5 +-
 gcc/config/riscv/riscv.opt   |   3 +
 gcc/config/riscv/vector.md   |   4 +-
 7 files changed, 383 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/riscv/generic-ooo.md

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
new file mode 100644
index 000..78b9e48f935
--- /dev/null
+++ b/gcc/config/riscv/generic-ooo.md
@@ -0,0 +1,284 @@
+;; RISC-V generic out-of-order core scheduling model.
+;; Copyright (C) 2017-2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "generic_ooo")
+
+;; Regarding functional units we assume a three-way split:
+;; - Integer ALU (IXU) - 4 symmetric units.
+;; - Floating-point (FXU) - 2 symmetric units.
+;; - Vector Unit (VXU) - 1 unit.
+
+;; We assume 6-wide issue:
+;; - 5-wide generic/integer issue.
+;; - 1-wide vector issue.
+
+;; For now, the only subunits are for non-pipelined integer division and
+;; vector div/mult/sqrt.
+;; No extra units for e.g. vector permutes, masking, everything is assumed to
+;; be on the same pipelined execution unit.
+
+;; Latency:
+;; - Regular integer operations take 1 cycle.
+;; - Multiplication/Division take multiple cycles.
+;; - Float operations take 4-6 cycles.
+;; - Regular vector operations take 2-6 cycles.
+;;   (This assumes LMUL = 1, latency for LMUL = 2, 4, 8 is scaled accordingly
+;;by riscv_sched_adjust_cost when -madjust-lmul-cost is given)
+;; - Load/Store:
+;;   - To/From IXU: 4 cycles.
+;;   - To/From FXU: 6 cycles.
+;;   - To/From VXU: 6 cycles.
+
+;; Integer/float issue queues.
+(define_cpu_unit "issue0,issue1,issue2,issue3,issue4" "generic_ooo")
+
+;; Separate issue queue for vector instructions.
+(define_cpu_unit "generic_ooo_vxu_issue" "generic_ooo")
+
+;; Integer/float execution units.
+(define_cpu_unit "ixu0,ixu1,ixu2,ixu3" "generic_ooo")
+(define_cpu_unit "fxu0,fxu1" "generic_ooo")
+
+;; Integer subunit for division.
+(define_cpu_unit "generic_ooo_div" "generic_ooo")
+
+;; Vector execution unit.
+(define_cpu_unit "generic_ooo_vxu_alu" "generic_ooo")
+
+;; Vector subunit that does mult/div/sqrt.
+(define_cpu_unit "generic_ooo_vxu_multicycle" "generic_ooo")
+
+;; Shortcuts
+(define_reservation "generic_ooo_issue" "issue0|issue1|issue2|issue3|issue4")
+(define_reservation "generic_ooo_ixu_alu" "ixu0|ixu1|ixu2|ixu3")
+(define_reservation "generic_ooo_fxu" "fxu0|fxu1")
+
+
+

Re: [PATCH] RISC-V:add a more appropriate type attribute

2023-08-23 Thread Jeff Law via Gcc-patches




On 8/23/23 06:28, Zhangjin Liao wrote:

Due to the more accurate type attribute added to the clz, ctz, and pcnt 
operations
inhttps://github.com/gcc-mirror/gcc/commit/07e2576d6f3  the same type attribute
should be used here.

gcc/ChangeLog:

 * config/riscv/bitmanip.md:add a more appropriate type attribute

Thanks.  I improved the ChangeLog slightly and pushed this to the trunk.

Jeff


[PATCH v2 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-08-23 Thread Andrew Stubbs
This patch series is a rework of part of the series I posted about a year
ago:

https://patchwork.sourceware.org/project/gcc/list/?series=10748&state=%2A&archive=both

The series depends on the low-latency patch series I posted a few weeks
ago:

https://patchwork.sourceware.org/project/gcc/list/?series=23045&state=%2A&archive=both

I will post the Unified Shared Memory and allocator directive patches at
a later time.

This version of the patches implement the same basic features, rebased
on the current sourcebase, plus a Cuda-specific allocator for improved
performance with NVPTX offloading, and a custom allocator for better
handling of small allocations. The whole series has been bug fixed and
generally improved (mostly by Thomas :) ).

An older, less compact, version of these patches is already applied to
the devel/omp/gcc-13 (OG13) branch.

OK for mainline?

Andrew

Andrew Stubbs (5):
  libgomp: basic pinned memory on Linux
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp: Add -foffload-memory
  openmp: -foffload-memory=pinned
  libgomp: fine-grained pinned memory allocator

Thomas Schwinge (1):
  libgomp, nvptx: Cuda pinned memory

 gcc/common.opt|  16 +
 gcc/coretypes.h   |   7 +
 gcc/doc/invoke.texi   |  16 +-
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc|  66 
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   5 +-
 libgomp/allocator.c   |  96 --
 libgomp/config/gcn/allocator.c|  17 +-
 libgomp/config/linux/allocator.c  | 234 +
 libgomp/config/nvptx/allocator.c  |  17 +-
 libgomp/libgomp-plugin.h  |   2 +
 libgomp/libgomp.h |  14 +
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp_g.h   |   1 +
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/plugin/plugin-nvptx.c |  34 ++
 libgomp/target.c  | 136 
 .../libgomp.c-c++-common/alloc-pinned-1.c |  28 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  | 134 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  | 139 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  | 174 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  | 176 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 128 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 127 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c  | 127 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +
 libgomp/usmpin-allocator.c| 319 ++
 30 files changed, 2051 insertions(+), 50 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/usmpin-allocator.c

-- 
2.41.0



[PATCH v2 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-08-23 Thread Andrew Stubbs

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(predefined_alloc_mapping): Add ompx_pinned_mem_alloc entry.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   |  58 ++
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 103 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 +
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 6 files changed, 262 insertions(+), 19 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 6007e64f580..39ba1d07bc7 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include 
 #endif
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
The following definitions (ab)use comma operators to avoid unused
@@ -76,6 +76,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 enum gomp_numa_memkind_kind
@@ -612,8 +613,10 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size,
-allocator_data && allocator_data->pinned);
+	  int pinned = (allocator_data
+			? allocator_data->pinned
+			: allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -634,7 +637,8 @@ retry:
 fail:;
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -762,6 +766,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
   memspace = predefined_alloc_mapping[data->allocator];
+  pinned = (data->allocator == ompx_pinned_mem_alloc);
 }
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
@@ -935,8 +940,10 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size,
- allocator_data && allocator_data->pinned);
+	  int pinned = (allocator_data
+			? allocator_data->pinned
+			: allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -957,7 +964,8 @@ retry:
 fail:;
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1180,11 +1188,14 @@ retry:
   else
 #endif
   if (prev_size)
-	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-data->size, new_size,
-(free_allocator_data
- && free_allocator_data->pinned),
-allocator_data->pinned);
+	{
+	  int was_pinned = (free_allocator_data
+			? free_allocator_data->pinned
+			: free_allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+  data->size, new_size, was_pinned,
+  allocator_data->pinned);
+	}
   else
 	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
   allocator_data->pinned);
@@ -1240,10 +1251,14 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
+	  int was

[PATCH v2 1/6] libgomp: basic pinned memory on Linux

2023-08-23 Thread Andrew Stubbs

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c  |  66 +
 libgomp/config/gcn/allocator.c   |  17 +--
 libgomp/config/linux/allocator.c |  99 +
 libgomp/config/nvptx/allocator.c |  17 +--
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 109 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 114 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 141 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 143 +++
 8 files changed, 665 insertions(+), 41 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index a132abc45b5..6007e64f580 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -41,20 +41,21 @@
The following definitions (ab)use comma operators to avoid unused
variable errors.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
-  malloc (((void)(MEMSPACE), (SIZE)))
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (((void)(MEMSPACE), (SIZE
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
-  calloc (1, (((void)(MEMSPACE), (SIZE
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, (((void)(MEMSPACE), (SIZE)
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
-  realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+   ((PIN) || (OLDPIN) ? NULL \
+   : realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE)
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
-  free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (((void)(MEMSPACE), (void)(SIZE), (ADDR
 #endif
 #ifndef MEMSPACE_VALIDATE
 #define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
@@ -434,10 +435,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 }
 #endif
 
-  /* No support for this so far.  */
-  if (data.pinned)
-return omp_null_allocator;
-
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
   *ret = data;
 #ifndef HAVE_SYNC_BUILTINS
@@ -577,7 +574,8 @@ retry:
 	}
   else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			  allocator_data->pinned);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -614,7 +612,8 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size,
+allocator_data && allocator_data->pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -646,7 +645,8 @@ fail:;
 	  || memkind
 #endif
 	  || allocator_data == NULL
-	  || allocator_data->pool_size < ~(uintptr_t) 0)
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -697,6 +697,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 {
   struct omp_mem_header *data;
   omp_memspace_handle_t memspace = omp_default_mem_space;
+  int pinned = false;
 
   if (ptr == NULL)
 return;
@@ -738,6 +739,7 @@ omp_free (void *ptr, omp

[PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-08-23 Thread Andrew Stubbs

Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.

The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX.  At present, the other supported devices do
not have equivalent capabilities (or requirements).

libgomp/ChangeLog:

* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp_g.h (GOMP_enable_pinned_mode): New prototype.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/config/linux/allocator.c | 134 ++
 libgomp/libgomp-plugin.h |   2 +
 libgomp/libgomp.h|   4 +
 libgomp/libgomp_g.h  |   1 +
 libgomp/plugin/plugin-nvptx.c|  34 +
 libgomp/target.c | 136 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  25 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c |  25 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c |  43 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c |  43 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c |  25 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c |  34 -
 12 files changed, 464 insertions(+), 42 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 8205d67c7a2..f29ed1091a3 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -36,6 +36,11 @@
 
 /* Implement malloc routines that can handle pinned memory on Linux.

+   Given that pinned memory is typically used to help host <-> device memory
+   transfers, we attempt to allocate such memory using a device (really:
+   libgomp plugin), but fall back to mmap plus mlock if no suitable device is
+   available.
+
It's possible to use mlock on any heap memory, but using munlock is
problematic if there are multiple pinned allocations on the same page.
Tracking all that manually would be possible, but adds overhead. This may
@@ -49,6 +54,7 @@
 #define _GNU_SOURCE
 #include 
 #include 
+#include 
 #include "libgomp.h"
 
 static bool always_pinned_mode = false;
@@ -65,42 +71,87 @@ GOMP_enable_pinned_mode ()
 always_pinned_mode = true;
 }
 
+static int using_device_for_page_locked
+  = /* uninitialized */ -1;
+
 static void *
-linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin,
+		  bool init0)
 {
-  (void)memspace;
+  gomp_debug (0, "%s: memspace=%llu, size=%llu, pin=%d, init0=%d\n",
+	  __FUNCTION__, (unsigned long long) memspace,
+	  (unsigned long long) size, pin, init0);
+
+  void *addr;
 
   /* Explicit pinning may not be required.  */
   pin = pin && !always_pinned_mode;
 
   if (pin)
 {
-  void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
-			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-  if (addr == MAP_FAILED)
-	return NULL;
-
-  if (mlock (addr, size))
+  int using_device
+	= __atomic_load_n (&using_device_for_page_locked,
+			   MEMMODEL_RELAXED);
+  gomp_debug (0, "  using_device=%d\n",
+		  using_device);
+  if (using_device != 0)
 	{
-	  gomp_debug (0, "libgomp: failed to pin memory (ulimit too low?)\n");
-	  munmap (addr, size);
-	  return NULL;
+	  using_d

[PATCH v2 3/6] openmp: Add -foffload-memory

2023-08-23 Thread Andrew Stubbs

Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 0888c15b88f..541213d285f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2243,6 +2243,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 3e9a2f19e27..a7feed51b0b 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -207,6 +207,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ef3f4098986..be780dc41d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted
 -flax-vector-conversions  -fms-extensions
--foffload=@var{arg}  -foffload-options=@var{arg}
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
 -fopenacc  -fopenacc-dim=@var{geom}
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
@@ -2740,6 +2740,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @opindex fopenacc
 @cindex OpenACC accelerator programming
 @item -fopenacc


[PATCH v2 4/6] openmp: -foffload-memory=pinned

2023-08-23 Thread Andrew Stubbs

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def  |  3 +
 gcc/omp-low.cc| 66 +++
 libgomp/config/linux/allocator.c  | 26 
 libgomp/libgomp.map   |  1 +
 .../libgomp.c-c++-common/alloc-pinned-1.c | 28 
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++
 6 files changed, 187 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index e0f03263db0..1f7280f6b36 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b882df048ef..64bc0662332 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14683,6 +14683,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+ 
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+ GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+  NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		   void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14739,6 +14801,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index edcde9f1e81..8205d67c7a2 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/li

[PATCH v2 6/6] libgomp: fine-grained pinned memory allocator

2023-08-23 Thread Andrew Stubbs

This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available).  In future,
this allocator will also be used for Unified Shared Memory.  Both memories
are incompatible with the system malloc because allocated memory cannot
share a page with memory allocated for other purposes.

This means that small allocations will no longer consume an entire page of
pinned memory.  Unfortunately, it also means that pinned memory pages will
never be unmapped (although they may be reused).

The implementation is not perfect; there are various corner cases (especially
related to extending onto new pages) where allocations and reallocations may
be sub-optimal, but it should still be a step forward in support for small
allocations.

I have considered using libmemkind's "fixed" memory but rejected it for three
reasons: 1) libmemkind may not always be present at runtime, 2) there's no
currently documented means to extend a "fixed" kind one page at a time
(although the code appears to have an undocumented function that may do the
job, and/or extending libmemkind to support the MAP_LOCKED mmap flag with its
regular kinds would be straight-forward), 3) USM benefits from having the
metadata located in different memory and using an external implementation makes
it hard to guarantee this.

libgomp/ChangeLog:

* Makefile.am (libgomp_la_SOURCES): Add usmpin-allocator.c.
* Makefile.in: Regenerate.
* config/linux/allocator.c: Include unistd.h.
(pin_ctx): New variable.
(ctxlock): New variable.
(linux_init_pin_ctx): New function.
(linux_memspace_alloc): Use usmpin-allocator for pinned memory.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.h (usmpin_init_context): New prototype.
(usmpin_register_memory): New prototype.
(usmpin_alloc): New prototype.
(usmpin_free): New prototype.
(usmpin_realloc): New prototype.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust for new behaviour.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-8.c: New test.
* usmpin-allocator.c: New file.
---
 libgomp/Makefile.am  |   2 +-
 libgomp/Makefile.in  |   5 +-
 libgomp/config/linux/allocator.c |  91 --
 libgomp/libgomp.h|  10 +
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c | 127 
 libgomp/usmpin-allocator.c   | 319 +++
 6 files changed, 521 insertions(+), 33 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/usmpin-allocator.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 428f7a9dab5..2402739e07d 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -67,7 +67,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
 	target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
 	oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
 	priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c
+	oacc-target.c usmpin-allocator.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 3ef05e6a3cb..8b3aa3c8499 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
 	oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
 	oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
 	affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
-	oacc-target.lo $(am__objects_1)
+	oacc-target.lo usmpin-allocator.lo $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -549,7 +549,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c $(am__append_3)
+	oacc-target.c usmpin-allocator.c $(am__append_3)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -782,6 +782,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/teams.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/time.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/usmpin-allocator.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/work.Plo@am__quote@
 
 .c.o:
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index f29ed1091a3..df18ad03d68 100644
--- a/l

[PING][PATCH 1/2] Ada: Synchronized private extensions are always limited

2023-08-23 Thread Richard Wai
 

From: Richard Wai  
Sent: Thursday, August 10, 2023 12:55 AM
To: 'gcc-patches@gcc.gnu.org' 
Cc: 'Eric Botcazou' ; 'Arnaud Charlet'
; 'Stephen Baird' 
Subject: [PATCH 1/2] Ada: Synchronized private extensions are always limited

 

GNAT currently considers a synchronized private extension that derives from
an interface to be limited only when said interface is a concurrent
interface. However it is of course legal for a synchronized private
extension to derive from a limited interface. In this case GNAT fails to
correctly determine that the private extension is limited.

 

This causes two separate problems that makes discriminated types in such a
case impossible:

1.  GNAT inappropriately rejects compilation, claiming default
discriminants on such a private extension are illegal.
2.  GNAT fails to generate the expected discriminals for the
unconstrained discriminanted case, leading to the corresponding
discriminants of the "corresponding record" of the underlying concurrent
type to have no identifiers, and thus compilation fails.

 

Fairly simple fix. If "synchronized" appears in the private extension
declaration, it is limited. This is explicit in the RM as well (7.3(6/2)).

 

Fixing this bug uncovered of a related bug wrt. TSS address finalizer
generation for constrained subtypes of synchronized private extensions with
no default discriminants. That patch is to follow separately.

 

Patch file is attached.

 

--  Begin change log entry --

 

ada: Private extensions with the keyword "synchronized" are always limited.

 

GNAT was relying on synchronized private type extensions deriving from a
concurrent interface to determine its limitedness. This does not cover the
case where such an extension derives a limited interface. RM-7.6(6/2) makes
is clear that "synchronized" in a private extension implies the derived type
is limited. GNAT should explicitly check for the presence of "synchronized"
in a private extension declaration, and it should have the same effect as
the presence of "limited".

 

gcc/ada/

* sem_ch3.adb (Build_Derived_Record_Type): Treat presence of
keyword "synchronized" the same as "limited" when determining if a private
extension is limited.

 

-- End change log entry --

 

This patch was bootstrapped on x86_64-*-freebsd13.2. Two new test cases were
added. Note that 4 gnat test cases fail currently on master and are
unrelated to this patch.

 

Check-ada output of this patch:

=== acats tests ===

Running chapter a ...

Running chapter c2 ...

Running chapter c3 ...

Running chapter c4 ...

Running chapter c5 ...

Running chapter c6 ...

Running chapter c7 ...

Running chapter c8 ...

Running chapter c9 ...

Running chapter ca ...

Running chapter cb ...

Running chapter cc ...

Running chapter cd ...

Running chapter ce ...

Running chapter cxa ...

Running chapter cxb ...

Running chapter cxf ...

Running chapter cxg ...

Running chapter cxh ...

Running chapter cz ...

Running chapter d ...

Running chapter e ...

Running chapter l ...

=== acats Summary ===

# of expected passes   2328

# of unexpected failures 0

 

Native configuration is x86_64-unknown-freebsd13.2

 

=== gnat tests ===

 

Schedule of variations:

unix

 

Running target unix

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 14)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 20)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 38)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 42)

 

=== gnat Summary ===

 

# of expected passes   3402

# of unexpected failures4

# of expected failures  23

# of unsupported tests   10

gnatmake version 14.0.0 20230809 (experimental)

 

 

Richard Wai

ANNEXI-STRAYLINE



[PATCH 2/2 v2] Ada: Finalization of constrained subtypes of unconstrained synchronized private extensions

2023-08-23 Thread Richard Wai
Somehow an error worked its way into the original diff (the diff itself),
making the previous patch fail to apply.

 

Fixed version attached.

 

Richard Wai

ANNEXI-STRAYLINE

 

From: Richard Wai  
Sent: Thursday, August 10, 2023 1:27 AM
To: 'gcc-patches@gcc.gnu.org' 
Cc: 'Eric Botcazou' ; 'Arnaud Charlet'
; 'Stephen Baird' 
Subject: [PATCH 2/2] Ada: Finalization of constrained subtypes of
unconstrained synchronized private extensions

 

When generating TSS address finalization bodies for a tagged class-wide
subtype, GNAT climbs the parent chain looking for the first
"non-constrained" type. That type's underlying type's class-wide type is
used as a "designated" type for a dispatching TSS deep finalize call to the
designated class-wide type. In the case of a constrained subtype of an
unconstrained synchronized private extension, this ends up designating the
underlying type of that private extension. This means it targets the
class-wide type of the actual underlying concurrent type rather than the
corresponding record. Ultimately it ends up generating a call to the
corresponding record's deep finalizer, but with incompatible types
(concurrent_type'Class -> concurrent_typeV'Class). This causes compilation
to fail.

 

This patch adds extra logic to exp_ch7(Make_Finalize_Address_Stmts) to
identify such cases and ensure that the designated type is the corresponding
record type's class-wide type in that situation.

 

Patch file is attached.

 

--  Begin change log entry -

 

ada: TSS finalize address subprogram generation for constrained subtypes of
unconstrained synchronized private extensions should take care to designate
the corresponding record of the underlying concurrent type.

 

When generating TSS finalize address subprograms for class-wide types of
constrained root types, it follows the parent chain looking for the first
"non-constrained" type. It is possible that such a type is a private
extension with the "synchronized" keyword, in which case the underlying type
is a concurrent type. When that happens, the designated type of the finalize
address subprogram should be the corresponding record's class-wide-type.

 

Gcc/ada/

* exp_ch3(Expand_Freeze_Class_Wide_Type): Expanded comments
explaining why TSS Finalize_Address is not generated for concurrent
class-wide types.

* exp_ch7(Make_Finalize_Address_Stmts): Handle cases where
the underlying non-constrained parent type is a concurrent type, and adjust
the designated type to be the corresponding record's class-wide type.

 

--  End change log entry -

 

This patch was bootstrapped on x86_64-*-freebsd13.2. One new test cases was
added. Note that 4 gnat test cases fail currently on master and are
unrelated to this patch.

 

Check-ada output of this patch:

 

=== acats tests ===

Running chapter a ...

Running chapter c2 ...

Running chapter c3 ...

Running chapter c4 ...

Running chapter c5 ...

Running chapter c6 ...

Running chapter c7 ...

Running chapter c8 ...

Running chapter c9 ...

Running chapter ca ...

Running chapter cb ...

Running chapter cc ...

Running chapter cd ...

Running chapter ce ...

Running chapter cxa ...

Running chapter cxb ...

Running chapter cxf ...

Running chapter cxg ...

Running chapter cxh ...

Running chapter cz ...

Running chapter d ...

Running chapter e ...

Running chapter l ...

=== acats Summary ===

# of expected passes   2328

# of unexpected failures 0

 

Native configuration is x86_64-unknown-freebsd13.2

 

=== gnat tests ===

 

Schedule of variations:

unix

 

Running target unix

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 14)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 20)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 38)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 42)

 

=== gnat Summary ===

 

# of expected passes   3401

# of unexpected failures 4

# of expected failures  23

# of unsupported tests   10

gnatmake version 14.0.0 20230809 (experimental)

 

 

Richard Wai

ANNEXI-STRAYLINE



ada-tss-constrained-subtype-of-private-synchronized-extention-v2.patch
Description: Binary data


Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-23 Thread Jeff Law via Gcc-patches




On 8/23/23 00:03, Li, Pan2 wrote:

Thanks Jeff for comments, and sorry for late response.

The background comes from the CALL insn. For the RISC-V dynamic rounding mode 
we need to

1. restore the frm BEFORE call, to avoid the static rounding mode pollute the 
call.
2. Backup the frm AFTER call, to ensure the frm value after call is live.

Currently, we don’t take care of it elegantly but we would like to refine this 
part by the optional EMIT_AFTER.
Understood.  So the natural question is why does x86/sh not need this 
for its mode switching?   Don't all the same issues exist on those 
targets as well?





I'm not aware of a case where we can have an insn with control flow that
isn't the end of the block.  So perhaps then that second conditional
into an assertion inside the true arm?


Not very sure my understanding is correct, but there may be a call insn in the 
middle of the bb,
And can be considered as control flow?
In the case where the call is control flow, then it'll end the block. 
Examples of this would be if the call could throw or perform a nonlocal 
goto.  For "normal" calls, they are not considered control flow and can 
show up in the middle of a block.





Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is
created by, say a nonlocal goto, exception handling, etc, then the insn
you insert at the end of the block will never be executed.


Got it, let me have a try for this, as well as there is somewhere take care of 
this already.
You might also peek at the RTL gcse/pre code which is also LCM based and 
has the same class of problems.


jeff


Re: [PATCH] RISC-V: Add initial pipeline description for an out-of-order core.

2023-08-23 Thread 钟居哲
Does this patch fix these 2 following PR:
108271 – Missed RVV cost model (gnu.org)
108412 – RISC-V: Negative optimization of GCSE && LOOP INVARIANTS (gnu.org)

If yes, plz append these 2 cases into testsuite and indicate those 2 PR are 
fixed.
So that we can close them.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-23 21:48
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add initial pipeline description for an out-of-order 
core.
Hi,
 
this adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor could look like.
For the lack of a better name, I called the -mtune parameter "generic-ooo".
 
In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented.  It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted,
this feature is only enabled when specifying -madjust-lmul-cost.
 
Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.
 
As mentioned, the latency numbers are guesswork at best.  I assumed
6-wide issue as most public announcements point into that direction
and obviously everything else is similarly coarse.  Feel free to
correct in case I unnecessarily pessimized or underestimated something.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add generic_ooo.
* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
scheduler hook.
(TARGET_SCHED_ADJUST_COST): Define.
* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
* config/riscv/riscv.opt: Add -madjust-lmul-cost.
* config/riscv/generic-ooo.md: New file.
* config/riscv/vector.md: Add vsetvl_pre.
---
gcc/config/riscv/generic-ooo.md  | 284 +++
gcc/config/riscv/riscv-cores.def |   1 +
gcc/config/riscv/riscv-opts.h|   3 +-
gcc/config/riscv/riscv.cc|  87 ++
gcc/config/riscv/riscv.md|   5 +-
gcc/config/riscv/riscv.opt   |   3 +
gcc/config/riscv/vector.md   |   4 +-
7 files changed, 383 insertions(+), 4 deletions(-)
create mode 100644 gcc/config/riscv/generic-ooo.md
 
diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
new file mode 100644
index 000..78b9e48f935
--- /dev/null
+++ b/gcc/config/riscv/generic-ooo.md
@@ -0,0 +1,284 @@
+;; RISC-V generic out-of-order core scheduling model.
+;; Copyright (C) 2017-2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "generic_ooo")
+
+;; Regarding functional units we assume a three-way split:
+;; - Integer ALU (IXU) - 4 symmetric units.
+;; - Floating-point (FXU) - 2 symmetric units.
+;; - Vector Unit (VXU) - 1 unit.
+
+;; We assume 6-wide issue:
+;; - 5-wide generic/integer issue.
+;; - 1-wide vector issue.
+
+;; For now, the only subunits are for non-pipelined integer division and
+;; vector div/mult/sqrt.
+;; No extra units for e.g. vector permutes, masking, everything is assumed to
+;; be on the same pipelined execution unit.
+
+;; Latency:
+;; - Regular integer operations take 1 cycle.
+;; - Multiplication/Division take multiple cycles.
+;; - Float operations take 4-6 cycles.
+;; - Regular vector operations take 2-6 cycles.
+;;   (This assumes LMUL = 1, latency for LMUL = 2, 4, 8 is scaled accordingly
+;;by riscv_sched_adjust_cost when -madjust-lmul-cost is given)
+;; - Load/Store:
+;;   - To/From IXU: 4 cycles.
+;;   - To/From FXU: 6 cycles.
+;;   - To/From VXU: 6 cycles.
+
+;; Integer/float issue queues.
+(define_cpu_unit "issue0,issue1,issue2,issue3,issue4" "generic_ooo")
+
+;; Separate issue queue for vector instructions.
+(define_cpu_unit "generic_ooo_vxu_issue" "generic_ooo")
+
+;; Integer/float execution units.
+(define_cpu_unit "ixu0,ixu1,ixu2,ixu3" "generic_ooo")
+(define_cpu_unit "fxu0,fxu1" "generic_ooo")
+
+;; Integer subunit for division.
+(define_cpu_unit "generic_ooo_div" 

[PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Wilco Dijkstra via Gcc-patches

A MOPS memmove may corrupt registers since there is no copy of the input 
operands to temporary
registers.  Fix this by calling aarch64_expand_cpymem which does this.  Also 
fix an issue with
STRICT_ALIGNMENT being ignored if TARGET_MOPS is true, and avoid crashing or 
generating a huge
expansion if aarch64_mops_memcpy_size_threshold is large.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
PR target/21
* config/aarch64/aarch64.md (cpymemdi): Remove STRICT_ALIGNMENT, add 
param for memmove.
(aarch64_movmemdi): Add new expander similar to aarch64_cpymemdi.
(movmemdi): Like cpymemdi call aarch64_expand_cpymem for correct 
expansion.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem_mops): Add support 
for memmove.
(aarch64_expand_cpymem): Add support for memmove. Handle 
STRICT_ALIGNMENT correctly.
Handle TARGET_MOPS size selection correctly.
* config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Update 
prototype. 

gcc/testsuite/ChangeLog/
PR target/21
* gcc.target/aarch64/mops_4.c: Add memmove testcases.

---
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
70303d6fd953e0c397b9138ede8858c2db2e53db..97375e81cbda078847af83bf5dd4e0d7673d6af4
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -765,7 +765,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 tree aarch64_vector_load_decl (tree);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
-bool aarch64_expand_cpymem (rtx *);
+bool aarch64_expand_cpymem (rtx *, bool);
 bool aarch64_expand_setmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_float_const_rtx_p (rtx);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
eba5d4a7e04b7af82437453a691d5607d98133c9..5e8d0a0c91bc7719de2a8c5627b354cf905a4db0
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25135,10 +25135,11 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
*src, rtx *dst,
   *dst = aarch64_progress_pointer (*dst);
 }
 
-/* Expand a cpymem using the MOPS extension.  OPERANDS are taken
-   from the cpymem pattern.  Return true iff we succeeded.  */
+/* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken
+   from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmove
+   rather than memcpy.  Return true iff we succeeded.  */
 static bool
-aarch64_expand_cpymem_mops (rtx *operands)
+aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)
 {
   if (!TARGET_MOPS)
 return false;
@@ -25150,17 +25151,19 @@ aarch64_expand_cpymem_mops (rtx *operands)
   rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
   rtx src_mem = replace_equiv_address (operands[1], src_addr);
   rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
-  emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
-
+  if (is_memmove)
+emit_insn (gen_aarch64_movmemdi (dst_mem, src_mem, sz_reg));
+  else
+emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
   return true;
 }
 
-/* Expand cpymem, as if from a __builtin_memcpy.  Return true if
-   we succeed, otherwise return false, indicating that a libcall to
-   memcpy should be emitted.  */
-
+/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove.
+   OPERANDS are taken from the cpymem/movmem pattern.  IS_MEMMOVE is true
+   if this is a memmove rather than memcpy.  Return true if we succeed,
+   otherwise return false, indicating that a libcall should be emitted.  */
 bool
-aarch64_expand_cpymem (rtx *operands)
+aarch64_expand_cpymem (rtx *operands, bool is_memmove)
 {
   int mode_bits;
   rtx dst = operands[0];
@@ -25168,25 +25171,23 @@ aarch64_expand_cpymem (rtx *operands)
   rtx base;
   machine_mode cur_mode = BLKmode;
 
-  /* Variable-sized memcpy can go through the MOPS expansion if available.  */
-  if (!CONST_INT_P (operands[2]))
-return aarch64_expand_cpymem_mops (operands);
+  /* Variable-sized or strict align copies may use the MOPS expansion.  */
+  if (!CONST_INT_P (operands[2]) || STRICT_ALIGNMENT)
+return aarch64_expand_cpymem_mops (operands, is_memmove);
 
   unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
 
-  /* Try to inline up to 256 bytes or use the MOPS threshold if available.  */
-  unsigned HOST_WIDE_INT max_copy_size
-= TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256;
+  /* Set inline limits for memmove/memcpy.  MOPS has a separate threshold.  */
+  unsigned HOST_WIDE_INT max_copy_size = is_memmove ? 0 : 256;
+  unsigned HOST_WIDE_INT max_mops_size = max_copy_size;
 
-  bool size_p = optimize_function_for_size_p (cfun);
+  if (TARGET_MOPS)
+max_mops_size = is_memmove ? aarch64_mops_memmove_size_threshold
+  : aarch64_mops_memcpy_size_threshold;
 
-  /* Large constant-sized cpymem should go through MOPS when possi

[committed] i386: Fix register spill failure with concat RTX [PR111010]

2023-08-23 Thread Uros Bizjak via Gcc-patches
Disable (=&r,m,m) alternative for 32-bit targets. The combination of two
memory operands (possibly with complex addressing mode), early clobbered
output, frame pointer and PIC registers uses too many registers on
a register constrained 32-bit target.

Also merge two similar patterns using DWIH mode iterator.

PR target/111010

gcc/ChangeLog:

* config/i386/i386.md (*concat3_3):
Merge pattern from *concatditi3_3 and *concatsidi3_3 using
DWIH mode iterator.  Disable (=&r,m,m) alternative for
32-bit targets.
(*concat3_4): Disable (=&r,m,m)
alternative for 32-bit targets.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Also regtested by Rainer on i386-pc-solaris2.11 where the patch fixes
the failure.

(I didn't find a nice testcase, the test is very sensitive to
perturbations in the code.)

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 108f4af8552..50794ed7bed 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12435,17 +12435,16 @@ (define_insn_and_split "*concat3_2"
   DONE;
 })
 
-(define_insn_and_split "*concatditi3_3"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r,x")
-   (any_or_plus:TI
- (ashift:TI
-   (zero_extend:TI
- (match_operand:DI 1 "nonimmediate_operand" "r,m,r,m,x"))
+(define_insn_and_split "*concat3_3"
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r,x")
+   (any_or_plus:
+ (ashift:
+   (zero_extend:
+ (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m,x"))
(match_operand:QI 2 "const_int_operand"))
- (zero_extend:TI
-   (match_operand:DI 3 "nonimmediate_operand" "r,r,m,m,0"]
-  "TARGET_64BIT
-   && INTVAL (operands[2]) == 64"
+ (zero_extend:
+   (match_operand:DWIH 3 "nonimmediate_operand" "r,r,m,m,0"]
+  "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
   [(const_int 0)]
@@ -12456,28 +12455,10 @@ (define_insn_and_split "*concatditi3_3"
   emit_insn (gen_vec_concatv2di (tmp, operands[3], operands[1]));
 }
   else
-split_double_concat (TImode, operands[0], operands[3], operands[1]);
-  DONE;
-})
-
-(define_insn_and_split "*concatsidi3_3"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=ro,r,r,&r")
-   (any_or_plus:DI
- (ashift:DI
-   (zero_extend:DI
- (match_operand:SI 1 "nonimmediate_operand" "r,m,r,m"))
-   (match_operand:QI 2 "const_int_operand"))
- (zero_extend:DI
-   (match_operand:SI 3 "nonimmediate_operand" "r,r,m,m"]
-  "!TARGET_64BIT
-   && INTVAL (operands[2]) == 32"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  split_double_concat (DImode, operands[0], operands[3], operands[1]);
+split_double_concat (mode, operands[0], operands[3], operands[1]);
   DONE;
-})
+}
+  [(set_attr "isa" "*,*,*,x64,x64")])
 
 (define_insn_and_split "*concat3_4"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r")
@@ -12495,7 +12476,8 @@ (define_insn_and_split "*concat3_4"
 {
   split_double_concat (mode, operands[0], operands[1], operands[2]);
   DONE;
-})
+}
+  [(set_attr "isa" "*,*,*,x64")])
 
 (define_insn_and_split "*concat3_5"
   [(set (match_operand:DWI 0 "nonimmediate_operand" "=r,o,o")


RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-23 Thread Li, Pan2 via Gcc-patches
Thanks Jeff for comments.

> Understood.  So the natural question is why does x86/sh not need this 
> for its mode switching?   Don't all the same issues exist on those 
> targets as well?

AFAIK, it comes from the different design principle between the risc-v and 
x86/arm intrinsic API.
The risc-v rvv FP rounding mode intrinsic API has one abstract level above the 
insn itself, while
the x86/arm only indicates the semantics of the insn.

For example, if one vector instruction VFADD doesn't have static rounding mode 
(aka encoding rm in insn),
there is no such a intrinsic API contains rounding mode argument in x86/arm. 
While the risc-v fp
vector intrinsic will always have static rounding mode API if the frm is 
honored.

In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm 
instrinsic API is closer to insn itself.

For the rest part, will have a try based on your suggestion soon as I am in the 
middle of something.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, August 23, 2023 10:25 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 00:03, Li, Pan2 wrote:
> Thanks Jeff for comments, and sorry for late response.
> 
> The background comes from the CALL insn. For the RISC-V dynamic rounding mode 
> we need to
> 
> 1. restore the frm BEFORE call, to avoid the static rounding mode pollute the 
> call.
> 2. Backup the frm AFTER call, to ensure the frm value after call is live.
> 
> Currently, we don’t take care of it elegantly but we would like to refine 
> this part by the optional EMIT_AFTER.
Understood.  So the natural question is why does x86/sh not need this 
for its mode switching?   Don't all the same issues exist on those 
targets as well?

> 
>> I'm not aware of a case where we can have an insn with control flow that
>> isn't the end of the block.  So perhaps then that second conditional
>> into an assertion inside the true arm?
> 
> Not very sure my understanding is correct, but there may be a call insn in 
> the middle of the bb,
> And can be considered as control flow?
In the case where the call is control flow, then it'll end the block. 
Examples of this would be if the call could throw or perform a nonlocal 
goto.  For "normal" calls, they are not considered control flow and can 
show up in the middle of a block.

> 
>> Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is
>> created by, say a nonlocal goto, exception handling, etc, then the insn
>> you insert at the end of the block will never be executed.
> 
> Got it, let me have a try for this, as well as there is somewhere take care 
> of this already.
You might also peek at the RTL gcse/pre code which is also LCM based and 
has the same class of problems.

jeff


Re: [PATCH] RISC-V: Add initial pipeline description for an out-of-order core.

2023-08-23 Thread Robin Dapp via Gcc-patches
> Does this patch fix these 2 following PR:
> 108271 – Missed RVV cost model (gnu.org) 
> 
> 108412 – RISC-V: Negative optimization of GCSE && LOOP INVARIANTS (gnu.org) 
> 
> 
> If yes, plz append these 2 cases into testsuite and indicate those 2 PR are 
> fixed.
> So that we can close them.

The second one is fixed on my local branch, the first not yet because there
is more to it still.  The second one is more due to pressure-aware scheduling
and I'm going to add it to the commit as well as the PR to the commit once this
is verified.

Regards
 Robin


Patch ping Re: [PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-23 Thread Jakub Jelinek via Gcc-patches
Hi!

Now that Richi has acked all the middle-end _BitInt patches (but am
deferring committing those until also the C FE and libgcc patches are
approved), I'd like to ping this patch.

Thanks!

On Wed, Aug 09, 2023 at 08:19:41PM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following patch enables _BitInt support on x86-64, the only
> target which has _BitInt specified in psABI.
> 
> 2023-08-09  Jakub Jelinek  
> 
>   PR c/102989
>   * config/i386/i386.cc (classify_argument): Handle BITINT_TYPE.
>   (ix86_bitint_type_info): New function.
>   (TARGET_C_BITINT_TYPE_INFO): Redefine.

Jakub



Re: [PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> A MOPS memmove may corrupt registers since there is no copy of the input 
> operands to temporary
> registers.  Fix this by calling aarch64_expand_cpymem which does this.  Also 
> fix an issue with
> STRICT_ALIGNMENT being ignored if TARGET_MOPS is true, and avoid crashing or 
> generating a huge
> expansion if aarch64_mops_memcpy_size_threshold is large.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/ChangeLog/
> PR target/21
> * config/aarch64/aarch64.md (cpymemdi): Remove STRICT_ALIGNMENT, add 
> param for memmove.
> (aarch64_movmemdi): Add new expander similar to aarch64_cpymemdi.
> (movmemdi): Like cpymemdi call aarch64_expand_cpymem for correct 
> expansion.
> * config/aarch64/aarch64.cc (aarch64_expand_cpymem_mops): Add support 
> for memmove.
> (aarch64_expand_cpymem): Add support for memmove. Handle 
> STRICT_ALIGNMENT correctly.
> Handle TARGET_MOPS size selection correctly.
> * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Update 
> prototype.
>
> gcc/testsuite/ChangeLog/
> PR target/21
> * gcc.target/aarch64/mops_4.c: Add memmove testcases.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 70303d6fd953e0c397b9138ede8858c2db2e53db..97375e81cbda078847af83bf5dd4e0d7673d6af4
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -765,7 +765,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
>  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  tree aarch64_vector_load_decl (tree);
>  void aarch64_expand_call (rtx, rtx, rtx, bool);
> -bool aarch64_expand_cpymem (rtx *);
> +bool aarch64_expand_cpymem (rtx *, bool);
>  bool aarch64_expand_setmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_float_const_rtx_p (rtx);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> eba5d4a7e04b7af82437453a691d5607d98133c9..5e8d0a0c91bc7719de2a8c5627b354cf905a4db0
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25135,10 +25135,11 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>*dst = aarch64_progress_pointer (*dst);
>  }
>
> -/* Expand a cpymem using the MOPS extension.  OPERANDS are taken
> -   from the cpymem pattern.  Return true iff we succeeded.  */
> +/* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken
> +   from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmove
> +   rather than memcpy.  Return true iff we succeeded.  */
>  static bool
> -aarch64_expand_cpymem_mops (rtx *operands)
> +aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)
>  {
>if (!TARGET_MOPS)
>  return false;
> @@ -25150,17 +25151,19 @@ aarch64_expand_cpymem_mops (rtx *operands)
>rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>rtx src_mem = replace_equiv_address (operands[1], src_addr);
>rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
> -  emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
> -
> +  if (is_memmove)
> +emit_insn (gen_aarch64_movmemdi (dst_mem, src_mem, sz_reg));
> +  else
> +emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
>return true;
>  }
>
> -/* Expand cpymem, as if from a __builtin_memcpy.  Return true if
> -   we succeed, otherwise return false, indicating that a libcall to
> -   memcpy should be emitted.  */
> -
> +/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove.
> +   OPERANDS are taken from the cpymem/movmem pattern.  IS_MEMMOVE is true
> +   if this is a memmove rather than memcpy.  Return true if we succeed,
> +   otherwise return false, indicating that a libcall should be emitted.  */
>  bool
> -aarch64_expand_cpymem (rtx *operands)
> +aarch64_expand_cpymem (rtx *operands, bool is_memmove)
>  {
>int mode_bits;
>rtx dst = operands[0];
> @@ -25168,25 +25171,23 @@ aarch64_expand_cpymem (rtx *operands)
>rtx base;
>machine_mode cur_mode = BLKmode;
>
> -  /* Variable-sized memcpy can go through the MOPS expansion if available.  
> */
> -  if (!CONST_INT_P (operands[2]))
> -return aarch64_expand_cpymem_mops (operands);
> +  /* Variable-sized or strict align copies may use the MOPS expansion.  */
> +  if (!CONST_INT_P (operands[2]) || STRICT_ALIGNMENT)
> +return aarch64_expand_cpymem_mops (operands, is_memmove);
>
>unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
>
> -  /* Try to inline up to 256 bytes or use the MOPS threshold if available.  
> */
> -  unsigned HOST_WIDE_INT max_copy_size
> -= TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256;
> +  /* Set inline limits for memmove/memcpy.  MOPS has a separate threshold.  
> */
> +  unsigned HOST_WIDE_INT max_copy_size = is_memmove ? 0 : 256;
> +  unsigned HOST_WIDE_INT max_mops_size = max_copy_size;
>
> -  bool size_p = optimize_function_

Re: [PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-23 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:19 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch enables _BitInt support on x86-64, the only
> target which has _BitInt specified in psABI.
>
> 2023-08-09  Jakub Jelinek  
>
> PR c/102989
> * config/i386/i386.cc (classify_argument): Handle BITINT_TYPE.
> (ix86_bitint_type_info): New function.
> (TARGET_C_BITINT_TYPE_INFO): Redefine.

LGTM, with a nit.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.cc.jj  2023-08-08 15:55:05.627176766 +0200
> +++ gcc/config/i386/i386.cc 2023-08-08 16:12:02.308940091 +0200
> @@ -2121,7 +2121,8 @@ classify_argument (machine_mode mode, co
> return 0;
>  }
>
> -  if (type && AGGREGATE_TYPE_P (type))
> +  if (type && (AGGREGATE_TYPE_P (type)
> +  || (TREE_CODE (type) == BITINT_TYPE && words > 1)))
>  {
>int i;
>tree field;
> @@ -2270,6 +2271,14 @@ classify_argument (machine_mode mode, co
> }
>   break;
>
> +   case BITINT_TYPE:
> + /* _BitInt(N) for N > 64 is passed as structure containing
> +(N + 63) / 64 64-bit elements.  */
> + if (words > 2)
> +   return 0;
> + classes[0] = classes[1] = X86_64_INTEGER_CLASS;
> + return 2;
> +
> default:
>   gcc_unreachable ();
> }
> @@ -24842,6 +24851,26 @@ ix86_get_excess_precision (enum excess_p
>return FLT_EVAL_METHOD_UNPREDICTABLE;
>  }
>
> +/* Return true if _BitInt(N) is supported and fill details about it into
> +   *INFO.  */

The above comment should fit into one line.

> +bool
> +ix86_bitint_type_info (int n, struct bitint_info *info)
> +{
> +  if (!TARGET_64BIT)
> +return false;
> +  if (n <= 8)
> +info->limb_mode = QImode;
> +  else if (n <= 16)
> +info->limb_mode = HImode;
> +  else if (n <= 32)
> +info->limb_mode = SImode;
> +  else
> +info->limb_mode = DImode;
> +  info->big_endian = false;
> +  info->extended = false;
> +  return true;
> +}
> +
>  /* Implement PUSH_ROUNDING.  On 386, we have pushw instruction that
> decrements by exactly 2 no matter what the position was, there is no 
> pushb.
>
> @@ -25446,6 +25475,8 @@ ix86_run_selftests (void)
>
>  #undef TARGET_C_EXCESS_PRECISION
>  #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
> +#undef TARGET_C_BITINT_TYPE_INFO
> +#define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
>  #undef TARGET_PROMOTE_PROTOTYPES
>  #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
>  #undef TARGET_PUSH_ARGUMENT
>
> Jakub
>


[PATCH] [frange] Relax floating point relational folding.

2023-08-23 Thread Aldy Hernandez via Gcc-patches
[Jakub/Andrew: I've been staring at this for far too long and could
use another pair of eyes.]

This patch implements a new frelop_early_resolve() that handles the
NAN special cases instead of calling into the integer version which
can break for some combinations.  Relaxing FP conditional folding in
this matter allows ranger to do a better job resulting in more
threading opportunities, among other things.

In auditing ranger versus DOM scoped tables I've noticed we are too
cautious when folding floating point conditionals involving
relationals.  We refuse to fold anything if there is the possibility
of a NAN, but this is overly restrictive.

For example:

  if (x_5 != y_8)
if (x_5 != y_8)
  link_error ();

In range-ops, we fail to fold the second conditional because
frelop_early_resolve bails on anything that may have a NAN, but in the
above case the possibility of a NAN is inconsequential.

However, there are some cases where we must be careful, because a NAN
can complicate matters:

  if (x_5 == x_5)
   ...

Here the operands to EQ_EXPR are the same so we get VREL_EQ as the
relation.  However, we can't fold the conditional unless we know x_5
cannot be a NAN.

On the other hand, we can fold the second conditional here:

  if (x_5 == x_5)
if (x_5 > x_5)

Because on the TRUE side of the first conditional we are guaranteed to
be free of NANs.

This patch is basically an inline of the integer version of
relop_early_resolve() with special casing for floats.

The main thing to keep in mind is that the relation coming into a
range-op entry may have a NAN, and for that one must look at the
operands.  This makes the relations akin to unordered comparisons,
making VREL_LT behave like VREL_UNLT would.

The tricky corner cases are VREL_EQ and VREL_NE, as discussed above.
Apart from these that are special cased, the relation table for
intersect should work fine for returning a FALSE, even with NANs.  The
union table, not so much and is documented in the code.

This allows us to add some optimizations for the unordered operators.
For example, a relation of VREL_LT on entry to an operator allows us
to fold an UNLT_EXPR as true, even with NANs because in this case
VREL_LT is really VREL_UNLT which maps perfectly.

BTW, we batted some ideas on how to get this work, and it seems this
is the cleaner route with the special cases nestled in the operators
themselves.  Another idea is to add unordered relations, but that
would require bloating the various tables adding spots for VREL_UNEQ,
VREL_UNLT, etc, plus adding relations for VREL_UNORDERED so the
intersects work correctly.  I'm not wed to either one, and we can
certainly revisit this if it becomes burdensome to maintain (or to get
right).

I'll hold off until the end of the week to commit, to wait for
feedback.

Tested on x86-64 Linux.

gcc/ChangeLog:

* range-op-float.cc (frelop_early_resolve): Rewrite for better NAN
handling.
(operator_not_equal::fold_range): Adjust for relations.
(operator_lt::fold_range): Same.
(operator_gt::fold_range): Same.
(foperator_unordered_equal::fold_range): Same.
(foperator_unordered_lt::fold_range): Same.
(foperator_unordered_le::fold_range): Same.
(foperator_unordered_gt::fold_range): Same.
(foperator_unordered_ge::fold_range): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp-float-12.c: New test.
---
 gcc/range-op-float.cc| 148 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-12.c |  23 +++
 2 files changed, 143 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-12.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index e30b489c410..14199647744 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -268,22 +268,75 @@ maybe_isnan (const frange &op1, const frange &op2)
   return op1.maybe_isnan () || op2.maybe_isnan ();
 }
 
-// Floating version of relop_early_resolve that takes into account NAN
-// and -ffinite-math-only.
+// Floating point version of relop_early_resolve that takes NANs into
+// account.
+//
+// For relation opcodes, first try to see if the supplied relation
+// forces a true or false result, and return that.
+// Then check for undefined operands.  If none of this applies,
+// return false.
+//
+// TRIO are the relations between operands as they appear in the IL.
+// MY_REL is the relation that corresponds to the operator being
+// folded.  For example, when attempting to fold x_3 == y_5, MY_REL is
+// VREL_EQ, and if the statement is dominated by x_3 > y_5, then
+// TRIO.op1_op2() is VREL_GT.
+//
+// Relations in a floating point world are a bit tricky, as TRIO
+// behaves as the corresponding unordered variant if either operand
+// could be a NAN.  For example, when resolving "if (x_5 == x_5)", the
+// relation is VREL_EQ, but it behaves like VREL_UNEQ if NANs are a
+// possibility.  Similarly, the false edge of "if (x 

Re: [PATCH] rtl: Forward declare rtx_code

2023-08-23 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches  writes:
> Now that we require C++ 11, we can safely forward declare rtx_code
> so that we can use it in target hooks.
>
> gcc/ChangeLog
>   * coretypes.h (rtx_code): Add forward declaration.
>   * rtl.h (rtx_code): Make compatible with forward declaration.
> ---
>  gcc/coretypes.h | 4 
>  gcc/rtl.h   | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index ca8837cef67..51e9ce0 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -100,6 +100,10 @@ struct gimple;
>  typedef gimple *gimple_seq;
>  struct gimple_stmt_iterator;
>  
> +/* Forward declare rtx_code, so that we can use it in target hooks without
> +   needing to pull in rtl.h.  */
> +enum rtx_code : unsigned;
> +
>  /* Forward decls for leaf gimple subclasses (for individual gimple codes).
> Keep this in the same order as the corresponding codes in gimple.def.  */
>  
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e1c51156f90..0e9491b89b4 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -45,7 +45,7 @@ class predefined_function_abi;
>  /* Register Transfer Language EXPRESSIONS CODES */
>  
>  #define RTX_CODE enum rtx_code
> -enum rtx_code  {
> +enum rtx_code : unsigned {
>  
>  #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS)   ENUM ,
>  #include "rtl.def"   /* rtl expressions are documented here */

Given:

  #define RTX_CODE_BITSIZE 8

there might be some value in making it uint8_t rather than unsigned.
Preapproved if you agree.

But the patch as posted is a strict improvement over the status quo,
so it's also OK as-is.

Thanks,
Richard


Re: [PATCH] rtl: use rtx_code for gen_ccmp_first and gen_ccmp_next

2023-08-23 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches  writes:
> Note, this patch is dependent on the patch I posted yesterday to
> forward declare rtx_code in coretypes.h.
>
> --
> Now that we have a forward declaration of rtx_code in coretypes.h, we
> can adjust these hooks to take rtx_code arguments rather than an int.
>
> gcc/ChangeLog:
>
>   * target.def (gen_ccmp_first, gen_ccmp_next): Use rtx_code for
>   CODE, CMP_CODE and BIT_CODE arguments.
>   * config/aarch64/aarch64.cc (aarch64_gen_ccmp_first): Likewise.
>   (aarch64_gen_ccmp_next): Likewise.
>   * doc/tm.texi: Regenerated.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 5 +++--
>  gcc/doc/tm.texi   | 4 ++--
>  gcc/target.def| 4 ++--
>  3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 560e5431636..bc09185b8ec 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25585,7 +25585,7 @@ aarch64_asan_shadow_offset (void)
>  
>  static rtx
>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
> - int code, tree treeop0, tree treeop1)
> + rtx_code code, tree treeop0, tree treeop1)
>  {
>machine_mode op_mode, cmp_mode, cc_mode = CCmode;
>rtx op0, op1;
> @@ -25659,7 +25659,8 @@ aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn 
> **gen_seq,
>  
>  static rtx
>  aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev,
> -int cmp_code, tree treeop0, tree treeop1, int bit_code)
> +rtx_code cmp_code, tree treeop0, tree treeop1,
> +rtx_code bit_code)
>  {
>rtx op0, op1, target;
>machine_mode op_mode, cmp_mode, cc_mode = CCmode;
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 95ba56e05ae..75cb8e3417c 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12005,7 +12005,7 @@ This target hook is required only when the target has 
> several different
>  modes and they have different conditional execution capability, such as ARM.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn 
> **@var{prep_seq}, rtx_insn **@var{gen_seq}, int @var{code}, tree @var{op0}, 
> tree @var{op1})
> +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn 
> **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx_code @var{code}, tree 
> @var{op0}, tree @var{op1})
>  This function prepares to emit a comparison insn for the first compare in a
>   sequence of conditional comparisions.  It returns an appropriate comparison
>   with @code{CC} for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.
> @@ -12015,7 +12015,7 @@ This function prepares to emit a comparison insn for 
> the first compare in a
>   @var{code} is the @code{rtx_code} of the compare for @var{op0} and 
> @var{op1}.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx_insn 
> **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx @var{prev}, int 
> @var{cmp_code}, tree @var{op0}, tree @var{op1}, int @var{bit_code})
> +@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx_insn 
> **@var{prep_seq}, rtx_insn **@var{gen_seq}, rtx @var{prev}, rtx_code 
> @var{cmp_code}, tree @var{op0}, tree @var{op1}, rtx_code @var{bit_code})
>  This function prepares to emit a conditional comparison within a sequence
>   of conditional comparisons.  It returns an appropriate comparison with
>   @code{CC} for passing to @code{gen_ccmp_next} or @code{cbranch_optab}.
> diff --git a/gcc/target.def b/gcc/target.def
> index 7d684296c17..3ad0bde3ece 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2735,7 +2735,7 @@ DEFHOOK
>   insns are saved in @var{gen_seq}.  They will be emitted when all the\n\
>   compares in the conditional comparision are generated without error.\n\
>   @var{code} is the @code{rtx_code} of the compare for @var{op0} and 
> @var{op1}.",
> - rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, int code, tree op0, tree 
> op1),
> + rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx_code code, tree op0, 
> tree op1),
>   NULL)
>  
>  DEFHOOK
> @@ -2752,7 +2752,7 @@ DEFHOOK
>   be appropriate for passing to @code{gen_ccmp_next} or 
> @code{cbranch_optab}.\n\
>   @var{code} is the @code{rtx_code} of the compare for @var{op0} and 
> @var{op1}.\n\
>   @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the 
> compares.",
> - rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, int cmp_code, tree 
> op0, tree op1, int bit_code),
> + rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, rtx_code cmp_code, 
> tree op0, tree op1, rtx_code bit_code),
>   NULL)
>  
>  /* Return a new value for loop unroll size.  */


[RFC] libstdc++: Make --enable-libstdcxx-backtrace=auto default to yes

2023-08-23 Thread Jonathan Wakely via Gcc-patches
Any objections to this? It's a C++23 feture, so should be enabled by
default.

-- >8 --

This causes libstdc++_libbacktrace.a to be built by default. This might
fail on some targets, in which case we can make the 'auto' choice expand
to either 'yes' or 'no' depending on the target.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_BACKTRACE): Default to yes.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 2 +-
 libstdc++-v3/configure| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b25378eaace..50c808c6b2d 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5481,7 +5481,7 @@ BACKTRACE_CPPFLAGS="$BACKTRACE_CPPFLAGS 
-DBACKTRACE_ELF_SIZE=$elfsize"
 
   AC_MSG_CHECKING([whether to build libbacktrace support])
   if test "$enable_libstdcxx_backtrace" = "auto"; then
-enable_libstdcxx_backtrace=no
+enable_libstdcxx_backtrace=yes
   fi
   AC_MSG_RESULT($enable_libstdcxx_backtrace)
   if test "$enable_libstdcxx_backtrace" = "yes"; then



Re: [PATCH] rtl: Forward declare rtx_code

2023-08-23 Thread Richard Earnshaw (lists) via Gcc-patches
On 23/08/2023 16:49, Richard Sandiford via Gcc-patches wrote:
> Richard Earnshaw via Gcc-patches  writes:
>> Now that we require C++ 11, we can safely forward declare rtx_code
>> so that we can use it in target hooks.
>>
>> gcc/ChangeLog
>>  * coretypes.h (rtx_code): Add forward declaration.
>>  * rtl.h (rtx_code): Make compatible with forward declaration.
>> ---
>>  gcc/coretypes.h | 4 
>>  gcc/rtl.h   | 2 +-
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>> index ca8837cef67..51e9ce0 100644
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -100,6 +100,10 @@ struct gimple;
>>  typedef gimple *gimple_seq;
>>  struct gimple_stmt_iterator;
>>  
>> +/* Forward declare rtx_code, so that we can use it in target hooks without
>> +   needing to pull in rtl.h.  */
>> +enum rtx_code : unsigned;
>> +
>>  /* Forward decls for leaf gimple subclasses (for individual gimple codes).
>> Keep this in the same order as the corresponding codes in gimple.def.  */
>>  
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index e1c51156f90..0e9491b89b4 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -45,7 +45,7 @@ class predefined_function_abi;
>>  /* Register Transfer Language EXPRESSIONS CODES */
>>  
>>  #define RTX_CODEenum rtx_code
>> -enum rtx_code  {
>> +enum rtx_code : unsigned {
>>  
>>  #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS)   ENUM ,
>>  #include "rtl.def"  /* rtl expressions are documented here */
> 
> Given:
> 
>   #define RTX_CODE_BITSIZE 8
> 
> there might be some value in making it uint8_t rather than unsigned.
> Preapproved if you agree.
> 
> But the patch as posted is a strict improvement over the status quo,
> so it's also OK as-is.
> 
> Thanks,
> Richard

I did think about that, but there were two reasons for not doing so:
- it presumes we would never want more than 8 bits for rtx_code (well, not 
quite, 
but it would make it more work to change this).
- it would probably lead to more zero-extension operations happening in the 
compiler

I'll put my patch in as is.

R.


Re: [PATCH] rtl: Forward declare rtx_code

2023-08-23 Thread Richard Sandiford via Gcc-patches
"Richard Earnshaw (lists)"  writes:
> On 23/08/2023 16:49, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw via Gcc-patches  writes:
>>> Now that we require C++ 11, we can safely forward declare rtx_code
>>> so that we can use it in target hooks.
>>>
>>> gcc/ChangeLog
>>> * coretypes.h (rtx_code): Add forward declaration.
>>> * rtl.h (rtx_code): Make compatible with forward declaration.
>>> ---
>>>  gcc/coretypes.h | 4 
>>>  gcc/rtl.h   | 2 +-
>>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>>> index ca8837cef67..51e9ce0 100644
>>> --- a/gcc/coretypes.h
>>> +++ b/gcc/coretypes.h
>>> @@ -100,6 +100,10 @@ struct gimple;
>>>  typedef gimple *gimple_seq;
>>>  struct gimple_stmt_iterator;
>>>  
>>> +/* Forward declare rtx_code, so that we can use it in target hooks without
>>> +   needing to pull in rtl.h.  */
>>> +enum rtx_code : unsigned;
>>> +
>>>  /* Forward decls for leaf gimple subclasses (for individual gimple codes).
>>> Keep this in the same order as the corresponding codes in gimple.def.  
>>> */
>>>  
>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>>> index e1c51156f90..0e9491b89b4 100644
>>> --- a/gcc/rtl.h
>>> +++ b/gcc/rtl.h
>>> @@ -45,7 +45,7 @@ class predefined_function_abi;
>>>  /* Register Transfer Language EXPRESSIONS CODES */
>>>  
>>>  #define RTX_CODE   enum rtx_code
>>> -enum rtx_code  {
>>> +enum rtx_code : unsigned {
>>>  
>>>  #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS)   ENUM ,
>>>  #include "rtl.def" /* rtl expressions are documented here */
>> 
>> Given:
>> 
>>   #define RTX_CODE_BITSIZE 8
>> 
>> there might be some value in making it uint8_t rather than unsigned.
>> Preapproved if you agree.
>> 
>> But the patch as posted is a strict improvement over the status quo,
>> so it's also OK as-is.
>> 
>> Thanks,
>> Richard
>
> I did think about that, but there were two reasons for not doing so:
> - it presumes we would never want more than 8 bits for rtx_code (well, not 
> quite, 
> but it would make it more work to change this).

The rtx_def structure itself provides a significant barrier to that though.

If we ever think that we need to represent more than 256 separate
operations, I think the natural way would be to treat the less well-used
ones in a similar way to unspecs.

> - it would probably lead to more zero-extension operations happening in the 
> compiler

Yeah, that's true.  The upside though is that we could then declare
arrays of codes directly, without having to resort to "unsigned char"
tricks.  That's unlikely to help codes much, but the same principle
would apply to modes, which are more frequently put into arrays.

E.g. one of the issues with bumping the machine_mode bitfield from 8 to
16 bits was finding all the places where "unsigned char" was used to
hold modes, and changing them to "unsigned short".  If machine_mode was
instead the "right" size, we could just call a spade a spade.

But like I say, that's mostly reasoning by analogy rather than because
the size of rtx_code itself is important.

Richard


Re: Another bug for __builtin_object_size? (Or expected behavior)

2023-08-23 Thread Qing Zhao via Gcc-patches


> On Aug 18, 2023, at 12:00 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Aug 17, 2023, at 5:32 PM, Siddhesh Poyarekar  wrote:
>> 
>> On 2023-08-17 17:25, Qing Zhao wrote:
 It's not exactly the same issue, the earlier discussion was about choosing 
 sizes in the same pass while the current one is about choosing between 
 passes, but I agree it "rhymes".  This is what I was alluding to 
 originally (for OST_MINIMUM use MIN_EXPR if both passes returned a pass) 
 but I haven't thought about it hard enough to be 100% confident that it's 
 the better solution, especially for OST_MAXIMUM.
>>> We have two different sources to get SIZE information for the subobject:
>>> 1. From the TYPESIZE information embedded in the IR;
>>> 2. From the initialization information propagated from data flow, this 
>>> includes both malloc call and the DECL_INIT.
>>> We need to choose between these two when both available, (these two 
>>> information could be
>>> in the same pass as we discussed before, or in different passes which is 
>>> shown in this discussion).
>>> I think that the MIN_EXPR might be the right choice (especially for 
>>> OST_MAXIMUM) -:)
>> 
>> It's worth a shot I guess.  We could emit something like the following in 
>> early_object_sizes_execute_one:
>> 
>> sz = (__bos(o->sub, ost) == unknown
>>   ? early_size
>>   : MIN_EXPR (__bos(o->sub, ost), early_size));
>> 
>> and see if it sticks.
> 
> I came up with the following change for tree-object-size.cc:
> 
> diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
> index a62af0500563..e1b2008c6dcc 100644
> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -2016,10 +2016,22 @@ do_valueize (tree t)
>   return t;
> }
> 
> -/* Process a __builtin_object_size or __builtin_dynamic_object_size call in
> -   CALL early for subobjects before any object information is lost due to
> -   optimization.  Insert a MIN or MAX expression of the result and
> -   __builtin_object_size at I so that it may be processed in the second pass.
> +/* Process a __builtin_object_size or __builtin_dynamic_object_size call
> +   early for subobjects before any object information is lost due to
> +   optimization.
> +
> +   We have two different sources to get the size information for subobjects:
> +   A. The TYPE information of the subobject in the IR;
> +   B. The initialization information propagated through data flow.
> +   In the early pass, only A is available.
> +   B might be available in the second pass.
> +
> +   If both A and B are available, we should choose the minimum one between
> +   these two.
> +
> +   Insert a MIN expression of the result from the early pass and the original
> +   __builtin_object_size call at I so that it may be processed in the second 
> pass.
> +
>__builtin_dynamic_object_size is treated like __builtin_object_size here
>since we're only looking for constant bounds.  */
> 
> @@ -2036,7 +2048,7 @@ early_object_sizes_execute_one (gimple_stmt_iterator 
> *i, gimple *call)
>   unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
>   tree ptr = gimple_call_arg (call, 0);
> 
> -  if (object_size_type != 1 && object_size_type != 3)
> +  if (object_size_type & OST_SUBOBJECT == 0)
> return;
> 
>   if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
> @@ -2050,9 +2062,8 @@ early_object_sizes_execute_one (gimple_stmt_iterator 
> *i, gimple *call)
> 
>   tree tem = make_ssa_name (type);
>   gimple_call_set_lhs (call, tem);
> -  enum tree_code code = object_size_type & OST_MINIMUM ? MAX_EXPR : MIN_EXPR;
>   tree cst = fold_convert (type, bytes);
> -  gimple *g = gimple_build_assign (lhs, code, tem, cst);
> +  gimple *g = gimple_build_assign (lhs, MIN_EXPR, tem, cst);
>   gsi_insert_after (i, g, GSI_NEW_STMT);
>   update_stmt (call);
> }
> 
> Let me know if you see any issue with the change.

I tested the above change, everything is fine except one testing case in 
gcc.dg/builtin-object-size-4.c

I reduced the failed case to the following small one:

#include 
#include 
#include 

/* Tests for strdup/strndup.  */
size_t
__attribute__ ((noinline))
test9 (void)
{
  const char *ptr = "abcdefghijklmnopqrstuvwxyz";
  char *res = strndup (ptr, 21);
  int n = 0;
  if ((n = __builtin_object_size (res, 3)) != 22)
printf("FAIL is %d\n", n);

  free (res);
}

int
main (void)
{
  test9 ();
}
[opc@qinzhao-ol8u3-x86 gcc]$ sh t
FAIL is 1

I debugged into tree-object-size.cc, the routine “strdup_object_size”, and have 
two questions on two places:

1. For the following:

 844   /* In all other cases, return the size of SRC since the object size 
cannot
 845  exceed that.  We cannot do this for OST_MINIMUM unless SRC points 
into a
 846  string constant since otherwise the object size could go all the way 
down
 847  to zero.  */
…
 864   /* For maximum estimate, our next best guess is the object size of 
the
 865  source.  */
 866   if (size_unknown_p 

Re: [PATCH] Fix tests sensitive to internal library allocations

2023-08-23 Thread François Dumont via Gcc-patches



On 21/08/2023 23:26, Jonathan Wakely wrote:

On Mon, 21 Aug 2023 at 21:20, François Dumont  wrote:

Here is the updated and tested patch.

OK for trunk, thanks.

We could consider it for the branches too (I'm going to remove the
global strings on the gcc-13 branch tomorrow).


It's not fixing anything so I don't think it worth it but let me know if 
you want me to do so.





Re: [PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

(that's quick!)

> +  if (size > max_copy_size || size > max_mops_size)
> +return aarch64_expand_cpymem_mops (operands, is_memmove);
>
> Could you explain this a bit more?  If I've followed the logic correctly,
> max_copy_size will always be 0 for movmem, so this "if" condition will
> always be true for movmem (given that the caller can be relied on to
> optimise away zero-length copies).  So doesn't this function reduce to:

In this patch it is zero yes, but there is no real reason for that. The goal is 
to
share as much code as possible. I have a patch that inlines memmove like
memcpy.

> when is_memmove is true?  If so, I think it would be clearer to do that
> directly, rather than go through aarch64_expand_cpymem.  max_copy_size
> is really an optimisation threshold, whereas the above seems to be
> leaning on it for correctness.

In principle we could for the time being add a assert (!is_memmove) if that
makes it clearer memmove isn't yet handled.

> ...I think we might as well keep this pattern conditional on TARGET_MOPS.

But then we have inconsistencies in the conditions of the expanders, which
is what led to all these bugs in the first place (I lost count, there are 4 or 5
different bugs I fixed). Ensuring everything is 100% identical between
memcpy and memmove makes the code much easier to follow.

> I think we can then also split:
>
>   /* All three registers are changed by the instruction, so each one
>  must be a fresh pseudo.  */
>   rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (operands[0], 0));
>   rtx src_addr = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));
>   rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>   rtx src_mem = replace_equiv_address (operands[1], src_addr);
>   rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
>
> out of aarch64_expand_cpymem_mops into a new function (say
> aarch64_prepare_mops_operands) and call it from the movmemdi
> expander.  There should then be no need for the extra staging
> expander (aarch64_movmemdi).

So you're saying we could remove aarch64_cpymemdi/movmemdi if
aarch64_expand_cpymem_mops did massage the operands in the
right way so that we can immediately match the underlying instruction?

Hmm, does that actually work, as in we don't lose the extra alias info that
gets lost in the current memmove expander? (another bug/inconsistency)

And the MOPS code would be separated from aarch64_expand_cpymem
so we'd do all the MOPS size tests inside aarch64_expand_cpymem_mops
and the expander tries using MOPS first and if it fails try inline expansion?

So something like:

(define_expand "movmemdi"

  if (aarch64_try_mops_expansion (operands, is_memmove))
DONE;
  if (aarch64_try_inline_copy_expansion (operands, is_memmove))
DONE;
  FAIL;
)

> IMO the STRICT_ALIGNMENT stuff should be a separate patch,
> with its own testcases.

We will need backports to fix all these bugs, so the question is whether it
is worth doing a lot of cleanups now?

Cheers,
Wilco


[PATCH][_GLIBCXX_INLINE_VERSION] Fix friend declarations

2023-08-23 Thread François Dumont via Gcc-patches

Hi

The few tests that are failing in versioned namespace mode are due to 
those friend declarations.


This is a fix proposal even if I considered 2 other options:

1. Make __format::_Arg_store a struct and so do not bother with friend 
declarations.


2. Consider it as a compiler bug and do nothing. In this case I think we 
might still need this patch to avoid a non-working format library in 
versioned namespace mode in gcc 14 if compiler bug is not fixed.


I can also define _GLIBCXX_STD_V at  level to limit impact.

    libstdc++: [_GLIBCXX_INLINE_VERSION] Fix  friend declarations

    GCC do not consider the inline namespace in friend declarations. We 
need

    to explicit this namespace.

    libstdc++-v3/ChangeLog:

    * include/bits/c++config (_GLIBCXX_STD_V): New macro giving 
current

    std namespace with optionally the version namespace.
    * include/std/format (std::__format::_Arg_store): Use 
latter on friend

    declarations.

Tested under versioned mode.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 0a41cdd29a9..a917fb58225 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -449,6 +449,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // of some nested namespace within it corresponding to the active mode.
 // _GLIBCXX_STD_A
 // _GLIBCXX_STD_C
+// _GLIBCXX_STD_V
 //
 // Macros for opening/closing conditional namespaces.
 // _GLIBCXX_BEGIN_NAMESPACE_ALGO
@@ -477,6 +478,12 @@ _GLIBCXX_END_NAMESPACE_VERSION
 # define _GLIBCXX_END_NAMESPACE_ALGO
 #endif
 
+#if _GLIBCXX_INLINE_VERSION
+# define _GLIBCXX_STD_V std::__8
+#else
+# define _GLIBCXX_STD_V std
+#endif
+
 // GLIBCXX_ABI Deprecated
 // Define if compatibility should be provided for -mlong-double-64.
 #undef _GLIBCXX_LONG_DOUBLE_COMPAT
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index f3d9ae152f9..94417c321e4 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3429,11 +3429,11 @@ namespace __format
   template
 class __format::_Arg_store
 {
-  friend std::basic_format_args<_Context>;
+  friend _GLIBCXX_STD_V::basic_format_args<_Context>;
 
   template
 	friend auto
-	std::make_format_args(_Argz&&...) noexcept;
+	_GLIBCXX_STD_V::make_format_args(_Argz&&...) noexcept;
 
   // For a sufficiently small number of arguments we only store values.
   // basic_format_args can get the types from the _Args pack.


Re: [PATCH V4] Add warning options -W[no-]compare-distinct-pointer-types

2023-08-23 Thread Marek Polacek via Gcc-patches
On Thu, Aug 17, 2023 at 05:37:03PM +0200, Jose E. Marchesi via Gcc-patches 
wrote:
> 
> > On Thu, 17 Aug 2023, Jose E. Marchesi via Gcc-patches wrote:
> >
> >> +@opindex Wcompare-distinct-pointer-types
> >> +@item -Wcompare-distinct-pointer-types
> >
> > This @item should say @r{(C and Objective-C only)}, since the option isn't 
> > implemented for C++.  OK with that change.
> 
> Pushed with that change.
> Thanks for the prompt review!

I see the following failures:

FAIL: gcc.c-torture/compile/pr106537-1.c   -Os   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-1.c   -Os   (test for warnings, line 30)
FAIL: gcc.c-torture/compile/pr106537-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   (test for warnings, line 30)
FAIL: gcc.c-torture/compile/pr106537-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   (test for warnings, line 30)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O0   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O0   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O1   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O1   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O3 -g   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O3 -g   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -Os   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -Os   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   (test for warnings, line 28)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   (test for warnings, line 26)
FAIL: gcc.c-torture/compile/pr106537-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   (test for warnings, line 28)

The problem is that for ==/!=, when one of the types is void*,
build_binary_op goes to the branch attempting to warn about
comparing void* with a function pointer, and never gets to the 
-Wcompare-distinct-pointer-types warning.

Marek



Re: [PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
> (that's quick!)
>
>> +  if (size > max_copy_size || size > max_mops_size)
>> +return aarch64_expand_cpymem_mops (operands, is_memmove);
>>
>> Could you explain this a bit more?  If I've followed the logic correctly,
>> max_copy_size will always be 0 for movmem, so this "if" condition will
>> always be true for movmem (given that the caller can be relied on to
>> optimise away zero-length copies).  So doesn't this function reduce to:
>
> In this patch it is zero yes, but there is no real reason for that. The goal 
> is to
> share as much code as possible. I have a patch that inlines memmove like
> memcpy.

But I think this part of the patch belongs in that future series.
The current patch should just concentrate on fixing the bug.

It's difficult to evaluate the change at the moment, without the follow-on
change that it's preparing for.  I don't think it stands as an indepedent
improvement in its own right.

>> when is_memmove is true?  If so, I think it would be clearer to do that
>> directly, rather than go through aarch64_expand_cpymem.  max_copy_size
>> is really an optimisation threshold, whereas the above seems to be
>> leaning on it for correctness.
>
> In principle we could for the time being add a assert (!is_memmove) if that
> makes it clearer memmove isn't yet handled.

I think for this patch movmemdi should just call aarch64_expand_cpymem_mops
directly.  Let's leave the aarch64_expand_cpymem changes to other patches.

>> ...I think we might as well keep this pattern conditional on TARGET_MOPS.
>
> But then we have inconsistencies in the conditions of the expanders, which
> is what led to all these bugs in the first place (I lost count, there are 4 
> or 5
> different bugs I fixed). Ensuring everything is 100% identical between
> memcpy and memmove makes the code much easier to follow.

I think that too should be part of your follow-on changes to do inline
movmem expansions without TARGET_MOPS.  While all supported movmemdis
require TARGET_MOPS, I think the expander should too.

>> I think we can then also split:
>>
>>   /* All three registers are changed by the instruction, so each one
>>  must be a fresh pseudo.  */
>>   rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (operands[0], 0));
>>   rtx src_addr = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));
>>   rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>>   rtx src_mem = replace_equiv_address (operands[1], src_addr);
>>   rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
>>
>> out of aarch64_expand_cpymem_mops into a new function (say
>> aarch64_prepare_mops_operands) and call it from the movmemdi
>> expander.  There should then be no need for the extra staging
>> expander (aarch64_movmemdi).
>
> So you're saying we could remove aarch64_cpymemdi/movmemdi if
> aarch64_expand_cpymem_mops did massage the operands in the
> right way so that we can immediately match the underlying instruction?

Yeah.  But I'd forgotten about the pesky fourth (alignment) operand
to movmemdi and cpymemdi, which we don't need for the mops patterns.
So I take that part back.  I agree it's clearer to have a separate
aarch64_movmemdi expander.

> Hmm, does that actually work, as in we don't lose the extra alias info that
> gets lost in the current memmove expander? (another bug/inconsistency)
>
> And the MOPS code would be separated from aarch64_expand_cpymem
> so we'd do all the MOPS size tests inside aarch64_expand_cpymem_mops
> and the expander tries using MOPS first and if it fails try inline expansion?
>
> So something like:
>
> (define_expand "movmemdi"
> 
>   if (aarch64_try_mops_expansion (operands, is_memmove))
> DONE;
>   if (aarch64_try_inline_copy_expansion (operands, is_memmove))
> DONE;
>   FAIL;
> )
>
>> IMO the STRICT_ALIGNMENT stuff should be a separate patch,
>> with its own testcases.
>
> We will need backports to fix all these bugs, so the question is whether it
> is worth doing a lot of cleanups now?

But I think what I'm asking for is significantly simpler than the
original patch.  That should make it more backportable rather than less.

Thanks,
Richard


[COMMITTED 1/2] Phi analyzer - Do not create phi groups with a single phi.

2023-08-23 Thread Andrew MacLeod via Gcc-patches
Rangers Phi Analyzer was creating a group consisting of a single PHI, 
which was problematic.  It didn't really help anything, and it prevented 
larger groups from including those PHIs and stopped some useful things 
from happening.


Bootstrapped on x86_64-pc-linux-gnu  with no regressions. Pushed.

Andrew
From 9855b3f0a2869d456f0ee34a94a1231eb6d44c4a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 16 Aug 2023 13:23:06 -0400
Subject: [PATCH 1/4] Don't process phi groups with one phi.

The phi analyzer should not create a phi group containing a single phi.

	* gimple-range-phi.cc (phi_analyzer::operator[]): Return NULL if
	no group was created.
	(phi_analyzer::process_phi): Do not create groups of one phi node.
---
 gcc/gimple-range-phi.cc | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-range-phi.cc b/gcc/gimple-range-phi.cc
index ffb4691d06b..a94b90a4660 100644
--- a/gcc/gimple-range-phi.cc
+++ b/gcc/gimple-range-phi.cc
@@ -344,9 +344,10 @@ phi_analyzer::operator[] (tree name)
   process_phi (as_a (SSA_NAME_DEF_STMT (name)));
   if (bitmap_bit_p (m_simple, v))
 	return  NULL;
-  // If m_simple bit isn't set, then process_phi allocated the table
-  // and should have a group.
-  gcc_checking_assert (v < m_tab.length ());
+ // If m_simple bit isn't set, and process_phi didn't allocated the table
+ // no group was created, so return NULL.
+ if (v >= m_tab.length ())
+  return NULL;
 }
   return m_tab[v];
 }
@@ -363,6 +364,7 @@ phi_analyzer::process_phi (gphi *phi)
   unsigned x;
   m_work.truncate (0);
   m_work.safe_push (gimple_phi_result (phi));
+  unsigned phi_count = 1;
   bitmap_clear (m_current);
 
   // We can only have 2 externals: an initial value and a modifier.
@@ -407,6 +409,7 @@ phi_analyzer::process_phi (gphi *phi)
 	  gimple *arg_stmt = SSA_NAME_DEF_STMT (arg);
 	  if (arg_stmt && is_a (arg_stmt))
 		{
+		  phi_count++;
 		  m_work.safe_push (arg);
 		  continue;
 		}
@@ -430,9 +433,12 @@ phi_analyzer::process_phi (gphi *phi)
 	}
 }
 
-  // If there are no names in the group, we're done.
-  if (bitmap_empty_p (m_current))
+  // If there are less than 2 names, just return.  This PHI may be included
+  // by another PHI, making it simple or a group of one will prevent a larger
+  // group from being formed.
+  if (phi_count < 2)
 return;
+  gcc_checking_assert (!bitmap_empty_p (m_current));
 
   phi_group *g = NULL;
   if (cycle_p)
-- 
2.41.0



[COMMITTED 2/2] tree-optimization/110918 - Phi analyzer - Initialize with a range instead of a tree.

2023-08-23 Thread Andrew MacLeod via Gcc-patches
Rangers PHI analyzer currently only allows a single initializing value 
to a group. This patch changes that to use an initialization range, which is
cumulative of all integer constants, plus a single symbolic value.  
There were many times when there were multiple constants feeding into 
PHIs and there is no reason to disqualify those from determining if 
there is a better starting range for a PHI,


This patch also changes the way PHI groups are printed so they show up 
in the listing as they are encountered, rather than as a list at the 
end.  It was quite difficult to see what was going on when it simply 
dumped the groups at the end of processing.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From bd50bbfa95e51edf51392f147e9a860adb5f495e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Aug 2023 12:34:59 -0400
Subject: [PATCH 2/4] Phi analyzer - Initialize with range instead of a tree.

Rangers PHI analyzer currently only allows a single initializer to a group.
This patch changes that to use an inialization range, which is
cumulative of all integer constants, plus a single symbolic value.  There is no other change to group functionality.

This patch also changes the way PHI groups are printed so they show up in the
listing as they are encountered, rather than as a list at the end.  It
was more difficult to see what was going on previously.

	PR tree-optimization/110918 - Initialize with range instead of a tree.
	gcc/
	* gimple-range-fold.cc (fold_using_range::range_of_phi): Tweak output.
	* gimple-range-phi.cc (phi_group::phi_group): Remove unused members.
	Initialize using a range instead of value and edge.
	(phi_group::calculate_using_modifier): Use initializer value and
	process for relations after trying for iteration convergence.
	(phi_group::refine_using_relation): Use initializer range.
	(phi_group::dump): Rework the dump output.
	(phi_analyzer::process_phi): Allow multiple constant initilizers.
	Dump groups immediately as created.
	(phi_analyzer::dump): Tweak output.
	* gimple-range-phi.h (phi_group::phi_group): Adjust prototype.
	(phi_group::initial_value): Delete.
	(phi_group::refine_using_relation): Adjust prototype.
	(phi_group::m_initial_value): Delete.
	(phi_group::m_initial_edge): Delete.
	(phi_group::m_vr): Use int_range_max.
	* tree-vrp.cc (execute_ranger_vrp): Don't dump phi groups.

	gcc/testsuite/
	* gcc.dg/pr102983.c: Adjust output expectations.
	* gcc.dg/pr110918.c: New.
---
 gcc/gimple-range-fold.cc|   6 +-
 gcc/gimple-range-phi.cc | 186 
 gcc/gimple-range-phi.h  |   9 +-
 gcc/testsuite/gcc.dg/pr102983.c |   2 +-
 gcc/testsuite/gcc.dg/pr110918.c |  26 +
 gcc/tree-vrp.cc |   5 +-
 6 files changed, 129 insertions(+), 105 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110918.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 7fa5a27cb12..8ebff7f5980 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -953,7 +953,7 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, fur_source &src)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 		{
-		  fprintf (dump_file, "   Loops range found for ");
+		  fprintf (dump_file, "Loops range found for ");
 		  print_generic_expr (dump_file, phi_def, TDF_SLIM);
 		  fprintf (dump_file, ": ");
 		  loop_range.dump (dump_file);
@@ -975,9 +975,9 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, fur_source &src)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	{
-	  fprintf (dump_file, "   PHI group range found for ");
+	  fprintf (dump_file, "PHI GROUP query for ");
 	  print_generic_expr (dump_file, phi_def, TDF_SLIM);
-	  fprintf (dump_file, ": ");
+	  fprintf (dump_file, " found : ");
 	  g->range ().dump (dump_file);
 	  fprintf (dump_file, " and adjusted original range from :");
 	  r.dump (dump_file);
diff --git a/gcc/gimple-range-phi.cc b/gcc/gimple-range-phi.cc
index a94b90a4660..9884a0ebbb0 100644
--- a/gcc/gimple-range-phi.cc
+++ b/gcc/gimple-range-phi.cc
@@ -79,39 +79,33 @@ phi_analyzer &phi_analysis ()
 phi_group::phi_group (const phi_group &g)
 {
   m_group = g.m_group;
-  m_initial_value = g.m_initial_value;
-  m_initial_edge = g.m_initial_edge;
   m_modifier = g.m_modifier;
   m_modifier_op = g.m_modifier_op;
   m_vr = g.m_vr;
 }
 
-// Create a new phi_group with members BM, initialvalue INIT_VAL, modifier
-// statement MOD, and resolve values using query Q.
-// Calculate the range for the gropup if possible, otherwise set it to
-// VARYING.
+// Create a new phi_group with members BM, initial range INIT_RANGE, modifier
+// statement MOD on edge MOD_EDGE, and resolve values using query Q.  Calculate
+// the range for the group if possible, otherwise set it to VARYING.
 
-phi_group::phi_group (bitmap bm, tree init_val, edge e, gimple *mod,
+phi_group::phi_group (bitmap bm, irange &init_range, gimple *mod,
 		  

[PATCH] Fix for bug libstdc++/111102 pointer arithmetic on nullptr

2023-08-23 Thread Paul Dreik via Gcc-patches
This fixes pointer arithmetic made on a null pointer, which I found 
through fuzzing.

Tested on debian/amd64.

Thanks, Paul


commit 78ac41590432f4f01036797fd9d661f6ed80cf37 (HEAD -> master)
Author: Paul Dreik 
Date:   Tue Aug 22 19:16:57 2023 +0200

libstdc++: fix illegal pointer arithmetic in format

when parsing a format string, the width is parsed into an unsigned 
short

but the result is not checked in the case the format string is not a
char string (such as a wide string). in case the parse fails,
a null pointer is returned which is used for pointer arithmetic
which is undefined behaviour.

Signed-off-by: Paul Dreik 

diff --git a/libstdc++-v3/include/std/format 
b/libstdc++-v3/include/std/format

index f3d9ae152f..fe2caa5868 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -285,7 +285,8 @@ namespace __format
  for (int __i = 0; __i < __n && (__first + __i) != __last; ++__i)
__buf[__i] = __first[__i];
  auto [__v, __ptr] = __format::__parse_integer(__buf, __buf + 
__n);

- return {__v, __first + (__ptr - __buf)};
+ if (__ptr) [[likely]]
+   return {__v, __first + (__ptr - __buf)};
}
   return {0, nullptr};
 }


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] Fortran: improve diagnostic message for COMMON with automatic object [PR32986]

2023-08-23 Thread Harald Anlauf via Gcc-patches
Dear all,

here's a simple patch for a very old PR that suggests a more helpful
error message for an automatic object in a COMMON.  The patch also
suppresses the less helpful old error message after the new one has
been emitted.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 829c0c06fe7ba2cf3e83508b95999b884b21236d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 23 Aug 2023 21:08:01 +0200
Subject: [PATCH] Fortran: improve diagnostic message for COMMON with automatic
 object [PR32986]

gcc/fortran/ChangeLog:

	PR fortran/32986
	* resolve.cc (is_non_constant_shape_array): Add forward declaration.
	(resolve_common_vars): Diagnose automatic array object in COMMON.
	(resolve_symbol): Prevent confusing follow-on error.

gcc/testsuite/ChangeLog:

	PR fortran/32986
	* gfortran.dg/common_28.f90: New test.
---
 gcc/fortran/resolve.cc  | 15 ++-
 gcc/testsuite/gfortran.dg/common_28.f90 |  7 +++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/common_28.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index ce8261d646a..1042b8c18e8 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -959,6 +959,10 @@ cleanup:
 }


+/* Forward declaration.  */
+static bool is_non_constant_shape_array (gfc_symbol *sym);
+
+
 /* Resolve common variables.  */
 static void
 resolve_common_vars (gfc_common_head *common_block, bool named_common)
@@ -1007,6 +1011,15 @@ resolve_common_vars (gfc_common_head *common_block, bool named_common)
 	gfc_error_now ("%qs at %L cannot appear in COMMON "
 		   "[F2008:C5100]", csym->name, &csym->declared_at);

+  if (csym->attr.dimension && is_non_constant_shape_array (csym))
+	{
+	  gfc_error_now ("Automatic object %qs at %L cannot appear in "
+			 "COMMON at %L", csym->name, &csym->declared_at,
+			 &common_block->where);
+	  /* Avoid confusing follow-on error.  */
+	  csym->error = 1;
+	}
+
   if (csym->ts.type != BT_DERIVED)
 	continue;

@@ -16612,7 +16625,7 @@ resolve_symbol (gfc_symbol *sym)
   /* Resolve array specifier. Check as well some constraints
  on COMMON blocks.  */

-  check_constant = sym->attr.in_common && !sym->attr.pointer;
+  check_constant = sym->attr.in_common && !sym->attr.pointer && !sym->error;

   /* Set the formal_arg_flag so that check_conflict will not throw
  an error for host associated variables in the specification
diff --git a/gcc/testsuite/gfortran.dg/common_28.f90 b/gcc/testsuite/gfortran.dg/common_28.f90
new file mode 100644
index 000..9b583b9948d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/common_28.f90
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/32986 - Improve diagnostic message for COMMON with automatic object
+
+function a(n)
+  real :: x(n) ! { dg-error "Automatic object" }
+  common /c/ x ! { dg-error "cannot appear in COMMON" }
+end function
--
2.35.3



Re: [PATCH v7 1/4] driver: add a spec function to join arguments

2023-08-23 Thread Jason Merrill via Gcc-patches

On 7/2/23 12:32, Ben Boeckel wrote:

When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument.


Odd.  I looked into this to figure out what was going on, and now I 
understand: when process_brace_body handles e.g. %{o*:-fjoined=%.x%*}, 
first it replaces $* with the rest of the flag, i.e. "", resulting in 
-fjoined=, and then adds the argument as a separate argument to the 
result of substitution.  This seems strange, but works fine for the 
existing uses that build one Separate switch from another.


The other oddity you mention comes from

  /* End of string.  If we are processing a spec function, we need to   
 end any pending argument.  */

  if (processing_spec_function)
end_going_arg ();


so that when give_switch calls do_spec_1 twice for the basename and 
suffix, they end up as separate arguments to the spec function.  I don't 
know the purpose of this code; it doesn't seem to have been necessary 
for the if-exists spec function that was added in the same patch 
(r59241).  Removing this doesn't seem to break anything for me.


The join function works around both of these issues.  But I notice that 
the use of reconcat is a memory leak, and since we have the obstack 
available I've tweaked the function to use it.  I also added some 
documentation.


Joseph, any thoughts on these issues or the workaround?

JasonFrom ca7c76c5b44e76e8596bf8db68d6210fd9ddf113 Mon Sep 17 00:00:00 2001
From: Ben Boeckel 
Date: Sun, 2 Jul 2023 12:32:08 -0400
Subject: [PATCH] driver: add a spec function to join arguments
To: gcc-patches@gcc.gnu.org

When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

	* gcc.cc (join_spec_func): Add a spec function to join all
	arguments.
	* doc/invoke.texi: Document it.

Signed-off-by: Ben Boeckel 
Co-authored-by: Jason Merrill 
---
 gcc/doc/invoke.texi | 10 ++
 gcc/gcc.cc  | 23 +++
 2 files changed, 33 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ef3f4098986..7c475ee5c82 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -35236,6 +35236,16 @@ usage:
 -l%:if-exists-then-else(%:getenv(VSB_DIR rtnet.h) rtnet net)
 @end smallexample
 
+@item @code{join}
+The @code{join} spec function returns the concatenation of its
+arguments.  This is currently used to build e.g. @samp{-fjoined=foo.b}
+from @samp{-fseparate foo.a}, as the behavior of @samp{%*} without this
+function produces the broken @samp{-fjoined= foo.b} instead.
+
+@smallexample
+%@{-fseparate*:-fjoined=%:join(%.b%*)@}
+@end smallexample
+
 @item @code{sanitize}
 The @code{sanitize} spec function takes no arguments.  It returns non-NULL if
 any address, thread or undefined behavior sanitizers are active.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..4c4e81dee50 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char **);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] =
   { "debug-level-gt",		debug_level_greater_than_spec_func },
   { "dwarf-version-gt",		dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file",	find_fortran_preinclude_file},
+  { "join",			join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,27 @@ find_fortran_preinclude_file (int argc, const char **argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.
+
+   This seems to be necessary to build "-fjoined=foo.b" from "-fseparate foo.a"
+   with a %{fseparate*:-fjoined=%.b$*} rule without adding undesired spaces:
+   when doing $* replacement we first replace $* with the rest of the switch
+   (i

Re: [PATCH] Fortran: improve diagnostic message for COMMON with automatic object [PR32986]

2023-08-23 Thread Steve Kargl via Gcc-patches
On Wed, Aug 23, 2023 at 09:16:08PM +0200, Harald Anlauf via Fortran wrote:
> 
> here's a simple patch for a very old PR that suggests a more helpful
> error message for an automatic object in a COMMON.  The patch also
> suppresses the less helpful old error message after the new one has
> been emitted.
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 

OK.  I leave the decision on backporting to you.

-- 
Steve


Re: [PATCH v7 2/4] p1689r5: initial support

2023-08-23 Thread Jason Merrill via Gcc-patches

On 7/2/23 12:32, Ben Boeckel wrote:

This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
   `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdeps-target=` specifies the `.o` that will be written for the TU
   that is scanned. This is required so that the build system can
   correlate the dependency output with the actual compilation that will
   occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously representable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.


Thank you for your patience, just a few tweaks left and I think we can 
put this all in this week or next.



diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index aef703f8111..141cfd60eda 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -302,6 +302,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
  /* Style of header dependencies to generate.  */
  enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
  
+/* Structured format of module dependencies to generate.  */

+enum cpp_fdeps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };


These should be FDEPS_FMT_* or just FDEPS_*.


@@ -395,10 +423,16 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
int colmax)
   if (colmax && colmax < 34)
 colmax = 34;
 
+  /* Write out C++ modules information if no other `-fdeps-format=`

+ option is given. */
+  cpp_fdeps_format fdeps_format = CPP_OPTION (pfile, deps.fdeps_format);
+  bool write_make_modules_deps = fdeps_format == DEPS_FMT_NONE &&
+CPP_OPTION (pfile, deps.modules);


We typically format an expression like this as:


+  bool write_make_modules_deps = (fdeps_format == FDEPS_FMT_NONE
+ && CPP_OPTION (pfile, deps.modules));



@@ -473,6 +507,117 @@ deps_write (const cpp_reader *pfile, FILE *fp, unsigned 
int colmax)
   make_write (pfile, fp, colmax);
 }
 
+static void

+p1689r5_write_filepath (const char *name, FILE *fp)


This and the other p1689r5 functions need more comments, at least one at 
the top explaining their purpose.


Jason



Re: [PATCH] debug/111080 - avoid outputting debug info for unused restrict qualified type

2023-08-23 Thread Jason Merrill via Gcc-patches

On 8/21/23 05:11, Richard Biener wrote:

The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs.  I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?


OK.


Thanks,
Richard.

PR debug/111080
* dwarf2out.cc (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.

* gcc.dg/debug/dwarf2/pr111080.c: New testcase.
---
  gcc/dwarf2out.cc |  7 +++
  gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c | 18 ++
  2 files changed, 25 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index fa0fe4c41bb..69018bde238 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -30141,8 +30141,13 @@ prune_unused_types_walk (dw_die_ref die)
  case DW_TAG_reference_type:
  case DW_TAG_rvalue_reference_type:
  case DW_TAG_volatile_type:
+case DW_TAG_restrict_type:
+case DW_TAG_shared_type:
+case DW_TAG_atomic_type:
+case DW_TAG_immutable_type:
  case DW_TAG_typedef:
  case DW_TAG_array_type:
+case DW_TAG_coarray_type:
  case DW_TAG_friend:
  case DW_TAG_enumeration_type:
  case DW_TAG_subroutine_type:
@@ -30151,6 +30156,8 @@ prune_unused_types_walk (dw_die_ref die)
  case DW_TAG_subrange_type:
  case DW_TAG_ptr_to_member_type:
  case DW_TAG_file_type:
+case DW_TAG_unspecified_type:
+case DW_TAG_dynamic_type:
/* Type nodes are useful only when other DIEs reference them --- don't
 mark them.  */
/* FALLTHROUGH */
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
new file mode 100644
index 000..3949d7e7c64
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -gdwarf-3 -dA" } */
+
+struct foo {
+int field_number_1;
+int field_number_2;
+int field_number_3;
+int field_number_4;
+int field_number_5;
+};
+
+typedef int fun_t(struct foo *restrict);
+
+int main() {
+return 0;
+}
+
+/* { dg-final { scan-assembler-not "DW_TAG_structure_type" } } */




Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-23 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 04:08:47PM -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 3:46 PM David Malcolm  wrote:
> >
> > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > > This patch enhances location_get_source_line(), which is the
> > > > > primary
> > > > > interface provided by the diagnostics infrastructure to obtain
> > > > > the line of
> > > > > source code corresponding to a given location, so that it
> > > > > understands
> > > > > generated data locations in addition to normal file-based
> > > > > locations. This
> > > > > involves changing the argument to location_get_source_line() from
> > > > > a plain
> > > > > file name, to a source_id object that can represent either type
> > > > > of location.
> > > > >
> >
> > [...]
> >
> > > > >
> > > > >
> > > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > > index 9377020b460..790279d4273 100644
> > > > > --- a/gcc/input.cc
> > > > > +++ b/gcc/input.cc
> > > > > @@ -207,6 +207,28 @@ private:
> > > > >void maybe_grow ();
> > > > >  };
> > > > >
> > > > > +/* This is the implementation of cache_data_source for generated
> > > > > +   data that is already in memory.  */
> > > > > +class data_cache_slot final : public cache_data_source
> > > >
> > > > It occurred to me: why are we caching accessing a buffer that's
> > > > already
> > > > in memory - but we're also caching the line-splitting information,
> > > > and
> > > > providing the line-splitting algorithm with a consistent interface
> > > > to
> > > > the data, right?
> > > >
> > >
> > > Yeah, for the current _Pragma use case, multi-line buffers are not
> > > going to
> > > be common, but they can occur. I was mainly motivated by the
> > > consistent
> > > interface, and by the assumption that the overhead is not critical
> > > given a
> > > diagnostic is being issued.
> >
> > (nods)
> >
> > >
> > > > [...snip...]
> > > >
> > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > > (const char *file_path)
> > > > >global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > > >  }
> > > > >
> > > > > +void
> > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > > +   unsigned int
> > > > > data_len)
> > > > > +{
> > > > > +  if (!global_dc->m_file_cache)
> > > > > +return;
> > > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > > >
> > > > Maybe we should rename diagnostic_context's m_file_cache to
> > > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > > so,
> > > > that can/should be a followup/separate patch.
> > > >
> > >
> > > Yes, we should. Believe it or not, I was trying to minimize the size
> > > of the
> > > patch :)
> >
> > :)
> >
> > Thanks for splitting it up, BTW.
> >
> > [...]
> >
> >
> > > >
> > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > > line_num,
> > > > > If the function fails, a NULL char_span is returned.  */
> > > > >
> > > > >  char_span
> > > > > -location_get_source_line (const char *file_path, int line)
> > > > > +location_get_source_line (source_id src, int line)
> > > > >  {
> > > > > -  const char *buffer = NULL;
> > > > > -  ssize_t len;
> > > > > -
> > > > > -  if (line == 0)
> > > > > -return char_span (NULL, 0);
> > > > > -
> > > > > -  if (file_path == NULL)
> > > > > -return char_span (NULL, 0);
> > > > > +  const char_span fail (nullptr, 0);
> > > > > +  if (!src || line <= 0)
> > > > > +return fail;
> > > >
> > > > Looking at source_id's operator bool, are there effectively three
> > > > kinds
> > > > of source_id?
> > > >
> > > > (a) file names
> > > > (b) generated buffer
> > > > (c) NULL == m_filename_or_buffer
> > > >
> > > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > > Or
> > > > is this more a special-case of (a)? (in that the m_len for such a
> > > > case
> > > > would be zero)
> > > >
> > > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > > NULL?
> > > >
> > > > [...snip...]
> > > >
> > > > The patch is OK for trunk as-is, but note the question about the
> > > > source_id ctor above.
> > > >
> > >
> > > Thanks. (c) has the same meaning as a NULL file name currently does,
> > > so a
> > > default-constructed source_id is not an in-memory buffer, but is
> > > rather a
> > > NULL filename. linemap_add() for instance, will interpret a NULL
> > > filename
> > > for an LC_LEAVE map, as a request to copy it from the natural values
> > > being
> > > returned to. I think the source_id constructor needs to accept a NULL
> > > filename to remain backwards compatible. With the current design of
> > > source_id, it is safe always to change a 'const char*' file name
> > > argument to
> > > a source_id argument instead; it will work just how it did before
> > > beca

Re: [PATCH] c++: refine CWG 2369 satisfaction vs non-dep convs [PR99599]

2023-08-23 Thread Jason Merrill via Gcc-patches

On 8/21/23 21:51, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look like
a reasonable approach?  I didn't observe any compile time/memory impact
of this change.

-- >8 --

As described in detail in the PR, CWG 2369 has the surprising
consequence of introducing constraint recursion in seemingly valid and
innocent code.

This patch attempts to fix this surpising behavior for the majority
of problematic use cases.  Rather than checking satisfaction before
_all_ non-dependent conversions, as specified by the CWG issue,
this patch makes us first check "safe" non-dependent conversions,
then satisfaction, then followed by "unsafe" non-dependent conversions.
In this case, a conversion is "safe" if computing it is guaranteed
to not induce template instantiation.  This patch heuristically
determines "safety" by checking for a constructor template or conversion
function template in the (class) parm or arg types respectively.
If neither type has such a member, then computing the conversion
should not induce instantiation (modulo satisfaction checking of
non-template constructor and conversion functions I suppose).

+ /* We're checking only non-instantiating conversions.
+A conversion may instantiate only if it's to/from a
+class type that has a constructor template/conversion
+function template.  */
+ tree parm_nonref = non_reference (parm);
+ tree type_nonref = non_reference (type);
+
+ if (CLASS_TYPE_P (parm_nonref))
+   {
+ if (!COMPLETE_TYPE_P (parm_nonref)
+ && CLASSTYPE_TEMPLATE_INSTANTIATION (parm_nonref))
+   return unify_success (explain_p);
+
+ tree ctors = get_class_binding (parm_nonref,
+ complete_ctor_identifier);
+ for (tree ctor : lkp_range (ctors))
+   if (TREE_CODE (ctor) == TEMPLATE_DECL)
+ return unify_success (explain_p);


Today we discussed maybe checking CLASSTYPE_NON_AGGREGATE?

Also, instantiation can also happen when checking for conversion to a 
pointer or reference to base class.


Jason



[PATCH] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-08-23 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

This patch implements P2564, described at , whereby
certain functions are promoted to consteval.  For example:

  consteval int id(int i) { return i; }

  template 
  constexpr int f(T t)
  {
return t + id(t); // id causes f to be promoted to consteval
  }

  void g(int i)
  {
f (3);
  }

now compiles.  Previously the code was ill-formed: we would complain
that 't' in 'f' is not a constant expression.  Since 'f' is now
consteval, it means that the call to id(t) is in an immediate context,
so doesn't have to produce a constant -- this is how we allow consteval
functions composition.  But making 'f' consteval also means that
the call to 'f' in 'g' must yield a constant; failure to do so results
in an error.  I made the effort to have cc1plus explain to us what's
going on.  For example, calling f(i) produces this neat diagnostic:

q.C: In function 'void g(int)':
q.C:11:5: error: call to consteval function 'f(i)' is not a constant 
expression
   11 |   f (i);
  |   ~~^~~
q.C:6:16: note: 'constexpr int f(T) [with T = int]' was promoted to an 
immediate function because its body contains an immediate-escalating expression 
'id(t)'
6 |   return t + id(t);
  |  ~~^~~

which hopefully makes it clear what's going on.

Implementing this proposal has been tricky.  One problem was delayed
instantiation: instantiating a function can set off a domino effect
where one call promotes a function to consteval but that then means
that another function should also be promoted, etc.  I previously
thought that I could get past that by implementing the propagation in
cp_gimplify_expr at which point we have already instantiated everything
via instantiate_pending_templates.  But I realized that we don't
gimplify e.g.

  static auto p = &id;

and so we'd never detect taking the address of a consteval function.
Therefore this patch instantiates immediate-escalating functions
beforehand.  And as usual, there were a lot of other things to
handle.  It's not just calls to consteval functions that we must
detect, we also have to look for id-expressions that denote an
immediate function.

I discovered two crashes:
, ICE-on-valid with NSDMI
, missing ; causes an ICE
which this patch doesn't address, but adds a dg-ice test for the former.

I left one FIXME in the patch because I'm unclear on how to properly fix
the modules problem.

PR c++/107687

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update __cpp_consteval.

gcc/cp/ChangeLog:

* call.cc (immediate_invocation_p): No longer static.
(immediate_escalating_function_p): New.
(maybe_promote_function_to_consteval): New.
(build_over_call): Set ADDR_EXPR_DENOTES_CALL_P.  Maybe promote
current_function_decl to consteval.
* constexpr.cc (instantiate_cx_fn_r): No longer static.
* cp-gimplify.cc (struct find_escalating_expr_t): New.
(find_escalating_expr_r): New.
(maybe_explain_promoted_consteval): New.
(maybe_escalate_decl_and_cfun): New.
(cp_fold_r) : Handle promoting functions to consteval.
: New case, handle promoting functions to consteval.
: Handle promoting functions to consteval.
* cp-tree.h (ADDR_EXPR_DENOTES_CALL_P): Define.
(immediate_invocation_p): Declare.
(immediate_escalating_function_p): Declare.
(maybe_promote_function_to_consteval): Declare.
(instantiate_constexpr_fns): Declare.
* typeck.cc (cp_build_addr_expr_1): SET_EXPR_LOCATION on constexpr
functions as well.

libstdc++-v3/ChangeLog:

* testsuite/20_util/integer_comparisons/greater_equal_neg.cc: Adjust
expected diagnostic.
* testsuite/20_util/integer_comparisons/greater_neg.cc: Likewise.
* testsuite/20_util/integer_comparisons/less_equal_neg.cc: Likewise.
* testsuite/20_util/optional/monadic/or_else_neg.cc: Likewise.
* testsuite/23_containers/array/creation/3_neg.cc: Likewise.
* testsuite/23_containers/span/first_neg.cc: Likewise.
* testsuite/23_containers/span/last_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_neg.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-inst1.C: Add dg-error.
* g++.dg/cpp23/consteval-if10.C: Remove dg-error.
* g++.dg/cpp23/consteval-if2.C: Likewise.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected value of __cpp_consteval.
* g++.dg/cpp26/feat-cxx26.C: Likewise.
* g++.dg/cpp2a/feat-cxx2a.C: Likewise.
* g++.dg/cpp2a/consteval-prop1.C: New test.
* g++.dg/cpp2a/consteval-prop10.C: New test.
* g++.dg/cpp2a/consteval-prop11.C: New test.
* g++.dg/cpp2a/consteval-prop12.C: New test.
* g++.dg/cpp2a/consteval-prop13.C: New test.
* g++.dg/cpp2a/consteval-prop14.C: New test.

Re: [PATCH] RISC-V: Enable Hoist to GCSE simple constants

2023-08-23 Thread Jeff Law via Gcc-patches




On 8/9/23 18:30, Vineet Gupta wrote:

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates . For a const with cost 1 backend currently returns 0,
causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
already since backend is returning 1 as well. Arguably that needs updating
as well to reflect cost better, but that should be a different change
anyways.

To keep testsuite parity, some V tests need updatinge which started failing
in the new costing regime.

gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
  -O2.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
Thanks for your patience on this.  I needed a bit of time to gather my 
thoughts and review some code.




index 8b7256108157..1802eef908fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  case CONST:
if ((cost = riscv_const_insns (x)) > 0)
{
- /* If the constant is likely to be stored in a GPR, SETs of
-single-insn constants are as cheap as register sets; we
-never want to CSE them.  */
+ /* Hoist will GCSE constants only if cost is non-zero.  */
  if (cost == 1 && outer_code == SET)
-   *total = 0;
- /* When we load a constant more than once, it usually is better
-to duplicate the last operation in the sequence than to CSE
-the constant itself.  */
+   *total = COSTS_N_INSNS (1);
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
}
So the concern here was we have two classes of constants which can be 
synthesized in a single instruction.


One class would be those constants that can be used as-is in most 
instructions.  (const_int 0) being the most obvious, but of course 
there's many others.


The other class can be synthesized in a single instruction, but aren't 
typically usable in something like addi, andi, etc.  A good example 
might be (const_int 0x4000).



I wanted to make sure we were doing something sensible across those two 
cases.  And I think we probably are as we have an earlier check in the 
CONST_INT case (no I don't like the case fallthrus at all :(


So I think your change makes sense.   But I think it can be refined to 
simplify the larger chunk of code we're looking at:



  /* If the constant is likely to be stored in a GPR, SETs of
 single-insn constants are as cheap as register sets; we
 never want to CSE them.  */
  if (cost == 1 && outer_code == SET)
*total = 0;
  /* When we load a constant more than once, it usually is better
 to duplicate the last operation in the sequence than to CSE
 the constant itself.  */
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);


Turns into
  if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);

With a suitable comment about GCSE and the general desire to duplicate 
the last op rather than CSE the constant for multi instruction constant 
synthesis cases.


If you agree, then consider the patch pre-approved with that change.  If 
not, then state why and your original patch is OK as well.


Again, thanks for your patience on this.

jeff


[committed] Improve quality of code from LRA register elimination

2023-08-23 Thread Jeff Law
This is primarily Jivan's work, I'm mostly responsible for the write-up 
and coordinating with Vlad on a few questions.


On targets with limitations on immediates usable in arithmetic 
instructions, LRA's register elimination phase can construct fairly poor 
code.


This example (from the GCC testsuite) illustrates the problem well.


int  consume (void *);
int foo (void) {
  int x[100];
  return consume (x + 1000);
}

If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc 
-mabi=lp64d", then you'll get this code (up to the call to consume()).




.cfi_startproc
li  t0,-4001792
li  a0,-3997696
li  a5,4001792
addisp,sp,-16
.cfi_def_cfa_offset 16
addit0,t0,1792
addia0,a0,1696
addia5,a5,-1792
sd  ra,8(sp)
add a5,a5,a0
add sp,sp,t0
.cfi_def_cfa_offset 416
.cfi_offset 1, -8
add a0,a5,sp
callconsume

Of particular interest is the value in a0 when we call consume. We 
compute that horribly inefficiently.   If we back-substitute from the 
final assignment to a0 we get...


a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792)
a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms
a0 = sp - 3990616

That's a pretty convoluted way to compute sp - 3990616.

Something like this would be notably better (not great, but we need both 
the stack adjustment and the address of the object to pass to consume):




   addi sp,sp,-16
   sd ra,8(sp)
   li t0,-4001792
   addi t0,t0,1792
   add sp,sp,t0
   li a0,4096
   addi a0,a0,-96
   add a0,sp,a0
   call consume


The problem is LRA's elimination code is not handling the case where we 
have (plus (reg1) (reg2) where reg1 is an eliminable register and reg2 
has a known equivalency, particularly a constant.


If we can determine that reg2 is equivalent to a constant and treat 
(plus (reg1) (reg2)) in the same way we'd treat (plus (reg1) 
(const_int)) then we can get the desired code.


This eliminates about 19b instructions, or roughly 1% for deepsjeng on 
rv64.  There are improvements elsewhere, but they're relatively small. 
This may ultimately lessen the value of Manolis's fold-mem-offsets 
patch.  So we'll have to evaluate that again once he posts a new version.


Bootstrapped and regression tested on x86_64 as well as bootstrapped on 
rv64.  Earlier versions have been tested against spec2017.  Pre-approved 
by Vlad in a private email conversation (thanks Vlad!).


Committed to the trunk,

Jeffcommit 47f95bc4be4eb14730ab3eaaaf8f6e71fda47690
Author: Raphael Moreira Zinsly 
Date:   Tue Aug 22 11:37:04 2023 -0600

RISC-V: Add multiarch support on riscv-linux-gnu

This adds multiarch support to the RISC-V port so that bootstraps work with
Debian out-of-the-box.  Without this patch the stage1 compiler is unable to
find headers/libraries when building the stage1 runtime.

This is functionally (and possibly textually) equivalent to Debian's fix for
the same problem.

gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
index 216d2776a18..a6f64f88d25 100644
--- a/gcc/config/riscv/t-linux
+++ b/gcc/config/riscv/t-linux
@@ -1,3 +1,5 @@
 # Only XLEN and ABI affect Linux multilib dir names, e.g. /lib32/ilp32d/
 MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst 
rv64%,lib64,$(MULTILIB_DIRNAMES)))
 MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
+
+MULTIARCH_DIRNAME := $(call if_multiarch,$(firstword $(subst -, 
,$(target)))-linux-gnu)


Re: [PATCH] c++: Fix up mangling of function/block scope static structured bindings [PR111069]

2023-08-23 Thread Jason Merrill via Gcc-patches

On 8/22/23 04:12, Jakub Jelinek wrote:

As can be seen on the testcase, we weren't correctly mangling
static/thread_local structured bindings (C++20 feature) at function/block
scope.  The following patch fixes that by using what write_local_name
does for those cases (note, structured binding mandling doesn't use the
standard path because it needs to pass a list of all the identifiers in
the structured binding to the mangling).  In addition to that it fixes
mangling of various helpers which use write_guarded_name (_ZGV*, _ZTH*,
_ZTW*) and kills find_decomp_unqualified_name which for the local names
would be too hard to implement and uses write_guarded_name for structured
binding related _ZGR* names as well.
All the mangled names on the testcase match now clang++ and my expectations.
Because the old mangled names were plain wrong (they mangled the same as
structured binding at global scope and resulted in assembly errors if there
was more than one static structured binding with the same identifiers in
the same (or another) function, I think we don't need to play with another
mangling ABI level which turns on/off the old broken way, unsure whether
we should backport the patch to 13 branch though.


Probably not.


BTW, I think we should also handle ABI tags in mangle_decomp which we
currently don't do, but guess that should be subject to another mangling ABI
version.


I'd be surprised if this would affect any real code, but I suppose so. 
In any case I'd like to fix this at the same time as the local statics, 
to avoid changing their mangled name twice.



@@ -9049,6 +9050,25 @@ cp_maybe_mangle_decomp (tree decl, tree
tree d = first;
for (unsigned int i = 0; i < count; i++, d = DECL_CHAIN (d))
v[count - i - 1] = d;
+  if (DECL_FUNCTION_SCOPE_P (decl))
+   {
+ size_t sz = 3;
+ for (unsigned int i = 0; i < count; ++i)
+   sz += IDENTIFIER_LENGTH (DECL_NAME (v[i])) + 1;
+ char *name = XALLOCAVEC (char, sz);
+ name[0] = 'D';
+ name[1] = 'C';
+ char *p = name + 2;
+ for (unsigned int i = 0; i < count; ++i)
+   {
+ size_t len = IDENTIFIER_LENGTH (DECL_NAME (v[i]));
+ *p++ = ' ';
+ memcpy (p, IDENTIFIER_POINTER (DECL_NAME (v[i])), len);
+ p += len;
+   }
+ *p = '\0';
+ determine_local_discriminator (decl, get_identifier (name));
+   }


Maybe do this in mangle_decomp, based on the actual mangling in process 
instead of this pseudo-mangling?



@@ -4564,6 +4519,13 @@ write_guarded_var_name (const tree varia
  /* The name of a guard variable for a reference temporary should refer
 to the reference, not the temporary.  */
  write_string (IDENTIFIER_POINTER (DECL_NAME (variable)) + 4);
+  else if (DECL_DECOMPOSITION_P (variable)
+  && DECL_NAME (variable) == NULL_TREE
+  && startswith (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (variable)),
+ "_Z"))


Maybe add a startswith overload that takes an identifier?


@@ -4630,7 +4592,10 @@ mangle_ref_init_variable (const tree var
start_mangling (variable);
write_string ("_ZGR");
check_abi_tags (variable);
-  write_name (variable, /*ignore_local_scope=*/0);
+  if (DECL_DECOMPOSITION_P (variable))
+write_guarded_var_name (variable);
+  else
+write_name (variable, /*ignore_local_scope=*/0);


Why not use write_guarded_name unconditionally?

Jason



Re: [committed] Improve quality of code from LRA register elimination

2023-08-23 Thread Jeff Law via Gcc-patches



On 8/23/23 14:13, Jeff Law wrote:
This is primarily Jivan's work, I'm mostly responsible for the write-up 
and coordinating with Vlad on a few questions.


On targets with limitations on immediates usable in arithmetic 
instructions, LRA's register elimination phase can construct fairly poor 
code.


This example (from the GCC testsuite) illustrates the problem well.


int  consume (void *);
int foo (void) {
   int x[100];
   return consume (x + 1000);
}

If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc 
-mabi=lp64d", then you'll get this code (up to the call to consume()).




     .cfi_startproc
     li  t0,-4001792
     li  a0,-3997696
     li  a5,4001792
     addi    sp,sp,-16
     .cfi_def_cfa_offset 16
     addi    t0,t0,1792
     addi    a0,a0,1696
     addi    a5,a5,-1792
     sd  ra,8(sp)
     add a5,a5,a0
     add sp,sp,t0
     .cfi_def_cfa_offset 416
     .cfi_offset 1, -8
     add a0,a5,sp
     call    consume

Of particular interest is the value in a0 when we call consume. We 
compute that horribly inefficiently.   If we back-substitute from the 
final assignment to a0 we get...


a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792)
a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms
a0 = sp - 3990616

That's a pretty convoluted way to compute sp - 3990616.

Something like this would be notably better (not great, but we need both 
the stack adjustment and the address of the object to pass to consume):




    addi sp,sp,-16
    sd ra,8(sp)
    li t0,-4001792
    addi t0,t0,1792
    add sp,sp,t0
    li a0,4096
    addi a0,a0,-96
    add a0,sp,a0
    call consume


The problem is LRA's elimination code is not handling the case where we 
have (plus (reg1) (reg2) where reg1 is an eliminable register and reg2 
has a known equivalency, particularly a constant.


If we can determine that reg2 is equivalent to a constant and treat 
(plus (reg1) (reg2)) in the same way we'd treat (plus (reg1) 
(const_int)) then we can get the desired code.


This eliminates about 19b instructions, or roughly 1% for deepsjeng on 
rv64.  There are improvements elsewhere, but they're relatively small. 
This may ultimately lessen the value of Manolis's fold-mem-offsets 
patch.  So we'll have to evaluate that again once he posts a new version.


Bootstrapped and regression tested on x86_64 as well as bootstrapped on 
rv64.  Earlier versions have been tested against spec2017.  Pre-approved 
by Vlad in a private email conversation (thanks Vlad!).


Committed to the trunk,

Whoops.  Attached the wrong patch :-)  This is the right one.

jeff

commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d
Author: Jivan Hakobyan 
Date:   Wed Aug 23 14:10:30 2023 -0600

Improve quality of code from LRA register elimination

This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.

On targets with limitations on immediates usable in arithmetic instructions,
LRA's register elimination phase can construct fairly poor code.

This example (from the GCC testsuite) illustrates the problem well.

int  consume (void *);
int foo (void) {
  int x[100];
  return consume (x + 1000);
}

If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc -mabi=lp64d", 
then
you'll get this code (up to the call to consume()).

.cfi_startproc
li  t0,-4001792
li  a0,-3997696
li  a5,4001792
addisp,sp,-16
.cfi_def_cfa_offset 16
addit0,t0,1792
addia0,a0,1696
addia5,a5,-1792
sd  ra,8(sp)
add a5,a5,a0
add sp,sp,t0
.cfi_def_cfa_offset 416
.cfi_offset 1, -8
add a0,a5,sp
callconsume

Of particular interest is the value in a0 when we call consume. We compute 
that
horribly inefficiently.   If we back-substitute from the final assignment 
to a0
we get...

a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-

Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-23 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 03:39:40PM -0400, David Malcolm wrote:
> On Tue, 2023-08-15 at 13:58 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > Class file_cache_slot in input.cc is used to query specific lines
> > > > of source
> > > > code from a file when needed by diagnostics infrastructure. This
> > > > will be
> > > > extended in a subsequent patch to support obtaining the source
> > > > code from
> > > > in-memory generated buffers rather than from a file. The present
> > > > patch
> > > > refactors class file_cache_slot, putting most of the logic into a
> > > > new base
> > > > class cache_data_source, in preparation for reusing that code in
> > > > the next
> > > > patch. There is no change in functionality yet.
> > > > 
> 
> [...snip...]
> 
> > > 
> > > I confess I had to reread both this and patch 4/8 to make sense of
> > > this; this is probably one of those cases where it's harder to read
> > > in
> > > patch form than as source, but I think I now understand the new
> > > implementation.
> > 
> > Yes, sorry about that. I hope at least splitting into two patches
> > here made it
> > a little easier.
> > 
> > > 
> > > Did you try testing this with valgrind (e.g. "make selftest-
> > > valgrind")?
> > > 
> > 
> > Oh interesting, was not aware of this. I think it shows that new
> > leaks were
> > not introduced with the patch series.
> > 
> 
> [...snip...]
> 
> > 
> > 
> > > I don't think we have any selftest coverage for "\r" in the line-
> > > break
> > > handling; that would be good to add.
> > > 
> > > This patch is OK for trunk once the rest of the kit is approved.
> > 
> > Thank you. To be clear, were you suggesting to add selftest coverage
> > for \r
> > endings now, or in a follow up?
> 
> The former, please, so that we can sure that the patch doesn't
> introduce any buffer overreads etc.
> 
> Thanks
> Dave
>

The following (incremental to patch 5/8 or after) adds selftest coverage for
alternate line endings. I hope things aren't too unclear this way; I can
resend updated versions of some or all of the patches from scratch, if useful.

AFAIK this is the current status of things:

Patch 1/8: Reviewed, updated version incorporating feedback has not been acked
yet, at: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627250.html

Patch 2/8: OKed, pending tweak to reject fixit hints in generated data, which
was sent incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627405.html

Patch 3/8: OKed, pending new selftest attached to this email.

Patch 4/8: OKed, pending tweak to assert on non-NULL buffers which was sent
incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628283.html

Patch 5/8: OKed

Patch 6/8: OKed

Patch 7/8: Not reviewed yet

Patch 8/8: Waiting additional feedback from you, perhaps SARIF need not worry
about this for now and should just ignore generated data locations.

Thanks again for taking the time to go through this, I hope it will prove
worth it.

-Lewis

-- >8 --

gcc/ChangeLog:

* input.cc (test_reading_source_line): Test additional cases,
including generated data and alternate line endings.
(input_cc_tests): Adapt to test_reading_source_line() changes.

diff --git a/gcc/input.cc b/gcc/input.cc
index 4c99df7a205..72274732c6c 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2392,30 +2392,51 @@ test_make_location_nonpure_range_endpoints (const 
line_table_case &case_)
 /* Verify reading of input files (e.g. for caret-based diagnostics).  */
 
 static void
-test_reading_source_line ()
+test_reading_source_line (bool generated, const char *e1, const char *e2)
 {
   /* Create a tempfile and write some text to it.  */
+  const char *line1 = "01234567890123456789";
+  const char *line2 = "This is the test text";
+  const char *line3 = "This is the 3rd line";
+  char content[72];
+  const int content_len = snprintf (content, sizeof (content),
+   "%s%s%s%s%s",
+   line1, e1, line2, e2, line3);
+  ASSERT_LT (content_len, (int)sizeof (content));
   temp_source_file tmp (SELFTEST_LOCATION, ".txt",
-   "01234567890123456789\n"
-   "This is the test text\n"
-   "This is the 3rd line");
+   content, content_len, generated);
 
-  /* Read back a specific line from the tempfile.  */
-  char_span source_line = location_get_source_line (tmp.get_filename (), 3);
+  /* Read back some specific lines from the tempfile, not all in order.  */
+  const source_id src = generated
+? source_id (tmp.content_buf, tmp.content_len)
+: source_id (tmp.get_filename ());
+
+  char_span source_line = location_get_source_line (src, 1);
+  ASSERT_TRUE (source_line);
+  ASSERT_TRUE (source_line.get_buffer () != NULL);
+  /* N.B. If the line terminator is \r\n, the returned char_span 

Re: [PATCH v7 1/4] driver: add a spec function to join arguments

2023-08-23 Thread Joseph Myers
On Wed, 23 Aug 2023, Jason Merrill via Gcc-patches wrote:

> Joseph, any thoughts on these issues or the workaround?

I don't have any comments here.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] MATCH: [PR111109] Fix bit_ior(cond, cond) when comparisons are fp

2023-08-23 Thread Andrew Pinski via Gcc-patches
The patterns that were added in r13-4620-g4d9db4bdd458, missed that
(a > b) and (a <= b) are not inverse of each other for floating point
comparisons (if NaNs are supported). Even though there was a check for
intergal types, it was only for the result of the cond rather for the
type of what is being compared. The fix is to check to see if cmp and
icmp are inverse of each other by using the invert_tree_comparison function.

OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu 
with no regressions.

I added the testcase to execute/ieee as it requires support for NAN.

PR tree-optimization/09

gcc/ChangeLog:

* match.pd (ior(cond,cond), ior(vec_cond,vec_cond)):
Add check to make sure cmp and icmp are inverse.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
---
 gcc/match.pd  | 11 ++-
 .../execute/ieee/fp-cmp-cond-1.c  | 78 +++
 2 files changed, 86 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 85b7d323a19..b666d73b189 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2087,6 +2087,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_and:c (convert? (cmp@0  @01 @02)) @3)
(bit_and:c (convert? (icmp@4 @01 @02)) @5))
 (if (INTEGRAL_TYPE_P (type)
+&& invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
 /* The scalar version has to be canonicalized after vectorization
because it makes unconditional loads conditional ones, which
means we lose vectorization because the loads may trap.  */
@@ -2101,6 +2102,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cond (cmp@0  @01 @02) @3 zerop)
(cond (icmp@4 @01 @02) @5 zerop))
 (if (INTEGRAL_TYPE_P (type)
+&& invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
 /* The scalar version has to be canonicalized after vectorization
because it makes unconditional loads conditional ones, which
means we lose vectorization because the loads may trap.  */
@@ -2113,13 +2115,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_ior
(bit_and:c (vec_cond:s (cmp@0 @6 @7) @4 @5) @2)
(bit_and:c (vec_cond:s (icmp@1 @6 @7) @4 @5) @3))
-(if (integer_zerop (@5))
+(if (integer_zerop (@5)
+&& invert_tree_comparison (cmp, HONOR_NANS (@6)) == icmp)
  (switch
   (if (integer_onep (@4))
(bit_and (vec_cond @0 @2 @3) @4))
(if (integer_minus_onep (@4))
 (vec_cond @0 @2 @3)))
-(if (integer_zerop (@4))
+(if (integer_zerop (@4)
+&& invert_tree_comparison (cmp, HONOR_NANS (@6)) == icmp)
  (switch
   (if (integer_onep (@5))
(bit_and (vec_cond @0 @3 @2) @5))
@@ -2132,7 +2136,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_ior
(vec_cond:s (cmp@0 @4 @5) @2 integer_zerop)
(vec_cond:s (icmp@1 @4 @5) @3 integer_zerop))
-(vec_cond @0 @2 @3)))
+  (if (invert_tree_comparison (cmp, HONOR_NANS (@4)) == icmp)
+   (vec_cond @0 @2 @3
 
 /* Transform X & -Y into X * Y when Y is { 0 or 1 }.  */
 (simplify
diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c 
b/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c
new file mode 100644
index 000..4a3c4b0eee2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c
@@ -0,0 +1,78 @@
+/* PR tree-optimization/09 */
+
+/*
+   f should return 0 if either fa and fb are a nan.
+   Rather than the value of a or b.
+*/
+__attribute__((noipa))
+int f(int a, int b, float fa, float fb) {
+  const _Bool c = fa < fb;
+  const _Bool c1 = fa >= fb;
+  return (c * a) | (c1 * b);
+}
+
+/*
+   f1 should return 0 if either fa and fb are a nan.
+   Rather than the value of a&1 or b&1.
+*/
+__attribute__((noipa))
+int f1(int a, int b, float fa, float fb) {
+  const _Bool c = fa < fb;
+  const _Bool c1 = fa >= fb;
+  return (c & a) | (c1 & b);
+}
+
+#if __SIZEOF_INT__ == __SIZEOF_FLOAT__
+typedef int v4si __attribute__ ((vector_size (1*sizeof(int;
+typedef float v4sf __attribute__ ((vector_size (1*sizeof(float;
+/*
+   fvf0 should return {0} if either fa and fb are a nan.
+   Rather than the value of a or b.
+*/
+__attribute__((noipa))
+v4si vf0(v4si a, v4si b, v4sf fa, v4sf fb) {
+  const v4si c = fa < fb;
+  const v4si c1 = fa >= fb;
+  return (c & a) | (c1 & b);
+}
+
+
+#endif
+
+int main(void)
+{
+  float a = __builtin_nan("");
+
+  if (f(-1,-1, a, a) != 0)
+__builtin_abort();
+  if (f(-1,-1, a, 0) != 0)
+__builtin_abort();
+  if (f(-1,-1, 0, a) != 0)
+__builtin_abort();
+  if (f(-1,-1, 0, 0) != -1)
+__builtin_abort();
+
+
+  if (f1(1,1, a, a) != 0)
+__builtin_abort();
+  if (f1(1,1, a, 0) != 0)
+__builtin_abort();
+  if (f1(1,1, 0, a) != 0)
+__builtin_abort();
+  if (f1(1,1, 0, 0) != 1)
+__builtin_abort();
+
+#if __SIZEOF_INT__ == __SIZEOF_FLOAT__
+  

Re: [PATCH] RISC-V: Add initial pipeline description for an out-of-order core.

2023-08-23 Thread Jeff Law via Gcc-patches




On 8/23/23 08:56, Robin Dapp wrote:

Does this patch fix these 2 following PR:
108271 – Missed RVV cost model (gnu.org) 

108412 – RISC-V: Negative optimization of GCSE && LOOP INVARIANTS (gnu.org) 


If yes, plz append these 2 cases into testsuite and indicate those 2 PR are 
fixed.
So that we can close them.


The second one is fixed on my local branch, the first not yet because there
is more to it still.  The second one is more due to pressure-aware scheduling
and I'm going to add it to the commit as well as the PR to the commit once this
is verified.
Basically there's a little conditional in IRA which basically says don't 
rematerialize constants at their use points if register pressure 
sensitive scheduling has been enabled.So turning on -fsched-pressure 
helps, it's a bit of a cop-out, but probably good enough for now.


Jeff


Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-23 Thread Jeff Law via Gcc-patches




On 8/23/23 08:54, Li, Pan2 wrote:

Thanks Jeff for comments.


Understood.  So the natural question is why does x86/sh not need this
for its mode switching?   Don't all the same issues exist on those
targets as well?


AFAIK, it comes from the different design principle between the risc-v and 
x86/arm intrinsic API.
The risc-v rvv FP rounding mode intrinsic API has one abstract level above the 
insn itself, while
the x86/arm only indicates the semantics of the insn.

For example, if one vector instruction VFADD doesn't have static rounding mode 
(aka encoding rm in insn),
there is no such a intrinsic API contains rounding mode argument in x86/arm. 
While the risc-v fp
vector intrinsic will always have static rounding mode API if the frm is 
honored.

In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm 
instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.


That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided




For the rest part, will have a try based on your suggestion soon as I am in the 
middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)


jeff


Re: [PATCH] RISC-V: Enable Hoist to GCSE simple constants

2023-08-23 Thread Vineet Gupta



On 8/23/23 13:04, Jeff Law wrote:
Thanks for your patience on this.  I needed a bit of time to gather my 
thoughts and review some code.


No worries at all.


index 8b7256108157..1802eef908fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN

  case CONST:
    if ((cost = riscv_const_insns (x)) > 0)
  {
-  /* If the constant is likely to be stored in a GPR, SETs of
- single-insn constants are as cheap as register sets; we
- never want to CSE them.  */
+  /* Hoist will GCSE constants only if cost is non-zero. */
    if (cost == 1 && outer_code == SET)
-    *total = 0;
-  /* When we load a constant more than once, it usually is better
- to duplicate the last operation in the sequence than to CSE
- the constant itself.  */
+    *total = COSTS_N_INSNS (1);
    else if (outer_code == SET || GET_MODE (x) == VOIDmode)
  *total = COSTS_N_INSNS (1);
  }
So the concern here was we have two classes of constants which can be 
synthesized in a single instruction.


One class would be those constants that can be used as-is in most 
instructions.  (const_int 0) being the most obvious, but of course 
there's many others.


The other class can be synthesized in a single instruction, but aren't 
typically usable in something like addi, andi, etc.  A good example 
might be (const_int 0x4000).



I wanted to make sure we were doing something sensible across those 
two cases.  And I think we probably are as we have an earlier check in 
the CONST_INT case


Exactly, I have a note written down to trace the call flow to remind 
myself how this works. I'll add a comment here to make this clear.



(no I don't like the case fallthrus at all :(


Seriously, I detest it too, but the irony is I've now made my 2nd change 
in there and keep adding to ugliness :-(




So I think your change makes sense.   But I think it can be refined to 
simplify the larger chunk of code we're looking at:



  /* If the constant is likely to be stored in a GPR, SETs of
 single-insn constants are as cheap as register sets; we
 never want to CSE them.  */
  if (cost == 1 && outer_code == SET)
    *total = 0;
  /* When we load a constant more than once, it usually is 
better

 to duplicate the last operation in the sequence than to CSE
 the constant itself.  */
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
    *total = COSTS_N_INSNS (1);


Turns into
  if (outer_code == SET || GET_MODE (x) == VOIDmode)
    *total = COSTS_N_INSNS (1);


Yep that's what I started with too but then left it, leaving it as an 
visual indication to fix things up when ultimately cost model returns 
the actual num of insns for a non trivial large const.Leaving the code 
there meant we  But I agree I'll fold it and add a TODO comment for 
improving the cost model.


For the current proposal I do want to understand/reason what is left 
there - what cases are we trying to filter out with #2467 ?


|    case CONST:
|  if ((cost = riscv_const_insns (x)) > 0) # 2465
| {
|        if (outer_code == SET || GET_MODE (x) == VOIDmode)  # 2467
|        *total = COSTS_N_INSNS 
(1);    # 2468


(1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic 
addresses etc whose actual values are not known at compile time ? Or is 
it needed just as an artifact of the weird fall through.


(2) outer_code SET will kick in for set_src_cost( ) call from Hoist, 
which passes a const_int rtx directly.

  But in case of say expand_mult () -> set_src_cost ( ) called for say
    (mult:DI (reg:DI 134)
    (const_int [0x202020202020202]))

  In the eventual call for const_int operand, outer_code is MULT, 
so we elide #2468


  But wait we don't even hit #2467 since #2465 has a weird cap 
inside which caps 6 to 0.


|    case CONST_INT:
|  {
|          int cost = riscv_integer_cost (INTVAL (x));
|          /* Force complicated constants to memory.  */
|          return cost < 4 ? cost : 0;  #1321
|  }

This definitely needs to be tracked separately in a PR



With a suitable comment about GCSE and the general desire to duplicate 
the last op rather than CSE the constant for multi instruction 
constant synthesis cases.


Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the 
RA is capable enough of undoing that ;-)

For now I can I can keep the comment with current philosophy



If you agree, then consider the patch pre-approved with that change.  
If not, then state why and your original patch is OK as well.


I'm afraid I have more questions than either of us were hoping for :-)

But it seems we can chunk up the work for just Hoist enabling and then 
improve th

[PATCH V5 1/4] rs6000: build constant via li;rotldi

2023-08-23 Thread Jiufu Guo via Gcc-patches
Hi,

If a constant is possible to be rotated to/from a positive or negative
value which "li" can generated, then "li;rotldi" can be used to build
the constant.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623528.html
This patch just did minor changes to the comments according to previous
review.

Bootstrap and regtest pass on ppc64{,le}.

Is this ok for trunk?


BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +--
 .../gcc.target/powerpc/const-build.c  | 57 +++
 2 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..acc332acc05 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   'rotldi'.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to the mask operand of rotldi(rldicl), and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* If C or ~C contains at least 49 successive zeros, then C can be rotated
+ to/from a positive or negative value that 'li' is able to load.  */
+  int n;
+  if (can_be_rotated_to_lowbits (c, 15, &n)
+  || can_be_rotated_to_lowbits (~c, 15, &n))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, &shift, &mask))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..69b37e2bb53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,1 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Verify that two instructions are successfully used to build constants.
+   One insn is li, another is rotate: rldicl.  */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.39.3



[committed] testsuite: Xfail gcc.dg/tree-ssa/update-threading.c for CRIS, PR110628

2023-08-23 Thread Hans-Peter Nilsson via Gcc-patches
Oops, looks like the PR title annotation didn't work and I
forgot the classic changelog annotation.

Anyway, after fixing a testsuite inconsistency, this test
fails for *some* architectures and shows up as a regression;
see the PR.

-- >8 --

* gcc.dg/tree-ssa/update-threading.c: Xfail for cris-*-*.
---
 gcc/testsuite/gcc.dg/tree-ssa/update-threading.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/update-threading.c 
b/gcc/testsuite/gcc.dg/tree-ssa/update-threading.c
index 1435e9ba2e02..9500099cddff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/update-threading.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/update-threading.c
@@ -20,4 +20,4 @@ main (int argc, char **argv)
 foo (((A) { ((!(i >> 4) ? 8 : 64 + (i >> 4)) << 8) + (i << 4) } ).a);
   exit (0);
 }
-/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 0 "optimized" { xfail 
cris-*-* } } } Xfail: PR110628 */
-- 
2.30.2



Re: [PATCH] rs6000: Disable PCREL for unsupported targets [PR111045]

2023-08-23 Thread Peter Bergner via Gcc-patches
On 8/21/23 8:51 PM, Kewen.Lin wrote:
>> The following patch has been bootstrapped and regtested on powerpc64-linux.
> 
> I think we should test this on powerpc64le-linux P8 or P9 (no P10) as well.

That's a good idea!



> I think this should be moved to be with the hunk on PCREL:
> 
>   /* If the ABI has support for PC-relative relocations, enable it by default.
>  This test depends on the sub-target tests above setting the code model to
>  medium for ELF v2 systems.  */
>   if (PCREL_SUPPORTED_BY_OS
>   && (rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0)
> rs6000_isa_flags |= OPTION_MASK_PCREL;
> 
>   /* -mpcrel requires -mcmodel=medium, but we can't check TARGET_CMODEL until
>   after the subtarget override options are done.  */
>   else if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
> {
>   if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
>   error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
> 
>   rs6000_isa_flags &= ~OPTION_MASK_PCREL;
> }

Agreed on the location, but...

Looking at this closer, I don't think I'm happy with the current code.
We seem to have duplicated tests for whether the target supports pcrel
or not in both PCREL_SUPPORTED_BY_OS and rs6000_pcrel_p().  That means
if another target were to add support for pcrel, they'd have to update
multiple locations of the code, and that seems error prone.

I think we should standardize our tests for whether the target/OS
supports pcrel (irrespective of the -mpcrel or -mcmodel=medium options)
and have that in PCREL_SUPPORTED_BY_OS.  Ie, a one-stop-shop for testing
whether the current target/OS can support pcrel.  Then we should modify
rs6000_pcrel_p() use PCREL_SUPPORTED_BY_OS rather than its own
semi-duplicated target/OS tests, plus any other tests for options that
might disqualify the current target/OS from supporting pcrel, when it
normally can (ie, -mmodel != medium for ELFv2).

I think then, that should allow simplifying the code in
rs6000_option_override_internal.

Thoughts?


Peter




[PATCH] VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizer

2023-08-23 Thread Juzhe-Zhong
Hi.

This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer.

Consider this following case:
#include 

#define N 32

/* Simple condition reduction.  */

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < N; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

With this patch, we can generate this following IR:

  _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]);
  _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... };
  ivtmp_36 = _44 * 4;
  vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0);

  mask__11.9_41 = vect__4.8_39 < vect_cst__40;
  last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, 
_44, 0);
  ...

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_reduction): Apply 
LEN_FOLD_EXTRACT_LAST.
* tree-vect-stmts.cc (vectorizable_condition): Ditto.

---
 gcc/tree-vect-loop.cc  |  7 --
 gcc/tree-vect-stmts.cc | 52 --
 2 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1cd6c291377..ebee8037e02 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7494,8 +7494,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
}
 
   if (reduc_chain_length == 1
- && direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST,
-vectype_in, OPTIMIZE_FOR_SPEED))
+ && (direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST, vectype_in,
+ OPTIMIZE_FOR_SPEED)
+ || direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
+vectype_in,
+OPTIMIZE_FOR_SPEED)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 413a88750d6..be9f3a280bd 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11740,8 +11740,17 @@ vectorizable_condition (vec_info *vinfo,
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
  if (reduction_type == EXTRACT_LAST_REDUCTION)
-   vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo),
-  ncopies * vec_num, vectype, NULL);
+   {
+ if (direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
+ vectype, OPTIMIZE_FOR_SPEED))
+   vect_record_loop_len (loop_vinfo,
+ &LOOP_VINFO_LENS (loop_vinfo),
+ ncopies * vec_num, vectype, 1);
+ else
+   vect_record_loop_mask (loop_vinfo,
+  &LOOP_VINFO_MASKS (loop_vinfo),
+  ncopies * vec_num, vectype, NULL);
+   }
  /* Extra inactive lanes should be safe for vect_nested_cycle.  */
  else if (STMT_VINFO_DEF_TYPE (reduc_info) != vect_nested_cycle)
{
@@ -11772,7 +11781,13 @@ vectorizable_condition (vec_info *vinfo,
  mask to the condition, or to its inverse.  */
 
   vec_loop_masks *masks = NULL;
-  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+  vec_loop_lens *lens = NULL;
+  if (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+{
+  if (reduction_type == EXTRACT_LAST_REDUCTION)
+   lens = &LOOP_VINFO_LENS (loop_vinfo);
+}
+  else if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
 {
   if (reduction_type == EXTRACT_LAST_REDUCTION)
masks = &LOOP_VINFO_MASKS (loop_vinfo);
@@ -11910,7 +11925,8 @@ vectorizable_condition (vec_info *vinfo,
   /* Force vec_compare to be an SSA_NAME rather than a comparison,
 in cases where that's necessary.  */
 
-  if (masks || reduction_type == EXTRACT_LAST_REDUCTION)
+  tree len = NULL_TREE, bias = NULL_TREE;
+  if (masks || lens || reduction_type == EXTRACT_LAST_REDUCTION)
{
  if (!is_gimple_val (vec_compare))
{
@@ -11931,6 +11947,23 @@ vectorizable_condition (vec_info *vinfo,
  vec_compare = vec_compare_name;
}
 
+ if (direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
+ vectype, OPTIMIZE_FOR_SPEED))
+   {
+ if (lens)
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens,
+  vec_num * ncopies, vectype, i, 1);
+ signed char biasval
+   = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+   }
+ else
+   {
+ 

[PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-23 Thread Juzhe-Zhong
Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li  a5,32
vmv.v.x v8,a1
vl8re32.v   v0,0(a0)
vid.v   v16
vmslt.vvv0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addia5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vmv8,v16,v0
vslidedown.vx   v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li  a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrra6,vlenb
mv  a2,a0
li  a3,32
li  a0,66
srlia6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v   v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
sllia1,a5,2
vmv.v.x v2,a6
vmslt.vvv0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addia5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vmv2,v3,v0
vslidedown.vx   v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w  a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the 
VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

---
 gcc/config/riscv/autovec.md   |  24 
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 115 +-
 gcc/config/riscv/vector.md|   2 +-
 .../riscv/rvv/autovec/reduc/extract_last-1.c  |  20 +++
 .../riscv/rvv/autovec/reduc/extract_last-10.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-11.c |  24 
 .../riscv/rvv/autovec/reduc/extract_last-12.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-13.c |   7 ++
 .../riscv/rvv

[PATCH] MATCH: remove negate for 1bit types

2023-08-23 Thread Andrew Pinski via Gcc-patches
For 1bit types, negate is either undefined or don't change the value.
In either cases we want to remove them.
This patch adds a match pattern to do that.
Also converting to a 1bit type we can remove the negate just like we already do
for `&1` so this patch adds that too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Notes on the testcases:
This patch is the last part to fix PR 95929; cond-bool-2.c testcase.
bit1neg-1.c is a 1bit-field testcase where we could remove the assignment
all the way in one case (which happened on the RTL level for some targets but 
not all).
cond-bool-2.c is the reduced testcase of PR 95929.

PR tree-optimization/95929

gcc/ChangeLog:

* match.pd (convert?(-a)): New pattern
for 1bit integer types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bit1neg-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-2.c: New test.
---
 gcc/match.pd| 12 ++
 gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c   | 23 ++
 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c | 21 +
 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c | 26 +
 4 files changed, 82 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index a2e56d5a4e8..3bbeceb37b4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9090,6 +9090,18 @@ and,
  (if (!TYPE_OVERFLOW_SANITIZED (type))
   (bit_and @0 @1)))
 
+/* `-a` is just `a` if the type is 1bit wide or when converting
+   to a 1bit type; similar to the above transformation of `(-x)&1`.
+   This is used mostly with the transformation of
+   `a ? ~b : b` into `(-a)^b`.
+   It also can show up with bitfields.  */
+(simplify
+ (convert? (negate @0))
+ (if (INTEGRAL_TYPE_P (type)
+  && TYPE_PRECISION (type) == 1
+  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)))
+  (convert @0)))
+
 /* Optimize
c1 = VEC_PERM_EXPR (a, a, mask)
c2 = VEC_PERM_EXPR (b, b, mask)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
new file mode 100644
index 000..2f123fbb9b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+struct f
+{
+  int a:1;
+};
+
+void g(struct f *a)
+{
+ int t = a->a;
+ t = -t;
+ a->a = t;
+}
+void g1(struct f *a, int b)
+{
+ int t = b;
+ t = -t;
+ a->a = t;
+}
+/* the 2 negates should have been removed as this is basically the same
+   as (-a) & 1. */
+/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
new file mode 100644
index 000..752a3030ad1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+_Bool f1(int a, int b)
+{
+  _Bool _1 = b != 0;
+  _Bool _2 = a != 0;
+  _Bool _8 = a == 0;
+  _Bool _13;
+  if (_1) _13 = _8; else _13 = _2;
+  return _13;
+}
+
+/* We should be able to optimize this to (a != 0) ^ (b != 0) */
+/* There should be no negate_expr nor gimple_cond here. */
+
+/* { dg-final { scan-tree-dump-not "negate_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "ne_expr, " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "gimple_assign " 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
new file mode 100644
index 000..b3e7e25dec6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+
+/* PR tree-optimization/95929 */
+
+
+static inline _Bool nand(_Bool a, _Bool b)
+{
+return !(a && b);
+}
+
+_Bool f(int a, int b)
+{
+return nand(nand(b, nand(a, a)), nand(a, nand(b, b)));
+}
+
+/* We should be able to optimize this to (a != 0) ^ (b != 0) */
+/* There should be no negate_expr nor gimple_cond here. */
+
+/* { dg-final { scan-tree-dump-not "negate_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "ne_expr, " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "cond_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "gimple_assign " 3 "optimize

[PATCH] testsuite: aarch64: Adjust SVE ACLE tests to new generated code

2023-08-23 Thread Thiago Jung Bauermann via Gcc-patches
Since commit e7a36e4715c7 "[PATCH] RISC-V: Support simplify (-1-x) for
vector." these tests fail on aarch64-linux:

=== g++ tests ===

Running g++:g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu++98 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  check-function-bodies 
subr_m1_s8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu++98 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
check-function-bodies subr_m1_s8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu++98 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  check-function-bodies 
subr_m1_u8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu++98 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
check-function-bodies subr_m1_u8_m

=== gcc tests ===

Running gcc:gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu90 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  check-function-bodies 
subr_m1_s8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu90 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
check-function-bodies subr_m1_s8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu90 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  check-function-bodies 
subr_m1_u8_m
FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu90 -O2 
-fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
check-function-bodies subr_m1_u8_m

Andrew Pinski's analysis in PR testsuite/111071 is that the new code is
better and the testcase should be updated. I also asked Prathamesh Kulkarni
in private and he agreed.

Here is the update. With this change, all tests in
gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp pass on aarch64-linux.

gcc/testsuite/
PR testsuite/111071
* gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c: Adjust to 
new code.
* gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise.

Suggested-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c | 3 +--
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
index b9615de6655f..3e521bc9ae32 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
@@ -76,8 +76,7 @@ TEST_UNIFORM_Z (subr_1_s8_m_untied, svint8_t,
 
 /*
 ** subr_m1_s8_m:
-** mov (z[0-9]+\.b), #-1
-** subrz0\.b, p0/m, z0\.b, \1
+** not z0\.b, p0/m, z0\.b
 ** ret
 */
 TEST_UNIFORM_Z (subr_m1_s8_m, svint8_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
index 65606b6dda03..4922bdbacc47 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
@@ -76,8 +76,7 @@ TEST_UNIFORM_Z (subr_1_u8_m_untied, svuint8_t,
 
 /*
 ** subr_m1_u8_m:
-** mov (z[0-9]+\.b), #-1
-** subrz0\.b, p0/m, z0\.b, \1
+** not z0\.b, p0/m, z0\.b
 ** ret
 */
 TEST_UNIFORM_Z (subr_m1_u8_m, svuint8_t,


[PATCH v5 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-23 Thread Chenghui Pan
This is an update of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627413.html

Changes since last version of patch set:
- Fix regression test fail of pr54346.c with 
RUNTESTFLAGS="--target_board=unix/-mlsx".
  This is caused by the code simplification of 
loongarch_expand_vec_perm_const_2 () in
  last version.
- Combine vilvh/xvilvh insn's RTL template impl.
- Add dg-skip-if for loongarch*-*-* in vshuf test in g++.dg/torture, because
  vshuf/xvshuf insn's result is undefined when 6 or 7 bit of vector's element 
is set,
  and insns with this condition are generated in these testcases.

Brief version history of patch set:

v1 -> v2:
- Reduce usage of "unspec" in RTL template.
- Append Support of ADDR_REG_REG in LSX and LASX.
- Constraint docs are appended in gcc/doc/md.texi and ccomment block.
- Codes related to vecarg are removed.
- Testsuite of LSX and LASX is added in v2. (Because of the size limitation of
  mail list, these patches are not shown)
- Adjust the loongarch_expand_vector_init() function to reduce instruction 
  output amount.
- Some minor implementation changes of RTL templates.

v2 -> v3:
- Revert vabsd/xvabsd RTL templates to unspec impl.
- Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping 
  with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
- Remove redundant definitions in lasxintrin.h.
- Refine commit info.

v3 -> v4:
- Code simplification.
- Testsuite patches are splited from this patch set again and will be submitted 
independently in the future.

Lulu Cheng (6):
  LoongArch: Add Loongson SX vector directive compilation framework.
  LoongArch: Add Loongson SX base instruction support.
  LoongArch: Add Loongson SX directive builtin function support.
  LoongArch: Add Loongson ASX vector directive compilation framework.
  LoongArch: Add Loongson ASX base instruction support.
  LoongArch: Add Loongson ASX directive builtin function support.

 gcc/config.gcc|2 +-
 gcc/config/loongarch/constraints.md   |  131 +-
 .../loongarch/genopts/loongarch-strings   |4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
 gcc/config/loongarch/lasx.md  | 5104 
 gcc/config/loongarch/lasxintrin.h | 5338 +
 gcc/config/loongarch/loongarch-builtins.cc| 2686 -
 gcc/config/loongarch/loongarch-c.cc   |   18 +
 gcc/config/loongarch/loongarch-def.c  |6 +
 gcc/config/loongarch/loongarch-def.h  |9 +-
 gcc/config/loongarch/loongarch-driver.cc  |   10 +
 gcc/config/loongarch/loongarch-driver.h   |2 +
 gcc/config/loongarch/loongarch-ftypes.def |  666 +-
 gcc/config/loongarch/loongarch-modes.def  |   39 +
 gcc/config/loongarch/loongarch-opts.cc|   89 +-
 gcc/config/loongarch/loongarch-opts.h |3 +
 gcc/config/loongarch/loongarch-protos.h   |   35 +
 gcc/config/loongarch/loongarch-str.h  |3 +
 gcc/config/loongarch/loongarch.cc | 4660 +-
 gcc/config/loongarch/loongarch.h  |  117 +-
 gcc/config/loongarch/loongarch.md |   56 +-
 gcc/config/loongarch/loongarch.opt|   12 +-
 gcc/config/loongarch/lsx.md   | 4467 ++
 gcc/config/loongarch/lsxintrin.h  | 5181 
 gcc/config/loongarch/predicates.md|  333 +-
 gcc/doc/md.texi   |   11 +
 gcc/testsuite/g++.dg/torture/vshuf-v16hi.C|1 +
 gcc/testsuite/g++.dg/torture/vshuf-v16qi.C|1 +
 gcc/testsuite/g++.dg/torture/vshuf-v2df.C |2 +
 gcc/testsuite/g++.dg/torture/vshuf-v2di.C |1 +
 gcc/testsuite/g++.dg/torture/vshuf-v4sf.C |2 +-
 gcc/testsuite/g++.dg/torture/vshuf-v8hi.C |1 +
 32 files changed, 28717 insertions(+), 285 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h

-- 
2.36.0



[PATCH v5 4/6] LoongArch: Add Loongson ASX vector directive compilation framework.

2023-08-23 Thread Chenghui Pan
From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add compilation framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LASX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LASX): Ditto.
* config/loongarch/loongarch-driver.cc (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
(ISA_HAS_LASX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LASX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 gcc/config/loongarch/genopts/loongarch-strings |  1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  |  4 
 gcc/config/loongarch/loongarch-c.cc| 11 +++
 gcc/config/loongarch/loongarch-def.c   |  4 +++-
 gcc/config/loongarch/loongarch-def.h   |  6 --
 gcc/config/loongarch/loongarch-driver.cc   |  2 +-
 gcc/config/loongarch/loongarch-driver.h|  1 +
 gcc/config/loongarch/loongarch-opts.cc |  9 -
 gcc/config/loongarch/loongarch-opts.h  |  4 +++-
 gcc/config/loongarch/loongarch-str.h   |  1 +
 gcc/config/loongarch/loongarch.opt |  4 
 11 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 24a5025061f..35d08f5967d 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -42,6 +42,7 @@ OPTSTR_DOUBLE_FLOAT   double-float
 
 # SIMD extensions
 OPTSTR_LSX lsx
+OPTSTR_LASXlasx
 
 # -mabi=
 OPTSTR_ABI_BASE  abi
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 338d77a7e40..afde23c9661 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -80,6 +80,10 @@ m@@OPTSTR_LSX@@
 Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
 Enable LoongArch SIMD Extension (LSX).
 
+m@@OPTSTR_LASX@@
+Target RejectNegative Var(la_opt_switches) Mask(LASX) 
Negative(m@@OPTSTR_LASX@@)
+Enable LoongArch Advanced SIMD Extension (LASX).
+
 ;; Base target models (implies ISA & tune parameters)
 Enum
 Name(cpu_type) Type(int)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index b065921adc3..2747fb9e472 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -104,8 +104,19 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_simd");
   builtin_define ("__loongarch_sx");
   builtin_define ("__loongarch_sx_width=128");
+
+  if (!ISA_HAS_LASX)
+   builtin_define ("__loongarch_simd_width=128");
 }
 
+  if (ISA_HAS_LASX)
+{
+  builtin_define ("__loongarch_asx");
+  builtin_define ("__loongarch_asx_width=256");
+  builtin_define ("__loongarch_simd_width=256");
+}
+
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 28e24c62249..bff92c86532 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -54,7 +54,7 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
-  .simd = ISA_EXT_SIMD_LSX,
+  .simd = ISA_EXT_SIMD_LASX,
   },
 };
 
@@ -150,6 +150,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
   [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
+  [ISA_EXT_SIMD_LASX] = OPTSTR_LASX,
 };
 
 const char*
@@ -180,6 +181,7 @@ loongarch_switch_strings[] = {
   [SW_SINGLE_FLOAT]  = OPTSTR_SINGLE_FLOAT,
   [SW_DOUBLE_FLOAT]  = OPTSTR_DOUBLE_FLOAT,
   [SW_LSX]   = OPTSTR_LSX,
+  [SW_LASX]  = OPTSTR_LASX,
 };
 
 
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index f34cffcfb9b..0bbcdb03d22 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -64,7 +64,8 @@ extern const char* loongarch_isa_ext_strings[];
 #define ISA_EXT_FPU642
 #define N_ISA_EXT_FPU_TYPES   3
 #define ISA_EXT_SIMD_LSX  3
-#define N_ISA_EXT_TYPES  4
+#define ISA_EXT_SIMD_LASX 4
+#define N_ISA_EXT_TYPES  5
 
 /* enum abi_base */
 extern const char* loong

[PATCH v5 1/6] LoongArch: Add Loongson SX vector directive compilation framework.

2023-08-23 Thread Chenghui Pan
From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add compilation framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LSX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LSX): Ditto.
(struct loongarch_isa): Ditto.
* config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto.
(driver_get_normalized_m_opts): Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto.
(isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 .../loongarch/genopts/loongarch-strings   |  3 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  8 +-
 gcc/config/loongarch/loongarch-c.cc   |  7 ++
 gcc/config/loongarch/loongarch-def.c  |  4 +
 gcc/config/loongarch/loongarch-def.h  |  7 +-
 gcc/config/loongarch/loongarch-driver.cc  | 10 +++
 gcc/config/loongarch/loongarch-driver.h   |  1 +
 gcc/config/loongarch/loongarch-opts.cc| 82 ++-
 gcc/config/loongarch/loongarch-opts.h |  1 +
 gcc/config/loongarch/loongarch-str.h  |  2 +
 gcc/config/loongarch/loongarch.opt|  8 +-
 11 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index a40998ead97..24a5025061f 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -40,6 +40,9 @@ OPTSTR_SOFT_FLOAT soft-float
 OPTSTR_SINGLE_FLOAT   single-float
 OPTSTR_DOUBLE_FLOAT   double-float
 
+# SIMD extensions
+OPTSTR_LSX lsx
+
 # -mabi=
 OPTSTR_ABI_BASE  abi
 STR_ABI_BASE_LP64Dlp64d
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4b9b4ac273e..338d77a7e40 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -76,6 +76,9 @@ m@@OPTSTR_DOUBLE_FLOAT@@
 Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) 
Negative(m@@OPTSTR_SOFT_FLOAT@@)
 Allow hardware floating-point instructions to cover both 32-bit and 64-bit 
operations.
 
+m@@OPTSTR_LSX@@
+Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
+Enable LoongArch SIMD Extension (LSX).
 
 ;; Base target models (implies ISA & tune parameters)
 Enum
@@ -125,11 +128,14 @@ Target RejectNegative Joined ToLower Enum(abi_base) 
Var(la_opt_abi_base) Init(M_
 Variable
 int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
-
 mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
+mmemvec-cost=
+Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) 
IntegerRange(1, 5)
+mmemvec-cost=COST  Set the cost of vector memory access instructions.
+
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..b065921adc3 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,13 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  if (ISA_HAS_LSX)
+{
+  builtin_define ("__loongarch_simd");
+  builtin_define ("__loongarch_sx");
+  builtin_define ("__loongarch_sx_width=128");
+}
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 6729c857f7c..28e24c62249 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -49,10 +49,12 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LOONGARCH64] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = 0,
   },
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = ISA_EXT_SIMD_LSX,
   },
 };
 
@@ -147,6 +149,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU64] = STR_ISA_EXT_FPU64,
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
+  [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
 };
 
 const char*
@@ -176,6 +179,7 @@ loongarch_switch_strings[] = {
   [SW_SOFT_FLOAT]= OPTSTR_SOFT_FLOAT,
   [SW_SINGLE_FLOAT]  = 

PING^2: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-23 Thread Xi Ruoyao via Gcc-patches
Ping again.

On Fri, 2023-08-18 at 13:04 +0200, Stefan Schulze Frielinghaus via Gcc-patches 
wrote:
> Ping.  Since this fixes bootstrap problem PR110939 for Loongarch I'm
> pingen this one earlier.
> 
> On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus wrote:
> > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> > completely missed the fact that the normal form of a generated constant for 
> > a
> > mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
> > actual constant.  This even holds true for unsigned constants.
> > 
> > Fixed by masking out the upper bits for the incoming constant and sign
> > extending the resulting unsigned constant.
> > 
> > Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> > 
> > While reading existing optimizations in combine I stumbled across two
> > optimizations where either my intuition about the representation of
> > unsigned integers via a const_int rtx is wrong, which then in turn would
> > probably also mean that this patch is wrong, or that the optimizations
> > are missed sometimes.  In other words in the following I would assume
> > that the upper bits are masked out:
> > 
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index 468b7fde911..80c4ff0fbaf 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
> >    else if (is_a  (mode, &int_mode)
> >    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> > -  && ((unsigned HOST_WIDE_INT) const_op
> > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > (int_mode))
> >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> > 1)))
> >     {
> >   const_op = 0;
> > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
> >    else if (is_a  (mode, &int_mode)
> >    && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> > -  && ((unsigned HOST_WIDE_INT) const_op
> > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > (int_mode))
> >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> > 1)))
> >     {
> >   const_op = 0;
> > 
> > For example, while bootstrapping on x64 the optimization is missed since
> > a LTU comparison in QImode is done and the constant equals
> > 0xff80.
> > 
> > Sorry for inlining another patch, but I would really like to make sure
> > that my understanding is correct, now, before I come up with another
> > patch.  Thus it would be great if someone could shed some light on this.
> > 
> > gcc/ChangeLog:
> > 
> > * combine.cc (simplify_compare_const): Properly handle unsigned
> > constants while narrowing comparison of memory and constants.
> > ---
> >  gcc/combine.cc | 19 ++-
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index e46d202d0a7..468b7fde911 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> >    && !MEM_VOLATILE_P (op0)
> >    /* The optimization makes only sense for constants which are big 
> > enough
> >  so that we have a chance to chop off something at all.  */
> > -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> > -  /* Bail out, if the constant does not fit into INT_MODE.  */
> > -  && (unsigned HOST_WIDE_INT) const_op
> > -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) 
> > - 1)
> > +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 
> > 0xff
> >    /* Ensure that we do not overflow during normalization.  */
> > -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> > HOST_WIDE_INT_M1U))
> > +  && (code != GTU
> > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
> > +    < HOST_WIDE_INT_M1U)
> > +  && trunc_int_for_mode (const_op, int_mode) == const_op)
> >  {
> > -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> > +  unsigned HOST_WIDE_INT n
> > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> >    enum rtx_code adjusted_code;
> >  
> >    /* Normalize code to either LEU or GEU.  */
> > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
> > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> > -   (unsigned HOST_WIDE_INT)const_op, GET

Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 4:08 PM Hongtao Liu  wrote:
>
> On Wed, Aug 23, 2023 at 3:02 PM Jonathan Wakely  wrote:
> >
> >
> >
> > On Wed, 23 Aug 2023, 06:15 Hongtao Liu via Libstdc++, 
> >  wrote:
> >>
> >> On Wed, Aug 23, 2023 at 7:28 AM Hongtao Liu  wrote:
> >> >
> >> > On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
> >> >  wrote:
> >> > >
> >> > > On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via 
> >> > > Gcc-patches wrote:
> >> > > > Committed as obvious.
> >> > > >
> >> > > > Less obvious (to me) is whether it's correct to say "GCC V13" here. I
> >> > > > don't think we refer to a version that way anywhere else, do we?
> >> > > >
> >> > > > Would "since GCC 13.1.0" be better?
> >> > >
> >> > > x86_field_alignment uses
> >> > >
> >> > >   inform (input_location, "the alignment of %<_Atomic %T%> 
> >> > > "
> >> > >   "fields changed in %{GCC 11.1%}",
> >> > >
> >> > > so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
> >> > > to me.
> >> >  %{GCC 13.1%} sounds reasonable.
> >> looks like %{ can't be using in const char*, so use % instead.
> >>
> >> How about:
> >>
> >> Author: liuhongt 
> >> Date:   Wed Aug 23 07:31:13 2023 +0800
> >>
> >> Adjust GCC V13 to GCC 13.1 in diagnotic.
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC
> >> V13 to GCC 13.1.
> >>
> >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> index e7822ef6500..88d9d7d537f 100644
> >> --- a/gcc/config/i386/i386.cc
> >> +++ b/gcc/config/i386/i386.cc
> >> @@ -22899,7 +22899,7 @@ ix86_invalid_conversion (const_tree fromtype,
> >> const_tree totype)
> >>   || (TYPE_MODE (totype) == BFmode
> >>   && TYPE_MODE (fromtype) == HImode))
> >> warning (0, "%<__bfloat16%> is redefined from typedef % "
> >> -   "to real %<__bf16%> since GCC V13, be careful of "
> >> +   "to real %<__bf16%> since %, be careful of "
> >>  "implicit conversion between %<__bf16%> and %; "
> >>  "an explicit bitcast may be needed here");
> >>  }
> >
> >
> >
> > Why does it need to be quoted? What's wrong with just saying GCC 13.1 
> > without the %< decoration?
> I'll just remove that.
pushed to trunk and backport to GCC13 release branch.
> >
> >
> >
> >>
> >> > >
> >> > > > -- >8 --
> >> > > >
> >> > > > gcc/ChangeLog:
> >> > > >
> >> > > >   * config/i386/i386.cc (ix86_invalid_conversion): Fix grammar.
> >> > > > ---
> >> > > >  gcc/config/i386/i386.cc | 2 +-
> >> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> > > >
> >> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> > > > index 50860050049..5d57726e22c 100644
> >> > > > --- a/gcc/config/i386/i386.cc
> >> > > > +++ b/gcc/config/i386/i386.cc
> >> > > > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree 
> >> > > > fromtype, const_tree totype)
> >> > > >   warning (0, "%<__bfloat16%> is redefined from typedef 
> >> > > > % "
> >> > > >   "to real %<__bf16%> since GCC V13, be careful of "
> >> > > >"implicit conversion between %<__bf16%> and 
> >> > > > %; "
> >> > > > -  "a explicit bitcast may be needed here");
> >> > > > +  "an explicit bitcast may be needed here");
> >> > > >  }
> >> > > >
> >> > > >/* Conversion allowed.  */
> >> > > > --
> >> > > > 2.41.0
> >> > > >
> >> > >
> >> > > Marek
> >> > >
> >> >
> >> >
> >> > --
> >> > BR,
> >> > Hongtao
> >>
> >>
> >>
> >> --
> >> BR,
> >> Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")

2023-08-23 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 12:31 PM liuhongt  wrote:
>
> Both "graniterapid-d" and "graniterapids" are attached with
> PROCESSOR_GRANITERAPID in processor_alias_table but mapped to
> different __cpu_subtype in get_intel_cpu.
>
> And get_builtin_code_for_version will try to match the first
> PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to
> "granitepraids" here.
>
> 861  else if (new_target->arch_specified && new_target->arch > 0)
> 1862for (i = 0; i < pta_size; i++)
> 1863  if (processor_alias_table[i].processor == new_target->arch)
> 1864{
> 1865  const pta *arch_info = &processor_alias_table[i];
> 1866  switch (arch_info->priority)
> 1867{
> 1868default:
> 1869  arg_str = arch_info->name;
>
> This mismatch makes dispatch_function_versions check the preidcate
> of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes
> the issue.
> The patch explicitly adds PROCESSOR_ARROWLAKE_S and
> PROCESSOR_GRANITERAPIDS_D to make a distinction.
>
> For "alderlake","raptorlake", "meteorlake" they share same isa, cost,
> tuning, and mapped to the same __cpu_type/__cpu_subtype in
> get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk(and backport graniterapids-d part to GCC13)?
Push to trunk and backport to GCC13 release branch.
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc (processor_names): Add new
> member graniterapids-s and arrowlake-s.
> * config/i386/i386-options.cc (processor_alias_table): Update
> table with PROCESSOR_ARROWLAKE_S and
> PROCESSOR_GRANITERAPIDS_D.
> (m_GRANITERAPID_D): New macro.
> (m_ARROWLAKE_S): Ditto.
> (m_CORE_AVX512): Add m_GRANITERAPIDS_D.
> (processor_cost_table): Add icelake_cost for
> PROCESSOR_GRANITERAPIDS_D and alderlake_cost for
> PROCESSOR_ARROWLAKE_S.
> * config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as
> m_ARROWLAKE.
> * config/i386/i386.h (enum processor_type): Add new member
> PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Handle
> PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
> ---
>  gcc/common/config/i386/i386-common.cc | 11 +++--
>  gcc/config/i386/i386-c.cc | 15 +++
>  gcc/config/i386/i386-options.cc   |  6 ++-
>  gcc/config/i386/i386.h|  4 +-
>  gcc/config/i386/x86-tune.def  | 63 ++-
>  5 files changed, 62 insertions(+), 37 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 12a01704a73..1e11163004b 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2155,7 +2155,9 @@ const char *const processor_names[] =
>"alderlake",
>"rocketlake",
>"graniterapids",
> +  "graniterapids-d",
>"arrowlake",
> +  "arrowlake-s",
>"intel",
>"lujiazui",
>"geode",
> @@ -2279,13 +2281,14 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
> -  {"graniterapids-d", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, 
> PTA_GRANITERAPIDS_D,
> -M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D), P_PROC_AVX512F},
> +  {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
> +PTA_GRANITERAPIDS_D, M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
> +P_PROC_AVX512F},
>{"arrowlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE,
>  M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE), P_PROC_AVX2},
> -  {"arrowlake-s", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE_S,
> +  {"arrowlake-s", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
>  M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE_S), P_PROC_AVX2},
> -  {"lunarlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE_S,
> +  {"lunarlake", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
>  M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE_S), P_PROC_AVX2},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
> index caef5531593..0e11709ebc5 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -258,6 +258,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>def_or_undef (parse_in, "__graniterapids");
>def_or_undef (parse_in, "__graniterapids__");
>break;
> +case PROCESSOR_GRANITERAPIDS_D:
> +  def_or_undef (parse_in, "__graniterapids_d");
> +  def_or_undef (parse_in, "__graniterapids_d__");
> +  break;
>  case PROCESSOR_ALDERLAKE:
>

Re: [PATCH v5 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-23 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-24 at 11:13 +0800, Chenghui Pan wrote:
> - Add dg-skip-if for loongarch*-*-* in vshuf test in g++.dg/torture, because
>   vshuf/xvshuf insn's result is undefined when 6 or 7 bit of vector's element 
> is set,
>   and insns with this condition are generated in these testcases.

I'm almost sure this is wrong.  You need to fix the code generation so
__builtin_shuffle will always generate something defined on LoongArch,
instead of covering up the issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v5 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-23 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-24 at 11:40 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Thu, 2023-08-24 at 11:13 +0800, Chenghui Pan wrote:
> > - Add dg-skip-if for loongarch*-*-* in vshuf test in g++.dg/torture, because
> >   vshuf/xvshuf insn's result is undefined when 6 or 7 bit of vector's 
> > element is set,
> >   and insns with this condition are generated in these testcases.
> 
> I'm almost sure this is wrong.  You need to fix the code generation so
> __builtin_shuffle will always generate something defined on LoongArch,
> instead of covering up the issue.

https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html says clearly:

   The elements of the input vectors are numbered in memory ordering of
   vec0 beginning at 0 and vec1 beginning at N. The elements of mask are
   considered modulo N in the single-operand case and modulo 2*N in the
   two-operand case.
   
So there is no undefined thing allowed here.  You must implement it as it's
documented.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v1] RISC-V: Support rounding mode for VFMADD/VFMACC autovec

2023-08-23 Thread Pan Li via Gcc-patches
From: Pan Li 

There will be a case like below for intrinsic and autovec combination

vfadd RTZ   <- intrinisc static rounding
vfmadd  <- autovec/autovec-opt

The autovec generated vfmadd should take DYN mode, and the
frm must be restored before the vfmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+
| frrm  a5
| ...
| fsrmi 4
| vfadd   <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmadd  <- autovec/autovec-opt
| ...
+

However, we leverage unspec instead of use to consume the FRM register
because there are some restrictions from the combine pass. Some code
path of try_combine may require the XVECLEN(pat, 0) == 2 for the
recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
and result in the vfwmacc optimization failure. For example, in the
test  widen-complicate-5.c and widen-8.c

Finally, there will be other fma cases and they will be covered in
the underlying patches.

Signed-off-by: Pan Li 
Co-Authored-By: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
* config/riscv/autovec.md: Ditto.
* config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 32 ---
 gcc/config/riscv/autovec.md   | 26 +++---
 gcc/config/riscv/vector-iterators.md  |  2 +
 .../rvv/base/float-point-frm-autovec-1.c  | 88 +++
 4 files changed, 125 insertions(+), 23 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 99b609a99d9..4b07e80ad95 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -459,12 +459,14 @@ (define_insn_and_split "*pred_single_widen_mul"
 ;; vect__13.182_33 = .FMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
 (define_insn_and_split "*double_widen_fma"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (float_extend:VWEXTF
-   (match_operand: 2 "register_operand"))
- (float_extend:VWEXTF
-   (match_operand: 3 "register_operand"))
- (match_operand:VWEXTF 1 "register_operand")))]
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand"))
+   (float_extend:VWEXTF
+ (match_operand: 3 "register_operand"))
+   (match_operand:VWEXTF 1 "register_operand"))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -475,16 +477,19 @@ (define_insn_and_split "*double_widen_fma"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 ;; This helps to match ext + fma.
 (define_insn_and_split "*single_widen_fma"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (float_extend:VWEXTF
-   (match_operand: 2 "register_operand"))
- (match_operand:VWEXTF 3 "register_operand")
- (match_operand:VWEXTF 1 "register_operand")))]
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand"))
+   (match_operand:VWEXTF 3 "register_operand")
+   (match_operand:VWEXTF 1 "register_operand"))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -501,7 +506,8 @@ (define_insn_and_split "*single_widen_fma"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 ;; -
 ;;  [FP] VFWNMSAC
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index acca4c22b90..4894986d2a5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1126,22 +1126,27 @@ (define_insn_and_split "*fnma"
 (define_expand "fma4"
   [(parallel
 [(set (match_operand:VF 0 "register_operand")
- (fma:VF
-   (match_operand:VF 1 "register_operand")
-   (match_operand:VF 2 "register_operand")
-   (match_operand:VF 3 "register_operand")))
+ (unspec:VF
+   [(fma:VF
+ (match_operand:VF 1 "register_operand")
+ (match_operand:VF 2 "register_operand")
+ (match_operand:VF 3 "register_operand"))
+(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
  (clobber (match_dup 4))])]
   "TARGET_VECTOR"
  

RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-23 Thread Li, Pan2 via Gcc-patches
Thanks Jeff.

> That implies a save/restore pair around the call (possibly optimized so 
> that we minimize the number of save/restores).  I would have expected 
> x86 to already be doing this.  But maybe there's some ABI thing around 
> mmx vs x86 state that allows it to be avoided

Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add 
save/restore
pair around the call. I bet mode-switching take care of this already.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, August 24, 2023 7:27 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and 
> x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above 
> the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding 
> mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. 
> While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is 
> honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the 
> x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided

> 
> For the rest part, will have a try based on your suggestion soon as I am in 
> the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff


RE: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

2023-08-23 Thread Li, Pan2 via Gcc-patches
> So in the expand method, you added a case for OP_TYPE_vx. 

Actually this patch doesn't add a case OP_TYPE_vx, there are two classes 
binop_frm and binop before this patch.
Binop_frm doesn't have OP_TYPE_vx while binop has OP_TYPE_vx. When delete the 
whole binop_frm, the git diff demo
It looks like add a case OP_TYPE_vx but actually not.

As Jeff pre-approved, will commit the v2 (add gcc_assert suggested by kito) 
around the end of this week if no more comments.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Tuesday, August 22, 2023 8:10 AM
To: Kito Cheng ; Jeff Law 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: RE: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

Thanks Kito and Jeff for comments, will double check and address the comment in 
v2.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, August 21, 2023 11:07 PM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

Just one nit from me: plz add assertion to OP_TYPE_vx to make sure NO
FRM_OP == HAS_FRM there

On Mon, Aug 21, 2023 at 11:04 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/17/23 20:53, Pan Li via Gcc-patches wrote:
> > From: Pan Li 
> >
> > As suggested by kito, we will add new frm_opt_type template arg
> > to the op class, to avoid the duplicated function expand.
> >
> > Signed-off-by: Pan Li 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-vector-builtins-bases.cc
> >   (class binop_frm): Removed.
> >   (class reverse_binop_frm): Ditto.
> >   (class widen_binop_frm): Ditto.
> >   (class vfmacc_frm): Ditto.
> >   (class vfnmacc_frm): Ditto.
> >   (class vfmsac_frm): Ditto.
> >   (class vfnmsac_frm): Ditto.
> >   (class vfmadd_frm): Ditto.
> >   (class vfnmadd_frm): Ditto.
> >   (class vfmsub_frm): Ditto.
> >   (class vfnmsub_frm): Ditto.
> >   (class vfwmacc_frm): Ditto.
> >   (class vfwnmacc_frm): Ditto.
> >   (class vfwmsac_frm): Ditto.
> >   (class vfwnmsac_frm): Ditto.
> >   (class unop_frm): Ditto.
> >   (class vfrec7_frm): Ditto.
> >   (class binop): Add frm_op_type template arg.
> >   (class unop): Ditto.
> >   (class widen_binop): Ditto.
> >   (class widen_binop_fp): Ditto.
> >   (class reverse_binop): Ditto.
> >   (class vfmacc): Ditto.
> >   (class vfnmsac): Ditto.
> >   (class vfmadd): Ditto.
> >   (class vfnmsub): Ditto.
> >   (class vfnmacc): Ditto.
> >   (class vfmsac): Ditto.
> >   (class vfnmadd): Ditto.
> >   (class vfmsub): Ditto.
> >   (class vfwmacc): Ditto.
> >   (class vfwnmacc): Ditto.
> >   (class vfwmsac): Ditto.
> >   (class vfwnmsac): Ditto.
> >   (class float_misc): Ditto.
> So in the expand method, you added a case for OP_TYPE_vx.  I assume that
> was intentional -- but it's not mentioned anywhere in the ChangeLog.  So
> please update the ChangeLog if it was intentional or remove the change
> if it wasn't intentional.  Pre-approved with whichever change is
> appropriate.
>
> Thanks,
> Jeff


Re: [PATCH] rs6000: Disable PCREL for unsupported targets [PR111045]

2023-08-23 Thread Kewen.Lin via Gcc-patches
Hi Peter,

on 2023/8/24 10:07, Peter Bergner wrote:
> On 8/21/23 8:51 PM, Kewen.Lin wrote:
>>> The following patch has been bootstrapped and regtested on powerpc64-linux.
>>
>> I think we should test this on powerpc64le-linux P8 or P9 (no P10) as well.
> 
> That's a good idea!
> 
> 
> 
>> I think this should be moved to be with the hunk on PCREL:
>>
>>   /* If the ABI has support for PC-relative relocations, enable it by 
>> default.
>>  This test depends on the sub-target tests above setting the code model 
>> to
>>  medium for ELF v2 systems.  */
>>   if (PCREL_SUPPORTED_BY_OS
>>   && (rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0)
>> rs6000_isa_flags |= OPTION_MASK_PCREL;
>>
>>   /* -mpcrel requires -mcmodel=medium, but we can't check TARGET_CMODEL until
>>   after the subtarget override options are done.  */
>>   else if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
>> {
>>   if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
>>  error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
>>
>>   rs6000_isa_flags &= ~OPTION_MASK_PCREL;
>> }
> 
> Agreed on the location, but...
> 
> Looking at this closer, I don't think I'm happy with the current code.
> We seem to have duplicated tests for whether the target supports pcrel
> or not in both PCREL_SUPPORTED_BY_OS and rs6000_pcrel_p().  That means
> if another target were to add support for pcrel, they'd have to update
> multiple locations of the code, and that seems error prone.
> 

Good point!  I also noticed this, but I wasn't sure what the future
supported target/OS combinations will be like, what would be common
fundamentally, then I thought maybe we can just leave it for now.
I change my mind now and I agree we can do more.

> I think we should standardize our tests for whether the target/OS
> supports pcrel (irrespective of the -mpcrel or -mcmodel=medium options)
> and have that in PCREL_SUPPORTED_BY_OS.  Ie, a one-stop-shop for testing
> whether the current target/OS can support pcrel.  Then we should modify
> rs6000_pcrel_p() use PCREL_SUPPORTED_BY_OS rather than its own
> semi-duplicated target/OS tests, plus any other tests for options that
> might disqualify the current target/OS from supporting pcrel, when it
> normally can (ie, -mmodel != medium for ELFv2).

By looking into the uses of function rs6000_pcrel_p, I think we can
just replace it with TARGET_PCREL.  Previously we don't require PCREL
unset for any unsupported target/OS, so we need rs6000_pcrel_p() to
ensure it's really supported in those use places, now if we can guarantee
TARGET_PCREL only hold for all places where it's supported, it looks
we can just check TARGET_PCREL only?

Then the code structure can look like:

if (PCREL_SUPPORTED_BY_OS
&& (rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0)
   // enable
else if (TARGET_PCREL && DEFAULT_ABI != ABI_ELFv2)
   // disable
else if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
   // disable

Here, ABI_ELFv2 and CMODEL_MEDIUM checking is current supported
target/OS specific, in future when we have any new supported
target/OS, this part can be factored out as one subtarget specific
checking function/macro.

Does it make sense?

BR,
Kewen

> 
> I think then, that should allow simplifying the code in
> rs6000_option_override_internal.
> 
> Thoughts?
> 
> 
> Peter
> 
> 



Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-23 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Aug 2023, 04:38 Hongtao Liu,  wrote:

> On Wed, Aug 23, 2023 at 4:08 PM Hongtao Liu  wrote:
> >
> > On Wed, Aug 23, 2023 at 3:02 PM Jonathan Wakely 
> wrote:
> > >
> > >
> > >
> > > On Wed, 23 Aug 2023, 06:15 Hongtao Liu via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
> > >>
> > >> On Wed, Aug 23, 2023 at 7:28 AM Hongtao Liu 
> wrote:
> > >> >
> > >> > On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
> > >> >  wrote:
> > >> > >
> > >> > > On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via
> Gcc-patches wrote:
> > >> > > > Committed as obvious.
> > >> > > >
> > >> > > > Less obvious (to me) is whether it's correct to say "GCC V13"
> here. I
> > >> > > > don't think we refer to a version that way anywhere else, do we?
> > >> > > >
> > >> > > > Would "since GCC 13.1.0" be better?
> > >> > >
> > >> > > x86_field_alignment uses
> > >> > >
> > >> > >   inform (input_location, "the alignment of %<_Atomic
> %T%> "
> > >> > >   "fields changed in %{GCC
> 11.1%}",
> > >> > >
> > >> > > so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks
> unusual
> > >> > > to me.
> > >> >  %{GCC 13.1%} sounds reasonable.
> > >> looks like %{ can't be using in const char*, so use %
> instead.
> > >>
> > >> How about:
> > >>
> > >> Author: liuhongt 
> > >> Date:   Wed Aug 23 07:31:13 2023 +0800
> > >>
> > >> Adjust GCC V13 to GCC 13.1 in diagnotic.
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> * config/i386/i386.cc (ix86_invalid_conversion): Adjust
> GCC
> > >> V13 to GCC 13.1.
> > >>
> > >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > >> index e7822ef6500..88d9d7d537f 100644
> > >> --- a/gcc/config/i386/i386.cc
> > >> +++ b/gcc/config/i386/i386.cc
> > >> @@ -22899,7 +22899,7 @@ ix86_invalid_conversion (const_tree fromtype,
> > >> const_tree totype)
> > >>   || (TYPE_MODE (totype) == BFmode
> > >>   && TYPE_MODE (fromtype) == HImode))
> > >> warning (0, "%<__bfloat16%> is redefined from typedef
> % "
> > >> -   "to real %<__bf16%> since GCC V13, be careful of "
> > >> +   "to real %<__bf16%> since %, be careful of
> "
> > >>  "implicit conversion between %<__bf16%> and
> %; "
> > >>  "an explicit bitcast may be needed here");
> > >>  }
> > >
> > >
> > >
> > > Why does it need to be quoted? What's wrong with just saying GCC 13.1
> without the %< decoration?
> > I'll just remove that.
> pushed to trunk and backport to GCC13 release branch.
>

Thanks!


> >
> > >
> > >
> > >>
> > >> > >
> > >> > > > -- >8 --
> > >> > > >
> > >> > > > gcc/ChangeLog:
> > >> > > >
> > >> > > >   * config/i386/i386.cc (ix86_invalid_conversion): Fix
> grammar.
> > >> > > > ---
> > >> > > >  gcc/config/i386/i386.cc | 2 +-
> > >> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >> > > >
> > >> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > >> > > > index 50860050049..5d57726e22c 100644
> > >> > > > --- a/gcc/config/i386/i386.cc
> > >> > > > +++ b/gcc/config/i386/i386.cc
> > >> > > > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree
> fromtype, const_tree totype)
> > >> > > >   warning (0, "%<__bfloat16%> is redefined from typedef
> % "
> > >> > > >   "to real %<__bf16%> since GCC V13, be careful of "
> > >> > > >"implicit conversion between %<__bf16%> and
> %; "
> > >> > > > -  "a explicit bitcast may be needed here");
> > >> > > > +  "an explicit bitcast may be needed here");
> > >> > > >  }
> > >> > > >
> > >> > > >/* Conversion allowed.  */
> > >> > > > --
> > >> > > > 2.41.0
> > >> > > >
> > >> > >
> > >> > > Marek
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > BR,
> > >> > Hongtao
> > >>
> > >>
> > >>
> > >> --
> > >> BR,
> > >> Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
>


Re: [PATCH] MATCH: [PR111109] Fix bit_ior(cond, cond) when comparisons are fp

2023-08-23 Thread Richard Biener via Gcc-patches
On Wed, Aug 23, 2023 at 11:51 PM Andrew Pinski via Gcc-patches
 wrote:
>
> The patterns that were added in r13-4620-g4d9db4bdd458, missed that
> (a > b) and (a <= b) are not inverse of each other for floating point
> comparisons (if NaNs are supported). Even though there was a check for
> intergal types, it was only for the result of the cond rather for the
> type of what is being compared. The fix is to check to see if cmp and
> icmp are inverse of each other by using the invert_tree_comparison function.
>
> OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu 
> with no regressions.

OK.

Thanks,
Richard.

> I added the testcase to execute/ieee as it requires support for NAN.
>
> PR tree-optimization/09
>
> gcc/ChangeLog:
>
> * match.pd (ior(cond,cond), ior(vec_cond,vec_cond)):
> Add check to make sure cmp and icmp are inverse.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
> ---
>  gcc/match.pd  | 11 ++-
>  .../execute/ieee/fp-cmp-cond-1.c  | 78 +++
>  2 files changed, 86 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 85b7d323a19..b666d73b189 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2087,6 +2087,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (bit_and:c (convert? (cmp@0  @01 @02)) @3)
> (bit_and:c (convert? (icmp@4 @01 @02)) @5))
>  (if (INTEGRAL_TYPE_P (type)
> +&& invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
>  /* The scalar version has to be canonicalized after vectorization
> because it makes unconditional loads conditional ones, which
> means we lose vectorization because the loads may trap.  */
> @@ -2101,6 +2102,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cond (cmp@0  @01 @02) @3 zerop)
> (cond (icmp@4 @01 @02) @5 zerop))
>  (if (INTEGRAL_TYPE_P (type)
> +&& invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
>  /* The scalar version has to be canonicalized after vectorization
> because it makes unconditional loads conditional ones, which
> means we lose vectorization because the loads may trap.  */
> @@ -2113,13 +2115,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(bit_ior
> (bit_and:c (vec_cond:s (cmp@0 @6 @7) @4 @5) @2)
> (bit_and:c (vec_cond:s (icmp@1 @6 @7) @4 @5) @3))
> -(if (integer_zerop (@5))
> +(if (integer_zerop (@5)
> +&& invert_tree_comparison (cmp, HONOR_NANS (@6)) == icmp)
>   (switch
>(if (integer_onep (@4))
> (bit_and (vec_cond @0 @2 @3) @4))
> (if (integer_minus_onep (@4))
>  (vec_cond @0 @2 @3)))
> -(if (integer_zerop (@4))
> +(if (integer_zerop (@4)
> +&& invert_tree_comparison (cmp, HONOR_NANS (@6)) == icmp)
>   (switch
>(if (integer_onep (@5))
> (bit_and (vec_cond @0 @3 @2) @5))
> @@ -2132,7 +2136,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(bit_ior
> (vec_cond:s (cmp@0 @4 @5) @2 integer_zerop)
> (vec_cond:s (icmp@1 @4 @5) @3 integer_zerop))
> -(vec_cond @0 @2 @3)))
> +  (if (invert_tree_comparison (cmp, HONOR_NANS (@4)) == icmp)
> +   (vec_cond @0 @2 @3
>
>  /* Transform X & -Y into X * Y when Y is { 0 or 1 }.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c
> new file mode 100644
> index 000..4a3c4b0eee2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-cond-1.c
> @@ -0,0 +1,78 @@
> +/* PR tree-optimization/09 */
> +
> +/*
> +   f should return 0 if either fa and fb are a nan.
> +   Rather than the value of a or b.
> +*/
> +__attribute__((noipa))
> +int f(int a, int b, float fa, float fb) {
> +  const _Bool c = fa < fb;
> +  const _Bool c1 = fa >= fb;
> +  return (c * a) | (c1 * b);
> +}
> +
> +/*
> +   f1 should return 0 if either fa and fb are a nan.
> +   Rather than the value of a&1 or b&1.
> +*/
> +__attribute__((noipa))
> +int f1(int a, int b, float fa, float fb) {
> +  const _Bool c = fa < fb;
> +  const _Bool c1 = fa >= fb;
> +  return (c & a) | (c1 & b);
> +}
> +
> +#if __SIZEOF_INT__ == __SIZEOF_FLOAT__
> +typedef int v4si __attribute__ ((vector_size (1*sizeof(int;
> +typedef float v4sf __attribute__ ((vector_size (1*sizeof(float;
> +/*
> +   fvf0 should return {0} if either fa and fb are a nan.
> +   Rather than the value of a or b.
> +*/
> +__attribute__((noipa))
> +v4si vf0(v4si a, v4si b, v4sf fa, v4sf fb) {
> +  const v4si c = fa < fb;
> +  const v4si c1 = fa >= fb;
> +  return (c & a) | (c1 & b);
> +}
> +
> +
> +#endif
> +
> +int main(void)
> +{
> +  float a = __builtin_nan("");
> +
> +  if (f(-1,-1, a, a) != 0)
> +__builtin_abort();
> +  if (f(-1,-1, a, 0) != 0)
> +__builtin_abort();
> +  if (f(-1

Re: [PATCH] VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizer

2023-08-23 Thread Richard Biener via Gcc-patches
On Thu, 24 Aug 2023, Juzhe-Zhong wrote:

> Hi.
> 
> This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer.
> 
> Consider this following case:
> #include 
> 
> #define N 32
> 
> /* Simple condition reduction.  */
> 
> int __attribute__ ((noinline, noclone))
> condition_reduction (int *a, int min_v)
> {
>   int last = 66; /* High start value.  */
> 
>   for (int i = 0; i < N; i++)
> if (a[i] < min_v)
>   last = i;
> 
>   return last;
> }
> 
> With this patch, we can generate this following IR:
> 
>   _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]);
>   _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... };
>   ivtmp_36 = _44 * 4;
>   vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0);
> 
>   mask__11.9_41 = vect__4.8_39 < vect_cst__40;
>   last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, 
> _44, 0);
>   ...

LGTM.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (vectorizable_reduction): Apply 
> LEN_FOLD_EXTRACT_LAST.
> * tree-vect-stmts.cc (vectorizable_condition): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc  |  7 --
>  gcc/tree-vect-stmts.cc | 52 --
>  2 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 1cd6c291377..ebee8037e02 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7494,8 +7494,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>   }
>  
>if (reduc_chain_length == 1
> -   && direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST,
> -  vectype_in, OPTIMIZE_FOR_SPEED))
> +   && (direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST, vectype_in,
> +   OPTIMIZE_FOR_SPEED)
> +   || direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
> +  vectype_in,
> +  OPTIMIZE_FOR_SPEED)))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 413a88750d6..be9f3a280bd 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11740,8 +11740,17 @@ vectorizable_condition (vec_info *vinfo,
> && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>   {
> if (reduction_type == EXTRACT_LAST_REDUCTION)
> - vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo),
> -ncopies * vec_num, vectype, NULL);
> + {
> +   if (direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
> +   vectype, OPTIMIZE_FOR_SPEED))
> + vect_record_loop_len (loop_vinfo,
> +   &LOOP_VINFO_LENS (loop_vinfo),
> +   ncopies * vec_num, vectype, 1);
> +   else
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo),
> +ncopies * vec_num, vectype, NULL);
> + }
> /* Extra inactive lanes should be safe for vect_nested_cycle.  */
> else if (STMT_VINFO_DEF_TYPE (reduc_info) != vect_nested_cycle)
>   {
> @@ -11772,7 +11781,13 @@ vectorizable_condition (vec_info *vinfo,
>   mask to the condition, or to its inverse.  */
>  
>vec_loop_masks *masks = NULL;
> -  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +  vec_loop_lens *lens = NULL;
> +  if (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +{
> +  if (reduction_type == EXTRACT_LAST_REDUCTION)
> + lens = &LOOP_VINFO_LENS (loop_vinfo);
> +}
> +  else if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  {
>if (reduction_type == EXTRACT_LAST_REDUCTION)
>   masks = &LOOP_VINFO_MASKS (loop_vinfo);
> @@ -11910,7 +11925,8 @@ vectorizable_condition (vec_info *vinfo,
>/* Force vec_compare to be an SSA_NAME rather than a comparison,
>in cases where that's necessary.  */
>  
> -  if (masks || reduction_type == EXTRACT_LAST_REDUCTION)
> +  tree len = NULL_TREE, bias = NULL_TREE;
> +  if (masks || lens || reduction_type == EXTRACT_LAST_REDUCTION)
>   {
> if (!is_gimple_val (vec_compare))
>   {
> @@ -11931,6 +11947,23 @@ vectorizable_condition (vec_info *vinfo,
> vec_compare = vec_compare_name;
>   }
>  
> +   if (direct_internal_fn_supported_p (IFN_LEN_FOLD_EXTRACT_LAST,
> +   vectype, OPTIMIZE_FOR_SPEED))
> + {
> +   if (lens)
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens,
> +vec_num * ncopies, vectype, i, 1);
> +   signed ch

Re: [PATCH] MATCH: remove negate for 1bit types

2023-08-23 Thread Richard Biener via Gcc-patches
On Thu, Aug 24, 2023 at 4:39 AM Andrew Pinski via Gcc-patches
 wrote:
>
> For 1bit types, negate is either undefined or don't change the value.
> In either cases we want to remove them.
> This patch adds a match pattern to do that.
> Also converting to a 1bit type we can remove the negate just like we already 
> do
> for `&1` so this patch adds that too.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> Notes on the testcases:
> This patch is the last part to fix PR 95929; cond-bool-2.c testcase.
> bit1neg-1.c is a 1bit-field testcase where we could remove the assignment
> all the way in one case (which happened on the RTL level for some targets but 
> not all).
> cond-bool-2.c is the reduced testcase of PR 95929.
>
> PR tree-optimization/95929
>
> gcc/ChangeLog:
>
> * match.pd (convert?(-a)): New pattern
> for 1bit integer types.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bit1neg-1.c: New test.
> * gcc.dg/tree-ssa/cond-bool-1.c: New test.
> * gcc.dg/tree-ssa/cond-bool-2.c: New test.
> ---
>  gcc/match.pd| 12 ++
>  gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c   | 23 ++
>  gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c | 21 +
>  gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c | 26 +
>  4 files changed, 82 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a2e56d5a4e8..3bbeceb37b4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9090,6 +9090,18 @@ and,
>   (if (!TYPE_OVERFLOW_SANITIZED (type))
>(bit_and @0 @1)))
>
> +/* `-a` is just `a` if the type is 1bit wide or when converting
> +   to a 1bit type; similar to the above transformation of `(-x)&1`.
> +   This is used mostly with the transformation of
> +   `a ? ~b : b` into `(-a)^b`.
> +   It also can show up with bitfields.  */
> +(simplify
> + (convert? (negate @0))
> + (if (INTEGRAL_TYPE_P (type)
> +  && TYPE_PRECISION (type) == 1
> +  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)))
> +  (convert @0)))
> +
>  /* Optimize
> c1 = VEC_PERM_EXPR (a, a, mask)
> c2 = VEC_PERM_EXPR (b, b, mask)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
> new file mode 100644
> index 000..2f123fbb9b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bit1neg-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +struct f
> +{
> +  int a:1;
> +};
> +
> +void g(struct f *a)
> +{
> + int t = a->a;
> + t = -t;
> + a->a = t;
> +}
> +void g1(struct f *a, int b)
> +{
> + int t = b;
> + t = -t;
> + a->a = t;
> +}
> +/* the 2 negates should have been removed as this is basically the same
> +   as (-a) & 1. */
> +/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
> new file mode 100644
> index 000..752a3030ad1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +_Bool f1(int a, int b)
> +{
> +  _Bool _1 = b != 0;
> +  _Bool _2 = a != 0;
> +  _Bool _8 = a == 0;
> +  _Bool _13;
> +  if (_1) _13 = _8; else _13 = _2;
> +  return _13;
> +}
> +
> +/* We should be able to optimize this to (a != 0) ^ (b != 0) */
> +/* There should be no negate_expr nor gimple_cond here. */
> +
> +/* { dg-final { scan-tree-dump-not "negate_expr, " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "ne_expr, " 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_assign " 3 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
> new file mode 100644
> index 000..b3e7e25dec6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +
> +/* PR tree-optimization/95929 */
> +
> +
> +static inline _Bool nand(_Bool a, _Bool b)
> +{
> +return !(a && b);
> +}
> +
> +_Bool f(int a, int b)
> +{
> +return nand(nand(b, nand(a, a)), nand(a, nand(b, b)));
> +}
> +
> +/* We should be able to optimize this to (a != 0) ^ (b != 0) */
> +/* There should be no negate_expr nor gimple_cond here. */
> +
> +/* { dg-final { scan-tree-dump-not "negate_expr, " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "ne_