date:20230720

loop-ch improvements, part 3

2023-07-20 Thread Jan Hubicka via Gcc-patches

Hi,
this patch makes tree-ssa-loop-ch to understand if-combined conditionals (which
are quite common) and remove the IV-derived heuristics.  That heuristics is
quite dubious because every variable with PHI in header of integral or pointer
type is seen as IV, so in the first basic block we match all loop invariants as
invariants and everything that chagnes in loop as IV-like.

I think the heuristics was mostly there to make header duplication happen when
the exit conditional is constant false in the first iteration and with ranger
we can work this out in good enough precision.

The patch adds notion of "combined exit" which has conditional that is
and/or/xor of loop invariant exit and exit known to be false in first
iteration.  Copying these is a win since the loop conditional will simplify
in both copies.

It seems that those are usual bit or/and/xor and the code size accounting is
true only when the values have at most one bit set or when the static constant
and invariant versions are simple (such as all zeros).  I am not testing this,
so the code may be optimistic here.  I think it is not common enough to matter
and I can not think of correct condition that is not quite complex.

I also improved code size estimate not accounting non-conditionals that are
know to be constant in peeled copy and improved debug output.

This requires testsuite compensaiton.  uninit-pred-loop-1.c.C does:

/* { dg-do compile } */
/* { dg-options "-Wuninitialized -O2 -std=c++98" } */

extern int bar();
int foo(int n, int m)
{
 for (;;) {
   int err = ({int _err; 
 for (int i = 0; i < 16; ++i) {
   if (m+i > n)
  break;
   _err = 17;
   _err = bar();
 }
 _err; 
   }); 

   if (err == 0) return 17;
}

Before path we duplicate
   if (m+i > n)
which makes maybe-uninitialized warning to not be output.  I do not quite see
why copying this out would be a win, since it won't simlify.  Also I think the
warning is correct.  if m>n the loop will bail out before initializing _err and
it will be used unitialized.  I think it is bug elsewhere that header
duplication supresses this.

copy headers does:
int is_sorted(int *a, int n, int m, int k)
{
  for (int i = 0; i < n - 1 && m && k > i; i++)
if (a[i] > a[i + 1])
  return 0;
  return 1;
}

it tests that all three for statement conditionals are duplicaed.  With patch
we no longer do k>i since it is not going to simplify.  So I added test
ensuring that k is positive.  Also the tests requires disabling if-combining and
vrp to avoid conditionals becoming combined ones. So I aded new version of test
that we now behave correctly aslo with if-combine.

ivopt_mult_2.c and ivopt_mult_1.c seems to require loop header
duplication for ivopts to behave particular way, so I also ensured by value
range that the header is duplicated.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-ch.cc (edge_range_query): Rename to ...
(get_range_query): ... this one; do 
(static_loop_exit): Add query parametr, turn ranger to reference.
(loop_static_stmt_p): New function.
(loop_static_op_p): New function.
(loop_iv_derived_p): Remove.
(loop_combined_static_and_iv_p): New function.
(should_duplicate_loop_header_p): Discover combined onditionals;
do not track iv derived; improve dumps.
(pass_ch::execute): Fix whitespace.

gcc/testsuite/ChangeLog:

* g++.dg/uninit-pred-loop-1_c.C: Allow warning.
* gcc.dg/tree-ssa/copy-headers-7.c: Add tests so exit conditition is
static; update template.
* gcc.dg/tree-ssa/ivopt_mult_1.c: Add test so exit condition is static.
* gcc.dg/tree-ssa/ivopt_mult_2.c: Add test so exit condition is static.
* gcc.dg/tree-ssa/copy-headers-8.c: New test.

diff --git a/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C 
b/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
index 711812aae1b..1ee1615526f 100644
--- a/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
+++ b/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
@@ -15,7 +15,7 @@ int foo(int n, int m)
  _err; 
}); 
 
-   if (err == 0) return 17;
+   if (err == 0) return 17;/* { dg-warning "uninitialized" "warning" } */
  }
 
  return 18;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
index 3c9b3807041..e2a6c75f2e9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
@@ -3,9 +3,10 @@
 
 int is_sorted(int *a, int n, int m, int k)
 {
-  for (int i = 0; i < n - 1 && m && k > i; i++)
-if (a[i] > a[i + 1])
-  return 0;
+  if (k > 0)
+for (int i = 0; i < n - 1 && m && k > i; i++)
+  if (a[i] > a[i + 1])
+   return 0;
   return 1;
 }
 
@@ -13,4 +14,8 @@ int is_sorted(int *a, int n, int m, int k)
the invariant test, not the alternate exit test.  */
 
 /* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
+/* { dg-final {

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-20 Thread Richard Sandiford via Gcc-patches

Jeff Law via Gcc-patches  writes:
> On 7/19/23 04:25, Richard Biener wrote:
>> On Wed, 19 Jul 2023, YunQiang Su wrote:
>> 
>>> Eric Botcazou  ?2023?7?19??? 17:45???

> I don't see that.  That's definitely not what GCC expects here,
> the left-most word of the doubleword should be unchanged.
>
> Your testcase should be a dg-do-run and probably more like
>
> NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
> {
>int val;
>((unsigned char*)&val)[0] = *buf++;
>((unsigned char*)&val)[1] = *buf++;
>((unsigned char*)&val)[2] = *buf++;
>((unsigned char*)&val)[3] = *buf++;
>return val;
> }
> int main()
> {
>int val = 0x01020304;
>val = test (&val);
>if (val != 0x01020304)
>  abort ();
> }
>
> not sure if I got endianess correct.  Now, the question is what
> WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
> the MIPS ABI says for returning SImode.

>>>
>>> MIPS N64 ABI uses 2 GPR for integer return values.
>>> If the return value is SImode, the first v0 register is used, and it
>>> must be sign-extended,
>>> aka the bits[64-31] are all same.
>>>
>>> Yes, it is same for signed and unsigned int32.
>>>
>>> https://irix7.com/techpubs/007-2816-004.pdf
>>> Page 6:
>>> 32-bit integer (int) parameters are always sign-extended when passed
>>> in registers,
>>> whether of signed or unsigned type. [This issue does not arise in the
>>> o32-bit ABI.]
>> 
>> Note I think Andrews comment#7 in the PR is spot-on then, the issue
>> isn't the bitfield inserts but the compare where combine elides
>> the sign_extend in favor of a subreg.  That's likely some wrongdoing
>> in simplify-rtx in the context of WORD_REGISTER_OPERATIONS.
> And I think it raises a real question about the use of GPR (which maps 
> to SImode and DImode for 64bit MIPS targets) on the conditional 
> branching patterns in mips.md.
>
> So while this code works:
>
>> (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
>> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4))) 
>> "/app/example.cpp":7:29 -1
>>  (nil))

Haven't had chance to compile and look at it properly, but this subreg
seems suspicious for MIPS, given the definition of TRULY_NOOP_TRUNCATION.
We should instead use a truncdisi2 to narrow reg:DI 200 to an SI register,
and then sign_extend it.

This is easily missed in target-independent code because so few targets
define TRULY_NOOP_TRUNCATION.

Where is the subreg being generated?

Richard

>> (jump_insn 23 20 24 2 (set (pc)
>> (if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
>> (const_int 0 [0]))
>> (label_ref 32)
>> (pc))) "/app/example.cpp":8:5 -1
>>  (int_list:REG_BR_PROB 440234148 (nil))
>>  -> 32)
>
>
> Normally the narrowing SUBREG in insn 23 would indicate we don't care 
> about the bits outside SImode.  But on a W_R_O targets we very much care 
> because the hardware is going to ultimately do the comparison in 64 bits.
>
> As Andrew/Richi have indicated this very much points to combine as 
> incorrectly eliminating the explict sign extension.  Most likely because 
> something saw the SUBREG and concluded those upper bits set by insn 20 
> were "don't care" bits.
>
> But it may ultimately be be better for the MIPS port to not expose a 
> SImode comparison.  Thus reducing the reliance on W_R_O and its 
> under-specified semantics and ultimately having the RTL map more closely 
> to what the hardware actually does/supports.
>
> That's the model we're working towards on the RISC-V port as well.  I 
> wouldn't be surprised if we eventually get to the point where we 
> eliminate WORD_REGISTER_OPERATIONS entirely.
>
> And yes, bitfield operations are one of the nasty sticking points.  The 
> thinking for them is that we want to support bit manipulations where the 
> bit position is variable.  To do that we will emit an explicit sign 
> extension after such operations.  Then rely on improved REE to identify 
> and remove those redundant extensions.
>
> Jeff
>
> Jeff

Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-20 Thread Richard Biener via Gcc-patches

On Wed, Jul 19, 2023 at 4:34 PM Maciej W. Rozycki  wrote:
>
> On Wed, 12 Jul 2023, Richard Biener wrote:
>
> > > > That said, we should handle this better so can you file an
> > > > enhancement bugreport for this?
> > >
> > >  Filed as PR -optimization/110630.
> >
> > Thanks!
>
>  Thanks for making this improvement.  I've checked MIPS results and code
> produced now is as follows:
>
> daddiu  $sp,$sp,-64
> sd  $5,24($sp)
> sd  $7,40($sp)
> ldc1$f0,24($sp)
> ldc1$f1,40($sp)
> sd  $4,16($sp)
> sd  $6,32($sp)
> ldc1$f2,32($sp)
> add.ps  $f1,$f0,$f1
> ldc1$f0,16($sp)
> add.ps  $f0,$f0,$f2
> sdc1$f1,56($sp)
> ld  $3,56($sp)
> sdc1$f0,48($sp)
> ld  $2,48($sp)
> jr  $31
> daddiu  $sp,$sp,64
>
> which does do vector stuff now, although it's still considerably worse
> than my handwritten example:
>
> > > dmtc1   $4,$f0
> > > dmtc1   $5,$f1
> > > dmtc1   $6,$f2
> > > dmtc1   $7,$f3
> > > add.ps  $f0,$f0,$f1
> > > add.ps  $f2,$f2,$f3
> > > dmfc1   $2,$f0
> > > jr  $31
> > > dmfc1   $3,$f2
>
> Or I'd say it's pretty terrible, but given the current situation with the
> MIPS backend I'm going to leave it to the new maintainer to sort out.

Yeah, I also wondered what is wrong ... I suspect it's the usual issue
of parameter passing causing spilling ...

> > >  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> > > testsuite?
> >
> > Sure, more coverage is always  nice.
>
>  Thanks, committed (with the `vect64' requirement removed, as we can take
> it for granted with `vect_float').
>
>   Maciej

Re: [PATCH] Add __builtin_iseqsig()

2023-07-20 Thread Richard Biener via Gcc-patches

On Wed, 19 Jul 2023, FX Coudert wrote:

> 6 weeks later, I?d like to ask a global maintainer to review this.
> The idea was okay?ed previously by Joseph Myers, but he asked for testing of 
> both the quiet and signalling NaN cases, which is now done.

OK.

Thanks,
Richard.

> FX
> 
> 
> > Le 6 juin 2023 ? 20:15, FX Coudert  a ?crit :
> > 
> > Hi,
> > 
> > (It took me a while to get back to this.)
> > 
> > This is a new and improved version of the patch at 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602932.html
> > It addresses the comment from Joseph that FE_INVALID should really be 
> > tested in the case of both quiet and signaling NaNs, which is now done 
> > systematically.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu
> > OK to commit?
> > 
> > FX
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Jul 2023, Richard Sandiford wrote:

> Tamar Christina  writes:
> > Hi All,
> >
> > The resulting predicate register of a whilelo is not
> > restricted to the lower half of the predicate register file.
> >
> > As such these tests started failing after recent changes
> > because the whilelo outside the loop is getting assigned p15.
> 
> It's the whilelo in the loop for me.  We go from:
> 
> .L3:
> ld1bz31.b, p7/z, [x4, x3]
> movprfx z30, z31
> mul z30.b, p5/m, z30.b, z29.b
> st1bz30.b, p7, [x4, x3]
> mov p6.b, p7.b
> add x3, x3, x0
> whilelo p7.b, w3, w1
> b.any   .L3
> 
> to:
> 
> .L3:
> ld1bz31.b, p7/z, [x3, x2]
> movprfx z29, z31
> mul z29.b, p6/m, z29.b, z30.b
> st1bz29.b, p7, [x3, x2]
> add x2, x2, x0
> whilelo p15.b, w2, w1
> b.any   .L4
> [...]
> .p2align 2,,3
> .L4:
> mov p7.b, p15.b
> b   .L3
> 
> This adds an extra (admittedly unconditional) branch to every non-final
> vector iteration, which seems unfortunate.  I don't think we'd see
> p8-p15 otherwise, since the result of the whilelo is used as a
> governing predicate by the next iteration of the loop.
> 
> This happens because the scalar loop is given an 89% chance of iterating.
> Previously we gave the vector loop an 83.33% chance of iterating, whereas
> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
> so the new probabilities definitely preserve the original probabilities
> more closely.  But for purely heuristic probabilities like these, I'm
> not sure we should lean so heavily into the idea that the vector
> latch is unlikely.
> 
> Honza, Richi, any thoughts?  Just wanted to double-check that this
> was operating as expected before making the tests accept the (arguably)
> less efficient code.  It looks like the commit was more aimed at fixing
> the profile counts for the epilogues, rather than the main loop.

The above looks like a failed coalescing, can you track down where
that happens and why?

And yes, the profile counts were supposed to be fixed, but not only
for the epilog but for header copying also for the main loop.  Not
sure if anything goes wrong here though - for estimates of course
it's only estimates and IIRC we estimate a loop to iterate 4 times
when we don't know better.

Richard.

> Thanks,
> Richard
> 
> > This widens the regexp.
> >
> > Tested on aarch64-none-linux-gnu and passes again.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/live_1.c: Update assembly.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > index 
> > 80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c
> >  100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > @@ -27,10 +27,10 @@
> >  
> >  TEST_ALL (EXTRACT_LAST)
> >  
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.b, } 2 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.s, } 4 } } */
> > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.d, } 4 } } */
> >  
> >  /* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], 
> > z[0-9]+\.b\n} 1 } } */
> >  /* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], 
> > z[0-9]+\.h\n} 2 } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Jul 2023, Robin Dapp wrote:

> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.

Didn't notice that but yes, consistency would be nice to have.

Richard.

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Jul 2023, Richard Sandiford wrote:

> Jeff Law via Gcc-patches  writes:
> > On 7/19/23 04:25, Richard Biener wrote:
> >> On Wed, 19 Jul 2023, YunQiang Su wrote:
> >> 
> >>> Eric Botcazou  ?2023?7?19??? 17:45???
> 
> > I don't see that.  That's definitely not what GCC expects here,
> > the left-most word of the doubleword should be unchanged.
> >
> > Your testcase should be a dg-do-run and probably more like
> >
> > NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
> > {
> >int val;
> >((unsigned char*)&val)[0] = *buf++;
> >((unsigned char*)&val)[1] = *buf++;
> >((unsigned char*)&val)[2] = *buf++;
> >((unsigned char*)&val)[3] = *buf++;
> >return val;
> > }
> > int main()
> > {
> >int val = 0x01020304;
> >val = test (&val);
> >if (val != 0x01020304)
> >  abort ();
> > }
> >
> > not sure if I got endianess correct.  Now, the question is what
> > WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
> > the MIPS ABI says for returning SImode.
> 
> >>>
> >>> MIPS N64 ABI uses 2 GPR for integer return values.
> >>> If the return value is SImode, the first v0 register is used, and it
> >>> must be sign-extended,
> >>> aka the bits[64-31] are all same.
> >>>
> >>> Yes, it is same for signed and unsigned int32.
> >>>
> >>> https://irix7.com/techpubs/007-2816-004.pdf
> >>> Page 6:
> >>> 32-bit integer (int) parameters are always sign-extended when passed
> >>> in registers,
> >>> whether of signed or unsigned type. [This issue does not arise in the
> >>> o32-bit ABI.]
> >> 
> >> Note I think Andrews comment#7 in the PR is spot-on then, the issue
> >> isn't the bitfield inserts but the compare where combine elides
> >> the sign_extend in favor of a subreg.  That's likely some wrongdoing
> >> in simplify-rtx in the context of WORD_REGISTER_OPERATIONS.
> > And I think it raises a real question about the use of GPR (which maps 
> > to SImode and DImode for 64bit MIPS targets) on the conditional 
> > branching patterns in mips.md.
> >
> > So while this code works:
> >
> >> (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
> >> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4))) 
> >> "/app/example.cpp":7:29 -1
> >>  (nil))
> 
> Haven't had chance to compile and look at it properly, but this subreg
> seems suspicious for MIPS, given the definition of TRULY_NOOP_TRUNCATION.
> We should instead use a truncdisi2 to narrow reg:DI 200 to an SI register,
> and then sign_extend it.
> 
> This is easily missed in target-independent code because so few targets
> define TRULY_NOOP_TRUNCATION.

Can we easily get rid of it?

Richard.

> Where is the subreg being generated?
> 
> Richard
> 
> >> (jump_insn 23 20 24 2 (set (pc)
> >> (if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
> >> (const_int 0 [0]))
> >> (label_ref 32)
> >> (pc))) "/app/example.cpp":8:5 -1
> >>  (int_list:REG_BR_PROB 440234148 (nil))
> >>  -> 32)
> >
> >
> > Normally the narrowing SUBREG in insn 23 would indicate we don't care 
> > about the bits outside SImode.  But on a W_R_O targets we very much care 
> > because the hardware is going to ultimately do the comparison in 64 bits.
> >
> > As Andrew/Richi have indicated this very much points to combine as 
> > incorrectly eliminating the explict sign extension.  Most likely because 
> > something saw the SUBREG and concluded those upper bits set by insn 20 
> > were "don't care" bits.
> >
> > But it may ultimately be be better for the MIPS port to not expose a 
> > SImode comparison.  Thus reducing the reliance on W_R_O and its 
> > under-specified semantics and ultimately having the RTL map more closely 
> > to what the hardware actually does/supports.
> >
> > That's the model we're working towards on the RISC-V port as well.  I 
> > wouldn't be surprised if we eventually get to the point where we 
> > eliminate WORD_REGISTER_OPERATIONS entirely.
> >
> > And yes, bitfield operations are one of the nasty sticking points.  The 
> > thinking for them is that we want to support bit manipulations where the 
> > bit position is variable.  To do that we will emit an explicit sign 
> > extension after such operations.  Then rely on improved REE to identify 
> > and remove those redundant extensions.
> >
> > Jeff
> >
> > Jeff
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-20 Thread juzhe.zh...@rivai.ai

Hi, Richard.

I plan to change all LEN_MASK into MASK_LEN.

Start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE,
we notice keeping mask same order with original mask_* patterns 
will make codes cleaner and easier to maintain. 

Thanks

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-07-20 15:21
To: Robin Dapp
CC: juzhe.zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Support floating-point in-order reduction for length 
loop control
On Thu, 20 Jul 2023, Robin Dapp wrote:

> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.

Didn't notice that but yes, consistency would be nice to have.

Richard.

Re: [GCC 13 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-07-20 Thread Richard Biener via Gcc-patches

On Wed, Jul 19, 2023 at 2:33 PM Uros Bizjak  wrote:
>
> On Wed, Jul 19, 2023 at 2:21 PM Richard Biener
>  wrote:
> >
> > On Sun, Jun 11, 2023 at 12:55 AM Roger Sayle  
> > wrote:
> > >
> > >
> > > This is a backport of the fixes for PR target/109973 and PR target/110083.
> > >
> > > This backport to the releases/gcc-13 branch has been tested on
> > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and
> > > without --target_board=unix{-m32} with no new failures.  Ok for gcc-13,
> > > or should we just close PR 109973 in Bugzilla?
> >
> > As alternative solution for the GCC 13 branch I have tested reverting
> > r13-2006-ga56c1641e9d25e successfully.  Can we choose between the
> > options please?  Sorry I'm only bringing this up now but 13.2 RC is due
> > tomorrow.
> >
> > Thank you,
> > Richard.
> >
> > >
> > >
> > > 2023-06-10  Roger Sayle  
> > > Uros Bizjak  
> > >
> > > gcc/ChangeLog
> > > PR target/109973
> > > PR target/110083
> > > * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> > > CODE_for_sse4_1_ptestzv2di.
> > > (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> > > (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> > > (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> > > * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> > > when expanding UNSPEC_PTEST to compare against zero.
> > > * config/i386/i386-features.cc (scalar_chain::convert_compare):
> > > Likewise generate CCZmode UNSPEC_PTESTs when converting 
> > > comparisons.
> > > Update or delete REG_EQUAL notes, converting CONST_INT and
> > > CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.
> > > (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > > result.
> > > (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE 
> > > result.
> > > * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
> > > * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
> > > check for suitable matching modes for the UNSPEC_PTEST pattern.
> > > * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> > > to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> > > (*_ptest): Add asterisk to hide define_insn.  Remove
> > > ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
> > > (_ptestz): New define_expand to specify CCZ.
> > > (_ptestc): New define_expand to specify CCC.
> > > (_ptest): A define_expand using CC to preserve the
> > > current behavior.
> > > (*ptest_and): Specify CCZ to only perform this optimization
> > > when only the Z flag is required.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR target/109973
> > > PR target/110083
> > > * gcc.target/i386/pr109973-1.c: New test case.
> > > * gcc.target/i386/pr109973-2.c: Likewise.
> > > * gcc.target/i386/pr110083.c: Likewise.
>
> Yes, I would rather have the offending patch reverted on gcc-13.

Done.

Richard.

> Uros.

Re: [PATCH] PR c/110699: Defend against error_mark_node in gimplify.cc.

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 12:11 AM Roger Sayle  wrote:
>
>
> This patch resolves PR c/110699, an ICE-after-error regression, by adding
> a check that the array type isn't error_mark_node in gimplify_compound_lval.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Can you change it to

  if (error_operand_p (TREE_OPERAND (t, 0))
return GS_ERROR;

and do that unconditionally for each 't' on the expr_stack?  It seems we only
ever push handled_component_p to it.

OK with that change.

Richard.

>
>
> 2023-07-19  Roger Sayle  
>
> gcc/ChangeLog
> PR c/110699
> * gimplify.cc (gimplify_compound_lval):  For ARRAY_REF and
> ARRAY_RANGE_REF return GS_ERROR if the array's type is
> error_mark_node.
>
> gcc/testsuite/ChangeLog
> PR c/110699
> * gcc.dg/pr110699.c: New test case.
>
>
> Cheers,
> Roger
> --
>

[PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Juzhe-Zhong

This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html

Consider this following case:
float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

Compile with **NO** -ffast-math:

Before this patch:
:4:21: missed: couldn't vectorize loop
:1:7: missed: not vectorized: relevant phi not supported: result_14 = 
PHI 

After this patch:
foo:
lui a5,%hi(.LC0)
flw fa0,%lo(.LC0)(a5)
ble a1,zero,.L4
.L3:
vsetvli a5,a1,e32,m1,ta,ma
vle32.v v1,0(a0)
sllia4,a5,2
vsetivlizero,1,e32,m1,ta,ma
sub a1,a1,a5
vfmv.s.fv2,fa0
add a0,a0,a4
vsetvli zero,a5,e32,m1,ta,ma
vfredosum.vsv1,v1,v2 --> FOLD_LEFT_PLUS
vfmv.f.sfa0,v1
bne a1,zero,.L3
ret
.L4:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (fold_left_plus_): New pattern.
(mask_len_fold_left_plus_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(enum reduction_type): Ditto.
(expand_reduction): Add in-order reduction.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New 
function.
(expand_reduction): Add in-order reduction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New test.

---
 gcc/config/riscv/autovec.md   | 39 
 gcc/config/riscv/riscv-protos.h   | 11 -
 gcc/config/riscv/riscv-v.cc   | 45 ---
 .../riscv/rvv/autovec/reduc/reduc_strict-1.c  | 28 
 .../riscv/rvv/autovec/reduc/reduc_strict-2.c  | 26 +++
 .../riscv/rvv/autovec/reduc/reduc_strict-3.c  | 18 
 .../riscv/rvv/autovec/reduc/reduc_strict-4.c  | 24 ++
 .../riscv/rvv/autovec/reduc/reduc_strict-5.c  | 28 
 .../riscv/rvv/autovec/reduc/reduc_strict-6.c  | 18 
 .../riscv/rvv/autovec/reduc/reduc_strict-7.c  | 21 +
 .../rvv/autovec/reduc/reduc_strict_run-1.c| 29 
 .../rvv/autovec/reduc/reduc_strict_run-2.c| 31 +
 12 files changed, 311 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 00947207f3f..af55ef7b68f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1687,3 +1687,42 @@
   riscv_vector::expand_reduction (SMIN, operands, f);
   DONE;
 })
+
+;; -
+;;  [FP] Left-to-right reductions
+;; -
+;; Includes:
+;; - vfredosum.vs
+;; -
+
+;; Unpredicated in-order FP reductions.
+(define_expand "fold_left_plus_"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VF 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands,
+ operands[1],
+ riscv_vector::FOLD_LEFT_REDUDUCTION);
+  DONE;
+})
+
+;; Predicated in-order FP reductions.
+(define_expand "mask_len_fold_left_plus_"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VF 2 "register_operand")
+   (match_operand: 3 "vector_mask_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operan

Re: [PATCH] Move combine over to statistics_counter_event.

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 4:42 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Since we have statistics_counter_event now, combine should use that
> instead of it is own custom printing of statistics.
> The only thing that is not done any more after this patch is printing
> out the total stats for the whole TU.

you can use -fdump-statistics-stats to get the total counts (but not
in the combine
dumpfile).

>
> Note you need to use -fdump-rtl-combine-stats to get the stats in the combine
> dump unlike before where the stats was dumped directly into the file.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * combine.cc (dump_combine_stats): Remove.
> (dump_combine_total_stats): Remove.
> (total_attempts, total_merges, total_extras,
> total_successes): Remove.
> (combine_instructions): Don't increment total stats
> instead use statistics_counter_event.
> * dumpfile.cc (print_combine_total_stats): Remove.
> * dumpfile.h (print_combine_total_stats): Remove.
> (dump_combine_total_stats): Remove.
> * passes.cc (finish_optimization_passes):
> Don't call print_combine_total_stats.
> * rtl.h (dump_combine_total_stats): Remove.
> (dump_combine_stats): Remove.
> ---
>  gcc/combine.cc  | 30 --
>  gcc/dumpfile.cc |  9 -
>  gcc/dumpfile.h  |  3 ---
>  gcc/passes.cc   |  7 ---
>  gcc/rtl.h   |  2 --
>  5 files changed, 4 insertions(+), 47 deletions(-)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index d9161b257e8..4bf867d74b0 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -108,10 +108,6 @@ static int combine_extras;
>
>  static int combine_successes;
>
> -/* Totals over entire compilation.  */
> -
> -static int total_attempts, total_merges, total_extras, total_successes;
> -
>  /* combine_instructions may try to replace the right hand side of the
> second instruction with the value of an associated REG_EQUAL note
> before throwing it at try_combine.  That is problematic when there
> @@ -1456,10 +1452,10 @@ retry:
>  undobuf.frees = 0;
>}
>
> -  total_attempts += combine_attempts;
> -  total_merges += combine_merges;
> -  total_extras += combine_extras;
> -  total_successes += combine_successes;
> +  statistics_counter_event (cfun, "attempts", combine_attempts);
> +  statistics_counter_event (cfun, "merges", combine_merges);
> +  statistics_counter_event (cfun, "extras", combine_extras);
> +  statistics_counter_event (cfun, "successes", combine_successes);
>
>nonzero_sign_valid = 0;
>rtl_hooks = general_rtl_hooks;
> @@ -14936,24 +14932,6 @@ unmentioned_reg_p (rtx equiv, rtx expr)
>return false;
>  }
>
> -DEBUG_FUNCTION void
> -dump_combine_stats (FILE *file)
> -{
> -  fprintf
> -(file,
> - ";; Combiner statistics: %d attempts, %d substitutions (%d requiring 
> new space),\n;; %d successes.\n\n",
> - combine_attempts, combine_merges, combine_extras, combine_successes);
> -}
> -
> -void
> -dump_combine_total_stats (FILE *file)
> -{
> -  fprintf
> -(file,
> - "\n;; Combiner totals: %d attempts, %d substitutions (%d requiring new 
> space),\n;; %d successes.\n",
> - total_attempts, total_merges, total_extras, total_successes);
> -}
> -
>  /* Make pseudo-to-pseudo copies after every hard-reg-to-pseudo-copy, because
> the reg-to-reg copy can usefully combine with later instructions, but we
> do not want to combine the hard reg into later instructions, for that
> diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
> index 51f68c8c6b4..a2050d13009 100644
> --- a/gcc/dumpfile.cc
> +++ b/gcc/dumpfile.cc
> @@ -2074,15 +2074,6 @@ dump_function (int phase, tree fn)
>  }
>  }
>
> -/* Print information from the combine pass on dump_file.  */
> -
> -void
> -print_combine_total_stats (void)
> -{
> -  if (dump_file)
> -dump_combine_total_stats (dump_file);
> -}
> -
>  /* Enable RTL dump for all the RTL passes.  */
>
>  bool
> diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
> index 7d5eca899dc..c41940624ca 100644
> --- a/gcc/dumpfile.h
> +++ b/gcc/dumpfile.h
> @@ -647,14 +647,11 @@ class auto_dump_scope
>auto_dump_scope scope (NAME, USER_LOC)
>
>  extern void dump_function (int phase, tree fn);
> -extern void print_combine_total_stats (void);
>  extern bool enable_rtl_dump_file (void);
>
>  /* In tree-dump.cc  */
>  extern void dump_node (const_tree, dump_flags_t, FILE *);
>
> -/* In combine.cc  */
> -extern void dump_combine_total_stats (FILE *);
>  /* In cfghooks.cc  */
>  extern void dump_bb (FILE *, basic_block, int, dump_flags_t);
>
> diff --git a/gcc/passes.cc b/gcc/passes.cc
> index d7b0ad271a1..6f894a41d22 100644
> --- a/gcc/passes.cc
> +++ b/gcc/passes.cc
> @@ -359,13 +359,6 @@ finish_optimization_passes (void)
>dumps->dump_finish (pass_profile_1->static_pass_number);
>  }
>
> -  if (optimize > 0)
> -{
> -  dumps->dump_start

[PATCH] Optimize vlddqu to vmovdqu for TARGET_AVX

2023-07-20 Thread liuhongt via Gcc-patches

For Intel processors, after TARGET_AVX, vmovdqu is optimized as fast
as vlddqu, UNSPEC_LDDQU can be removed to enable more optimizations.
Can someone confirm this with AMD folks?
If AMD doesn't like such optimization, I'll put my optimization under
micro-architecture tuning.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
If AMD also like such optimization, Ok for trunk?

gcc/ChangeLog:

* config/i386/sse.md (_lddqu): Change to
define_expand, expand as simple move when TARGET_AVX
&& ( == 16 || !TARGET_AVX256_SPLIT_UNALIGNED_LOAD).
The original define_insn is renamed to
..
(_lddqu): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vlddqu_vinserti128.c: New test.
---
 gcc/config/i386/sse.md| 15 ++-
 .../gcc.target/i386/vlddqu_vinserti128.c  | 11 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2d81347c7b6..d571a78f4c4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1835,7 +1835,20 @@ (define_peephole2
   [(set (match_dup 4) (match_dup 1))]
   "operands[4] = adjust_address (operands[0], V2DFmode, 0);")
 
-(define_insn "_lddqu"
+(define_expand "_lddqu"
+  [(set (match_operand:VI1 0 "register_operand")
+   (unspec:VI1 [(match_operand:VI1 1 "memory_operand")]
+   UNSPEC_LDDQU))]
+  "TARGET_SSE3"
+{
+  if (TARGET_AVX && ( == 16 || !TARGET_AVX256_SPLIT_UNALIGNED_LOAD))
+{
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+})
+
+(define_insn "*_lddqu"
   [(set (match_operand:VI1 0 "register_operand" "=x")
(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
UNSPEC_LDDQU))]
diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c 
b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
new file mode 100644
index 000..29699a5fa7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */
+/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */
+
+#include 
+__m256i foo(void *data) {
+__m128i X1 = _mm_lddqu_si128((__m128i*)data);
+__m256i V1 = _mm256_broadcastsi128_si256 (X1);
+return V1;
+}
-- 
2.39.1.388.g2fc9e9ca3c

Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 8:49 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Andrew Carlotti  writes:
> > Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> > OK to backport to GCC 13?
>
> OK, thanks.

In case you want it in 13.2 please push it really soon, we want to do 13.2 RC1
today.

Richard.

> Richard
>
> > Many intrinsics currently depend on both an architecture version and a
> > feature, despite the corresponding instructions being available within
> > GCC at lower architecture versions.
> >
> > LLVM has already removed these explicit architecture version
> > dependences; this patch does the same for GCC. Note that +fp16 does not
> > imply +simd, so we need to add an explicit +simd for the Neon fp16
> > intrinsics.
> >
> > Binutils did not previously support all of these architecture+feature
> > combinations, but this problem is already reachable from GCC.  For
> > example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > GCC 10.  This is fixed in Binutils 2.41.
> >
> > This patch retains explicit architecture version dependencies for
> > features that do not currently have a separate feature flag.
> >
> > gcc/ChangeLog:
> >
> >  * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
> >  dependency.
> >  * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
> >  dependencies from target pragmas.
> >  * config/aarch64/arm_fp16.h (target): Likewise.
> >  * config/aarch64/arm_neon.h (target): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/aarch64/feature-bf16-backport.c: New test.
> >  * gcc.target/aarch64/feature-dotprod-backport.c: New test.
> >  * gcc.target/aarch64/feature-fp16-backport.c: New test.
> >  * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
> >  * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
> >  * gcc.target/aarch64/feature-i8mm-backport.c: New test.
> >  * gcc.target/aarch64/feature-memtag-backport.c: New test.
> >  * gcc.target/aarch64/feature-sha3-backport.c: New test.
> >  * gcc.target/aarch64/feature-sm4-backport.c: New test.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index 
> > a01f1ee99d85917941ffba55bc3b4dcac87b41f6..2b0fc97bb71e9d560ae26035c7d7142682e46c38
> >  100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
> >  #define TARGET_RNG (AARCH64_ISA_RNG)
> >
> >  /* Memory Tagging instructions optional to Armv8.5 enabled through 
> > +memtag.  */
> > -#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
> > +#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
> >
> >  /* I8MM instructions are enabled through +i8mm.  */
> >  #define TARGET_I8MM (AARCH64_ISA_I8MM)
> > diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> > index 
> > 3b6b63e6805432b5f1686745f987c52d2967c7c1..7599a32301dadf80760d3cb40a8685d2e6a476fb
> >  100644
> > --- a/gcc/config/aarch64/arm_acle.h
> > +++ b/gcc/config/aarch64/arm_acle.h
> > @@ -292,7 +292,7 @@ __rndrrs (uint64_t *__res)
> >  #pragma GCC pop_options
> >
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.5-a+memtag")
> > +#pragma GCC target ("+nothing+memtag")
> >
> >  #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
> >__builtin_aarch64_memtag_irg(__ptr, __u64_mask)
> > diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> > index 
> > 350f8cc33d99e16137e9d70fa7958b10924dc67f..c10f9dcf7e097ded1740955addcd73348649dc56
> >  100644
> > --- a/gcc/config/aarch64/arm_fp16.h
> > +++ b/gcc/config/aarch64/arm_fp16.h
> > @@ -30,7 +30,7 @@
> >  #include 
> >
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > +#pragma GCC target ("+nothing+fp16")
> >
> >  typedef __fp16 float16_t;
> >
> > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> > index 
> > 0ace1eeddb97443433c091d2363403fcf2907654..349f3167699447eb397af482eaeadf8a07617025
> >  100644
> > --- a/gcc/config/aarch64/arm_neon.h
> > +++ b/gcc/config/aarch64/arm_neon.h
> > @@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
> >  #include "arm_fp16.h"
> >
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > +#pragma GCC target ("+nothing+simd+fp16")
> >
> >  /* ARMv8.2-A FP16 one operand vector intrinsics.  */
> >
> > @@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
> >  /* AdvSIMD Dot Product intrinsics.  */
> >
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.2-a+dotprod")
> > +#pragma GCC target ("+nothing+dotprod")
> >
> >  __extension__ extern __inline uint32x2_t
> >  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> > @@ -26844,7 +26844,7 @@ vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, 
> > int8x16_t __b, const int __index)
> >  #pragma GCC pop_options
> >
> >  #pragma GCC push_optio

Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Robin Dapp via Gcc-patches

> +enum reduction_type
> +{
> +  UNORDERED_REDUDUCTION,
> +  FOLD_LEFT_REDUDUCTION,
> +  MASK_LEN_FOLD_LEFT_REDUDUCTION,
> +};

There are redundant 'DU's here ;)
Wouldn't it be sufficient to have an enum

enum reduction_type
{
  UNORDERED,
  FOLD_LEFT,
  MASK_LEN_FOLD_LEFT,
};
?

Regards
 Robin

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.

Could you give me another enum name?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-20 15:41
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> +enum reduction_type
> +{
> +  UNORDERED_REDUDUCTION,
> +  FOLD_LEFT_REDUDUCTION,
> +  MASK_LEN_FOLD_LEFT_REDUDUCTION,
> +};
 
There are redundant 'DU's here ;)
Wouldn't it be sufficient to have an enum
 
enum reduction_type
{
  UNORDERED,
  FOLD_LEFT,
  MASK_LEN_FOLD_LEFT,
};
?
 
Regards
Robin

Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Kito Cheng via Gcc-patches

Seems like there is a potential vsetvli optimization chance in the example?

> After this patch:
> foo:
> lui a5,%hi(.LC0)
> flw fa0,%lo(.LC0)(a5)
> ble a1,zero,.L4
> .L3:
> vsetvli a5,a1,e32,m1,ta,ma
> vle32.v v1,0(a0)
> sllia4,a5,2
> vsetivlizero,1,e32,m1,ta,ma

This could just use "vsetvli a5,a1,e32,m1,ta,ma"

> sub a1,a1,a5
> vfmv.s.fv2,fa0
> add a0,a0,a4
> vsetvli zero,a5,e32,m1,ta,ma

And then this can be removed too.

> vfredosum.vsv1,v1,v2
> vfmv.f.sfa0,v1
> bne a1,zero,.L3
> ret
> .L4:
> ret

RE: [x86_64 PATCH] More TImode parameter passing improvements.

2023-07-20 Thread Roger Sayle



Hi Uros,

> From: Uros Bizjak 
> Sent: 20 July 2023 07:50
> 
> On Wed, Jul 19, 2023 at 10:07 PM Roger Sayle 
> wrote:
> >
> > This patch is the next piece of a solution to the x86_64 ABI issues in
> > PR 88873.  This splits the *concat3_3 define_insn_and_split
> > into two patterns, a TARGET_64BIT *concatditi3_3 and a !TARGET_64BIT
> > *concatsidi3_3.  This allows us to add an additional alternative to
> > the the 64-bit version, enabling the register allocator to perform
> > this operation using SSE registers, which is implemented/split after
> > reload using vec_concatv2di.
> >
> > To demonstrate the improvement, the test case from PR88873:
> >
> > typedef struct { double x, y; } s_t;
> >
> > s_t foo (s_t a, s_t b, s_t c)
> > {
> >   return (s_t){ __builtin_fma(a.x, b.x, c.x), __builtin_fma (a.y, b.y,
> > c.y) }; }
> >
> > when compiled with -O2 -march=cascadelake, currently generates:
> >
> > foo:vmovq   %xmm2, -56(%rsp)
> > movq-56(%rsp), %rax
> > vmovq   %xmm3, -48(%rsp)
> > vmovq   %xmm4, -40(%rsp)
> > movq-48(%rsp), %rcx
> > vmovq   %xmm5, -32(%rsp)
> > vmovq   %rax, %xmm6
> > movq-40(%rsp), %rax
> > movq-32(%rsp), %rsi
> > vpinsrq $1, %rcx, %xmm6, %xmm6
> > vmovq   %xmm0, -24(%rsp)
> > vmovq   %rax, %xmm7
> > vmovq   %xmm1, -16(%rsp)
> > vmovapd %xmm6, %xmm2
> > vpinsrq $1, %rsi, %xmm7, %xmm7
> > vfmadd132pd -24(%rsp), %xmm7, %xmm2
> > vmovapd %xmm2, -56(%rsp)
> > vmovsd  -48(%rsp), %xmm1
> > vmovsd  -56(%rsp), %xmm0
> > ret
> >
> > with this change, we avoid many of the reloads via memory,
> >
> > foo:vpunpcklqdq %xmm3, %xmm2, %xmm7
> > vpunpcklqdq %xmm1, %xmm0, %xmm6
> > vpunpcklqdq %xmm5, %xmm4, %xmm2
> > vmovdqa %xmm7, -24(%rsp)
> > vmovdqa %xmm6, %xmm1
> > movq-16(%rsp), %rax
> > vpinsrq $1, %rax, %xmm7, %xmm4
> > vmovapd %xmm4, %xmm6
> > vfmadd132pd %xmm1, %xmm2, %xmm6
> > vmovapd %xmm6, -24(%rsp)
> > vmovsd  -16(%rsp), %xmm1
> > vmovsd  -24(%rsp), %xmm0
> > ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-19  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.cc (ix86_expand_move): Don't call
> > force_reg, to use SUBREG rather than create a new pseudo when
> > inserting DFmode fields into TImode with insvti_{high,low}part.
> > (*concat3_3): Split into two define_insn_and_split...
> > (*concatditi3_3): 64-bit implementation.  Provide alternative
> > that allows register allocation to use SSE registers that is
> > split into vec_concatv2di after reload.
> > (*concatsidi3_3): 32-bit implementation.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/pr88873.c: New test case.
> 
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index f9b0dc6..9c3febe 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -558,7 +558,7 @@ ix86_expand_move (machine_mode mode, rtx
> operands[])
>op0 = SUBREG_REG (op0);
>tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
>if (mode == DFmode)
> -op1 = force_reg (DImode, gen_lowpart (DImode, op1));
> +op1 = gen_lowpart (DImode, op1);
> 
> Please note that gen_lowpart will ICE when op1 is a SUBREG. This is the reason
> that we need to first force a SUBREG to a register and then perform 
> gen_lowpart,
> and it is necessary to avoid ICE.

The good news is that we know op1 is a register, as this is tested by
"&& REG_P (op1)" on line 551.  You'll also notice that I'm not removing
the force_reg from before the call to gen_lowpart, but removing the call
to force_reg after the call to gen_lowpart.  When I originally wrote this,
the hope was that placing this SUBREG in its own pseudo would help
with register allocation/CSE.  Unfortunately, increasing the number of
pseudos (in this case) increases compile-time (due to quadratic behaviour
in LRA), as shown by PR rtl-optimization/110587, and keeping the DF->DI
conversion in a SUBREG inside the insvti_{high,low}part allows the
register allocator to see the DF->DI->TI sequence in a single pattern,
and hence choose to keep the TI mode in SSE registers, rather than use
a pair of reloads, to write the DF value to memory, then read it back as
a scalar in DImode, and perhaps the same again to go the other way.

>op1 = gen_rtx_ZERO_EXTEND (TImode, op1);
>op1 = gen_rtx_IOR (TImode, tmp, op1);
>   }
> @@ -570,7 +570,7 @@ ix86_expand_move (machine_mode mode, rtx
> operands[])
>op0 = SUBREG_REG (op0);
>tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
>if (mode == DFmode)
> -op1 = force

Re: [x86_64 PATCH] More TImode parameter passing improvements.

2023-07-20 Thread Uros Bizjak via Gcc-patches

On Thu, Jul 20, 2023 at 9:44 AM Roger Sayle  wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak 
> > Sent: 20 July 2023 07:50
> >
> > On Wed, Jul 19, 2023 at 10:07 PM Roger Sayle 
> > wrote:
> > >
> > > This patch is the next piece of a solution to the x86_64 ABI issues in
> > > PR 88873.  This splits the *concat3_3 define_insn_and_split
> > > into two patterns, a TARGET_64BIT *concatditi3_3 and a !TARGET_64BIT
> > > *concatsidi3_3.  This allows us to add an additional alternative to
> > > the the 64-bit version, enabling the register allocator to perform
> > > this operation using SSE registers, which is implemented/split after
> > > reload using vec_concatv2di.
> > >
> > > To demonstrate the improvement, the test case from PR88873:
> > >
> > > typedef struct { double x, y; } s_t;
> > >
> > > s_t foo (s_t a, s_t b, s_t c)
> > > {
> > >   return (s_t){ __builtin_fma(a.x, b.x, c.x), __builtin_fma (a.y, b.y,
> > > c.y) }; }
> > >
> > > when compiled with -O2 -march=cascadelake, currently generates:
> > >
> > > foo:vmovq   %xmm2, -56(%rsp)
> > > movq-56(%rsp), %rax
> > > vmovq   %xmm3, -48(%rsp)
> > > vmovq   %xmm4, -40(%rsp)
> > > movq-48(%rsp), %rcx
> > > vmovq   %xmm5, -32(%rsp)
> > > vmovq   %rax, %xmm6
> > > movq-40(%rsp), %rax
> > > movq-32(%rsp), %rsi
> > > vpinsrq $1, %rcx, %xmm6, %xmm6
> > > vmovq   %xmm0, -24(%rsp)
> > > vmovq   %rax, %xmm7
> > > vmovq   %xmm1, -16(%rsp)
> > > vmovapd %xmm6, %xmm2
> > > vpinsrq $1, %rsi, %xmm7, %xmm7
> > > vfmadd132pd -24(%rsp), %xmm7, %xmm2
> > > vmovapd %xmm2, -56(%rsp)
> > > vmovsd  -48(%rsp), %xmm1
> > > vmovsd  -56(%rsp), %xmm0
> > > ret
> > >
> > > with this change, we avoid many of the reloads via memory,
> > >
> > > foo:vpunpcklqdq %xmm3, %xmm2, %xmm7
> > > vpunpcklqdq %xmm1, %xmm0, %xmm6
> > > vpunpcklqdq %xmm5, %xmm4, %xmm2
> > > vmovdqa %xmm7, -24(%rsp)
> > > vmovdqa %xmm6, %xmm1
> > > movq-16(%rsp), %rax
> > > vpinsrq $1, %rax, %xmm7, %xmm4
> > > vmovapd %xmm4, %xmm6
> > > vfmadd132pd %xmm1, %xmm2, %xmm6
> > > vmovapd %xmm6, -24(%rsp)
> > > vmovsd  -16(%rsp), %xmm1
> > > vmovsd  -24(%rsp), %xmm0
> > > ret
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-19  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > * config/i386/i386-expand.cc (ix86_expand_move): Don't call
> > > force_reg, to use SUBREG rather than create a new pseudo when
> > > inserting DFmode fields into TImode with insvti_{high,low}part.
> > > (*concat3_3): Split into two define_insn_and_split...
> > > (*concatditi3_3): 64-bit implementation.  Provide alternative
> > > that allows register allocation to use SSE registers that is
> > > split into vec_concatv2di after reload.
> > > (*concatsidi3_3): 32-bit implementation.
> > >
> > > gcc/testsuite/ChangeLog
> > > * gcc.target/i386/pr88873.c: New test case.
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index f9b0dc6..9c3febe 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -558,7 +558,7 @@ ix86_expand_move (machine_mode mode, rtx
> > operands[])
> >op0 = SUBREG_REG (op0);
> >tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
> >if (mode == DFmode)
> > -op1 = force_reg (DImode, gen_lowpart (DImode, op1));
> > +op1 = gen_lowpart (DImode, op1);
> >
> > Please note that gen_lowpart will ICE when op1 is a SUBREG. This is the 
> > reason
> > that we need to first force a SUBREG to a register and then perform 
> > gen_lowpart,
> > and it is necessary to avoid ICE.
>
> The good news is that we know op1 is a register, as this is tested by
> "&& REG_P (op1)" on line 551.  You'll also notice that I'm not removing
> the force_reg from before the call to gen_lowpart, but removing the call
> to force_reg after the call to gen_lowpart.  When I originally wrote this,
> the hope was that placing this SUBREG in its own pseudo would help
> with register allocation/CSE.  Unfortunately, increasing the number of
> pseudos (in this case) increases compile-time (due to quadratic behaviour
> in LRA), as shown by PR rtl-optimization/110587, and keeping the DF->DI
> conversion in a SUBREG inside the insvti_{high,low}part allows the
> register allocator to see the DF->DI->TI sequence in a single pattern,
> and hence choose to keep the TI mode in SSE registers, rather than use
> a pair of reloads, to write the DF value to memory, then read it back as
> a scalar in DImode, and perhaps the same again to go the other way.

This was my on

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

Oh, Yes. 
It can be easily addressed by this:
emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);

This patch emit scalar move insn with AVL = 1 for all reduction. It can easily 
addressed when we recognize it is mask_len_fold_left_plus reduction,
we assign the AVL to the scalar move insn.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-20 15:42
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
Seems like there is a potential vsetvli optimization chance in the example?
 
> After this patch:
> foo:
> lui a5,%hi(.LC0)
> flw fa0,%lo(.LC0)(a5)
> ble a1,zero,.L4
> .L3:
> vsetvli a5,a1,e32,m1,ta,ma
> vle32.v v1,0(a0)
> sllia4,a5,2
> vsetivlizero,1,e32,m1,ta,ma
 
This could just use "vsetvli a5,a1,e32,m1,ta,ma"
 
> sub a1,a1,a5
> vfmv.s.fv2,fa0
> add a0,a0,a4
> vsetvli zero,a5,e32,m1,ta,ma
 
And then this can be removed too.
 
> vfredosum.vsv1,v1,v2
> vfmv.f.sfa0,v1
> bne a1,zero,.L3
> ret
> .L4:
> ret

Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Robin Dapp via Gcc-patches

> The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.
> 
> Could you give me another enum name?

I would have expected it to work when it's namespaced.

Regards
 Robin

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

I have no ideal, just ICE comes when running regression:

during RTL pass: expand
auto.c: In function 'test_int32_t_float_unordered_var':
auto.c:24:3: internal compiler error: in expand_vec_cmp_float, at 
config/riscv/riscv-v.cc:2564
   24 |   test_##TYPE1##_##TYPE2##_##CMP##_var (TYPE1 *restrict dest,   \
  |   ^
auto.c:41:3: note: in expansion of macro 'TEST_LOOP'
   41 |   TEST_LOOP (int32_t, float, CMP) \
  |   ^
auto.c:55:1: note: in expansion of macro 'TEST_CMP'
   55 | TEST_CMP (unordered)
  | ^~~~
0x1c8af0d riscv_vector::expand_vec_cmp_float(rtx_def*, rtx_code, rtx_def*, 
rtx_def*, bool)
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:2564
0x233d200 gen_vec_cmprvvm1sfrvvmf32bi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
../../../riscv-gcc/gcc/config/riscv/autovec.md:559
0x14c4582 rtx_insn* insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
../../../riscv-gcc/gcc/recog.h:407
0x14c3c02 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8197
0x14c4097 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8237
0x14c412b expand_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8268
0x14bfc3e expand_vec_cmp_expr(tree_node*, tree_node*, rtx_def*)
../../../riscv-gcc/gcc/optabs.cc:6692
0x1124e4a do_store_flag
../../../riscv-gcc/gcc/expr.cc:13060
0x1116b10 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
../../../riscv-gcc/gcc/expr.cc:10265
0x1119405 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../../riscv-gcc/gcc/expr.cc:10810
0x1110fb0 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
../../../riscv-gcc/gcc/expr.cc:9015
0xf2e973 expand_normal(tree_node*)
../../../riscv-gcc/gcc/expr.h:316
0x12bb060 expand_vec_cond_mask_optab_fn
../../../riscv-gcc/gcc/internal-fn.cc:3059
0x12c27ca expand_VCOND_MASK
../../../riscv-gcc/gcc/internal-fn.def:184
0x12c52a5 expand_internal_call(internal_fn, gcall*)
../../../riscv-gcc/gcc/internal-fn.cc:4792
0x12c52d0 expand_internal_call(gcall*)
../../../riscv-gcc/gcc/internal-fn.cc:4800
0xf5e4c1 expand_call_stmt
../../../riscv-gcc/gcc/cfgexpand.cc:2737
0xf62871 expand_gimple_stmt_1
../../../riscv-gcc/gcc/cfgexpand.cc:3880
0xf62f0f expand_gimple_stmt
../../../riscv-gcc/gcc/cfgexpand.cc:4044
0xf6b8a9 expand_gimple_basic_block
../../../riscv-gcc/gcc/cfgexpand.cc:6096

This ICE happens when compiling vcond.cc tests


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-20 15:57
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.
> 
> Could you give me another enum name?
 
I would have expected it to work when it's namespaced.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Kito Cheng via Gcc-patches

Seems like because you ` using namespace riscv_vector;` so the
UNORDERED in expand_vec_cmp_float used reduction_type::UNORDERED

Hmmm, maybe enum class?

enum class reduction_type
{
  UNORDERED,
  FOLD_LEFT,
  MASK_LEN_FOLD_LEFT,
};

and need use like this reduction_type::UNORDERED

On Thu, Jul 20, 2023 at 3:59 PM juzhe.zh...@rivai.ai
 wrote:
>
> I have no ideal, just ICE comes when running regression:
>
> during RTL pass: expand
> auto.c: In function 'test_int32_t_float_unordered_var':
> auto.c:24:3: internal compiler error: in expand_vec_cmp_float, at 
> config/riscv/riscv-v.cc:2564
>24 |   test_##TYPE1##_##TYPE2##_##CMP##_var (TYPE1 *restrict dest,   \
>   |   ^
> auto.c:41:3: note: in expansion of macro 'TEST_LOOP'
>41 |   TEST_LOOP (int32_t, float, CMP) \
>   |   ^
> auto.c:55:1: note: in expansion of macro 'TEST_CMP'
>55 | TEST_CMP (unordered)
>   | ^~~~
> 0x1c8af0d riscv_vector::expand_vec_cmp_float(rtx_def*, rtx_code, rtx_def*, 
> rtx_def*, bool)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:2564
> 0x233d200 gen_vec_cmprvvm1sfrvvmf32bi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/config/riscv/autovec.md:559
> 0x14c4582 rtx_insn* insn_gen_fn::operator() rtx_def*>(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
> ../../../riscv-gcc/gcc/recog.h:407
> 0x14c3c02 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8197
> 0x14c4097 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8237
> 0x14c412b expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8268
> 0x14bfc3e expand_vec_cmp_expr(tree_node*, tree_node*, rtx_def*)
> ../../../riscv-gcc/gcc/optabs.cc:6692
> 0x1124e4a do_store_flag
> ../../../riscv-gcc/gcc/expr.cc:13060
> 0x1116b10 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> expand_modifier)
> ../../../riscv-gcc/gcc/expr.cc:10265
> 0x1119405 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../../riscv-gcc/gcc/expr.cc:10810
> 0x1110fb0 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../../riscv-gcc/gcc/expr.cc:9015
> 0xf2e973 expand_normal(tree_node*)
> ../../../riscv-gcc/gcc/expr.h:316
> 0x12bb060 expand_vec_cond_mask_optab_fn
> ../../../riscv-gcc/gcc/internal-fn.cc:3059
> 0x12c27ca expand_VCOND_MASK
> ../../../riscv-gcc/gcc/internal-fn.def:184
> 0x12c52a5 expand_internal_call(internal_fn, gcall*)
> ../../../riscv-gcc/gcc/internal-fn.cc:4792
> 0x12c52d0 expand_internal_call(gcall*)
> ../../../riscv-gcc/gcc/internal-fn.cc:4800
> 0xf5e4c1 expand_call_stmt
> ../../../riscv-gcc/gcc/cfgexpand.cc:2737
> 0xf62871 expand_gimple_stmt_1
> ../../../riscv-gcc/gcc/cfgexpand.cc:3880
> 0xf62f0f expand_gimple_stmt
> ../../../riscv-gcc/gcc/cfgexpand.cc:4044
> 0xf6b8a9 expand_gimple_basic_block
> ../../../riscv-gcc/gcc/cfgexpand.cc:6096
>
> This ICE happens when compiling vcond.cc tests
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-07-20 15:57
> To: juzhe.zh...@rivai.ai; gcc-patches
> CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
> Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> > The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.
> >
> > Could you give me another enum name?
>
> I would have expected it to work when it's namespaced.
>
> Regards
> Robin
>
>

[PATCH] CODE STRUCTURE: Refine codes in Vectorizer

2023-07-20 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richard and Richi.

I plan to refine the codes that I recently support for RVV auto-vectorization.
This patch is inspired last review comments from Richard:
https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/

Richard said he prefer the the code structure as follows:

Please instead switch the if condition so that the structure is:

   if (...)
 vect_record_loop_mask (...)
   else if (...)
 vect_record_loop_len (...)
   else
 can't use partial vectors

This is his last comments.

So, I come back to refine this piece of codes.

Does it look reasonable ?

This next refine patch is change all names of "LEN_MASK" into "MASK_LEN" but 
should come after this
patch.

gcc/ChangeLog:

* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Refine 
code structure.

---
 gcc/tree-vect-stmts.cc | 38 +-
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cb86d544313..b86e159ae4c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1605,6 +1605,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
 nvectors = vect_get_num_copies (loop_vinfo, vectype);
 
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   machine_mode vecmode = TYPE_MODE (vectype);
   bool is_load = (vls_type == VLS_LOAD);
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
@@ -1631,33 +1632,29 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   internal_fn ifn = (is_load
 ? IFN_MASK_GATHER_LOAD
 : IFN_MASK_SCATTER_STORE);
-  if (!internal_gather_scatter_fn_supported_p (ifn, vectype,
-  gs_info->memory_type,
-  gs_info->offset_vectype,
-  gs_info->scale))
-   {
- ifn = (is_load
-? IFN_LEN_MASK_GATHER_LOAD
-: IFN_LEN_MASK_SCATTER_STORE);
- if (internal_gather_scatter_fn_supported_p (ifn, vectype,
- gs_info->memory_type,
- gs_info->offset_vectype,
- gs_info->scale))
-   {
- vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
- vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
- return;
-   }
+  internal_fn len_ifn = (is_load
+? IFN_LEN_MASK_GATHER_LOAD
+: IFN_LEN_MASK_SCATTER_STORE);
+  if (internal_gather_scatter_fn_supported_p (ifn, vectype,
+ gs_info->memory_type,
+ gs_info->offset_vectype,
+ gs_info->scale))
+   vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
+  scalar_mask);
+  else if (internal_gather_scatter_fn_supported_p (len_ifn, vectype,
+  gs_info->memory_type,
+  gs_info->offset_vectype,
+  gs_info->scale))
+   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
+  else
+   {
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors because"
 " the target doesn't have an appropriate"
 " gather load or scatter store instruction.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
- return;
}
-  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
-scalar_mask);
   return;
 }
 
@@ -1703,7 +1700,6 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   if (get_len_load_store_mode (vecmode, is_load).exists (&vmode))
 {
   nvectors = group_memory_nvectors (group_size * vf, nunits);
-  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode);
   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor);
   using_partial_vectors_p = true;
-- 
2.36.3

Re: [PATCH] Optimize vlddqu to vmovdqu for TARGET_AVX

2023-07-20 Thread Uros Bizjak via Gcc-patches

On Thu, Jul 20, 2023 at 9:35 AM liuhongt  wrote:
>
> For Intel processors, after TARGET_AVX, vmovdqu is optimized as fast
> as vlddqu, UNSPEC_LDDQU can be removed to enable more optimizations.
> Can someone confirm this with AMD folks?
> If AMD doesn't like such optimization, I'll put my optimization under
> micro-architecture tuning.

The instruction is reachable only as __builtin_ia32_lddqu* (aka
_mm_lddqu_si*), so it was chosen by the programmer for a reason. I
think that in this case, the compiler should not be too smart and
change the instruction behind the programmer's back. The caveats are
also explained at length in the ISA manual.

Uros.

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> If AMD also like such optimization, Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (_lddqu): Change to
> define_expand, expand as simple move when TARGET_AVX
> && ( == 16 || !TARGET_AVX256_SPLIT_UNALIGNED_LOAD).
> The original define_insn is renamed to
> ..
> (_lddqu): .. this.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vlddqu_vinserti128.c: New test.
> ---
>  gcc/config/i386/sse.md| 15 ++-
>  .../gcc.target/i386/vlddqu_vinserti128.c  | 11 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 2d81347c7b6..d571a78f4c4 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1835,7 +1835,20 @@ (define_peephole2
>[(set (match_dup 4) (match_dup 1))]
>"operands[4] = adjust_address (operands[0], V2DFmode, 0);")
>
> -(define_insn "_lddqu"
> +(define_expand "_lddqu"
> +  [(set (match_operand:VI1 0 "register_operand")
> +   (unspec:VI1 [(match_operand:VI1 1 "memory_operand")]
> +   UNSPEC_LDDQU))]
> +  "TARGET_SSE3"
> +{
> +  if (TARGET_AVX && ( == 16 || 
> !TARGET_AVX256_SPLIT_UNALIGNED_LOAD))
> +{
> +  emit_move_insn (operands[0], operands[1]);
> +  DONE;
> +}
> +})
> +
> +(define_insn "*_lddqu"
>[(set (match_operand:VI1 0 "register_operand" "=x")
> (unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
> UNSPEC_LDDQU))]
> diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c 
> b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
> new file mode 100644
> index 000..29699a5fa7f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */
> +/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */
> +
> +#include 
> +__m256i foo(void *data) {
> +__m128i X1 = _mm_lddqu_si128((__m128i*)data);
> +__m256i V1 = _mm256_broadcastsi128_si256 (X1);
> +return V1;
> +}
> --
> 2.39.1.388.g2fc9e9ca3c
>

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

I have tried this:
enum class reduction_type
{
  UNORDERED,
  FOLD_LEFT,
  MASK_LEN_FOLD_LEFT,
};

But fail to build.

/gcc/build -I../../../riscv-gcc/gcc/../include  
-I../../../riscv-gcc/gcc/../libcpp/include -g -O0 \
-o build/gencondmd.o build/gencondmd.cc
In file included from ./tm_p.h:4:0,
 from build/gencondmd.cc:29:
../../../riscv-gcc/gcc/config/riscv/riscv-protos.h:294:36: error: could not 
convert ‘UNORDERED’ from ‘rtx_code’ to ‘riscv_vector::reduction_type’
  reduction_type = UNORDERED);



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-20 16:03
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; gcc-patches; kito.cheng; jeffreyalaw
Subject: Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction
Seems like because you ` using namespace riscv_vector;` so the
UNORDERED in expand_vec_cmp_float used reduction_type::UNORDERED
 
Hmmm, maybe enum class?
 
enum class reduction_type
{
  UNORDERED,
  FOLD_LEFT,
  MASK_LEN_FOLD_LEFT,
};
 
and need use like this reduction_type::UNORDERED
 
On Thu, Jul 20, 2023 at 3:59 PM juzhe.zh...@rivai.ai
 wrote:
>
> I have no ideal, just ICE comes when running regression:
>
> during RTL pass: expand
> auto.c: In function 'test_int32_t_float_unordered_var':
> auto.c:24:3: internal compiler error: in expand_vec_cmp_float, at 
> config/riscv/riscv-v.cc:2564
>24 |   test_##TYPE1##_##TYPE2##_##CMP##_var (TYPE1 *restrict dest,   \
>   |   ^
> auto.c:41:3: note: in expansion of macro 'TEST_LOOP'
>41 |   TEST_LOOP (int32_t, float, CMP) \
>   |   ^
> auto.c:55:1: note: in expansion of macro 'TEST_CMP'
>55 | TEST_CMP (unordered)
>   | ^~~~
> 0x1c8af0d riscv_vector::expand_vec_cmp_float(rtx_def*, rtx_code, rtx_def*, 
> rtx_def*, bool)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:2564
> 0x233d200 gen_vec_cmprvvm1sfrvvmf32bi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/config/riscv/autovec.md:559
> 0x14c4582 rtx_insn* insn_gen_fn::operator() rtx_def*>(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
> ../../../riscv-gcc/gcc/recog.h:407
> 0x14c3c02 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8197
> 0x14c4097 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8237
> 0x14c412b expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8268
> 0x14bfc3e expand_vec_cmp_expr(tree_node*, tree_node*, rtx_def*)
> ../../../riscv-gcc/gcc/optabs.cc:6692
> 0x1124e4a do_store_flag
> ../../../riscv-gcc/gcc/expr.cc:13060
> 0x1116b10 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> expand_modifier)
> ../../../riscv-gcc/gcc/expr.cc:10265
> 0x1119405 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../../riscv-gcc/gcc/expr.cc:10810
> 0x1110fb0 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../../riscv-gcc/gcc/expr.cc:9015
> 0xf2e973 expand_normal(tree_node*)
> ../../../riscv-gcc/gcc/expr.h:316
> 0x12bb060 expand_vec_cond_mask_optab_fn
> ../../../riscv-gcc/gcc/internal-fn.cc:3059
> 0x12c27ca expand_VCOND_MASK
> ../../../riscv-gcc/gcc/internal-fn.def:184
> 0x12c52a5 expand_internal_call(internal_fn, gcall*)
> ../../../riscv-gcc/gcc/internal-fn.cc:4792
> 0x12c52d0 expand_internal_call(gcall*)
> ../../../riscv-gcc/gcc/internal-fn.cc:4800
> 0xf5e4c1 expand_call_stmt
> ../../../riscv-gcc/gcc/cfgexpand.cc:2737
> 0xf62871 expand_gimple_stmt_1
> ../../../riscv-gcc/gcc/cfgexpand.cc:3880
> 0xf62f0f expand_gimple_stmt
> ../../../riscv-gcc/gcc/cfgexpand.cc:4044
> 0xf6b8a9 expand_gimple_basic_block
> ../../../riscv-gcc/gcc/cfgexpand.cc:6096
>
> This ICE happens when compiling vcond.cc tests
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-07-20 15:57
> To: juzhe.zh...@rivai.ai; gcc-patches
> CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
> Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> > The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.
> >
> > Could you give me another enum name?
>
> I would have expected it to work when it's namespaced.
>
> Regards
> Robin
>
>

Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-20 Thread Andrew Carlotti via Gcc-patches

On Thu, Jul 20, 2023 at 09:37:14AM +0200, Richard Biener wrote:
> On Thu, Jul 20, 2023 at 8:49 AM Richard Sandiford via Gcc-patches
>  wrote:
> >
> > Andrew Carlotti  writes:
> > > Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> > > OK to backport to GCC 13?
> >
> > OK, thanks.
> 
> In case you want it in 13.2 please push it really soon, we want to do 13.2 RC1
> today.
> 
> Richard.

Pushed, thanks.

> 
> > Richard
> >
> > > Many intrinsics currently depend on both an architecture version and a
> > > feature, despite the corresponding instructions being available within
> > > GCC at lower architecture versions.
> > >
> > > LLVM has already removed these explicit architecture version
> > > dependences; this patch does the same for GCC. Note that +fp16 does not
> > > imply +simd, so we need to add an explicit +simd for the Neon fp16
> > > intrinsics.
> > >
> > > Binutils did not previously support all of these architecture+feature
> > > combinations, but this problem is already reachable from GCC.  For
> > > example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > > GCC 10.  This is fixed in Binutils 2.41.
> > >
> > > This patch retains explicit architecture version dependencies for
> > > features that do not currently have a separate feature flag.
> > >
> > > gcc/ChangeLog:
> > >
> > >  * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
> > >  dependency.
> > >  * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
> > >  dependencies from target pragmas.
> > >  * config/aarch64/arm_fp16.h (target): Likewise.
> > >  * config/aarch64/arm_neon.h (target): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >  * gcc.target/aarch64/feature-bf16-backport.c: New test.
> > >  * gcc.target/aarch64/feature-dotprod-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
> > >  * gcc.target/aarch64/feature-i8mm-backport.c: New test.
> > >  * gcc.target/aarch64/feature-memtag-backport.c: New test.
> > >  * gcc.target/aarch64/feature-sha3-backport.c: New test.
> > >  * gcc.target/aarch64/feature-sm4-backport.c: New test.
> > >
> > > ---
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index 
> > > a01f1ee99d85917941ffba55bc3b4dcac87b41f6..2b0fc97bb71e9d560ae26035c7d7142682e46c38
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
> > >  #define TARGET_RNG (AARCH64_ISA_RNG)
> > >
> > >  /* Memory Tagging instructions optional to Armv8.5 enabled through 
> > > +memtag.  */
> > > -#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
> > > +#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
> > >
> > >  /* I8MM instructions are enabled through +i8mm.  */
> > >  #define TARGET_I8MM (AARCH64_ISA_I8MM)
> > > diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> > > index 
> > > 3b6b63e6805432b5f1686745f987c52d2967c7c1..7599a32301dadf80760d3cb40a8685d2e6a476fb
> > >  100644
> > > --- a/gcc/config/aarch64/arm_acle.h
> > > +++ b/gcc/config/aarch64/arm_acle.h
> > > @@ -292,7 +292,7 @@ __rndrrs (uint64_t *__res)
> > >  #pragma GCC pop_options
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.5-a+memtag")
> > > +#pragma GCC target ("+nothing+memtag")
> > >
> > >  #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
> > >__builtin_aarch64_memtag_irg(__ptr, __u64_mask)
> > > diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> > > index 
> > > 350f8cc33d99e16137e9d70fa7958b10924dc67f..c10f9dcf7e097ded1740955addcd73348649dc56
> > >  100644
> > > --- a/gcc/config/aarch64/arm_fp16.h
> > > +++ b/gcc/config/aarch64/arm_fp16.h
> > > @@ -30,7 +30,7 @@
> > >  #include 
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > > +#pragma GCC target ("+nothing+fp16")
> > >
> > >  typedef __fp16 float16_t;
> > >
> > > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> > > index 
> > > 0ace1eeddb97443433c091d2363403fcf2907654..349f3167699447eb397af482eaeadf8a07617025
> > >  100644
> > > --- a/gcc/config/aarch64/arm_neon.h
> > > +++ b/gcc/config/aarch64/arm_neon.h
> > > @@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
> > >  #include "arm_fp16.h"
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > > +#pragma GCC target ("+nothing+simd+fp16")
> > >
> > >  /* ARMv8.2-A FP16 one operand vector intrinsics.  */
> > >
> > > @@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
> > >  /* AdvSIMD Dot Product intrinsics.  */
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.2-a+dotprod")
> > > +#pragma

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Kito Cheng via Gcc-patches

reduction_type = reduction_type::UNORDERED

On Thu, Jul 20, 2023 at 4:16 PM juzhe.zh...@rivai.ai
 wrote:
>
> I have tried this:
> enum class reduction_type
> {
>   UNORDERED,
>   FOLD_LEFT,
>   MASK_LEN_FOLD_LEFT,
> };
>
> But fail to build.
>
> /gcc/build -I../../../riscv-gcc/gcc/../include  
> -I../../../riscv-gcc/gcc/../libcpp/include -g -O0 \
> -o build/gencondmd.o build/gencondmd.cc
> In file included from ./tm_p.h:4:0,
>  from build/gencondmd.cc:29:
> ../../../riscv-gcc/gcc/config/riscv/riscv-protos.h:294:36: error: could not 
> convert ‘UNORDERED’ from ‘rtx_code’ to ‘riscv_vector::reduction_type’
>   reduction_type = UNORDERED);
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-07-20 16:03
> To: juzhe.zh...@rivai.ai
> CC: Robin Dapp; gcc-patches; kito.cheng; jeffreyalaw
> Subject: Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> Seems like because you ` using namespace riscv_vector;` so the
> UNORDERED in expand_vec_cmp_float used reduction_type::UNORDERED
>
> Hmmm, maybe enum class?
>
> enum class reduction_type
> {
>   UNORDERED,
>   FOLD_LEFT,
>   MASK_LEN_FOLD_LEFT,
> };
>
> and need use like this reduction_type::UNORDERED
>
> On Thu, Jul 20, 2023 at 3:59 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > I have no ideal, just ICE comes when running regression:
> >
> > during RTL pass: expand
> > auto.c: In function 'test_int32_t_float_unordered_var':
> > auto.c:24:3: internal compiler error: in expand_vec_cmp_float, at 
> > config/riscv/riscv-v.cc:2564
> >24 |   test_##TYPE1##_##TYPE2##_##CMP##_var (TYPE1 *restrict dest,   \
> >   |   ^
> > auto.c:41:3: note: in expansion of macro 'TEST_LOOP'
> >41 |   TEST_LOOP (int32_t, float, CMP) \
> >   |   ^
> > auto.c:55:1: note: in expansion of macro 'TEST_CMP'
> >55 | TEST_CMP (unordered)
> >   | ^~~~
> > 0x1c8af0d riscv_vector::expand_vec_cmp_float(rtx_def*, rtx_code, rtx_def*, 
> > rtx_def*, bool)
> > ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:2564
> > 0x233d200 gen_vec_cmprvvm1sfrvvmf32bi(rtx_def*, rtx_def*, rtx_def*, 
> > rtx_def*)
> > ../../../riscv-gcc/gcc/config/riscv/autovec.md:559
> > 0x14c4582 rtx_insn* insn_gen_fn::operator() > rtx_def*>(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
> > ../../../riscv-gcc/gcc/recog.h:407
> > 0x14c3c02 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8197
> > 0x14c4097 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8237
> > 0x14c412b expand_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8268
> > 0x14bfc3e expand_vec_cmp_expr(tree_node*, tree_node*, rtx_def*)
> > ../../../riscv-gcc/gcc/optabs.cc:6692
> > 0x1124e4a do_store_flag
> > ../../../riscv-gcc/gcc/expr.cc:13060
> > 0x1116b10 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> > expand_modifier)
> > ../../../riscv-gcc/gcc/expr.cc:10265
> > 0x1119405 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> > expand_modifier, rtx_def**, bool)
> > ../../../riscv-gcc/gcc/expr.cc:10810
> > 0x1110fb0 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> > expand_modifier, rtx_def**, bool)
> > ../../../riscv-gcc/gcc/expr.cc:9015
> > 0xf2e973 expand_normal(tree_node*)
> > ../../../riscv-gcc/gcc/expr.h:316
> > 0x12bb060 expand_vec_cond_mask_optab_fn
> > ../../../riscv-gcc/gcc/internal-fn.cc:3059
> > 0x12c27ca expand_VCOND_MASK
> > ../../../riscv-gcc/gcc/internal-fn.def:184
> > 0x12c52a5 expand_internal_call(internal_fn, gcall*)
> > ../../../riscv-gcc/gcc/internal-fn.cc:4792
> > 0x12c52d0 expand_internal_call(gcall*)
> > ../../../riscv-gcc/gcc/internal-fn.cc:4800
> > 0xf5e4c1 expand_call_stmt
> > ../../../riscv-gcc/gcc/cfgexpand.cc:2737
> > 0xf62871 expand_gimple_stmt_1
> > ../../../riscv-gcc/gcc/cfgexpand.cc:3880
> > 0xf62f0f expand_gimple_stmt
> > ../../../riscv-gcc/gcc/cfgexpand.cc:4044
> > 0xf6b8a9 expand_gimple_basic_block
> > ../../../riscv-gcc/gcc/cfgexpand.cc:6096
> >
> > This ICE happens when compiling vcond.cc tests
> > 
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Robin Dapp
> > Date: 2023-07-20 15:57
> > To: juzhe.zh...@rivai.ai; gcc-patches
> > CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
> > Subject: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> > > The UNORDERED enum will cause ICE since we have UNORDERED in rtx_code.
> > >
> > > Could you give me another enum name?
> >
> > I would have expected it to work when it's namespaced.
> >
> > Regards
> > Robin
> >
> >
>

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Jan Hubicka via Gcc-patches

> Tamar Christina  writes:
> > Hi All,
> >
> > The resulting predicate register of a whilelo is not
> > restricted to the lower half of the predicate register file.
> >
> > As such these tests started failing after recent changes
> > because the whilelo outside the loop is getting assigned p15.
> 
> It's the whilelo in the loop for me.  We go from:
> 
> .L3:
> ld1bz31.b, p7/z, [x4, x3]
> movprfx z30, z31
> mul z30.b, p5/m, z30.b, z29.b
> st1bz30.b, p7, [x4, x3]
> mov p6.b, p7.b
> add x3, x3, x0
> whilelo p7.b, w3, w1
> b.any   .L3
> 
> to:
> 
> .L3:
> ld1bz31.b, p7/z, [x3, x2]
> movprfx z29, z31
> mul z29.b, p6/m, z29.b, z30.b
> st1bz29.b, p7, [x3, x2]
> add x2, x2, x0
> whilelo p15.b, w2, w1
> b.any   .L4
> [...]
> .p2align 2,,3
> .L4:
> mov p7.b, p15.b
> b   .L3
> 
> This adds an extra (admittedly unconditional) branch to every non-final
> vector iteration, which seems unfortunate.  I don't think we'd see
> p8-p15 otherwise, since the result of the whilelo is used as a
> governing predicate by the next iteration of the loop.
> 
> This happens because the scalar loop is given an 89% chance of iterating.
> Previously we gave the vector loop an 83.33% chance of iterating, whereas
> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
> so the new probabilities definitely preserve the original probabilities
> more closely.  But for purely heuristic probabilities like these, I'm
> not sure we should lean so heavily into the idea that the vector
> latch is unlikely.
> 
> Honza, Richi, any thoughts?  Just wanted to double-check that this
> was operating as expected before making the tests accept the (arguably)
> less efficient code.  It looks like the commit was more aimed at fixing
> the profile counts for the epilogues, rather than the main loop.

You are right that we shold not scale down static profiles in case they
are artifically flat. It is nice to have actual testcase.
Old code used to test:

  /* Without profile feedback, loops for which we do not know a better estimate
 are assumed to roll 10 times.  When we unroll such loop, it appears to
 roll too little, and it may even seem to be cold.  To avoid this, we
 ensure that the created loop appears to roll at least 5 times (but at
 most as many times as before unrolling).  Don't do adjustment if profile
 feedback is present.  */
  if (new_est_niter < 5 && !profile_p)
{
  if (est_niter < 5)
new_est_niter = est_niter;
  else 
new_est_niter = 5;
} 

This is not right when profile feedback is around and also when we
managed to determine precise #of itrations at branch prediction time and
did not cap.

So I replaced it iwht the test that adjusted header count is not smaller
than the preheader edge count.  However this will happily get loop
iteration count close to 0.

It is bit hard to figure out if profile is realistic:

Sometimes we do
   profile_status_for_fn (cfun) != PROFILE_READ
I am trying to get rid of this test.  With LTO or when comdat profile is
lost we inline together functions with and without profile.

We can test for quality of loop header count to be precise or adjusted.
However at the time vectorizer is modifying loop profile we already
adjusted it for the initial conditional for profitability threshold and
drop it to GUESSED.Even with profile feedback we do not know outcome
probability of that one (Ondrej Kubanek's histograms will help here).

So I think we want to check if we have loop iteration estimate recorded
(that should be true for both profile feedback and loops with known trip
count) and if so compare it what profile says and it is more or less in
match consider profile realistic.  This needs to be done before
vectorizer starts tampering with the loop.

I will try to make patch for that.
Honza
> 
> Thanks,
> Richard
> 
> > This widens the regexp.
> >
> > Tested on aarch64-none-linux-gnu and passes again.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/live_1.c: Update assembly.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > index 
> > 80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c
> >  100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
> > @@ -27,10 +27,10 @@
> >  
> >  TEST_ALL (EXTRACT_LAST)
> >  
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
> > -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4

[PATCH v1] RISC-V: Fix one incorrect match operand for RVV reduction

2023-07-20 Thread Pan Li via Gcc-patches

From: Pan Li 

There are 2 of the RVV reduction pattern mask operand takes
vector_merge_operand instead of vector_mask_operand by mistake. This
patch would like to fix this.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector.md: Fix incorrect match_operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110299-1.c: Adjust tests.
* gcc.target/riscv/rvv/base/pr110299-2.c: Ditto.
---
 gcc/config/riscv/vector.md   | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fcff3ee3a17..f745888127c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7915,7 +7915,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VSF_LMUL1
  [(unspec:VSF_LMUL1
[(unspec:
- [(match_operand: 1 "vector_merge_operand"  "vmWc1,vmWc1")
+ [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand  5 "vector_length_operand" "   rK,   rK")
   (match_operand  6 "const_int_operand" "i,i")
   (match_operand  7 "const_int_operand" "i,i")
@@ -7937,7 +7937,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VDF_LMUL1
  [(unspec:VDF_LMUL1
[(unspec:
- [(match_operand:  1 "vector_merge_operand"  "vmWc1,vmWc1")
+ [(match_operand:  1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand   5 "vector_length_operand" "   rK,   rK")
   (match_operand   6 "const_int_operand" "i,i")
   (match_operand   7 "const_int_operand" "i,i")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
index d83eea925a7..a903dde34d1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
@@ -3,5 +3,5 @@
 
 #include "pr110299-1.h"
 
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
index cdcde1b89a4..1254ace58eb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
@@ -4,5 +4,5 @@
 #include "pr110299-1.h"
 #include "pr110299-2.h"
 
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
-- 
2.34.1

Re: [PATCH v1] RISC-V: Fix one incorrect match operand for RVV reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

LGTM. You can commit it.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-20 16:35
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Fix one incorrect match operand for RVV reduction
From: Pan Li 
 
There are 2 of the RVV reduction pattern mask operand takes
vector_merge_operand instead of vector_mask_operand by mistake. This
patch would like to fix this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Fix incorrect match_operand.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110299-1.c: Adjust tests.
* gcc.target/riscv/rvv/base/pr110299-2.c: Ditto.
---
gcc/config/riscv/vector.md   | 4 ++--
gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c | 4 ++--
gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fcff3ee3a17..f745888127c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7915,7 +7915,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VSF_LMUL1
  [(unspec:VSF_LMUL1
[(unspec:
-   [(match_operand: 1 "vector_merge_operand"  "vmWc1,vmWc1")
+   [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand  5 "vector_length_operand" "   rK,   rK")
   (match_operand  6 "const_int_operand" "i,i")
   (match_operand  7 "const_int_operand" "i,i")
@@ -7937,7 +7937,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VDF_LMUL1
  [(unspec:VDF_LMUL1
[(unspec:
-   [(match_operand:  1 "vector_merge_operand"  "vmWc1,vmWc1")
+   [(match_operand:  1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand   5 "vector_length_operand" "   rK,   rK")
   (match_operand   6 "const_int_operand" "i,i")
   (match_operand   7 "const_int_operand" "i,i")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
index d83eea925a7..a903dde34d1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
@@ -3,5 +3,5 @@
#include "pr110299-1.h"
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
index cdcde1b89a4..1254ace58eb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
@@ -4,5 +4,5 @@
#include "pr110299-1.h"
#include "pr110299-2.h"
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
-- 
2.34.1

RE: [PATCH v1] RISC-V: Fix one incorrect match operand for RVV reduction

2023-07-20 Thread Li, Pan2 via Gcc-patches

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, July 20, 2023 4:37 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Fix one incorrect match operand for RVV 
reduction

LGTM. You can commit it.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-20 16:35
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Fix one incorrect match operand for RVV reduction
From: Pan Li mailto:pan2...@intel.com>>

There are 2 of the RVV reduction pattern mask operand takes
vector_merge_operand instead of vector_mask_operand by mistake. This
patch would like to fix this.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/vector.md: Fix incorrect match_operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110299-1.c: Adjust tests.
* gcc.target/riscv/rvv/base/pr110299-2.c: Ditto.
---
gcc/config/riscv/vector.md   | 4 ++--
gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c | 4 ++--
gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fcff3ee3a17..f745888127c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7915,7 +7915,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VSF_LMUL1
  [(unspec:VSF_LMUL1
[(unspec:
-   [(match_operand: 1 "vector_merge_operand"  "vmWc1,vmWc1")
+   [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand  5 "vector_length_operand" "   rK,   rK")
   (match_operand  6 "const_int_operand" "i,i")
   (match_operand  7 "const_int_operand" "i,i")
@@ -7937,7 +7937,7 @@ (define_insn 
"@pred_widen_reduc_plus"
(unspec:VDF_LMUL1
  [(unspec:VDF_LMUL1
[(unspec:
-   [(match_operand:  1 "vector_merge_operand"  "vmWc1,vmWc1")
+   [(match_operand:  1 "vector_mask_operand"   "vmWc1,vmWc1")
   (match_operand   5 "vector_length_operand" "   rK,   rK")
   (match_operand   6 "const_int_operand" "i,i")
   (match_operand   7 "const_int_operand" "i,i")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
index d83eea925a7..a903dde34d1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
@@ -3,5 +3,5 @@
#include "pr110299-1.h"
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
index cdcde1b89a4..1254ace58eb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
@@ -4,5 +4,5 @@
#include "pr110299-1.h"
#include "pr110299-2.h"
-/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
-/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0.t} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times 
{vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
--
2.34.1

[PATCH V2] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Juzhe-Zhong

This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html

Consider this following case:
float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

Compile with **NO** -ffast-math:

Before this patch:
:4:21: missed: couldn't vectorize loop
:1:7: missed: not vectorized: relevant phi not supported: result_14 = 
PHI 

After this patch:
foo:
lui a5,%hi(.LC0)
flw fa0,%lo(.LC0)(a5)
ble a1,zero,.L4
.L3:
vsetvli a5,a1,e32,m1,ta,ma
vle32.v v1,0(a0)
sllia4,a5,2
sub a1,a1,a5
vfmv.s.fv2,fa0
add a0,a0,a4
vfredosum.vsv1,v1,v2 --> FOLD_LEFT_PLUS
vfmv.f.sfa0,v1
bne a1,zero,.L3
ret
.L4:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (fold_left_plus_): New pattern.
(mask_len_fold_left_plus_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(enum reduction_type): Ditto.
(expand_reduction): Add in-order reduction.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New 
function.
(expand_reduction): Add in-order reduction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New test.

---
 gcc/config/riscv/autovec.md   | 39 ++
 gcc/config/riscv/riscv-protos.h   | 13 -
 gcc/config/riscv/riscv-v.cc   | 53 +++
 .../riscv/rvv/autovec/reduc/reduc_strict-1.c  | 28 ++
 .../riscv/rvv/autovec/reduc/reduc_strict-2.c  | 26 +
 .../riscv/rvv/autovec/reduc/reduc_strict-3.c  | 18 +++
 .../riscv/rvv/autovec/reduc/reduc_strict-4.c  | 24 +
 .../riscv/rvv/autovec/reduc/reduc_strict-5.c  | 28 ++
 .../riscv/rvv/autovec/reduc/reduc_strict-6.c  | 18 +++
 .../riscv/rvv/autovec/reduc/reduc_strict-7.c  | 21 
 .../rvv/autovec/reduc/reduc_strict_run-1.c| 29 ++
 .../rvv/autovec/reduc/reduc_strict_run-2.c| 31 +++
 12 files changed, 317 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 00947207f3f..667a877d009 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1687,3 +1687,42 @@
   riscv_vector::expand_reduction (SMIN, operands, f);
   DONE;
 })
+
+;; -
+;;  [FP] Left-to-right reductions
+;; -
+;; Includes:
+;; - vfredosum.vs
+;; -
+
+;; Unpredicated in-order FP reductions.
+(define_expand "fold_left_plus_"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VF 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands,
+ operands[1],
+ riscv_vector::reduction_type::FOLD_LEFT);
+  DONE;
+})
+
+;; Predicated in-order FP reductions.
+(define_expand "mask_len_fold_left_plus_"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VF 2 "register_operand")
+   (match_operand: 3 "vector_mask_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  if (rtx_equal_p (operands[4], const0_rtx))
+emit_move_insn

Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread juzhe.zh...@rivai.ai

Address all comments on V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625038.html 

Redundant vsetvli are elided by this code:
  rtx len = type == reduction_type::MASK_LEN_FOLD_LEFT ? ops[4] : NULL_RTX;
  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops,
 len);

Pass through len operand for MASK_LEN_FOLD_LEFT

Now the codegen:

foo:
lui a5,%hi(.LC0)
flw fa0,%lo(.LC0)(a5)
ble a1,zero,.L4
.L3:
vsetvli a5,a1,e32,m1,ta,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v1,0(a0)
vfmv.s.f v2,fa0
add a0,a0,a4
vfredosum.vs v1,v1,v2
vfmv.f.s fa0,v1
bne a1,zero,.L3
ret


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-20 16:24
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; gcc-patches; kito.cheng; jeffreyalaw
Subject: Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction
reduction_type = reduction_type::UNORDERED
 
On Thu, Jul 20, 2023 at 4:16 PM juzhe.zh...@rivai.ai
 wrote:
>
> I have tried this:
> enum class reduction_type
> {
>   UNORDERED,
>   FOLD_LEFT,
>   MASK_LEN_FOLD_LEFT,
> };
>
> But fail to build.
>
> /gcc/build -I../../../riscv-gcc/gcc/../include  
> -I../../../riscv-gcc/gcc/../libcpp/include -g -O0 \
> -o build/gencondmd.o build/gencondmd.cc
> In file included from ./tm_p.h:4:0,
>  from build/gencondmd.cc:29:
> ../../../riscv-gcc/gcc/config/riscv/riscv-protos.h:294:36: error: could not 
> convert ‘UNORDERED’ from ‘rtx_code’ to ‘riscv_vector::reduction_type’
>   reduction_type = UNORDERED);
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-07-20 16:03
> To: juzhe.zh...@rivai.ai
> CC: Robin Dapp; gcc-patches; kito.cheng; jeffreyalaw
> Subject: Re: Re: [PATCH] RISC-V: Support in-order floating-point reduction
> Seems like because you ` using namespace riscv_vector;` so the
> UNORDERED in expand_vec_cmp_float used reduction_type::UNORDERED
>
> Hmmm, maybe enum class?
>
> enum class reduction_type
> {
>   UNORDERED,
>   FOLD_LEFT,
>   MASK_LEN_FOLD_LEFT,
> };
>
> and need use like this reduction_type::UNORDERED
>
> On Thu, Jul 20, 2023 at 3:59 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > I have no ideal, just ICE comes when running regression:
> >
> > during RTL pass: expand
> > auto.c: In function 'test_int32_t_float_unordered_var':
> > auto.c:24:3: internal compiler error: in expand_vec_cmp_float, at 
> > config/riscv/riscv-v.cc:2564
> >24 |   test_##TYPE1##_##TYPE2##_##CMP##_var (TYPE1 *restrict dest,   \
> >   |   ^
> > auto.c:41:3: note: in expansion of macro 'TEST_LOOP'
> >41 |   TEST_LOOP (int32_t, float, CMP) \
> >   |   ^
> > auto.c:55:1: note: in expansion of macro 'TEST_CMP'
> >55 | TEST_CMP (unordered)
> >   | ^~~~
> > 0x1c8af0d riscv_vector::expand_vec_cmp_float(rtx_def*, rtx_code, rtx_def*, 
> > rtx_def*, bool)
> > ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:2564
> > 0x233d200 gen_vec_cmprvvm1sfrvvmf32bi(rtx_def*, rtx_def*, rtx_def*, 
> > rtx_def*)
> > ../../../riscv-gcc/gcc/config/riscv/autovec.md:559
> > 0x14c4582 rtx_insn* insn_gen_fn::operator() > rtx_def*>(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
> > ../../../riscv-gcc/gcc/recog.h:407
> > 0x14c3c02 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8197
> > 0x14c4097 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8237
> > 0x14c412b expand_insn(insn_code, unsigned int, expand_operand*)
> > ../../../riscv-gcc/gcc/optabs.cc:8268
> > 0x14bfc3e expand_vec_cmp_expr(tree_node*, tree_node*, rtx_def*)
> > ../../../riscv-gcc/gcc/optabs.cc:6692
> > 0x1124e4a do_store_flag
> > ../../../riscv-gcc/gcc/expr.cc:13060
> > 0x1116b10 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
> > expand_modifier)
> > ../../../riscv-gcc/gcc/expr.cc:10265
> > 0x1119405 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> > expand_modifier, rtx_def**, bool)
> > ../../../riscv-gcc/gcc/expr.cc:10810
> > 0x1110fb0 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> > expand_modifier, rtx_def**, bool)
> > ../../../riscv-gcc/gcc/expr.cc:9015
> > 0xf2e973 expand_normal(tree_node*)
> > ../../../riscv-gcc/gcc/expr.h:316
> > 0x12bb060 expand_vec_cond_mask_optab_fn
> > ../../../riscv-gcc/gcc/internal-fn.cc:3059
> > 0x12c27ca expand_VCOND_MASK
> > ../../../riscv-gcc/gcc/internal-fn.def:184
> > 0x12c52a5 expand_internal_call(internal_fn, gcall*)
> > ../../../riscv-gcc/gcc/internal-fn.cc:4792
> > 0x12c52d0 expand_internal_call(gcall*)
> > ../../../riscv-gcc/gcc/internal-fn.cc:4800
> > 0xf5e4c1 expand_call_stmt
> > ../../../riscv-gcc/gcc/cfgexpand.cc:2737
> > 0xf62871 expand_gimple_stmt_1
> > ../../../riscv-gcc/gcc/cfgexpand.cc:3880
> > 0xf62f0f expand_gimple_stmt
> > ../../../riscv-gcc/gcc/cfgexpand.cc:4044
> > 0xf6b8a9 expand_gimple_basi

Re: [PATCH V2] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Kito Cheng via Gcc-patches

LGTM, but I would like make sure Robin is OK too

On Thu, Jul 20, 2023 at 4:51 PM Juzhe-Zhong  wrote:
>
> This patch is depending on:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
>
> Consider this following case:
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
>
> Compile with **NO** -ffast-math:
>
> Before this patch:
> :4:21: missed: couldn't vectorize loop
> :1:7: missed: not vectorized: relevant phi not supported: result_14 = 
> PHI 
>
> After this patch:
> foo:
> lui a5,%hi(.LC0)
> flw fa0,%lo(.LC0)(a5)
> ble a1,zero,.L4
> .L3:
> vsetvli a5,a1,e32,m1,ta,ma
> vle32.v v1,0(a0)
> sllia4,a5,2
> sub a1,a1,a5
> vfmv.s.fv2,fa0
> add a0,a0,a4
> vfredosum.vsv1,v1,v2 --> FOLD_LEFT_PLUS
> vfmv.f.sfa0,v1
> bne a1,zero,.L3
> ret
> .L4:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (fold_left_plus_): New pattern.
> (mask_len_fold_left_plus_): Ditto.
> * config/riscv/riscv-protos.h (enum insn_type): New enum.
> (enum reduction_type): Ditto.
> (expand_reduction): Add in-order reduction.
> * config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New 
> function.
> (expand_reduction): Add in-order reduction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 39 ++
>  gcc/config/riscv/riscv-protos.h   | 13 -
>  gcc/config/riscv/riscv-v.cc   | 53 +++
>  .../riscv/rvv/autovec/reduc/reduc_strict-1.c  | 28 ++
>  .../riscv/rvv/autovec/reduc/reduc_strict-2.c  | 26 +
>  .../riscv/rvv/autovec/reduc/reduc_strict-3.c  | 18 +++
>  .../riscv/rvv/autovec/reduc/reduc_strict-4.c  | 24 +
>  .../riscv/rvv/autovec/reduc/reduc_strict-5.c  | 28 ++
>  .../riscv/rvv/autovec/reduc/reduc_strict-6.c  | 18 +++
>  .../riscv/rvv/autovec/reduc/reduc_strict-7.c  | 21 
>  .../rvv/autovec/reduc/reduc_strict_run-1.c| 29 ++
>  .../rvv/autovec/reduc/reduc_strict_run-2.c| 31 +++
>  12 files changed, 317 insertions(+), 11 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 00947207f3f..667a877d009 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1687,3 +1687,42 @@
>riscv_vector::expand_reduction (SMIN, operands, f);
>DONE;
>  })
> +
> +;; -
> +;;  [FP] Left-to-right reductions
> +;; -
> +;; Includes:
> +;; - vfredosum.vs
> +;; -
> +
> +;; Unpredicated in-order FP reductions.
> +(define_expand "fold_left_plus_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand: 1 "register_operand")
> +   (match_operand:VF 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (PLUS, operands,
> + operands[1],
> + riscv_vector::reduction_type::FOLD_LEFT);
> +  DONE;
> +})
> +
> +;; Predicated in-order FP reductions.
> +(define_expand "mask_len_fold_left_plus_"
> +

Re: [PATCH V2] RISC-V: Support in-order floating-point reduction

2023-07-20 Thread Robin Dapp via Gcc-patches

> LGTM, but I would like make sure Robin is OK too

Yes, LGTM as well.

Regards
 Robin

[PATCH 1/3] RISC-V: Part-1: Select suitable vector registers for vector type args and returns

2023-07-20 Thread Lehua Ding

I have posted below the vector register calling convention rules from in the
proposal[1]:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for 
checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector 
arguments.
(riscv_arguments_is_vector_type_p): New function for check vector 
returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.

[PATCH 0/3] RISC-V: Add an experimental vector calling convention

2023-07-20 Thread Lehua Ding

Hi RISC-V folks,

This patch implement the proposal of RISC-V vector calling convention[1] and
this feature can be enabled by `--param=riscv-vector-abi` option. Currently,
all vector type arguments and return values are pass by reference. With this
patch, these arguments and return values can pass through vector registers.
Currently only vector types defined in the RISC-V Vector Extension Intrinsic 
Document[2]
are supported. GNU-ext vector types are unsupported for now since the
corresponding proposal was not presented.

The proposal introduce a new calling convention variant, functions which follow
this variant need follow the bellow vector register convention.

| Name| ABI Mnemonic | Meaning  | Preserved across 
calls?
=
| v0  |  | Argument register| No
| v1-v7   |  | Callee-saved registers   | Yes
| v8-v23  |  | Argument registers   | No
| v24-v31 |  | Callee-saved registers   | Yes

If a functions follow this vector calling convention, then the function symbole
must be annotated with .variant_cc directive[3] (used to indicate that it is a
calling convention variant).

This implementation split into three parts, each part corresponds to a 
sub-patch.

- Part-1: Select suitable vector regsiters for vector type arguments and return
  values according to the proposal.
- Part-2: Allocate frame area for callee-saved vector registers and save/restore
  them in prologue and epilogue.
- Part-3: Generate .variant_cc directive for vector function in assembly code.

Best,
Lehua

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389
[2] 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system
[3] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops

Lehua Ding (3):
  RISC-V: Part-1: Select suitable vector registers for vector type args
and returns
  RISC-V: Part-2: Save/Restore vector registers which need to be
preversed
  RISC-V: Part-3: Output .variant_cc directive for vector function

 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  10 +
 gcc/config/riscv/riscv.cc | 510 --
 gcc/config/riscv/riscv.h  |  40 ++
 gcc/config/riscv/riscv.md |  43 +-
 gcc/config/riscv/riscv.opt|   5 +
 .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
 .../riscv/rvv/base/abi-call-args-1.c  | 197 +++
 .../riscv/rvv/base/abi-call-args-2-run.c  |  34 ++
 .../riscv/rvv/base/abi-call-args-2.c  |  27 +
 .../riscv/rvv/base/abi-call-args-3-run.c  | 260 +
 .../riscv/rvv/base/abi-call-args-3.c  | 116 
 .../riscv/rvv/base/abi-call-args-4-run.c  | 145 +
 .../riscv/rvv/base/abi-call-args-4.c  | 111 
 .../riscv/rvv/base/abi-call-error-1.c |  11 +
 .../riscv/rvv/base/abi-call-return-run.c  | 127 +
 .../riscv/rvv/base/abi-call-return.c  | 197 +++
 .../riscv/rvv/base/abi-call-variant_cc.c  |  39 ++
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
 .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 
 23 files changed, 2327 insertions(+), 62 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-error-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

-- 
2.36.3

[PATCH 3/3] RISC-V: Part-3: Output .variant_cc directive for vector function

2023-07-20 Thread Lehua Ding

Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual and
RISC-V ELF Specification[2].

[1] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):  Output 
.variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME): Implement 
ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |  3 ++
 gcc/config/riscv/riscv.cc | 48 +++
 gcc/config/riscv/riscv.h  | 15 ++
 .../riscv/rvv/base/abi-call-variant_cc.c  | 39 +++
 4 files changed, 105 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 52e15e1b5d6..eb62eb46f55 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -101,6 +101,9 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern void riscv_declare_function_name (FILE *, const char *, tree);
+extern void riscv_asm_output_alias (FILE *, const tree, const tree);
+extern void riscv_asm_output_external (FILE *, const tree, const char *);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1ca3ed42d40..c8879659f1f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6740,6 +6740,54 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Output .variant_cc for function symbol which follows vector calling
+   convention.  */
+
+static void
+riscv_asm_output_variant_cc (FILE *stream, const tree decl, const char *name)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+{
+  riscv_cc cc = (riscv_cc) fndecl_abi (decl).id ();
+  if (cc == RISCV_CC_V)
+   {
+ fprintf (stream, "\t.variant_cc\t");
+ assemble_name (stream, name);
+ fprintf (stream, "\n");
+   }
+}
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  */
+
+void
+riscv_declare_function_name (FILE *stream, const char *name, tree fndecl)
+{
+  riscv_asm_output_variant_cc (stream, fndecl, name);
+  ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
+  ASM_OUTPUT_LABEL (stream, name);
+}
+
+/* Implement ASM_OUTPUT_DEF_FROM_DECLS.  */
+
+void
+riscv_asm_output_alias (FILE *stream, const tree decl, const tree target)
+{
+  const char *name = XSTR (XEXP (DECL_RTL (decl), 0), 0);
+  const char *value = IDENTIFIER_POINTER (target);
+  riscv_asm_output_variant_cc (stream, decl, name);
+  ASM_OUTPUT_DEF (stream, name, value);
+}
+
+/* Implement ASM_OUTPUT_EXTERNAL.  */
+
+void
+riscv_asm_output_external (FILE *stream, tree decl, const char *name)
+{
+  default_elf_asm_output_external (stream, decl, name);
+  riscv_asm_output_variant_cc (stream, decl, name);
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index b24b240dd75..1820593bab5 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1021,6 +1021,21 @@ while (0)
 
 #define ASM_COMMENT_START "#"
 
+/* Add output .variant_cc directive for specific function definition.  */
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) 
\
+  riscv_declare_function_name (STR, NAME, DECL)
+
+/* Add output .variant_cc directive for specific alias definition.  */
+#undef ASM_OUTPUT_DEF_FROM_DECLS
+#define ASM_OUTPUT_DEF_FROM_DECLS(STR, DECL, TARGET)   
\
+  riscv_asm_output_alias (STR, DECL, TARGET)
+
+/* Add output .variant_cc directive for specific extern function.  */
+#undef ASM_OUTPUT_EXTERNAL
+#define ASM_OUTPUT_EXTERNAL(STR, DECL, NAME)   
\
+  riscv_asm_output_external (STR, DECL, NAME)
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (POINTER_SIZE == 64 ? "long unsigned int" : "unsigned int")
 
diff --git a/gcc/testsuite/gcc.target/riscv/r

[PATCH 2/3] RISC-V: Part-2: Save/Restore vector registers which need to be preversed

2023-07-20 Thread Lehua Ding

Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): 
Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_save_libcall_count): Add vector save area.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved 
registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.

---
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv.cc | 222 +++---
 gcc/config/riscv/riscv.md |  43 +++-
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
 .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +
 7 files changed, 606 insertions(+), 45 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

diff --git a/gcc/config/riscv/riscv-sr.cc b/gcc/config/riscv/riscv-sr.cc
index 7248f04d68f..e6e17685df5 100644
--- a/gcc/config/riscv/riscv-sr.cc
+++ b/gcc/config/riscv/riscv-sr.cc
@@ -447,12 +447,18 @@ riscv_remove_unneeded_save_restore_calls (void)
   && !SIBCALL_REG_P (REGNO (target)))
 return;
 
+  /* Extract RISCV CC from the UNSPEC rtx.  */
+  rtx unspec = XVECEXP (callpat, 0, 1);
+  gcc_assert (GET_CODE (unspec) == UNSPEC
+ && XINT (unspec, 1) == UNSPEC_CALLEE_CC);
+  riscv_cc cc = (riscv_cc) INTVAL (XVECEXP (unspec, 0, 0));
   rtx sibcall = NULL;
   if (set_target != NULL)
-sibcall
-  = gen_sibcall_value_internal (set_target, target, const0_rtx);
+sibcall = gen_sibcall_value_internal (set_target, target, const0_rtx,
+ gen_int_mode (cc, SImode));
   else
-sibcall = gen_sibcall_internal (target, const0_rtx);
+sibcall
+  = gen_sibcall_internal (target, const0_rtx, gen_int_mode (cc, SImode));
 
   rtx_insn *before_call = PREV_INSN (call);
   remove_insn (call);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 73e9f2001e6..1ca3ed42d40 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -101,6 +101,9 @@ struct GTY(())  riscv_frame_info {
   /* Likewise FPR X.  */
   unsigned int fmask;
 
+  /* Likewise for vector registers.  */
+  unsigned int vmask;
+
   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
   unsigned save_libcall_adjustment;
 
@@ -108,6 +111,10 @@ struct GTY(())  riscv_frame_info {
   poly_int64 gp_sp_offset;
   poly_int64 fp_sp_offset;
 
+  /* Top and bottom offsets of vector save areas from frame bottom.  */
+  poly_int64 v_sp_offset_top;
+  poly_int64 v_sp_offset_bottom;
+
   /* Offset of virtual frame pointer from stack pointer/frame bottom */
   poly_int64 frame_pointer_offset;
 
@@ -243,7 +250,7 @@ unsigned riscv_stack_boundary;
 /* If non-zero, this is an offset to be added to SP to redefine the CFA
when restoring the FP register from the stack.  Only valid when generating
the epilogue.  */
-static int epilogue_cfa_sp_offset;
+static poly_int64 epilogue_cfa_sp_offset;
 
 /* Which tuning parameters to use.  */
 static const struct riscv_tune_param *tune_param;

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Thu, 20 Jul 2023, Richard Sandiford wrote:
>
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > The resulting predicate register of a whilelo is not
>> > restricted to the lower half of the predicate register file.
>> >
>> > As such these tests started failing after recent changes
>> > because the whilelo outside the loop is getting assigned p15.
>> 
>> It's the whilelo in the loop for me.  We go from:
>> 
>> .L3:
>> ld1bz31.b, p7/z, [x4, x3]
>> movprfx z30, z31
>> mul z30.b, p5/m, z30.b, z29.b
>> st1bz30.b, p7, [x4, x3]
>> mov p6.b, p7.b
>> add x3, x3, x0
>> whilelo p7.b, w3, w1
>> b.any   .L3
>> 
>> to:
>> 
>> .L3:
>> ld1bz31.b, p7/z, [x3, x2]
>> movprfx z29, z31
>> mul z29.b, p6/m, z29.b, z30.b
>> st1bz29.b, p7, [x3, x2]
>> add x2, x2, x0
>> whilelo p15.b, w2, w1
>> b.any   .L4
>> [...]
>> .p2align 2,,3
>> .L4:
>> mov p7.b, p15.b
>> b   .L3
>> 
>> This adds an extra (admittedly unconditional) branch to every non-final
>> vector iteration, which seems unfortunate.  I don't think we'd see
>> p8-p15 otherwise, since the result of the whilelo is used as a
>> governing predicate by the next iteration of the loop.
>> 
>> This happens because the scalar loop is given an 89% chance of iterating.
>> Previously we gave the vector loop an 83.33% chance of iterating, whereas
>> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
>> so the new probabilities definitely preserve the original probabilities
>> more closely.  But for purely heuristic probabilities like these, I'm
>> not sure we should lean so heavily into the idea that the vector
>> latch is unlikely.
>> 
>> Honza, Richi, any thoughts?  Just wanted to double-check that this
>> was operating as expected before making the tests accept the (arguably)
>> less efficient code.  It looks like the commit was more aimed at fixing
>> the profile counts for the epilogues, rather than the main loop.
>
> The above looks like a failed coalescing, can you track down where
> that happens and why?

Ah, sorry, I shouldn't have trimmed the context.  The previous predicate
(p6 in the original code) is live on exit from the loop, while the
whilelo result is live on the latch edge.  So I think a move is needed
somewhere.

Thanks,
Richard

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Richard Sandiford via Gcc-patches

Jan Hubicka  writes:
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > The resulting predicate register of a whilelo is not
>> > restricted to the lower half of the predicate register file.
>> >
>> > As such these tests started failing after recent changes
>> > because the whilelo outside the loop is getting assigned p15.
>> 
>> It's the whilelo in the loop for me.  We go from:
>> 
>> .L3:
>> ld1bz31.b, p7/z, [x4, x3]
>> movprfx z30, z31
>> mul z30.b, p5/m, z30.b, z29.b
>> st1bz30.b, p7, [x4, x3]
>> mov p6.b, p7.b
>> add x3, x3, x0
>> whilelo p7.b, w3, w1
>> b.any   .L3
>> 
>> to:
>> 
>> .L3:
>> ld1bz31.b, p7/z, [x3, x2]
>> movprfx z29, z31
>> mul z29.b, p6/m, z29.b, z30.b
>> st1bz29.b, p7, [x3, x2]
>> add x2, x2, x0
>> whilelo p15.b, w2, w1
>> b.any   .L4
>> [...]
>> .p2align 2,,3
>> .L4:
>> mov p7.b, p15.b
>> b   .L3
>> 
>> This adds an extra (admittedly unconditional) branch to every non-final
>> vector iteration, which seems unfortunate.  I don't think we'd see
>> p8-p15 otherwise, since the result of the whilelo is used as a
>> governing predicate by the next iteration of the loop.
>> 
>> This happens because the scalar loop is given an 89% chance of iterating.
>> Previously we gave the vector loop an 83.33% chance of iterating, whereas
>> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
>> so the new probabilities definitely preserve the original probabilities
>> more closely.  But for purely heuristic probabilities like these, I'm
>> not sure we should lean so heavily into the idea that the vector
>> latch is unlikely.
>> 
>> Honza, Richi, any thoughts?  Just wanted to double-check that this
>> was operating as expected before making the tests accept the (arguably)
>> less efficient code.  It looks like the commit was more aimed at fixing
>> the profile counts for the epilogues, rather than the main loop.
>
> You are right that we shold not scale down static profiles in case they
> are artifically flat. It is nice to have actual testcase.
> Old code used to test:
>
>   /* Without profile feedback, loops for which we do not know a better 
> estimate
>  are assumed to roll 10 times.  When we unroll such loop, it appears to
>  roll too little, and it may even seem to be cold.  To avoid this, we
>  ensure that the created loop appears to roll at least 5 times (but at
>  most as many times as before unrolling).  Don't do adjustment if profile
>  feedback is present.  */
>   if (new_est_niter < 5 && !profile_p)
> {
>   if (est_niter < 5)
> new_est_niter = est_niter;
>   else 
> new_est_niter = 5;
> } 
>
> This is not right when profile feedback is around and also when we
> managed to determine precise #of itrations at branch prediction time and
> did not cap.
>
> So I replaced it iwht the test that adjusted header count is not smaller
> than the preheader edge count.  However this will happily get loop
> iteration count close to 0.
>
> It is bit hard to figure out if profile is realistic:
>
> Sometimes we do
>profile_status_for_fn (cfun) != PROFILE_READ
> I am trying to get rid of this test.  With LTO or when comdat profile is
> lost we inline together functions with and without profile.
>
> We can test for quality of loop header count to be precise or adjusted.
> However at the time vectorizer is modifying loop profile we already
> adjusted it for the initial conditional for profitability threshold and
> drop it to GUESSED.Even with profile feedback we do not know outcome
> probability of that one (Ondrej Kubanek's histograms will help here).

Ah, yeah, hadn't thought about that.

> So I think we want to check if we have loop iteration estimate recorded
> (that should be true for both profile feedback and loops with known trip
> count) and if so compare it what profile says and it is more or less in
> match consider profile realistic.  This needs to be done before
> vectorizer starts tampering with the loop.
>
> I will try to make patch for that.

Thanks!

Richard

Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-20 Thread Maciej W. Rozycki

On Thu, 20 Jul 2023, Richard Biener wrote:

> >  Thanks for making this improvement.  I've checked MIPS results and code
> > produced now is as follows:
> >
> > daddiu  $sp,$sp,-64
> > sd  $5,24($sp)
> > sd  $7,40($sp)
> > ldc1$f0,24($sp)
> > ldc1$f1,40($sp)
> > sd  $4,16($sp)
> > sd  $6,32($sp)
> > ldc1$f2,32($sp)
> > add.ps  $f1,$f0,$f1
> > ldc1$f0,16($sp)
> > add.ps  $f0,$f0,$f2
> > sdc1$f1,56($sp)
> > ld  $3,56($sp)
> > sdc1$f0,48($sp)
> > ld  $2,48($sp)
> > jr  $31
> > daddiu  $sp,$sp,64
> >
> > which does do vector stuff now, although it's still considerably worse
> > than my handwritten example:
> >
> > > > dmtc1   $4,$f0
> > > > dmtc1   $5,$f1
> > > > dmtc1   $6,$f2
> > > > dmtc1   $7,$f3
> > > > add.ps  $f0,$f0,$f1
> > > > add.ps  $f2,$f2,$f3
> > > > dmfc1   $2,$f0
> > > > jr  $31
> > > > dmfc1   $3,$f2
> >
> > Or I'd say it's pretty terrible, but given the current situation with the
> > MIPS backend I'm going to leave it to the new maintainer to sort out.
> 
> Yeah, I also wondered what is wrong ... I suspect it's the usual issue
> of parameter passing causing spilling ...

 There's no such requirement in the psABI and I fail to see a plausible 
justification.  And direct GPR<->FPR move patterns are available in the 
backend for the V2SF mode.  Also there's no delay slot requirement even 
for these move instructions for MIPS64r1+ ISA levels, which have this 
paired-single FP format defined.  It seems to me a plain bug (or missed 
optimisation if you prefer).

  Maciej

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-20 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
#> On Thu, 20 Jul 2023, Richard Sandiford wrote:
>
>> Jeff Law via Gcc-patches  writes:
>> > On 7/19/23 04:25, Richard Biener wrote:
>> >> On Wed, 19 Jul 2023, YunQiang Su wrote:
>> >> 
>> >>> Eric Botcazou  ?2023?7?19??? 17:45???
>> 
>> > I don't see that.  That's definitely not what GCC expects here,
>> > the left-most word of the doubleword should be unchanged.
>> >
>> > Your testcase should be a dg-do-run and probably more like
>> >
>> > NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
>> > {
>> >int val;
>> >((unsigned char*)&val)[0] = *buf++;
>> >((unsigned char*)&val)[1] = *buf++;
>> >((unsigned char*)&val)[2] = *buf++;
>> >((unsigned char*)&val)[3] = *buf++;
>> >return val;
>> > }
>> > int main()
>> > {
>> >int val = 0x01020304;
>> >val = test (&val);
>> >if (val != 0x01020304)
>> >  abort ();
>> > }
>> >
>> > not sure if I got endianess correct.  Now, the question is what
>> > WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
>> > the MIPS ABI says for returning SImode.
>> 
>> >>>
>> >>> MIPS N64 ABI uses 2 GPR for integer return values.
>> >>> If the return value is SImode, the first v0 register is used, and it
>> >>> must be sign-extended,
>> >>> aka the bits[64-31] are all same.
>> >>>
>> >>> Yes, it is same for signed and unsigned int32.
>> >>>
>> >>> https://irix7.com/techpubs/007-2816-004.pdf
>> >>> Page 6:
>> >>> 32-bit integer (int) parameters are always sign-extended when passed
>> >>> in registers,
>> >>> whether of signed or unsigned type. [This issue does not arise in the
>> >>> o32-bit ABI.]
>> >> 
>> >> Note I think Andrews comment#7 in the PR is spot-on then, the issue
>> >> isn't the bitfield inserts but the compare where combine elides
>> >> the sign_extend in favor of a subreg.  That's likely some wrongdoing
>> >> in simplify-rtx in the context of WORD_REGISTER_OPERATIONS.
>> > And I think it raises a real question about the use of GPR (which maps 
>> > to SImode and DImode for 64bit MIPS targets) on the conditional 
>> > branching patterns in mips.md.
>> >
>> > So while this code works:
>> >
>> >> (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
>> >> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4))) 
>> >> "/app/example.cpp":7:29 -1
>> >>  (nil))
>> 
>> Haven't had chance to compile and look at it properly, but this subreg
>> seems suspicious for MIPS, given the definition of TRULY_NOOP_TRUNCATION.
>> We should instead use a truncdisi2 to narrow reg:DI 200 to an SI register,
>> and then sign_extend it.
>> 
>> This is easily missed in target-independent code because so few targets
>> define TRULY_NOOP_TRUNCATION.
>
> Can we easily get rid of it?

Not easily.  The problem is that the original 64-bit ISA says that the
behaviour of a 32-bit arithmetic instruction is undefined if the inputs
aren't in sign-extended form.  (A bit like GCC CONST_INTs :))  So:

  // $2 = 0x0_8000_
  addiu $3,$2,$2

has undefined behaviour, it must instead be:

  // $2 = 0x__8000_

The purpose of TRULY_NOOP_TRUNCATION is to ensure that we never form an
SImode register by taking a lowpart subreg of a DImode (or wider) register.

In other words, things are inverted to that truncating DI to SI is a
sign-extend operation (represented in RTL as a trunc) while sign-extending
from SI to DI is free (lowered to nothing after RA).

Richard

[PATCH v4 0/3] c++: Track lifetimes in constant evaluation [PR70331, ...]

2023-07-20 Thread Nathaniel Shead via Gcc-patches

This is an update of the patch series at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623375.html

Changes since v3:

- Use void_node in values map to indicate out-of-lifetime instead of a separate
  hash set
- Remove tracking of temporaries for loops and calls
- Fix missed checks for uses of empty classes outside lifetime and associated
  test
- Add reference to PR c++/110619 for the second patch, and corresponding new
  test case

Bootstrapped and regtested on x86_64-pc-linux-gnu.

Nathaniel Shead (3):
  c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]
  c++: Improve constexpr error for dangling local variables [PR110619]
  c++: Improve location information in constant evaluation

 gcc/cp/constexpr.cc   | 178 +-
 gcc/cp/semantics.cc   |   5 +-
 gcc/cp/typeck.cc  |   5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  |  10 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |   8 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |   8 +-
 .../g++.dg/cpp0x/constexpr-delete2.C  |   5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |   2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |   1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|   6 +-
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|   2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C |  10 +
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |   5 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  14 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 ++
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime6.C|  15 ++
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |   3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |   4 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda6.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda8.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   |  14 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |   4 +-
 .../g++.dg/cpp2a/constexpr-dynamic17.C|   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |   6 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/constinit10.C  |   5 +-
 .../g++.dg/cpp2a/is-corresponding-member4.C   |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla2.C |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla3.C |   4 +-
 gcc/testsuite/g++.dg/ubsan/pr63956.C  |  23 +--
 .../g++.dg/warn/Wreturn-local-addr-6.C|   3 -
 .../25_algorithms/equal/constexpr_neg.cc  |   7 +-
 .../testsuite/26_numerics/gcd/105844.cc   |  10 +-
 .../testsuite/26_numerics/lcm/105844.cc   |  14 +-
 50 files changed, 350 insertions(+), 168 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C

-- 
2.41.0

[PATCH v4 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-20 Thread Nathaniel Shead via Gcc-patches

This adds rudimentary lifetime tracking in C++ constexpr contexts,
allowing the compiler to report errors with using values after their
backing has gone out of scope. We don't yet handle other ways of
accessing values outside their lifetime (e.g. following explicit
destructor calls).

PR c++/96630
PR c++/98675
PR c++/70331

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_global_ctx::is_outside_lifetime): New
function.
(constexpr_global_ctx::get_value): Don't return expired values.
(constexpr_global_ctx::get_value_ptr): Likewise.
(constexpr_global_ctx::remove_value): Mark value outside
lifetime.
(outside_lifetime_error): New function.
(cxx_eval_call_expression): No longer track save_exprs.
(cxx_eval_loop_expr): Likewise.
(cxx_eval_constant_expression): Add checks for outside lifetime
values. Remove local variables at end of bind exprs, and
temporaries after cleanup points.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: New test.
* g++.dg/cpp1y/constexpr-lifetime2.C: New test.
* g++.dg/cpp1y/constexpr-lifetime3.C: New test.
* g++.dg/cpp1y/constexpr-lifetime4.C: New test.
* g++.dg/cpp1y/constexpr-lifetime5.C: New test.
* g++.dg/cpp1y/constexpr-lifetime6.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 132 --
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 +++
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime6.C|  15 ++
 7 files changed, 170 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..cd4424bcb44 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1148,7 +1148,8 @@ enum constexpr_switch_state {
 
 class constexpr_global_ctx {
   /* Values for any temporaries or local variables within the
- constant-expression. */
+ constant-expression. Objects outside their lifetime have
+ value 'void_node'.  */
   hash_map values;
 public:
   /* Number of cxx_eval_constant_expression calls (except skipped ones,
@@ -1170,17 +1171,28 @@ public:
 : constexpr_ops_count (0), cleanups (NULL), modifiable (nullptr),
   heap_dealloc_count (0) {}
 
+  bool is_outside_lifetime (tree t)
+  {
+if (tree *p = values.get(t))
+  if (*p == void_node)
+   return true;
+return false;
+  }
  tree get_value (tree t)
   {
 if (tree *p = values.get (t))
-  return *p;
+  if (*p != void_node)
+   return *p;
 return NULL_TREE;
   }
   tree *get_value_ptr (tree t)
   {
 if (modifiable && !modifiable->contains (t))
   return nullptr;
-return values.get (t);
+if (tree *p = values.get (t))
+  if (*p != void_node)
+   return p;
+return nullptr;
   }
   void put_value (tree t, tree v)
   {
@@ -1188,7 +1200,13 @@ public:
 if (!already_in_map && modifiable)
   modifiable->add (t);
   }
-  void remove_value (tree t) { values.remove (t); }
+  void remove_value (tree t)
+  {
+if (DECL_P (t))
+  values.put (t, void_node);
+else
+  values.remove (t);
+  }
 };
 
 /* Helper class for constexpr_global_ctx.  In some cases we want to avoid
@@ -3085,12 +3103,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  gcc_assert (!DECL_BY_REFERENCE (res));
  ctx->global->put_value (res, NULL_TREE);
 
- /* Track the callee's evaluated SAVE_EXPRs and TARGET_EXPRs so that
-we can forget their values after the call.  */
- constexpr_ctx ctx_with_save_exprs = *ctx;
- auto_vec save_exprs;
- ctx_with_save_exprs.save_exprs = &save_exprs;
- ctx_with_save_exprs.call = &new_call;
+ /* Remember the current call we're evaluating.  */
+ constexpr_ctx call_ctx = *ctx;
+ call_ctx.call = &new_call;
  unsigned save_heap_alloc_count = ctx->global->heap_vars.length ();
  unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count;
 
@@ -3101,7 +3116,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  non_constant_p, overflow_p);
 
  tree jump_target = NULL_TREE;
- cxx_eval_constant_expression (&ctx_with_save_exprs, body,
+ cxx_eval_constant_expressio

[PATCH v4 2/3] c++: Improve constexpr error for dangling local variables [PR110619]

2023-07-20 Thread Nathaniel Shead via Gcc-patches

Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values.

This patch removes this "optimisation". Relying on this raises a warning
by default and causes UB anyway, so there should be no issue in doing
so. We also suppress additional warnings from later passes that detect
this as a dangling pointer, since we've already indicated this anyway.

PR c++/110619

gcc/cp/ChangeLog:

* semantics.cc (finish_return_stmt): Suppress dangling pointer
reporting on return statement if already reported.
* typeck.cc (check_return_expr): Don't set return expression to
zero for dangling addresses.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime5.C: Test reported message is
correct.
* g++.dg/cpp1y/constexpr-lifetime6.C: Likewise.
* g++.dg/cpp1y/constexpr-110619.C: New test.
* g++.dg/warn/Wreturn-local-addr-6.C: Remove check for return
value optimisation.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/semantics.cc  |  5 -
 gcc/cp/typeck.cc |  5 +++--
 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C| 10 ++
 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C |  4 ++--
 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C |  8 
 gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C |  3 ---
 6 files changed, 23 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..107407de513 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1260,7 +1260,10 @@ finish_return_stmt (tree expr)
 
   r = build_stmt (input_location, RETURN_EXPR, expr);
   if (no_warning)
-suppress_warning (r, OPT_Wreturn_type);
+{
+  suppress_warning (r, OPT_Wreturn_type);
+  suppress_warning (r, OPT_Wdangling_pointer_);
+}
   r = maybe_cleanup_point_expr_void (r);
   r = add_stmt (r);
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 859b133a18d..47233b3b717 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11273,8 +11273,9 @@ check_return_expr (tree retval, bool *no_warning)
   else if (!processing_template_decl
   && maybe_warn_about_returning_address_of_local (retval, loc)
   && INDIRECT_TYPE_P (valtype))
-   retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
-build_zero_cst (TREE_TYPE (retval)));
+   /* Suppress the Wdangling-pointer warning in the return statement
+  that would otherwise occur.  */
+   *no_warning = true;
 }
 
   /* A naive attempt to reduce the number of -Wdangling-reference false
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
new file mode 100644
index 000..cca13302238
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wno-return-local-addr" }
+// PR c++/110619
+
+constexpr auto f() {
+int i = 0;
+return &i;
+};
+
+static_assert( f() != nullptr );
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
index a4bc71d890a..ad3ef579f63 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
@@ -1,11 +1,11 @@
 // { dg-do compile { target c++14 } }
 // { dg-options "-Wno-return-local-addr" }
 
-constexpr const int& id(int x) { return x; }
+constexpr const int& id(int x) { return x; }  // { dg-message "note: declared 
here" }
 
 constexpr bool test() {
   const int& y = id(3);
   return y == 3;
 }
 
-constexpr bool x = test();  // { dg-error "" }
+constexpr bool x = test();  // { dg-error "accessing object outside its 
lifetime" }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
index f358aff4490..b81e89af79c 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
@@ -4,12 +4,12 @@
 struct Empty {};
 
 constexpr const Empty& empty() {
-  return Empty{};
+  return Empty{};  // { dg-message "note: declared here" }
 }
 
-constexpr const Empty& empty_parm(Empty e) {
+constexpr const Empty& empty_parm(Empty e) {  // { dg-message "note: declared 
here" }
   return e;
 }
 
-constexpr Empty a = empty();  // { dg-error "" }
-constexpr Empty b = empty_parm({});  // { dg-error "" }
+constexpr Empty a = empty();  // { dg-error "accessing object outside its 
lifetime" }
+constexpr Empty b = empty_parm({});  // { dg-error "accessing object outside 
its lifetime" }
diff --git a/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C 
b/gcc/testsuite/g+

[PATCH v4 3/3] c++: Improve location information in constant evaluation

2023-07-20 Thread Nathaniel Shead via Gcc-patches

This patch updates 'input_location' during constant evaluation to ensure
that errors in subexpressions that lack location information still
provide accurate diagnostics.

By itself this change causes some small regressions in diagnostic
quality for circumstances where errors used 'input_location' but the
location of the parent subexpression doesn't make sense, so this patch
also includes a couple of other small diagnostic improvements to improve
the most egregious cases.

gcc/cp/ChangeLog:

* constexpr.cc (modifying_const_object_error): Find the source
location of the const object's declaration.
(cxx_eval_store_expression): Fall back to the location of the
target object when evaluating initialiser.
(cxx_eval_constant_expression): Update input_location to the location
of the currently evaluated expression, if possible.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/equal/constexpr_neg.cc: Update diagnostic
locations.
* testsuite/26_numerics/gcd/105844.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-48089.C: Update diagnostic locations.
* g++.dg/cpp0x/constexpr-70323.C: Likewise.
* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
* g++.dg/cpp0x/constexpr-delete2.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ice20.C: Likewise.
* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
* g++.dg/cpp0x/overflow1.C: Likewise.
* g++.dg/cpp1y/constexpr-89285.C: Likewise.
* g++.dg/cpp1y/constexpr-89481.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const14.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const16.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const18.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const19.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const21.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const22.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const3.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const4.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const7.C: Likewise.
* g++.dg/cpp1y/constexpr-union5.C: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda8.C: Likewise.
* g++.dg/cpp2a/bit-cast11.C: Likewise.
* g++.dg/cpp2a/bit-cast12.C: Likewise.
* g++.dg/cpp2a/bit-cast14.C: Likewise.
* g++.dg/cpp2a/constexpr-98122.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
* g++.dg/cpp2a/constexpr-init1.C: Likewise.
* g++.dg/cpp2a/constexpr-new12.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise.
* g++.dg/cpp2a/constinit10.C: Likewise.
* g++.dg/cpp2a/is-corresponding-member4.C: Likewise.
* g++.dg/ext/constexpr-vla2.C: Likewise.
* g++.dg/ext/constexpr-vla3.C: Likewise.
* g++.dg/ubsan/pr63956.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 46 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |  8 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |  8 ++--
 .../g++.dg/cpp0x/constexpr-delete2.C  |  5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|  6 +--
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |  5 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  4 +-
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  2 +-
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |  3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |  4 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |  4 +-
 .../g++.dg/cpp1z/constexpr-lambda6.C  |  4 +-
 .../g++.dg/cpp1

Re: [PATCH, OpenACC 2.7, v2] Implement host_data must have use_device clause requirement

2023-07-20 Thread Thomas Schwinge

Hi Chung-Lin!

On 2023-07-13T18:54:00+0800, Chung-Lin Tang  wrote:
> On 2023/6/16 5:13 PM, Thomas Schwinge wrote:
>> OK with one small change, please -- unless there's a reason for doing it
>> this way: [...]

> I've adjusted the Fortran implementation as you described. Yes, I agree this 
> way
> more fits current Fortran FE conventions.
>
> I've re-tested the attached v2 patch, will commit later this week if no major
> objections.

ACK, thanks.


Grüße
 Thomas


> gcc/c/ChangeLog:
>
>   * c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC
>   host_data construct to have an use_device clause.
>
> gcc/cp/ChangeLog:
>
>   * parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC
>   host_data construct to have an use_device clause.
>
> gcc/fortran/ChangeLog:
>
>   * openmp.cc (resolve_omp_clauses): Add checking requiring
>   OpenACC host_data construct to have an use_device clause.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/goacc/host_data-2.c: Adjust testcase.
>   * gfortran.dg/goacc/host_data-error.f90: New testcase.
>   * gfortran.dg/goacc/pr71704.f90: Adjust testcase.
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 24a6eb6e459..80920b31f83 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -18461,8 +18461,13 @@ c_parser_oacc_host_data (location_t loc, c_parser 
> *parser, bool *if_p)
>tree stmt, clauses, block;
>
>clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
> -"#pragma acc host_data");
> -
> +"#pragma acc host_data", false);
> +  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
> +{
> +  error_at (loc, "% construct requires % 
> clause");
> +  return error_mark_node;
> +}
> +  clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
>block = c_begin_omp_parallel ();
>add_stmt (c_parser_omp_structured_block (parser, if_p));
>stmt = c_finish_oacc_host_data (loc, clauses, block);
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 5e2b5cba57e..beb5b632e5e 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -45895,8 +45895,15 @@ cp_parser_oacc_host_data (cp_parser *parser, 
> cp_token *pragma_tok, bool *if_p)
>unsigned int save;
>
>clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
> - "#pragma acc host_data", pragma_tok);
> -
> + "#pragma acc host_data", pragma_tok,
> + false);
> +  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
> +{
> +  error_at (pragma_tok->location,
> + "% construct requires % clause");
> +  return error_mark_node;
> +}
> +  clauses = finish_omp_clauses (clauses, C_ORT_ACC);
>block = begin_omp_parallel ();
>save = cp_parser_begin_omp_structured_block (parser);
>cp_parser_statement (parser, NULL_TREE, false, if_p);
> diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
> index 8efc4b3ecfa..f7af02845de 100644
> --- a/gcc/fortran/openmp.cc
> +++ b/gcc/fortran/openmp.cc
> @@ -8764,6 +8764,12 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
> *omp_clauses,
>  "% clause", &omp_clauses->detach->where);
>  }
>
> +  if (openacc
> +  && code->op == EXEC_OACC_HOST_DATA
> +  && omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL)
> +gfc_error ("% construct at %L requires % 
> clause",
> +&code->loc);
> +
>if (omp_clauses->assume)
>  gfc_resolve_omp_assumptions (omp_clauses->assume);
>  }
> diff --git a/gcc/testsuite/c-c++-common/goacc/host_data-2.c 
> b/gcc/testsuite/c-c++-common/goacc/host_data-2.c
> index b3093e575ff..862a764eb3a 100644
> --- a/gcc/testsuite/c-c++-common/goacc/host_data-2.c
> +++ b/gcc/testsuite/c-c++-common/goacc/host_data-2.c
> @@ -8,7 +8,9 @@ void
>  f (void)
>  {
>int v2 = 3;
> -#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for 
> ..pragma acc host_data." } */
> +#pragma acc host_data copy(v2)
> +  /* { dg-error ".copy. is not valid for ..pragma acc host_data." "" { 
> target *-*-* } .-1 } */
> +  /* { dg-error ".host_data. construct requires .use_device. clause" "" { 
> target *-*-* } .-2 } */
>;
>
>  #pragma acc host_data use_device(v2)
> @@ -20,6 +22,9 @@ f (void)
>/* { dg-error ".use_device_ptr. variable is neither a pointer nor an 
> array" "" { target c } .-1 } */
>/* { dg-error ".use_device_ptr. variable is neither a pointer, nor an 
> array nor reference to pointer or array" "" { target c++ } .-2 } */
>;
> +
> +#pragma acc host_data /* { dg-error ".host_data. construct requires 
> .use_device. clause" } */
> +  ;
>  }
>
>
> diff --git a/gcc/testsuite/gfortran.dg/goacc/host_data-error.f90 
> b/gcc/testsuite/gfortran.dg/goacc/host_data-error.f90
> new file mode 100644
> index 000..bd262989410
> --- /dev/null
> +++ b/gc

Re: [PATCH v2] vect: Handle demoting FLOAT and promoting FIX_TRUNC.

2023-07-20 Thread Robin Dapp via Gcc-patches

>>   cvt_type
>> -   = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
>> +   = build_nonstandard_integer_type (GET_MODE_BITSIZE
>> + (intermediate_mode),
>>   0);
> 
> the 0); now fits on the previous line?
> 
> Otherwise looks OK to me.

Thanks,  I adjusted the things you remarked including the use of
int_mode_for_size.  Another thing I changed is continuing
instead of breaking when the current intermediate mode cannot hold
the range so it still has a chance to fit in the next larger one.

Bootstrap and testsuite are unchanged on x86, aarch64 and power and
I'm going to commit the attached barring further remarks.

Regards
 Robin

>From cabfa07256eafec4485304fe7639d8fd7512cf11 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 13 Jul 2023 09:10:06 +0200
Subject: [PATCH v3] vect: Handle demoting FLOAT and promoting FIX_TRUNC.

The recent changes that allowed multi-step conversions for
"non-packing/unpacking", i.e. modifier == NONE targets included
promoting to-float and demoting to-int variants.  This patch
adds the missing demoting to-float and promoting to-int handling.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_conversion): Handle
more demotion/promotion for modifier == NONE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c: 
New test.
* gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c: 
New test.
---
 .../conversions/vec-narrow-int64-float16.c| 12 
 .../conversions/vec-widen-float16-int64.c | 12 
 gcc/tree-vect-stmts.cc| 69 ++-
 3 files changed, 76 insertions(+), 17 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
new file mode 100644
index 000..ebee1cfa888
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh 
-mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+void convert (_Float16 *restrict dst, int64_t *restrict a, int n)
+{
+  for (int i = 0; i < n; i++)
+dst[i] = (_Float16) (a[i] & 0x7fff);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
new file mode 100644
index 000..eb0a17e99bc
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh 
-mabi=lp64d --param=riscv-autovec-preference=scalable -fno-trapping-math" } */
+
+#include 
+
+void convert (int64_t *restrict dst, _Float16 *restrict a, int n)
+{
+  for (int i = 0; i < n; i++)
+dst[i] = (int64_t) a[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cb86d544313..51173ecf145 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5192,31 +5192,66 @@ vectorizable_conversion (vec_info *vinfo,
break;
   }
 
-  /* For conversions between float and smaller integer types try whether we
-can use intermediate signed integer types to support the
+  /* For conversions between float and integer types try whether
+we can use intermediate signed integer types to support the
 conversion.  */
-  if ((code == FLOAT_EXPR
-  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
- || (code == FIX_TRUNC_EXPR
- && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)
- && !flag_trapping_math))
+  if (GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode)
+ && (code == FLOAT_EXPR ||
+ (code == FIX_TRUNC_EXPR && !flag_trapping_math)))
{
+ bool demotion = GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode);
  bool float_expr_p = code == FLOAT_EXPR;
- scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
- fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
+ unsigned short target_size;
+ scalar_mode intermediate_mode;
+ if (demotion)
+   {
+ intermediate_mode = lhs_mode;
+ target_size = GET_MODE_SIZE (rhs_mode)

[PATCH] testsuite: Add a test case for PR110729

2023-07-20 Thread Kewen.Lin via Gcc-patches

Hi,

As PR110729 reported, there was one issue for .section
__patchable_function_entries with -ffunction-sections, that
is we put the same symbol as link_to section symbol for all
functions wrongly.  The commit r13-4294 for PR99889 has
fixed this with the corresponding label LPFE* which sits in
the function_section.

As Fangrui suggested[1], this patch is to add a bit more test
coverage.  I didn't find a good way to check all linked_to
symbols are different, so I checked for LPFE[012] here.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624866.html

Tested well on x86_64-redhat-linux, powerpc64-linux-gnu
P7/P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/110729

gcc/testsuite/ChangeLog:

* gcc.dg/pr110729.c: New test.
---
 gcc/testsuite/gcc.dg/pr110729.c | 29 +
 1 file changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr110729.c

diff --git a/gcc/testsuite/gcc.dg/pr110729.c b/gcc/testsuite/gcc.dg/pr110729.c
new file mode 100644
index 000..92dfd8ae000
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110729.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
+/* { dg-require-effective-target o_flag_in_section } */
+/* { dg-options "-ffunction-sections -fpatchable-function-entry=2" } */
+/* { dg-additional-options "-fno-pie" { target sparc*-*-* } } */
+
+/* Verify there are three different link_to symbols for three
+.section __patchable_function_entries respectively.  */
+
+int
+f ()
+{
+  return 1;
+}
+
+int
+g ()
+{
+  return 2;
+}
+
+int
+h ()
+{
+  return 3;
+}
+
+/* { dg-final { scan-assembler-times {.section[\t 
]*__patchable_function_entries,.*,\.LPFE0} 1 } }  */
+/* { dg-final { scan-assembler-times {.section[\t 
]*__patchable_function_entries,.*,\.LPFE1} 1 } }  */
+/* { dg-final { scan-assembler-times {.section[\t 
]*__patchable_function_entries,.*,\.LPFE2} 1 } }  */
--
2.39.3

[PATCH] sccvn: Correct the index of bias for IFN_LEN_STORE [PR110744]

2023-07-20 Thread Kewen.Lin via Gcc-patches

Hi,

Commit r14-2267-gb8806f6ffbe72e adjusts the arguments order
of LEN_STORE from {len,vector,bias} to {len,bias,vector},
in order to make them consistent with LEN_MASK_STORE and
MASK_STORE.  But it missed to update the related handlings
in tree-ssa-sccvn.cc, it caused the failure shown in PR
110744.  This patch is to fix the related handlings with
the correct index.

Bootstrapped and regress-tested on x86_64-redhat-linux,
powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/110744

gcc/ChangeLog:

* tree-ssa-sccvn.cc (vn_reference_lookup_3): Correct the index of bias
operand for ifn IFN_LEN_STORE.
---
 gcc/tree-ssa-sccvn.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 11061a374a2..c0b3ec420c5 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -3299,11 +3299,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
return (void *)-1;
  break;
case IFN_LEN_STORE:
- len = gimple_call_arg (call, 2);
- bias = gimple_call_arg (call, 4);
- if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
-   return (void *)-1;
- break;
+ {
+   int len_index = internal_fn_len_index (fn);
+   len = gimple_call_arg (call, len_index);
+   bias = gimple_call_arg (call, len_index + 1);
+   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
+ return (void *) -1;
+   break;
+ }
default:
  return (void *)-1;
}
--
2.39.3

[r14-2629 Regression] FAIL: g++.dg/cpp0x/udlit-extended-id-3.C -std=c++20 (test for excess errors) on Linux/x86_64

2023-07-20 Thread haochen.jiang via Gcc-patches

On Linux/x86_64,

1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2 is the first bad commit
commit 1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2
Author: Lewis Hyatt 
Date:   Tue Jul 18 17:16:08 2023 -0400

libcpp: Handle extended characters in user-defined literal suffix [PR103902]

caused

FAIL: g++.dg/cpp0x/udlit-extended-id-1.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/cpp0x/udlit-extended-id-1.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/cpp0x/udlit-extended-id-1.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp0x/udlit-extended-id-3.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/cpp0x/udlit-extended-id-3.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/cpp0x/udlit-extended-id-3.C  -std=c++20 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2629/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp0x/udlit-extended-id-1.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp0x/udlit-extended-id-1.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp0x/udlit-extended-id-3.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp0x/udlit-extended-id-3.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread haochen.jiang via Gcc-patches

On Linux/x86_64,

c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
commit c1e420549f2305efb70ed37e693d380724eb7540
Author: Maciej W. Rozycki 
Date:   Wed Jul 19 11:59:29 2023 +0100

testsuite: Add 64-bit vector variant for bb-slp-pr95839.c

caused

FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c -flto -ffat-lto-objects  scan-tree-dump 
slp2 "optimized: basic block"
FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic 
block"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2639/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr95839-v8.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr95839-v8.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 11:16 AM Maciej W. Rozycki  wrote:
>
> On Thu, 20 Jul 2023, Richard Biener wrote:
>
> > >  Thanks for making this improvement.  I've checked MIPS results and code
> > > produced now is as follows:
> > >
> > > daddiu  $sp,$sp,-64
> > > sd  $5,24($sp)
> > > sd  $7,40($sp)
> > > ldc1$f0,24($sp)
> > > ldc1$f1,40($sp)
> > > sd  $4,16($sp)
> > > sd  $6,32($sp)
> > > ldc1$f2,32($sp)
> > > add.ps  $f1,$f0,$f1
> > > ldc1$f0,16($sp)
> > > add.ps  $f0,$f0,$f2
> > > sdc1$f1,56($sp)
> > > ld  $3,56($sp)
> > > sdc1$f0,48($sp)
> > > ld  $2,48($sp)
> > > jr  $31
> > > daddiu  $sp,$sp,64
> > >
> > > which does do vector stuff now, although it's still considerably worse
> > > than my handwritten example:
> > >
> > > > > dmtc1   $4,$f0
> > > > > dmtc1   $5,$f1
> > > > > dmtc1   $6,$f2
> > > > > dmtc1   $7,$f3
> > > > > add.ps  $f0,$f0,$f1
> > > > > add.ps  $f2,$f2,$f3
> > > > > dmfc1   $2,$f0
> > > > > jr  $31
> > > > > dmfc1   $3,$f2
> > >
> > > Or I'd say it's pretty terrible, but given the current situation with the
> > > MIPS backend I'm going to leave it to the new maintainer to sort out.
> >
> > Yeah, I also wondered what is wrong ... I suspect it's the usual issue
> > of parameter passing causing spilling ...
>
>  There's no such requirement in the psABI and I fail to see a plausible
> justification.  And direct GPR<->FPR move patterns are available in the
> backend for the V2SF mode.  Also there's no delay slot requirement even
> for these move instructions for MIPS64r1+ ISA levels, which have this
> paired-single FP format defined.  It seems to me a plain bug (or missed
> optimisation if you prefer).

Definitely.  OTOH parameter/return passing for V4SFmode while
appearantly being done in registers the backend(?) assigns BLKmode
to the V4SFmode arguments so they get immediately spilled in the
code moving the incoming hardregisters to pseudos (or stack as in
this case).  It comes down to the issue that Jiufu Guo is eventually
addressing with adding SRA-style heuristics to the code chosing
the layout of that storage.  Interestingly for the return value we get
TImode.

Note we don't seem to be able to optimize

(insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 78 $frame)
(const_int 24 [0x18])) [1 a+8 S8 A64])
(reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:DI 5 $5)
(nil)))
...
(insn 40 7 41 2 (set (reg:V2SF 205 [ a+8 ])
(mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
(const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
{*movv2sf}
 (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
(const_int 24 [0x18])) [1 a+8 S8 A64])
(nil)))

for some reason.  Maybe we are afraid of the hardreg use in the store,
maybe it is because the store is in the prologue (before
NOTE_INSN_FUNCTION_BEG).  Also postreload isn't able to fix this:

(insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
(const_int 24 [0x18])) [1 a+8 S8 A64])
(reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
 (nil))
...
(insn 40 7 41 2 (set (reg:V2SF 32 $f0 [orig:205 a+8 ] [205])
(mem/c:V2SF (plus:DI (reg/f:DI 29 $sp)
(const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
{*movv2sf}
 (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
(const_int 24 [0x18])) [1 a+8 S8 A64])
(nil)))

so something is amiss in the backend as well if you say there should be
direct moves available.

Richard.

>
>   Maciej

Re: [PATCH] testsuite: Add a test case for PR110729

2023-07-20 Thread Richard Sandiford via Gcc-patches

"Kewen.Lin"  writes:
> Hi,
>
> As PR110729 reported, there was one issue for .section
> __patchable_function_entries with -ffunction-sections, that
> is we put the same symbol as link_to section symbol for all
> functions wrongly.  The commit r13-4294 for PR99889 has
> fixed this with the corresponding label LPFE* which sits in
> the function_section.
>
> As Fangrui suggested[1], this patch is to add a bit more test
> coverage.  I didn't find a good way to check all linked_to
> symbols are different, so I checked for LPFE[012] here.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624866.html
>
> Tested well on x86_64-redhat-linux, powerpc64-linux-gnu
> P7/P8/P9 and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
>   PR testsuite/110729
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/pr110729.c: New test.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.dg/pr110729.c | 29 +
>  1 file changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr110729.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr110729.c b/gcc/testsuite/gcc.dg/pr110729.c
> new file mode 100644
> index 000..92dfd8ae000
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110729.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
> +/* { dg-require-effective-target o_flag_in_section } */
> +/* { dg-options "-ffunction-sections -fpatchable-function-entry=2" } */
> +/* { dg-additional-options "-fno-pie" { target sparc*-*-* } } */
> +
> +/* Verify there are three different link_to symbols for three
> +.section __patchable_function_entries respectively.  */
> +
> +int
> +f ()
> +{
> +  return 1;
> +}
> +
> +int
> +g ()
> +{
> +  return 2;
> +}
> +
> +int
> +h ()
> +{
> +  return 3;
> +}
> +
> +/* { dg-final { scan-assembler-times {.section[\t 
> ]*__patchable_function_entries,.*,\.LPFE0} 1 } }  */
> +/* { dg-final { scan-assembler-times {.section[\t 
> ]*__patchable_function_entries,.*,\.LPFE1} 1 } }  */
> +/* { dg-final { scan-assembler-times {.section[\t 
> ]*__patchable_function_entries,.*,\.LPFE2} 1 } }  */
> --
> 2.39.3

Re: [PATCH] sccvn: Correct the index of bias for IFN_LEN_STORE [PR110744]

2023-07-20 Thread Richard Sandiford via Gcc-patches

"Kewen.Lin"  writes:
> Hi,
>
> Commit r14-2267-gb8806f6ffbe72e adjusts the arguments order
> of LEN_STORE from {len,vector,bias} to {len,bias,vector},
> in order to make them consistent with LEN_MASK_STORE and
> MASK_STORE.  But it missed to update the related handlings
> in tree-ssa-sccvn.cc, it caused the failure shown in PR
> 110744.  This patch is to fix the related handlings with
> the correct index.
>
> Bootstrapped and regress-tested on x86_64-redhat-linux,
> powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
>   PR tree-optimization/110744
>
> gcc/ChangeLog:
>
>   * tree-ssa-sccvn.cc (vn_reference_lookup_3): Correct the index of bias
>   operand for ifn IFN_LEN_STORE.

OK, thanks.

Richard

> ---
>  gcc/tree-ssa-sccvn.cc | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..c0b3ec420c5 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3299,11 +3299,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   return (void *)-1;
> break;
>   case IFN_LEN_STORE:
> -   len = gimple_call_arg (call, 2);
> -   bias = gimple_call_arg (call, 4);
> -   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> - return (void *)-1;
> -   break;
> +   {
> + int len_index = internal_fn_len_index (fn);
> + len = gimple_call_arg (call, len_index);
> + bias = gimple_call_arg (call, len_index + 1);
> + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> +   return (void *) -1;
> + break;
> +   }
>   default:
> return (void *)-1;
>   }
> --
> 2.39.3

[PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Biener via Gcc-patches

When we materialize a layout we push edge permutes to constant/external
defs without checking we can actually do so.  For externals defined
by vector stmts rather than scalar components we can't.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/110742
* tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
Do not materialize an edge permutation in an external node with
vector defs.
(vect_slp_analyze_node_operations_1): Guard purely internal
nodes better.

* g++.dg/torture/pr110742.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
 gcc/tree-vect-slp.cc|  8 +++--
 2 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C

diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
b/gcc/testsuite/g++.dg/torture/pr110742.C
new file mode 100644
index 000..d41ac0479d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110742.C
@@ -0,0 +1,47 @@
+// { dg-do compile }
+
+struct HARD_REG_SET {
+  HARD_REG_SET operator~() const {
+HARD_REG_SET res;
+for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
+  res.elts[i] = ~elts[i];
+return res;
+  }
+  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
+HARD_REG_SET res;
+for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
+  res.elts[i] = elts[i] & other.elts[i];
+return res;
+  }
+  unsigned long elts[4];
+};
+typedef const HARD_REG_SET &const_hard_reg_set;
+inline bool hard_reg_set_subset_p(const_hard_reg_set x, const_hard_reg_set y) {
+  unsigned long bad = 0;
+  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
+bad |= (x.elts[i] & ~y.elts[i]);
+  return bad == 0;
+}
+inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
+  unsigned long bad = 0;
+  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
+bad |= x.elts[i];
+  return bad == 0;
+}
+extern HARD_REG_SET rr[2];
+extern int t[2];
+extern HARD_REG_SET nn;
+static HARD_REG_SET mm;
+void setup_reg_class_relations(void) {
+  HARD_REG_SET intersection_set, union_set, temp_set2;
+  for (int cl2 = 0; cl2 < 2; cl2++) {
+temp_set2 = rr[cl2] & ~nn;
+if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
+  mm = rr[0] & nn;
+  if (hard_reg_set_subset_p(mm, intersection_set))
+if (!hard_reg_set_subset_p(mm, temp_set2) ||
+hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
+  t[cl2] = 0;
+}
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 693621ca990..1d79c77e8ce 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout (slp_tree 
node,
 return result;
 
   if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
-  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
+  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
+ && (to_layout_i == 0
+ /* We can't permute vector defs.  */
+ || SLP_TREE_VEC_DEFS (node).is_empty (
 {
   /* If the vector is uniform or unchanged, there's nothing to do.  */
   if (to_layout_i == 0 || vect_slp_tree_uniform_p (node))
@@ -5944,7 +5947,8 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
  calculated by the recursive call).  Otherwise it is the number of
  scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by
  VF divided by the number of elements in a vector.  */
-  if (!STMT_VINFO_DATA_REF (stmt_info)
+  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
+  && !STMT_VINFO_DATA_REF (stmt_info)
   && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
 {
   for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i)
-- 
2.35.3

Re: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer

2023-07-20 Thread juzhe.zh...@rivai.ai

Just finish Bootstrap and regression on X86.

Ok for trunk ?


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-07-20 16:06
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer
From: Ju-Zhe Zhong 
 
Hi, Richard and Richi.
 
I plan to refine the codes that I recently support for RVV auto-vectorization.
This patch is inspired last review comments from Richard:
https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/
 
Richard said he prefer the the code structure as follows:
 
Please instead switch the if condition so that the structure is:
 
   if (...)
 vect_record_loop_mask (...)
   else if (...)
 vect_record_loop_len (...)
   else
 can't use partial vectors
 
This is his last comments.
 
So, I come back to refine this piece of codes.
 
Does it look reasonable ?
 
This next refine patch is change all names of "LEN_MASK" into "MASK_LEN" but 
should come after this
patch.
 
gcc/ChangeLog:
 
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Refine code 
structure.
 
---
gcc/tree-vect-stmts.cc | 38 +-
1 file changed, 17 insertions(+), 21 deletions(-)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cb86d544313..b86e159ae4c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1605,6 +1605,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
 nvectors = vect_get_num_copies (loop_vinfo, vectype);
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   machine_mode vecmode = TYPE_MODE (vectype);
   bool is_load = (vls_type == VLS_LOAD);
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
@@ -1631,33 +1632,29 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   internal_fn ifn = (is_load
? IFN_MASK_GATHER_LOAD
: IFN_MASK_SCATTER_STORE);
-  if (!internal_gather_scatter_fn_supported_p (ifn, vectype,
-gs_info->memory_type,
-gs_info->offset_vectype,
-gs_info->scale))
- {
-   ifn = (is_load
- ? IFN_LEN_MASK_GATHER_LOAD
- : IFN_LEN_MASK_SCATTER_STORE);
-   if (internal_gather_scatter_fn_supported_p (ifn, vectype,
-   gs_info->memory_type,
-   gs_info->offset_vectype,
-   gs_info->scale))
- {
-   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
-   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
-   return;
- }
+  internal_fn len_ifn = (is_load
+  ? IFN_LEN_MASK_GATHER_LOAD
+  : IFN_LEN_MASK_SCATTER_STORE);
+  if (internal_gather_scatter_fn_supported_p (ifn, vectype,
+   gs_info->memory_type,
+   gs_info->offset_vectype,
+   gs_info->scale))
+ vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
+scalar_mask);
+  else if (internal_gather_scatter_fn_supported_p (len_ifn, vectype,
+gs_info->memory_type,
+gs_info->offset_vectype,
+gs_info->scale))
+ vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
+  else
+ {
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors because"
 " the target doesn't have an appropriate"
 " gather load or scatter store instruction.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
-   return;
}
-  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
-  scalar_mask);
   return;
 }
@@ -1703,7 +1700,6 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   if (get_len_load_store_mode (vecmode, is_load).exists (&vmode))
 {
   nvectors = group_memory_nvectors (group_size * vf, nunits);
-  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode);
   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor);
   using_partial_vectors_p = true;
-- 
2.36.3

[PATCH] tree-optimization/110204 - second level redundancy and simplification

2023-07-20 Thread Richard Biener via Gcc-patches

When PRE discovers a full redundancy during insertion it cannot unite
the two value sets.  Instead it inserts a copy old-val = new-val where
new-val can also be a constant.  The following looks through such
copies during elimination, providing one extra level of constant and
copy propagation.  For the PR this helps avoiding a bogus diagnostic
that's emitted on unreachable code during loop optimization.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR tree-optimization/110204
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_avail):
Look through copies generated by PRE.
---
 gcc/tree-ssa-sccvn.cc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 11061a374a2..a0b98c18ac8 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6594,7 +6594,22 @@ eliminate_dom_walker::eliminate_avail (basic_block, tree 
op)
   if (SSA_NAME_IS_DEFAULT_DEF (valnum))
return valnum;
   if (avail.length () > SSA_NAME_VERSION (valnum))
-   return avail[SSA_NAME_VERSION (valnum)];
+   {
+ tree av = avail[SSA_NAME_VERSION (valnum)];
+ /* When PRE discovers a new redundancy there's no way to unite
+the value classes so it instead inserts a copy old-val = new-val.
+Look through such copies here, providing one more level of
+simplification at elimination time.  */
+ gassign *ass;
+ if (av && (ass = dyn_cast  (SSA_NAME_DEF_STMT (av
+   if (gimple_assign_rhs_class (ass) == GIMPLE_SINGLE_RHS)
+ {
+   tree rhs1 = gimple_assign_rhs1 (ass);
+   if (CONSTANT_CLASS_P (rhs1) || TREE_CODE (rhs1) == SSA_NAME)
+ av = rhs1;
+ }
+ return av;
+   }
 }
   else if (is_gimple_min_invariant (valnum))
 return valnum;
-- 
2.35.3

Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 1:46 PM haochen.jiang via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> commit c1e420549f2305efb70ed37e693d380724eb7540
> Author: Maciej W. Rozycki 
> Date:   Wed Jul 19 11:59:29 2023 +0100
>
> testsuite: Add 64-bit vector variant for bb-slp-pr95839.c

I think the issue is we disable V2SF on ia32 because of the conflict
with MMX which we
don't want to use.

> caused
>
> FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c -flto -ffat-lto-objects  scan-tree-dump 
> slp2 "optimized: basic block"
> FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic 
> block"
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2639/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr95839-v8.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr95839-v8.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Just finish Bootstrap and regression on X86.
> 
> Ok for trunk ?

OK.  Not an issue currently but I think LEN_MASK should be
checked before MASK.

Richard.

> 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-07-20 16:06
> To: gcc-patches
> CC: richard.sandiford; rguenther; Ju-Zhe Zhong
> Subject: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer
> From: Ju-Zhe Zhong 
>  
> Hi, Richard and Richi.
>  
> I plan to refine the codes that I recently support for RVV auto-vectorization.
> This patch is inspired last review comments from Richard:
> https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/
>  
> Richard said he prefer the the code structure as follows:
>  
> Please instead switch the if condition so that the structure is:
>  
>if (...)
>  vect_record_loop_mask (...)
>else if (...)
>  vect_record_loop_len (...)
>else
>  can't use partial vectors
>  
> This is his last comments.
>  
> So, I come back to refine this piece of codes.
>  
> Does it look reasonable ?
>  
> This next refine patch is change all names of "LEN_MASK" into "MASK_LEN" but 
> should come after this
> patch.
>  
> gcc/ChangeLog:
>  
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Refine code 
> structure.
>  
> ---
> gcc/tree-vect-stmts.cc | 38 +-
> 1 file changed, 17 insertions(+), 21 deletions(-)
>  
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cb86d544313..b86e159ae4c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1605,6 +1605,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>  nvectors = vect_get_num_copies (loop_vinfo, vectype);
>vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>machine_mode vecmode = TYPE_MODE (vectype);
>bool is_load = (vls_type == VLS_LOAD);
>if (memory_access_type == VMAT_LOAD_STORE_LANES)
> @@ -1631,33 +1632,29 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>internal_fn ifn = (is_load
> ? IFN_MASK_GATHER_LOAD
> : IFN_MASK_SCATTER_STORE);
> -  if (!internal_gather_scatter_fn_supported_p (ifn, vectype,
> -gs_info->memory_type,
> -gs_info->offset_vectype,
> -gs_info->scale))
> - {
> -   ifn = (is_load
> - ? IFN_LEN_MASK_GATHER_LOAD
> - : IFN_LEN_MASK_SCATTER_STORE);
> -   if (internal_gather_scatter_fn_supported_p (ifn, vectype,
> -   gs_info->memory_type,
> -   gs_info->offset_vectype,
> -   gs_info->scale))
> - {
> -   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> -   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> -   return;
> - }
> +  internal_fn len_ifn = (is_load
> +  ? IFN_LEN_MASK_GATHER_LOAD
> +  : IFN_LEN_MASK_SCATTER_STORE);
> +  if (internal_gather_scatter_fn_supported_p (ifn, vectype,
> +   gs_info->memory_type,
> +   gs_info->offset_vectype,
> +   gs_info->scale))
> + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> +scalar_mask);
> +  else if (internal_gather_scatter_fn_supported_p (len_ifn, vectype,
> +gs_info->memory_type,
> +gs_info->offset_vectype,
> +gs_info->scale))
> + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> +  else
> + {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors because"
>  " the target doesn't have an appropriate"
>  " gather load or scatter store instruction.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -   return;
> }
> -  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> -  scalar_mask);
>return;
>  }
> @@ -1703,7 +1700,6 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>if (get_len_load_store_mode (vecmode, is_load).exists (&vmode))
>  {
>nvectors = group_memory_nvectors (group_size * vf, nunits);
> -  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE 
> (vecmode);
>vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor);
>using_partial_vectors_p = true;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-20 Thread Maciej W. Rozycki

On Thu, 20 Jul 2023, Richard Biener wrote:

> >  There's no such requirement in the psABI and I fail to see a plausible
> > justification.  And direct GPR<->FPR move patterns are available in the
> > backend for the V2SF mode.  Also there's no delay slot requirement even
> > for these move instructions for MIPS64r1+ ISA levels, which have this
> > paired-single FP format defined.  It seems to me a plain bug (or missed
> > optimisation if you prefer).
> 
> Definitely.  OTOH parameter/return passing for V4SFmode while
> appearantly being done in registers the backend(?) assigns BLKmode
> to the V4SFmode arguments so they get immediately spilled in the

 MIPS NewABI targets use registers to return data of small aggregate types
(effectively of up to the TImode size), so this seems reasonable to me.  
FP scalars and aggregates made of up to two fields are returned in FPRs 
and any other data is returned in GPRs:

"* Function results are returned in $2 (and $3 if needed), or $f0 (and $f2 
   if needed), as appropriate for the type.  Composite results (struct, 
   union, or array) are returned in $2/$f0 and $3/$f2 according to the 
   following rules:

"  - A struct with only one or two floating point fields is returned in 
 $f0 (and $f2 if necessary).  This is a generalization of the Fortran 
 COMPLEX case.

"  - Any other struct or union results of at most 128 bits are returned in 
 $2 (first 64 bits) and $3 (remainder, if necessary)."

Given that V4SFmode data has more than two FP fields (it's effectively an 
array of four) it is correctly returned in GPRs (even though the advantage 
of this arrangement is questionable, but the NewABI predates the invention 
of the paired-single FP format by a few years, which was only introduced 
with the MIPS V ISA, and actually implemented with the MIPS64r1 ISA even 
later).  A similar NewABI rule works here for the arguments.

 I suspect the relevant part of the backend handles it correctly for other 
modes and was missed in the update for V4SFmode, which was a change on its 
own.  The only sufficiently old version of GCC I have ready to use is 
4.1.2 and it produces the same code, so at least it does not seem to be a 
regression.

> code moving the incoming hardregisters to pseudos (or stack as in
> this case).  It comes down to the issue that Jiufu Guo is eventually
> addressing with adding SRA-style heuristics to the code chosing
> the layout of that storage.  Interestingly for the return value we get
> TImode.

 That may come from the use of the GPRs I suppose.

> Note we don't seem to be able to optimize
> 
> (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 78 $frame)
> (const_int 24 [0x18])) [1 a+8 S8 A64])
> (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
>  (expr_list:REG_DEAD (reg:DI 5 $5)
> (nil)))
> ...
> (insn 40 7 41 2 (set (reg:V2SF 205 [ a+8 ])
> (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
> (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
> {*movv2sf}
>  (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
> (const_int 24 [0x18])) [1 a+8 S8 A64])
> (nil)))
> 
> for some reason.  Maybe we are afraid of the hardreg use in the store,

 I believe the reason is the relevant constraints use the `*' modifier so 
as not to spill FP values to GPRs or vice versa (ISTR a discussion as to 
why we should prevent it from happening and I don't remember the outcome, 
but overall it seems reasonable to me), so once we've spilled to memory it 
won't be undone.  That doesn't mean we should refrain from moving directly 
when data is there already in the "wrong" kind of register.

> maybe it is because the store is in the prologue (before
> NOTE_INSN_FUNCTION_BEG).  Also postreload isn't able to fix this:
> 
> (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
> (const_int 24 [0x18])) [1 a+8 S8 A64])
> (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
>  (nil))
> ...
> (insn 40 7 41 2 (set (reg:V2SF 32 $f0 [orig:205 a+8 ] [205])
> (mem/c:V2SF (plus:DI (reg/f:DI 29 $sp)
> (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
> {*movv2sf}
>  (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
> (const_int 24 [0x18])) [1 a+8 S8 A64])
> (nil)))
> 
> so something is amiss in the backend as well if you say there should be
> direct moves available.

 There are, they're alternatives #5/#6 (`mtc'/`mfc') in `*movv2sf' and 
they're handled correctly by `mips_output_move' AFAICT.  Hardware has 
always had it, so there's no ISA constraint here.

 But as I say, I'm leaving it to the backend maintainer to sort out.

  Maciej

Re: loop-ch improvements, part 3

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 9:10 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this patch makes tree-ssa-loop-ch to understand if-combined conditionals 
> (which
> are quite common) and remove the IV-derived heuristics.  That heuristics is
> quite dubious because every variable with PHI in header of integral or pointer
> type is seen as IV, so in the first basic block we match all loop invariants 
> as
> invariants and everything that chagnes in loop as IV-like.
>
> I think the heuristics was mostly there to make header duplication happen when
> the exit conditional is constant false in the first iteration and with ranger
> we can work this out in good enough precision.
>
> The patch adds notion of "combined exit" which has conditional that is
> and/or/xor of loop invariant exit and exit known to be false in first
> iteration.  Copying these is a win since the loop conditional will simplify
> in both copies.
>
> It seems that those are usual bit or/and/xor and the code size accounting is
> true only when the values have at most one bit set or when the static constant
> and invariant versions are simple (such as all zeros).  I am not testing this,
> so the code may be optimistic here.  I think it is not common enough to matter
> and I can not think of correct condition that is not quite complex.
>
> I also improved code size estimate not accounting non-conditionals that are
> know to be constant in peeled copy and improved debug output.
>
> This requires testsuite compensaiton.  uninit-pred-loop-1.c.C does:
>
> /* { dg-do compile } */
> /* { dg-options "-Wuninitialized -O2 -std=c++98" } */
>
> extern int bar();
> int foo(int n, int m)
> {
>  for (;;) {
>int err = ({int _err;
>  for (int i = 0; i < 16; ++i) {
>if (m+i > n)
>   break;
>_err = 17;
>_err = bar();
>  }
>  _err;
>});
>
>if (err == 0) return 17;
> }
>
> Before path we duplicate
>if (m+i > n)
> which makes maybe-uninitialized warning to not be output.  I do not quite see
> why copying this out would be a win, since it won't simlify.  Also I think the
> warning is correct.  if m>n the loop will bail out before initializing _err 
> and
> it will be used unitialized.  I think it is bug elsewhere that header
> duplication supresses this.
>
> copy headers does:
> int is_sorted(int *a, int n, int m, int k)
> {
>   for (int i = 0; i < n - 1 && m && k > i; i++)
> if (a[i] > a[i + 1])
>   return 0;
>   return 1;
> }
>
> it tests that all three for statement conditionals are duplicaed.  With patch
> we no longer do k>i since it is not going to simplify.  So I added test
> ensuring that k is positive.  Also the tests requires disabling if-combining 
> and
> vrp to avoid conditionals becoming combined ones. So I aded new version of 
> test
> that we now behave correctly aslo with if-combine.
>
> ivopt_mult_2.c and ivopt_mult_1.c seems to require loop header
> duplication for ivopts to behave particular way, so I also ensured by value
> range that the header is duplicated.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-ch.cc (edge_range_query): Rename to ...
> (get_range_query): ... this one; do
> (static_loop_exit): Add query parametr, turn ranger to reference.
> (loop_static_stmt_p): New function.
> (loop_static_op_p): New function.
> (loop_iv_derived_p): Remove.
> (loop_combined_static_and_iv_p): New function.
> (should_duplicate_loop_header_p): Discover combined onditionals;
> do not track iv derived; improve dumps.
> (pass_ch::execute): Fix whitespace.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/uninit-pred-loop-1_c.C: Allow warning.
> * gcc.dg/tree-ssa/copy-headers-7.c: Add tests so exit conditition is
> static; update template.
> * gcc.dg/tree-ssa/ivopt_mult_1.c: Add test so exit condition is 
> static.
> * gcc.dg/tree-ssa/ivopt_mult_2.c: Add test so exit condition is 
> static.
> * gcc.dg/tree-ssa/copy-headers-8.c: New test.
>
> diff --git a/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C 
> b/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
> index 711812aae1b..1ee1615526f 100644
> --- a/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
> +++ b/gcc/testsuite/g++.dg/uninit-pred-loop-1_c.C
> @@ -15,7 +15,7 @@ int foo(int n, int m)
>   _err;
> });
>
> -   if (err == 0) return 17;
> +   if (err == 0) return 17;/* { dg-warning "uninitialized" "warning" } */
>   }
>
>   return 18;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> index 3c9b3807041..e2a6c75f2e9 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> @@ -3,9 +3,10 @@
>
>  int is_sorted(int *a, int n, int m, int k)
>  {
> -  for (int i = 0; i < n - 1 && m && k > i; i++)
> -if (a[i] > a[i + 1])
> -  return 0;
> +  if (k > 0)
> +fo

Re: [PATCH v4] Introduce attribute sym

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 1:11 AM Alexandre Oliva  wrote:
>
> On Jul 18, 2023, Richard Biener  wrote:
>
> > I think the __symver__ attribute does something similar already so
> > maybe use __attribute__((__sym__("foo")))?
>
> Cool, thanks, that will do.  Regstrapped on x86_64-linux-gnu.  Ok to
> install?
>
>
> This patch introduces an attribute to add extra asm names (aliases)
> for a decl when its definition is output.  The main goal is to ease
> interfacing C++ with Ada, as C++ mangled names have to be named, and
> in some cases (e.g. when using stdint.h typedefs in function
> arguments) the symbol names may vary across platforms.
>
> The attribute is usable in C and C++, presumably in all C-family
> languages.  It can be attached to global variables and functions.  In
> C++, it can also be attached to class types, namespace-scoped
> variables and functions, static data members, member functions,
> explicit instantiations and specializations of template functions,
> members and classes.
>
> When applied to constructors or destructor, additional sym aliases
> with _Base and _Del suffixes are defined for variants other than
> complete-object ones.  This changes the assumption that clones always
> carry the same attributes as their abstract declarations, so there is
> now a function to adjust them.
>
> C++ also had a bug in which attributes from local extern declarations
> failed to be propagated to a preexisting corresponding
> namespace-scoped decl.  I've fixed that, and adjusted acc tests that
> distinguished between C and C++ in this regard.
>
> Applying the attribute to class types is only valid in C++, and the
> effect is to attach the alias to the RTTI object associated with the
> class type.

I wonder if we could have shared some of the cgraph/varasm bits
with the symver attribute handling?  It's just a new 'sym' but
without the version part?

I hope Honza can chime in here.

Thanks,
Richard.

> for  gcc/ChangeLog
>
> * attribs.cc: Include cgraph.h.
> (decl_attributes): Allow late introduction of sym alias in
> types.
> (create_sym_alias_decl, create_sym_alias_decls): New.
> * attribs.h: Declare them.
> (FOR_EACH_SYM_ALIAS): New macro.
> * cgraph.cc (cgraph_node::create): Create sym alias decls.
> * varpool.cc (varpool_node::get_create): Create sym alias
> decls.
> * cgraph.h (symtab_node::remap_sym_alias_target): New.
> * symtab.cc (symtab_node::remap_sym_alias_target): Define.
> * cgraphunit.cc (cgraph_node::analyze): Create alias_target
> node if needed.
> (analyze_functions): Fixup visibility of implicit alias only
> after its node is analyzed.
> * doc/extend.texi (sym): Document for variables, functions and
> types.
>
> for  gcc/ada/ChangeLog
>
> * doc/gnat_rm/interfacing_to_other_languages.rst: Mention
> attribute sym to give RTTI symbols mnemonic names.
> * doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
> aliases.  Fix incorrect ref to C1 ctor variant.
>
> for  gcc/c-family/ChangeLog
>
> * c-ada-spec.cc (pp_asm_name): Use first sym alias if
> available.
> * c-attribs.cc (handle_sym_attribute): New.
> (c_common_attribute_table): Add sym.
> (handle_copy_attribute): Do not copy sym attribute.
>
> for  gcc/c/ChangeLog
>
> * c-decl.cc (duplicate_decls): Remap sym alias target.
>
> for  gcc/cp/ChangeLog
>
> * class.cc (adjust_clone_attributes): New.
> (copy_fndecl_with_name, build_clone): Call it.
> * cp-tree.h (adjust_clone_attributes): Declare.
> (update_sym_alias_interface): Declare.
> (update_tinfo_sym_alias): Declare.
> * decl.cc (duplicate_decls): Remap sym_alias target.
> Adjust clone attributes.
> (grokfndecl): Tentatively create sym alias decls after
> adding attributes in e.g. a template member function explicit
> instantiation.
> * decl2.cc (cplus_decl_attributes): Update tinfo sym alias.
> (copy_interface, update_sym_alias_interface): New.
> (determine_visibility): Update sym alias interface.
> (tentative_decl_linkage, import_export_decl): Likewise.
> * name-lookup.cc: Include target.h and cgraph.h.
> (push_local_extern_decl_alias): Merge attributes with
> namespace-scoped decl, and drop duplicate sym alias.
> * optimize.cc (maybe_clone_body): Re-adjust attributes after
> cloning them.  Update sym alias interface.
> * rtti.cc: Include attribs.h and cgraph.h.
> (get_tinfo_decl): Copy sym attributes from type to tinfo decl.
> Create sym alias decls.
> (update_tinfo_sym_alias): New.
>
> for  gcc/testsuite/ChangeLog
>
> * c-c++-common/goacc/declare-1.c: Adjust.
> * c-c++-common/goacc/declare-2.c: Adjust.
> * c-c++-common/torture/attr-sym-1.c: New.
> * c-c++-commo

Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Maciej W. Rozycki

On Thu, 20 Jul 2023, Richard Biener wrote:

> > c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> > commit c1e420549f2305efb70ed37e693d380724eb7540
> > Author: Maciej W. Rozycki 
> > Date:   Wed Jul 19 11:59:29 2023 +0100
> >
> > testsuite: Add 64-bit vector variant for bb-slp-pr95839.c
> 
> I think the issue is we disable V2SF on ia32 because of the conflict
> with MMX which we
> don't want to use.

 I'm not sure if I have a way to test with such a target.  Would you 
expect:

/* { dg-require-effective-target vect64 } */

to cover it?  If so, then I'll put it back as in the original version and 
post for Haochen to verify.

  Maciej

Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Richard Biener via Gcc-patches

On Thu, Jul 20, 2023 at 3:13 PM Maciej W. Rozycki  wrote:
>
> On Thu, 20 Jul 2023, Richard Biener wrote:
>
> > > c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> > > commit c1e420549f2305efb70ed37e693d380724eb7540
> > > Author: Maciej W. Rozycki 
> > > Date:   Wed Jul 19 11:59:29 2023 +0100
> > >
> > > testsuite: Add 64-bit vector variant for bb-slp-pr95839.c
> >
> > I think the issue is we disable V2SF on ia32 because of the conflict
> > with MMX which we
> > don't want to use.
>
>  I'm not sure if I have a way to test with such a target.  Would you
> expect:
>
> /* { dg-require-effective-target vect64 } */
>
> to cover it?  If so, then I'll put it back as in the original version and
> post for Haochen to verify.

Yeah, that should work here.

Richard.

>   Maciej

[PATCH] LoongArch: Allow using --with-arch=native if host CPU is LoongArch

2023-07-20 Thread Xi Ruoyao via Gcc-patches

If the host triple and the target triple are different but the host is
LoongArch, in some cases --with-arch=native can be useful.  For example,
if we are bootstrapping a loongarch64-linux-musl toolchain on a
Glibc-based system and we don't intend to use the toolchain on other
machines, we can use

../gcc/configure --{build,host}=loongarch64-linux-gnu \
 --target=loongarch64-linux-musl --with-arch=native

Relax the check in config.gcc to allow such configurations.

gcc/ChangeLog:

* config.gcc [target=loongarch*-*-*, with_arch=native]: Allow
building cross compiler if the host CPU is LoongArch.
---

Tested on x86_64-linux-gnu (building a cross compiler targeting
LoongArch --with-arch=native still rejected) and loongarch64-linux-gnu
(building a cross compiler targeting loongarch64-linux-musl allowed).
Ok for trunk?

 gcc/config.gcc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1446eb2b3ca..146bca22a38 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4939,10 +4939,13 @@ case "${target}" in
case ${with_arch} in
"" | loongarch64 | la464) ;; # OK, append here.
native)
-   if test x${host} != x${target}; then
+   case ${host} in
+   loongarch*) ;; # OK
+   *)
echo "--with-arch=native is illegal for 
cross-compiler." 1>&2
exit 1
-   fi
+   ;;
+   esac
;;
"")
echo "Please set a default value for \${with_arch}" \
-- 
2.41.0

Re: [PATCH, OpenACC 2.7] readonly modifier support in front-ends

2023-07-20 Thread Thomas Schwinge

Hi Chung-Lin, Tobias!

On 2023-07-11T02:33:58+0800, Chung-Lin Tang  wrote:
> this patch contains support for the 'readonly' modifier in copyin clauses
> and the cache directive.

Thanks!

> As we discussed earlier, the work for actually linking this to middle-end
> points-to analysis is a somewhat non-trivial issue. This first patch allows
> the language feature to be used in OpenACC directives first (with no effect 
> for now).
> The middle-end changes are probably going to be a later patch.

ACK.

> (Also CCing Tobias because of the Fortran bits)

A few specific GCC/Fortran questions for Tobias below, and some more
review comments for Chung-Lin:

> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser,
>
>  static tree
>  c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind,
> -   tree list, bool allow_deref = false)
> +   tree list, bool allow_deref = false,
> +   bool *readonly = NULL)
>  {
>/* The clauses location.  */
>location_t loc = c_parser_peek_token (parser)->location;
> @@ -14067,6 +14068,20 @@ c_parser_omp_var_list_parens (c_parser *parser, enum 
> omp_clause_code kind,
>matching_parens parens;
>if (parens.require_open (parser))
>  {
> +  if (readonly != NULL)
> + {
> +   c_token *token = c_parser_peek_token (parser);
> +   if (token->type == CPP_NAME
> +   && !strcmp (IDENTIFIER_POINTER (token->value), "readonly")
> +   && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
> + {
> +   c_parser_consume_token (parser);
> +   c_parser_consume_token (parser);
> +   *readonly = true;
> + }
> +   else
> + *readonly = false;
> + }
>list = c_parser_omp_variable_list (parser, loc, kind, list, 
> allow_deref);
>parens.skip_until_found_close (parser);
>  }

Instead of doing this in 'c_parser_omp_var_list_parens', I think it's
clearer to have this special 'readonly :' parsing logic in the two places
where it's used.  For example (random), like 'ancestor :' is parsed in
'c_parser_omp_clause_device', or 'conditional :' is parsed in
'c_parser_omp_clause_lastprivate'.  (Yes, this does duplicate a bit of
code, but that's easy enough to follow along.)

The existing 'enum omp_clause_code kind', 'bool allow_deref' actually
affect the parsing process; the new 'bool readonly' only propagates a
flag.

> @@ -14084,7 +14099,11 @@ c_parser_omp_var_list_parens (c_parser *parser, enum 
> omp_clause_code kind,
> OpenACC 2.6:
> no_create ( variable-list )
> attach ( variable-list )
> -   detach ( variable-list ) */
> +   detach ( variable-list )
> +
> +   OpenACC 2.7:
> +   copyin (readonly : variable-list )
> + */
>
>  static tree
>  c_parser_oacc_data_clause (c_parser *parser, pragma_omp_clause c_kind,
> @@ -14135,11 +14154,22 @@ c_parser_oacc_data_clause (c_parser *parser, 
> pragma_omp_clause c_kind,
>  default:
>gcc_unreachable ();
>  }
> +
> +  /* Turn on readonly modifier parsing for copyin clause.  */
> +  bool readonly = false, *readonly_ptr = NULL;
> +  if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN)
> +readonly_ptr = &readonly;
> +
>tree nl, c;
> -  nl = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_MAP, list, true);
> +  nl = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_MAP, list, true,
> +  readonly_ptr);

That is, similar to 'c_parser_omp_clause_device', or
'c_parser_omp_clause_lastprivate', inline 'c_parser_omp_var_list_parens'
here, and only for 'PRAGMA_OACC_CLAUSE_COPYIN' parse 'readonly :', then
(for all) use 'c_parser_omp_variable_list' etc. instead of
'c_parser_omp_var_list_parens', then set 'readonly':

>for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
> -OMP_CLAUSE_SET_MAP_KIND (c, kind);
> +{
> +  OMP_CLAUSE_SET_MAP_KIND (c, kind);
> +  if (readonly)
> + OMP_CLAUSE_MAP_READONLY (c) = 1;
> +}
>
>return nl;

> @@ -18212,6 +18242,9 @@ c_parser_omp_structured_block (c_parser *parser, bool 
> *if_p)
>  /* OpenACC 2.0:
> # pragma acc cache (variable-list) new-line
>
> +   OpenACC 2.7:
> +   # pragma acc cache (readonly: variable-list) new-line
> +
> LOC is the location of the #pragma token.
>  */
>
> @@ -18219,8 +18252,14 @@ static tree
>  c_parser_oacc_cache (location_t loc, c_parser *parser)
>  {
>tree stmt, clauses;
> +  bool readonly;
> +
> +  clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL,
> +   false, &readonly);
> +  if (readonly)
> +for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +  OMP_CLAUSE__CACHE__READONLY (c) = 1;
>
> -  clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL);
>clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
>
>c_parser_skip_to_pragma_eol (parser);

Similarly.

> --- a/gcc/c

RE: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer

2023-07-20 Thread Li, Pan2 via Gcc-patches

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Thursday, July 20, 2023 8:54 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches ; richard.sandiford 

Subject: Re: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer

On Thu, 20 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Just finish Bootstrap and regression on X86.
> 
> Ok for trunk ?

OK.  Not an issue currently but I think LEN_MASK should be
checked before MASK.

Richard.

> 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-07-20 16:06
> To: gcc-patches
> CC: richard.sandiford; rguenther; Ju-Zhe Zhong
> Subject: [PATCH] CODE STRUCTURE: Refine codes in Vectorizer
> From: Ju-Zhe Zhong 
>  
> Hi, Richard and Richi.
>  
> I plan to refine the codes that I recently support for RVV auto-vectorization.
> This patch is inspired last review comments from Richard:
> https://patchwork.sourceware.org/project/gcc/patch/20230712042124.111818-1-juzhe.zh...@rivai.ai/
>  
> Richard said he prefer the the code structure as follows:
>  
> Please instead switch the if condition so that the structure is:
>  
>if (...)
>  vect_record_loop_mask (...)
>else if (...)
>  vect_record_loop_len (...)
>else
>  can't use partial vectors
>  
> This is his last comments.
>  
> So, I come back to refine this piece of codes.
>  
> Does it look reasonable ?
>  
> This next refine patch is change all names of "LEN_MASK" into "MASK_LEN" but 
> should come after this
> patch.
>  
> gcc/ChangeLog:
>  
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Refine code 
> structure.
>  
> ---
> gcc/tree-vect-stmts.cc | 38 +-
> 1 file changed, 17 insertions(+), 21 deletions(-)
>  
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cb86d544313..b86e159ae4c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1605,6 +1605,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>  nvectors = vect_get_num_copies (loop_vinfo, vectype);
>vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>machine_mode vecmode = TYPE_MODE (vectype);
>bool is_load = (vls_type == VLS_LOAD);
>if (memory_access_type == VMAT_LOAD_STORE_LANES)
> @@ -1631,33 +1632,29 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>internal_fn ifn = (is_load
> ? IFN_MASK_GATHER_LOAD
> : IFN_MASK_SCATTER_STORE);
> -  if (!internal_gather_scatter_fn_supported_p (ifn, vectype,
> -gs_info->memory_type,
> -gs_info->offset_vectype,
> -gs_info->scale))
> - {
> -   ifn = (is_load
> - ? IFN_LEN_MASK_GATHER_LOAD
> - : IFN_LEN_MASK_SCATTER_STORE);
> -   if (internal_gather_scatter_fn_supported_p (ifn, vectype,
> -   gs_info->memory_type,
> -   gs_info->offset_vectype,
> -   gs_info->scale))
> - {
> -   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> -   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> -   return;
> - }
> +  internal_fn len_ifn = (is_load
> +  ? IFN_LEN_MASK_GATHER_LOAD
> +  : IFN_LEN_MASK_SCATTER_STORE);
> +  if (internal_gather_scatter_fn_supported_p (ifn, vectype,
> +   gs_info->memory_type,
> +   gs_info->offset_vectype,
> +   gs_info->scale))
> + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> +scalar_mask);
> +  else if (internal_gather_scatter_fn_supported_p (len_ifn, vectype,
> +gs_info->memory_type,
> +gs_info->offset_vectype,
> +gs_info->scale))
> + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> +  else
> + {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors because"
>  " the target doesn't have an appropriate"
>  " gather load or scatter store instruction.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -   return;
> }
> -  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> -  scalar_mask);
>return;
>  }
> @@ -1703,7 +1700,6 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>if (get_len_load_store_mode (vecmode, is_load).exists (&vmode))
>  {
>nvectors = group_memory_nvectors (group_size * vf, nunits);
> -  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE 
> (vecmode);
>vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor);
>using_partial_vectors_p = true;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Cleanup code determining number of iterations from cfg profile

2023-07-20 Thread Jan Hubicka via Gcc-patches

Hi,
this patch cleanups API for determining expected loop iteraitons from profile.
We started with having expected_loop_iterations and only source was the integer
represented BB counts. It did some work on guessing number of iteration if
profile was absent or bogus.  Later we introduced loop_info and added
get_estimated_loop_iterations which made expected_loop_iterations useful mostly
when doing profile updates and not for loop optimization heuristics.  The
naming is bit ambiguous so this difference is not clear.  Even later we
introduced precision tracking to profile and exended the API to return
reliablity of result but did not update all uses to do reasonable stuff with
it.  There is also some cofusion about +-1s concering latch execution counts
versus header execution counts.  

This patch aims to obsolette expected_loop_iterations and
expected_loop_iterations_unbounded (and "suceeds" modulo 1 use of each of two).
It adds expected_loop_iterations_by_profile which computes sreal and does
correct precision/presence tracking.  

Unlike old code, it is based on CFG profile only and  does not attempt to
provide fake answer when info is missing and does not check sanity with
loop_info.

We now define iterations consistently as lath execution in loop_info so I use
that here too.

I converted almost all calls to new API: dumps, code produing loop_info from
CFG profile and profile updating.  Remaining uses are in loop unrolling and
prefetching that needs more TLC I will do incrementally.  

There are some improvements possible which I can play with incrementally.
 - for simple loops with one exit dominating latch we can use exit
   probability for easier to preserve info in loop itraionts.
   THis is probably not too critical since all esitmates should be recorded
   in loop_info and would help mostly if new loop is constructed or old
   loop is lost and redicovered.
 - We may want to avoid trusting the profile if it is obviously inconsistent
   on header.

Bootstrapped/regtested x86_64-linux, plan to commit it later today if
there are no complains.

Honza

gcc/ChangeLog:

* cfgloop.cc: Include sreal.h.
(flow_loop_dump): Dump sreal iteration exsitmate.
(get_estimated_loop_iterations): Update.
* cfgloop.h (expected_loop_iterations_by_profile): Declare.
* cfgloopanal.cc (expected_loop_iterations_by_profile): New function.
(expected_loop_iterations_unbounded): Use new API.
* cfgloopmanip.cc (scale_loop_profile): Use
expected_loop_iterations_by_profile
* predict.cc (pass_profile::execute): Likewise.
* profile.cc (branch_prob): Likewise.
* tree-ssa-loop-niter.cc: Include sreal.h.
(estimate_numbers_of_iterations): Likewise

diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index ccda7415d70..11336ea45c0 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-ssa.h"
 #include "tree-pretty-print.h"
+#include "sreal.h"
 
 static void flow_loops_cfg_dump (FILE *);
 
@@ -138,14 +139,11 @@ flow_loop_dump (const class loop *loop, FILE *file,
   loop_depth (loop), (long) (loop_outer (loop)
  ? loop_outer (loop)->num : -1));
 
-  if (loop->latch)
-{
-  bool read_profile_p;
-  gcov_type nit = expected_loop_iterations_unbounded (loop, 
&read_profile_p);
-  if (read_profile_p && !loop->any_estimate)
-   fprintf (file, ";;  profile-based iteration count: %" PRIu64 "\n",
-(uint64_t) nit);
-}
+  bool reliable;
+  sreal iterations;
+  if (expected_loop_iterations_by_profile (loop, &iterations, &reliable))
+fprintf (file, ";;  profile-based iteration count: %f %s\n",
+iterations.to_double (), reliable ? "(reliable)" : "(unreliable)");
 
   fprintf (file, ";;  nodes:");
   bbs = get_loop_body (loop);
@@ -2014,10 +2012,12 @@ get_estimated_loop_iterations (class loop *loop, 
widest_int *nit)
  profile.  */
   if (!loop->any_estimate)
 {
-  if (loop->header->count.reliable_p ())
+  sreal snit;
+  bool reliable;
+  if (expected_loop_iterations_by_profile (loop, &snit, &reliable)
+ && reliable)
{
-  *nit = gcov_type_to_wide_int
-  (expected_loop_iterations_unbounded (loop) + 1);
+ *nit = (snit + 0.5).to_int ();
  return true;
}
   return false;
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index e7ac2b5f3db..4d2fd4b6af5 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -403,7 +403,10 @@ extern void verify_loop_structure (void);
 /* Loop analysis.  */
 extern bool just_once_each_iteration_p (const class loop *, const_basic_block);
 gcov_type expected_loop_iterations_unbounded (const class loop *,
- bool *read_profile_p = NULL, bool 
by_profile_only = false);
+ bool *read_profile_

[PATCH v2] c++: fix ICE with designated initializer [PR110114]

2023-07-20 Thread Marek Polacek via Gcc-patches

On Wed, Jul 19, 2023 at 03:24:10PM -0400, Jason Merrill wrote:
> On 7/19/23 14:38, Marek Polacek wrote:
> > On Wed, Jul 19, 2023 at 02:32:15PM -0400, Patrick Palka wrote:
> > > On Wed, 19 Jul 2023, Marek Polacek wrote:
> > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > LGTM.  It might be preferable to check COMPLETE_TYPE_P in the caller
> > > instead, so that we avoid inspecting CLASSTYPE_NON_AGGREGATE on an
> > > incomplete class type, and so that the caller doesn't "commit" to
> > > building an aggregate conversion.
> > 
> > Perhaps.  I wanted to avoid the call to build_user_type_conversion_1.
> > I could add an early return to implicit_conversion_1 but I'd have to
> > move some code around not to check COMPLETE_TYPE_P before complete_type.
> 
> Maybe return NULL for the incomplete case here, rather than just skipping
> reshape_init?
> 
>   /* Call reshape_init early to remove redundant braces.  */
>   if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
>   && CLASS_TYPE_P (to)
>   && COMPLETE_TYPE_P (complete_type (to))
>   && !CLASSTYPE_NON_AGGREGATE (to))
> {
>   expr = reshape_init (to, expr, complain);
>   if (expr == error_mark_node)
> return NULL;
>   from = TREE_TYPE (expr);
> }
> 
> If that doesn't work, the patch is fine as-is.

It does work, with one test tweak (which I don't think is a regression):
 
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
r13-1227 added an assert checking that the index in a CONSTRUCTOR
is a FIELD_DECL.  That's a reasonable assumption but in this case
we never called reshape_init due to the type being incomplete, and
so the index remained an identifier node: get_class_binding never
got around to looking up the FIELD_DECL.

We can avoid the crash by returning early in implicit_conversion_1; we'd
return NULL anyway due to:

  if (i < CONSTRUCTOR_NELTS (ctor))
return NULL;

in build_aggr_conv.

PR c++/110114

gcc/cp/ChangeLog:

* call.cc (implicit_conversion_1): Return early if the type isn't
complete.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist100.C: Adjust expected diagnostic.
* g++.dg/cpp2a/desig28.C: New test.
* g++.dg/cpp2a/desig29.C: New test.
---
 gcc/cp/call.cc   | 19 +++
 gcc/testsuite/g++.dg/cpp0x/initlist100.C |  4 ++--
 gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
 gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
 4 files changed, 40 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index b55230d98aa..673ec91d60e 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2059,15 +2059,18 @@ implicit_conversion_1 (tree to, tree from, tree expr, 
bool c_cast_p,
   complain &= ~tf_error;
 
   /* Call reshape_init early to remove redundant braces.  */
-  if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
-  && CLASS_TYPE_P (to)
-  && COMPLETE_TYPE_P (complete_type (to))
-  && !CLASSTYPE_NON_AGGREGATE (to))
+  if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr) && CLASS_TYPE_P (to))
 {
-  expr = reshape_init (to, expr, complain);
-  if (expr == error_mark_node)
-   return NULL;
-  from = TREE_TYPE (expr);
+  to = complete_type (to);
+  if (!COMPLETE_TYPE_P (to))
+   return nullptr;
+  if (!CLASSTYPE_NON_AGGREGATE (to))
+   {
+ expr = reshape_init (to, expr, complain);
+ if (expr == error_mark_node)
+   return nullptr;
+ from = TREE_TYPE (expr);
+   }
 }
 
   if (TYPE_REF_P (to))
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist100.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist100.C
index 9d80a004c17..6865d34a6f9 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist100.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist100.C
@@ -2,9 +2,9 @@
 // { dg-do compile { target c++11 } }
 
 namespace std {
-template  class initializer_list;  // { dg-message "declaration" }
+template  class initializer_list;
 }
 
 template  struct B { B (std::initializer_list); };
 struct C { virtual int foo (); };
-struct D : C {} d { B { D {} } };  // { dg-error "incomplete|no matching" }
+struct D : C {} d { B { D {} } };  // { dg-error "no matching" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C 
b/gcc/testsuite/g++.dg/cpp2a/desig28.C
new file mode 100644
index 000..b63265fea51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
@@ -0,0 +1,17 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct A {
+int a,b;
+};
+
+struct B;
+
+void foo(const A &) {}
+void foo(const B &) {}
+
+int
+main ()
+{
+  foo({.a=0});
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
b/gcc/testsuite/g++.dg/cpp2a/desig29.C
new file mode 100644
index 000..bd1a82b041d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
@@ -0,0 +1,10 @@
+// PR c++

Re: [PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> When we materialize a layout we push edge permutes to constant/external
> defs without checking we can actually do so.  For externals defined
> by vector stmts rather than scalar components we can't.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
>   PR tree-optimization/110742
>   * tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
>   Do not materialize an edge permutation in an external node with
>   vector defs.
>   (vect_slp_analyze_node_operations_1): Guard purely internal
>   nodes better.
>
>   * g++.dg/torture/pr110742.C: New testcase.
> ---
>  gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
>  gcc/tree-vect-slp.cc|  8 +++--
>  2 files changed, 53 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
> b/gcc/testsuite/g++.dg/torture/pr110742.C
> new file mode 100644
> index 000..d41ac0479d2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr110742.C
> @@ -0,0 +1,47 @@
> +// { dg-do compile }
> +
> +struct HARD_REG_SET {
> +  HARD_REG_SET operator~() const {
> +HARD_REG_SET res;
> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
> +  res.elts[i] = ~elts[i];
> +return res;
> +  }
> +  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
> +HARD_REG_SET res;
> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
> +  res.elts[i] = elts[i] & other.elts[i];
> +return res;
> +  }
> +  unsigned long elts[4];
> +};
> +typedef const HARD_REG_SET &const_hard_reg_set;
> +inline bool hard_reg_set_subset_p(const_hard_reg_set x, const_hard_reg_set 
> y) {
> +  unsigned long bad = 0;
> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
> +bad |= (x.elts[i] & ~y.elts[i]);
> +  return bad == 0;
> +}
> +inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
> +  unsigned long bad = 0;
> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
> +bad |= x.elts[i];
> +  return bad == 0;
> +}
> +extern HARD_REG_SET rr[2];
> +extern int t[2];
> +extern HARD_REG_SET nn;
> +static HARD_REG_SET mm;
> +void setup_reg_class_relations(void) {
> +  HARD_REG_SET intersection_set, union_set, temp_set2;
> +  for (int cl2 = 0; cl2 < 2; cl2++) {
> +temp_set2 = rr[cl2] & ~nn;
> +if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
> +  mm = rr[0] & nn;
> +  if (hard_reg_set_subset_p(mm, intersection_set))
> +if (!hard_reg_set_subset_p(mm, temp_set2) ||
> +hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
> +  t[cl2] = 0;
> +}
> +  }
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 693621ca990..1d79c77e8ce 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout 
> (slp_tree node,
>  return result;
>  
>if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> -  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
> +  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
> +   && (to_layout_i == 0
> +   /* We can't permute vector defs.  */
> +   || SLP_TREE_VEC_DEFS (node).is_empty (

Guess it's personal preference, but IMO it's easier to follow without the
to_layout_i condition, so that it ties directly to the create_partitions
test.  (Would be nice to have a name for whatever a node matching the new
condition is, but I don't have any good ideas.)

LGTM otherwise FWIW.

Thanks,
Richard

>  {
>/* If the vector is uniform or unchanged, there's nothing to do.  */
>if (to_layout_i == 0 || vect_slp_tree_uniform_p (node))
> @@ -5944,7 +5947,8 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
> slp_tree node,
>   calculated by the recursive call).  Otherwise it is the number of
>   scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by
>   VF divided by the number of elements in a vector.  */
> -  if (!STMT_VINFO_DATA_REF (stmt_info)
> +  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> +  && !STMT_VINFO_DATA_REF (stmt_info)
>&& REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>  {
>for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i)

Re: [PATCH v2] c++: fix ICE with designated initializer [PR110114]

2023-07-20 Thread Jason Merrill via Gcc-patches


On 7/20/23 10:08, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 03:24:10PM -0400, Jason Merrill wrote:

On 7/19/23 14:38, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 02:32:15PM -0400, Patrick Palka wrote:

On Wed, 19 Jul 2023, Marek Polacek wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


LGTM.  It might be preferable to check COMPLETE_TYPE_P in the caller
instead, so that we avoid inspecting CLASSTYPE_NON_AGGREGATE on an
incomplete class type, and so that the caller doesn't "commit" to
building an aggregate conversion.


Perhaps.  I wanted to avoid the call to build_user_type_conversion_1.
I could add an early return to implicit_conversion_1 but I'd have to
move some code around not to check COMPLETE_TYPE_P before complete_type.


Maybe return NULL for the incomplete case here, rather than just skipping
reshape_init?

   /* Call reshape_init early to remove redundant braces.  */
   if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
   && CLASS_TYPE_P (to)
   && COMPLETE_TYPE_P (complete_type (to))
   && !CLASSTYPE_NON_AGGREGATE (to))
 {
   expr = reshape_init (to, expr, complain);
   if (expr == error_mark_node)
 return NULL;
   from = TREE_TYPE (expr);
 }

If that doesn't work, the patch is fine as-is.


It does work, with one test tweak (which I don't think is a regression):
  
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
r13-1227 added an assert checking that the index in a CONSTRUCTOR
is a FIELD_DECL.  That's a reasonable assumption but in this case
we never called reshape_init due to the type being incomplete, and
so the index remained an identifier node: get_class_binding never
got around to looking up the FIELD_DECL.

We can avoid the crash by returning early in implicit_conversion_1; we'd
return NULL anyway due to:

   if (i < CONSTRUCTOR_NELTS (ctor))
 return NULL;

in build_aggr_conv.

PR c++/110114

gcc/cp/ChangeLog:

* call.cc (implicit_conversion_1): Return early if the type isn't
complete.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist100.C: Adjust expected diagnostic.
* g++.dg/cpp2a/desig28.C: New test.
* g++.dg/cpp2a/desig29.C: New test.
---
  gcc/cp/call.cc   | 19 +++
  gcc/testsuite/g++.dg/cpp0x/initlist100.C |  4 ++--
  gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
  gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
  4 files changed, 40 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index b55230d98aa..673ec91d60e 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2059,15 +2059,18 @@ implicit_conversion_1 (tree to, tree from, tree expr, 
bool c_cast_p,
complain &= ~tf_error;
  
/* Call reshape_init early to remove redundant braces.  */

-  if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
-  && CLASS_TYPE_P (to)
-  && COMPLETE_TYPE_P (complete_type (to))
-  && !CLASSTYPE_NON_AGGREGATE (to))
+  if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr) && CLASS_TYPE_P (to))
  {
-  expr = reshape_init (to, expr, complain);
-  if (expr == error_mark_node)
-   return NULL;
-  from = TREE_TYPE (expr);
+  to = complete_type (to);
+  if (!COMPLETE_TYPE_P (to))
+   return nullptr;
+  if (!CLASSTYPE_NON_AGGREGATE (to))
+   {
+ expr = reshape_init (to, expr, complain);
+ if (expr == error_mark_node)
+   return nullptr;
+ from = TREE_TYPE (expr);
+   }
  }
  
if (TYPE_REF_P (to))

diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist100.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist100.C
index 9d80a004c17..6865d34a6f9 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist100.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist100.C
@@ -2,9 +2,9 @@
  // { dg-do compile { target c++11 } }
  
  namespace std {

-template  class initializer_list;  // { dg-message "declaration" }
+template  class initializer_list;
  }
  
  template  struct B { B (std::initializer_list); };

  struct C { virtual int foo (); };
-struct D : C {} d { B { D {} } };  // { dg-error "incomplete|no matching" }
+struct D : C {} d { B { D {} } };  // { dg-error "no matching" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C 
b/gcc/testsuite/g++.dg/cpp2a/desig28.C
new file mode 100644
index 000..b63265fea51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
@@ -0,0 +1,17 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct A {
+int a,b;
+};
+
+struct B;
+
+void foo(const A &) {}
+void foo(const B &) {}
+
+int
+main ()
+{
+  foo({.a=0});
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
b/gcc/testsuite/g++.dg/cpp2a/desig29.C
new file mode 100644
index 000..bd1a82b041d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
@@ -0,0 +1,10 @@
+// PR c++/110114
+//

[PATCH 0/3] Espressif xtensa chips multilib

2023-07-20 Thread Alexey Lapshin via Gcc-patches

This patch series introduces multilib support for Espressif XTENSA chips in gcc.
The addition of the "-mdynconfig=" option was necessary because the existing
environment variable XTENSA_GNU_CONFIG cannot be utilized for implementing 
multilib.
This is because multilib operates with gcc options rather than environment 
variables.

It is not possible to use full path in "-mdynconfig=" because it could be 
different
on users machines and because multilib syntax already reserves the directory
delimiter character. So, it designed to contain dynconfig filename which would 
not
change in toolchain.

Also, the XTENSA_GNU_CONFIG environment variable usage was changed to include 
both
the dynconfig file fullpath or dynconfigs directory. This change was made to 
overcome
the limitations of modifying LD_LIBRARY_PATH on macOS host machines.

[PATCH 1/3] gcc: xtensa: add mdynconfig option

2023-07-20 Thread Alexey Lapshin via Gcc-patches

gcc/
* config/xtensa/elf.h (ASM_SPEC, LINK_SPEC): Pass dynconfig to
assembler/linker.
* config/xtensa/linux.h (ASM_SPEC, LINK_SPEC): Likewise.
* config/xtensa/uclinux.h (ASM_SPEC, LINK_SPEC): Likewise.
* config/xtensa/xtensa-dynconfig.cc: May build dynconfig path with
  dir in XTENSA_GNU_CONFIG and filename in mdynconfig option.
* doc/invoke.texi: Add XTENSA_GNU_CONFIG and mdynconfig doc.
---
 gcc/config/xtensa/elf.h   |  6 ++-
 gcc/config/xtensa/linux.h |  6 ++-
 gcc/config/xtensa/uclinux.h   |  6 ++-
 gcc/config/xtensa/xtensa-dynconfig.cc | 55 ++-
 gcc/config/xtensa/xtensa.opt  |  4 ++
 gcc/doc/invoke.texi   | 14 +++
 6 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/gcc/config/xtensa/elf.h b/gcc/config/xtensa/elf.h
index 715b3a0b1d2..6683edea1de 100644
--- a/gcc/config/xtensa/elf.h
+++ b/gcc/config/xtensa/elf.h
@@ -49,7 +49,8 @@ along with GCC; see the file COPYING3.  If not see
   %{mauto-litpools:--auto-litpools} \
   %{mno-auto-litpools:--no-auto-litpools} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #undef LIB_SPEC
 #define LIB_SPEC "-lc -lsim -lc -lhandlers-sim -lhal"
@@ -69,7 +70,8 @@ along with GCC; see the file COPYING3.  If not see
   %{rdynamic:-export-dynamic} \
 %{static:-static}}} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #undef LOCAL_LABEL_PREFIX
 #define LOCAL_LABEL_PREFIX "."
diff --git a/gcc/config/xtensa/linux.h b/gcc/config/xtensa/linux.h
index e684e7deebf..928e8c36923 100644
--- a/gcc/config/xtensa/linux.h
+++ b/gcc/config/xtensa/linux.h
@@ -46,7 +46,8 @@ along with GCC; see the file COPYING3.  If not see
   %{mauto-litpools:--auto-litpools} \
   %{mno-auto-litpools:--no-auto-litpools} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld.so.1"
 
@@ -60,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
 %{static-pie:-static -pie --no-dynamic-linker -z text} \
 %{static:-static}} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #undef LOCAL_LABEL_PREFIX
 #define LOCAL_LABEL_PREFIX "."
diff --git a/gcc/config/xtensa/uclinux.h b/gcc/config/xtensa/uclinux.h
index da9e619fb05..68c209bbebb 100644
--- a/gcc/config/xtensa/uclinux.h
+++ b/gcc/config/xtensa/uclinux.h
@@ -53,13 +53,15 @@ along with GCC; see the file COPYING3.  If not see
   %{mauto-litpools:--auto-litpools} \
   %{mno-auto-litpools:--no-auto-litpools} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #undef LINK_SPEC
 #define LINK_SPEC \
  "%{!no-elf2flt:%{!elf2flt*:-elf2flt}} \
   %{mabi=windowed:--abi-windowed} \
-  %{mabi=call0:--abi-call0}"
+  %{mabi=call0:--abi-call0} \
+  %{mdynconfig=*:--dynconfig=%*}"
 
 #undef LOCAL_LABEL_PREFIX
 #define LOCAL_LABEL_PREFIX "."
diff --git a/gcc/config/xtensa/xtensa-dynconfig.cc 
b/gcc/config/xtensa/xtensa-dynconfig.cc
index 9aea9f253c2..3d6938a134b 100644
--- a/gcc/config/xtensa/xtensa-dynconfig.cc
+++ b/gcc/config/xtensa/xtensa-dynconfig.cc
@@ -22,6 +22,7 @@
 #include "coretypes.h"
 #include "diagnostic.h"
 #include "intl.h"
+#include "options.h"
 #define XTENSA_CONFIG_DEFINITION
 #include "xtensa-config.h"
 #include "xtensa-dynconfig.h"
@@ -67,6 +68,55 @@ dlerror (void)
 
 #define CONFIG_ENV_NAME "XTENSA_GNU_CONFIG"
 
+#ifdef ENABLE_PLUGIN
+
+static char *get_xtensa_dynconfig_file (void)
+{
+  const char *xtensa_dynconfig_env = getenv (CONFIG_ENV_NAME);
+  if (!strlen (xtensa_dynconfig_file))
+{
+  if (xtensa_dynconfig_env && !strlen (lbasename (xtensa_dynconfig_env)))
+   {
+ /* XTENSA_GNU_CONFIG has directory path, but dynconfig file is not 
set */
+ return NULL;
+   }
+  else if (xtensa_dynconfig_env)
+   {
+ /* XTENSA_GNU_CONFIG has filepath */
+ return xstrdup (xtensa_dynconfig_env);
+   }
+  /* dynconfig is not set */
+  return NULL;
+}
+  if (!xtensa_dynconfig_env)
+{
+  /* XTENSA_GNU_CONFIG has filepath */
+  return xstrdup (xtensa_dynconfig_file);
+}
+  if (!strlen (lbasename (xtensa_dynconfig_env)))
+{
+  /* XTENSA_GNU_CONFIG has directory path and dynconfig file is set */
+  const size_t len = strlen (xtensa_dynconfig_env) +
+ strlen (xtensa_dynconfig_file) + 1;
+  char *path = ( char *) xmalloc (len);
+  strcpy (path, xtensa_dynconfig_env);
+  strcat (path, xtensa_dynconfig_file);
+  return path;
+}
+  if (strcmp (lbasename (xtensa_dynconfig_env),
+

[PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Alexey Lapshin via Gcc-patches

gcc/
* config/xtensa/xtensa.h (XCHAL_HAVE_BE, XCHAL_HAVE_DENSITY,
  XCHAL_HAVE_CONST16, XCHAL_HAVE_ABS, XCHAL_HAVE_ADDX,
  XCHAL_HAVE_L32R, XSHAL_USE_ABSOLUTE_LITERALS,
  XSHAL_HAVE_TEXT_SECTION_LITERALS, XCHAL_HAVE_MAC16,
  XCHAL_HAVE_MUL16, XCHAL_HAVE_MUL32, XCHAL_HAVE_MUL32_HIGH,
  XCHAL_HAVE_DIV32, XCHAL_HAVE_NSA, XCHAL_HAVE_MINMAX,
  XCHAL_HAVE_SEXT, XCHAL_HAVE_LOOPS, XCHAL_HAVE_THREADPTR,
  XCHAL_HAVE_RELEASE_SYNC, XCHAL_HAVE_S32C1I,
  XCHAL_HAVE_BOOLEANS, XCHAL_HAVE_FP, XCHAL_HAVE_FP_DIV,
  XCHAL_HAVE_FP_RECIP, XCHAL_HAVE_FP_SQRT,
  XCHAL_HAVE_FP_RSQRT, XCHAL_HAVE_FP_POSTINC, XCHAL_HAVE_DFP,
  XCHAL_HAVE_DFP_DIV, XCHAL_HAVE_DFP_RECIP,
  XCHAL_HAVE_DFP_SQRT, XCHAL_HAVE_DFP_RSQRT,
  XCHAL_HAVE_WINDOWED, XCHAL_NUM_AREGS,
  XCHAL_HAVE_WIDE_BRANCHES, XCHAL_HAVE_PREDICTED_BRANCHES,
  XCHAL_ICACHE_SIZE, XCHAL_DCACHE_SIZE,
  XCHAL_ICACHE_LINESIZE, XCHAL_DCACHE_LINESIZE,
  XCHAL_ICACHE_LINEWIDTH, XCHAL_DCACHE_LINEWIDTH,
  XCHAL_DCACHE_IS_WRITEBACK, XCHAL_HAVE_MMU,
  XCHAL_MMU_MIN_PTE_PAGE_SIZE, XCHAL_HAVE_DEBUG,
  XCHAL_NUM_IBREAK, XCHAL_NUM_DBREAK, XCHAL_DEBUGLEVEL,
  XCHAL_MAX_INSTRUCTION_SIZE, XCHAL_INST_FETCH_WIDTH,
  XSHAL_ABI, XTHAL_ABI_WINDOWED, XTHAL_ABI_CALL0,
  XCHAL_M_STAGE, XTENSA_MARCH_LATEST, XTENSA_MARCH_EARLIEST,
  XCHAL_HAVE_CLAMPS, XCHAL_HAVE_DEPBITS,
  XCHAL_HAVE_EXCLUSIVE, XCHAL_HAVE_XEA3): Add builtin-macros
  with values from dynconfig.
---
 gcc/config/xtensa/xtensa.h | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 8ebf37cab33..a65b674915b 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 
 
+#define XTENSA_CPU_CPP_BUILTIN(OPT) builtin_define_with_int_value (#OPT, OPT)
 /* Target CPU builtins.  */
 #define TARGET_CPU_CPP_BUILTINS()  \
   do { \
@@ -82,6 +83,67 @@ along with GCC; see the file COPYING3.  If not see
   builtin_define ("__XTENSA_SOFT_FLOAT__");
\
 for (builtin = xtensa_get_config_strings (); *builtin; ++builtin)  \
   builtin_define (*builtin);   \
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_BE); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DENSITY);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_CONST16);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_ABS);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_ADDX);   
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_L32R);   
\
+XTENSA_CPU_CPP_BUILTIN(XSHAL_USE_ABSOLUTE_LITERALS);   \
+XTENSA_CPU_CPP_BUILTIN(XSHAL_HAVE_TEXT_SECTION_LITERALS);  \
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MAC16);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL16);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL32);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL32_HIGH); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DIV32);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_NSA);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MINMAX); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_SEXT);   
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_LOOPS);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_THREADPTR);  
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_RELEASE_SYNC);   \
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_S32C1I); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_BOOLEANS);   
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_DIV); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_RECIP);   
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_SQRT);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_RSQRT);   
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_POSTINC); 
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DFP);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DFP_DIV);
\
+XTENSA_CPU_CPP_BUILTIN(XCHAL_HAV

[PATCH 3/3] gcc: xtensa: add xtensa-esp-elf multilib

2023-07-20 Thread Alexey Lapshin via Gcc-patches

gcc/
* config.gcc: Add xtensa*-esp*-elf target.
* config/xtensa/t-esp-multilib: New file.
---
 gcc/config.gcc   |  6 ++
 gcc/config/xtensa/t-esp-multilib | 20 
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/config/xtensa/t-esp-multilib

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6fd1594480a..f972c71a0b2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3512,6 +3512,12 @@ xstormy16-*-elf)
 xtensa*-*-elf*)
tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h"
extra_options="${extra_options} xtensa/elf.opt"
+   tmake_file="${tmake_file} xtensa/t-xtensa"
+   case ${target} in
+   xtensa*-esp-elf*)
+   tmake_file="${tmake_file} xtensa/t-esp-multilib"
+   ;;
+   esac
;;
 xtensa*-*-linux*)
tm_file="${tm_file} elfos.h gnu-user.h linux.h glibc-stdint.h 
xtensa/linux.h"
diff --git a/gcc/config/xtensa/t-esp-multilib b/gcc/config/xtensa/t-esp-multilib
new file mode 100644
index 000..dfc0ac0e04c
--- /dev/null
+++ b/gcc/config/xtensa/t-esp-multilib
@@ -0,0 +1,20 @@
+# Copyright (C) 2023 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+MULTILIB_OPTIONS = 
mdynconfig=xtensa_esp32.so/mdynconfig=xtensa_esp32s2.so/mdynconfig=xtensa_esp32s3.so
 fno-rtti
+MULTILIB_DIRNAMES = esp32 esp32s2 esp32s3 no-rtti
-- 
2.34.1

Re: [PATCH v4 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-20 Thread Jason Merrill via Gcc-patches


On 7/20/23 05:35, Nathaniel Shead wrote:

This adds rudimentary lifetime tracking in C++ constexpr contexts,
allowing the compiler to report errors with using values after their
backing has gone out of scope. We don't yet handle other ways of
accessing values outside their lifetime (e.g. following explicit
destructor calls).


Incidentally, much of that should be straightforward to handle by no 
longer ignoring clobbers here:



case MODIFY_EXPR:
  if (cxx_dialect < cxx14)
goto fail;
  if (!RECUR (TREE_OPERAND (t, 0), any))
return false;
  /* Just ignore clobbers.  */
  if (TREE_CLOBBER_P (TREE_OPERAND (t, 1)))
return true;


Assignment from a clobber represents end of lifetime to the middle-end. 
This can be a follow-up patch.



@@ -7051,10 +7065,17 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
return ctx->ctor;
if (VAR_P (t))
if (tree v = ctx->global->get_value (t))
-   {
- r = v;
- break;
-   }
+ {
+   r = v;
+   break;
+ }
+  if (ctx->global->is_outside_lifetime (t))
+   {
+ if (!ctx->quiet)
+   outside_lifetime_error (loc, t);
+ *non_constant_p = true;
+ break;
+   }


Shouldn't this new check also be under the if (VAR_P (t))?  A CONST_DECL 
can't go out of scope.


Jason

[committed] Document new analyzer parameters

2023-07-20 Thread Martin Jambor

Hi,

This patch documents the analyzer parameters introduced in
r14-2029-g0e466e978c7286 also in gcc/doc/invoke.texi.

Committed as obvious after testing with make pdf and make info and
eyeballing the result.

Thanks,

Martin


2023-07-20  Martin Jambor  

* doc/invoke.texi (analyzer-text-art-string-ellipsis-threshold): New.
(analyzer-text-art-ideal-canvas-width): Likewise.
(analyzer-text-art-string-ellipsis-head-len): Likewise.
(analyzer-text-art-string-ellipsis-tail-len): Likewise.

---
 gcc/doc/invoke.texi | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d3c821e208a..5628c08214d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16324,6 +16324,18 @@ The parameter is used only in GIMPLE FE.
 The maximum number of 'after supernode' exploded nodes within the analyzer
 per supernode, before terminating analysis.
 
+@item analyzer-text-art-string-ellipsis-threshold
+The number of bytes at which to ellipsize string literals in analyzer text art 
diagrams.
+
+@item analyzer-text-art-ideal-canvas-width
+The ideal width in characters of text art diagrams generated by the analyzer.
+
+@item analyzer-text-art-string-ellipsis-head-len
+The number of literal bytes to show at the head of a string literal in text 
art when ellipsizing it.
+
+@item analyzer-text-art-string-ellipsis-tail-len
+The number of literal bytes to show at the tail of a string literal in text 
art when ellipsizing it.
+
 @item ranger-logical-depth
 Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.
-- 
2.41.0

Re: [PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Biener via Gcc-patches




> Am 20.07.2023 um 16:09 schrieb Richard Sandiford :
> 
> Richard Biener via Gcc-patches  writes:
>> When we materialize a layout we push edge permutes to constant/external
>> defs without checking we can actually do so.  For externals defined
>> by vector stmts rather than scalar components we can't.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> OK?
>> 
>> Thanks,
>> Richard.
>> 
>>PR tree-optimization/110742
>>* tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
>>Do not materialize an edge permutation in an external node with
>>vector defs.
>>(vect_slp_analyze_node_operations_1): Guard purely internal
>>nodes better.
>> 
>>* g++.dg/torture/pr110742.C: New testcase.
>> ---
>> gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
>> gcc/tree-vect-slp.cc|  8 +++--
>> 2 files changed, 53 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C
>> 
>> diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
>> b/gcc/testsuite/g++.dg/torture/pr110742.C
>> new file mode 100644
>> index 000..d41ac0479d2
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/torture/pr110742.C
>> @@ -0,0 +1,47 @@
>> +// { dg-do compile }
>> +
>> +struct HARD_REG_SET {
>> +  HARD_REG_SET operator~() const {
>> +HARD_REG_SET res;
>> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
>> +  res.elts[i] = ~elts[i];
>> +return res;
>> +  }
>> +  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
>> +HARD_REG_SET res;
>> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
>> +  res.elts[i] = elts[i] & other.elts[i];
>> +return res;
>> +  }
>> +  unsigned long elts[4];
>> +};
>> +typedef const HARD_REG_SET &const_hard_reg_set;
>> +inline bool hard_reg_set_subset_p(const_hard_reg_set x, const_hard_reg_set 
>> y) {
>> +  unsigned long bad = 0;
>> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
>> +bad |= (x.elts[i] & ~y.elts[i]);
>> +  return bad == 0;
>> +}
>> +inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
>> +  unsigned long bad = 0;
>> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
>> +bad |= x.elts[i];
>> +  return bad == 0;
>> +}
>> +extern HARD_REG_SET rr[2];
>> +extern int t[2];
>> +extern HARD_REG_SET nn;
>> +static HARD_REG_SET mm;
>> +void setup_reg_class_relations(void) {
>> +  HARD_REG_SET intersection_set, union_set, temp_set2;
>> +  for (int cl2 = 0; cl2 < 2; cl2++) {
>> +temp_set2 = rr[cl2] & ~nn;
>> +if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
>> +  mm = rr[0] & nn;
>> +  if (hard_reg_set_subset_p(mm, intersection_set))
>> +if (!hard_reg_set_subset_p(mm, temp_set2) ||
>> +hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
>> +  t[cl2] = 0;
>> +}
>> +  }
>> +}
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 693621ca990..1d79c77e8ce 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout 
>> (slp_tree node,
>> return result;
>> 
>>   if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
>> -  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
>> +  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
>> +  && (to_layout_i == 0
>> +  /* We can't permute vector defs.  */
>> +  || SLP_TREE_VEC_DEFS (node).is_empty (
> 
> Guess it's personal preference, but IMO it's easier to follow without the
> to_layout_i condition, so that it ties directly to the create_partitions
> test.

I don’t understand- in the code guarding this we seem to expect to_layout_i == 
0 and that’s the case we can handle as noop.  I didn’t understand why the 
function doesn’t always just do nothing in this case though, so I must have 
missed something.

Richard 


>  (Would be nice to have a name for whatever a node matching the new
> condition is, but I don't have any good ideas.)
> 
> LGTM otherwise FWIW.
> 
> Thanks,
> Richard
> 
>> {
>>   /* If the vector is uniform or unchanged, there's nothing to do.  */
>>   if (to_layout_i == 0 || vect_slp_tree_uniform_p (node))
>> @@ -5944,7 +5947,8 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
>> slp_tree node,
>>  calculated by the recursive call).  Otherwise it is the number of
>>  scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by
>>  VF divided by the number of elements in a vector.  */
>> -  if (!STMT_VINFO_DATA_REF (stmt_info)
>> +  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
>> +  && !STMT_VINFO_DATA_REF (stmt_info)
>>   && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>> {
>>   for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i)

Re: [committed] Document new analyzer parameters

2023-07-20 Thread David Malcolm via Gcc-patches

On Thu, 2023-07-20 at 16:47 +0200, Martin Jambor wrote:
> Hi,
> 
> This patch documents the analyzer parameters introduced in
> r14-2029-g0e466e978c7286 also in gcc/doc/invoke.texi.
> 
> Committed as obvious after testing with make pdf and make info and
> eyeballing the result.
> 
> Thanks,

Thanks
Dave

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Max Filippov via Gcc-patches

On Thu, Jul 20, 2023 at 7:37 AM Alexey Lapshin
 wrote:
>
> gcc/
> * config/xtensa/xtensa.h (XCHAL_HAVE_BE, XCHAL_HAVE_DENSITY,
>   XCHAL_HAVE_CONST16, XCHAL_HAVE_ABS, XCHAL_HAVE_ADDX,
>   XCHAL_HAVE_L32R, XSHAL_USE_ABSOLUTE_LITERALS,
>   XSHAL_HAVE_TEXT_SECTION_LITERALS, XCHAL_HAVE_MAC16,
>   XCHAL_HAVE_MUL16, XCHAL_HAVE_MUL32, XCHAL_HAVE_MUL32_HIGH,
>   XCHAL_HAVE_DIV32, XCHAL_HAVE_NSA, XCHAL_HAVE_MINMAX,
>   XCHAL_HAVE_SEXT, XCHAL_HAVE_LOOPS, XCHAL_HAVE_THREADPTR,
>   XCHAL_HAVE_RELEASE_SYNC, XCHAL_HAVE_S32C1I,
>   XCHAL_HAVE_BOOLEANS, XCHAL_HAVE_FP, XCHAL_HAVE_FP_DIV,
>   XCHAL_HAVE_FP_RECIP, XCHAL_HAVE_FP_SQRT,
>   XCHAL_HAVE_FP_RSQRT, XCHAL_HAVE_FP_POSTINC, XCHAL_HAVE_DFP,
>   XCHAL_HAVE_DFP_DIV, XCHAL_HAVE_DFP_RECIP,
>   XCHAL_HAVE_DFP_SQRT, XCHAL_HAVE_DFP_RSQRT,
>   XCHAL_HAVE_WINDOWED, XCHAL_NUM_AREGS,
>   XCHAL_HAVE_WIDE_BRANCHES, XCHAL_HAVE_PREDICTED_BRANCHES,
>   XCHAL_ICACHE_SIZE, XCHAL_DCACHE_SIZE,
>   XCHAL_ICACHE_LINESIZE, XCHAL_DCACHE_LINESIZE,
>   XCHAL_ICACHE_LINEWIDTH, XCHAL_DCACHE_LINEWIDTH,
>   XCHAL_DCACHE_IS_WRITEBACK, XCHAL_HAVE_MMU,
>   XCHAL_MMU_MIN_PTE_PAGE_SIZE, XCHAL_HAVE_DEBUG,
>   XCHAL_NUM_IBREAK, XCHAL_NUM_DBREAK, XCHAL_DEBUGLEVEL,
>   XCHAL_MAX_INSTRUCTION_SIZE, XCHAL_INST_FETCH_WIDTH,
>   XSHAL_ABI, XTHAL_ABI_WINDOWED, XTHAL_ABI_CALL0,
>   XCHAL_M_STAGE, XTENSA_MARCH_LATEST, XTENSA_MARCH_EARLIEST,
>   XCHAL_HAVE_CLAMPS, XCHAL_HAVE_DEPBITS,
>   XCHAL_HAVE_EXCLUSIVE, XCHAL_HAVE_XEA3): Add builtin-macros
>   with values from dynconfig.
> ---
>  gcc/config/xtensa/xtensa.h | 62 ++
>  1 file changed, 62 insertions(+)
>
> diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
> index 8ebf37cab33..a65b674915b 100644
> --- a/gcc/config/xtensa/xtensa.h
> +++ b/gcc/config/xtensa/xtensa.h
> @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
>  #endif
>
>
> +#define XTENSA_CPU_CPP_BUILTIN(OPT) builtin_define_with_int_value (#OPT, OPT)
>  /* Target CPU builtins.  */
>  #define TARGET_CPU_CPP_BUILTINS()  \
>do { \
> @@ -82,6 +83,67 @@ along with GCC; see the file COPYING3.  If not see
>builtin_define ("__XTENSA_SOFT_FLOAT__");  
>   \
>  for (builtin = xtensa_get_config_strings (); *builtin; ++builtin)  \
>builtin_define (*builtin);   \

The loop above already does the same thing, doesn't it?

> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_BE);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DENSITY);  
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_CONST16);  
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_ABS);  
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_ADDX); 
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_L32R); 
>   \
> +XTENSA_CPU_CPP_BUILTIN(XSHAL_USE_ABSOLUTE_LITERALS);   \
> +XTENSA_CPU_CPP_BUILTIN(XSHAL_HAVE_TEXT_SECTION_LITERALS);  \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MAC16);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL16);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL32);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MUL32_HIGH);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_DIV32);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_NSA);  
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_MINMAX);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_SEXT); 
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_LOOPS);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_THREADPTR);
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_RELEASE_SYNC);   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_S32C1I);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_BOOLEANS); 
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_DIV);   
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_RECIP); 
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_SQRT);  
>   \
> +XTENSA_CPU_CPP_BUILTIN(XCHAL_HAVE_FP_RSQRT); 
>

Re: [PATCH, OpenACC 2.7] readonly modifier support in front-ends

2023-07-20 Thread Tobias Burnus


Hi Thomas & Chung-Lin,


On 20.07.23 15:33, Thomas Schwinge wrote:

On 2023-07-11T02:33:58+0800, Chung-Lin Tang
 wrote:

+++ b/gcc/c/c-parser.cc
@@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser,

  static tree
  c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind,
-   tree list, bool allow_deref = false)
+   tree list, bool allow_deref = false,
+   bool *readonly = NULL)
...

Instead of doing this in 'c_parser_omp_var_list_parens', I think it's
clearer to have this special 'readonly :' parsing logic in the two places
where it's used.


I concur. The same issue also occurred for OpenMP's
c_parser_omp_clause_to, and c_parser_omp_clause_from and the 'present'
modifier. For it, I created a combined function but the main reason for
that is that OpenMP also permits more modifiers (like 'iterators'),
which would cause more duplication of code ('iterator' is not yet
supported).

For something as simple to parse as this modifier, I would just do it at
the two places – as Thomas suggested.


+++ b/gcc/fortran/gfortran.h
@@ -1360,7 +1360,11 @@ typedef struct gfc_omp_namelist
  {
gfc_omp_reduction_op reduction_op;
gfc_omp_depend_doacross_op depend_doacross_op;
-  gfc_omp_map_op map_op;
+  struct
+{
+   ENUM_BITFIELD (gfc_omp_map_op) map_op:8;
+   bool readonly;
+};
gfc_expr *align;
struct
   {

[...] Thus, the above looks good to me.


I concur but I wonder whether it would be cleaner to name the struct;
this makes it also more obvious what belongs together in the union.

Namely, naming the struct 'map' and then changing the 45 users from
'u.map_op' to 'u.map.op' and the new 'u.readonly' to 'u.map.readonly'. –
this seems to be cleaner.


  static bool
  gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
-   bool allow_common, bool allow_derived)
+   bool allow_common, bool allow_derived, bool readonly = 
false)
  {

Similar to 'c_parser_omp_var_list_parens' above,


I concur that not doing it here is cleaner.


again, for
example (random), like 'ancestor :', or 'conditional :' are parsed --
which you're mostly already doing


I think OpenMP's "present" (as modifier to "omp target updates"'s
"to"/"from") is a better example than "ancestor" as for present we also
have a list. See: gfc_match_motion_var_list how to handle the headp.

(There an extra functions was used as in the future also other modifiers
like 'iterator' will be used.)

However, as Thomas noted, the patch contains also an example (see
further down in Thomas' email, not quoted here).


Or, we could add a new 'gcc/fortran/gfortran.h:gfc_omp_map_op' item
'OMP_MAP_TO_READONLY', which eventually translates into 'OMP_MAP_TO' with
'readonly' set?


I think having the additional flag is easier to understand - and at least
memory wise we do not save memory as it is in a union. The advantage
of not having a union is that accessing the int-enum is faster than accessing
an char-wide bitset enum.

In terms of code changes (and without having a closer look), the two
approaches seems to be be similar.

Hence, using OMP_MAP_TO_READONLY for OpenACC would be fine, too. And
I do not have a strong preference for either.

* * *

I did wonder about the following, but I now believe it won't affect
the choice. Namely, we want to handle at some point the following:

!$omp target firstprivate(var) allocator(omp_const_mem_alloc: var)

This could be turned into  GOMP_MAP_FIRSTPRIVATE... + OMP_.*READONLY flag.

But if we don't do it in the FE, the internal Fortran representation
does not matter.
Advantage for doing it in the ME: Only one code location, especially as
we might use the opportunity to also check that the omp_const_mem_alloc
is only used with privatization (in OpenMP).

Difference: OpenMP uses 'firstprivate' (i.e. private copy, no reference count 
bump,
only permitted for 'target') while OpenACC uses 'copy' which implies reference
counting and permitted in 'acc (enter/exit) data' and not only for compute 
constructs.

OpenMP in principle also permits user-defined allocator with a constant
memory space - I am not completely sure whether/when it can be used with
  omp target firstprivate(...) allocator(my_alloc : ...)



Then we'd just here call the (unaltered)
'gfc_match_omp_map_clause', with
'readonly ? OMP_MAP_TO_READONLY : OMP_MAP_TO'?  Per
'git grep --cached '[^G]OMP_MAP_TO[^F]' -- gcc/fortran/' not a lot of
places need adjusting for that (most of the 'gcc/fortran/openmp.cc' ones
are not applicable).


I think either would work. – I have no strong feeling what's better.
But you still need to handle it for clause resolution.


+ if (gfc_match ("readonly :") == MATCH_YES)
I note this one does not have a space after ':' in 'gfc_match', but the
one above in 'gfc_match_omp_clauses' does.  I don't know off-hand if that
makes a di

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Alexey Lapshin via Gcc-patches

Oops, missed this loop while implementing...

I had a problem with building esp chips multilib until added my changes.

This loop looks like just defines a macro without value.
But the value must be set to make it work correctly.
It uses builtin_define() instead builtin_define_with_int_value()

I will check how it could be soved with the loop approach.

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Max Filippov via Gcc-patches

On Thu, Jul 20, 2023 at 8:12 AM Alexey Lapshin
 wrote:
>
> Oops, missed this loop while implementing...
>
> I had a problem with building esp chips multilib until added my changes.
>
> This loop looks like just defines a macro without value.

But it defines them with their respective values.
Just notice that it adds two leading underscores in front of the names.

> But the value must be set to make it work correctly.
> It uses builtin_define() instead builtin_define_with_int_value()
>
> I will check how it could be soved with the loop approach.

-- 
Thanks.
-- Max

Re: [PATCH v4 2/3] c++: Improve constexpr error for dangling local variables [PR110619]

2023-07-20 Thread Jason Merrill via Gcc-patches


On 7/20/23 05:36, Nathaniel Shead wrote:

Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values.

This patch removes this "optimisation".


This isn't an optimization, it's for safety, removing a way for an 
attacker to get a handle on other data on the stack (CWE-562).


But I agree that we need to preserve some element of UB for constexpr 
evaluation to see.


Perhaps we want to move this transformation to 
cp_maybe_instrument_return, so it happens after maybe_save_constexpr_fundef?



Relying on this raises a warning
by default and causes UB anyway, so there should be no issue in doing
so. We also suppress additional warnings from later passes that detect
this as a dangling pointer, since we've already indicated this anyway.

PR c++/110619

gcc/cp/ChangeLog:

* semantics.cc (finish_return_stmt): Suppress dangling pointer
reporting on return statement if already reported.
* typeck.cc (check_return_expr): Don't set return expression to
zero for dangling addresses.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime5.C: Test reported message is
correct.
* g++.dg/cpp1y/constexpr-lifetime6.C: Likewise.
* g++.dg/cpp1y/constexpr-110619.C: New test.
* g++.dg/warn/Wreturn-local-addr-6.C: Remove check for return
value optimisation.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/semantics.cc  |  5 -
  gcc/cp/typeck.cc |  5 +++--
  gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C| 10 ++
  gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C |  4 ++--
  gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C |  8 
  gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C |  3 ---
  6 files changed, 23 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..107407de513 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1260,7 +1260,10 @@ finish_return_stmt (tree expr)
  
r = build_stmt (input_location, RETURN_EXPR, expr);

if (no_warning)
-suppress_warning (r, OPT_Wreturn_type);
+{
+  suppress_warning (r, OPT_Wreturn_type);
+  suppress_warning (r, OPT_Wdangling_pointer_);
+}
r = maybe_cleanup_point_expr_void (r);
r = add_stmt (r);
  
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc

index 859b133a18d..47233b3b717 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11273,8 +11273,9 @@ check_return_expr (tree retval, bool *no_warning)
else if (!processing_template_decl
   && maybe_warn_about_returning_address_of_local (retval, loc)
   && INDIRECT_TYPE_P (valtype))
-   retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
-build_zero_cst (TREE_TYPE (retval)));
+   /* Suppress the Wdangling-pointer warning in the return statement
+  that would otherwise occur.  */
+   *no_warning = true;
  }
  
/* A naive attempt to reduce the number of -Wdangling-reference false

diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
new file mode 100644
index 000..cca13302238
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wno-return-local-addr" }
+// PR c++/110619
+
+constexpr auto f() {
+int i = 0;
+return &i;
+};
+
+static_assert( f() != nullptr );
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
index a4bc71d890a..ad3ef579f63 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
@@ -1,11 +1,11 @@
  // { dg-do compile { target c++14 } }
  // { dg-options "-Wno-return-local-addr" }
  
-constexpr const int& id(int x) { return x; }

+constexpr const int& id(int x) { return x; }  // { dg-message "note: declared 
here" }
  
  constexpr bool test() {

const int& y = id(3);
return y == 3;
  }
  
-constexpr bool x = test();  // { dg-error "" }

+constexpr bool x = test();  // { dg-error "accessing object outside its 
lifetime" }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
index f358aff4490..b81e89af79c 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C
@@ -4,12 +4,12 @@
  struct Empty {};
  
  constexpr const Empty& empty() {

-  return Empty{};
+  return Empty{};  // { dg-message "note: declared here" }
  }
  
-constexpr const Empty& empty_parm(Empty e) {

+constexpr const Empty& empty_parm(Empt

Re: [PATCH] c++: passing partially inst tmpl as ttp [PR110566]

2023-07-20 Thread Patrick Palka via Gcc-patches

On Wed, 19 Jul 2023, Jason Merrill wrote:

> On 7/19/23 14:05, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk/13?
> > 
> > -- >8 --
> > 
> > Since the arguments 'pargs' passed to the coerce_template_parms from
> > coerce_template_template_parms are always a full set, we need to make sure
> > we always pass the parameters of the most general template because if the
> > template is partially instantiated then the levels won't match up.
> 
> Hmm, so then my comment "In that case we might end up adding more levels than
> needed, but that shouldn't be a problem; any args we need to refer to are at
> the right level." is wrong for auto template parms?

I suppose, but only for the ttp case I think?  It seems all is well when
passing an ordinary template as a ttp as long as we use the parameters of the
most general template for the coercion.  I can't come up with a counterexample
at least.

> 
> So I guess we likely need to do more to assure that pargs has the right number
> of levels if there are autos in the innermost arg parms.
> 
> Also, most_general_template doesn't work for TTPs, so that probably won't help
> handle their partial instantiations.

Ah, yeah :( Here's an analagous testcase that we still ICE on due to this:

  template class>
  struct A;

  template
  struct B {
template
struct C {
  template class TT>
  using type = A;
};
  };

  template struct B::C;

I think I have a fix using get_innermost_template_args.

> 
> And is it right for an alias template in a partial specialization?

IIUC yes, e.g. tsubst directly relies on most_general_template to work
for alias templates (when substituting an alias template specialization).

> 
> > In the
> > testcase below during said call to coerce_template_parms the parameters
> > are {X, Y} both level 1, but the arguments are {{int}, {N, M}}, which
> > leads to a crash during auto deduction of X and Y.
> > 
> > PR c++/110566
> 
> Since this is a regression from the patch for PR c++/108179, please list that
> PR here as well, to help avoid backporting that patch without this one.
> 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (coerce_template_template_parms): Simplify by using
> > DECL_INNERMOST_TEMPLATE_PARMS and removing redundant asserts.
> > Always pass the parameters of the most general template to
> > coerce_template_parms.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/ttp38.C: New test.
> > ---
> >   gcc/cp/pt.cc  | 12 +---
> >   gcc/testsuite/g++.dg/template/ttp38.C | 12 
> >   2 files changed, 17 insertions(+), 7 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/ttp38.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index d882e9dd117..8723868823e 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -8073,12 +8073,10 @@ coerce_template_template_parms (tree parm_tmpl,
> > tree parm, arg;
> > int variadic_p = 0;
> >   -  tree parm_parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS
> > (parm_tmpl));
> > -  tree arg_parms_full = DECL_TEMPLATE_PARMS (arg_tmpl);
> > -  tree arg_parms = INNERMOST_TEMPLATE_PARMS (arg_parms_full);
> > -
> > -  gcc_assert (TREE_CODE (parm_parms) == TREE_VEC);
> > -  gcc_assert (TREE_CODE (arg_parms) == TREE_VEC);
> > +  tree parm_parms = DECL_INNERMOST_TEMPLATE_PARMS (parm_tmpl);
> > +  tree arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (arg_tmpl);
> > +  tree gen_arg_tmpl = most_general_template (arg_tmpl);
> > +  tree gen_arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_arg_tmpl);
> >   nparms = TREE_VEC_LENGTH (parm_parms);
> > nargs = TREE_VEC_LENGTH (arg_parms);
> > @@ -8134,7 +8132,7 @@ coerce_template_template_parms (tree parm_tmpl,
> > scope_args = TI_ARGS (tinfo);
> > pargs = add_to_template_args (scope_args, pargs);
> >   -  pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE,
> > tf_none);
> > +  pargs = coerce_template_parms (gen_arg_parms, pargs, NULL_TREE,
> > tf_none);
> > if (pargs != error_mark_node)
> > {
> >   tree targs = make_tree_vec (nargs);
> > diff --git a/gcc/testsuite/g++.dg/template/ttp38.C
> > b/gcc/testsuite/g++.dg/template/ttp38.C
> > new file mode 100644
> > index 000..7d25d291e81
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/ttp38.C
> > @@ -0,0 +1,12 @@
> > +// PR c++/110566
> > +// { dg-do compile { target c++20 } }
> > +
> > +template class>
> > +struct A;
> > +
> > +template
> > +struct B {
> > +  template struct C;
> > +};
> > +
> > +using type = A::C>;
> 
>

Re: [PATCH v2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-20 Thread Patrick Palka via Gcc-patches

On Wed, Jul 19, 2023 at 3:33 PM Ken Matsui via Gcc-patches
 wrote:
>
> This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used
> as a flag to toggle the use of built-in traits in the type_traits header
> through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
> source code.

LGTM!

>
> libstdc++-v3/ChangeLog:
>
> * include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
> (_GLIBCXX_HAS_BUILTIN): Keep defined.
>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/bits/c++config | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index dd47f274d5f..984985d6fff 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -854,7 +854,15 @@ namespace __gnu_cxx
>  # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
>  #endif
>
> -#undef _GLIBCXX_HAS_BUILTIN
> +// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
> +// has a corresponding built-in type trait, 0 otherwise.
> +// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in
> +// traits.
> +#ifndef _GLIBCXX_NO_BUILTIN_TRAITS
> +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT)
> +#else
> +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0
> +#endif
>
>  // Mark code that should be ignored by the compiler, but seen by Doxygen.
>  #define _GLIBCXX_DOXYGEN_ONLY(X)
> --
> 2.41.0
>

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Alexey Lapshin via Gcc-patches

I see now, thanks for the explanation, I will try to rebuild toolchain without 
this particular patch.

BTW, what do you thing about placing config from newlib overlay to dynconfig?

[committed] libgomp.texi: Split OpenMP routines chapter into sections

2023-07-20 Thread Tobias Burnus


When recently looking at the libgomp documentation, here the current GCC 13 
version,
  
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/Runtime-Library-Routines.html
I found both the order confusing and the wording:
"The routines are structured in following three parts:"

as I did not see any separation. In the "info" version, one could indeed see:

  Control threads, processors and the parallel environment.  They have C
  linkage, and do not throw exceptions.
 (long list of routines)
  Initialize, set, test, unset and destroy simple and nested locks.
(short list)
  Portable, thread-based, wall clock timer.
 (two rourines)
  Support for event objects.
 (one routine)

which are the three parts - or do I count four?

OpenMP itself split the long list into several parts; the attached commit does
likewise. Result (before or after the change, depending when you have a look):
  https://gcc.gnu.org/onlinedocs/libgomp/

(The commit also fixed a typo in the OMP_ALLOCATOR example.)

Committed as r14-2681-g506f068e7d01ad

* * *

If you look at the '@c' items in the '@menu' you will also see which routines
are missing - either
* because they have not yet been documented - or
* because they have not yet been implemented.

37 routines fall into the first category. Possibly, some of the dicumented 
routines
need to be updated as well, if somethines has changed after their initial
implementation (or to improve the wording).

See also https://gcc.gnu.org/PR110364 for improvements which should eventually 
be done.

Tobias

PS: If someone feels bored, proof reading or adding a description for the 
missing routines
would be useful :-)  - But assume/fear that no one feels bored :-)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 506f068e7d01ad2fb107185b8fb204a0ec23785c
Author: Tobias Burnus 
Date:   Thu Jul 20 18:12:57 2023 +0200

libgomp.texi: Split OpenMP routines chapter into sections

The previous list of OpenMP routines was rather lengthy and the order seemed
to be rather random - especially for outputs which did not have @menu as then
the sectioning was not visible.

The OpenMP specification split in 5.1 the lengthy list by adding
sections to the chapter and grouping the routines under them.

This patch follow suite and uses the same sections and order. The commit also
prepares for adding not-yet-documented routines by listening those in the
@menu (@c commented - both for just undocumented and for also unimplemented
routines). See also PR 110364.

libgomp/ChangeLog:

* libgomp.texi (OpenMP Runtime Library Routines):
Split long list by adding sections and moving routines there.
(OMP_ALLOCATORS): Fix typo.
---
 libgomp/libgomp.texi | 1267 +-
 1 file changed, 727 insertions(+), 540 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index a8582b50177..9d3b2ae54cb 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -500,214 +500,229 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0.
 @node Runtime Library Routines
 @chapter OpenMP Runtime Library Routines
 
-The runtime routines described here are defined by Section 3 of the OpenMP
-specification in version 4.5.  The routines are structured in following
-three parts:
+The runtime routines described here are defined by Section 18 of the OpenMP
+specification in version 5.2.
 
 @menu
-Control threads, processors and the parallel environment.  They have C
-linkage, and do not throw exceptions.
+* Thread Team Routines::
+* Thread Affinity Routines::
+* Teams Region Routines::
+* Tasking Routines::
+@c * Resource Relinquishing Routines::
+* Device Information Routines::
+@c * Device Memory Routines::
+* Lock Routines::
+* Timing Routines::
+* Event Routine::
+@c * Interoperability Routines::
+@c * Memory Management Routines::
+@c * Tool Control Routine::
+@c * Environment Display Routine::
+@end menu
 
-* omp_get_active_level::Number of active parallel regions
-* omp_get_ancestor_thread_num:: Ancestor thread ID
-* omp_get_cancellation::Whether cancellation support is enabled
-* omp_get_default_device::  Get the default device for target regions
-* omp_get_device_num::  Get device that current thread is running on
-* omp_get_dynamic:: Dynamic teams setting
-* omp_get_initial_device::  Device number of host device
-* omp_get_level::   Number of parallel regions
-* omp_get_max_active_levels::   Current maximum number of active regions
-* omp_get_max_task_priority::   Maximum task priority value that can be set
-* omp_get_max_teams::   Maximum number of teams for teams region
-* omp_get_max_thr

Re: [PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
>> Am 20.07.2023 um 16:09 schrieb Richard Sandiford :
>> 
>> Richard Biener via Gcc-patches  writes:
>>> When we materialize a layout we push edge permutes to constant/external
>>> defs without checking we can actually do so.  For externals defined
>>> by vector stmts rather than scalar components we can't.
>>> 
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>>> 
>>> OK?
>>> 
>>> Thanks,
>>> Richard.
>>> 
>>>PR tree-optimization/110742
>>>* tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
>>>Do not materialize an edge permutation in an external node with
>>>vector defs.
>>>(vect_slp_analyze_node_operations_1): Guard purely internal
>>>nodes better.
>>> 
>>>* g++.dg/torture/pr110742.C: New testcase.
>>> ---
>>> gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
>>> gcc/tree-vect-slp.cc|  8 +++--
>>> 2 files changed, 53 insertions(+), 2 deletions(-)
>>> create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C
>>> 
>>> diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
>>> b/gcc/testsuite/g++.dg/torture/pr110742.C
>>> new file mode 100644
>>> index 000..d41ac0479d2
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/torture/pr110742.C
>>> @@ -0,0 +1,47 @@
>>> +// { dg-do compile }
>>> +
>>> +struct HARD_REG_SET {
>>> +  HARD_REG_SET operator~() const {
>>> +HARD_REG_SET res;
>>> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
>>> +  res.elts[i] = ~elts[i];
>>> +return res;
>>> +  }
>>> +  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
>>> +HARD_REG_SET res;
>>> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
>>> +  res.elts[i] = elts[i] & other.elts[i];
>>> +return res;
>>> +  }
>>> +  unsigned long elts[4];
>>> +};
>>> +typedef const HARD_REG_SET &const_hard_reg_set;
>>> +inline bool hard_reg_set_subset_p(const_hard_reg_set x, const_hard_reg_set 
>>> y) {
>>> +  unsigned long bad = 0;
>>> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
>>> +bad |= (x.elts[i] & ~y.elts[i]);
>>> +  return bad == 0;
>>> +}
>>> +inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
>>> +  unsigned long bad = 0;
>>> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); ++i)
>>> +bad |= x.elts[i];
>>> +  return bad == 0;
>>> +}
>>> +extern HARD_REG_SET rr[2];
>>> +extern int t[2];
>>> +extern HARD_REG_SET nn;
>>> +static HARD_REG_SET mm;
>>> +void setup_reg_class_relations(void) {
>>> +  HARD_REG_SET intersection_set, union_set, temp_set2;
>>> +  for (int cl2 = 0; cl2 < 2; cl2++) {
>>> +temp_set2 = rr[cl2] & ~nn;
>>> +if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
>>> +  mm = rr[0] & nn;
>>> +  if (hard_reg_set_subset_p(mm, intersection_set))
>>> +if (!hard_reg_set_subset_p(mm, temp_set2) ||
>>> +hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
>>> +  t[cl2] = 0;
>>> +}
>>> +  }
>>> +}
>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>>> index 693621ca990..1d79c77e8ce 100644
>>> --- a/gcc/tree-vect-slp.cc
>>> +++ b/gcc/tree-vect-slp.cc
>>> @@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout 
>>> (slp_tree node,
>>> return result;
>>> 
>>>   if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
>>> -  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
>>> +  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
>>> +  && (to_layout_i == 0
>>> +  /* We can't permute vector defs.  */
>>> +  || SLP_TREE_VEC_DEFS (node).is_empty (
>> 
>> Guess it's personal preference, but IMO it's easier to follow without the
>> to_layout_i condition, so that it ties directly to the create_partitions
>> test.
>
> I don’t understand- in the code guarding this we seem to expect to_layout_i 
> == 0 and that’s the case we can handle as noop.  I didn’t understand why the 
> function doesn’t always just do nothing in this case though, so I must have 
> missed something.

OK, so I guess that disproves that my way is easier to understand :)

I think logically, the code is doing the equivalent of:

  int partition_i = m_vertices[node->vertex].partition;
  if (partition < 0)
{
  /* If the vector is uniform or unchanged, there's nothing to do.  */
  ...  
}
  else
{
  ... Return node if to_layout_i matches this partition's chosen layout...
}

And I guess I should have written it that way.

So when there is no partition, we have a constant or external def
built from individual scalars.  We can use the node as-is if the
caller wants an unpermuted node or if all elements are equal
(so that the permutation doesn't matter).  Otherwise we need
to permute the scalars.

When there is a partition, we can use the node as-is if the caller
wants the layout that was chosen for that partition.  Otherwise we
need a new VEC_PERM_EXPR node.

In the particular

Re: [PATCH v4 3/3] c++: Improve location information in constant evaluation

2023-07-20 Thread Jason Merrill via Gcc-patches


On 7/20/23 05:37, Nathaniel Shead wrote:

This patch updates 'input_location' during constant evaluation to ensure
that errors in subexpressions that lack location information still
provide accurate diagnostics.

By itself this change causes some small regressions in diagnostic
quality for circumstances where errors used 'input_location' but the
location of the parent subexpression doesn't make sense, so this patch
also includes a couple of other small diagnostic improvements to improve
the most egregious cases.

gcc/cp/ChangeLog:

* constexpr.cc (modifying_const_object_error): Find the source
location of the const object's declaration.
(cxx_eval_store_expression): Fall back to the location of the
target object when evaluating initialiser.


I'm skeptical about this workaround being an improvement in general. 
Reverting it, there only seems to be a difference for constexpr-89285.C, 
which seems fine; we see the location as the first line of the class, as 
usual for implicitly declared constructors.


Showing the DMI location might be an improvement, but I think it would 
be better to make that change in perform_member_init so it applies to 
runtime as well.



(cxx_eval_constant_expression): Update input_location to the location
of the currently evaluated expression, if possible.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/equal/constexpr_neg.cc: Update diagnostic
locations.
* testsuite/26_numerics/gcd/105844.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-48089.C: Update diagnostic locations.
* g++.dg/cpp0x/constexpr-70323.C: Likewise.
* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
* g++.dg/cpp0x/constexpr-delete2.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ice20.C: Likewise.
* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
* g++.dg/cpp0x/overflow1.C: Likewise.
* g++.dg/cpp1y/constexpr-89285.C: Likewise.
* g++.dg/cpp1y/constexpr-89481.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const14.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const16.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const18.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const19.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const21.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const22.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const3.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const4.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const7.C: Likewise.
* g++.dg/cpp1y/constexpr-union5.C: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda8.C: Likewise.
* g++.dg/cpp2a/bit-cast11.C: Likewise.
* g++.dg/cpp2a/bit-cast12.C: Likewise.
* g++.dg/cpp2a/bit-cast14.C: Likewise.
* g++.dg/cpp2a/constexpr-98122.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
* g++.dg/cpp2a/constexpr-init1.C: Likewise.
* g++.dg/cpp2a/constexpr-new12.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise.
* g++.dg/cpp2a/constinit10.C: Likewise.
* g++.dg/cpp2a/is-corresponding-member4.C: Likewise.
* g++.dg/ext/constexpr-vla2.C: Likewise.
* g++.dg/ext/constexpr-vla3.C: Likewise.
* g++.dg/ubsan/pr63956.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/constexpr.cc   | 46 ++-
  gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 ++--
  gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |  8 ++--
  gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |  8 ++--
  .../g++.dg/cpp0x/constexpr-delete2.C  |  5 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  1 +
  .../g++.dg/cpp0x/constexpr-recursion.C|  6 +--
  gcc/testsuite/g++.dg/cpp0x/overflow1.C|  2 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |  5 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
  .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
  .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
  .../g++.dg/cpp1y/constexpr-lifetime3.C|  4 +-
  .../g++.dg/cpp1y/constexpr-lifetime4.C|  2 +-
  .../g++.dg/cpp1y/constexpr-lifetime5.C|  4 +-
  .../g++.dg/cpp1y/constexpr-tracking-const14.C |  3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const16.C |  3 +-
  .../g++.dg/cpp1y/constexpr-tracking-const18.C |  4 +-
  .../g++.dg/cpp1y/c

Re: [PATCH] testsuite: fix allocator-opt1.C FAIL with old ABI

2023-07-20 Thread Marek Polacek via Gcc-patches

On Wed, Jul 19, 2023 at 03:22:10PM -0400, Marek Polacek wrote:
> Ping.
> 
> On Mon, Jul 10, 2023 at 04:33:26PM -0400, Marek Polacek via Gcc-patches wrote:
> > Running
> > $ make check-g++ 
> > RUNTESTFLAGS='--target_board=unix\{-D_GLIBCXX_USE_CXX11_ABI=0,\} 
> > dg.exp=allocator-opt1.C'
> > yields:
> > 
> > FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++98  scan-tree-dump-times 
> > gimple "struct allocator D" 1
> > FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++14  scan-tree-dump-times 
> > gimple "struct allocator D" 1
> > FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++17  scan-tree-dump-times 
> > gimple "struct allocator D" 1
> > FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++20  scan-tree-dump-times 
> > gimple "struct allocator D" 1

I just pushed the patch after fixing it and adding a new comment:

-- >8 --

Running
$ make check-g++ 
RUNTESTFLAGS='--target_board=unix\{-D_GLIBCXX_USE_CXX11_ABI=0,\} 
dg.exp=allocator-opt1.C'
yields:

FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++98  scan-tree-dump-times gimple 
"struct allocator D" 1
FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++14  scan-tree-dump-times gimple 
"struct allocator D" 1
FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++17  scan-tree-dump-times gimple 
"struct allocator D" 1
FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++20  scan-tree-dump-times gimple 
"struct allocator D" 1

=== g++ Summary for unix/-D_GLIBCXX_USE_CXX11_ABI=0 ===

=== g++ Summary for unix ===

because in the old ABI we get two "struct allocator D".  This patch
follows r14-658 although I'm not quite sure I follow the logic there.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/allocator-opt1.C: Force _GLIBCXX_USE_CXX11_ABI to 1.
---
 gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C 
b/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
index e8394c7ad70..51c470dee37 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
@@ -5,8 +5,20 @@
 // Currently the dump doesn't print the allocator template arg in this context.
 // { dg-final { scan-tree-dump-times "struct allocator D" 1 "gimple" } }
 
+// In the pre-C++11 ABI we get two allocator variables.
+#undef _GLIBCXX_USE_CXX11_ABI
+#define _GLIBCXX_USE_CXX11_ABI 1
+
 #include 
+
+// When the library is not dual-ABI and defaults to old just compile
+// an empty TU.  NB: We test _GLIBCXX_USE_CXX11_ABI again because the
+// #include above might have undef'd _GLIBCXX_USE_CXX11_ABI.
+#if _GLIBCXX_USE_CXX11_ABI
+
 void f (const char *p)
 {
   std::string lst[] = { p, p, p, p };
 }
+
+#endif

base-commit: 506f068e7d01ad2fb107185b8fb204a0ec23785c
-- 
2.41.0

Re: [PATCH] c++: passing partially inst tmpl as ttp [PR110566]

2023-07-20 Thread Jason Merrill via Gcc-patches


On 7/20/23 12:00, Patrick Palka wrote:

On Wed, 19 Jul 2023, Jason Merrill wrote:


On 7/19/23 14:05, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

-- >8 --

Since the arguments 'pargs' passed to the coerce_template_parms from
coerce_template_template_parms are always a full set, we need to make sure
we always pass the parameters of the most general template because if the
template is partially instantiated then the levels won't match up.


Hmm, so then my comment "In that case we might end up adding more levels than
needed, but that shouldn't be a problem; any args we need to refer to are at
the right level." is wrong for auto template parms?


I suppose, but only for the ttp case I think?  It seems all is well when
passing an ordinary template as a ttp as long as we use the parameters of the
most general template for the coercion.  I can't come up with a counterexample
at least.


Agreed; the quoted comment is already only about passing a TTP, as 
ordinary templates always have DECL_CONTEXT set properly.


Jason

Re: [PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Biener via Gcc-patches




> Am 20.07.2023 um 18:59 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
 Am 20.07.2023 um 16:09 schrieb Richard Sandiford 
 :
>>> 
>>> Richard Biener via Gcc-patches  writes:
 When we materialize a layout we push edge permutes to constant/external
 defs without checking we can actually do so.  For externals defined
 by vector stmts rather than scalar components we can't.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
 OK?
 
 Thanks,
 Richard.
 
   PR tree-optimization/110742
   * tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
   Do not materialize an edge permutation in an external node with
   vector defs.
   (vect_slp_analyze_node_operations_1): Guard purely internal
   nodes better.
 
   * g++.dg/torture/pr110742.C: New testcase.
 ---
 gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
 gcc/tree-vect-slp.cc|  8 +++--
 2 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C
 
 diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
 b/gcc/testsuite/g++.dg/torture/pr110742.C
 new file mode 100644
 index 000..d41ac0479d2
 --- /dev/null
 +++ b/gcc/testsuite/g++.dg/torture/pr110742.C
 @@ -0,0 +1,47 @@
 +// { dg-do compile }
 +
 +struct HARD_REG_SET {
 +  HARD_REG_SET operator~() const {
 +HARD_REG_SET res;
 +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
 +  res.elts[i] = ~elts[i];
 +return res;
 +  }
 +  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
 +HARD_REG_SET res;
 +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
 +  res.elts[i] = elts[i] & other.elts[i];
 +return res;
 +  }
 +  unsigned long elts[4];
 +};
 +typedef const HARD_REG_SET &const_hard_reg_set;
 +inline bool hard_reg_set_subset_p(const_hard_reg_set x, 
 const_hard_reg_set y) {
 +  unsigned long bad = 0;
 +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); 
 ++i)
 +bad |= (x.elts[i] & ~y.elts[i]);
 +  return bad == 0;
 +}
 +inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
 +  unsigned long bad = 0;
 +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); 
 ++i)
 +bad |= x.elts[i];
 +  return bad == 0;
 +}
 +extern HARD_REG_SET rr[2];
 +extern int t[2];
 +extern HARD_REG_SET nn;
 +static HARD_REG_SET mm;
 +void setup_reg_class_relations(void) {
 +  HARD_REG_SET intersection_set, union_set, temp_set2;
 +  for (int cl2 = 0; cl2 < 2; cl2++) {
 +temp_set2 = rr[cl2] & ~nn;
 +if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
 +  mm = rr[0] & nn;
 +  if (hard_reg_set_subset_p(mm, intersection_set))
 +if (!hard_reg_set_subset_p(mm, temp_set2) ||
 +hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
 +  t[cl2] = 0;
 +}
 +  }
 +}
 diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
 index 693621ca990..1d79c77e8ce 100644
 --- a/gcc/tree-vect-slp.cc
 +++ b/gcc/tree-vect-slp.cc
 @@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout 
 (slp_tree node,
return result;
 
  if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
 -  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
 +  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
 +  && (to_layout_i == 0
 +  /* We can't permute vector defs.  */
 +  || SLP_TREE_VEC_DEFS (node).is_empty (
>>> 
>>> Guess it's personal preference, but IMO it's easier to follow without the
>>> to_layout_i condition, so that it ties directly to the create_partitions
>>> test.
>> 
>> I don’t understand- in the code guarding this we seem to expect to_layout_i 
>> == 0 and that’s the case we can handle as noop.  I didn’t understand why the 
>> function doesn’t always just do nothing in this case though, so I must have 
>> missed something.
> 
> OK, so I guess that disproves that my way is easier to understand :)
> 
> I think logically, the code is doing the equivalent of:
> 
>  int partition_i = m_vertices[node->vertex].partition;
>  if (partition < 0)
>{
>  /* If the vector is uniform or unchanged, there's nothing to do.  */
>  ...  
>}
>  else
>{
>  ... Return node if to_layout_i matches this partition's chosen layout...
>}
> 
> And I guess I should have written it that way.
> 
> So when there is no partition, we have a constant or external def
> built from individual scalars.  We can use the node as-is if the
> caller wants an unpermuted node or if all elements are equal
> (so that the permutation doesn't matter).  Otherwise we

Re: [PATCH] tree-optimization/110742 - fix latent issue with permuting existing vectors

2023-07-20 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
>> Am 20.07.2023 um 18:59 schrieb Richard Sandiford :
>> 
>> Richard Biener  writes:
> Am 20.07.2023 um 16:09 schrieb Richard Sandiford 
> :
 
 Richard Biener via Gcc-patches  writes:
> When we materialize a layout we push edge permutes to constant/external
> defs without checking we can actually do so.  For externals defined
> by vector stmts rather than scalar components we can't.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/110742
>   * tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
>   Do not materialize an edge permutation in an external node with
>   vector defs.
>   (vect_slp_analyze_node_operations_1): Guard purely internal
>   nodes better.
> 
>   * g++.dg/torture/pr110742.C: New testcase.
> ---
> gcc/testsuite/g++.dg/torture/pr110742.C | 47 +
> gcc/tree-vect-slp.cc|  8 +++--
> 2 files changed, 53 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/torture/pr110742.C
> 
> diff --git a/gcc/testsuite/g++.dg/torture/pr110742.C 
> b/gcc/testsuite/g++.dg/torture/pr110742.C
> new file mode 100644
> index 000..d41ac0479d2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr110742.C
> @@ -0,0 +1,47 @@
> +// { dg-do compile }
> +
> +struct HARD_REG_SET {
> +  HARD_REG_SET operator~() const {
> +HARD_REG_SET res;
> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
> +  res.elts[i] = ~elts[i];
> +return res;
> +  }
> +  HARD_REG_SET operator&(const HARD_REG_SET &other) const {
> +HARD_REG_SET res;
> +for (unsigned int i = 0; i < (sizeof(elts) / sizeof((elts)[0])); ++i)
> +  res.elts[i] = elts[i] & other.elts[i];
> +return res;
> +  }
> +  unsigned long elts[4];
> +};
> +typedef const HARD_REG_SET &const_hard_reg_set;
> +inline bool hard_reg_set_subset_p(const_hard_reg_set x, 
> const_hard_reg_set y) {
> +  unsigned long bad = 0;
> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); 
> ++i)
> +bad |= (x.elts[i] & ~y.elts[i]);
> +  return bad == 0;
> +}
> +inline bool hard_reg_set_empty_p(const_hard_reg_set x) {
> +  unsigned long bad = 0;
> +  for (unsigned int i = 0; i < (sizeof(x.elts) / sizeof((x.elts)[0])); 
> ++i)
> +bad |= x.elts[i];
> +  return bad == 0;
> +}
> +extern HARD_REG_SET rr[2];
> +extern int t[2];
> +extern HARD_REG_SET nn;
> +static HARD_REG_SET mm;
> +void setup_reg_class_relations(void) {
> +  HARD_REG_SET intersection_set, union_set, temp_set2;
> +  for (int cl2 = 0; cl2 < 2; cl2++) {
> +temp_set2 = rr[cl2] & ~nn;
> +if (hard_reg_set_empty_p(mm) && hard_reg_set_empty_p(temp_set2)) {
> +  mm = rr[0] & nn;
> +  if (hard_reg_set_subset_p(mm, intersection_set))
> +if (!hard_reg_set_subset_p(mm, temp_set2) ||
> +hard_reg_set_subset_p(rr[0], rr[t[cl2]]))
> +  t[cl2] = 0;
> +}
> +  }
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 693621ca990..1d79c77e8ce 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5198,7 +5198,10 @@ vect_optimize_slp_pass::get_result_with_layout 
> (slp_tree node,
>return result;
> 
>  if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> -  || SLP_TREE_DEF_TYPE (node) == vect_external_def)
> +  || (SLP_TREE_DEF_TYPE (node) == vect_external_def
> +  && (to_layout_i == 0
> +  /* We can't permute vector defs.  */
> +  || SLP_TREE_VEC_DEFS (node).is_empty (
 
 Guess it's personal preference, but IMO it's easier to follow without the
 to_layout_i condition, so that it ties directly to the create_partitions
 test.
>>> 
>>> I don’t understand- in the code guarding this we seem to expect to_layout_i 
>>> == 0 and that’s the case we can handle as noop.  I didn’t understand why 
>>> the function doesn’t always just do nothing in this case though, so I must 
>>> have missed something.
>> 
>> OK, so I guess that disproves that my way is easier to understand :)
>> 
>> I think logically, the code is doing the equivalent of:
>> 
>>  int partition_i = m_vertices[node->vertex].partition;
>>  if (partition < 0)
>>{
>>  /* If the vector is uniform or unchanged, there's nothing to do.  */
>>  ...  
>>}
>>  else
>>{
>>  ... Return node if to_layout_i matches this partition's chosen layout...
>>}
>> 
>> And I guess I should have written it that way.
>> 
>> So when there is no partition, we have a constant or external def
>> built from individual scalars.  W

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Max Filippov via Gcc-patches

On Thu, Jul 20, 2023 at 9:10 AM Alexey Lapshin
 wrote:
> I see now, thanks for the explanation, I will try to rebuild toolchain 
> without this particular patch.
> BTW, what do you thing about placing config from newlib overlay to dynconfig?

That's the right thing to do. Bonus points for keeping backwards
compatibility with the overlay-based configuration method (:
I did the same for the uClibc, but the change is still in my queue:

https://github.com/jcmvbkbc/uclibc-ng-xtensa/commit/842aede0537812a0d2158433c5e048ee87324075

-- 
Thanks.
-- Max

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Alexey Lapshin via Gcc-patches

On Thu, 2023-07-20 at 08:25 -0700, Max Filippov wrote:
> But it defines them with their respective values.
> Just notice that it adds two leading underscores in front of the names.

Why builtin macros were defined with prefix?
With this approach I also need define it somewhere:

#define XTHAL_ABI_WINDOWED  __XTHAL_ABI_WINDOWED
#define XTHAL_ABI_CALL0 __XTHAL_ABI_CALL0
.

Or add prefix to macros in existing code that also looks not good..

I want to get idea why toolchain can't have builtin macros with the same names?

Re: [PATCH 2/3] gcc: xtensa: use dynconfig settings as builtin-macros

2023-07-20 Thread Alexey Lapshin via Gcc-patches

On Thu, 2023-07-20 at 10:43 -0700, Max Filippov wrote:
> Bonus points for keeping backwards
> compatibility with the overlay-based configuration method (:

Got you, thanks!

1 2 >

1 - 100 of 142 matches

Mail list logo