[PATCH] i386: Adjust rtx cost for imulq and imulw [PR115749]

2024-07-24 Thread Kong, Lingling
Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least 
there is no obvious regression.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.

OK for trunk?

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (struct processor_costs):
Adjust rtx_cost of imulq and imulw for COST_N_INSNS (4)
to COST_N_INSNS (3).

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115749.c: New test.
---
 gcc/config/i386/x86-tune-costs.h | 16 
 gcc/testsuite/gcc.target/i386/pr115749.c | 16 
 2 files changed, 24 insertions(+), 8 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/pr115749.c

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 769f334e531..2bfaee554d5 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2182,7 +2182,7 @@ struct processor_costs skylake_cost = {
   COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI */
-   COSTS_N_INSNS (4),  /*   HI */
+   COSTS_N_INSNS (3),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
COSTS_N_INSNS (3),  /*   DI */
COSTS_N_INSNS (3)}, /*other */
@@ -2310,7 +2310,7 @@ struct processor_costs icelake_cost = {
   COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI */
-   COSTS_N_INSNS (4),  /*   HI */
+   COSTS_N_INSNS (3),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
COSTS_N_INSNS (3),  /*   DI */
COSTS_N_INSNS (3)}, /*other */
@@ -2434,9 +2434,9 @@ struct processor_costs alderlake_cost = {
   COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI */
-   COSTS_N_INSNS (4),  /*   HI */
+   COSTS_N_INSNS (3),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
-   COSTS_N_INSNS (4),  /*   DI */
+   COSTS_N_INSNS (3),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
   {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */
@@ -3234,9 +3234,9 @@ struct processor_costs tremont_cost = {
   COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI */
-   COSTS_N_INSNS (4),  /*   HI */
+   COSTS_N_INSNS (3),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
-   COSTS_N_INSNS (4),  /*   DI */
+   COSTS_N_INSNS (3),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
   {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */
@@ -3816,9 +3816,9 @@ struct processor_costs generic_cost = {
   COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI */
-   COSTS_N_INSNS (4),  /*   HI */
+   COSTS_N_INSNS (3),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
-   COSTS_N_INSNS (4),  /*   DI */
+   COSTS_N_INSNS (3),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
   {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */
diff --git a/gcc/testsuite/gcc.target/i386/pr115749.c 
b/gcc/testsuite/

Re: [PATCH] regrename: Skip renaming register pairs [PR115860]

2024-07-24 Thread Stefan Schulze Frielinghaus
On Tue, Jul 23, 2024 at 11:40:00AM -0600, Jeff Law wrote:
> 
> 
> On 7/23/24 9:45 AM, Stefan Schulze Frielinghaus wrote:
> 
> > 
> > > They come from:
> > > ```
> > > (define_insn "*tf_to_fprx2_0"
> > >[(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> > >  (subreg:DF (match_operand:TF1 "general_operand"   "v") 
> > > 0))]
> > > ...
> > > (define_insn "*tf_to_fprx2_1"
> > >[(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> > >  (subreg:DF (match_operand:TF1 "general_operand"   "v") 
> > > 8))]
> > > 
> > > ```
> > > 
> > > I am not sure if that is a valid thing to do. s390 backend is the only
> > > one that has insn patterns like this. all that uses "+" use either
> > > strict_lowpart of zero_extract for the lhs or just a pure set.
> > > Maybe there is a better way of representing this. Maybe using unspec here?
> > 
> > I gave unspec a try and came up with
> > 
> > (define_insn "*tf_to_fprx2_0"
> >[(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> >  (unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
> > UNSPEC_TF_TO_FPRX2_0))]
> >"TARGET_VXE"
> >; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
> >"vpdi\t%v0,%v1,%v0,1"
> >[(set_attr "op_type" "VRR")])
> > 
> > (define_insn "*tf_to_fprx2_1"
> >[(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> >  (unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
> > UNSPEC_TF_TO_FPRX2_1))]
> >"TARGET_VXE"
> >; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
> >"vpdi\t%V0,%v1,%V0,5"
> >[(set_attr "op_type" "VRR")])
> > 
> > which seems to work.  However, I'm still getting subregs at final:
> > 
> > (insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >  (mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 
> > {movtf_vr}
> >   (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
> >  (nil)))
> > (insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
> >  (unspec:DF [
> >  (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >  ] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
> >   (nil))
> > (insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
> >  (unspec:DF [
> >  (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >  ] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
> >   (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
> >  (nil)))
> > 
> > Thus, I'm not sure whether this really solves the problem or rather
> > shifts around it.  I'm still a bit puzzled why the initial RTL is
> > invalid.  If I understood you correctly Jeff, then we are missing a
> > pattern which would match once the subregs are eliminated.  Since none
> > exists the subregs survive and regrename gets confused.  This basically
> > means that subregs of register pairs must not survive RA and the unspec
> > solution from above is no real solution.
> I'd tend to agree.  The routine in question is cleanup_subreg_operands and
> from a quick looksie it's not going to work for the insn in question because
> cleanup_subreg_operands actually looks down into the recog data structures
> for each operand.  In the case above the subreg is explicit in the RTL
> rather than matched by the operand predicate.

Right, I did some further tests over night where I also added patterns
in order to match variants where the subregs are eliminated and that
seems to work.  I still haven't made up my mind which route would be
best.  Anyhow, it is clear that this patch should be dropped and I will
come up with a solution for the target.

Thank you Andrew and Jeff for pointing this out.  Some myths about
subregs have been revealed for me :)

Cheers,
Stefan


Re: [PATCH 2/3] aarch64: Add support for moving fpm system register

2024-07-24 Thread Claudio Bantaloukas


On 22/07/2024 11:07, Alex Coplan wrote:
> Hi Claudio,
> 
> I've left a couple of small comments below.
> 
> On 22/07/2024 09:30, Claudio Bantaloukas wrote:
--8<-
>>
>> @@ -1505,6 +1513,8 @@ (define_insn_and_split "*movdi_aarch64"
>>[w, w  ; fmov , fp  , 4] fmov\t%d0, %d1
>>[w, Dd ; neon_move, simd, 4] << 
>> aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);
>>[w, Dx ; neon_move, simd, 8] #
>> + [Umv, r; mrs  , *   , 8] msr\t%0, %x1
>> + [r, Umv; mrs  , *   , 8] mrs\t%x0, %1
> 
> I think in the case of this pattern (but not the others) the %x modifier
> isn't needed since a DImode operand satisfying "r" should get printed as
> an x register by default.

Some sleep and coffee helped! After re-reading gccint 17.5 Output 
Templates and Operand Substitution, I managed to find that

asm_fprintf (f, "%s", reg_names [REGNO (x)]);

is used for registers in aarch64_print_operand. Thank you for the heads up!

Claudio


> 
> Thanks,
> Alex
> 
>> }
>> "CONST_INT_P (operands[1])
>>  && REG_P (operands[0])
>> diff --git a/gcc/config/aarch64/constraints.md 
>> b/gcc/config/aarch64/constraints.md
>> index a2569cea510..0c81fb28f7e 100644
>> --- a/gcc/config/aarch64/constraints.md
>> +++ b/gcc/config/aarch64/constraints.md
>> @@ -77,6 +77,9 @@ (define_register_constraint "Upl" "PR_LO_REGS"
>>   (define_register_constraint "Uph" "PR_HI_REGS"
>> "SVE predicate registers p8 - p15.")
>>   
>> +(define_register_constraint "Umv" "MOVEABLE_SYSREGS"
>> +  "@internal System Registers suitable for moving rather than requiring an 
>> unspec msr")
>> +
>>   (define_constraint "c"
>>"@internal The condition code register."
>> (match_operand 0 "cc_register"))
>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
>> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> index b774f28c9f0..10fd128d8f9 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> @@ -1,14 +1,14 @@
>>   /* Test the fp8 ACLE intrinsics family.  */
>>   /* { dg-do compile } */
>>   /* { dg-options "-O1 -march=armv8-a" } */
>> -/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>   
>>   #ifdef __ARM_FEATURE_FP8
>>   #error "__ARM_FEATURE_FP8 feature macro defined."
>>   #endif
>>   
>>   #pragma GCC push_options
>> -#pragma GCC target ("arch=armv9.4-a+fp8")
>> +#pragma GCC target("arch=armv9.4-a+fp8")
>>   
>>   #include 
>>   
>> @@ -16,4 +16,105 @@
>>   #error "__ARM_FEATURE_FP8 feature macro not defined."
>>   #endif
>>   
>> -#pragma GCC pop_options
>> +/*
>> +**test_write_fpmr_sysreg_asm_64:
>> +**  msr fpmr, x0
>> +**  ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_64 (uint64_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_32:
>> +**  uxtwx0, w0
>> +**  msr fpmr, x0
>> +**  ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_32 (uint32_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_16:
>> +**  and x0, x0, 65535
>> +**  msr fpmr, x0
>> +**  ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_16 (uint16_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_8:
>> +**  and x0, x0, 255
>> +**  msr fpmr, x0
>> +**  ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_8 (uint8_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_64:
>> +**  mrs x0, fpmr
>> +**  ret
>> +*/
>> +uint64_t
>> +test_read_fpmr_sysreg_asm_64 ()
>> +{
>> +  register uint64_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_32:
>> +**  mrs x0, fpmr
>> +**  ret
>> +*/
>> +uint32_t
>> +test_read_fpmr_sysreg_asm_32 ()
>> +{
>> +  register uint32_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_16:
>> +**  mrs x0, fpmr
>> +**  ret
>> +*/
>> +uint16_t
>> +test_read_fpmr_sysreg_asm_16 ()
>> +{
>> +  register uint16_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_8:
>> +**  mrs x0, fpmr
>> +**  ret
>> +*/
>> +uint8_t
>> +test_read_fpmr_sysreg_asm_8 ()
>> +{
>> +  register uint8_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
> 

[PATCH] RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabled

2024-07-24 Thread Christoph Müllner
It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
matches for a XTheadMemIdx INSN with the effect of emitting an invalid
instruction as reported in PR116035.

The pattern above is used to emit a zext.w instruction to zero-extend
SI mode registers to DI mode.  A similar functionality can be achieved
by XTheadBb's th.extu instruction.  And indeed, we have the equivalent
pattern in thead.md (zero_extendsidi2_th_extu).  However, that pattern
depends on !TARGET_XTHEADMEMIDX.  To compensate for that, there are
specific patterns that ensure that zero-extension instruction can still
be emitted (th_memidx_bb_zero_extendsidi2 and friends).

While we could implement something similar (th_memidx_zba_zero_extendsidi2)
it would only make sense, if there existed real HW that does implement Zba
and XTheadMemIdx, but not XTheadBb.  Unless such a machine exists, let's
simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.

PR target/116035

gcc/ChangeLog:

* config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
for XTheadMemIdx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116035-1.c: New test.
* gcc.target/riscv/pr116035-2.c: New test.

Reported-by: Patrick O'Neill 
Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/bitmanip.md|  2 +-
 gcc/testsuite/gcc.target/riscv/pr116035-1.c | 29 +
 gcc/testsuite/gcc.target/riscv/pr116035-2.c | 26 ++
 3 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-2.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index f403ba8dbba..6b720992ca3 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -22,7 +22,7 @@
 (define_insn "*zero_extendsidi2_bitmanip"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
-  "TARGET_64BIT && TARGET_ZBA"
+  "TARGET_64BIT && TARGET_ZBA && !TARGET_XTHEADMEMIDX"
   "@
zext.w\t%0,%1
lwu\t%0,%1"
diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-1.c 
b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
new file mode 100644
index 000..bc45941ff8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64g_zba_xtheadmemidx" { target { rv64 } } } */
+/* { dg-options "-march=rv32g_zba_xtheadmemidx" { target { rv32 } } } */
+
+void a(long);
+unsigned b[11];
+void c()
+{
+  for (int d = 0; d < 11; ++d)
+a(b[d]);
+}
+
+#if __riscv_xlen == 64
+unsigned long zext64_32(unsigned int u32)
+{
+  /* Missed optimization for Zba+XTheadMemIdx.  */
+  return u32; //zext.w a0, a0
+}
+#endif
+
+/* { dg-final { scan-assembler "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { 
target rv64 } } } */
+/* { dg-final { scan-assembler "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { 
target rv32 } } } */
+
+/* { dg-final { scan-assembler-not "lwu\t\[a-x0-9\]+,\(\[a-x0-9\]+\),4,0" } } 
*/
+
+/* Missed optimizations for Zba+XTheadMemIdx.  */
+/* { dg-final { scan-assembler "zext.w\t" { target rv64 xfail rv64 } } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-2.c 
b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
new file mode 100644
index 000..2c1a9694860
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64g_xtheadbb_xtheadmemidx" { target { rv64 } } } */
+/* { dg-options "-march=rv32g_xtheadbb_xtheadmemidx" { target { rv32 } } } */
+
+void a(long);
+unsigned b[11];
+void c()
+{
+  for (int d = 0; d < 11; ++d)
+a(b[d]);
+}
+
+#if __riscv_xlen == 64
+unsigned long zext64_32(unsigned int u32)
+{
+return u32; //th.extu a0, a0, 31, 0
+}
+#endif
+
+/* { dg-final { scan-assembler "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { 
target { rv64 } } } } */
+/* { dg-final { scan-assembler "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { 
target { rv32 } } } } */
+
+/* { dg-final { scan-assembler-not "lwu\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" } 
} */
+
+/* { dg-final { scan-assembler "th.extu\t" { target rv64 } } } */
-- 
2.45.2



Re: [PATCH] RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabled

2024-07-24 Thread Kito Cheng
LGTM :)

On Wed, Jul 24, 2024 at 3:16 PM Christoph Müllner
 wrote:
>
> It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
> matches for a XTheadMemIdx INSN with the effect of emitting an invalid
> instruction as reported in PR116035.
>
> The pattern above is used to emit a zext.w instruction to zero-extend
> SI mode registers to DI mode.  A similar functionality can be achieved
> by XTheadBb's th.extu instruction.  And indeed, we have the equivalent
> pattern in thead.md (zero_extendsidi2_th_extu).  However, that pattern
> depends on !TARGET_XTHEADMEMIDX.  To compensate for that, there are
> specific patterns that ensure that zero-extension instruction can still
> be emitted (th_memidx_bb_zero_extendsidi2 and friends).
>
> While we could implement something similar (th_memidx_zba_zero_extendsidi2)
> it would only make sense, if there existed real HW that does implement Zba
> and XTheadMemIdx, but not XTheadBb.  Unless such a machine exists, let's
> simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.
>
> PR target/116035
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
> for XTheadMemIdx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr116035-1.c: New test.
> * gcc.target/riscv/pr116035-2.c: New test.
>
> Reported-by: Patrick O'Neill 
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/bitmanip.md|  2 +-
>  gcc/testsuite/gcc.target/riscv/pr116035-1.c | 29 +
>  gcc/testsuite/gcc.target/riscv/pr116035-2.c | 26 ++
>  3 files changed, 56 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-2.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index f403ba8dbba..6b720992ca3 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -22,7 +22,7 @@
>  (define_insn "*zero_extendsidi2_bitmanip"
>[(set (match_operand:DI 0 "register_operand" "=r,r")
> (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
> -  "TARGET_64BIT && TARGET_ZBA"
> +  "TARGET_64BIT && TARGET_ZBA && !TARGET_XTHEADMEMIDX"
>"@
> zext.w\t%0,%1
> lwu\t%0,%1"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> new file mode 100644
> index 000..bc45941ff8f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> +/* { dg-options "-march=rv64g_zba_xtheadmemidx" { target { rv64 } } } */
> +/* { dg-options "-march=rv32g_zba_xtheadmemidx" { target { rv32 } } } */
> +
> +void a(long);
> +unsigned b[11];
> +void c()
> +{
> +  for (int d = 0; d < 11; ++d)
> +a(b[d]);
> +}
> +
> +#if __riscv_xlen == 64
> +unsigned long zext64_32(unsigned int u32)
> +{
> +  /* Missed optimization for Zba+XTheadMemIdx.  */
> +  return u32; //zext.w a0, a0
> +}
> +#endif
> +
> +/* { dg-final { scan-assembler "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" 
> { target rv64 } } } */
> +/* { dg-final { scan-assembler "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" 
> { target rv32 } } } */
> +
> +/* { dg-final { scan-assembler-not "lwu\t\[a-x0-9\]+,\(\[a-x0-9\]+\),4,0" } 
> } */
> +
> +/* Missed optimizations for Zba+XTheadMemIdx.  */
> +/* { dg-final { scan-assembler "zext.w\t" { target rv64 xfail rv64 } } } */
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> new file mode 100644
> index 000..2c1a9694860
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> +/* { dg-options "-march=rv64g_xtheadbb_xtheadmemidx" { target { rv64 } } } */
> +/* { dg-options "-march=rv32g_xtheadbb_xtheadmemidx" { target { rv32 } } } */
> +
> +void a(long);
> +unsigned b[11];
> +void c()
> +{
> +  for (int d = 0; d < 11; ++d)
> +a(b[d]);
> +}
> +
> +#if __riscv_xlen == 64
> +unsigned long zext64_32(unsigned int u32)
> +{
> +return u32; //th.extu a0, a0, 31, 0
> +}
> +#endif
> +
> +/* { dg-final { scan-assembler "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" 
> { target { rv64 } } } } */
> +/* { dg-final { scan-assembler "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" 
> { target { rv32 } } } } */
> +
> +/* { dg-final { scan-assembler-not "lwu\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" 
> } } */
> +
> +/* { dg-final { scan-assembler "th.extu\t" { target rv64 } } } */
> --
> 2.45.2
>


Re: [PATCH] optabs/rs6000: Rename iorc and andc to iorn and andn

2024-07-24 Thread Kewen.Lin
Hi Andrew,

on 2024/7/24 10:49, Andrew Pinski wrote:
> When I was trying to add an scalar version of iorc and andc, the optab that
> got matched was for and/ior with the mode of csi and cdi instead of iorc and
> andc optabs for si and di modes. Since csi/cdi are the complex integer modes,
> we need to rename the optabs to be without c there. This changes c to n which
> is a neutral and known not to be first letter of a mode.
> 
> Bootstrapped and tested on x86_64 and powerpc64le.
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/
>   for the code.
>   * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
>   to andn.

Nit: s/andn/iorn/

>   * config/rs6000/rs6000.md (andc3): Rename to ...
>   (andn3): This.
>   (iorc3): Rename to ...
>   (iorn3): This.

Thanks for doing this, rs6000 part change is OK (in case you need that).

BR,
Kewen

>   * doc/md.texi: Update documentation for the rename.
>   * internal-fn.def (BIT_ANDC): Rename to ...
>   (BIT_ANDN): This.
>   (BIT_IORC): Rename to ...
>   (BIT_IORN): This.
>   * optabs.def (andc_optab): Rename to ...
>   (andn_optab): This.
>   (iorc_optab): Rename to ...
>   (iorn_optab): This.
>   * gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the
>   renamed internal functions, ANDC/IORC to ANDN/IORN.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 44 +--
>  gcc/config/rs6000/rs6000-string.cc|  2 +-
>  gcc/config/rs6000/rs6000.md   |  4 +--
>  gcc/doc/md.texi   |  8 ++---
>  gcc/gimple-isel.cc| 12 
>  gcc/internal-fn.def   |  4 +--
>  gcc/optabs.def| 10 --
>  7 files changed, 44 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..ffbeff64d6d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -518,25 +518,25 @@
>  VAND_V8HI_UNS andv8hi3 {}
>  
>const vsc __builtin_altivec_vandc_v16qi (vsc, vsc);
> -VANDC_V16QI andcv16qi3 {}
> +VANDC_V16QI andnv16qi3 {}
>  
>const vuc __builtin_altivec_vandc_v16qi_uns (vuc, vuc);
> -VANDC_V16QI_UNS andcv16qi3 {}
> +VANDC_V16QI_UNS andnv16qi3 {}
>  
>const vf __builtin_altivec_vandc_v4sf (vf, vf);
> -VANDC_V4SF andcv4sf3 {}
> +VANDC_V4SF andnv4sf3 {}
>  
>const vsi __builtin_altivec_vandc_v4si (vsi, vsi);
> -VANDC_V4SI andcv4si3 {}
> +VANDC_V4SI andnv4si3 {}
>  
>const vui __builtin_altivec_vandc_v4si_uns (vui, vui);
> -VANDC_V4SI_UNS andcv4si3 {}
> +VANDC_V4SI_UNS andnv4si3 {}
>  
>const vss __builtin_altivec_vandc_v8hi (vss, vss);
> -VANDC_V8HI andcv8hi3 {}
> +VANDC_V8HI andnv8hi3 {}
>  
>const vus __builtin_altivec_vandc_v8hi_uns (vus, vus);
> -VANDC_V8HI_UNS andcv8hi3 {}
> +VANDC_V8HI_UNS andnv8hi3 {}
>  
>const vsc __builtin_altivec_vavgsb (vsc, vsc);
>  VAVGSB avgv16qi3_ceil {}
> @@ -1189,13 +1189,13 @@
>  VAND_V2DI_UNS andv2di3 {}
>  
>const vd __builtin_altivec_vandc_v2df (vd, vd);
> -VANDC_V2DF andcv2df3 {}
> +VANDC_V2DF andnv2df3 {}
>  
>const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
> -VANDC_V2DI andcv2di3 {}
> +VANDC_V2DI andnv2di3 {}
>  
>const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
> -VANDC_V2DI_UNS andcv2di3 {}
> +VANDC_V2DI_UNS andnv2di3 {}
>  
>const vd __builtin_altivec_vnor_v2df (vd, vd);
>  VNOR_V2DF norv2df3 {}
> @@ -1975,40 +1975,40 @@
>  NEG_V2DI negv2di2 {}
>  
>const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
> -ORC_V16QI iorcv16qi3 {}
> +ORC_V16QI iornv16qi3 {}
>  
>const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
> -ORC_V16QI_UNS iorcv16qi3 {}
> +ORC_V16QI_UNS iornv16qi3 {}
>  
>const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
> -ORC_V1TI iorcv1ti3 {}
> +ORC_V1TI iornv1ti3 {}
>  
>const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
> -ORC_V1TI_UNS iorcv1ti3 {}
> +ORC_V1TI_UNS iornv1ti3 {}
>  
>const vd __builtin_altivec_orc_v2df (vd, vd);
> -ORC_V2DF iorcv2df3 {}
> +ORC_V2DF iornv2df3 {}
>  
>const vsll __builtin_altivec_orc_v2di (vsll, vsll);
> -ORC_V2DI iorcv2di3 {}
> +ORC_V2DI iornv2di3 {}
>  
>const vull __builtin_altivec_orc_v2di_uns (vull, vull);
> -ORC_V2DI_UNS iorcv2di3 {}
> +ORC_V2DI_UNS iornv2di3 {}
>  
>const vf __builtin_altivec_orc_v4sf (vf, vf);
> -ORC_V4SF iorcv4sf3 {}
> +ORC_V4SF iornv4sf3 {}
>  
>const vsi __builtin_altivec_orc_v4si (vsi, vsi);
> -ORC_V4SI iorcv4si3 {}
> +ORC_V4SI iornv4si3 {}
>  
>const vui __builtin_altivec_orc_v4si_uns (vui, vui);
> -ORC_V4SI_UNS iorcv4si3 {}
> +ORC_V4SI_UNS iornv4si3 {}
>  
>const vss __builtin_altivec_orc_v8hi (

Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-24 Thread Kewen.Lin
on 2024/7/24 06:53, Andrew Pinski wrote:
> On Mon, Jul 22, 2024 at 7:41 PM Kewen.Lin  wrote:
>>
>> Hi Andrew,
>>
>> on 2024/7/23 08:09, Andrew Pinski wrote:
>>> On Sun, Jun 30, 2024 at 11:17 PM Kewen.Lin  wrote:

 Hi,

 As PR115659 shows, assuming c = x CMP y, there are some
 folding chances for patterns r = c ? 0/z : z/-1:
   - For r = c ? 0 : z, it can be folded into r = ~c & z.
   - For r = c ? z : -1, it can be folded into r = ~c | z.

 But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
 compound operation, I'm not sure if each target with
 vector capability have a single vector instruction for it,
 if no, it's arguable to consider it always beats vector
 selection (like vector constant gets hoisted or combined
 and selection has same latency as normal logical operation).
 So IMHO we probably need to query target with new optabs.
 So this patch is to introduce new optabs andc, iorc and its
 corresponding internal functions BIT_{ANDC,IORC} (looking
 for suggestion for naming optabs and ifns), and if targets
 defines such optabs for vector modes, it means targets
 support these hardware insns and should be not worse than
 vector selection.  btw, the rs6000 changes are meant to
 give an example for a target supporting andc/iorc.

 Does this sound reasonable?
>>>
>>> Just a quick FYI (I will be making the change and testing the change).
>>> The optab names `andc` and `iorc` unfortunately do not work with
>>> scalar modes since there are complex modes which start with c and are
>>> combined with the scalar modes. So for an example a pattern named
>>> `andcsi3` is not for the optab `andc` with the mode of si but rather
>>> for `and` optab and for the mode `csi`. The same issue happens for
>>> `iorc` too.
>>
>> ah, thanks for pointing out this!  I guess a "_" can help, that is:
>>
>> OPTAB_D (andc_optab, "andc_$a3")
>> OPTAB_D (iorc_optab, "iorc_$a3")
>>
>> but the downside is the code naming become different from "and$a3"
>> and "ior$a3", so it seems better to use different names like what
>> you proposed.
>>
>>> Thinking out loud on what names we should use instead; `andn` and
>>> `iorn` might be ok? Does anyone else have any suggestions?
>>
>> FWIW, they look good to me.
> 
> Just FYI. I also noticed the powerpc backend could define these optabs
> for scalars and would benefit for better code with the following
> example (after I finish up my patches):
> ```
> long f1(long a, long b)
> {
> a = ~0x4;
> return a | ~b;
> }
> long f2(long a, long b)
> {
> a = ~0x4;
> return a & ~b;
> }
> ```
> 

Yeah, andc/orc would be better, thanks for the heads up.

BR,
Kewen


Re: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2024-07-24 Thread Filip Kastl
> How about this formatting, I tend to find it a bit easier to read even.
> I also updated the location numbering to be numerical so, removed the quotes.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> +elif format == 'json':
> +fn = lambda x: x.error_message
> +i = 1
> +result = []
> +for (k, errors) in groupby(sorted(errors, key = fn), fn):
> +errors = list(errors)
> +entry = {}
> +entry['type'] = i
> +entry['msg'] = k
> +entry['count'] = len(errors)
> +i += 1
> +errlines = []
> +for e in errors:
> +locs = e.error_location ().split(':')
> +errlines.append({ "file": locs[0]
> +, "row": int(locs[1])
> +, "column": int(locs[2])
> +, "err": e.console_error })
> +entry['errors'] = errlines
> +result.append(entry)
> +
> +if len(errors) == 0:
> +exit(0)
> +else:
> +json_string = json.dumps(result)
> +print(json_string)
> +exit(1)
>  else:
>  assert False

Sure, this looks nice.  I'm not sure if I have the right to approve the patch
though.

Cheers,
Filip Kastl


[PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-24 Thread liuhongt
For below pattern, RA may still allocate r162 as v/k register, try to
reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi
which result a linker error.

(set (reg:DI 162)
 (mem/u/c:DI
   (const:DI (unspec:DI
 [(symbol_ref:DI ("a") [flags 0x60]  )]
 UNSPEC_GOTNTPOFF))

Quote from H.J for why linker issue an error.
>What do these do:
>
>leaq__libc_tsd_CTYPE_B@gottpoff(%rip), %rax
>vmovq   (%rax), %xmm0
>
>From x86-64 TLS psABI:
>
>The assembler generates for the x@gottpoff(%rip) expressions a R X86
>64 GOTTPOFF relocation for the symbol x which requests the linker to
>generate a GOT entry with a R X86 64 TPOFF64 relocation. The offset of
>the GOT entry relative to the end of the instruction is then used in
>the instruction. The R X86 64 TPOFF64 relocation is pro- cessed at
>program startup time by the dynamic linker by looking up the symbol x
>in the modules loaded at that point. The offset is written in the GOT
>entry and later loaded by the addq instruction.
>
>The above code sequence looks wrong to me.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk and backport?

gcc/ChangeLog:

PR target/116043
* config/i386/constraints.md (Bk): Refine to
define_special_memory_constraint.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr116043.c: New test.
---
 gcc/config/i386/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/i386/pr116043.c | 33 
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116043.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 7508d7a58bd..b760e7c221a 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -187,7 +187,7 @@ (define_special_memory_constraint "Bm"
   "@internal Vector memory operand."
   (match_operand 0 "vector_memory_operand"))
 
-(define_memory_constraint "Bk"
+(define_special_memory_constraint "Bk"
   "@internal TLS address that allows insn using non-integer registers."
   (and (match_operand 0 "memory_operand")
(not (match_test "ix86_gpr_tls_address_pattern_p (op)"
diff --git a/gcc/testsuite/gcc.target/i386/pr116043.c 
b/gcc/testsuite/gcc.target/i386/pr116043.c
new file mode 100644
index 000..76553496c10
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116043.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bf16 -O3" } */
+/* { dg-final { scan-assembler-not {(?n)lea.*@gottpoff} } } */
+
+extern __thread int a, c, i, j, k, l;
+int *b;
+struct d {
+  int e;
+} f, g;
+char *h;
+
+void m(struct d *n) {
+  b = &k;
+  for (; n->e; b++, n--) {
+i = b && a;
+if (i)
+  j = c;
+  }
+}
+
+char *o(struct d *n) {
+  for (; n->e;)
+return h;
+}
+
+int q() {
+  if (l)
+return 1;
+  int p = *o(&g);
+  m(&f);
+  m(&g);
+  l = p;
+}
-- 
2.31.1



Re: [PATCH] RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabled

2024-07-24 Thread Christoph Müllner
Is it OK to backport to GCC 14 (patch applies cleanly, test is running)?

On Wed, Jul 24, 2024 at 9:25 AM Kito Cheng  wrote:
>
> LGTM :)
>
> On Wed, Jul 24, 2024 at 3:16 PM Christoph Müllner
>  wrote:
> >
> > It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
> > matches for a XTheadMemIdx INSN with the effect of emitting an invalid
> > instruction as reported in PR116035.
> >
> > The pattern above is used to emit a zext.w instruction to zero-extend
> > SI mode registers to DI mode.  A similar functionality can be achieved
> > by XTheadBb's th.extu instruction.  And indeed, we have the equivalent
> > pattern in thead.md (zero_extendsidi2_th_extu).  However, that pattern
> > depends on !TARGET_XTHEADMEMIDX.  To compensate for that, there are
> > specific patterns that ensure that zero-extension instruction can still
> > be emitted (th_memidx_bb_zero_extendsidi2 and friends).
> >
> > While we could implement something similar (th_memidx_zba_zero_extendsidi2)
> > it would only make sense, if there existed real HW that does implement Zba
> > and XTheadMemIdx, but not XTheadBb.  Unless such a machine exists, let's
> > simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.
> >
> > PR target/116035
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
> > for XTheadMemIdx.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr116035-1.c: New test.
> > * gcc.target/riscv/pr116035-2.c: New test.
> >
> > Reported-by: Patrick O'Neill 
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/bitmanip.md|  2 +-
> >  gcc/testsuite/gcc.target/riscv/pr116035-1.c | 29 +
> >  gcc/testsuite/gcc.target/riscv/pr116035-2.c | 26 ++
> >  3 files changed, 56 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-2.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index f403ba8dbba..6b720992ca3 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -22,7 +22,7 @@
> >  (define_insn "*zero_extendsidi2_bitmanip"
> >[(set (match_operand:DI 0 "register_operand" "=r,r")
> > (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
> > -  "TARGET_64BIT && TARGET_ZBA"
> > +  "TARGET_64BIT && TARGET_ZBA && !TARGET_XTHEADMEMIDX"
> >"@
> > zext.w\t%0,%1
> > lwu\t%0,%1"
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-1.c 
> > b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> > new file mode 100644
> > index 000..bc45941ff8f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> > +/* { dg-options "-march=rv64g_zba_xtheadmemidx" { target { rv64 } } } */
> > +/* { dg-options "-march=rv32g_zba_xtheadmemidx" { target { rv32 } } } */
> > +
> > +void a(long);
> > +unsigned b[11];
> > +void c()
> > +{
> > +  for (int d = 0; d < 11; ++d)
> > +a(b[d]);
> > +}
> > +
> > +#if __riscv_xlen == 64
> > +unsigned long zext64_32(unsigned int u32)
> > +{
> > +  /* Missed optimization for Zba+XTheadMemIdx.  */
> > +  return u32; //zext.w a0, a0
> > +}
> > +#endif
> > +
> > +/* { dg-final { scan-assembler 
> > "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target rv64 } } } */
> > +/* { dg-final { scan-assembler 
> > "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target rv32 } } } */
> > +
> > +/* { dg-final { scan-assembler-not "lwu\t\[a-x0-9\]+,\(\[a-x0-9\]+\),4,0" 
> > } } */
> > +
> > +/* Missed optimizations for Zba+XTheadMemIdx.  */
> > +/* { dg-final { scan-assembler "zext.w\t" { target rv64 xfail rv64 } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-2.c 
> > b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> > new file mode 100644
> > index 000..2c1a9694860
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> > +/* { dg-options "-march=rv64g_xtheadbb_xtheadmemidx" { target { rv64 } } } 
> > */
> > +/* { dg-options "-march=rv32g_xtheadbb_xtheadmemidx" { target { rv32 } } } 
> > */
> > +
> > +void a(long);
> > +unsigned b[11];
> > +void c()
> > +{
> > +  for (int d = 0; d < 11; ++d)
> > +a(b[d]);
> > +}
> > +
> > +#if __riscv_xlen == 64
> > +unsigned long zext64_32(unsigned int u32)
> > +{
> > +return u32; //th.extu a0, a0, 31, 0
> > +}
> > +#endif
> > +
> > +/* { dg-final { scan-assembler 
> > "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target { rv64 } } } } */
> > +/* { dg-final { scan-assembler 
> > "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target { rv32 } } } } */
> > +
> > +/* { dg-final { scan-assembler-not 
>

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Jennifer Schmitz
The following typo was corrected, updated patch file below:

/* Fuse CMP and CSET. */
- if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)


0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data


> On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
> 
> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
> implemented so far. This patch implements and tests the two fusion pairs.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> There was also no non-noise impact on SPEC CPU2017 benchmark.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
> 
> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
> fusion logic.
> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
> (cmp+cset): Likewise.
> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
> field fusible_ops.
> 
> gcc/testsuite/
> 
> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>


smime.p7s
Description: S/MIME cryptographic signature


Patch ping

2024-07-24 Thread Jakub Jelinek
Hi!

I'd like to ping the C23 #embed patchset:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html 

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655013.html 

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657049.html 

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html

Thanks

Jakub



RE: [PATCH v2] AArch64: Add LUTI ACLE for SVE2

2024-07-24 Thread Vladimir Miloserdov
Hi Kyrill,

Thanks for your prompt review again!

>It looks like __ARM_FEATURE_LUT should guard the Advanced SIMD intrinsics for 
>LUTI at least.
>Therefore we should add this macro definition only once those are implemented 
>as well. So I’d remove this hunk.
I'm working on adding those Advanced SIMD intrinsics now.  The flag should be 
added by either SVE or AdvSIMD LUTI intrinsics patch from my understanding. I 
think this one is first to go in, so should we leave it here?

>Since this patch depends on Andrew’s feature flags rework shouldn’t this hunk 
>look like:
>AARCH64_HAVE_ISA (LUT)
>Instead?
Yes, looks like I was using an older revision of Andrew's work. I'll wait for 
it to be merged and update the patch then.

>+# Return 1 if this is an AArch64 target supporting LUT (Lookup table)
>I don’t see this effective target check used anywhere, do we need to add it?
>I guess I don’t mind it for future use, but just checking that it was 
>deliberate.
It is deliberate - I thought it may be useful to have for future use.

BR,
- Vladimir Miloserdov


[Patch, rs6000] lmprove loop_unroll_adjust

2024-07-24 Thread Ajit Agarwal
Hello All:

This patch improves loop_unroll_adjust by adding mem count to calculate
unroll factor.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Improve loop_unroll_adjust

Improves loop_unroll_adjust by adding mem count to calculate
unroll factor.

2024-07-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000.cc: Improve loop_unroll_adjust
by adding mem count to calculate unroll factor.
---
 gcc/config/rs6000/rs6000.cc | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0bcc6a2d0ab..3dd3857a74e 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -81,6 +81,7 @@
 #include "ppc-auxv.h"
 #include "rs6000-internal.h"
 #include "opts.h"
+#include "rtl-iter.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -5570,7 +5571,11 @@ rs6000_cost_data::finish_cost (const vector_costs 
*scalar_costs)
 static unsigned
 rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
 {
-   if (unroll_only_small_loops)
+  basic_block *bbs;
+  rtx_insn *insn;
+  unsigned i;
+  unsigned mem_count = 0;
+  if (unroll_only_small_loops)
 {
   /* TODO: These are hardcoded values right now.  We probably should use
 a PARAM here.  */
@@ -5582,6 +5587,28 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop 
*loop)
   return 0;
 }
 
+   /* Count the number of memory references within the loop body. */
+  subrtx_iterator::array_type array;
+  bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+FOR_BB_INSNS (bbs[i], insn)
+  if (NONDEBUG_INSN_P (insn))
+   FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
+ if (const_rtx x = *iter)
+   if (MEM_P (x))
+ {
+   machine_mode mode = GET_MODE (x);
+   unsigned int n_words = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+   if (n_words > 4)
+ mem_count += 2;
+   else
+ mem_count += 1;
+ }
+  free (bbs);
+
+  if (mem_count && mem_count <=32)
+return MIN (nunroll, 32 / mem_count);
+
   return nunroll;
 }
 
-- 
2.43.5



Re: [PATCH v2] AArch64: Add LUTI ACLE for SVE2

2024-07-24 Thread Kyrylo Tkachov
Hi Vladimir,

> On 24 Jul 2024, at 11:41, Vladimir Miloserdov  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Kyrill,
> 
> Thanks for your prompt review again!
> 
>> It looks like __ARM_FEATURE_LUT should guard the Advanced SIMD intrinsics 
>> for LUTI at least.
>> Therefore we should add this macro definition only once those are 
>> implemented as well. So I’d remove this hunk.
> I'm working on adding those Advanced SIMD intrinsics now.  The flag should be 
> added by either SVE or AdvSIMD LUTI intrinsics patch from my understanding. I 
> think this one is first to go in, so should we leave it here?

As the macro gives the user information that both SVE and AdvSIMD intrinsics 
are available, its definition should only go in once both AdvSIMD and SVE 
intrinsics are implemented. So either this patch goes in after the AdvSIMD one, 
or the macro definition goes into the AdvSIMD patch after this patch goes in.
Thanks,
Kyrill

> 
>> Since this patch depends on Andrew’s feature flags rework shouldn’t this 
>> hunk look like:
>> AARCH64_HAVE_ISA (LUT)
>> Instead?
> Yes, looks like I was using an older revision of Andrew's work. I'll wait for 
> it to be merged and update the patch then.
> 
>> +# Return 1 if this is an AArch64 target supporting LUT (Lookup table)
>> I don’t see this effective target check used anywhere, do we need to add it?
>> I guess I don’t mind it for future use, but just checking that it was 
>> deliberate.
> It is deliberate - I thought it may be useful to have for future use.
> 
> BR,
> - Vladimir Miloserdov



Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Robin Dapp
> Thanks for the explanation! I have a few clarification questions about this.
> If I understand correctly, B would represent the number of elements the
> vector can have (for 128b vector operating on 32b elements, B == 4, but if
> operating on 64b elements B == 2); however, I'm not too sure what A
> represents.

The runtime size of a vector is a polynomial with a "base size" of A and
"increments beyond A" of B.  B is compile-time variable/indeterminate and
runtime invariant.  For (non-zve32) RVV it specifies the number of 64-bit
chunks beyond the minimum size of 64 bits.  The polynomial is [2 2] in that
case and the "vector bit size" would be
  64 * [2 2] = [128 128] = 128 + x * 128.
For a runtime vector size of 256 bits, x would be 1 and so on and we determine
it at runtime via csrr.

> On the poly_int docs, it says
> > An indeterminate value of 0 should usually represent the minimum possible
> > runtime value, with c0 specifying the value in that case.
> "minimum possible runtime value" doesn't make sense to me. Does it mean the
> potential minimum bound of elements it will operate on?

This refers to the minimum runtime size of a vector, the constant 2 * 64 bit
above.  So it doesn't talk about the number of elements.  The number of
elements can be deducted from the "vector size" polynomial by dividing it by
the element size.  The minimum number of elements for an element size S could
e.g. be [128 128] / S = 128 / S + x * (128 / S).

-- 
Regards
 Robin



[PATCH] gm2: fix bad programming practice warning

2024-07-24 Thread Wilken Gottwalt
Fix identifier names to be too similar to Modula-2 keywords and causing
warnings coming from Modula-2's own libraries.

m2/m2iso/StdChans.mod:54:20: note: In implementation module ‘StdChans’:
either the identifier has the same name as a keyword or alternatively a
keyword has the wrong case (‘IN’ and ‘in’)
   54 |stdnull: ChanId ;

m2/m2iso/StdChans.mod:54:20: note: the symbol name ‘in’ is legal as an
identifier, however as such it might cause confusion and is considered
bad programming practice

gcc/gm2:
* gm2-libs-iso/StdChans.mod: Fix bad identifier warning.

Signed-off-by: Wilken Gottwalt 
---
 gcc/m2/gm2-libs-iso/StdChans.mod | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/m2/gm2-libs-iso/StdChans.mod b/gcc/m2/gm2-libs-iso/StdChans.mod
index fbefbde4b10..e15d4ef9580 100644
--- a/gcc/m2/gm2-libs-iso/StdChans.mod
+++ b/gcc/m2/gm2-libs-iso/StdChans.mod
@@ -45,9 +45,9 @@ FROM RTgen IMPORT ChanDev, DeviceType,
 
 
 VAR
-   in,
-   out,
-   err,
+   inch,
+   outch,
+   errch,
stdin,
stdout,
stderr,
@@ -169,21 +169,21 @@ END NullChan ;
 PROCEDURE InChan () : ChanId ;
   (* Returns the identity of the current default input channel. *)
 BEGIN
-   RETURN( in )
+   RETURN( inch )
 END InChan ;
 
 
 PROCEDURE OutChan () : ChanId ;
   (* Returns the identity of the current default output channel. *)
 BEGIN
-   RETURN( out )
+   RETURN( outch )
 END OutChan ;
 
 
 PROCEDURE ErrChan () : ChanId ;
   (* Returns the identity of the current default error message channel. *)
 BEGIN
-   RETURN( err )
+   RETURN( errch )
 END ErrChan ;
 
   (* The following procedures allow for redirection of the default channels *)
@@ -191,21 +191,21 @@ END ErrChan ;
 PROCEDURE SetInChan (cid: ChanId) ;
   (* Sets the current default input channel to that identified by cid. *)
 BEGIN
-   in := cid
+   inch := cid
 END SetInChan ;
 
 
 PROCEDURE SetOutChan (cid: ChanId) ;
   (* Sets the current default output channel to that identified by cid. *)
 BEGIN
-   out := cid
+   outch := cid
 END SetOutChan ;
 
 
 PROCEDURE SetErrChan (cid: ChanId) ;
   (* Sets the current default error channel to that identified by cid. *)
 BEGIN
-   err := cid
+   errch := cid
 END SetErrChan ;
 
 
@@ -303,9 +303,9 @@ END Init ;
 BEGIN
Init
 FINALLY
-   SafeClose(in) ;
-   SafeClose(out) ;
-   SafeClose(err) ;
+   SafeClose(inch) ;
+   SafeClose(outch) ;
+   SafeClose(errch) ;
SafeClose(stdin) ;
SafeClose(stdout) ;
SafeClose(stderr)
-- 
2.45.2



RE: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2024-07-24 Thread Vladimir Miloserdov
Hi Tamar,

A few suggestions below.

>diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py 
>index 
>>6dbe4b53559c63d2e0276d0ff88619cd2f7f8e06..ab21ed4607593668ab95f24715295a41ac7d8>a21
> 100755
>--- a/contrib/check_GNU_style_lib.py
>+++ b/contrib/check_GNU_style_lib.py
>@@ -29,6 +29,7 @@
> import sys
> import re
> import unittest
>+import json
 
> def import_pip3(*args):
> missing=[]
>@@ -317,6 +318,33 @@ def check_GNU_style_file(file, format):
> else:
> print('%d error(s) written to %s file.' % (len(errors), f))
> exit(1)
>+elif format == 'json':
>+fn = lambda x: x.error_message
>+i = 1
>+result = []
>+for (k, errors) in groupby(sorted(errors, key = fn), fn):
>+errors = list(errors)
>+entry = {}
>+entry['type'] = i
>+entry['msg'] = k
>+entry['count'] = len(errors)
>+i += 1
>+errlines = []
>+for e in errors:
>+locs = e.error_location ().split(':')
>+errlines.append({ "file": locs[0]
>+, "row": int(locs[1])
>+, "column": int(locs[2])
>+, "err": e.console_error })
>+entry['errors'] = errlines
>+result.append(entry)
>+
>+if len(errors) == 0:
>+exit(0)
>+else:
>+json_string = json.dumps(result)
>+print(json_string)
>+exit(1)
> else:
> assert False

Might be a good idea to rename "fn" -> "get_err", "i" -> "error_type_counter", 
"k" -> "error_message", "errors" -> "grouped_errors" to make it easier to 
understand. 

You can also simplify "entry" construction like this:
entry = {
'type': error_type_counter,
'msg': error_message,
'count': len(grouped_errors)
}

BR,
- Vladimir Miloserdov


Re: [PATCH] optabs/rs6000: Rename iorc and andc to iorn and andn

2024-07-24 Thread Richard Biener
On Wed, Jul 24, 2024 at 9:38 AM Kewen.Lin  wrote:
>
> Hi Andrew,
>
> on 2024/7/24 10:49, Andrew Pinski wrote:
> > When I was trying to add an scalar version of iorc and andc, the optab that
> > got matched was for and/ior with the mode of csi and cdi instead of iorc and
> > andc optabs for si and di modes. Since csi/cdi are the complex integer 
> > modes,
> > we need to rename the optabs to be without c there. This changes c to n 
> > which
> > is a neutral and known not to be first letter of a mode.
> >
> > Bootstrapped and tested on x86_64 and powerpc64le.
> >
> > gcc/ChangeLog:
> >
> >   * config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/
> >   for the code.
> >   * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
> >   to andn.
>
> Nit: s/andn/iorn/
>
> >   * config/rs6000/rs6000.md (andc3): Rename to ...
> >   (andn3): This.
> >   (iorc3): Rename to ...
> >   (iorn3): This.
>
> Thanks for doing this, rs6000 part change is OK (in case you need that).

OK for the middle-end parts.

Richard.

> BR,
> Kewen
>
> >   * doc/md.texi: Update documentation for the rename.
> >   * internal-fn.def (BIT_ANDC): Rename to ...
> >   (BIT_ANDN): This.
> >   (BIT_IORC): Rename to ...
> >   (BIT_IORN): This.
> >   * optabs.def (andc_optab): Rename to ...
> >   (andn_optab): This.
> >   (iorc_optab): Rename to ...
> >   (iorn_optab): This.
> >   * gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the
> >   renamed internal functions, ANDC/IORC to ANDN/IORN.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/rs6000/rs6000-builtins.def | 44 +--
> >  gcc/config/rs6000/rs6000-string.cc|  2 +-
> >  gcc/config/rs6000/rs6000.md   |  4 +--
> >  gcc/doc/md.texi   |  8 ++---
> >  gcc/gimple-isel.cc| 12 
> >  gcc/internal-fn.def   |  4 +--
> >  gcc/optabs.def| 10 --
> >  7 files changed, 44 insertions(+), 40 deletions(-)
> >
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 77eb0f7e406..ffbeff64d6d 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -518,25 +518,25 @@
> >  VAND_V8HI_UNS andv8hi3 {}
> >
> >const vsc __builtin_altivec_vandc_v16qi (vsc, vsc);
> > -VANDC_V16QI andcv16qi3 {}
> > +VANDC_V16QI andnv16qi3 {}
> >
> >const vuc __builtin_altivec_vandc_v16qi_uns (vuc, vuc);
> > -VANDC_V16QI_UNS andcv16qi3 {}
> > +VANDC_V16QI_UNS andnv16qi3 {}
> >
> >const vf __builtin_altivec_vandc_v4sf (vf, vf);
> > -VANDC_V4SF andcv4sf3 {}
> > +VANDC_V4SF andnv4sf3 {}
> >
> >const vsi __builtin_altivec_vandc_v4si (vsi, vsi);
> > -VANDC_V4SI andcv4si3 {}
> > +VANDC_V4SI andnv4si3 {}
> >
> >const vui __builtin_altivec_vandc_v4si_uns (vui, vui);
> > -VANDC_V4SI_UNS andcv4si3 {}
> > +VANDC_V4SI_UNS andnv4si3 {}
> >
> >const vss __builtin_altivec_vandc_v8hi (vss, vss);
> > -VANDC_V8HI andcv8hi3 {}
> > +VANDC_V8HI andnv8hi3 {}
> >
> >const vus __builtin_altivec_vandc_v8hi_uns (vus, vus);
> > -VANDC_V8HI_UNS andcv8hi3 {}
> > +VANDC_V8HI_UNS andnv8hi3 {}
> >
> >const vsc __builtin_altivec_vavgsb (vsc, vsc);
> >  VAVGSB avgv16qi3_ceil {}
> > @@ -1189,13 +1189,13 @@
> >  VAND_V2DI_UNS andv2di3 {}
> >
> >const vd __builtin_altivec_vandc_v2df (vd, vd);
> > -VANDC_V2DF andcv2df3 {}
> > +VANDC_V2DF andnv2df3 {}
> >
> >const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
> > -VANDC_V2DI andcv2di3 {}
> > +VANDC_V2DI andnv2di3 {}
> >
> >const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
> > -VANDC_V2DI_UNS andcv2di3 {}
> > +VANDC_V2DI_UNS andnv2di3 {}
> >
> >const vd __builtin_altivec_vnor_v2df (vd, vd);
> >  VNOR_V2DF norv2df3 {}
> > @@ -1975,40 +1975,40 @@
> >  NEG_V2DI negv2di2 {}
> >
> >const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
> > -ORC_V16QI iorcv16qi3 {}
> > +ORC_V16QI iornv16qi3 {}
> >
> >const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
> > -ORC_V16QI_UNS iorcv16qi3 {}
> > +ORC_V16QI_UNS iornv16qi3 {}
> >
> >const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
> > -ORC_V1TI iorcv1ti3 {}
> > +ORC_V1TI iornv1ti3 {}
> >
> >const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
> > -ORC_V1TI_UNS iorcv1ti3 {}
> > +ORC_V1TI_UNS iornv1ti3 {}
> >
> >const vd __builtin_altivec_orc_v2df (vd, vd);
> > -ORC_V2DF iorcv2df3 {}
> > +ORC_V2DF iornv2df3 {}
> >
> >const vsll __builtin_altivec_orc_v2di (vsll, vsll);
> > -ORC_V2DI iorcv2di3 {}
> > +ORC_V2DI iornv2di3 {}
> >
> >const vull __builtin_altivec_orc_v2di_uns (vull, vull);
> > -ORC_V2DI_UNS iorcv2di3 {}
> > +ORC_V2DI_UNS iornv2di3 {}
> >
> >const vf __builtin_altivec_orc_v4sf (vf, vf);
> > -ORC_V4SF i

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Richard Biener
On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu  wrote:
>
>
> On 7/23/2024 11:20 AM, Richard Sandiford wrote:
> > Edwin Lu  writes:
> >> On 7/23/2024 4:56 AM, Richard Biener wrote:
> >>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:
>  Hi Richard,
> 
>  On 5/31/2024 1:48 AM, Richard Biener wrote:
> > On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  
> > wrote:
> >> From: Greg McGary 
> > Still a NACK.  If remain ends up zero then
> >
> >/* Try to use a single smaller load when we are 
> > about
> >   to load excess elements compared to the 
> > unrolled
> >   scalar loop.  */
> >if (known_gt ((vec_num * j + i + 1) * nunits,
> >   (group_size * vf - gap)))
> >  {
> >poly_uint64 remain = ((group_size * vf - gap)
> >  - (vec_num * j + i) * 
> > nunits);
> >if (known_ge ((vec_num * j + i + 1) * nunits
> >  - (group_size * vf - gap), 
> > nunits))
> >  /* DR will be unused.  */
> >  ltype = NULL_TREE;
> >
> > needs to be re-formulated so that the combined conditions make sure
> > this doesn't happen.  The outer known_gt should already ensure that
> > remain > 0.  For correctness that should possibly be maybe_gt though.
> > Yeah.  FWIW, I mentioned the maybe_gt thing in
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:
> >
> >Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
> >We can only skip doing it if we know for sure that the load won't cross
> >the gap.  (Not sure whether the difference can trigger in practice.)
> >
> > But AFAICT, the known_gt doesn't inherently prove that remain is known
> > to be nonzero.  It just proves that the gap between the end of the scalar
> > accesses and the end of this vector is known to be nonzero.
> >
> >>> Putting the list back in the loop and CCing Richard S.
> >>>
>  I'm currently looking into this patch and am trying to figure out what
>  is going on. Stepping through gdb, I see that remain == {coeffs = {0,
>  2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
>  with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).
> 
> From what I understand, this falls under the umbrella of 0 <= remain <
>  nunits. The divide by zero error is because of the 0 <= remain which is
>  coming from the constant_multiple_p function in poly-int.h where it
>  performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
>  (https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)
> 
> 
> >  if (known_ge ((vec_num * j + i + 1) * 
>  nunits
> >- (group_size * vf - gap),
>  nunits))
> >/* DR will be unused.  */
> >ltype = NULL_TREE;
> 
>  This if condition is a bit suspicious to me though. I'm seeing that it's
>  evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
>  returning false. Should it be maybe_ge instead?
> >>> No, we can only not emit a load if we know it won't be used, not if
> >>> it eventually cannot be used.
> > Agreed.
> >
> > [switching round for easier reply]
>  After running some
>  tests, to me it looks like it doesn't vectorize quite as often; however,
>  I'm not fully sure what else to do when the coeffs can potentially be
>  equal to 0.
> 
>  Should it even be possible for there to be a {coeffs = {0, n}}
>  situation? My understanding of how poly_ints are used for representing
>  vectorization is that the first coefficient is the number of elements
>  needed to make the minimum supported vector size. That is, if vector
>  lengths are 128 bits, element size is 32 bits, coeff[0] should be
>  minimum of 4. Is this understanding correct?
> >>> I was told n can be negative, but nunits.coeff[0] should be non-zero.
> >> What would it mean for the coeffs[0] to be 0? Would that mean the vector 
> >> length supports 0 bits?
> > coeffs = {A,B} just means A+B*X, where X is the number of vector
> > "chunks" beyond the minimum length.  It's certainly valid for a poly_int
> > to have a zero coeffs[0] (i.e. zero A).  For example, (the length of a
> > vector) - (the minimum length) would have this property.
>
> Thanks for the explanation! I have a few clarification questions about this.
> If I understand correctly, B would represent the number of elements the 
> vector can have (for 128b vector operating on 32b elements, B == 4, but if 
> operating on 64b elements B == 2);

Re: [PATCH v2] MATCH: Add simplification for MAX and MIN to match.pd [PR109878]

2024-07-24 Thread Richard Biener
On Fri, Jul 19, 2024 at 7:19 PM Eikansh Gupta  wrote:
>
> Min and max could be optimized if both operands are defined by
> (same) variable restricted by an and(&). For signed types,
> optimization can be done when both constant have same sign bit.
> The patch also adds optimization for specific case of min/max(a, a&CST).
>
> This patch adds match pattern for:
>
> max (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST1
> min (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST0
> min (a, a & CST) --> a & CST
> max (a, a & CST) --> a

OK.

Richard.

> PR tree-optimization/109878
>
> gcc/ChangeLog:
>
> * match.pd min/max (a & CST0, a & CST1): New pattern.
> min/max (a, a & CST): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr109878-1.c: New test.
> * gcc.dg/tree-ssa/pr109878-2.c: New test.
> * gcc.dg/tree-ssa/pr109878-3.c: New test.
> * gcc.dg/tree-ssa/pr109878.c: New test.
>
> Signed-off-by: Eikansh Gupta 
> ---
>  gcc/match.pd   | 26 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c | 64 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c | 31 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr109878-3.c | 42 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr109878.c   | 64 ++
>  5 files changed, 227 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf359b0ec0f..dbaff6ab3da 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4321,6 +4321,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  @0
>  @2)))
>
> +/* min (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST0 */
> +/* max (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST1 */
> +/* If signed a, then both the constants should have same sign. */
> +(for minmax (min max)
> + (simplify
> +  (minmax (bit_and@3 @0 INTEGER_CST@1) (bit_and@4 @0 INTEGER_CST@2))
> +   (if (TYPE_UNSIGNED (type)
> +|| (tree_int_cst_sgn (@1) == tree_int_cst_sgn (@2)))
> +(with { auto andvalue = wi::to_wide (@1) & wi::to_wide (@2); }
> + (if (andvalue == ((minmax == MIN_EXPR)
> +   ? wi::to_wide (@1) : wi::to_wide (@2)))
> +  @3
> +  (if (andvalue == ((minmax != MIN_EXPR)
> +? wi::to_wide (@1) : wi::to_wide (@2)))
> +   @4))
> +
> +/* min (a, a & CST) --> a & CST */
> +/* max (a, a & CST) --> a */
> +(for minmax (min max)
> + (simplify
> +  (minmax @0 (bit_and@1 @0 INTEGER_CST@2))
> +   (if (TYPE_UNSIGNED(type))
> +(if (minmax == MIN_EXPR)
> + @1
> + @0
> +
>  /* Simplify min (&var[off0], &var[off1]) etc. depending on whether
> the addresses are known to be less, equal or greater.  */
>  (for minmax (min max)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
> new file mode 100644
> index 000..509e59adea1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
> @@ -0,0 +1,64 @@
> +/* PR tree-optimization/109878 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +
> +/* All the constant pair  used here satisfy the condition:
> +   (cst0 & cst1 == cst0) || (cst0 & cst1 == cst1).
> +   If the above condition is true, then MIN_EXPR is not needed. */
> +int min_and(int a, int b) {
> +  b = a & 3;
> +  a = a & 1;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +int min_and1(int a, int b) {
> +  b = a & 3;
> +  a = a & 15;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +int min_and2(int a, int b) {
> +  b = a & -7;
> +  a = a & -3;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +int min_and3(int a, int b) {
> +  b = a & -5;
> +  a = a & -13;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +/* When constants are of opposite signs, the simplification will only
> +   work for unsigned types. */
> +unsigned int min_and4(unsigned int a, unsigned int b) {
> +  b = a & 3;
> +  a = a & -5;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +unsigned int min_and5(unsigned int a, unsigned int b) {
> +  b = a & -3;
> +  a = a & 5;
> +  if (b < a)
> +return b;
> +  else
> +return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " MIN_EXPR " "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
> new file mode 100644
> index 000..1503dcde1cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/109878 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +
> +/* The testcases here shou

Re: [PATCH 1/8] libstdc++: Clean up @diff@ markup in some I/O tests

2024-07-24 Thread Jonathan Wakely
I've pushed this series to trunk now.

On Mon, 22 Jul 2024 at 17:34, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> -- >8 --
>
> We have a number of 27_io/* tests with comments like this:
>
> // @require@ %-*.tst
> // @diff@ %-*.tst %-*.txt
>
> It seems that these declare required data files used by the test and a
> post-test action to compare the test output with the expected result.
> We do have tests that depend on some *.tst and/or *.txt files that are
> copied from testsuite/data into each test's working directory before it
> runs, so the comments are related to those dependencies.  However,
> nothing in the current test framework actually makes use of these
> comments. Currently, every test gets a fresh copy of every *.tst and
> *.txt file in the testsuite/data directory, whether the test actually
> requires them or not.
>
> This change is the first in a series to clean up this unused markup in
> the tests. This first step is to just remove all @require@ and @diff@
> comments where they seem to serve no purpose at all. These tests do not
> open any of the *.tst or *.txt files that are copied into the test's
> working directory from the testsuite/data directory, so they don't
> "require" any of those files, and there's no need to "diff" them after
> the test runs.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/27_io/basic_filebuf/close/char/4879.cc: Remove
> @require@ and @diff@ comments.
> * testsuite/27_io/basic_filebuf/close/char/9964.cc: Likewise.
> * testsuite/27_io/basic_filebuf/open/char/3.cc: Likewise.
> * testsuite/27_io/basic_filebuf/open/char/9507.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sbumpc/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sbumpc/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sgetc/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sgetc/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sgetn/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sgetn/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/snextc/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sputbackc/char/1-io.cc:
> Likewise.
> * testsuite/27_io/basic_filebuf/sputbackc/char/1-out.cc:
> Likewise.
> * testsuite/27_io/basic_filebuf/sputbackc/char/2-io.cc:
> Likewise.
> * testsuite/27_io/basic_filebuf/sputbackc/char/2-out.cc:
> Likewise.
> * testsuite/27_io/basic_filebuf/sputc/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sputc/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sputn/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sputn/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sungetc/char/1-io.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sungetc/char/1-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sungetc/char/2-io.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sungetc/char/2-out.cc: Likewise.
> * testsuite/27_io/basic_filebuf/sputc/char/1-io.cc: Likewise.
> Remove unused variable.
> * testsuite/27_io/basic_filebuf/sputn/char/2-io.cc: Likewise.
> * testsuite/27_io/basic_ofstream/cons/char/1.cc: Remove
> @require@ and @diff@ comments. Remove unused variables.
> * testsuite/27_io/basic_ofstream/rdbuf/char/2832.cc: Remove
> * testsuite/27_io/ios_base/sync_with_stdio/2.cc: Likewise.
> ---
>  .../testsuite/27_io/basic_filebuf/close/char/4879.cc   |  4 +---
>  .../testsuite/27_io/basic_filebuf/close/char/9964.cc   |  4 +---
>  .../testsuite/27_io/basic_filebuf/open/char/3.cc   |  4 +---
>  .../testsuite/27_io/basic_filebuf/open/char/9507.cc|  4 +---
>  .../testsuite/27_io/basic_filebuf/sbumpc/char/1-out.cc |  5 +
>  .../testsuite/27_io/basic_filebuf/sbumpc/char/2-out.cc |  5 +
>  .../testsuite/27_io/basic_filebuf/sgetc/char/1-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sgetc/char/2-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sgetn/char/1-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sgetn/char/2-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/snextc/char/2-out.cc |  5 +
>  .../27_io/basic_filebuf/sputbackc/char/1-io.cc |  5 +
>  .../27_io/basic_filebuf/sputbackc/char/1-out.cc|  5 +
>  .../27_io/basic_filebuf/sputbackc/char/2-io.cc |  5 +
>  .../27_io/basic_filebuf/sputbackc/char/2-out.cc|  5 +
>  .../testsuite/27_io/basic_filebuf/sputc/char/1-io.cc   |  6 +-
>  .../testsuite/27_io/basic_filebuf/sputc/char/1-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sputc/char/2-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sputn/char/1-out.cc  |  5 +
>  .../testsuite/27_io/basic_filebuf/sputn/char/2-io.cc   |  6 +-
>  .../testsuite/27_io/basic_filebuf/sputn/char/2-out.cc  |  5 +

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Kyrylo Tkachov
Hi Jennifer,

> On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:
> 
> The following typo was corrected, updated patch file below:
> 
> /* Fuse CMP and CSET. */
> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
> 
>> On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
>> 
>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>> implemented so far. This patch implements and tests the two fusion pairs.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 

I’ve bootstrapped and tested the patch myself as well and it looks good.
I think it can be extended a bit though...



>> gcc/
>> 
>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>> fusion logic.
>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>> (cmp+cset): Likewise.
>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>> field fusible_ops.
>> 
>> gcc/testsuite/
>> 
>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>

…

+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
+ && REG_P (XEXP (SET_SRC (curr_set), 1))
+ && REG_P (XEXP (SET_SRC (curr_set), 2))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;
+
+ /* Fuse CMP and CSET. */
+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
+ && REG_P (SET_DEST (curr_set))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;

We have zero-extending forms of these patterns that I think we want to match 
here as well:
*cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
They have some zero_extends around their operands that we’d want to match here 
as well.

Feel free to add them as a follow-up to this patch though as this patch is 
correct as is.
I’ll push it for you to trunk…
Thanks,
Kyrill

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Jennifer,
>
>> On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:
>> 
>> The following typo was corrected, updated patch file below:
>> 
>> /* Fuse CMP and CSET. */
>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>> 
>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
>>> 
>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>> implemented so far. This patch implements and tests the two fusion pairs.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>
> I’ve bootstrapped and tested the patch myself as well and it looks good.
> I think it can be extended a bit though...
>
>
>
>>> gcc/
>>> 
>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>> fusion logic.
>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>> (cmp+cset): Likewise.
>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>> field fusible_ops.
>>> 
>>> gcc/testsuite/
>>> 
>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>
> …
>
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> + && prev_set && curr_set
> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
> + && REG_P (XEXP (SET_SRC (curr_set), 1))
> + && REG_P (XEXP (SET_SRC (curr_set), 2))
> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
> + return true;
> +
> + /* Fuse CMP and CSET. */
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
> + && prev_set && curr_set
> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
> + && REG_P (SET_DEST (curr_set))
> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
> + return true;
>
> We have zero-extending forms of these patterns that I think we want to match 
> here as well:
> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
> They have some zero_extends around their operands that we’d want to match 
> here as well.
>
> Feel free to add them as a follow-up to this patch though as this patch is 
> correct as is.
> I’ll push it for you to trunk…

Sorry for the slow review, was trying to think of a specific suggestion
before replying, but didn't have time to come up with one.

Don't these conditions catch quite a bit more than just the CMP and
CSEL, especially on the CMP side?  Maybe it's ok to have a liberal,
fuzzy match, but I think it at least deserves a comment.

Richard


Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Kyrylo Tkachov


> On 24 Jul 2024, at 13:34, Kyrylo Tkachov  wrote:
> 
> Hi Jennifer,
> 
>> On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:
>> 
>> The following typo was corrected, updated patch file below:
>> 
>> /* Fuse CMP and CSET. */
>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>> 
>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
>>> 
>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>> implemented so far. This patch implements and tests the two fusion pairs.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
> 
> I’ve bootstrapped and tested the patch myself as well and it looks good.
> I think it can be extended a bit though...
> 
> 
> 
>>> gcc/
>>> 
>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>> fusion logic.
>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>> (cmp+cset): Likewise.
>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>> field fusible_ops.
>>> 
>>> gcc/testsuite/
>>> 
>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
> 
> …
> 
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> + && prev_set && curr_set
> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
> + && REG_P (XEXP (SET_SRC (curr_set), 1))
> + && REG_P (XEXP (SET_SRC (curr_set), 2))
> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
> + return true;
> +
> + /* Fuse CMP and CSET. */
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
> + && prev_set && curr_set
> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
> + && REG_P (SET_DEST (curr_set))
> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
> + return true;
> 
> We have zero-extending forms of these patterns that I think we want to match 
> here as well:
> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
> They have some zero_extends around their operands that we’d want to match 
> here as well.
> 
> Feel free to add them as a follow-up to this patch though as this patch is 
> correct as is.
> I’ll push it for you to trunk…

Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files in 
the ChangeLog to match the files in the patch.

Thanks,
Kyrill


> Thanks,
> Kyrill




Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Kyrylo Tkachov


> On 24 Jul 2024, at 13:38, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov  writes:
>> Hi Jennifer,
>> 
>>> On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:
>>> 
>>> The following typo was corrected, updated patch file below:
>>> 
>>> /* Fuse CMP and CSET. */
>>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>>> 
 On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
 
 According to the Neoverse V2 Software Optimization Guide (section 4.14), 
 the
 instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
 implemented so far. This patch implements and tests the two fusion pairs.
 
 The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 There was also no non-noise impact on SPEC CPU2017 benchmark.
 OK for mainline?
 
 Signed-off-by: Jennifer Schmitz 
 
>> 
>> I’ve bootstrapped and tested the patch myself as well and it looks good.
>> I think it can be extended a bit though...
>> 
>> 
>> 
 gcc/
 
 * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
 fusion logic.
 * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
 (cmp+cset): Likewise.
 * config/aarch64/tuning_models/neoversev2.h: Enable logic in
 field fusible_ops.
 
 gcc/testsuite/
 
 * gcc.target/aarch64/fuse_cmp_csel.c: New test.
 * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
 <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>> 
>> …
>> 
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>> + && prev_set && curr_set
>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
>> + && REG_P (XEXP (SET_SRC (curr_set), 1))
>> + && REG_P (XEXP (SET_SRC (curr_set), 2))
>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>> + return true;
>> +
>> + /* Fuse CMP and CSET. */
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>> + && prev_set && curr_set
>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
>> + && REG_P (SET_DEST (curr_set))
>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>> + return true;
>> 
>> We have zero-extending forms of these patterns that I think we want to match 
>> here as well:
>> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
>> They have some zero_extends around their operands that we’d want to match 
>> here as well.
>> 
>> Feel free to add them as a follow-up to this patch though as this patch is 
>> correct as is.
>> I’ll push it for you to trunk…
> 
> Sorry for the slow review, was trying to think of a specific suggestion
> before replying, but didn't have time to come up with one.
> 
> Don't these conditions catch quite a bit more than just the CMP and
> CSEL, especially on the CMP side?  Maybe it's ok to have a liberal,
> fuzzy match, but I think it at least deserves a comment.

Maybe we can add a restriction to integer modes on the compare?
I don’t think FCSEL should be getting fused…
Kyrill

> 
> Richard



[PATCH] rtl-ssa: Fix split_clobber_group tree insertion [PR116044]

2024-07-24 Thread Richard Sandiford
PR116044 is a regression in the testsuite on AMD GCN caused (again)
by the split_clobber_group code.  The first patch in this area
(g:71b31690a7c52413496e91bcc5ee4c68af2f366f) fixed a bug caused
by carrying the old group over as one the split ones.  That patch
instead:

- created two new groups
- inserted them in the splay tree as neighbours of the old group
- removed the old group, and
- invalidated the old group (to force lazy recomputation when
  a clobber's parent group is queried)

However, this left add_def trying to insert the new definition
relative to a stale splay tree root.  The second patch
(g:34f33ea801563e2eabb348e8d3e9344a91abfd48) attempted to fix
that by inserting it relative to the new root.  But that's not
always correct either.  We specifically want to insert it after
the first of the two new groups, whether that group is the root
or not.

This patch does that, and tries to refactor the code to make
it a bit less brittle.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu.  OK to install?

Sorry for all the trouble that this code has caused :-(

Richard

gcc/
PR rtl-optimization/116044
* rtl-ssa/accesses.h (function_info::split_clobber_group): Return
an array of two clobber_groups.
* rtl-ssa/accesses.cc (function_info::split_clobber_group): Return
the new clobber groups.  Don't modify the splay tree here.
(function_info::add_def): Update call accordingly.  Generalize
the splay tree insertion code so that the new definition can be
inserted as a child of any existing node, not just the root.
Fix the insertion used after calling split_clobber_group.
---
 gcc/rtl-ssa/accesses.cc | 66 +++--
 gcc/rtl-ssa/functions.h |  3 +-
 2 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 0bba8391b00..5450ea118d1 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -792,12 +792,12 @@ function_info::merge_clobber_groups (clobber_info 
*clobber1,
 }
 
 // GROUP spans INSN, and INSN now sets the resource that GROUP clobbers.
-// Split GROUP around INSN, to form two new groups, and return the clobber
-// that comes immediately before INSN.
+// Split GROUP around INSN, to form two new groups.  The first of the
+// returned groups comes before INSN and the second comes after INSN.
 //
-// The resource that GROUP clobbers is known to have an associated
-// splay tree.  The caller must remove GROUP from the tree on return.
-clobber_info *
+// The caller is responsible for updating the def_splay_tree and chaining
+// the defs together.
+std::array
 function_info::split_clobber_group (clobber_group *group, insn_info *insn)
 {
   // Search for either the previous or next clobber in the group.
@@ -835,14 +835,10 @@ function_info::split_clobber_group (clobber_group *group, 
insn_info *insn)
   auto *group1 = allocate (first_clobber, prev, tree1.root ());
   auto *group2 = allocate (next, last_clobber, tree2.root ());
 
-  // Insert GROUP2 into the splay tree as an immediate successor of GROUP1.
-  def_splay_tree::insert_child (group, 1, group2);
-  def_splay_tree::insert_child (group, 1, group1);
-
   // Invalidate the old group.
   group->set_last_clobber (nullptr);
 
-  return prev;
+  return { group1, group2 };
 }
 
 // Add DEF to the end of the function's list of definitions of
@@ -899,7 +895,7 @@ function_info::add_def (def_info *def)
   insn_info *insn = def->insn ();
 
   int comparison;
-  def_node *root = nullptr;
+  def_node *neighbor = nullptr;
   def_info *prev = nullptr;
   def_info *next = nullptr;
   if (*insn > *last->insn ())
@@ -909,8 +905,8 @@ function_info::add_def (def_info *def)
   if (def_splay_tree tree = last->splay_root ())
{
  tree.splay_max_node ();
- root = tree.root ();
- last->set_splay_root (root);
+ last->set_splay_root (tree.root ());
+ neighbor = tree.root ();
}
   prev = last;
 }
@@ -921,8 +917,8 @@ function_info::add_def (def_info *def)
   if (def_splay_tree tree = last->splay_root ())
{
  tree.splay_min_node ();
- root = tree.root ();
- last->set_splay_root (root);
+ last->set_splay_root (tree.root ());
+ neighbor = tree.root ();
}
   next = first;
 }
@@ -931,8 +927,8 @@ function_info::add_def (def_info *def)
   // Search the splay tree for an insertion point.
   def_splay_tree tree = need_def_splay_tree (last);
   comparison = lookup_def (tree, insn);
-  root = tree.root ();
-  last->set_splay_root (root);
+  last->set_splay_root (tree.root ());
+  neighbor = tree.root ();
 
   // Deal with cases in which we found an overlapping live range.
   if (comparison == 0)
@@ -943,22 +939,34 @@ function_info::add_def (def_info *def)
  add_clobber (clobber, group);
  return;
   

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Kyrylo Tkachov


On 24 Jul 2024, at 13:40, Kyrylo Tkachov  wrote:



On 24 Jul 2024, at 13:34, Kyrylo Tkachov  wrote:

Hi Jennifer,

On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:

The following typo was corrected, updated patch file below:

/* Fuse CMP and CSET. */
- if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
<0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>

On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:

According to the Neoverse V2 Software Optimization Guide (section 4.14), the
instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
implemented so far. This patch implements and tests the two fusion pairs.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
There was also no non-noise impact on SPEC CPU2017 benchmark.
OK for mainline?

Signed-off-by: Jennifer Schmitz 


I’ve bootstrapped and tested the patch myself as well and it looks good.
I think it can be extended a bit though...



gcc/

* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
fusion logic.
* config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
(cmp+cset): Likewise.
* config/aarch64/tuning_models/neoversev2.h: Enable logic in
field fusible_ops.

gcc/testsuite/

* gcc.target/aarch64/fuse_cmp_csel.c: New test.
* gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
<0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>

…

+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
+ && REG_P (XEXP (SET_SRC (curr_set), 1))
+ && REG_P (XEXP (SET_SRC (curr_set), 2))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;
+
+ /* Fuse CMP and CSET. */
+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
+ && REG_P (SET_DEST (curr_set))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;

We have zero-extending forms of these patterns that I think we want to match 
here as well:
*cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
They have some zero_extends around their operands that we’d want to match here 
as well.

Feel free to add them as a follow-up to this patch though as this patch is 
correct as is.
I’ll push it for you to trunk…

Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files in 
the ChangeLog to match the files in the patch.

As discussed offline, I’ve reverted the patch for now to give Jennifer a chance 
to address the feedback.
Apologies for jumping the gun a bit here.
Thanks,
Kyrill


Thanks,
Kyrill


Thanks,
Kyrill



[PATCH] tree-optimization/116057 - wrong code with CCP and vector CTORs

2024-07-24 Thread Richard Biener
The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants.  This should
be classified as CONSTANT and not UNDEFINED.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116057
* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
operands to look for constants.

* gcc.dg/torture/pr116057.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr116057.c | 20 
 gcc/tree-ssa-ccp.cc | 11 +++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116057.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116057.c 
b/gcc/testsuite/gcc.dg/torture/pr116057.c
new file mode 100644
index 000..a7021c8e746
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116057.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Wno-psabi" } */
+
+#define vect8 __attribute__((vector_size(8)))
+
+vect8 int __attribute__((noipa))
+f(int a)
+{
+  int b;
+  vect8 int t={1,1};
+  if(a) return t;
+  return (vect8 int){0, b};
+}
+
+int main ()
+{
+  if (f(0)[0] != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index de83d26d311..44711018e0e 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -762,6 +762,17 @@ likely_value (gimple *stmt)
continue;
   if (is_gimple_min_invariant (op))
has_constant_operand = true;
+  else if (TREE_CODE (op) == CONSTRUCTOR)
+   {
+ unsigned j;
+ tree val;
+ FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (op), j, val)
+   if (CONSTANT_CLASS_P (val))
+ {
+   has_constant_operand = true;
+   break;
+ }
+   }
 }
 
   if (has_constant_operand)
-- 
2.35.3


[PATCH] RISC-V: Error early with V and no M extension. [PR116036]

2024-07-24 Thread Robin Dapp
Hi,

for calculating the value of a poly_int at runtime we use a multiplication
instruction that requires the M extension.  Instead of just asserting and
ICEing this patch emits an early error at option-parsing time.

We have several tests that use only "i" (without "m") and I adjusted all of
them to "im".  For now, I didn't verify if the original error just with "i"
still occurs but just added "m".

Tested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr116036.c: New test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/arch-37.c: Ditto.
* gcc.target/riscv/arch-38.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/predef-36.c: Ditto.
* gcc.target/riscv/predef-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
---
 gcc/config/riscv/riscv.cc |  5 +
 gcc/internal-fn.cc|  3 ++-
 gcc/testsuite/gcc.target/riscv/arch-31.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-32.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-37.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-38.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-2.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/predef-14.c|  6 +++---
 gcc/testsuite/gcc.target/riscv/predef-15.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-16.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-26.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-27.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-32.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-33.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-36.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-37.c|  6 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111486.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c| 11 +++
 19 files changed, 62 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9ea1b70f3..487698a8e4f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9690,6 +9690,11 @@ riscv_override_options_internal (struct gcc_options 
*opts)
   else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
 error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
 
+  /* We might use a multiplication to calculate the scalable vector length at
+ runtime.  Therefore, require the M extension.  */
+  if (TARGET_VECTOR && !TARGET_MUL)
+sorry ("the % extension requires the % extension");
+
   /* Likewise floating-point division and square root.  */
   if ((TARGET_HARD_FLOAT_OPTS_P (opts) || TARGET_ZFINX_OPTS_P (opts))
   && ((target_flags_explicit & MASK_FDIV) == 0))
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 826d552a6fd..eb6c033535c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -5049,7 +5049,8 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 }
 
 /* Return true if the given ELS_VALUE is supported for a
-   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  */
+   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  The target's
+   preferred else value is return in ELSVAL.  */
 
 bool
 internal_mask_load_else_supported_p (internal_fn ifn,
diff --git a/gcc/testsuite/gcc.target/riscv/arch-31.c 
b/gcc/testsuite/gcc.target/riscv/arch-31.c
index 5180753b905..9b867c5ecd2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-31.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zvfbfmin -mabi=ilp32f" } */
+/* { dg-options "-march=rv32im_zvfbfmin -mabi=ilp32f" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-32.c 
b/gcc/testsuite/gcc.target/riscv/arch-32.c
index 49616832512..49a3db79489 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-32.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64iv_z

[PATCH] libstdc++: Fix future.wait_until when given a negative time_point

2024-07-24 Thread William Tsai
The template `future.wait_until` will expand to
`_M_load_and_test_until_impl` where it will call
`_M_load_and_test_until*` with given time_point casted into second and
nanosecond. The callee expects the caller to provide the values
correctly from caller while the caller did not make check with those
values. One possible error is that if `future.wait_until` is given with
a negative time_point, the underlying system call will raise an error as
the system call does not accept second < 0 and nanosecond < 1.

Following is a simple testcase:
```
#include 
#include 
#include 

using namespace std;

int main() {
promise p;
future f = p.get_future();
chrono::steady_clock::time_point tp(chrono::milliseconds{-10});
future_status status = f.wait_until(tp);
if (status == future_status::timeout) {
cout << "Timed out" << endl;
}
return 0;
}
```

libstdc++-v3/ChangeLog:

* include/bits/atomic_futex.h: Check if __s and __ns is valid before
calling before calling _M_load_and_test_until*

Signed-off-by: William Tsai 
---
 libstdc++-v3/include/bits/atomic_futex.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index dd654174873..4c31946a97f 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -173,6 +173,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
   // XXX correct?
+  if ((__s.time_since_epoch().count() < 0) || (__ns.count() < 0))
+   return false;
   return _M_load_and_test_until(__assumed, __operand, __equal, __mo,
  true, __s.time_since_epoch(), __ns);
 }
@@ -186,6 +188,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
   // XXX correct?
+  if ((__s.time_since_epoch().count() < 0) || (__ns.count() < 0))
+   return false;
   return _M_load_and_test_until_steady(__assumed, __operand, __equal, __mo,
  true, __s.time_since_epoch(), __ns);
 }
-- 
2.37.1



[PATCH] RISC-V: xtheadmemidx: Disable pre/post-modify addressing if RVV is enabled

2024-07-24 Thread Christoph Müllner
When enabling XTheadMemIdx, we enable the pre- and post-modify
addressing modes in the RISC-V backend.
Unfortunately, the auto_inc_dec pass will then attempt to utilize
this feature regardless of the mode class (i.e. scalar or vector).
The assumption seems to be, that an enabled addressing mode for
scalar instructions will also be available for vector instructions.

In case of XTheadMemIdx and RVV, this is ovbiously not the case.
Still, auto_inc_dec (-O3) performs optimizations like the following:

(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  [(char *)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
 (nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) 5 {adddi3}
 (nil))
>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) [0 MEM  [(char 
*)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
 (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
(nil)))

The resulting memory-store with post-modify addressing cannot be
lowered to an existing instruction (see PR116033).
At a lower optimization level (-O2) this is currently fine,
but we can't rely on that.

One solution would be to introduce a target hook to check if a certain
type can be used for pre-/post-modify optimizations.
However, it will be hard to justify such a hook, if only a single
RISC-V vendor extension requires that.
Therefore, this patch takes a more drastic approach and disables
pre-/post-modify addressing if TARGET_VECTOR is set.
This results in not emitting pre-/post-modify instructions from
XTheadMemIdx if RVV is enabled.

Note, that this is not an issue with XTheadVector, because
we currently don't have auto-vectorization for that extension.
To ensure this won't changed without being noticed, an additional
test is added.

PR target/116033

gcc/ChangeLog:

* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): Disable for RVV.
(HAVE_PRE_MODIFY_DISP): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116033-1.c: New test.
* gcc.target/riscv/pr116033-2.c: New test.
* gcc.target/riscv/pr116033-3.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.h|  6 ++--
 gcc/testsuite/gcc.target/riscv/pr116033-1.c | 40 +
 gcc/testsuite/gcc.target/riscv/pr116033-2.c | 40 +
 gcc/testsuite/gcc.target/riscv/pr116033-3.c | 38 
 4 files changed, 122 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-3.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6f040011864..e5760294506 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1254,8 +1254,10 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
 #define MAX_POLY_VARIANT 64
 
-#define HAVE_POST_MODIFY_DISP TARGET_XTHEADMEMIDX
-#define HAVE_PRE_MODIFY_DISP  TARGET_XTHEADMEMIDX
+#define HAVE_POST_MODIFY_DISP \
+  (TARGET_XTHEADMEMIDX && (!TARGET_VECTOR || TARGET_XTHEADVECTOR))
+#define HAVE_PRE_MODIFY_DISP \
+  (TARGET_XTHEADMEMIDX && (!TARGET_VECTOR || TARGET_XTHEADVECTOR))
 
 /* Check TLS Descriptors mechanism is selected.  */
 #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
diff --git a/gcc/testsuite/gcc.target/riscv/pr116033-1.c 
b/gcc/testsuite/gcc.target/riscv/pr116033-1.c
new file mode 100644
index 000..8dcbe6cc2b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116033-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-O2" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gv_xtheadmemidx" { target { rv64 } } } */
+/* { dg-options "-march=rv32gv_xtheadmemidx" { target { rv32 } } } */
+
+char arr_3[20][20];
+void init()
+{
+  for (int i_0 = 0; i_0 < 20; ++i_0)
+for (int i_1 = 0; i_0 < 20; ++i_0)
+  for (int i_1 = 0; i_1 < 20; ++i_0)
+for (int i_1 = 0; i_1 < 20; ++i_1)
+  arr_3[i_0][i_1] = i_1;
+}
+
+long
+lr_reg_imm_upd_char_1 (long *rs1, long rs2)
+{
+  /* Register+register addressing still works.  */
+  *rs1 = *rs1 + (rs2 << 1);
+  return *(char*)(*rs1);
+}
+
+void
+char_pre_dec_load_1 (char *p)
+{
+  /* Missed optimization for V+XTheadMemIdx.  */
+  extern void fchar1(char*, long);
+  p = p - 1;
+  char x = *p;
+  fchar1 (p, x);
+}
+
+/* { dg-final { scan-assembler "vse8.v\tv\[0-9\]+,\[0-9\]+\\(\[a-x0-9\]+\\)" } 
} */
+/* { dg-final { scan-assembler-not 
"vse8.v\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),\[0-9\]+,\[0-9\]+" } } */
+
+/* { d

[PATCH] Maintain complex constraint vector order during PTA solving

2024-07-24 Thread Richard Biener
There's a FIXME comment in the PTA constraint solver that the vector
of complex constraints can get unsorted which can lead to duplicate
entries piling up during node unification.  The following fixes this
with the assumption that delayed updates to constraints are uncommon
(otherwise re-sorting the whole vector would be more efficient).

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
somewhen tomorrow.

Richard.

* tree-ssa-structalias.cc (constraint_equal): Take const
reference to constraints.
(constraint_vec_find): Similar.
(solve_graph): Keep constraint vector sorted and verify
sorting with checking.
---
 gcc/tree-ssa-structalias.cc | 73 +++--
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 65f9132a94f..a32ef1d5cc0 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -902,7 +902,7 @@ constraint_less (const constraint_t &a, const constraint_t 
&b)
 /* Return true if two constraints A and B are equal.  */
 
 static bool
-constraint_equal (struct constraint a, struct constraint b)
+constraint_equal (const constraint &a, const constraint &b)
 {
   return constraint_expr_equal (a.lhs, b.lhs)
 && constraint_expr_equal (a.rhs, b.rhs);
@@ -913,7 +913,7 @@ constraint_equal (struct constraint a, struct constraint b)
 
 static constraint_t
 constraint_vec_find (vec vec,
-struct constraint lookfor)
+constraint &lookfor)
 {
   unsigned int place;
   constraint_t found;
@@ -2806,10 +2806,8 @@ solve_graph (constraint_graph_t graph)
 better visitation order in the next iteration.  */
  while (bitmap_clear_bit (changed, i))
{
- unsigned int j;
- constraint_t c;
  bitmap solution;
- vec complex = graph->complex[i];
+ vec &complex = graph->complex[i];
  varinfo_t vi = get_varinfo (i);
  bool solution_empty;
 
@@ -2845,23 +2843,73 @@ solve_graph (constraint_graph_t graph)
  solution_empty = bitmap_empty_p (solution);
 
  /* Process the complex constraints */
+ hash_set *cvisited = nullptr;
+ if (flag_checking)
+   cvisited = new hash_set;
  bitmap expanded_pts = NULL;
- FOR_EACH_VEC_ELT (complex, j, c)
+ for (unsigned j = 0; j < complex.length (); ++j)
{
- /* XXX: This is going to unsort the constraints in
-some cases, which will occasionally add duplicate
-constraints during unification.  This does not
-affect correctness.  */
- c->lhs.var = find (c->lhs.var);
- c->rhs.var = find (c->rhs.var);
+ constraint_t c = complex[j];
+ /* At unification time only the directly involved nodes
+will get their complex constraints updated.  Update
+our complex constraints now but keep the constraint
+vector sorted and clear of duplicates.  Also make
+sure to evaluate each prevailing constraint only once.  */
+ unsigned int new_lhs = find (c->lhs.var);
+ unsigned int new_rhs = find (c->rhs.var);
+ if (c->lhs.var != new_lhs || c->rhs.var != new_rhs)
+   {
+ constraint tem = *c;
+ tem.lhs.var = new_lhs;
+ tem.rhs.var = new_rhs;
+ unsigned int place
+   = complex.lower_bound (&tem, constraint_less);
+ c->lhs.var = new_lhs;
+ c->rhs.var = new_rhs;
+ if (place != j)
+   {
+ complex.ordered_remove (j);
+ if (j < place)
+   --place;
+ if (place < complex.length ())
+   {
+ if (constraint_equal (*complex[place], *c))
+   {
+ j--;
+ continue;
+   }
+ else
+   complex.safe_insert (place, c);
+   }
+ else
+   complex.quick_push (c);
+ if (place > j)
+   {
+ j--;
+ continue;
+   }
+   }
+   }
 
  /* The only complex constraint that can change our
 solution to non-empty, given an empty solution,
 is a constraint where the lhs side is

Re: [PATCH] RISC-V: Error early with V and no M extension. [PR116036]

2024-07-24 Thread Kito Cheng
LGTM, although I was a little late to join the meeting yesterday, but
I vaguely know you guys are discussing this, that combination really
does not make too much sense and also the LLVM side already does the
same thing :)

On Wed, Jul 24, 2024 at 8:50 PM Robin Dapp  wrote:
>
> Hi,
>
> for calculating the value of a poly_int at runtime we use a multiplication
> instruction that requires the M extension.  Instead of just asserting and
> ICEing this patch emits an early error at option-parsing time.
>
> We have several tests that use only "i" (without "m") and I adjusted all of
> them to "im".  For now, I didn't verify if the original error just with "i"
> still occurs but just added "m".
>
> Tested on rv64gcv_zvfh_zvbb.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR target/116036
>
> * config/riscv/riscv.cc (riscv_override_options_internal): Error
> with TARGET_VECTOR && !TARGET_MUL.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr116036.c: New test.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
> * gcc.target/riscv/arch-32.c: Ditto.
> * gcc.target/riscv/arch-37.c: Ditto.
> * gcc.target/riscv/arch-38.c: Ditto.
> * gcc.target/riscv/predef-14.c: Ditto.
> * gcc.target/riscv/predef-15.c: Ditto.
> * gcc.target/riscv/predef-16.c: Ditto.
> * gcc.target/riscv/predef-26.c: Ditto.
> * gcc.target/riscv/predef-27.c: Ditto.
> * gcc.target/riscv/predef-32.c: Ditto.
> * gcc.target/riscv/predef-33.c: Ditto.
> * gcc.target/riscv/predef-36.c: Ditto.
> * gcc.target/riscv/predef-37.c: Ditto.
> * gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
> * gcc.target/riscv/compare-debug-1.c: Ditto.
> * gcc.target/riscv/compare-debug-2.c: Ditto.
> * gcc.target/riscv/rvv/base/pr116036.c: New test.
> ---
>  gcc/config/riscv/riscv.cc |  5 +
>  gcc/internal-fn.cc|  3 ++-
>  gcc/testsuite/gcc.target/riscv/arch-31.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/arch-32.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/arch-37.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/arch-38.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/compare-debug-1.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/compare-debug-2.c  |  2 +-
>  gcc/testsuite/gcc.target/riscv/predef-14.c|  6 +++---
>  gcc/testsuite/gcc.target/riscv/predef-15.c|  4 ++--
>  gcc/testsuite/gcc.target/riscv/predef-16.c|  4 ++--
>  gcc/testsuite/gcc.target/riscv/predef-26.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/predef-27.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/predef-32.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/predef-33.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/predef-36.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/predef-37.c|  6 +-
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111486.c |  2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c| 11 +++
>  19 files changed, 62 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index ca9ea1b70f3..487698a8e4f 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -9690,6 +9690,11 @@ riscv_override_options_internal (struct gcc_options 
> *opts)
>else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
>  error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
>
> +  /* We might use a multiplication to calculate the scalable vector length at
> + runtime.  Therefore, require the M extension.  */
> +  if (TARGET_VECTOR && !TARGET_MUL)
> +sorry ("the % extension requires the % extension");
> +
>/* Likewise floating-point division and square root.  */
>if ((TARGET_HARD_FLOAT_OPTS_P (opts) || TARGET_ZFINX_OPTS_P (opts))
>&& ((target_flags_explicit & MASK_FDIV) == 0))
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 826d552a6fd..eb6c033535c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -5049,7 +5049,8 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  }
>
>  /* Return true if the given ELS_VALUE is supported for a
> -   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  */
> +   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  The target's
> +   preferred else value is return in ELSVAL.  */
>
>  bool
>  internal_mask_load_else_supported_p (internal_fn ifn,
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-31.c 
> b/gcc/testsuite/gcc.target/riscv/arch-31.c
> index 5180753b905..9b867c5ecd2 100644
> --- a/gcc/testsuite/gcc.target/riscv/arch-31.c
> +++ b/gcc/testsuite/gcc.target/riscv/arch-31.c
> @@ -1,5 +1,

Re: [PATCH] RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabled

2024-07-24 Thread Kito Cheng
Yeah, OK once your local test passes :)

On Wed, Jul 24, 2024 at 4:38 PM Christoph Müllner
 wrote:
>
> Is it OK to backport to GCC 14 (patch applies cleanly, test is running)?
>
> On Wed, Jul 24, 2024 at 9:25 AM Kito Cheng  wrote:
> >
> > LGTM :)
> >
> > On Wed, Jul 24, 2024 at 3:16 PM Christoph Müllner
> >  wrote:
> > >
> > > It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
> > > matches for a XTheadMemIdx INSN with the effect of emitting an invalid
> > > instruction as reported in PR116035.
> > >
> > > The pattern above is used to emit a zext.w instruction to zero-extend
> > > SI mode registers to DI mode.  A similar functionality can be achieved
> > > by XTheadBb's th.extu instruction.  And indeed, we have the equivalent
> > > pattern in thead.md (zero_extendsidi2_th_extu).  However, that pattern
> > > depends on !TARGET_XTHEADMEMIDX.  To compensate for that, there are
> > > specific patterns that ensure that zero-extension instruction can still
> > > be emitted (th_memidx_bb_zero_extendsidi2 and friends).
> > >
> > > While we could implement something similar 
> > > (th_memidx_zba_zero_extendsidi2)
> > > it would only make sense, if there existed real HW that does implement Zba
> > > and XTheadMemIdx, but not XTheadBb.  Unless such a machine exists, let's
> > > simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.
> > >
> > > PR target/116035
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
> > > for XTheadMemIdx.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/riscv/pr116035-1.c: New test.
> > > * gcc.target/riscv/pr116035-2.c: New test.
> > >
> > > Reported-by: Patrick O'Neill 
> > > Signed-off-by: Christoph Müllner 
> > > ---
> > >  gcc/config/riscv/bitmanip.md|  2 +-
> > >  gcc/testsuite/gcc.target/riscv/pr116035-1.c | 29 +
> > >  gcc/testsuite/gcc.target/riscv/pr116035-2.c | 26 ++
> > >  3 files changed, 56 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116035-2.c
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > index f403ba8dbba..6b720992ca3 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -22,7 +22,7 @@
> > >  (define_insn "*zero_extendsidi2_bitmanip"
> > >[(set (match_operand:DI 0 "register_operand" "=r,r")
> > > (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" 
> > > "r,m")))]
> > > -  "TARGET_64BIT && TARGET_ZBA"
> > > +  "TARGET_64BIT && TARGET_ZBA && !TARGET_XTHEADMEMIDX"
> > >"@
> > > zext.w\t%0,%1
> > > lwu\t%0,%1"
> > > diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-1.c 
> > > b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> > > new file mode 100644
> > > index 000..bc45941ff8f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/pr116035-1.c
> > > @@ -0,0 +1,29 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> > > +/* { dg-options "-march=rv64g_zba_xtheadmemidx" { target { rv64 } } } */
> > > +/* { dg-options "-march=rv32g_zba_xtheadmemidx" { target { rv32 } } } */
> > > +
> > > +void a(long);
> > > +unsigned b[11];
> > > +void c()
> > > +{
> > > +  for (int d = 0; d < 11; ++d)
> > > +a(b[d]);
> > > +}
> > > +
> > > +#if __riscv_xlen == 64
> > > +unsigned long zext64_32(unsigned int u32)
> > > +{
> > > +  /* Missed optimization for Zba+XTheadMemIdx.  */
> > > +  return u32; //zext.w a0, a0
> > > +}
> > > +#endif
> > > +
> > > +/* { dg-final { scan-assembler 
> > > "th.lwuia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target rv64 } } } */
> > > +/* { dg-final { scan-assembler 
> > > "th.lwia\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),4,0" { target rv32 } } } */
> > > +
> > > +/* { dg-final { scan-assembler-not 
> > > "lwu\t\[a-x0-9\]+,\(\[a-x0-9\]+\),4,0" } } */
> > > +
> > > +/* Missed optimizations for Zba+XTheadMemIdx.  */
> > > +/* { dg-final { scan-assembler "zext.w\t" { target rv64 xfail rv64 } } } 
> > > */
> > > +
> > > diff --git a/gcc/testsuite/gcc.target/riscv/pr116035-2.c 
> > > b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> > > new file mode 100644
> > > index 000..2c1a9694860
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/pr116035-2.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
> > > +/* { dg-options "-march=rv64g_xtheadbb_xtheadmemidx" { target { rv64 } } 
> > > } */
> > > +/* { dg-options "-march=rv32g_xtheadbb_xtheadmemidx" { target { rv32 } } 
> > > } */
> > > +
> > > +void a(long);
> > > +unsigned b[11];
> > > +void c()
> > > +{
> > > +  for (int d = 0; d < 11; ++d)
> > > +a(b[d]);
> > > +}
> > > +
> > > +#if __riscv_xlen == 64
> > > +unsigned long zext64_32(u

Re: [PATCH] RISC-V: xtheadmemidx: Disable pre/post-modify addressing if RVV is enabled

2024-07-24 Thread Kito Cheng
LGTM :)

On Wed, Jul 24, 2024 at 9:31 PM Christoph Müllner
 wrote:
>
> When enabling XTheadMemIdx, we enable the pre- and post-modify
> addressing modes in the RISC-V backend.
> Unfortunately, the auto_inc_dec pass will then attempt to utilize
> this feature regardless of the mode class (i.e. scalar or vector).
> The assumption seems to be, that an enabled addressing mode for
> scalar instructions will also be available for vector instructions.
>
> In case of XTheadMemIdx and RVV, this is ovbiously not the case.
> Still, auto_inc_dec (-O3) performs optimizations like the following:
>
> (insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  char> [(char *)_39]+0 S4 A32])
> (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> 3183 {*movv4qi}
>  (nil))
> (insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
> (plus:DI (reg:DI 136 [ ivtmp.13 ])
> (const_int 20 [0x14]))) 5 {adddi3}
>  (nil))
> >
> (insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
> (plus:DI (reg:DI 136 [ ivtmp.13 ])
> (const_int 20 [0x14]))) [0 MEM  [(char 
> *)_39]+0 S4 A32])
> (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> 3183 {*movv4qi}
>  (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
> (nil)))
>
> The resulting memory-store with post-modify addressing cannot be
> lowered to an existing instruction (see PR116033).
> At a lower optimization level (-O2) this is currently fine,
> but we can't rely on that.
>
> One solution would be to introduce a target hook to check if a certain
> type can be used for pre-/post-modify optimizations.
> However, it will be hard to justify such a hook, if only a single
> RISC-V vendor extension requires that.
> Therefore, this patch takes a more drastic approach and disables
> pre-/post-modify addressing if TARGET_VECTOR is set.
> This results in not emitting pre-/post-modify instructions from
> XTheadMemIdx if RVV is enabled.
>
> Note, that this is not an issue with XTheadVector, because
> we currently don't have auto-vectorization for that extension.
> To ensure this won't changed without being noticed, an additional
> test is added.
>
> PR target/116033
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): Disable for RVV.
> (HAVE_PRE_MODIFY_DISP): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr116033-1.c: New test.
> * gcc.target/riscv/pr116033-2.c: New test.
> * gcc.target/riscv/pr116033-3.c: New test.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/riscv.h|  6 ++--
>  gcc/testsuite/gcc.target/riscv/pr116033-1.c | 40 +
>  gcc/testsuite/gcc.target/riscv/pr116033-2.c | 40 +
>  gcc/testsuite/gcc.target/riscv/pr116033-3.c | 38 
>  4 files changed, 122 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033-3.c
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 6f040011864..e5760294506 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1254,8 +1254,10 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
> e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
>  #define MAX_POLY_VARIANT 64
>
> -#define HAVE_POST_MODIFY_DISP TARGET_XTHEADMEMIDX
> -#define HAVE_PRE_MODIFY_DISP  TARGET_XTHEADMEMIDX
> +#define HAVE_POST_MODIFY_DISP \
> +  (TARGET_XTHEADMEMIDX && (!TARGET_VECTOR || TARGET_XTHEADVECTOR))
> +#define HAVE_PRE_MODIFY_DISP \
> +  (TARGET_XTHEADMEMIDX && (!TARGET_VECTOR || TARGET_XTHEADVECTOR))
>
>  /* Check TLS Descriptors mechanism is selected.  */
>  #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
> diff --git a/gcc/testsuite/gcc.target/riscv/pr116033-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr116033-1.c
> new file mode 100644
> index 000..8dcbe6cc2b8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr116033-1.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-O2" "-Og" "-Os" "-Oz" } } */
> +/* { dg-options "-march=rv64gv_xtheadmemidx" { target { rv64 } } } */
> +/* { dg-options "-march=rv32gv_xtheadmemidx" { target { rv32 } } } */
> +
> +char arr_3[20][20];
> +void init()
> +{
> +  for (int i_0 = 0; i_0 < 20; ++i_0)
> +for (int i_1 = 0; i_0 < 20; ++i_0)
> +  for (int i_1 = 0; i_1 < 20; ++i_0)
> +for (int i_1 = 0; i_1 < 20; ++i_1)
> +  arr_3[i_0][i_1] = i_1;
> +}
> +
> +long
> +lr_reg_imm_upd_char_1 (long *rs1, long rs2)
> +{
> +  /* Register+register addressing still works.  */
> +  *rs1 = *rs1 + (rs2 << 1);
> +  return *(char*)(*rs1);
> +}
> +
> +void
> +char_pre_dec_load_1 (char *p)
> +{
> +  /* Missed optimization for V

Re: [PATCH] RISC-V: xtheadmemidx: Disable pre/post-modify addressing if RVV is enabled

2024-07-24 Thread Jeff Law




On 7/24/24 7:31 AM, Christoph Müllner wrote:

When enabling XTheadMemIdx, we enable the pre- and post-modify
addressing modes in the RISC-V backend.
Unfortunately, the auto_inc_dec pass will then attempt to utilize
this feature regardless of the mode class (i.e. scalar or vector).
The assumption seems to be, that an enabled addressing mode for
scalar instructions will also be available for vector instructions.

In case of XTheadMemIdx and RVV, this is ovbiously not the case.
Still, auto_inc_dec (-O3) performs optimizations like the following:

(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  [(char *)_39]+0 S4 A32])
 (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
  (nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
 (plus:DI (reg:DI 136 [ ivtmp.13 ])
 (const_int 20 [0x14]))) 5 {adddi3}
  (nil))
>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
 (plus:DI (reg:DI 136 [ ivtmp.13 ])
 (const_int 20 [0x14]))) [0 MEM  [(char 
*)_39]+0 S4 A32])
 (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
  (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
 (nil)))

The resulting memory-store with post-modify addressing cannot be
lowered to an existing instruction (see PR116033).
At a lower optimization level (-O2) this is currently fine,
but we can't rely on that.

One solution would be to introduce a target hook to check if a certain
type can be used for pre-/post-modify optimizations.
However, it will be hard to justify such a hook, if only a single
RISC-V vendor extension requires that.
Therefore, this patch takes a more drastic approach and disables
pre-/post-modify addressing if TARGET_VECTOR is set.
This results in not emitting pre-/post-modify instructions from
XTheadMemIdx if RVV is enabled.

Note, that this is not an issue with XTheadVector, because
we currently don't have auto-vectorization for that extension.
To ensure this won't changed without being noticed, an additional
test is added.

PR target/116033

gcc/ChangeLog:

* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): Disable for RVV.
(HAVE_PRE_MODIFY_DISP): Likewise.
This sounds like it's just papering over the real bug, probably in the 
address checking code of the backend.


I haven't tried to debug this, but I'd look closely at 
th_classify_address which seems to ignore the mode, which seems wrong.


jeff



Re: [PATCH] RISC-V: Error early with V and no M extension. [PR116036]

2024-07-24 Thread Jeff Law




On 7/24/24 7:54 AM, Kito Cheng wrote:

LGTM, although I was a little late to join the meeting yesterday, but
I vaguely know you guys are discussing this, that combination really
does not make too much sense and also the LLVM side already does the
same thing :)
Yea, the idea was this combination doesn't make sense and we should 
issue a gracefull diagnostic.  If some oddball implementation comes 
along, the implementors of that oddball implementation can deal with 
this problem.


jeff



Re: [PATCH] RISC-V: Error early with V and no M extension. [PR116036]

2024-07-24 Thread Robin Dapp
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 826d552a6fd..eb6c033535c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -5049,7 +5049,8 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  }
>  
>  /* Return true if the given ELS_VALUE is supported for a
> -   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  */
> +   MASK_LOAD or MASK_LEN_LOAD with mode MODE.  The target's
> +   preferred else value is return in ELSVAL.  */

Grml,  this is obviously not part of this patch but a rebase error.

Going to send a V2.

-- 
Regards
 Robin



Re: [PATCH] RISC-V: xtheadmemidx: Disable pre/post-modify addressing if RVV is enabled

2024-07-24 Thread Christoph Müllner
On Wed, Jul 24, 2024 at 3:57 PM Jeff Law  wrote:
>
>
>
> On 7/24/24 7:31 AM, Christoph Müllner wrote:
> > When enabling XTheadMemIdx, we enable the pre- and post-modify
> > addressing modes in the RISC-V backend.
> > Unfortunately, the auto_inc_dec pass will then attempt to utilize
> > this feature regardless of the mode class (i.e. scalar or vector).
> > The assumption seems to be, that an enabled addressing mode for
> > scalar instructions will also be available for vector instructions.
> >
> > In case of XTheadMemIdx and RVV, this is ovbiously not the case.
> > Still, auto_inc_dec (-O3) performs optimizations like the following:
> >
> > (insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  > char> [(char *)_39]+0 S4 A32])
> >  (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> > 3183 {*movv4qi}
> >   (nil))
> > (insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
> >  (plus:DI (reg:DI 136 [ ivtmp.13 ])
> >  (const_int 20 [0x14]))) 5 {adddi3}
> >   (nil))
> > >
> > (insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
> >  (plus:DI (reg:DI 136 [ ivtmp.13 ])
> >  (const_int 20 [0x14]))) [0 MEM  [(char 
> > *)_39]+0 S4 A32])
> >  (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 
> > 3183 {*movv4qi}
> >   (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
> >  (nil)))
> >
> > The resulting memory-store with post-modify addressing cannot be
> > lowered to an existing instruction (see PR116033).
> > At a lower optimization level (-O2) this is currently fine,
> > but we can't rely on that.
> >
> > One solution would be to introduce a target hook to check if a certain
> > type can be used for pre-/post-modify optimizations.
> > However, it will be hard to justify such a hook, if only a single
> > RISC-V vendor extension requires that.
> > Therefore, this patch takes a more drastic approach and disables
> > pre-/post-modify addressing if TARGET_VECTOR is set.
> > This results in not emitting pre-/post-modify instructions from
> > XTheadMemIdx if RVV is enabled.
> >
> > Note, that this is not an issue with XTheadVector, because
> > we currently don't have auto-vectorization for that extension.
> > To ensure this won't changed without being noticed, an additional
> > test is added.
> >
> >   PR target/116033
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): Disable for RVV.
> >   (HAVE_PRE_MODIFY_DISP): Likewise.
> This sounds like it's just papering over the real bug, probably in the
> address checking code of the backend.
>
> I haven't tried to debug this, but I'd look closely at
> th_classify_address which seems to ignore the mode, which seems wrong.

I checked that, and there is a mode check there.
But, after your comment, I challenged the test and indeed:
  if (!(INTEGRAL_MODE_P (mode) && GET_MODE_SIZE (mode).to_constant ()
<= 8)) return false;
INTEGRAL_MODE_P() includes vector modes.
I'll change the check to ensure that "GET_MODE_CLASS (MODE) ==
MODE_INT" is fulfilled and prepare a v2.

Thank you!


Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-24 Thread Jennifer Schmitz
Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg 
operands of the compare and if-then-else expressions. As it is not legal to 
have different modes in the operand registers, I only added one check for each 
of the expressions.
The updated patch was bootstrapped and tested again.
Best,
Jennifer



0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data



> On 24 Jul 2024, at 13:57, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 24 Jul 2024, at 13:40, Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 24 Jul 2024, at 13:34, Kyrylo Tkachov  wrote:
>>> 
>>> Hi Jennifer,
>>> 
 On 24 Jul 2024, at 10:52, Jennifer Schmitz  wrote:
 
 The following typo was corrected, updated patch file below:
 
 /* Fuse CMP and CSET. */
 - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
 + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
 <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
 
> On 23 Jul 2024, at 12:16, Jennifer Schmitz  wrote:
> 
> According to the Neoverse V2 Software Optimization Guide (section 4.14), 
> the
> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
> implemented so far. This patch implements and tests the two fusion pairs.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> regression.
> There was also no non-noise impact on SPEC CPU2017 benchmark.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
>>> 
>>> I’ve bootstrapped and tested the patch myself as well and it looks good.
>>> I think it can be extended a bit though...
>>> 
>>> 
>>> 
> gcc/
> 
> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
> fusion logic.
> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
> (cmp+cset): Likewise.
> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
> field fusible_ops.
> 
> gcc/testsuite/
> 
> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>>> 
>>> …
>>> 
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>>> + && prev_set && curr_set
>>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>>> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
>>> + && REG_P (XEXP (SET_SRC (curr_set), 1))
>>> + && REG_P (XEXP (SET_SRC (curr_set), 2))
>>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>>> + return true;
>>> +
>>> + /* Fuse CMP and CSET. */
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>>> + && prev_set && curr_set
>>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>>> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
>>> + && REG_P (SET_DEST (curr_set))
>>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>>> + return true;
>>> 
>>> We have zero-extending forms of these patterns that I think we want to 
>>> match here as well:
>>> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
>>> They have some zero_extends around their operands that we’d want to match 
>>> here as well.
>>> 
>>> Feel free to add them as a follow-up to this patch though as this patch is 
>>> correct as is.
>>> I’ll push it for you to trunk…
>> 
>> Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files 
>> in the ChangeLog to match the files in the patch.
The test-file names match in the updated version.
> 
> As discussed offline, I’ve reverted the patch for now to give Jennifer a 
> chance to address the feedback.
> Apologies for jumping the gun a bit here.
> Thanks,
> Kyrill
> 
>> 
>> Thanks,
>> Kyrill
>> 
>> 
>>> Thanks,
>>> Kyrill




smime.p7s
Description: S/MIME cryptographic signature


[PATCH] SVE Intrinsics: Change return type of redirect_call to gcall.

2024-07-24 Thread Jennifer Schmitz
As suggested in the review of
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657474.html,
this patch changes the return type of gimple_folder::redirect_call from
gimple * to gcall *. The motivation for this is that so far, most callers of
the function had been casting the result of the function to gcall. These
call sites were updated.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/

* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::redirect_call): Update return type.
* config/aarch64/aarch64-sve-builtins.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svqshl_impl::fold):
Remove cast to gcall.
(svrshl_impl::fold): Likewise.


0001-SVE-Intrinsics-Change-return-type-of-redirect_call-t.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2] RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressing

2024-07-24 Thread Christoph Müllner
auto_inc_dec (-O3) performs optimizations like the following
if RVV and XTheadMemIdx is enabled.

(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  [(char *)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
 (nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) 5 {adddi3}
 (nil))
>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) [0 MEM  [(char 
*)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 
{*movv4qi}
 (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
(nil)))

The reason why the pass believes that this is legal is,
that the mode test in th_memidx_classify_address_modify()
requires INTEGRAL_MODE_P (mode), which includes vector modes.

Let's restrict the mode test such, that only MODE_INT is allowed.

PR target/116033

gcc/ChangeLog:

* config/riscv/thead.cc (th_memidx_classify_address_modify):
Fix mode test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116033.c: New test.

Reported-by: Patrick O'Neill 
Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/thead.cc |  6 ++
 gcc/testsuite/gcc.target/riscv/pr116033.c | 16 
 2 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116033.c

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 951b60888596..6f5edeb7e0ac 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -453,10 +453,8 @@ th_memidx_classify_address_modify (struct 
riscv_address_info *info, rtx x,
   if (!TARGET_XTHEADMEMIDX)
 return false;
 
-  if (!TARGET_64BIT && mode == DImode)
-return false;
-
-  if (!(INTEGRAL_MODE_P (mode) && GET_MODE_SIZE (mode).to_constant () <= 8))
+  if (GET_MODE_CLASS (mode) != MODE_INT
+  || GET_MODE_SIZE (mode).to_constant () > UNITS_PER_WORD)
 return false;
 
   if (GET_CODE (x) != POST_MODIFY
diff --git a/gcc/testsuite/gcc.target/riscv/pr116033.c 
b/gcc/testsuite/gcc.target/riscv/pr116033.c
new file mode 100644
index ..881922da0260
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116033.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gv_xtheadmemidx" { target { rv64 } } } */
+/* { dg-options "-march=rv32gv_xtheadmemidx" { target { rv32 } } } */
+
+char arr_3[20][20];
+void init()
+{
+  for (int i_0 = 0; i_0 < 20; ++i_0)
+for (int i_1 = 0; i_0 < 20; ++i_0)
+  for (int i_1 = 0; i_1 < 20; ++i_0)
+for (int i_1 = 0; i_1 < 20; ++i_1)
+  arr_3[i_0][i_1] = i_1;
+}
+
+/* { dg-final { scan-assembler-not 
"vse8.v\t\[a-x0-9\]+,\\(\[a-x0-9\]+\\),\[0-9\]+,\[0-9\]+" } } */
-- 
2.45.2



[PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Robin Dapp
Hi,

now with proper diff...

For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.

We have several tests that use only "i" (without "m") and I adjusted all of
them to "im".  For now, I didn't verify if the original error just with "i"
still occurs but just added "m".

Tested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/arch-37.c: Ditto.
* gcc.target/riscv/arch-38.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/predef-36.c: Ditto.
* gcc.target/riscv/predef-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
---
 gcc/config/riscv/riscv.cc |  5 +
 gcc/testsuite/gcc.target/riscv/arch-31.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-32.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-37.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-38.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-2.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/predef-14.c|  6 +++---
 gcc/testsuite/gcc.target/riscv/predef-15.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-16.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-26.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-27.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-32.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-33.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-36.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-37.c|  6 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111486.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c| 11 +++
 18 files changed, 60 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7016a33cce3..fcdb7ab08dd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9691,6 +9691,11 @@ riscv_override_options_internal (struct gcc_options 
*opts)
   else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
 error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
 
+  /* We might use a multiplication to calculate the scalable vector length at
+ runtime.  Therefore, require the M extension.  */
+  if (TARGET_VECTOR && !TARGET_MUL)
+sorry ("the % extension requires the % extension");
+
   /* Likewise floating-point division and square root.  */
   if ((TARGET_HARD_FLOAT_OPTS_P (opts) || TARGET_ZFINX_OPTS_P (opts))
   && ((target_flags_explicit & MASK_FDIV) == 0))
diff --git a/gcc/testsuite/gcc.target/riscv/arch-31.c 
b/gcc/testsuite/gcc.target/riscv/arch-31.c
index 5180753b905..9b867c5ecd2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-31.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zvfbfmin -mabi=ilp32f" } */
+/* { dg-options "-march=rv32im_zvfbfmin -mabi=ilp32f" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-32.c 
b/gcc/testsuite/gcc.target/riscv/arch-32.c
index 49616832512..49a3db79489 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-32.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64iv_zvfbfmin -mabi=lp64d" } */
+/* { dg-options "-march=rv64imv_zvfbfmin -mabi=lp64d" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-37.c 
b/gcc/testsuite/gcc.target/riscv/arch-37.c
index 5b19a73c556..b56ba77b973 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-37.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-37.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zvfbfwma -mabi=ilp32f" } */
+/* { dg-options "-march=rv32im_zvfbfwma -mabi=ilp32f" } */
 int
 foo ()
 {}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-38.c 
b/gcc/testsuite/gcc.target/riscv/arch-38.c
index cee3efebe75..164a91e38a3 1006

Re: [PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Palmer Dabbelt

On Wed, 24 Jul 2024 08:25:30 PDT (-0700), Robin Dapp wrote:

Hi,

now with proper diff...

For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.

We have several tests that use only "i" (without "m") and I adjusted all of
them to "im".  For now, I didn't verify if the original error just with "i"
still occurs but just added "m".

Tested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/arch-37.c: Ditto.
* gcc.target/riscv/arch-38.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/predef-36.c: Ditto.
* gcc.target/riscv/predef-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
---
 gcc/config/riscv/riscv.cc |  5 +
 gcc/testsuite/gcc.target/riscv/arch-31.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-32.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-37.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-38.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-2.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/predef-14.c|  6 +++---
 gcc/testsuite/gcc.target/riscv/predef-15.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-16.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-26.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-27.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-32.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-33.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-36.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-37.c|  6 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111486.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c| 11 +++
 18 files changed, 60 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7016a33cce3..fcdb7ab08dd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9691,6 +9691,11 @@ riscv_override_options_internal (struct gcc_options 
*opts)
   else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
 error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
 
+  /* We might use a multiplication to calculate the scalable vector length at

+ runtime.  Therefore, require the M extension.  */
+  if (TARGET_VECTOR && !TARGET_MUL)
+sorry ("the % extension requires the % extension");


It's really GCC's implementation of the V extension that requires M, not 
the actul ISA V extension.  So I think the wording could be a little 
confusing for users here, but no big deal either way on my end so


Reviewed-by: Palmer Dabbelt 

Thanks!


+
   /* Likewise floating-point division and square root.  */
   if ((TARGET_HARD_FLOAT_OPTS_P (opts) || TARGET_ZFINX_OPTS_P (opts))
   && ((target_flags_explicit & MASK_FDIV) == 0))
diff --git a/gcc/testsuite/gcc.target/riscv/arch-31.c 
b/gcc/testsuite/gcc.target/riscv/arch-31.c
index 5180753b905..9b867c5ecd2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-31.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zvfbfmin -mabi=ilp32f" } */
+/* { dg-options "-march=rv32im_zvfbfmin -mabi=ilp32f" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-32.c 
b/gcc/testsuite/gcc.target/riscv/arch-32.c
index 49616832512..49a3db79489 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-32.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64iv_zvfbfmin -mabi=lp64d" } */
+/* { dg-options "-march=rv64imv_zvfbfmin -mabi=lp64d" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-37.c 
b/gcc/testsuite/gcc.target/riscv/arch-37.c
index 5b19a73c556..b56ba77b973 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-37.c
+++ b/gcc/testsuite/gcc.target/riscv/arc

Re: [PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Robin Dapp
> It's really GCC's implementation of the V extension that requires M, not 
> the actul ISA V extension.  So I think the wording could be a little 
> confusing for users here, but no big deal either way on my end so
>
> Reviewed-by: Palmer Dabbelt 

Hmm, fair.  How about just "the 'V' implementation requires the 'M' extension"?
Or "the current 'V' implementation"?

-- 
Regards
 Robin



Re: [PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Patrick O'Neill



On 7/24/24 08:37, Robin Dapp wrote:

It's really GCC's implementation of the V extension that requires M, not
the actul ISA V extension.  So I think the wording could be a little
confusing for users here, but no big deal either way on my end so

Reviewed-by: Palmer Dabbelt 

Hmm, fair.  How about just "the 'V' implementation requires the 'M' extension"?
Or "the current 'V' implementation"?

That phrasing makes sense to me. It's consistent with the -mbig-endian 
sorry message:


https://godbolt.org/z/oWMeorEeM

Patrick



Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Carl Love

Peter:

On 7/23/24 2:26 PM, Peter Bergner wrote:

On 7/19/24 3:04 PM, Carl Love wrote:

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+   (match_operand:VEC_IC 2 "register_operand" "v")
 (match_operand:QI 3 "const_0_to_12_operand" "n")]
VSHIFT_DBL_LR))]
"TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter


OK, will add that change and retest the patch.  Thanks.

 Carl


Re: [PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Robin Dapp
> That phrasing makes sense to me. It's consistent with the -mbig-endian 
> sorry message:
>
> https://godbolt.org/z/oWMeorEeM

I seem to remember that explicitly mentioning GCC in an error message like
that was discouraged but I might be confusing things.

So probably
"GCC's current 'V' implementation".

-- 
Regards
 Robin



[PATCH] libcpp, c++, v2: Optimize initializers using #embed in C++

2024-07-24 Thread Jakub Jelinek
On Tue, Jul 23, 2024 at 09:38:15PM -0400, Jason Merrill wrote:
Thanks.

> > but please see
> > https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
> > comment and whether we really want the preprocessor to preprocess it for
> > C++ as (or as-if)
> > static_cast(127),static_cast > char>(69),static_cast(76),static_cast > char>(70),static_cast(2),...
> > i.e. 9 tokens per byte rather than 2, or
> > (unsigned char)127,(unsigned char)69,...
> > or
> > ((unsigned char)127),((unsigned char)69),...
> > etc.
> 
> The discussion at that link suggests that the author is planning to propose
> removing the cast.

Yeah, just wanted to mention it for completeness, that the earlier
libcpp patch implements what is planned and would need to change if
something different is accepted into C++26.  And also state that I'd
strongly prefer preprocessing compatibility between C and C++ here.
Sure, I know 'a' is char in C++ and int in C, but there simply isn't
any compact form to denote unsigned char literals right now.

> Let's call this variable old_raw_data_ptr for clarity, here and in
> reshape_init_class.

Done.

> > -  elt_init = reshape_init_r (elt_type, d,
> > -/*first_initializer_p=*/NULL_TREE,
> > -complain);
> > +  if (TREE_CODE (d->cur->value) == RAW_DATA_CST
> > + && (TREE_CODE (elt_type) == INTEGER_TYPE
> > + || (TREE_CODE (elt_type) == ENUMERAL_TYPE
> > + && TYPE_CONTEXT (TYPE_MAIN_VARIANT (elt_type)) == std_node
> > + && strcmp (TYPE_NAME_STRING (TYPE_MAIN_VARIANT (elt_type)),
> > +"byte") == 0))
> 
> Maybe is_byte_access_type?  Or finally factor out a function to test
> specifically for std::byte, it's odd that we don't have one yet.

Used is_byte_access_type (though next to the INTEGER_TYPE check, because
even signed char can and should be handled, and the CHAR_BIT test should
remain too (it is testing the host char precision against the target one).

> > @@ -7158,6 +7244,7 @@ reshape_init_class (tree type, reshape_i
> >  is initialized by the designated-initializer-list { D }, where D
> >  is the designated- initializer-clause naming a member of the
> >  anonymous union member."  */
> > + gcc_checking_assert (TREE_CODE (d->cur->value) != RAW_DATA_CST);
> 
> Is there a test of trying to use #embed as a designated initializer?  I
> don't see one.

I was thinking about those, then tested with -std=c++26 and saw it all
rejected, so didn't include anything.
But it seems we accept mixing [0] = 1, 1 in C++17 and older, it is just
rejected in C++20 and later, so the test is useful.  Of course even
when using
[0] =
#embed __FILE__
only the initial designator is provided and then the rest if more than
one doesn't have them.
Anyway, found an ICE on
int a[] = { [0] =
#embed __FILE__
};
and fixed that too.

> > +  if (tree raw_init = cp_maybe_split_raw_data (d))
> > +   return raw_init;
> > d->cur++;
> > return init;
> 
> This split-or-++ pattern seems to repeat a lot in reshape_init_r, could we
> factor it out to avoid problems with people forgetting one or the other?
> Maybe consume_init (d) or d->consume_init ()?

Done that (though, taking also init argument so that it can be used more
cleanly); the functions always start with init = d->cur->value; but
then can change what init is in some cases (but never in the RAW_DATA_CST
case).  Only 3 spots could use that though, another one is a recovery one
which needs to skip over perhaps multiple entries, and another one would
crash if d->cur is incremented before the has_designator_problem call.

Lightly tested on x86_64-linux, of course would test it fully, but I guess
there will be other changes requested in the 4 other patches...

2024-07-24  Jakub Jelinek  

libcpp/
* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/cp/ChangeLog:
* cp-tree.h (class raw_data_iterator): New type.
(class raw_data_range): New type.
* parser.cc (cp_parser_postfix_open_square_expression): Handle
parsing of CPP_EMBED.
(cp_parser_parenthesized_expression_list): Likewise.  Use
cp_lexer_next_token_is.
(cp_parser_expression): Handle parsing of CPP_EMBED.
(cp_parser_template_argument_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_oacc_clause_tile): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
* constexpr.cc (reduced_constant_expression_p): Likewise.
(raw_data_cst_elt): New function.
(find_array_ctor_elt): Handle RAW_DATA_CST.
(cxx_eval_array_reference): Likewise.
* typeck2.cc (digest_init_r): Emit -Wnarrowing and/or -Wconversion
diagnostics.
(process_init_constructor_array): Handle RAW_DATA_CST.
* decl.cc (maybe_deduce_size_from_array_init): Likewise.
(is_direct_enum_i

[committed 01/11] aarch64: Remove unused global aarch64_tune_flags

2024-07-24 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_tune_flags): Remove unused global variable.
(aarch64_override_options_internal): Remove dead assignment.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9e51236ce9fa059ccc6e4fe24335b5fb36692ef8..d8fbd7102e7b8e45c68f7725b5bbaca0beec7c96
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -349,9 +349,6 @@ static bool aarch64_print_address_internal (FILE*, 
machine_mode, rtx,
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
-/* Mask to specify which instruction scheduling options should be used.  */
-uint64_t aarch64_tune_flags = 0;
-
 /* Global flag for PC relative loads.  */
 bool aarch64_pcrelative_literal_loads;
 
@@ -18273,7 +18270,6 @@ void
 aarch64_override_options_internal (struct gcc_options *opts)
 {
   const struct processor *tune = aarch64_get_tune_cpu (opts->x_selected_tune);
-  aarch64_tune_flags = tune->flags;
   aarch64_tune = tune->sched_core;
   /* Make a copy of the tuning parameters attached to the core, which
  we may later overwrite.  */


[committed 00/11] aarch64: Extend aarch64_feature_flags to 128 bits

2024-07-24 Thread Andrew Carlotti
The end goal of the series is to change the definition of aarch64_feature_flags
from a uint64_t typedef to a class with 128 bits of storage.  This class is a
new template bitmap type that uses operator overloading to mimic the existing
integer interface as much as possible.

The changes are mostly in the backend, but patch 10/11 introduces this new
bitmap type in the middle end.

Committed as approved by Kyrill and Richard S, with minor changes to
patches 04, 08 and 10 as requested by Richard.


[committed 02/11] aarch64: Move AARCH64_NUM_ISA_MODES definition

2024-07-24 Thread Andrew Carlotti
AARCH64_NUM_ISA_MODES will be used within aarch64-opts.h in a later
commit.

gcc/ChangeLog:

* config/aarch64/aarch64.h (DEF_AARCH64_ISA_MODE): Move to...
* config/aarch64/aarch64-opts.h (DEF_AARCH64_ISA_MODE): ...here.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
a05c0d3ded1c69802f15eebb8c150c7dcc62b4ef..06a4fed3833482543891b4f7c778933f7cebd631
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -24,6 +24,11 @@
 
 #ifndef USED_FOR_TARGET
 typedef uint64_t aarch64_feature_flags;
+
+constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
+#define DEF_AARCH64_ISA_MODE(IDENT) + 1
+#include "aarch64-isa-modes.def"
+);
 #endif
 
 /* The various cores that implement AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
fac1882bcb38eae3690c2dc366ebc6c3f64ee940..2be6dc4089b81d2a4e1ba6861b25094774198406
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -183,11 +183,6 @@ enum class aarch64_feature : unsigned char {
 
 constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_OFF;
 
-constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
-#define DEF_AARCH64_ISA_MODE(IDENT) + 1
-#include "aarch64-isa-modes.def"
-);
-
 /* The mask of all ISA modes.  */
 constexpr auto AARCH64_FL_ISA_MODES
   = (aarch64_feature_flags (1) << AARCH64_NUM_ISA_MODES) - 1;



[committed 04/11] aarch64: Introduce aarch64_isa_mode type

2024-07-24 Thread Andrew Carlotti
Currently there are many places where an aarch64_feature_flags variable
is used, but only the bottom three isa mode bits are set and read.
Using a separate data type for these value makes it more clear that
they're not expected or required to have any of their upper feature bits
set.  It will also make things simpler and more efficient when we extend
aarch64_feature_flags to 128 bits.

This patch uses explicit casts whenever converting from an
aarch64_feature_flags value to an aarch64_isa_mode value.  This isn't
strictly necessary, but serves to highlight the locations where an
explicit conversion will become necessary later.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h: Add aarch64_isa_mode typedef.
* config/aarch64/aarch64-protos.h
(aarch64_gen_callee_cookie): Use aarch64_isa_mode parameter.
(aarch64_sme_vq_immediate): Ditto.
* config/aarch64/aarch64.cc
(aarch64_fntype_pstate_sm): Use aarch64_isa_mode values.
(aarch64_fntype_pstate_za): Ditto.
(aarch64_fndecl_pstate_sm): Ditto.
(aarch64_fndecl_pstate_za): Ditto.
(aarch64_fndecl_isa_mode): Ditto.
(aarch64_cfun_incoming_pstate_sm): Ditto.
(aarch64_cfun_enables_pstate_sm): Ditto.
(aarch64_call_switches_pstate_sm): Ditto.
(aarch64_gen_callee_cookie): Ditto.
(aarch64_callee_isa_mode): Ditto.
(aarch64_insn_callee_abi): Ditto.
(aarch64_sme_vq_immediate): Ditto.
(aarch64_add_offset_temporaries): Ditto.
(aarch64_add_offset): Ditto.
(aarch64_add_sp): Ditto.
(aarch64_sub_sp): Ditto.
(aarch64_guard_switch_pstate_sm): Ditto.
(aarch64_switch_pstate_sm): Ditto.
(aarch64_init_cumulative_args): Ditto.
(aarch64_allocate_and_probe_stack_space): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_start_call_args): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_end_call_args): Ditto.
(aarch64_set_current_function): Ditto, with added conversions.
(aarch64_handle_attr_arch): Avoid macro with changed type.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_isa_flags): Ditto.
(aarch64_switch_pstate_sm_for_landing_pad):
Use arch64_isa_mode values.
(aarch64_switch_pstate_sm_for_jump): Ditto.
(pass_switch_pstate_sm::gate): Ditto.
* config/aarch64/aarch64.h
(AARCH64_ISA_MODE_{SM_ON|SM_OFF|ZA_ON}): New macros.
(AARCH64_FL_SM_STATE): Mark as possibly unused.
(AARCH64_ISA_MODE_SM_STATE): New aarch64_isa_mode mask.
(AARCH64_DEFAULT_ISA_MODE): New aarch64_isa_mode value.
(AARCH64_FL_DEFAULT_ISA_MODE): Define using above value.
(AARCH64_ISA_MODE): Change type to aarch64_isa_mode.
(arm_pcs): Use aarch64_isa_mode value.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
06a4fed3833482543891b4f7c778933f7cebd631..2c36bfaad19b999238601d44709c280ef987046b
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -23,6 +23,8 @@
 #define GCC_AARCH64_OPTS_H
 
 #ifndef USED_FOR_TARGET
+typedef uint64_t aarch64_isa_mode;
+
 typedef uint64_t aarch64_feature_flags;
 
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
42639e9efcf1e0f9362f759ae63a31b8eeb0d581..f64afe2889018e1c4735a1677e6bf5febc4a7665
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -767,7 +767,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 tree aarch64_vector_load_decl (tree);
-rtx aarch64_gen_callee_cookie (aarch64_feature_flags, arm_pcs);
+rtx aarch64_gen_callee_cookie (aarch64_isa_mode, arm_pcs);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem_mops (rtx *, bool);
 bool aarch64_expand_cpymem (rtx *, bool);
@@ -808,7 +808,7 @@ int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_rdsvl_immediate_p (const_rtx);
 rtx aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT,
- aarch64_feature_flags);
+ aarch64_isa_mode);
 char *aarch64_output_rdsvl (const_rtx);
 bool aarch64_addsvl_addspl_immediate_p (const_rtx);
 char *aarch64_output_addsvl_addspl (rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
2be6dc4089b81d2a4e1ba6861b25094774198406..dfb244307635a7aa1c552acd55a635cd0bdeeb39
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -181,7 +181,17 @@ enum class aarch64_feature : unsigned char {
 #include "aarch64-arches.def"
 #undef HANDLE
 
-constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_O

[committed 03/11] aarch64: Eliminate a temporary variable.

2024-07-24 Thread Andrew Carlotti
The name would become misleading in a later commit anyway, and I think
this is marginally more readable.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_override_options): Remove temporary variable.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d8fbd7102e7b8e45c68f7725b5bbaca0beec7c96..4e3a4047ea80600d4c38f0359ba6962a15f77987
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18856,7 +18856,6 @@ aarch64_override_options (void)
   SUBTARGET_OVERRIDE_OPTIONS;
 #endif
 
-  auto isa_mode = AARCH64_FL_DEFAULT_ISA_MODE;
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
@@ -18879,25 +18878,25 @@ aarch64_override_options (void)
}
 
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (cpu)
 {
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu_isa | isa_mode);
+  aarch64_set_asm_isa_flags (cpu_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (arch)
 {
   cpu = &all_cores[arch->ident];
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else
 {
   /* No -mcpu or -march specified, so use the default CPU.  */
   cpu = &all_cores[TARGET_CPU_DEFAULT];
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu->flags | isa_mode);
+  aarch64_set_asm_isa_flags (cpu->flags | AARCH64_FL_DEFAULT_ISA_MODE);
 }
 
   selected_tune = tune ? tune->ident : cpu->ident;



[committed 05/11] aarch64: Define aarch64_get_{asm_|}isa_flags

2024-07-24 Thread Andrew Carlotti
Building an aarch64_feature_flags value from data within a gcc_options
or cl_target_option struct will get more complicated in a later commit.
Use a macro to avoid doing this manually in more than one location.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_handle_option): Use new macro.
* config/aarch64/aarch64.cc
(aarch64_override_options_internal): Ditto.
(aarch64_option_print): Ditto.
(aarch64_set_current_function): Ditto.
(aarch64_can_inline_p): Ditto.
(aarch64_declare_function_name): Ditto.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): New
(aarch64_get_isa_flags): New.
(aarch64_asm_isa_flags): Use new macro.
(aarch64_isa_flags): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
951d041d3109b935e90a7cb5d714940414e81761..63c50189a09d5c7c713f57e23a8172f44bf6bec5
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -111,7 +111,7 @@ aarch64_handle_option (struct gcc_options *opts,
 
 case OPT_mgeneral_regs_only:
   opts->x_target_flags |= MASK_GENERAL_REGS_ONLY;
-  aarch64_set_asm_isa_flags (opts, opts->x_aarch64_asm_isa_flags);
+  aarch64_set_asm_isa_flags (opts, aarch64_get_asm_isa_flags (opts));
   return true;
 
 case OPT_mfix_cortex_a53_835769:
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
dfb244307635a7aa1c552acd55a635cd0bdeeb39..193f2486176b6bac372a143e2f52041c5a28ebaf
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -22,15 +22,18 @@
 #ifndef GCC_AARCH64_H
 #define GCC_AARCH64_H
 
+#define aarch64_get_asm_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+#define aarch64_get_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
 #ifndef GENERATOR_FILE
 #undef aarch64_asm_isa_flags
-#define aarch64_asm_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_asm_isa_flags)
+#define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
 #undef aarch64_isa_flags
-#define aarch64_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_isa_flags)
+#define aarch64_isa_flags (aarch64_get_isa_flags (&global_options))
 #endif
 
 /* Target CPU builtins.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1b343c9ae1c7aef2a3fb3f28a3b2236ca270cfbd..66ce04d77e17a65d320578de389462756d33d110
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18329,10 +18329,11 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
-  if ((opts->x_aarch64_isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
-  && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+  aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
+  if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
+  && !(isa_flags & AARCH64_FL_SME))
 {
-  if (opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  if (isa_flags & AARCH64_FL_SM_ON)
error ("streaming functions require the ISA extension %qs", "sme");
   else
error ("functions with SME state require the ISA extension %qs",
@@ -18341,8 +18342,7 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
  " option %<-march%>, or by using the %"
  " attribute or pragma", "sme");
   opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
-  auto new_flags = (opts->x_aarch64_asm_isa_flags
-   | feature_deps::SME ().enable);
+  auto new_flags = isa_flags | feature_deps::SME ().enable;
   aarch64_set_asm_isa_flags (opts, new_flags);
 }
 
@@ -19036,9 +19036,9 @@ aarch64_option_print (FILE *file, int indent, struct 
cl_target_option *ptr)
   const struct processor *cpu
 = aarch64_get_tune_cpu (ptr->x_selected_tune);
   const struct processor *arch = aarch64_get_arch (ptr->x_selected_arch);
+  aarch64_feature_flags isa_flags = aarch64_get_asm_isa_flags(ptr);
   std::string extension
-= aarch64_get_extension_string_for_isa_flags (ptr->x_aarch64_asm_isa_flags,
- arch->flags);
+= aarch64_get_extension_string_for_isa_flags (isa_flags, arch->flags);
 
   fprintf (file, "%*sselected tune = %s\n", indent, "", cpu->name);
   fprintf (file, "%*sselected arch = %s%s\n", indent, "",
@@ -19098,7 +19098,7 @@ aarch64_set_current_function (tree fndecl)
   auto new_isa_mode = (fndecl
   ? aarch64_fndecl_isa_mode (fndecl)
   : AARCH64_DEFAULT_ISA_MODE);
-  auto isa_flags = TREE_TARGET_OPTION (new_tree)->x_a

[committed 08/11] aarch64: Add bool conversion to TARGET_* macros

2024-07-24 Thread Andrew Carlotti
Use a new AARCH64_HAVE_ISA macro in TARGET_* definitions, and eliminate
all the AARCH64_ISA_* feature macros.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc
(aarch64_define_unconditional_macros): Use TARGET_V8R macro.
(aarch64_update_cpp_builtins): Use TARGET_* macros.
* config/aarch64/aarch64.h (AARCH64_HAVE_ISA): New macro.
(AARCH64_ISA_SM_OFF, AARCH64_ISA_SM_ON, AARCH64_ISA_ZA_ON)
(AARCH64_ISA_V8A, AARCH64_ISA_V8_1A, AARCH64_ISA_CRC)
(AARCH64_ISA_FP, AARCH64_ISA_SIMD, AARCH64_ISA_LSE)
(AARCH64_ISA_RDMA, AARCH64_ISA_V8_2A, AARCH64_ISA_F16)
(AARCH64_ISA_SVE, AARCH64_ISA_SVE2, AARCH64_ISA_SVE2_AES)
(AARCH64_ISA_SVE2_BITPERM, AARCH64_ISA_SVE2_SHA3)
(AARCH64_ISA_SVE2_SM4, AARCH64_ISA_SME, AARCH64_ISA_SME_I16I64)
(AARCH64_ISA_SME_F64F64, AARCH64_ISA_SME2, AARCH64_ISA_V8_3A)
(AARCH64_ISA_DOTPROD, AARCH64_ISA_AES, AARCH64_ISA_SHA2)
(AARCH64_ISA_V8_4A, AARCH64_ISA_SM4, AARCH64_ISA_SHA3)
(AARCH64_ISA_F16FML, AARCH64_ISA_RCPC, AARCH64_ISA_RCPC8_4)
(AARCH64_ISA_RNG, AARCH64_ISA_V8_5A, AARCH64_ISA_TME)
(AARCH64_ISA_MEMTAG, AARCH64_ISA_V8_6A, AARCH64_ISA_I8MM)
(AARCH64_ISA_F32MM, AARCH64_ISA_F64MM, AARCH64_ISA_BF16)
(AARCH64_ISA_SB, AARCH64_ISA_RCPC3, AARCH64_ISA_V8R)
(AARCH64_ISA_PAUTH, AARCH64_ISA_V8_7A, AARCH64_ISA_V8_8A)
(AARCH64_ISA_V8_9A, AARCH64_ISA_V9A, AARCH64_ISA_V9_1A)
(AARCH64_ISA_V9_2A, AARCH64_ISA_V9_3A, AARCH64_ISA_V9_4A)
(AARCH64_ISA_MOPS, AARCH64_ISA_LS64, AARCH64_ISA_CSSC)
(AARCH64_ISA_D128, AARCH64_ISA_THE, AARCH64_ISA_GCS): Remove.
(TARGET_BASE_SIMD, TARGET_SIMD, TARGET_FLOAT)
(TARGET_NON_STREAMING, TARGET_STREAMING, TARGET_ZA, TARGET_SHA2)
(TARGET_SHA3, TARGET_AES, TARGET_SM4, TARGET_F16FML)
(TARGET_CRC32, TARGET_LSE, TARGET_FP_F16INST)
(TARGET_SIMD_F16INST, TARGET_DOTPROD, TARGET_SVE, TARGET_SVE2)
(TARGET_SVE2_AES, TARGET_SVE2_BITPERM, TARGET_SVE2_SHA3)
(TARGET_SVE2_SM4, TARGET_SME, TARGET_SME_I16I64)
(TARGET_SME_F64F64, TARGET_SME2, TARGET_ARMV8_3, TARGET_JSCVT)
(TARGET_FRINT, TARGET_TME, TARGET_RNG, TARGET_MEMTAG)
(TARGET_I8MM, TARGET_SVE_I8MM, TARGET_SVE_F32MM)
(TARGET_SVE_F64MM, TARGET_BF16_FP, TARGET_BF16_SIMD)
(TARGET_SVE_BF16, TARGET_PAUTH, TARGET_BTI, TARGET_MOPS)
(TARGET_LS64, TARGET_CSSC, TARGET_SB, TARGET_RCPC, TARGET_RCPC2)
(TARGET_RCPC3, TARGET_SIMD_RDMA, TARGET_ARMV9_4, TARGET_D128)
(TARGET_THE, TARGET_GCS): Redefine using AARCH64_HAVE_ISA.
(TARGET_V8R, TARGET_V9A): New.
* config/aarch64/aarch64.md (arch_enabled): Use TARGET_RCPC2.
* config/aarch64/iterators.md (GPI_I16): Use TARGET_FP_F16INST.
(GPF_F16): Ditto.
* config/aarch64/predicates.md
(aarch64_rcpc_memory_operand): Use TARGET_RCPC2.


diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
2aff097dd33c1892d255f7227c72dc90892bc78a..f9b9e379375507c5c49cac280f3a8c3e34c9aec9
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -64,7 +64,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__ARM_ARCH_8A");
 
   builtin_define_with_int_value ("__ARM_ARCH_PROFILE",
-  AARCH64_ISA_V8R ? 'R' : 'A');
+  TARGET_V8R ? 'R' : 'A');
   builtin_define ("__ARM_FEATURE_CLZ");
   builtin_define ("__ARM_FEATURE_IDIV");
   builtin_define ("__ARM_FEATURE_UNALIGNED");
@@ -132,7 +132,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
pfile);
 
   cpp_undef (pfile, "__ARM_ARCH");
-  builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9A ? 9 : 8);
+  builtin_define_with_int_value ("__ARM_ARCH", TARGET_V9A ? 9 : 8);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
 flag_short_enums ? 1 : 4);
@@ -259,7 +259,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 
   aarch64_def_or_undef (TARGET_LS64,
"__ARM_FEATURE_LS64", pfile);
-  aarch64_def_or_undef (AARCH64_ISA_RCPC, "__ARM_FEATURE_RCPC", pfile);
+  aarch64_def_or_undef (TARGET_RCPC, "__ARM_FEATURE_RCPC", pfile);
   aarch64_def_or_undef (TARGET_D128, "__ARM_FEATURE_SYSREG128", pfile);
 
   aarch64_def_or_undef (TARGET_SME, "__ARM_FEATURE_SME", pfile);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
903e708565dc7830e9544813dd315f99d489cad2..6310ebd72ff2af6d39a776702ef40e9399357e95
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -68,18 +68,6 @@
 #define BYTES_BIG_ENDIAN (TARGET_BIG_END != 0)
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
 
-/* AdvSIMD is supported in the default configuration, unless disabled by
-   -mgeneral-regs-only or by the +nosimd extension.  The set of available
-   instructions is then subdivided into:
-
-   - 

[committed 06/11] aarch64: Decouple feature flag option storage type

2024-07-24 Thread Andrew Carlotti
The awk scripts that process the .opt files are relatively fragile and
only handle a limited set of data types correctly.  The unrecognised
aarch64_feature_flags type is handled as a uint64_t, which happens to be
correct for now.  However, that assumption will change when we extend
the mask to 128 bits.

This patch changes the option members to use uint64_t types, and adds a
"_0" suffix to the names (both for future extensibility, and to allow
the original name to be used for the full aarch64_feature_flags mask
within generator files).

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Reorder, and add suffix to names.
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Add "_0" suffix.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Redefine using renamed uint64_t value.
(aarch64_isa_flags): Ditto.
* config/aarch64/aarch64.opt:
(aarch64_asm_isa_flags): Rename to...
(aarch64_asm_isa_flags_0): ...this, and change to uint64_t.
(aarch64_isa_flags): Rename to...
(aarch64_isa_flags_0): ...this, and change to uint64_t.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
63c50189a09d5c7c713f57e23a8172f44bf6bec5..bd0770dd0d84005701afed35d4af356380a405e9
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -66,15 +66,16 @@ static const struct default_options 
aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
-/* Set OPTS->x_aarch64_asm_isa_flags to FLAGS and update
-   OPTS->x_aarch64_isa_flags accordingly.  */
+
+/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
+   OPTS->x_aarch64_isa_flags_0 accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags = flags;
-  opts->x_aarch64_isa_flags = flags;
+  opts->x_aarch64_asm_isa_flags_0 = flags;
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
-opts->x_aarch64_isa_flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+  opts->x_aarch64_isa_flags_0 = flags;
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
193f2486176b6bac372a143e2f52041c5a28ebaf..903e708565dc7830e9544813dd315f99d489cad2
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -23,13 +23,18 @@
 #define GCC_AARCH64_H
 
 #define aarch64_get_asm_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0))
 #define aarch64_get_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0))
 
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
-#ifndef GENERATOR_FILE
+#ifdef GENERATOR_FILE
+#undef aarch64_asm_isa_flags
+#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0))
+#undef aarch64_isa_flags
+#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0))
+#else
 #undef aarch64_asm_isa_flags
 #define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
 #undef aarch64_isa_flags
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
6356c419399bd324929cd599e5a4b926b0383469..45aab49de27bdfa0fb3f67ec06c7dcf0ac242fb3
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -31,10 +31,10 @@ TargetVariable
 enum aarch64_arch selected_arch = aarch64_no_arch
 
 TargetVariable
-aarch64_feature_flags aarch64_asm_isa_flags = 0
+uint64_t aarch64_asm_isa_flags_0 = 0
 
 TargetVariable
-aarch64_feature_flags aarch64_isa_flags = 0
+uint64_t aarch64_isa_flags_0 = 0
 
 TargetVariable
 unsigned aarch_enable_bti = 2



[committed 09/11] aarch64: Use constructor explicitly in get_flags_off

2024-07-24 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h
(get_flags_off): Construct aarch64_feature_flags (0) explicitly.


diff --git a/gcc/config/aarch64/aarch64-feature-deps.h 
b/gcc/config/aarch64/aarch64-feature-deps.h
index 
79126db88254b89f74a8583d50a77bc27865e265..a14ae22b72980bef5eec80588f06d9ced895dfd7
 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -97,9 +97,10 @@ template struct info;
 constexpr aarch64_feature_flags
 get_flags_off (aarch64_feature_flags mask)
 {
-  return (0
+  return (aarch64_feature_flags (0)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) \
- | (feature_deps::IDENT ().enable & mask ? AARCH64_FL_##IDENT : 0)
+ | (feature_deps::IDENT ().enable & mask ? AARCH64_FL_##IDENT \
+ : aarch64_feature_flags (0))
 #include "config/aarch64/aarch64-option-extensions.def"
  );
 }



[committed 10/11] Add new bbitmap class

2024-07-24 Thread Andrew Carlotti
This class provides a constant-size bitmap that can be used as almost a
drop-in replacement for bitmaps stored in integer types.  The
implementation is entirely within the header file and uses recursive
templated operations to support effective optimisation and usage in
constexpr expressions.

This initial implementation hardcodes the choice of uint64_t elements
for storage and initialisation, but this could instead be specified via
a second template parameter.

gcc/ChangeLog:

* bbitmap.h: New file.


diff --git a/gcc/bbitmap.h b/gcc/bbitmap.h
new file mode 100644
index 
..716c013b1035e91d00226803755a44cb40ea5643
--- /dev/null
+++ b/gcc/bbitmap.h
@@ -0,0 +1,236 @@
+/* Functions to support fixed-length bitmaps.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_BBITMAP_H
+#define GCC_BBITMAP_H
+
+/* Implementation of bounded (fixed length) bitmaps.
+
+   This provides a drop-in replacement for bitmaps that have outgrown the
+   storage capacity of a single integer.
+
+   Sets are stored as a fixed length array of uint64_t elements.  The length of
+   this array is given as a template parameter.  */
+
+/* Use recusive templated functions to define constexpr operations.  */
+template
+struct bbitmap_operators
+{
+  /* Return a result that maps binary operator OP to elements [0, M) of
+ X and Y, and takes the remaining elements from REST.  */
+  template
+  static constexpr Result binary(Operator op, const Arg &x, const Arg &y,
+Rest ...rest)
+  {
+return bbitmap_operators::template binary
+  (op, x, y, op (x.val[M - 1], y.val[M - 1]), rest...);
+  }
+
+  /* Return a result that contains the bitwise inverse of elements [0, M) of X,
+ and takes the remaining elements from REST.  */
+  template
+  static constexpr Result bit_not(const Arg &x, Rest ...rest)
+  {
+return bbitmap_operators::template bit_not
+  (x, ~(x.val[M - 1]), rest...);
+  }
+
+  /* Return true if any element [0, M) of X is nonzero.  */
+  template
+  static constexpr bool non_zero(const Arg &x)
+  {
+return (bool) x.val[M - 1]
+  || bbitmap_operators::template non_zero (x);
+  }
+
+  /* Return true if elements [0, M) of X are all equal to the corresponding
+ elements of Y.  */
+  template
+  static constexpr bool equal(const Arg &x, const Arg &y)
+  {
+return x.val[M - 1] == y.val[M - 1]
+  && bbitmap_operators::template equal (x, y);
+  }
+
+  /* If bit index INDEX selects a bit in the first M elements, return a
+ Result with that bit set and the other bits of the leading M elements
+ clear.  Clear the leading M elements otherwise.  Take the remaining
+ elements of the Result from REST.  */
+  template
+  static constexpr Result from_index(int index, Rest ...rest)
+  {
+return bbitmap_operators::template from_index
+  (index,
+   uint64_t ((index - (M - 1) * 64) == (index & 63)) << (index & 63),
+   rest...);
+  }
+};
+
+/* These functions form the base for the recursive functions above.  They
+   return either bitmap containing the elements passed in REST, or a default
+   bool result.  */
+template<>
+struct bbitmap_operators<0>
+{
+  template
+  static constexpr Result binary(Operator, const Arg, const Arg,
+Rest ...rest)
+  {
+return Result { rest... };
+  }
+
+  template
+  static constexpr Result bit_not(const Arg, Rest ...rest)
+  {
+return Result { rest... };
+  }
+
+  template
+  static constexpr bool non_zero(const Arg)
+  {
+return false;
+  }
+
+  template
+  static constexpr bool equal(const Arg, const Arg)
+  {
+return true;
+  }
+
+  template
+  static constexpr Result from_index(int, Rest ...rest)
+  {
+return Result { rest... };
+  }
+};
+
+template
+constexpr T bbitmap_element_or(T x, T y) { return x | y;}
+
+template
+constexpr T bbitmap_element_and(T x, T y) { return x & y;}
+
+template
+constexpr T bbitmap_element_xor(T x, T y) { return x ^ y;}
+
+
+
+template 
+class GTY((user)) bbitmap
+{
+public:
+  uint64_t val[N];
+
+  template
+  constexpr bbitmap(Rest ...rest) : val{(uint64_t) rest...} {}
+
+  constexpr bbitmap operator|(const bbitmap other) const
+  {
+return bbitmap_operators::template binary>
+  (bbitmap_element_or, *this, other);
+  }
+
+  bbitmap operator|

[committed 07/11] aarch64: Add explicit bool cast to return value

2024-07-24 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Add bool cast.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
66ce04d77e17a65d320578de389462756d33d110..7c2af1316b6740ccb7a383b3ac73f7c8ec36889c
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -30296,7 +30296,7 @@ aarch64_valid_sysreg_name_p (const char *regname)
   if (sysreg == NULL)
 return aarch64_is_implem_def_reg (regname);
   if (sysreg->arch_reqs)
-return (aarch64_isa_flags & sysreg->arch_reqs);
+return bool (aarch64_isa_flags & sysreg->arch_reqs);
   return true;
 }
 



[committed 11/11] aarch64: Extend aarch64_feature_flags to 128 bits

2024-07-24 Thread Andrew Carlotti
Replace the existing uint64_t typedef with a bbitmap<2> typedef.  Most
of the preparatory work was carried out in previous commits, so this
patch itself is fairly small.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Store a second uint64_t value.
* config/aarch64/aarch64-opts.h
(aarch64_feature_flags): Switch typedef to bbitmap<2>.
* config/aarch64/aarch64.cc
(aarch64_set_current_function): Extract isa mode from val[0].
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Load a second uint64_t value.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Ditto.
(aarch64_isa_flags): Ditto.
(HANDLE): Use bbitmap<2>::from_index to initialise flags.
(AARCH64_FL_ISA_MODES): Do arithmetic on integer type.
(AARCH64_ISA_MODE): Extract value from bbitmap<2> array.
* config/aarch64/aarch64.opt
(aarch64_asm_isa_flags_1): New variable.
(aarch64_isa_flags_1): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
bd0770dd0d84005701afed35d4af356380a405e9..64b65b7ff9e4bf7c72bf0c5db6fa976a51fe9f32
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -67,15 +67,19 @@ static const struct default_options 
aarch_option_optimization_table[] =
   };
 
 
-/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
-   OPTS->x_aarch64_isa_flags_0 accordingly.  */
+/* Set OPTS->x_aarch64_asm_isa_flags_<0..n> to FLAGS and update
+   OPTS->x_aarch64_isa_flags_<0..n> accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags_0 = flags;
+  opts->x_aarch64_asm_isa_flags_0 = flags.val[0];
+  opts->x_aarch64_asm_isa_flags_1 = flags.val[1];
+
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
 flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
-  opts->x_aarch64_isa_flags_0 = flags;
+
+  opts->x_aarch64_isa_flags_0 = flags.val[0];
+  opts->x_aarch64_isa_flags_1 = flags.val[1];
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
2c36bfaad19b999238601d44709c280ef987046b..80ec1a05253da62b20eebb5e491f04c6da6851e7
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -23,14 +23,16 @@
 #define GCC_AARCH64_OPTS_H
 
 #ifndef USED_FOR_TARGET
-typedef uint64_t aarch64_isa_mode;
+#include "bbitmap.h"
 
-typedef uint64_t aarch64_feature_flags;
+typedef uint64_t aarch64_isa_mode;
 
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
 #define DEF_AARCH64_ISA_MODE(IDENT) + 1
 #include "aarch64-isa-modes.def"
 );
+
+typedef bbitmap<2> aarch64_feature_flags;
 #endif
 
 /* The various cores that implement AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
6310ebd72ff2af6d39a776702ef40e9399357e95..b7e330438d9be52d9c79c12d1fb811d0b6e08688
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -23,17 +23,21 @@
 #define GCC_AARCH64_H
 
 #define aarch64_get_asm_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0))
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0, \
+ (opts)->x_aarch64_asm_isa_flags_1))
 #define aarch64_get_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0))
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0, \
+ (opts)->x_aarch64_isa_flags_1))
 
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
 #ifdef GENERATOR_FILE
 #undef aarch64_asm_isa_flags
-#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0))
+#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0,\
+ aarch64_asm_isa_flags_1))
 #undef aarch64_isa_flags
-#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0))
+#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0, \
+ aarch64_isa_flags_1))
 #else
 #undef aarch64_asm_isa_flags
 #define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (&global_options))
@@ -167,8 +171,8 @@ enum class aarch64_feature : unsigned char {
 
 /* Define unique flags for each of the above.  */
 #define HANDLE(IDENT) \
-  constexpr auto AARCH64_FL_##IDENT \
-= aarch64_feature_flags (1) << int (aarch64_feature::IDENT);
+  constexpr auto AARCH64_FL_##IDENT ATTRIBUTE_UNUSED \
+= aarch64_feature_flags::from_index (int (aarch64_feature::IDENT));
 #define DEF_AARCH64_ISA_MODE(IDENT) HANDLE (IDENT)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) HANDLE (IDENT)
 #define AARCH64_ARCH(A, B, IDENT, D, E) HANDLE (IDENT)
@@ -191,7 +195,7 @@ constexpr auto AARCH64_ISA

[committed 2/2] libstdc++: Fix and for -std=gnu++14 -fconcepts [PR116070]

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

Pushed to trunk. Backports to follow after the 14.2 release.

-- >8 --

This questionable combination of flags causes a number of errors. The
ones in the rvalue stream overloads need to be fixed in the gcc-14
branch so I'm committing it separately to simplify backporting.

libstdc++-v3/ChangeLog:

PR libstdc++/116070
* include/std/istream: Check feature test macro before using
is_class_v and is_same_v.
* include/std/ostream: Likewise.
---
 libstdc++-v3/include/std/istream | 2 +-
 libstdc++-v3/include/std/ostream | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/istream b/libstdc++-v3/include/std/istream
index 11d51d3e666..a2b207dae78 100644
--- a/libstdc++-v3/include/std/istream
+++ b/libstdc++-v3/include/std/istream
@@ -1069,7 +1069,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 2328. Rvalue stream extraction should use perfect forwarding
   // 1203. More useful rvalue stream insertion
 
-#if __cpp_concepts >= 201907L
+#if __cpp_concepts >= 201907L && __glibcxx_type_trait_variable_templates
   template
 requires __derived_from_ios_base<_Is>
   && requires (_Is& __is, _Tp&& __t) { __is >> std::forward<_Tp>(__t); }
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 8a21758d0a3..12be6c4fd17 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -768,7 +768,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 1203. More useful rvalue stream insertion
 
-#if __cpp_concepts >= 201907L
+#if __cpp_concepts >= 201907L && __glibcxx_type_trait_variable_templates
   // Use concepts if possible because they're cheaper to evaluate.
   template
 concept __derived_from_ios_base = is_class_v<_Tp>
-- 
2.45.2



[committed 1/2] libstdc++: Fix std::vector for -std=gnu++14 -fconcepts [PR116070]

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

Pushed to trunk. Backports to follow after the 14.2 release.

-- >8 --

This questionable combination of flags causes a number of errors. This
one in std::vector needs to be fixed in the gcc-13 branch so I'm
committing it separately to simplify backporting.

libstdc++-v3/ChangeLog:

PR libstdc++/116070
* include/bits/stl_bvector.h: Check feature test macro before
using is_default_constructible_v.
---
 libstdc++-v3/include/bits/stl_bvector.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 245e1c3b3a7..c45b7ff3320 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -593,7 +593,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX20_CONSTEXPR
_Bvector_impl() _GLIBCXX_NOEXCEPT_IF(
  is_nothrow_default_constructible<_Bit_alloc_type>::value)
-#if __cpp_concepts
+#if __cpp_concepts && __glibcxx_type_trait_variable_templates
requires is_default_constructible_v<_Bit_alloc_type>
 #endif
: _Bit_alloc_type()
-- 
2.45.2



[committed] testsuite: Fix up pr116034.c test for big/pdp endian [PR116061]

2024-07-24 Thread Jakub Jelinek
Hi!

Didn't notice the memmove is into an int variable, so the test
was still failing on big endian.

Committed to trunk as obvious.

2024-07-24  Jakub Jelinek  

PR tree-optimization/116034
PR testsuite/116061
* gcc.dg/pr116034.c (g): Change type from int to unsigned short.
(foo): Guard memmove call on __SIZEOF_SHORT__ == 2.

--- gcc/testsuite/gcc.dg/pr116034.c.jj  2024-07-23 10:50:10.878953531 +0200
+++ gcc/testsuite/gcc.dg/pr116034.c 2024-07-24 17:57:37.377829853 +0200
@@ -2,12 +2,13 @@
 /* { dg-do run } */
 /* { dg-options "-O1 -fno-strict-aliasing" } */
 
-int g;
+unsigned short int g;
 
 static inline int
 foo (_Complex unsigned short c)
 {
-  __builtin_memmove (&g, 1 + (char *) &c, 2);
+  if (__SIZEOF_SHORT__ == 2)
+__builtin_memmove (&g, 1 + (char *) &c, 2);
   return g;
 }
 

Jakub



Re: [PATCH v2] RISC-V: Add basic support for the Zacas extension

2024-07-24 Thread Patrick O'Neill

On 7/23/24 19:48, Kito Cheng wrote:

I incline do not add skip_zacas stuffs (although skip_zabha is already
there but that's fine), because that's different situation compare to
the zaamo/zalrsc, zaamo/zalrsc should automatically append if a
extension is available, which is new behavior and new extensions.

But zacas is only added when users explicitly add that in -march
string unlike zaamo/zalrsc, so I am not sure if we need to check the
binutils support and drop that if unsupported,

My biggest concern is : should we do so for every new extension?

I think we didn't do that so far, so we should
I included it since binutils only recently added support but you make a 
good point.
Failing explicitly for new extensions on outdated binutils makes sense 
to me.


If there aren't any objections I'll remove it from v3 and send a 
separate patch to
remove zabha so it's clear we only handle breaking changes to existing 
extensions.




Re: [PATCH] SVE Intrinsics: Change return type of redirect_call to gcall.

2024-07-24 Thread Richard Sandiford
Jennifer Schmitz  writes:
> As suggested in the review of
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657474.html,
> this patch changes the return type of gimple_folder::redirect_call from
> gimple * to gcall *. The motivation for this is that so far, most callers of
> the function had been casting the result of the function to gcall. These
> call sites were updated.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64-sve-builtins.cc
>   (gimple_folder::redirect_call): Update return type.
>   * config/aarch64/aarch64-sve-builtins.h: Likewise.
>   * config/aarch64/aarch64-sve-builtins-sve2.cc (svqshl_impl::fold):
>   Remove cast to gcall.
>   (svrshl_impl::fold): Likewise.

OK, thanks.

And sorry for my hypocrisy.  It looks like I was responsible for the
existing instances of as_a, so I should have done this myself
when adding SVE2.

Richard

> From 54bbe7cbcbc1c26171301726ff489c9f0a730e80 Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz 
> Date: Tue, 23 Jul 2024 03:54:50 -0700
> Subject: [PATCH] SVE Intrinsics: Change return type of redirect_call to gcall.
>
> As suggested in the review of
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657474.html,
> this patch changes the return type of gimple_folder::redirect_call from
> gimple * to gcall *. The motivation for this is that so far, most callers of
> the function had been casting the result of the function to gcall. These
> call sites were updated.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64-sve-builtins.cc
>   (gimple_folder::redirect_call): Update return type.
>   * config/aarch64/aarch64-sve-builtins.h: Likewise.
>   * config/aarch64/aarch64-sve-builtins-sve2.cc (svqshl_impl::fold):
>   Remove cast to gcall.
>   (svrshl_impl::fold): Likewise.
> ---
>  gcc/config/aarch64/aarch64-sve-builtins-sve2.cc | 6 +++---
>  gcc/config/aarch64/aarch64-sve-builtins.cc  | 2 +-
>  gcc/config/aarch64/aarch64-sve-builtins.h   | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> index 4f25cc68028..dc591551682 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> @@ -349,7 +349,7 @@ public:
>   instance.base_name = "svlsr";
>   instance.base = functions::svlsr;
> }
> - gcall *call = as_a  (f.redirect_call (instance));
> + gcall *call = f.redirect_call (instance);
>   gimple_call_set_arg (call, 2, amount);
>   return call;
> }
> @@ -379,7 +379,7 @@ public:
>   function_instance instance ("svlsl", functions::svlsl,
>   shapes::binary_uint_opt_n, MODE_n,
>   f.type_suffix_ids, GROUP_none, f.pred);
> - gcall *call = as_a  (f.redirect_call (instance));
> + gcall *call = f.redirect_call (instance);
>   gimple_call_set_arg (call, 2, amount);
>   return call;
> }
> @@ -392,7 +392,7 @@ public:
>   function_instance instance ("svrshr", functions::svrshr,
>   shapes::shift_right_imm, MODE_n,
>   f.type_suffix_ids, GROUP_none, f.pred);
> - gcall *call = as_a  (f.redirect_call (instance));
> + gcall *call = f.redirect_call (instance);
>   gimple_call_set_arg (call, 2, amount);
>   return call;
> }
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index f3983a123e3..0a560eaedca 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3592,7 +3592,7 @@ gimple_folder::load_store_cookie (tree type)
>  }
>  
>  /* Fold the call to a call to INSTANCE, with the same arguments.  */
> -gimple *
> +gcall *
>  gimple_folder::redirect_call (const function_instance &instance)
>  {
>registered_function *rfn
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
> b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9cc07d5fa3d..9ab6f202c30 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -629,7 +629,7 @@ public:
>tree fold_contiguous_base (gimple_seq &, tree);
>tree load_store_cookie (tree);
>  
> -  gimple *redirect_call (const function_instance &);
> +  gcall *redirect_call (const function_instance &);
>gimple *redirect_pred_x ();
>  
>gimple *fold_to_cstu (poly_uint64);


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Segher Boessenkool
Hi!

So much manual stuff needed, sigh.

On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vsdb_): Change
>     define_insn iterator to VEC_IC.

>From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).

Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
at least?  Just something like "ISA 3.1 added 128-bit things" or
whatever, but don't leave the reader second-guessing, a reader will
often guess wrong :-)

> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>     file.

Please don't line-wrap where not wanted.  Changelog lines are 80
character positions wide.  (Or 79 if you want, but heh).

> +The above instances are extension of the exiting overloaded built-ins

(existing)

> a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> new file mode 100644
> index 000..bb90f489149
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> @@ -0,0 +1,349 @@
> +/* { dg-do run  { target { int128 } && { power10_hw } } } */

Everything power10 is int128 always.

> +/* { dg-do link { target { ! power10_hw } } } */
> +/* { dg-require-effective-target power10_ok } */

So this is enough always.

Often we have two testcases, one for run, one for compiling only.  It's
a bit simpler and cleaner.

> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

Why the -save-temps?  Always document it if you want that for something,
but never put it in the testcase if not.  A leftover from development?

Okay for trunk, thank you!  Well Peter had some comments too, modulo
those I guess, I'll read them now ;-)


Segher


ping Re: [patch, libgfortran] PR105361 Incorrect end-of-file condition for derived-type I/O

2024-07-24 Thread Jerry D

On 7/22/24 8:13 AM, Jerry D wrote:

Hi all,

The attached patch fixes this by avoiding looking for and avoiding the 
EOF condition in the parent READ after returning from the child IO process.


I could not think of a simple test case yet since the problem occurred 
only when redirecting the input to the test program via a pipe.  If I 
have some more time today I will try to come up with something.


OK for mainline?

Jerry

commit e6fa9d84cf126630c9ea744aabec6d7715087310 (HEAD -> master)
Author: Jerry DeLisle 
Date:   Sun Jul 21 19:19:00 2024 -0700

     Fortran: Suppress wrong End Of File error with user defined IO.

     libgfortran/ChangeLog:
     PR libfortran/105361
     * io/list_read.c (finish_list_read): Add a condition check for
     a user defined derived type IO operation to avoid calling the
     EOF error.


I failed to mention that this patch regression tests OK on x86_64.

I also developed the attached test case. This does reproduce the error.
I will update the log entry to reflect this test case.

OK for mainline?! { dg-do run }

module x
  implicit none
  type foo
 real :: r
  end type foo
  interface read(formatted)
 module procedure read_formatted
  end interface read(formatted)
contains
  subroutine read_formatted (dtv, unit, iotype, vlist, iostat, iomsg)
class (foo), intent(inout) :: dtv
integer, intent(in) :: unit
character (len=*), intent(in) :: iotype
integer, intent(in) :: vlist(:)
integer, intent(out) :: iostat
character (len=*), intent(inout) :: iomsg
read (unit,*,iostat=iostat,iomsg=iomsg) dtv%r
!print *,dtv%r
  end subroutine read_formatted
end module x

program main
  use x
  implicit none
  type(foo) :: a, b
  real :: c, d
  open(10, access="stream") 
  write(10) "1 2" ! // NEW_LINE('A')
  close(10)
  open(10)
  read(10,*) c, d
  if ((c /= 1.0) .or. (d /= 2.0)) stop 1
  rewind(10)
  !print *, c,d
  read (10,*) a, b
  close(10, status="delete")
  if ((a%r /= 1.0) .or. (b%r /= 2.0)) stop 2
  !print *, a,b
end program main


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Segher Boessenkool
On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote:
> On 7/19/24 3:04 PM, Carl Love wrote:
> > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> > index 5af9bf920a2..2a18ee44526 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
> >  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> > 
> >  (define_insn "vsdb_"
> > - [(set (match_operand:VI2 0 "register_operand" "=v")
> > -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> > -   (match_operand:VI2 2 "register_operand" "v")
> > + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> > +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> > +   (match_operand:VEC_IC 2 "register_operand" "v")
> > (match_operand:QI 3 "const_0_to_12_operand" "n")]
> >VSHIFT_DBL_LR))]
> >"TARGET_POWER10"
> 
> I know the old code used the register_operand predicate for the vector
> operands, but those really should be changed to altivec_register_operand.

register_operand is just fine usually.  The "v" constraint already makes
sure things end up in a VMX (a lower VSX) register, the predicate
doesn't help here.  register_operand is shorter (and thus, preferred),
and also more likely correct if the code changes later :-)


Segher


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Peter Bergner
On 7/24/24 12:06 PM, Segher Boessenkool wrote:
> On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote:
>> On 7/19/24 3:04 PM, Carl Love wrote:
>>>  (define_insn "vsdb_"
>>> - [(set (match_operand:VI2 0 "register_operand" "=v")
>>> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
>>> -   (match_operand:VI2 2 "register_operand" "v")
>>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
>>> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
>>> +   (match_operand:VEC_IC 2 "register_operand" "v")
>>> (match_operand:QI 3 "const_0_to_12_operand" "n")]
>>>VSHIFT_DBL_LR))]
>>>"TARGET_POWER10"
>>
>> I know the old code used the register_operand predicate for the vector
>> operands, but those really should be changed to altivec_register_operand.
> 
> register_operand is just fine usually.  The "v" constraint already makes
> sure things end up in a VMX (a lower VSX) register, the predicate
> doesn't help here.  register_operand is shorter (and thus, preferred),
> and also more likely correct if the code changes later :-)

I thought we always wanted the predicate to match the constraint being used?

Peter



Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Peter Bergner


On 7/24/24 12:03 PM, Segher Boessenkool wrote:
>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> 
> Why the -save-temps?  Always document it if you want that for something,
> but never put it in the testcase if not.  A leftover from development?

I can answer this one. :-).  Since these are dg-do run/link tests,
we need the -save-temps to keep the assembler files around for Carl's
scan-assembler tests at the end of the test case.

Peter




[committed][rtl-optimization/116037] Explicitly track if a destination was skipped in ext-dce

2024-07-24 Thread Jeff Law
So this has been in the hopper since the first bugs were reported 
against ext-dce.  It'd been holding off committing as I was finding 
other issues in terms of correctness of live computations.  There's 
still problems in that space, but I think it's time to push this chunk 
forward.  I'm marking it as 116037, but it may impact other bugs.


This patch starts explicitly tracking if set processing skipped a 
destination, which can happen for wide modes (TI+), vectors, certain 
subregs, etc.  This is computed during ext_dce_set_processing.


During use processing we use that flag to determine reliably if we need 
to make the inputs fully live and to avoid even trying to eliminate an 
extension if we skipped output processing.


While testing this I found that a recent change to fix cases where we 
had two subreg input operands mucked up the code to make things like a 
shift/rotate count fully live.  So that goof has been fixed.


Bootstrapped and regression tested on x86.  Most, but not all, of these 
changes have also been tested on the crosses.  Pushing to the trunk.


I'm not including it in this patch but I'm poking at converting this 
code to use note_uses/note_stores to make it more maintainable.  The 
SUBREG and STRICT_LOW_PART handling of note_stores is problematical, but 
I think it's solvable.  I haven't tried a conversion to note_uses yet.


Jeffcommit 679086172b84be18c55fdbb9cda7e97806e7c083
Author: Jeff Law 
Date:   Wed Jul 24 11:16:26 2024 -0600

[rtl-optimization/116037] Explicitly track if a destination was skipped in 
ext-dce

So this has been in the hopper since the first bugs were reported against
ext-dce.  It'd been holding off committing as I was finding other issues in
terms of correctness of live computations.  There's still problems in that
space, but I think it's time to push this chunk forward.  I'm marking it as
116037, but it may impact other bugs.

This patch starts explicitly tracking if set processing skipped a 
destination,
which can happen for wide modes (TI+), vectors, certain subregs, etc.  This 
is
computed during ext_dce_set_processing.

During use processing we use that flag to determine reliably if we need to 
make
the inputs fully live and to avoid even trying to eliminate an extension if 
we
skipped output processing.

While testing this I found that a recent change to fix cases where we had 
two
subreg input operands mucked up the code to make things like a shift/rotate
count fully live.  So that goof has been fixed.

Bootstrapped and regression tested on x86.  Most, but not all, of these 
changes
have also been tested on the crosses.  Pushing to the trunk.

I'm not including it in this patch but I'm poking at converting this code to
use note_uses/note_stores to make it more maintainable.  The SUBREG and
STRICT_LOW_PART handling of note_stores is problematical, but I think it's
solvable.  I haven't tried a conversion to note_uses yet.

PR rtl-optimization/116037
gcc/
* ext-dce.cc (ext_dce_process_sets): Note if we ever skip a dest
and return that info explicitly.
(ext_dce_process_uses): If a set was skipped, then consider all bits
in every input as live.  Do not try to optimize away an extension if
we skipped processing a destination in the same insn.  Restore code
to make shift/rotate count fully live.
(ext_dce_process_bb): Handle API changes for ext_dce_process_sets.

gcc/testsuite/
* gcc.dg/torture/pr116037.c: New test

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c56dfb505b8..c94d1fc3414 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -181,9 +181,11 @@ safe_for_live_propagation (rtx_code code)
within an object) are set by INSN, the more aggressive the
optimization phase during use handling will be.  */
 
-static void
+static bool
 ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp)
 {
+  bool skipped_dest = false;
+
   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, obj, NONCONST)
 {
@@ -210,6 +212,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Skip the subrtxs of this destination.  There is
 little value in iterating into the subobjects, so
 just skip them for a bit of efficiency.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -241,6 +244,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Skip the subrtxs of the STRICT_LOW_PART.  We can't
 process them because it'll set objects as no longer
 live when they are in fact still live.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -2

[PATCH] c++: Mostly concepts related formatting fixes

2024-07-24 Thread Jakub Jelinek
Hi!

When playing with P2963R3, while reading and/or modifying code I've fixed
various comment or code formatting issues (and in 3 spots also comment
wording), but including that in the WIP P2963R3 patch made that patch
totally unreadable because these changes were 4 times the size of the
actual code changes.

So, here it is separated to a pure formatting + comment wording patch.
Is that something we want, just parts of it, or throw away?

Shouldn't change anything on the generated code...

2024-07-24  Jakub Jelinek  

* constraint.cc (subst_info::quiet, subst_info::noisy): Formatting
fixes.
(known_non_bool_p): Comment formatting fixes.
(unpack_concept_check): Likewise.
(resolve_function_concept_overload): Likewise.
(resolve_function_concept_check): Likewise.
(resolve_concept_check): Likewise.
(deduce_constrained_parameter): Likewise.
(finish_type_constraints): Likewise.
(get_returned_expression): Likewise.
(get_variable_initializer): Likewise.
(norm_info::update_context, norm_info::ctx_params): Formatting
fixes.
(norm_info::context): Comment formatting fixes.
(normalize_logical_operation): Likewise.  Formatting fix.
(normalize_concept_check): Comment formatting fixes.
(normalize_atom): Likewise.
(normalize_expression): Likewise.
(get_normalized_constraints_from_info): Likewise.
(get_normalized_constraints_from_decl): Likewise.  Formatting
fixes.
(atomic_constraints_identical_p): Comment formatting fixes.
(constraints_equivalent_p): Formatting fixes.
(inchash::add_constraint): Likewise.
(associate_classtype_constraints): Comment formatting fixes.
(get_constraints): Likewise.
(set_constraints): Likewise.
(build_concept_check_arguments): Likewise.
(build_function_check): Likewise.
(build_concept_check): Likewise.
(finish_shorthand_constraint): Likewise.
(get_shorthand_constraints): Likewise.
(check_constraint_variables): Likewise.
(tsubst_constraint_variables): Likewise.
(tsubst_requires_expr): Likewise.
(get_mapped_args): Likewise.  Formatting fixes.
(satisfy_atom): Comment formatting fixes.
(satisfy_constraint_r): Comment wording and formatting fixes.
(satisfy_normalized_constraints): Comment formatting fixes.
(satisfy_declaration_constraints): Likewise.
(evaluate_concept_check): Likewise.
(finish_requires_expr): Likewise.
(finish_compound_requirement): Likewise.
(check_function_concept): Likewise.
(equivalently_constrained): Likewise.
(more_constrained): Likewise.
(diagnose_atomic_constraint): Likewise.
* cp-tree.h (TREE_LANG_FLAG_0): Fix a comment error,
FOLD_EXPR_MODIFY_P instead of FOLD_EXPR_MODOP_P.
(DECL_MAIN_FREESTANDING_P, DECL_MAIN_P): Comment formatting fixes.
(enum cpp0x_warn_str): Likewise.
(enum composite_pointer_operation): Likewise.
(enum expr_list_kind): Likewise.
(enum impl_conv_rhs): Likewise.
(enum impl_conv_void): Likewise.
(struct deferred_access_check): Likewise.
(ATOMIC_CONSTR_EXPR): Likewise.
(FUNCTION_REF_QUALIFIED): Likewise.
(DECL_DEPENDENT_P): Likewise.
(FOLD_EXPR_MODIFY_P): Likewise.
(FOLD_EXPR_OP_RAW): Likewise.
(FOLD_EXPR_PACK): Likewise.
(FOLD_EXPR_INIT): Likewise.
(TYPE_WAS_UNNAMED): Likewise.
(class cp_unevaluated): Likewise.
(struct ovl_op_info_t assertion): Likewise.
(cp_declarator::function::requires_clause): Likewise.
(variable_template_p): Likewise.
(concept_definition_p): Likewise.
* logic.cc (clause::clause): Likewise.
(clause::replace): Likewise.
(clause::insert): Likewise.  Formatting fixes.
(struct formula): Comment formatting fixes.
(formula::branch): Likewise.
(debug): Formatting fixes.
(dnf_size_r): Comment formatting fixes.
(cnf_size_r): Likewise.
(dnf_size): Likewise.
(cnf_size): Likewise.
(branch_clause): Likewise.
(decompose_atom): Likewise.
(decompose_term): Likewise.  Formatting fixes.
(struct subsumption_entry): Comment formatting fixes.
(subsumption_cache): Likewise.
(save_subsumption): Likewise.  Formatting fixes.
(subsumes_constraints_nonnull): Formatting fixes.

--- gcc/cp/constraint.cc.jj 2024-07-24 15:47:15.207477692 +0200
+++ gcc/cp/constraint.cc2024-07-24 18:58:32.646330544 +0200
@@ -83,13 +83,13 @@ struct subst_info
   { }
 
   /* True if we should not diagnose errors.  */
-  bool quiet() const
+  bool quiet () const
   {
 return !(complain & tf_warning_or_error);
   }
 
   /* True if we should diagnose errors.  */
-  bool noisy() const
+  bool noisy () 

Re: [PATCH] c++: Mostly concepts related formatting fixes

2024-07-24 Thread Jason Merrill

On 7/24/24 1:33 PM, Jakub Jelinek wrote:

Hi!

When playing with P2963R3, while reading and/or modifying code I've fixed
various comment or code formatting issues (and in 3 spots also comment
wording), but including that in the WIP P2963R3 patch made that patch
totally unreadable because these changes were 4 times the size of the
actual code changes.

So, here it is separated to a pure formatting + comment wording patch.
Is that something we want, just parts of it, or throw away?

@@ -627,7 +626,7 @@ decompose_disjunction (formula& f, claus
  branch_clause (f, c, t);
  }
  
-/* An atomic constraint is already decomposed.  */

+/* An atomic or fold expanded constraint is already decomposed.  */


This hunk should be part of the P2963 patch.  Everything else is OK.

Jason



[PATCH 0/5] RISC-V: Enable stack-clash protection

2024-07-24 Thread Raphael Moreira Zinsly
Hi All,

This patch series implements stack-clash protection for RISC-V using 4K
probes as default. The non-vector implementation is based on AArch64’s
as the generated stack frame is similar.
The tests are also adapted from AArch64.


Thanks,
Raphael

Raphael Moreira Zinsly (5):
  RISC-V: Small stack tie changes
  RISC-V: Move riscv_v_adjust_scalable_frame
  RISC-V: Stack-clash protection implemention
  RISC-V: Add support to vector stack-clash protection
  RISC-V: Enable stack clash in alloca

 gcc/config/riscv/riscv.cc | 396 ++
 gcc/config/riscv/riscv.h  |  27 ++
 gcc/config/riscv/riscv.md |   2 +-
 gcc/testsuite/gcc.dg/params/blocksort-part.c  |   2 +-
 gcc/testsuite/gcc.dg/pr82788.c|   2 +-
 gcc/testsuite/gcc.dg/stack-check-6.c  |   2 +-
 gcc/testsuite/gcc.dg/stack-check-6a.c |   2 +-
 .../gcc.target/riscv/stack-check-12.c |  23 +
 .../gcc.target/riscv/stack-check-13.c |  26 ++
 .../gcc.target/riscv/stack-check-14.c |  24 ++
 .../gcc.target/riscv/stack-check-15.c |  21 +
 .../gcc.target/riscv/stack-check-alloca-1.c   |  15 +
 .../gcc.target/riscv/stack-check-alloca-10.c  |  13 +
 .../gcc.target/riscv/stack-check-alloca-2.c   |  11 +
 .../gcc.target/riscv/stack-check-alloca-3.c   |  11 +
 .../gcc.target/riscv/stack-check-alloca-4.c   |  12 +
 .../gcc.target/riscv/stack-check-alloca-5.c   |  12 +
 .../gcc.target/riscv/stack-check-alloca-6.c   |  12 +
 .../gcc.target/riscv/stack-check-alloca-7.c   |  12 +
 .../gcc.target/riscv/stack-check-alloca-8.c   |  14 +
 .../gcc.target/riscv/stack-check-alloca-9.c   |  13 +
 .../gcc.target/riscv/stack-check-alloca.h |  15 +
 .../gcc.target/riscv/stack-check-cfa-1.c  |  12 +
 .../gcc.target/riscv/stack-check-cfa-2.c  |  13 +
 .../gcc.target/riscv/stack-check-cfa-3.c  |  13 +
 .../gcc.target/riscv/stack-check-prologue-1.c |   9 +
 .../riscv/stack-check-prologue-10.c   |  11 +
 .../riscv/stack-check-prologue-11.c   |  11 +
 .../riscv/stack-check-prologue-12.c   |  15 +
 .../riscv/stack-check-prologue-13.c   |  20 +
 .../riscv/stack-check-prologue-14.c   |  24 ++
 .../riscv/stack-check-prologue-15.c   |  23 +
 .../riscv/stack-check-prologue-16.c   |  30 ++
 .../gcc.target/riscv/stack-check-prologue-2.c |  10 +
 .../gcc.target/riscv/stack-check-prologue-3.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-4.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-5.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-6.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-7.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-8.c |  10 +
 .../gcc.target/riscv/stack-check-prologue-9.c |  11 +
 .../gcc.target/riscv/stack-check-prologue.h   |   5 +
 .../gcc.target/riscv/struct_vect_24.c |  47 +++
 gcc/testsuite/lib/target-supports.exp |   6 +-
 44 files changed, 912 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-2.c
 crea

[PATCH 1/5] RISC-V: Small stack tie changes

2024-07-24 Thread Raphael Moreira Zinsly
Enable the register used by riscv_emit_stack_tie () to be passed as
an argument so we can tie the stack with other registers besides
hard_frame_pointer_rtx.
Also don't allow operand 1 of stack_tie to be optimized to sp
in preparation for the stack clash protection support.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_emit_stack_tie): Pass the
  register to be tied to the stack pointer as argument.
* config/riscv/riscv.md (stack_tie): Don't match equal
  operands.
---
 gcc/config/riscv/riscv.cc | 18 +-
 gcc/config/riscv/riscv.md |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 19b9b2daa95..f85d018c514 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7891,12 +7891,12 @@ riscv_adjust_multi_push_cfi_prologue (int saved_size)
 }
 
 static void
-riscv_emit_stack_tie (void)
+riscv_emit_stack_tie (rtx reg)
 {
   if (Pmode == SImode)
-emit_insn (gen_stack_tiesi (stack_pointer_rtx, hard_frame_pointer_rtx));
+emit_insn (gen_stack_tiesi (stack_pointer_rtx, reg));
   else
-emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
+emit_insn (gen_stack_tiedi (stack_pointer_rtx, reg));
 }
 
 /*zcmp multi push and pop code_for_push_pop function ptr array  */
@@ -8077,7 +8077,7 @@ riscv_expand_prologue (void)
GEN_INT ((frame->hard_frame_pointer_offset - 
remaining_size).to_constant ()));
   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
 
-  riscv_emit_stack_tie ();
+  riscv_emit_stack_tie (hard_frame_pointer_rtx);
 }
 
   /* Save the V registers.  */
@@ -8108,7 +8108,7 @@ riscv_expand_prologue (void)
 allocation is ordered WRT fp setup and subsequent writes
 into the frame.  */
  if (frame_pointer_needed)
-   riscv_emit_stack_tie ();
+   riscv_emit_stack_tie (hard_frame_pointer_rtx);
  return;
}
 
@@ -8147,7 +8147,7 @@ riscv_expand_prologue (void)
 allocation is ordered WRT fp setup and subsequent writes
 into the frame.  */
   if (frame_pointer_needed)
-   riscv_emit_stack_tie ();
+   riscv_emit_stack_tie (hard_frame_pointer_rtx);
 }
 }
 
@@ -8282,7 +8282,7 @@ riscv_expand_epilogue (int style)
   if (cfun->calls_alloca)
 {
   /* Emit a barrier to prevent loads from a deallocated stack.  */
-  riscv_emit_stack_tie ();
+  riscv_emit_stack_tie (hard_frame_pointer_rtx);
   need_barrier_p = false;
 
   poly_int64 adjust_offset = -frame->hard_frame_pointer_offset;
@@ -8376,7 +8376,7 @@ riscv_expand_epilogue (int style)
   if (known_gt (step1, 0))
 {
   /* Emit a barrier to prevent loads from a deallocated stack.  */
-  riscv_emit_stack_tie ();
+  riscv_emit_stack_tie (hard_frame_pointer_rtx);
   need_barrier_p = false;
 
   /* Restore the scalable frame which is assigned in prologue.  */
@@ -8476,7 +8476,7 @@ riscv_expand_epilogue (int style)
 frame->mask = mask; /* Undo the above fib.  */
 
   if (need_barrier_p)
-riscv_emit_stack_tie ();
+riscv_emit_stack_tie (hard_frame_pointer_rtx);
 
   /* Deallocate the final bit of the frame.  */
   if (step2.to_constant () > 0)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 46c46039c33..5780c5abacf 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3969,7 +3969,7 @@
(unspec:BLK [(match_operand:X 0 "register_operand" "r")
 (match_operand:X 1 "register_operand" "r")]
UNSPEC_TIE))]
-  ""
+  "!rtx_equal_p (operands[0], operands[1])"
   ""
   [(set_attr "type" "ghost")
(set_attr "length" "0")]
-- 
2.42.0



[PATCH 2/5] RISC-V: Move riscv_v_adjust_scalable_frame

2024-07-24 Thread Raphael Moreira Zinsly
Move riscv_v_adjust_scalable_frame () in preparation for the stack clash
protection support.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): Move
  closer to riscv_expand_prologue.
---
 gcc/config/riscv/riscv.cc | 62 +++
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f85d018c514..89fc8966654 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3122,37 +3122,6 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, 
rtx tmp, rtx src)
 }
 }
 
-/* Adjust scalable frame of vector for prologue && epilogue. */
-
-static void
-riscv_v_adjust_scalable_frame (rtx target, poly_int64 offset, bool epilogue)
-{
-  rtx tmp = RISCV_PROLOGUE_TEMP (Pmode);
-  rtx adjust_size = RISCV_PROLOGUE_TEMP2 (Pmode);
-  rtx insn, dwarf, adjust_frame_rtx;
-
-  riscv_legitimize_poly_move (Pmode, adjust_size, tmp,
- gen_int_mode (offset, Pmode));
-
-  if (epilogue)
-insn = gen_add3_insn (target, target, adjust_size);
-  else
-insn = gen_sub3_insn (target, target, adjust_size);
-
-  insn = emit_insn (insn);
-
-  RTX_FRAME_RELATED_P (insn) = 1;
-
-  adjust_frame_rtx
-= gen_rtx_SET (target,
-  plus_constant (Pmode, target, epilogue ? offset : -offset));
-
-  dwarf = alloc_reg_note (REG_FRAME_RELATED_EXPR, copy_rtx (adjust_frame_rtx),
- NULL_RTX);
-
-  REG_NOTES (insn) = dwarf;
-}
-
 /* Take care below subreg const_poly_int move:
 
1. (set (subreg:DI (reg:TI 237) 8)
@@ -7928,6 +7897,37 @@ static const code_for_push_pop_t 
code_for_push_pop[ZCMP_MAX_GRP_SLOTS][ZCMP_OP_N
   code_for_gpr_multi_popret_up_to_s11,
   code_for_gpr_multi_popretz_up_to_s11}};
 
+/* Adjust scalable frame of vector for prologue && epilogue. */
+
+static void
+riscv_v_adjust_scalable_frame (rtx target, poly_int64 offset, bool epilogue)
+{
+  rtx tmp = RISCV_PROLOGUE_TEMP (Pmode);
+  rtx adjust_size = RISCV_PROLOGUE_TEMP2 (Pmode);
+  rtx insn, dwarf, adjust_frame_rtx;
+
+  riscv_legitimize_poly_move (Pmode, adjust_size, tmp,
+ gen_int_mode (offset, Pmode));
+
+  if (epilogue)
+insn = gen_add3_insn (target, target, adjust_size);
+  else
+insn = gen_sub3_insn (target, target, adjust_size);
+
+  insn = emit_insn (insn);
+
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  adjust_frame_rtx
+= gen_rtx_SET (target,
+  plus_constant (Pmode, target, epilogue ? offset : -offset));
+
+  dwarf = alloc_reg_note (REG_FRAME_RELATED_EXPR, copy_rtx (adjust_frame_rtx),
+ NULL_RTX);
+
+  REG_NOTES (insn) = dwarf;
+}
+
 static rtx
 riscv_gen_multi_push_pop_insn (riscv_zcmp_op_t op, HOST_WIDE_INT adj_size,
   unsigned int regs_num)
-- 
2.42.0



[PATCH 4/5] RISC-V: Add support to vector stack-clash protection

2024-07-24 Thread Raphael Moreira Zinsly
Adds basic support to vector stack-clash protection using a loop to do
the probing and stack adjustments.

gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_allocate_and_probe_stack_loop): New function.
(riscv_v_adjust_scalable_frame): Add stack-clash protection
support.
(riscv_allocate_and_probe_stack_space): Move the probe loop
implementation to riscv_allocate_and_probe_stack_loop.
* config/riscv/riscv.h: Define RISCV_STACK_CLASH_VECTOR_CFA_REGNUM.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack-check-cfa-3.c: New test.
* gcc.target/riscv/stack-check-prologue-16.c: New test.
* gcc.target/riscv/struct_vect_24.c: New test.
---
 gcc/config/riscv/riscv.cc | 99 +++
 gcc/config/riscv/riscv.h  |  2 +
 .../gcc.target/riscv/stack-check-cfa-3.c  | 13 +++
 .../riscv/stack-check-prologue-16.c   | 30 ++
 .../gcc.target/riscv/struct_vect_24.c | 47 +
 5 files changed, 170 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/struct_vect_24.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 292d190f319..69c0e07f4c5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7897,6 +7897,35 @@ static const code_for_push_pop_t 
code_for_push_pop[ZCMP_MAX_GRP_SLOTS][ZCMP_OP_N
   code_for_gpr_multi_popret_up_to_s11,
   code_for_gpr_multi_popretz_up_to_s11}};
 
+/*  Set a probe loop for stack clash protection.  */
+static void
+riscv_allocate_and_probe_stack_loop (rtx tmp, enum rtx_code code,
+rtx op0, rtx op1, bool vector,
+HOST_WIDE_INT offset)
+{
+  tmp = riscv_force_temporary (tmp, gen_int_mode (offset, Pmode));
+
+  /* Loop.  */
+  rtx label = gen_label_rtx ();
+  emit_label (label);
+
+  /* Allocate and probe stack.  */
+  emit_insn (gen_sub3_insn (stack_pointer_rtx, stack_pointer_rtx, tmp));
+  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+   STACK_CLASH_CALLER_GUARD));
+  emit_insn (gen_blockage ());
+
+  /* Adjust the remaining vector length.  */
+  if (vector)
+emit_insn (gen_sub3_insn (op0, op0, tmp));
+
+  /* Branch if there's still more bytes to probe.  */
+  riscv_expand_conditional_branch (label, code, op0, op1);
+  JUMP_LABEL (get_last_insn ()) = label;
+
+  emit_insn (gen_blockage ());
+}
+
 /* Adjust scalable frame of vector for prologue && epilogue. */
 
 static void
@@ -7909,6 +7938,49 @@ riscv_v_adjust_scalable_frame (rtx target, poly_int64 
offset, bool epilogue)
   riscv_legitimize_poly_move (Pmode, adjust_size, tmp,
  gen_int_mode (offset, Pmode));
 
+  /* If doing stack clash protection then we use a loop to allocate and probe
+ the stack.  */
+  if (flag_stack_clash_protection && !epilogue)
+{
+  HOST_WIDE_INT min_probe_threshold
+   = (1 << param_stack_clash_protection_guard_size) - 
STACK_CLASH_CALLER_GUARD;
+
+  if (!frame_pointer_needed)
+   {
+ /* This is done to provide unwinding information for the stack
+adjustments we're about to do, however to prevent the optimizers
+from removing the S0 move and leaving the CFA note (which would be
+very wrong) we tie the old and new stack pointer together.
+The tie will expand to nothing but the optimizers will not touch
+the instruction.  */
+ insn = get_last_insn ();
+ rtx stack_ptr_copy = gen_rtx_REG (Pmode, 
RISCV_STACK_CLASH_VECTOR_CFA_REGNUM);
+ emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
+ riscv_emit_stack_tie (stack_ptr_copy);
+
+ /* We want the CFA independent of the stack pointer for the
+duration of the loop.  */
+ add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
+ RTX_FRAME_RELATED_P (insn) = 1;
+   }
+
+  riscv_allocate_and_probe_stack_loop (tmp, GE, adjust_size, tmp, true,
+  min_probe_threshold);
+
+  /* Allocate the residual.  */
+  insn = emit_insn (gen_sub3_insn (target, target, adjust_size));
+
+  /* Now reset the CFA register if needed.  */
+  if (!frame_pointer_needed)
+   {
+ add_reg_note (insn, REG_CFA_DEF_CFA,
+   plus_constant (Pmode, stack_pointer_rtx, -offset));
+ RTX_FRAME_RELATED_P (insn) = 1;
+   }
+
+  return;
+}
+
   if (epilogue)
 insn = gen_add3_insn (target, target, adjust_size);
   else
@@ -8056,8 +8128,9 @@ riscv_allocate_and_probe_stack_space (rtx temp1, 
HOST_WIDE_INT size)
   else
 {
   /* Compute the ending address.  */
-  temp1 = riscv_force_temporary (temp1, gen_int_mode (rounded_size, 
Pmode));
- 

[PATCH 5/5] RISC-V: Enable stack clash in alloca

2024-07-24 Thread Raphael Moreira Zinsly
Add the TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE to riscv in
order to enable stack clash protection when using alloca.
The code and tests are the same used by aarch64.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_compute_frame_info): Update
  outgoing args size.
  (riscv_stack_clash_protection_alloca_probe_range): New.
  (TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.
* config/riscv/riscv.h
  (STACK_CLASH_MIN_BYTES_OUTGOING_ARGS): New.
  (STACK_DYNAMIC_OFFSET): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack-check-14.c: New test.
* gcc.target/riscv/stack-check-15.c: New test.
* gcc.target/riscv/stack-check-alloca-1.c: New test.
* gcc.target/riscv/stack-check-alloca-2.c: New test.
* gcc.target/riscv/stack-check-alloca-3.c: New test.
* gcc.target/riscv/stack-check-alloca-4.c: New test.
* gcc.target/riscv/stack-check-alloca-5.c: New test.
* gcc.target/riscv/stack-check-alloca-6.c: New test.
* gcc.target/riscv/stack-check-alloca-7.c: New test.
* gcc.target/riscv/stack-check-alloca-8.c: New test.
* gcc.target/riscv/stack-check-alloca-9.c: New test.
* gcc.target/riscv/stack-check-alloca-10.c: New test.
* gcc.target/riscv/stack-check-alloca.h: New.
---
 gcc/config/riscv/riscv.cc | 17 +
 gcc/config/riscv/riscv.h  | 17 +
 .../gcc.target/riscv/stack-check-14.c | 24 +++
 .../gcc.target/riscv/stack-check-15.c | 21 
 .../gcc.target/riscv/stack-check-alloca-1.c   | 15 
 .../gcc.target/riscv/stack-check-alloca-10.c  | 13 ++
 .../gcc.target/riscv/stack-check-alloca-2.c   | 11 +
 .../gcc.target/riscv/stack-check-alloca-3.c   | 11 +
 .../gcc.target/riscv/stack-check-alloca-4.c   | 12 ++
 .../gcc.target/riscv/stack-check-alloca-5.c   | 12 ++
 .../gcc.target/riscv/stack-check-alloca-6.c   | 12 ++
 .../gcc.target/riscv/stack-check-alloca-7.c   | 12 ++
 .../gcc.target/riscv/stack-check-alloca-8.c   | 14 +++
 .../gcc.target/riscv/stack-check-alloca-9.c   | 13 ++
 .../gcc.target/riscv/stack-check-alloca.h | 15 
 15 files changed, 219 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-alloca.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 69c0e07f4c5..a110e011766 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7245,6 +7245,10 @@ riscv_compute_frame_info (void)
 
   frame = &cfun->machine->frame;
 
+  /* Adjust the outgoing arguments size if required.  Keep it in sync with what
+ the mid-end is doing.  */
+  crtl->outgoing_args_size = STACK_DYNAMIC_OFFSET (cfun);
+
   /* In an interrupt function, there are two cases in which t0 needs to be 
used:
  1, If we have a large frame, then we need to save/restore t0.  We check 
for
  this before clearing the frame struct.
@@ -11879,6 +11883,15 @@ riscv_c_mode_for_floating_type (enum tree_index ti)
   return default_mode_for_floating_type (ti);
 }
 
+/* On riscv we have an ABI defined safe buffer.  This constant is used to
+   determining the probe offset for alloca.  */
+
+static HOST_WIDE_INT
+riscv_stack_clash_protection_alloca_probe_range (void)
+{
+  return STACK_CLASH_CALLER_GUARD;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -12187,6 +12200,10 @@ riscv_c_mode_for_floating_type (enum tree_index ti)
 #define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \
   riscv_vectorize_preferred_vector_alignment
 
+#undef TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE
+#define TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE \
+  riscv_stack_clash_protection_alloca_probe_range
+
 /* Mode switching hooks.  */
 
 #undef TARGET_MODE_EMIT
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 0432beb81e0..7f20190e960 100644
--- a/gcc/config/riscv/riscv.h
+++

[PATCH 3/5] RISC-V: Stack-clash protection implemention

2024-07-24 Thread Raphael Moreira Zinsly
This implements stack-clash protection for riscv, with
riscv_allocate_and_probe_stack_space being based of
aarch64_allocate_and_probe_stack_space from aarch64's implementation.
We enforce the probing interval and the guard size to always be equal, their
default value is 4Kb which is riscv page size.

We also probe up by 1024 bytes in the general case when a probe is required.

gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_option_override): Enforce that interval is the same size as
guard size.
(riscv_allocate_and_probe_stack_space): New function.
(riscv_expand_prologue): Call riscv_allocate_and_probe_stack_space
to the final allocation of the stack and add stack-clash dump
information.
* config/riscv/riscv.h: Define STACK_CLASH_CALLER_GUARD and
STACK_CLASH_MAX_UNROLL_PAGES.

gcc/testsuite/ChangeLog:
* gcc.dg/params/blocksort-part.c: Skip riscv for
stack-clash protection intervals.
* gcc.dg/pr82788.c: Skip riscv.
* gcc.dg/stack-check-6.c: Skip residual check for riscv.
* gcc.dg/stack-check-6a.c: Skip riscv.
* gcc.target/riscv/stack-check-12.c: New test.
* gcc.target/riscv/stack-check-13.c: New test.
* gcc.target/riscv/stack-check-cfa-1.c: New test.
* gcc.target/riscv/stack-check-cfa-2.c: New test.
* gcc.target/riscv/stack-check-prologue-1.c: New test.
* gcc.target/riscv/stack-check-prologue-10.c: New test.
* gcc.target/riscv/stack-check-prologue-11.c: New test.
* gcc.target/riscv/stack-check-prologue-12.c: New test.
* gcc.target/riscv/stack-check-prologue-13.c: New test.
* gcc.target/riscv/stack-check-prologue-14.c: New test.
* gcc.target/riscv/stack-check-prologue-15.c: New test.
* gcc.target/riscv/stack-check-prologue-2.c: New test.
* gcc.target/riscv/stack-check-prologue-3.c: New test.
* gcc.target/riscv/stack-check-prologue-4.c: New test.
* gcc.target/riscv/stack-check-prologue-5.c: New test.
* gcc.target/riscv/stack-check-prologue-6.c: New test.
* gcc.target/riscv/stack-check-prologue-7.c: New test.
* gcc.target/riscv/stack-check-prologue-8.c: New test.
* gcc.target/riscv/stack-check-prologue-9.c: New test.
* gcc.target/riscv/stack-check-prologue.h: New file.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection):
Add riscv.
(check_effective_target_caller_implicit_probes): Likewise.
---
 gcc/config/riscv/riscv.cc | 244 +++---
 gcc/config/riscv/riscv.h  |   8 +
 gcc/testsuite/gcc.dg/params/blocksort-part.c  |   2 +-
 gcc/testsuite/gcc.dg/pr82788.c|   2 +-
 gcc/testsuite/gcc.dg/stack-check-6.c  |   2 +-
 gcc/testsuite/gcc.dg/stack-check-6a.c |   2 +-
 .../gcc.target/riscv/stack-check-12.c |  23 ++
 .../gcc.target/riscv/stack-check-13.c |  26 ++
 .../gcc.target/riscv/stack-check-cfa-1.c  |  12 +
 .../gcc.target/riscv/stack-check-cfa-2.c  |  13 +
 .../gcc.target/riscv/stack-check-prologue-1.c |   9 +
 .../riscv/stack-check-prologue-10.c   |  11 +
 .../riscv/stack-check-prologue-11.c   |  11 +
 .../riscv/stack-check-prologue-12.c   |  15 ++
 .../riscv/stack-check-prologue-13.c   |  20 ++
 .../riscv/stack-check-prologue-14.c   |  24 ++
 .../riscv/stack-check-prologue-15.c   |  23 ++
 .../gcc.target/riscv/stack-check-prologue-2.c |  10 +
 .../gcc.target/riscv/stack-check-prologue-3.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-4.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-5.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-6.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-7.c |  11 +
 .../gcc.target/riscv/stack-check-prologue-8.c |  10 +
 .../gcc.target/riscv/stack-check-prologue-9.c |  11 +
 .../gcc.target/riscv/stack-check-prologue.h   |   5 +
 gcc/testsuite/lib/target-supports.exp |   6 +-
 27 files changed, 504 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack-check-prologue-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/stac

Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Segher Boessenkool
On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote:
> On 7/24/24 12:06 PM, Segher Boessenkool wrote:
> I thought we always wanted the predicate to match the constraint being used?

Predicates and constraints have different purposes, and are used at
different times (typically).  Everything before RA is predicates
only, and RA and everything after it use constraints (as well).

register_operand says it has to be a register.  It allows any
pseudo-register, so before RA, there is no real difference between
register_operand and altivec_register_operand (which allows all pseudos
as well)..

The constraint should not demand things that weren't clear earlier,
because that will then cause reloading eventually, often with less
efficient code.  It still will *work* though.

But that is not the case here :-)


Segher


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Segher Boessenkool
On Wed, Jul 24, 2024 at 12:16:33PM -0500, Peter Bergner wrote:
> 
> On 7/24/24 12:03 PM, Segher Boessenkool wrote:
> >> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> > 
> > Why the -save-temps?  Always document it if you want that for something,
> > but never put it in the testcase if not.  A leftover from development?
> 
> I can answer this one. :-).  Since these are dg-do run/link tests,
> we need the -save-temps to keep the assembler files around for Carl's
> scan-assembler tests at the end of the test case.

Ah!  Gotcha.  Please add a comment then?  Just something trivial, a
short line is enough :-)


Segher


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Carl Love

Segher:

Thanks for the review, a few questions...

On 7/24/24 10:03 AM, Segher Boessenkool wrote:

Hi!

So much manual stuff needed, sigh.

On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:

gcc/ChangeLog:
     * config/rs6000/altivec.md (vsdb_): Change
     define_insn iterator to VEC_IC.

 From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).

Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
at least?  Just something like "ISA 3.1 added 128-bit things" or
whatever, but don't leave the reader second-guessing, a reader will
often guess wrong :-)


I don't disagree that the reader will guess wrong, probably after being 
frustated that it isn't obvious.  :-)
The VEC_IC was an existing definition, this patch does not add it.  Your 
comments seems to imply you want a comment on the definition for VEC_IC 
in vector.md?  I could add one to the existing definition if you like 
but it seems outside the scope of this patch.


The change log entry could be improved to say "Change define_insn 
iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would 
that address your concerns?



gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
     file.

Please don't line-wrap where not wanted.  Changelog lines are 80
character positions wide.  (Or 79 if you want, but heh).


Yea, it does look like file will just fit on the same line.  Fixed.



+The above instances are extension of the exiting overloaded built-ins

(existing)

Fixed spelling error.




a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
index 000..bb90f489149
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
@@ -0,0 +1,349 @@
+/* { dg-do run  { target { int128 } && { power10_hw } } } */

Everything power10 is int128 always.


OK, so don't need the power10_hw.  Changed to just int128 for the target:

    /* { dg-do run  target int128  } */

+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */

So this is enough always.

Often we have two testcases, one for run, one for compiling only.  It's
a bit simpler and cleaner.


Sounds like you would prefer to have a run and a compile test file? I 
will create a new file vec-shift-double-int128.c  consisting of a series 
of functions to test each built-in definition.



+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

Why the -save-temps?  Always document it if you want that for something,
but never put it in the testcase if not.  A leftover from development?

Okay for trunk, thank you!  Well Peter had some comments too, modulo
those I guess, I'll read them now ;-)
So as Peter said, the save-temps is because the runnable case file also 
checks for assembler times at the end of the file.


I will move the scan-assembler-times checks to the new compile only test.

    Carl


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Segher Boessenkool
Hi!

On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote:
> On 7/24/24 10:03 AM, Segher Boessenkool wrote:
> >So much manual stuff needed, sigh.
> >
> >On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
> >>gcc/ChangeLog:
> >>     * config/rs6000/altivec.md (vsdb_): Change
> >>     define_insn iterator to VEC_IC.
> > From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).
> >
> >Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
> >at least?  Just something like "ISA 3.1 added 128-bit things" or
> >whatever, but don't leave the reader second-guessing, a reader will
> >often guess wrong :-)
> 
> I don't disagree that the reader will guess wrong, probably after being 
> frustated that it isn't obvious.  :-)
> The VEC_IC was an existing definition, this patch does not add it.  Your 
> comments seems to imply you want a comment on the definition for VEC_IC 
> in vector.md?  I could add one to the existing definition if you like 
> but it seems outside the scope of this patch.

Yes, I'm just lamenting the state of things :-)

It would have to be a separate patch, yes.  A trivial patch to add such
a comment is pre-approved :-)

> The change log entry could be improved to say "Change define_insn 
> iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would 
> that address your concerns?

The current changelog is fine.  Changelogs never can replace comments in
the code.

> >>+/* { dg-do run  { target { int128 } && { power10_hw } } } */
> >Everything power10 is int128 always.
> 
> OK, so don't need the power10_hw.  Changed to just int128 for the target:

No, the other way around: you cannot run the code on machines without
these (ISA 3.1) instructions!

But p10 always satisfies the int128 predicate.  Although, hrm, how
about -m32 :-)

>     /* { dg-do run  target int128  } */
> >>+/* { dg-do link { target { ! power10_hw } } } */
> >>+/* { dg-require-effective-target power10_ok } */
> >So this is enough always.
> >
> >Often we have two testcases, one for run, one for compiling only.  It's
> >a bit simpler and cleaner.
> 
> Sounds like you would prefer to have a run and a compile test file? I 
> will create a new file vec-shift-double-int128.c  consisting of a series 
> of functions to test each built-in definition.

No, I don't prefer that, but it is easier to handle (also for you).
That that results in a bit more files, who cares, I don't anyway :-)

> >>+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> >Why the -save-temps?  Always document it if you want that for something,
> >but never put it in the testcase if not.  A leftover from development?
> >
> >Okay for trunk, thank you!  Well Peter had some comments too, modulo
> >those I guess, I'll read them now ;-)
> So as Peter said, the save-temps is because the runnable case file also 
> checks for assembler times at the end of the file.

Yup.  A comment would help :-)


Segher


Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Edwin Lu



On 7/24/2024 3:52 AM, Richard Biener wrote:

On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu  wrote:


On 7/23/2024 11:20 AM, Richard Sandiford wrote:

Edwin Lu  writes:

On 7/23/2024 4:56 AM, Richard Biener wrote:

On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:

Hi Richard,

On 5/31/2024 1:48 AM, Richard Biener wrote:

On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  wrote:

From: Greg McGary 

Still a NACK.  If remain ends up zero then

/* Try to use a single smaller load when we are about
   to load excess elements compared to the unrolled
   scalar loop.  */
if (known_gt ((vec_num * j + i + 1) * nunits,
   (group_size * vf - gap)))
  {
poly_uint64 remain = ((group_size * vf - gap)
  - (vec_num * j + i) * nunits);
if (known_ge ((vec_num * j + i + 1) * nunits
  - (group_size * vf - gap), nunits))
  /* DR will be unused.  */
  ltype = NULL_TREE;

needs to be re-formulated so that the combined conditions make sure
this doesn't happen.  The outer known_gt should already ensure that
remain > 0.  For correctness that should possibly be maybe_gt though.

Yeah.  FWIW, I mentioned the maybe_gt thing in
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:

Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
We can only skip doing it if we know for sure that the load won't cross
the gap.  (Not sure whether the difference can trigger in practice.)

But AFAICT, the known_gt doesn't inherently prove that remain is known
to be nonzero.  It just proves that the gap between the end of the scalar
accesses and the end of this vector is known to be nonzero.


Putting the list back in the loop and CCing Richard S.


I'm currently looking into this patch and am trying to figure out what
is going on. Stepping through gdb, I see that remain == {coeffs = {0,
2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).

From what I understand, this falls under the umbrella of 0 <= remain <
nunits. The divide by zero error is because of the 0 <= remain which is
coming from the constant_multiple_p function in poly-int.h where it
performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
(https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)


>  if (known_ge ((vec_num * j + i + 1) * nunits
>- (group_size * vf - gap),
nunits))
>/* DR will be unused.  */
>ltype = NULL_TREE;

This if condition is a bit suspicious to me though. I'm seeing that it's
evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
returning false. Should it be maybe_ge instead?

No, we can only not emit a load if we know it won't be used, not if
it eventually cannot be used.

Agreed.

[switching round for easier reply]

After running some
tests, to me it looks like it doesn't vectorize quite as often; however,
I'm not fully sure what else to do when the coeffs can potentially be
equal to 0.

Should it even be possible for there to be a {coeffs = {0, n}}
situation? My understanding of how poly_ints are used for representing
vectorization is that the first coefficient is the number of elements
needed to make the minimum supported vector size. That is, if vector
lengths are 128 bits, element size is 32 bits, coeff[0] should be
minimum of 4. Is this understanding correct?

I was told n can be negative, but nunits.coeff[0] should be non-zero.

What would it mean for the coeffs[0] to be 0? Would that mean the vector length 
supports 0 bits?

coeffs = {A,B} just means A+B*X, where X is the number of vector
"chunks" beyond the minimum length.  It's certainly valid for a poly_int
to have a zero coeffs[0] (i.e. zero A).  For example, (the length of a
vector) - (the minimum length) would have this property.

Thanks for the explanation! I have a few clarification questions about this.
If I understand correctly, B would represent the number of elements the vector 
can have (for 128b vector operating on 32b elements, B == 4, but if operating 
on 64b elements B == 2); however, I'm not too sure what A represents.

On the poly_int docs, it says

An indeterminate value of 0 should usually represent the minimum possible 
runtime value, with c0 specifying the value in that case.

"minimum possible runtime value" doesn't make sense to me. Does it mean the 
potential minimum bound of elements it will operate on?


What is j and i when the divisor is zero?

The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = {2,
2}} nunits = {coeffs = {2, 2}} group_siz

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Edwin Lu



On 7/24/2024 3:03 AM, Robin Dapp wrote:

Thanks for the explanation! I have a few clarification questions about this.
If I understand correctly, B would represent the number of elements the
vector can have (for 128b vector operating on 32b elements, B == 4, but if
operating on 64b elements B == 2); however, I'm not too sure what A
represents.

The runtime size of a vector is a polynomial with a "base size" of A and
"increments beyond A" of B.  B is compile-time variable/indeterminate and
runtime invariant.  For (non-zve32) RVV it specifies the number of 64-bit
chunks beyond the minimum size of 64 bits.  The polynomial is [2 2] in that
case and the "vector bit size" would be
   64 * [2 2] = [128 128] = 128 + x * 128.
For a runtime vector size of 256 bits, x would be 1 and so on and we determine
it at runtime via csrr.


On the poly_int docs, it says

An indeterminate value of 0 should usually represent the minimum possible
runtime value, with c0 specifying the value in that case.

"minimum possible runtime value" doesn't make sense to me. Does it mean the
potential minimum bound of elements it will operate on?

This refers to the minimum runtime size of a vector, the constant 2 * 64 bit
above.  So it doesn't talk about the number of elements.  The number of
elements can be deducted from the "vector size" polynomial by dividing it by
the element size.  The minimum number of elements for an element size S could
e.g. be [128 128] / S = 128 / S + x * (128 / S).


I think all of this makes sense to me! Thanks for all the explanations!

Edwin



Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Edwin Lu



On 7/24/2024 3:03 AM, Robin Dapp wrote:

Thanks for the explanation! I have a few clarification questions about this.
If I understand correctly, B would represent the number of elements the
vector can have (for 128b vector operating on 32b elements, B == 4, but if
operating on 64b elements B == 2); however, I'm not too sure what A
represents.

The runtime size of a vector is a polynomial with a "base size" of A and
"increments beyond A" of B.  B is compile-time variable/indeterminate and
runtime invariant.  For (non-zve32) RVV it specifies the number of 64-bit
chunks beyond the minimum size of 64 bits.  The polynomial is [2 2] in that
case and the "vector bit size" would be
   64 * [2 2] = [128 128] = 128 + x * 128.
For a runtime vector size of 256 bits, x would be 1 and so on and we determine
it at runtime via csrr.


On the poly_int docs, it says

An indeterminate value of 0 should usually represent the minimum possible
runtime value, with c0 specifying the value in that case.

"minimum possible runtime value" doesn't make sense to me. Does it mean the
potential minimum bound of elements it will operate on?

This refers to the minimum runtime size of a vector, the constant 2 * 64 bit
above.  So it doesn't talk about the number of elements.  The number of
elements can be deducted from the "vector size" polynomial by dividing it by
the element size.  The minimum number of elements for an element size S could
e.g. be [128 128] / S = 128 / S + x * (128 / S).


I think all of this makes sense to me! Thanks for all the explanations!

Edwin



Re: [PATCH 1/2] cp/coroutines: do not rewrite parameters in unevaluated contexts

2024-07-24 Thread Jason Merrill

On 7/23/24 7:41 PM, Arsen Arsenović wrote:

It is possible to use parameters of a parent function of a lambda in
unevaluated contexts without capturing them.  By not capturing them, we
work around the usual mechanism we use to prevent rewriting captured
parameters.  Prevent this by simply skipping rewrites in unevaluated
contexts.  Those won't mind the value not being present anyway.

gcc/cp/ChangeLog:

PR c++/111728
* coroutines.cc (rewrite_param_uses): Skip unevaluated
subexpressions.

gcc/testsuite/ChangeLog:

PR c++/111728
* g++.dg/coroutines/pr111728.C: New test.
---
Evening!

This 'series' contains two patches for the coroutine implementation to
address two unrelated PRs.


When you're explicitly dividing your description between stuff that goes 
in the commit message and stuff that doesn't, the stuff that doesn't go 
in the commit message should come first, followed by a "scissors" line, 
per git-mailinfo(1):



  --scissors
   Remove everything in body before a scissors line (e.g. "-- >8 --"). 
The line
   represents scissors and perforation marks, and is used to request 
the reader
   to cut the message at that line. If that line appears in the body of 
the
   message before the patch, everything before it (including the 
scissors line
   itself) is ignored when this option is used.

   This is useful if you want to begin your message in a discussion 
thread with
   comments and suggestions on the message you are responding to, and to
   conclude it with a patch submission, separating the discussion and 
the
   beginning of the proposed commit log message with a scissors line.
The first prevents an ICE during coroutine parameter substitution by not
performing it in unevaluated contexts.  Those contexts can contain names
that were not captured by lambdas but *are* parameters to coroutines.
In the testcase from the PR, the rewriting machinery finds a param in
the body of the coroutine, which it did not previously encounter while
processing the coroutine declaration, and that does not have a
DECL_VALUE_EXPR, and fails.

Since it is not really useful to rewrite parameter uses in unevaluated
contexts, we can just ignore those, preventing confusion (and the ICE).


All of the explanation of the change rationale can merge into the commit 
message.



Regression tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely night.

  gcc/cp/coroutines.cc   |  6 +
  gcc/testsuite/g++.dg/coroutines/pr111728.C | 29 ++
  2 files changed, 35 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr111728.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index e8f028df3ad..fb8f24e6c61 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3755,6 +3755,12 @@ rewrite_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
return cp_walk_tree (&t, rewrite_param_uses, d, NULL);
  }
  
+  if (unevaluated_p (TREE_CODE (*stmt)))

+{
+  *do_subtree = 0; // Nothing to do.


We tend to avoid C++-style comments; I'm not sure any comment is needed 
here, but "/* No odr-uses in unevaluated operands.  */" would be clearer 
IMO.


OK either way.

Jason



Re: [PATCH 2/2] cp+coroutines: teach convert_to_void to diagnose discarded co_awaits

2024-07-24 Thread Jason Merrill

On 7/23/24 7:41 PM, Arsen Arsenović wrote:

co_await expressions are nearly calls to Awaitable::await_resume, and,
as such, should inherit its nodiscard.  A discarded co_await expression
should, hence, act as if its call to await_resume was discarded.

CO_AWAIT_EXPR trees do conveniently contain the expression for calling
await_resume in them, so we can discard it.

gcc/cp/ChangeLog:

PR c++/110171
* coroutines.cc (co_await_get_resume_call): New function.
Returns the await_resume expression of a given co_await.
* cp-tree.h (co_await_get_resume_call): New function.
* cvt.cc (convert_to_void): Handle CO_AWAIT_EXPRs and call
maybe_warn_nodiscard on their resume exprs.

gcc/testsuite/ChangeLog:

PR c++/110171
* g++.dg/coroutines/pr110171-1.C: New test.
* g++.dg/coroutines/pr110171.C: New test.
---
This patch teaches convert_to_void how to discard 'through' a
CO_AWAIT_EXPR.  CO_AWAIT_EXPR nodes (most of the time) already contain
their relevant await_resume() call embedded within them, so, when we
discard a CO_AWAIT_EXPR, we can also just discard the await_resume()
call embedded within it.  This results in a [[nodiscard]] diagnostic
that the PR noted was missing.


Again you have two different versions of the patch rationale?


As with the previous patch, regression-tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA.

  gcc/cp/coroutines.cc | 13 
  gcc/cp/cp-tree.h |  3 ++
  gcc/cp/cvt.cc|  8 +
  gcc/testsuite/g++.dg/coroutines/pr110171-1.C | 34 
  gcc/testsuite/g++.dg/coroutines/pr110171.C   | 32 ++
  5 files changed, 90 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110171-1.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110171.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index fb8f24e6c61..05486c2fb19 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -596,6 +596,19 @@ coro_get_destroy_function (tree decl)
return NULL_TREE;
  }
  
+/* Given a CO_AWAIT_EXPR AWAIT_EXPR, return its resume call.  */

+
+tree*


Why return tree* rather than tree?


+co_await_get_resume_call (tree await_expr)
+{
+  gcc_checking_assert (TREE_CODE (await_expr) == CO_AWAIT_EXPR);
+  tree vec = TREE_OPERAND (await_expr, 3);
+  if (!vec)
+return nullptr;
+  return &TREE_VEC_ELT (vec, 2);
+}
+
+
  /* These functions assumes that the caller has verified that the state for
 the decl has been initialized, we try to minimize work here.  */
  
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h

index 856699de82f..c9ae8950bd1 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8763,6 +8763,9 @@ extern tree coro_get_actor_function   (tree);
  extern tree coro_get_destroy_function (tree);
  extern tree coro_get_ramp_function(tree);
  
+extern tree* co_await_get_resume_call		(tree await_expr);

+
+
  /* contracts.cc */
  extern tree make_postcondition_variable   (cp_expr);
  extern tree make_postcondition_variable   (cp_expr, tree);
diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index d95e01c118c..7b4bd8a9dc4 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1502,6 +1502,14 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
maybe_warn_nodiscard (expr, implicit);
break;
  
+case CO_AWAIT_EXPR:

+  {
+   auto awr = co_await_get_resume_call (expr);
+   if (awr && *awr)
+ *awr = convert_to_void (*awr, implicit, complain);
+   break;
+  }
+
  default:;
  }
expr = resolve_nondeduced_context (expr, complain);
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110171-1.C 
b/gcc/testsuite/g++.dg/coroutines/pr110171-1.C
new file mode 100644
index 000..d8aff582487
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110171-1.C
@@ -0,0 +1,34 @@
+// { dg-do compile }
+#include 
+
+struct must_check_result
+{
+bool await_ready() { return false; }
+void await_suspend(std::coroutine_handle<>) {}
+[[nodiscard]] bool await_resume() { return {}; }
+};
+
+struct task {};
+
+namespace std
+{
+template
+struct coroutine_traits
+{
+struct promise_type
+{
+task get_return_object() { return {}; }
+suspend_always initial_suspend() noexcept { return {}; }
+suspend_always final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+};
+};
+}
+
+task example(auto)
+{
+co_await must_check_result(); // { dg-warning "-Wunused-result" }
+}
+
+void f() { example(1); }
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110171.C 
b/gcc/testsuite/g++.dg/coroutines/pr110171.C
new file mode 100644
index 000..4b82e23656c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110171.C
@@ -0,0 +1,32 @@
+// { dg-do co

[PATCH] Use add_name_and_src_coords_attributes in modified_type_die

2024-07-24 Thread Tom Tromey
While working on a patch to the Ada compiler, I found a spot in
dwarf2out.cc that calls add_name_attribute where a call to
add_name_and_src_coords_attributes would be better, because the latter
respects DECL_NAMELESS.

gcc

* dwarf2out.cc (modified_type_die): Call
add_name_and_src_coords_attributes for type decls.
---
 gcc/dwarf2out.cc | 29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 357efaa5990..9f6e7110411 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -14047,23 +14047,18 @@ modified_type_die (tree type, int cv_quals, bool 
reverse,
  Don't attach a DW_AT_name to DW_TAG_const_type or DW_TAG_volatile_type
  if the base type already has the same name.  */
   if (name
-  && ((TREE_CODE (name) != TYPE_DECL
-  && (qualified_type == TYPE_MAIN_VARIANT (type)
-  || (cv_quals == TYPE_UNQUALIFIED)))
- || (TREE_CODE (name) == TYPE_DECL
- && DECL_NAME (name)
- && (TREE_TYPE (name) == qualified_type
- || (lang_hooks.types.get_debug_type
- && (lang_hooks.types.get_debug_type (TREE_TYPE (name))
- == qualified_type))
-{
-  if (TREE_CODE (name) == TYPE_DECL)
-   /* Could just call add_name_and_src_coords_attributes here,
-  but since this is a builtin type it doesn't have any
-  useful source coordinates anyway.  */
-   name = DECL_NAME (name);
-  add_name_attribute (mod_type_die, IDENTIFIER_POINTER (name));
-}
+  && TREE_CODE (name) != TYPE_DECL
+  && (qualified_type == TYPE_MAIN_VARIANT (type)
+ || (cv_quals == TYPE_UNQUALIFIED)))
+add_name_attribute (mod_type_die, IDENTIFIER_POINTER (name));
+  else if (name
+  && TREE_CODE (name) == TYPE_DECL
+  && DECL_NAME (name)
+  && (TREE_TYPE (name) == qualified_type
+  || (lang_hooks.types.get_debug_type
+  && (lang_hooks.types.get_debug_type (TREE_TYPE (name))
+  == qualified_type
+add_name_and_src_coords_attributes (mod_type_die, name, true);
   else if (mod_type_die && mod_type_die->die_tag == DW_TAG_base_type)
 {
   if (TREE_CODE (type) == BITINT_TYPE)
-- 
2.45.0



[PATCH 1/2] libstdc++: Move std::optional assertions out of _M_get()

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

Any reason not to do this? I don't think the assertions are useful to
catch implementation bugs where we access the contained value without
checking it - we should use tests for that.

-- >8 --

Currently we implement the precondition for accessing the contained
value of a std::optional in the _M_get() accessor in the base class.
This means that we always check the assertions even in internal
functions that have an explicit check for a contained value being
present, such as value() and value_or(U&&). Although those redundant
assertions should get optimized out in most cases, they might hurt
inliner heuristics and generally give the compiler more work to do.
And they won't be optimized out at all for non-optimized builds.

The current assertions also result in repeated invalid bug reports, such
as PR 91281, PR 101659, PR 102712, and PR 107894.

We can move the assertions from the internal accessors to the public
member functions where the preconditions are specified.

libstdc++-v3/ChangeLog:

* include/std/optional (_Optional_base_impl::_M_get()): Move
assertions to ...
(optional::operator->, optional::operator*): ... here.
---
 libstdc++-v3/include/std/optional | 40 ---
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 48e0f3d36f2..af72004645e 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -472,17 +472,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // The _M_get operations have _M_engaged as a precondition.
   constexpr _Tp&
   _M_get() noexcept
-  {
-   __glibcxx_assert(this->_M_is_engaged());
-   return static_cast<_Dp*>(this)->_M_payload._M_get();
-  }
+  { return static_cast<_Dp*>(this)->_M_payload._M_get(); }
 
   constexpr const _Tp&
   _M_get() const noexcept
-  {
-   __glibcxx_assert(this->_M_is_engaged());
-   return static_cast(this)->_M_payload._M_get();
-  }
+  { return static_cast(this)->_M_payload._M_get(); }
 };
 
   /**
@@ -958,27 +952,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Observers.
   constexpr const _Tp*
   operator->() const noexcept
-  { return std::__addressof(this->_M_get()); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return std::__addressof(this->_M_get());
+  }
 
   constexpr _Tp*
   operator->() noexcept
-  { return std::__addressof(this->_M_get()); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return std::__addressof(this->_M_get());
+  }
 
   constexpr const _Tp&
   operator*() const& noexcept
-  { return this->_M_get(); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return this->_M_get();
+  }
 
   constexpr _Tp&
   operator*()& noexcept
-  { return this->_M_get(); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return this->_M_get();
+  }
 
   constexpr _Tp&&
   operator*()&& noexcept
-  { return std::move(this->_M_get()); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return std::move(this->_M_get());
+  }
 
   constexpr const _Tp&&
   operator*() const&& noexcept
-  { return std::move(this->_M_get()); }
+  {
+   __glibcxx_assert(this->_M_is_engaged());
+   return std::move(this->_M_get());
+  }
 
   constexpr explicit operator bool() const noexcept
   { return this->_M_is_engaged(); }
-- 
2.45.2



[PATCH 2/2] libstdc++: Use _M_get() in std::optional internals

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

Now that _base::_M_get() doesn't check the precondition, we can use
_M_get() instead of operator*() for the internal uses where we've
already checked the precondition holds.

Add a using-declaration so that we don't need to lookup _M_get in the
dependent base class, and make optional a friend so that the
converting constructors and assignment operators can use the parameter's
_M_get member.

libstdc++-v3/ChangeLog:

* include/std/optional (optional): Add using-declaraction for
_Base::_M_get and declare optional as friend.
(optional(const optional&)): Use
_M_get instead of operator*.
(optional(optional&&)): Likewise.
(operator=(const optional&)): Likewise.
(operator=(optional&&)): Likewise.
(and_then, tansform): Likewise.
---
 libstdc++-v3/include/std/optional | 38 ---
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index af72004645e..9ed7ab50140 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -760,7 +760,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
noexcept(is_nothrow_constructible_v<_Tp, const _Up&>)
{
  if (__t)
-   emplace(*__t);
+   emplace(__t._M_get());
}
 
   template)
{
  if (__t)
-   emplace(*__t);
+   emplace(__t._M_get());
}
 
   template)
{
  if (__t)
-   emplace(std::move(*__t));
+   emplace(std::move(__t._M_get()));
}
 
   template)
{
  if (__t)
-   emplace(std::move(*__t));
+   emplace(std::move(__t._M_get()));
}
 
   template_M_is_engaged())
-   this->_M_get() = *__u;
+   this->_M_get() = __u._M_get();
  else
-   this->_M_construct(*__u);
+   this->_M_construct(__u._M_get());
}
  else
{
@@ -889,9 +889,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (__u)
{
  if (this->_M_is_engaged())
-   this->_M_get() = std::move(*__u);
+   this->_M_get() = std::move(__u._M_get());
  else
-   this->_M_construct(std::move(*__u));
+   this->_M_construct(std::move(__u._M_get()));
}
  else
{
@@ -1056,7 +1056,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return static_cast<_Tp>(std::forward<_Up>(__u));
}
 
-#if __cpp_lib_optional >= 202110L
+#if __cpp_lib_optional >= 202110L // C++23
   // [optional.monadic]
 
   template
@@ -1068,7 +1068,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"the function passed to std::optional::and_then "
"must return a std::optional");
  if (has_value())
-   return std::__invoke(std::forward<_Fn>(__f), **this);
+   return std::__invoke(std::forward<_Fn>(__f), _M_get());
  else
return _Up();
}
@@ -1082,7 +1082,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"the function passed to std::optional::and_then "
"must return a std::optional");
  if (has_value())
-   return std::__invoke(std::forward<_Fn>(__f), **this);
+   return std::__invoke(std::forward<_Fn>(__f), _M_get());
  else
return _Up();
}
@@ -1096,7 +1096,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"the function passed to std::optional::and_then "
"must return a std::optional");
  if (has_value())
-   return std::__invoke(std::forward<_Fn>(__f), std::move(**this));
+   return std::__invoke(std::forward<_Fn>(__f), std::move(_M_get()));
  else
return _Up();
}
@@ -1110,7 +1110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"the function passed to std::optional::and_then "
"must return a std::optional");
  if (has_value())
-   return std::__invoke(std::forward<_Fn>(__f), std::move(**this));
+   return std::__invoke(std::forward<_Fn>(__f), std::move(_M_get()));
  else
return _Up();
}
@@ -1121,7 +1121,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  using _Up = remove_cv_t>;
  if (has_value())
-   return optional<_Up>(_Optional_func<_Fn>{__f}, **this);
+   return optional<_Up>(_Optional_func<_Fn>{__f}, _M_get());
  else
return optional<_Up>();
}
@@ -1132,7 +1132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  using _Up = remove_cv_t>;
  if (has_value())
-   return optional<_Up>(_Optional_func<_Fn>{__f}, **this);
+   return optional<_Up>(_Optional_func<_Fn>{__f}, _M_get());
  else
return optional<_Up>();
   

[PATCH] libstdc++: Use concepts to simplify std::optional base classes

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

In C++20 mode we can simplify some of the std::optional base class
hierarchy using concepts. We can overload the destructor and copy
constructor and move constructor with a trivial defaulted version and a
constrained non-trivial version. This allows us to remove some class
template partial specializations that were used to conditionally define
those special members as trivial or non-trivial. This should not change
any semantics, but should be less work for the compiler, due to not
needing to match partial specializations, and completely removing one
level of the inheritance hierarchy.

libstdc++-v3/ChangeLog:

* include/std/optional (_Optional_payload_base::_Storage)
[C++20]: Define constrained non-trivial destructor.
(_Optional_payload_base::_Storage) [C++20]: Do not
define partial specialization when primary template has
constrained destructor.
(_Optional_base) [C++20]: Define constrained trivial copy and
move cons and move constructors. Define payload accessors here
instead of inheriting them from _Optional_base_impl.
(_Optional_base_impl, _Optional_base)
(_Optional_base, _Optional_base)
[C++20]: Do not define.
---
 libstdc++-v3/include/std/optional | 216 --
 1 file changed, 146 insertions(+), 70 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 9ed7ab50140..344be5e44d3 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -226,10 +226,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ }
 #endif
 
+#if __cpp_concepts >= 202002L // Conditionally trivial special member functions
+ ~_Storage() = default;
+
+ // User-provided destructor is needed when _Up has non-trivial dtor.
+ _GLIBCXX20_CONSTEXPR
+ ~_Storage() requires (!is_trivially_destructible_v<_Up>)
+ { }
+
+ _Storage(const _Storage&) = default;
+ _Storage(_Storage&&) = default;
+ _Storage& operator=(const _Storage&) = default;
+ _Storage& operator=(_Storage&&) = default;
+#endif
+
  _Empty_byte _M_empty;
  _Up _M_value;
};
 
+#if __cpp_concepts < 202002L
   template
union _Storage<_Up, false>
{
@@ -259,9 +274,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // User-provided destructor is needed when _Up has non-trivial dtor.
  _GLIBCXX20_CONSTEXPR ~_Storage() { }
 
+ _Storage(const _Storage&) = default;
+ _Storage(_Storage&&) = default;
+ _Storage& operator=(const _Storage&) = default;
+ _Storage& operator=(_Storage&&) = default;
+
  _Empty_byte _M_empty;
  _Up _M_value;
};
+#endif
 
   _Storage<_Stored_type> _M_payload;
 
@@ -438,8 +459,128 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX20_CONSTEXPR ~_Optional_payload() { this->_M_reset(); }
 };
 
+  /**
+* @brief Class template that provides copy/move constructors of optional.
+*
+* Such a separate base class template is necessary in order to
+* conditionally make copy/move constructors trivial.
+*
+* When the contained value is trivially copy/move constructible,
+* the copy/move constructors of _Optional_base will invoke the
+* trivial copy/move constructor of _Optional_payload. Otherwise,
+* they will invoke _Optional_payload(bool, const _Optional_payload&)
+* or _Optional_payload(bool, _Optional_payload&&) to initialize
+* the contained value, if copying/moving an engaged optional.
+*
+* Whether the other special members are trivial is determined by the
+* _Optional_payload<_Tp> specialization used for the _M_payload member.
+*
+* @see optional, _Enable_special_members
+*/
+  template,
+  bool = is_trivially_move_constructible_v<_Tp>>
+struct _Optional_base
+{
+  // Constructors for disengaged optionals.
+  constexpr _Optional_base() = default;
+
+  // Constructors for engaged optionals.
+  template, bool> = false>
+   constexpr explicit
+   _Optional_base(in_place_t, _Args&&... __args)
+   : _M_payload(in_place, std::forward<_Args>(__args)...)
+   { }
+
+  template&,
+ _Args...>, bool> = false>
+   constexpr explicit
+   _Optional_base(in_place_t,
+  initializer_list<_Up> __il,
+  _Args&&... __args)
+   : _M_payload(in_place, __il, std::forward<_Args>(__args)...)
+   { }
+
+  // Copy and move constructors.
+  constexpr
+  _Optional_base(const _Optional_base& __other)
+  noexcept(is_nothrow_copy_constructible_v<_Tp>)
+  : _M_payload(__other._M_payload._M_engaged, __other._M_payload)
+  { }
+
+  constexpr
+  _Optional_base(_Optional_base&& __other)
+  noexcept(is_nothrow_move_constructible_v<_Tp>)
+  : _M_payload(__

[PATCH 2/2] libstdc++: Implement LWG 3836 for std::optional bool conversions

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/optional (optional): Constrain constructors to
prevent problematic bool conversions, as per LWG 3836.
* testsuite/20_util/optional/cons/lwg3836.cc: New test.
---
 libstdc++-v3/include/std/optional | 58 ++-
 .../20_util/optional/cons/lwg3836.cc  | 58 +++
 2 files changed, 100 insertions(+), 16 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/optional/cons/lwg3836.cc

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 344be5e44d3..700e7047aba 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -749,16 +749,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 inline constexpr bool __is_optional_v> = true;
 
+  template
+using __converts_from_any_cvref = __or_<
+   is_constructible<_Tp, _Wp&>,   is_convertible<_Wp&, _Tp>,
+   is_constructible<_Tp, _Wp>,is_convertible<_Wp, _Tp>,
+   is_constructible<_Tp, const _Wp&>, is_convertible,
+   is_constructible<_Tp, const _Wp>,  is_convertible
+  >;
+
   template
-using __converts_from_optional =
-  __or_&>,
-   is_constructible<_Tp, optional<_Up>&>,
-   is_constructible<_Tp, const optional<_Up>&&>,
-   is_constructible<_Tp, optional<_Up>&&>,
-   is_convertible&, _Tp>,
-   is_convertible&, _Tp>,
-   is_convertible&&, _Tp>,
-   is_convertible&&, _Tp>>;
+using __converts_from_optional
+  = __converts_from_any_cvref<_Tp, optional<_Up>>;
 
   template
 using __assigns_from_optional =
@@ -800,6 +801,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
using _Requires = enable_if_t<__and_v<_Cond...>, bool>;
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3836. std::expected conversion constructor
+  // expected(const expected&) should take precedence over
+  // expected(U&&) with operator bool
+  template>
+   struct __not_constructing_bool_from_optional
+   : true_type
+   { };
+
+  template
+   struct __not_constructing_bool_from_optional<_From, bool>
+   : bool_constant>>
+   { };
+
+  template>
+   struct __construct_from_contained_value
+   : __not_<__converts_from_optional<_Tp, _From>>
+   { };
+
+  template
+   struct __construct_from_contained_value<_From, bool>
+   : true_type
+   { };
+
 public:
   using value_type = _Tp;
 
@@ -811,7 +836,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template, __not_tag<_Up>,
 is_constructible<_Tp, _Up>,
-is_convertible<_Up, _Tp>> = true>
+is_convertible<_Up, _Tp>,
+__not_constructing_bool_from_optional<_Up>> = true>
constexpr
optional(_Up&& __t)
noexcept(is_nothrow_constructible_v<_Tp, _Up>)
@@ -820,7 +846,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template, __not_tag<_Up>,
 is_constructible<_Tp, _Up>,
-__not_>> = false>
+__not_>,
+__not_constructing_bool_from_optional<_Up>> = false>
explicit constexpr
optional(_Up&& __t)
noexcept(is_nothrow_constructible_v<_Tp, _Up>)
@@ -830,7 +857,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Requires<__not_>,
 is_constructible<_Tp, const _Up&>,
 is_convertible,
-__not_<__converts_from_optional<_Tp, _Up>>> = true>
+__construct_from_contained_value<_Up>> = true>
constexpr
optional(const optional<_Up>& __t)
noexcept(is_nothrow_constructible_v<_Tp, const _Up&>)
@@ -843,7 +870,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Requires<__not_>,
 is_constructible<_Tp, const _Up&>,
 __not_>,
-__not_<__converts_from_optional<_Tp, _Up>>> = false>
+__construct_from_contained_value<_Up>> = false>
explicit constexpr
optional(const optional<_Up>& __t)
noexcept(is_nothrow_constructible_v<_Tp, const _Up&>)
@@ -856,7 +883,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Requires<__not_>,
 is_constructible<_Tp, _Up>,
 is_convertible<_Up, _Tp>,
-__not_<__converts_from_optional<_Tp, _Up>>> = true>
+__construct_from_contained_value<_Up>> = true>
constexpr
optional(optional<_Up>&& __t)
noexcept(is_nothrow_constructible_v<_Tp, _Up>)
@@ -869,7 +896,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Requires<__not_>,
 is_constructible<_Tp, _Up>,
 __not_>,
-__not_<__converts_from_optional<_Tp, _Up>>> =

[PATCH 1/2] libstdc++: Implement LWG 3836 for std::expected bool conversions

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/expected (expected): Constrain constructors to
prevent problematic bool conversions, as per LWG 3836.
* testsuite/20_util/expected/lwg3836.cc: New test.
---
 libstdc++-v3/include/std/expected | 59 ++-
 .../testsuite/20_util/expected/lwg3836.cc | 34 +++
 2 files changed, 77 insertions(+), 16 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/expected/lwg3836.cc

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 86026c3947a..2594cfe131c 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -314,6 +314,17 @@ namespace __expected
  __guard.release();
}
 }
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3836. std::expected conversion constructor
+  // expected(const expected&) should take precedence over
+  // expected(U&&) with operator bool
+
+  // If T is cv bool, remove_cvref_t is not a specialization of expected.
+  template
+concept __not_constructing_bool_from_expected
+  = ! is_same_v, bool>
+ || ! __is_expected>;
 }
 /// @endcond
 
@@ -327,26 +338,41 @@ namespace __expected
   static_assert( ! __expected::__is_unexpected> );
   static_assert( __expected::__can_be_unexpected<_Er> );
 
-  template>
+  // If T is not cv bool, converts-from-any-cvref> and
+  // is_constructible, cv expected ref-qual> are false.
+  template,
+  typename = remove_cv_t<_Tp>>
static constexpr bool __cons_from_expected
- = __or_v&>,
-  is_constructible<_Tp, expected<_Up, _Err>>,
-  is_constructible<_Tp, const expected<_Up, _Err>&>,
-  is_constructible<_Tp, const expected<_Up, _Err>>,
-  is_convertible&, _Tp>,
-  is_convertible, _Tp>,
-  is_convertible&, _Tp>,
-  is_convertible, _Tp>,
-  is_constructible<_Unex, expected<_Up, _Err>&>,
-  is_constructible<_Unex, expected<_Up, _Err>>,
-  is_constructible<_Unex, const expected<_Up, _Err>&>,
-  is_constructible<_Unex, const expected<_Up, _Err>>
+ = __or_v&>,
+  is_constructible<_Tp, expected<_Up, _Gr>>,
+  is_constructible<_Tp, const expected<_Up, _Gr>&>,
+  is_constructible<_Tp, const expected<_Up, _Gr>>,
+  is_convertible&, _Tp>,
+  is_convertible, _Tp>,
+  is_convertible&, _Tp>,
+  is_convertible, _Tp>,
+  is_constructible<_Unex, expected<_Up, _Gr>&>,
+  is_constructible<_Unex, expected<_Up, _Gr>>,
+  is_constructible<_Unex, const expected<_Up, _Gr>&>,
+  is_constructible<_Unex, const expected<_Up, _Gr>>
  >;
 
-  template
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // If t is cv bool, we know it can be constructed from expected,
+  // but we don't want to cause the expected(U&&) constructor to be used,
+  // so we only check the is_constructible, ...> cases.
+  template
+   static constexpr bool __cons_from_expected<_Up, _Gr, _Unex, bool>
+ = __or_v&>,
+  is_constructible<_Unex, expected<_Up, _Gr>>,
+  is_constructible<_Unex, const expected<_Up, _Gr>&>,
+  is_constructible<_Unex, const expected<_Up, _Gr>>
+ >;
+
+  template
constexpr static bool __explicit_conv
  = __or_v<__not_>,
-  __not_>
+  __not_>
  >;
 
   template
@@ -445,8 +471,9 @@ namespace __expected
   template
requires (!is_same_v, expected>)
  && (!is_same_v, in_place_t>)
- && (!__expected::__is_unexpected>)
  && is_constructible_v<_Tp, _Up>
+ && (!__expected::__is_unexpected>)
+ && __expected::__not_constructing_bool_from_expected<_Tp, _Up>
constexpr explicit(!is_convertible_v<_Up, _Tp>)
expected(_Up&& __v)
noexcept(is_nothrow_constructible_v<_Tp, _Up>)
diff --git a/libstdc++-v3/testsuite/20_util/expected/lwg3836.cc 
b/libstdc++-v3/testsuite/20_util/expected/lwg3836.cc
new file mode 100644
index 000..cd029c44963
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/expected/lwg3836.cc
@@ -0,0 +1,34 @@
+// { dg-do run { target c++23 } }
+
+#include 
+#include 
+
+constexpr void
+test_convert_contained_value_to_bool()
+{
+  struct BaseError { };
+  struct DerivedError : BaseError { };
+
+  std::expected e = false;
+
+  // Should use expected(const expected&) ctor, not expected(U&&):
+  std::expected e2 = e;
+
+  // Contained value should be e.value() not static_cast(e):
+  VERIFY( e2.value() == false );
+
+  std::expected e3(std::unexpect);
+  std::expected e4 = e3;
+  // Should have error, not static_cas

[PATCH] libstdc++: Add noexcept to bad_expected_access members (LWG 4031)

2024-07-24 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/expected (bad_expected_access): Add noexcept
to special member functions, as per LWG 4031.
* testsuite/20_util/expected/bad.cc: Check for nothrow copy and
move members.
---
 libstdc++-v3/include/std/expected  |  8 
 libstdc++-v3/testsuite/20_util/expected/bad.cc | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 2594cfe131c..3c52f7db01e 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -79,10 +79,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
 protected:
   bad_expected_access() noexcept { }
-  bad_expected_access(const bad_expected_access&) = default;
-  bad_expected_access(bad_expected_access&&) = default;
-  bad_expected_access& operator=(const bad_expected_access&) = default;
-  bad_expected_access& operator=(bad_expected_access&&) = default;
+  bad_expected_access(const bad_expected_access&) noexcept = default;
+  bad_expected_access(bad_expected_access&&) noexcept = default;
+  bad_expected_access& operator=(const bad_expected_access&) noexcept = 
default;
+  bad_expected_access& operator=(bad_expected_access&&) noexcept = default;
   ~bad_expected_access() = default;
 
 public:
diff --git a/libstdc++-v3/testsuite/20_util/expected/bad.cc 
b/libstdc++-v3/testsuite/20_util/expected/bad.cc
index c629e149da5..7e227f904a0 100644
--- a/libstdc++-v3/testsuite/20_util/expected/bad.cc
+++ b/libstdc++-v3/testsuite/20_util/expected/bad.cc
@@ -12,3 +12,16 @@ test_pr105146()
 {
   std::bad_expected_access(E{});
 }
+
+void
+test_lwg4031()
+{
+  struct test_type : std::bad_expected_access { };
+
+  static_assert( std::is_nothrow_default_constructible_v );
+  // LWG 4031. bad_expected_access member functions should be noexcept
+  static_assert( std::is_nothrow_copy_constructible_v );
+  static_assert( std::is_nothrow_move_constructible_v );
+  static_assert( std::is_nothrow_copy_assignable_v );
+  static_assert( std::is_nothrow_move_assignable_v );
+}
-- 
2.45.2



Re: [PATCH 1/2] libstdc++: Move std::optional assertions out of _M_get()

2024-07-24 Thread Ville Voutilainen
On Wed, 24 Jul 2024 at 22:51, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> Any reason not to do this? I don't think the assertions are useful to
> catch implementation bugs where we access the contained value without
> checking it - we should use tests for that.

Looks good to me.

> The current assertions also result in repeated invalid bug reports, such
> as PR 91281, PR 101659, PR 102712, and PR 107894.

I'm not sure moving the assertions helps with that, maybe some of
those bug reports
are caused by people not knowing how to enable the assertions.


Re: [PATCH 1/2] libstdc++: Move std::optional assertions out of _M_get()

2024-07-24 Thread Jonathan Wakely
On Wed, 24 Jul 2024 at 20:55, Ville Voutilainen
 wrote:
>
> On Wed, 24 Jul 2024 at 22:51, Jonathan Wakely  wrote:
> >
> > Tested x86_64-linux.
> >
> > Any reason not to do this? I don't think the assertions are useful to
> > catch implementation bugs where we access the contained value without
> > checking it - we should use tests for that.
>
> Looks good to me.

Thanks.

> > The current assertions also result in repeated invalid bug reports, such
> > as PR 91281, PR 101659, PR 102712, and PR 107894.
>
> I'm not sure moving the assertions helps with that, maybe some of
> those bug reports
> are caused by people not knowing how to enable the assertions.

Oddly, I think *all* of them were people inspecting the code and
deciding there were no assertions (because they looked in the wrong
place). In some of those bug reports, _GLIBCXX_DEBUG and
_GLIBCXX_ASSERTIONS are explicitly mentioned, but they either only
looked at the code and didn't test it, or thought they were testing
with assertions enabled but failed to enable them somehow.



Re: [PATCH 1/2] libstdc++: Move std::optional assertions out of _M_get()

2024-07-24 Thread Jonathan Wakely
On Wed, 24 Jul 2024 at 20:58, Jonathan Wakely  wrote:
>
> On Wed, 24 Jul 2024 at 20:55, Ville Voutilainen
>  wrote:
> >
> > On Wed, 24 Jul 2024 at 22:51, Jonathan Wakely  wrote:
> > >
> > > Tested x86_64-linux.
> > >
> > > Any reason not to do this? I don't think the assertions are useful to
> > > catch implementation bugs where we access the contained value without
> > > checking it - we should use tests for that.
> >
> > Looks good to me.
>
> Thanks.
>
> > > The current assertions also result in repeated invalid bug reports, such
> > > as PR 91281, PR 101659, PR 102712, and PR 107894.
> >
> > I'm not sure moving the assertions helps with that, maybe some of
> > those bug reports
> > are caused by people not knowing how to enable the assertions.
>
> Oddly, I think *all* of them were people inspecting the code and
> deciding there were no assertions (because they looked in the wrong
> place). In some of those bug reports, _GLIBCXX_DEBUG and
> _GLIBCXX_ASSERTIONS are explicitly mentioned, but they either only
> looked at the code and didn't test it, or thought they were testing
> with assertions enabled but failed to enable them somehow.

In one case, the same person who had added the assertions claimed
there weren't any ;-)



  1   2   >