Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Robin Dapp
Hi Vineet,

I was thinking of two things while skimming the code:

 - Couldn't we do this in the expanders directly?  Or is the
   subreg-promoted info gone until we reach that?

 - Should some common-code part be more suited to handle that?
   We already elide redundant sign-zero extensions for other
   reasons.  Maybe we could add subreg promoted handling there?

Regards
 Robin



Re: [PATCH 1/3]rs6000: update num_insns_constant for 2 insns

2023-10-25 Thread Jiufu Guo


Hi,

"Kewen.Lin"  writes:

> Hi,
>
> on 2023/10/25 10:00, Jiufu Guo wrote:
>> Hi,
>> 
>> Trunk gcc supports more constants to be built via two instructions: e.g.
>> "li/lis; xori/xoris/rldicl/rldicr/rldic".
>> And then num_insns_constant should also be updated.
>> 
>
> Thanks for updating this.
>
>> Bootstrap & regtest pass ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (can_be_built_by_lilis_and_rldicX): New
>>  function.
>>  (num_insns_constant_gpr): Update to return 2 for more cases.
>>  (rs6000_emit_set_long_const): Update to use
>>  can_be_built_by_lilis_and_rldicX.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc | 64 -
>>  1 file changed, 41 insertions(+), 23 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index cc24dd5301e..b23ff3d7917 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -6032,6 +6032,9 @@ direct_return (void)
>>return 0;
>>  }
>>  
>> +static bool
>> +can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT, int *, HOST_WIDE_INT *);
>> +
>>  /* Helper for num_insns_constant.  Calculate number of instructions to
>> load VALUE to a single gpr using combinations of addi, addis, ori,
>> oris, sldi and rldimi instructions.  */
>> @@ -6044,35 +6047,41 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
>>  return 1;
>>  
>>/* constant loadable with addis */
>> -  else if ((value & 0x) == 0
>> -   && (value >> 31 == -1 || value >> 31 == 0))
>> +  if ((value & 0x) == 0 && (value >> 31 == -1 || value >> 31 == 0))
>>  return 1;
>>  
>>/* PADDI can support up to 34 bit signed integers.  */
>> -  else if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
>> +  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
>>  return 1;
>>  
>> -  else if (TARGET_POWERPC64)
>> -{
>> -  HOST_WIDE_INT low = sext_hwi (value, 32);
>> -  HOST_WIDE_INT high = value >> 31;
>> +  if (!TARGET_POWERPC64)
>> +return 2;
>>  
>> -  if (high == 0 || high == -1)
>> -return 2;
>> +  HOST_WIDE_INT low = sext_hwi (value, 32);
>> +  HOST_WIDE_INT high = value >> 31;
>>  
>> -  high >>= 1;
>> +  if (high == 0 || high == -1)
>> +return 2;
>>  
>> -  if (low == 0 || low == high)
>> -return num_insns_constant_gpr (high) + 1;
>> -  else if (high == 0)
>> -return num_insns_constant_gpr (low) + 1;
>> -  else
>> -return (num_insns_constant_gpr (high)
>> -+ num_insns_constant_gpr (low) + 1);
>> -}
>> +  high >>= 1;
>>  
>> -  else
>> +  HOST_WIDE_INT ud2 = (low >> 16) & 0x;
>> +  HOST_WIDE_INT ud1 = low & 0x;
>> +  if (high == -1 && ((!(ud2 & 0x8000) && ud1 == 0) || (ud1 & 0x8000)))
>> +return 2;
>> +  if (high == 0 && (ud1 == 0 || (!(ud1 & 0x8000
>>  return 2;
>
> I was thinking that instead of enumerating all the cases in function
> rs6000_emit_set_long_const, if we can add one optional argument like
> "int* num_insns=nullptr" to function rs6000_emit_set_long_const, and
> when it's not nullptr, not emitting anything but update the count in
> rs6000_emit_set_long_const.  It helps people remember to update
> num_insns when updating rs6000_emit_set_long_const in future, it's
> also more clear on how the number comes from.
>
> Does it sound good to you?

Thanks for your advice!  Yes, "rs6000_emit_set_long_const"  and
"num_insns_constant_gpr" should be aligned.  Using a same logic
(same code place) would make sense.


BR,
Jeff (Jiufu Guo)
>
> BR,
> Kewen
>
>> +
>> +  int shift;
>> +  HOST_WIDE_INT mask;
>> +  if (can_be_built_by_lilis_and_rldicX (value, &shift, &mask))
>> +return 2;
>> +
>> +  if (low == 0 || low == high)
>> +return num_insns_constant_gpr (high) + 1;
>> +  if (high == 0)
>> +return num_insns_constant_gpr (low) + 1;
>> +  return (num_insns_constant_gpr (high) + num_insns_constant_gpr (low) + 1);
>>  }
>>  
>>  /* Helper for num_insns_constant.  Allow constants formed by the
>> @@ -10492,6 +10501,18 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
>> *shift, HOST_WIDE_INT *mask)
>>return false;
>>  }
>>  
>> +/* Combine the above checking functions for  li/lis;rldicX. */
>> +
>> +static bool
>> +can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT c, int *shift,
>> +  HOST_WIDE_INT *mask)
>> +{
>> +  return (can_be_built_by_li_lis_and_rotldi (c, shift, mask)
>> +  || can_be_built_by_li_lis_and_rldicl (c, shift, mask)
>> +  || can_be_built_by_li_lis_and_rldicr (c, shift, mask)
>> +  || can_be_built_by_li_and_rldic (c, shift, mask));
>> +}
>> +
>>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
>> Output insns to set DEST equal to the constant C as a series of
>> lis, ori and shl instructions.  */
>> @@ -10538,10 +10559,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>emit_move_insn (

Re: [PATCH 2/3]rs6000: using 'pli' to load 34bit-constant

2023-10-25 Thread Jiufu Guo


Hi,

"Kewen.Lin"  writes:

> on 2023/10/25 10:00, Jiufu Guo wrote:
>> Hi,
>> 
>> For constants with 16bit values, 'li or lis' can be used to generate
>> the value.  For 34bit constant, 'pli' is ok to generate the value.
>> 
>> Bootstrap®test pass on ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
>>  pli for 34bit constant.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index b23ff3d7917..4690384cdbe 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10530,7 +10530,11 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>ud3 = (c >> 32) & 0x;
>>ud4 = (c >> 48) & 0x;
>> 
>> -  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>> +  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
>> +{
>> +  emit_move_insn (dest, GEN_INT (c));
>> +}
>
> Nit: unexpected formatting, no {} needed.
>
> Is there any test case justifying this change?
Great catch! pr93012.c could cover this implicitly, but it only be
changed in the [PATCH 3/3].  I would add a new case for this in this
patch.

> I think only one "li" or "lis"
> beats "pli" since the latter is a prefixed insn, it puts more burdens on insn
> decoding.

Yes! Good news is "emit_move_insn (dest, GEN_INT (c));" is able to
support "li, lis and pli".  The "mov" would match/generate the best
one.

Thanks for your quick review and very helpful comments!

BR,
Jeff (Jiufu Guo)
>
> BR,
> Kewen
>
>> +  else if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 
>> 0x8000))
>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>>  emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
>> 


[PATCH v3 1/1] gcc: config: microblaze: fix cpu version check

2023-10-25 Thread Neal Frager
The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Signed-off-by: Neal Frager 
---
V1->V2:
 - No need to create a new microblaze specific version check
   routine as strverscmp is the correct solution.
V2->V3:
 - Changed mcpu define for microblaze isa testsuite examples.
---
 gcc/config/microblaze/microblaze.cc| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/bshift.c   | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/div.c  | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fcvt.c | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/float.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/fsqrt.c| 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mul-bshift-pcmp.c  | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mul-bshift.c   | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mul.c  | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mulh-bshift-pcmp.c | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/mulh.c | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/nofcmp.c   | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/nofloat.c  | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/pcmp.c | 2 +-
 gcc/testsuite/gcc.target/microblaze/isa/vanilla.c  | 2 +-
 gcc/testsuite/gcc.target/microblaze/microblaze.exp | 2 +-
 20 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index c9f6c4198cf..60ad55120d2 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,7 +56,7 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
 
 /* Classifies an address.
 
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c 
b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
index 64cf1e2e59e..664586bff9f 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/bshift.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mxl-barrel-shift" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mxl-barrel-shift" } */
 
 volatile int m1, m2, m3;
 volatile unsigned int u1, u2, u3;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/div.c 
b/gcc/testsuite/gcc.target/microblaze/isa/div.c
index 25ee42ce5c8..783e7c0f684 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/div.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/div.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mno-xl-soft-div" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mno-xl-soft-div" } */
 
 volatile int m1, m2, m3;
 volatile long l1, l2;
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
index 4041a241391..b6202e168d6 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/fcmp1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mhard-float" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mhard-float" } */
 
 volatile float f1, f2, f3;
 
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c
index 3902b839db9..4386c6e6cc3 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/fcmp2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mhard-float" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mhard-float" } */
 
 volatile float f1, f2, f3;
 
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c
index 8555974dda5..b414e48fe1b 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/fcmp3.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mhard-float" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mhard-float" } */
 
 volatile float f1, f2, f3;
 
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c 
b/gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c
index 79cc5f9dd8e..ff137012df4 100644
--- a/gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c
+++ b/gcc/testsuite/gcc.target/microblaze/isa/fcmp4.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -mcpu=v6.00.a -mhard-float" } */
+/* { dg-options "-O3 -mcpu=v10.0 -mhard-float" } */
 
 void float_func(float f1, float f2, float f3)
 {
diff --git a/gcc/testsuite/gcc.target/microblaze/isa/fcvt.c 
b/gcc/testsuite/gcc.ta

RE: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-25 Thread Frager, Neal
 There is a microblaze cpu version 10.0 included in versal. If the 
 minor version is only a single digit, then the version comparison 
 will fail as version 10.0 will appear as 100 compared to version
 6.00 or 8.30 which will calculate to values 600 and 830.
 The issue can be seen when using the '-mcpu=10.0' option.
 With this fix, versions with a single digit minor number such as
 10.0 will be calculated as greater than versions with a smaller 
 major version number, but with two minor version digits.
 By applying this fix, several incorrect warning messages will no 
 longer be printed when building the versal plm application, such as 
 the warning message below:
 warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a'
 or greater
 Signed-off-by: Neal Frager 
 ---
gcc/config/microblaze/microblaze.cc | 164 +---
1 file changed, 76 insertions(+), 88 deletions(-)
>>>
>>> Please add a test case.
>>>
>>> --
>>> Michael Eager
>>
>> Hi Michael,
>>
>> Would you mind helping me understand how to make a gcc test case for this 
>> patch?
>>
>> This patch does not change the resulting binaries of a microblaze gcc build. 
>>  The output will be the same with our without the patch, so I do not having 
>> anything in the binary itself to verify.
>>
>> All that happens is false warning messages will not be printed when building 
>> with ‘-mcpu=10.0’.  Is there a way to test for warning messages?
>>
>> In any case, please do not commit v1 of this patch.  I am going to work on 
>> making a v2 based on Mark’s feedback.
> 
>> You can create a test case which passes the -mcpu=10.0 and other options to 
>> GCC and verify that the message is not generated after the patch is applied.
> 
>> You can make all GCC warnings into errors with the "-Werror" option.
>> This means that the compile will fail if the warning is issued.
> 
>> Take a look at gcc/testsuite/gcc.target/aarch64/bti-1.c for an example of 
>> using { dg-options "" } to specify command line options.
> 
>> There is a test suite option (dg-warning) which checks that a particular 
>> source line generates a warning message, but it isn't clear whether is is 
>> possible to check that a warning is not issued.
> 
> Hi Michael,
> 
> Thanks to Mark Hatle's feedback, we have a much simpler solution to the 
> problem.
> 
> The following change is actually all that is necessary.  Since we are 
> just moving from strcasecmp to strverscmp, does v2 of the patch need to have 
> a test case to go with it?
> 
> -#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
> +#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
> 
> I assume there are already test cases that verify that strverscmp works 
> correctly?

> Still need a test case to verify this fix.

Hi Michael,

It appears to me that simply increasing the microblaze testsuite option from 
-mcpu=6.00.a to -mcpu=10.0 works for verifying this fix.  Without the fix, the 
microblaze testsuite isa examples print the false warning messages when 
-mcpu=10.0 is used.

Please see v3 of my patch which includes the testsuite update.

Best regards,
Neal Frager
AMD


[PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: Fix test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Ditto.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c | 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
index b3469c41c87..192040c04f6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=gnu99 -O3 -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=scalable" } */
+/* { dg-options "-std=gnu99 -O3 --param=riscv-autovec-preference=scalable" } */
 
 #include 
 #include 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
index 5df7e08c42f..a2f85477f9e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */
+/* { dg-options "-std=c99 -Wno-pedantic -Wno-psabi -O3" } */
 
 #include 
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c
index 7c77ae87f08..958f764f989 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */
+/* { dg-options "-std=c99 -Wno-pedantic -Wno-psabi -O3" } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
index 5dc095cce51..59341f4ca9b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=c99 -Wno-pedantic -Wno-psabi" } */
+/* { dg-options "-std=c99 -Wno-pedantic -Wno-psabi -O3" } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
index 7a50b701c36..7535aeac497 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
--param=riscv-autovec-preference=scalable -fno-builtin" } */
+/* { dg-options "-std=c99 -fno-vect-cost-model 
--param=riscv-autovec-preference=scalable -fno-builtin -O3" } */
 
 #include "vmv-imm-template.h"
 
-- 
2.36.3



Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-25 Thread Jiufu Guo


Hi,

"Kewen.Lin"  writes:

> Hi,
>
> on 2023/10/25 10:00, Jiufu Guo wrote:
>> Hi,
>> 
>> Sometimes, a complicated constant is built via 3(or more)
>> instructions to build. Generally speaking, it would not be
>> as faster as loading it from the constant pool (as a few
>> discussions in PR63281).
>
> I may miss some previous discussions, but I'm curious why we
> chose ">=3" here, as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c9
> which indicates that more than 3 (>3) should be considered
> with this change.

Thanks a lot for your great patience for reading the history!
Yes, there are some discussions about "> 3" vs. "> 2".
- In theory, "ld" is one instruction.  If consider "address/toc"
  adjust, we may count it as 2 instructions. "pld" may need less
  cycles.
- As test, it seems "> 2" could get better/stable runtime result
  during testing SPEC2017.

>
>> 
>> For the concern that I raised in:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
>> The micro-cases would not be the major concern. Because as
>> Segher explained in:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
>> It would just be about the benchmark method.
>> 
>> As tested on spec2017, for visible performance changes, we
>> can find the runtime improvement on 500.perlbench_r about
>> ~1.8% (-O2) when support loading complicates constant from
>> constant pool. And no visible performance recession on
>> other benchmarks.
>
> The improvement on 500.perlbench_r looks to match what PR63281
> mentioned, nice!  I'm curious that which options and which kinds
> of CPUs have you tested with?  Since this is a general change,
> I'd expect we can test with P8/P9/P10 at O2/O3 (or Ofast) at
> least.

Great advice! Thanks for pointing this!
A few months ago, P8/P9/P10 are tested. While this time, I rerun
SPEC2017 on P10 for -O2 and -O3.  More test on latest code would
be better.


BR,
Jeff (Jiufu Guo)

>
> BR,
> Kewen
>
>> 
>> Boostrap & regtest pass on ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>  PR target/63281
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>>  complicate constant to memory.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
>>  * gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
>>  point.
>>  * gcc.target/powerpc/pr106550.c: Likewise..
>>  * gcc.target/powerpc/pr106550_1.c: Likewise.
>>  * gcc.target/powerpc/pr87870.c: Update according to latest behavior.
>>  * gcc.target/powerpc/pr93012.c: Likewise.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc | 16 
>>  .../gcc.target/powerpc/const_anchors.c  |  5 ++---
>>  .../gcc.target/powerpc/parall_5insn_const.c | 14 --
>>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
>>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
>>  gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
>>  gcc/testsuite/gcc.target/powerpc/pr93012.c  |  4 +++-
>>  7 files changed, 65 insertions(+), 11 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 4690384cdbe..b9562f1ea0f 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>c = sext_hwi (c, 32);
>>emit_move_insn (lo, GEN_INT (c));
>>  }
>> +
>> +  /* If it can be stored to the constant pool and profitable.  */
>> +  else if (base_reg_operand (dest, mode)
>> +   && num_insns_constant (source, mode) > 2)
>> +{
>> +  rtx sym = force_const_mem (mode, source);
>> +  if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
>> +  && use_toc_relative_ref (XEXP (sym, 0), mode))
>> +{
>> +  rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
>> +  sym = gen_const_mem (mode, toc);
>> +  set_mem_alias_set (sym, get_TOC_alias_set ());
>> +}
>> +
>> +  emit_insn (gen_rtx_SET (dest, sym));
>> +}
>>else
>>  rs6000_emit_set_long_const (dest, c);
>>break;
>> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
>> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> index 542e2674b12..188744165f2 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile { target has_arch_ppc64 } } */
>> -/* { dg-options "-O2" } */
>> +/* { dg-options "-O2 -fdump-rtl-final" } */
>>  
>>  #define C1 0x2351847027482577ULL
>>  #define C2 0x2351847027482578ULL
>> @@ -16,5 +16,4 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
>> long b)
>>if (b)
>>  *a++ = C2;
>>  }
>> -
>> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
>> +/* { dg-final { scan-rtl-du

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Martin Uecker
Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> 
> > Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> > 
> > Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> > > Hi, Sid,
> > > 
> > > Really appreciate for your example and detailed explanation. Very helpful.
> > > I think that this example is an excellent example to show (almost) all 
> > > the issues we need to consider.
> > > 
> > > I slightly modified this example to make it to be compilable and 
> > > run-able, as following: 
> > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > anyway, the potential reordering possibility is there…)
> > > 
> > >  1 #include 
> > >  2 struct A
> > >  3 {
> > >  4  size_t size;
> > >  5  char buf[] __attribute__((counted_by(size)));
> > >  6 };
> > >  7 
> > >  8 static size_t
> > >  9 get_size_from (void *ptr)
> > > 10 {
> > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > 12 }
> > > 13 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > 21  return;
> > > 22 }
> > > 23 
> > > 24 int main ()
> > > 25 {
> > > 26  foo (20);
> > > 27  return 0;
> > > 28 }
> > > 
> > > With my GCC, it was compiled and worked:
> > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > 20
> > > Situation 1: With O1 and above, the routine “get_size_from” was inlined 
> > > into “foo”, therefore, the call to __bdos is in the same routine as the 
> > > instantiation of the object, and the TYPE information and the attached 
> > > counted_by attribute information in the TYPE of the object can be USED by 
> > > the __bdos call to compute the final object size. 
> > > 
> > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > -1
> > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
> > > “foo”, therefore, the call to __bdos is Not in the same routine as the 
> > > instantiation of the object, As a result, the TYPE info and the attached 
> > > counted_by info of the object can NOT be USED by the __bdos call. 
> > > 
> > > Keep in mind of the above 2 situations, we will refer them in below:
> > > 
> > > 1. First,  the problem we are trying to resolve is:
> > > 
> > > (Your description):
> > > 
> > > > the reordering of __bdos w.r.t. initialization of the size parameter 
> > > > but to also account for DSE of the assignment, we can abstract this 
> > > > problem to that of DFA being unable to see implicit use of the size 
> > > > parameter in the __bdos call.
> > > 
> > > basically is correct.  However, with the following exception:
> > > 
> > > The implicit use of the size parameter in the __bdos call is not always 
> > > there, it ONLY exists WHEN the __bdos is able to evaluated to an 
> > > expression of the size parameter in the “objsz” phase, i.e., the 
> > > “Situation 1” of the above example. 
> > > In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> > > object,  it does not see the counted_by information from the TYPE, 
> > > therefore,  it is not able to evaluate the size of the object through the 
> > > counted_by information.  As a result, the implicit use of the size 
> > > parameter in the __bdos call does NOT exist at all.  The optimizer can 
> > > freely reorder the initialization of the size parameter with the __bdos 
> > > call since there is no data flow dependency between these two. 
> > > 
> > > With this exception in mind, we can see that your proposed “option 2” 
> > > (making the type of size “volatile”) is too conservative, it will  
> > > disable many optimizations  unnecessarily, even though it’s safe and 
> > > simple to implement. 
> > > 
> > > As a compiler optimization person for many many years, I really don’t 
> > > want to take this approach at this moment.  -:)
> > > 
> > > 2. Some facts I’d like to mention:
> > > 
> > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> > > optimization stage. During RTL stage,  the __bdos call has already been 
> > > replaced by an expression of the size parameter or a constant, the data 
> > > dependency is explicitly in the IR already.  I believe that the data 
> > > analysis in RTL stage should pick up the data dependency correctly, No 
> > > special handling is needed in RTL.
> > > 
> > > B. If the __bdos call cannot see the real object , it has no way to get 
> > > the “counted_by” field from the TYPE of the real object. So, if we try to 
> > > add the implicit use of the “counted_by” field to the __bdos call, the 
> > > object instantiation should be in the same routine as the __bdos call.  
> > > Both the FE and the gimplification phase are too early to do this work. 
> > > 
> > > 2. Then, what’s the 

Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-25 Thread Kewen.Lin
on 2023/10/25 16:14, Jiufu Guo wrote:
> 
> Hi,
> 
> "Kewen.Lin"  writes:
> 
>> Hi,
>>
>> on 2023/10/25 10:00, Jiufu Guo wrote:
>>> Hi,
>>>
>>> Sometimes, a complicated constant is built via 3(or more)
>>> instructions to build. Generally speaking, it would not be
>>> as faster as loading it from the constant pool (as a few
>>> discussions in PR63281).
>>
>> I may miss some previous discussions, but I'm curious why we
>> chose ">=3" here, as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c9
>> which indicates that more than 3 (>3) should be considered
>> with this change.
> 
> Thanks a lot for your great patience for reading the history!
> Yes, there are some discussions about "> 3" vs. "> 2".
> - In theory, "ld" is one instruction.  If consider "address/toc"
>   adjust, we may count it as 2 instructions. "pld" may need less
>   cycles.

OK, even without prefixed insn support, the high part of address
computation could be optimized as nop by linker further.  It would
be good to say something on this in commit log, otherwise people
may be confused as the PR comment mentioned above.

> - As test, it seems "> 2" could get better/stable runtime result
>   during testing SPEC2017.

Ok, if you posted the conclusion previously, it would be good to
mention it here with a link on the result comparisons.

> 
>>
>>>
>>> For the concern that I raised in:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
>>> The micro-cases would not be the major concern. Because as
>>> Segher explained in:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
>>> It would just be about the benchmark method.
>>>
>>> As tested on spec2017, for visible performance changes, we
>>> can find the runtime improvement on 500.perlbench_r about
>>> ~1.8% (-O2) when support loading complicates constant from
>>> constant pool. And no visible performance recession on
>>> other benchmarks.
>>
>> The improvement on 500.perlbench_r looks to match what PR63281
>> mentioned, nice!  I'm curious that which options and which kinds
>> of CPUs have you tested with?  Since this is a general change,
>> I'd expect we can test with P8/P9/P10 at O2/O3 (or Ofast) at
>> least.
> 
> Great advice! Thanks for pointing this!
> A few months ago, P8/P9/P10 are tested. While this time, I rerun
> SPEC2017 on P10 for -O2 and -O3.  More test on latest code would
> be better.

Was it tested previously with your recent commits on constant
building together?  or just with the trunk at that time?  Anyway,
I was curious how it's tested, thanks for replying, good to see
those are covered.  :)  I'd leave the further review to Segher and
David.

BR,
Kewen

> 
> 
> BR,
> Jeff (Jiufu Guo)
> 
>>
>> BR,
>> Kewen
>>
>>>
>>> Boostrap & regtest pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff (Jiufu Guo)
>>>
>>> PR target/63281
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>>> complicate constant to memory.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
>>> * gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
>>> point.
>>> * gcc.target/powerpc/pr106550.c: Likewise..
>>> * gcc.target/powerpc/pr106550_1.c: Likewise.
>>> * gcc.target/powerpc/pr87870.c: Update according to latest behavior.
>>> * gcc.target/powerpc/pr93012.c: Likewise.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc | 16 
>>>  .../gcc.target/powerpc/const_anchors.c  |  5 ++---
>>>  .../gcc.target/powerpc/parall_5insn_const.c | 14 --
>>>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
>>>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
>>>  gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
>>>  gcc/testsuite/gcc.target/powerpc/pr93012.c  |  4 +++-
>>>  7 files changed, 65 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 4690384cdbe..b9562f1ea0f 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>>   c = sext_hwi (c, 32);
>>>   emit_move_insn (lo, GEN_INT (c));
>>> }
>>> +
>>> +  /* If it can be stored to the constant pool and profitable.  */
>>> +  else if (base_reg_operand (dest, mode)
>>> +  && num_insns_constant (source, mode) > 2)
>>> +   {
>>> + rtx sym = force_const_mem (mode, source);
>>> + if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
>>> + && use_toc_relative_ref (XEXP (sym, 0), mode))
>>> +   {
>>> + rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
>>> + sym = gen_const_mem (mode, toc);
>>> + set_mem_alias_set (sym, get_TOC_alias_set ());
>>> +   }
>>> +
>>> + emit_insn (gen_rtx_SET (dest, sym));
>>> +   }
>>>else

[PING] libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] (was: [PATCH v5 GCC] libffi/test: Fix compilation for build s

2023-10-25 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2023-09-12T12:58:27+0200, I wrote:
> Hi!
>
> On 2020-04-20T14:18:40+0100, "Maciej W. Rozycki via Gcc-patches" 
>  wrote:
>> Fix a problem with the libffi testsuite using a method to determine the
>> compiler to use resulting in the tool being different from one the
>> library has been built with, and causing a catastrophic failure from the
>> inability to actually choose any compiler at all in a cross-compilation
>> configuration.
>
> This has since, as far as I can tell, been resolved properly by H.J. Lu's
> GCC commit 5be7b66998127286fada45e4f23bd8a2056d553e,
> "libffi: Integrate build with GCC", and
> GCC commit 4824ed41ba7cd63e60fd9f8769a58b79935a90d1
> "libffi: Integrate testsuite with GCC testsuite".
>
>> Address this problem by providing a DejaGNU configuration file defining
>> the compiler to use, via the CC_FOR_TARGET TCL variable, set from $CC by
>> autoconf, which will have all the required options set for the target
>> compiler to build executables in the environment configured
>
> As we've found, this is conceptually problematic, as discussed in
> 
> "Consider '--with-build-sysroot=[...]' for target libraries' build-tree 
> testing (instead of build-time 'CC' etc.) [PR109951]".
> I therefore suggest to apply to GCC libffi the conceptually same changes
> as I've just pushed for libgomp:
> 
> "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
> build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]".
> OK to push the attached
> "libffi: Consider '--with-build-sysroot=[...]' for target libraries' 
> build-tree testing (instead of build-time 'CC' etc.) [PR109951]"?
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8b8654d04dcbb7f0a5947bc21efc5b9c60b3b6c6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Sep 2023 10:50:00 +0200
Subject: [PATCH] libffi: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

	PR testsuite/109951
	libffi/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
	set 'SYSROOT_CFLAGS_FOR_TARGET'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* include/Makefile.in: Likewise.
	* man/Makefile.in: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libffi.exp (libffi_target_compile): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
---
 libffi/Makefile.in  |  1 +
 libffi/configure| 10 ++
 libffi/configure.ac |  5 +++--
 libffi/include/Makefile.in  |  1 +
 libffi/man/Makefile.in  |  1 +
 libffi/testsuite/Makefile.in|  1 +
 libffi/testsuite/lib/libffi.exp |  7 +++
 7 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/libffi/Makefile.in b/libffi/Makefile.in
index 1d936b5c8a5..3a55212cc00 100644
--- a/libffi/Makefile.in
+++ b/libffi/Makefile.in
@@ -383,6 +383,7 @@ SED = @SED@
 SET_MAKE = @SET_MAKE@
 SHELL = @SHELL@
 STRIP = @STRIP@
+SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
 TARGET = @TARGET@
 TARGETDIR = @TARGETDIR@
 TARGET_OBJ = @TARGET_OBJ@
diff --git a/libffi/configure b/libffi/configure
index 9eac9c907bf..f1efd6987a3 100755
--- a/libffi/configure
+++ b/libffi/configure
@@ -666,6 +666,7 @@ TESTSUBDIR_TRUE
 MAINT
 MAINTAINER_MODE_FALSE
 MAINTAINER_MODE_TRUE
+SYSROOT_CFLAGS_FOR_TARGET
 READELF
 CXXCPP
 CPP
@@ -11634,7 +11635,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11637 "configure"
+#line 11638 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11740,7 +11741,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11743 "configure"
+#line 11744 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15137,9 +15138,10 @@ _ACEOF
 
 
 
+
+
 cat > local.exp < local.exp <

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread HAO CHEN GUI
Hi Haochen,
  The regression cases are caused by "targetm.scalar_mode_supported_p" I added
for scalar mode checking. XImode, OImode and TImode (with -m32) are not
enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces
operations on i386.

  The original code doesn't do a check for scalar modes. I think it might be
incorrect as not all scalar modes support move and compare optabs. (e.g.
TImode with -m32 on rs6000).

  I drafted a new patch to manually check optabs for scalar mode. Now both
vector and scalar modes are checked for optabs.

  I did a simple test. All former regression cases are back. Could you help do
a full regression test? I am worry about the coverage of my CI system.

Thanks
Gui Haochen

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7aac575eff8..2af9fcbed18 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
 /* Return true if optabs exists for the mode and certain by pieces
operations.  */
 static bool
-qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
+mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
+  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+return false;
+
   if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
-  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
-return true;
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
+return false;

   if (op == COMPARE_BY_PIECES
-  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
-  && can_compare_p (EQ, mode, ccp_jump))
-return true;
+  && !can_compare_p (EQ, mode, ccp_jump))
+return false;

-  return false;
+  return true;
 }

 /* Return the widest mode that can be used to perform part of an
@@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (qi_vector_mode_supported_p (candidate, op))
+   if (mode_supported_p (candidate, op))
  result = candidate;
  }

@@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
 {
   mode = tmode.require ();
   if (GET_MODE_SIZE (mode) < size
- && targetm.scalar_mode_supported_p (mode))
+ && mode_supported_p (mode, op))
   result = mode;
 }

@@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
(unsigned int size)
  break;

if (GET_MODE_SIZE (candidate) >= size
-   && qi_vector_mode_supported_p (candidate, m_op))
+   && mode_supported_p (candidate, m_op))
  return candidate;
  }
 }


[PING] libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] (was: [PATCH v4 1/5] libatomic/test: Fix compilation for b

2023-10-25 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2023-09-12T13:03:28+0200, I wrote:
> Hi!
>
> On 2020-04-04T00:00:44+0100, "Maciej W. Rozycki via Gcc-patches" 
>  wrote:
>> Fix a problem with the libatomic testsuite using a method to determine
>> the compiler to use resulting in the tool being different from one the
>> library has been built with, and causing a catastrophic failure from the
>> lack of a suitable `--sysroot=' option where the `--with-build-sysroot='
>> configuration option has been used to build the compiler resulting in
>> the inability to link executables.
>>
>> Address this problem by providing a DejaGNU configuration file defining
>> the compiler to use, via the GCC_UNDER_TEST TCL variable, set from $CC
>> by autoconf, which will have all the required options set for the target
>> compiler to build executables in the environment configured
>
> As we've found, this is conceptually problematic, as discussed in
> 
> "Consider '--with-build-sysroot=[...]' for target libraries' build-tree 
> testing (instead of build-time 'CC' etc.)
> [PR109951]".
> I therefore suggest to apply to libatomic the conceptually same changes
> as I've just pushed for libgomp:
> 
> "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' 
> build-tree testing (instead of build-time 'CC'
> etc.) [PR91884, PR109951]".
> OK to push the attached
> "libatomic: Consider '--with-build-sysroot=[...]' for target libraries' 
> build-tree testing (instead of build-time 'CC' etc.) [PR109951]"?
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 584bfb74e802b94c490b963bd05ed520b5c6e453 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Sep 2023 11:36:31 +0200
Subject: [PATCH] libatomic: Consider '--with-build-sysroot=[...]' for target
 libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

	PR testsuite/109951
	libatomic/
	* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
	* testsuite/lib/libatomic.exp (libatomic_init): If
	'--with-build-sysroot=[...]' was specified, use it for build-tree
	testing.
	* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
	set.
	(SYSROOT_CFLAGS_FOR_TARGET): Set.
---
 libatomic/Makefile.in   | 1 +
 libatomic/configure | 7 +--
 libatomic/configure.ac  | 2 ++
 libatomic/testsuite/Makefile.in | 1 +
 libatomic/testsuite/lib/libatomic.exp   | 5 +
 libatomic/testsuite/libatomic-site-extra.exp.in | 2 +-
 6 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index 83efe7d2694..2d2d64ee947 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -328,6 +328,7 @@ SET_MAKE = @SET_MAKE@
 SHELL = @SHELL@
 SIZES = @SIZES@
 STRIP = @STRIP@
+SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@
 VERSION = @VERSION@
 XCFLAGS = @XCFLAGS@
 XLDFLAGS = @XLDFLAGS@
diff --git a/libatomic/configure b/libatomic/configure
index 57f320753e1..629ad22e833 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -656,6 +656,7 @@ LIBAT_BUILD_VERSIONED_SHLIB_FALSE
 LIBAT_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+SYSROOT_CFLAGS_FOR_TARGET
 enable_aarch64_lse
 libtool_VERSION
 MAINT
@@ -11402,7 +11403,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11405 "configure"
+#line 11406 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11508,7 +11509,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11511 "configure"
+#line 11512 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11866,6 +11867,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 ;;
 esac
 
+
+
 # Get target configury.
 . ${srcdir}/configure.tgt
 if test -n "$UNSUPPORTED"; then
diff --git a/libatomic/configure.ac b/libatomic/configure.ac
index 318b605a1d7..4beff2d681f 100644
--- a/libatomic/configure.ac
+++ b/libatomic/configure.ac
@@ -170,6 +170,8 @@ case "$target" in
 ;;
 esac

Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread Robin Dapp
Hi Juzhe,

I guess that's OK but what's the problem here?  Are the default options
wrong so we need to overwrite them instead of adding some?

Regards
 Robin


[PATCH] RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]

2023-10-25 Thread Juzhe-Zhong
Address kito's comments of AVL propagation patch.

Change avl_type into avl_type_idx.

No functionality change.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (vlmax_avl_type_p): New function.
* config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto.
* config/riscv/riscv-vsetvl.cc (get_avl): Adapt function.
* config/riscv/vector.md (): Change avl_type into avl_type_idx.

---
 gcc/config/riscv/riscv-protos.h  |   1 +
 gcc/config/riscv/riscv-v.cc  |  12 +++
 gcc/config/riscv/riscv-vsetvl.cc |   2 +-
 gcc/config/riscv/vector.md   | 165 +++
 4 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..fffd9cd0b8a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -559,6 +559,7 @@ bool cmp_lmul_le_one (machine_mode);
 bool cmp_lmul_gt_one (machine_mode);
 bool gather_scatter_valid_offset_mode_p (machine_mode);
 bool vls_mode_valid_p (machine_mode);
+bool vlmax_avl_type_p (rtx_insn *);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..d439ec06af0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4435,4 +4435,16 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
 }
 
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int index = get_attr_avl_type_idx (rinsn);
+  if (index == INVALID_ATTRIBUTE)
+return false;
+  rtx avl_type = recog_data.operand[index];
+  return INTVAL (avl_type) == VLMAX;
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..73a6d4b7406 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -365,7 +365,7 @@ get_avl (rtx_insn *rinsn)
 
   if (!has_vl_op (rinsn))
 return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
+  if (vlmax_avl_type_p (rinsn))
 return RVV_VLMAX;
   extract_insn_cached (rinsn);
   return recog_data.operand[get_attr_vl_op_idx (rinsn)];
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..cea3dbf37a6 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -800,31 +800,22 @@
(const_int INVALID_ATTRIBUTE)))
 
 ;; The avl type value.
-(define_attr "avl_type" ""
-  (cond [(eq_attr "mode" 
"V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
- 
V1BI,V2BI,V4BI,V8BI,V16BI,V32BI,V64BI,V128BI,V256BI,V512BI,V1024BI,V2048BI,V4096BI,
- 
V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
- 
V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
- 
V1DI,V2DI,V4DI,V8DI,V16DI,V32DI,V64DI,V128DI,V256DI,V512DI,
- 
V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
- 
V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
- 
V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-  (symbol_ref "riscv_vector::NONVLMAX")
-   (eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
+(define_attr "avl_type_idx" ""
+  (cond [(eq_attr "type" 
"vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  
vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
  vfclass,vired,viwred,vfredu,vfredo,vfwredu,vfwredo,\
  vimovxv,vfmovfv,vlsegde,vlsegdff")
-  (symbol_ref "INTVAL (operands[7])")
+  (const_int 7)
 (eq_attr "type" "vldm,vstm,vimov,vmalu,vmalu")
-  (symbol_ref "INTVAL (operands[5])")
+  (const_int 5)
 
 ;; If operands[3] of "vlds" is not vector mode, it is pred_broadcast.
 ;; wheras it is pred_strided_load if operands[3] is vector mode.
 (eq_attr "type" "vlds")
   (if_then_else (match_test "VECTOR_MODE_P (GET_MODE (operands[3]))")
 (const_int INVALID_ATTRIBUTE)
-(symbol_ref "INTVAL (operands[7])"))
+(const_int 7))
 
 (eq_attr "type" "vldux,vldox,vialu,vshift,viminmax,vimul,vidiv,vsalu,\
  viwalu,viwmul,vnshift,vaalu,vsmul,vsshift,\
@@ -832,18 +823,18 @@
  vfsgnj,vfcmp,vslideup,vslidedown,vislide1up,\
  
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
  vlsegds,vlsegdux,vlsegdox")
-  (symbol_ref "INTVAL (operands[8])")
+  (const_int 8)
 (eq_attr "type" "vstux,vstox,vssegts,vssegtux,vssegtox")
-  (symbol_ref "INTVAL 

Re: [PATCH] RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]

2023-10-25 Thread Robin Dapp
LGTM.

Regards
 Robin


[PING^2] More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc. (was: [PATCH][v2] Introduce TARGET_SUPPORTS_ALIASES)

2023-10-25 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2023-09-19T10:47:56+0200, I wrote:
> Hi!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2023-09-08T14:02:50+0200, I wrote:
>> Hi!
>>
>> On 2017-08-10T15:42:13+0200, Jan Hubicka  wrote:
 On 07/31/2017 11:57 AM, Yuri Gribov wrote:
 > On Mon, Jul 31, 2017 at 9:04 AM, Martin Liška  wrote:
 >> Doing the transformation suggested by Honza.
>>
>> ... which was:
>>
>> | On 2017-07-24T16:06:22+0200, Jan Hubicka  wrote:
>> | > we probably should turn ASM_OUTPUT_DEF ifdefs into a conditional 
>> compilation
>> | > incrementally.
>>
 >From 78ee08b25d22125cb1fa248bac98ef1e84504761 Mon Sep 17 00:00:00 2001
 From: marxin 
 Date: Tue, 25 Jul 2017 13:11:28 +0200
 Subject: [PATCH] Introduce TARGET_SUPPORTS_ALIASES
>>
>> ..., and got pushed as commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6
>> (Subversion r251048) "Introduce TARGET_SUPPORTS_ALIASES".
>>
>> I don't know if that was actually intentional here, or just an
>> "accident", but such changes actually allow that a back end may or may
>> not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES')
>> independent of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not
>> just on static but instead on dynamic (run-time) configuration.  This is
>> relevant for the nvptx back end's '-malias' flag.
>>
>> There did remain a few instances where we currently still assume that
>> from '#ifdef ASM_OUTPUT_DEF' follows 'TARGET_SUPPORTS_ALIASES', which I'm
>> adjusting in the attached (with '--ignore-space-change', for easy review)
>> "More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.".
>> OK to push?
>>
>> These changes are necessary to cure nvptx regressions raised in
>> 
>> "[nvptx] Use .alias directive for mptx >= 6.3", addressing the comment:
>> "[...] remains to be analyzed".
>>
>>
>> Grüße
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4c725226c3657adb775af274876de5077b8fbf45 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Sep 2023 22:15:08 +0200
Subject: [PATCH] More '#ifdef ASM_OUTPUT_DEF' -> 'if
 (TARGET_SUPPORTS_ALIASES)' etc.

Per commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration.  There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'.  Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

	gcc/
	* ipa-icf.cc (sem_item::target_supports_symbol_aliases_p):
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before
	'return true;'.
	* ipa-visibility.cc (function_and_variable_visibility): Change
	'#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'.
	* varasm.cc (output_constant_pool_contents)
	[#ifdef ASM_OUTPUT_DEF]:
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
	(do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]:
	'if (!TARGET_SUPPORTS_ALIASES)',
	'gcc_checking_assert (seen_error ());'.
	(assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to
	'if (!TARGET_SUPPORTS_ALIASES)'.
	(default_asm_output_anchor):
	'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
---
 gcc/ipa-icf.cc|  1 +
 gcc/ipa-visibility.cc |  8 +---
 gcc/varasm.cc | 13 ++---
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index 836d0914ded..bbdfd445397 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -218,6 +218,7 @@ sem_item::target_supports_symbol_aliases_p (void)
 #if !defined (ASM_OUTPUT_DEF) || (!defined(ASM_OUTPUT_WEAK_ALIAS) && !defined (ASM_WEAKEN_DECL))
   return false;
 #else
+  gcc_checking_assert (TARGET_SUPPORTS_ALIASES);
   return true;
 #endif
 }
diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
index 8ec82bb333e..8ce56114ee3 100644
--- a/gcc/ipa-visibility.cc
+++ b/gcc/ipa-visibility.cc
@@ -622,7 +622,8 @@ function_and_variable_visibility (bool whole_program)
   /* All aliases should be processed at this point.  */
   gcc_checking_assert (!alias_pairs || !alias_pairs->length ());
 
-#ifdef ASM_OUTPUT_DEF
+  if (TARGET_SUPPORTS_ALIASES)
+{
   FOR_EACH_DEFINED_FUNCTION (node)
 	{
 	  if (node->get_availability () != AVAIL_INTERPOSABLE
@@ -643,7 +644,8 @@ function_and_variable_visibility (bool whole_program)
 
 	  if (!alias)
 		{
-	  alias = dyn_cast (node->noninterposable_alias ());
+		  alias
+		= dyn_cast (node->noninterposable_alias ());
 		  gcc_assert (al

Re: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread juzhe.zh...@rivai.ai
In rvv.exp: we specify -march=rv64gcv_zfh

However, when I built the toolchain with -march=rv64gcv_zfh_zvfh. Then link 
fail.

All other tests like compress_run-1.c are works fine with :
/* { dg-options "-O3 --param riscv-autovec-preference=fixed-vlmax -Wno-psabi" } 
*/


So I adapt these tests like others.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-25 16:35
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite
Hi Juzhe,
 
I guess that's OK but what's the problem here?  Are the default options
wrong so we need to overwrite them instead of adding some?
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread juzhe.zh...@rivai.ai
FAIL: gcc.target/riscv/rvv/autovec/slp-mask-run-1.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
check-function-bodies foo1
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
check-function-bodies foo2
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
check-function-bodies foo3
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
check-function-bodies foo4
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
check-function-bodies foo5
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-run.c -O3 -ftree-vectorize (test for 
excess errors)

Fix all of these FAILs


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-25 16:35
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite
Hi Juzhe,
 
I guess that's OK but what's the problem here?  Are the default options
wrong so we need to overwrite them instead of adding some?
 
Regards
Robin
 


Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Hi Haochen,
>   The regression cases are caused by "targetm.scalar_mode_supported_p" I added
> for scalar mode checking. XImode, OImode and TImode (with -m32) are not
> enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces
> operations on i386.
>
>   The original code doesn't do a check for scalar modes. I think it might be
> incorrect as not all scalar modes support move and compare optabs. (e.g.
> TImode with -m32 on rs6000).
>
>   I drafted a new patch to manually check optabs for scalar mode. Now both
> vector and scalar modes are checked for optabs.
>
>   I did a simple test. All former regression cases are back. Could you help do
> a full regression test? I am worry about the coverage of my CI system.

Thanks for the quick fix.  The patch LGTM FWIW.  Just a small suggestion
for the function name:

>
> Thanks
> Gui Haochen
>
> patch.diff
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 7aac575eff8..2af9fcbed18 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
>  /* Return true if optabs exists for the mode and certain by pieces
> operations.  */
>  static bool
> -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)

Might be worth calling this something more specific, such as
by_pieces_mode_supported_p.

Otherwise the patch is OK for trunk if it passes the x86 testing.

Thanks,
Richard

>  {
> +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> +return false;
> +
>if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> -return true;
> +  && VECTOR_MODE_P (mode)
> +  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
> +return false;
>
>if (op == COMPARE_BY_PIECES
> -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> -  && can_compare_p (EQ, mode, ccp_jump))
> -return true;
> +  && !can_compare_p (EQ, mode, ccp_jump))
> +return false;
>
> -  return false;
> +  return true;
>  }
>
>  /* Return the widest mode that can be used to perform part of an
> @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
> by_pieces_operation op)
> {
>   if (GET_MODE_SIZE (candidate) >= size)
> break;
> - if (qi_vector_mode_supported_p (candidate, op))
> + if (mode_supported_p (candidate, op))
> result = candidate;
> }
>
> @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
> by_pieces_operation op)
>  {
>mode = tmode.require ();
>if (GET_MODE_SIZE (mode) < size
> -   && targetm.scalar_mode_supported_p (mode))
> +   && mode_supported_p (mode, op))
>result = mode;
>  }
>
> @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
> (unsigned int size)
> break;
>
>   if (GET_MODE_SIZE (candidate) >= size
> - && qi_vector_mode_supported_p (candidate, m_op))
> + && mode_supported_p (candidate, m_op))
> return candidate;
> }
>  }


Re: PR111754

2023-10-25 Thread Richard Sandiford
Sigh, I knew I should have waited until the morning to proof-read
and send this.

Richard Sandiford  writes:
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 40767736389..00fce4945a7 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices &sel,
>unsigned res_npatterns, res_nelts_per_pattern;
>unsigned HOST_WIDE_INT res_nelts;
>  
> -  /* (1) If SEL is a suitable mask as determined by
> - valid_mask_for_fold_vec_perm_cst_p, then:
> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> - res_nelts_per_pattern = max of nelts_per_pattern between
> -  ARG0, ARG1 and SEL.
> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> - res_npatterns = nelts in result vector.
> - res_nelts_per_pattern = 1.
> - This exception is made so that VLS ARG0, ARG1 and SEL work as before.  
> */
> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> -{
> -  res_npatterns
> - = std::max (VECTOR_CST_NPATTERNS (arg0),
> - std::max (VECTOR_CST_NPATTERNS (arg1),
> -   sel.encoding ().npatterns ()));
> +  /* First try to implement the fold in a VLA-friendly way.
> +
> + (1) If the selector is simply a duplication of N elements, the
> +  result is likewise a duplication of N elements.
> +
> + (2) If the selector is N elements followed by a duplication
> +  of N elements, the result is too.
>  
> -  res_nelts_per_pattern
> - = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> - std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> -   sel.encoding ().nelts_per_pattern ()));
> + (3) If the selector is N elements followed by an interleaving
> +  of N linear series, the situation is more complex.
>  
> +  valid_mask_for_fold_vec_perm_cst_p detects whether we
> +  can handle this case.  If we can, then each of the N linear
> +  series either (a) selects the same element each time or
> +  (b) selects a linear series from one of the input patterns.
> +
> +  If (b) holds for one of the linear series, the result
> +  will contain a linear series, and so the result will have
> +  the same shape as the selector.  If (a) holds for all of
> +  the lienar series, the result will be the same as (2) above.

linear

> +
> +  (b) can only hold if one of the inputs pattern has a

input patterns

Sorry for the typos.

Richard

> +  stepped encoding.  */
> +  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> +{
> +  res_npatterns = sel.encoding ().npatterns ();
> +  res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> +  if (res_nelts_per_pattern == 3
> +   && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> +   && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> + res_nelts_per_pattern = 2;
>res_nelts = res_npatterns * res_nelts_per_pattern;
>  }
>else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))


Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread Robin Dapp


> However, when I built the toolchain with -march=rv64gcv_zfh_zvfh.
> Then link fail.

Hmm, is it about zvfh or why does linking fail?

Regards
 Robin


Re: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread juzhe.zh...@rivai.ai
Hmmm. I am not familiar with Binutils...

I just adapted tests like others in the testsuite make them consistent.
And turns out it can fix the issues...



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-25 16:44
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite
 
> However, when I built the toolchain with -march=rv64gcv_zfh_zvfh.
> Then link fail.
 
Hmm, is it about zvfh or why does linking fail?
 
Regards
Robin
 


RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread Jiang, Haochen
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, October 25, 2023 4:40 PM
> To: HAO CHEN GUI 
> Cc: Jiang, Haochen ; gcc-patches  patc...@gcc.gnu.org>
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> HAO CHEN GUI  writes:
> > Hi Haochen,
> >   The regression cases are caused by "targetm.scalar_mode_supported_p"
> > I added for scalar mode checking. XImode, OImode and TImode (with
> > -m32) are not enabled in ix86_scalar_mode_supported_p. So they're
> > excluded from by pieces operations on i386.
> >
> >   The original code doesn't do a check for scalar modes. I think it
> > might be incorrect as not all scalar modes support move and compare optabs. 
> > (e.g.
> > TImode with -m32 on rs6000).
> >
> >   I drafted a new patch to manually check optabs for scalar mode. Now
> > both vector and scalar modes are checked for optabs.
> >
> >   I did a simple test. All former regression cases are back. Could you
> > help do a full regression test? I am worry about the coverage of my CI 
> > system.

Thanks for that. I am running the regression test now.

Thx,
Haochen

> 
> Thanks for the quick fix.  The patch LGTM FWIW.  Just a small suggestion for
> the function name:
> 
> >
> > Thanks
> > Gui Haochen
> >
> > patch.diff
> > diff --git a/gcc/expr.cc b/gcc/expr.cc index 7aac575eff8..2af9fcbed18
> > 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
> >  /* Return true if optabs exists for the mode and certain by pieces
> > operations.  */
> >  static bool
> > -qi_vector_mode_supported_p (fixed_size_mode mode,
> by_pieces_operation
> > op)
> > +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> 
> Might be worth calling this something more specific, such as
> by_pieces_mode_supported_p.
> 
> Otherwise the patch is OK for trunk if it passes the x86 testing.
> 
> Thanks,
> Richard
> 
> >  {
> > +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> > +return false;
> > +
> >if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> > -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> > -return true;
> > +  && VECTOR_MODE_P (mode)
> > +  && optab_handler (vec_duplicate_optab, mode) ==
> CODE_FOR_nothing)
> > +return false;
> >
> >if (op == COMPARE_BY_PIECES
> > -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> > -  && can_compare_p (EQ, mode, ccp_jump))
> > -return true;
> > +  && !can_compare_p (EQ, mode, ccp_jump))
> > +return false;
> >
> > -  return false;
> > +  return true;
> >  }
> >
> >  /* Return the widest mode that can be used to perform part of an @@
> > -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size,
> by_pieces_operation op)
> >   {
> > if (GET_MODE_SIZE (candidate) >= size)
> >   break;
> > -   if (qi_vector_mode_supported_p (candidate, op))
> > +   if (mode_supported_p (candidate, op))
> >   result = candidate;
> >   }
> >
> > @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int
> size, by_pieces_operation op)
> >  {
> >mode = tmode.require ();
> >if (GET_MODE_SIZE (mode) < size
> > - && targetm.scalar_mode_supported_p (mode))
> > + && mode_supported_p (mode, op))
> >result = mode;
> >  }
> >
> > @@ -1454,7 +1457,7 @@
> op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size)
> >   break;
> >
> > if (GET_MODE_SIZE (candidate) >= size
> > -   && qi_vector_mode_supported_p (candidate, m_op))
> > +   && mode_supported_p (candidate, m_op))
> >   return candidate;
> >   }
> >  }


Re: [PATCH, OpenACC 2.7] Implement self clause for compute constructs

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-06-13T23:52:25+0800, Chung-Lin Tang via Gcc-patches 
 wrote:
> This patch implements the compiler side for the 'self' clause for compute 
> constructs:
> parallel, kernels, and serial.
>
> As you know, the actual "local device" device type for libgomp is not yet 
> implemented,
> so the libgomp side is basically just a simple duplicate of what 
> host-fallback is doing,

Thanks, and ACK.

> though everything else should be completed by this patch.

What also is missing is allowing nested OpenACC compute constructs, which
GCC currently rejects.  (Just removing the nesting restriction isn't
sufficient, I think: will also have to think about explicit/implicit data
(and other?) clauses in nested compute constructs, for example, so this
doesn't seem entirely trivial to implement.)  I'm fine to defer that item
until actual multicore CPU "device" support emerges (for avoidance of
doubt: we're not currently working on that).

> Tested on powerpc64le-linux/nvptx, x64_64-linux/amdgcn tests pending.
> Is this okay for trunk?

With minor textual conflicts resolved, I've pushed this to master branch
in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", see
attached.


I'll then apply/submit a number of follow-on commits.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 13 Jun 2023 08:44:31 -0700
Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.

The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
	(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_oacc_compute_clause_self): New function.
	(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
	* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
	with OMP_CLAUSE_IF case.

gcc/fortran/ChangeLog:

	* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
	* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
	(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
	(OACC_KERNELS_CLAUSES): Likewise.
	(OACC_SERIAL_CLAUSES): Likewise.
	(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
	* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
	clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
	(gfc_split_omp_clauses): Add handling of self_expr field copy.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
	* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
	case.
	(convert_local_omp_clauses): Likewise.
	* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/self-clause-1.c: New test.
	* c-c++-common/goacc/self-clause-2.c: New test.
	* gfortran.dg/goacc/self.f95: New test.

include/ChangeLog:

	* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.

libgomp/ChangeLog:

	* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
	GO

Enable 'c-c++-common/goacc/{if,self}-clause-1.c' for C++ (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-skip-if "not yet" { c++ } } */

Pushed to master branch commit 76cc5463227308c4dd2b70ccfe04d2b5361db0fe
"Enable 'c-c++-common/goacc/{if,self}-clause-1.c' for C++", see attached.

> +
> +void
> +f (int b)
> +{
> +  struct { int i; } *p;
> +
> +#pragma acc parallel self self(b) /* { dg-error "too many 'self' clauses" } 
> */
> +  ;
> +#pragma acc parallel self(*p) /* { dg-error "used struct type value where 
> scalar is required" } */
> +  ;
> +
> +#pragma acc kernels self self(b) /* { dg-error "too many 'self' clauses" } */
> +  ;
> +#pragma acc kernels self(*p) /* { dg-error "used struct type value where 
> scalar is required" } */
> +  ;
> +
> +#pragma acc serial self self(b) /* { dg-error "too many 'self' clauses" } */
> +  ;
> +#pragma acc serial self(*p) /* { dg-error "used struct type value where 
> scalar is required" } */
> +  ;
> +}


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 76cc5463227308c4dd2b70ccfe04d2b5361db0fe Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 20 Oct 2023 14:07:37 +0200
Subject: [PATCH] Enable 'c-c++-common/goacc/{if,self}-clause-1.c' for C++

As discovered via recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs",
'c-c++-common/goacc/if-clause-1.c', which the new
'c-c++-common/goacc/self-clause-1.c' was copied from, was not enabled for C++.

	gcc/testsuite/
	* c-c++-common/goacc/if-clause-1.c: Enable for C++
	* c-c++-common/goacc/self-clause-1.c: Likewise.
---
 gcc/testsuite/c-c++-common/goacc/if-clause-1.c   |  6 --
 gcc/testsuite/c-c++-common/goacc/self-clause-1.c | 14 ++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-1.c b/gcc/testsuite/c-c++-common/goacc/if-clause-1.c
index 85abf1659e9..d78520bdd1b 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-1.c
@@ -1,4 +1,4 @@
-/* { dg-skip-if "not yet" { c++ } } */
+/* See also 'self-clause-1.c'.  */
 
 void
 f (void)
@@ -6,5 +6,7 @@ f (void)
   struct { int i; } *p;
 #pragma acc data copyout(p) if(1) if(1) /* { dg-error "too many 'if' clauses" } */
   ;
-#pragma acc update device(p) if(*p) /* { dg-error "used struct type value where scalar is required" } */
+#pragma acc update device(p) if(*p)
+  /* { dg-error {used struct type value where scalar is required} {} { target c } .-1 }
+ { dg-error {could not convert '\* p' from 'f\(\)::' to 'bool'} {} { target c++ } .-2 } */
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
index ed5d072e81f..fe892bea210 100644
--- a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
@@ -1,4 +1,4 @@
-/* { dg-skip-if "not yet" { c++ } } */
+/* See also 'if-clause-1.c'.  */
 
 void
 f (int b)
@@ -7,16 +7,22 @@ f (int b)
 
 #pragma acc parallel self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
-#pragma acc parallel self(*p) /* { dg-error "used struct type value where scalar is required" } */
+#pragma acc parallel self(*p)
+  /* { dg-error {used struct type value where scalar is required} {} { target c } .-1 }
+ { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
 #pragma acc kernels self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
-#pragma acc kernels self(*p) /* { dg-error "used struct type value where scalar is required" } */
+#pragma acc kernels self(*p)
+  /* { dg-error {used struct type value where scalar is required} {} { target c } .-1 }
+ { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
 #pragma acc serial self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
-#pragma acc serial self(*p) /* { dg-error "used struct type value where scalar is required" } */
+#pragma acc serial self(*p)
+  /* { dg-error {used struct type value where scalar is required} {} { target c } .-1 }
+ { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 }
-- 
2.34.1



[PATCH] RISC-V: Export some functions from riscv-vsetvl to riscv-v

2023-10-25 Thread Juzhe-Zhong
Address kito's comments of AVL propagation patch.

Export the functions that are not only used by VSETVL PASS but also AVL 
propagation PASS.

No functionality change.
gcc/ChangeLog:

* config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl to 
riscv-v
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-v.cc (has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(count_regno_occurrences): Ditto.
(validate_change_or_fail): Ditto.

---
 gcc/config/riscv/riscv-protos.h  |  8 +++
 gcc/config/riscv/riscv-v.cc  | 83 
 gcc/config/riscv/riscv-vsetvl.cc | 70 ---
 3 files changed, 91 insertions(+), 70 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index fffd9cd0b8a..668d75043ca 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -560,6 +560,14 @@ bool cmp_lmul_gt_one (machine_mode);
 bool gather_scatter_valid_offset_mode_p (machine_mode);
 bool vls_mode_valid_p (machine_mode);
 bool vlmax_avl_type_p (rtx_insn *);
+bool has_vl_op (rtx_insn *);
+bool tail_agnostic_p (rtx_insn *);
+void validate_change_or_fail (rtx, rtx *, rtx, bool);
+bool nonvlmax_avl_type_p (rtx_insn *);
+bool vlmax_avl_p (rtx);
+uint8_t get_sew (rtx_insn *);
+enum vlmul_type get_vlmul (rtx_insn *);
+int count_regno_occurrences (rtx_insn *, unsigned int);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index d439ec06af0..3fe8125801b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4447,4 +4447,87 @@ vlmax_avl_type_p (rtx_insn *rinsn)
   return INTVAL (avl_type) == VLMAX;
 }
 
+/* Return true if it is an RVV instruction depends on VL global
+   status register.  */
+bool
+has_vl_op (rtx_insn *rinsn)
+{
+  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
+}
+
+/* Get default tail policy.  */
+static bool
+get_default_ta ()
+{
+  /* For the instruction that doesn't require TA, we still need a default value
+ to emit vsetvl. We pick up the default value according to prefer policy. 
*/
+  return (bool) (get_prefer_tail_policy () & 0x1
+|| (get_prefer_tail_policy () >> 1 & 0x1));
+}
+
+/* Helper function to get TA operand.  */
+bool
+tail_agnostic_p (rtx_insn *rinsn)
+{
+  /* If it doesn't have TA, we return agnostic by default.  */
+  extract_insn_cached (rinsn);
+  int ta = get_attr_ta (rinsn);
+  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
+}
+
+/* Change insn and Assert the change always happens.  */
+void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
+/* Return true if it is NONVLMAX AVL TYPE.  */
+bool
+nonvlmax_avl_type_p (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int index = get_attr_avl_type_idx (rinsn);
+  if (index == INVALID_ATTRIBUTE)
+return false;
+  rtx avl_type = recog_data.operand[index];
+  return INTVAL (avl_type) == NONVLMAX;
+}
+
+/* Return true if RTX is RVV VLMAX AVL.  */
+bool
+vlmax_avl_p (rtx x)
+{
+  return x && rtx_equal_p (x, RVV_VLMAX);
+}
+
+/* Helper function to get SEW operand. We always have SEW value for
+   all RVV instructions that have VTYPE OP.  */
+uint8_t
+get_sew (rtx_insn *rinsn)
+{
+  return get_attr_sew (rinsn);
+}
+
+/* Helper function to get VLMUL operand. We always have VLMUL value for
+   all RVV instructions that have VTYPE OP. */
+enum vlmul_type
+get_vlmul (rtx_insn *rinsn)
+{
+  return (enum vlmul_type) get_attr_vlmul (rinsn);
+}
+
+/* Count the number of REGNO in RINSN.  */
+int
+count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
+{
+  int count = 0;
+  extract_insn (rinsn);
+  for (int i = 0; i < recog_data.n_operands; i++)
+if (refers_to_regno_p (regno, recog_data.operand[i]))
+  count++;
+  return count;
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 73a6d4b7406..77dbf159d41 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -255,12 +255,6 @@ policy_to_str (bool agnostic_p)
   return agnostic_p

Re: [PATCH] RISC-V: Export some functions from riscv-vsetvl to riscv-v

2023-10-25 Thread Kito Cheng
LGTM, but plz mention it's NFC in the title, no v2 needed :)

On Wed, Oct 25, 2023 at 5:03 PM Juzhe-Zhong  wrote:
>
> Address kito's comments of AVL propagation patch.
>
> Export the functions that are not only used by VSETVL PASS but also AVL 
> propagation PASS.
>
> No functionality change.
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl 
> to riscv-v
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (nonvlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (count_regno_occurrences): Ditto.
> * config/riscv/riscv-v.cc (has_vl_op): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (nonvlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (get_vlmul): Ditto.
> (count_regno_occurrences): Ditto.
> * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
> (has_vl_op): Ditto.
> (get_sew): Ditto.
> (get_vlmul): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (count_regno_occurrences): Ditto.
> (validate_change_or_fail): Ditto.
>
> ---
>  gcc/config/riscv/riscv-protos.h  |  8 +++
>  gcc/config/riscv/riscv-v.cc  | 83 
>  gcc/config/riscv/riscv-vsetvl.cc | 70 ---
>  3 files changed, 91 insertions(+), 70 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index fffd9cd0b8a..668d75043ca 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -560,6 +560,14 @@ bool cmp_lmul_gt_one (machine_mode);
>  bool gather_scatter_valid_offset_mode_p (machine_mode);
>  bool vls_mode_valid_p (machine_mode);
>  bool vlmax_avl_type_p (rtx_insn *);
> +bool has_vl_op (rtx_insn *);
> +bool tail_agnostic_p (rtx_insn *);
> +void validate_change_or_fail (rtx, rtx *, rtx, bool);
> +bool nonvlmax_avl_type_p (rtx_insn *);
> +bool vlmax_avl_p (rtx);
> +uint8_t get_sew (rtx_insn *);
> +enum vlmul_type get_vlmul (rtx_insn *);
> +int count_regno_occurrences (rtx_insn *, unsigned int);
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index d439ec06af0..3fe8125801b 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -4447,4 +4447,87 @@ vlmax_avl_type_p (rtx_insn *rinsn)
>return INTVAL (avl_type) == VLMAX;
>  }
>
> +/* Return true if it is an RVV instruction depends on VL global
> +   status register.  */
> +bool
> +has_vl_op (rtx_insn *rinsn)
> +{
> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
> +}
> +
> +/* Get default tail policy.  */
> +static bool
> +get_default_ta ()
> +{
> +  /* For the instruction that doesn't require TA, we still need a default 
> value
> + to emit vsetvl. We pick up the default value according to prefer 
> policy. */
> +  return (bool) (get_prefer_tail_policy () & 0x1
> +|| (get_prefer_tail_policy () >> 1 & 0x1));
> +}
> +
> +/* Helper function to get TA operand.  */
> +bool
> +tail_agnostic_p (rtx_insn *rinsn)
> +{
> +  /* If it doesn't have TA, we return agnostic by default.  */
> +  extract_insn_cached (rinsn);
> +  int ta = get_attr_ta (rinsn);
> +  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
> +}
> +
> +/* Change insn and Assert the change always happens.  */
> +void
> +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
> +{
> +  bool change_p = validate_change (object, loc, new_rtx, in_group);
> +  gcc_assert (change_p);
> +}
> +
> +/* Return true if it is NONVLMAX AVL TYPE.  */
> +bool
> +nonvlmax_avl_type_p (rtx_insn *rinsn)
> +{
> +  extract_insn_cached (rinsn);
> +  int index = get_attr_avl_type_idx (rinsn);
> +  if (index == INVALID_ATTRIBUTE)
> +return false;
> +  rtx avl_type = recog_data.operand[index];
> +  return INTVAL (avl_type) == NONVLMAX;
> +}
> +
> +/* Return true if RTX is RVV VLMAX AVL.  */
> +bool
> +vlmax_avl_p (rtx x)
> +{
> +  return x && rtx_equal_p (x, RVV_VLMAX);
> +}
> +
> +/* Helper function to get SEW operand. We always have SEW value for
> +   all RVV instructions that have VTYPE OP.  */
> +uint8_t
> +get_sew (rtx_insn *rinsn)
> +{
> +  return get_attr_sew (rinsn);
> +}
> +
> +/* Helper function to get VLMUL operand. We always have VLMUL value for
> +   all RVV instructions that have VTYPE OP. */
> +enum vlmul_type
> +get_vlmul (rtx_insn *rinsn)
> +{
> +  return (enum vlmul_type) get_attr_vlmul (rinsn);
> +}
> +
> +/* Count the number of REGNO in RINSN.  */
> +int
> +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
> +{
> +  int count = 0;
> +  extract_insn (rinsn);
> +  for (int i = 0; i < recog_data.n_opera

Re: [PATCH] RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]

2023-10-25 Thread juzhe.zh...@rivai.ai
Committed.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-25 16:35
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]
Address kito's comments of AVL propagation patch.
 
Change avl_type into avl_type_idx.
 
No functionality change.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (vlmax_avl_type_p): New function.
* config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto.
* config/riscv/riscv-vsetvl.cc (get_avl): Adapt function.
* config/riscv/vector.md (): Change avl_type into avl_type_idx.
 
---
gcc/config/riscv/riscv-protos.h  |   1 +
gcc/config/riscv/riscv-v.cc  |  12 +++
gcc/config/riscv/riscv-vsetvl.cc |   2 +-
gcc/config/riscv/vector.md   | 165 +++
4 files changed, 92 insertions(+), 88 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..fffd9cd0b8a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -559,6 +559,7 @@ bool cmp_lmul_le_one (machine_mode);
bool cmp_lmul_gt_one (machine_mode);
bool gather_scatter_valid_offset_mode_p (machine_mode);
bool vls_mode_valid_p (machine_mode);
+bool vlmax_avl_type_p (rtx_insn *);
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e39a9507803..d439ec06af0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4435,4 +4435,16 @@ expand_popcount (rtx *ops)
   emit_move_insn (dst, x4);
}
+/* Return true if it is VLMAX AVL TYPE.  */
+bool
+vlmax_avl_type_p (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int index = get_attr_avl_type_idx (rinsn);
+  if (index == INVALID_ATTRIBUTE)
+return false;
+  rtx avl_type = recog_data.operand[index];
+  return INTVAL (avl_type) == VLMAX;
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index e9dd669de98..73a6d4b7406 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -365,7 +365,7 @@ get_avl (rtx_insn *rinsn)
   if (!has_vl_op (rinsn))
 return NULL_RTX;
-  if (get_attr_avl_type (rinsn) == VLMAX)
+  if (vlmax_avl_type_p (rinsn))
 return RVV_VLMAX;
   extract_insn_cached (rinsn);
   return recog_data.operand[get_attr_vl_op_idx (rinsn)];
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ef91950178f..cea3dbf37a6 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -800,31 +800,22 @@
(const_int INVALID_ATTRIBUTE)))
;; The avl type value.
-(define_attr "avl_type" ""
-  (cond [(eq_attr "mode" 
"V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
-   
V1BI,V2BI,V4BI,V8BI,V16BI,V32BI,V64BI,V128BI,V256BI,V512BI,V1024BI,V2048BI,V4096BI,
-   V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
-   V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
-   V1DI,V2DI,V4DI,V8DI,V16DI,V32DI,V64DI,V128DI,V256DI,V512DI,
-   V1HF,V2HF,V4HF,V8HF,V16HF,V32HF,V64HF,V128HF,V256HF,V512HF,V1024HF,V2048HF,
-   V1SF,V2SF,V4SF,V8SF,V16SF,V32SF,V64SF,V128SF,V256SF,V512SF,V1024SF,
-   V1DF,V2DF,V4DF,V8DF,V16DF,V32DF,V64DF,V128DF,V256DF,V512DF")
-(symbol_ref "riscv_vector::NONVLMAX")
- (eq_attr "type" "vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
+(define_attr "avl_type_idx" ""
+  (cond [(eq_attr "type" 
"vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
  vfclass,vired,viwred,vfredu,vfredo,vfwredu,vfwredo,\
  vimovxv,vfmovfv,vlsegde,vlsegdff")
-(symbol_ref "INTVAL (operands[7])")
+(const_int 7)
(eq_attr "type" "vldm,vstm,vimov,vmalu,vmalu")
-(symbol_ref "INTVAL (operands[5])")
+(const_int 5)
;; If operands[3] of "vlds" is not vector mode, it is pred_broadcast.
;; wheras it is pred_strided_load if operands[3] is vector mode.
(eq_attr "type" "vlds")
   (if_then_else (match_test "VECTOR_MODE_P (GET_MODE (operands[3]))")
 (const_int INVALID_ATTRIBUTE)
-  (symbol_ref "INTVAL (operands[7])"))
+  (const_int 7))
(eq_attr "type" "vldux,vldox,vialu,vshift,viminmax,vimul,vidiv,vsalu,\
  viwalu,viwmul,vnshift,vaalu,vsmul,vsshift,\
@@ -832,18 +823,18 @@
  vfsgnj,vfcmp,vslideup,vslidedown,vislide1up,\
  vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
  vlsegds,vlsegdux,vlsegdox")
-(symbol_ref "INTVAL (operands[8])")
+(const_int 8)
(eq_attr "type" "vstux,vstox,vssegts,vssegtux,vssegtox")
-(symbol_ref "INTVAL (operands[5])")
+(const_int 5)
(eq_attr "type" "vimuladd,vfmuladd")
-(symbol_ref "INTVAL (operands[9])")
+(const_int 9)
(eq_attr "type" "vmsfs,vmidx,vcompress")
-(symbol_ref "INTVAL (operands[6])")
+(const_int 6)
(eq_attr "type" "vmpop,vmffs,vssegte")
-(symbol_ref "INTVAL (operands[4])")]

Re: Re: [PATCH] RISC-V: Export some functions from riscv-vsetvl to riscv-v

2023-10-25 Thread juzhe.zh...@rivai.ai
Thanks. Committed with NFC mentioned.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-10-25 17:06
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Export some functions from riscv-vsetvl to riscv-v
LGTM, but plz mention it's NFC in the title, no v2 needed :)
 
On Wed, Oct 25, 2023 at 5:03 PM Juzhe-Zhong  wrote:
>
> Address kito's comments of AVL propagation patch.
>
> Export the functions that are not only used by VSETVL PASS but also AVL 
> propagation PASS.
>
> No functionality change.
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl 
> to riscv-v
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (nonvlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (count_regno_occurrences): Ditto.
> * config/riscv/riscv-v.cc (has_vl_op): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (validate_change_or_fail): Ditto.
> (nonvlmax_avl_type_p): Ditto.
> (vlmax_avl_p): Ditto.
> (get_sew): Ditto.
> (enum vlmul_type): Ditto.
> (get_vlmul): Ditto.
> (count_regno_occurrences): Ditto.
> * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
> (has_vl_op): Ditto.
> (get_sew): Ditto.
> (get_vlmul): Ditto.
> (get_default_ta): Ditto.
> (tail_agnostic_p): Ditto.
> (count_regno_occurrences): Ditto.
> (validate_change_or_fail): Ditto.
>
> ---
>  gcc/config/riscv/riscv-protos.h  |  8 +++
>  gcc/config/riscv/riscv-v.cc  | 83 
>  gcc/config/riscv/riscv-vsetvl.cc | 70 ---
>  3 files changed, 91 insertions(+), 70 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index fffd9cd0b8a..668d75043ca 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -560,6 +560,14 @@ bool cmp_lmul_gt_one (machine_mode);
>  bool gather_scatter_valid_offset_mode_p (machine_mode);
>  bool vls_mode_valid_p (machine_mode);
>  bool vlmax_avl_type_p (rtx_insn *);
> +bool has_vl_op (rtx_insn *);
> +bool tail_agnostic_p (rtx_insn *);
> +void validate_change_or_fail (rtx, rtx *, rtx, bool);
> +bool nonvlmax_avl_type_p (rtx_insn *);
> +bool vlmax_avl_p (rtx);
> +uint8_t get_sew (rtx_insn *);
> +enum vlmul_type get_vlmul (rtx_insn *);
> +int count_regno_occurrences (rtx_insn *, unsigned int);
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index d439ec06af0..3fe8125801b 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -4447,4 +4447,87 @@ vlmax_avl_type_p (rtx_insn *rinsn)
>return INTVAL (avl_type) == VLMAX;
>  }
>
> +/* Return true if it is an RVV instruction depends on VL global
> +   status register.  */
> +bool
> +has_vl_op (rtx_insn *rinsn)
> +{
> +  return recog_memoized (rinsn) >= 0 && get_attr_has_vl_op (rinsn);
> +}
> +
> +/* Get default tail policy.  */
> +static bool
> +get_default_ta ()
> +{
> +  /* For the instruction that doesn't require TA, we still need a default 
> value
> + to emit vsetvl. We pick up the default value according to prefer 
> policy. */
> +  return (bool) (get_prefer_tail_policy () & 0x1
> +|| (get_prefer_tail_policy () >> 1 & 0x1));
> +}
> +
> +/* Helper function to get TA operand.  */
> +bool
> +tail_agnostic_p (rtx_insn *rinsn)
> +{
> +  /* If it doesn't have TA, we return agnostic by default.  */
> +  extract_insn_cached (rinsn);
> +  int ta = get_attr_ta (rinsn);
> +  return ta == INVALID_ATTRIBUTE ? get_default_ta () : IS_AGNOSTIC (ta);
> +}
> +
> +/* Change insn and Assert the change always happens.  */
> +void
> +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
> +{
> +  bool change_p = validate_change (object, loc, new_rtx, in_group);
> +  gcc_assert (change_p);
> +}
> +
> +/* Return true if it is NONVLMAX AVL TYPE.  */
> +bool
> +nonvlmax_avl_type_p (rtx_insn *rinsn)
> +{
> +  extract_insn_cached (rinsn);
> +  int index = get_attr_avl_type_idx (rinsn);
> +  if (index == INVALID_ATTRIBUTE)
> +return false;
> +  rtx avl_type = recog_data.operand[index];
> +  return INTVAL (avl_type) == NONVLMAX;
> +}
> +
> +/* Return true if RTX is RVV VLMAX AVL.  */
> +bool
> +vlmax_avl_p (rtx x)
> +{
> +  return x && rtx_equal_p (x, RVV_VLMAX);
> +}
> +
> +/* Helper function to get SEW operand. We always have SEW value for
> +   all RVV instructions that have VTYPE OP.  */
> +uint8_t
> +get_sew (rtx_insn *rinsn)
> +{
> +  return get_attr_sew (rinsn);
> +}
> +
> +/* Helper function to get VLMUL operand. We always have VLMUL value for
> +   all RVV instructions that have VTYPE OP. */
> +enum vlmul_type
> +get_vlmul (rtx_insn *rinsn)
> +{
> +  return (enum vlmul

Re: [PATCH] config, aarch64: Use a more compatible sed invocation.

2023-10-25 Thread Richard Earnshaw




On 24/10/2023 16:53, Iain Sandoe wrote:

Although this came up initially when working on the Darwin Arm64
port, it also breaks cross-compilers on platforms with non-GNU sed.

Tested on x86_64-darwin X aarch64-linux-gnu, aarch64-darwin,
aarch64-linux-gnu and x86_64-linux-gnu.  OK for master?
thanks,
Iain

--- 8< ---

Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension to the -e (recognising extended regex).

This is failing on Darwin, which defaults to Posix behaviour for -e.
However '-E' is accepted to indicate an extended RE.  Strictly, this
is also not really sufficient, since we should only require a Posix
sed (but it seems supported for BSD-derivatives).



The man pages I have for linux, freebsd and macos all show something 
pretty similar:


  -e script
  add the script to the commands to be executed

Wording varies slightly, but I think the meaning is clearly the same. 
So this really has nothing to do with extended regexps.


That means, I think, that we really want '-E -e 

Disentangle handling of OpenACC 'host', 'self' pragma tokens (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

I found this:

> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc

>  static tree
>  c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
> -const char *where, bool finish_p = true)
> +const char *where, bool finish_p = true,
> +bool compute_p = false)
>  {
>tree clauses = NULL;
>bool first = true;
> @@ -18064,7 +18100,18 @@ c_parser_oacc_all_clauses (c_parser *parser, 
> omp_clause_mask mask,
>   c_parser_consume_token (parser);
>
>here = c_parser_peek_token (parser)->location;
> -  c_kind = c_parser_omp_clause_name (parser);
> +
> +  /* For OpenACC compute directives */
> +  if (compute_p
> +   && c_parser_next_token_is (parser, CPP_NAME)
> +   && !strcmp (IDENTIFIER_POINTER (c_parser_peek_token (parser)->value),
> +   "self"))
> + {
> +   c_kind = PRAGMA_OACC_CLAUSE_SELF;
> +   c_parser_consume_token (parser);
> + }
> +  else
> + c_kind = c_parser_omp_clause_name (parser);

..., and similarly in the C++ and (to a lesser extent) Fortran front ends
a bit twisted, and pushed to master branch
commit c92509d9fd98e02d17ab1610f696c88f606dcdf4
"Disentangle handling of OpenACC 'host', 'self' pragma tokens", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c92509d9fd98e02d17ab1610f696c88f606dcdf4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 20 Oct 2023 14:47:58 +0200
Subject: [PATCH] Disentangle handling of OpenACC 'host', 'self' pragma tokens

'gcc/c-family/c-pragma.h:pragma_omp_clause' already defines
'PRAGMA_OACC_CLAUSE_SELF', but it has no longer been used for the 'update'
directive's 'self' clause as of 2018
commit 829c6349e96c5bfa8603aaef8858b38e237a2f33 (Subversion r261813)
"Update OpenACC data clause semantics to the 2.5 behavior".  That one instead
mapped the 'self' pragma token to the 'host' one (same semantics).  That means
that we're later not able to tell whether originally we had seen 'self' or
'host', which was OK as long as only the 'update' directive had a 'self'
clause.  However, as of recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", also OpenACC
compute constructs may have a 'self' clause -- with different semantics.  That
means, we need to know which OpenACC directive we're parsing clauses for, which
can be done in a simpler way than in that commit, similar to how the OpenMP
'to' clause is handled.

While at that, clarify that (already in OpenACC 2.0a)
"The 'host' clause is a synonym for the 'self' clause." -- not the other way
round.

	gcc/c/
	* c-parser.cc (c_parser_omp_clause_name): Return
	'PRAGMA_OACC_CLAUSE_SELF' for "self".
	(c_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
	(c_parser_oacc_all_clauses): Remove 'bool compute_p' formal
	parameter, and instead locally determine whether we're called for
	an OpenACC compute construct or OpenACC 'update' directive.
	(c_parser_oacc_compute): Adjust.
	gcc/cp/
	* parser.cc (cp_parser_omp_clause_name): Return
	'PRAGMA_OACC_CLAUSE_SELF' for "self".
	(cp_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
	(cp_parser_oacc_all_clauses): Remove 'bool compute_p' formal
	parameter, and instead locally determine whether we're called for
	an OpenACC compute construct or OpenACC 'update' directive.
	(cp_parser_oacc_compute): Adjust.
	gcc/fortran/
	* openmp.cc (omp_mask2): Split 'OMP_CLAUSE_HOST_SELF' into
	'OMP_CLAUSE_SELF', 'OMP_CLAUSE_HOST'.
	(gfc_match_omp_clauses, OACC_UPDATE_CLAUSES): Adjust.
---
 gcc/c/c-parser.cc | 38 +-
 gcc/cp/parser.cc  | 39 +--
 gcc/fortran/openmp.cc | 27 ++-
 3 files changed, 48 insertions(+), 56 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index a82f5afeff7..5213a57a1ec 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14061,8 +14061,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	result = PRAGMA_OMP_CLAUSE_SCHEDULE;
 	  else if (!strcmp ("sections", p))
 	result = PRAGMA_OMP_CLAUSE_SECTIONS;
-	  else if (!strcmp ("self", p)) /* "self" is a synonym for "host".  */
-	result = PRAGMA_OACC_CLAUSE_HOST;
+	  else if (!strcmp ("self", p))
+	result = PRAGMA_OACC_CLAUSE_SELF;
 	  else if (!strcmp ("seq", p))
 	result = PRAGMA_OACC_CLAUSE_SEQ;
 	  else if (!strcmp ("s

Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread Robin Dapp
> Hmmm. I am not familiar with Binutils...
> 
> I just adapted tests like others in the testsuite make them consistent.
> And turns out it can fix the issues..

I see where you're coming from, but can you assemble/link any
executable with -march=..._zvfh?  Probably not?  Doesn't half of
GCC's testsuite fail then?

So rather than overwrite the default options we should either
add an effective-target check in target-supports.exp or in those
particular tests.  I believe the others like compress should do
the same thing.  I can do that at some point if you don't want it
but right now I'm on other things.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite

2023-10-25 Thread juzhe.zh...@rivai.ai
>> Doesn't half of
>> GCC's testsuite fail then?
No. Only a few tests failed (The tests are mentioned in this patch).
All other tests passed no matter how I configure toolchain building.


>>  I can do that at some point if you don't want it
>> but right now I'm on other things.
No worry, I won't commit this patch. I will use this patch in my local.
You can fix it when you have time.
I don't know how to fix it since I am really noob about testing.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-25 17:15
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix multiple EXCESS test FAILs in RVV testsuite
> Hmmm. I am not familiar with Binutils...
> 
> I just adapted tests like others in the testsuite make them consistent.
> And turns out it can fix the issues..
 
I see where you're coming from, but can you assemble/link any
executable with -march=..._zvfh?  Probably not?  Doesn't half of
GCC's testsuite fail then?
 
So rather than overwrite the default options we should either
add an effective-target check in target-supports.exp or in those
particular tests.  I believe the others like compress should do
the same thing.  I can do that at some point if you don't want it
but right now I'm on other things.
 
Regards
Robin
 


Consistently order 'OMP_CLAUSE_SELF' right after 'OMP_CLAUSE_IF' (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 13 Jun 2023 08:44:31 -0700
> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

> it essentially behaves like the 'if' clause with the condition inverted)

Pushed to master branch commit a5e919027fdb1900a6f2d64f763c99dbaf98aee6
"Consistently order 'OMP_CLAUSE_SELF' right after 'OMP_CLAUSE_IF'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a5e919027fdb1900a6f2d64f763c99dbaf98aee6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 23 Oct 2023 14:24:44 +0200
Subject: [PATCH] Consistently order 'OMP_CLAUSE_SELF' right after
 'OMP_CLAUSE_IF'

As noted in recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", the OpenACC 'self'
clause very much relates to the 'if' clause, and therefore copies a lot of the
latter's handling.  Therefore it makes sense to also place this handling in
proximity to that of the 'if' clause, which was done in a lot but not all
instances.

	gcc/
	* tree-core.h (omp_clause_code): Move 'OMP_CLAUSE_SELF' after
	'OMP_CLAUSE_IF'.
	* tree-pretty-print.cc (dump_omp_clause): Adjust.
	* tree.cc (omp_clause_num_ops, omp_clause_code_name): Likewise.
	* tree.h: Likewise.
---
 gcc/tree-core.h  |  6 +++---
 gcc/tree-pretty-print.cc | 13 +++--
 gcc/tree.cc  |  4 ++--
 gcc/tree.h   |  4 ++--
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index cfe37c1d627..4dc36827d32 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -389,6 +389,9 @@ enum omp_clause_code {
   /* OpenACC/OpenMP clause: if (scalar-expression).  */
   OMP_CLAUSE_IF,
 
+  /* OpenACC clause: self.  */
+  OMP_CLAUSE_SELF,
+
   /* OpenMP clause: num_threads (integer-expression).  */
   OMP_CLAUSE_NUM_THREADS,
 
@@ -527,9 +530,6 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
-
-  /* OpenACC clause: self.  */
-  OMP_CLAUSE_SELF,
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 39ec1df9394..1fadd752d05 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -587,6 +587,13 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
   pp_right_paren (pp);
   break;
 
+case OMP_CLAUSE_SELF:
+  pp_string (pp, "self(");
+  dump_generic_node (pp, OMP_CLAUSE_SELF_EXPR (clause),
+			 spc, flags, false);
+  pp_right_paren (pp);
+  break;
+
 case OMP_CLAUSE_NUM_THREADS:
   pp_string (pp, "num_threads(");
   dump_generic_node (pp, OMP_CLAUSE_NUM_THREADS_EXPR (clause),
@@ -1453,12 +1460,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
 			 false);
   pp_right_paren (pp);
   break;
-case OMP_CLAUSE_SELF:
-  pp_string (pp, "self(");
-  dump_generic_node (pp, OMP_CLAUSE_SELF_EXPR (clause),
-			 spc, flags, false);
-  pp_right_paren (pp);
-  break;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/tree.cc b/gcc/tree.cc
index c38b09c431b..cfead156ddf 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -280,6 +280,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE__CONDTEMP_  */
   1, /* OMP_CLAUSE__SCANTEMP_  */
   1, /* OMP_CLAUSE_IF  */
+  1, /* OMP_CLAUSE_SELF */
   1, /* OMP_CLAUSE_NUM_THREADS  */
   1, /* OMP_CLAUSE_SCHEDULE  */
   0, /* OMP_CLAUSE_NOWAIT  */
@@ -326,7 +327,6 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
-  1, /* OMP_CLAUSE_SELF */
 };
 
 const char * const omp_clause_code_name[] =
@@ -372,6 +372,7 @@ const char * const omp_clause_code_name[] =
   "_condtemp_",
   "_scantemp_",
   "if",
+  "self",
   "num_threads",
   "schedule",
   "nowait",
@@ -418,7 +419,6 @@ const char * const omp_clause_code_name[] =
   "if_present",
   "finalize",
   "nohost",
-  "self",
 };
 
 /* Unless specific to OpenACC, we tend to internally maintain OpenMP-centric
diff --git a/gcc/tree.h b/gcc/tree.h
index aaf744c060e..ac94bd7b460 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1724,6 +1724,8 @@ class auto_suppress_location_wrappers
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FINAL), 0)
 #define OMP_CLAUSE_IF_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_

[PATCH v2] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-25 Thread Roger Sayle

Hi Jeff,
Many thanks for the review/approval of my fix for PR rtl-optimization/91865.
Based on your and Richard Biener's feedback, I’d like to propose a revision
calling simplify_unary_operation instead of simplify_const_unary_operation
(i.e. Richi's recommendation).  I was originally concerned that this might
potentially result in unbounded recursion, and testing for ZERO_EXTEND was
safer but "uglier", but testing hasn't shown any issues.  If we do see issues
in the future, it's easy to fall back to the previous version of this patch.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-25  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.


Thanks again,
Roger
--

> -Original Message-
> From: Jeff Law 
> Sent: 19 October 2023 16:20
> 
> On 10/14/23 16:14, Roger Sayle wrote:
> >
> > This patch is my proposed solution to PR rtl-optimization/91865.
> > Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> > possible for combine's make_compound_operation to unintentionally
> > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> > unlikely to be matched by the backend.
> >
> > For the new test case:
> >
> > const int table[2] = {1, 2};
> > int foo (char i) { return table[i]; }
> >
> > compiling with -O2 -mlarge on msp430 we currently see:
> >
> > Trying 2 -> 7:
> >  2: r25:HI=zero_extend(R12:QI)
> >REG_DEAD R12:QI
> >  7: r28:PSI=sign_extend(r25:HI)#0
> >REG_DEAD r25:HI
> > Failed to match this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >  (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]
> >
> > which results in the following code:
> >
> > foo:AND #0xff, R12
> >  RLAM.A #4, R12 { RRAM.A #4, R12
> >  RLAM.A  #1, R12
> >  MOVX.W  table(R12), R12
> >  RETA
> >
> > With this patch, we now see:
> >
> > Trying 2 -> 7:
> >  2: r25:HI=zero_extend(R12:QI)
> >REG_DEAD R12:QI
> >  7: r28:PSI=sign_extend(r25:HI)#0
> >REG_DEAD r25:HI
> > Successfully matched this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >  (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> > cost 8
> >
> > foo:MOV.B   R12, R12
> >  RLAM.A  #1, R12
> >  MOVX.W  table(R12), R12
> >  RETA
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-10-14  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR rtl-optimization/91865
> >  * combine.cc (make_compound_operation): Avoid creating a
> >  ZERO_EXTEND of a ZERO_EXTEND.
> Final question.  Is there a reasonable expectation that we could get a
> similar situation with sign extensions?   If so we probably ought to try
> and handle both.
> 
> OK with the obvious change to handle nested sign extensions if you think it's
> useful to do so.  And OK as-is if you don't think handling nested sign 
> extensions is
> useful.
> 
> jeff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 360aa2f25e6..b1b16ac7bb2 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8449,8 +8449,8 @@ make_compound_operation (rtx x, enum rtx_code in_code)
   if (code == ZERO_EXTEND)
 {
   new_rtx = make_compound_operation (XEXP (x, 0), next_code);
-  tem = simplify_const_unary_operation (ZERO_EXTEND, GET_MODE (x),
-   new_rtx, GET_MODE (XEXP (x, 0)));
+  tem = simplify_unary_operation (ZERO_EXTEND, GET_MODE (x),
+ new_rtx, GET_MODE (XEXP (x, 0)));
   if (tem)
return tem;
   SUBST (XEXP (x, 0), new_rtx);
diff --git a/gcc/testsuite/gcc.target/msp430/pr91865.c 
b/gcc/testsuite/gcc.target/msp430/pr91865.c
new file mode 100644
index 000..8cc21c8b9e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr91865.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlarge" } */
+
+const int table[2] = {1, 2};
+int foo (char i) { return table[i]; }
+
+/* { dg-final { scan-assembler-not "AND" } } */
+/* { dg-final { scan-assembler-not "RRAM" } } */


[PATCH] s390: Fix constraint for insn *cmphi_ccu

2023-10-25 Thread Stefan Schulze Frielinghaus
Currently for an unsigned 16-bit comparison between memory and an
immediate where the high bit is set, a clc is emitted.  This is because
the constant is created for mode HI and therefore sign extended.  This
means constraint D does not hold anymore.  Since the mode already
restricts the immediate to 16 bit, it is enough to make use of
constraint n and chop of the high bits in the output template.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
use of constraint n instead of D and chop of high bits in the
output template.
---
 gcc/config/s390/s390.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba21442..777a20f8e77 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1355,13 +1355,13 @@
 (define_insn "*cmphi_ccu"
   [(set (reg CC_REGNUM)
 (compare (match_operand:HI 0 "nonimmediate_operand" "d,d,Q,Q,BQ")
- (match_operand:HI 1 "general_operand"  "Q,S,D,BQ,Q")))]
+ (match_operand:HI 1 "general_operand"  "Q,S,n,BQ,Q")))]
   "s390_match_ccmode (insn, CCUmode)
&& !register_operand (operands[1], HImode)"
   "@
clm\t%0,3,%S1
clmy\t%0,3,%S1
-   clhhsi\t%0,%1
+   clhhsi\t%0,%x1
#
#"
   [(set_attr "op_type" "RS,RSY,SIL,SS,SS")
-- 
2.41.0



Extend test suite coverage for OpenACC 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 13 Jun 2023 08:44:31 -0700
> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

>  .../c-c++-common/goacc/self-clause-1.c|  22 +
>  .../c-c++-common/goacc/self-clause-2.c|  17 +
>  gcc/testsuite/gfortran.dg/goacc/self.f95  |  53 +

>  .../libgomp.oacc-c-c++-common/self-1.c| 962 ++

I found that insufficient, and added some more.  Pushed to
master branch commit 047841a68ebf5f991e842961f9e54f3c10b94f2c
"Extend test suite coverage for OpenACC 'self' clause for compute constructs",
see attached.  This is mostly just adapting and cross-linking some
existing 'if' clause test cases.  (..., which turned up a problem when
the 'self' clause is used with OpenACC 'kernels'.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 047841a68ebf5f991e842961f9e54f3c10b94f2c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 23 Oct 2023 14:53:29 +0200
Subject: [PATCH] Extend test suite coverage for OpenACC 'self' clause for
 compute constructs

... on top of what was provided in recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".

	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: Enhance.
	* c-c++-common/goacc/self-clause-1.c: Likewise.
	* c-c++-common/goacc/self-clause-2.c: Likewise.
	* gfortran.dg/goacc/if.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/self.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance.
	* testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
---
 .../c-c++-common/goacc/if-clause-2.c  |   2 +
 .../c-c++-common/goacc/self-clause-1.c|   6 +
 .../c-c++-common/goacc/self-clause-2.c|  20 +
 gcc/testsuite/gfortran.dg/goacc/if.f95|  10 +-
 .../gfortran.dg/goacc/kernels-tree.f95|   5 +-
 .../gfortran.dg/goacc/parallel-tree.f95   |   3 +-
 gcc/testsuite/gfortran.dg/goacc/self.f95  |   8 +
 .../libgomp.oacc-c-c++-common/if-1.c  |   4 +
 .../libgomp.oacc-c-c++-common/if-self-1.c |  36 +
 .../libgomp.oacc-c-c++-common/self-1.c|   5 +
 .../testsuite/libgomp.oacc-fortran/if-1.f90   |   4 +
 .../testsuite/libgomp.oacc-fortran/self-1.f90 | 996 ++
 12 files changed, 1094 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/if-self-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/self-1.f90

diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index a48072509e1..71475521758 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,3 +1,5 @@
+/* See also 'self-clause-2.c'.  */
+
 /* { dg-additional-options "-fdump-tree-gimple" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
{ dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
index fe892bea210..28de3dc0584 100644
--- a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
@@ -5,6 +5,8 @@ f (int b)
 {
   struct { int i; } *p;
 
+#pragma acc parallel self(0) self(b) /* { dg-error "too many 'self' clauses" } */
+  ;
 #pragma acc parallel self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
 #pragma acc parallel self(*p)
@@ -12,6 +14,8 @@ f (int b)
  { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
+#pragma acc kernels self(0) self(b) /* { dg-error "too many 'self' clauses" } */
+  ;
 #pragma acc kernels self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
 #pragma acc kernels self(*p)
@@ -19,6 +23,8 @@ f (int b)
  { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
+#pragma acc serial self(0) self(b) /* { dg-error "too many 'self' clauses" } */
+  ;
 #pragma acc serial self self(b) /* { dg-error "too many 'self' clauses" } */
   ;

Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' decomposition (was: Extend test suite coverage for OpenACC 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Impl

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T11:29:52+0200, I wrote:
> On 2023-10-25T10:57:06+0200, I wrote:
>> With minor textual conflicts resolved, I've pushed this to master branch
>> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
>> "OpenACC 2.7: Implement self clause for compute constructs", see
>> attached.
>>
>>
>> I'll then apply/submit a number of follow-on commits.
>
>> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
>> From: Chung-Lin Tang 
>> Date: Tue, 13 Jun 2023 08:44:31 -0700
>> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs
>
>>  .../c-c++-common/goacc/self-clause-1.c|  22 +
>>  .../c-c++-common/goacc/self-clause-2.c|  17 +
>>  gcc/testsuite/gfortran.dg/goacc/self.f95  |  53 +
>
>>  .../libgomp.oacc-c-c++-common/self-1.c| 962 ++
>
> I found that insufficient, and added some more.  Pushed to
> master branch commit 047841a68ebf5f991e842961f9e54f3c10b94f2c
> "Extend test suite coverage for OpenACC 'self' clause for compute constructs",
> see attached.  This is mostly just adapting and cross-linking some
> existing 'if' clause test cases.  (..., which turned up a problem when
> the 'self' clause is used with OpenACC 'kernels'.)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/self-1.f90
> @@ -0,0 +1,996 @@
> +! OpenACC 'self' clause.
> +
> +! This is 'if-1.f90' with 'self(!cond)' instead of 'if(cond)' on compute
> +! constructs.
> +! ..., which the exception of certain 'kernels' constructs.

..., which I've then fixed up per master branch
commit 7b2ae64b68132c1c643cb34d58cd5eab6f9de652
"Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' 
decomposition",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7b2ae64b68132c1c643cb34d58cd5eab6f9de652 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 23 Oct 2023 15:28:30 +0200
Subject: [PATCH] Handle OpenACC 'self' clause for compute constructs in
 OpenACC 'kernels' decomposition

... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs" for that case.

	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Handle 'OMP_CLAUSE_SELF' like 'OMP_CLAUSE_IF'.
	* omp-expand.cc (expand_omp_target): Handle 'OMP_CLAUSE_SELF' for
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	gcc/testsuite/
	* c-c++-common/goacc/self-clause-2.c: Verify
	'--param=openacc-kernels=decompose'.
	* gfortran.dg/goacc/kernels-tree.f95: Adjust.
	libgomp/
	* oacc-parallel.c (GOACC_data_start): Handle
	'GOACC_FLAG_LOCAL_DEVICE'.
	(GOACC_parallel_keyed): Simplify accordingly.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Adjust.
---
 gcc/omp-expand.cc   | 14 --
 gcc/omp-oacc-kernels-decompose.cc   | 15 ---
 .../c-c++-common/goacc/self-clause-2.c  |  6 ++
 .../gfortran.dg/goacc/kernels-tree.f95  |  2 +-
 libgomp/oacc-parallel.c | 17 +
 .../testsuite/libgomp.oacc-fortran/self-1.f90   | 15 +++
 6 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 8576b938102..5c6a7f2e381 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -10334,9 +10334,19 @@ expand_omp_target (struct omp_region *region)
 
   if ((c = omp_find_clause (clauses, OMP_CLAUSE_SELF)) != NULL_TREE)
 {
-  gcc_assert (is_gimple_omp_oacc (entry_stmt) && offloaded);
+  gcc_assert ((is_gimple_omp_oacc (entry_stmt) && offloaded)
+		  || (gimple_omp_target_kind (entry_stmt)
+		  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS));
 
-  edge e = split_block_after_labels (new_bb);
+  edge e;
+  if (offloaded)
+	e = split_block_after_labels (new_bb);
+  else
+	{
+	  gsi = gsi_last_nondebug_bb (new_bb);
+	  gsi_prev (&gsi);
+	  e = split_block (new_bb, gsi_stmt (gsi));
+	}
   basic_block cond_bb = e->src;
   new_bb = e->dest;
   remove_edge (e);
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index ffc0a8f813e..dfbb34935d0 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1519,17 +1519,18 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
 	  break;
 	}
 	}
-  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF
+	   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SELF)
 	{
-	  /* If there is an 'if' clause, it must be duplicated to the
-	 enclosing data region.  Temporarily remove the if clause's
-	 chain to avoid copying it.  */
+	  /* If there is an 'if' or 'self' clause, it must be duplicated to the
+	 enclosing data region.  

Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-25 Thread Jiufu Guo


Hi,

"Kewen.Lin"  writes:

> on 2023/10/25 16:14, Jiufu Guo wrote:
>> 
>> Hi,
>> 
>> "Kewen.Lin"  writes:
>> 
>>> Hi,
>>>
>>> on 2023/10/25 10:00, Jiufu Guo wrote:
 Hi,

 Sometimes, a complicated constant is built via 3(or more)
 instructions to build. Generally speaking, it would not be
 as faster as loading it from the constant pool (as a few
 discussions in PR63281).
>>>
>>> I may miss some previous discussions, but I'm curious why we
>>> chose ">=3" here, as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c9
>>> which indicates that more than 3 (>3) should be considered
>>> with this change.
>> 
>> Thanks a lot for your great patience for reading the history!
>> Yes, there are some discussions about "> 3" vs. "> 2".
>> - In theory, "ld" is one instruction.  If consider "address/toc"
>>   adjust, we may count it as 2 instructions. "pld" may need less
>>   cycles.
>
> OK, even without prefixed insn support, the high part of address
> computation could be optimized as nop by linker further.  It would
> be good to say something on this in commit log, otherwise people
> may be confused as the PR comment mentioned above.
>
>> - As test, it seems "> 2" could get better/stable runtime result
>>   during testing SPEC2017.
>
> Ok, if you posted the conclusion previously, it would be good to
> mention it here with a link on the result comparisons.

Thanks a lot for your sugguestions.

>
>> 
>>>

 For the concern that I raised in:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
 The micro-cases would not be the major concern. Because as
 Segher explained in:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
 It would just be about the benchmark method.

 As tested on spec2017, for visible performance changes, we
 can find the runtime improvement on 500.perlbench_r about
 ~1.8% (-O2) when support loading complicates constant from
 constant pool. And no visible performance recession on
 other benchmarks.
>>>
>>> The improvement on 500.perlbench_r looks to match what PR63281
>>> mentioned, nice!  I'm curious that which options and which kinds
>>> of CPUs have you tested with?  Since this is a general change,
>>> I'd expect we can test with P8/P9/P10 at O2/O3 (or Ofast) at
>>> least.
>> 
>> Great advice! Thanks for pointing this!
>> A few months ago, P8/P9/P10 are tested. While this time, I rerun
>> SPEC2017 on P10 for -O2 and -O3.  More test on latest code would
>> be better.
>
> Was it tested previously with your recent commits on constant
> building together?  or just with the trunk at that time?  Anyway,

Just with the trunk at that time.

> I was curious how it's tested, thanks for replying, good to see
> those are covered.  :)  I'd leave the further review to Segher and
> David.

Thanks again.

BR,
Jeff (Jiufu Guo)

>
> BR,
> Kewen
>
>> 
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>>
>>> BR,
>>> Kewen
>>>

 Boostrap & regtest pass on ppc64{,le}.
 Is this ok for trunk?

 BR,
 Jeff (Jiufu Guo)

PR target/63281

 gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
complicate constant to memory.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
* gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
point.
* gcc.target/powerpc/pr106550.c: Likewise..
* gcc.target/powerpc/pr106550_1.c: Likewise.
* gcc.target/powerpc/pr87870.c: Update according to latest behavior.
* gcc.target/powerpc/pr93012.c: Likewise.

 ---
  gcc/config/rs6000/rs6000.cc | 16 
  .../gcc.target/powerpc/const_anchors.c  |  5 ++---
  .../gcc.target/powerpc/parall_5insn_const.c | 14 --
  gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
  gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
  gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
  gcc/testsuite/gcc.target/powerpc/pr93012.c  |  4 +++-
  7 files changed, 65 insertions(+), 11 deletions(-)

 diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
 index 4690384cdbe..b9562f1ea0f 100644
 --- a/gcc/config/rs6000/rs6000.cc
 +++ b/gcc/config/rs6000/rs6000.cc
 @@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
  c = sext_hwi (c, 32);
  emit_move_insn (lo, GEN_INT (c));
}
 +
 +  /* If it can be stored to the constant pool and profitable.  */
 +  else if (base_reg_operand (dest, mode)
 + && num_insns_constant (source, mode) > 2)
 +  {
 +rtx sym = force_const_mem (mode, source);
 +if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
 +&& use_toc_relative_ref (XEXP (sym, 0), mode))
 +  {
 +

Minor fixes for OpenACC/Fortran 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

Regarding the Fortran front end changes:

> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 13 Jun 2023 08:44:31 -0700
> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -1546,6 +1546,7 @@ typedef struct gfc_omp_clauses
>gfc_omp_namelist *lists[OMP_LIST_NUM];
>struct gfc_expr *if_expr;
>struct gfc_expr *if_exprs[OMP_IF_LAST];
> +  struct gfc_expr *self_expr;
>struct gfc_expr *final_expr;
>struct gfc_expr *num_threads;
>struct gfc_expr *chunk_size;

..., this needs to be handled in a few more places, I think...

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc

> @@ -6615,6 +6631,8 @@ gfc_split_omp_clauses (gfc_code *code,
> /* And this is copied to all.  */
> clausesa[GFC_OMP_SPLIT_TARGET].if_expr
>   = code->ext.omp_clauses->if_expr;
> +   clausesa[GFC_OMP_SPLIT_TARGET].self_expr
> + = code->ext.omp_clauses->self_expr;
> clausesa[GFC_OMP_SPLIT_TARGET].nowait
>   = code->ext.omp_clauses->nowait;
>   }

..., but this change isn't necessary: that function is for OpenMP only,
and generally doesn't (have to) care about OpenACC-only clauses.

OK to push the attached
"Minor fixes for OpenACC/Fortran 'self' clause for compute constructs",
or is anything more needed?


Also, I've filed 
"Missing OpenACC/Fortran handling in 'gcc/fortran/frontend-passes.c'",
which applies generally, not just to the OpenACC 'self' clause on compute
constructs.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 943de6d5f1498aabfc343bf5e9dd6c2b63fc55ed Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 20 Oct 2023 15:49:35 +0200
Subject: [PATCH] Minor fixes for OpenACC/Fortran 'self' clause for compute
 constructs

... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".

	gcc/fortran/
	* dump-parse-tree.cc (show_omp_clauses): Handle 'self_expr'.
	* openmp.cc (gfc_free_omp_clauses): Likewise.
	* trans-openmp.cc (gfc_split_omp_clauses): Don't handle 'self_expr'.
---
 gcc/fortran/dump-parse-tree.cc | 6 ++
 gcc/fortran/openmp.cc  | 1 +
 gcc/fortran/trans-openmp.cc| 2 --
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index cc4846e5d74..26391df46e6 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1614,6 +1614,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   show_expr (omp_clauses->if_exprs[i]);
   fputc (')', dumpfile);
 }
+  if (omp_clauses->self_expr)
+{
+  fputs (" SELF(", dumpfile);
+  show_expr (omp_clauses->self_expr);
+  fputc (')', dumpfile);
+}
   if (omp_clauses->final_expr)
 {
   fputs (" FINAL(", dumpfile);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 2e2e23d567b..5e3cd0570bb 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -163,6 +163,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->if_expr);
   for (i = 0; i < OMP_IF_LAST; i++)
 gfc_free_expr (c->if_exprs[i]);
+  gfc_free_expr (c->self_expr);
   gfc_free_expr (c->final_expr);
   gfc_free_expr (c->num_threads);
   gfc_free_expr (c->chunk_size);
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 82bbc41b388..00782ee1716 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -6631,8 +6631,6 @@ gfc_split_omp_clauses (gfc_code *code,
 	  /* And this is copied to all.  */
 	  clausesa[GFC_OMP_SPLIT_TARGET].if_expr
 	= code->ext.omp_clauses->if_expr;
-	  clausesa[GFC_OMP_SPLIT_TARGET].self_expr
-	= code->ext.omp_clauses->self_expr;
 	  clausesa[GFC_OMP_SPLIT_TARGET].nowait
 	= code->ext.omp_clauses->nowait;
 	}
-- 
2.34.1



Re: [PATCH v2] AArch64: Improve immediate generation

2023-10-25 Thread Richard Earnshaw




On 24/10/2023 18:27, Wilco Dijkstra wrote:

v2: Use check-function-bodies in tests

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
 * gcc.target/aarch64/moveor_imm.c: Add new test.
 * gcc.target/aarch64/pr106583.c: Change tests.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
  }
  
/* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */

diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..a1fc90ad73411ae8ed848fa321586afcb8d710aa
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -1,32 +1,64 @@
  /* { dg-do compile } */
  /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
  
  /* Go from four moves to two.  */
  
+/*

+** foo:
+** mov w[0-9]+, 2576980377
+** movkx[0-9]+, 0x, lsl 32
+** ...
+*/
+
  int
  foo (long long x)
  {
-  return x <= 0x1998;
+  return x <= 0x9998;
  }
  
+/*

+** GT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  GT (unsigned int x)
  {
return x > 0xfefe;
  }
  
+/*

+** LE:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  LE (unsigned int x)
  {
return x <= 0xfefe;
  }
  
+/*

+** GE:
+** mov w[0-9]+, 4278190079
+** ...
+*/
+
  int
  GE (long long x)
  {
return x >= 0xff00;
  }
  
+/*

+** LT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  LT (int x)
  {
@@ -35,6 +67,13 @@ LT (int x)
  
  /* Optimize the immediate in conditionals.  */
  
+/*

+** check:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  check (int x, int y)
  {
@@ -44,11 +83,15 @@ check (int x, int y)
return x;
  }
  
+/*

+** tern:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  tern (int x)
  {
return x >= 0xff00 ? 5 : -3;
  }
-
-/* baz produces one movk instruction.  */
-/* { dg-final { scan-assembler-times "movk" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..1c0c3f3bf8c588f9661112a8b3f9a72c5ddff95c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**  movx0, -6148914691236517206
+** eor x0, x0, -9223372036854775807
+** ret
+*/


Some odd white space above.

Also, I think it would be better to write the tests as

** f1:
**   ...
**   
**   ...

Then different prologue and epilogue options (such as BTI or pac-ret) 
won't affect the tests.



+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+/*
+** f2:
+** mov x0, -1085102592571150096
+** eor x0, x0, -2305843009213693951
+** ret
+*/
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+/*
+** f3:
+** mov x0, -3689348814741910324
+** eor x0, x0, -4611686018427387903
+** ret
+*/
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+/*
+** f4:
+** mov x0, -7378697629483820647
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+/*
+** f5:
+** mov x0, 3689348814741910323
+** eor x0, x0, 864691128656461824
+** ret
+*/
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f93

Re: [RFC 1/2] RISC-V: Add support for _Bfloat16.

2023-10-25 Thread Jin Ma
> >>> +;; The conversion of DF to BF needs to be done with SF if there is a
> >>> +;; chance to generate at least one instruction, otherwise just using
> >>> +;; libfunc __truncdfbf2.
> >>> +(define_expand "truncdfbf2"
> >>> +  [(set (match_operand:BF 0 "register_operand" "=f")
> >>> +   (float_truncate:BF
> >>> +   (match_operand:DF 1 "register_operand" " f")))]
> >>> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> >>> +  {
> >>> +convert_move (operands[0],
> >>> +   convert_modes (SFmode, DFmode, operands[1], 0), 0);
> >>> +DONE;
> >>> +  })
> >> So for conversions to/from BFmode, doesn't generic code take care of
> >> this for us?  Search for convert_mode_scalar in expr.cc. That code will
> >> utilize SFmode as an intermediate step just like your expander.   Is
> >> there some reason that generic code is insufficient?
> >>
> >> Similarly for the the other conversions.
> > 
> > As far as I can see, the function 'convert_mode_scalar' doesn't seem to be 
> > perfect for
> > dealing with the conversions to/from BFmode. It can only handle BF to HF, 
> > SF, DF and
> > SF to BF well, but the rest of the conversion without any processing, 
> > directly using
> > the libcall.
> > 
> > Maybe I should choose to enhance its functionality? This seems to be a
> > good choice, I'm not sure.My recollection was that BF could be converted 
> > to/from SF trivially and 
> if we wanted BF->DF we'd first convert to SF, then to DF.
> 
> Direct BF<->DF conversions aren't actually important from a performance 
> standpoint.  So it's OK if they have an extra step IMHO.

Thank you very much for your review and detailed reply. Maybe there are some 
problems with my expression
and I am a little confused about your guidance. My understanding is that you 
also think that it is reasonable to
convert through SF, right? In fact, this is what I did.

In this patch, my thoughts are as follows:

The general principle is to use the real instructions instead of libcall as 
much as possible for conversions,
while minimizing the definition of libcall(only reusing which has been defined 
by other architectures such
as aarch64). If SF can be used as a transit, it is preferred to convert to SF, 
otherwise libcall is directly used.

1. For the conversions between floating points

For BF->DF, as you said, the function 'convert_mode_scalar' in the general code 
has been well implemented,
which will be expressed as BF->SF->DF. And the generated instruction list may 
be as follows:
  'call __extendbfsf2' + 'call __extendsfdf2' (when only soft floating point 
support);
  'call __extendbfsf2' + 'fcvt.d.s'   (when (TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) is true);
  'fcvt.s.bf16'+ 'fcvt.d.s'   (when ((TARGET_DOUBLE_FLOAT || 
TARGET_ZDINX) && TARGET_ZFBFMIN) is true)

For DF->BF, if any of fcvt.s.d and fcvt.bf16.s cannot be generated, the 'call 
__truncdfbf2' is directly generated
by the function 'convert_mode_scalar'. Otherwise the new pattern(define_expand 
"truncdfbf2") is used. This
makes it possible to implement DF->BF by 'fcvt.s.d' + 'fcvt.bf16.s', which 
cannot be generated by the function
'convert_mode_scala'.

2. For the conversions between integer and BF, it seems that gcc only uses 
libcall to implement it, but this is
obviously wrong. For example, the conversion BF->SI directly calls the 
unimplemented libcall __fixunsbfsi.
So I added some new pattern to handle these transformations with SF.

Thanks,

Jin

> 
> jeff

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Siddhesh Poyarekar

On 2023-10-25 04:16, Martin Uecker wrote:

Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:



Am 24.10.2023 um 22:38 schrieb Martin Uecker :

Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:

Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the 
issues we need to consider.

I slightly modified this example to make it to be compilable and run-able, as 
following:
(but I still cannot make the incorrect reordering or DSE happening, anyway, the 
potential reordering possibility is there…)

  1 #include 
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7
  8 static size_t
  9 get_size_from (void *ptr)
10 {
11  return __builtin_dynamic_object_size (ptr, 1);
12 }
13
14 void
15 foo (size_t sz)
16 {
17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
18  obj->size = sz;
19  obj->buf[0] = 2;
20  __builtin_printf (“%d\n", get_size_from (obj->buf));
21  return;
22 }
23
24 int main ()
25 {
26  foo (20);
27  return 0;
28 }






When it’s set I suppose.  Turn

X.l = n;

Into

X.l = __builtin_with_size (x.buf, n);


It would turn

some_variable = (&) x.buf

into

some_variable = __builtin_with_size ( (&) x.buf. x.len)


So the later access to x.buf and not the initialization
of a member of the struct (which is too early).



Hmm, so with Qing's example above, are you suggesting the transformation 
be to foo like so:


14 void
15 foo (size_t sz)
16 {
16.5  void * _1;
17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
18  obj->size = sz;
19  obj->buf[0] = 2;
19.5  _1 = __builtin_with_size (obj->buf, obj->size);
20  __builtin_printf (“%d\n", get_size_from (_1));
21  return;
22 }

If yes then this could indeed work.  I think I got thrown off by the 
reference to __bdos.


Thanks,
Sid


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Richard Biener



> Am 25.10.2023 um 10:16 schrieb Martin Uecker :
> 
> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>> 
 Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>>> 
>>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
 Hi, Sid,
 
 Really appreciate for your example and detailed explanation. Very helpful.
 I think that this example is an excellent example to show (almost) all the 
 issues we need to consider.
 
 I slightly modified this example to make it to be compilable and run-able, 
 as following: 
 (but I still cannot make the incorrect reordering or DSE happening, 
 anyway, the potential reordering possibility is there…)
 
 1 #include 
 2 struct A
 3 {
 4  size_t size;
 5  char buf[] __attribute__((counted_by(size)));
 6 };
 7 
 8 static size_t
 9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
 sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }
 
 With my GCC, it was compiled and worked:
 [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
 [opc@qinzhao-ol8u3-x86 ]$ ./a.out
 20
 Situation 1: With O1 and above, the routine “get_size_from” was inlined 
 into “foo”, therefore, the call to __bdos is in the same routine as the 
 instantiation of the object, and the TYPE information and the attached 
 counted_by attribute information in the TYPE of the object can be USED by 
 the __bdos call to compute the final object size. 
 
 [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
 [opc@qinzhao-ol8u3-x86 ]$ ./a.out
 -1
 Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
 “foo”, therefore, the call to __bdos is Not in the same routine as the 
 instantiation of the object, As a result, the TYPE info and the attached 
 counted_by info of the object can NOT be USED by the __bdos call. 
 
 Keep in mind of the above 2 situations, we will refer them in below:
 
 1. First,  the problem we are trying to resolve is:
 
 (Your description):
 
> the reordering of __bdos w.r.t. initialization of the size parameter but 
> to also account for DSE of the assignment, we can abstract this problem 
> to that of DFA being unable to see implicit use of the size parameter in 
> the __bdos call.
 
 basically is correct.  However, with the following exception:
 
 The implicit use of the size parameter in the __bdos call is not always 
 there, it ONLY exists WHEN the __bdos is able to evaluated to an 
 expression of the size parameter in the “objsz” phase, i.e., the 
 “Situation 1” of the above example. 
 In the “Situation 2”, when the __bdos does not see the TYPE of the real 
 object,  it does not see the counted_by information from the TYPE, 
 therefore,  it is not able to evaluate the size of the object through the 
 counted_by information.  As a result, the implicit use of the size 
 parameter in the __bdos call does NOT exist at all.  The optimizer can 
 freely reorder the initialization of the size parameter with the __bdos 
 call since there is no data flow dependency between these two. 
 
 With this exception in mind, we can see that your proposed “option 2” 
 (making the type of size “volatile”) is too conservative, it will  disable 
 many optimizations  unnecessarily, even though it’s safe and simple to 
 implement. 
 
 As a compiler optimization person for many many years, I really don’t want 
 to take this approach at this moment.  -:)
 
 2. Some facts I’d like to mention:
 
 A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
 optimization stage. During RTL stage,  the __bdos call has already been 
 replaced by an expression of the size parameter or a constant, the data 
 dependency is explicitly in the IR already.  I believe that the data 
 analysis in RTL stage should pick up the data dependency correctly, No 
 special handling is needed in RTL.
 
 B. If the __bdos call cannot see the real object , it has no way to get 
 the “counted_by” field from the TYPE of the real object. So, if we try to 
 add the implicit use of the “counted_by” field to the __bdos call, the 
 object instantiation should be in the same routine as the __bdos call.  
 Both the FE and the gimplification phase are too early to do this work. 
 
 2. Then, what’s the best approach to resolve this problem:
 
 There were 

Re: [PATCH v4] c-family: Implement __has_feature and __has_extension [PR60512]

2023-10-25 Thread Alex Coplan
On 11/10/2023 14:31, Alex Coplan wrote:
> On 27/09/2023 15:27, Alex Coplan wrote:
> > Hi,
> > 
> > This is a v4 patch to address Jason's feedback here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630911.html
> > 
> > w.r.t. v3 it just removes a comment now that some uncertainty around
> > cxx_binary_literals has been resolved, and updates the documentation as
> > suggested to point to the Clang docs.
> > 
> > --
> > 
> > This patch implements clang's __has_feature and __has_extension in GCC.
> > Currently the patch aims to implement all documented features (and some
> > undocumented ones) following the documentation at
> > https://clang.llvm.org/docs/LanguageExtensions.html with the exception
> > of the legacy features for C++ type traits.  These are omitted, since as
> > the clang documentation notes, __has_builtin is the correct "modern" way
> > to query for these (which GCC already implements).
> 
> Gentle ping on this:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631525.html

Ping^2

> 
> Thanks,
> Alex


[committed] libstdc++: Build libstdc++_libbacktrace.a as PIC [PR111936]

2023-10-25 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk, will backport to 12 and 13.

-- >8 --

In order for std::stacktrace to be used in a shared library, the
libbacktrace symbols need to be built with -fPIC. Add the libtool
-prefer-pic flag to the commands in src/libbacktrace/Makefile so that
the archive contains PIC objects.

libstdc++-v3/ChangeLog:

PR libstdc++/111936
* src/libbacktrace/Makefile.am: Add -prefer-pic to libtool
compile commands.
* src/libbacktrace/Makefile.in: Regenerate.
---
 libstdc++-v3/src/libbacktrace/Makefile.am | 8 ++--
 libstdc++-v3/src/libbacktrace/Makefile.in | 7 ---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/src/libbacktrace/Makefile.am 
b/libstdc++-v3/src/libbacktrace/Makefile.am
index 492d3b6e952..e3cede97463 100644
--- a/libstdc++-v3/src/libbacktrace/Makefile.am
+++ b/libstdc++-v3/src/libbacktrace/Makefile.am
@@ -49,9 +49,13 @@ WARN_FLAGS = -W -Wall -Wwrite-strings 
-Wmissing-format-attribute \
 -Wcast-qual
 C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wno-unused-but-set-variable
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
-AM_CFLAGS = $(C_WARN_FLAGS)
+AM_CFLAGS = \
+   $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
+   $(C_WARN_FLAGS)
 AM_CFLAGS += $(EXTRA_CFLAGS)
-AM_CXXFLAGS = $(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions
+AM_CXXFLAGS = \
+   $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
+   $(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
 
 obj_prefix = std_stacktrace
diff --git a/libstdc++-v3/src/libbacktrace/Makefile.in 
b/libstdc++-v3/src/libbacktrace/Makefile.in
index a85c6d118ea..ce80d246766 100644
--- a/libstdc++-v3/src/libbacktrace/Makefile.in
+++ b/libstdc++-v3/src/libbacktrace/Makefile.in
@@ -472,9 +472,10 @@ libstdc___libbacktrace_la_CPPFLAGS = \
 
 C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wno-unused-but-set-variable
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
-AM_CFLAGS = $(C_WARN_FLAGS) $(EXTRA_CFLAGS)
-AM_CXXFLAGS = $(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions \
-   $(EXTRA_CXXFLAGS)
+AM_CFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
+   $(C_WARN_FLAGS) $(EXTRA_CFLAGS)
+AM_CXXFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
+   $(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions $(EXTRA_CXXFLAGS)
 obj_prefix = std_stacktrace
 
 # Each FILE.c in SOURCES will be compiled to SHORTNAME-FILE.o
-- 
2.41.0



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Martin Uecker
Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
> 
> > Am 25.10.2023 um 10:16 schrieb Martin Uecker :
> > 
> > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > 
> > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> > > > 
> > > > Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> > > > > Hi, Sid,
> > > > > 
> > > > > Really appreciate for your example and detailed explanation. Very 
> > > > > helpful.
> > > > > I think that this example is an excellent example to show (almost) 
> > > > > all the issues we need to consider.
> > > > > 
> > > > > I slightly modified this example to make it to be compilable and 
> > > > > run-able, as following: 
> > > > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > > > anyway, the potential reordering possibility is there…)
> > > > > 
> > > > > 1 #include 
> > > > > 2 struct A
> > > > > 3 {
> > > > > 4  size_t size;
> > > > > 5  char buf[] __attribute__((counted_by(size)));
> > > > > 6 };
> > > > > 7 
> > > > > 8 static size_t
> > > > > 9 get_size_from (void *ptr)
> > > > > 10 {
> > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > 12 }
> > > > > 13 
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > 21  return;
> > > > > 22 }
> > > > > 23 
> > > > > 24 int main ()
> > > > > 25 {
> > > > > 26  foo (20);
> > > > > 27  return 0;
> > > > > 28 }
> > > > > 
> > > > > With my GCC, it was compiled and worked:
> > > > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > 20
> > > > > Situation 1: With O1 and above, the routine “get_size_from” was 
> > > > > inlined into “foo”, therefore, the call to __bdos is in the same 
> > > > > routine as the instantiation of the object, and the TYPE information 
> > > > > and the attached counted_by attribute information in the TYPE of the 
> > > > > object can be USED by the __bdos call to compute the final object 
> > > > > size. 
> > > > > 
> > > > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > -1
> > > > > Situation 2: With O0, the routine “get_size_from” was NOT inlined 
> > > > > into “foo”, therefore, the call to __bdos is Not in the same routine 
> > > > > as the instantiation of the object, As a result, the TYPE info and 
> > > > > the attached counted_by info of the object can NOT be USED by the 
> > > > > __bdos call. 
> > > > > 
> > > > > Keep in mind of the above 2 situations, we will refer them in below:
> > > > > 
> > > > > 1. First,  the problem we are trying to resolve is:
> > > > > 
> > > > > (Your description):
> > > > > 
> > > > > > the reordering of __bdos w.r.t. initialization of the size 
> > > > > > parameter but to also account for DSE of the assignment, we can 
> > > > > > abstract this problem to that of DFA being unable to see implicit 
> > > > > > use of the size parameter in the __bdos call.
> > > > > 
> > > > > basically is correct.  However, with the following exception:
> > > > > 
> > > > > The implicit use of the size parameter in the __bdos call is not 
> > > > > always there, it ONLY exists WHEN the __bdos is able to evaluated to 
> > > > > an expression of the size parameter in the “objsz” phase, i.e., the 
> > > > > “Situation 1” of the above example. 
> > > > > In the “Situation 2”, when the __bdos does not see the TYPE of the 
> > > > > real object,  it does not see the counted_by information from the 
> > > > > TYPE, therefore,  it is not able to evaluate the size of the object 
> > > > > through the counted_by information.  As a result, the implicit use of 
> > > > > the size parameter in the __bdos call does NOT exist at all.  The 
> > > > > optimizer can freely reorder the initialization of the size parameter 
> > > > > with the __bdos call since there is no data flow dependency between 
> > > > > these two. 
> > > > > 
> > > > > With this exception in mind, we can see that your proposed “option 2” 
> > > > > (making the type of size “volatile”) is too conservative, it will  
> > > > > disable many optimizations  unnecessarily, even though it’s safe and 
> > > > > simple to implement. 
> > > > > 
> > > > > As a compiler optimization person for many many years, I really don’t 
> > > > > want to take this approach at this moment.  -:)
> > > > > 
> > > > > 2. Some facts I’d like to mention:
> > > > > 
> > > > > A.  The incorrect reordering (or CSE) potential ONLY exists in the 
> > > > > TREE optimization stage. During RTL stage,  the __bdos call has 
> > > > > already been replaced by an expression of the size parameter or a 
> > > > > constant, the data dependency is explicitly in the IR already.  I 
>

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Martin Uecker
Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-25 04:16, Martin Uecker wrote:
> > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > 
> > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> > > > 
> > > > Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> > > > > Hi, Sid,
> > > > > 
> > > > > Really appreciate for your example and detailed explanation. Very 
> > > > > helpful.
> > > > > I think that this example is an excellent example to show (almost) 
> > > > > all the issues we need to consider.
> > > > > 
> > > > > I slightly modified this example to make it to be compilable and 
> > > > > run-able, as following:
> > > > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > > > anyway, the potential reordering possibility is there…)
> > > > > 
> > > > >   1 #include 
> > > > >   2 struct A
> > > > >   3 {
> > > > >   4  size_t size;
> > > > >   5  char buf[] __attribute__((counted_by(size)));
> > > > >   6 };
> > > > >   7
> > > > >   8 static size_t
> > > > >   9 get_size_from (void *ptr)
> > > > > 10 {
> > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > 12 }
> > > > > 13
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > 21  return;
> > > > > 22 }
> > > > > 23
> > > > > 24 int main ()
> > > > > 25 {
> > > > > 26  foo (20);
> > > > > 27  return 0;
> > > > > 28 }
> > > > > 
> 
> 
> 
> > > When it’s set I suppose.  Turn
> > > 
> > > X.l = n;
> > > 
> > > Into
> > > 
> > > X.l = __builtin_with_size (x.buf, n);
> > 
> > It would turn
> > 
> > some_variable = (&) x.buf
> > 
> > into
> > 
> > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > 
> > 
> > So the later access to x.buf and not the initialization
> > of a member of the struct (which is too early).
> > 
> 
> Hmm, so with Qing's example above, are you suggesting the transformation 
> be to foo like so:
> 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 16.5  void * _1;
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> 20  __builtin_printf (“%d\n", get_size_from (_1));
> 21  return;
> 22 }
> 
> If yes then this could indeed work.  I think I got thrown off by the 
> reference to __bdos.

Yes. I think it is important not to evaluate the size at the
access to buf and not the allocation, because the point is to 
recover it from the size member even when the compiler can't 
see the original allocation.

Evaluating at this point requires that the size is correctly set
before the access to the FAM and the user has to make sure 
this is the case. But to me this requirement would make sense.

Semantically, it could aöso make sense to evaluate the size at a
later time.  But then the reordering becomes problematic again.

Also I think this would make this feature generally more useful.
For example, it could work also for others pointers in the struct
and not just for FAMs.  In this case, the struct may already be
freed when  BDOS is called, so it might also not possible to
access the size member at a later time.

Martin


> 



Re: PR111754

2023-10-25 Thread Prathamesh Kulkarni
On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
 wrote:
>
> Hi,
>
> Sorry the slow review.  I clearly didn't think this through properly
> when doing the review of the original patch, so I wanted to spend
> some time working on the code to get a better understanding of
> the problem.
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > For the following test-case:
> >
> > typedef float __attribute__((__vector_size__ (16))) F;
> > F foo (F a, F b)
> > {
> >   F v = (F) { 9 };
> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > }
> >
> > Compiling with -O2 results in following ICE:
> > foo.c: In function ‘foo’:
> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> >   |  ^~
> > 0x7f3185 wi::int_traits
> >>::decompose(long*, unsigned int, std::pair
> > const&)
> > ../../gcc/gcc/rtl.h:2314
> > 0x7f3185 wide_int_ref_storage > false>::wide_int_ref_storage
> >>(std::pair const&)
> > ../../gcc/gcc/wide-int.h:1089
> > 0x7f3185 generic_wide_int
> >>::generic_wide_int
> >>(std::pair const&)
> > ../../gcc/gcc/wide-int.h:847
> > 0x7f3185 poly_int<1u, generic_wide_int > false> > >::poly_int
> >>(poly_int_full, std::pair const&)
> > ../../gcc/gcc/poly-int.h:467
> > 0x7f3185 poly_int<1u, generic_wide_int > false> > >::poly_int
> >>(std::pair const&)
> > ../../gcc/gcc/poly-int.h:453
> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > ../../gcc/gcc/rtl.h:2383
> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > ../../gcc/gcc/rtx-vector-builder.h:122
> > 0xfd4e1b vector_builder > rtx_vector_builder>::elt(unsigned int) const
> > ../../gcc/gcc/vector-builder.h:253
> > 0xfd4d11 rtx_vector_builder::build()
> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > 0xc21d9c const_vector_from_tree
> > ../../gcc/gcc/expr.cc:13487
> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> > ../../gcc/gcc/expr.cc:11059
> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> > ../../gcc/gcc/expr.h:310
> > 0xaee682 expand_return
> > ../../gcc/gcc/cfgexpand.cc:3809
> > 0xaee682 expand_gimple_stmt_1
> > ../../gcc/gcc/cfgexpand.cc:3918
> > 0xaee682 expand_gimple_stmt
> > ../../gcc/gcc/cfgexpand.cc:4044
> > 0xaf28f0 expand_gimple_basic_block
> > ../../gcc/gcc/cfgexpand.cc:6100
> > 0xaf4996 execute
> > ../../gcc/gcc/cfgexpand.cc:6835
> >
> > IIUC, the issue is that fold_vec_perm returns a vector having float element
> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > to derive element v[3], not present in the encoding, while trying to
> > build rtx vector
> > in rtx_vector_builder::build():
> >  for (unsigned int i = 0; i < nelts; ++i)
> > RTVEC_ELT (v, i) = elt (i);
> >
> > The attached patch tries to fix this by returning false from
> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > input vector has non-integral element type, so for VLA vectors, it
> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > non-integral element type.
> >
> > For VLS vectors, this will still work for stepped sequence since it
> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > res_npattern = res_nelts and
> > res_nelts_per_pattern = 1
> >
> > and fold the above case to:
> > F foo (F a, F b)
> > {
> >[local count: 1073741824]:
> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > }
> >
> > But I am not sure if this is entirely correct, since:
> > tree res = out_elts.build ();
> > will canonicalize the encoding and may result in a stepped sequence
> > (vector_builder::finalize() may reduce npatterns at the cost of increasing
> > nelts_per_pattern)  ?
> >
> > PS: This issue is now latent after PR111648 fix, since
> > valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
> > false because the corresponding pattern in arg0 is not a natural
> > stepped sequence, and folds correctly using VLS exception. However, I
> > guess the underlying issue of dealing with non-integral element types
> > in fold_vec_perm_cst still remains ?
> >
> > The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
> > and on x86_64-linux-gnu.
>
> I think the problem is instead in the way that we're calculating
> res_npatterns and res_nelts_per_pattern.
>
> If the selector is a duplication of { a1, ..., an }, then the
> result will be a duplication of n elements, regardless of the shape
> of the other arguments.
>
> Similarly, if the selector is { a1, , an } followed by a
> duplication of { b1, ..., bn }, the result be n elements followed
> by a duplication of n elements, regardless of the shape of the other
> arguments.
>
> So for these two cases, res_npatterns and res_nelts_per_pattern
> can come directly from the s

[PATCH v2 1/4] libgrust: Add entry for maintainers and stub changelog file.

2023-10-25 Thread Arthur Cohen
ChangeLog:

* MAINTAINERS: Add maintainers for libgrust.

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Add libgrust.
* gcc_update: Add libgrust file dependencies

Co-authored-by: Pierre-Emmanuel Patry 
---
 MAINTAINERS | 1 +
 contrib/gcc-changelog/git_commit.py | 1 +
 contrib/gcc_update  | 4 
 libgrust/ChangeLog  | 6 ++
 4 files changed, 12 insertions(+)
 create mode 100644 libgrust/ChangeLog

diff --git a/MAINTAINERS b/MAINTAINERS
index 4401086fb6c..d2bdde71d27 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -182,6 +182,7 @@ libgo   Ian Lance Taylor

 libgompJakub Jelinek   
 libgompTobias Burnus   

 libgomp (OpenACC)  Thomas Schwinge 
+libgrust   All Rust front end maintainers
 libiberty  Ian Lance Taylor
 libitm Torvald Riegel  
 libobjcNicola Pero 

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 9110317a759..4e601fa1f63 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -69,6 +69,7 @@ default_changelog_locations = {
 'libgfortran',
 'libgm2',
 'libgomp',
+'libgrust',
 'libhsail-rt',
 'libiberty',
 'libitm',
diff --git a/contrib/gcc_update b/contrib/gcc_update
index cda2bdb0df9..774c926e723 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -153,6 +153,10 @@ libgomp/testsuite/Makefile.in: 
libgomp/testsuite/Makefile.am libgomp/aclocal.m4
 libgomp/configure.ac: libgomp/plugin/configfrag.ac
 libgomp/configure: libgomp/configure.ac libgomp/aclocal.m4
 libgomp/config.h.in: libgomp/configure.ac libgomp/aclocal.m4
+libgrust/Makefile.in: libgrust/Makefile.am libgrust/aclocal.m4
+libgrust/aclocal.m4: libgrust/configure.ac
+libgrust/configure: libgrust/configure.ac libgrust/aclocal.m4
+libgrust/libproc_macro_internal/Makefile.in: 
libgrust/libproc_macro_internal/Makefile.am libgrust/aclocal.m4
 libitm/aclocal.m4: libitm/configure.ac libitm/acinclude.m4
 libitm/Makefile.in: libitm/Makefile.am libitm/aclocal.m4
 libitm/testsuite/Makefile.in: libitm/testsuite/Makefile.am libitm/aclocal.m4
diff --git a/libgrust/ChangeLog b/libgrust/ChangeLog
new file mode 100644
index 000..97887c90552
--- /dev/null
+++ b/libgrust/ChangeLog
@@ -0,0 +1,6 @@
+
+Copyright (C) 2023 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
-- 
2.42.0



[PATCH v2 2/4] libgrust: Add libproc_macro and build system

2023-10-25 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

Add some dummy files in libproc_macro along with its build system.

libgrust/Changelog:

* Makefile.am: New file.
* configure.ac: New file.
* libproc_macro/Makefile.am: New file.
* libproc_macro/proc_macro.cc: New file.
* libproc_macro/proc_macro.h: New file.

Signed-off-by: Pierre-Emmanuel Patry 
---
 libgrust/Makefile.am |  68 
 libgrust/configure.ac| 113 +++
 libgrust/libproc_macro/Makefile.am   |  58 ++
 libgrust/libproc_macro/proc_macro.cc |   7 ++
 libgrust/libproc_macro/proc_macro.h  |   7 ++
 5 files changed, 253 insertions(+)
 create mode 100644 libgrust/Makefile.am
 create mode 100644 libgrust/configure.ac
 create mode 100644 libgrust/libproc_macro/Makefile.am
 create mode 100644 libgrust/libproc_macro/proc_macro.cc
 create mode 100644 libgrust/libproc_macro/proc_macro.h

diff --git a/libgrust/Makefile.am b/libgrust/Makefile.am
new file mode 100644
index 000..8e5274922c5
--- /dev/null
+++ b/libgrust/Makefile.am
@@ -0,0 +1,68 @@
+AUTOMAKE_OPTIONS = 1.8 foreign
+
+SUFFIXES = .c .rs .def .o .lo .a
+
+ACLOCAL_AMFLAGS = -I . -I .. -I ../config
+
+AM_CFLAGS = -I $(srcdir)/../libgcc -I $(MULTIBUILDTOP)../../gcc/include
+
+TOP_GCCDIR := $(shell cd $(top_srcdir) && cd .. && pwd)
+
+GCC_DIR = $(TOP_GCCDIR)/gcc
+RUST_SRC = $(GCC_DIR)/rust
+
+toolexeclibdir=@toolexeclibdir@
+toolexecdir=@toolexecdir@
+
+SUBDIRS = libproc_macro
+
+RUST_BUILDDIR := $(shell pwd)
+
+# Work around what appears to be a GNU make bug handling MAKEFLAGS
+# values defined in terms of make variables, as is the case for CC and
+# friends when we are called from the top level Makefile.
+AM_MAKEFLAGS = \
+"GCC_DIR=$(GCC_DIR)" \
+"RUST_SRC=$(RUST_SRC)" \
+   "AR_FLAGS=$(AR_FLAGS)" \
+   "CC_FOR_BUILD=$(CC_FOR_BUILD)" \
+   "CC_FOR_TARGET=$(CC_FOR_TARGET)" \
+   "RUST_FOR_TARGET=$(RUST_FOR_TARGET)" \
+   "CFLAGS=$(CFLAGS)" \
+   "CXXFLAGS=$(CXXFLAGS)" \
+   "CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
+   "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
+   "INSTALL=$(INSTALL)" \
+   "INSTALL_DATA=$(INSTALL_DATA)" \
+   "INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
+   "INSTALL_SCRIPT=$(INSTALL_SCRIPT)" \
+   "LDFLAGS=$(LDFLAGS)" \
+   "LIBCFLAGS=$(LIBCFLAGS)" \
+   "LIBCFLAGS_FOR_TARGET=$(LIBCFLAGS_FOR_TARGET)" \
+   "MAKE=$(MAKE)" \
+   "MAKEINFO=$(MAKEINFO) $(MAKEINFOFLAGS)" \
+   "PICFLAG=$(PICFLAG)" \
+   "PICFLAG_FOR_TARGET=$(PICFLAG_FOR_TARGET)" \
+   "SHELL=$(SHELL)" \
+   "RUNTESTFLAGS=$(RUNTESTFLAGS)" \
+   "exec_prefix=$(exec_prefix)" \
+   "infodir=$(infodir)" \
+   "libdir=$(libdir)" \
+   "includedir=$(includedir)" \
+   "prefix=$(prefix)" \
+   "tooldir=$(tooldir)" \
+   "gxx_include_dir=$(gxx_include_dir)" \
+   "AR=$(AR)" \
+   "AS=$(AS)" \
+   "LD=$(LD)" \
+   "RANLIB=$(RANLIB)" \
+   "NM=$(NM)" \
+   "NM_FOR_BUILD=$(NM_FOR_BUILD)" \
+   "NM_FOR_TARGET=$(NM_FOR_TARGET)" \
+   "DESTDIR=$(DESTDIR)" \
+   "WERROR=$(WERROR)" \
+"TARGET_LIB_PATH=$(TARGET_LIB_PATH)" \
+"TARGET_LIB_PATH_librust=$(TARGET_LIB_PATH_librust)" \
+   "LIBTOOL=$(RUST_BUILDDIR)/libtool"
+
+include $(top_srcdir)/../multilib.am
diff --git a/libgrust/configure.ac b/libgrust/configure.ac
new file mode 100644
index 000..7aed489a643
--- /dev/null
+++ b/libgrust/configure.ac
@@ -0,0 +1,113 @@
+AC_INIT([libgrust], version-unused,,librust)
+AC_CONFIG_SRCDIR(Makefile.am)
+AC_CONFIG_FILES([Makefile])
+
+# AM_ENABLE_MULTILIB(, ..)
+
+# Do not delete or change the following two lines.  For why, see
+# http://gcc.gnu.org/ml/libstdc++/2003-07/msg00451.html
+AC_CANONICAL_SYSTEM
+target_alias=${target_alias-$host_alias}
+AC_SUBST(target_alias)
+
+# Automake should never attempt to rebuild configure
+AM_MAINTAINER_MODE
+
+AM_INIT_AUTOMAKE([1.15.1 foreign no-dist -Wall])
+
+# Make sure we don't test executables when making cross-tools.
+GCC_NO_EXECUTABLES
+
+
+# Add the ability to change LIBTOOL directory
+GCC_WITH_TOOLEXECLIBDIR
+
+# Use system specific extensions
+AC_USE_SYSTEM_EXTENSIONS
+
+
+# Checks for header files.
+AC_HEADER_STDC
+AC_HEADER_SYS_WAIT
+AC_CHECK_HEADERS(limits.h stddef.h string.h strings.h stdlib.h \
+ time.h sys/stat.h wchar.h)
+
+# Check for tools
+AM_PROG_AR
+AC_PROG_CC
+AC_PROG_CXX
+AM_PROG_AS
+AC_PROG_MAKE_SET
+AC_PROG_INSTALL
+
+# Enable libtool
+LT_INIT
+
+# target_noncanonical variables...
+AC_CANONICAL_HOST
+ACX_NONCANONICAL_HOST
+ACX_NONCANONICAL_TARGET
+GCC_TOPLEV_SUBDIRS
+
+AC_MSG_CHECKING([for --enable-version-specific-runtime-libs])
+AC_ARG_ENABLE(version-specific-runtime-libs,
+[  --enable-version-specific-runtime-libsSpecify that runtime libraries 
should be installed in a compiler-specific directory ],
+[case "$enableval" in
+ yes) version_specific_libs=yes ;;
+ no)  version_specific_

[PATCH v2 3/4] build: Add libgrust as compilation modules

2023-10-25 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

Define the libgrust directory as a host compilation module as well as
for targets.

ChangeLog:

* Makefile.def: Add libgrust as host & target module.
* configure.ac: Add libgrust to host tools list.

gcc/rust/ChangeLog:

* config-lang.in: Add libgrust as a target module for the rust
language.

Signed-off-by: Pierre-Emmanuel Patry 
---
 Makefile.def| 2 ++
 configure.ac| 3 ++-
 gcc/rust/config-lang.in | 2 ++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Makefile.def b/Makefile.def
index 15c068e4ac4..929a6f0a08e 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -149,6 +149,7 @@ host_modules= { module= libcc1; 
extra_configure_flags=--enable-shared; };
 host_modules= { module= gotools; };
 host_modules= { module= libctf; bootstrap=true; };
 host_modules= { module= libsframe; bootstrap=true; };
+host_modules= { module= libgrust; };
 
 target_modules = { module= libstdc++-v3;
   bootstrap=true;
@@ -192,6 +193,7 @@ target_modules = { module= libgm2; lib_path=.libs; };
 target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; };
 target_modules = { module= libitm; lib_path=.libs; };
 target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
+target_modules = { module= libgrust; };
 
 // These are (some of) the make targets to be done in each subdirectory.
 // Not all; these are the ones which don't have special options.
diff --git a/configure.ac b/configure.ac
index 692dc716343..b2a5511bab1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -133,7 +133,7 @@ build_tools="build-texinfo build-flex build-bison build-m4 
build-fixincludes"
 
 # these libraries are used by various programs built for the host environment
 #f
-host_libs="intl libiberty opcodes bfd readline tcl tk itcl libgui zlib 
libbacktrace libcpp libcody libdecnumber gmp mpfr mpc isl libiconv libctf 
libsframe"
+host_libs="intl libiberty opcodes bfd readline tcl tk itcl libgui zlib 
libbacktrace libcpp libcody libdecnumber gmp mpfr mpc isl libiconv libctf 
libsframe libgrust "
 
 # these tools are built for the host environment
 # Note, the powerpc-eabi build depends on sim occurring before gdb in order to
@@ -164,6 +164,7 @@ target_libraries="target-libgcc \
target-libada \
target-libgm2 \
target-libgo \
+   target-libgrust \
target-libphobos \
target-zlib"
 
diff --git a/gcc/rust/config-lang.in b/gcc/rust/config-lang.in
index aac66c9b962..8f071dcb0bf 100644
--- a/gcc/rust/config-lang.in
+++ b/gcc/rust/config-lang.in
@@ -29,4 +29,6 @@ compilers="rust1\$(exeext)"
 
 build_by_default="no"
 
+target_libs="target-libffi target-libbacktrace target-libgrust"
+
 gtfiles="\$(srcdir)/rust/rust-lang.cc"
-- 
2.42.0



[PATCH v2 4/4] build: Regenerate build files

2023-10-25 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

Regenerate all build files.

ChangeLog:

* Makefile.in:
* configure: Regenerate.

libgrust/ChangeLog:

* Makefile.in: New file.
* aclocal.m4: New file.
* configure: New file.
* libproc_macro/Makefile.in: New file.

libgm2/ChangeLog:

* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.in: Regenerate.

Signed-off-by: Pierre-Emmanuel Patry 
---
 Makefile.in|  1015 +-
 configure  | 3 +-
 libgrust/Makefile.in   |   671 +
 libgrust/aclocal.m4|  1260 ++
 libgrust/configure | 18420 +++
 libgrust/libproc_macro/Makefile.in |   704 +
 6 files changed, 22063 insertions(+), 10 deletions(-)
 create mode 100644 libgrust/Makefile.in
 create mode 100644 libgrust/aclocal.m4
 create mode 100755 libgrust/configure
 create mode 100644 libgrust/libproc_macro/Makefile.in


Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-25 Thread Ajit Agarwal



On 25/10/23 2:19 am, Vineet Gupta wrote:
> On 10/24/23 13:36, rep.dot@gmail.com wrote:
>> As said, I don't see why the below was not cleaned up before the V1 
>> submission.
>> Iff it breaks when manually CSEing, I'm curious why?
 The function below looks identical in v12 of the patch.
 Why didn't you use common subexpressions?
 ba
>>> Using CSE here breaks aarch64 regressions hence I have reverted it back
>>> not to use CSE,
>> Just for my own education, can you please paste your patch perusing common 
>> subexpressions and an assembly diff of the failing versus working aarch64 
>> testcase, along how you configured that failing (cross-?)compiler and the 
>> command-line of a typical testcase that broke when manually CSEing the 
>> function below?
> 
> I was meaning to ask this before, but what exactly is the CSE issue, manually 
> or whatever.
> 
Here is the abi interface where I CSE'D and got a mail from automated 
regressions run that aarch64
test fails.

static inline bool
abi_extension_candidate_return_reg_p (int regno)
{
  if (targetm.calls.function_value_regno_p (regno))
return true;

  return false;
}

+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  return targetm.calls.function_value_regno_p (regno);
+}


I have not done any assembly diff as myself have not cross compiled with 
aarch64.
Reverting above CSE the tests passes with automatically regression runs and 
build with linaro.
Linaro runs the tests with every patch we submit in gcc-patches and if there is 
any fail they 
report error.

Reverting CSE the Linaro tests passes.

Thanks & Regards
Ajit

> -Vineet


Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-25 Thread Ajit Agarwal



On 25/10/23 2:06 am, rep.dot@gmail.com wrote:
> On 24 October 2023 09:36:22 CEST, Ajit Agarwal  wrote:
>> Hello Bernhard:
>>
>> On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote:
>>> On Mon, 23 Oct 2023 12:16:18 +0530
>>> Ajit Agarwal  wrote:
>>>
 Hello All:

 Addressed below review comments in the version 11 of the patch.
 Please review and please let me know if its ok for trunk.
>>>
>>> s/satisified/satisfied/
>>>
>>
>> I will fix this.
> 
> thanks!
> 
>>
> As said, I don't see why the below was not cleaned up before the V1 
> submission.
> Iff it breaks when manually CSEing, I'm curious why?
>>>
>>> The function below looks identical in v12 of the patch.
>>> Why didn't you use common subexpressions?
>>> ba
>>
>> Using CSE here breaks aarch64 regressions hence I have reverted it back 
>> not to use CSE,
> 
> Just for my own education, can you please paste your patch perusing common 
> subexpressions and an assembly diff of the failing versus working aarch64 
> testcase, along how you configured that failing (cross-?)compiler and the 
> command-line of a typical testcase that broke when manually CSEing the 
> function below?
> 
> I might have not completely understood the subtile intricacies of RTL 
> re-entrancy, it seems?
> 

Here is the abi interface where I CSE'D and got a mail from automated 
regressions run that aarch64
test fails.

static inline bool
abi_extension_candidate_return_reg_p (int regno)
{
  if (targetm.calls.function_value_regno_p (regno))
return true;

  return false;
}

+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  return targetm.calls.function_value_regno_p (regno);
+}


I have not done any assembly diff as myself have not cross compiled with 
aarch64.
Reverting above CSE the tests passes with automatically regression runs and 
build with linaro.
Linaro runs the tests with every patch we submit in gcc-patches and if there is 
any fail they 
report error.

Reverting CSE the Linaro tests passes.

Thanks & Regards
Ajit
> thanks
> 
>   
>>> +/* Return TRUE if reg source operand of zero_extend is argument 
>>> registers
>>> +   and not return registers and source and destination operand are same
>>> +   and mode of source and destination operand are not same.  */
>>> +
>>> +static bool
>>> +abi_extension_candidate_p (rtx_insn *insn)
>>> +{
>>> +  rtx set = single_set (insn);
>>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
>>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
>>> +
>>> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
>>> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
>>> (orig_src)))  
>>> +return false;
>>> +
>>> +  /* Mode of destination and source should be different.  */
>>> +  if (dst_mode == GET_MODE (orig_src))
>>> +return false;
>>> +
>>> +  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
>>> +  bool promote_p = abi_target_promote_function_mode (mode);
>>> +
>>> +  /* REGNO of source and destination should be same if not
>>> +  promoted.  */
>>> +  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
>>> +return false;
>>> +
>>> +  return true;
>>> +}
>>> +  
>>>
>>>
>
> As said, please also rephrase the above (and everything else if it 
> obviously looks akin the above).
>>>
>>> thanks
> 


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Richard Biener



> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
> 
> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
 
> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> 
> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very 
>> helpful.
>> I think that this example is an excellent example to show (almost) all 
>> the issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and 
>> run-able, as following:
>> (but I still cannot make the incorrect reordering or DSE happening, 
>> anyway, the potential reordering possibility is there…)
>> 
>>  1 #include 
>>  2 struct A
>>  3 {
>>  4  size_t size;
>>  5  char buf[] __attribute__((counted_by(size)));
>>  6 };
>>  7
>>  8 static size_t
>>  9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>> sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> 
>> 
>> 
 When it’s set I suppose.  Turn
 
 X.l = n;
 
 Into
 
 X.l = __builtin_with_size (x.buf, n);
>>> 
>>> It would turn
>>> 
>>> some_variable = (&) x.buf
>>> 
>>> into
>>> 
>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>> 
>>> 
>>> So the later access to x.buf and not the initialization
>>> of a member of the struct (which is too early).
>>> 
>> 
>> Hmm, so with Qing's example above, are you suggesting the transformation 
>> be to foo like so:
>> 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 16.5  void * _1;
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>> 21  return;
>> 22 }
>> 
>> If yes then this could indeed work.  I think I got thrown off by the 
>> reference to __bdos.
> 
> Yes. I think it is important not to evaluate the size at the
> access to buf and not the allocation, because the point is to 
> recover it from the size member even when the compiler can't 
> see the original allocation.

But if the access is through a pointer without the attribute visible even the 
Frontend cannot recover?  We’d need to force type correctness and give up on 
indirecting through an int * when it can refer to two diffenent container 
types.  The best we can do I think is mark allocation sites and hope for some 
basic code hygiene (not clobbering size or array pointer through pointers 
without the appropriately attributed type)

> Evaluating at this point requires that the size is correctly set
> before the access to the FAM and the user has to make sure 
> this is the case. But to me this requirement would make sense.
> 
> Semantically, it could aöso make sense to evaluate the size at a
> later time.  But then the reordering becomes problematic again.
> 
> Also I think this would make this feature generally more useful.
> For example, it could work also for others pointers in the struct
> and not just for FAMs.  In this case, the struct may already be
> freed when  BDOS is called, so it might also not possible to
> access the size member at a later time.
> 
> Martin
> 
> 
>> 
> 


Re: [PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-25 Thread Ajit Agarwal



On 24/10/23 11:47 pm, Vineet Gupta wrote:
> 
> 
> On 10/24/23 10:03, Ajit Agarwal wrote:
>> Hello Vineet, Jeff and Bernhard:
>>
>> This version 14 of the patch uses abi interfaces to remove zero and sign 
>> extension elimination.
>> This fixes aarch64 regressions failures with aggressive CSE.
> 
> Once again, this information belong between the two "---" lines that you 
> added for v6 and stopped updating.
> 
> And it seems the only code difference between v13 and v14 is
> 
> -  return tgt_mode == mode;
> +  if (tgt_mode == mode)
> +    return true;
> +  else
> +    return false;
> 
> How does that make any difference ?

In V14 of the patch I reverted the CSE done v13 of the patch.
This is because I got a mail from Linaro with Linaro regressions fails. 
Then I got a sorry mail saying there were some errands at there end and ask me 
to ignore.

Please review the V13 of the patch with CSE'd and please let me know if this 
okay for trunk.

Thanks & Regards
Ajit


> 
> -Vineet
> 
>>
>> Bootstrapped and regtested on powerpc-linux-gnu.
>>
>> In this version (version 14) of the patch following review comments are 
>> incorporated.
>>
>> a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
>> b) Source and destination with different registers are considered.
>> c) Further enhancements.
>> d) Added sign extension elimination using abi interfaces.
>> d) Addressed remaining review comments from Vineet.
>> e) Addressed review comments from Bernhard.
>> f) Fix aarch64 regressions failure.
>>
>> Please let me know if there is anything missing in this patch.
>>
>> Ok for trunk?
>>
>> Thanks & Regards
>> Ajit
>>
>> ree: Improve ree pass using defined abi interfaces
>>
>> For rs6000 target we see zero and sign extend with missing
>> definitions. Improved to eliminate such zero and sign extension
>> using defined ABI interfaces.
>>
>> 2023-10-24  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * ree.cc (combine_reaching_defs): Eliminate zero_extend and 
>> sign_extend
>>  using defined abi interfaces.
>>  (add_removable_extension): Use of defined abi interfaces for no
>>  reaching defs.
>>  (abi_extension_candidate_return_reg_p): New function.
>>  (abi_extension_candidate_p): New function.
>>  (abi_extension_candidate_argno_p): New function.
>>  (abi_handle_regs): New function.
>>  (abi_target_promote_function_mode): New function.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.target/powerpc/zext-elim-3.C
>> ---
>> changes since v6:
>>    - Added missing abi interfaces.
>>    - Rearranging and restructuring the code.
>>    - Removal of hard coded zero extend and sign extend in abi interfaces.
>>    - Relaxed different registers with source and destination in abi 
>> interfaces.
>>    - Using CSE in abi interfaces.
>>    - Fix aarch64 regressions.
>>    - Add Sign extension removal in abi interfaces.
>>    - Modified comments as per coding convention.
>>    - Modified code as per coding convention.
>>    - Fix bug bootstrapping RISCV failures.
>> ---
>>   gcc/ree.cc    | 147 +-
>>   .../g++.target/powerpc/zext-elim-3.C  |  13 ++
>>   2 files changed, 154 insertions(+), 6 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
>>
>> diff --git a/gcc/ree.cc b/gcc/ree.cc
>> index fc04249fa84..f557b49b366 100644
>> --- a/gcc/ree.cc
>> +++ b/gcc/ree.cc
>> @@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
>>   if (REGNO (DF_REF_REG (def)) == REGNO (reg))
>>     break;
>>   -  gcc_assert (def != NULL);
>> +  if (def == NULL)
>> +    return NULL;
>>       ref_chain = DF_REF_CHAIN (def);
>>   @@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
>>     return src;
>>   }
>>   +/* Return TRUE if target mode is equal to source mode, false otherwise.  
>> */
>> +
>> +static bool
>> +abi_target_promote_function_mode (machine_mode mode)
>> +{
>> +  int unsignedp;
>> +  machine_mode tgt_mode
>> +    = targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
>> +   NULL_TREE, 1);
>> +
>> +  if (tgt_mode == mode)
>> +    return true;
>> +  else
>> +    return false;
>> +}
>> +
>> +/* Return TRUE if regno is a return register.  */
>> +
>> +static inline bool
>> +abi_extension_candidate_return_reg_p (int regno)
>> +{
>> +  if (targetm.calls.function_value_regno_p (regno))
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if the following conditions are satisfied.
>> +
>> +  a) reg source operand is argument register and not return register.
>> +  b) mode of source and destination operand are different.
>> +  c) if not promoted REGNO of source and destination operand are same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
>> +
>> 

[PATCH V2] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

2023-10-25 Thread Juzhe-Zhong
This patch addresses the redundant AVL/VL toggling in RVV partial 
auto-vectorization
which is a known issue for a long time and I finally find the time to address 
it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
 int *__restrict b,
 int *__restrict n)
{
  for (int i = 0; i < n; i++)
  a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);  -> 
vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);-> 
vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> 
vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;  -> 
vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> 
vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple 
PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation 
on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:
   
.L3:
ld1wz30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
ld1wz31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
add z31.s, z31.s, z30.s-> un-predicated add
st1wz31.s, p7, [x0, x3, lsl 2] -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL 
propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all 
vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't 
see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to 
maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, 
COND_LEN_SUB, 
   We also need COND_LEN_EXTEND, , COND_LEN_CEIL, ... .. over 100+ 
patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and 
clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why 
not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we 
recently refactored VSETVL PASS. However, I changed my mind recently after 
several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is 
complicated enough and aleady has enough aggressive and fancy optimizations 
which
   turns out it can always generate optimal codegen in most of the cases. It's 
not a good idea keep adding more features into VSETVL PASS to make VSETVL
 PASS become heavy and heavy again, then we will need to refactor it 
again in the future.
 Actuall, the VSETVL PASS is very stable and optimal after the recent 
refactoring. Hopefully, we should not change VSETVL PASS any more except the 
minor
 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 
different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before 
RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial 
auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can 
be very easily extended features and enhancements.
 We can easily extend and enhance more AVL propagation in a clean and 
separate PASS in the future. (If we do it on VSETVL PASS, we will complicate 
 VSETVL PASS again which is already so complicated.) 

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
  int *__restrict b,
  int *__restrict c,
  int *__restrict a2,
  int *__restrict b2,
  int *__restrict c2,
  int *__restrict a3,
  int *__restrict b3,
  int *__restrict c3,
  int *__restrict a4,
  int *__restrict b4,
  int *__restrict c4,
  int *__restrict a5,
  int *__restrict b5,
  int *__restrict c5,
  int n)
{
for (int i = 0; i < n; i++){
  a[i] = b[i] + c[i];
  b5[i] = b[i] + c[i];
  a2[i] = b2[i] + c2[i];
  a3[i] = b3[i] + c3[i];
  a4[i] = b4[i] + c4[i];
  a5[i] = a[i] + a4[i];
  a[i] = a5[i] + b5[i]+ a[i];

  a[i] = a[i] + c[i];
  b5[i] = a[i] + 

Re: [PATCH] gcc/jit/jit-recording.cc: recording::global::write_to_dump: Avoid crashes when writing psuedo-C for globals with string initializers.

2023-10-25 Thread David Malcolm
On Fri, 2022-11-25 at 02:13 +0530, Vibhav Pant via Jit wrote:
> If a char * global was initialized with a rvalue from
> `gcc_jit_context_new_string_literal` containing a format string,
> dumping the context causes libgccjit to SIGSEGV due to an improperly
> constructed call to vasprintf. The following code snippet can
> reproduce
> the crash:
> 
> int main(int argc, char **argv)
> {
>  gcc_jit_context *ctxt = gcc_jit_context_acquire ();
>  gcc_jit_lvalue *var = gcc_jit_context_new_global(
>  ctxt, NULL, GCC_JIT_GLOBAL_EXPORTED,
>  gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_CONST_CHAR_PTR),
>  "var");
>  gcc_jit_global_set_initializer_rvalue(
>  var, gcc_jit_context_new_string_literal(ctxt, "%s"));
>  gcc_jit_context_dump_to_file (ctxt, "output", 0);
>  return 0;
> }
> 
> The offending line is jit-recording.cc:4922, where a call to d.write
> passes the initializer rvalue's debug string to `write` without a
> format specifier. The attached patch fixes this issue.

Thanks for spotting this, and sorry about missing your patch.

I've gone ahead and pushed this to trunk (as r14-4923-gac66744d94226a),
and will backport it.

Dave



[pushed] c++: add fixed testcase [PR99804]

2023-10-25 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, pushed to trunk.

-- >8 --

We accept the non-dependent call f(e) here ever since the
NON_DEPENDENT_EXPR removal patch r14-4793-gdad311874ac3b3.
I haven't looked closely into why but I suspect wrapping 'e'
in a NON_DEPENDENT_EXPR was causing the argument conversion
to misbehave.

PR c++/99804

gcc/testsuite/ChangeLog:

* g++.dg/template/enum9.C: New test.
---
 gcc/testsuite/g++.dg/template/enum9.C | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/enum9.C

diff --git a/gcc/testsuite/g++.dg/template/enum9.C 
b/gcc/testsuite/g++.dg/template/enum9.C
new file mode 100644
index 000..c992cd505c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/enum9.C
@@ -0,0 +1,12 @@
+// PR c++/99804
+
+struct S {
+  enum E { A, B } e : 1;
+  void f(E);
+  template void g() { f(e); }
+};
+
+int main() {
+  S s;
+  s.g();
+}
-- 
2.42.0.482.g2e8e77cbac



Re: [pushed][PATCH v1] LoongArch: Implement __builtin_thread_pointer for TLS.

2023-10-25 Thread chenglulu

Pushed to r14-4925.

在 2023/10/24 下午2:40, chenxiaolong 写道:

gcc/ChangeLog:

* config/loongarch/loongarch.md (get_thread_pointer):Adds the
 instruction template corresponding to the __builtin_thread_pointer
 function.
* doc/extend.texi:Add the __builtin_thread_pointer function support
 description to the documentation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/builtin_thread_pointer.c: New test.
---
  gcc/config/loongarch/loongarch.md  |  7 +++
  gcc/doc/extend.texi|  5 +
  .../gcc.target/loongarch/builtin_thread_pointer.c  | 10 ++
  3 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/builtin_thread_pointer.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 13473472171..4dd716e1941 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -113,6 +113,7 @@ (define_c_enum "unspecv" [
  
  (define_constants

[(RETURN_ADDR_REGNUM1)
+   (TP_REGNUM  2)
 (T0_REGNUM 12)
 (T1_REGNUM 13)
 (S0_REGNUM 23)
@@ -3647,6 +3648,12 @@ (define_insn "@stack_tie"
[(set_attr "length" "0")
 (set_attr "type" "ghost")])
  
+;; Named pattern for expanding thread pointer reference.

+(define_expand "get_thread_pointer"
+  [(set (match_operand:P 0 "register_operand" "=r")
+   (reg:P TP_REGNUM))]
+  "HAVE_AS_TLS"
+  {})
  
  (define_split
[(match_operand 0 "small_data_pattern")]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf941e6b93a..9923a18bde9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16749,6 +16749,11 @@ __float128 __builtin_nanq (void);
  __float128 __builtin_nansq (void);
  @end smallexample
  
+Returns the value that is currently set in the @samp{tp} register.

+@smallexample
+void * __builtin_thread_pointer (void)
+@end smallexample
+
  @node MIPS DSP Built-in Functions
  @subsection MIPS DSP Built-in Functions
  
diff --git a/gcc/testsuite/gcc.target/loongarch/builtin_thread_pointer.c b/gcc/testsuite/gcc.target/loongarch/builtin_thread_pointer.c

new file mode 100644
index 000..541e3b143bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/builtin_thread_pointer.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "or\t\\\$r4,\\\$r2,\\\$r0" } } */
+
+void *
+get_tp ()
+{
+  return __builtin_thread_pointer ();
+}




Re: [pushed][PATCH v1] LoongArch: Fix vfrint-releated comments in lsxintrin.h and lasxintrin.h

2023-10-25 Thread chenglulu

Pushed to r14-4926.

在 2023/10/23 上午10:13, Chenghui Pan 写道:

The comment of vfrint-related intrinsic functions does not match the return
value type in definition. This patch fixes these comments.

gcc/ChangeLog:

* config/loongarch/lasxintrin.h (__lasx_xvftintrnel_l_s): Fix comments.
(__lasx_xvfrintrne_s): Ditto.
(__lasx_xvfrintrne_d): Ditto.
(__lasx_xvfrintrz_s): Ditto.
(__lasx_xvfrintrz_d): Ditto.
(__lasx_xvfrintrp_s): Ditto.
(__lasx_xvfrintrp_d): Ditto.
(__lasx_xvfrintrm_s): Ditto.
(__lasx_xvfrintrm_d): Ditto.
* config/loongarch/lsxintrin.h (__lsx_vftintrneh_l_s): Ditto.
(__lsx_vfrintrne_s): Ditto.
(__lsx_vfrintrne_d): Ditto.
(__lsx_vfrintrz_s): Ditto.
(__lsx_vfrintrz_d): Ditto.
(__lsx_vfrintrp_s): Ditto.
(__lsx_vfrintrp_d): Ditto.
(__lsx_vfrintrm_s): Ditto.
(__lsx_vfrintrm_d): Ditto.
---
  gcc/config/loongarch/lasxintrin.h | 16 
  gcc/config/loongarch/lsxintrin.h  | 16 
  2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/loongarch/lasxintrin.h 
b/gcc/config/loongarch/lasxintrin.h
index d3937992746..7bce2c757f1 100644
--- a/gcc/config/loongarch/lasxintrin.h
+++ b/gcc/config/loongarch/lasxintrin.h
@@ -3368,7 +3368,7 @@ __m256i __lasx_xvftintrnel_l_s (__m256 _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V8SI, V8SF.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256 __lasx_xvfrintrne_s (__m256 _1)
  {
@@ -3376,7 +3376,7 @@ __m256 __lasx_xvfrintrne_s (__m256 _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V4DI, V4DF.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256d __lasx_xvfrintrne_d (__m256d _1)
  {
@@ -3384,7 +3384,7 @@ __m256d __lasx_xvfrintrne_d (__m256d _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V8SI, V8SF.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256 __lasx_xvfrintrz_s (__m256 _1)
  {
@@ -3392,7 +3392,7 @@ __m256 __lasx_xvfrintrz_s (__m256 _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V4DI, V4DF.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256d __lasx_xvfrintrz_d (__m256d _1)
  {
@@ -3400,7 +3400,7 @@ __m256d __lasx_xvfrintrz_d (__m256d _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V8SI, V8SF.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256 __lasx_xvfrintrp_s (__m256 _1)
  {
@@ -3408,7 +3408,7 @@ __m256 __lasx_xvfrintrp_s (__m256 _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V4DI, V4DF.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256d __lasx_xvfrintrp_d (__m256d _1)
  {
@@ -3416,7 +3416,7 @@ __m256d __lasx_xvfrintrp_d (__m256d _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V8SI, V8SF.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256 __lasx_xvfrintrm_s (__m256 _1)
  {
@@ -3424,7 +3424,7 @@ __m256 __lasx_xvfrintrm_s (__m256 _1)
  }
  
  /* Assembly instruction format:	xd, xj.  */

-/* Data types in instruction templates:  V4DI, V4DF.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256d __lasx_xvfrintrm_d (__m256d _1)
  {
diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h
index ec42069904d..29553c093fa 100644
--- a/gcc/config/loongarch/lsxintrin.h
+++ b/gcc/config/loongarch/lsxintrin.h
@@ -3412,7 +3412,7 @@ __m128i __lsx_vftintrneh_l_s (__m128 _1)
  }
  
  /* Assembly instruction format:	vd, vj.  */

-/* Data types in instruction templates:  V4SI, V4SF.  */
+/* Data types in instruction templates:  V4SF, V4SF.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m128 __lsx_vfrintrne_s (__m128 _1)
  {
@@ -3420,7 +3420,7 @@ __m128 __lsx_vfrintrne_s (__m128 _1)
  }
  
  /* Assembly instruction format:	vd, vj.  */

-/* Data types in instruction templates:  V2DI, V2DF.  */
+/* Data types in instruction templates:  V2DF

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-24 18:51, Qing Zhao wrote:
>> Thanks for the proposal!
>> So what you suggested is:
>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
>> then the call to the _bdos (x.buf, 1) will
>> Become:
>>_bdos(__builtin_with_size(x.buf, x.L), 1)?
>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
> 
> Oops, I think Martin and I fell off-list in a subthread.  I clarified that my 
> comment was that any such annotation at object reference is probably too late 
> and hence not the right place for it; basically it has the same problems as 
> the option A in your comment.  A better place to reinforce such a 
> relationship would be the allocation+initialization site instead.

I think Martin’s proposal might work, it’s different than the option A:

A.  Add an additional argument, the size parameter,  to __bdos, 
 A.1, during FE;
 A.2, during gimplification phase;

Option A targets on the __bdos call, try to encode the implicit use to the 
call, this will not work when the real object has not been instantiation at the 
call site.

However, Martin’s proposal targets on the FMA array itself, it will enhance the 
FAM access naturally with the size information. And such FAM access with size 
info will propagated to the __bdos site later through inlining, etc. and then 
tree-object-size can use the size information at that point. At the same time, 
the implicit use of the size is recorded correctly. 

So, I think that this proposal is natural and reasonable.

Qing
> 
> Thanks,
> Sid



Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Jeff Law




On 10/25/23 01:12, Robin Dapp wrote:

Hi Vineet,

I was thinking of two things while skimming the code:

  - Couldn't we do this in the expanders directly?  Or is the
subreg-promoted info gone until we reach that?
Well, it doesn't seem like there's a lot of difference between doing it 
in the generic expander bits vs target expander bits -- the former just 
calls into the latter for the most part.  Thus if the subreg-promoted 
state is available in the target expander, I'd expect it to be available 
in the generic expander.


It may be the case that we have more places to fix because we have 
different expander paths (think conditional branches, conditional moves, 
sCC and probably others).  By catching it in riscv_expand_comparands he 
caught a nice little choke point.  I think what Vineet has done will 
also work for RTL if-conversion.





  - Should some common-code part be more suited to handle that?
We already elide redundant sign-zero extensions for other
reasons.  Maybe we could add subreg promoted handling there?
Unsure.  I wouldn't be surprised if we were able to find similar code in 
simplify-rtx or something like that.  It's probably worth a quick looksie.


I also wonder if Vineet's work would subsume this local change.  I've 
been meaning to find testcases for this and determine if we should drop 
it or clean it up and submit it upstream:



+(define_insn "*branch_equals_zero"
+  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:ANYI 2 "register_operand" "r")
+ (const_int 0)])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "!partial_subreg_p (operands[2])"
+  "b%C1\t%2,zero,%0"
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])



My sense is it's just papering over a missed simplification elsewhere.

Jeff


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Robin Dapp


> Well, it doesn't seem like there's a lot of difference between doing
> it in the generic expander bits vs target expander bits -- the former
> just calls into the latter for the most part.  Thus if the
> subreg-promoted state is available in the target expander, I'd expect
> it to be available in the generic expander.

Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well.  Maybe that doesn't work, though?

Regards
 Robin


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Jeff Law




On 10/25/23 07:47, Robin Dapp wrote:



Well, it doesn't seem like there's a lot of difference between doing
it in the generic expander bits vs target expander bits -- the former
just calls into the latter for the most part.  Thus if the
subreg-promoted state is available in the target expander, I'd expect
it to be available in the generic expander.


Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well.  Maybe that doesn't work, though?
That's a *really* interesting idea.  Interested in playing around a bit 
with that Vineet?


jeff


[NVPTX] Patch pings...

2023-10-25 Thread Roger Sayle


Random fact: there have been no changes to nvptx.md in 2023 apart
from Jakub's tree-wide update to the copyright years in early January.

Please can I ping two of my of pending Nvidia nvptx patches:

"Correct pattern for popcountdi2 insn in nvptx.md" from January
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609571.html

and

"Update nvptx's bitrev2 pattern to use BITREVERSE rtx" from June
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620994.html

Both of these still apply cleanly (because nvptx.md hasn't changed).

Thanks in advance,
Roger
--




[COMMITTED] Faster irange union for appending ranges.

2023-10-25 Thread Andrew MacLeod
Its a common idiom to build a range by unioning other ranges into 
another one.  If this is done sequentially, those new ranges can be 
simply appended to the end of the existing range, avoiding some 
expensive processing fro the general case.


This patch identifies and optimizes this situation.  The result is a 
2.1% speedup in VRP and a 0.8% speedup in threading, with a overall 
compile time improvement of 0.14% across the GCC build.


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
commit f7dbf6230453c76a19921607601eff968bb70169
Author: Andrew MacLeod 
Date:   Mon Oct 23 14:52:45 2023 -0400

Faster irange union for appending ranges.

A common pattern to to append a range to an existing range via union.
This optimizes that process.

* value-range.cc (irange::union_append): New.
(irange::union_): Call union_append when appropriate.
* value-range.h (irange::union_append): New prototype.

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index f507ec57536..fcf53efa1dd 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1291,6 +1291,45 @@ irange::irange_single_pair_union (const irange &r)
   return true;
 }
 
+// Append R to this range, knowing that R occurs after all of these subranges.
+// Return TRUE as something must have changed.
+
+bool
+irange::union_append (const irange &r)
+{
+  // Check if the first range in R is an immmediate successor to the last
+  // range, ths requiring a merge.
+  signop sign = TYPE_SIGN (m_type);
+  wide_int lb = r.lower_bound ();
+  wide_int ub = upper_bound ();
+  unsigned start = 0;
+  if (widest_int::from (ub, sign) + 1
+  == widest_int::from (lb, sign))
+{
+  m_base[m_num_ranges * 2 - 1] = r.m_base[1];
+  start = 1;
+}
+  maybe_resize (m_num_ranges + r.m_num_ranges - start);
+  for ( ; start < r.m_num_ranges; start++)
+{
+  // Merge the last ranges if it exceeds the maximum size.
+  if (m_num_ranges + 1 > m_max_ranges)
+   {
+ m_base[m_max_ranges * 2 - 1] = r.m_base[r.m_num_ranges * 2 - 1];
+ break;
+   }
+  m_base[m_num_ranges * 2] = r.m_base[start * 2];
+  m_base[m_num_ranges * 2 + 1] = r.m_base[start * 2 + 1];
+  m_num_ranges++;
+}
+
+  if (!union_bitmask (r))
+normalize_kind ();
+  if (flag_checking)
+verify_range ();
+  return true;
+}
+
 // Return TRUE if anything changes.
 
 bool
@@ -1322,6 +1361,11 @@ irange::union_ (const vrange &v)
   if (m_num_ranges == 1 && r.m_num_ranges == 1)
 return irange_single_pair_union (r);
 
+  signop sign = TYPE_SIGN (m_type);
+  // Check for an append to the end.
+  if (m_kind == VR_RANGE && wi::gt_p (r.lower_bound (), upper_bound (), sign))
+return union_append (r);
+
   // If this ranges fully contains R, then we need do nothing.
   if (irange_contains_p (r))
 return union_bitmask (r);
@@ -1340,7 +1384,6 @@ irange::union_ (const vrange &v)
   // [Xi,Yi]..[Xn,Yn]  U  [Xj,Yj]..[Xm,Ym]   -->  [Xk,Yk]..[Xp,Yp]
   auto_vec res (m_num_ranges * 2 + r.m_num_ranges * 2);
   unsigned i = 0, j = 0, k = 0;
-  signop sign = TYPE_SIGN (m_type);
 
   while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
 {
diff --git a/gcc/value-range.h b/gcc/value-range.h
index c00b15194c4..e9d81d22cd0 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -339,6 +339,7 @@ private:
   bool set_range_from_bitmask ();
 
   bool intersect (const wide_int& lb, const wide_int& ub);
+  bool union_append (const irange &r);
   unsigned char m_num_ranges;
   bool m_resizable;
   unsigned char m_max_ranges;


Re: [PATCH] c++: build_new_1 and non-dep array size [PR111929]

2023-10-25 Thread Patrick Palka
On Tue, 24 Oct 2023, Jason Merrill wrote:

> On 10/24/23 13:03, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > like the right approach?
> > 
> > -- >8 --
> > 
> > This PR is another instance of NON_DEPENDENT_EXPR having acted as an
> > "analysis barrier" for middle-end routines, and now that it's gone we
> > may end up passing weird templated trees (that have a generic tree code)
> > to the middle-end which leads to an ICE.  In the testcase below the
> > non-dependent array size 'var + 42' is expressed as an ordinary
> > PLUS_EXPR, but whose operand types have different precisions -- long and
> > int respectively -- naturally because templated trees encode only the
> > syntactic form of an expression devoid of e.g. implicit conversions
> > (typically).  This type incoherency triggers a wide_int assert during
> > the call to size_binop in build_new_1 which requires the operand types
> > have the same precision.
> > 
> > This patch fixes this by replacing our incremental folding of 'size'
> > within build_new_1 with a single call to cp_fully_fold (which is a no-op
> > in template context) once 'size' is fully built.
> 
> This is OK, but we could probably also entirely skip a lot of the calculation
> in a template, since we don't care about any values.  Can we skip the entire
> if (array_p) block?

That seems to be safe correctness-wise, but QOI-wise it'd mean we'd no
longer diagnose a too large array size ahead of time:

  template
  void f() {
new int[__SIZE_MAX__ / sizeof(int)];
  }

  : In function ‘void f()’:
  :3:37: error: size ‘(((sizetype)(18446744073709551615 / sizeof (int))) 
* 4)’ of array exceeds maximum object size ‘9223372036854775807’

(That we diagnose this ahead of time is thanks to the NON_DEPENDENT_EXPR
removal; previously 'nelts' was wrapped in NON_DEPENDENT_EXPR which
ironically prevented fold_non_dependent_expr from folding it to a
constant...)

> 
> > PR c++/111929
> > 
> > gcc/cp/ChangeLog:
> > 
> > * init.cc (build_new_1): Use convert, build2, build3 instead of
> > fold_convert, size_binop and fold_build3 when building 'size'.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/non-dependent28.C: New test.
> > ---
> >   gcc/cp/init.cc  | 9 +
> >   gcc/testsuite/g++.dg/template/non-dependent28.C | 6 ++
> >   2 files changed, 11 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent28.C
> > 
> > diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> > index d48bb16c7c5..56c1b5e9f5e 100644
> > --- a/gcc/cp/init.cc
> > +++ b/gcc/cp/init.cc
> > @@ -3261,7 +3261,7 @@ build_new_1 (vec **placement, tree type,
> > tree nelts,
> > max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
> > max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
> >   -  size = size_binop (MULT_EXPR, size, fold_convert (sizetype,
> > nelts));
> > +  size = build2 (MULT_EXPR, sizetype, size, convert (sizetype, nelts));
> >   if (TREE_CODE (cst_outer_nelts) == INTEGER_CST)
> > {
> > @@ -3344,7 +3344,7 @@ build_new_1 (vec **placement, tree type,
> > tree nelts,
> > /* Use a class-specific operator new.  */
> > /* If a cookie is required, add some extra space.  */
> > if (array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type))
> > -   size = size_binop (PLUS_EXPR, size, cookie_size);
> > +   size = build2 (PLUS_EXPR, sizetype, size, cookie_size);
> > else
> > {
> >   cookie_size = NULL_TREE;
> > @@ -3358,8 +3358,8 @@ build_new_1 (vec **placement, tree type,
> > tree nelts,
> > if (cxx_dialect >= cxx11 && flag_exceptions)
> > errval = throw_bad_array_new_length ();
> > if (outer_nelts_check != NULL_TREE)
> > -   size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
> > -   size, errval);
> > +   size = build3 (COND_EXPR, sizetype, outer_nelts_check, size, errval);
> > +  size = cp_fully_fold (size);
> > /* Create the argument list.  */
> > vec_safe_insert (*placement, 0, size);
> > /* Do name-lookup to find the appropriate operator.  */
> > @@ -3418,6 +3418,7 @@ build_new_1 (vec **placement, tree type,
> > tree nelts,
> > /* If size is zero e.g. due to type having zero size, try to
> >  preserve outer_nelts for constant expression evaluation
> >  purposes.  */
> > +  size = cp_fully_fold (size);
> > if (integer_zerop (size) && outer_nelts)
> > size = build2 (MULT_EXPR, TREE_TYPE (size), size, outer_nelts);
> >   diff --git a/gcc/testsuite/g++.dg/template/non-dependent28.C
> > b/gcc/testsuite/g++.dg/template/non-dependent28.C
> > new file mode 100644
> > index 000..3e45154f61d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/non-dependent28.C
> > @@ -0,0 +1,6 @@
> > +// PR c++/111929
> > +
> > +template
> > +void f(long var) {
> > +  new int[var + 42];
> > +}
> 

[committed] i386: Narrow test instructions with immediate operands [PR111698]

2023-10-25 Thread Uros Bizjak
i386: Narrow test instructions with immediate operands [PR111698]

Narrow test instructions with immediate operand that test memory location
for zero.  E.g. testl $0x00aa, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after a large write to the same address causes store-to-load forwarding stall.

PR target/111698

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
New tune.
* config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro.
* config/i386/i386.md: New peephole pattern to narrow test
instructions with immediate operands that test memory locations
for zero.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111698.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e4c1fc6eef0..4426b27f4fe 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -311,6 +311,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 #define TARGET_USE_SAHFix86_tune_features[X86_TUNE_USE_SAHF]
 #define TARGET_MOVXix86_tune_features[X86_TUNE_MOVX]
 #define TARGET_PARTIAL_REG_STALL ix86_tune_features[X86_TUNE_PARTIAL_REG_STALL]
+#define TARGET_PARTIAL_MEMORY_READ_STALL \
+   ix86_tune_features[X86_TUNE_PARTIAL_MEMORY_READ_STALL]
 #define TARGET_PARTIAL_FLAG_REG_STALL \
ix86_tune_features[X86_TUNE_PARTIAL_FLAG_REG_STALL]
 #define TARGET_LCP_STALL \
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f90cf1ca734..5d8d5b2eae6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11100,6 +11100,57 @@ (define_split
   operands[3] = gen_int_mode (INTVAL (operands[3]), QImode);
 })
 
+;; Narrow test instructions with immediate operands that test
+;; memory locations for zero.  E.g. testl $0x00aa, mem can be
+;; converted to testb $0xaa, mem+2.  Reject volatile locations and
+;; targets where reading (possibly unaligned) part of memory
+;; location after a large write to the same address causes
+;; store-to-load forwarding stall.
+(define_peephole2
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ
+ (and:SWI248 (match_operand:SWI248 0 "memory_operand")
+ (match_operand 1 "const_int_operand"))
+ (const_int 0)))]
+  "!TARGET_PARTIAL_MEMORY_READ_STALL && !MEM_VOLATILE_P (operands[0])"
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (match_dup 2) (const_int 0)))]
+{
+  unsigned HOST_WIDE_INT ival = UINTVAL (operands[1]);
+  int first_nonzero_byte, bitsize;
+  rtx new_addr, new_const;
+  machine_mode new_mode;
+
+  if (ival == 0)
+FAIL;
+
+  /* Clear bits outside mode width.  */
+  ival &= GET_MODE_MASK (mode);
+
+  first_nonzero_byte = ctz_hwi (ival) / BITS_PER_UNIT;
+
+  ival >>= first_nonzero_byte * BITS_PER_UNIT;
+
+  bitsize = sizeof (ival) * BITS_PER_UNIT - clz_hwi (ival);
+
+  if (bitsize <= GET_MODE_BITSIZE (QImode))
+new_mode = QImode;
+  else if (bitsize <= GET_MODE_BITSIZE (HImode))
+new_mode = HImode;
+  else if (bitsize <= GET_MODE_BITSIZE (SImode))
+new_mode = SImode;
+  else
+new_mode = DImode;
+
+  if (GET_MODE_SIZE (new_mode) >= GET_MODE_SIZE (mode))
+FAIL;
+
+  new_addr = adjust_address (operands[0], new_mode, first_nonzero_byte);
+  new_const = gen_int_mode (ival, new_mode);
+
+  operands[2] = gen_rtx_AND (new_mode, new_addr, new_const);
+})
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 3636a4a95d8..9d0699ff9b9 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -658,6 +658,14 @@ DEF_TUNE (X86_TUNE_NOT_UNPAIRABLE, "not_unpairable", 
m_PENT | m_LAKEMONT)
and can happen in caller/callee saving sequences.  */
 DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
 
+/* X86_TUNE_PARTIAL_MEMORY_READ_STALL: Reading (possible unaligned) part of
+   memory location after a large write to the same address causes
+   store-to-load forwarding stall.  */
+DEF_TUNE (X86_TUNE_PARTIAL_MEMORY_READ_STALL, "partial_memoy_read_stall",
+ m_386 | m_486 | m_PENT | m_LAKEMONT | m_PPRO | m_P4_NOCONA | m_CORE2
+  | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
+  | m_K6_GEODE | m_ATHLON_K8 | m_AMDFAM10)
+
 /* X86_TUNE_PROMOTE_QIMODE: When it is cheap, turn 8bit arithmetic to
corresponding 32bit arithmetic.  */
 DEF_TUNE (X86_TUNE_PROMOTE_QIMODE, "promote_qimode",
diff --git a/gcc/testsuite/gcc.target/i386/pr111698.c 
b/gcc/testsuite/gcc.target/i386/pr111698.c
new file mode 100644
index 000..2da6be531a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111698.c
@@ -0,0 +1,19 @@
+/* PR target/111698 */
+/* { dg-options "-O2 -masm=att" } */
+/* { dg-fi

RE: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-25 Thread Roger Sayle
Hi Uros,

I've tried your suggestions to see what would happen.
Alas, allowing both operands to (i386's) widening multiplications
to be  nonimmediate_operand results in 90 additional testsuite
unexpected failures", and 41 unresolved testcase, around things
like:

gcc.c-torture/compile/di.c:6:1: error: unrecognizable insn:
(insn 14 13 15 2 (parallel [
(set (reg:DI 98 [ _3 ])
(mult:DI (zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 
virtual-stack-vars)
(const_int -8 [0xfff8])) [1 a+0 S4 
A64]))
(zero_extend:DI (mem/c:SI (plus:SI (reg/f:SI 93 
virtual-stack-vars)
(const_int -16 [0xfff0])) [1 b+0 S4 
A64]
(clobber (reg:CC 17 flags))
]) "gcc.c-torture/compile/di.c":5:12 -1
 (nil))
during RTL pass: vregs
gcc.c-torture/compile/di.c:6:1: internal compiler error: in extract_insn, at 
recog.cc:2791

In my experiments, I've used nonimmediate_operand instead of general_operand,
as a zero_extend of an immediate_operand, like const_int, would be 
non-canonical.

In short, it's ok (common) for '%' to apply to operands with different 
predicates;
reload will only swap things if the operand's predicates/constraints remain 
consistent.
For example, see i386.c's *add_1 pattern.  And as shown above it can't
be left to (until) reload to decide which "mem" gets loaded into a register 
(which
would be nice), as some passes before reload check both predicates and 
constraints.

My original patch fixes PR 110511, using the same peephole2 idiom as already
used elsewhere in i386.md.  Ok for mainline?

> -Original Message-
> From: Uros Bizjak 
> Sent: 19 October 2023 18:02
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening
> multiplications.
> 
> On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle 
> wrote:
> >
> >
> > This patch contains clean-ups of the widening multiplication patterns
> > in i386.md, and provides variants of the existing highpart
> > multiplication
> > peephole2 transformations (that tidy up register allocation after
> > reload), and thereby fixes PR target/110511, which is a superfluous
> > move instruction.
> >
> > For the new test case, compiled on x86_64 with -O2.
> >
> > Before:
> > mulx64: movabsq $-7046029254386353131, %rcx
> > movq%rcx, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > After:
> > mulx64: movabsq $-7046029254386353131, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > The clean-ups are (i) that operand 1 is consistently made
> > register_operand and operand 2 becomes nonimmediate_operand, so that
> > predicates match the constraints, (ii) the representation of the BMI2
> > mulx instruction is updated to use the new umul_highpart RTX, and
> > (iii) because operands
> > 0 and 1 have different modes in widening multiplications, "a" is a
> > more appropriate constraint than "0" (which avoids spills/reloads
> > containing SUBREGs).  The new peephole2 transformations are based upon
> > those at around line 9951 of i386.md, that begins with the comment ;;
> > Highpart multiplication peephole2s to tweak register allocation.
> > ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq
> > %rdi
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110511
> > * config/i386/i386.md (mul3): Make operands 1 and
> > 2 take "regiser_operand" and "nonimmediate_operand" respectively.
> > (mulqihi3): Likewise.
> > (*bmi2_umul3_1): Operand 2 needs to be register_operand
> > matching the %d constraint.  Use umul_highpart RTX to represent
> > the highpart multiplication.
> > (*umul3_1):  Operand 2 should use regiser_operand
> > predicate, and "a" rather than "0" as operands 0 and 2 have
> > different modes.
> > (define_split): For mul to mulx conversion, use the new
> > umul_highpart RTX representation.
> > (*mul3_1):  Operand 1 should be register_operand
> > and the constraint %a as operands 0 and 1 have different modes.
> > (*mulqihi3_1): Operand 1 should be register_operand matching
> > the constraint %0.
> > (define_peephole2): Providing widening multiplication variants
> > of the peephole2s that tweak highpart multiplication register
> > allocation.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110511
> > * gcc.target/i386/pr110511.c: New test case.
> >
> 
>  (define_insn "*bmi2_umul3_1"
>[(set (match_operand:DWIH 0 "register_operand" "=r")
>  (mult:DWIH
> -  (match_operand:DWIH 2 "nonimmedia

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Siddhesh Poyarekar

On 2023-10-25 09:27, Qing Zhao wrote:




On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  wrote:

On 2023-10-24 18:51, Qing Zhao wrote:

Thanks for the proposal!
So what you suggested is:
For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
then the call to the _bdos (x.buf, 1) will
Become:
_bdos(__builtin_with_size(x.buf, x.L), 1)?
Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?


Oops, I think Martin and I fell off-list in a subthread.  I clarified that my 
comment was that any such annotation at object reference is probably too late 
and hence not the right place for it; basically it has the same problems as the 
option A in your comment.  A better place to reinforce such a relationship 
would be the allocation+initialization site instead.


I think Martin’s proposal might work, it’s different than the option A:

A.  Add an additional argument, the size parameter,  to __bdos,
  A.1, during FE;
  A.2, during gimplification phase;

Option A targets on the __bdos call, try to encode the implicit use to the 
call, this will not work when the real object has not been instantiation at the 
call site.

However, Martin’s proposal targets on the FMA array itself, it will enhance the 
FAM access naturally with the size information. And such FAM access with size 
info will propagated to the __bdos site later through inlining, etc. and then 
tree-object-size can use the size information at that point. At the same time, 
the implicit use of the size is recorded correctly.

So, I think that this proposal is natural and reasonable.


Ack, we discussed this later in the thread and I agree[1].  Richard 
still has concerns[2] that I think may be addressed by putting 
__builtin_with_size at the point where the reference to x.buf escapes, 
but I'm not very sure about that.


Oh, and Martin suggested using __builtin_with_size more generally[3] in 
bugzilla to address attribute inlining issues and we have high level 
consensus for a __builtin_with_access instead, which associates access 
type in addition to size with the target object.  For the purposes of 
counted_by, access type could simply be -1.


Thanks,
Sid


[1] 
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039


[2] 
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3


[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6


[PATCH] s390: fix htm-builtins test cases

2023-10-25 Thread Juergen Christ
Transactional and non-transactional stores to the same cache line cause
transactions to abort on newer generations.  Add sufficient padding to make
sure another cache line is used.

Tested on s390.

gcc/testsuite/ChangeLog:

* gcc.target/s390/htm-builtins-1.c: Fix.
* gcc.target/s390/htm-builtins-2.c: Fix.

Signed-off-by: Juergen Christ 
---
 gcc/testsuite/gcc.target/s390/htm-builtins-1.c | 4 +++-
 gcc/testsuite/gcc.target/s390/htm-builtins-2.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c 
b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
index ff43be9fe736..4f95bf3accaa 100644
--- a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
+++ b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
@@ -53,9 +53,11 @@ __attribute__ ((aligned(256))) struct
 __attribute__ ((aligned(256))) struct
 {
   volatile uint64_t c1;
+  char pad1[256 - sizeof(uint64_t)];
   volatile uint64_t c2;
+  char pad2[256 - sizeof(uint64_t)];
   volatile uint64_t c3;
-} counters = { 0, 0, 0 };
+} counters = { 0 };
 
 /*  local helper functions - */
 
diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c 
b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
index bb9d346ea560..2e838caacc8c 100644
--- a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
+++ b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
@@ -94,9 +94,11 @@ float global_float_3 = 0.0;
 __attribute__ ((aligned(256))) struct
 {
   volatile uint64_t c1;
+  char pad1[256 - sizeof(uint64_t)];
   volatile uint64_t c2;
+  char pad2[256 - sizeof(uint64_t)];
   volatile uint64_t c3;
-} counters = { 0, 0, 0 };
+} counters = { 0 };
 
 /*  local helper functions - */
 
-- 
2.39.3



Re: [PATCH] s390: fix htm-builtins test cases

2023-10-25 Thread Andreas Krebbel
On 10/25/23 16:50, Juergen Christ wrote:
> Transactional and non-transactional stores to the same cache line cause
> transactions to abort on newer generations.  Add sufficient padding to make
> sure another cache line is used.
> 
> Tested on s390.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/htm-builtins-1.c: Fix.
>   * gcc.target/s390/htm-builtins-2.c: Fix.

Ok. Thanks!

Andreas

> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/testsuite/gcc.target/s390/htm-builtins-1.c | 4 +++-
>  gcc/testsuite/gcc.target/s390/htm-builtins-2.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c 
> b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
> index ff43be9fe736..4f95bf3accaa 100644
> --- a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
> +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
> @@ -53,9 +53,11 @@ __attribute__ ((aligned(256))) struct
>  __attribute__ ((aligned(256))) struct
>  {
>volatile uint64_t c1;
> +  char pad1[256 - sizeof(uint64_t)];
>volatile uint64_t c2;
> +  char pad2[256 - sizeof(uint64_t)];
>volatile uint64_t c3;
> -} counters = { 0, 0, 0 };
> +} counters = { 0 };
>  
>  /*  local helper functions - 
> */
>  
> diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c 
> b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
> index bb9d346ea560..2e838caacc8c 100644
> --- a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
> +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
> @@ -94,9 +94,11 @@ float global_float_3 = 0.0;
>  __attribute__ ((aligned(256))) struct
>  {
>volatile uint64_t c1;
> +  char pad1[256 - sizeof(uint64_t)];
>volatile uint64_t c2;
> +  char pad2[256 - sizeof(uint64_t)];
>volatile uint64_t c3;
> -} counters = { 0, 0, 0 };
> +} counters = { 0 };
>  
>  /*  local helper functions - 
> */
>  



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Richard Biener



> Am 25.10.2023 um 16:50 schrieb Siddhesh Poyarekar :
> 
> On 2023-10-25 09:27, Qing Zhao wrote:
 On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  
 wrote:
>>> 
>>> On 2023-10-24 18:51, Qing Zhao wrote:
 Thanks for the proposal!
 So what you suggested is:
 For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the 
 FE, then the call to the _bdos (x.buf, 1) will
 Become:
_bdos(__builtin_with_size(x.buf, x.L), 1)?
 Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>> 
>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that 
>>> my comment was that any such annotation at object reference is probably too 
>>> late and hence not the right place for it; basically it has the same 
>>> problems as the option A in your comment.  A better place to reinforce such 
>>> a relationship would be the allocation+initialization site instead.
>> I think Martin’s proposal might work, it’s different than the option A:
>> A.  Add an additional argument, the size parameter,  to __bdos,
>>  A.1, during FE;
>>  A.2, during gimplification phase;
>> Option A targets on the __bdos call, try to encode the implicit use to the 
>> call, this will not work when the real object has not been instantiation at 
>> the call site.
>> However, Martin’s proposal targets on the FMA array itself, it will enhance 
>> the FAM access naturally with the size information. And such FAM access with 
>> size info will propagated to the __bdos site later through inlining, etc. 
>> and then tree-object-size can use the size information at that point. At the 
>> same time, the implicit use of the size is recorded correctly.
>> So, I think that this proposal is natural and reasonable.
> 
> Ack, we discussed this later in the thread and I agree[1].  Richard still has 
> concerns[2] that I think may be addressed by putting __builtin_with_size at 
> the point where the reference to x.buf escapes, but I'm not very sure about 
> that.
> 
> Oh, and Martin suggested using __builtin_with_size more generally[3] in 
> bugzilla to address attribute inlining issues and we have high level 
> consensus for a __builtin_with_access instead, which associates access type 
> in addition to size with the target object.  For the purposes of counted_by, 
> access type could simply be -1.

Btw, I’d like to see some hard numbers on the amount of extra false positives 
this will cause a well as the effect on generated code before putting this in 
mainline and effectively needing to support it forever.

Richard 

> Thanks,
> Sid
> 
> 
> [1] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
> 
> [2] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
> 
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6


[PATCH] c++: another build_new_1 folding fix [PR111929]

2023-10-25 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

We also need to avoid folding 'outer_nelts_check' when in a template
context to prevent an ICE on the below testcase.  This patch achieves
this by replacing the fold_build2 call with build2 (cp_fully_fold will
later fold the overall expression if appropriate).

In passing, this patch removes an unnecessary call to convert on 'nelts'
since it should always already be a size_t (and 'convert' isn't the best
conversion entry point to use anyway since it doesn't take a complain
parameter.)

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Remove unnecessary call to convert
on 'nelts'.  Use build2 instead of fold_build2 for
'outer_nelts_checks'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28a.C: New test.
---
 gcc/cp/init.cc   | 8 
 gcc/testsuite/g++.dg/template/non-dependent28a.C | 8 
 2 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent28a.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 65d37c3c0c7..6444f0a8518 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3261,7 +3261,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
   max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
 
-  size = build2 (MULT_EXPR, sizetype, size, convert (sizetype, nelts));
+  size = build2 (MULT_EXPR, sizetype, size, nelts);
 
   if (TREE_CODE (cst_outer_nelts) == INTEGER_CST)
{
@@ -3293,9 +3293,9 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
- wi::clz (max_outer_nelts);
  max_outer_nelts = (max_outer_nelts >> shift) << shift;
 
-  outer_nelts_check = fold_build2 (LE_EXPR, boolean_type_node,
-  outer_nelts,
-  max_outer_nelts_tree);
+ outer_nelts_check = build2 (LE_EXPR, boolean_type_node,
+ outer_nelts,
+ max_outer_nelts_tree);
}
 }
 
diff --git a/gcc/testsuite/g++.dg/template/non-dependent28a.C 
b/gcc/testsuite/g++.dg/template/non-dependent28a.C
new file mode 100644
index 000..d32520c38ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent28a.C
@@ -0,0 +1,8 @@
+// PR c++/111929
+
+struct A { operator int(); };
+
+template
+void f() {
+  new int[A()];
+}
-- 
2.42.0.482.g2e8e77cbac



Re: [PING^2] More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.

2023-10-25 Thread Jeff Law




On 10/25/23 02:38, Thomas Schwinge wrote:

Hi!

Ping.


Grüße
  Thomas


On 2023-09-19T10:47:56+0200, I wrote:

Hi!

Ping.


Grüße
  Thomas


On 2023-09-08T14:02:50+0200, I wrote:

Hi!

On 2017-08-10T15:42:13+0200, Jan Hubicka  wrote:

On 07/31/2017 11:57 AM, Yuri Gribov wrote:

On Mon, Jul 31, 2017 at 9:04 AM, Martin Liška  wrote:

Doing the transformation suggested by Honza.


... which was:

| On 2017-07-24T16:06:22+0200, Jan Hubicka  wrote:
| > we probably should turn ASM_OUTPUT_DEF ifdefs into a conditional compilation
| > incrementally.


>From 78ee08b25d22125cb1fa248bac98ef1e84504761 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 25 Jul 2017 13:11:28 +0200
Subject: [PATCH] Introduce TARGET_SUPPORTS_ALIASES


..., and got pushed as commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6
(Subversion r251048) "Introduce TARGET_SUPPORTS_ALIASES".

I don't know if that was actually intentional here, or just an
"accident", but such changes actually allow that a back end may or may
not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES')
independent of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not
just on static but instead on dynamic (run-time) configuration.  This is
relevant for the nvptx back end's '-malias' flag.

There did remain a few instances where we currently still assume that
from '#ifdef ASM_OUTPUT_DEF' follows 'TARGET_SUPPORTS_ALIASES', which I'm
adjusting in the attached (with '--ignore-space-change', for easy review)
"More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.".
OK to push?

These changes are necessary to cure nvptx regressions raised in

"[nvptx] Use .alias directive for mptx >= 6.3", addressing the comment:
"[...] remains to be analyzed".

OK
jeff


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta

Hey Robin,

On 10/25/23 00:12, Robin Dapp wrote:

Hi Vineet,

I was thinking of two things while skimming the code:

  - Couldn't we do this in the expanders directly?  Or is the
subreg-promoted info gone until we reach that?


Following is the call stack involved:

  expand_gimple_cond
    do_compare_and_jump
   emit_cmp_and_jump_insns
   gen_cbranchqi4
   riscv_expand_conditional_branch
   riscv_emit_int_compare
  riscv_extend_comparands


Last function is what introduces the extraneous sign extends, w/o taking 
subreg-promoted into consideration and what my patch attempts to address.



  - Should some common-code part be more suited to handle that?
We already elide redundant sign-zero extensions for other
reasons.  Maybe we could add subreg promoted handling there?


Not in the context of this specific issue.

-Vineet


Re: [PATCH v2] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-25 Thread Jeff Law




On 10/25/23 03:21, Roger Sayle wrote:


Hi Jeff,
Many thanks for the review/approval of my fix for PR rtl-optimization/91865.
Based on your and Richard Biener's feedback, I’d like to propose a revision
calling simplify_unary_operation instead of simplify_const_unary_operation
(i.e. Richi's recommendation).  I was originally concerned that this might
potentially result in unbounded recursion, and testing for ZERO_EXTEND was
safer but "uglier", but testing hasn't shown any issues.  If we do see issues
in the future, it's easy to fall back to the previous version of this patch.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-25  Roger Sayle  
 Richard Biener  

gcc/ChangeLog
 PR rtl-optimization/91865
 * combine.cc (make_compound_operation): Avoid creating a
 ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
 PR rtl-optimization/91865
 * gcc.target/msp430/pr91865.c: New test case.
I'm not terribly worried about recursion.  For the case you want to 
handle, it's going to be picked up by the call to 
simplify_const_unary_operation at the start of simplify_unary_operation. 
 It's only if that fails that we call into simplify_unary_operation_1.


The only thing that even comes close to worrisome to me in this space is 
the asserts in do_SUBST.  But I don't think your patch is likely to make 
the problems with those asserts any worse than they already are.


OK for the trunk.

Jeff



Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Jeff Law




On 10/25/23 10:25, Vineet Gupta wrote:

Hey Robin,

On 10/25/23 00:12, Robin Dapp wrote:

Hi Vineet,

I was thinking of two things while skimming the code:

  - Couldn't we do this in the expanders directly?  Or is the
    subreg-promoted info gone until we reach that?


Following is the call stack involved:

   expand_gimple_cond
     do_compare_and_jump
    emit_cmp_and_jump_insns
    gen_cbranchqi4
    riscv_expand_conditional_branch
    riscv_emit_int_compare
   riscv_extend_comparands


Last function is what introduces the extraneous sign extends, w/o taking 
subreg-promoted into consideration and what my patch attempts to address.



  - Should some common-code part be more suited to handle that?
    We already elide redundant sign-zero extensions for other
    reasons.  Maybe we could add subreg promoted handling there?


Not in the context of this specific issue.
Robin's point (IIUC) is that if we put this logic into a zero/sign 
extend expander, then it'll get used for *any* attempt to zero/sign 
extend that goes through the target expander.


It doesn't work for your case because we use gen_rtx_{ZERO,SIGN}_EXTEND 
directly.   But if those were adjusted to use the expander, then Robin's 
idea would be applicable to this case too.


Jeff


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta




On 10/25/23 09:30, Jeff Law wrote:

  - Should some common-code part be more suited to handle that?
    We already elide redundant sign-zero extensions for other
    reasons.  Maybe we could add subreg promoted handling there?


Not in the context of this specific issue.
Robin's point (IIUC) is that if we put this logic into a zero/sign 
extend expander, then it'll get used for *any* attempt to zero/sign 
extend that goes through the target expander.


It doesn't work for your case because we use 
gen_rtx_{ZERO,SIGN}_EXTEND directly.   But if those were adjusted to 
use the expander, then Robin's idea would be applicable to this case too


Understood. Definitely solid idea.

-Vineet


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta




On 10/25/23 06:52, Jeff Law wrote:


On 10/25/23 07:47, Robin Dapp wrote:



Well, it doesn't seem like there's a lot of difference between doing
it in the generic expander bits vs target expander bits -- the former
just calls into the latter for the most part.  Thus if the
subreg-promoted state is available in the target expander, I'd expect
it to be available in the generic expander.


Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well.  Maybe that doesn't work, though?
That's a *really* interesting idea.  Interested in playing around a 
bit with that Vineet?


Sure I'll tinker with the {sign,zero}expanders.

And there's a third playing field :-) There seem to be still cases where 
subreg-promoted note is not set when it probably should.
So we end up in riscv_extend_comparands but with note not being there 
(for something corresponding to function arg) thus can't skip the extension.


Thx,
-Vineet


Re: [PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-25 Thread Ian Lance Taylor
On Tue, Oct 24, 2023, 11:03 AM Jeff Law  wrote:

>
>
> On 10/24/23 09:26, Stefan Schulze Frielinghaus wrote:
> > Currently _BitInt is only supported on x86_64 which means that for other
> > targets all tests fail with e.g.
> >
> > gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is
> not supported on this target
> >237 | _BitInt(32) b32_v;
> >| ^~~
> >
> > Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
> > into godump-2.c such that all other tests in godump-1.c are still
> > executed in case of missing _BitInt support.
> >
> > Tested on s390x and x86_64.  Ok for mainline?
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
> >   * gcc.misc-tests/godump-2.c: New test.
> OK
>

Thanks.

Ian

>


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 6:39 AM, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
>> 
>>> Am 25.10.2023 um 10:16 schrieb Martin Uecker :
>>> 
>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
 
>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> 
> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very 
>> helpful.
>> I think that this example is an excellent example to show (almost) all 
>> the issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and 
>> run-able, as following: 
>> (but I still cannot make the incorrect reordering or DSE happening, 
>> anyway, the potential reordering possibility is there…)
>> 
>> 1 #include 
>> 2 struct A
>> 3 {
>> 4  size_t size;
>> 5  char buf[] __attribute__((counted_by(size)));
>> 6 };
>> 7 
>> 8 static size_t
>> 9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>> sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23 
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> With my GCC, it was compiled and worked:
>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> 20
>> Situation 1: With O1 and above, the routine “get_size_from” was inlined 
>> into “foo”, therefore, the call to __bdos is in the same routine as the 
>> instantiation of the object, and the TYPE information and the attached 
>> counted_by attribute information in the TYPE of the object can be USED 
>> by the __bdos call to compute the final object size. 
>> 
>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> -1
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>> instantiation of the object, As a result, the TYPE info and the attached 
>> counted_by info of the object can NOT be USED by the __bdos call. 
>> 
>> Keep in mind of the above 2 situations, we will refer them in below:
>> 
>> 1. First,  the problem we are trying to resolve is:
>> 
>> (Your description):
>> 
>>> the reordering of __bdos w.r.t. initialization of the size parameter 
>>> but to also account for DSE of the assignment, we can abstract this 
>>> problem to that of DFA being unable to see implicit use of the size 
>>> parameter in the __bdos call.
>> 
>> basically is correct.  However, with the following exception:
>> 
>> The implicit use of the size parameter in the __bdos call is not always 
>> there, it ONLY exists WHEN the __bdos is able to evaluated to an 
>> expression of the size parameter in the “objsz” phase, i.e., the 
>> “Situation 1” of the above example. 
>> In the “Situation 2”, when the __bdos does not see the TYPE of the real 
>> object,  it does not see the counted_by information from the TYPE, 
>> therefore,  it is not able to evaluate the size of the object through 
>> the counted_by information.  As a result, the implicit use of the size 
>> parameter in the __bdos call does NOT exist at all.  The optimizer can 
>> freely reorder the initialization of the size parameter with the __bdos 
>> call since there is no data flow dependency between these two. 
>> 
>> With this exception in mind, we can see that your proposed “option 2” 
>> (making the type of size “volatile”) is too conservative, it will  
>> disable many optimizations  unnecessarily, even though it’s safe and 
>> simple to implement. 
>> 
>> As a compiler optimization person for many many years, I really don’t 
>> want to take this approach at this moment.  -:)
>> 
>> 2. Some facts I’d like to mention:
>> 
>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
>> optimization stage. During RTL stage,  the __bdos call has already been 
>> replaced by an expression of the size parameter or a constant, the data 
>> dependency is explicitly in the IR already.  I believe that the data 
>> analysis in RTL stage should pick up the data dependency correctly, No 
>> special handling is needed in RTL.
>> 
>> B. If the __bdos call cannot see the real object , it has no way to get 
>> the “counted_by” field from the TYPE of the 

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Martin Uecker
Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
> 
> > Am 25.10.2023 um 12:47 schrieb Martin Uecker :
> > 
> > Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
> > > > On 2023-10-25 04:16, Martin Uecker wrote:
> > > > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > > > 
> > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker :
> > > > > > 
> > > > > > Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> > > > > > > Hi, Sid,
> > > > > > > 
> > > > > > > Really appreciate for your example and detailed explanation. Very 
> > > > > > > helpful.
> > > > > > > I think that this example is an excellent example to show 
> > > > > > > (almost) all the issues we need to consider.
> > > > > > > 
> > > > > > > I slightly modified this example to make it to be compilable and 
> > > > > > > run-able, as following:
> > > > > > > (but I still cannot make the incorrect reordering or DSE 
> > > > > > > happening, anyway, the potential reordering possibility is there…)
> > > > > > > 
> > > > > > >  1 #include 
> > > > > > >  2 struct A
> > > > > > >  3 {
> > > > > > >  4  size_t size;
> > > > > > >  5  char buf[] __attribute__((counted_by(size)));
> > > > > > >  6 };
> > > > > > >  7
> > > > > > >  8 static size_t
> > > > > > >  9 get_size_from (void *ptr)
> > > > > > > 10 {
> > > > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > > > 12 }
> > > > > > > 13
> > > > > > > 14 void
> > > > > > > 15 foo (size_t sz)
> > > > > > > 16 {
> > > > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > > > sizeof(char));
> > > > > > > 18  obj->size = sz;
> > > > > > > 19  obj->buf[0] = 2;
> > > > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > > > 21  return;
> > > > > > > 22 }
> > > > > > > 23
> > > > > > > 24 int main ()
> > > > > > > 25 {
> > > > > > > 26  foo (20);
> > > > > > > 27  return 0;
> > > > > > > 28 }
> > > > > > > 
> > > 
> > > 
> > > 
> > > > > When it’s set I suppose.  Turn
> > > > > 
> > > > > X.l = n;
> > > > > 
> > > > > Into
> > > > > 
> > > > > X.l = __builtin_with_size (x.buf, n);
> > > > 
> > > > It would turn
> > > > 
> > > > some_variable = (&) x.buf
> > > > 
> > > > into
> > > > 
> > > > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> > > > 
> > > > 
> > > > So the later access to x.buf and not the initialization
> > > > of a member of the struct (which is too early).
> > > > 
> > > 
> > > Hmm, so with Qing's example above, are you suggesting the transformation 
> > > be to foo like so:
> > > 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 16.5  void * _1;
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> > > 20  __builtin_printf (“%d\n", get_size_from (_1));
> > > 21  return;
> > > 22 }
> > > 
> > > If yes then this could indeed work.  I think I got thrown off by the 
> > > reference to __bdos.
> > 
> > Yes. I think it is important not to evaluate the size at the
> > access to buf and not the allocation, because the point is to 
> > recover it from the size member even when the compiler can't 
> > see the original allocation.
> 
> But if the access is through a pointer without the attribute visible
> even the Frontend cannot recover?  

Yes, if the access is using a struct-with-FAM without the attribute
the FE would not be insert the builtin.  BDOS could potentially
still see the original allocation but if it doesn't, then there is
no information.

> We’d need to force type correctness and give up on indirecting
> through an int * when it can refer to two diffenent container types. 
> The best we can do I think is mark allocation sites and hope for
> some basic code hygiene (not clobbering size or array pointer
> through pointers without the appropriately attributed type)

I am do not fully understand what you are referring to. But yes,
for full bounds safety we would need the language feature.
In C people should start to variably-modified types
more.  I think we can build perfect bounds safety on top of
them in a very good way with only FE changes.

All these attributes are just a best effort.  But for a while,
this will be necessary.

Martin

> 
> > Evaluating at this point requires that the size is correctly set
> > before the access to the FAM and the user has to make sure 
> > this is the case. But to me this requirement would make sense.
> > 
> > Semantically, it could aöso make sense to evaluate the size at a
> > later time.  But then the reordering becomes problematic again.
> > 
> > Also I think this would make this feature generally more useful.
> > For example, it could work also for others pointers in the struct
> > and not just for FAMs.  In this case, the struct may already be
> > freed when  BDOS is called, so it might also not possible to
> > access the size member at a later ti

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 7:13 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
>> 
>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
 On 2023-10-25 04:16, Martin Uecker wrote:
 Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> 
>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>> 
>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>>> Hi, Sid,
>>> 
>>> Really appreciate for your example and detailed explanation. Very 
>>> helpful.
>>> I think that this example is an excellent example to show (almost) all 
>>> the issues we need to consider.
>>> 
>>> I slightly modified this example to make it to be compilable and 
>>> run-able, as following:
>>> (but I still cannot make the incorrect reordering or DSE happening, 
>>> anyway, the potential reordering possibility is there…)
>>> 
>>> 1 #include 
>>> 2 struct A
>>> 3 {
>>> 4  size_t size;
>>> 5  char buf[] __attribute__((counted_by(size)));
>>> 6 };
>>> 7
>>> 8 static size_t
>>> 9 get_size_from (void *ptr)
>>> 10 {
>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>> 12 }
>>> 13
>>> 14 void
>>> 15 foo (size_t sz)
>>> 16 {
>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>> sizeof(char));
>>> 18  obj->size = sz;
>>> 19  obj->buf[0] = 2;
>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>> 21  return;
>>> 22 }
>>> 23
>>> 24 int main ()
>>> 25 {
>>> 26  foo (20);
>>> 27  return 0;
>>> 28 }
>>> 
>>> 
>>> 
>>> 
> When it’s set I suppose.  Turn
> 
> X.l = n;
> 
> Into
> 
> X.l = __builtin_with_size (x.buf, n);
 
 It would turn
 
 some_variable = (&) x.buf
 
 into
 
 some_variable = __builtin_with_size ( (&) x.buf. x.len)
 
 
 So the later access to x.buf and not the initialization
 of a member of the struct (which is too early).
 
>>> 
>>> Hmm, so with Qing's example above, are you suggesting the transformation 
>>> be to foo like so:
>>> 
>>> 14 void
>>> 15 foo (size_t sz)
>>> 16 {
>>> 16.5  void * _1;
>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>> 18  obj->size = sz;
>>> 19  obj->buf[0] = 2;
>>> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
>>> 20  __builtin_printf (“%d\n", get_size_from (_1));
>>> 21  return;
>>> 22 }
>>> 
>>> If yes then this could indeed work.  I think I got thrown off by the 
>>> reference to __bdos.
>> 
>> Yes. I think it is important not to evaluate the size at the
>> access to buf and not the allocation, because the point is to 
>> recover it from the size member even when the compiler can't 
>> see the original allocation.
> 
> But if the access is through a pointer without the attribute visible even the 
> Frontend cannot recover?  We’d need to force type correctness and give up on 
> indirecting through an int * when it can refer to two diffenent container 
> types.

Might need issue warnings when this happens?

>  The best we can do I think is mark allocation sites and hope for some basic 
> code hygiene (not clobbering size or array pointer through pointers without 
> the appropriately attributed type)
I guess that we need to clarify the requirement in the documentation, and also 
issue warnings when the source code has such issues.

Qing
> 
>> Evaluating at this point requires that the size is correctly set
>> before the access to the FAM and the user has to make sure 
>> this is the case. But to me this requirement would make sense.
>> 
>> Semantically, it could aöso make sense to evaluate the size at a
>> later time.  But then the reordering becomes problematic again.
>> 
>> Also I think this would make this feature generally more useful.
>> For example, it could work also for others pointers in the struct
>> and not just for FAMs.  In this case, the struct may already be
>> freed when  BDOS is called, so it might also not possible to
>> access the size member at a later time.
>> 
>> Martin
>> 
>> 
>>> 
>> 



[PATCH] c++/modules: fix up recent testcases

2023-10-25 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

Declaring get() inline seems necessary to avoid link failure:

  /usr/bin/ld: /tmp/ccwdv6Co.o: in function `g3@pr105322.Decltype()':
  
decltype-1_b.C:(.text._ZW8pr105322W8Decltype2g3v[_ZW8pr105322W8Decltype2g3v]+0x18):
 undefined reference to `f@pr105322.Decltype()::A::get()'

Not sure if that's expected?

-- >8 --

This fixes some minor issues with the testcases from
r14-4806-g084addf8a700fa.

gcc/testsuite/ChangeLog:

* g++.dg/modules/decltype-1_a.C: Add missing } to dg-module-do
directive.  Declare f()::A::get() inline.
* g++.dg/modules/lambda-5_a.C: Add missing } to dg-module-do
directive.
---
 gcc/testsuite/g++.dg/modules/decltype-1_a.C | 4 ++--
 gcc/testsuite/g++.dg/modules/lambda-5_a.C   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/modules/decltype-1_a.C 
b/gcc/testsuite/g++.dg/modules/decltype-1_a.C
index ca66e8b598a..6512f151aae 100644
--- a/gcc/testsuite/g++.dg/modules/decltype-1_a.C
+++ b/gcc/testsuite/g++.dg/modules/decltype-1_a.C
@@ -1,5 +1,5 @@
 // PR c++/105322
-// { dg-module-do link
+// { dg-module-do link }
 // { dg-additional-options -fmodules-ts }
 // { dg-module-cmi pr105322.Decltype }
 
@@ -7,7 +7,7 @@ export module pr105322.Decltype;
 
 auto f() {
   struct A { int m;
-int get () { return m; }
+inline int get () { return m; }
   };
   return A{};
 }
diff --git a/gcc/testsuite/g++.dg/modules/lambda-5_a.C 
b/gcc/testsuite/g++.dg/modules/lambda-5_a.C
index 6b589d4965c..37d0e77b1e1 100644
--- a/gcc/testsuite/g++.dg/modules/lambda-5_a.C
+++ b/gcc/testsuite/g++.dg/modules/lambda-5_a.C
@@ -1,5 +1,5 @@
 // PR c++/105322
-// { dg-module-do link
+// { dg-module-do link }
 // { dg-additional-options -fmodules-ts }
 // { dg-module-cmi pr105322.Lambda }
 
-- 
2.42.0.482.g2e8e77cbac



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 10:50 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-25 09:27, Qing Zhao wrote:
>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  wrote:
>>> 
>>> On 2023-10-24 18:51, Qing Zhao wrote:
 Thanks for the proposal!
 So what you suggested is:
 For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the 
 FE, then the call to the _bdos (x.buf, 1) will
 Become:
_bdos(__builtin_with_size(x.buf, x.L), 1)?
 Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>> 
>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that 
>>> my comment was that any such annotation at object reference is probably too 
>>> late and hence not the right place for it; basically it has the same 
>>> problems as the option A in your comment.  A better place to reinforce such 
>>> a relationship would be the allocation+initialization site instead.
>> I think Martin’s proposal might work, it’s different than the option A:
>> A.  Add an additional argument, the size parameter,  to __bdos,
>>  A.1, during FE;
>>  A.2, during gimplification phase;
>> Option A targets on the __bdos call, try to encode the implicit use to the 
>> call, this will not work when the real object has not been instantiation at 
>> the call site.
>> However, Martin’s proposal targets on the FMA array itself, it will enhance 
>> the FAM access naturally with the size information. And such FAM access with 
>> size info will propagated to the __bdos site later through inlining, etc. 
>> and then tree-object-size can use the size information at that point. At the 
>> same time, the implicit use of the size is recorded correctly.
>> So, I think that this proposal is natural and reasonable.
> 
> Ack, we discussed this later in the thread and I agree[1].  Richard still has 
> concerns[2] that I think may be addressed by putting __builtin_with_size at 
> the point where the reference to x.buf escapes, but I'm not very sure about 
> that.
> 
> Oh, and Martin suggested using __builtin_with_size more generally[3] in 
> bugzilla to address attribute inlining issues and we have high level 
> consensus for a __builtin_with_access instead, which associates access type 
> in addition to size with the target object.  For the purposes of counted_by, 
> access type could simply be -1.

Yes, I read all the discussions in the comments of PR96503 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503), and I do agree that this 
is a good idea. 

I prefer the name for the new builtin as:  
__builtin_with_access_and_size
Instead of 
__builtin_with_access

All the attributes, “alloca_size”, “access”, and the new “counted_by” for FMA, 
could be converted to this builtin consistently, and even the later new 
extension, for example, “counted_by” attribute for general pointers, could use 
the same builtin. 

SOMETYPE *ptr = __builtin_with_access_and_size (SOMETYPE *ptr, size_t size, int 
access)

In the above, 

1. SOMETYPE will be the type of the pointee of “ptr”, it could be a real type 
or void.

2. “size”

If SOMETYPE is a real type, the “size” will be the number of elements of the 
type;
If SOMETYPE is void, the “size” will be the number of bytes.   

3. “access”

-1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

For the “counted_by” and “alloca_size” attribute, the “access” will be -1. 

Qing
> 
> Thanks,
> Sid
> 
> 
> [1] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
> 
> [2] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
> 
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6



[PATCH] bpf: Improvements in CO-RE builtins implementation.

2023-10-25 Thread Cupertino Miranda

Hi everyone,

This patch contains some more recent improvements to BPF CO-RE builtins.
Please find further details of the changes on the patch header.

Looking forward for your review and comments.

Best regards,
Cupertino Miranda

commit 6054209c0a8af9c3e6363550bf2ba4f4f2172eba
Author: Cupertino Miranda 
Date:   Tue Aug 8 09:22:41 2023 +0100

bpf: Improvements in CO-RE builtins implementation.

This patch moved the processing of attribute preserve_access_index to
its own independent pass in a gimple lowering pass.
This approach is more consistent with the implementation of the CO-RE
builtins when used explicitly in the code.  The attributed type accesses
are now early converted to __builtin_core_reloc builtin instead of being
kept as an expression in code through out all of the middle-end.
This disables the compiler to optimize out or manipulate the expression
using the local defined type, instead of assuming nothing is known about
this expression, as it should be the case in all of the CO-RE
relocations.

In the process, also the __builtin_preserve_access_index has been
improved to generate code for more complex expressions that would
require more then one CO-RE relocation.
This turned out to be a requirement, since bpf-next selftests would rely on
loop unrolling in order to convert an undefined index array access into a
defined one. This seemed extreme to expect for the unroll to happen, and for
that reason GCC still generates correct code in such scenarios, even when index
access is never predictable or unrolling does not occur.

gcc/ChangeLog:
* config/bpf/bpf-passes.def (pass_lower_bpf_core): Added pass.
* config/bpf/bpf-protos.h: Added prototype for new pass.
* config/bpf/bpf.cc (bpf_const_not_ok_for_debug_p): New function.
* config/bpf/bpf.md (mov_reloc_core): Changed
* config/bpf/core-builtins.cc (cr_builtins, is_attr_preserve_access,
core_field_info, bpf_core_get_index, compute_field_expr,
process_field_expr, pack_type, make_core_relo,
bpf_handle_plugin_finish_type, core_buintin_helpers,
construct_builtin_core_reloc, bpf_resolve_overloaded_core_builtin,
bpf_add_core_reloc): Changed.
(root_for_core_field_info, pack_field_expr,
core_expr_with_field_expr_plus_base, make_core_safe_access_index,
replace_core_access_index_comp_expr, maybe_get_base_for_field_expr,
core_access_clean, core_is_access_index, core_mark_as_access_index,
make_gimple_core_safe_access_index, execute_lower_bpf_core,
make_pass_lower_bpf_core): Added functions.
(pass_data_lower_bpf_core): New pass struct.
(pass_lower_bpf_core): New gimple_opt_pass class.
(pack_field_expr_for_preserve_field, bpf_replace_core_move_operands): Removed function.
(bpf_enum_value_kind): Added GTY(()).
* config/bpf/core-builtins.h (bpf_field_info_kind, bpf_type_id_kind,
bpf_type_info_kind, bpf_enum_value_kind): New enum.
* config/bpf/t-bpf: Added pass bpf-passes.def to PASSES_EXTRA.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: New test.
* gcc.target/bpf/core-attr-6.c: New test.
* gcc.target/bpf/core-builtin-1.c: Corrected
* gcc.target/bpf/core-builtin-enumvalue-opt.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-enumvalue.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-exprlist-1.c: New test.
* gcc.target/bpf/core-builtin-exprlist-2.c: New test.
* gcc.target/bpf/core-builtin-exprlist-3.c: New test.
* gcc.target/bpf/core-builtin-exprlist-4.c: New test.
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Extra tests

diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
new file mode 100644
index ..249c58e22067
--- /dev/null
+++ b/gcc/config/bpf/bpf-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for eBPF.
+   Copyright (C) 2021-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_BEFORE (p

[PATCH] c++: more ahead-of-time -Wparentheses warnings

2023-10-25 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Now that we don't have to worry about looking thruogh NON_DEPENDENT_EXPR,
we can easily extend the -Wparentheses warning in convert_for_assignment
to consider (non-dependent) templated assignment operator expressions as
well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond.

gcc/cp/ChangeLog:

* cp-tree.h (is_assignment_op_expr_p): Declare.
* semantics.cc (is_assignment_op_expr_p): Generalize to return
true for assignment operator expression, not just one that
have been resolved to an operator overload.
(maybe_convert_cond): Remove now-redundant checks around
is_assignment_op_expr_p.
* typeck.cc (convert_for_assignment): Look through implicit
INDIRECT_REF in -Wparentheses warning logic, and generalize
to use is_assignment_op_expr_p.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-13.C: Strengthen by not requiring
that the templates are instantiated for any of the -Wparentheses
warnings to be issued.
* g++.dg/warn/Wparentheses-23.C: Likewise.
* g++.dg/warn/Wparentheses-32.C: Remove xfails.
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/semantics.cc | 22 +++--
 gcc/cp/typeck.cc|  7 ---
 gcc/testsuite/g++.dg/warn/Wparentheses-13.C |  2 --
 gcc/testsuite/g++.dg/warn/Wparentheses-23.C |  3 ---
 gcc/testsuite/g++.dg/warn/Wparentheses-32.C |  8 
 6 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 30fe716b109..c90ef883e52 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7875,6 +7875,7 @@ extern tree lambda_regenerating_args  (tree);
 extern tree most_general_lambda(tree);
 extern tree finish_omp_target  (location_t, tree, tree, bool);
 extern void finish_omp_target_clauses  (location_t, tree, tree *);
+extern bool is_assignment_op_expr_p(tree);
 
 /* in tree.cc */
 extern int cp_tree_operand_length  (const_tree);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 72ec72de690..4b0038a4fc7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -840,15 +840,20 @@ finish_goto_stmt (tree destination)
   return add_stmt (build_stmt (input_location, GOTO_EXPR, destination));
 }
 
-/* Returns true if CALL is a (possibly wrapped) CALL_EXPR or AGGR_INIT_EXPR
-   to operator= () that is written as an operator expression. */
-static bool
-is_assignment_op_expr_p (tree call)
+/* Returns true if T corresponds to an assignment operator expression.  */
+
+bool
+is_assignment_op_expr_p (tree t)
 {
-  if (call == NULL_TREE)
+  if (t == NULL_TREE)
 return false;
 
-  call = extract_call_expr (call);
+  if (TREE_CODE (t) == MODIFY_EXPR
+  || (TREE_CODE (t) == MODOP_EXPR
+ && TREE_CODE (TREE_OPERAND (t, 1)) == NOP_EXPR))
+return true;
+
+  tree call = extract_call_expr (t);
   if (call == NULL_TREE
   || call == error_mark_node
   || !CALL_EXPR_OPERATOR_SYNTAX (call))
@@ -882,10 +887,7 @@ maybe_convert_cond (tree cond)
   cond = convert_from_reference (cond);
 
   tree inner = REFERENCE_REF_P (cond) ? TREE_OPERAND (cond, 0) : cond;
-  if ((TREE_CODE (inner) == MODIFY_EXPR
-   || (TREE_CODE (inner) == MODOP_EXPR
-  && TREE_CODE (TREE_OPERAND (inner, 1)) == NOP_EXPR)
-   || is_assignment_op_expr_p (inner))
+  if (is_assignment_op_expr_p (inner)
   && warn_parentheses
   && !warning_suppressed_p (inner, OPT_Wparentheses)
   && warning_at (cp_expr_loc_or_input_loc (inner),
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 3b719326d76..0585b4a6bf0 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10338,16 +10338,17 @@ convert_for_assignment (tree type, tree rhs,
 
   /* If -Wparentheses, warn about a = b = c when a has type bool and b
  does not.  */
+  tree inner_rhs = REFERENCE_REF_P (rhs) ? TREE_OPERAND (rhs, 0) : rhs;
   if (warn_parentheses
   && TREE_CODE (type) == BOOLEAN_TYPE
-  && TREE_CODE (rhs) == MODIFY_EXPR
-  && !warning_suppressed_p (rhs, OPT_Wparentheses)
+  && is_assignment_op_expr_p (inner_rhs)
+  && !warning_suppressed_p (inner_rhs, OPT_Wparentheses)
   && TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE
   && (complain & tf_warning)
   && warning_at (rhs_loc, OPT_Wparentheses,
 "suggest parentheses around assignment used as "
 "truth value"))
-suppress_warning (rhs, OPT_Wparentheses);
+suppress_warning (inner_rhs, OPT_Wparentheses);
 
   if (complain & tf_warning)
 warn_for_address_or_pointer_of_packed_member (type, rhs);
diff --git a/gcc/testsuite/g++.dg/warn/Wparentheses-13.C 
b/gcc/testsuite/g++.dg/warn/Wparentheses-13.C
index 22a139f23a4..d6438942c28 100644
--- a/gcc/testsuite/g++.dg/warn/Wparenthe

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 11:38 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 25.10.2023 um 16:50 schrieb Siddhesh Poyarekar :
>> 
>> On 2023-10-25 09:27, Qing Zhao wrote:
> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  
> wrote:
 
 On 2023-10-24 18:51, Qing Zhao wrote:
> Thanks for the proposal!
> So what you suggested is:
> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the 
> FE, then the call to the _bdos (x.buf, 1) will
> Become:
>   _bdos(__builtin_with_size(x.buf, x.L), 1)?
> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
 
 Oops, I think Martin and I fell off-list in a subthread.  I clarified that 
 my comment was that any such annotation at object reference is probably 
 too late and hence not the right place for it; basically it has the same 
 problems as the option A in your comment.  A better place to reinforce 
 such a relationship would be the allocation+initialization site instead.
>>> I think Martin’s proposal might work, it’s different than the option A:
>>> A.  Add an additional argument, the size parameter,  to __bdos,
>>> A.1, during FE;
>>> A.2, during gimplification phase;
>>> Option A targets on the __bdos call, try to encode the implicit use to the 
>>> call, this will not work when the real object has not been instantiation at 
>>> the call site.
>>> However, Martin’s proposal targets on the FMA array itself, it will enhance 
>>> the FAM access naturally with the size information. And such FAM access 
>>> with size info will propagated to the __bdos site later through inlining, 
>>> etc. and then tree-object-size can use the size information at that point. 
>>> At the same time, the implicit use of the size is recorded correctly.
>>> So, I think that this proposal is natural and reasonable.
>> 
>> Ack, we discussed this later in the thread and I agree[1].  Richard still 
>> has concerns[2] that I think may be addressed by putting __builtin_with_size 
>> at the point where the reference to x.buf escapes, but I'm not very sure 
>> about that.
>> 
>> Oh, and Martin suggested using __builtin_with_size more generally[3] in 
>> bugzilla to address attribute inlining issues and we have high level 
>> consensus for a __builtin_with_access instead, which associates access type 
>> in addition to size with the target object.  For the purposes of counted_by, 
>> access type could simply be -1.
> 
> Btw, I’d like to see some hard numbers on the amount of extra false positives 
> this will cause a well as the effect on generated code before putting this in 
> mainline and effectively needing to support it forever. 

What do you mean by the “extra false positives”? 

For the code generation impact:

turning the original  x.buf 
to a builtin function call
__builtin_with_access_and_size(x,buf, x.L,-1)

might inhibit some optimizations from happening before the builtin is evaluated 
into object size info (phase  .objsz1).  I guess there might be some 
performance impact. 

However, if we mark this builtin as PURE, NOTRROW, etc, then the negative 
performance impact will be reduced to minimum? 

Qing

> 
> Richard 
> 
>> Thanks,
>> Sid
>> 
>> 
>> [1] 
>> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
>> 
>> [2] 
>> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
>> 
>> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6



[pushed] tree: update address_space comment

2023-10-25 Thread Jason Merrill
Pushing as obvious.

-- 8< --

Mention front-end uses of the address_space bit-field, and remove the
inaccurate "only".

gcc/ChangeLog:

* tree-core.h (struct tree_base): Update address_space comment.
---
 gcc/tree-core.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 4dc36827d32..13435344401 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1082,10 +1082,11 @@ struct GTY(()) tree_base {
 
   unsigned spare1 : 8;
 
-  /* This field is only used with TREE_TYPE nodes; the only reason it is
+  /* For _TYPE nodes, this is TYPE_ADDR_SPACE; the reason it is
 present in tree_base instead of tree_type is to save space.  The size
 of the field must be large enough to hold addr_space_t values.
-For CONSTRUCTOR nodes this holds the clobber_kind enum.  */
+For CONSTRUCTOR nodes this holds the clobber_kind enum.
+The C++ front-end uses this in IDENTIFIER_NODE and NAMESPACE_DECL.  */
   unsigned address_space : 8;
 } bits;
 

base-commit: 406709b1c7b134a7a05445837f406e98c04f76f0
-- 
2.39.3



Re: [PATCH] c++: another build_new_1 folding fix [PR111929]

2023-10-25 Thread Jason Merrill

On 10/25/23 12:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

We also need to avoid folding 'outer_nelts_check' when in a template
context to prevent an ICE on the below testcase.  This patch achieves
this by replacing the fold_build2 call with build2 (cp_fully_fold will
later fold the overall expression if appropriate).

In passing, this patch removes an unnecessary call to convert on 'nelts'
since it should always already be a size_t (and 'convert' isn't the best
conversion entry point to use anyway since it doesn't take a complain
parameter.)

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Remove unnecessary call to convert
on 'nelts'.


OK.


 Use build2 instead of fold_build2 for 'outer_nelts_checks'.


Also OK, but can we skip that whole block when processing_template_decl? 
 Seems no point to build runtime checks.


Jason



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-25 Thread Robin Dapp
> At first, this seemed like an odd place to fold away the length.
> AFAIK the length in res_op is inherited directly from the original
> operation, and so it isn't any more redundant after the fold than
> it was before.  But I suppose the reason for doing it here is that
> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
> lengths.  Doing that avoids the need to define an IFN_COND_FOO
> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
> I think it deserves a comment.

I think, generally, what I want to cover is a more fundamental thing
- in length-controlled targets the loop length doesn't change
throughout a loop and what we normally do is load the right length,
operate on the maximum length (ignoring tail elements) and store
the right length.

So, whenever the length is constant it was already determined that
we operate on exactly this length and length masking is not needed.
Only when the length is variable and not compile-time constant we need
to use length masking (and therefore the vec_cond simplification becomes
invalid).  I think we never e.g. operate on the first "half" of a
vector, leaving the second half unchanged.  As far as I know such access
patterns are always done with non-length, "conditional" masking.

Actually the only problematic cases I found were reduction-like loops
where the reduction operated on full length rather than the "right" one.
If a tail element is wrong then, obviously the reduction result is also
wrong.  From a "loop len" point of view a reduction could have a length
like len_store.  Then the simplification problem would go away.

In the attached version I removed the hunk you mentioned but added a
match.pd pattern where all constant-length vcond_mask_len are simplified
to vec_cond.

/* A VCOND_MASK_LEN with a constant length is just a vec_cond for
   our purposes.  */
(simplify
 (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
  (vec_cond @0 @1 @2))

This works for all of the testsuite (and is basically the same
thing we have been testing all along with the bogus simplification
still in place).  Is there any way how to formalize the
requirement?  Or am I totally wrong and this must never be done?

> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
> (I realise IFN_COND_MASK isn't, but that's used differently.)

Right, a conversion optab is not necessary - in the expander function
all we really do is move the condition from position 1 to 3.  Changing
the order would mean inconsistency with vec_cond.  If that's acceptable
I can change it and we can use expand_direct_optab_fn.  For now I kept
the expander function but used a direct optab.

Regards
 Robin

>From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type):
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when
length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Allow
matching of 6 and 7 parameters.
(gimple_match_op::set_op): Ditto.
(gimple_match_op::gimple_match_op): Always initialize len and
bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(expand_vec_cond_mask_len_optab_fn): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md | 37 
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi |  9 
 gcc/gimple-match-exports.cc | 13 --
 gcc/gimple-match.h  | 78 -
 

Ping: [PATCH] Power10: Add options to disable load and store vector pair.

2023-10-25 Thread Michael Meissner
Ping patch:

| Date: Fri, 13 Oct 2023 19:41:13 -0400
| From: Michael Meissner 
| Subject: [PATCH] Power10: Add options to disable load and store vector pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-10-25 Thread Michael Meissner
Ping patch.

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner 
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [NVPTX] Patch pings...

2023-10-25 Thread Thomas Schwinge
Hi Roger!

Thanks for your patience!  I very much know how it's frustrating...
I promise I'll get to your patches: in fact I already started looking
into these before the Cauldron, but ran into GCC/nvptx things that I
didn't understand but felt I need to understand/address first, then after
the Cauldron was out for late summer vacations, only recently returned,
and now catching up with things accumulated during that vacation time,
and then planning to resume GCC/nvptx work.


Grüße
 Thomas


On 2023-10-25T14:54:50+0100, "Roger Sayle"  wrote:
> Random fact: there have been no changes to nvptx.md in 2023 apart
> from Jakub's tree-wide update to the copyright years in early January.
>
> Please can I ping two of my of pending Nvidia nvptx patches:
>
> "Correct pattern for popcountdi2 insn in nvptx.md" from January
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609571.html
>
> and
>
> "Update nvptx's bitrev2 pattern to use BITREVERSE rtx" from June
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620994.html
>
> Both of these still apply cleanly (because nvptx.md hasn't changed).
>
> Thanks in advance,
> Roger
> --
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Ping: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-10-25 Thread Michael Meissner
Ping patch.

| Date: Wed, 18 Oct 2023 20:00:18 -0400
| From: Michael Meissner 
| Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable 
-mblock-ops-vector-pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-10-25 Thread Michael Meissner
Ping patch:

| ate: Wed, 18 Oct 2023 20:01:54 -0400
| From: Michael Meissner 
| Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633513.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2023-10-25 Thread Michael Meissner
Ping patch.

| Date: Wed, 18 Oct 2023 20:03:02 -0400
| From: Michael Meissner 
| Subject: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


  1   2   >