Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-14 Thread Xi Ruoyao
On Thu, 2023-12-14 at 15:44 +0800, Jiahao Xu wrote:

> The implementation of this patch has some issues. When I compile 521.wrf with 
> -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE:

Indeed, creating CCCmode pseudos without a complete movfcc
implementation is buggy.

This patch needs a complete rework.

>  during RTL pass: reload
>  module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop':
>  module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error: maximum 
> number of generated reload insns per insn achieved (90)
>   1369 |   END SUBROUTINE FAST_SBM
>    | ^
>  0x1207221bf lra_constraints(bool)
>  ../../gcc/gcc/lra-constraints.cc:5429
>  0x120705a3f lra(_IO_FILE*, int)
>  ../../gcc/gcc/lra.cc:2442
>  0x1206ac93f do_reload
>  ../../gcc/gcc/ira.cc:5973
>  0x1206ac93f execute
>  ../../gcc/gcc/ira.cc:6161

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] expmed: Get vec_extract element mode from insn_data, [PR112999]

2023-12-14 Thread Robin Dapp
Hi,

this is a bit of a follow up of the latest expmed change.

In extract_bit_field_1 we try to get a better vector mode before
extracting from it.  Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieable vector mode with a fitting inner mode.

On riscv this triggered an ICE (PR112999) because we would take the
detour of extracting from a mask-mode vector via a vector integer mode.
One element of that mode could be subreg-punned with TImode which, in
turn, would need to be operated on in DImode chunks.

As the backend might return the extracted value in a different mode than
the inner mode of the input vector, we might already have a mode
equivalent to the target mode.  Therefore, this patch first obtains the
mode the backend uses for the particular vec_extract and then compares
it against the target mode.  Only if those disagree we try to find a
better mode.  Otherwise we continue with the initial one.

Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
on riscv.

Regards
 Robin

gcc/ChangeLog:

PR target/112999

* expmed.cc (extract_bit_field_1):  Get vec_extract's result
element mode from insn_data and compare it to the target mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112999.c: New test.
---
 gcc/expmed.cc   | 17 +++--
 .../gcc.target/riscv/rvv/autovec/pr112999.c | 17 +
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index ed17850ff74..6fbe4d9cfaf 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1722,10 +1722,23 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
}
 }
 
+  /* The target may prefer to return the extracted value in a different mode
+ than INNERMODE.  */
+  machine_mode outermode = GET_MODE (op0);
+  machine_mode element_mode = GET_MODE_INNER (outermode);
+  if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
+{
+  enum insn_code icode
+   = convert_optab_handler (vec_extract_optab, outermode, element_mode);
+
+  if (icode != CODE_FOR_nothing)
+   element_mode = insn_data[icode].operand[0].mode;
+}
+
   /* See if we can get a better vector mode before extracting.  */
   if (VECTOR_MODE_P (GET_MODE (op0))
   && !MEM_P (op0)
-  && GET_MODE_INNER (GET_MODE (op0)) != tmode)
+  && element_mode != tmode)
 {
   machine_mode new_mode;
 
@@ -1755,7 +1768,7 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   /* Use vec_extract patterns for extracting parts of vectors whenever
  available.  If that fails, see whether the current modes and bitregion
  give a natural subreg.  */
-  machine_mode outermode = GET_MODE (op0);
+  outermode = GET_MODE (op0);
   if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
 {
   scalar_mode innermode = GET_MODE_INNER (outermode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
new file mode 100644
index 000..c049c5a0386
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d 
--param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax -O3 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
+
+int a[1024];
+int b[1024];
+
+_Bool
+fn1 ()
+{
+  _Bool tem;
+  for (int i = 0; i < 1024; ++i)
+{
+  tem = !a[i];
+  b[i] = tem;
+}
+  return tem;
+}
-- 
2.43.0


Re: [PATCH] i386: Remove RAO-INT from Grand Ridge

2023-12-14 Thread Hongtao Liu
On Thu, Dec 14, 2023 at 10:55 AM Haochen Jiang  wrote:
>
> Hi all,
>
> According to ISE050 published at the end of September, RAO-INT will not
> be in Grand Ridge anymore. This patch aims to remove it.
>
> The documentation comes following:
>
> https://cdrdv2.intel.com/v1/dl/getContent/671368
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk and backport to GCC13?
Ok.
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/driver-i386.cc (host_detect_local_cpu): Do not
> set Grand Ridge depending on RAO-INT.
> * config/i386/i386.h: Remove PTA_RAOINT from PTA_GRANDRIDGE.
> * doc/invoke.texi: Adjust documentation.
> ---
>  gcc/config/i386/driver-i386.cc | 3 ---
>  gcc/config/i386/i386.h | 2 +-
>  gcc/doc/invoke.texi| 4 ++--
>  3 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 0cfb2884d65..3342e550f2a 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -665,9 +665,6 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>   /* Assume Arrow Lake S.  */
>   else if (has_feature (FEATURE_SM3))
> cpu = "arrowlake-s";
> - /* Assume Grand Ridge.  */
> - else if (has_feature (FEATURE_RAOINT))
> -   cpu = "grandridge";
>   /* Assume Sierra Forest.  */
>   else if (has_feature (FEATURE_AVXVNNIINT8))
> cpu = "sierraforest";
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 47340c6a4ad..303baf8c921 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2416,7 +2416,7 @@ constexpr wide_int_bitmask PTA_GRANITERAPIDS = 
> PTA_SAPPHIRERAPIDS | PTA_AMX_FP16
>| PTA_PREFETCHI;
>  constexpr wide_int_bitmask PTA_GRANITERAPIDS_D = PTA_GRANITERAPIDS
>| PTA_AMX_COMPLEX;
> -constexpr wide_int_bitmask PTA_GRANDRIDGE = PTA_SIERRAFOREST | PTA_RAOINT;
> +constexpr wide_int_bitmask PTA_GRANDRIDGE = PTA_SIERRAFOREST;
>  constexpr wide_int_bitmask PTA_ARROWLAKE = PTA_ALDERLAKE | PTA_AVXIFMA
>| PTA_AVXVNNIINT8 | PTA_AVXNECONVERT | PTA_CMPCCXADD | PTA_UINTR;
>  constexpr wide_int_bitmask PTA_ARROWLAKE_S = PTA_ARROWLAKE | PTA_AVXVNNIINT16
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1f26f80d26c..82dd9cdf907 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -33451,8 +33451,8 @@ SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PREFETCHW, 
> PCLMUL, RDRND, XSAVE, XSAVEC,
>  XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB, MOVDIRI,
>  MOVDIR64B, CLDEMOTE, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA, LZCNT,
>  PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, KL, WIDEKL, AVX-VNNI,
> -AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, ENQCMD, UINTR and RAOINT
> -instruction set support.
> +AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, ENQCMD and UINTR instruction 
> set
> +support.
>
>  @item clearwaterforest
>  Intel Clearwater Forest CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread juzhe.zh...@rivai.ai
Hi, Richard.

I have a question about the decrement IV costing since I find the reduction 
case is generating inferior codegen.

reduc_plus_int:
mv a3,a0
ble a1,zero,.L7
addiw a5,a1,-1
li a4,2
bleu a5,a4,.L8
vsetivli zero,4,e32,m1,ta,ma
srliw a4,a1,2
vmv.v.i v1,0
slli a4,a4,4
add a4,a4,a0
mv a5,a0
.L4:
vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
li a5,0
vmv.s.x v2,a5
andi a5,a1,-4
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
beq a1,a5,.L12
.L3:
subw a1,a1,a5
slli a5,a5,32
srli a5,a5,32
slli a1,a1,32
vsetvli a4,zero,e32,m1,ta,ma
slli a5,a5,2
srli a1,a1,32
vmv.v.i v1,0
add a3,a3,a5
vsetvli a1,a1,e8,mf4,ta,ma
vle32.v v3,0(a3)
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli zero,a1,e32,m1,tu,ma
vmv.v.v v1,v3
vsetvli a4,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
addw a0,a0,a5
ret
.L12:
ret
.L7:
li a0,0
ret
.L8:
li a5,0
li a0,0
j .L3

This patch adjust length_update_cost from 3 (original cost) into 2 can fix 
conversion case (the test append in this patch).
But can't fix reduction case.

Then I adjust it into COST = 1:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 19e38b8637b..50c99b1fe79 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 processed in current iteration, and a SHIFT operation to
 compute the next memory address instead of adding vectorization
 factor.  */
- length_update_cost = 2;
+ length_update_cost = 1;
else
  /* For increment IV stype, Each may need two MINs and one MINUS to
 update lengths in body for next iteration.  */

Then the codegen is reasonable now:

reduc_plus_int:
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret

The reason I set COST = 2 instead of 1 in this patch since

one COST is for SELECT_VL.

The other is for memory address calculation since we don't update memory 
address with adding VF directly,
instead:

We shift the result of SELECV_VL, and then add it into the memory IV as follows:

SSA_1 = SELECT_VL --> SSA_1 is element-wise
SSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)
next iteration memory address = current iteration memory address + SSA_2.

If elemente is INT8, then the shift operation is not needed:
SSA_2 = SSA_1 << 1 since it is already byte-wise.

So, my question is the COST should be 1 or 2.
It seems that COST = 1 is better for using SELECT_VL.
 
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-12-13 18:17
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford; jeffreyalaw
Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
vectorization COST model
On Wed, 13 Dec 2023, Juzhe-Zhong wrote:
 
> Hi, before this patch, a simple conversion case for RVV codegen:
> 
> foo:
> ble a2,zero,.L8
> addiw   a5,a2,-1
> li  a4,6
> bleua5,a4,.L6
> srliw   a3,a2,3
> sllia3,a3,3
> add a3,a3,a0
> mv  a5,a0
> mv  a4,a1
> vsetivlizero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addia5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addia4,a4,16
> bne a3,a5,.L4
> andia5,a2,-8
> beq a2,a5,.L10
> .L3:
> sllia4,a5,32
> srlia4,a4,32
> subwa2,a2,a5
> sllia2,a2,32
> sllia5,a4,1
> srlia2,a2,32
> add a0,a0,a4
> add a1,a1,a5
> vsetvli zero,a2,e16,m1,ta,ma
> vle8.v  v2,0(a0)
> vzext.vf2   v1,v2
> vse16.v v1,0(a1)
> .L8:
> ret
> .L10:
> ret
> .L6:
> li  a5,0
> j   .L3
> 
> This vectorization go through first loop:
> 
> vsetivlizero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addia5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addia4,a4,16
> bne a3,a5,.L4
> 
> Each iteration processes 8 elements.
> 
> For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 
> 128.
> But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, 
> e.g. VLEN = 256bits.
> only half of the vector units are working and another half is idle.
> 
> After investigation, I realize that I forgot to adjust COST for SELECT_VL.
> So, adjust COST for SELECT_VL styple length vectorization. We adjust COST 
> from 3 to 2. since
> after this patch:
> 
> foo:
> ble a2,zero,.L5
> .L3:
> vsetvli a5,a2,e16,m1,ta,ma -> SELECT_VL cost.
> vle8.v v2,0(a0)
> slli a4,a5,1-> addit

Re: Re: [PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like insns

2023-12-14 Thread Fei Gao
On 2023-12-11 13:38  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT, AND]
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_cond_zero_shift_op_supported): check if OP is 
>>shift like operation
>>  (noce_cond_zero_binary_op_supported): restructure & call 
>>noce_cond_zero_shift_op_supported
>>  (noce_bbs_ok_for_cond_zero_arith): add support for const_int
>>  (noce_try_cond_zero_arith): add support for x=c ? (y op const_int)
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for x=c ? (y op 
>>const_int) : y


>> @@ -3089,7 +3111,18 @@ noce_try_cond_zero_arith (struct noce_if_info 
>> *if_info)
>>     return false;
>>   }
>>  
>> -  *to_replace = target;
>> +  if (CONST_INT_P (*to_replace))
>> +{
>> +  if (noce_cond_zero_shift_op_supported (bin_code))
>> +    *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +  else if (SUBREG_P (bin_op0))
>> +    *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
>> +  else
>> +    *to_replace = target;
>Not all targets use QImode for their shift counts, so you can't just
>force that argument to QImode. 
Thanks for your info. I haven't understood the "complex" you mentioned
regarding subreg until now.

>
>The way this works in our internal tree is that we re-expand the binary
>operation rather than replacing bits of existing RTL.  That allows the
>expanders to do the right thing automatically for the target WRT
>handling of things like the mode of the shift count.  In fact, I don't
>see how you can ever do replacement of a constant with a register with
>the current scheme since the original constant will be modeless, so you
>never know what mode to use. 
Letting the expander to handle const_int case seems a target general solution.

BR, 
Fei

>
>
>
>Jeff

Re: Re: [PATCH 5/5] [ifcvt] optimize extension for x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-14 Thread Fei Gao
On 2023-12-11 13:46  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
>> to support SImode in 64-bit machine.
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>> * ifcvt.cc (noce_cond_zero_binary_op_supported): add support for extension
>>  (noce_bbs_ok_for_cond_zero_arith): likewise
>>  (noce_try_cond_zero_arith): support extension of LSHIFTRT case
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for extension
>So I think this needs to defer to gcc-15.  But even so I think getting
>some review on the effort is useful.
>
>
>> ---
>>   gcc/ifcvt.cc  |  16 ++-
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 126 +-
>>   2 files changed, 139 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index b84be53ec5c..306497a8e37 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -2934,6 +2934,10 @@ noce_cond_zero_binary_op_supported (rtx op)
>>   {
>> enum rtx_code opcode = GET_CODE (op);
>>  
>> +  /* Strip SIGN_EXTEND or ZERO_EXTEND if any.  */
>> +  if (opcode == SIGN_EXTEND || opcode == ZERO_EXTEND)
>> +    opcode = GET_CODE (XEXP (op, 0));
>So it seems to me like that you need to record what the extension was so
>that you can re-apply it to the result.
>
>> @@ -3114,7 +3122,11 @@ noce_try_cond_zero_arith (struct noce_if_info 
>> *if_info)
>> if (CONST_INT_P (*to_replace))
>>   {
>>     if (noce_cond_zero_shift_op_supported (bin_code))
>> -    *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +    {
>> +  *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +  if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
>> +PUT_CODE (a, SIGN_EXTEND);
>> +    }
>This doesn't look correct (ignoring the SUBREG issues with patch #4 in
>this series). 
Agree there's  issue here for const_int case as you mentioned in 
[PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y.

>
>When we looked at this internally the conclusion was we needed to first
>strip the extension, recording what kind of extension it was, then
>reapply the same extension to the result of the now conditional
>operation.  And it's independent of SUBREG handling. 
Ignoring the const_int case, we can reuse the RTL pattern and replace
the z(SUBREG pr REG) in INSN_A(x=y op z) without recording what kind
of extension it was. 

New patch will be sent to gcc15.

BR, 
Fei

>
>
>Jeff

Re: [PR target/110201] Fix operand types for various scalar crypto insns

2023-12-14 Thread Christoph Müllner
On Tue, Jun 20, 2023 at 12:34 AM Jeff Law via Gcc-patches
 wrote:
>
>
> A handful of the scalar crypto instructions are supposed to take a
> constant integer argument 0..3 inclusive.  A suitable constraint was
> created and used for this purpose (D03), but the operand's predicate is
> "register_operand".  That's just wrong.
>
> This patch adds a new predicate "const_0_3_operand" and fixes the
> relevant insns to use it.  One could argue the constraint is redundant
> now (and you'd be correct).  I wouldn't lose sleep if someone wanted
> that removed, in which case I'll spin up a V2.
>
> The testsuite was broken in a way that made it consistent with the
> compiler, so the tests passed, when they really should have been issuing
> errors all along.
>
> This patch adjusts the existing tests so that they all expect a
> diagnostic on the invalid operand usage (including out of range
> constants).  It adds new tests with proper constants, testing the
> extremes of valid values.
>
> OK for the trunk, or should we remove the D03 constraint?

Reviewed-by: Christoph Muellner 

The patch does not apply cleanly anymore, because there were some
small changes in crypto.md.


Re: [PATCH] RISC-V: fix scalar crypto pattern

2023-12-14 Thread Christoph Müllner
On Thu, Dec 14, 2023 at 1:40 AM Jeff Law  wrote:
> On 12/13/23 02:03, Christoph Müllner wrote:
> > On Wed, Dec 13, 2023 at 9:22 AM Liao Shihua  wrote:
> >>
> >> In Scalar Crypto Built-In functions, some require immediate parameters,
> >> But register_operand are incorrectly used in the pattern.
> >>
> >> E.g.:
> >> __builtin_riscv_aes64ks1i(rs1,1)
> >> Before:
> >>li a5,1
> >>aes64ks1i a0,a0,a5
> >>
> >>Assembler messages:
> >>Error: instruction aes64ks1i requires absolute expression
> >>
> >> After:
> >>aes64ks1i a0,a0,1
> >
> > Looks good to me (also tested with rv32 and rv64).
> > (I was actually surprised that the D03 constraint was not sufficient)
> >
> > Reviewed-by: Christoph Muellner 
> > Tested-by: Christoph Muellner 
> >
> > Nit: I would prefer to separate arguments with a comma followed by a space.
> > Even if the existing code was not written like that.
> > E.g. __builtin_riscv_sm4ed(rs1,rs2,1); -> __builtin_riscv_sm4ed(rs1, rs2, 
> > 1);
> >
> > I propose to remove the builtin tests for scalar crypto and scalar bitmanip
> > as part of the patchset that adds the intrinsic tests (no value in
> > duplicated tests).
> >
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/crypto.md: Use immediate_operand instead of 
> >> register_operand.
> You should mention the actual patterns changed.
>
> I would strongly recommend adding some tests that out of range cases are
> rejected (out of range constants as well as a variable for that last
> argument).  I did that in my patch from June to fix this problem (which
> was never acked/reviewed).

Sorry, I was not aware of this patch.
Since Jeff's patch was here first and also includes more tests, I
propose to move forward with his patch (but I'm not a maintainer!).
Therefore, I've reviewed Jeff's patch and replied to his email.

FWIW: Jeff's patch can be found here:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622233.html


Re: [committed] testsuite: Fix up target-enter-data-1.c on 32-bit targets

2023-12-14 Thread Julian Brown
On Thu, 14 Dec 2023 08:14:56 +0100
Jakub Jelinek  wrote:

> On Wed, Nov 29, 2023 at 11:43:05AM +, Julian Brown wrote:
> > * c-c++-common/gomp/target-enter-data-1.c: Adjust scan
> > output.  
> 
> struct bar { int num_vectors; double *vectors; };
> 
> is 16 bytes only on 64-bit targets, on 32-bit ones it is just 8 bytes,
> so the explicit matching of the * 16 multiplication only works on the
> former.
> 
> Tested on x86_64-linux -m32/-m64, committed to trunk.

That was quick, thank you! Apologies for the breakage.

Julian


Re: [PATCH v4] A new copy propagation and PHI elimination pass

2023-12-14 Thread Filip Kastl
Successfully bootstrapped and regtested on x86_64-linux. Will push to master.

Filip


Re: [PATCH] match.pd: Simplify (t * u) / v -> t * (u / v) [PR112994]

2023-12-14 Thread Richard Biener



> Am 14.12.2023 um 08:35 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase is optimized just on GENERIC (using
>  strict_overflow_p = false;
>  if (TREE_CODE (arg1) == INTEGER_CST
>  && (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
>&strict_overflow_p)) != 0)
>{
>  if (strict_overflow_p)
>fold_overflow_warning (("assuming signed overflow does not occur "
>"when simplifying division"),
>   WARN_STRICT_OVERFLOW_MISC);
>  return fold_convert_loc (loc, type, tem);
>}
> ) but not on GIMPLE.
> 
> The following included patch is what I've bootstrapped/regtested
> for it on x86_64-linux and i686-linux, unfortunately it regressed
> +FAIL: gcc.dg/Wstrict-overflow-3.c correct warning (test for warnings, line 
> 12)
> test, we are indeed assuming that signed overflow does not occur
> when simplifying division in there.
> 
> The attached version of the patch (which provides the simplification only
> for GIMPLE) fixes that, but I haven't bootstrapped/regtested it yet.
> And/or we could add the
>fold_overflow_warning (("assuming signed overflow does not occur "
>"when simplifying division"),
>   WARN_STRICT_OVERFLOW_MISC);
> call into the simplification, but in that case IMHO it should go into
> the (t * u) / u -> t simplification as well, there we assume the exact
> same thing (of course, in both cases only in the spots where we don't
> verify it through ranger that it never overflows).
> 
> Guarding the whole simplification to GIMPLE only IMHO makes sense because
> the above mentioned folding does it for GENERIC (and extract_muldiv even
> handles far more cases, dunno how many from that we should be doing on
> GIMPLE in match.pd and what could be done elsewhere; e.g. extract_muldiv
> can handle (x * 16 + y * 32) / 8 -> x * 2 + y * 4 etc.).
> 
> Dunno about the fold_overflow_warning, I always have doubts about why
> such a warning is useful to users.
> 
> Ok for trunk (and which version)?

I couldn’t spot the difference… OK for either
Version (and no, calling fold_overflow_warning looks wrong)

Richard 

> 2023-12-14  Jakub Jelinek  
> 
>PR tree-optimization/112994
>* match.pd ((t * 2) / 2 -> t): Adjust comment to use u instead of 2.
>Punt without range checks if TYPE_OVERFLOW_SANITIZED.
>((t * u) / v -> t * (u / v)): New simplification.
> 
>* gcc.dg/tree-ssa/pr112994-1.c: New test.
> 
> --- gcc/match.pd.jj2023-12-13 11:21:15.852158970 +0100
> +++ gcc/match.pd2023-12-13 19:10:26.448927327 +0100
> @@ -930,12 +930,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && TYPE_UNSIGNED (TREE_TYPE (@0)))
>   (bit_and @0 (negate @1
> 
> -/* Simplify (t * 2) / 2) -> t.  */
> (for div (trunc_div ceil_div floor_div round_div exact_div)
> + /* Simplify (t * u) / u -> t.  */
>  (simplify
>   (div (mult:c @0 @1) @1)
>   (if (ANY_INTEGRAL_TYPE_P (type))
> -   (if (TYPE_OVERFLOW_UNDEFINED (type))
> +   (if (TYPE_OVERFLOW_UNDEFINED (type) && !TYPE_OVERFLOW_SANITIZED (type))
> @0
> #if GIMPLE
> (with {value_range vr0, vr1;}
> @@ -945,6 +945,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
>   @0))
> #endif
> +   )))
> + /* Simplify (t * u) / v -> t * (u / v) if u is multiple of v.  */
> + (simplify
> +  (div (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && wi::multiple_of_p (wi::to_widest (@1), wi::to_widest (@2), SIGNED))
> +   (if (TYPE_OVERFLOW_UNDEFINED (type) && !TYPE_OVERFLOW_SANITIZED (type))
> +(mult @0 (div! @1 @2))
> +#if GIMPLE
> +(with {value_range vr0, vr1;}
> + (if (get_range_query (cfun)->range_of_expr (vr0, @0)
> +  && get_range_query (cfun)->range_of_expr (vr1, @1)
> +  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
> +  (mult @0 (div! @1 @2
> +#endif
>
> 
> #if GIMPLE
> --- gcc/testsuite/gcc.dg/tree-ssa/pr112994-1.c.jj2023-12-13 
> 16:58:25.757663610 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr112994-1.c2023-12-13 
> 16:43:16.413152969 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/112994 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not " / \\\[2389-\\\]" "optimized" } } */
> +
> +int f1 (int x) { return (x * 4) / 2; }
> +int f2 (int x) { return (x * 56) / 8; }
> +int f3 (int x) { return (x * 56) / -8; }
> +int f4 (int x) { int y = x * 4; return y / 2; }
> +int f5 (int x) { int y = x * 56; return y / 8; }
> +int f6 (int x) { int y = x * 56; return y / -8; }
> +unsigned f7 (unsigned x) { if (x > ~0U / 6) __builtin_unreachable (); 
> unsigned y = x * 6; return y / 3; }
> +unsigned f8 (unsigned x) { if (x > ~0U / 63) __builtin_unreachable (); 
> unsigned y = x * 6

Re: [PATCH] match.pd: Simplify (t * u) / (t * v) [PR112994]

2023-12-14 Thread Richard Biener



> Am 14.12.2023 um 08:37 schrieb Jakub Jelinek :
> 
> Hi!
> 
> On top of the previously posted patch, this simplifies say (x * 16) / (x * 4)
> into 4.  Unlike the previous pattern, this is something we didn't fold
> previously on GENERIC, so I think it shouldn't be all wrapped with #if
> GIMPLE.  The question whether there should be fold_overflow_warning for the
> TYPE_OVERFLOW_UNDEFINED case remains.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> (with or without fold_overflow_warning)?

Ok without the warning.

Richard 

> 2023-12-13  Jakub Jelinek  
> 
>PR tree-optimization/112994
>* match.pd ((t * u) / (t * v) -> (u / v)): New simplification.
> 
>* gcc.dg/tree-ssa/pr112994-2.c: New test.
> 
> --- gcc/match.pd.jj2023-12-13 18:43:51.277839661 +0100
> +++ gcc/match.pd2023-12-13 18:43:45.891913683 +0100
> @@ -962,6 +960,23 @@ (define_operator_list SYNC_FETCH_AND_AND
>  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1))
>   (mult @0 (div! @1 @2
> #endif
> +   )))
> + /* Simplify (t * u) / (t * v) -> (u / v) if u is multiple of v.  */
> + (simplify
> +  (div (mult @0 INTEGER_CST@1) (mult @0 INTEGER_CST@2))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && wi::multiple_of_p (wi::to_widest (@1), wi::to_widest (@2), SIGNED))
> +   (if (TYPE_OVERFLOW_UNDEFINED (type) && !TYPE_OVERFLOW_SANITIZED (type))
> +(div @1 @2)
> +#if GIMPLE
> +(with {value_range vr0, vr1, vr2;}
> + (if (get_range_query (cfun)->range_of_expr (vr0, @0)
> +  && get_range_query (cfun)->range_of_expr (vr1, @1)
> +  && get_range_query (cfun)->range_of_expr (vr2, @2)
> +  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1)
> +  && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr2))
> +  (div @1 @2)))
> +#endif
>
> 
> #if GIMPLE
> --- gcc/testsuite/gcc.dg/tree-ssa/pr112994-2.c.jj2023-12-13 
> 19:07:20.882475735 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr112994-2.c2023-12-13 
> 19:07:02.597726855 +0100
> @@ -0,0 +1,15 @@
> +/* PR tree-optimization/112994 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 2;" 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "return 7;" 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "return -7;" 2 "optimized" } } */
> +
> +int f1 (int x) { return (x * 4) / (x * 2); }
> +int f2 (int x) { return (x * 56) / (x * 8); }
> +int f3 (int x) { return (x * 56) / (x * -8); }
> +int f4 (int x) { int y = x * 4; return y / (x * 2); }
> +int f5 (int x) { int y = x * 56; return y / (x * 8); }
> +int f6 (int x) { int y = x * 56; return y / (x * -8); }
> +unsigned f7 (unsigned x) { if (x > ~0U / 4) __builtin_unreachable (); 
> unsigned y = x * 4; return y / (x * 2); }
> +unsigned f8 (unsigned x) { if (x > ~0U / 56) __builtin_unreachable (); 
> unsigned y = x * 56; return y / (x * 8); }
> 
>Jakub
> 


Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread Richard Biener
Am 14.12.2023 um 09:28 schrieb juzhe.zh...@rivai.ai:
Hi, Richard.I have a question about the decrement IV costing since I find the reduction case is generating inferior codegen.reduc_plus_int:	mv	a3,a0	ble	a1,zero,.L7	addiw	a5,a1,-1	li	a4,2	bleu	a5,a4,.L8	vsetivli	zero,4,e32,m1,ta,ma	srliw	a4,a1,2	vmv.v.i	v1,0	slli	a4,a4,4	add	a4,a4,a0	mv	a5,a0.L4:	vle32.v	v2,0(a5)	addi	a5,a5,16	vadd.vv	v1,v1,v2	bne	a5,a4,.L4	li	a5,0	vmv.s.x	v2,a5	andi	a5,a1,-4	vredsum.vs	v1,v1,v2	vmv.x.s	a0,v1	beq	a1,a5,.L12.L3:	subw	a1,a1,a5	slli	a5,a5,32	srli	a5,a5,32	slli	a1,a1,32	vsetvli	a4,zero,e32,m1,ta,ma	slli	a5,a5,2	srli	a1,a1,32	vmv.v.i	v1,0	add	a3,a3,a5	vsetvli	a1,a1,e8,mf4,ta,ma	vle32.v	v3,0(a3)	li	a5,0	vsetivli	zero,1,e32,m1,ta,ma	vmv.s.x	v2,a5	vsetvli	zero,a1,e32,m1,tu,ma	vmv.v.v	v1,v3	vsetvli	a4,zero,e32,m1,ta,ma	vredsum.vs	v1,v1,v2	vmv.x.s	a5,v1	addw	a0,a0,a5	ret.L12:	ret.L7:	li	a0,0	ret.L8:	li	a5,0	li	a0,0	j	.L3
This patch adjust length_update_cost from 3 (original cost) into 2 can fix conversion case (the test append in this patch).But can't fix reduction case.Then I adjust it into COST = 1:diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.ccindex 19e38b8637b..50c99b1fe79 100644--- a/gcc/tree-vect-loop.cc+++ b/gcc/tree-vect-loop.cc@@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,                 processed in current iteration, and a SHIFT operation to                 compute the next memory address instead of adding vectorization                 factor.  */-             length_update_cost = 2;+             length_update_cost = 1;            else              /* For increment IV stype, Each may need two MINs and one MINUS to                 update lengths in body for next iteration.  */Then the codegen is reasonable now:reduc_plus_int:	ble	a1,zero,.L4	vsetvli	a5,zero,e32,m1,ta,ma	vmv.v.i	v1,0.L3:	vsetvli	a5,a1,e32,m1,tu,ma	vle32.v	v2,0(a0)	slli	a4,a5,2	sub	a1,a1,a5	add	a0,a0,a4	vadd.vv	v1,v2,v1	bne	a1,zero,.L3	li	a5,0	vsetivli	zero,1,e32,m1,ta,ma	vmv.s.x	v2,a5	vsetvli	a5,zero,e32,m1,ta,ma	vredsum.vs	v1,v1,v2	vmv.x.s	a0,v1	ret.L4:	li	a0,0	retThe reason I set COST = 2 instead of 1 in this patch sinceone COST is for SELECT_VL.The other is for memory address calculation since we don't update memory address with adding VF directly,instead:We shift the result of SELECV_VL, and then add it into the memory IV as follows:SSA_1 = SELECT_VL --> SSA_1 is element-wiseSSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)next iteration memory address = current iteration memory address + SSA_2.If elemente is INT8, then the shift operation is not needed:SSA_2 = SSA_1 << 1 since it is already byte-wise.So, my question is the COST should be 1 or 2.It seems that COST = 1 is better forusing SELECT_VL.We are not modeling address cost, so trying to account for this here is only going to be a heuristic that’s as many times wrong than it is correct.  If the difference between SELECT_VL and not is so small  then you’ll have a hard time modeling this here. Thanks.
juzhe.zh...@rivai.ai
 From: Richard BienerDate: 2023-12-13 18:17To: Juzhe-ZhongCC: gcc-patches; richard.sandiford; jeffreyalawSubject: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST modelOn Wed, 13 Dec 2023, Juzhe-Zhong wrote:
 
> Hi, before this patch, a simple conversion case for RVV codegen:
> 
> foo:
> ble a2,zero,.L8
> addiw   a5,a2,-1
> li  a4,6
> bleu    a5,a4,.L6
> srliw   a3,a2,3
> slli    a3,a3,3
> add a3,a3,a0
> mv  a5,a0
> mv  a4,a1
> vsetivli    zero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addi    a5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addi    a4,a4,16
> bne a3,a5,.L4
> andi    a5,a2,-8
> beq a2,a5,.L10
> .L3:
> slli    a4,a5,32
> srli    a4,a4,32
> subw    a2,a2,a5
> slli    a2,a2,32
> slli    a5,a4,1
> srli    a2,a2,32
> add a0,a0,a4
> add a1,a1,a5
> vsetvli zero,a2,e16,m1,ta,ma
> vle8.v  v2,0(a0)
> vzext.vf2   v1,v2
> vse16.v v1,0(a1)
> .L8:
> ret
> .L10:
> ret
> .L6:
> li  a5,0
> j   .L3
> 
> This vectorization go through first loop:
> 
> vsetivli    zero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addi    a5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addi    a4,a4,16
> bne a3,a5,.L4
> 
> Each iteration processes 8 elements.
> 
> For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
> But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
> only half of the vector units are working and another half is idle.
> 
> After investigation, I realize that I forgot to adjust COST for SELECT_VL.
> So, adjust COST for SELECT_VL sty

Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread juzhe.zh...@rivai.ai
Thanks Richard.

Let me clarify again to make sure I understand your comments correctly:

Do you suggest not to model address cost here like other partial vectorization 
style (while_ult, avx512...etc).
Then set COST = 1 since we only have SELECT_VL since beginning.
At various cases we saw, COST=1 is better than COST=2.

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-12-14 18:46
To: juzhe.zhong
CC: gcc-patches; richard.sandiford; jeffreyalaw
Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
vectorization COST model


Am 14.12.2023 um 09:28 schrieb juzhe.zh...@rivai.ai:

 
Hi, Richard.

I have a question about the decrement IV costing since I find the reduction 
case is generating inferior codegen.

reduc_plus_int:
mv a3,a0
ble a1,zero,.L7
addiw a5,a1,-1
li a4,2
bleu a5,a4,.L8
vsetivli zero,4,e32,m1,ta,ma
srliw a4,a1,2
vmv.v.i v1,0
slli a4,a4,4
add a4,a4,a0
mv a5,a0
.L4:
vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
li a5,0
vmv.s.x v2,a5
andi a5,a1,-4
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
beq a1,a5,.L12
.L3:
subw a1,a1,a5
slli a5,a5,32
srli a5,a5,32
slli a1,a1,32
vsetvli a4,zero,e32,m1,ta,ma
slli a5,a5,2
srli a1,a1,32
vmv.v.i v1,0
add a3,a3,a5
vsetvli a1,a1,e8,mf4,ta,ma
vle32.v v3,0(a3)
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli zero,a1,e32,m1,tu,ma
vmv.v.v v1,v3
vsetvli a4,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
addw a0,a0,a5
ret
.L12:
ret
.L7:
li a0,0
ret
.L8:
li a5,0
li a0,0
j .L3

This patch adjust length_update_cost from 3 (original cost) into 2 can fix 
conversion case (the test append in this patch).
But can't fix reduction case.

Then I adjust it into COST = 1:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 19e38b8637b..50c99b1fe79 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 processed in current iteration, and a SHIFT operation to
 compute the next memory address instead of adding vectorization
 factor.  */
- length_update_cost = 2;
+ length_update_cost = 1;
else
  /* For increment IV stype, Each may need two MINs and one MINUS to
 update lengths in body for next iteration.  */

Then the codegen is reasonable now:

reduc_plus_int:
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret

The reason I set COST = 2 instead of 1 in this patch since

one COST is for SELECT_VL.

The other is for memory address calculation since we don't update memory 
address with adding VF directly,
instead:

We shift the result of SELECV_VL, and then add it into the memory IV as follows:

SSA_1 = SELECT_VL --> SSA_1 is element-wise
SSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)
next iteration memory address = current iteration memory address + SSA_2.

If elemente is INT8, then the shift operation is not needed:
SSA_2 = SSA_1 << 1 since it is already byte-wise.

So, my question is the COST should be 1 or 2.
It seems that COST = 1 is better for
using SELECT_VL.

We are not modeling address cost, so trying to account for this here is only 
going to be a heuristic that’s as many times wrong than it is correct.  If the 
difference between SELECT_VL and not is so small  then you’ll have a hard time 
modeling this here.

 
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-12-13 18:17
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford; jeffreyalaw
Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
vectorization COST model
On Wed, 13 Dec 2023, Juzhe-Zhong wrote:
 
> Hi, before this patch, a simple conversion case for RVV codegen:
> 
> foo:
> ble a2,zero,.L8
> addiw   a5,a2,-1
> li  a4,6
> bleua5,a4,.L6
> srliw   a3,a2,3
> sllia3,a3,3
> add a3,a3,a0
> mv  a5,a0
> mv  a4,a1
> vsetivlizero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addia5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addia4,a4,16
> bne a3,a5,.L4
> andia5,a2,-8
> beq a2,a5,.L10
> .L3:
> sllia4,a5,32
> srlia4,a4,32
> subwa2,a2,a5
> sllia2,a2,32
> sllia5,a4,1
> srlia2,a2,32
> add a0,a0,a4
> add a1,a1,a5
> vsetvli zero,a2,e16,m1,ta,ma
> vle8.v  v2,0(a0)
> vzext.vf2   v1,v2
> vse16.v v1,0(a1)
> .L8:
> ret
> .L10:
> ret
> .L6:
> li  a5,0
> j   .L3
> 
> This vectorization go through first lo

Re: [PATCH] RISC-V: fix scalar crypto pattern

2023-12-14 Thread Liao Shihua

Sorry, I was not aware of this patch.
Since Jeff's patch was here first and also includes more tests, I
propose to move forward with his patch (but I'm not a maintainer!).
Therefore, I've reviewed Jeff's patch and replied to his email.

FWIW: Jeff's patch can be found here:
   https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622233.html


No problem.

And I would tend to remove the D03 constraint if we used const_0_3_operand.

BR

Liao  Shihua


[PATCH] LoongArch: Fix incorrect code generation for sad pattern

2023-12-14 Thread Jiahao Xu
When I attempt to enable vect_usad_char effective target for LoongArch, some
tests fail. These tests fail because the sad pattern generates bad code. This
patch to fixed them, for sad patterns, use zero expansion instead of sign
expansion for reduction.

Currently, we are fixing failed vectorized tests, and in the future, we will
enable more tests of "vect" for LoongArch.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use zero expansion instruction.
* config/loongarch/lsx.md: Ditto.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..db6871507e2 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -5097,8 +5097,8 @@ (define_expand "usadv32qi"
   rtx t2 = gen_reg_rtx (V16HImode);
   rtx t3 = gen_reg_rtx (V8SImode);
   emit_insn (gen_lasx_xvabsd_u_bu (t1, operands[1], operands[2]));
-  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lasx_xvhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
   DONE;
 })
@@ -5114,8 +5114,8 @@ (define_expand "ssadv32qi"
   rtx t2 = gen_reg_rtx (V16HImode);
   rtx t3 = gen_reg_rtx (V8SImode);
   emit_insn (gen_lasx_xvabsd_s_b (t1, operands[1], operands[2]));
-  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lasx_xvhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
   DONE;
 })
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index dbdb423011b..5e5e2503636 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -3468,8 +3468,8 @@ (define_expand "usadv16qi"
   rtx t2 = gen_reg_rtx (V8HImode);
   rtx t3 = gen_reg_rtx (V4SImode);
   emit_insn (gen_lsx_vabsd_u_bu (t1, operands[1], operands[2]));
-  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lsx_vhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
   DONE;
 })
@@ -3485,8 +3485,8 @@ (define_expand "ssadv16qi"
   rtx t2 = gen_reg_rtx (V8HImode);
   rtx t3 = gen_reg_rtx (V4SImode);
   emit_insn (gen_lsx_vabsd_s_b (t1, operands[1], operands[2]));
-  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lsx_vhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
   DONE;
 })
-- 
2.20.1



Re: [PATCH] expmed: Get vec_extract element mode from insn_data, [PR112999]

2023-12-14 Thread Richard Sandiford
Robin Dapp  writes:
> Hi,
>
> this is a bit of a follow up of the latest expmed change.
>
> In extract_bit_field_1 we try to get a better vector mode before
> extracting from it.  Better refers to the case when the requested target
> mode does not equal the inner mode of the vector to extract from and we
> have an equivalent tieable vector mode with a fitting inner mode.
>
> On riscv this triggered an ICE (PR112999) because we would take the
> detour of extracting from a mask-mode vector via a vector integer mode.
> One element of that mode could be subreg-punned with TImode which, in
> turn, would need to be operated on in DImode chunks.
>
> As the backend might return the extracted value in a different mode than
> the inner mode of the input vector, we might already have a mode
> equivalent to the target mode.  Therefore, this patch first obtains the
> mode the backend uses for the particular vec_extract and then compares
> it against the target mode.  Only if those disagree we try to find a
> better mode.  Otherwise we continue with the initial one.
>
> Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> on riscv.

This doesn't seem like the right condition.  The mode of the
operand is semantically arbitrary (as long as it has enough bits).
E.g. if the pattern happens to have a HImode operand, it sounds like
the problem you're describing would still fire for SImode.

It looks like:

  FOR_EACH_MODE_FROM (new_mode, new_mode)
if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
&& known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
&& targetm.vector_mode_supported_p (new_mode)
&& targetm.modes_tieable_p (GET_MODE (op0), new_mode))
  break;

should at least test whether the bitpos is a multiple of
GET_MODE_UNIT_SIZE (new_mode), otherwise the new mode isn't really
better.  Arguably it should also test whether bitnum is equal
to GET_MODE_UNIT_SIZE (new_mode).

Not sure whether there'll be any fallout from that, but it seems
worth trying.

Thanks,
Richard

>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   PR target/112999
>
>   * expmed.cc (extract_bit_field_1):  Get vec_extract's result
>   element mode from insn_data and compare it to the target mode.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/autovec/pr112999.c: New test.
> ---
>  gcc/expmed.cc   | 17 +++--
>  .../gcc.target/riscv/rvv/autovec/pr112999.c | 17 +
>  2 files changed, 32 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
>
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index ed17850ff74..6fbe4d9cfaf 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1722,10 +1722,23 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> bitsize, poly_uint64 bitnum,
>   }
>  }
>  
> +  /* The target may prefer to return the extracted value in a different mode
> + than INNERMODE.  */
> +  machine_mode outermode = GET_MODE (op0);
> +  machine_mode element_mode = GET_MODE_INNER (outermode);
> +  if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
> +{
> +  enum insn_code icode
> + = convert_optab_handler (vec_extract_optab, outermode, element_mode);
> +
> +  if (icode != CODE_FOR_nothing)
> + element_mode = insn_data[icode].operand[0].mode;
> +}
> +
>/* See if we can get a better vector mode before extracting.  */
>if (VECTOR_MODE_P (GET_MODE (op0))
>&& !MEM_P (op0)
> -  && GET_MODE_INNER (GET_MODE (op0)) != tmode)
> +  && element_mode != tmode)
>  {
>machine_mode new_mode;
>  
> @@ -1755,7 +1768,7 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>/* Use vec_extract patterns for extracting parts of vectors whenever
>   available.  If that fails, see whether the current modes and bitregion
>   give a natural subreg.  */
> -  machine_mode outermode = GET_MODE (op0);
> +  outermode = GET_MODE (op0);
>if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
>  {
>scalar_mode innermode = GET_MODE_INNER (outermode);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
> new file mode 100644
> index 000..c049c5a0386
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d 
> --param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax 
> -O3 -fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
> +
> +int a[1024];
> +int b[1024];
> +
> +_Bool
> +fn1 ()
> +{
> +  _Bool tem;
> +  for (int i = 0; i < 1024; ++i)
> +{
> +  tem = !a[i];
> +  b[i] = tem;
> +}
> +  return tem;
> +}


[PATCH] Revert "RISC-V: Add avail interface into function_group_info"

2023-12-14 Thread Feng Wang
This reverts commit ce7e66787b5b4ad385b21756da5a89171d233ddc.
Will refactor this part in the same way as aarch64 sve.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Revert changes.
(read_vl): Ditto.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.
(vfwcvt_rtz_xu): Ditto.
(vfwcvt_f): Ditto.
(vfwcvt_x_frm): Ditto.
(vfwcvt_xu_frm): Ditto.
(vfncvt_x): Ditto.
(vfncvt_xu): Ditto.
(vfncvt_rtz_x): Ditto.
(vfncvt_rtz_

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:
> On 2023/12/14 4:52, Thomas Schwinge wrote:
>> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> > On 2023/12/12 1:45, H.J. Lu wrote:
>> >> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng 
>> wrote:
>> >> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> >> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> >> > > > This patch try to introduce the rwlock and split the read/write
>> >> > > > to unit_root tree and unit_cache with rwlock instead of the
>> >> > > > mutex to increase CPU efficiency. In the get_gfc_unit function,
>> >> > > > the percentage to step into the insert_unit function is around
>> >> > > > 30%, in most instances, we can get the unit in the phase of
>> >> > > > reading the unit_cache or unit_root tree. So split the
>> >> > > > read/write phase by rwlock would be an approach to make it more
>> parallel.
>> >> > > >
>> >> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> >> > > > 220 cores. The benchmark we used is
>> >> > > > https://github.com/rwesson/NEAT

>> I've just filed 
>> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution
>> test timeouts".
>> Would you be able to look into that?

> Sure, I will look into that.
>
> BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> your
> test environment to me to help reproduce the issue?

That's unfortunately not possible: it's behind company VPN, restricted
access.  :-/ I'll later try to have at least a quick look where it's
hanging, or what it's doing.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Tobias Burnus


 
 
  
   Hi,
   
   
   
Thomas Schwinge wrote:

   
 

   
On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:



 On 2023/12/14 4:52, Thomas Schwinge wrote:
 >> I've just filed > Would you be able to look into that?

   
 



 Sure, I will look into that.
 



 BTW, I didn’t have the PowerPC in hands, do you mind granting the access of your
 

 test environment to me to help reproduce the issue?
 

   
That's unfortunately not possible:

   
  
   How about https://gcc.gnu.org/wiki/CompileFarm I think there is a PowerPC system available.
   
  
    
   
  
   Tobias
   
  
    
   
  
    
  
 



Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Jakub Jelinek
On Thu, Dec 14, 2023 at 01:29:01PM +0100, Thomas Schwinge wrote:
> > Sure, I will look into that.
> >
> > BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> > your
> > test environment to me to help reproduce the issue?
> 
> That's unfortunately not possible: it's behind company VPN, restricted
> access.  :-/ I'll later try to have at least a quick look where it's
> hanging, or what it's doing.

There is always https://gcc.gnu.org/wiki/CompileFarm
There are e.g. 192, 160, 128, 80, 64 thread power machines.
Whether any of them can actually reproduce it, I haven't tried.
But shouldn't be that time consuming to reproduce, for this kind of thing
one can just build
.../configure --enable-languages=c,c++,fortran --disable-bootstrap 
--disable-libsanitizer
make -jN
compiler and just
cd power*/libgomp
make check RUNTESTFLAGS=fortran.exp=rwlock*.f90
repeatedly to see if it gets stuck.

Jakub



Re: [PATCH] Revert "RISC-V: Add avail interface into function_group_info"

2023-12-14 Thread Kito Cheng
ok

On Thu, Dec 14, 2023 at 8:08 PM Feng Wang  wrote:
>
> This reverts commit ce7e66787b5b4ad385b21756da5a89171d233ddc.
> Will refactor this part in the same way as aarch64 sve.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
> Revert 
> changes.
> (read_vl): Ditto.
> (vlenb): Ditto.
> (vsetvl): Ditto.
> (vsetvlmax): Ditto.
> (vle): Ditto.
> (vse): Ditto.
> (vlm): Ditto.
> (vsm): Ditto.
> (vlse): Ditto.
> (vsse): Ditto.
> (vluxei8): Ditto.
> (vluxei16): Ditto.
> (vluxei32): Ditto.
> (vluxei64): Ditto.
> (vloxei8): Ditto.
> (vloxei16): Ditto.
> (vloxei32): Ditto.
> (vloxei64): Ditto.
> (vsuxei8): Ditto.
> (vsuxei16): Ditto.
> (vsuxei32): Ditto.
> (vsuxei64): Ditto.
> (vsoxei8): Ditto.
> (vsoxei16): Ditto.
> (vsoxei32): Ditto.
> (vsoxei64): Ditto.
> (vleff): Ditto.
> (vadd): Ditto.
> (vsub): Ditto.
> (vrsub): Ditto.
> (vneg): Ditto.
> (vwaddu): Ditto.
> (vwsubu): Ditto.
> (vwadd): Ditto.
> (vwsub): Ditto.
> (vwcvt_x): Ditto.
> (vwcvtu_x): Ditto.
> (vzext): Ditto.
> (vsext): Ditto.
> (vadc): Ditto.
> (vmadc): Ditto.
> (vsbc): Ditto.
> (vmsbc): Ditto.
> (vand): Ditto.
> (vor): Ditto.
> (vxor): Ditto.
> (vnot): Ditto.
> (vsll): Ditto.
> (vsra): Ditto.
> (vsrl): Ditto.
> (vnsrl): Ditto.
> (vnsra): Ditto.
> (vncvt_x): Ditto.
> (vmseq): Ditto.
> (vmsne): Ditto.
> (vmsltu): Ditto.
> (vmslt): Ditto.
> (vmsleu): Ditto.
> (vmsle): Ditto.
> (vmsgtu): Ditto.
> (vmsgt): Ditto.
> (vmsgeu): Ditto.
> (vmsge): Ditto.
> (vminu): Ditto.
> (vmin): Ditto.
> (vmaxu): Ditto.
> (vmax): Ditto.
> (vmul): Ditto.
> (vmulh): Ditto.
> (vmulhu): Ditto.
> (vmulhsu): Ditto.
> (vdivu): Ditto.
> (vdiv): Ditto.
> (vremu): Ditto.
> (vrem): Ditto.
> (vwmul): Ditto.
> (vwmulu): Ditto.
> (vwmulsu): Ditto.
> (vmacc): Ditto.
> (vnmsac): Ditto.
> (vmadd): Ditto.
> (vnmsub): Ditto.
> (vwmaccu): Ditto.
> (vwmacc): Ditto.
> (vwmaccsu): Ditto.
> (vwmaccus): Ditto.
> (vmerge): Ditto.
> (vmv_v): Ditto.
> (vsaddu): Ditto.
> (vsadd): Ditto.
> (vssubu): Ditto.
> (vssub): Ditto.
> (vaaddu): Ditto.
> (vaadd): Ditto.
> (vasubu): Ditto.
> (vasub): Ditto.
> (vsmul): Ditto.
> (vssrl): Ditto.
> (vssra): Ditto.
> (vnclipu): Ditto.
> (vnclip): Ditto.
> (vfadd): Ditto.
> (vfsub): Ditto.
> (vfrsub): Ditto.
> (vfadd_frm): Ditto.
> (vfsub_frm): Ditto.
> (vfrsub_frm): Ditto.
> (vfwadd): Ditto.
> (vfwsub): Ditto.
> (vfwadd_frm): Ditto.
> (vfwsub_frm): Ditto.
> (vfmul): Ditto.
> (vfdiv): Ditto.
> (vfrdiv): Ditto.
> (vfmul_frm): Ditto.
> (vfdiv_frm): Ditto.
> (vfrdiv_frm): Ditto.
> (vfwmul): Ditto.
> (vfwmul_frm): Ditto.
> (vfmacc): Ditto.
> (vfnmsac): Ditto.
> (vfmadd): Ditto.
> (vfnmsub): Ditto.
> (vfnmacc): Ditto.
> (vfmsac): Ditto.
> (vfnmadd): Ditto.
> (vfmsub): Ditto.
> (vfmacc_frm): Ditto.
> (vfnmacc_frm): Ditto.
> (vfmsac_frm): Ditto.
> (vfnmsac_frm): Ditto.
> (vfmadd_frm): Ditto.
> (vfnmadd_frm): Ditto.
> (vfmsub_frm): Ditto.
> (vfnmsub_frm): Ditto.
> (vfwmacc): Ditto.
> (vfwnmacc): Ditto.
> (vfwmsac): Ditto.
> (vfwnmsac): Ditto.
> (vfwmacc_frm): Ditto.
> (vfwnmacc_frm): Ditto.
> (vfwmsac_frm): Ditto.
> (vfwnmsac_frm): Ditto.
> (vfsqrt): Ditto.
> (vfsqrt_frm): Ditto.
> (vfrsqrt7): Ditto.
> (vfrec7): Ditto.
> (vfrec7_frm): Ditto.
> (vfmin): Ditto.
> (vfmax): Ditto.
> (vfsgnj): Ditto.
> (vfsgnjn): Ditto.
> (vfsgnjx): Ditto.
> (vfneg): Ditto.
> (vfabs): Ditto.
> (vmfeq): Ditto.
> (vmfne): Ditto.
> (vmflt): Ditto.
> (vmfle): Ditto.
> (vmfgt): Ditto.
> (vmfge): Ditto.
> (vfclass): Ditto.
> (vfmerge): Ditto.
> (vfmv_v): Ditto.
> (vfcvt_x): Ditto.
> (vfcvt_xu): Ditto.
> (vfcvt_rtz_x): Ditto.
> (vfcvt_rtz_xu): Ditto.
> (vfcvt_f): Dit

[PATCH v2] LoongArch: Fix incorrect code generation for sad pattern

2023-12-14 Thread Jiahao Xu
When I attempt to enable vect_usad_char effective target for LoongArch, 
slp-reduc-sad.c
and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern 
generates bad
code. This patch to fixed them, for sad patterns, use zero expansion instead of 
sign
expansion for reduction.

Currently, we are fixing failed vectorized tests, and in the future, we will
enable more tests of "vect" for LoongArch.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use zero expansion instruction.
* config/loongarch/lsx.md: Ditto.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index eeac8cd984b..db6871507e2 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -5097,8 +5097,8 @@ (define_expand "usadv32qi"
   rtx t2 = gen_reg_rtx (V16HImode);
   rtx t3 = gen_reg_rtx (V8SImode);
   emit_insn (gen_lasx_xvabsd_u_bu (t1, operands[1], operands[2]));
-  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lasx_xvhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
   DONE;
 })
@@ -5114,8 +5114,8 @@ (define_expand "ssadv32qi"
   rtx t2 = gen_reg_rtx (V16HImode);
   rtx t3 = gen_reg_rtx (V8SImode);
   emit_insn (gen_lasx_xvabsd_s_b (t1, operands[1], operands[2]));
-  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lasx_xvhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
   DONE;
 })
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index dbdb423011b..5e5e2503636 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -3468,8 +3468,8 @@ (define_expand "usadv16qi"
   rtx t2 = gen_reg_rtx (V8HImode);
   rtx t3 = gen_reg_rtx (V4SImode);
   emit_insn (gen_lsx_vabsd_u_bu (t1, operands[1], operands[2]));
-  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lsx_vhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
   DONE;
 })
@@ -3485,8 +3485,8 @@ (define_expand "ssadv16qi"
   rtx t2 = gen_reg_rtx (V8HImode);
   rtx t3 = gen_reg_rtx (V4SImode);
   emit_insn (gen_lsx_vabsd_s_b (t1, operands[1], operands[2]));
-  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
-  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_lsx_vhaddw_hu_bu (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_wu_hu (t3, t2, t2));
   emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
   DONE;
 })
-- 
2.20.1



Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread Richard Biener
On Thu, 14 Dec 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richard.
> 
> Let me clarify again to make sure I understand your comments correctly:
> 
> Do you suggest not to model address cost here like other partial 
> vectorization style (while_ult, avx512...etc). Then set COST = 1 since 
> we only have SELECT_VL since beginning. At various cases we saw, COST=1 
> is better than COST=2.

I suggest to not model with address cost in mind since nothing else in
the vectorizer does that and thus you're "comparing" apples with
oranges then.

If address cost is important to decide between SELECT_VL and not
SELECT_VL then we'd need to start modeling address cost _at all_.

Richard.

> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-14 18:46
> To: juzhe.zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
> vectorization COST model
> 
> 
> Am 14.12.2023 um 09:28 schrieb juzhe.zh...@rivai.ai:
> 
>  
> Hi, Richard.
> 
> I have a question about the decrement IV costing since I find the reduction 
> case is generating inferior codegen.
> 
> reduc_plus_int:
> mv a3,a0
> ble a1,zero,.L7
> addiw a5,a1,-1
> li a4,2
> bleu a5,a4,.L8
> vsetivli zero,4,e32,m1,ta,ma
> srliw a4,a1,2
> vmv.v.i v1,0
> slli a4,a4,4
> add a4,a4,a0
> mv a5,a0
> .L4:
> vle32.v v2,0(a5)
> addi a5,a5,16
> vadd.vv v1,v1,v2
> bne a5,a4,.L4
> li a5,0
> vmv.s.x v2,a5
> andi a5,a1,-4
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> beq a1,a5,.L12
> .L3:
> subw a1,a1,a5
> slli a5,a5,32
> srli a5,a5,32
> slli a1,a1,32
> vsetvli a4,zero,e32,m1,ta,ma
> slli a5,a5,2
> srli a1,a1,32
> vmv.v.i v1,0
> add a3,a3,a5
> vsetvli a1,a1,e8,mf4,ta,ma
> vle32.v v3,0(a3)
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli zero,a1,e32,m1,tu,ma
> vmv.v.v v1,v3
> vsetvli a4,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a5,v1
> addw a0,a0,a5
> ret
> .L12:
> ret
> .L7:
> li a0,0
> ret
> .L8:
> li a5,0
> li a0,0
> j .L3
> 
> This patch adjust length_update_cost from 3 (original cost) into 2 can fix 
> conversion case (the test append in this patch).
> But can't fix reduction case.
> 
> Then I adjust it into COST = 1:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 19e38b8637b..50c99b1fe79 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  processed in current iteration, and a SHIFT operation to
>  compute the next memory address instead of adding 
> vectorization
>  factor.  */
> - length_update_cost = 2;
> + length_update_cost = 1;
> else
>   /* For increment IV stype, Each may need two MINs and one MINUS 
> to
>  update lengths in body for next iteration.  */
> 
> Then the codegen is reasonable now:
> 
> reduc_plus_int:
> ble a1,zero,.L4
> vsetvli a5,zero,e32,m1,ta,ma
> vmv.v.i v1,0
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma
> vle32.v v2,0(a0)
> slli a4,a5,2
> sub a1,a1,a5
> add a0,a0,a4
> vadd.vv v1,v2,v1
> bne a1,zero,.L3
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli a5,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> ret
> .L4:
> li a0,0
> ret
> 
> The reason I set COST = 2 instead of 1 in this patch since
> 
> one COST is for SELECT_VL.
> 
> The other is for memory address calculation since we don't update memory 
> address with adding VF directly,
> instead:
> 
> We shift the result of SELECV_VL, and then add it into the memory IV as 
> follows:
> 
> SSA_1 = SELECT_VL --> SSA_1 is element-wise
> SSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)
> next iteration memory address = current iteration memory address + SSA_2.
> 
> If elemente is INT8, then the shift operation is not needed:
> SSA_2 = SSA_1 << 1 since it is already byte-wise.
> 
> So, my question is the COST should be 1 or 2.
> It seems that COST = 1 is better for
> using SELECT_VL.
> 
> We are not modeling address cost, so trying to account for this here is only 
> going to be a heuristic that?s as many times wrong than it is correct.  If 
> the difference between SELECT_VL and not is so small  then you?ll have a hard 
> time modeling this here.
> 
>  
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-13 18:17
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
> vectorization COST model
> On Wed, 13 Dec 2023, Juzhe-Zhong wrote:
>  
> > Hi, before this patch, a simple conversion case for RVV codegen:
> > 
> > foo:
> > ble a2,zero,.L8
> > addiw   a5,a2,-1
> > li  a4,6
> > bleua5,a4,.L6
> > srliw   a3,a2,3
> > sllia3,a3,3
> > add a3,a3,a0
> > mv  a5,a0
> > mv  a4,a1
> > vsetivlizero,8,e16,m1

[committed] Revert "RISC-V: Add avail interface into function_group_info"

2023-12-14 Thread Feng Wang
This reverts commit ce7e66787b5b4ad385b21756da5a89171d233ddc.
Will refactor this part in the same way as aarch64 sve.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Revert changes.
(read_vl): Ditto.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.
(vfwcvt_rtz_xu): Ditto.
(vfwcvt_f): Ditto.
(vfwcvt_x_frm): Ditto.
(vfwcvt_xu_frm): Ditto.
(vfncvt_x): Ditto.
(vfncvt_xu): Ditto.
(vfncvt_rtz_x): Ditto.
(vfncvt_rtz_

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-14 Thread Richard Biener
On Wed, 13 Dec 2023, Tamar Christina wrote:

> > > >   else if (vect_use_mask_type_p (stmt_info))
> > > > {
> > > >   unsigned int precision = stmt_info->mask_precision;
> > > >   scalar_type = build_nonstandard_integer_type (precision, 1);
> > > >   vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > group_size);
> > > >   if (!vectype)
> > > > return opt_result::failure_at (stmt, "not vectorized: 
> > > > unsupported"
> > > >" data-type %T\n", scalar_type);
> > > >
> > > > Richard, do you have any advice here?  I suppose 
> > > > vect_determine_precisions
> > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > extra mask producer we add here we have to emulate what it would have
> > > > done, right?
> > >
> > > How about handling gconds directly in vect_determine_mask_precision?
> > > In a sense it's not needed, since gconds are always roots, and so we
> > > could calculate their precision on the fly instead.  But handling it in
> > > vect_determine_mask_precision feels like it should reduce the number
> > > of special cases.
> > 
> > Yeah, that sounds worth trying.
> > 
> > Richard.
> 
> So here's a respin with this suggestion and the other issues fixed.
> Note that the testcases still need to be updated with the right stanzas.
> 
> The patch is much smaller, I still have a small change to
> vect_get_vector_types_for_stmt  in case we get there on a gcond where
> vect_recog_gcond_pattern couldn't apply due to the target missing an
> appropriate vectype.  The change only gracefully rejects the gcond.
> 
> Since patterns cannot apply to the same root twice I've had to also do
> the split of the condition out of the gcond in bitfield lowering.

Bah.  Guess we want to fix that (next stage1).  Can you please add
a comment to the split out done in vect_recog_bitfield_ref_pattern?

> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> 
> Ok for master?

OK with the above change.

Thanks,
Richard.

> Thanks,
> Tamar
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
>   (vect_recog_bitfield_ref_pattern): Update to split out bool.
>   (vect_recog_gcond_pattern): New.
>   (possible_vector_mask_operation_p): Support gcond.
>   (vect_determine_mask_precision): Likewise.
>   * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
>   lhs.
>   (vectorizable_early_exit): New.
>   (vect_analyze_stmt, vect_transform_stmt): Use it.
>   (vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
>   vect_recog_gcond_pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-early-break_84.c: New test.
>   * gcc.dg/vect/vect-early-break_85.c: New test.
>   * gcc.dg/vect/vect-early-break_86.c: New test.
>   * gcc.dg/vect/vect-early-break_87.c: New test.
>   * gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> new file mode 100644
> index 
> ..0622339491d333b07c2ce895785b5216713097a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> @@ -0,0 +1,39 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include 
> +
> +#ifndef N
> +#define N 17
> +#endif
> +bool vect_a[N] = { false, false, true, false, false, false,
> +   false, false, false, false, false, false,
> +   false, false, false, false, false };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(bool x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] == x)
> + return 1;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (true) != 1)
> +abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> new file mode 100644
> index 
> ..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +int vect_a[N] = { 5, 4, 8, 4, 6 };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(int x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> + 

In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include ' (was: [PATCH v4] A new copy propagation and PHI elimination pass)

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-12-13T17:12:11+0100, Filip Kastl  wrote:
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc

> +#include 

Pushed to master branch commit 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2
"In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of 
'#include '",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 14 Dec 2023 14:12:45 +0100
Subject: [PATCH] In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM'
 instead of '#include '

... to avoid issues such as:

In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
 from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
 from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
[...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
 return malloc (size);
^
make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1

Minor fix-up for commit cd794c3961017703a4d2ca0e854ea23b3d4b6373
"A new copy propagation and PHI elimination pass".

	gcc/
	* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
	'#include '.
---
 gcc/gimple-ssa-sccopy.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index ac5ec32eb32..7ebb6c05caf 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "vec.h"
 #include "hash-set.h"
-#include 
 #include "ssa-iterators.h"
 #include "gimple-fold.h"
 #include "gimplify.h"
-- 
2.34.1



Re: [PATCH] middle-end: Fix up constant handling in emit_conditional_move [PR111260]

2023-12-14 Thread Richard Biener
On Wed, Dec 13, 2023 at 5:51 PM Andrew Pinski  wrote:
>
> After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a 
> register
> if it is shared with one of the other operands. The problem is used the 
> comparison
> mode for the register but that could be different from the operand mode. This
> causes some issues on some targets.
> To fix it, we either need to have the modes match or if it is an integer mode,
> then we can use the lower part for the smaller mode.
>
> Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.

I think to fulfil the original purpose requiring matching modes is enough,
the x86 backend checks for equality here, a subreg wouldn't be enough.
In fact the whole point preserving equality doesn't work then.

So please just check the modes are equal (I also see I used
'mode' for the cost check - I've really split out the check done
by prepare_cmp_insn here btw).  This seemed to be the simplest
solution at the time, rather than for example trying to postpone
legitimizing the constants (so rtx_equal_p could continue to be
lazy with slight mode mismatches).

Richard.

> PR middle-end/111260
>
> gcc/ChangeLog:
>
> * optabs.cc (emit_conditional_move): Fix up mode handling for
> forcing the constant to a register.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/condmove-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/optabs.cc | 40 +--
>  .../gcc.c-torture/compile/condmove-1.c|  9 +
>  2 files changed, 45 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/condmove-1.c
>
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index f0a048a6bdb..573cf22760e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5131,26 +5131,58 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>   /* If we are optimizing, force expensive constants into a register
>  but preserve an eventual equality with op2/op3.  */
>   if (CONSTANT_P (orig_op0) && optimize
> + && (cmpmode == mode
> + || (GET_MODE_CLASS (cmpmode) == MODE_INT
> + && GET_MODE_CLASS (mode) == MODE_INT))
>   && (rtx_cost (orig_op0, mode, COMPARE, 0,
> optimize_insn_for_speed_p ())
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> + machine_mode new_mode;
> + if (known_le (GET_MODE_PRECISION (cmpmode), GET_MODE_PRECISION 
> (mode)))
> +   new_mode = mode;
> + else
> +   new_mode = cmpmode;
>   if (rtx_equal_p (orig_op0, op2))
> -   op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> +   {
> + rtx r = force_reg (new_mode, orig_op0);
> + op2p = gen_lowpart (mode, r);
> + XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
> +   }
>   else if (rtx_equal_p (orig_op0, op3))
> -   op3p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> +   {
> + rtx r = force_reg (new_mode, orig_op0);
> + op3p = gen_lowpart (mode, r);
> + XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
> +   }
> }
>   if (CONSTANT_P (orig_op1) && optimize
> + && (cmpmode == mode
> + || (GET_MODE_CLASS (cmpmode) == MODE_INT
> + && GET_MODE_CLASS (mode) == MODE_INT))
>   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> optimize_insn_for_speed_p ())
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> + machine_mode new_mode;
> + if (known_le (GET_MODE_PRECISION (cmpmode), GET_MODE_PRECISION 
> (mode)))
> +   new_mode = mode;
> + else
> +   new_mode = cmpmode;
>   if (rtx_equal_p (orig_op1, op2))
> -   op2p = XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
> +   {
> + rtx r = force_reg (new_mode, orig_op1);
> + op2p = gen_lowpart (mode, r);
> + XEXP (comparison, 1) = gen_lowpart (cmpmode, r);
> +   }
>   else if (rtx_equal_p (orig_op1, op3))
> -   op3p = XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
> +   {
> + rtx r = force_reg (new_mode, orig_op1);
> + op3p = gen_lowpart (mode, r);
> + XEXP (comparison, 1) = gen_lowpart (cmpmode, r);
> +   }
> }
>   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> GET_CODE (comparison), NULL_RTX, unsignedp,
> diff --git a/gcc/testsuite/gcc.c-torture/compile/condmove-1.

[committed] Fix m68k testcase for c99

2023-12-14 Thread Jeff Law


More fallout from the c99 conversion.   The m68k specific test pr63347.c 
calls exit and abort without a prototype in scope.  This patch turns 
them into __builtin calls avoiding the error.


Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.

Jeffcommit 679adb2396a911b5999591f7a4f27a88064e91ff
Author: Jeff Law 
Date:   Thu Dec 14 06:31:49 2023 -0700

[committed] Fix m68k testcase for c99

More fallout from the c99 conversion.   The m68k specific test pr63347.c 
calls
exit and abort without a prototype in scope.  This patch turns them into
__builtin calls avoiding the error.

Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.

gcc/testsuite
* gcc.target/m68k/pr63347.c: Call __builtin_abort and __builtin_exit
instead of abort and exit.

diff --git a/gcc/testsuite/gcc.target/m68k/pr63347.c 
b/gcc/testsuite/gcc.target/m68k/pr63347.c
index 63964769766..b817f4694f3 100644
--- a/gcc/testsuite/gcc.target/m68k/pr63347.c
+++ b/gcc/testsuite/gcc.target/m68k/pr63347.c
@@ -32,13 +32,13 @@ int main(int argc, char *argv[])
 myaddr = 0x0;
 ret = print_info(&myaddr);
 if (!ret)
-abort ();
+__builtin_abort ();
 
 myaddr = 0x01020304;
 ret = print_info(&myaddr);
 if (ret)
-abort ();
-exit (0);
+__builtin_abort ();
+__builtin_exit (0);
 }
 
 


[PATCH] Middle-end: Do not model address cost for SELECT_VL style vectorization

2023-12-14 Thread Juzhe-Zhong
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization doesn't
do that.

To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen for RVV.

Ok for trunk ?

PR target/53

gcc/ChangeLog:

* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Remove 
address cost for select_vl/decrement IV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr53.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c: New test.

---
 .../vect/costmodel/riscv/rvv/pr53-1.c  | 18 ++
 .../riscv/rvv/{pr53.c => pr11153-2.c}  |  4 ++--
 gcc/tree-vect-loop.cc  | 10 --
 3 files changed, 24 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
 rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{pr53.c => 
pr11153-2.c} (93%)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
new file mode 100644
index 000..51c91f7410c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mtune=generic-ooo -ffast-math" } */
+
+#define DEF_REDUC_PLUS(TYPE)   
\
+  TYPE __attribute__ ((noinline, noclone)) 
\
+  reduc_plus_##TYPE (TYPE *__restrict a, int n)
\
+  {
\
+TYPE r = 0;
\
+for (int i = 0; i < n; ++i)
\
+  r += a[i];   
\
+return r;  
\
+  }
+
+#define TEST_PLUS(T) T (int) T (float)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+/* { dg-final { scan-assembler-not {vsetivli\s+zero,\s*4} } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
similarity index 93%
rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c
rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
index 06e08ec5f2e..d361f1fc7fa 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mtune=generic-ooo" } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -ffast-math" 
} */
 
 #define DEF_REDUC_PLUS(TYPE)   
\
   TYPE __attribute__ ((noinline, noclone)) 
\
@@ -11,7 +11,7 @@
 return r;  
\
   }
 
-#define TEST_PLUS(T) T (int)
+#define TEST_PLUS(T) T (int) T (float)
 
 TEST_PLUS (DEF_REDUC_PLUS)
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 19e38b8637b..7a3db5f098b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4872,12 +4872,10 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 
unsigned int length_update_cost = 0;
if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
- /* For decrement IV style, we use a single SELECT_VL since
-beginning to calculate the number of elements need to be
-processed in current iteration, and a SHIFT operation to
-compute the next memory address instead of adding vectorization
-factor.  */
- length_update_cost = 2;
+ /* For decrement IV style, Each only need a single SELECT_VL
+or MIN since beginning to calculate the number of elements
+need to be processed in current iteration.  */
+ length_update_cost = 1;
else
  /* For increment IV stype, Each may need two MINs and one MINUS to
 update lengths in body for next iteration.  */
-- 
2.36.1



[PATCH] aarch64: Improve handling of accumulators in early-ra

2023-12-14 Thread Richard Sandiford
Being very simplistic, early-ra just models an allocno's live range
as a single interval.  This doesn't work well for single-register
accumulators that are updated multiple times in a loop, since in
SSA form, each intermediate result will be a separate SSA name and
will remain separate from the accumulator even after out-of-ssa.
This means that in something like:

  for (;;)
{
  x = x + ...;
  x = x + ...;
}

the first definition of x and the second use will be a separate pseudo
from the "main" loop-carried pseudo.

A real RA would fix this by keeping general, segmented live ranges.
But that feels like a slippery slope in this context.

This patch instead looks for sharability at a more local level,
as described in the comments.  It's a bit hackish, but hopefully
not too much.

The patch also contains some small tweaks that are needed to make
the new and existing tests pass:

- fix a case where a pseudo that was only moved was wrongly treated
  as not an FPR candidate

- fix some bookkeeping related to is_strong_copy_src

- use the number of FPR preferences as a tiebreaker when sorting colors

I fully expect that we'll need to be more aggressive at skipping the
early-ra allocation.  For example, it probably makes sense to refuse any
allocation that involves an FPR move.  But I'd like to keep collecting
examples of where things go wrong first, so that hopefully we can improve
the cases with strided registers or structures.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New
member variable.
(allocno_info::equiv_allocno): Replace with...
(allocno_info::related_allocno): ...this member variable.
(allocno_info::chain_prev): Put into an enum with...
(allocno_info::last_use_point): ...this new member variable.
(color_info::num_fpr_preferences): New member variable.
(early_ra::m_shared_allocnos): Likewise.
(allocno_info::is_shared): New member function.
(allocno_info::is_equiv_to): Likewise.
(early_ra::dump_allocnos): Dump sharing information.  Tweak column
widths.
(early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2.
(early_ra::start_new_region): Handle m_shared_allocnos.
(early_ra::create_allocno_group): Set related_allocno rather than
equiv_allocno.
(early_ra::record_allocno_use): Likewise.  Detect multiple calls
for the same program point.  Update last_use_point and is_equiv.
Clear is_strong_copy_src rather than is_strong_copy_dest.
(early_ra::record_allocno_def): Use related_allocno rather than
equiv_allocno.  Update last_use_point.
(early_ra::valid_equivalence_p): Replace with...
(early_ra::find_related_start): ...this new function.
(early_ra::record_copy): Look for cases where a destination copy chain
can be shared with the source allocno.
(early_ra::find_strided_accesses): Update for equiv_allocno->
related_allocno change.  Only call consider_strong_copy_src_chain
at the head of a copy chain.
(early_ra::is_chain_candidate): Skip shared allocnos.  Update for
new representation of equivalent allocnos.
(early_ra::chain_allocnos): Update for new representation of
equivalent allocnos.
(early_ra::try_to_chain_allocnos): Likewise.
(early_ra::merge_fpr_info): New function, split out from...
(early_ra::set_single_color_rep): ...here.
(early_ra::form_chains): Handle shared allocnos.
(early_ra::process_copies): Count the number of FPR preferences.
(early_ra::cmp_decreasing_size): Rename to...
(early_ra::cmp_allocation_order): ...this.  Sort equal-sized groups
by the number of FPR preferences.
(early_ra::finalize_allocation): Handle shared allocnos.
(early_ra::process_region): Reset chain_prev as well as chain_next.

gcc/testsuite/
* gcc.target/aarch64/sve/accumulators_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to
be in any order.
* gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.
---
 gcc/config/aarch64/aarch64-early-ra.cc| 283 ++
 .../gcc.target/aarch64/sve/accumulators_1.c   | 200 +
 .../aarch64/sve/acle/asm/create2_1.c  |  65 
 .../aarch64/sve/acle/asm/create3_1.c  |  72 ++---
 .../aarch64/sve/acle/asm/create4_1.c  |  96 +++---
 5 files changed, 573 insertions(+), 143 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/accumulators_1.c

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index f05869b5cf2..484db94513d 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -256,6 +256

Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread 钟居哲
Thanks a lot for the clarification.

I send a patch to remove address memory cost:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640595.html 
for decremnt IV/SELECT_VL.

And I have tested various cases that are all get better codegen in RVV.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-12-14 20:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; jeffreyalaw
Subject: Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
vectorization COST model
On Thu, 14 Dec 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richard.
> 
> Let me clarify again to make sure I understand your comments correctly:
> 
> Do you suggest not to model address cost here like other partial 
> vectorization style (while_ult, avx512...etc). Then set COST = 1 since 
> we only have SELECT_VL since beginning. At various cases we saw, COST=1 
> is better than COST=2.
 
I suggest to not model with address cost in mind since nothing else in
the vectorizer does that and thus you're "comparing" apples with
oranges then.
 
If address cost is important to decide between SELECT_VL and not
SELECT_VL then we'd need to start modeling address cost _at all_.
 
Richard.
 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-14 18:46
> To: juzhe.zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
> vectorization COST model
> 
> 
> Am 14.12.2023 um 09:28 schrieb juzhe.zh...@rivai.ai:
> 
>  
> Hi, Richard.
> 
> I have a question about the decrement IV costing since I find the reduction 
> case is generating inferior codegen.
> 
> reduc_plus_int:
> mv a3,a0
> ble a1,zero,.L7
> addiw a5,a1,-1
> li a4,2
> bleu a5,a4,.L8
> vsetivli zero,4,e32,m1,ta,ma
> srliw a4,a1,2
> vmv.v.i v1,0
> slli a4,a4,4
> add a4,a4,a0
> mv a5,a0
> .L4:
> vle32.v v2,0(a5)
> addi a5,a5,16
> vadd.vv v1,v1,v2
> bne a5,a4,.L4
> li a5,0
> vmv.s.x v2,a5
> andi a5,a1,-4
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> beq a1,a5,.L12
> .L3:
> subw a1,a1,a5
> slli a5,a5,32
> srli a5,a5,32
> slli a1,a1,32
> vsetvli a4,zero,e32,m1,ta,ma
> slli a5,a5,2
> srli a1,a1,32
> vmv.v.i v1,0
> add a3,a3,a5
> vsetvli a1,a1,e8,mf4,ta,ma
> vle32.v v3,0(a3)
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli zero,a1,e32,m1,tu,ma
> vmv.v.v v1,v3
> vsetvli a4,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a5,v1
> addw a0,a0,a5
> ret
> .L12:
> ret
> .L7:
> li a0,0
> ret
> .L8:
> li a5,0
> li a0,0
> j .L3
> 
> This patch adjust length_update_cost from 3 (original cost) into 2 can fix 
> conversion case (the test append in this patch).
> But can't fix reduction case.
> 
> Then I adjust it into COST = 1:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 19e38b8637b..50c99b1fe79 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  processed in current iteration, and a SHIFT operation to
>  compute the next memory address instead of adding 
> vectorization
>  factor.  */
> - length_update_cost = 2;
> + length_update_cost = 1;
> else
>   /* For increment IV stype, Each may need two MINs and one MINUS 
> to
>  update lengths in body for next iteration.  */
> 
> Then the codegen is reasonable now:
> 
> reduc_plus_int:
> ble a1,zero,.L4
> vsetvli a5,zero,e32,m1,ta,ma
> vmv.v.i v1,0
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma
> vle32.v v2,0(a0)
> slli a4,a5,2
> sub a1,a1,a5
> add a0,a0,a4
> vadd.vv v1,v2,v1
> bne a1,zero,.L3
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli a5,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> ret
> .L4:
> li a0,0
> ret
> 
> The reason I set COST = 2 instead of 1 in this patch since
> 
> one COST is for SELECT_VL.
> 
> The other is for memory address calculation since we don't update memory 
> address with adding VF directly,
> instead:
> 
> We shift the result of SELECV_VL, and then add it into the memory IV as 
> follows:
> 
> SSA_1 = SELECT_VL --> SSA_1 is element-wise
> SSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)
> next iteration memory address = current iteration memory address + SSA_2.
> 
> If elemente is INT8, then the shift operation is not needed:
> SSA_2 = SSA_1 << 1 since it is already byte-wise.
> 
> So, my question is the COST should be 1 or 2.
> It seems that COST = 1 is better for
> using SELECT_VL.
> 
> We are not modeling address cost, so trying to account for this here is only 
> going to be a heuristic that?s as many times wrong than it is correct.  If 
> the difference between SELECT_VL and not is so small  then you?ll have a hard 
> time modeling this here.
> 
>  
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-13 18:17
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH] 

Re: [PATCH v3 5/5] aarch64: Add function multiversioning support

2023-12-14 Thread Richard Sandiford
Andrew Carlotti  writes:
> This adds initial support for function multiversioning on aarch64 using
> the target_version and target_clones attributes.  This loosely follows
> the Beta specification in the ACLE [1], although with some differences
> that still need to be resolved (possibly as follow-up patches).
>
> Existing function multiversioning implementations are broken in various
> ways when used across translation units.  This includes placing
> resolvers in the wrong translation units, and using symbol mangling that
> callers to unintentionally bypass the resolver in some circumstances.
> Fixing these issues for aarch64 will require modifications to our ACLE
> specification.  It will also require further adjustments to existing
> middle end code, to facilitate different mangling and resolver
> placement while preserving existing target behaviours.
>
> The list of function multiversioning features specified in the ACLE is
> also inconsistent with the list of features supported in target option
> extensions.  I intend to resolve some or all of these inconsistencies at
> a later stage.
>
> The target_version attribute is currently only supported in C++, since
> this is the only frontend with existing support for multiversioning
> using the target attribute.  On the other hand, this patch happens to
> enable multiversioning with the target_clones attribute in Ada and D, as
> well as the entire C family, using their existing frontend support.
>
> This patch also does not support the following aspects of the Beta
> specification:
>
> - The target_clones attribute should allow an implicit unlisted
>   "default" version.
> - There should be an option to disable function multiversioning at
>   compile time.
> - Unrecognised target names in a target_clones attribute should be
>   ignored (with an optional warning).  This current patch raises an
>   error instead.
>
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> ---
>
> I believe the support present in this patch correctly handles function
> multiversioning within a single translation unit for all features in the ACLE
> specification with option extension support.
>
> Is it ok to push this patch in its current state? I'd then continue working on
> incremental improvements to the supported feature extensions and the ABI 
> issues
> in followup patches, along with corresponding changes and improvements to the
> ACLE specification.
>
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-feature-deps.h (fmv_deps_):
>   Define aarch64_feature_flags mask foreach FMV feature.
>   * config/aarch64/aarch64-option-extensions.def: Use new macros
>   to define FMV feature extensions.
>   * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
>   Check for target_version attribute after processing target
>   attribute.
>   (aarch64_fmv_feature_data): New.
>   (aarch64_parse_fmv_features): New.
>   (aarch64_process_target_version_attr): New.
>   (aarch64_option_valid_version_attribute_p): New.
>   (get_feature_mask_for_version): New.
>   (compare_feature_masks): New.
>   (aarch64_compare_version_priority): New.
>   (build_ifunc_arg_type): New.
>   (make_resolver_func): New.
>   (add_condition_to_bb): New.
>   (dispatch_function_versions): New.
>   (aarch64_generate_version_dispatcher_body): New.
>   (aarch64_get_function_versions_dispatcher): New.
>   (aarch64_common_function_versions): New.
>   (aarch64_mangle_decl_assembler_name): New.
>   (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
>   (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
>   (TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
>   (TARGET_COMPARE_VERSION_PRIORITY): New implementation.
>   (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
>   (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
>   (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
>   * config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
>   Set target macro.
>   * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> new value to report duplicate FMV feature.
>   * common/config/aarch64/cpuinfo.h: New file.
>
> libgcc/ChangeLog:
>
>   * config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
> copy in gcc/common
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/options_set_17.c: Reorder expected flags.
>   * gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
>   * gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
>   *

[pushed] analyzer: cleanups [PR112655]

2023-12-14 Thread David Malcolm
Avoid copying eedges in infinite_loop::infinite_loop.

Use initializer lists in the various places reported in
PR analyzer/112655 (apart from coord_test's ctor, which
would require nontrivial refactoring).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-6549-g8cf5afba5dc482

gcc/analyzer/ChangeLog:
PR analyzer/112655
* infinite-loop.cc (infinite_loop::infinite_loop): Pass eedges
via rvalue reference rather than by value.
(starts_infinite_loop_p): Move eedges when constructing an
infinite_loop instance.
* sm-file.cc (fileptr_state_machine::fileptr_state_machine): Use
initializer list for states.
* sm-sensitive.cc
(sensitive_state_machine::sensitive_state_machine): Likewise.
* sm-signal.cc (signal_state_machine::signal_state_machine):
Likewise.
* sm-taint.cc (taint_state_machine::taint_state_machine):
Likewise.
* varargs.cc (va_list_state_machine::va_list_state_machine): Likewise.
---
 gcc/analyzer/infinite-loop.cc |  8 
 gcc/analyzer/sm-file.cc   | 12 ++--
 gcc/analyzer/sm-sensitive.cc  |  6 +++---
 gcc/analyzer/sm-signal.cc |  6 +++---
 gcc/analyzer/sm-taint.cc  | 12 ++--
 gcc/analyzer/varargs.cc   |  6 +++---
 6 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/gcc/analyzer/infinite-loop.cc b/gcc/analyzer/infinite-loop.cc
index c47ce1c89085..fc194d919cf3 100644
--- a/gcc/analyzer/infinite-loop.cc
+++ b/gcc/analyzer/infinite-loop.cc
@@ -71,7 +71,7 @@ struct infinite_loop
 {
   infinite_loop (const exploded_node &enode,
location_t loc,
-   std::vector eedges,
+   std::vector &&eedges,
logger *logger)
   : m_enode (enode),
 m_loc (loc),
@@ -423,9 +423,9 @@ starts_infinite_loop_p (const exploded_node &enode,
  free (filename);
}
  return ::make_unique (enode,
- first_loc,
- eedges,
- logger);
+  first_loc,
+  std::move (eedges),
+  logger);
}
  else
{
diff --git a/gcc/analyzer/sm-file.cc b/gcc/analyzer/sm-file.cc
index f8e31f873a5a..323df23b1b71 100644
--- a/gcc/analyzer/sm-file.cc
+++ b/gcc/analyzer/sm-file.cc
@@ -270,13 +270,13 @@ private:
 /* fileptr_state_machine's ctor.  */
 
 fileptr_state_machine::fileptr_state_machine (logger *logger)
-: state_machine ("file", logger)
+: state_machine ("file", logger),
+  m_unchecked (add_state ("unchecked")),
+  m_null (add_state ("null")),
+  m_nonnull (add_state ("nonnull")),
+  m_closed (add_state ("closed")),
+  m_stop (add_state ("stop"))
 {
-  m_unchecked = add_state ("unchecked");
-  m_null = add_state ("null");
-  m_nonnull = add_state ("nonnull");
-  m_closed = add_state ("closed");
-  m_stop = add_state ("stop");
 }
 
 /* Get a set of functions that are known to take a FILE * that must be open,
diff --git a/gcc/analyzer/sm-sensitive.cc b/gcc/analyzer/sm-sensitive.cc
index 4776d6465bb5..aea337cdccda 100644
--- a/gcc/analyzer/sm-sensitive.cc
+++ b/gcc/analyzer/sm-sensitive.cc
@@ -161,10 +161,10 @@ private:
 /* sensitive_state_machine's ctor.  */
 
 sensitive_state_machine::sensitive_state_machine (logger *logger)
-: state_machine ("sensitive", logger)
+: state_machine ("sensitive", logger),
+  m_sensitive (add_state ("sensitive")),
+  m_stop (add_state ("stop"))
 {
-  m_sensitive = add_state ("sensitive");
-  m_stop = add_state ("stop");
 }
 
 /* Warn about an exposure at NODE and STMT if ARG is in the "sensitive"
diff --git a/gcc/analyzer/sm-signal.cc b/gcc/analyzer/sm-signal.cc
index 6bca395ac5c7..799bae5364b8 100644
--- a/gcc/analyzer/sm-signal.cc
+++ b/gcc/analyzer/sm-signal.cc
@@ -182,10 +182,10 @@ private:
 /* signal_state_machine's ctor.  */
 
 signal_state_machine::signal_state_machine (logger *logger)
-: state_machine ("signal", logger)
+: state_machine ("signal", logger),
+  m_in_signal_handler (add_state ("in_signal_handler")),
+  m_stop (add_state ("stop"))
 {
-  m_in_signal_handler = add_state ("in_signal_handler");
-  m_stop = add_state ("stop");
 }
 
 /* Update MODEL for edges that simulate HANDLER_FUN being called as
diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index 597e8e55609a..ce18957b56b8 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -830,13 +830,13 @@ private:
 /* taint_state_machine's ctor.  */
 
 taint_state_machine::taint_state_machine (logger *logger)
-: state_machine ("taint", logger)
+: state_machine ("taint", logger),
+  m_tainted (add_state ("tainted")),
+  m_has_lb (add_state ("has_lb")),
+  m_has_ub (add_state ("has_ub")),
+  m_stop (add_state ("stop")),
+  m_tainted

Re: [pushed 1/4] c++: copy location to AGGR_INIT_EXPR

2023-12-14 Thread Marek Polacek
On Wed, Dec 13, 2023 at 08:38:12PM -0500, Jason Merrill wrote:
> On 12/13/23 19:00, Marek Polacek wrote:
> > On Wed, Dec 13, 2023 at 11:47:37AM -0500, Jason Merrill wrote:
> > > Tested x86_64-pc-linux-gnu, applying to trunk.
> > > 
> > > -- 8< --
> > > 
> > > When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose 
> > > location
> > > information.
> > 
> > I think the following should be an obvious fix, so I'll check it in.
> 
> Thanks, I wonder why I wasn't seeing that?

It must be due to -fimplicit-constexpr.  So if you have
GXX_TESTSUITE_STDS=98,11,14,17,20,impcx
then I think the FAIL won't show up.
 
> > -- >8 --
> > Since r14-6505 I see:
> > 
> > FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23  at line 91 (test for 
> > errors, line 89)
> > FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23 (test for excess errors)
> > FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++26  at line 91 (test for 
> > errors, line 89)
> > FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++26 (test for excess errors)
> > 
> > and it wasn't fixed by r14-6511.  So I'm fixing it with the below.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/constexpr-ex1.C: Adjust expected diagnostic line.
> > ---
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C 
> > b/gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C
> > index 383d38a42d4..b26eb5d0c90 100644
> > --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C
> > +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C
> > @@ -88,7 +88,7 @@ struct resource {
> >   };
> >   constexpr resource f(resource d)
> >   { return d; }  // { dg-error "non-.constexpr." "" { 
> > target { { { ! implicit_constexpr } && c++20_down } || c++11_only } } }
> > -// { dg-error "non-.constexpr." "" { target { c++23 && { ! 
> > implicit_constexpr } } } .-2 }
> > +// { dg-error "non-.constexpr." "" { target { c++23 && { ! 
> > implicit_constexpr } } } .-1 }
> >   constexpr resource d = f(9);   // { dg-message ".constexpr." "" { target 
> > { { ! implicit_constexpr } || c++11_only } } }
> >   // 4.4 floating-point constant expressions
> > 
> > base-commit: c535360788e142a92e1d8b1db25bf4452e26f5fb
> 

Marek



Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-12-14 Thread Tobias Burnus

On 19.08.23 00:47, Julian Brown wrote:

This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).


LGTM with:

- inclusion of your follow-up fix for shared-memory systems (see email
of August 21)

- adding a comment to map-arrayofstruct-1.c indicating that this usage
is an extension, violating a restriction (be a bit more explicit that
just that)

See https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603126.html
for a quote of the specification or (same wording, newer spec) in TR12
under "Restrictions to the map clause are as follows:" in "6.8.3 map
Clause" [218+219:36-37+1-3]

Thanks,

Tobias


2023-08-18  Julian Brown  

gcc/
  * gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
  (omp_get_attachment, omp_group_last, omp_group_base,
  omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
  (omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
  Support GOMP_MAP_STRUCT_UNORD.
  (omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
  gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
  GOMP_MAP_STRUCT_UNORD support.
  * omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
  * tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
  * gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
  * oacc-mem.c (find_group_last, goacc_enter_data_internal,
  goacc_exit_data_internal, GOACC_enter_exit_data): Add
  GOMP_MAP_STRUCT_UNORD support.
  * target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
  Detect incorrect use of variable indexing of arrays of structs.
  (GOMP_target_enter_exit_data, gomp_target_task_fn): Add
  GOMP_MAP_STRUCT_UNORD support.
  * testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
  * testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
  * testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
  * testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
  gcc/gimplify.cc   | 110 ++
  gcc/omp-low.cc|   1 +
  gcc/tree-pretty-print.cc  |   3 +
  include/gomp-constants.h  |   6 +
  libgomp/oacc-mem.c|   6 +-
  libgomp/target.c  |  60 +-
  .../map-arrayofstruct-1.c |  38 ++
  .../map-arrayofstruct-2.c |  58 +
  .../map-arrayofstruct-3.c |  68 +++
  .../libgomp.fortran/map-subarray-5.f90|  54 +
  10 files changed, 377 insertions(+), 27 deletions(-)
  create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
  create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
  create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
  create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index fad4308a0eb4..e682583054b0 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8965,7 +8965,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,

  static tree
  extract_base_bit_offset (tree base, poly_int64 *bitposp,
-  poly_offset_int *poffsetp)
+  poly_offset_int *poffsetp,
+  bool *variable_offset)
  {
tree offset;
poly_int64 bitsize, bitpos;
@@ -8983,10 +8984,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
if (offset && poly_int_tree_p (offset))
  {
poffset = wi::to_poly_offset (offset);
-  offset = NULL_TREE;
+  *variable_offset = false;
  }
else
-poffset = 0;
+{
+  

Re: [PATCH] RISC-V: fix scalar crypto pattern

2023-12-14 Thread Jeff Law




On 12/14/23 02:48, Christoph Müllner wrote:

On Thu, Dec 14, 2023 at 1:40 AM Jeff Law  wrote:

On 12/13/23 02:03, Christoph Müllner wrote:

On Wed, Dec 13, 2023 at 9:22 AM Liao Shihua  wrote:


In Scalar Crypto Built-In functions, some require immediate parameters,
But register_operand are incorrectly used in the pattern.

E.g.:
 __builtin_riscv_aes64ks1i(rs1,1)
 Before:
li a5,1
aes64ks1i a0,a0,a5

Assembler messages:
Error: instruction aes64ks1i requires absolute expression

 After:
aes64ks1i a0,a0,1


Looks good to me (also tested with rv32 and rv64).
(I was actually surprised that the D03 constraint was not sufficient)

Reviewed-by: Christoph Muellner 
Tested-by: Christoph Muellner 

Nit: I would prefer to separate arguments with a comma followed by a space.
Even if the existing code was not written like that.
E.g. __builtin_riscv_sm4ed(rs1,rs2,1); -> __builtin_riscv_sm4ed(rs1, rs2, 1);

I propose to remove the builtin tests for scalar crypto and scalar bitmanip
as part of the patchset that adds the intrinsic tests (no value in
duplicated tests).


gcc/ChangeLog:

  * config/riscv/crypto.md: Use immediate_operand instead of 
register_operand.

You should mention the actual patterns changed.

I would strongly recommend adding some tests that out of range cases are
rejected (out of range constants as well as a variable for that last
argument).  I did that in my patch from June to fix this problem (which
was never acked/reviewed).


Sorry, I was not aware of this patch.
Since Jeff's patch was here first and also includes more tests, I
propose to move forward with his patch (but I'm not a maintainer!).
Therefore, I've reviewed Jeff's patch and replied to his email.

FWIW: Jeff's patch can be found here:
   https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622233.html
So my patch will need a trivial update as a couple patterns changed to 
use an iterator and thus the patch context doesn't match anymore.  It's 
a trivial fix.


jeff


Re: [PATCH v3 4/5] Add support for target_version attribute

2023-12-14 Thread Richard Sandiford
Andrew Carlotti  writes:
> This patch adds support for the "target_version" attribute to the middle
> end and the C++ frontend, which will be used to implement function
> multiversioning in the aarch64 backend.
>
> On targets that don't use the "target" attribute for multiversioning,
> there is no conflict between the "target" and "target_clones"
> attributes.  This patch therefore makes the mutual exclusion in
> C-family, D and Ada conditonal upon the value of the
> expanded_clones_attribute target hook.
>
> The "target_version" attribute is only added to C++ in this patch,
> because this is currently the only frontend which supports
> multiversioning using the "target" attribute.  Support for the
> "target_version" attribute will be extended to C at a later date.
>
> Targets that currently use the "target" attribute for function
> multiversioning (i.e. i386 and rs6000) are not affected by this patch.
>
> Ok for master?
>
> gcc/ChangeLog:
>
>   * attribs.cc (decl_attributes): Pass attribute name to target.
>   (is_function_default_version): Update comment to specify
>   incompatibility with target_version attributes.
>   * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
>   Call valid_version_attribute_p for target_version attributes.
>   * defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
>   * target.def (valid_version_attribute_p): New hook.
>   * doc/tm.texi.in: Add new hook.
>   * doc/tm.texi: Regenerate.
>   * multiple_target.cc (create_dispatcher_calls): Remove redundant
>   is_function_default_version check.
>   (expand_target_clones): Use target macro to pick attribute name.
>   * targhooks.cc (default_target_option_valid_version_attribute_p):
>   New.
>   * targhooks.h (default_target_option_valid_version_attribute_p):
>   New.
>   * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
>   target_version attributes.
>
> gcc/c-family/ChangeLog:
>
>   * c-attribs.cc (attr_target_exclusions): Make
>   target/target_clones exclusion target-dependent.
>   (attr_target_clones_exclusions): Ditto, and add target_version.
>   (attr_target_version_exclusions): New.
>   (c_common_attribute_table): Add target_version.
>   (handle_target_version_attribute): New.
>
> gcc/ada/ChangeLog:
>
>   * gcc-interface/utils.cc (attr_target_exclusions): Make
>   target/target_clones exclusion target-dependent.
>   (attr_target_clones_exclusions): Ditto.
>
> gcc/d/ChangeLog:
>
>   * d-attribs.cc (attr_target_exclusions): Make
>   target/target_clones exclusion target-dependent.
>   (attr_target_clones_exclusions): Ditto.
>
> gcc/cp/ChangeLog:
>
>   * decl2.cc (check_classfn): Update comment to include
>   target_version attributes.

The front-end changes look mechanical, so: OK with the nit below fixed
unless anyone objects in 24 hours, or asks for more time.

> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> index 
> f2c504ddf8d3df11abe81aec695c9eea0b39da6c..5d946c33b212c5ea50e7a73524e8c1d062280956
>  100644
> --- a/gcc/ada/gcc-interface/utils.cc
> +++ b/gcc/ada/gcc-interface/utils.cc
> @@ -145,14 +145,16 @@ static const struct attribute_spec::exclusions 
> attr_noinline_exclusions[] =
>  
>  static const struct attribute_spec::exclusions attr_target_exclusions[] =
>  {
> -  { "target_clones", true, true, true },
> +  { "target_clones", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
> +TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE },
>{ NULL, false, false, false },
>  };
>  
>  static const struct attribute_spec::exclusions 
> attr_target_clones_exclusions[] =
>  {
>{ "always_inline", true, true, true },
> -  { "target", true, true, true },
> +  { "target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, 
> TARGET_HAS_FMV_TARGET_ATTRIBUTE,
> +TARGET_HAS_FMV_TARGET_ATTRIBUTE },
>{ NULL, false, false, false },
>  };
>  
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 
> c7209c26acc9faf699774b0ef669ec6748b9073d..19cccf2d7ca4fdd6a46a01884393c6779333dbc5
>  100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
>   options to the attribute((target(...))) list.  */
>if (TREE_CODE (*node) == FUNCTION_DECL
>&& current_target_pragma
> -  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
> +  && targetm.target_option.valid_attribute_p (*node,
> +   get_identifier ("target"),
> current_target_pragma, 0))
>  {
>tree cur_attr = lookup_attribute ("target", attributes);
> @@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
>return func_decl;  
>  }
>  
> -/* Returns true if decl is multi-versioned and DECL is the default function,
> -   that is it is not tagged with target specific optimization.  */
> +/* Ret

[COMMITTED] Initial libgrust build patches

2023-12-14 Thread Arthur Cohen
Hi,

This patchset contains the initial changes to add the libgrust folder to
gcc, which will later contain libraries used by our Rust frontend.

This work was done by Pierre-Emmanuel Patry as part of his work on
supporting procedural macros in our frontend. It was then tested by
Thomas Schwinge, and finally pushed by Marc Poulhiès.

Kindly,

Arthur

[PATCH 1/4] libgrust: Add ChangeLog file
[PATCH 2/4] libgrust: Add entry for maintainers
[PATCH 3/4] libgrust: Add libproc_macro and build system
[PATCH 4/4] build: Add libgrust as compilation modules



[PATCH 0/4] v3 of: Option handling: add documentation URLs

2023-12-14 Thread David Malcolm
> Hi David,
> 
> On Fri, Dec 08, 2023 at 06:35:56PM -0500, David Malcolm wrote:
> > On Tue, 2023-11-21 at 23:43 +, Joseph Myers wrote:
> > > On Tue, 21 Nov 2023, Tobias Burnus wrote:
> > > 
> > > > On 21.11.23 14:57, David Malcolm wrote:
> > > > > On Tue, 2023-11-21 at 02:09 +0100, Hans-Peter Nilsson wrote:
> > > > > > Sorry for barging in though I did try finding the relevant
> > > > > > discussion, but is committing this generated stuff necessary?
> > > > > > Is it because we don't want to depend on Python being
> > > > > > present at build time?
> > > > > Partly, yes, [...]
> > > > 
> > > > I wonder how to ensure that this remains up to date. Should there
> > > > be an item at
> > > > 
> > > > https://gcc.gnu.org/branching.html and/or
> > > > https://gcc.gnu.org/releasing.html similar to the .pot generation?
> > >
> > > My suggestion earlier in the discussion was that it should be
> > > added to the post-commit CI discussed starting at
> > >  (I
> > > think that CI is now in operation).  These are generated files
> > > that ought to be kept up to date with each commit that affects
> > > .opt files, unlike the .pot files where the expectation is that
> > > they should be up to date for releases and updated from time to
> > > time at other times for submission to the TP.
> > > 
> > I had a go at scripting the testing of this, but I am terrible at shell
> > scripts (maybe I should use Python?).  Here's what I have so far:
> > 
> > $ cat contrib/regenerate-index-urls.sh
> > 
> > set -x
> > 
> > SRC_DIR=$1
> > BUILD_DIR=$2
> > NUM_JOBS=$3
> > 
> > # FIXME: error-checking!
> > 
> > mkdir -p $BUILD_DIR || exit 1
> > cd $BUILD_DIR
> > $SRC_DIR/configure --disable-bootstrap --enable-languages=c,d,fortran || 
> > exit 2
> > make html-gcc -j$NUM_JOBS || exit 3
> > cd gcc || exit 4
> > make regenerate-opt-urls || exit 5
> > cd $SRC_DIR
> > (git diff $1 > /dev/null ) && echo "regenerate-opt-urls needs to be run and 
> > the results committed" || exit 6
> > 
> > # e.g.
> > #  time bash contrib/regenerate-index-urls.sh $(pwd) $(pwd)/../build-ci 64
> > 
> > This takes about 100 seconds of wallclock on my 64-core box (mostly
> > configuring gcc, which as well as the usual sequence of unparallelized
> > tests seems to require building libiberty and lto-plugin).  Is that
> > something we want to do on every commit?  Is implementing the CI a
> > blocker for getting the patches in? (if so, I'll likely need some help)
> 
> The CI builers don't have 64-cores, but a couple of hundred seconds
> shouldn't be an issue to do on each commit (OSUOSL just got us a
> second x86_64 container builder for larger jobs). The above can easily
> be added to the existing gcc-autoregen builder:
> https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
> https://sourceware.org/cgit/builder/tree/builder/master.cfg#n3453
> 
> Once your patch is in please feel free to sent an email to
> build...@sourceware.org
> https://sourceware.org/mailman/listinfo/buildbot
> And we'll add the above build steps and update the autotools
> Containerfile to include the fortran (gfortran?) and d (gdc?) build
> dependencies.
> 
> Cheers,
> 
> Mark

Thanks Mark.

Joseph: it seems that we have a way to add CI for this.

I refreshed the patches and successfully bootstrapped & regrtested them
on x86_64-pc-linux-gnu; here's the v3 version of them.

Are these OK for trunk, assuming I followup with adding CI for this?
(that said, I disappear for the rest of 2023 at the end of this week, so
I'd work on the CI in early January)

Thanks
Dave


David Malcolm (4):
  options: add gcc/regenerate-opt-urls.py
  Add generated .opt.urls files
  opts: add logic to generate options-urls.cc
  options: wire up options-urls.cc into gcc_urlifier

 gcc/Makefile.in  |   34 +-
 gcc/ada/gcc-interface/lang.opt.urls  |   30 +
 gcc/analyzer/analyzer.opt.urls   |  206 ++
 gcc/c-family/c.opt.urls  | 1409 ++
 gcc/common.opt.urls  | 1832 ++
 gcc/config/aarch64/aarch64.opt.urls  |   84 +
 gcc/config/alpha/alpha.opt.urls  |   76 +
 gcc/config/alpha/elf.opt.urls|2 +
 gcc/config/arc/arc-tables.opt.urls   |2 +
 gcc/config/arc/arc.opt.urls  |  260 +++
 gcc/config/arm/arm-tables.opt.urls   |2 +
 gcc/config/arm/arm.opt.urls  |  149 ++
 gcc/config/arm/vxworks.opt.urls  |2 +
 gcc/config/avr/avr.opt.urls  |   71 +
 gcc/config/bfin/bfin.opt.urls|   61 +
 gcc/config/bpf/bpf.opt.urls  |   35 +
 gcc/config/c6x/c6x-tables.opt.urls   |2 +
 gcc/config/c6x/c6x.opt.urls  |   18 +
 gcc/config/cris/cris.opt.urls|   65 +
 gcc/config/cris/elf.opt.urls |8 +
 gcc/config/csky/csky.opt.urls|  104 +
 gc

[PATCH 3/4; v2] opts: add logic to generate options-urls.cc

2023-12-14 Thread David Malcolm
Changed in v2:
- split out from the code that uses this
- now handles lang-specific URLs, as well as generic URLs
- the generated options-urls.cc now contains a function with a
  switch statement, rather than an array, to support
  lang-specific URLs:

const char *
get_opt_url_suffix (int option_index, unsigned lang_mask)
{
  switch (option_index)
{
 [...snip...]
 case OPT_B:
   if (lang_mask & CL_D)
 return "gdc/Directory-Options.html#index-B";
   return "gcc/Directory-Options.html#index-B";
[...snip...]
  return nullptr;
}

gcc/ChangeLog:
* Makefile.in (ALL_OPT_URL_FILES): New.
(GCC_OBJS): Add options-urls.o.
(OBJS): Likewise.
(OBJS-libcommon): Likewise.
(s-options): Depend on $(ALL_OPT_URL_FILES), and add this to
inputs to opt-gather.awk.
(options-urls.cc): New Makefile target.
* opt-functions.awk (url_suffix): New function.
(lang_url_suffix): New function.
* options-urls-cc-gen.awk: New file.
* opts.h (get_opt_url_suffix): New decl.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in |  18 +--
 gcc/opt-functions.awk   |  15 ++
 gcc/options-urls-cc-gen.awk | 105 
 gcc/opts.h  |   4 ++
 4 files changed, 138 insertions(+), 4 deletions(-)
 create mode 100644 gcc/options-urls-cc-gen.awk

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d85953495ce8..33b29ab50280 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1273,6 +1273,8 @@ FLAGS_TO_PASS = \
 # All option source files
 ALL_OPT_FILES=$(lang_opt_files) $(extra_opt_files)
 
+ALL_OPT_URL_FILES=$(patsubst %, %.urls, $(ALL_OPT_FILES))
+
 # Target specific, C specific object file
 C_TARGET_OBJS=@c_target_objs@
 
@@ -1289,7 +1291,7 @@ FORTRAN_TARGET_OBJS=@fortran_target_objs@
 RUST_TARGET_OBJS=@rust_target_objs@
 
 # Object files for gcc many-languages driver.
-GCC_OBJS = gcc.o gcc-main.o ggc-none.o gcc-urlifier.o
+GCC_OBJS = gcc.o gcc-main.o ggc-none.o gcc-urlifier.o options-urls.o
 
 c-family-warn = $(STRICT_WARN)
 
@@ -1611,6 +1613,7 @@ OBJS = \
optinfo.o \
optinfo-emit-json.o \
options-save.o \
+   options-urls.o \
opts-global.o \
ordered-hash-map-tests.o \
passes.o \
@@ -1837,7 +1840,8 @@ OBJS-libcommon = diagnostic-spec.o diagnostic.o 
diagnostic-color.o \
 # compiler and containing target-dependent code.
 OBJS-libcommon-target = $(common_out_object_file) prefix.o \
opts.o opts-common.o options.o vec.o hooks.o common/common-targhooks.o \
-   hash-table.o file-find.o spellcheck.o selftest.o opt-suggestions.o
+   hash-table.o file-find.o spellcheck.o selftest.o opt-suggestions.o \
+   options-urls.o
 
 # This lists all host objects for the front ends.
 ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
@@ -2440,9 +2444,9 @@ s-specs : Makefile
$(STAMP) s-specs
 
 optionlist: s-options ; @true
-s-options: $(ALL_OPT_FILES) Makefile $(srcdir)/opt-gather.awk
+s-options: $(ALL_OPT_FILES) $(ALL_OPT_URL_FILES) Makefile 
$(srcdir)/opt-gather.awk
LC_ALL=C ; export LC_ALL ; \
-   $(AWK) -f $(srcdir)/opt-gather.awk $(ALL_OPT_FILES) > tmp-optionlist
+   $(AWK) -f $(srcdir)/opt-gather.awk $(ALL_OPT_FILES) 
$(ALL_OPT_URL_FILES) > tmp-optionlist
$(SHELL) $(srcdir)/../move-if-change tmp-optionlist optionlist
$(STAMP) s-options
 
@@ -2458,6 +2462,12 @@ options-save.cc: optionlist $(srcdir)/opt-functions.awk 
$(srcdir)/opt-read.awk \
   -f $(srcdir)/optc-save-gen.awk \
   -v header_name="config.h system.h coretypes.h tm.h" < $< > $@
 
+options-urls.cc: optionlist $(srcdir)/opt-functions.awk $(srcdir)/opt-read.awk 
\
+$(srcdir)/options-urls-cc-gen.awk
+   $(AWK) -f $(srcdir)/opt-functions.awk -f $(srcdir)/opt-read.awk \
+  -f $(srcdir)/options-urls-cc-gen.awk \
+  -v header_name="config.h system.h coretypes.h tm.h" < $< > $@
+
 options.h: s-options-h ; @true
 s-options-h: optionlist $(srcdir)/opt-functions.awk $(srcdir)/opt-read.awk \
 $(srcdir)/opth-gen.awk
diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index a58e93815e30..c31e66f2105a 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -193,6 +193,21 @@ function var_name(flags)
return nth_arg(0, opt_args("Var", flags))
 }
 
+# If FLAGS includes a UrlSuffix flag, return the value it specifies.
+# Return the empty string otherwise.
+function url_suffix(flags)
+{
+   return nth_arg(0, opt_args("UrlSuffix", flags))
+}
+
+# If FLAGS includes a LangUrlSuffix_LANG flag, return the
+# value it specifies.
+# Return the empty string otherwise.
+function lang_url_suffix(flags, lang)
+{
+   return nth_arg(0, opt_args("LangUrlSuffix_" lang, flags))
+}
+
 # Return the name of the variable if FLAGS has a HOST_WIDE_INT variable. 
 # Return the empty string otherwise.
 function host_wide_int_var_name(flags)
di

[PATCH 4/4; v2] options: wire up options-urls.cc into gcc_urlifier

2023-12-14 Thread David Malcolm
Changed in v2:
- split out from the code that generates options-urls.cc
- call the generated function, rather than use a generated array
- pass around lang_mask

gcc/ChangeLog:
* diagnostic.h (diagnostic_make_option_url_cb): Add lang_mask
param.
(diagnostic_context::make_option_url): Update for lang_mask param.
* gcc-urlifier.cc: Include "opts.h" and "options.h".
(gcc_urlifier::gcc_urlifier): Add lang_mask param.
(gcc_urlifier::m_lang_mask): New field.
(doc_urls): Make static.
(gcc_urlifier::get_url_for_quoted_text): Use label_text.
(gcc_urlifier::get_url_suffix_for_quoted_text): Use label_text.
Look for an option by name before trying a binary search in
doc_urls.
(gcc_urlifier::get_url_suffix_for_quoted_text): Use label_text.
(gcc_urlifier::get_url_suffix_for_option): New.
(make_gcc_urlifier): Add lang_mask param.
(selftest::gcc_urlifier_cc_tests): Update for above changes.
Verify that a URL is found for "-fpack-struct".
* gcc-urlifier.def: Drop options "--version" and "-fpack-struct".
* gcc-urlifier.h (make_gcc_urlifier): Add lang_mask param.
* gcc.cc (driver::global_initializations): Pass 0 for lang_mask
to make_gcc_urlifier.
* opts-diagnostic.h (get_option_url): Add lang_mask param.
* opts.cc (get_option_html_page): Remove special-casing for
analyzer and LTO.
(get_option_url_suffix): New.
(get_option_url): Reimplement.
(selftest::test_get_option_html_page): Rename to...
(selftest::test_get_option_url_suffix): ...this and update for
above changes.
(selftest::opts_cc_tests): Update for renaming.
* opts.h: Include "rich-location.h".
(get_option_url_suffix): New decl.

gcc/testsuite/ChangeLog:
* lib/gcc-dg.exp: Set TERM to xterm.

gcc/ChangeLog:
* toplev.cc (general_init): Pass lang_mask to urlifier.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic.h |   6 +-
 gcc/gcc-urlifier.cc  | 106 ---
 gcc/gcc-urlifier.def |   2 -
 gcc/gcc-urlifier.h   |   2 +-
 gcc/gcc.cc   |   2 +-
 gcc/opts-diagnostic.h|   3 +-
 gcc/opts.cc  |  95 ---
 gcc/opts.h   |   4 ++
 gcc/testsuite/lib/gcc-dg.exp |   6 ++
 gcc/toplev.cc|   5 +-
 10 files changed, 169 insertions(+), 62 deletions(-)

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 80e53ec92b06..0ee0e6485937 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -186,7 +186,8 @@ typedef char *(*diagnostic_make_option_name_cb) (const 
diagnostic_context *,
 diagnostic_t,
 diagnostic_t);
 typedef char *(*diagnostic_make_option_url_cb) (const diagnostic_context *,
-   int);
+   int,
+   unsigned);
 
 class edit_context;
 namespace json { class value; }
@@ -526,7 +527,8 @@ public:
   {
 if (!m_option_callbacks.m_make_option_url_cb)
   return nullptr;
-return m_option_callbacks.m_make_option_url_cb (this, option_index);
+return m_option_callbacks.m_make_option_url_cb (this, option_index,
+   get_lang_mask ());
   }
 
   void
diff --git a/gcc/gcc-urlifier.cc b/gcc/gcc-urlifier.cc
index 0dbff9853132..ae762a45f4a1 100644
--- a/gcc/gcc-urlifier.cc
+++ b/gcc/gcc-urlifier.cc
@@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "pretty-print.h"
 #include "pretty-print-urlifier.h"
 #include "gcc-urlifier.h"
+#include "opts.h"
+#include "options.h"
 #include "selftest.h"
 
 namespace {
@@ -34,23 +36,34 @@ namespace {
 class gcc_urlifier : public urlifier
 {
 public:
+  gcc_urlifier (unsigned int lang_mask)
+  : m_lang_mask (lang_mask)
+  {}
+
   char *get_url_for_quoted_text (const char *p, size_t sz) const final 
override;
 
-  const char *get_url_suffix_for_quoted_text (const char *p, size_t sz) const;
+  label_text get_url_suffix_for_quoted_text (const char *p, size_t sz) const;
   /* We use ATTRIBUTE_UNUSED as this helper is called only from ASSERTs.  */
-  const char *get_url_suffix_for_quoted_text (const char *p) const 
ATTRIBUTE_UNUSED;
+  label_text get_url_suffix_for_quoted_text (const char *p) const 
ATTRIBUTE_UNUSED;
 
 private:
+  label_text get_url_suffix_for_option (const char *p, size_t sz) const;
+
   static char *
   make_doc_url (const char *doc_url_suffix);
+
+  unsigned int m_lang_mask;
 };
 
 /* class gcc_urlifier : public urlifier.  */
 
+/* Manage a hard-coded mapping from quoted string to URL suffixes
+   in gcc-urlifier.def  */
+
 #define DOC_URL(QUOTED_TEXT, URL_SUFFIX) \
   { (QUOTED_TEXT), (URL_SUFFIX) }
 
-const stru

[COMMITTED 1/4] libgrust: Add ChangeLog file

2023-12-14 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

libgrust/ChangeLog:

* ChangeLog: New file.

Signed-off-by: Pierre-Emmanuel Patry 
---
 libgrust/ChangeLog | 6 ++
 1 file changed, 6 insertions(+)
 create mode 100644 libgrust/ChangeLog

diff --git a/libgrust/ChangeLog b/libgrust/ChangeLog
new file mode 100644
index 000..97887c90552
--- /dev/null
+++ b/libgrust/ChangeLog
@@ -0,0 +1,6 @@
+
+Copyright (C) 2023 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
-- 
2.39.1



[PATCH 1/4; v3] options: add gcc/regenerate-opt-urls.py

2023-12-14 Thread David Malcolm
Changed in v3:
- Makefile.in: added OPT_URLS_HTML_DEPS and a comment

Changed in v2:
- added convenience targets to Makefile for regenerating the .opt.urls
  files, and for running unit tests for the generation code
- parse gdc and gfortran documentation, and create LangUrlSuffix_{lang}
directives for language-specific URLs.
- add documentation to sourcebuild.texi

gcc/ChangeLog:
* Makefile.in (OPT_URLS_HTML_DEPS): New.
(regenerate-opt-urls): New target.
(regenerate-opt-urls-unit-test): New target.
* doc/options.texi (Option properties): Add UrlSuffix and
description of regenerate-opt-urls.py.  Add LangUrlSuffix_*.
* doc/sourcebuild.texi (Anatomy of a Target Back End): Add
reference to regenerate-opt-urls.py's TARGET_SPECIFIC_PAGES.
* regenerate-opt-urls.py: New file.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in|  16 ++
 gcc/doc/options.texi   |  26 +++
 gcc/doc/sourcebuild.texi   |   4 +
 gcc/regenerate-opt-urls.py | 408 +
 4 files changed, 454 insertions(+)
 create mode 100755 gcc/regenerate-opt-urls.py

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f284c1387e27..d85953495ce8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3611,6 +3611,22 @@ $(build_htmldir)/gccinstall/index.html: 
$(TEXI_GCCINSTALL_FILES)
DESTDIR=$(@D) \
$(SHELL) $(srcdir)/doc/install.texi2html
 
+# Regenerate the .opt.urls files from the generated html, and from the .opt
+# files.  Doing so requires all languages that have their own HTML manuals
+# to be enabled.
+.PHONY: regenerate-opt-urls
+OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \
+   $(build_htmldir)/gdc/Option-Index.html \
+   $(build_htmldir)/gfortran/Option-Index.html
+
+regenerate-opt-urls: $(srcdir)/regenerate-opt-urls.py $(OPT_URLS_HTML_DEPS)
+   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
$(srcdir))
+
+# Run the unit tests for regenerate-opt-urls.py
+.PHONY: regenerate-opt-urls-unit-test
+regenerate-opt-urls-unit-test: $(OPT_URLS_HTML_DEPS)
+   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
$(srcdir)) --unit-test
+
 MANFILES = doc/gcov.1 doc/cpp.1 doc/gcc.1 doc/gfdl.7 doc/gpl.7 \
doc/fsf-funding.7 doc/gcov-tool.1 doc/gcov-dump.1 \
   $(if $(filter yes,@enable_lto@),doc/lto-dump.1)
diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index 715f0a1479c7..37d7ecc1477d 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -597,4 +597,30 @@ This warning option corresponds to @code{cpplib.h} warning 
reason code
 @var{CPP_W_Enum}.  This should only be used for warning options of the
 C-family front-ends.
 
+@item UrlSuffix(@var{url_suffix})
+Adjacent to each human-written @code{.opt} file in the source tree is
+a corresponding file with a @code{.opt.urls} extension.  These files
+contain @code{UrlSuffix} directives giving the ending part of the URL
+for the documentation of the option, such as:
+
+@smallexample
+Wabi-tag
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wabi-tag)
+@end smallexample
+
+These URL suffixes are relative to @code{DOCUMENTATION_ROOT_URL}.
+
+There files are generated from the @code{.opt} files and the generated
+HTML documentation by @code{regenerate-opt-urls.py}, and should be
+regenerated when adding new options, via manually invoking
+@code{make regenerate-opt-urls}.
+
+@item LangUrlSuffix_@var{lang}(@var{url_suffix})
+In addition to @code{UrlSuffix} directives, @code{regenerate-opt-urls.py}
+can generate language-specific URLs, such as:
+
+@smallexample
+LangUrlSuffix_D(gdc/Code-Generation.html#index-MMD)
+@end smallexample
+
 @end table
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 26a7e9c35070..9a394b3e2c77 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -813,6 +813,10 @@ options supported by this target (@pxref{Run-time Target, 
, Run-time
 Target Specification}).  This means both entries in the summary table
 of options and details of the individual options.
 @item
+An entry in @file{gcc/regenerate-opt-urls.py}'s TARGET_SPECIFIC_PAGES
+dictionary mapping from target-specific HTML documentation pages
+to the target specific source directory.
+@item
 Documentation in @file{gcc/doc/extend.texi} for any target-specific
 attributes supported (@pxref{Target Attributes, , Defining
 target-specific uses of @code{__attribute__}}), including where the
diff --git a/gcc/regenerate-opt-urls.py b/gcc/regenerate-opt-urls.py
new file mode 100755
index ..b123fc57c7b9
--- /dev/null
+++ b/gcc/regenerate-opt-urls.py
@@ -0,0 +1,408 @@
+#!/usr/bin/env python3
+
+# Copyright (C) 2023 Free Software Foundation, Inc.
+#
+# Script to regenerate FOO.opt.urls files for each FOO.opt in the
+# source tree.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public L

[COMMITTED 2/4] libgrust: Add entry for maintainers

2023-12-14 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

ChangeLog:

* MAINTAINERS: Add maintainers for libgrust.

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Add libgrust.

Co-authored-by: Arthur Cohen 
Signed-off-by: Pierre-Emmanuel Patry 
---
 MAINTAINERS | 1 +
 contrib/gcc-changelog/git_commit.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e877396dc0e..343560c5b84 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -182,6 +182,7 @@ libgo   Ian Lance Taylor

 libgompJakub Jelinek   
 libgompTobias Burnus   

 libgomp (OpenACC)  Thomas Schwinge 
+libgrust   All Rust front end maintainers
 libiberty  Ian Lance Taylor
 libitm Torvald Riegel  
 libobjcNicola Pero 

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 9110317a759..4e601fa1f63 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -69,6 +69,7 @@ default_changelog_locations = {
 'libgfortran',
 'libgm2',
 'libgomp',
+'libgrust',
 'libhsail-rt',
 'libiberty',
 'libitm',
-- 
2.39.1



[COMMITTED 4/4] build: Add libgrust as compilation modules

2023-12-14 Thread Arthur Cohen
From: Pierre-Emmanuel Patry 

Define the libgrust directory as a host compilation module as well as
for targets. Disable target libgrust if we're not building target
libstdc++.

ChangeLog:

* Makefile.def: Add libgrust as host & target module.
* configure.ac: Add libgrust to host tools list. Add libgrust to
noconfigdirs if we're not building target libstdc++.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/rust/ChangeLog:

* config-lang.in: Add libgrust as a target module for the rust
language.

Co-authored-by: Thomas Schwinge 
Signed-off-by: Pierre-Emmanuel Patry 


[PATCH] tree-optimization/112793 - SLP of constant/external code-generated twice

2023-12-14 Thread Richard Biener
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.

* g++.dg/vect/pr112793.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr112793.cc | 32 +++
 gcc/tree-vect-slp.cc  | 17 +++---
 2 files changed, 41 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr112793.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr112793.cc 
b/gcc/testsuite/g++.dg/vect/pr112793.cc
new file mode 100644
index 000..258d7c1b111
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr112793.cc
@@ -0,0 +1,32 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-march=znver2" { target x86_64-*-* i?86-*-* } }
+
+typedef double T;
+T c, s;
+T a[16];
+struct Matrix4 {
+  Matrix4(){}
+  Matrix4(T e, T f, T i, T j) {
+r[1] = r[4] = e;
+r[5] = f;
+r[8] = i;
+r[9] = j;
+  }
+  Matrix4 operator*(Matrix4 a) {
+return Matrix4(
+   r[0] * a.r[4] + r[4] + r[15] + r[6],
+   r[1] * a.r[4] + 1 + 2 + 3,  r[0] * r[8] + 1 + 2 + 3,
+   r[1] * r[8] + r[1] + r[14] + r[2] * r[3]);
+  }
+  T r[16] = {};
+};
+Matrix4 t1, t2;
+Matrix4 tt;
+Matrix4 getRotAltAzToEquatorial()
+{
+  t2.r[4] =  0;
+  t1.r[1] =  -s;
+  t1.r[8] = 0;
+  return t1 * t2;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b412ec38802..a82fca45161 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9062,13 +9062,6 @@ vect_schedule_slp_node (vec_info *vinfo,
   int i;
   slp_tree child;
 
-  /* For existing vectors there's nothing to do.  */
-  if (SLP_TREE_DEF_TYPE (node) == vect_external_def
-  && SLP_TREE_VEC_DEFS (node).exists ())
-return;
-
-  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ());
-
   /* Vectorize externals and constants.  */
   if (SLP_TREE_DEF_TYPE (node) == vect_constant_def
   || SLP_TREE_DEF_TYPE (node) == vect_external_def)
@@ -9079,10 +9072,18 @@ vect_schedule_slp_node (vec_info *vinfo,
   if (!SLP_TREE_VECTYPE (node))
return;
 
-  vect_create_constant_vectors (vinfo, node);
+  /* There are two reasons vector defs might already exist.  The first
+is that we are vectorizing an existing vector def.  The second is
+when performing BB vectorization shared constant/external nodes
+are not split apart during partitioning so during the code-gen
+DFS walk we can end up visiting them twice.  */
+  if (! SLP_TREE_VEC_DEFS (node).exists ())
+   vect_create_constant_vectors (vinfo, node);
   return;
 }
 
+  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ());
+
   stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
 
   gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0);
-- 
2.35.3


Re: [committed] d: Merge upstream dmd, druntime 2bbf64907c, phobos b64bfbf91

2023-12-14 Thread Rainer Orth
Thomas Schwinge  writes:

> On 2023-12-11T11:13:28+0100, Iain Buclaw  wrote:
>> Excerpts from Iain Buclaw's message of Dezember 11, 2023 11:07 am:
>>> This patch merges the D front-end and runtime library with upstream dmd
>>> 2bbf64907c, and the standard library with phobos b64bfbf91.
>>>
>>> Synchronizing with the upstream release of v2.106.0.
>
>> [...] should
>> fix that linker problem when using gdc-9 for bootstrapping.
>
> Thanks, confirmed for x86_64-pc-linux-gnu bootstrap with GCC 9.

Also confirmed on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Middle-end: Do not model address cost for SELECT_VL style vectorization

2023-12-14 Thread Richard Biener
On Thu, 14 Dec 2023, Juzhe-Zhong wrote:

> Follow Richard's suggestions, we should not model address cost in the loop
> vectorizer for select_vl or decrement IV since other style vectorization 
> doesn't
> do that.
> 
> To make cost model comparison apple to apple.
> This patch set COST from 2 to 1 which turns out have better codegen
> in various codegen for RVV.
> 
> Ok for trunk ?

OK with me.

Richard.

>   PR target/53
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Remove 
> address cost for select_vl/decrement IV.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/costmodel/riscv/rvv/pr53.c: Moved to...
>   * gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
>   * gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c: New test.
> 
> ---
>  .../vect/costmodel/riscv/rvv/pr53-1.c  | 18 ++
>  .../riscv/rvv/{pr53.c => pr11153-2.c}  |  4 ++--
>  gcc/tree-vect-loop.cc  | 10 --
>  3 files changed, 24 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
>  rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{pr53.c => 
> pr11153-2.c} (93%)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
> new file mode 100644
> index 000..51c91f7410c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
> -mtune=generic-ooo -ffast-math" } */
> +
> +#define DEF_REDUC_PLUS(TYPE) 
>   \
> +  TYPE __attribute__ ((noinline, noclone))   
>   \
> +  reduc_plus_##TYPE (TYPE *__restrict a, int n)  
>   \
> +  {  
>   \
> +TYPE r = 0;  
>   \
> +for (int i = 0; i < n; ++i)  
>   \
> +  r += a[i]; 
>   \
> +return r;
>   \
> +  }
> +
> +#define TEST_PLUS(T) T (int) T (float)
> +
> +TEST_PLUS (DEF_REDUC_PLUS)
> +
> +/* { dg-final { scan-assembler-not {vsetivli\s+zero,\s*4} } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
> similarity index 93%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
> index 06e08ec5f2e..d361f1fc7fa 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr53.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
> -mtune=generic-ooo" } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
> -ffast-math" } */
>  
>  #define DEF_REDUC_PLUS(TYPE) 
>   \
>TYPE __attribute__ ((noinline, noclone))   
>   \
> @@ -11,7 +11,7 @@
>  return r;
>   \
>}
>  
> -#define TEST_PLUS(T) T (int)
> +#define TEST_PLUS(T) T (int) T (float)
>  
>  TEST_PLUS (DEF_REDUC_PLUS)
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 19e38b8637b..7a3db5f098b 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4872,12 +4872,10 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  
>   unsigned int length_update_cost = 0;
>   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> -   /* For decrement IV style, we use a single SELECT_VL since
> -  beginning to calculate the number of elements need to be
> -  processed in current iteration, and a SHIFT operation to
> -  compute the next memory address instead of adding vectorization
> -  factor.  */
> -   length_update_cost = 2;
> +   /* For decrement IV style, Each only need a single SELECT_VL
> +  or MIN since beginning to calculate the number of elements
> +  need to be processed in current iteration.  */
> +   length_update_cost = 1;
>   else
> /* For increment IV stype, Each may need two MINs and one MINUS to
>update lengths in body for next iteration.  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Richard Earnshaw (lists)
On 09/12/2023 15:39, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
>   * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>   (__gthrw): New function.
>   (__gthread_rwlock_rdlock): New function.
>   (__gthread_rwlock_tryrdlock): New function.
>   (__gthread_rwlock_wrlock): New function.
>   (__gthread_rwlock_trywrlock): New function.
>   (__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
>   * io/async.c (DEBUG_LINE): New macro.
>   * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>   (CHECK_RDLOCK): New macro.
>   (CHECK_WRLOCK): New macro.
>   (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>   (IN_RWLOCK_DEBUG_QUEUE): New macro.
>   (RDLOCK): New macro.
>   (WRLOCK): New macro.
>   (RWUNLOCK): New macro.
>   (RD_TO_WRLOCK): New macro.
>   (INTERN_RDLOCK): New macro.
>   (INTERN_WRLOCK): New macro.
>   (INTERN_RWUNLOCK): New macro.
>   * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>   a comment.
>   (unit_lock): Remove including associated internal_proto.
>   (unit_rwlock): New declarations including associated internal_proto.
>   (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>   instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>   unit_lock.
>   * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>   unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>   (st_write_done_worker): Likewise.
>   * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>   comment. Use unit_rwlock variable instead of unit_lock variable.
>   (get_gfc_unit_from_unit_root): New function.
>   (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>   LOCK and UNLOCK on unit_lock.
>   (close_units): Likewise.
>   (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>   unit_lock.
>   * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>   of LOCK and UNLOCK on unit_lock.
> 

It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function 
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
 1023 |   WRLOCK (&unit_rwlock);
  |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
 1025 |   RWUNLOCK (&unit_rwlock);
  |   ^~~~


R.

> ---
> v1 -> v2:
> Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
> 
> v2 -> v3:
> Rebase the patch with trunk branch.
> 
> v3 -> v4:
> Update the comments.
> 
> v4 -> v5:
> Fix typos and code formatter.
> 
> v5 -> v6:
> Add unit tests.
> 
> v6 -> v7:
> Update ChangeLog and code formatter.
> 
> Reviewed-by: Hongjiu Lu 
> Reviewed-by: Bernhard Reutner-Fischer 
> Reviewed-by: Thomas Koenig 
> Reviewed-by: Jakub Jelinek 
> Signed-off-by: Lipeng Zhu 
> ---
>  libgcc/gthr-posix.h   |  60 +++
>  libgfortran/io/async.c|   4 +
>  libgfortran/io/async.h| 151 ++
>  libgfortran/io/io.h   |  15 +-
>  libgfortran/io/transfer.c |   8 +-
>  libgfortran/io/unit.c | 117 +-
>  libgfortran/io/unix.c |  16 +-
>  .../testsuite/libgomp.fortran/rwlock_1.f90|  33 
>  .../testsuite/libgomp.fortran/rwlock_2.f90|  22 +++
>  .../testsuite/libgomp.fortran/rwlock_3.f90|  18 +++
>  10 files changed, 386 insertions(+), 58 deletions(-)
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> 
> diff --git a/libgcc/

[PATCH] tree-optimization/113018 - ICE with BB reduction vectorization

2023-12-14 Thread Richard Biener
When BB reduction vectorization picks up a chain with an ASM def
in it and that's inside the vectorized region we fail to get its
LHS.  Instead of trying to get the correct def the following
avoids vectorizing such def and instead keeps it as def to add
in the epilog.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113018
* tree-vect-slp.cc (vect_slp_check_for_roots): Only start
SLP discovery from stmts with a LHS.
---
 gcc/tree-vect-slp.cc | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f8a168caa60..a82fca45161 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7419,7 +7419,12 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
  invalid = true;
  break;
}
- if (chain[i].dt != vect_internal_def)
+ if (chain[i].dt != vect_internal_def
+ /* Avoid stmts where the def is not the LHS, like
+ASMs.  */
+ || (gimple_get_lhs (bb_vinfo->lookup_def
+ (chain[i].op)->stmt)
+ != chain[i].op))
remain_cnt++;
}
  if (!invalid && chain.length () - remain_cnt > 1)
@@ -7431,8 +7436,11 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
remain.create (remain_cnt);
  for (unsigned i = 0; i < chain.length (); ++i)
{
- if (chain[i].dt == vect_internal_def)
-   stmts.quick_push (bb_vinfo->lookup_def (chain[i].op));
+ stmt_vec_info stmt_info;
+ if (chain[i].dt == vect_internal_def
+ && ((stmt_info = bb_vinfo->lookup_def (chain[i].op)),
+ gimple_get_lhs (stmt_info->stmt) == chain[i].op))
+   stmts.quick_push (stmt_info);
  else
remain.quick_push (chain[i].op);
}
-- 
2.35.3


Re: [PATCH] expmed: Get vec_extract element mode from insn_data, [PR112999]

2023-12-14 Thread Robin Dapp
> It looks like:
> 
>   FOR_EACH_MODE_FROM (new_mode, new_mode)
>   if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
>   && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
>   && targetm.vector_mode_supported_p (new_mode)
>   && targetm.modes_tieable_p (GET_MODE (op0), new_mode))
> break;
> 
> should at least test whether the bitpos is a multiple of
> GET_MODE_UNIT_SIZE (new_mode), otherwise the new mode isn't really
> better.  Arguably it should also test whether bitnum is equal
> to GET_MODE_UNIT_SIZE (new_mode).
> 
> Not sure whether there'll be any fallout from that, but it seems
> worth trying.

Thanks, bootstrapped and regtested the attached v2 without fallout
on x86, aarch64 and power10.  Tested on riscv.

Regards
 Robin

Subject: [PATCH v2] expmed: Compare unit_precision for better mode.

In extract_bit_field_1 we try to get a better vector mode before
extracting from it.  Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieable vector mode with a fitting inner mode.

On riscv this triggered an ICE (PR112999) because we would take the
detour of extracting from a mask-mode vector via a vector integer mode.
One element of that mode could be subreg-punned with TImode which, in
turn, would need to be operated on in DImode chunks.

This patch adds

&& known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
&& multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))

to the list of criteria for a better mode.

gcc/ChangeLog:

PR target/112999

* expmed.cc (extract_bit_field_1):  Ensure better mode
has fitting unit_precision.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112999.c: New test.
---
 gcc/expmed.cc   |  2 ++
 .../gcc.target/riscv/rvv/autovec/pr112999.c | 17 +
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index d75314096b4..05331dd5d82 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1745,6 +1745,8 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   FOR_EACH_MODE_FROM (new_mode, new_mode)
if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
&& known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
+   && known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
+   && multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
&& targetm.vector_mode_supported_p (new_mode)
&& targetm.modes_tieable_p (GET_MODE (op0), new_mode))
  break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
new file mode 100644
index 000..c049c5a0386
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d 
--param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax -O3 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
+
+int a[1024];
+int b[1024];
+
+_Bool
+fn1 ()
+{
+  _Bool tem;
+  for (int i = 0; i < 1024; ++i)
+{
+  tem = !a[i];
+  b[i] = tem;
+}
+  return tem;
+}
-- 
2.43.0



Re: [PATCH] expmed: Get vec_extract element mode from insn_data, [PR112999]

2023-12-14 Thread Richard Sandiford
Robin Dapp  writes:
>> It looks like:
>> 
>>   FOR_EACH_MODE_FROM (new_mode, new_mode)
>>  if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
>>  && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
>>  && targetm.vector_mode_supported_p (new_mode)
>>  && targetm.modes_tieable_p (GET_MODE (op0), new_mode))
>>break;
>> 
>> should at least test whether the bitpos is a multiple of
>> GET_MODE_UNIT_SIZE (new_mode), otherwise the new mode isn't really
>> better.  Arguably it should also test whether bitnum is equal
>> to GET_MODE_UNIT_SIZE (new_mode).
>> 
>> Not sure whether there'll be any fallout from that, but it seems
>> worth trying.
>
> Thanks, bootstrapped and regtested the attached v2 without fallout
> on x86, aarch64 and power10.  Tested on riscv.

Nice.  No fallout on three targets seems promising.

> Regards
>  Robin
>
> Subject: [PATCH v2] expmed: Compare unit_precision for better mode.
>
> In extract_bit_field_1 we try to get a better vector mode before
> extracting from it.  Better refers to the case when the requested target
> mode does not equal the inner mode of the vector to extract from and we
> have an equivalent tieable vector mode with a fitting inner mode.
>
> On riscv this triggered an ICE (PR112999) because we would take the
> detour of extracting from a mask-mode vector via a vector integer mode.
> One element of that mode could be subreg-punned with TImode which, in
> turn, would need to be operated on in DImode chunks.
>
> This patch adds
>
> && known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
> && multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
>
> to the list of criteria for a better mode.
>
> gcc/ChangeLog:
>
>   PR target/112999
>
>   * expmed.cc (extract_bit_field_1):  Ensure better mode
>   has fitting unit_precision.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/autovec/pr112999.c: New test.

OK, thanks.

Richard

> ---
>  gcc/expmed.cc   |  2 ++
>  .../gcc.target/riscv/rvv/autovec/pr112999.c | 17 +
>  2 files changed, 19 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
>
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index d75314096b4..05331dd5d82 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1745,6 +1745,8 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>FOR_EACH_MODE_FROM (new_mode, new_mode)
>   if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
>   && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
> + && known_eq (bitsize, GET_MODE_UNIT_PRECISION (new_mode))
> + && multiple_p (bitnum, GET_MODE_UNIT_PRECISION (new_mode))
>   && targetm.vector_mode_supported_p (new_mode)
>   && targetm.modes_tieable_p (GET_MODE (op0), new_mode))
> break;
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
> new file mode 100644
> index 000..c049c5a0386
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112999.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d 
> --param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax 
> -O3 -fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
> +
> +int a[1024];
> +int b[1024];
> +
> +_Bool
> +fn1 ()
> +{
> +  _Bool tem;
> +  for (int i = 0; i < 1024; ++i)
> +{
> +  tem = !a[i];
> +  b[i] = tem;
> +}
> +  return tem;
> +}


Re: [PATCH] libstdc++: Make __gnu_debug::vector usable in constant expressions [PR109536]

2023-12-14 Thread Jonathan Wakely
On Wed, 6 Dec 2023 at 14:30, Jonathan Wakely wrote:
>
> Any comments on this approach?

Pushed to trunk now.

>
> -- >8 --
>
> This makes constexpr std::vector (mostly) work in Debug Mode. All safe
> iterator instrumentation and checking is disabled during constant
> evaluation, because it requires mutex locks and calls to non-inline
> functions defined in libstdc++.so. It should be OK to disable the safety
> checks, because most UB should be detected during constant evaluation
> anyway.
>
> We could try to enable the full checking in constexpr, but it would mean
> wrapping all the non-inline functions like _M_attach with an inline
> _M_constexpr_attach that does the iterator housekeeping inline without
> mutex locks when calling for constant evaluation, and calls the
> non-inline function at runtime. That could be done in future if we find
> that we've lost safety or useful checking by disabling the safe
> iterators.
>
> There are a few test failures in C++20 mode, which I'm unable to
> explain. The _Safe_iterator::operator++() member gives errors for using
> non-constexpr functions during constant evaluation, even though those
> functions are guarded by std::is_constant_evaluated() checks. The same
> code works fine for C++23 and up.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/109536
> * include/bits/c++config (__glibcxx_constexpr_assert): Remove
> macro.
> * include/bits/stl_algobase.h (__niter_base, __copy_move_a)
> (__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
> (__lexicographical_compare_aux): Add constexpr to overloads for
> debug mode iterators.
> * include/debug/helper_functions.h (__unsafe): Add constexpr.
> * include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
> macro, folding it into ...
> (_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
> __glibcxx_constexpr_assert.
> * include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
> to some member functions. Omit attaching, detaching and checking
> operations during constant evaluation.
> * include/debug/safe_container.h (_Safe_container): Likewise.
> * include/debug/safe_iterator.h (_Safe_iterator): Likewise.
> * include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
> (__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
> (__lexicographical_compare_aux): Add constexpr.
> * include/debug/vector (_Safe_vector, vector): Add constexpr.
> Omit safe iterator operations during constant evaluation.
> * testsuite/23_containers/vector/bool/capacity/constexpr.cc:
> Remove dg-xfail-if for debug mode.
> * testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
> * testsuite/23_containers/vector/bool/cons/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/bool/element_access/1.cc:
> Likewise.
> * testsuite/23_containers/vector/bool/element_access/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/capacity/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
> * testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
> * testsuite/23_containers/vector/data_access/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/element_access/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/modifiers/constexpr.cc:
> Likewise.
> * testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
> Likewise.
> ---
>  libstdc++-v3/include/bits/c++config   |   9 -
>  libstdc++-v3/include/bits/stl_algobase.h  |  15 ++
>  libstdc++-v3/include/debug/helper_functions.h |   1 +
>  libstdc++-v3/include/debug/macros.h   |   9 +-
>  libstdc++-v3/include/debug/safe_base.h|  35 +++-
>  libstdc++-v3/include/debug/safe_container.h   |  15 +-
>  libstdc++-v3/include/debug/safe_iterator.h| 186 +++---
>  libstdc++-v3/include/debug/safe_iterator.tcc  |  15 ++
>  libstdc++-v3/include/debug/vector | 146 --
>  .../vector/bool/capacity/constexpr.cc |   1 -
>  .../23_containers/vector/bool/cmp_c++20.cc|   1 -
>  .../vector/bool/cons/constexpr.cc |   1 -
>  .../vector/bool/element_access/1.cc   |   1 -
>  .../vector/bool/element_access/constexpr.cc   |   1 -
>  .../vector/bool/modifiers/assign/constexpr.cc |   1 -
>  .../vector/bool/modifiers/constexpr.cc|   1 -

[PATCH] doc: Document AArch64-specific asm operand modifiers

2023-12-14 Thread Alex Coplan
Hi,

As it stands, GCC doesn't document any public AArch64-specific operand
modifiers for use in inline asm.  This patch fixes that by documenting
an initial set of public AArch64-specific operand modifiers.

Tested with make html and checking the output looks OK in a browser.

OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* doc/extend.texi: Document AArch64 Operand Modifiers.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e8b5e771f7a..6ade36759ee 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11723,6 +11723,31 @@ operand as if it were a memory reference.
 @tab @code{%l0}
 @end multitable
 
+@anchor{aarch64Operandmodifiers}
+@subsubsection AArch64 Operand Modifiers
+
+The following table shows the modifiers supported by AArch64 and their effects:
+
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{w} @tab Print a 32-bit general-purpose register name or, given a
+constant zero operand, the 32-bit zero register (@code{wzr}).
+@item @code{x} @tab Print a 64-bit general-purpose register name or, given a
+constant zero operand, the 64-bit zero register (@code{xzr}).
+@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 
8-bit)
+prefix.
+@item @code{h} @tab Print an FP/SIMD register name with an @code{h} (halfword,
+16-bit) prefix.
+@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single
+word, 32-bit) prefix.
+@item @code{d} @tab Print an FP/SIMD register name with a @code{d} (doubleword,
+64-bit) prefix.
+@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword,
+128-bit) prefix.
+@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. 
with
+a @code{z} prefix).  This is a no-op for SVE register operands.
+@end multitable
+
 @anchor{x86Operandmodifiers}
 @subsubsection x86 Operand Modifiers
 


Re: [PATCH v3] AArch64: Add inline memmove expansion

2023-12-14 Thread Richard Sandiford
Sorry, only just realised that I've never replied to this :(

Wilco Dijkstra  writes:
> Hi Richard,
>
>> +  rtx load[max_ops], store[max_ops];
>>
>> Please either add a comment explaining why 40 is guaranteed to be
>> enough, or (my preference) use:
>>
>>  auto_vec, ...> ops;
>
> I've changed to using auto_vec since that should help reduce conflicts
> with Alex' LDP changes. I double-checked maximum number of instructions,
> with a minor tweak to handle AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS
> it can now be limited to 12 if you also select -mstrict-align.
>
> v3: update after review, use auto_vec, tweak max_copy_size, add another test.
>
> Add support for inline memmove expansions.  The generated code is identical
> as for memcpy, except that all loads are emitted before stores rather than
> being interleaved.  The maximum size is 256 bytes which requires at most 16
> registers.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/ChangeLog/
> * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):
> Change default.
> * config/aarch64/aarch64.md (cpymemdi): Add a parameter.
> (movmemdi): Call aarch64_expand_cpymem.
> * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename function,
> simplify, support storing generated loads/stores.
> (aarch64_expand_cpymem): Support expansion of memmove.
> * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool 
> arg.
>
> gcc/testsuite/ChangeLog/
> * gcc.target/aarch64/memmove.c: Add new test.
> * gcc.target/aarch64/memmove.c: Likewise.

OK, thanks.  I since added a comment:

 ??? Although it would be possible to use LDP/STP Qn in streaming mode
 (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear
 whether that would improve performance.  */

which now belongs...

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> d2718cc87b306e9673b166cc40e0af2ba72aa17b..d958b181d79440ab1b4f274cc188559edc04c628
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -769,7 +769,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  tree aarch64_vector_load_decl (tree);
>  void aarch64_expand_call (rtx, rtx, rtx, bool);
>  bool aarch64_expand_cpymem_mops (rtx *, bool);
> -bool aarch64_expand_cpymem (rtx *);
> +bool aarch64_expand_cpymem (rtx *, bool);
>  bool aarch64_expand_setmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_float_const_rtx_p (rtx);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 748b313092c5af452e9526a0c6747c51e598e4ca..26d1485ff6b977caeeb780dfaee739069742e638
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23058,51 +23058,41 @@ aarch64_progress_pointer (rtx pointer)
>return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
>  }
>
> +typedef auto_vec, 12> copy_ops;
> +
>  /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
>
>  static void
> -aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
> - machine_mode mode)
> +aarch64_copy_one_block (copy_ops &ops, rtx src, rtx dst,
> +   int offset, machine_mode mode)
>  {
> -  /* Handle 256-bit memcpy separately.  We do this by making 2 adjacent 
> memory
> - address copies using V4SImode so that we can use Q registers.  */
> -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> +  /* Emit explict load/store pair instructions for 32-byte copies.  */
> +  if (known_eq (GET_MODE_SIZE (mode), 32))
>  {
>mode = V4SImode;
> +  rtx src1 = adjust_address (src, mode, offset);
> +  rtx src2 = adjust_address (src, mode, offset + 16);
> +  rtx dst1 = adjust_address (dst, mode, offset);
> +  rtx dst2 = adjust_address (dst, mode, offset + 16);
>rtx reg1 = gen_reg_rtx (mode);
>rtx reg2 = gen_reg_rtx (mode);
> -  /* "Cast" the pointers to the correct mode.  */
> -  *src = adjust_address (*src, mode, 0);
> -  *dst = adjust_address (*dst, mode, 0);
> -  /* Emit the memcpy.  */
> -  emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,
> -   aarch64_progress_pointer (*src)));
> -  emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,
> -aarch64_progress_pointer (*dst), 
> reg2));
> -  /* Move the pointers forward.  */
> -  *src = aarch64_move_pointer (*src, 32);
> -  *dst = aarch64_move_pointer (*dst, 32);
> +  rtx load = aarch64_gen_load_pair (mode, reg1, src1, reg2, src2);
> +  rtx store = aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2);
> +  ops.safe_push ({ load, store });
>return;
>  }
>
>rtx reg = gen_reg_rtx (mode);
> -
> -  /* "Cast" the pointers to the correct mode.  */
> -  *src = ad

Re: [PATCH] doc: Document AArch64-specific asm operand modifiers

2023-12-14 Thread Richard Sandiford
Alex Coplan  writes:
> Hi,
>
> As it stands, GCC doesn't document any public AArch64-specific operand
> modifiers for use in inline asm.  This patch fixes that by documenting
> an initial set of public AArch64-specific operand modifiers.
>
> Tested with make html and checking the output looks OK in a browser.
>
> OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   * doc/extend.texi: Document AArch64 Operand Modifiers.

OK.  And thanks for doing this.  The previous line of "only documented
AArch64 modifiers are supported in asms" and "no AArch64 modifiers
are documented" just wasn't defensible, especially given that many of
the modifiers in the patch are necessary for even basic usage.

Richard

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e8b5e771f7a..6ade36759ee 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -11723,6 +11723,31 @@ operand as if it were a memory reference.
>  @tab @code{%l0}
>  @end multitable
>  
> +@anchor{aarch64Operandmodifiers}
> +@subsubsection AArch64 Operand Modifiers
> +
> +The following table shows the modifiers supported by AArch64 and their 
> effects:
> +
> +@multitable @columnfractions .10 .90
> +@headitem Modifier @tab Description
> +@item @code{w} @tab Print a 32-bit general-purpose register name or, given a
> +constant zero operand, the 32-bit zero register (@code{wzr}).
> +@item @code{x} @tab Print a 64-bit general-purpose register name or, given a
> +constant zero operand, the 64-bit zero register (@code{xzr}).
> +@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 
> 8-bit)
> +prefix.
> +@item @code{h} @tab Print an FP/SIMD register name with an @code{h} 
> (halfword,
> +16-bit) prefix.
> +@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single
> +word, 32-bit) prefix.
> +@item @code{d} @tab Print an FP/SIMD register name with a @code{d} 
> (doubleword,
> +64-bit) prefix.
> +@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword,
> +128-bit) prefix.
> +@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. 
> with
> +a @code{z} prefix).  This is a no-op for SVE register operands.
> +@end multitable
> +
>  @anchor{x86Operandmodifiers}
>  @subsubsection x86 Operand Modifiers
>  


RE: [PATCH] middle-end: Fix up constant handling in emit_conditional_move [PR111260]

2023-12-14 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, December 14, 2023 5:23 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Fix up constant handling in
> emit_conditional_move [PR111260]
> 
> On Wed, Dec 13, 2023 at 5:51 PM Andrew Pinski
>  wrote:
> >
> > After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to
> a
> > register if it is shared with one of the other operands. The problem
> > is used the comparison mode for the register but that could be
> > different from the operand mode. This causes some issues on some targets.
> > To fix it, we either need to have the modes match or if it is an
> > integer mode, then we can use the lower part for the smaller mode.
> >
> > Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.
> 
> I think to fulfil the original purpose requiring matching modes is enough, the
> x86 backend checks for equality here, a subreg wouldn't be enough.
> In fact the whole point preserving equality doesn't work then.
> 
> So please just check the modes are equal (I also see I used 'mode' for the 
> cost
> check - I've really split out the check done by prepare_cmp_insn here btw).
> This seemed to be the simplest solution at the time, rather than for example
> trying to postpone legitimizing the constants (so rtx_equal_p could continue
> to be lazy with slight mode mismatches).

I was originally deciding between that patch and what this one.  I decided to 
go with this
one in the end as I was trying to keep the extra optimization. Anyways I test 
and commit
the one with just the checking of the modes to be equal.

Also it is not that rtx_equal_p is lazy on mode mismatches but rather all 
CONST_INT
don't have a mode (all have VOIDmode); this is why you have to pass around the 
cmpmode
and the outer mode of the conditional.

Thanks,
Andrew

> 
> Richard.
> 
> > PR middle-end/111260
> >
> > gcc/ChangeLog:
> >
> > * optabs.cc (emit_conditional_move): Fix up mode handling for
> > forcing the constant to a register.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/compile/condmove-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/optabs.cc | 40 +--
> >  .../gcc.c-torture/compile/condmove-1.c|  9 +
> >  2 files changed, 45 insertions(+), 4 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.c-torture/compile/condmove-1.c
> >
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > f0a048a6bdb..573cf22760e 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -5131,26 +5131,58 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
> >   /* If we are optimizing, force expensive constants into a register
> >  but preserve an eventual equality with op2/op3.  */
> >   if (CONSTANT_P (orig_op0) && optimize
> > + && (cmpmode == mode
> > + || (GET_MODE_CLASS (cmpmode) == MODE_INT
> > + && GET_MODE_CLASS (mode) == MODE_INT))
> >   && (rtx_cost (orig_op0, mode, COMPARE, 0,
> > optimize_insn_for_speed_p ())
> >   > COSTS_N_INSNS (1))
> >   && can_create_pseudo_p ())
> > {
> > + machine_mode new_mode;
> > + if (known_le (GET_MODE_PRECISION (cmpmode),
> GET_MODE_PRECISION (mode)))
> > +   new_mode = mode;
> > + else
> > +   new_mode = cmpmode;
> >   if (rtx_equal_p (orig_op0, op2))
> > -   op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> > +   {
> > + rtx r = force_reg (new_mode, orig_op0);
> > + op2p = gen_lowpart (mode, r);
> > + XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
> > +   }
> >   else if (rtx_equal_p (orig_op0, op3))
> > -   op3p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> > +   {
> > + rtx r = force_reg (new_mode, orig_op0);
> > + op3p = gen_lowpart (mode, r);
> > + XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
> > +   }
> > }
> >   if (CONSTANT_P (orig_op1) && optimize
> > + && (cmpmode == mode
> > + || (GET_MODE_CLASS (cmpmode) == MODE_INT
> > + && GET_MODE_CLASS (mode) == MODE_INT))
> >   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> > optimize_insn_for_speed_p ())
> >   > COSTS_N_INSNS (1))
> >   && can_create_pseudo_p ())
> > {
> > + machine_mode new_mode;
> > + if (known_le (GET_MODE_PRECISION (cmpmode),
> GET_MODE_PRECISION (mode)))
> > +   new_mode = mode;
> > + else
> > +   new_mode = cmpmode;
> >   i

Re: [PATCH] c++: Implement P2582R1, CTAD from inherited constructors

2023-12-14 Thread Marek Polacek
On Wed, Dec 13, 2023 at 08:48:49PM -0500, Jason Merrill wrote:
> On 11/27/23 10:58, Patrick Palka wrote:
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (type_targs_deducible_from): Adjust return type.
> > * pt.cc (alias_ctad_tweaks): Handle C++23 inherited CTAD.
> > (inherited_ctad_tweaks): Define.
> > (type_targs_deducible_from): Return the deduced arguments or
> > NULL_TREE instead of a bool.  Handle 'tmpl' being a TREE_LIST
> > representing a synthetic alias template.
> > (ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
> > USING_DECL in C++23 mode.
> > (deduction_guides_for): Add FIXME for stale cache entries in
> > light of inherited CTAD.
> 
> check_GNU_style.py notices a few too-long lines in comments:
> 
> > === ERROR type #2: lines should not exceed 80 characters (3 error(s)) ===
> > gcc/cp/pt.cc:30076:80:  /* FIXME this should mean they 
> > don't compare as equivalent.  */
> > gcc/cp/pt.cc:30138:80:   class template TMPL; adjust the base's guides be 
> > deduction guides for TMPL.  */
> > gcc/cp/pt.cc:30190:80:  /* If tmpl is a class template, this is 
> > trivial: it's deducible if TYPE is a
> 
> OK with those fixed.
 
> > index 4624794c4b7..74f92325d7a 100644
> > --- a/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C
> > +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C
> > @@ -1,5 +1,4 @@
> > -// Deduction from inherited constructors isn't supported yet, but we 
> > shouldn't
> > -// crash.  It may well be supported in C++23.
> > +// Deduction from inherited constructors isn't supported before C++23.
> >   //{ dg-do compile { target c++17 } }
> > @@ -17,5 +16,5 @@ int main()
> >   {
> > B b = 42;   // { dg-line init }
> > // { dg-prune-output "no matching function" }
> > -  // { dg-error "class template argument deduction" "" { target *-*-* } 
> > init }
> > +  // { dg-error "class template argument deduction" "" { target c++23_down 
> > } init }
> >   }

I checked in this patch:

-- >8 --
The test says that CTAD from inherited constructors doesn't work
before C++23 so we should use c++20_down for the error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction67.C: Correct dg-error target.
---
 gcc/testsuite/g++.dg/cpp1z/class-deduction67.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C
index 74f92325d7a..fa1523d99d5 100644
--- a/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction67.C
@@ -16,5 +16,5 @@ int main()
 {
   B b = 42;// { dg-line init }
   // { dg-prune-output "no matching function" }
-  // { dg-error "class template argument deduction" "" { target c++23_down } 
init }
+  // { dg-error "class template argument deduction" "" { target c++20_down } 
init }
 }

base-commit: e5e1999aa664333f766f3e6cc6996f769d50ae7a
-- 
2.43.0



RE: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-12-14 Thread Tamar Christina
> I see you've changed it from:
> 
> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> 
> to:
> 
> +  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
> +   operands[3]));
> 
> Was that to fix a specific problem?  The original looked OK to me
> for that part (it was the vector comparison that I was asking about).
> 

No,It was to be more consistent with the Arm and MVE patch.  

Note that I may update the tests to disable scheduling.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_I 1 "register_operand")
+ (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+ so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+{
+  tmp = gen_reg_rtx (mode);
+  emit_insn (gen_xor3 (tmp, operands[1], operands[2]));
+}
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = gen_lowpart (V4SImode, tmp);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, gen_lowpart (mode, res));
+}
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 
..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+** cmgtv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** cmgev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** cmeqv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != 0)
+   break;
+}
+}
+
+/*
+** f5:
+** ...
+** cmltv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] < 0)
+   break;
+}
+}
+
+/*
+** f6:
+** ...
+** cmlev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-

[patch, fortran, committed] PR112873 F2023 degree trig functions

2023-12-14 Thread Jerry D

The following has been committed per discussion in the subject PR.

commit 95b70545331764c85079a1d0e1e19b605bda1456 (HEAD -> master, 
origin/master, origin/HEAD)

Author: Jerry DeLisle 
Date:   Wed Dec 13 19:04:50 2023 -0800

fortran: Add degree based trig functions for F2023

PR fortran/112873

gcc/fortran/ChangeLog:

* gfortran.texi: Update to reflect the changes.
* intrinsic.cc (add_functions): Update the standard that the
various  degree trigonometric functions have been described in.
(gfc_check_intrinsic_standard): Add an error string for F2023.
* intrinsic.texi: Update accordingly.

Thanks to Steve for the changes checked by Harald and myself. I know 
there is a way to do a co-Author on a git commit. Will try to do that 
next time when it applies.


Jerry


RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-14 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, December 14, 2023 1:13 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org;
> nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Wed, 13 Dec 2023, Tamar Christina wrote:
> 
> > > > >   else if (vect_use_mask_type_p (stmt_info))
> > > > > {
> > > > >   unsigned int precision = stmt_info->mask_precision;
> > > > >   scalar_type = build_nonstandard_integer_type (precision, 1);
> > > > >   vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > > group_size);
> > > > >   if (!vectype)
> > > > > return opt_result::failure_at (stmt, "not vectorized: 
> > > > > unsupported"
> > > > >" data-type %T\n", 
> > > > > scalar_type);
> > > > >
> > > > > Richard, do you have any advice here?  I suppose
> vect_determine_precisions
> > > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > > extra mask producer we add here we have to emulate what it would have
> > > > > done, right?
> > > >
> > > > How about handling gconds directly in vect_determine_mask_precision?
> > > > In a sense it's not needed, since gconds are always roots, and so we
> > > > could calculate their precision on the fly instead.  But handling it in
> > > > vect_determine_mask_precision feels like it should reduce the number
> > > > of special cases.
> > >
> > > Yeah, that sounds worth trying.
> > >
> > > Richard.
> >
> > So here's a respin with this suggestion and the other issues fixed.
> > Note that the testcases still need to be updated with the right stanzas.
> >
> > The patch is much smaller, I still have a small change to
> > vect_get_vector_types_for_stmt  in case we get there on a gcond where
> > vect_recog_gcond_pattern couldn't apply due to the target missing an
> > appropriate vectype.  The change only gracefully rejects the gcond.
> >
> > Since patterns cannot apply to the same root twice I've had to also do
> > the split of the condition out of the gcond in bitfield lowering.
> 
> Bah.  Guess we want to fix that (next stage1).  Can you please add
> a comment to the split out done in vect_recog_bitfield_ref_pattern?

Done.

> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and
> no issues.
> >
> > Ok for master?
> 
> OK with the above change.
> 

Thanks!

That leaves one patch left. I'll have that for you Tuesday morning.  Currently 
going over it
to see if I can't clean it up (and usually a day or two helps) more to minimize 
respins.

I'll then also send the final testsuite patches.

Thanks for all the reviews!

Cheers,
Tamar


Re: [PATCH 1/2] emit-rtl, lra: Move lra's emit_inc to emit-rtl.cc

2023-12-14 Thread Vladimir Makarov



On 12/13/23 16:00, Alex Coplan wrote:

Hi,

In PR112906 we ICE because we try to use force_reg to reload an
auto-increment address, but force_reg can't do this.

With the aim of fixing the PR by supporting reloading arbitrary
addresses in pre-RA splitters, this patch generalizes
lra-constraints.cc:emit_inc and makes it available to the rest of the
compiler by moving the generalized version to emit-rtl.cc.

We observe that the separate IN parameter to LRA's emit_inc is
redundant, since the function is static and is only (statically) called
once in lra-constraints.cc, with in == value.  As such, we drop the IN
parameter and simplify the code accordingly.

The function was initially adopted from reload1.cc:inc_for_reload.

We wrap the emit_inc code in a virtual class to allow LRA to override
how reload pseudos are created, thereby preserving the existing LRA
behaviour as much as possible.

We then add a second (higher-level) routine to emit-rtl.cc,
force_reload_address, which can reload arbitrary addresses.  This uses
the generalized emit_inc code to handle the RTX_AUTOINC case.  The
second patch in this series uses force_reload_address to fix PR112906.

Since we intend to call address_reload_context::emit_autoinc from within
splitters, and the code lifted from LRA calls recog, we have to avoid
clobbering recog_data.  We do this by introducing a new RAII class for
saving/restoring recog_data on the stack.

Bootstrapped/regtested on aarch64-linux-gnu, bootstrapped on
x86_64-linux-gnu, OK for trunk?

OK for me.  Thank you.

gcc/ChangeLog:

PR target/112906
* emit-rtl.cc (address_reload_context::emit_autoinc): New.
(force_reload_address): New.
* emit-rtl.h (struct address_reload_context): Declare.
(force_reload_address): Declare.
* lra-constraints.cc (class lra_autoinc_reload_context): New.
(emit_inc): Drop IN parameter, invoke
code moved to emit-rtl.cc:address_reload_context::emit_autoinc.
(curr_insn_transform): Drop redundant IN parameter in call to
emit_inc.
* recog.h (class recog_data_saver): New.




[PATCH] c++: abi_tag attribute on templates [PR109715]

2023-12-14 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Do we want to condition this on abi_check (19)?

-- >8 --

As with other declaration attributes, we need to look through
TEMPLATE_DECL when looking up the abi_tag attribute.

PR c++/109715

gcc/cp/ChangeLog:

* mangle.cc (get_abi_tags): Look through TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/abi/abi-tag25.C: New test.
---
 gcc/cp/mangle.cc |  3 +++
 gcc/testsuite/g++.dg/abi/abi-tag25.C | 17 +
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 0684f0e6038..1fbd879c116 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -527,6 +527,9 @@ get_abi_tags (tree t)
   if (!t || TREE_CODE (t) == NAMESPACE_DECL)
 return NULL_TREE;
 
+  if (TREE_CODE (t) == TEMPLATE_DECL && DECL_TEMPLATE_RESULT (t))
+t = DECL_TEMPLATE_RESULT (t);
+
   if (DECL_P (t) && DECL_DECLARES_TYPE_P (t))
 t = TREE_TYPE (t);
 
diff --git a/gcc/testsuite/g++.dg/abi/abi-tag25.C 
b/gcc/testsuite/g++.dg/abi/abi-tag25.C
new file mode 100644
index 000..9847f0dccc8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/abi-tag25.C
@@ -0,0 +1,17 @@
+// PR c++/109715
+// { dg-do compile { target c++11 } }
+
+template
+[[gnu::abi_tag("foo")]] void fun() { }
+
+template void fun();
+
+#if __cpp_variable_templates
+template
+[[gnu::abi_tag("foo")]] int var = 0;
+
+template int var;
+#endif
+
+// { dg-final { scan-assembler "_Z3funB3fooIiEvv" } }
+// { dg-final { scan-assembler "_Z3varB3fooIiE" { target c++14 } } }
-- 
2.43.0.76.g1a87c842ec



[PATCH] c++: section attribute on templates [PR70435, PR88061]

2023-12-14 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

The section attribute currently has no effect on templates because the
call to set_decl_section_name only happens at parse time and not also at
instantiation time.  This patch fixes this by propagating the section
name from the template to the instantiation.

PR c++/70435
PR c++/88061

gcc/cp/ChangeLog:

* pt.cc (tsubst_function_decl): Call set_decl_section_name.
(tsubst_decl) : Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-section1.C: New test.
* g++.dg/ext/attr-section1a.C: New test.
* g++.dg/ext/attr-section2.C: New test.
* g++.dg/ext/attr-section2a.C: New test.
---
 gcc/cp/pt.cc  |  4 
 gcc/testsuite/g++.dg/ext/attr-section1.C  |  9 +
 gcc/testsuite/g++.dg/ext/attr-section1a.C | 11 +++
 gcc/testsuite/g++.dg/ext/attr-section2.C  |  9 +
 gcc/testsuite/g++.dg/ext/attr-section2a.C | 14 ++
 5 files changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1a.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 50e6f062c85..8c4174fb902 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14607,6 +14607,8 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t 
complain,
= remove_attribute ("visibility", DECL_ATTRIBUTES (r));
 }
   determine_visibility (r);
+  if (DECL_SECTION_NAME (t))
+set_decl_section_name (r, t);
   if (DECL_DEFAULTED_OUTSIDE_CLASS_P (r)
   && !processing_template_decl)
 defaulted_late_check (r);
@@ -15423,6 +15425,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain,
  = remove_attribute ("visibility", DECL_ATTRIBUTES (r));
  }
determine_visibility (r);
+   if (!local_p && DECL_SECTION_NAME (t))
+ set_decl_section_name (r, t);
  }
 
if (!local_p)
diff --git a/gcc/testsuite/g++.dg/ext/attr-section1.C 
b/gcc/testsuite/g++.dg/ext/attr-section1.C
new file mode 100644
index 000..b8ac65baa93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section1.C
@@ -0,0 +1,9 @@
+// PR c++/70435
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+[[gnu::section(".foo")]] void fun() { }
+
+template void fun();
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section1a.C 
b/gcc/testsuite/g++.dg/ext/attr-section1a.C
new file mode 100644
index 000..be24be2fc95
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section1a.C
@@ -0,0 +1,11 @@
+// PR c++/70435
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+struct A {
+  [[gnu::section(".foo")]] void fun() { }
+};
+
+template struct A;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section2.C 
b/gcc/testsuite/g++.dg/ext/attr-section2.C
new file mode 100644
index 000..a76f43b346f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section2.C
@@ -0,0 +1,9 @@
+// PR c++/88061
+// { dg-do compile { target { c++14 && named_sections } } }
+
+template
+[[gnu::section(".foo")]] int var = 42;
+
+template int var;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section2a.C 
b/gcc/testsuite/g++.dg/ext/attr-section2a.C
new file mode 100644
index 000..a0b01cd8d93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section2a.C
@@ -0,0 +1,14 @@
+// PR c++/88061
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+struct A {
+  [[gnu::section(".foo")]] static int var;
+};
+
+template
+int A::var = 42;
+
+template struct A;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
-- 
2.43.0.76.g1a87c842ec



Re: [PATCH] c++: section attribute on templates [PR70435, PR88061]

2023-12-14 Thread Marek Polacek
On Thu, Dec 14, 2023 at 02:17:25PM -0500, Patrick Palka wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?

LGTM.
 
> -- >8 --
> 
> The section attribute currently has no effect on templates because the
> call to set_decl_section_name only happens at parse time and not also at
> instantiation time.  This patch fixes this by propagating the section
> name from the template to the instantiation.
> 
>   PR c++/70435
>   PR c++/88061
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_function_decl): Call set_decl_section_name.
>   (tsubst_decl) : Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/attr-section1.C: New test.
>   * g++.dg/ext/attr-section1a.C: New test.
>   * g++.dg/ext/attr-section2.C: New test.
>   * g++.dg/ext/attr-section2a.C: New test.
> ---
>  gcc/cp/pt.cc  |  4 
>  gcc/testsuite/g++.dg/ext/attr-section1.C  |  9 +
>  gcc/testsuite/g++.dg/ext/attr-section1a.C | 11 +++
>  gcc/testsuite/g++.dg/ext/attr-section2.C  |  9 +
>  gcc/testsuite/g++.dg/ext/attr-section2a.C | 14 ++
>  5 files changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1a.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2a.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 50e6f062c85..8c4174fb902 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -14607,6 +14607,8 @@ tsubst_function_decl (tree t, tree args, 
> tsubst_flags_t complain,
>   = remove_attribute ("visibility", DECL_ATTRIBUTES (r));
>  }
>determine_visibility (r);
> +  if (DECL_SECTION_NAME (t))
> +set_decl_section_name (r, t);
>if (DECL_DEFAULTED_OUTSIDE_CLASS_P (r)
>&& !processing_template_decl)
>  defaulted_late_check (r);
> @@ -15423,6 +15425,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> complain,
> = remove_attribute ("visibility", DECL_ATTRIBUTES (r));
> }
>   determine_visibility (r);
> + if (!local_p && DECL_SECTION_NAME (t))
> +   set_decl_section_name (r, t);
> }
>  
>   if (!local_p)
> diff --git a/gcc/testsuite/g++.dg/ext/attr-section1.C 
> b/gcc/testsuite/g++.dg/ext/attr-section1.C
> new file mode 100644
> index 000..b8ac65baa93
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/attr-section1.C
> @@ -0,0 +1,9 @@
> +// PR c++/70435
> +// { dg-do compile { target { c++11 && named_sections } } }
> +
> +template
> +[[gnu::section(".foo")]] void fun() { }
> +
> +template void fun();
> +
> +// { dg-final { scan-assembler {.section[ \t]+.foo} } }
> diff --git a/gcc/testsuite/g++.dg/ext/attr-section1a.C 
> b/gcc/testsuite/g++.dg/ext/attr-section1a.C
> new file mode 100644
> index 000..be24be2fc95
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/attr-section1a.C
> @@ -0,0 +1,11 @@
> +// PR c++/70435
> +// { dg-do compile { target { c++11 && named_sections } } }
> +
> +template
> +struct A {
> +  [[gnu::section(".foo")]] void fun() { }
> +};
> +
> +template struct A;
> +
> +// { dg-final { scan-assembler {.section[ \t]+.foo} } }
> diff --git a/gcc/testsuite/g++.dg/ext/attr-section2.C 
> b/gcc/testsuite/g++.dg/ext/attr-section2.C
> new file mode 100644
> index 000..a76f43b346f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/attr-section2.C
> @@ -0,0 +1,9 @@
> +// PR c++/88061
> +// { dg-do compile { target { c++14 && named_sections } } }
> +
> +template
> +[[gnu::section(".foo")]] int var = 42;
> +
> +template int var;
> +
> +// { dg-final { scan-assembler {.section[ \t]+.foo} } }
> diff --git a/gcc/testsuite/g++.dg/ext/attr-section2a.C 
> b/gcc/testsuite/g++.dg/ext/attr-section2a.C
> new file mode 100644
> index 000..a0b01cd8d93
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/attr-section2a.C
> @@ -0,0 +1,14 @@
> +// PR c++/88061
> +// { dg-do compile { target { c++11 && named_sections } } }
> +
> +template
> +struct A {
> +  [[gnu::section(".foo")]] static int var;
> +};
> +
> +template
> +int A::var = 42;
> +
> +template struct A;
> +
> +// { dg-final { scan-assembler {.section[ \t]+.foo} } }
> -- 
> 2.43.0.76.g1a87c842ec
> 

Marek



Re: [PATCH] RISC-V: fix scalar crypto pattern

2023-12-14 Thread Jeff Law




On 12/14/23 02:48, Christoph Müllner wrote:

On Thu, Dec 14, 2023 at 1:40 AM Jeff Law  wrote:

On 12/13/23 02:03, Christoph Müllner wrote:

On Wed, Dec 13, 2023 at 9:22 AM Liao Shihua  wrote:


In Scalar Crypto Built-In functions, some require immediate parameters,
But register_operand are incorrectly used in the pattern.

E.g.:
 __builtin_riscv_aes64ks1i(rs1,1)
 Before:
li a5,1
aes64ks1i a0,a0,a5

Assembler messages:
Error: instruction aes64ks1i requires absolute expression

 After:
aes64ks1i a0,a0,1


Looks good to me (also tested with rv32 and rv64).
(I was actually surprised that the D03 constraint was not sufficient)

Reviewed-by: Christoph Muellner 
Tested-by: Christoph Muellner 

Nit: I would prefer to separate arguments with a comma followed by a space.
Even if the existing code was not written like that.
E.g. __builtin_riscv_sm4ed(rs1,rs2,1); -> __builtin_riscv_sm4ed(rs1, rs2, 1);

I propose to remove the builtin tests for scalar crypto and scalar bitmanip
as part of the patchset that adds the intrinsic tests (no value in
duplicated tests).


gcc/ChangeLog:

  * config/riscv/crypto.md: Use immediate_operand instead of 
register_operand.

You should mention the actual patterns changed.

I would strongly recommend adding some tests that out of range cases are
rejected (out of range constants as well as a variable for that last
argument).  I did that in my patch from June to fix this problem (which
was never acked/reviewed).


Sorry, I was not aware of this patch.
No worries.  I'd planned to ping it again as part of the stage3 
bugfixing effort ;-)  It wasn't until I started looking at Liao's patch 
that I realized he was fixing the same problem.



Since Jeff's patch was here first and also includes more tests, I
propose to move forward with his patch (but I'm not a maintainer!).
Therefore, I've reviewed Jeff's patch and replied to his email.
Thanks.  I think the combination of your review, the high overlap with 
Liao's work and my status as a global maintainer should be sufficient to 
move this forward.



Jeff


Re: [PATCH] RISC-V: fix scalar crypto pattern

2023-12-14 Thread Jeff Law




On 12/14/23 04:12, Liao Shihua wrote:

Sorry, I was not aware of this patch.
Since Jeff's patch was here first and also includes more tests, I
propose to move forward with his patch (but I'm not a maintainer!).
Therefore, I've reviewed Jeff's patch and replied to his email.

FWIW: Jeff's patch can be found here:
   https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622233.html


No problem.

And I would tend to remove the D03 constraint if we used const_0_3_operand.

Seems reasonable to me.  I'll remove it.

jeff


Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-12-14 Thread Richard Sandiford
Tamar Christina  writes:
>> I see you've changed it from:
>> 
>> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
>> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
>> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
>> 
>> to:
>> 
>> +  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
>> +  operands[3]));
>> 
>> Was that to fix a specific problem?  The original looked OK to me
>> for that part (it was the vector comparison that I was asking about).
>> 
>
> No,It was to be more consistent with the Arm and MVE patch.  
>
> Note that I may update the tests to disable scheduling.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (cbranch4): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_"
>DONE;
>  })
>  
> +;; Patterns comparing two vectors and conditionally jump
> +
> +(define_expand "cbranch4"
> +  [(set (pc)
> +(if_then_else
> +  (match_operator 0 "aarch64_equality_operator"
> +[(match_operand:VDQ_I 1 "register_operand")
> + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> +  (label_ref (match_operand 3 ""))
> +  (pc)))]
> +  "TARGET_SIMD"
> +{
> +  auto code = GET_CODE (operands[0]);
> +  rtx tmp = operands[1];
> +
> +  /* If comparing against a non-zero vector we have to do a comparison first

...an EOR first

(or XOR)

OK with that change, thanks.

Richard

> + so we can have a != 0 comparison with the result.  */
> +  if (operands[2] != CONST0_RTX (mode))
> +{
> +  tmp = gen_reg_rtx (mode);
> +  emit_insn (gen_xor3 (tmp, operands[1], operands[2]));
> +}
> +
> +  /* For 64-bit vectors we need no reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (mode)))
> +{
> +  /* Always reduce using a V4SI.  */
> +  rtx reduc = gen_lowpart (V4SImode, tmp);
> +  rtx res = gen_reg_rtx (V4SImode);
> +  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> +  emit_move_insn (tmp, gen_lowpart (mode, res));
> +}
> +
> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> +  DONE;
> +})
> +
>  ;; Patterns comparing two vectors to produce a mask.
>  
>  (define_expand "vec_cmp"
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> new file mode 100644
> index 
> ..c0363c3787270507d7902bb2ac0e39faef63a852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> @@ -0,0 +1,124 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +
> +/*
> +** f1:
> +**   ...
> +**   cmgtv[0-9]+.4s, v[0-9]+.4s, #0
> +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   fmovx[0-9]+, d[0-9]+
> +**   cbnzx[0-9]+, \.L[0-9]+
> +**   ...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] > 0)
> + break;
> +}
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   cmgev[0-9]+.4s, v[0-9]+.4s, #0
> +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   fmovx[0-9]+, d[0-9]+
> +**   cbnzx[0-9]+, \.L[0-9]+
> +**   ...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] >= 0)
> + break;
> +}
> +}
> +
> +/*
> +** f3:
> +**   ...
> +**   cmeqv[0-9]+.4s, v[0-9]+.4s, #0
> +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   fmovx[0-9]+, d[0-9]+
> +**   cbnzx[0-9]+, \.L[0-9]+
> +**   ...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] == 0)
> + break;
> +}
> +}
> +
> +/*
> +** f4:
> +**   ...
> +**   cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   fmovx[0-9]+, d[0-9]+
> +**   cbnzx[0-9]+, \.L[0-9]+
> +**   ...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +{
> +  b[i] += a[i];
> +  if (a[i] != 0)
> + break;
> +}
> +}
> +
> +/*
> +** f5:
> +**   ...
> +**   cmltv[0-9]+.4s, v[0-9]+.4s, #0
> +**   umaxp   v

Re: [PATCH v2] aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-14 Thread Richard Sandiford
Andrew Carlotti  writes:
> On Sat, Dec 09, 2023 at 07:22:49PM +, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>> > ...
>> 
>> This is the only use of native_detect_p, so it'd be good to remove
>> the field itself.
>
> Done
>  
>> > ...
>> >
>> > @@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
>> >if (tune)
>> >  return res;
>> >  
>> > +  if (!processed_exts)
>> > +goto not_found;
>> 
>> Could you explain this part?  It seems like more of a parsing change
>> (i.e. being more strict about what we accept).
>> 
>> If that's the intention, it probably belongs in:
>> 
>>   if (n_cores == 0
>>   || n_cores > 2
>>   || (n_cores == 1 && n_variants != 1)
>>   || imp == INVALID_IMP)
>> goto not_found;
>> 
>> But maybe it should be a separate patch.
>
> I added it because I realised that the parsing behaviour didn't make sense in
> that case, and my patch happens to change the behaviour as well (the outcome
> without the check would be no enabled features, whereas previously it would
> enable only the features with no native detection).

Ah, OK, thanks for the explanation.

> I agree that it makes sense to put it with the original check, so I've made 
> that change.
>
>> Looks good otherwise, thanks.
>> 
>> Richard
>
> New patch version below, ok for master?
>
> ---
>
> For native cpu feature detection, certain features have no entry in
> /proc/cpuinfo, so have to be assumed to be present whenever the detected
> cpu is supposed to support that feature.
>
> However, the logic for this was mistakenly implemented by excluding
> these features from part of aarch64_get_extension_string_for_isa_flags.
> This function is also used elsewhere when canonicalising explicit
> feature sets, which may require removing features that are normally
> implied by the specified architecture version.
>
> This change reenables generation of +nopredres, +nols64 and +nomops
> during canonicalisation, by relocating the misplaced native cpu
> detection logic.
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (struct aarch64_option_extension): Remove unused field.
>   (all_extensions): Ditto.
>   (aarch64_get_extension_string_for_isa_flags): Remove filtering
>   of features without native detection.
>   * config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
>   Explicitly add expected features that lack cpuinfo detection.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/options_set_28.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> c2a6d357c0bc17996a25ea5c3a40f69d745c7931..4d0431d3a2cad5414790646bce0c09877c0366b2
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -149,9 +149,6 @@ struct aarch64_option_extension
>aarch64_feature_flags flags_on;
>/* If this feature is turned off, these bits also need to be turned off.  
> */
>aarch64_feature_flags flags_off;
> -  /* Indicates whether this feature is taken into account during native cpu
> - detection.  */
> -  bool native_detect_p;
>  };
>  
>  /* ISA extensions in AArch64.  */
> @@ -159,10 +156,9 @@ static constexpr aarch64_option_extension 
> all_extensions[] =
>  {
>  #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
>{NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
> -   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
> -   FEATURE_STRING[0]},
> +   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
>  #include "config/aarch64/aarch64-option-extensions.def"
> -  {NULL, 0, 0, 0, false}
> +  {NULL, 0, 0, 0}
>  };
>  
>  struct processor_name_to_arch
> @@ -358,8 +354,7 @@ aarch64_get_extension_string_for_isa_flags
>   /* If either crypto flag needs removing here, then both do.  */
>   flags = flags_crypto;
>  
> -  if (opt.native_detect_p
> -   && (flags & current_flags & ~isa_flags))
> +  if (flags & current_flags & ~isa_flags)
>   {
> current_flags &= ~opt.flags_off;
> outstr += "+no";
> diff --git a/gcc/config/aarch64/driver-aarch64.cc 
> b/gcc/config/aarch64/driver-aarch64.cc
> index 
> 8e318892b10aa2288421fad418844744a2f5a3b4..c18f065aa41e7328d71b45a53c82a3b703ae44d5
>  100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
>unsigned int n_variants = 0;
>bool processed_exts = false;
>aarch64_feature_flags extension_flags = 0;
> +  aarch64_feature_flags unchecked_extension_flags = 0;
>aarch64_feature_flags default_flags = 0;
>std::string buf;
>size_t sep_pos = -1;
> @@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
> /* If the feature contains no HWCAPS string then ignore it for the
>

[PATCH] hardened: use LD_PIE_SPEC only if defined

2023-12-14 Thread Alexandre Oliva


sol2.h may define LINK_PIE_SPEC and leave LD_PIE_SPEC undefined, but
gcc.cc will only provide a LD_PIE_SPEC definition if LINK_PIE_SPEC is
not defined, and then it uses LD_PIE_SPEC guarded by #ifdef HAVE_LD_PIE
only.  Add LD_PIE_SPEC to the guard.

Regstrapped on x86_64-linux-gnu; also testing on sparc-solaris2.11.3,
where I hit the problem and couldn't build a baseline to compare with.
Ok to install?


gcc/ChangeLog

* gcc.cc (process_command): Use LD_PIE_SPEC only if defined.
---
 gcc/gcc.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 701f5cdfb59c8..d5e02c11cb05d 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -5008,7 +5008,7 @@ process_command (unsigned int decoded_options_count,
 {
   if (!any_link_options_p && !static_p)
{
-#ifdef HAVE_LD_PIE
+#if defined HAVE_LD_PIE && defined LD_PIE_SPEC
  save_switch (LD_PIE_SPEC, 0, NULL, /*validated=*/true, 
/*known=*/false);
 #endif
  /* These are passed straight down to collect2 so we have to break

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] strub: avoid lto inlining

2023-12-14 Thread Alexandre Oliva


The strub builtins are not suited for cross-unit inlining, they should
only be inlined by the builtin expanders, if at all.  While testing on
sparc64, it occurred to me that, if libgcc was built with LTO enabled,
lto1 might inline them, and that would likely break things.  So, make
sure they're clearly marked as not inlinable.

Regstrapped on x86_64-linux-gnu, also testing on sparc-solaris2.11.3.
Ok to install?


for  libgcc/ChangeLog

* strub.c (ATTRIBUTE_NOINLINE): New.
(ATTRIBUTE_STRUB_CALLABLE): Add it.
(__strub_dummy_force_no_leaf): Drop it.
---
 libgcc/strub.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libgcc/strub.c b/libgcc/strub.c
index b0f990d9deebb..5062554d0e1e6 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -36,7 +36,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 # define TOPS <
 #endif
 
-#define ATTRIBUTE_STRUB_CALLABLE __attribute__ ((__strub__ ("callable")))
+/* Make sure these builtins won't be inlined, even with LTO.  */
+#define ATTRIBUTE_NOINLINE \
+  __attribute__ ((__noinline__, __noclone__))
+
+#define ATTRIBUTE_STRUB_CALLABLE \
+  __attribute__ ((__strub__ ("callable"))) ATTRIBUTE_NOINLINE
 
 /* Enter a stack scrubbing context, initializing the watermark to the caller's
stack address.  */
@@ -72,7 +77,6 @@ __strub_update (void **watermark)
 /* Dummy function, called to force the caller to not be a leaf function, so
that it can't use the red zone.  */
 static void ATTRIBUTE_STRUB_CALLABLE
-__attribute__ ((__noinline__, __noipa__))
 __strub_dummy_force_no_leaf (void)
 {
 }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] strub: use opt_for_fn during ipa

2023-12-14 Thread Alexandre Oliva


Instead of global optimization levels and flags, check per-function
ones.

Regstrapped on x86_64-linux-gnu, also testing on sparc-solaris2.11.3.
Ok to install?

(sorry, Richi, I dropped the ball and failed to fix this before the
monster commit)


for  gcc/ChangeLog

* ipa-strub.cc (gsi_insert_finally_seq_after_call): Likewise.
(pass_ipa_strub::adjust_at_calls_call): Likewise.
---
 gcc/ipa-strub.cc |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index 943bb60996fc1..32e2869cf7840 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -2132,7 +2132,7 @@ gsi_insert_finally_seq_after_call (gimple_stmt_iterator 
gsi, gimple_seq seq)
|| (call && gimple_call_nothrow_p (call))
|| (eh_lp <= 0
&& (TREE_NOTHROW (cfun->decl)
-   || !flag_exceptions)));
+   || !opt_for_fn (cfun->decl, flag_exceptions;
 
   if (noreturn_p && nothrow_p)
 return;
@@ -2470,9 +2470,11 @@ pass_ipa_strub::adjust_at_calls_call (cgraph_edge *e, 
int named_args,
   /* If we're already within a strub context, pass on the incoming watermark
  pointer, and omit the enter and leave calls around the modified call, as 
an
  optimization, or as a means to satisfy a tail-call requirement.  */
-  tree swmp = ((optimize_size || optimize > 2
+  tree swmp = ((opt_for_fn (e->caller->decl, optimize_size)
+   || opt_for_fn (e->caller->decl, optimize) > 2
|| gimple_call_must_tail_p (ocall)
-   || (optimize == 2 && gimple_call_tail_p (ocall)))
+   || (opt_for_fn (e->caller->decl, optimize) == 2
+   && gimple_call_tail_p (ocall)))
   ? strub_watermark_parm (e->caller->decl)
   : NULL_TREE);
   bool omit_own_watermark = swmp;

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] hardened: use LD_PIE_SPEC only if defined

2023-12-14 Thread Marek Polacek
On Thu, Dec 14, 2023 at 04:50:49PM -0300, Alexandre Oliva wrote:
> 
> sol2.h may define LINK_PIE_SPEC and leave LD_PIE_SPEC undefined, but
> gcc.cc will only provide a LD_PIE_SPEC definition if LINK_PIE_SPEC is
> not defined, and then it uses LD_PIE_SPEC guarded by #ifdef HAVE_LD_PIE
> only.  Add LD_PIE_SPEC to the guard.
> 
> Regstrapped on x86_64-linux-gnu; also testing on sparc-solaris2.11.3,
> where I hit the problem and couldn't build a baseline to compare with.
> Ok to install?
 
OK, thanks.  Jakub notified me of this problem a few days ago and
I forgot about it, sorry :(.
 
> gcc/ChangeLog
> 
>   * gcc.cc (process_command): Use LD_PIE_SPEC only if defined.
> ---
>  gcc/gcc.cc |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 701f5cdfb59c8..d5e02c11cb05d 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -5008,7 +5008,7 @@ process_command (unsigned int decoded_options_count,
>  {
>if (!any_link_options_p && !static_p)
>   {
> -#ifdef HAVE_LD_PIE
> +#if defined HAVE_LD_PIE && defined LD_PIE_SPEC
> save_switch (LD_PIE_SPEC, 0, NULL, /*validated=*/true, 
> /*known=*/false);
>  #endif
> /* These are passed straight down to collect2 so we have to break
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
> 

Marek



[PATCH #1/2] strub: sparc: omit frame in strub_leave [PR112917]

2023-12-14 Thread Alexandre Oliva


If we allow __strub_leave to allocate a frame on sparc, it will
overlap with a lot of the stack range we're supposed to scrub, because
of the large fixed-size outgoing args and register save area.
Unfortunately, setting up the PIC register seems to prevent the frame
pointer from being omitted.

Since the strub runtime doesn't issue calls or use global variables,
at least on sparc, disabling PIC to compile strub.c seems to do the
right thing.

Regstrapped on x86_64-linux-gnu, also testing on sparc-solaris2.11.3.
(but it will likely take forever on the cfarm machine; if someone with
faster sparc machines could give this a spin and confirm, that would be
appreciated.) Ok to install?

IIRC this patch gets 32-bit sparc to pass all strub tests, but sparc64
still fails many of them; there's another one for sparc64 that fixes
them, and that will improve sparc -m32 as well.


for  libgcc/ChangeLog

PR middle-end/112917
* config.host (sparc, sparc64): Enable...
* config/sparc/t-sparc: ... this new fragment.
---
 libgcc/config.host  |2 ++
 libgcc/config/sparc/t-sparc |4 
 2 files changed, 6 insertions(+)
 create mode 100644 libgcc/config/sparc/t-sparc

diff --git a/libgcc/config.host b/libgcc/config.host
index 694e3e9f54cad..54d06978a5d2c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -199,9 +199,11 @@ riscv*-*-*)
;;
 sparc64*-*-*)
cpu_type=sparc
+   tmake_file="${tmake_file} sparc/t-sparc"
;;
 sparc*-*-*)
cpu_type=sparc
+   tmake_file="${tmake_file} sparc/t-sparc"
;;
 s390*-*-*)
cpu_type=s390
diff --git a/libgcc/config/sparc/t-sparc b/libgcc/config/sparc/t-sparc
new file mode 100644
index 0..fb1bf1fc29cc4
--- /dev/null
+++ b/libgcc/config/sparc/t-sparc
@@ -0,0 +1,4 @@
+# This is needed for __strub_leave to omit the frame pointer, without
+# which it will allocate a register save area on the stack and leave
+# it unscrubbed and most likely unused, because it's a leaf function.
+CFLAGS-strub.c += -fno-PIC -fomit-frame-pointer

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] c++: abi_tag attribute on templates [PR109715]

2023-12-14 Thread Jason Merrill

On 12/14/23 14:17, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Do we want to condition this on abi_check (19)?


I think we do, sadly.


-- >8 --

As with other declaration attributes, we need to look through
TEMPLATE_DECL when looking up the abi_tag attribute.

PR c++/109715

gcc/cp/ChangeLog:

* mangle.cc (get_abi_tags): Look through TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/abi/abi-tag25.C: New test.
---
  gcc/cp/mangle.cc |  3 +++
  gcc/testsuite/g++.dg/abi/abi-tag25.C | 17 +
  2 files changed, 20 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 0684f0e6038..1fbd879c116 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -527,6 +527,9 @@ get_abi_tags (tree t)
if (!t || TREE_CODE (t) == NAMESPACE_DECL)
  return NULL_TREE;
  
+  if (TREE_CODE (t) == TEMPLATE_DECL && DECL_TEMPLATE_RESULT (t))

+t = DECL_TEMPLATE_RESULT (t);
+
if (DECL_P (t) && DECL_DECLARES_TYPE_P (t))
  t = TREE_TYPE (t);
  
diff --git a/gcc/testsuite/g++.dg/abi/abi-tag25.C b/gcc/testsuite/g++.dg/abi/abi-tag25.C

new file mode 100644
index 000..9847f0dccc8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/abi-tag25.C
@@ -0,0 +1,17 @@
+// PR c++/109715
+// { dg-do compile { target c++11 } }
+
+template
+[[gnu::abi_tag("foo")]] void fun() { }
+
+template void fun();
+
+#if __cpp_variable_templates
+template
+[[gnu::abi_tag("foo")]] int var = 0;
+
+template int var;
+#endif
+
+// { dg-final { scan-assembler "_Z3funB3fooIiEvv" } }
+// { dg-final { scan-assembler "_Z3varB3fooIiE" { target c++14 } } }




Re: [PATCH] c++: section attribute on templates [PR70435, PR88061]

2023-12-14 Thread Jason Merrill

On 12/14/23 14:17, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

The section attribute currently has no effect on templates because the
call to set_decl_section_name only happens at parse time and not also at
instantiation time.  This patch fixes this by propagating the section
name from the template to the instantiation.

PR c++/70435
PR c++/88061

gcc/cp/ChangeLog:

* pt.cc (tsubst_function_decl): Call set_decl_section_name.
(tsubst_decl) : Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-section1.C: New test.
* g++.dg/ext/attr-section1a.C: New test.
* g++.dg/ext/attr-section2.C: New test.
* g++.dg/ext/attr-section2a.C: New test.
---
  gcc/cp/pt.cc  |  4 
  gcc/testsuite/g++.dg/ext/attr-section1.C  |  9 +
  gcc/testsuite/g++.dg/ext/attr-section1a.C | 11 +++
  gcc/testsuite/g++.dg/ext/attr-section2.C  |  9 +
  gcc/testsuite/g++.dg/ext/attr-section2a.C | 14 ++
  5 files changed, 47 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section1a.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-section2a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 50e6f062c85..8c4174fb902 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14607,6 +14607,8 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t 
complain,
= remove_attribute ("visibility", DECL_ATTRIBUTES (r));
  }
determine_visibility (r);
+  if (DECL_SECTION_NAME (t))
+set_decl_section_name (r, t);
if (DECL_DEFAULTED_OUTSIDE_CLASS_P (r)
&& !processing_template_decl)
  defaulted_late_check (r);
@@ -15423,6 +15425,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain,
  = remove_attribute ("visibility", DECL_ATTRIBUTES (r));
  }
determine_visibility (r);
+   if (!local_p && DECL_SECTION_NAME (t))
+ set_decl_section_name (r, t);
  }
  
  	if (!local_p)

diff --git a/gcc/testsuite/g++.dg/ext/attr-section1.C 
b/gcc/testsuite/g++.dg/ext/attr-section1.C
new file mode 100644
index 000..b8ac65baa93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section1.C
@@ -0,0 +1,9 @@
+// PR c++/70435
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+[[gnu::section(".foo")]] void fun() { }
+
+template void fun();
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section1a.C 
b/gcc/testsuite/g++.dg/ext/attr-section1a.C
new file mode 100644
index 000..be24be2fc95
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section1a.C
@@ -0,0 +1,11 @@
+// PR c++/70435
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+struct A {
+  [[gnu::section(".foo")]] void fun() { }
+};
+
+template struct A;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section2.C 
b/gcc/testsuite/g++.dg/ext/attr-section2.C
new file mode 100644
index 000..a76f43b346f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section2.C
@@ -0,0 +1,9 @@
+// PR c++/88061
+// { dg-do compile { target { c++14 && named_sections } } }
+
+template
+[[gnu::section(".foo")]] int var = 42;
+
+template int var;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }
diff --git a/gcc/testsuite/g++.dg/ext/attr-section2a.C 
b/gcc/testsuite/g++.dg/ext/attr-section2a.C
new file mode 100644
index 000..a0b01cd8d93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-section2a.C
@@ -0,0 +1,14 @@
+// PR c++/88061
+// { dg-do compile { target { c++11 && named_sections } } }
+
+template
+struct A {
+  [[gnu::section(".foo")]] static int var;
+};
+
+template
+int A::var = 42;
+
+template struct A;
+
+// { dg-final { scan-assembler {.section[ \t]+.foo} } }




Re: [PATCH 5/6] Allow poly_uint64 for group_size args to vector type query routines

2023-12-14 Thread Richard Sandiford
Richard Biener  writes:
> The following changes the unsigned group_size argument to a poly_uint64
> one to avoid too much special-casing in callers for VLA vectors when
> passing down the effective maximum desirable vector size to vector
> type query routines.  The intent is to be able to pass down
> the vectorization factor (times the SLP group size) eventually.
>
>   * tree-vectorizer.h (get_vectype_for_scalar_type,
>   get_mask_type_for_scalar_type, vect_get_vector_types_for_stmt):
>   Change group_size argument to poly_uint64 type.
>   (vect_get_mask_type_for_stmt): Remove prototype for no longer
>   existing function.
>   * tree-vect-stmts.cc (get_vectype_for_scalar_type): Change
>   group_size argument to poly_uint64.
>   (get_mask_type_for_scalar_type): Likewise.
>   (vect_get_vector_types_for_stmt): Likewise.

LGTM FWIW, although...

> ---
>  gcc/tree-vect-stmts.cc | 25 ++---
>  gcc/tree-vectorizer.h  |  7 +++
>  2 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 88401a2a00b..a5e26b746fb 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -13297,14 +13297,14 @@ get_related_vectype_for_scalar_type (machine_mode 
> prevailing_mode,
>  
>  tree
>  get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
> -  unsigned int group_size)
> +  poly_uint64 group_size)
>  {
>/* For BB vectorization, we should always have a group size once we've
>   constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
>   are tentative requests during things like early data reference
>   analysis and pattern recognition.  */
>if (is_a  (vinfo))
> -gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +gcc_assert (vinfo->slp_instances.is_empty () || known_ne (group_size, 
> 0));
>else
>  group_size = 0;
>  
> @@ -13320,9 +13320,11 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
> scalar_type,
>  
>/* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
>   try again with an explicit number of elements.  */
> +  uint64_t cst_group_size;
>if (vectype
> -  && group_size
> -  && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
> +  && group_size.is_constant (&cst_group_size)
> +  && cst_group_size != 0
> +  && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), cst_group_size))
>  {
>/* Start with the biggest number of units that fits within
>GROUP_SIZE and halve it until we find a valid vector type.

...it feels like this makes sense for VLA too in some form, if we
plan to keep it longer-term.  It's not a trivial adaption though,
so would definitely be a separate patch.

Thanks,
Richard

> @@ -13336,7 +13338,7 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
> scalar_type,
>even though the group is not a multiple of that vector size.
>The BB vectorizer will then try to carve up the group into
>smaller pieces.  */
> -  unsigned int nunits = 1 << floor_log2 (group_size);
> +  unsigned int nunits = 1 << floor_log2 (cst_group_size);
>do
>   {
> vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
> @@ -13372,7 +13374,7 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
> scalar_type, slp_tree node)
>  
>  tree
>  get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
> -unsigned int group_size)
> +poly_uint64 group_size)
>  {
>tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, 
> group_size);
>  
> @@ -14243,7 +14245,7 @@ opt_result
>  vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>   tree *stmt_vectype_out,
>   tree *nunits_vectype_out,
> - unsigned int group_size)
> + poly_uint64 group_size)
>  {
>gimple *stmt = stmt_info->stmt;
>  
> @@ -14252,7 +14254,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   are tentative requests during things like early data reference
>   analysis and pattern recognition.  */
>if (is_a  (vinfo))
> -gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +gcc_assert (vinfo->slp_instances.is_empty () || known_ne (group_size, 
> 0));
>else
>  group_size = 0;
>  
> @@ -14281,7 +14283,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  
>tree vectype;
>tree scalar_type = NULL_TREE;
> -  if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
> +  if (known_eq (group_size, 0U) && STMT_VINFO_VECTYPE (stmt_info))
>  {
>vectype = STMT_VINFO_VECTYPE (stmt_info);
>if (dump_enabled_p ())
> @@ -14310,10 +14312,11 @@ vect_get_vector_types_for_stmt (vec

Re: [V4] [C PATCH 1/4] c23: tag compatibility rules for struct and unions

2023-12-14 Thread Joseph Myers
On Mon, 27 Nov 2023, Martin Uecker wrote:

> Note that there is an additional change in parser_xref_tag
> to address the issue regarding completeness in redefinition
> which affects also structs / unions.  The test c23-tag-6.c
> was changed accordingly.
> 
> 
> c23: tag compatibility rules for struct and unions
> 
> Implement redeclaration and compatibility rules for
> structures and unions in C23.

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-14 Thread Di Zhao OS

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, December 13, 2023 5:01 PM
> To: Di Zhao OS 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> get_reassociation_width
> 
> On Wed, Dec 13, 2023 at 9:14 AM Di Zhao OS
>  wrote:
> >
> > Hello Richard,
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, December 11, 2023 7:01 PM
> > > To: Di Zhao OS 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > get_reassociation_width
> > >
> > > On Wed, Nov 29, 2023 at 3:36 PM Di Zhao OS
> > >  wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Tuesday, November 21, 2023 9:01 PM
> > > > > To: Di Zhao OS 
> > > > > Cc: gcc-patches@gcc.gnu.org
> > > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > > > get_reassociation_width
> > > > >
> > > > > On Thu, Nov 9, 2023 at 6:53 PM Di Zhao OS
> 
> > > > > wrote:
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Richard Biener 
> > > > > > > Sent: Tuesday, October 31, 2023 9:48 PM
> > > > > > > To: Di Zhao OS 
> > > > > > > Cc: gcc-patches@gcc.gnu.org
> > > > > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > > > > > get_reassociation_width
> > > > > > >
> > > > > > > On Sun, Oct 8, 2023 at 6:40 PM Di Zhao OS
> > > 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Attached is a new version of the patch.
> > > > > > > >
> > > > > > > > > -Original Message-
> > > > > > > > > From: Richard Biener 
> > > > > > > > > Sent: Friday, October 6, 2023 5:33 PM
> > > > > > > > > To: Di Zhao OS 
> > > > > > > > > Cc: gcc-patches@gcc.gnu.org
> > > > > > > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider
> FMA in
> > > > > > > > > get_reassociation_width
> > > > > > > > >
> > > > > > > > > On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > This is a new version of the patch on "nested FMA".
> > > > > > > > > > Sorry for updating this after so long, I've been studying
> and
> > > > > > > > > > writing micro cases to sort out the cause of the regression.
> > > > > > > > >
> > > > > > > > > Sorry for taking so long to reply.
> > > > > > > > >
> > > > > > > > > > First, following previous discussion:
> > > > > > > > > > (https://gcc.gnu.org/pipermail/gcc-patches/2023-
> > > > > September/629080.html)
> > > > > > > > > >
> > > > > > > > > > 1. From testing more altered cases, I don't think the
> > > > > > > > > > problem is that reassociation works locally. In that:
> > > > > > > > > >
> > > > > > > > > >   1) On the example with multiplications:
> > > > > > > > > >
> > > > > > > > > > tmp1 = a + c * c + d * d + x * y;
> > > > > > > > > > tmp2 = x * tmp1;
> > > > > > > > > > result += (a + c + d + tmp2);
> > > > > > > > > >
> > > > > > > > > >   Given "result" rewritten by width=2, the performance is
> > > > > > > > > >   worse if we rewrite "tmp1" with width=2. In contrast, if
> we
> > > > > > > > > >   remove the multiplications from the example (and make
> "tmp1"
> > > > > > > > > >   not singe used), and still rewrite "result" by width=2,
> then
> > > > > > > > > >   rewriting "tmp1" with width=2 is better. (Make sense
> because
> > > > > > > > > >   the tree's depth at "result" is still smaller if we
> rewrite
> > > > > > > > > >   "tmp1".)
> > > > > > > > > >
> > > > > > > > > >   2) I tried to modify the assembly code of the example
> without
> > > > > > > > > >   FMA, so the width of "result" is 4. On Ampere1 there's no
> > > > > > > > > >   obvious improvement. So although this is an interesting
> > > > > > > > > >   problem, it doesn't seem like the cause of the regression.
> > > > > > > > >
> > > > > > > > > OK, I see.
> > > > > > > > >
> > > > > > > > > > 2. From assembly code of the case with FMA, one problem is
> > > > > > > > > > that, rewriting "tmp1" to parallel didn't decrease the
> > > > > > > > > > minimum CPU cycles (taking MULT_EXPRs into account), but
> > > > > > > > > > increased code size, so the overhead is increased.
> > > > > > > > > >
> > > > > > > > > >a) When "tmp1" is not re-written to parallel:
> > > > > > > > > > fmadd d31, d2, d2, d30
> > > > > > > > > > fmadd d31, d3, d3, d31
> > > > > > > > > > fmadd d31, d4, d5, d31  //"tmp1"
> > > > > > > > > > fmadd d31, d31, d4, d3
> > > > > > > > > >
> > > > > > > > > >b) When "tmp1" is re-written to parallel:
> > > > > > > > > > fmul  d31, d4, d5
> > > > > > > > > > fmadd d27, d2, d2, d30
> > > > > > > > > > fmadd d31, d3, d3, d31
> > > > > > > > > > fadd  d31, d31, d27 //"tmp1"
> > > > > > > > > > fmadd d31, d31, d4, d3
> > > > > > > > > >
> > > > > > > > > > For version a), there are 3 dependent FMAs to calculate
> "tmp1".
> > > > > > > > > 

Re: [V4] [PATCH 2/4] c23: tag compatibility rules for enums

2023-12-14 Thread Joseph Myers
On Mon, 27 Nov 2023, Martin Uecker wrote:

> + enum B : short { M = 1 } *y2e = &x2;/* { dg-warning "incompatible" 
> } */

This probably now needs to be dg-error because 
-Wincompatible-pointer-types is now an error by default.  OK with that 
changed as needed (you may also need such a change in patch 1, so retest 
both patches against current master before applying).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v4] c++: fix ICE with sizeof in a template [PR112869]

2023-12-14 Thread Marek Polacek
On Wed, Dec 13, 2023 at 03:28:38PM -0500, Jason Merrill wrote:
> On 12/12/23 17:48, Marek Polacek wrote:
> > On Fri, Dec 08, 2023 at 11:09:15PM -0500, Jason Merrill wrote:
> > > On 12/8/23 16:15, Marek Polacek wrote:
> > > > On Fri, Dec 08, 2023 at 12:09:18PM -0500, Jason Merrill wrote:
> > > > > On 12/5/23 15:31, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > This test shows that we cannot clear *walk_subtrees in
> > > > > > cp_fold_immediate_r when we're in_immediate_context, because that,
> > > > > > as the comment says, affects cp_fold_r as well.  Here we had an
> > > > > > expression with
> > > > > > 
> > > > > >  min ((long int) VIEW_CONVERT_EXPR > > > > > int>(bytecount), (long int) <<< Unknown tree: sizeof_expr
> > > > > >(int) <<< error >>> >>>)
> > > > > > 
> > > > > > as its sub-expression, and we never evaluated that into
> > > > > > 
> > > > > >  min ((long int) bytecount, 4)
> > > > > > 
> > > > > > so the SIZEOF_EXPR leaked into the middle end.
> > > > > > 
> > > > > > (There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but 
> > > > > > that
> > > > > > one should be OK.)
> > > > > > 
> > > > > > PR c++/112869
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * cp-gimplify.cc (cp_fold_immediate_r): Don't clear 
> > > > > > *walk_subtrees
> > > > > > for unevaluated operands.
> > > > > 
> > > > > I agree that we want this change for in_immediate_context (), but I 
> > > > > don't
> > > > > see why we want it for TYPE_P or unevaluated_p (code) or
> > > > > cp_unevaluated_operand?
> > > > 
> > > > No particular reason, just paranoia.  How's this?
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This test shows that we cannot clear *walk_subtrees in
> > > > cp_fold_immediate_r when we're in_immediate_context, because that,
> > > > as the comment says, affects cp_fold_r as well.  Here we had an
> > > > expression with
> > > > 
> > > > min ((long int) VIEW_CONVERT_EXPR(bytecount), 
> > > > (long int) <<< Unknown tree: sizeof_expr
> > > >   (int) <<< error >>> >>>)
> > > > 
> > > > as its sub-expression, and we never evaluated that into
> > > > 
> > > > min ((long int) bytecount, 4)
> > > > 
> > > > so the SIZEOF_EXPR leaked into the middle end.
> > > > 
> > > > (There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
> > > > one should be OK.)
> > > > 
> > > > PR c++/112869
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * cp-gimplify.cc (cp_fold_immediate_r): Don't clear 
> > > > *walk_subtrees
> > > > for in_immediate_context.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/template/sizeof18.C: New test.
> > > > ---
> > > >gcc/cp/cp-gimplify.cc| 6 +-
> > > >gcc/testsuite/g++.dg/template/sizeof18.C | 8 
> > > >2 files changed, 13 insertions(+), 1 deletion(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C
> > > > 
> > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > index 5abb91bbdd3..6af7c787372 100644
> > > > --- a/gcc/cp/cp-gimplify.cc
> > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > @@ -1179,11 +1179,15 @@ cp_fold_immediate_r (tree *stmt_p, int 
> > > > *walk_subtrees, void *data_)
> > > >  /* No need to look into types or unevaluated operands.
> > > > NB: This affects cp_fold_r as well.  */
> > > > -  if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
> > > > +  if (TYPE_P (stmt) || unevaluated_p (code))
> > > >{
> > > >  *walk_subtrees = 0;
> > > >  return NULL_TREE;
> > > >}
> > > > +  else if (in_immediate_context ())
> > > > +/* Don't clear *walk_subtrees here: we still need to walk the 
> > > > subtrees
> > > > +   of SIZEOF_EXPR and similar.  */
> > > > +return NULL_TREE;
> > > >  tree decl = NULL_TREE;
> > > >  bool call_p = false;
> > > > diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
> > > > b/gcc/testsuite/g++.dg/template/sizeof18.C
> > > > new file mode 100644
> > > > index 000..afba9946258
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/template/sizeof18.C
> > > > @@ -0,0 +1,8 @@
> > > > +// PR c++/112869
> > > > +// { dg-do compile }
> > > > +
> > > > +void min(long, long);
> > > > +template  void Binaryread(int &, T, unsigned long);
> > > > +template <> void Binaryread(int &, float, unsigned long bytecount) {
> > > > +  min(bytecount, sizeof(int));
> > > > +}
> > > 
> > > Hmm, actually, why does the above make a difference for this testcase?
> > > 
> > > ...
> > > 
> > > It seems that in_immediate_context always returns true in cp_fold_function
> > > because current_binding_level->kind == sk_template_parms.  That seems 
> > > like a
> > > problem.  Maybe for cp_fold_immediate_r we only want to check

Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-14 Thread FX Coudert
The testcase fails on darwin:

+FAIL: gcc.target/i386/pr112943.c (test for excess errors)

because it does not support _Decimal64.

/* { dg-do compile { target { ! ia32 } } } */

should be changed to:

/* { dg-do compile { target { dfp && { ! ia32 } } } } */


Thanks,
FX


[PATCH] c++: fix parsing with auto(x) at block scope [PR112482]

2023-12-14 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This is sort of like r14-5514, but at block scope.  Consider

  struct A { A(int, int); };
  void
  g (int a)
  {
A bar(auto(a), 42); // not a fn decl
  }

where we emit error: 'auto' parameter not permitted in this context
which is bogus -- bar doesn't declare a function, so the auto is OK,
but we don't know it till we've seen the second argument.  The error
comes from grokdeclarator invoked just after we've parsed the auto(a).

A possible approach seems to be to delay the auto parameter checking
and only check once we know we are indeed dealing with a function
declaration.  For tparms, we should still emit the error right away.

PR c++/112482

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Do not issue the auto parameter error while
tentatively parsing a function parameter.
* parser.cc (cp_parser_parameter_declaration_clause): Check it here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast15.C: New test.
---
 gcc/cp/decl.cc | 13 +++--
 gcc/cp/parser.cc   | 17 +++--
 gcc/testsuite/g++.dg/cpp23/auto-fncast15.C | 21 +
 3 files changed, 47 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast15.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4d17ead123a..1ffe4c82748 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14203,6 +14203,7 @@ grokdeclarator (const cp_declarator *declarator,
   tree auto_node = type_uses_auto (type);
   if (auto_node && !(cxx_dialect >= cxx17 && template_parm_flag))
{
+ bool err_p = true;
  if (cxx_dialect >= cxx14)
{
  if (decl_context == PARM && AUTO_IS_DECLTYPE (auto_node))
@@ -14221,13 +14222,21 @@ grokdeclarator (const cp_declarator *declarator,
"abbreviated function template");
  inform (DECL_SOURCE_LOCATION (c), "%qD declared here", c);
}
- else
+ else if (decl_context == CATCHPARM || template_parm_flag)
error_at (typespec_loc,
  "% parameter not permitted in this context");
+ else
+   /* Do not issue an error while tentatively parsing a function
+  parameter: for T t(auto(a), 42);, when we just saw the 1st
+  parameter, we don't know yet that this construct won't be
+  a function declaration.  Defer the checking to
+  cp_parser_parameter_declaration_clause.  */
+   err_p = false;
}
  else
error_at (typespec_loc, "parameter declared %");
- type = error_mark_node;
+ if (err_p)
+   type = error_mark_node;
}
 
   /* A parameter declared as an array of T is really a pointer to T.
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 58e910d64af..e4fbab1bab5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25102,8 +25102,21 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
  committed yet, nor should we.  Pushing here will detect the error
  of redefining a parameter.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_PAREN))
-for (tree p : pending_decls)
-  pushdecl (p);
+{
+  for (tree p : pending_decls)
+   pushdecl (p);
+
+  /* Delayed checking of auto parameters.  */
+  if (!parser->auto_is_implicit_function_template_parm_p
+ && cxx_dialect >= cxx14)
+   for (tree p = parameters; p; p = TREE_CHAIN (p))
+ if (type_uses_auto (TREE_TYPE (TREE_VALUE (p
+   {
+ error_at (location_of (TREE_VALUE (p)),
+   "% parameter not permitted in this context");
+ TREE_TYPE (TREE_VALUE (p)) = error_mark_node;
+   }
+}
 
   /* Finish the parameter list.  */
   if (!ellipsis_p)
diff --git a/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C 
b/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C
new file mode 100644
index 000..deb1efcc46c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C
@@ -0,0 +1,21 @@
+// PR c++/112482
+// { dg-do compile { target c++23 } }
+// { dg-options "-Wno-vexing-parse" }
+
+void foo (auto i, auto j);
+
+struct A {
+   A(int,int);
+};
+
+void
+g (int a)
+{
+  A b1(auto(42), auto(42));
+  A b2(auto(a), auto(42));
+  A b3(auto(42), auto(a));
+  A b4(auto(a), // { dg-error "13:'auto' parameter" }
+   auto(a2)); // { dg-error "13:'auto' parameter" }
+  int v1(auto(42));
+  int fn1(auto(a)); // { dg-error "16:'auto' parameter" }
+}

base-commit: 767e2674875139ac8f354ceee655c1a9561b9779
-- 
2.43.0



Re: [PATCH v4] c++: fix ICE with sizeof in a template [PR112869]

2023-12-14 Thread Jason Merrill

On 12/14/23 16:01, Marek Polacek wrote:

On Wed, Dec 13, 2023 at 03:28:38PM -0500, Jason Merrill wrote:

On 12/12/23 17:48, Marek Polacek wrote:

On Fri, Dec 08, 2023 at 11:09:15PM -0500, Jason Merrill wrote:

On 12/8/23 16:15, Marek Polacek wrote:

On Fri, Dec 08, 2023 at 12:09:18PM -0500, Jason Merrill wrote:

On 12/5/23 15:31, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

  min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

  min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for unevaluated operands.


I agree that we want this change for in_immediate_context (), but I don't
see why we want it for TYPE_P or unevaluated_p (code) or
cp_unevaluated_operand?


No particular reason, just paranoia.  How's this?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

 min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
   (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

 min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for in_immediate_context.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
gcc/cp/cp-gimplify.cc| 6 +-
gcc/testsuite/g++.dg/template/sizeof18.C | 8 
2 files changed, 13 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..6af7c787372 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1179,11 +1179,15 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
  /* No need to look into types or unevaluated operands.
 NB: This affects cp_fold_r as well.  */
-  if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
+  if (TYPE_P (stmt) || unevaluated_p (code))
{
  *walk_subtrees = 0;
  return NULL_TREE;
}
+  else if (in_immediate_context ())
+/* Don't clear *walk_subtrees here: we still need to walk the subtrees
+   of SIZEOF_EXPR and similar.  */
+return NULL_TREE;
  tree decl = NULL_TREE;
  bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}


Hmm, actually, why does the above make a difference for this testcase?

...

It seems that in_immediate_context always returns true in cp_fold_function
because current_binding_level->kind == sk_template_parms.  That seems like a
problem.  Maybe for cp_fold_immediate_r we only want to check
cp_unevaluated_operand or DECL_IMMEDIATE_CONTEXT (current_function_decl)?


Yeah, I suppose that could become an issue.  How about this, then?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
  (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
in an unevaluated operand or immediate function.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
--

Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's (was: [PATCH] aarch64: enable mixed-types for aarch64 simdclones)

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-10-16T16:03:26+0100, "Andre Vieira (lists)" 
 wrote:
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> @@ -12,8 +12,13 @@ int array[N];
>
>  #pragma omp declare simd simdlen(4) notinbranch
>  #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
>  #pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3)
> +#endif
>  __attribute__((noinline)) int
>  foo (int a, int b, int c)
>  {

These added lines run afoul with end-of-file GCN-specific DejaGnu
directives:

[...]
/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* 
} 18 } */
/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* 
} 18 } */

That, indeed, also has been suboptimal, to use absolute lines numbers
here.  (..., and maybe, like aarch64 have now done, GCN also should
suitably parameterize the 'simdlen', to resolve this altogether?
Until then, to resolve regressions, I've pushed to master branch
commit 7b15959f8e35b821ebfe832a36e5e712b708dae1
"Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's", see
attached.


Grüße
 Thomas


> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> @@ -12,8 +12,13 @@ int array[N] __attribute__((aligned (32)));
>
>  #pragma omp declare simd simdlen(4) notinbranch aligned(a:16) uniform(a) 
> linear(b)
>  #pragma omp declare simd simdlen(4) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch aligned(a:16) uniform(a) 
> linear(b)
> +#pragma omp declare simd simdlen(2) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch aligned(a:16) uniform(a) 
> linear(b)
>  #pragma omp declare simd simdlen(8) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#endif
>  __attribute__((noinline)) void
>  foo (int *a, int b, int c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> @@ -12,7 +12,11 @@ float d[N];
>  int e[N];
>  unsigned short f[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(4) notinbranch uniform(b)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch uniform(b)
> +#endif
>  __attribute__((noinline)) float
>  foo (float a, float b, float c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> @@ -10,7 +10,11 @@
>
>  int d[N], e[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
> +#else
>  #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +#endif
>  __attribute__((noinline)) long long int
>  foo (int a, int b, int c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> @@ -12,14 +12,22 @@ int a[N], b[N];
>  long int c[N];
>  unsigned char d[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
> +#endif
>  __attribute__((noinline)) int
>  foo (long int a, int b, int c)
>  {
>return a + b + c;
>  }
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
> +#endif
>  __attribute__((noinline)) long int
>  bar (int a, int b, long int c)
>  {


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7b15959f8e35b821ebfe832a36e5e712b708dae1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 14 Dec 2023 10:47:35 +0100
Subject: [PATCH] Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's

Recent commit f5fc001a84a7dbb942a6252b3162dd38b4aae311
"aarch64: enable mixed-types for aarch64 simdclones" added lines to those
test cases and GCN-specific line numbers got out of sync, which had
originally gotten added in commit b73c49f6f88dd7f7569f9a72c8ceb04598d4c15c
"amdgcn: OpenMP SIMD routine support".

	gcc/testsuite/
	* gcc.dg/vect/vect-simd-clone-1.c: Update GCN 'dg-warning's.
	* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c | 5 ++---
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c | 5 ++---
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-simd-clone

Re: [PATCH] c++: fix parsing with auto(x) at block scope [PR112482]

2023-12-14 Thread Jason Merrill

On 12/14/23 16:02, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
This is sort of like r14-5514, but at block scope.  Consider

   struct A { A(int, int); };
   void
   g (int a)
   {
 A bar(auto(a), 42); // not a fn decl
   }

where we emit error: 'auto' parameter not permitted in this context
which is bogus -- bar doesn't declare a function, so the auto is OK,
but we don't know it till we've seen the second argument.  The error
comes from grokdeclarator invoked just after we've parsed the auto(a).

A possible approach seems to be to delay the auto parameter checking
and only check once we know we are indeed dealing with a function
declaration.  For tparms, we should still emit the error right away.

PR c++/112482

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Do not issue the auto parameter error while
tentatively parsing a function parameter.
* parser.cc (cp_parser_parameter_declaration_clause): Check it here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast15.C: New test.
---
  gcc/cp/decl.cc | 13 +++--
  gcc/cp/parser.cc   | 17 +++--
  gcc/testsuite/g++.dg/cpp23/auto-fncast15.C | 21 +
  3 files changed, 47 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast15.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4d17ead123a..1ffe4c82748 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14203,6 +14203,7 @@ grokdeclarator (const cp_declarator *declarator,
tree auto_node = type_uses_auto (type);
if (auto_node && !(cxx_dialect >= cxx17 && template_parm_flag))
{
+ bool err_p = true;
  if (cxx_dialect >= cxx14)
{
  if (decl_context == PARM && AUTO_IS_DECLTYPE (auto_node))
@@ -14221,13 +14222,21 @@ grokdeclarator (const cp_declarator *declarator,
"abbreviated function template");
  inform (DECL_SOURCE_LOCATION (c), "%qD declared here", c);
}
- else
+ else if (decl_context == CATCHPARM || template_parm_flag)
error_at (typespec_loc,
  "% parameter not permitted in this context");
+ else
+   /* Do not issue an error while tentatively parsing a function
+  parameter: for T t(auto(a), 42);, when we just saw the 1st
+  parameter, we don't know yet that this construct won't be
+  a function declaration.  Defer the checking to
+  cp_parser_parameter_declaration_clause.  */
+   err_p = false;
}
  else
error_at (typespec_loc, "parameter declared %");
- type = error_mark_node;
+ if (err_p)
+   type = error_mark_node;
}
  
/* A parameter declared as an array of T is really a pointer to T.

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 58e910d64af..e4fbab1bab5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25102,8 +25102,21 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
   committed yet, nor should we.  Pushing here will detect the error
   of redefining a parameter.  */
if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_PAREN))
-for (tree p : pending_decls)
-  pushdecl (p);
+{
+  for (tree p : pending_decls)
+   pushdecl (p);
+
+  /* Delayed checking of auto parameters.  */
+  if (!parser->auto_is_implicit_function_template_parm_p
+ && cxx_dialect >= cxx14)
+   for (tree p = parameters; p; p = TREE_CHAIN (p))
+ if (type_uses_auto (TREE_TYPE (TREE_VALUE (p
+   {
+ error_at (location_of (TREE_VALUE (p)),
+   "% parameter not permitted in this context");
+ TREE_TYPE (TREE_VALUE (p)) = error_mark_node;
+   }
+}
  
/* Finish the parameter list.  */

if (!ellipsis_p)
diff --git a/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C 
b/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C
new file mode 100644
index 000..deb1efcc46c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/auto-fncast15.C
@@ -0,0 +1,21 @@
+// PR c++/112482
+// { dg-do compile { target c++23 } }
+// { dg-options "-Wno-vexing-parse" }
+
+void foo (auto i, auto j);
+
+struct A {
+   A(int,int);
+};
+
+void
+g (int a)
+{
+  A b1(auto(42), auto(42));
+  A b2(auto(a), auto(42));
+  A b3(auto(42), auto(a));
+  A b4(auto(a), // { dg-error "13:'auto' parameter" }
+   auto(a2)); // { dg-error "13:'auto' parameter" }
+  int v1(auto(42));
+  int fn1(auto(a)); // { dg-error "16:'auto' parameter" }
+}

base-commit: 767e2674875139ac8f354ceee655c1a9561b9779




Re: [PATCH] c++: abi_tag attribute on templates [PR109715]

2023-12-14 Thread Patrick Palka
On Thu, 14 Dec 2023, Jason Merrill wrote:

> On 12/14/23 14:17, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?  Do we want to condition this on abi_check (19)?
> 
> I think we do, sadly.

Sounds good, like so?  Bootstrap and regtest in progress.

-- >8 --

Subject: [PATCH] c++: abi_tag attribute on templates [PR109715]

As with other declaration attributes, we need to look through
TEMPLATE_DECL when looking up the abi_tag attribute.

PR c++/109715

gcc/cp/ChangeLog:

* mangle.cc (get_abi_tags): Look through TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/abi/abi-tag25.C: New test.
* g++.dg/abi/abi-tag25a.C: New test.
---
 gcc/cp/mangle.cc  |  6 ++
 gcc/testsuite/g++.dg/abi/abi-tag25.C  | 17 +
 gcc/testsuite/g++.dg/abi/abi-tag25a.C | 11 +++
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25.C
 create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25a.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 0684f0e6038..e3383df1836 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -530,6 +530,12 @@ get_abi_tags (tree t)
   if (DECL_P (t) && DECL_DECLARES_TYPE_P (t))
 t = TREE_TYPE (t);
 
+  if (TREE_CODE (t) == TEMPLATE_DECL
+  && DECL_TEMPLATE_RESULT (t)
+  /* We used to ignore abi_tag on function and variable templates.  */
+  && abi_check (19))
+t = DECL_TEMPLATE_RESULT (t);
+
   tree attrs;
   if (TYPE_P (t))
 attrs = TYPE_ATTRIBUTES (t);
diff --git a/gcc/testsuite/g++.dg/abi/abi-tag25.C 
b/gcc/testsuite/g++.dg/abi/abi-tag25.C
new file mode 100644
index 000..9847f0dccc8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/abi-tag25.C
@@ -0,0 +1,17 @@
+// PR c++/109715
+// { dg-do compile { target c++11 } }
+
+template
+[[gnu::abi_tag("foo")]] void fun() { }
+
+template void fun();
+
+#if __cpp_variable_templates
+template
+[[gnu::abi_tag("foo")]] int var = 0;
+
+template int var;
+#endif
+
+// { dg-final { scan-assembler "_Z3funB3fooIiEvv" } }
+// { dg-final { scan-assembler "_Z3varB3fooIiE" { target c++14 } } }
diff --git a/gcc/testsuite/g++.dg/abi/abi-tag25a.C 
b/gcc/testsuite/g++.dg/abi/abi-tag25a.C
new file mode 100644
index 000..9499b5614cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/abi-tag25a.C
@@ -0,0 +1,11 @@
+// PR c++/109715
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fabi-version=18 -fabi-compat-version=18 -Wabi=0" }
+
+#include "abi-tag25.C"
+
+// { dg-warning "mangled name" "" { target *-*-* } 5 }
+// { dg-warning "mangled name" "" { target *-*-* } 11 }
+
+// { dg-final { scan-assembler "_Z3funIiEvv" } }
+// { dg-final { scan-assembler "_Z3varIiE" { target c++14 } } }
-- 
2.43.0.76.g1a87c842ec




Re: [V4] [PATCH 3/4] c23: aliasing of compatible tagged types

2023-12-14 Thread Joseph Myers
On Mon, 27 Nov 2023, Martin Uecker wrote:

> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
> index a5dd9a37944..ece5b6a5d26 100644
> --- a/gcc/c/c-tree.h
> +++ b/gcc/c/c-tree.h
> @@ -758,6 +758,7 @@ extern tree require_complete_type (location_t, tree);
>  extern bool same_translation_unit_p (const_tree, const_tree);
>  extern int comptypes (tree, tree);
>  extern bool comptypes_same_p (tree, tree);
> +extern int comptypes_equiv_p (tree, tree);

This function should return bool.

> @@ -1250,6 +1266,9 @@ comptypes_internal (const_tree type1, const_tree type2,
>  
>   if ((d1 == NULL_TREE) != (d2 == NULL_TREE))
> data->different_types_p = true;
> + /* Ignore size mismatches.  */
> + if (data->equiv)
> +   return 1;
>   /* Sizes must match unless one is missing or variable.  */
>   if (d1 == NULL_TREE || d2 == NULL_TREE || d1 == d2)
> return true;
> @@ -1467,6 +1486,9 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree 
> t2,
>   if (list_length (TYPE_FIELDS (t1)) != list_length (TYPE_FIELDS (t2)))
> return false;
>  
> + if (data->equiv && (C_TYPE_VARIABLE_SIZE (t1) || C_TYPE_VARIABLE_SIZE 
> (t2)))
> +   return 0;
> +
>   for (s1 = TYPE_FIELDS (t1), s2 = TYPE_FIELDS (t2);
>s1 && s2;
>s1 = DECL_CHAIN (s1), s2 = DECL_CHAIN (s2))
> @@ -1486,6 +1508,15 @@ tagged_types_tu_compatible_p (const_tree t1, 
> const_tree t2,
>   && simple_cst_equal (DECL_FIELD_BIT_OFFSET (s1),
>DECL_FIELD_BIT_OFFSET (s2)) != 1)
> return false;
> +
> + tree st1 = TYPE_SIZE (TREE_TYPE (s1));
> + tree st2 = TYPE_SIZE (TREE_TYPE (s2));
> +
> + if (data->equiv
> + && st1 && TREE_CODE (st1) == INTEGER_CST
> + && st2 && TREE_CODE (st2) == INTEGER_CST
> + && !tree_int_cst_equal (st1, st2))
> +  return 0;

And these functions do return bool, so you should use true and false 
instead of 1 and 0.

> +/* The structs are incompatible so can be assumed not to
> + * alias, but this is not exploited.  So do not check for 
> + * this below but check the warning about incompatibility.  */
> +
> +int test_bar3(struct bar* a, void* b)
> +{
> + a->x = 1;
> +
> + struct bar { int x; int f[1]; }* p = b;
> + struct bar* q = a;  /* { dg-warning "incompatible" 
> } */

I expect you'll now need -fpermissive or 
-Wno-error=incompatible-pointer-types (this is an execution test so you 
need to stop this being an error, but see below).

> + // allow both results here
> + int r = test_bar3(&z, &z);
> + if ((r != 2) && (r != 1))
> + __builtin_abort();

I don't think you should really be executing this call at all (aliasing 
not allowed means undefined behavior at runtime); better to put this in a 
separate compile-only test (which would also avoid the need for 
-fpermissive or -Wno-error=incompatible-pointer-types because once it's no 
longer an execution test, having an error is OK).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix tests for gomp

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-12-13T12:09:14+0100, Jakub Jelinek  wrote:
> On Wed, Dec 13, 2023 at 11:03:50AM +, Andre Vieira (lists) wrote:
>> Hmm I think I understand what you are saying, but I'm not sure I agree.
>> So before I enabled simdclone testing for aarch64, this test had no target
>> selectors. So it checked the same for 'all simdclone test targets'. Which
>> seem to be x86 and amdgcn:
>>
>> @@ -4321,7 +4321,8 @@ proc check_effective_target_vect_simd_clones { } {
>>  return [check_cached_effective_target_indexed vect_simd_clones {
>>expr { (([istarget i?86-*-*] || [istarget x86_64-*-*])
>>   && [check_effective_target_avx512f])
>> || [istarget amdgcn-*-*]
>> || [istarget aarch64*-*-*] }}]
>>  }
>>
>> I haven't checked what amdgcn does with this test, but I'd have to assume
>> they were passing before? Though I'm not sure how amdgcn would pass the
>> original:

>> --- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
>> +++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c

>>  -  /* At gimplification time, we can't decide yet which function to call.  
>> */
>>  -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
>
> It can't really pass there.  amdgcn certainly doesn't create 4 different
> simd clones where one has avx512f isa and others don't.
> gcn creates just one simd clone with simdlen 64 and that clone will never
> support avx512f isa and we know that already at gimplification time.

For GCN target (and likewise, nvptx target) configurations, libgomp test
cases currently are a total mess -- the reason being that those target
configurations actually (largely) implement GCN or nvptx *offloading*
configuration functionality: they lower OMP constructs and implement
libgomp functions in a way that (largely) assumes that they're
*offloading* instead of *target* configurations, and therefore things go
horribly wrong.  (This certainly is something worth fixing, but...)
Therefore, currently, GCN or nvptx *target* configuration's
'check-target-libgomp' currently doesn't really have any value, and
certainly isn't maintained in any way.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH #2/2] strub: sparc64: unbias the stack address [PR112917]

2023-12-14 Thread Alexandre Oliva


The stack pointer is biased by 2047 bytes on sparc64, so the range it
delimits is way off.  Unbias the addresses returned by
__builtin_stack_address (), so that the strub builtins, inlined or
not, can function correctly.  I've considered introducing a new target
macro, but using STACK_POINTER_OFFSET seems safe, and it enables the
register save areas to be scrubbed as well.

Because of the large fixed-size outgoing args area next to the
register save area on sparc, we still need __strub_leave to not
allocate its own frame, otherwise it won't be able to clear part of
the frame it should.

Regstrapped on x86_64-linux-gnu, also testing on sparc-solaris2.11.3.
Ok to install?



for  gcc/ChangeLog

PR middle-end/112917
* builtins.cc (expand_bultin_stack_address): Add
STACK_POINTER_OFFSET.
---
 gcc/builtins.cc |   34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 7c2732ab79e6f..4c8c514fe8618 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5443,8 +5443,38 @@ expand_builtin_frame_address (tree fndecl, tree exp)
 static rtx
 expand_builtin_stack_address ()
 {
-  return convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
- STACK_UNSIGNED);
+  rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
+STACK_UNSIGNED);
+
+  /* Unbias the stack pointer, bringing it to the boundary between the
+ stack area claimed by the active function calling this builtin,
+ and stack ranges that could get clobbered if it called another
+ function.  It should NOT encompass any stack red zone, that is
+ used in leaf functions.
+
+ On SPARC, the register save area is *not* considered active or
+ used by the active function, but rather as akin to the area in
+ which call-preserved registers are saved by callees.  This
+ enables __strub_leave to clear what would otherwise overlap with
+ its own register save area.
+
+ If the address is computed too high or too low, parts of a stack
+ range that should be scrubbed may be left unscrubbed, scrubbing
+ may corrupt active portions of the stack frame, and stack ranges
+ may be doubly-scrubbed by caller and callee.
+
+ In order for it to be just right, the area delimited by
+ @code{__builtin_stack_address} and @code{__builtin_frame_address
+ (0)} should encompass caller's registers saved by the function,
+ local on-stack variables and @code{alloca} stack areas.
+ Accumulated outgoing on-stack arguments, preallocated as part of
+ a function's own prologue, are to be regarded as part of the
+ (caller) function's active area as well, whereas those pushed or
+ allocated temporarily for a call are regarded as part of the
+ callee's stack range, rather than the caller's.  */
+  ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
+
+  return force_reg (ptr_mode, ret);
 }
 
 /* Expand a call to builtin function __builtin_strub_enter.  */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase

2023-12-14 Thread Patrick O'Neill
The testcase for pr112773 started passing after r14-6472-g8501edba91e
which was before the actual fix. This patch adds -fno-vect-cost-model
which prevents the testcase from passing due to the vls change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/pr112773.c: Add
-fno-vect-cost-model.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
index 5f7374b0040..57104c9ebec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
+/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3 
-fno-vect-cost-model" } */
 
 long long a;
 int b, c;
-- 
2.42.0



[COMMITTED] middle-end: Fix up constant handling in emit_conditional_move [PR111260]

2023-12-14 Thread Andrew Pinski
After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a register
if it is shared with one of the other operands. The problem is used the 
comparison
mode for the register but that could be different from the operand mode. This
causes some issues on some targets.
To fix it, we need to make sure the mode of the comparison matches the mode
of the other operands, before we can compare the constants (CONST_INT has no
modes so compare_rtx returns true if they have the same value even if the usage
is in a different mode).

Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.

PR middle-end/111260

gcc/ChangeLog:

* optabs.cc (emit_conditional_move): Change the modes to be
equal before forcing the constant to a register.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/condmove-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/optabs.cc| 2 ++
 gcc/testsuite/gcc.c-torture/compile/condmove-1.c | 9 +
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/condmove-1.c

diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index f0a048a6bdb..6a34276c239 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -5131,6 +5131,7 @@ emit_conditional_move (rtx target, struct rtx_comparison 
comp,
  /* If we are optimizing, force expensive constants into a register
 but preserve an eventual equality with op2/op3.  */
  if (CONSTANT_P (orig_op0) && optimize
+ && cmpmode == mode
  && (rtx_cost (orig_op0, mode, COMPARE, 0,
optimize_insn_for_speed_p ())
  > COSTS_N_INSNS (1))
@@ -5142,6 +5143,7 @@ emit_conditional_move (rtx target, struct rtx_comparison 
comp,
op3p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
}
  if (CONSTANT_P (orig_op1) && optimize
+ && cmpmode == mode
  && (rtx_cost (orig_op1, mode, COMPARE, 0,
optimize_insn_for_speed_p ())
  > COSTS_N_INSNS (1))
diff --git a/gcc/testsuite/gcc.c-torture/compile/condmove-1.c 
b/gcc/testsuite/gcc.c-torture/compile/condmove-1.c
new file mode 100644
index 000..3fcc591af00
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/condmove-1.c
@@ -0,0 +1,9 @@
+/* PR middle-end/111260 */
+
+/* Used to ICE while expansion of the `(a == b) ? b : 0;` */
+int f1(long long a)
+{
+  int b = 822920;
+  int t = a == b;
+  return t * (int)b;
+}
-- 
2.39.3



[pushed] testsuite: move more analyzer test cases to c-c++-common (3) [PR96395]

2023-12-14 Thread David Malcolm
Move a further 268 tests from gcc.dg/analyzer to c-c++-common/analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-6564-gae034b9106fbdd.

gcc/testsuite/ChangeLog:
PR analyzer/96395
* c-c++-common/analyzer/analyzer-decls.h: New header.
* gcc.dg/analyzer/20020129-1.c: Move to...
* c-c++-common/analyzer/20020129-1.c: ...here.
* gcc.dg/analyzer/SARD-tc117-basic-1-min.c: Move to...
* c-c++-common/analyzer/SARD-tc117-basic-1-min.c: ...here.
* gcc.dg/analyzer/SARD-tc249-basic-00034-min.c: Move to...
* c-c++-common/analyzer/SARD-tc249-basic-00034-min.c: ...here.
* gcc.dg/analyzer/abort.c: Move to...
* c-c++-common/analyzer/abort.c: ...here.
* gcc.dg/analyzer/aliasing-1.c: Move to...
* c-c++-common/analyzer/aliasing-1.c: ...here.
* gcc.dg/analyzer/aliasing-2.c: Move to...
* c-c++-common/analyzer/aliasing-2.c: ...here.
* gcc.dg/analyzer/alloca-leak.c: Move to...
* c-c++-common/analyzer/alloca-leak.c: ...here.
* gcc.dg/analyzer/analyzer-debugging-fns-1.c: Move to...
* c-c++-common/analyzer/analyzer-debugging-fns-1.c: ...here.
* gcc.dg/analyzer/analyzer-verbosity-2a.c: Move to...
* c-c++-common/analyzer/analyzer-verbosity-2a.c: ...here.
* gcc.dg/analyzer/analyzer-verbosity-3a.c: Move to...
* c-c++-common/analyzer/analyzer-verbosity-3a.c: ...here.
* gcc.dg/analyzer/asm-x86-1.c: Move to...
* c-c++-common/analyzer/asm-x86-1.c: ...here.
* gcc.dg/analyzer/attr-alloc_size-3.c: Move to...
* c-c++-common/analyzer/attr-alloc_size-3.c: ...here.
* gcc.dg/analyzer/attr-const-1.c: Move to...
* c-c++-common/analyzer/attr-const-1.c: ...here.
* gcc.dg/analyzer/attr-const-2.c: Move to...
* c-c++-common/analyzer/attr-const-2.c: ...here.
* gcc.dg/analyzer/attr-const-3.c: Move to...
* c-c++-common/analyzer/attr-const-3.c: ...here.
* gcc.dg/analyzer/attr-malloc-2.c: Move to...
* c-c++-common/analyzer/attr-malloc-2.c: ...here.
* gcc.dg/analyzer/attr-malloc-4.c: Move to...
* c-c++-common/analyzer/attr-malloc-4.c: ...here.
* gcc.dg/analyzer/attr-malloc-5.c: Move to...
* c-c++-common/analyzer/attr-malloc-5.c: ...here.
* gcc.dg/analyzer/attr-malloc-misuses.c: Move to...
* c-c++-common/analyzer/attr-malloc-misuses.c: ...here.
* gcc.dg/analyzer/attr-tainted_args-misuses.c: Move to...
* c-c++-common/analyzer/attr-tainted_args-misuses.c: ...here.
* gcc.dg/analyzer/bzip2-arg-parse-1.c: Move to...
* c-c++-common/analyzer/bzip2-arg-parse-1.c: ...here.
* gcc.dg/analyzer/call-summaries-1.c: Move to...
* c-c++-common/analyzer/call-summaries-1.c: ...here.
* gcc.dg/analyzer/call-summaries-3.c: Move to...
* c-c++-common/analyzer/call-summaries-3.c: ...here.
* gcc.dg/analyzer/call-summaries-asm-x86.c: Move to...
* c-c++-common/analyzer/call-summaries-asm-x86.c: ...here.
* gcc.dg/analyzer/callbacks-1.c: Move to...
* c-c++-common/analyzer/callbacks-1.c: ...here.
* gcc.dg/analyzer/callbacks-2.c: Move to...
* c-c++-common/analyzer/callbacks-2.c: ...here.
* gcc.dg/analyzer/callbacks-3.c: Move to...
* c-c++-common/analyzer/callbacks-3.c: ...here.
* gcc.dg/analyzer/capacity-2.c: Move to...
* c-c++-common/analyzer/capacity-2.c: ...here.
* gcc.dg/analyzer/capacity-3.c: Move to...
* c-c++-common/analyzer/capacity-3.c: ...here.
* gcc.dg/analyzer/casts-1.c: Move to...
* c-c++-common/analyzer/casts-1.c: ...here.
* gcc.dg/analyzer/casts-2.c: Move to...
* c-c++-common/analyzer/casts-2.c: ...here.
* gcc.dg/analyzer/clobbers-1.c: Move to...
* c-c++-common/analyzer/clobbers-1.c: ...here.
* gcc.dg/analyzer/compound-assignment-4.c: Move to...
* c-c++-common/analyzer/compound-assignment-4.c: ...here.
* gcc.dg/analyzer/data-model-12.c: Move to...
* c-c++-common/analyzer/data-model-12.c: ...here.
* gcc.dg/analyzer/data-model-14.c: Move to...
* c-c++-common/analyzer/data-model-14.c: ...here.
* gcc.dg/analyzer/data-model-18.c: Move to...
* c-c++-common/analyzer/data-model-18.c: ...here.
* gcc.dg/analyzer/data-model-2.c: Move to...
* c-c++-common/analyzer/data-model-2.c: ...here.
* gcc.dg/analyzer/data-model-20.c: Move to...
* c-c++-common/analyzer/data-model-20.c: ...here.
* gcc.dg/analyzer/data-model-21.c: Move to...
* c-c++-common/analyzer/data-model-21.c: ...here.
* gcc.dg/analyzer/data-model-22.c: Move to...
* c-c++-common/analyzer/data-model-22.c: ...here.
* gcc.dg/analyzer/data-model-4.c: Move to...
* c-c++-common/analyzer/data-model-4.c: ...here.
* gcc.dg/a

Re: [PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase

2023-12-14 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-12-15 05:32
To: gcc-patches
CC: rdapp.gcc; juzhe.zhong; Patrick O'Neill
Subject: [PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase
The testcase for pr112773 started passing after r14-6472-g8501edba91e
which was before the actual fix. This patch adds -fno-vect-cost-model
which prevents the testcase from passing due to the vls change.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/pr112773.c: Add
-fno-vect-cost-model.
 
Signed-off-by: Patrick O'Neill 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
index 5f7374b0040..57104c9ebec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
+/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3 
-fno-vect-cost-model" } */
long long a;
int b, c;
-- 
2.42.0
 
 


Re: [PATCH] c++: abi_tag attribute on templates [PR109715]

2023-12-14 Thread Jason Merrill

On 12/14/23 16:08, Patrick Palka wrote:

On Thu, 14 Dec 2023, Jason Merrill wrote:


On 12/14/23 14:17, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Do we want to condition this on abi_check (19)?


I think we do, sadly.


Sounds good, like so?  Bootstrap and regtest in progress.

-- >8 --

Subject: [PATCH] c++: abi_tag attribute on templates [PR109715]

As with other declaration attributes, we need to look through
TEMPLATE_DECL when looking up the abi_tag attribute.

PR c++/109715

gcc/cp/ChangeLog:

* mangle.cc (get_abi_tags): Look through TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/abi/abi-tag25.C: New test.
* g++.dg/abi/abi-tag25a.C: New test.
---
  gcc/cp/mangle.cc  |  6 ++
  gcc/testsuite/g++.dg/abi/abi-tag25.C  | 17 +
  gcc/testsuite/g++.dg/abi/abi-tag25a.C | 11 +++
  3 files changed, 34 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25.C
  create mode 100644 gcc/testsuite/g++.dg/abi/abi-tag25a.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 0684f0e6038..e3383df1836 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -530,6 +530,12 @@ get_abi_tags (tree t)
if (DECL_P (t) && DECL_DECLARES_TYPE_P (t))
  t = TREE_TYPE (t);
  
+  if (TREE_CODE (t) == TEMPLATE_DECL

+  && DECL_TEMPLATE_RESULT (t)
+  /* We used to ignore abi_tag on function and variable templates.  */
+  && abi_check (19))
+t = DECL_TEMPLATE_RESULT (t);


Generally I try to call abi_check only when we know that there's 
something that will change the mangling, so here only if the template 
has ABI tags.  I suppose the only downside is a second mangling that 
produces the same name and gets ignored in mangle_decl so we don't need 
to be too strict about it, but it shouldn't be too hard to do that here?


Jason



[PATCH] lower-bitint: Fix .{ADD,SUB,MUL}_OVERFLOW with _BitInt large/huge INTEGER_CST arguments [PR113003]

2023-12-14 Thread Jakub Jelinek
Hi!

As shown in the testcase, .{ADD,SUB,MUL}_OVERFLOW calls are another
exception to the middle/large/huge _BitInt discovery through SSA_NAMEs
next to stores of INTEGER_CSTs to memory and their conversions to
floating point.
The calls can have normal COMPLEX_TYPE with INTEGER_TYPE elts return type
(or BITINT_TYPE with small precision) and one of the arguments can be
SSA_NAME with an INTEGER_TYPE or small BITINT_TYPE as well; still, when
there is an INTEGER_CST argument with large/huge BITINT_TYPE, we need to
lower it that way.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2023-12-14  Jakub Jelinek  

PR tree-optimization/113003
* gimple-lower-bitint.cc (arith_overflow_arg_kind): New function.
(gimple_lower_bitint): Use it to catch .{ADD,SUB,MUL}_OVERFLOW
calls with large/huge INTEGER_CST arguments.

* gcc.dg/bitint-54.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-13 11:36:21.636633517 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-14 13:28:17.660737468 +0100
@@ -5641,6 +5641,37 @@ build_bitint_stmt_ssa_conflicts (gimple
 def (live, lhs, graph);
 }
 
+/* If STMT is .{ADD,SUB,MUL}_OVERFLOW with INTEGER_CST arguments,
+   return the largest bitint_prec_kind of them, otherwise return
+   bitint_prec_small.  */
+
+static bitint_prec_kind
+arith_overflow_arg_kind (gimple *stmt)
+{
+  bitint_prec_kind ret = bitint_prec_small;
+  if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
+switch (gimple_call_internal_fn (stmt))
+  {
+  case IFN_ADD_OVERFLOW:
+  case IFN_SUB_OVERFLOW:
+  case IFN_MUL_OVERFLOW:
+   for (int i = 0; i < 2; ++i)
+ {
+   tree a = gimple_call_arg (stmt, i);
+   if (TREE_CODE (a) == INTEGER_CST
+   && TREE_CODE (TREE_TYPE (a)) == BITINT_TYPE)
+ {
+   bitint_prec_kind kind = bitint_precision_kind (TREE_TYPE (a));
+   ret = MAX (ret, kind);
+ }
+ }
+   break;
+  default:
+   break;
+  }
+  return ret;
+}
+
 /* Entry point for _BitInt(N) operation lowering during optimization.  */
 
 static unsigned int
@@ -5657,7 +5688,12 @@ gimple_lower_bitint (void)
continue;
   tree type = TREE_TYPE (s);
   if (TREE_CODE (type) == COMPLEX_TYPE)
-   type = TREE_TYPE (type);
+   {
+ if (arith_overflow_arg_kind (SSA_NAME_DEF_STMT (s))
+ != bitint_prec_small)
+   break;
+ type = TREE_TYPE (type);
+   }
   if (TREE_CODE (type) == BITINT_TYPE
  && bitint_precision_kind (type) != bitint_prec_small)
break;
@@ -5745,7 +5781,12 @@ gimple_lower_bitint (void)
continue;
   tree type = TREE_TYPE (s);
   if (TREE_CODE (type) == COMPLEX_TYPE)
-   type = TREE_TYPE (type);
+   {
+ if (arith_overflow_arg_kind (SSA_NAME_DEF_STMT (s))
+ >= bitint_prec_large)
+   has_large_huge = true;
+ type = TREE_TYPE (type);
+   }
   if (TREE_CODE (type) == BITINT_TYPE
  && bitint_precision_kind (type) >= bitint_prec_large)
{
@@ -6245,8 +6286,7 @@ gimple_lower_bitint (void)
  {
bitint_prec_kind this_kind
  = bitint_precision_kind (TREE_TYPE (t));
-   if (this_kind > kind)
- kind = this_kind;
+   kind = MAX (kind, this_kind);
  }
  if (is_gimple_assign (stmt) && gimple_store_p (stmt))
{
@@ -6255,8 +6295,7 @@ gimple_lower_bitint (void)
{
  bitint_prec_kind this_kind
= bitint_precision_kind (TREE_TYPE (t));
- if (this_kind > kind)
-   kind = this_kind;
+ kind = MAX (kind, this_kind);
}
}
  if (is_gimple_assign (stmt)
@@ -6268,21 +6307,22 @@ gimple_lower_bitint (void)
{
  bitint_prec_kind this_kind
= bitint_precision_kind (TREE_TYPE (t));
- if (this_kind > kind)
-   kind = this_kind;
+ kind = MAX (kind, this_kind);
}
}
  if (is_gimple_call (stmt))
{
  t = gimple_call_lhs (stmt);
- if (t
- && TREE_CODE (TREE_TYPE (t)) == COMPLEX_TYPE
- && TREE_CODE (TREE_TYPE (TREE_TYPE (t))) == BITINT_TYPE)
+ if (t && TREE_CODE (TREE_TYPE (t)) == COMPLEX_TYPE)
{
- bitint_prec_kind this_kind
-   = bitint_precision_kind (TREE_TYPE (TREE_TYPE (t)));
- if (this_kind > kind)
-   kind = this_kind;
+ bitint_prec_kind this_kind = arith_overflow_arg_kind (stmt);
+ kind = MAX (kind, this_kind);
+ if (TREE_CODE (TREE_TYPE (TREE_TYPE (t))) == BITINT_TYPE)
+   {
+

[PATCH] bitint: Introduce abi_limb_mode

2023-12-14 Thread Jakub Jelinek
Hi!

Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said
earlier I'm afraid we need to differentiate between the limb mode/precision
specified in the psABIs (what is used to decide how it is actually passed,
aligned or what size it has) vs. what limb mode/precision should be used
during bitint lowering and in the libgcc bitint APIs.
While in the x86_64 psABI a limb is 64-bit, which is perfect for both,
that is a wordsize which we can perform operations natively in,
e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but
on the bitint lowering side I believe it would result in terribly bad code
and on the libgcc side wouldn't work at all (because it relies there on
longlong.h support).

So, the following patch makes it possible for aarch64 to use TImode
as abi_limb_mode for _BitInt(129) and larger, while using DImode as
limb_mode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-14  Jakub Jelinek  

* target.h (struct bitint_info): Add abi_limb_mode member, adjust
comment.
* target.def (bitint_type_info): Mention abi_limb_mode instead of
limb_mode.
* varasm.cc (output_constant): Use abi_limb_mode rather than
limb_mode.
* stor-layout.cc (finish_bitfield_representative): Likewise.  Assert
that if precision is smaller or equal to abi_limb_mode precision or
if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode
must be the same as info.abi_limb_mode.
(layout_type): Use abi_limb_mode rather than limb_mode.
* gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise.
(clear_padding_type): Likewise.
* config/i386/i386.cc (ix86_bitint_type_info): Also set
info->abi_limb_mode.
* doc/tm.texi: Regenerated.

--- gcc/target.h.jj 2023-09-06 17:28:24.228977486 +0200
+++ gcc/target.h2023-12-14 14:26:48.490047206 +0100
@@ -69,15 +69,23 @@ union cumulative_args_t { void *p; };
 #endif /* !CHECKING_P */
 
 /* Target properties of _BitInt(N) type.  _BitInt(N) is to be represented
-   as series of limb_mode CEIL (N, GET_MODE_PRECISION (limb_mode)) limbs,
-   ordered from least significant to most significant if !big_endian,
+   as series of abi_limb_mode CEIL (N, GET_MODE_PRECISION (abi_limb_mode))
+   limbs, ordered from least significant to most significant if !big_endian,
otherwise from most significant to least significant.  If extended is
false, the bits above or equal to N are undefined when stored in a register
or memory, otherwise they are zero or sign extended depending on if
-   it is unsigned _BitInt(N) or _BitInt(N) / signed _BitInt(N).  */
+   it is unsigned _BitInt(N) or _BitInt(N) / signed _BitInt(N).
+   limb_mode is either the same as abi_limb_mode, or some narrower mode
+   in which _BitInt lowering should actually perform operations in and
+   what libgcc _BitInt helpers should use.
+   E.g. abi_limb_mode could be TImode which is something some processor
+   specific ABI would specify to use, but it would be desirable to handle
+   it as an array of DImode instead for efficiency.
+   Note, abi_limb_mode can be different from limb_mode only if big_endian
+   matches WORDS_BIG_ENDIAN.  */
 
 struct bitint_info {
-  machine_mode limb_mode;
+  machine_mode abi_limb_mode, limb_mode;
   bool big_endian;
   bool extended;
 };
--- gcc/target.def.jj   2023-12-08 08:28:23.644171016 +0100
+++ gcc/target.def  2023-12-14 14:27:25.239537794 +0100
@@ -6357,8 +6357,8 @@ DEFHOOK
 (bitint_type_info,
  "This target hook returns true if @code{_BitInt(@var{N})} is supported and\n\
 provides details on it.  @code{_BitInt(@var{N})} is to be represented as\n\
-series of @code{info->limb_mode}\n\
-@code{CEIL (@var{N}, GET_MODE_PRECISION (info->limb_mode))} limbs,\n\
+series of @code{info->abi_limb_mode}\n\
+@code{CEIL (@var{N}, GET_MODE_PRECISION (info->abi_limb_mode))} limbs,\n\
 ordered from least significant to most significant if\n\
 @code{!info->big_endian}, otherwise from most significant to least\n\
 significant.  If @code{info->extended} is false, the bits above or equal to\n\
--- gcc/varasm.cc.jj2023-12-01 08:10:44.504299177 +0100
+++ gcc/varasm.cc   2023-12-14 14:55:45.821971713 +0100
@@ -5315,7 +5315,8 @@ output_constant (tree exp, unsigned HOST
  tree type = TREE_TYPE (exp);
  bool ok = targetm.c.bitint_type_info (TYPE_PRECISION (type), &info);
  gcc_assert (ok);
- scalar_int_mode limb_mode = as_a  (info.limb_mode);
+ scalar_int_mode limb_mode
+   = as_a  (info.abi_limb_mode);
  if (TYPE_PRECISION (type) <= GET_MODE_PRECISION (limb_mode))
{
  cst = expand_expr (exp, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);
--- gcc/stor-layout.cc.jj   2023-10-08 16:37:31.780273377 +0200
+++ gcc/stor-layout.cc  2023-12-14 14:59:27.147904721 +0100
@@ -2154,7 +2154,8 @@ finish_bitfield_representative (tree rep
 

[Committed] RISC-V: Adjust test

2023-12-14 Thread Juzhe-Zhong
Since middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640595.html

will change vectorization code.

Adapt tests for ths patch.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Adapt test.

---
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c
index 6f983ef8bb5..b37166b2973 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c
@@ -62,8 +62,7 @@ int main() {
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetivli\tzero,\s*4,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli} 4 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\tzero,\s*[a-x0-9]+,\s*e8,\s*m8,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times {li\t[a-x0-9]+,\s*32} 1 } } */
-- 
2.36.3



Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-14 Thread Thomas Rodgers
I need to look at this a bit more (and not on my phone, at lunch).
Ultimately, C++26 expects to add predicate waits and returning a
‘tri-state’ result isn’t something that’s been considered or likely to be
approved.

On Mon, Dec 11, 2023 at 12:18 PM Jonathan Wakely 
wrote:

> CCing Tom's current address, as he's not @redhat.com now.
>
> On Mon, 11 Dec 2023, 19:24 Nate Eldredge, 
> wrote:
>
>> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>>
>> > To fix, we need something like `__args._M_old = __val;` inside the loop
>> in
>> > __atomic_wait_address(), so that we always wait on the exact value that
>> the
>> > predicate __pred() rejected.  Again, there are similar instances in
>> > atomic_timed_wait.h.
>>
>> Thinking through this, there's another problem.  The main loop in
>> __atomic_wait_address() would then be:
>>
>>while (!__pred(__val))
>>  {
>>__args._M_old = __val;
>>__detail::__wait_impl(__wait_addr, &__args);
>>__val = __vfn();
>>  }
>>
>> The idea being that we only call __wait_impl to wait on a value that the
>> predicate said was unacceptable.  But looking for instance at its caller
>> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
>> passed in is _S_do_try_acquire(), whose code is:
>>
>>  _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>>__detail::__platform_wait_t __old) noexcept
>>  {
>>if (__old == 0)
>>  return false;
>>
>>return __atomic_impl::compare_exchange_strong(__counter,
>>  __old, __old - 1,
>>
>>  memory_order::acquire,
>>
>>  memory_order::relaxed);
>>  }
>>
>> It returns false if the value passed in was unacceptable (i.e. zero),
>> *or*
>> if it was nonzero (let's say 1) but the compare_exchange failed because
>> another thread swooped in and modified the semaphore counter.  In that
>> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
>> is likewise bad.  If the counter is externally changed back to 1 just
>> before we call __platform_wait (that's the futex call), we would go to
>> sleep waiting on a semaphore that is already available: deadlock.
>>
>> I guess there's a couple ways to fix it.
>>
>> You could have the "predicate" callback instead return a tri-state value:
>> "all done, stop waiting" (like current true), "value passed is not
>> acceptable" (like current false), and "value was acceptable but something
>> else went wrong".  Only the second case should result in calling
>> __wait_impl().  In the third case, __atomic_wait_address() should
>> just reload the value (using __vfn()) and loop again.
>>
>> Or, make the callback __pred() a pure predicate that only tests its input
>> value for acceptability, without any other side effects.  Then have
>> __atomic_wait_address() simply return as soon as __pred(__val) returns
>> true.  It would be up to the caller to actually decrement the semaphore
>> or
>> whatever, and to call __atomic_wait_address() again if this fails.  In
>> that case, __atomic_wait_address should probably return the final value
>> that was read, so the caller can immediately do something like a
>> compare-exchange using it, and not have to do an additional load and
>> predicate test.
>>
>> Or, make __pred() a pure predicate as before, and give
>> __atomic_wait_address yet one more callback function argument, call it
>> __taker(), whose job is to acquire the semaphore etc, and have
>> __atomic_wait_address call it after __pred(__val) returns true.
>>
>> --
>> Nate Eldredge
>> n...@thatsmathematics.com
>>
>>


  1   2   >