Re: Re: [PATCH v1] RISC-V: Align the predictor style for define_insn_and_split

2023-06-14 Thread juzhe.zh...@rivai.ai
>> Yeah sure, we need to be able to run tests only for specific targets.
>> Why does {riscv_vector} && {rv64} not work?
I am not sure. These testcases were added by kito long time ago.
Frankly, I am not familiar with GCC test framework.

I think the highest priority is to fix the "real" compiler bugs which I have 
noticed yesterday:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test

@Li Pan could you verify whether your patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621610.html can fix these 2 
issues?
If yes, please send V2 patch with append these information into patch log.


Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-14 14:52
To: juzhe.zh...@rivai.ai; pan2.li; gcc-patches
CC: rdapp.gcc; jeffreyalaw; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v1] RISC-V: Align the predictor style for 
define_insn_and_split
Yes, I agree with the general assessment (and didn't mean to insinuate
that the FAILs are compiler's or a fault of the patch.
 
> So these 2 failures in RV32 are not the compile's bugs. I have seen:
> /* { dg-do run { target { { {riscv_vector} && {rv64} } } } } */ in
> these testcases which can not work to block execution in RV32 (Since
> such testcase only needs to be tested on RV64). I think this is the
> issue we need to figure out.
 
Yeah sure, we need to be able to run tests only for specific targets.
Why does {riscv_vector} && {rv64} not work?
 
For zvfh I'm testing something like the following:
 
proc check_effective_target_riscv_zvfh { } {
if { ![istarget rv32*-*-*] && ![istarget rv64*-*-*] } then {
return 0;
}
 
if !check_effective_target_riscv_vector then {
return 0;
}
 
return [
[check_runtime riscv_check_zvfh {
int main (void)
{
asm ("vsetivli zero,8,e16,m1,ta,ma");
asm ("vfadd.vv %%v8,%%v8,%%v16" : : : "%%v8");
return 0;
}
} "-march=rv64gcv_zvfh" ]
|| ... ]
 
Regards
Robin
 


Re: [PATCH v1] RISC-V: Align the predictor style for define_insn_and_split

2023-06-14 Thread Robin Dapp via Gcc-patches
> I am not sure. These testcases were added by kito long time ago.
> Frankly, I am not familiar with GCC test framework.

Ok, I'm going to have a look.  Need to verify the zvfh things anyway.

Regards
 Robin


RE: Re: [PATCH v1] RISC-V: Align the predictor style for define_insn_and_split

2023-06-14 Thread Li, Pan2 via Gcc-patches
Sure, working on the V2 as well as the RV32 testing, will reply the bugfix 
PATCH once ready.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, June 14, 2023 3:01 PM
To: Robin Dapp ; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; jeffreyalaw ; 
Wang, Yanzhang ; kito.cheng 
Subject: Re: Re: [PATCH v1] RISC-V: Align the predictor style for 
define_insn_and_split

>> Yeah sure, we need to be able to run tests only for specific targets.
>> Why does {riscv_vector} && {rv64} not work?
I am not sure. These testcases were added by kito long time ago.
Frankly, I am not familiar with GCC test framework.

I think the highest priority is to fix the "real" compiler bugs which I have 
noticed yesterday:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test

@Li Pan could you verify whether your patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621610.html can fix these 2 
issues?
If yes, please send V2 patch with append these information into patch log.


Thanks.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-06-14 14:52
To: juzhe.zh...@rivai.ai; 
pan2.li; gcc-patches
CC: rdapp.gcc; 
jeffreyalaw; 
yanzhang.wang; 
kito.cheng
Subject: Re: [PATCH v1] RISC-V: Align the predictor style for 
define_insn_and_split
Yes, I agree with the general assessment (and didn't mean to insinuate
that the FAILs are compiler's or a fault of the patch.

> So these 2 failures in RV32 are not the compile's bugs. I have seen:
> /* { dg-do run { target { { {riscv_vector} && {rv64} } } } } */ in
> these testcases which can not work to block execution in RV32 (Since
> such testcase only needs to be tested on RV64). I think this is the
> issue we need to figure out.

Yeah sure, we need to be able to run tests only for specific targets.
Why does {riscv_vector} && {rv64} not work?

For zvfh I'm testing something like the following:

proc check_effective_target_riscv_zvfh { } {
if { ![istarget rv32*-*-*] && ![istarget rv64*-*-*] } then {
return 0;
}

if !check_effective_target_riscv_vector then {
return 0;
}

return [
[check_runtime riscv_check_zvfh {
int main (void)
{
asm ("vsetivli zero,8,e16,m1,ta,ma");
asm ("vfadd.vv %%v8,%%v8,%%v16" : : : "%%v8");
return 0;
}
} "-march=rv64gcv_zvfh" ]
|| ... ]

Regards
Robin



Re: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-14 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 13 Jun 2023 at 12:38, Tejas Belagod via Gcc-patches
 wrote:
>
>
>
> From: Richard Sandiford 
> Date: Monday, June 12, 2023 at 2:15 PM
> To: Tejas Belagod 
> Cc: gcc-patches@gcc.gnu.org , Tejas Belagod 
> 
> Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
> Tejas Belagod  writes:
> > From: Tejas Belagod 
> >
> >   This PR optimizes an SVE intrinsics sequence where
> > svlasta (svptrue_pat_b8 (SV_VL1), x)
> >   a scalar is selected based on a constant predicate and a variable vector.
> >   This sequence is optimized to return the correspoding element of a NEON
> >   vector. For eg.
> > svlasta (svptrue_pat_b8 (SV_VL1), x)
> >   returns
> > umovw0, v0.b[1]
> >   Likewise,
> > svlastb (svptrue_pat_b8 (SV_VL1), x)
> >   returns
> >  umovw0, v0.b[0]
> >   This optimization only works provided the constant predicate maps to a 
> > range
> >   that is within the bounds of a 128-bit NEON register.
> >
> > gcc/ChangeLog:
> >
> >PR target/96339
> >* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> > Fold sve
> >calls that have a constant input predicate vector.
> >(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
> >(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
> >(svlast_impl::vect_all_same): Check if all vector elements are equal.
> >
> > gcc/testsuite/ChangeLog:
> >
> >PR target/96339
> >* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
> >* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
> >* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
> >* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
> >to expect optimized code for function body.
> >* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
> >* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
>
> OK, thanks.
>
> Applied on master, thanks.
Hi Tejas,
This seems to break aarch64 bootstrap build with following error due
to -Wsign-compare diagnostic:
00:18:19 
/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/aarch64/aarch64-sve-builtins-base.cc:1133:35:
error: comparison of integer expressions of different signedness:
‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
00:18:19  1133 | for (i = npats; i < enelts; i += step_1)
00:18:19  | ~~^~~~
00:30:46 abe-debug-build: cc1plus: all warnings being treated as errors
00:30:46 abe-debug-build: make[3]: ***
[/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/aarch64/t-aarch64:96:
aarch64-sve-builtins-base.o] Error 1

Thanks,
Prathamesh
>
> Tejas.
>
>
> Richard


[PATCH] RISC-V: Add (u)int8_t to binop tests.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this patch adds the missing (u)int8_t types to the binop tests.
I suggest in the future we have the testsuite pass -march=rv32gcv
as well as -march=rv64gcv as options to each test case instead of 
essentially duplicate the files as we do now.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Adapt for
(u)int8_t.
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Dito.
---
 .../gcc.target/riscv/rvv/autovec/binop/shift-run.c |  4 
 .../gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c | 10 +++---
 .../gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c |  6 +++---
 .../riscv/rvv/autovec/binop/shift-template.h   |  5 -
 .../gcc.target/riscv/rvv/autovec/binop/vadd-run.c  |  6 ++
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vadd-template.h |  7 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vand-run.c  |  6 ++
 .../gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vand-template.h |  7 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-run.c  |  4 
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c  |  6 +++---
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c  |  6 +++---
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-template.h |  4 
 .../gcc.target/riscv/rvv/autovec/binop/vmax-run.c  |  4 
 .../gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vmax-template.h |  3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vmin-run.c  |  4 
 .../gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c  |  4 ++--
 .../gcc.target/riscv/rvv/autovec/binop/vmin-template.h |  3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vmul-run.c  |  4 
 .../gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c  |  2 +-
 .../gcc.target/riscv/rvv/autov

Re: [PATCH] RISC-V: Add (u)int8_t to binop tests.

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-14 15:16
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add (u)int8_t to binop tests.
Hi,
 
this patch adds the missing (u)int8_t types to the binop tests.
I suggest in the future we have the testsuite pass -march=rv32gcv
as well as -march=rv64gcv as options to each test case instead of 
essentially duplicate the files as we do now.
 
Regards
Robin
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Adapt for
(u)int8_t.
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Dito.
---
.../gcc.target/riscv/rvv/autovec/binop/shift-run.c |  4 
.../gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c | 10 +++---
.../gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c |  6 +++---
.../riscv/rvv/autovec/binop/shift-template.h   |  5 -
.../gcc.target/riscv/rvv/autovec/binop/vadd-run.c  |  6 ++
.../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vadd-template.h |  7 ++-
.../gcc.target/riscv/rvv/autovec/binop/vand-run.c  |  6 ++
.../gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vand-template.h |  7 ++-
.../gcc.target/riscv/rvv/autovec/binop/vdiv-run.c  |  4 
.../gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c  |  6 +++---
.../gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c  |  6 +++---
.../gcc.target/riscv/rvv/autovec/binop/vdiv-template.h |  4 
.../gcc.target/riscv/rvv/autovec/binop/vmax-run.c  |  4 
.../gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vmax-template.h |  3 ++-
.../gcc.target/riscv/rvv/autovec/binop/vmin-run.c  |  4 
.../gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c  |  4 ++--
.../gcc.target/riscv/rvv/autovec/binop/vmin-template.h |  3 ++-
.../gcc.target/riscv/rvv/autovec/binop/vmul-run.c  |  4 
.../gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c  |  2 +-
.../gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c  |  2 +-
.../gcc.target/riscv/rvv/autovec/binop/vmul-template.h |  3 ++-
.../gcc.target/riscv/rvv/autovec/binop/vor-run.c   |  6 ++
.../gcc.t

[PATCH v2] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to fix one bug exported by RV32 test case
multiple_rgroup_run-2.c. The mask should be restricted by elen in
vector, and the condition between the vmv.s.x and the vmv.v.x should
take inner_bits_size rather than constants.

After this patch, below failures on RV32 will be fixed.

FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
 gcc/config/riscv/riscv-v.cc | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e07d5c2901a..db1a5529419 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,19 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
index_in_pattern) const
 {
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  /* We restrict the limit to the elen of RVV. For example:
+ -march=zve32*, the ELEN is 32.
+ -march=zve64*, the ELEN is 64.
+ The related vmv.v.x/vmv.s.x is restricted to ELEN as above, we cannot
+ take care of case like below when ELEN=32
+ vsetvil e64,m1
+ vmv.v.x/vmv.s.x
+   */
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
 
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
 
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
 
   for (int i = 0; i < limit; i++)
 mask |= base_mask << (i * npatterns ());
@@ -1928,7 +1937,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   rtx mask = gen_reg_rtx (mask_mode);
   rtx dup = gen_reg_rtx (dup_mode);
 
-  if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+  if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
RVV_VUNDEF (dup_mode), merge_mask};
@@ -1938,7 +1947,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
- rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+ rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
-- 
2.34.1



RE: [PATCH v1] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Li, Pan2 via Gcc-patches
Thanks Juzhe for reviewing, update the PATCH v2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621645.html

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, June 14, 2023 9:07 AM
To: Li, Pan2 ; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; Li, 
Pan2 ; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v1] RISC-V: Bugfix for vec_init repeating auto 
vectorization in RV32


>> unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
Add comment here to demonstrate why you pick up elen to set the LIMIT.
I understand:
1. -march=zve32* ===> ELEN = 32
-march=zve64* ===> ELEN = 64
2. both vmv.v.x/vmv.s.x is restrict to the ELEN
For example, When ELEN=32 (-march=zve32*)
vsetvli ...e64,m1
vmv.v.x/vmv.s.x
We can't support such code sequence.

You should demonstrate it clearly in the comments.

Otherwise, this patch LGTM.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-14 08:58
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for vec_init repeating auto vectorization in 
RV32
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to fix one bug exported by RV32 test case
multiple_rgroup_run-2.c. The mask should be restricted by elen in
vector, and the condition between the vmv.s.x and the vmv.v.x should
take inner_bits_size rather than constants.

Passed both the rv32 and rv64 riscv/rvv tests.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
gcc/config/riscv/riscv-v.cc | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fb970344521..9270e258ca3 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,11 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
index_in_pattern) const
{
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
   for (int i = 0; i < limit; i++)
 mask |= base_mask << (i * npatterns ());
@@ -1923,7 +1924,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   rtx mask = gen_reg_rtx (mask_mode);
   rtx dup = gen_reg_rtx (dup_mode);
-  if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+  if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
RVV_VUNDEF (dup_mode), merge_mask};
@@ -1933,7 +1934,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
-   rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+   rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+ Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
--
2.34.1




Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
 wrote:
>
> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> broadcast insns are always shorter.
>
> Add two new alternatives each, one covering the AVX2 case and one
> covering AVX512.
I think you can just change assemble output for this first alternative
when TARGET_AVX2, use vbroadcastss, else use vshufps since
vbroadcastss only accept register operand when TARGET_AVX2. And no
need to support 2 extra alternatives which doesn't make sense just
make RA more confused about the same meaning of different
alternatives.
>
> gcc/
>
> * config/i386/sse.md (vec_dupv4sf): New AVX2 and AVX512F
> alternatives using vbroadcastss.
> (*vec_dupv4si): New AVX2 and AVX512F alternatives using
> vpbroadcastd.
> ---
> I'm working from the assumption that the isa attributes to the original
> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> operand forms already when at least AVX2 is enabled.
>
> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_
> and elsewhere.)
Not sure about this part. I grep prefix_extra, seems only used by
znver.md/znver4.md for schedule, and only for comi instructions(?the
reservation name seems so).
>
> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
allowed for scalar mode, but not for 128/256-bit vector modes.

20204  if (TARGET_AVX512F
20205  && (VALID_AVX512F_REG_OR_XI_MODE (mode)
20206  || VALID_AVX512F_SCALAR_MODE (mode)))
20207return true;


> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> alternative (or just m, as Yv is already covered by the 2nd one)?
I guess xm is more suitable since we still want to allocate
operands[1] to register when sse3_noavx.
It didn't hit any error since for avx and above, alternative 1(2rd
one) is always matched than alternative 2.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25798,38 +25798,42 @@
> (const_int 1)))])
>
>  (define_insn "vec_dupv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SF 0 "register_operand" "=Yv,v,v,v,x")
> (vec_duplicate:V4SF
> - (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SF 1 "nonimmediate_operand" "v,vm,Yv,m,0")))]
>"TARGET_SSE"
>"@
> +   vbroadcastss\t{%1, %0|%0, %1}
> +   vbroadcastss\t{%1, %g0|%g0, %1}
> vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "avx,avx,noavx")
> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> -   (set_attr "mode" "V4SF")])
> +  [(set_attr "isa" "avx2,avx512f,avx,avx,noavx")
> +   (set_attr "type" "ssemov,ssemov,sseshuf1,ssemov,sseshuf1")
> +   (set_attr "length_immediate" "0,0,1,0,1")
> +   (set_attr "prefix_extra" "*,*,0,1,*")
> +   (set_attr "prefix" "maybe_evex,evex,maybe_evex,maybe_evex,orig")
> +   (set_attr "mode" "V4SF,V16SF,V4SF,V4SF,V4SF")])
>
>  (define_insn "*vec_dupv4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yv,v,v,v,x")
> (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SI 1 "nonimmediate_operand" "vm,vm,Yv,m,0")))]
>"TARGET_SSE"
>"@
> +   vpbroadcastd\t{%1, %0|%0, %1}
> +   vpbroadcastd\t{%1, %g0|%g0, %1}
> %vpshufd\t{$0, %1, %0|%0, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "sse2,avx,noavx")
> -   (set_attr "type" "sselog1,ssemov,sselog1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> -   (set_attr "mode" "TI,V4SF,V4SF")
> +  [(set_attr "isa" "avx2,avx512f,sse2,avx,noavx")
> +   (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1")
> +   (set_attr "length_immediate" "0,0,1,0,1")
> +   (set_attr "prefix_extra" "*,*,0,1,*")
> +   (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig")
> +   (set_attr "mode" "TI,XI,TI,V4SF,V4SF")
> (set (attr "preferred_for_speed")
> - (cond [(eq_attr "alternative" "1"

Re: [PATCH v2] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread juzhe.zh...@rivai.ai
+  /* We restrict the limit to the elen of RVV. For example:
+ -march=zve32*, the ELEN is 32.
+ -march=zve64*, the ELEN is 64.
+ The related vmv.v.x/vmv.s.x is restricted to ELEN as above, we cannot
+ take care of case like below when ELEN=32
+ vsetvil e64,m1
+ vmv.v.x/vmv.s.x
+   */

The comment is not clear enough.

How about:

According to RVV ISA SPEC, ELEN = 32 when -march=zve32* and ELEN = 64 when 
-march=zve64*.
Since vmv.v.x/vmv.s.x can't broadcast/move 64-bit value to the vector when ELEN 
= 32, we restrict the LIMIT to the ELEN.

I am not the native English speaker, I'd like to see Jeff or Robin comments 
that.

Thanks. 


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-14 15:29
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for vec_init repeating auto vectorization in 
RV32
From: Pan Li 
 
This patch would like to fix one bug exported by RV32 test case
multiple_rgroup_run-2.c. The mask should be restricted by elen in
vector, and the condition between the vmv.s.x and the vmv.v.x should
take inner_bits_size rather than constants.
 
After this patch, below failures on RV32 will be fixed.
 
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
gcc/config/riscv/riscv-v.cc | 18 ++
1 file changed, 14 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e07d5c2901a..db1a5529419 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,19 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
index_in_pattern) const
{
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  /* We restrict the limit to the elen of RVV. For example:
+ -march=zve32*, the ELEN is 32.
+ -march=zve64*, the ELEN is 64.
+ The related vmv.v.x/vmv.s.x is restricted to ELEN as above, we cannot
+ take care of case like below when ELEN=32
+ vsetvil e64,m1
+ vmv.v.x/vmv.s.x
+   */
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
   for (int i = 0; i < limit; i++)
 mask |= base_mask << (i * npatterns ());
@@ -1928,7 +1937,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   rtx mask = gen_reg_rtx (mask_mode);
   rtx dup = gen_reg_rtx (dup_mode);
-  if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+  if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
RVV_VUNDEF (dup_mode), merge_mask};
@@ -1938,7 +1947,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
-   rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+   rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+ Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
-- 
2.34.1
 
 


[PATCH v2] RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.

2023-06-14 Thread Jin Ma via Gcc-patches
In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for 
FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt 
functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.
---
 gcc/config/riscv/riscv.cc | 48 +--
 gcc/config/riscv/riscv.md | 13 +
 .../gcc.target/riscv/interrupt-fcsr-1.c   | 15 ++
 .../gcc.target/riscv/interrupt-fcsr-2.c   | 15 ++
 .../gcc.target/riscv/interrupt-fcsr-3.c   | 14 ++
 5 files changed, 102 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/interrupt-fcsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/interrupt-fcsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/interrupt-fcsr-3.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2..9d71e5c9f72 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5095,12 +5095,15 @@ riscv_compute_frame_info (void)
 
   frame = &cfun->machine->frame;
 
-  /* In an interrupt function, if we have a large frame, then we need to
- save/restore t0.  We check for this before clearing the frame struct.  */
+  /* In an interrupt function, there are two cases in which t0 needs to be 
used:
+ 1, If we have a large frame, then we need to save/restore t0.  We check 
for
+ this before clearing the frame struct.
+ 2, Need to save and restore some CSRs in the frame.  */
   if (cfun->machine->interrupt_handler_p)
 {
   HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
-  if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
+  if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1))
+ || (TARGET_HARD_FLOAT || TARGET_ZFINX))
interrupt_save_prologue_temp = true;
 }
 
@@ -5147,6 +5150,17 @@ riscv_compute_frame_info (void)
}
 }
 
+  /* In an interrupt function, we need extra space for the initial saves of 
CSRs.  */
+  if (cfun->machine->interrupt_handler_p
+  && ((TARGET_HARD_FLOAT && frame->fmask)
+ || (TARGET_ZFINX
+ /* Except for RISCV_PROLOGUE_TEMP_REGNUM.  */
+ && (frame->mask & ~(1 << RISCV_PROLOGUE_TEMP_REGNUM)
+/* Save and restore FCSR.  */
+/* TODO: When P or V extensions support interrupts, some of their CSRs
+   may also need to be saved and restored.  */
+x_save_size += riscv_stack_align (1 * UNITS_PER_WORD);
+
   /* At the bottom of the frame are any outgoing stack arguments. */
   offset = riscv_stack_align (crtl->outgoing_args_size);
   /* Next are local stack variables. */
@@ -5392,6 +5406,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
}
}
 
+  /* In an interrupt function, save and restore some necessary CSRs in the 
stack
+to avoid changes in CSRs.  */
+  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
+ && cfun->machine->interrupt_handler_p
+ && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
+ || (TARGET_ZFINX
+ && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
+   {
+ unsigned int fcsr_size = GET_MODE_SIZE (SImode);
+ if (!epilogue)
+   {
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+ emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset, riscv_save_reg);
+   }
+ else
+   {
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset - fcsr_size, riscv_restore_reg);
+ emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+   }
+ continue;
+   }
+
   riscv_save_restore_reg (word_mode, regno, offset, fn);
 }
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d8e935cb934..565e8cd27cd 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -78,6 +78,8 @@ (define_c_enum "unspecv" [
   UNSPECV_GPR_RESTORE
 
   ;; Floating-point unspecs.
+  UNSPECV_FRCSR
+  UNSPECV_FSCSR
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
   UNSPECV_FSNVSNAN
@@ -3056,6 +3058,17 @@ (define_insn "gpr_restore_return"
   ""
   "")
 
+(define_insn "riscv_frcsr"
+  [(set (match_operand:SI 0 "register_operand" "=r"

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> Segher Boessenkool  writes:
> 
> > Hi!
> >
> > As I said in a reply to the original patch: not okay.  Sorry.
> 
> Thanks a lot for your comments!
> I'm also thinking about other solutions:
> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>   This is the existing pattern.  It may be read as an action
>   to clean an unknown-size memory block.
> 
> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> UNSPEC_TIE".
>   Current patch is using this one.
> 
> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> UNSPEC_TIE".
>This avoids using BLK on unspec, but using DI.

That gives the MEM a size which means we can interpret the (set ..)
as killing a specific area of memory, enabling DSE of earlier
stores.

AFAIU this special instruction is only supposed to prevent
code motion (of stack memory accesses?) across this instruction?
I'd say a

  (may_clobber (mem:BLK (reg:DI 1 1)))

might be more to the point?  I've used "may_clobber" which doesn't
exist since I'm not sure whether a clobber is considered a kill.
The docs say "Represents the storing or possible storing of an 
unpredictable..." - what is it?  Storing or possible storing?
I suppose stack_tie should be less strict than the documented
(clobber (mem:BLK (const_int 0))) (clobber all memory).

?

> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
> UNSPEC_TIE"
>There is still a mode for the unspec.
> 
> 
> >
> > But some comments on this patch:
> >
> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE
> >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
> >
> > This makes it required that the operand of an UNSPEC_TIE unspec is a
> > const_int 0.  This should be documented somewhere.  Ideally you would
> > want no operand at all here, but every unspec has an operand.
> 
> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".
> 
> >
> >> +  RTVEC_ELT (p, i)
> >> +  = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
> >> +  UNSPEC_TIE));
> >
> > If it is hard to indent your code, your code is trying to do to much.
> > Just have an extra temporary?
> >
> >   rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), 
> > UNSPEC_TIE);
> >   RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);
> >
> > That is shorter even, and certainly more readable :-)
> 
> Yeap, thanks!
> 
> >
> >> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
> >>operands[4] = gen_frame_mem (Pmode, operands[1]);
> >>p = rtvec_alloc (1);
> >>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
> >> -const0_rtx);
> >> +gen_rtx_UNSPEC (BLKmode,
> >> +gen_rtvec (1, const0_rtx),
> >> +UNSPEC_TIE));
> >>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
> >
> > I have a hard time to see how this could ever be seen as clearer or more
> > obvious or anything like that :-(
> 
> I was thinking about just invoking gen_stack_tie here.
> 
> BR,
> Jeff (Jiufu Guo)
> 
> >
> >
> > Segher
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Fix typo in 'libgomp.c/target-51.c' (was: [patch] OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory)

2023-06-14 Thread Thomas Schwinge
Hi!

On 2023-06-13T20:44:39+0200, Tobias Burnus  wrote:
> I intent to commit this tomorrow, unless there are comments.

I'm sorry I'm late.  ;-P

> It does as it says (see commit log): It initializes default-device-var
> to the value using the algorithm described in OpenMP 5.2, which
> depends on whether OMP_TARGET_OFFLOAD=mandatory was set.
>
> NOTE: With -foffload=disable there is no binary code but still
> devices get found - such that default-device-var == 0 (= first
> nonhost device). Thus, in that case, libgomp runs the code on that
> device but as no binary data is available, host fallback is used.
> (Even if there would be executable code for another device on
> the system.)
> With mandatory, this unintended host fallback is detected and an
> error is diagnosed. One can argue whether keeping the devices
> makes sense (e.g. because in a dynamic library device code will
> be loaded later) or not (don't list if no code is available).

This reminds me of the (unresolved) 
"Means to determine at runtime foffload targets specified at compile time".

> Note that TR11 (future OpenMP 6.0) extends OMP_DEFAULT_DEVICE and
> adds OMP_AVAILABLE_DEVICES which permit a finer-grained control about
> the device, including OMP_DEFAULT_DEVICE=initial (and 'invalid') which
> the current scheme does not permit. (Well, there is
> OMP_TARGET_OFFLOAD=disabled, but that's a too big hammer.)

> PS:  DejaGNU testing was done without offloading configured
> and with remote testing on a system having an offload device,
> which which does not support setting environment variables.
> Manual testing was done with offloading enabled and depending
> on the testcase, running on a system with and/or without offloading
> hardware.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -150,7 +150,11 @@ resolve_device (int device_id, bool remapped)
>if (device_id == (remapped ? GOMP_DEVICE_HOST_FALLBACK
>: omp_initial_device))
>   return NULL;
> -  if (device_id == omp_invalid_device)
> +  if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
> +   && gomp_get_num_devices () == 0)
> + gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY but only the host "
> + "device is available");
> +  else if (device_id == omp_invalid_device)
>   gomp_fatal ("omp_invalid_device encountered");
>else if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY)
>   gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
|   "but device not found");
|
|return NULL;
|  }
|else if (device_id >= gomp_get_num_devices ())
|  {
|if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
| && device_id != num_devices_openmp)
|   gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
|   "but device not found");
|
|return NULL;
|  }
|
|gomp_mutex_lock (&devices[device_id].lock);
|if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
|  gomp_init_device (&devices[device_id]);
|else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
|  {
|gomp_mutex_unlock (&devices[device_id].lock);
|
|if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY)
|   gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
|   "but device is finalized");
|
|return NULL;
|  }
|gomp_mutex_unlock (&devices[device_id].lock);
|
|return &devices[device_id];
|  }

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-51.c
> @@ -0,0 +1,24 @@
> +/* Check OMP_TARGET_OFFLOAD on systems with no available non-host devices,
> +   which is enforced by using -foffload=disable.  */
> +
> +/* { dg-do run } */
> +/* { dg-additional-options "-foffload=disable" } */
> +/* { dg-set-target-env-var OMP_TARGET_OFFLOAD "mandatory" } */
> +
> +/* { dg-shouldfail "OMP_TARGET_OFFLOAD=mandatory and no available device" } 
> */
> +
> +/* See comment in target-50.c/target-50.c for why the output differs.  */
> +
> +/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY but only 
> the host device is available.*" { target { ! offload_device } } } */
> +/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY but device 
> not found.*" { target offload_device } } */

I intend to push the attached "Fix typo in 'libgomp.c/target-51.c'" after
testing.

Let me know if I should also adjust the new 'target { ! offload_device }'
diagnostic "[...] MANDATORY but only the host device is available" to
include a comma before 'but', for consistency with the other existing
diagnostics (cited above)?


Grüße
 Thomas


> +
> +int
> +main ()
> +{
> +  int x;
> +  #pragma omp target map(tofrom:x)
> +x = 5;
> +  if (x != 5)
> +__builtin_abort ();
> +  return 0;
> +}


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter H

Re: [wwwdocs] gcc-14/changes.html + projects/gomp/: GCC 14 OpenMP update

2023-06-14 Thread Tobias Burnus

Now committed - with additional changes: two GCC 13 features on the
implementation-status list were missed in a previous update. Current
version:

→ https://gcc.gnu.org/projects/gomp/

→ https://gcc.gnu.org/gcc-14/changes.html (as first entry)

Tobias

On 13.06.23 20:45, Tobias Burnus wrote:

First update for OpenMP changes that made it into GCC 14.

Wording, technical and other comments are welcome as always.

I intent to commit the attached patch tomorrow.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit b286a7932a63eb192e5586f7534ed508fca0c7f0
Author: Tobias Burnus 
Date:   Wed Jun 14 10:03:16 2023 +0200

gcc-14/changes.html + projects/gomp/: GCC 14 OpenMP update
---
 htdocs/gcc-14/changes.html  | 15 +++
 htdocs/projects/gomp/index.html | 14 --
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 55d566b8..c403c94f 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -37,6 +37,21 @@ a work-in-progress.
 
 General Improvements
 
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP
+  
+
+  The requires directive's unified_address
+  requirement is now fulfilled by both AMD GCN and nvptx devices.
+
+
+  OpenMP 5.2: The OMP_TARGET_OFFLOAD=mandatory handling has
+  been updated for the clarifications and changes of the 5.2 specification.
+  For Fortran, the list of directives permitted in Fortran pure procedures
+  was extended.
+
+  
+
 
 New Languages and Language specific improvements
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 328d17bd..2df67403 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -142,7 +142,7 @@ filing a bug report.
 
 Implementation status in libgomp manual:
 https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Implementation-Status.html";
->Mainline (GCC 13),
+>Mainline (GCC 14),
 https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/OpenMP-Implementation-Status.html";
 >GCC 13,
 https://gcc.gnu.org/onlinedocs/gcc-12.1.0/libgomp/OpenMP-Implementation-Status.html";
@@ -313,12 +313,14 @@ than listed, depending on resolved corner cases and optimizations.
   GCC 9
   GCC 12
   GCC 13
+  GCC 14
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
-  unified_shared_memory
+  unified_shared_memory
+  complete but no non-host devices provides unified_shared_memory
 
   
   
@@ -791,7 +793,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Clauses on end directive can be on directive
-No
+GCC 13
 
   
   
@@ -801,7 +803,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 linear clause syntax changes and step modifier
-No
+GCC 13
 
   
   
@@ -836,7 +838,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Extended list of directives permitted in Fortran pure procedures
-No
+GCC 14
 
   
   
@@ -926,7 +928,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Initial value of default-device-var ICV with OMP_TARGET_OFFLOAD=mandatory
-No
+GCC 14
 
   
   


Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches
 wrote:
>
> There's no reason to constrain this to AVX512VL, as the wider operation
> is not usable for more narrow operands only when the possible memory
But this may require more resources (on AMD znver4 processor a zmm
instruction will also be split into 2 uops, right?) And on some intel
processors(SKX/CLX) there will be frequency reduction.
If it needs to be done, it is better guarded with
!TARGET_PREFER_AVX256, at least when micro-architecture AVX256_OPTIMAL
or users explicitly uses -mprefer-vector-width=256, we don't want to
produce any zmm instruction for surprise.(Although
-mprefer-vector-width=256 is supposed for auto-vectorizer, but backend
codegen also use it under such cases, i.e. in *movsf_internal
alternative 5 use zmm only TARGET_AVX512F && !TARGET_PREFER_AVX256.)
> source is a non-broadcast one. This way even the scalar copysign3
> can benefit from the operation being a single-insn one (leaving aside
> moves which the compiler decides to insert for unclear reasons, and
> leaving aside the fact that bcst_mem_operand() is too restrictive for
> broadcast to be embedded right into VPTERNLOG*).
>
> Along with this also request value duplication in
> ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
> excess space allocation in .rodata.*, filled with zeros which are never
> read.
>
> gcc/
>
> * config/i386/i386-expand.cc (ix86_expand_copysign): Request
> value duplication by ix86_build_signbit_mask() when AVX512F and
> not HFmode.
> * config/i386/sse.md (*_vternlog_all): Convert to
> 2-alternative form. Adjust "mode" attribute. Add "enabled"
> attribute.
> (*_vpternlog_1): Relax to just TARGET_AVX512F.
> (*_vpternlog_2): Likewise.
> (*_vpternlog_3): Likewise.
> ---
> I guess the underlying pattern, going along the lines of what
> one_cmpl2 uses, can be applied elsewhere
> as well.
>
> HFmode could use embedded broadcast too for copysign and alike, but that
> would need to be V2HF -> V8HF (for which I don't think there are any
> existing patterns).
>
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[])
>else
>  dest = NULL_RTX;
>op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode);
> -  mask = ix86_build_signbit_mask (vmode, 0, 0);
> +  mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 
> 0);
>
>if (CONST_DOUBLE_P (operands[1]))
>  {
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12399,11 +12399,11 @@
> (set_attr "mode" "")])
>
>  (define_insn "*_vternlog_all"
> -  [(set (match_operand:V 0 "register_operand" "=v")
> +  [(set (match_operand:V 0 "register_operand" "=v,v")
> (unspec:V
> - [(match_operand:V 1 "register_operand" "0")
> -  (match_operand:V 2 "register_operand" "v")
> -  (match_operand:V 3 "bcst_vector_operand" "vmBr")
> + [(match_operand:V 1 "register_operand" "0,0")
> +  (match_operand:V 2 "register_operand" "v,v")
> +  (match_operand:V 3 "bcst_vector_operand" "vBr,m")
>(match_operand:SI 4 "const_0_to_255_operand")]
>   UNSPEC_VTERNLOG))]
>"TARGET_AVX512F
> @@ -12411,10 +12411,22 @@
> it's not real AVX512FP16 instruction.  */
>&& (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
>   || GET_CODE (operands[3]) != VEC_DUPLICATE)"
> -  "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"
> +{
> +  if (TARGET_AVX512VL)
> +return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}";
> +  else
> +return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}";
> +}
>[(set_attr "type" "sselog")
> (set_attr "prefix" "evex")
> -   (set_attr "mode" "")])
> +   (set (attr "mode")
> +(if_then_else (match_test "TARGET_AVX512VL")
> + (const_string "")
> + (const_string "XI")))
> +   (set (attr "enabled")
> +   (if_then_else (eq_attr "alternative" "1")
> + (symbol_ref " == 64 || TARGET_AVX512VL")
> + (const_string "*")))])
>
>  ;; There must be lots of other combinations like
>  ;;
> @@ -12443,7 +12455,7 @@
>   (any_logic2:V
> (match_operand:V 3 "regmem_or_bitnot_regmem_operand")
> (match_operand:V 4 "regmem_or_bitnot_regmem_operand"]
> -  "( == 64 || TARGET_AVX512VL)
> +  "TARGET_AVX512F
> && ix86_pre_reload_split ()
> && (rtx_equal_p (STRIP_UNARY (operands[1]),
> STRIP_UNARY (operands[4]))
> @@ -12527,7 +12539,7 @@
>   (match_operand:V 2 "regmem_or_bitnot_regmem_operand"))
> (match_operand:V 3 "regmem_or_bitnot_regmem_operand"))
>   (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))]
> -  "( == 64 || TARGET_AVX512VL)
> +  "TARGET_AVX512F
> && ix86_pre_reload_split ()
> && (rtx_equal_p (STRIP_UNARY (operands[

Re: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 13, 2023 at 10:07 AM Kewen Lin via Gcc-patches
 wrote:
>
> This patch adjusts the cost handling on
> VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load.  We
> don't call function vect_model_load_cost for it any more.
>
> As the affected test case gcc.target/i386/pr70021.c shows,
> the previous costing can under-cost the total generated
> vector loads as for VMAT_CONTIGUOUS_PERMUTE function
> vect_model_load_cost doesn't consider the group size which
> is considered as vec_num during the transformation.
The original PR is for the correctness issue, and I'm not sure how
much of a performance impact the patch would be, but the change looks
reasonable, so the test change looks ok to me.
I'll track performance impact on SPEC2017 to see if there's any
regression caused by the patch(Guess probably not).
>
> This patch makes the count of vector load in costing become
> consistent with what we generates during the transformation.
> To be more specific, for the given test case, for memory
> access b[i_20], it costed for 2 vector loads before,
> with this patch it costs 8 instead, it matches the final
> count of generated vector loads basing from b.  This costing
> change makes cost model analysis feel it's not profitable
> to vectorize the first loop, so this patch adjusts the test
> case without vect cost model any more.
>
> But note that this test case also exposes something we can
> improve further is that although the number of vector
> permutation what we costed and generated are consistent,
> but DCE can further optimize some unused permutation out,
> it would be good if we can predict that and generate only
> those necessary permutations.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
> handle memory_access_type VMAT_CONTIGUOUS, remove some
> VMAT_CONTIGUOUS_PERMUTE related handlings.
> (vectorizable_load): Adjust the cost handling on 
> VMAT_CONTIGUOUS_PERMUTE
> without calling vect_model_load_cost.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
> ---
>  gcc/testsuite/gcc.target/i386/pr70021.c |  2 +-
>  gcc/tree-vect-stmts.cc  | 88 ++---
>  2 files changed, 51 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c 
> b/gcc/testsuite/gcc.target/i386/pr70021.c
> index 6562c0f2bd0..d509583601e 100644
> --- a/gcc/testsuite/gcc.target/i386/pr70021.c
> +++ b/gcc/testsuite/gcc.target/i386/pr70021.c
> @@ -1,7 +1,7 @@
>  /* PR target/70021 */
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx2 } */
> -/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
> -mtune=skylake" } */
> +/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
> -mtune=skylake -fno-vect-cost-model" } */
>
>  #include "avx2-check.h"
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 7f8d9db5363..e7a97dbe05d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
>   slp_tree slp_node,
>   stmt_vector_for_cost *cost_vec)
>  {
> -  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
> - || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
> +  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
>
>unsigned int inside_cost = 0, prologue_cost = 0;
>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
> @@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
>   once per group anyhow.  */
>bool first_stmt_p = (first_stmt_info == stmt_info);
>
> -  /* We assume that the cost of a single load-lanes instruction is
> - equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
> - access is instead being provided by a load-and-permute operation,
> - include the cost of the permutes.  */
> -  if (first_stmt_p
> -  && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
> -{
> -  /* Uses an even and odd extract operations or shuffle operations
> -for each needed permute.  */
> -  int group_size = DR_GROUP_SIZE (first_stmt_info);
> -  int nstmts = ncopies * ceil_log2 (group_size) * group_size;
> -  inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
> -  stmt_info, 0, vect_body);
> -
> -  if (dump_enabled_p ())
> -dump_printf_loc (MSG_NOTE, vect_location,
> - "vect_model_load_cost: strided group_size = %d .\n",
> - group_size);
> -}
> -
>vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
>   misalignment, first_stmt_p, &inside_cost, 
> &prologue_cost,
>   cost_vec, cost_vec, true);
> @@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo,
>  alignment support schemes.  */

Re: [PATCH v2] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi Pan,

> This patch would like to fix one bug exported by RV32 test case
> multiple_rgroup_run-2.c. The mask should be restricted by elen in
> vector, and the condition between the vmv.s.x and the vmv.v.x should
> take inner_bits_size rather than constants.

exported -> exposed.

How about something like:

"When constructing a vector mask from individual elements we wrongly
assumed that we can broadcast BITS_PER_WORD (i.e. XLEN).  The maximum
is actually the vector element length (i.e. ELEN).  This patch fixes
this."?

> +  /* We restrict the limit to the elen of RVV. For example:
> + -march=zve32*, the ELEN is 32.
> + -march=zve64*, the ELEN is 64.
> + The related vmv.v.x/vmv.s.x is restricted to ELEN as above, we cannot
> + take care of case like below when ELEN=32
> + vsetvil e64,m1
> + vmv.v.x/vmv.s.x
> +   */

/* Here we construct a mask pattern that will later be broadcast
   to a vector register.  The maximum broadcast size for vmv.v.x/vmv.s.x
   is determined by the length of a vector element (ELEN) and not by
   XLEN so make sure we do not exceed it.  One example is -march=zve32*
   which mandates ELEN == 32 but can be combined with -march=rv64
   with XLEN == 64.  */

Regards
 Robin


Re: [x86 PATCH] Convert ptestz of pandn into ptestc.

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle  wrote:
>
>
> This patch is the next instalment in a set of backend patches around
> improvements to ptest/vptest.  A previous patch optimized the sequence
> t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
> property that ZF is set to (X&Y) == 0.  This patch performs a similar
> transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost)
> equivalent ptestc(y,x), using the property that the CF flags is set to
> (~X&Y) == 0.  The tricky bit is that this sets the CF flag instead of
> the ZF flag, so we can only perform this transformation when we can
> also convert the flags' consumer, as well as the producer.
>
> For the test case:
>
> int foo (__m128i x, __m128i y)
> {
>   __m128i a = x & ~y;
>   return __builtin_ia32_ptestz128 (a, a);
> }
>
> With -O2 -msse4.1 we previously generated:
>
> foo:pandn   %xmm0, %xmm1
> xorl%eax, %eax
> ptest   %xmm1, %xmm1
> sete%al
> ret
>
> with this patch we now generate:
>
> foo:xorl%eax, %eax
> ptest   %xmm0, %xmm1
> setc%al
> ret
>
> At the same time, this patch also provides alternative fixes for
> PR target/109973 and PR target/110118, by recognizing that ptestc(x,x)
> always sets the carry flag (X&~X is always zero).  This is achieved
> both by recognizing the special case in ix86_expand_sse_ptest and with
> a splitter to convert an eligible ptest into an stc.
>
> The next piece is, of course, STV of "if (x & ~y)..."
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> 2023-06-13  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
> expansion of ptestc with equal operands as returning const1_rtx.
> * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
> estimates of UNSPEC_PTEST, where the ptest performs the PAND
> or PAND of its operands.
> * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
> of reg_equal_p operands into an x86_stc instruction.
> (define_split): Split pandn/ptestz/setne into ptestc/setnc.
> (define_split): Split pandn/ptestz/sete into ptestc/setc.
> (define_split): Split pandn/ptestz/je into ptestc/jc.
> (define_split): Split pandn/ptestz/jne into ptestc/jnc.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/avx-vptest-4.c: New test case.
> * gcc.target/i386/avx-vptest-5.c: Likewise.
> * gcc.target/i386/avx-vptest-6.c: Likewise.
> * gcc.target/i386/pr109973-1.c: Update test case.
> * gcc.target/i386/pr109973-2.c: Likewise.
> * gcc.target/i386/sse4_1-ptest-4.c: New test case.
> * gcc.target/i386/sse4_1-ptest-5.c: Likewise.
> * gcc.target/i386/sse4_1-ptest-6.c: Likewise.
>
>
> Thanks in advance,
> Roger

+  /* ptest reg, reg sets the carry flag.  */
+  if (comparison == LTU
+  && (d->code == IX86_BUILTIN_PTESTC
+  || d->code == IX86_BUILTIN_PTESTC256)
+  && rtx_equal_p (op0, op1))
+return const1_rtx;

In this function, a RTX that sets a target reg should be emitted, and
a target register returned. I don't think the above code is correct.

+;; pandn/ptestz/setne -> ptestc/setnc
+(define_split
+  [(set (match_operand:QI 0 "register_operand")
+(ne:QI

Please note that setcc is a bit tricky on x86. You can actually set a
register in QI/HI/SI/DImode, and post-reload splitters will do the
correct extension (see the patterns in i386.md, after "For all sCOND
expanders ..." comment). But you have to account for all these modes
in the pre-reload splitter. Maybe you should use the
"int248_register_operand" predicate to avoid pattern explosion.

+  (unspec:CCZ [
+(and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand"))
+   (match_operand:V_AVX 2 "register_operand"))
+(and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))]
+UNSPEC_PTEST)
+  (const_int 0)))]
+  "TARGET_SSE4_1"
+  [(set (reg:CCC FLAGS_REG)
+(unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST))
+   (set (strict_low_part (subreg:QI (match_dup 0) 0))
+(geu:QI (reg:CCC FLAGS_REG) (const_int 0)))])

No need to set strict_low_part, just set a register with EQ/NE of
CCCmode and post-reload splitters will do their magic. Please also
note that you emit a QI subreg of a QI register here, which doesn't
seem right.

+
+;; Changing the CCmode of FLAGS_REG requires updating both def and use.

Does the above comment also apply to the above pattern?

+;; pandn/ptestz/sete -> ptestc/setc
+(define_split
+  [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0))
+(eq:QI
+  (unspec:CCZ [
+(and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand"))
+   (match_operand:V_AVX 2 "regist

RE: [PATCH v2] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Li, Pan2 via Gcc-patches
Thanks Robin, that looks like much better than the v2, let me update it to 
PATCH v3.

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, June 14, 2023 4:27 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang ; kito.ch...@gmail.com
Subject: Re: [PATCH v2] RISC-V: Bugfix for vec_init repeating auto 
vectorization in RV32

Hi Pan,

> This patch would like to fix one bug exported by RV32 test case
> multiple_rgroup_run-2.c. The mask should be restricted by elen in
> vector, and the condition between the vmv.s.x and the vmv.v.x should
> take inner_bits_size rather than constants.

exported -> exposed.

How about something like:

"When constructing a vector mask from individual elements we wrongly
assumed that we can broadcast BITS_PER_WORD (i.e. XLEN).  The maximum
is actually the vector element length (i.e. ELEN).  This patch fixes
this."?

> +  /* We restrict the limit to the elen of RVV. For example:
> + -march=zve32*, the ELEN is 32.
> + -march=zve64*, the ELEN is 64.
> + The related vmv.v.x/vmv.s.x is restricted to ELEN as above, we cannot
> + take care of case like below when ELEN=32
> + vsetvil e64,m1
> + vmv.v.x/vmv.s.x
> +   */

/* Here we construct a mask pattern that will later be broadcast
   to a vector register.  The maximum broadcast size for vmv.v.x/vmv.s.x
   is determined by the length of a vector element (ELEN) and not by
   XLEN so make sure we do not exceed it.  One example is -march=zve32*
   which mandates ELEN == 32 but can be combined with -march=rv64
   with XLEN == 64.  */

Regards
 Robin


Re: [committed] OpenMP: Cleanups related to the 'present' modifier

2023-06-14 Thread Thomas Schwinge
Hi Tobias!

On 2023-06-12T18:44:23+0200, Tobias Burnus  wrote:
> Cleanup follow up to
>r14-1579-g4ede915d5dde93 "openmp: Add support for the 'present' modifier"
> committed 6 days ago.
>
> Namely:
> * Replace for the program → libgomp ABI 
> GOMP_MAP_PRESENT_[ALLOC,TO,FROM,TOFROM]
>by the preexisting GOMP_MAP_FORCE_PRESENT but keep the other enum values
>(and use them until gimplifcation).
>
> * Improve wording if a non-existing/unsupported map-type modifier was used
>by not referring to 'omp target' as it could be also target (enter/exit) 
> data.
>+ Add a testcase for enter/exit data + data.
>
> * Unify + improve wording shown for 'present' when not present on the device.
>
> * Extend in the testcases to check that data actually gets copied with
>'target update' and 'map when the 'present' modifier is present.
>
> Committed as Rev. r14-1736-g38944ec2a6fa10

> OpenMP: Cleanups related to the 'present' modifier
>
> Reduce number of enum values passed to libgomp as
> GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as
> GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore);
> that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also
> abort if not present but copy data when present. This is is a follow-up to
> the commit r14-1579-g4ede915d5dde93 done 6 days ago.

Great, that matches how I thought this should be done (re our 2023-06-07
GCC IRC discussion).

> Additionally, the commit [...]
> extends testcases a tiny bit.

> gcc/testsuite/ChangeLog:

> * gfortran.dg/gomp/target-update-1.f90: Likewise.

That one addressed fixed 
"gfortran.dg/gomp/target-update-1.f90 fails after r14-1579-g4ede915d5dde93".

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h

|  #define GOMP_MAP_FLAG_PRESENT(GOMP_MAP_FLAG_SPECIAL_5 \
|| GOMP_MAP_FLAG_SPECIAL_0)

Couldn't/shouldn't we now get rid of this 'GOMP_MAP_FLAG_PRESENT'...

|  #define GOMP_MAP_FLAG_ALWAYS_PRESENT (GOMP_MAP_FLAG_SPECIAL_2 \
|| GOMP_MAP_FLAG_PRESENT)

..., as it is only used in 'GOMP_MAP_FLAG_ALWAYS_PRESENT' here...

> @@ -136,14 +136,6 @@ enum gomp_map_kind
> device.  */
>  GOMP_MAP_ALWAYS_TOFROM = (GOMP_MAP_FLAG_SPECIAL_2
>| GOMP_MAP_TOFROM),
> -/* Must already be present.  */
> -GOMP_MAP_PRESENT_ALLOC = (GOMP_MAP_FLAG_PRESENT | 
> GOMP_MAP_ALLOC),
> -/* Must already be present, copy to device.  */
> -GOMP_MAP_PRESENT_TO =(GOMP_MAP_FLAG_PRESENT | GOMP_MAP_TO),
> -/* Must already be present, copy from device.  */
> -GOMP_MAP_PRESENT_FROM =  (GOMP_MAP_FLAG_PRESENT | GOMP_MAP_FROM),
> -/* Must already be present, copy to and from device.  */
> -GOMP_MAP_PRESENT_TOFROM =(GOMP_MAP_FLAG_PRESENT | 
> GOMP_MAP_TOFROM),
>  /* Must already be present, unconditionally copy to device.  */
>  GOMP_MAP_ALWAYS_PRESENT_TO = (GOMP_MAP_FLAG_ALWAYS_PRESENT
>| GOMP_MAP_TO),
> @@ -205,7 +197,13 @@ enum gomp_map_kind
>  /* An attach or detach operation.  Rewritten to the appropriate type 
> during
> gimplification, depending on directive (i.e. "enter data" or
> parallel/kernels region vs. "exit data").  */
> -GOMP_MAP_ATTACH_DETACH = (GOMP_MAP_LAST | 3)
> +GOMP_MAP_ATTACH_DETACH = (GOMP_MAP_LAST | 3),
> +/* Must already be present - all of following map to 
> GOMP_MAP_FORCE_PRESENT
> +   as no data transfer is needed.  */
> +GOMP_MAP_PRESENT_ALLOC = (GOMP_MAP_LAST | 4),
> +GOMP_MAP_PRESENT_TO =(GOMP_MAP_LAST | 5),
> +GOMP_MAP_PRESENT_FROM =  (GOMP_MAP_LAST | 6),
> +GOMP_MAP_PRESENT_TOFROM =(GOMP_MAP_LAST | 7)
>};
>
>  #define GOMP_MAP_COPY_TO_P(X) \
> @@ -243,7 +241,8 @@ enum gomp_map_kind
>(((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_FLAG_FORCE)
>
>  #define GOMP_MAP_PRESENT_P(X) \
> -  (((X) & GOMP_MAP_FLAG_PRESENT) == GOMP_MAP_FLAG_PRESENT)
> +  (((X) & GOMP_MAP_FLAG_PRESENT) == GOMP_MAP_FLAG_PRESENT \
> +   || (X) == GOMP_MAP_FORCE_PRESENT)

..., and this 'GOMP_MAP_PRESENT_P' should look for
'GOMP_MAP_FLAG_ALWAYS_PRESENT' instead of 'GOMP_MAP_FLAG_PRESENT' (plus
'GOMP_MAP_FORCE_PRESENT')?

Instead of the current effective 'GOMP_MAP_FLAG_ALWAYS_PRESENT':

GOMP_MAP_FLAG_SPECIAL_0
| GOMP_MAP_FLAG_SPECIAL_2
| GOMP_MAP_FLAG_SPECIAL_5

..., it could/should use a simpler flag combination?  (My idea is that
this later make usage of flag bits for other purposes easier -- but I've
not verified that in depth.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf

[PATCH v3] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Pan Li via Gcc-patches
From: Pan Li 

When constructing a vector mask from individual elements we wrongly
assumed that we can broadcast BITS_PER_WORD (i.e. XLEN).  The maximum is
actually the vector element length (i.e. ELEN).  This patch fixes this.

After this patch, below failures on RV32 will be fixed.

FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
 gcc/config/riscv/riscv-v.cc | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e07d5c2901a..01f647bc0bd 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,17 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
index_in_pattern) const
 {
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  /* Here we construct a mask pattern that will later be broadcast
+ to a vector register.  The maximum broadcast size for vmv.v.x/vmv.s.x
+ is determined by the length of a vector element (ELEN) and not by
+ XLEN so make sure we do not exceed it.  One example is -march=zve32*
+ which mandates ELEN == 32 but can be combined with -march=rv64
+ with XLEN == 64.  */
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
 
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
 
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
 
   for (int i = 0; i < limit; i++)
 mask |= base_mask << (i * npatterns ());
@@ -1928,7 +1935,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   rtx mask = gen_reg_rtx (mask_mode);
   rtx dup = gen_reg_rtx (dup_mode);
 
-  if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+  if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
RVV_VUNDEF (dup_mode), merge_mask};
@@ -1938,7 +1945,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
- rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+ rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
-- 
2.34.1



Re: [PATCH v3] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-14 17:00
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Bugfix for vec_init repeating auto vectorization in 
RV32
From: Pan Li 
 
When constructing a vector mask from individual elements we wrongly
assumed that we can broadcast BITS_PER_WORD (i.e. XLEN).  The maximum is
actually the vector element length (i.e. ELEN).  This patch fixes this.
 
After this patch, below failures on RV32 will be fixed.
 
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
gcc/config/riscv/riscv-v.cc | 16 
1 file changed, 12 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e07d5c2901a..01f647bc0bd 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,17 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
index_in_pattern) const
{
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  /* Here we construct a mask pattern that will later be broadcast
+ to a vector register.  The maximum broadcast size for vmv.v.x/vmv.s.x
+ is determined by the length of a vector element (ELEN) and not by
+ XLEN so make sure we do not exceed it.  One example is -march=zve32*
+ which mandates ELEN == 32 but can be combined with -march=rv64
+ with XLEN == 64.  */
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
   for (int i = 0; i < limit; i++)
 mask |= base_mask << (i * npatterns ());
@@ -1928,7 +1935,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   rtx mask = gen_reg_rtx (mask_mode);
   rtx dup = gen_reg_rtx (dup_mode);
-  if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+  if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
RVV_VUNDEF (dup_mode), merge_mask};
@@ -1938,7 +1945,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
   else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
-   rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+   rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+ Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
-- 
2.34.1
 
 


Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Jan Beulich via Gcc-patches
On 14.06.2023 09:41, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
>  wrote:
>>
>> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
>> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
>> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
>> need for VEX3 in the broadcast ones. When EVEX encoding is required the
>> broadcast insns are always shorter.
>>
>> Add two new alternatives each, one covering the AVX2 case and one
>> covering AVX512.
> I think you can just change assemble output for this first alternative
> when TARGET_AVX2, use vbroadcastss, else use vshufps since
> vbroadcastss only accept register operand when TARGET_AVX2. And no
> need to support 2 extra alternatives which doesn't make sense just
> make RA more confused about the same meaning of different
> alternatives.

You mean by switching from "@ ..." to C code using "switch
(which_alternative)"? I can do that, sure. Yet that'll make for a
more complicated "length_immediate" attribute then. Would be nice
if you could confirm that this is what you want, as I may well
have misunderstood you.

But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
different.

>> ---
>> I'm working from the assumption that the isa attributes to the original
>> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
>> or avx_noavx2 as applicable), as the new earlier alternatives cover all
>> operand forms already when at least AVX2 is enabled.
>>
>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
>> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_
>> and elsewhere.)
> Not sure about this part. I grep prefix_extra, seems only used by
> znver.md/znver4.md for schedule, and only for comi instructions(?the
> reservation name seems so).

define_attr "length_vex" and define_attr "length" use it, too.
Otherwise I would have asked whether the attribute couldn't be
purged from most insns.

My present understanding is that the attribute is wrong on
vec_dupv4sf (and hence wants dropping from there altogether), and it
should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
only for the non-AVX pshufd case. I suspect at least the latter
would be going to far for doing it "while here" right in this patch.
Plus I think I have seen various other questionable uses of that
attribute.

>> Is use of Yv for the source operand really necessary in *vec_dupv4si?
>> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> allowed for scalar mode, but not for 128/256-bit vector modes.
> 
> 20204  if (TARGET_AVX512F
> 20205  && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> 20206  || VALID_AVX512F_SCALAR_MODE (mode)))
> 20207return true;

Okay, so I need to switch input constraints for relevant new
alternatives to Yv (I actually wonder why I did use v in
vec_dupv4sf, as it was clear to me that SFmode can be in the high
16 xmm registers with just AVX512F).

>> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
>> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
>> alternative (or just m, as Yv is already covered by the 2nd one)?
> I guess xm is more suitable since we still want to allocate
> operands[1] to register when sse3_noavx.
> It didn't hit any error since for avx and above, alternative 1(2rd
> one) is always matched than alternative 2.

I'm afraid I don't follow: With just -mavx512f the source operand
can be in, say, %xmm16 (as per your clarification above). This
would not match Yv, but it would match vm. And hence wrongly
create an AVX512VL form of vmovddup. I didn't try it out earlier,
because unlike for SFmode / DFmode I thought it's not really clear
how to get the compiler to reliably put a DImode variable in an xmm
reg, but it just occurred to me that this can be done the same way
there. And voila,

typedef long long __attribute__((vector_size(16))) v2di;

v2di bcst(long long ll) {
register long long x asm("xmm16") = ll;

asm("nop %%esp" : "+v" (x));
return (v2di){x, x};
}

compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
I'll make another patch, yet for that I'm then also not sure why
you say xm would be more suitable. Yvm allows for registers (with
or without AVX, merely SSE being required) just as much as vm
does, doesn't it? And I don't think I've found any combination of
destination being v and source being xm anywhere. Plus we want to
allow for the higher registers when AVX512VL is enabled.

Jan


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?
> I'd say a
>
>   (may_clobber (mem:BLK (reg:DI 1 1)))
>
> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it? Storing or possible storing?

I'd also understood it to be either.  As in, it is a may-clobber
that can be used for must-clobber.  Alternatively: the value stored
is unpredictable, and can therefore be the same as the current value.

I think the main difference between:

  (clobber (mem:BLK …))

and

  (set (mem:BLK …) (unspec:BLK …))

is that the latter must happen for correctness (unless something
that understands the unspec proves otherwise) whereas a clobber
can validly be dropped.  So for something like stack_tie, a set
seems more correct than a clobber.

Thanks,
Richard


Remove MFWRAP_SPEC remnant

2023-06-14 Thread Jivan Hakobyan via Gcc-patches
This patch removes a remnant of mudflap.

gcc/ChangeLog:
* config/moxie/uclinux.h (MFWRAP_SPEC): Remove


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/moxie/uclinux.h b/gcc/config/moxie/uclinux.h
index f7bb62e56c7..a7d371047c4 100644
--- a/gcc/config/moxie/uclinux.h
+++ b/gcc/config/moxie/uclinux.h
@@ -32,11 +32,3 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #undef TARGET_LIBC_HAS_FUNCTION
 #define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
-
-/* Like the definition in gcc.cc, but for purposes of uClinux, every link is
-   static.  */
-#define MFWRAP_SPEC " %{fmudflap|fmudflapth: \
- --wrap=malloc --wrap=free --wrap=calloc --wrap=realloc\
- --wrap=mmap --wrap=munmap --wrap=alloca\
- %{fmudflapth: --wrap=pthread_create\
-}} %{fmudflap|fmudflapth: --wrap=main}"


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-06-14 at 09:55 +0800, Jiufu Guo wrote:
> Hi,
> 
> Xi Ruoyao  writes:
> 
> > On Tue, 2023-06-13 at 20:23 +0800, Jiufu Guo via Gcc-patches wrote:
> > 
> > > Compare with previous version, this addes ChangeLog and removes
> > > const_anchor parts.
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621356.html.
> > 
> > [Off topic]
> > 
> > const_anchor is just broken now.  See
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104843 and the thread
> > beginning at
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591470.html.  If
> > you want to use it for rs6000 I guess you need to fix it first...
> 
> Thanks so much for pointing out this.  It seems about supporting
> negative value, right?
> 
> As you say: for 1. "g(0x8123, 0x81240001)", it would be fine.
> 
> The generated insns are:
> (insn 5 2 6 2 (set (reg:DI 117)
>     (const_int -2128347135 [0x81240001])) "negative.c":5:3 681 
> {*movdi_internal64}
>  (nil))
> (insn 6 5 7 2 (set (reg:DI 118)
>     (plus:DI (reg:DI 117)
>     (const_int -2 [0xfffe]))) "negative.c":5:3 66 
> {*adddi3}
>  (expr_list:REG_EQUAL (const_int -2128347137 [0x8123])
>     (nil)))
> 
> While for 2. "g (0x7fff, 0x8001)", the generated rtl insns:
> (insn 5 2 6 2 (set (reg:DI 117)
>     (const_int -2147483647 [0x8001])) "negative.c":5:3 681 
> {*movdi_internal64}
>  (nil))
> (insn 7 6 8 2 (set (reg:DI 3 3)
>     (const_int 2147483647 [0x7fff])) "negative.c":5:3 681 
> {*movdi_internal64}
>  (nil))
> 
> The current const_anchor does not generate sth like: "r3 = r117 - 2"
> But I would lean to say it is the limitation of current implementation:
> "0x8001" and "0x7fff" hit different anchors(even these
> two values are 'close' on some aspect.)

The generic issue here is to fix (not "papering over") the signed
overflow, we need to perform the addition in a target machine mode.  We
may always use Pmode (IIRC const_anchor was introduced for optimizing
some constant addresses), but can we do better?

Should we try addition in both DImode and SImode for a 64-bit capable
machine?

Or should we even try more operations than addition (for eg bit
operations like xor or shift)?  Doing so will need to create a new
target hook for const anchoring, this is the "complete rework" I meant.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > AFAIU this special instruction is only supposed to prevent
> > code motion (of stack memory accesses?) across this instruction?
> > I'd say a
> >
> >   (may_clobber (mem:BLK (reg:DI 1 1)))
> >
> > might be more to the point?  I've used "may_clobber" which doesn't
> > exist since I'm not sure whether a clobber is considered a kill.
> > The docs say "Represents the storing or possible storing of an 
> > unpredictable..." - what is it? Storing or possible storing?
> 
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.
> 
> I think the main difference between:
> 
>   (clobber (mem:BLK ?))
> 
> and
> 
>   (set (mem:BLK ?) (unspec:BLK ?))
> 
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

How can a clobber be validly dropped?  For the case of stack
memory if there's no stack use after it it could be elided
and I suppose the clobber itself can be moved.  But then
the function return is a stack use as well.

Btw, with the same reason the (set (mem:...)) could be removed, no?
Or is the (unspec:) SET_SRC having implicit side-effects that
prevents the removal (so rs6000 could have its stack_tie removed)?

That said, I fail to see how a clobber is special here.

Richard.

> Thanks,
> Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Wed, 14 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Segher Boessenkool  writes:
>> 
>> > Hi!
>> >
>> > As I said in a reply to the original patch: not okay.  Sorry.
>> 
>> Thanks a lot for your comments!
>> I'm also thinking about other solutions:
>> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>>   This is the existing pattern.  It may be read as an action
>>   to clean an unknown-size memory block.
>> 
>> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
>> UNSPEC_TIE".
>>   Current patch is using this one.
>> 
>> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
>> UNSPEC_TIE".
>>This avoids using BLK on unspec, but using DI.
>
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Oh, thanks!
While with 'unspec:DI', I'm wondering if it means this 'set' would
do some special things other than pure 'set' to the memory. 

BR,
Jeff (Jiufu Guo)

>
> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?
> I'd say a
>
>   (may_clobber (mem:BLK (reg:DI 1 1)))
>
> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?
> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).
>
> ?
>
>> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
>> UNSPEC_TIE"
>>There is still a mode for the unspec.
>> 
>> 
>> >
>> > But some comments on this patch:
>> >
>> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
>> >> +   && XINT (SET_SRC (set), 1) == UNSPEC_TIE
>> >> +   && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
>> >
>> > This makes it required that the operand of an UNSPEC_TIE unspec is a
>> > const_int 0.  This should be documented somewhere.  Ideally you would
>> > want no operand at all here, but every unspec has an operand.
>> 
>> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
>> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".
>> 
>> >
>> >> +  RTVEC_ELT (p, i)
>> >> + = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
>> >> + UNSPEC_TIE));
>> >
>> > If it is hard to indent your code, your code is trying to do to much.
>> > Just have an extra temporary?
>> >
>> >   rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), 
>> > UNSPEC_TIE);
>> >   RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);
>> >
>> > That is shorter even, and certainly more readable :-)
>> 
>> Yeap, thanks!
>> 
>> >
>> >> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>> >>operands[4] = gen_frame_mem (Pmode, operands[1]);
>> >>p = rtvec_alloc (1);
>> >>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
>> >> -   const0_rtx);
>> >> +   gen_rtx_UNSPEC (BLKmode,
>> >> +   gen_rtvec (1, const0_rtx),
>> >> +   UNSPEC_TIE));
>> >>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
>> >
>> > I have a hard time to see how this could ever be seen as clearer or more
>> > obvious or anything like that :-(
>> 
>> I was thinking about just invoking gen_stack_tie here.
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> >
>> >
>> > Segher
>> 


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Sandiford  writes:

> Richard Biener  writes:
>> AFAIU this special instruction is only supposed to prevent
>> code motion (of stack memory accesses?) across this instruction?
>> I'd say a
>>
>>   (may_clobber (mem:BLK (reg:DI 1 1)))
>>
>> might be more to the point?  I've used "may_clobber" which doesn't
>> exist since I'm not sure whether a clobber is considered a kill.
>> The docs say "Represents the storing or possible storing of an 
>> unpredictable..." - what is it? Storing or possible storing?
>
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.
>
> I think the main difference between:
>
>   (clobber (mem:BLK …))
>
> and
>
>   (set (mem:BLK …) (unspec:BLK …))
>
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

Thanks a lot for all your helpful comments!

BR,
Jeff (Jiufu Guo)

>
> Thanks,
> Richard


Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Jan Beulich via Gcc-patches
On 14.06.2023 10:10, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches
>  wrote:
>>
>> There's no reason to constrain this to AVX512VL, as the wider operation
>> is not usable for more narrow operands only when the possible memory
> But this may require more resources (on AMD znver4 processor a zmm
> instruction will also be split into 2 uops, right?) And on some intel
> processors(SKX/CLX) there will be frequency reduction.

I'm afraid I don't follow: Largely the same AVX512 code would be
generated when passing -mavx512vl, so how can power/performance
considerations matter here? All I'm doing here (and in a few more
patches I'm still in the process of testing) is relax when AVX512
insns can actually be used (reducing the copying between registers
and/or the number of insns needed). My understanding on the Intel
side is that it only matters whether AVX512 insns are used, not
what vector length they are. You may be right about znver4, though.

Nevertheless I agree ...

> If it needs to be done, it is better guarded with
> !TARGET_PREFER_AVX256, at least when micro-architecture AVX256_OPTIMAL
> or users explicitly uses -mprefer-vector-width=256, we don't want to
> produce any zmm instruction for surprise.(Although
> -mprefer-vector-width=256 is supposed for auto-vectorizer, but backend
> codegen also use it under such cases, i.e. in *movsf_internal
> alternative 5 use zmm only TARGET_AVX512F && !TARGET_PREFER_AVX256.)

... that respecting such overrides is probably desirable, so I'll
adjust.

Jan

>> source is a non-broadcast one. This way even the scalar copysign3
>> can benefit from the operation being a single-insn one (leaving aside
>> moves which the compiler decides to insert for unclear reasons, and
>> leaving aside the fact that bcst_mem_operand() is too restrictive for
>> broadcast to be embedded right into VPTERNLOG*).
>>
>> Along with this also request value duplication in
>> ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
>> excess space allocation in .rodata.*, filled with zeros which are never
>> read.
>>
>> gcc/
>>
>> * config/i386/i386-expand.cc (ix86_expand_copysign): Request
>> value duplication by ix86_build_signbit_mask() when AVX512F and
>> not HFmode.
>> * config/i386/sse.md (*_vternlog_all): Convert to
>> 2-alternative form. Adjust "mode" attribute. Add "enabled"
>> attribute.
>> (*_vpternlog_1): Relax to just TARGET_AVX512F.
>> (*_vpternlog_2): Likewise.
>> (*_vpternlog_3): Likewise.



Re: Fix typo in 'libgomp.c/target-51.c' (was: [patch] OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory)

2023-06-14 Thread Tobias Burnus

On 14.06.23 10:09, Thomas Schwinge wrote:

This reminds me of the (unresolved)https://gcc.gnu.org/PR81886
"Means to determine at runtime foffload targets specified at compile time".


I think there is the problem that we also support offloading in
libraries. Thus, if you compile the main program without offloading and
then link in a shared offloading-providing library (possibly with
dlopen), it comes (too) late. Thus, we either exclude devices which
could be later used – or we have to live with providing devices
(existing in hardware and with libgomp support) for which no executable
code is available.

As long as the number of devices is not a dynamic property, I guess we
can only handle one or the other.


I intend to push the attached "Fix typo in 'libgomp.c/target-51.c'"
after testing.
Let me know if I should also adjust the new 'target { ! offload_device }'
diagnostic "[...] MANDATORY but only the host device is available" to
include a comma before 'but', for consistency with the other existing
diagnostics (cited above)?


I think it makes sense to be consistent. Thus: Yes, please add the commas.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > AFAIU this special instruction is only supposed to prevent
>> > code motion (of stack memory accesses?) across this instruction?
>> > I'd say a
>> >
>> >   (may_clobber (mem:BLK (reg:DI 1 1)))
>> >
>> > might be more to the point?  I've used "may_clobber" which doesn't
>> > exist since I'm not sure whether a clobber is considered a kill.
>> > The docs say "Represents the storing or possible storing of an 
>> > unpredictable..." - what is it? Storing or possible storing?
>> 
>> I'd also understood it to be either.  As in, it is a may-clobber
>> that can be used for must-clobber.  Alternatively: the value stored
>> is unpredictable, and can therefore be the same as the current value.
>> 
>> I think the main difference between:
>> 
>>   (clobber (mem:BLK ?))
>> 
>> and
>> 
>>   (set (mem:BLK ?) (unspec:BLK ?))
>> 
>> is that the latter must happen for correctness (unless something
>> that understands the unspec proves otherwise) whereas a clobber
>> can validly be dropped.  So for something like stack_tie, a set
>> seems more correct than a clobber.
>
> How can a clobber be validly dropped?  For the case of stack
> memory if there's no stack use after it it could be elided
> and I suppose the clobber itself can be moved.  But then
> the function return is a stack use as well.
>
> Btw, with the same reason the (set (mem:...)) could be removed, no?
> Or is the (unspec:) SET_SRC having implicit side-effects that
> prevents the removal (so rs6000 could have its stack_tie removed)?
>
> That said, I fail to see how a clobber is special here.

Clobbers are for side-effects.  They don't start a def-use chain.
E.g. any use after a full clobber is an uninitialised read rather
than a read of the clobber “result”.

In contrast, a set of memory with an unspec source is in dataflow terms
the same as a set of memory with a specified source.  (some unspecs
actually have well-defined values, it's just that only the target code
knows what those well-defined value are.)

So a set of memory could only be removed if DSE proves that there are no
reads of the set bytes before the next set(s) to the same bytes of memory.
And memory is always live.

Thanks,
Richard



[PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1

2023-06-14 Thread Richard Biener via Gcc-patches


Currently vect_determine_partial_vectors_and_peeling will decide
to apply fully masking to the main loop despite
--param vect-partial-vector-usage=1 when the currently analyzed
vector mode results in a vectorization factor that's bigger
than the number of scalar iterations.  That's undesirable for
targets where a vector mode can handle both partial vector and
non-partial vector vectorization.  I understand that for AARCH64
we have SVE and NEON but SVE can only do partial vector and
NEON only non-partial vector vectorization, plus the target
chooses to let cost comparison decide the vector mode to use.

For x86 and the upcoming AVX512 partial vector support the
story is different, the target chooses the first (and largest)
vector mode that can successfully used for vectorization.  But
that means with --param vect-partial-vector-usage=1 we will
always choose AVX512 with partial vectors for the main loop
even if, for example, V4SI would be a perfect fit with full
vectors and no required epilog!

The following tries to find the appropriate condition for
this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
on the main loop when --param vect-partial-vector-usage=1 will
hurt AARCH64?  Incidentially looking up the docs for
vect-partial-vector-usage suggests that it's not supposed to
control epilog vectorization but instead
"1 allows partial vector loads and stores if vectorization removes the
need for the code to iterate".  That's probably OK in the end
but if there's a fixed size vector mode that allows the same thing
without using masking that would be better.

I wonder if we should special-case known niter (bounds) somehow
when analyzing the vector modes and override the targets sorting?

Maybe we want a new --param in addition to vect-epilogues-nomask
and vect-partial-vector-usage to say we want masked epilogues?

* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
as only applying to epilogues.
---
 gcc/tree-vect-loop.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9be66b8fbc5..9323aa572d4 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling 
(loop_vec_info loop_vinfo,
  && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
  && !vect_known_niters_smaller_than_vf (loop_vinfo))
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
-  else
+  /* Avoid using a large fixed size vectorization mode with masking
+for the main loop when we were asked to only use masking for
+the epilog.
+???  Ideally we'd start analysis with a better sized mode,
+the param_vect_partial_vector_usage == 2 case suffers from
+this as well.  But there's a catch-22.  */
+  else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+&& param_vect_partial_vector_usage == 1
+&& LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
 }
 
-- 
2.35.3


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 14 Jun 2023, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > AFAIU this special instruction is only supposed to prevent
> >> > code motion (of stack memory accesses?) across this instruction?
> >> > I'd say a
> >> >
> >> >   (may_clobber (mem:BLK (reg:DI 1 1)))
> >> >
> >> > might be more to the point?  I've used "may_clobber" which doesn't
> >> > exist since I'm not sure whether a clobber is considered a kill.
> >> > The docs say "Represents the storing or possible storing of an 
> >> > unpredictable..." - what is it? Storing or possible storing?
> >> 
> >> I'd also understood it to be either.  As in, it is a may-clobber
> >> that can be used for must-clobber.  Alternatively: the value stored
> >> is unpredictable, and can therefore be the same as the current value.
> >> 
> >> I think the main difference between:
> >> 
> >>   (clobber (mem:BLK ?))
> >> 
> >> and
> >> 
> >>   (set (mem:BLK ?) (unspec:BLK ?))
> >> 
> >> is that the latter must happen for correctness (unless something
> >> that understands the unspec proves otherwise) whereas a clobber
> >> can validly be dropped.  So for something like stack_tie, a set
> >> seems more correct than a clobber.
> >
> > How can a clobber be validly dropped?  For the case of stack
> > memory if there's no stack use after it it could be elided
> > and I suppose the clobber itself can be moved.  But then
> > the function return is a stack use as well.
> >
> > Btw, with the same reason the (set (mem:...)) could be removed, no?
> > Or is the (unspec:) SET_SRC having implicit side-effects that
> > prevents the removal (so rs6000 could have its stack_tie removed)?
> >
> > That said, I fail to see how a clobber is special here.
> 
> Clobbers are for side-effects.  They don't start a def-use chain.
> E.g. any use after a full clobber is an uninitialised read rather
> than a read of the clobber ?result?.

I see.  So

(parallel
 (unspec stack_tie)
 (clobber (mem:BLK ...)))

then?  I suppose it needs to be an unspec_volatile?  It feels like
the stack_ties are a delicate hack preventing enough but not too
much optimization ...

> In contrast, a set of memory with an unspec source is in dataflow terms
> the same as a set of memory with a specified source.  (some unspecs
> actually have well-defined values, it's just that only the target code
> knows what those well-defined value are.)
> 
> So a set of memory could only be removed if DSE proves that there are no
> reads of the set bytes before the next set(s) to the same bytes of memory.
> And memory is always live.
> 
> Thanks,
> Richard
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

2023-06-14 Thread Thomas Schwinge
Hi!

On 2023-06-13T13:11:38+0200, Tobias Burnus  wrote:
> On 13.06.23 12:42, Thomas Schwinge wrote:
>> On 2023-06-05T14:18:48+0200, I wrote:
>>> OK to push the attached
>>> "Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'"?
>>
>> Subject: [PATCH] Add
>>   'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'
>>
>>   gcc/testsuite/
>>   * gfortran.fortran-torture/execute/math.f90: Enhance for optional
>>   OpenACC, OpenMP 'target' usage.
>
> I think it is more readable with a linebreak here and with "OpenACC
> 'serial' and OpenMP ..." instead of "OpenACC, OpenMP".
>
> What I would like to see a hint somewhere in the commit log that the
> libgomp files include the gfortran.fortran-torture file. I don't care
> whether you add the hint before the changelog items as free text – or in
> the bullet above (e.g. "as it is included in libgomp/testsuite") – or
> after "New." in the following bullet list.
>
>>   libgomp/
>>   * testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
>>   * testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
>>   Likewise.
>
>> ---
>>   .../gfortran.fortran-torture/execute/math.f90 | 23 +--
>>   .../fortran-torture_execute_math.f90  |  4 
>>   .../fortran-torture_execute_math.f90  |  5 
>>   3 files changed, 30 insertions(+), 2 deletions(-)
>>   create mode 100644 
>> libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
>>   create mode 100644 
>> libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
>>
>> diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 
>> b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> index 17cc78f7a10..e71f669304f 100644
>> --- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> +++ b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> @@ -1,9 +1,14 @@
>>   ! Program to test mathematical intrinsics
>> +
>> +! See also 
>> 'libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90'; thus 
>> the '!$omp' directives.
>> +! See also 
>> 'libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90'; 
>> thus the '!$acc' directives.
>
> Likewise here: it is not completely obvious that this file is 'include'd
> by the other testcases.
>
> Maybe add a line "! This file is also included in:" and remove the "See
> also" or some creative variant of it.
>
> Minor remark: The OpenMP part is OK, but strict reading of the spec
> requires an "omp declare target' if a subroutine is in a different
> compilation unit. And according the glossary, that's the case here. In
> practice, it also works without as it is in the same translation unit.
> (compilation unit = for C/C++: translation unit, for Fortran:
> subprogram). I think the HPE/Cray compiler will complain, but maybe only
> when used with modules and not with subroutine subprograms. (As many
> compilers write a .mod file for modules, a late change of attributes can
> be more problematic.)
>
> Otherwise LGTM.

Thanks for the review.  I've now pushed
commit e76af2162c7b768ef0a913d485c51a80b08a1020
"Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'", see
attached.

> PS: I assume that you have check it with both with an in-build-tree and
> an in-install-tree testsuite run.

I happened to have (..., but don't think it'd make a relevant difference
here?)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e76af2162c7b768ef0a913d485c51a80b08a1020 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 2 Jun 2023 23:11:00 +0200
Subject: [PATCH] Add
 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

..., via 'include'ing the existing 'gfortran.fortran-torture/execute/math.f90',
which therefore is enhanced for optional OpenACC 'serial', OpenMP 'target'
usage.

	gcc/testsuite/
	* gfortran.fortran-torture/execute/math.f90: Enhance for optional
	OpenACC 'serial', OpenMP 'target' usage.
	libgomp/
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 .../gfortran.fortran-torture/execute/math.f90 | 24 +--
 .../fortran-torture_execute_math.f90  |  4 
 .../fortran-torture_execute_math.f90  |  5 
 3 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90

diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
index 17cc78f7a105..6c97eba3f8ff 100644
--- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f

Re: [committed] OpenMP: Cleanups related to the 'present' modifier

2023-06-14 Thread Tobias Burnus

On 14.06.23 10:42, Thomas Schwinge wrote:

Couldn't/shouldn't we now get rid of this 'GOMP_MAP_FLAG_PRESENT'...

...

  #define GOMP_MAP_PRESENT_P(X) \
-  (((X) & GOMP_MAP_FLAG_PRESENT) == GOMP_MAP_FLAG_PRESENT)
+  (((X) & GOMP_MAP_FLAG_PRESENT) == GOMP_MAP_FLAG_PRESENT \
+   || (X) == GOMP_MAP_FORCE_PRESENT)

..., and this 'GOMP_MAP_PRESENT_P' should look for
'GOMP_MAP_FLAG_ALWAYS_PRESENT' instead of 'GOMP_MAP_FLAG_PRESENT' (plus
'GOMP_MAP_FORCE_PRESENT')?

Instead of the current effective 'GOMP_MAP_FLAG_ALWAYS_PRESENT':

 GOMP_MAP_FLAG_SPECIAL_0
 | GOMP_MAP_FLAG_SPECIAL_2
 | GOMP_MAP_FLAG_SPECIAL_5

..., it could/should use a simpler flag combination?  (My idea is that
this later make usage of flag bits for other purposes easier -- but I've
not verified that in depth.)


I concur that it would be useful to save that space. We do not fully
rule out other combinations as we can always move to check single values
instead of comparing bit patterns, but I concur, reserving flags would
be useful.

Can you propose some bit pattern to use? Attached are the currently used
ones (binary, hex, and decimal).

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
1   0x100   256 GOMP_MAP_LAST
0   0x000 0 GOMP_MAP_ALLOC
1   0x001 1 GOMP_MAP_TO
00010   0x002 2 GOMP_MAP_FROM
00011   0x003 3 GOMP_MAP_TOFROM
00100   0x004 4 GOMP_MAP_POINTER
00101   0x005 5 GOMP_MAP_TO_PSET
00110   0x006 6 GOMP_MAP_FORCE_PRESENT
00111   0x007 7 GOMP_MAP_DELETE
01000   0x008 8 GOMP_MAP_FORCE_DEVICEPTR
01001   0x009 9 GOMP_MAP_DEVICE_RESIDENT
01010   0x00a10 GOMP_MAP_LINK
01011   0x00b11 GOMP_MAP_IF_PRESENT
01100   0x00c12 GOMP_MAP_FIRSTPRIVATE
01101   0x00d13 GOMP_MAP_FIRSTPRIVATE_INT
01110   0x00e14 GOMP_MAP_USE_DEVICE_PTR
0   0x00f15 GOMP_MAP_ZERO_LEN_ARRAY_SECTION
01000   0x080   128 GOMP_MAP_FORCE_ALLOC
01001   0x081   129 GOMP_MAP_FORCE_TO
01010   0x082   130 GOMP_MAP_FORCE_FROM
01011   0x083   131 GOMP_MAP_FORCE_TOFROM
1   0x01016 GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT
10001   0x01117 GOMP_MAP_ALWAYS_TO
10010   0x01218 GOMP_MAP_ALWAYS_FROM
10011   0x01319 GOMP_MAP_ALWAYS_TOFROM
010010101   0x095   149 GOMP_MAP_ALWAYS_PRESENT_TO
010010110   0x096   150 GOMP_MAP_ALWAYS_PRESENT_FROM
010010111   0x097   151 GOMP_MAP_ALWAYS_PRESENT_TOFROM
11100   0x01c28 GOMP_MAP_STRUCT
11101   0x01d29 GOMP_MAP_ALWAYS_POINTER
0   0x01e30 GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION
1   0x01f31 GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION
10111   0x01723 GOMP_MAP_RELEASE
00101   0x05080 GOMP_MAP_ATTACH
001010001   0x05181 GOMP_MAP_DETACH
011010001   0x0d1   209 GOMP_MAP_FORCE_DETACH
001010010   0x05282 GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
10001   0x101   257 GOMP_MAP_FIRSTPRIVATE_POINTER
10010   0x102   258 GOMP_MAP_FIRSTPRIVATE_REFERENCE
10011   0x103   259 GOMP_MAP_ATTACH_DETACH
10100   0x104   260 GOMP_MAP_PRESENT_ALLOC
10101   0x105   261 GOMP_MAP_PRESENT_TO
10110   0x106   262 GOMP_MAP_PRESENT_FROM
10111   0x107   263 GOMP_MAP_PRESENT_TOFROM


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 14 Jun 2023, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > AFAIU this special instruction is only supposed to prevent
>> >> > code motion (of stack memory accesses?) across this instruction?
>> >> > I'd say a
>> >> >
>> >> >   (may_clobber (mem:BLK (reg:DI 1 1)))
>> >> >
>> >> > might be more to the point?  I've used "may_clobber" which doesn't
>> >> > exist since I'm not sure whether a clobber is considered a kill.
>> >> > The docs say "Represents the storing or possible storing of an 
>> >> > unpredictable..." - what is it? Storing or possible storing?
>> >> 
>> >> I'd also understood it to be either.  As in, it is a may-clobber
>> >> that can be used for must-clobber.  Alternatively: the value stored
>> >> is unpredictable, and can therefore be the same as the current value.
>> >> 
>> >> I think the main difference between:
>> >> 
>> >>   (clobber (mem:BLK ?))
>> >> 
>> >> and
>> >> 
>> >>   (set (mem:BLK ?) (unspec:BLK ?))
>> >> 
>> >> is that the latter must happen for correctness (unless something
>> >> that understands the unspec proves otherwise) whereas a clobber
>> >> can validly be dropped.  So for something like stack_tie, a set
>> >> seems more correct than a clobber.
>> >
>> > How can a clobber be validly dropped?  For the case of stack
>> > memory if there's no stack use after it it could be elided
>> > and I suppose the clobber itself can be moved.  But then
>> > the function return is a stack use as well.
>> >
>> > Btw, with the same reason the (set (mem:...)) could be removed, no?
>> > Or is the (unspec:) SET_SRC having implicit side-effects that
>> > prevents the removal (so rs6000 could have its stack_tie removed)?
>> >
>> > That said, I fail to see how a clobber is special here.
>> 
>> Clobbers are for side-effects.  They don't start a def-use chain.
>> E.g. any use after a full clobber is an uninitialised read rather
>> than a read of the clobber ?result?.
>
> I see.  So
>
> (parallel
>  (unspec stack_tie)
>  (clobber (mem:BLK ...)))
>
> then?  I suppose it needs to be an unspec_volatile?

Yeah, it would need to be unspec_volatile, at which point it becomes
quite a big hammer.

> It feels like the stack_ties are a delicate hack preventing enough but
> not too much optimization ...

Yup.  I think the only non-hacky way would be to have dedicated RTL for
memory becoming valid and becoming invalid.  Anything else is a compromise.

But TBH, I still think the (set (mem:BLK …) (unspec:BLK …)) strikes
the right balance, unless there's a specific argument otherwise.
The effect on memory isn't a side effect (contrary to what clobber
implies) but instead is the main purpose of allocating and deallocating
stack memory.

Thanks,
Richard


Re: [ping] driver: Forward '-lgfortran', '-lm' to offloading compilation

2023-06-14 Thread Tobias Burnus

On 13.06.23 12:44, Thomas Schwinge wrote:

On 2023-06-05T14:25:18+0200, I wrote:

OK to push the attached
"driver: Forward '-lgfortran', '-lm' to offloading compilation"?
(We didn't have a PR open for that, or did we?)


(It was approved by Joseph and pushed by Thomas as r14-1807-g4bcb46b3ade179 )

I wonder whether we should do for the example:

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2720 +2720 @@ Typical command lines are
--foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-O3
+-foffload-options=amdgcn-amdhsa=-march=gfx906

To my knowledge the merge_flto_options is also run for code only doing
offloading - such that a host-side -O2 still ends up as -O2 for the offloading
compiler.

Thus, adding -foffload-options=-O3 encourages bad practice, at least kind of.

Thoughts?

BTW: I think the changed linking behavior should be document in the release
notes and in the wiki, i.e. https://gcc.gnu.org/gcc-14/changes.html
and https://gcc.gnu.org/wiki/Offloading (anywhere else?)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] RTL: Merge rtx_equal_p and hash_rtx functions with their callback variants

2023-06-14 Thread Uros Bizjak via Gcc-patches
Use default argument when callback function is not required to merge
rtx_equal_p and hash_rtx functions with their callback variants.

gcc/ChangeLog:

* cse.cc (hash_rtx_cb): Rename to hash_rtx.
(hash_rtx): Remove.
* early-remat.cc (remat_candidate_hasher::equal): Update
to call rtx_equal_p with rtx_equal_p_callback_function argument.
* rtl.cc (rtx_equal_p_cb): Rename to rtx_equal_p.
(rtx_equal_p): Remove.
* rtl.h (rtx_equal_p): Add rtx_equal_p_callback_function
argument with NULL default value.
(rtx_equal_p_cb): Remove function declaration.
(hash_rtx_cb): Ditto.
(hash_rtx): Add hash_rtx_callback_function argument
with NULL default value.
* sel-sched-ir.cc (free_nop_pool): Update function comment.
(skip_unspecs_callback): Ditto.
(vinsn_init): Update to call hash_rtx with
hash_rtx_callback_function argument.
(vinsn_equal_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master?

Uros.
diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..6f2e0d43185 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -2208,13 +2208,26 @@ hash_rtx_string (const char *ps)
   return hash;
 }
 
-/* Same as hash_rtx, but call CB on each rtx if it is not NULL.
+/* Hash an rtx.  We are careful to make sure the value is never negative.
+   Equivalent registers hash identically.
+   MODE is used in hashing for CONST_INTs only;
+   otherwise the mode of X is used.
+
+   Store 1 in DO_NOT_RECORD_P if any subexpression is volatile.
+
+   If HASH_ARG_IN_MEMORY_P is not NULL, store 1 in it if X contains
+   a MEM rtx which does not have the MEM_READONLY_P flag set.
+
+   Note that cse_insn knows that the hash code of a MEM expression
+   is just (int) MEM plus the hash code of the address.
+
+   Call CB on each rtx if CB is not NULL.
When the callback returns true, we continue with the new rtx.  */
 
 unsigned
-hash_rtx_cb (const_rtx x, machine_mode mode,
- int *do_not_record_p, int *hash_arg_in_memory_p,
- bool have_reg_qty, hash_rtx_callback_function cb)
+hash_rtx (const_rtx x, machine_mode mode,
+ int *do_not_record_p, int *hash_arg_in_memory_p,
+ bool have_reg_qty, hash_rtx_callback_function cb)
 {
   int i, j;
   unsigned hash = 0;
@@ -2234,8 +2247,8 @@ hash_rtx_cb (const_rtx x, machine_mode mode,
   if (cb != NULL
   && ((*cb) (x, mode, &newx, &newmode)))
 {
-  hash += hash_rtx_cb (newx, newmode, do_not_record_p,
-   hash_arg_in_memory_p, have_reg_qty, cb);
+  hash += hash_rtx (newx, newmode, do_not_record_p,
+   hash_arg_in_memory_p, have_reg_qty, cb);
   return hash;
 }
 
@@ -2355,9 +2368,9 @@ hash_rtx_cb (const_rtx x, machine_mode mode,
for (i = 0; i < units; ++i)
  {
elt = CONST_VECTOR_ENCODED_ELT (x, i);
-   hash += hash_rtx_cb (elt, GET_MODE (elt),
- do_not_record_p, hash_arg_in_memory_p,
- have_reg_qty, cb);
+   hash += hash_rtx (elt, GET_MODE (elt),
+ do_not_record_p, hash_arg_in_memory_p,
+ have_reg_qty, cb);
  }
 
return hash;
@@ -2463,10 +2476,10 @@ hash_rtx_cb (const_rtx x, machine_mode mode,
{
  for (i = 1; i < ASM_OPERANDS_INPUT_LENGTH (x); i++)
{
- hash += (hash_rtx_cb (ASM_OPERANDS_INPUT (x, i),
-GET_MODE (ASM_OPERANDS_INPUT (x, i)),
-do_not_record_p, hash_arg_in_memory_p,
-have_reg_qty, cb)
+ hash += (hash_rtx (ASM_OPERANDS_INPUT (x, i),
+GET_MODE (ASM_OPERANDS_INPUT (x, i)),
+do_not_record_p, hash_arg_in_memory_p,
+have_reg_qty, cb)
   + hash_rtx_string
(ASM_OPERANDS_INPUT_CONSTRAINT (x, i)));
}
@@ -2502,16 +2515,16 @@ hash_rtx_cb (const_rtx x, machine_mode mode,
  goto repeat;
}
 
- hash += hash_rtx_cb (XEXP (x, i), VOIDmode, do_not_record_p,
-   hash_arg_in_memory_p,
-   have_reg_qty, cb);
+ hash += hash_rtx (XEXP (x, i), VOIDmode, do_not_record_p,
+   hash_arg_in_memory_p,
+   have_reg_qty, cb);
  break;
 
case 'E':
  for (j = 0; j < XVECLEN (x, i); j++)
-   hash += hash_rtx_cb (XVECEXP (x, i, j), VOIDmode, do_not_record_p,
- hash_arg_in_memory_p,
- have_reg_qty, cb);
+   hash += hash_rtx (XVECEXP (x, i, j), VOIDmode, do_not_record_p,
+ hash_arg_in_memory_p,
+ have_reg_qt

[Patch] libgomp.texi: Document allocator + affininity env vars

2023-06-14 Thread Tobias Burnus

Comments on the wording and/or the content?

I did notice that we missed to document three OMP_* environment
variables, hence, I added them.

(For OMP_ALLOCATOR, I expect an update once the 5.1 extensions have been
implemented.)

(Some cross references could be added if we had/once we have documented
the respective omp_(get,set)_... I think we lack about half of the API
functions.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Document allocator + affininity env vars

libgomp/ChangeLog:

	* libgomp.texi (OMP_ALLOCATOR, OMP_AFFINITY_FORMAT,
	OMP_DISPLAY_AFFINITY): New.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 21d3582a665..70b090824bb 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1937,7 +1937,10 @@ section 4 of the OpenMP specification in version 4.5, while those
 beginning with @env{GOMP_} are GNU extensions.
 
 @menu
+* OMP_ALLOCATOR::   Set the default allocator
+* OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
 * OMP_CANCELLATION::Set whether cancellation is activated
+* OMP_DISPLAY_AFFINITY::Display thread affinity information
 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
 * OMP_DEFAULT_DEVICE::  Set the device used in target regions
 * OMP_DYNAMIC:: Dynamic adjustment of threads
@@ -1962,6 +1965,87 @@ beginning with @env{GOMP_} are GNU extensions.
 @end menu
 
 
+@node OMP_ALLOCATOR
+@section @env{OMP_ALLOCATOR} -- Set the default allocator
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Sets the default allocator that is used when no allocator has been specified
+in the @code{allocate} or @code{allocator} clause or when
+@code{omp_null_allocator} is used as allocator when invoking an OpenMP memory
+routine. The value should be one of the predefined allocators.
+If unset, @code{omp_default_mem_alloc} is used.
+
+@c @item @emph{See also}:
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
+@end table
+
+
+
+@node OMP_AFFINITY_FORMAT
+@section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Sets the format string used when displaying OpenMP thread affinity information.
+Special values are output using @code{%} followed by an optional size
+specification and then either the single-character field type or its long
+name enclosed in curly braces; using @code{%%} will display a literal percent.
+The size specification consists of an optional @code{0.} or @code{.} followed
+by a positive integer, specifing the minimal width of the output.  With
+@code{0.} and numerical values, the output is padded with zeros on the left;
+with @code{.}, the output is padded by spaces on the left; otherwise, the
+output is padded by spaces on the right.  If unset, the value is
+``@code{level %L thread %i affinity %A}''.
+
+Supported field types are:
+
+@multitable @columnfractions .10 .25 .60
+@item t @tab team_num @tab value returned by @code{omp_get_team_num}
+@item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
+@item L @tab nesting_level @tab value returned by @code{omp_get_level}
+@item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
+@item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
+@item a @tab ancestor_tnum
+  @tab value returned by
+   @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
+@item H @tab host @tab name of the host that executes the thread
+@item P @tab process_id @tab process identifier
+@item i @tab native_thread_id @tab native thread identifier
+@item A @tab thread_affinity
+  @tab comma separated list of integer values or ranges, representing the
+   processors on which a process might execute, subject to affinity
+   mechanisms
+@end multitable
+
+For instance, after setting
+
+@smallexample
+OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
+@end smallexample
+
+with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
+@code{omp_display_affinity} with @code{NULL} or an empty string, the program
+might display the following:
+
+@smallexample
+00!0!   1!4; 0;01;0;1;0-11
+00!3!   1!4; 0;01;0;1;0-11
+00!2!   1!4; 0;01;0;1;0-11
+00!1!   1!4; 0;01;0;1;0-11
+@end smallexample
+
+@item @emph{See also}:
+@ref{OMP_DISPLAY_AFFINITY}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
+@end table
+
+
+
 @node OMP_CANCELLATION
 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
 @cindex Environment Variable
@@ -1979,6 +2063,26 @@ if unset, cancellation is disabled and the @co

[PATCH] RISC-V: Fix PR 110119

2023-06-14 Thread Lehua Ding
Hi,

This patch fix the PR 110119. 

The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.

---
 gcc/config/riscv/riscv.cc | 19 -
 .../gcc.target/riscv/rvv/base/p110119-1.c | 27 +++
 .../gcc.target/riscv/rvv/base/p110119-2.c | 27 +++
 3 files changed, 67 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
 
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
 
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..3583e06f1a8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..1d12a610b677
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
-- 
2.36.3



libgomp testsuite: Don't handle 'lang_link_flags'

2023-06-14 Thread Thomas Schwinge
Hi!

Any objections to pushing the attached
"libgomp testsuite: Don't handle 'lang_link_flags'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b3d33dc858fffeeed83735e55d86963e2297a78d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 5 Jun 2023 11:45:41 +0200
Subject: [PATCH] libgomp testsuite: Don't handle 'lang_link_flags'

..., which as of recent commit 4bcb46b3ade1796c5a57b294f5cca25f00671cac
"driver: Forward '-lgfortran', '-lm' to offloading compilation" is unused,
and we don't anticipate any new usage.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't handle
	'lang_link_flags'.
---
 libgomp/testsuite/lib/libgomp.exp | 4 
 1 file changed, 4 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 1c4af9a8a2c..fb2bce38e28 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -277,10 +277,6 @@ proc libgomp_target_compile { source dest type options } {
 	lappend options "ldflags=-L${blddir}/${lang_library_path}"
 	}
 }
-global lang_link_flags
-if { [info exists lang_link_flags] } {
-	lappend options "ldflags=${lang_link_flags}"
-}
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
 	lappend options "libs=${gluefile}"
-- 
2.34.1



Re: [PATCH] RISC-V: Fix PR 110119

2023-06-14 Thread juzhe.zh...@rivai.ai
Add PR target/pr110119



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 18:34
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Fix PR 110119
Hi,
 
This patch fix the PR 110119. 
 
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
 
Best,
Lehua
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 -
.../gcc.target/riscv/rvv/base/p110119-1.c | 27 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 27 +++
3 files changed, 67 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..3583e06f1a8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..1d12a610b677
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+  __riscv_

Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others (was: Fix typo in 'libgomp.c/target-51.c' (was: [patch] OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory))

2023-06-14 Thread Thomas Schwinge
Hi!

On 2023-06-14T11:42:22+0200, Tobias Burnus  wrote:
> On 14.06.23 10:09, Thomas Schwinge wrote:
>> Let me know if I should also adjust the new 'target { ! offload_device }'
>> diagnostic "[...] MANDATORY but only the host device is available" to
>> include a comma before 'but', for consistency with the other existing
>> diagnostics (cited above)?
>
> I think it makes sense to be consistent. Thus: Yes, please add the commas.

I've pushed commit f2ef1dabbc18eb6efc0eb47bbb0eebbc6d72e09e
"Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f2ef1dabbc18eb6efc0eb47bbb0eebbc6d72e09e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 14 Jun 2023 12:44:05 +0200
Subject: [PATCH] Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others

On 2023-06-14T11:42:22+0200, Tobias Burnus  wrote:
> On 14.06.23 10:09, Thomas Schwinge wrote:
>> Let me know if I should also adjust the new 'target { ! offload_device }'
>> diagnostic "[...] MANDATORY but only the host device is available" to
>> include a comma before 'but', for consistency with the other existing
>> diagnostics (cited above)?
>
> I think it makes sense to be consistent. Thus: Yes, please add the commas.

Fix-up for recent commit 18c8b56c7d67a9e37acf28822587786f0fc0efbc
"OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory".

	libgomp/
	* target.c (resolve_device): Align a
	'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others.
	* testsuite/libgomp.c/target-51.c: Adjust.
---
 libgomp/target.c| 4 ++--
 libgomp/testsuite/libgomp.c/target-51.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index f1020fad601b..e39ef8f6e82a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -152,8 +152,8 @@ resolve_device (int device_id, bool remapped)
 	return NULL;
   if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
 	  && gomp_get_num_devices () == 0)
-	gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY but only the host "
-		"device is available");
+	gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
+		"but only the host device is available");
   else if (device_id == omp_invalid_device)
 	gomp_fatal ("omp_invalid_device encountered");
   else if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY)
diff --git a/libgomp/testsuite/libgomp.c/target-51.c b/libgomp/testsuite/libgomp.c/target-51.c
index cf9e690263e9..bbe9ade6e24b 100644
--- a/libgomp/testsuite/libgomp.c/target-51.c
+++ b/libgomp/testsuite/libgomp.c/target-51.c
@@ -9,7 +9,7 @@
 
 /* See comment in target-50.c/target-50.c for why the output differs.  */
 
-/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY but only the host device is available.*" { target { ! offload_device } } } */
+/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available.*" { target { ! offload_device } } } */
 /* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found.*" { target offload_device } } */
 
 int
-- 
2.39.2



[PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
Hi,

The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.

Best,
Lehua

  PR target/110119

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.

---
 gcc/config/riscv/riscv.cc | 19 +-
 .../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
 .../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
 3 files changed, 65 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
 
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
 
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
-- 
2.36.3



Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
Thanks for fixing this.

This patch let RVV type (both vector and tuple) return in memory by default 
when there is no vector ABI support.
It makes sens to me.

CC more RISC-V folks to comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:03
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
Hi,
 
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
 
Best,
Lehua
 
  PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 +-
.../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
3 files changed, 65 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (v

Re: [PATCH] RISC-V: Fix PR 110119

2023-06-14 Thread Lehua Ding
Resubmitted a new, more standardized patch(bellow is the new patch link), 
thanks.


https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621683.html


  

Re: [PATCH 1/2] Missed opportunity to use [SU]ABD

2023-06-14 Thread Richard Sandiford via Gcc-patches
Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
>   * doc/md.texi (sabd, uabd): Document them.
>   * internal-fn.def (ABD): Use new optab.
>   * optabs.def (sabd_optab, uabd_optab): New optabs,
>   * tree-vect-patterns.cc (vect_recog_absolute_difference):
>   Recognize the following idiom abs (a - b).
>   (vect_recog_sad_pattern): Refactor to use
>   vect_recog_absolute_difference.
>   (vect_recog_abd_pattern): Use patterns found by
>   vect_recog_absolute_difference to build a new ABD
>   internal call.
> ---
>  gcc/doc/md.texi   |  10 ++
>  gcc/internal-fn.def   |   3 +
>  gcc/optabs.def|   2 +
>  gcc/tree-vect-patterns.cc | 245 +-
>  4 files changed, 230 insertions(+), 30 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 6a435eb44610960513e9739ac9ac1e8a27182c10..e11b10d2fca11016232921bc85e47975f700e6c6
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5787,6 +5787,16 @@ Other shift and rotate instructions, analogous to the
>  Vector shift and rotate instructions that take vectors as operand 2
>  instead of a scalar type.
>  
> +@cindex @code{uabd@var{m}} instruction pattern
> +@cindex @code{sabd@var{m}} instruction pattern
> +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> +Signed and unsigned absolute difference instructions.  These
> +instructions find the difference between operands 1 and 2
> +then return the absolute value.  A C code equivalent would be:
> +@smallexample
> +op0 = op1 > op2 ? op1 - op2 : op2 - op1;
> +@end smallexample
> +
>  @cindex @code{avg@var{m}3_floor} instruction pattern
>  @cindex @code{uavg@var{m}3_floor} instruction pattern
>  @item @samp{avg@var{m}3_floor}
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 3ac9d82aace322bd8ef108596e5583daa18c76e3..116965f4830cec8f60642ff011a86b6562e2c509
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -191,6 +191,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
>  
> +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> +   sabd, uabd, binary)
> +
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> savg_floor, uavg_floor, binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 6c064ff4993620067d38742a0bfe0a3efb511069..35b835a6ac56d72417dac8ddfd77a8a7e2475e65
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> "mask_fold_left_plus_$a")
>  OPTAB_D (extract_last_optab, "extract_last_$a")
>  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
>  
> +OPTAB_D (uabd_optab, "uabd$a3")
> +OPTAB_D (sabd_optab, "sabd$a3")
>  OPTAB_D (savg_floor_optab, "avg$a3_floor")
>  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
>  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> dc102c919352a0328cf86eabceb3a38c41a7e4fd..4b63febc33e90b3caa854404a241afb2f09d755e
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -782,6 +782,86 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
> stmt2_info, tree new_rhs,
>  }
>  }
>  
> +/* Look for the following pattern
> + X = x[i]
> + Y = y[i]
> + DIFF = X - Y
> + DAD = ABS_EXPR
> +
> +   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
> +   HALF_TYPE and UNPROM will be set should the statement be found to
> +   be a widened operation.
> +   DIFF_STMT will be set to the MINUS_EXPR
> +   statement that precedes the ABS_STMT unless vect_widened_op_tree
> +   succeeds.
> + */
> +static bool
> +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
> + tree *half_type,
> + vect_unpromoted_value unprom[2],
> + gassign **diff_stmt)
> +{
> +  if (!abs_stmt)
> +return false;
> +
> +  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a 
> phi
> + inside the loop (in case we are analyzing an outer-loop).  */
> +  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
> +  if (code != ABS_EXPR && code != ABSU_EXPR)
> +return false;
> +
> +  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
> +  tree abs_type = TREE_TYPE (abs_oprnd);
> +  if (!abs_oprnd)
> +return false;
> +  if (!ANY_INTEGRAL_TYPE_P (abs_type)
> +  || TYPE_OVERFLOW_WRAPS (abs_type)
> +  || TYPE_UNSIGNED (abs_type))
> +return false;
> +
> +  /* Peel off conversions from the ABS input.  This can involve sign
> + changes (e.g. from an 

Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > +   else if (addc_subc)
> > > + {
> > > +   if (!integer_zerop (arg2))
> > > + ;
> > > +   /* x = y + 0 + 0; x = y - 0 - 0; */
> > > +   else if (integer_zerop (arg1))
> > > + result = arg0;
> > > +   /* x = 0 + y + 0; */
> > > +   else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > +   /* x = y - y - 0; */
> > > +   else if (subcode == MINUS_EXPR
> > > +&& operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> > 
> > So this all performs simplifications but also constant folding.  In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> > 
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
> 
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.

The patch I've posted yesterday now fully tested on x86_64-linux and
i686-linux.

Here is an untested incremental patch to handle constant folding of these
in fold-const-call.cc rather than gimple-fold.cc.
Not really sure if that is the way to go because it is replacing 28
lines of former code with 65 of new code, for the overall benefit that say
int
foo (long long *p)
{
  int one = 1;
  long long max = __LONG_LONG_MAX__;
  return __builtin_add_overflow (one, max, p);
}
can be now fully folded already in ccp1 pass while before it was only
cleaned up in forwprop1 pass right after it.

As for doing some stuff in match.pd, I'm afraid it would result in even more
significant growth, the advantage of gimple-fold.cc doing all of these in
one place is that the needed infrastructure can be shared.

--- gcc/gimple-fold.cc.jj   2023-06-14 12:21:38.657657759 +0200
+++ gcc/gimple-fold.cc  2023-06-14 12:52:04.335054958 +0200
@@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
  else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- if (type
- && result == NULL_TREE
- && TREE_CODE (arg0) == INTEGER_CST
- && TREE_CODE (arg1) == INTEGER_CST
- && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
-   {
- if (cplx_result)
-   result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
-   result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
-   {
- if (cplx_result)
-   overflow = build_one_cst (type);
- else
-   result = NULL_TREE;
-   }
- if (uaddc_usubc && result)
-   {
- tree r = int_const_binop (subcode, result,
-   fold_convert (type, arg2));
- if (r == NULL_TREE)
-   result = NULL_TREE;
- else if (arith_overflowed_p (subcode, type, result, arg2))
-   overflow = build_one_cst (type);
-   }
-   }
  if (result)
{
  if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj   2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc  2023-06-14 12:56:08.195631214 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
 {
   const char *p0, *p1;
   char c;
+  tree_code subcode;
   switch (fn)
 {
 case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
 case CFN_FOLD_LEFT_PLUS:
   return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+case CFN_UBSAN_CHECK_ADD:
+case CFN_ADD_OVERFLOW:
+  subcode = PLUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_SUB:
+case CFN_SUB_OVERFLOW:
+  subcode = MINUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_MUL:
+case CFN_MUL_OVERFLOW:
+  subcode = MULT_EXPR;
+  goto arith_overflow;
+
+arith_overflow:
+  if (integer_cst_p (arg0) && integer_cst_p (arg1))
+   {
+ tree itype
+   = TREE_CODE (type) == CO

Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
Oh. I see.

Change  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p 
(arg.mode))

into 

if (riscv_v_ext_mode_p (arg.mode))

since riscv_v_ext_mode_p (arg.mode) includes riscv_v_ext_vector_mode_p 
(arg.mode) and riscv_v_ext_tuple_mode_p (arg.mode)

no need has riscv_v_ext_tuple_mode_p


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:03
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
Hi,
 
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
 
Best,
Lehua
 
  PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 +-
.../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
3 files changed, 65 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{

回复: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
Also
p110119-1.c
change name of test into
pr110119-1.c


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-14 19:17
收件人: 丁乐华; gcc-patches
抄送: jeffreyalaw; Robin Dapp; palmer
主题: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
Oh. I see.

Change  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p 
(arg.mode))

into 

if (riscv_v_ext_mode_p (arg.mode))

since riscv_v_ext_mode_p (arg.mode) includes riscv_v_ext_vector_mode_p 
(arg.mode) and riscv_v_ext_tuple_mode_p (arg.mode)

no need has riscv_v_ext_tuple_mode_p


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:03
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
Hi,
 
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
 
Best,
Lehua
 
  PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 +-
.../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
3 files changed, 65 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fix

Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
Fix all comment from Juzhe, thanks. Below is the new patch. Please use the
attachment if there is a problem with the format of the patch below.



PR 110119



gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode




gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.



---
 gcc/config/riscv/riscv.cc | 17 
 .../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
 .../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
 3 files changed, 64 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
 
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
 
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t*)a;
+  return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 

0001-RISC-V-Ensure-vector-args-and-return-use-function-st.patch
Description: Binary data


Re: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
\ No newline at end of file

Add newline for each test.


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:33
To: 钟居哲; gcc-patches
CC: Jeff Law; Robin Dapp; palmer
Subject: Re: [PATCH] RISC-V: Ensure vector args and return use function stack 
to pass [PR110119]
Fix all comment from Juzhe, thanks. Below is the new patch. Please use the
attachment if there is a problem with the format of the patch below.

PR 110119

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for vector mode
(riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.

---
gcc/config/riscv/riscv.cc | 17 
.../gcc.target/riscv/rvv/base/pr110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/pr110119-2.c | 26 +++
3 files changed, 64 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
riscv_pass_in_vector_p (type);
}
- /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly. */
+ /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance. */
if (riscv_v_ext_mode_p (mode))
return NULL_RTX;
+
if (named)
{
riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
}
+ /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance. */
+ if (riscv_v_ext_mode_p (arg.mode))
+ return true;
+
/* Pass by reference if the data do not fit in two integer registers. */
return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+ vnx2qi v = {a, b};
+ return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+ return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t * a, int8_t *out)
+{
+ vint32m1_t v = *(vint32m1_t*)a;
+ return v;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+ vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+ return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+ __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+ __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+ vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+ return v;
+}
-- 
2.36.3
 
 
-- Original --
From:  "juzhe.zh...@rivai.ai";
Date:  Wed, Jun 14, 2023 07:20 PM
To:  "丁乐华"; "gcc-patches"; 
Cc:  "Jeff Law"; "Robin Dapp"; 
"palmer"; 
Subject:  回复: Re: [PATCH] RISC-V: Ensure vector args and return use function 
stack to pass [PR110119]
 
Also
p110119-1.c
change name of test into
pr110119-1.c


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-14 19:17
收件人: 丁乐华; gcc-patches
抄送: jeffreyalaw; Robin Dapp; palmer
主题: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
Oh. I see.

Change  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p 
(arg.mode))

in

Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

> Thanks for fixing this.
> 
> This patch let RVV type (both vector and tuple) return in memory by
> default when there is no vector ABI support. It makes sens to me.
> 
> CC more RISC-V folks to comments.

so this is intended to fix the PR as well as unblock while we continue
with the preliminary ABI separately?

If so, works for me.

Regards
 Robin


Re: [RFC] Add stdckdint.h header for C23

2023-06-14 Thread Florian Weimer via Gcc-patches
* Paul Eggert:

> I don't see how you could implement __has_include_next()
> for arbitrary non-GCC compilers, which is what we'd need for glibc
> users.

This is not a requirement for glibc in general.  For example, 
only works with compilers to which it has been ported.

Thanks,
Florian



[PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Biener via Gcc-patches
The function is only meaningful for LOOP_VINFO_MASKS processing so
inline it into the single use.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

* tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
into ...
(vect_verify_full_masking): ... this.
---
 gcc/tree-vect-loop.cc | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ace9e759f5b..a9695e5b25d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info loop_vinfo, 
tree cmp_type)
   return true;
 }
 
-/* Calculate the maximum number of scalars per iteration for every
-   rgroup in LOOP_VINFO.  */
-
-static unsigned int
-vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
-{
-  unsigned int res = 1;
-  unsigned int i;
-  rgroup_controls *rgm;
-  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
-res = MAX (res, rgm->max_nscalars_per_iter);
-  return res;
-}
-
 /* Calculate the minimum precision necessary to represent:
 
   MAX_NITERS * FACTOR
@@ -1210,8 +1196,6 @@ static bool
 vect_verify_full_masking (loop_vec_info loop_vinfo)
 {
   unsigned int min_ni_width;
-  unsigned int max_nscalars_per_iter
-= vect_get_max_nscalars_per_iter (loop_vinfo);
 
   /* Use a normal loop if there are no statements that need masking.
  This only happens in rare degenerate cases: it means that the loop
@@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
   if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
 return false;
 
+  /* Calculate the maximum number of scalars per iteration for every rgroup.  
*/
+  unsigned int max_nscalars_per_iter = 1;
+  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
+max_nscalars_per_iter
+  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
+
   /* Work out how many bits we need to represent the limit.  */
   min_ni_width
 = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);
-- 
2.35.3



[PATCH 2/3] Add loop_vinfo argument to vect_get_loop_mask

2023-06-14 Thread Richard Biener via Gcc-patches
This adds a loop_vinfo argument for future use, making the next
patch smaller.

* tree-vectorizer.h (vect_get_loop_mask): Add loop_vec_info
argument.
* tree-vect-loop.cc (vect_get_loop_mask): Likewise.
(vectorize_fold_left_reduction): Adjust.
(vect_transform_reduction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
---
 gcc/tree-vect-loop.cc  | 16 +---
 gcc/tree-vect-stmts.cc | 36 +++-
 gcc/tree-vectorizer.h  |  3 ++-
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a9695e5b25d..1897e720389 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6637,7 +6637,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *new_stmt;
   tree mask = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-   mask = vect_get_loop_mask (gsi, masks, vec_num, vectype_in, i);
+   mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -7950,8 +7950,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
  gcc_assert (commutative_binary_op_p (code, op.type));
  std::swap (vop[0], vop[1]);
}
- tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
- vectype_in, i);
+ tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
+ vec_num * ncopies, vectype_in, i);
  gcall *call = gimple_build_call_internal (cond_fn, 4, mask,
vop[0], vop[1], vop[0]);
  new_temp = make_ssa_name (vec_dest, call);
@@ -7967,8 +7967,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 
  if (masked_loop_p && mask_by_cond_expr)
{
- tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
- vectype_in, i);
+ tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
+ vec_num * ncopies, vectype_in, i);
  build_vect_cond_expr (code, vop, mask, gsi);
}
 
@@ -10075,7 +10075,8 @@ vectorizable_live_operation (vec_info *vinfo,
 the loop mask for the final iteration.  */
  gcc_assert (ncopies == 1 && !slp_node);
  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
- tree mask = vect_get_loop_mask (gsi, &LOOP_VINFO_MASKS (loop_vinfo),
+ tree mask = vect_get_loop_mask (loop_vinfo, gsi,
+ &LOOP_VINFO_MASKS (loop_vinfo),
  1, vectype, 0);
  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
  mask, vec_lhs_phi);
@@ -10359,7 +10360,8 @@ vect_record_loop_mask (loop_vec_info loop_vinfo, 
vec_loop_masks *masks,
arrangement.  */
 
 tree
-vect_get_loop_mask (gimple_stmt_iterator *gsi, vec_loop_masks *masks,
+vect_get_loop_mask (loop_vec_info,
+   gimple_stmt_iterator *gsi, vec_loop_masks *masks,
unsigned int nvectors, tree vectype, unsigned int index)
 {
   rgroup_controls *rgm = &(*masks)[nvectors - 1];
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a7acc032d47..47baf35227f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3692,7 +3692,8 @@ vectorizable_call (vec_info *vinfo,
  unsigned int vec_num = vec_oprnds0.length ();
  /* Always true for SLP.  */
  gcc_assert (ncopies == 1);
- vargs[varg++] = vect_get_loop_mask (gsi, masks, vec_num,
+ vargs[varg++] = vect_get_loop_mask (loop_vinfo,
+ gsi, masks, vec_num,
  vectype_out, i);
}
  size_t k;
@@ -3733,7 +3734,8 @@ vectorizable_call (vec_info *vinfo,
  unsigned int vec_num = vec_oprnds0.length ();
  /* Always true for SLP.  */
  gcc_assert (ncopies == 1);
- tree mask = vect_get_loop_mask (gsi, masks, vec_num,
+ tree mask = vect_get_loop_mask (loop_vinfo,
+ gsi, masks, vec_num,
  vectype_out, i);
  vargs[mask_opno] = prep

Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
> so this is intended to fix the PR as well as unblock while we continue
> with the preliminary ABI separately?


Yes, and I will send the new prerelease vector calling convention later.


Best,
Lehua

[PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Richard Biener via Gcc-patches
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).

Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar 128 256 512512e512f
19.42   11.329.35   11.17   15.13   16.89
25.726.536.666.667.628.56
34.495.105.105.745.085.73
44.104.334.295.213.794.25
63.783.853.864.762.542.85
83.641.893.764.501.922.16
   123.562.213.754.261.261.42
   163.360.831.064.160.951.07
   203.391.421.334.070.750.85
   243.230.661.724.220.620.70
   283.181.092.044.200.540.61
   323.160.470.410.410.470.53
   343.160.670.610.560.440.50
   383.190.950.950.820.400.45
   423.090.581.211.130.360.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.

This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

I was split between making 'vec_loop_masks' a class with methods,
possibly merging in the _len stuff into a single registry.  It
seemed to be too many changes for the purpose of getting AVX512
working.  I'm going to play wait and see what happens with RISC-V
here since they are going to get both masks and lengths registered
I think.

The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).

There's followup work to be done to try enabling masked epilogues
for x86-64 by default (when AVX512 is enabled, possibly only when
-mprefer-vector-width=512).  Getting cost modeling and decision
right is going to be challenging.

Any comments?

OK?

Btw, testing on GCN would be welcome - the _avx512 paths could
work for it so in case the while_ult path fails (not sure if
it ever does) it could get _avx512 style masking.  Likewise
testing on ARM just to see I didn't break anything here.
I don't have SVE hardware so testing is probably meaningless.

Thanks,
Richard.

* tree-vectorizer.h (enum vect_partial_vector_style): New.
(_loop_vec_info::partial_vector_style): Likewise.
(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
(rgroup_controls::compare_type): Add.
(vec_loop_masks): Change from a typedef to auto_vec<>
to a structure.
* tree-vect-loop-manip.cc (vect_set

[PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
The V2 patch address comments from Juzhe, thanks.

Hi,
 
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
 
Best,
Lehua

PR target/110119

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.

---
 gcc/config/riscv/riscv.cc | 17 
 .../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
 .../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
 3 files changed, 64 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
 
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
 
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..f16502bcfeec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t *a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t *) a;
+  return v;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
-- 
2.36.3



Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
> \ No newline at end of file
> Add newline for each test.



Address this comment, below is the V2 patch link.


https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621698.html
 
Best,
Lehua


  

Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM now. Thanks for fixing it.

Good to see a Fix patch of the ICE before Vector ABI patch.
Let's wait for more comments.

Lehua Ding takes care of Vector ABI implementation and hopefully will send it 
soon.

Thanks.


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:56
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; palmer
Subject: [PATCH V2] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
The V2 patch address comments from Juzhe, thanks.
 
Hi,
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
Best,
Lehua
 
PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 17 
.../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
.../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
3 files changed, 64 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..f16502bcfeec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t *a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t *) a;
+  return v;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_

Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM now. Thanks for fixing it.

Good to see a Fix patch of the ICE before Vector ABI patch.
Let's wait for more comments.

Lehua Ding takes care of Vector ABI implementation and hopefully will send it 
soon.

It seems the email of Jeff is wrong. CC Jeff .for you.

Thanks.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:56
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; palmer
Subject: [PATCH V2] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
The V2 patch address comments from Juzhe, thanks.
 
Hi,
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
Best,
Lehua
 
PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 17 
.../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
.../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
3 files changed, 64 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..f16502bcfeec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t *a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t *) a;
+  return v;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__att

Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM now. Thanks for fixing it.

Good to see a Fix patch of the ICE before Vector ABI patch.
Let's wait for more comments.

Lehua Ding takes care of Vector ABI implementation and hopefully will send it 
soon.

It seems the email of Jeff is wrong. CC Jeff .for you.

Oh. I see Robin's email is also wrong. CC Robin too for you 

Thanks.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:56
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; palmer
Subject: [PATCH V2] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119]
The V2 patch address comments from Juzhe, thanks.
 
Hi,
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
Best,
Lehua
 
PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 17 
.../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
.../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
3 files changed, 64 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2a..e5ae4e81b7a5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg)
return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index ..f16502bcfeec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t *a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t *) a;
+  return v;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index ..b233ff1e9040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, i

Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Robin Dapp via Gcc-patches
> Oh. I see Robin's email is also wrong. CC Robin too for you 

It still arrived via the mailing list ;)

> Good to see a Fix patch of the ICE before Vector ABI patch.
> Let's wait for more comments.

LGTM, this way I don't even need to rewrite my tests.

Regards
 Robin


[wwwdocs] Broken URL to README.Portability

2023-06-14 Thread Jivan Hakobyan via Gcc-patches
This patch fixes the link to README.Portability in "GCC Coding Conventions"
page


-- 
With the best regards
Jivan Hakobyan
diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index 9b6d243d..f5a356a8 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -252,7 +252,7 @@ and require at least an ANSI C89 or ISO C90 host compiler.
 C code should avoid pre-standard style function definitions, unnecessary
 function prototypes and use of the now deprecated PARAMS macro.
 See https://gcc.gnu.org/svn/gcc/trunk/gcc/README.Portability";>README.Portability
+href="https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/README.Portability";>README.Portability
 for details of some of the portability problems that may arise.  Some
 of these problems are warned about by gcc -Wtraditional,
 which is included in the default warning options in a bootstrap.


Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
>
> This patch converts the ipa_jump_func code to use the type agnostic
> ipa_vr suitable for GC instead of value_range which is integer specific.
>
> I've disabled the range cacheing to simplify the patch for review, but
> it is handled in the next patch in the series.
>
> OK?
>
> gcc/ChangeLog:
>
> * ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
> * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
> (ipa_set_jfunc_vr): Take a range.
> (ipa_compute_jump_functions_for_edge): Pass range to
> ipa_set_jfunc_vr.
> (ipa_write_jump_function): Call streamer write helper.
> (ipa_read_jump_function): Call streamer read helper.
> * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.
> ---
>  gcc/ipa-cp.cc   | 15 +++
>  gcc/ipa-prop.cc | 70 ++---
>  gcc/ipa-prop.h  |  5 +++-
>  3 files changed, 44 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index bdbc2184b5f..03273666ea2 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1928,6 +1928,21 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr,
>   && !dst_vr.undefined_p ());
>  }
>
> +/* Same as above, but the SRC_VR argument is an IPA_VR which must
> +   first be extracted onto a vrange.  */
> +
> +static bool
> +ipa_vr_operation_and_type_effects (vrange &dst_vr,
> +  const ipa_vr &src_vr,
> +  enum tree_code operation,
> +  tree dst_type, tree src_type)
> +{
> +  Value_Range tmp;
> +  src_vr.get_vrange (tmp);
> +  return ipa_vr_operation_and_type_effects (dst_vr, tmp, operation,
> +   dst_type, src_type);
> +}
> +
>  /* Determine range of JFUNC given that INFO describes the caller node or
> the one it is inlined to, CS is the call graph edge corresponding to JFUNC
> and PARM_TYPE of the parameter.  */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index bbfe0f8aa45..c46a89f1b49 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2287,9 +2287,10 @@ ipa_set_jfunc_bits (ipa_jump_func *jf, const 
> widest_int &value,
>  /* Return a pointer to a value_range just like *TMP, but either find it in
> ipa_vr_hash_table or allocate it in GC memory.  TMP->equiv must be NULL.  
> */
>
> -static value_range *
> -ipa_get_value_range (value_range *tmp)
> +static ipa_vr *
> +ipa_get_value_range (const vrange &tmp)
>  {
> +  /* FIXME: Add hashing support.
>value_range **slot = ipa_vr_hash_table->find_slot (tmp, INSERT);
>if (*slot)
>  return *slot;
> @@ -2297,40 +2298,27 @@ ipa_get_value_range (value_range *tmp)
>value_range *vr = new (ggc_alloc ()) value_range;
>*vr = *tmp;
>*slot = vr;
> +  */
> +  ipa_vr *vr = new (ggc_alloc ()) ipa_vr (tmp);
>
>return vr;
>  }
>
> -/* Return a pointer to a value range consisting of TYPE, MIN, MAX and an 
> empty
> -   equiv set. Use hash table in order to avoid creating multiple same copies 
> of
> -   value_ranges.  */
> -
> -static value_range *
> -ipa_get_value_range (enum value_range_kind kind, tree min, tree max)
> -{
> -  value_range tmp (TREE_TYPE (min),
> -  wi::to_wide (min), wi::to_wide (max), kind);
> -  return ipa_get_value_range (&tmp);
> -}
> -
> -/* Assign to JF a pointer to a value_range structure with TYPE, MIN and MAX 
> and
> -   a NULL equiv bitmap.  Use hash table in order to avoid creating multiple
> -   same value_range structures.  */
> +/* Assign to JF a pointer to a value_range just like TMP but either fetch a
> +   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
>
>  static void
> -ipa_set_jfunc_vr (ipa_jump_func *jf, enum value_range_kind type,
> - tree min, tree max)
> +ipa_set_jfunc_vr (ipa_jump_func *jf, const vrange &tmp)
>  {
> -  jf->m_vr = ipa_get_value_range (type, min, max);
> +  jf->m_vr = ipa_get_value_range (tmp);
>  }
>
> -/* Assign to JF a pointer to a value_range just like TMP but either fetch a
> -   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
> -
>  static void
> -ipa_set_jfunc_vr (ipa_jump_func *jf, value_range *tmp)
> +ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr)
>  {
> -  jf->m_vr = ipa_get_value_range (tmp);
> +  Value_Range tmp;
> +  vr.get_vrange (tmp);
> +  ipa_set_jfunc_vr (jf, tmp);
>  }
>
>  /* Compute jump function for all arguments of callsite CS and insert the
> @@ -2392,8 +2380,8 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>
>   if (addr_nonzero)
> {
> - tree z = build_int_cst (TREE_TYPE (arg), 0);
> - ipa_set_jfunc_vr (jfunc, VR_ANTI_RANGE, z, z);
> + vr.set_nonzero (TREE_TYPE (arg));
> + ipa_set_jfunc_vr (jfunc, vr);
> }
>   else
> gcc_assert (!jfunc->m_vr);

Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
>
> Minor cleanups to get rid of value_range in IPA.  There's only one left,
> but it's in the switch code which is integer specific.
>
> OK?
>
> gcc/ChangeLog:
>
> * ipa-cp.cc (decide_whether_version_node): Adjust comment.
> * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
> for Value_Range.
> (set_switch_stmt_execution_predicate): Same.
> * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
> ---
>  gcc/ipa-cp.cc|  3 +--
>  gcc/ipa-fnsummary.cc | 22 ++
>  gcc/ipa-prop.cc  |  9 +++--
>  3 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 03273666ea2..2e64415096e 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node *node)
> {
>   /* If some values generated for self-recursive calls with
>  arithmetic jump functions fall outside of the known
> -value_range for the parameter, we can skip them.  VR 
> interface
> -supports this only for integers now.  */
> +range for the parameter, we can skip them.  */
>   if (TREE_CODE (val->value) == INTEGER_CST
>   && !plats->m_value_range.bottom_p ()
>   && !ipa_range_contains_p (plats->m_value_range.m_vr,
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index 0474af8991e..1ce8501fe85 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>   if (vr.varying_p () || vr.undefined_p ())
> break;
>
> - value_range res;
> + Value_Range res (op->type);
>   if (!op->val[0])
> {
> + Value_Range varying (op->type);
> + varying.set_varying (op->type);
>   range_op_handler handler (op->code, op->type);
>   if (!handler
>   || !res.supports_type_p (op->type)
> - || !handler.fold_range (res, op->type, vr,
> - value_range (op->type)))
> + || !handler.fold_range (res, op->type, vr, varying))
> res.set_varying (op->type);
> }
>   else if (!op->val[1])
> {
> - value_range op0;
> + Value_Range op0 (op->type);
>   range_op_handler handler (op->code, op->type);
>
>   ipa_range_set_and_normalize (op0, op->val[0]);
> @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> }
>   if (!vr.varying_p () && !vr.undefined_p ())
> {
> - value_range res;
> - value_range val_vr;
> + int_range<2> res;
> + Value_Range val_vr (TREE_TYPE (c->val));
>   range_op_handler handler (c->code, boolean_type_node);
>
>   ipa_range_set_and_normalize (val_vr, c->val);
>
>   if (!handler
> - || !res.supports_type_p (boolean_type_node)
> + || !val_vr.supports_type_p (TREE_TYPE (c->val))
>   || !handler.fold_range (res, boolean_type_node, vr, 
> val_vr))
> res.set_varying (boolean_type_node);
>
> @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct 
> ipa_func_body_info *fbi,
>int bound_limit = opt_for_fn (fbi->node->decl,
> param_ipa_max_switch_predicate_bounds);
>int bound_count = 0;
> -  value_range vr;
> +  // This can safely be an integer range, as switches can only hold
> +  // integers.
> +  int_range<2> vr;
>
>get_range_query (cfun)->range_of_expr (vr, op);
>if (vr.undefined_p ())
>  vr.set_varying (TREE_TYPE (op));
>tree vr_min, vr_max;
> +  // ?? This entire function could use a rewrite to use the irange
> +  // API, instead of trying to recreate its intersection/union logic.
> +  // Any use of get_legacy_range() is a serious code smell.
>value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max);
>wide_int vr_wmin = wi::to_wide (vr_min);
>wide_int vr_wmax = wi::to_wide (vr_max);
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 6383bc11e0a..5f9e6dbbff2 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2348,7 +2348,6 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>gcall *call = cs->call_stmt;
>int n, arg_num = gimple_call_num_args (call);
>bool useful_context = false;
> -  value_range vr;
>
>if (arg_num == 0 || args->jump_functi

Re: [PATCH] Implement ipa_vr hashing.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Sat, Jun 10, 2023 at 10:30 PM Aldy Hernandez  wrote:
>
>
>
> On 5/29/23 16:51, Martin Jambor wrote:
> > Hi,
> >
> > On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> >> Implement hashing for ipa_vr.  When all is said and done, all these
> >> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
> >> the similar 7% increase in this area last week.  So we get type agnostic
> >> ranges with "infinite" range precision close to free.
> >
> > Do you know why/where this slow-down happens?  Do we perhaps want to
> > limit the "infiniteness" a little somehow?
>
> I addressed the slow down in another mail.
>
> >
> > Also, jump functions live for a long time, have you looked at how memory
> > hungry they become?  I hope that the hashing would be good at preventing
> > any issues.
>
> On a side-note, the caching does help.  On a (mistaken) hunch, I had
> played around with removing caching for everything but UNDEFINED/VARYING
> and zero/nonzero to simplify things, but the cache hit ratio was still
> surprisingly high (+80%).  So good job there :-).
>
> >
> > Generally, I think I OK with the patches if the impact on memory is not
> > too bad, though I guess they depend on the one I looked at last week, so
> > we may focus on that one first.
>
> I'm not sure whether this was an OK for the other patches, given you
> approved the first patch, so I'll hold off until you give the go-ahead.
>
> Thanks.
> Aldy



RE: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Li, Pan2 via Gcc-patches
Nit for test.

+/* { dg-options "-march=rv64gczve32x 
+--param=riscv-autovec-preference=fixed-vlmax" } */

To

+/* { dg-options "-march=rv64gc_zve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Wednesday, June 14, 2023 7:21 PM
To: 丁乐华 ; gcc-patches 
Cc: jeffreyalaw ; Robin Dapp ; 
palmer 
Subject: 回复: Re: [PATCH] RISC-V: Ensure vector args and return use function 
stack to pass [PR110119]

Also
p110119-1.c
change name of test into
pr110119-1.c


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-14 19:17
收件人: 丁乐华; gcc-patches
抄送: jeffreyalaw; Robin Dapp; palmer
主题: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119] Oh. I see.

Change  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p 
(arg.mode))

into 

if (riscv_v_ext_mode_p (arg.mode))

since riscv_v_ext_mode_p (arg.mode) includes riscv_v_ext_vector_mode_p 
(arg.mode) and riscv_v_ext_tuple_mode_p (arg.mode)

no need has riscv_v_ext_tuple_mode_p


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:03
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119] Hi,
 
The reason for this bug is that in the case where the vector register is set to 
a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option), 
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed 
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function 
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not 
unified. The current treatment is to pass all vector arguments and returns 
through the function stack, and a new calling convention for vector registers 
will be added in the future.
 
Best,
Lehua
 
  PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 +-
.../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
3 files changed, 65 insertions(+), 6 deletions(-) create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 
dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info &arg) return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD); } diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv 
+--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out) {
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ 

Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Jakub Jelinek wrote:

> On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > > + else if (addc_subc)
> > > > +   {
> > > > + if (!integer_zerop (arg2))
> > > > +   ;
> > > > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > > > + else if (integer_zerop (arg1))
> > > > +   result = arg0;
> > > > + /* x = 0 + y + 0; */
> > > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > > +   result = arg1;
> > > > + /* x = y - y - 0; */
> > > > + else if (subcode == MINUS_EXPR
> > > > +  && operand_equal_p (arg0, arg1, 0))
> > > > +   result = integer_zero_node;
> > > > +   }
> > > 
> > > So this all performs simplifications but also constant folding.  In
> > > particular the match.pd re-simplification will invoke fold_const_call
> > > on all-constant argument function calls but does not do extra folding
> > > on partially constant arg cases but instead relies on patterns here.
> > > 
> > > Can you add all-constant arg handling to fold_const_call and
> > > consider moving cases like y + 0 + 0 to match.pd?
> > 
> > The reason I've done this here is that this is the spot where all other
> > similar internal functions are handled, be it the ubsan ones
> > - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> > - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> > there 2 constant arguments as well as various patterns that can be
> > simplified and has code to clean it up later, build a COMPLEX_CST,
> > or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> > elsewhere, we should do it for all of those functions, but then
> > probably incrementally.
> 
> The patch I've posted yesterday now fully tested on x86_64-linux and
> i686-linux.
> 
> Here is an untested incremental patch to handle constant folding of these
> in fold-const-call.cc rather than gimple-fold.cc.
> Not really sure if that is the way to go because it is replacing 28
> lines of former code with 65 of new code, for the overall benefit that say
> int
> foo (long long *p)
> {
>   int one = 1;
>   long long max = __LONG_LONG_MAX__;
>   return __builtin_add_overflow (one, max, p);
> }
> can be now fully folded already in ccp1 pass while before it was only
> cleaned up in forwprop1 pass right after it.

I think that's still very much desirable so this followup looks OK.
Maybe you can re-base it as prerequesite though?

> As for doing some stuff in match.pd, I'm afraid it would result in even more
> significant growth, the advantage of gimple-fold.cc doing all of these in
> one place is that the needed infrastructure can be shared.

Yes, I saw that.

Richard.

> 
> --- gcc/gimple-fold.cc.jj 2023-06-14 12:21:38.657657759 +0200
> +++ gcc/gimple-fold.cc2023-06-14 12:52:04.335054958 +0200
> @@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
>   result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
>   result = arg1;
> -   if (type
> -   && result == NULL_TREE
> -   && TREE_CODE (arg0) == INTEGER_CST
> -   && TREE_CODE (arg1) == INTEGER_CST
> -   && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
> - {
> -   if (cplx_result)
> - result = int_const_binop (subcode, fold_convert (type, arg0),
> -   fold_convert (type, arg1));
> -   else
> - result = int_const_binop (subcode, arg0, arg1);
> -   if (result && arith_overflowed_p (subcode, type, arg0, arg1))
> - {
> -   if (cplx_result)
> - overflow = build_one_cst (type);
> -   else
> - result = NULL_TREE;
> - }
> -   if (uaddc_usubc && result)
> - {
> -   tree r = int_const_binop (subcode, result,
> - fold_convert (type, arg2));
> -   if (r == NULL_TREE)
> - result = NULL_TREE;
> -   else if (arith_overflowed_p (subcode, type, result, arg2))
> - overflow = build_one_cst (type);
> - }
> - }
> if (result)
>   {
> if (result == integer_zero_node)
> --- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
> +++ gcc/fold-const-call.cc2023-06-14 12:56:08.195631214 +0200
> @@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
>  {
>const char *p0, *p1;
>char c;
> +  tree_code subcode;
>switch (fn)
>  {
>  case CFN_BUILT_IN_STRSPN:
> @@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
>  case CFN_FOLD_LEFT_PLUS:
>return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
>  
> +case CFN_UBSAN_CHECK_ADD:
> +case CFN_ADD_OVERFLOW:
> +  s

Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches
On Tue, 13 Jun 2023, Jakub Jelinek wrote:

> On Tue, Jun 13, 2023 at 08:40:36AM +, Richard Biener wrote:
> > I suspect re-association can wreck things even more here.  I have
> > to say the matching code is very hard to follow, not sure if
> > splitting out a function matching
> > 
> >_22 = .{ADD,SUB}_OVERFLOW (_6, _5);
> >_23 = REALPART_EXPR <_22>;
> >_24 = IMAGPART_EXPR <_22>;
> > 
> > from _23 and _24 would help?
> 
> I've outlined 3 most often used sequences of statements or checks
> into 3 helper functions, hope that helps.
> 
> > > +  while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> > > + {
> > > +   gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> > > +   if (has_single_use (rhs[0])
> > > +   && is_gimple_assign (g)
> > > +   && (gimple_assign_rhs_code (g) == code
> > > +   || (code == MINUS_EXPR
> > > +   && gimple_assign_rhs_code (g) == PLUS_EXPR
> > > +   && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> > > + {
> > > +   rhs[0] = gimple_assign_rhs1 (g);
> > > +   tree &r = rhs[2] ? rhs[3] : rhs[2];
> > > +   r = gimple_assign_rhs2 (g);
> > > +   if (gimple_assign_rhs_code (g) != code)
> > > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
> > 
> > Can you use const_unop here?  In fact both will not reliably
> > negate all constants (ick), so maybe we want a force_const_negate ()?
> 
> It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should
> work.  That said, changed it to const_unop and am just giving up on it
> as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't
> simplify.
> 
> > > +   else if (addc_subc)
> > > + {
> > > +   if (!integer_zerop (arg2))
> > > + ;
> > > +   /* x = y + 0 + 0; x = y - 0 - 0; */
> > > +   else if (integer_zerop (arg1))
> > > + result = arg0;
> > > +   /* x = 0 + y + 0; */
> > > +   else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > +   /* x = y - y - 0; */
> > > +   else if (subcode == MINUS_EXPR
> > > +&& operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> > 
> > So this all performs simplifications but also constant folding.  In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> > 
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
> 
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.
> 
> > > +@cindex @code{addc@var{m}5} instruction pattern
> > > +@item @samp{addc@var{m}5}
> > > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> > > +only values 0 or 1) together, sets operand 0 to the result of the
> > > +addition of the 3 operands and sets operand 1 to 1 iff there was no
> > > +overflow on the unsigned additions, and to 0 otherwise.  So, it is
> > > +an addition with carry in (operand 4) and carry out (operand 1).
> > > +All operands have the same mode.
> > 
> > operand 1 set to 1 for no overflow sounds weird when specifying it
> > as carry out - can you double check?
> 
> Fixed.
> 
> > > +@cindex @code{subc@var{m}5} instruction pattern
> > > +@item @samp{subc@var{m}5}
> > > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> > > +from operand 2 instead of adding them.  So, it is
> > > +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> > > +(operand 1).  All operands have the same mode.
> > > +
> > 
> > I wonder if we want to name them uaddc and usubc?  Or is this supposed
> > to be simply the twos-complement "carry"?  I think the docs should
> > say so then (note we do have uaddv and addv).
> 
> Makes sense, I've actually renamed even the internal functions etc.
> 
> Here is only lightly tested patch with everything but gimple-fold.cc
> changed.
> 
> 2023-06-13  Jakub Jelinek  
> 
>   PR middle-end/79173
>   * internal-fn.def (UADDC, USUBC): New internal functions.
>   * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
>   (commutative_ternary_fn_p): Return true also for IFN_UADDC.
>   * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
>   * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
>   match_uaddc_usubc): New functi

Re:RE: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
> Nit for test.
> +/* { dg-options "-march=rv64gczve32x
> +--param=riscv-autovec-preference=fixed-vlmax" } */
> To
> +/* { dg-options "-march=rv64gc_zve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
Fixed in the V2 patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621698.html), thank you.


Best,
Lehua
 

Re: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> Currently vect_determine_partial_vectors_and_peeling will decide
> to apply fully masking to the main loop despite
> --param vect-partial-vector-usage=1 when the currently analyzed
> vector mode results in a vectorization factor that's bigger
> than the number of scalar iterations.  That's undesirable for
> targets where a vector mode can handle both partial vector and
> non-partial vector vectorization.  I understand that for AARCH64
> we have SVE and NEON but SVE can only do partial vector and
> NEON only non-partial vector vectorization, plus the target
> chooses to let cost comparison decide the vector mode to use.

SVE can do both (and does non-partial for things that can't yet be
predicated, like reversing loads).  But yeah, NEON can only do
non-partial.

> For x86 and the upcoming AVX512 partial vector support the
> story is different, the target chooses the first (and largest)
> vector mode that can successfully used for vectorization.  But
> that means with --param vect-partial-vector-usage=1 we will
> always choose AVX512 with partial vectors for the main loop
> even if, for example, V4SI would be a perfect fit with full
> vectors and no required epilog!

Sounds like a good candidate for VECT_COMPARE_COSTS.  Did you
try using that?

> The following tries to find the appropriate condition for
> this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
> on the main loop when --param vect-partial-vector-usage=1 will
> hurt AARCH64?

Yeah, I'd expect so.

> Incidentially looking up the docs for
> vect-partial-vector-usage suggests that it's not supposed to
> control epilog vectorization but instead
> "1 allows partial vector loads and stores if vectorization removes the
> need for the code to iterate".  That's probably OK in the end
> but if there's a fixed size vector mode that allows the same thing
> without using masking that would be better.
>
> I wonder if we should special-case known niter (bounds) somehow
> when analyzing the vector modes and override the targets sorting?
>
> Maybe we want a new --param in addition to vect-epilogues-nomask
> and vect-partial-vector-usage to say we want masked epilogues?
>
>   * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
>   For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
>   as only applying to epilogues.
> ---
>  gcc/tree-vect-loop.cc | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9be66b8fbc5..9323aa572d4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling 
> (loop_vec_info loop_vinfo,
> && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> && !vect_known_niters_smaller_than_vf (loop_vinfo))
>   LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> -  else
> +  /* Avoid using a large fixed size vectorization mode with masking
> +  for the main loop when we were asked to only use masking for
> +  the epilog.
> +  ???  Ideally we'd start analysis with a better sized mode,
> +  the param_vect_partial_vector_usage == 2 case suffers from
> +  this as well.  But there's a catch-22.  */
> +  else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && param_vect_partial_vector_usage == 1
> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))

I don't think is_constant is a good thing to test here.  The way things
work for SVE is essentially the same for VL-agnostic and VL-specific.

Also, I think this hard-codes the assumption that the smallest mode
isn't maskable.  Wouldn't it spuriously fail vectorisation if there
was no smaller mode available?

Similarly, it looks like it could fail for AVX512 if processing 511
characters, whereas I'd have expected AVX512 to still be the right
choice there.

If VECT_COMPARE_COSTS seems too expensive, we could try to look
for cases where a vector mode later in the list gives a VF that
is exactly equal to the number of scalar iterations.  (Exactly
*divides* the number of scalar iterations would be less clear-cut IMO.)

But converting a vector mode into a VF isn't trivial with our
current vectoriser structures, so I'm not sure how much of a
saving it would be over VECT_COMPARE_COSTS.  And it would be much
more special-purpose than VECT_COMPARE_COSTS.

Thanks,
Richard


>   LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
>  }


Re: [PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> The function is only meaningful for LOOP_VINFO_MASKS processing so
> inline it into the single use.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
>   * tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
>   into ...
>   (vect_verify_full_masking): ... this.

I think we did have a use for the separate function internally,
but obviously it was never submitted.  Personally I'd prefer
to keep things as they are though.



> ---
>  gcc/tree-vect-loop.cc | 22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ace9e759f5b..a9695e5b25d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info 
> loop_vinfo, tree cmp_type)
>return true;
>  }
>  
> -/* Calculate the maximum number of scalars per iteration for every
> -   rgroup in LOOP_VINFO.  */
> -
> -static unsigned int
> -vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
> -{
> -  unsigned int res = 1;
> -  unsigned int i;
> -  rgroup_controls *rgm;
> -  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
> -res = MAX (res, rgm->max_nscalars_per_iter);
> -  return res;
> -}
> -
>  /* Calculate the minimum precision necessary to represent:
>  
>MAX_NITERS * FACTOR
> @@ -1210,8 +1196,6 @@ static bool
>  vect_verify_full_masking (loop_vec_info loop_vinfo)
>  {
>unsigned int min_ni_width;
> -  unsigned int max_nscalars_per_iter
> -= vect_get_max_nscalars_per_iter (loop_vinfo);
>  
>/* Use a normal loop if there are no statements that need masking.
>   This only happens in rare degenerate cases: it means that the loop
> @@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
>if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
>  return false;
>  
> +  /* Calculate the maximum number of scalars per iteration for every rgroup. 
>  */
> +  unsigned int max_nscalars_per_iter = 1;
> +  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
> +max_nscalars_per_iter
> +  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
> +
>/* Work out how many bits we need to represent the limit.  */
>min_ni_width
>  = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);


[PATCH v2] c++: Accept elaborated-enum-base in system headers

2023-06-14 Thread Alex Coplan via Gcc-patches
Hi,

This is a v2 patch addressing feedback for:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621050.html

macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:

typedef enum T : BaseType T;

i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.

This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.

The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html

GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.

With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
OK for trunk?

Thanks,
Alex

gcc/c-family/ChangeLog:

* c.opt (Welaborated-enum-base): New.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_enum_specifier): Don't reject
elaborated-type-specifier with enum-base, instead emit new
Welaborated-enum-base warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/elab-enum-base.C: New test.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cead1995561..f935665d629 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1488,6 +1488,13 @@ Wsubobject-linkage
 C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1)
 Warn if a class type has a base or a field whose type uses the anonymous 
namespace or depends on a type with no linkage.
 
+Welaborated-enum-base
+C++ ObjC++ Var(warn_elaborated_enum_base) Warning Init(1)
+Warn if an additional enum-base is used in an elaborated-type-specifier.
+That is, if an enum with given underlying type and no enumerator list
+is used in a declaration other than just a standalone declaration of the
+enum.
+
 Wduplicate-decl-specifier
 C ObjC Var(warn_duplicate_decl_specifier) Warning LangEnabledBy(C ObjC,Wall)
 Warn when a declaration has duplicate const, volatile, restrict or _Atomic 
specifier.
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d77fbd20e56..4dd290717de 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -21024,11 +21024,13 @@ cp_parser_enum_specifier (cp_parser* parser)
 
   /* Check for the `:' that denotes a specified underlying type in C++0x.
  Note that a ':' could also indicate a bitfield width, however.  */
+  location_t colon_loc = UNKNOWN_LOCATION;
   if (cp_lexer_next_token_is (parser->lexer, CPP_COLON))
 {
   cp_decl_specifier_seq type_specifiers;
 
   /* Consume the `:'.  */
+  colon_loc = cp_lexer_peek_token (parser->lexer)->location;
   cp_lexer_consume_token (parser->lexer);
 
   auto tdf
@@ -21077,10 +21079,13 @@ cp_parser_enum_specifier (cp_parser* parser)
  && cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
{
  if (has_underlying_type)
-   cp_parser_commit_to_tentative_parse (parser);
- cp_parser_error (parser, "expected %<;%> or %<{%>");
- if (has_underlying_type)
-   return error_mark_node;
+   pedwarn (colon_loc,
+OPT_Welaborated_enum_base,
+"declaration of enumeration with "
+"fixed underlying type and no enumerator list is "
+"only permitted as a standalone declaration");
+ else
+   cp_parser_error (parser, "expected %<;%> or %<{%>");
}
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C 
b/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C
new file mode 100644
index 000..57141f013bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+// Empty dg-options to override -pedantic-errors.
+
+typedef long CFIndex;
+typedef enum CFComparisonResult : CFIndex CFComparisonResult;
+// { dg-warning "declaration of enumeration with fixed underlying type" "" { 
target *-*-* } .-1 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/enum40.C 
b/gcc/testsuite/g++.dg/cpp0x/enum40.C
index cfdf2a4a18a..d3ffeb62d70 100644
--- a/gcc/testsuite/g++.dg/cpp0x/enum40.C
+++ b/gcc/testsuite/g++.dg/cpp0x/enum40.C
@@ -4,23 +4,25 @@
 void
 foo ()
 {
-  enum : int a alignas;// { dg-error "expected" }
+  enum : int a alignas;// { dg-error "declaration of enum" }
+  // { dg-error {expected '\(' before ';'} "" { target *-*-* } .-1 }
 }
 
 void
 bar ()
 {
-  enum : int a;   

Re: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > Currently vect_determine_partial_vectors_and_peeling will decide
> > to apply fully masking to the main loop despite
> > --param vect-partial-vector-usage=1 when the currently analyzed
> > vector mode results in a vectorization factor that's bigger
> > than the number of scalar iterations.  That's undesirable for
> > targets where a vector mode can handle both partial vector and
> > non-partial vector vectorization.  I understand that for AARCH64
> > we have SVE and NEON but SVE can only do partial vector and
> > NEON only non-partial vector vectorization, plus the target
> > chooses to let cost comparison decide the vector mode to use.
> 
> SVE can do both (and does non-partial for things that can't yet be
> predicated, like reversing loads).  But yeah, NEON can only do
> non-partial.
> 
> > For x86 and the upcoming AVX512 partial vector support the
> > story is different, the target chooses the first (and largest)
> > vector mode that can successfully used for vectorization.  But
> > that means with --param vect-partial-vector-usage=1 we will
> > always choose AVX512 with partial vectors for the main loop
> > even if, for example, V4SI would be a perfect fit with full
> > vectors and no required epilog!
> 
> Sounds like a good candidate for VECT_COMPARE_COSTS.  Did you
> try using that?

Yeah, I didn't try that because we've never done that and I expect
unrelated "effects" ...

> > The following tries to find the appropriate condition for
> > this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
> > on the main loop when --param vect-partial-vector-usage=1 will
> > hurt AARCH64?
> 
> Yeah, I'd expect so.
> 
> > Incidentially looking up the docs for
> > vect-partial-vector-usage suggests that it's not supposed to
> > control epilog vectorization but instead
> > "1 allows partial vector loads and stores if vectorization removes the
> > need for the code to iterate".  That's probably OK in the end
> > but if there's a fixed size vector mode that allows the same thing
> > without using masking that would be better.
> >
> > I wonder if we should special-case known niter (bounds) somehow
> > when analyzing the vector modes and override the targets sorting?
> >
> > Maybe we want a new --param in addition to vect-epilogues-nomask
> > and vect-partial-vector-usage to say we want masked epilogues?
> >
> > * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
> > For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
> > as only applying to epilogues.
> > ---
> >  gcc/tree-vect-loop.cc | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 9be66b8fbc5..9323aa572d4 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling 
> > (loop_vec_info loop_vinfo,
> >   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> >   && !vect_known_niters_smaller_than_vf (loop_vinfo))
> > LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > -  else
> > +  /* Avoid using a large fixed size vectorization mode with masking
> > +for the main loop when we were asked to only use masking for
> > +the epilog.
> > +???  Ideally we'd start analysis with a better sized mode,
> > +the param_vect_partial_vector_usage == 2 case suffers from
> > +this as well.  But there's a catch-22.  */
> > +  else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> > +&& param_vect_partial_vector_usage == 1
> > +&& LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
> 
> I don't think is_constant is a good thing to test here.  The way things
> work for SVE is essentially the same for VL-agnostic and VL-specific.
> 
> Also, I think this hard-codes the assumption that the smallest mode
> isn't maskable.  Wouldn't it spuriously fail vectorisation if there
> was no smaller mode available?
> 
> Similarly, it looks like it could fail for AVX512 if processing 511
> characters, whereas I'd have expected AVX512 to still be the right
> choice there.

Possibly, yes.

> If VECT_COMPARE_COSTS seems too expensive, we could try to look
> for cases where a vector mode later in the list gives a VF that
> is exactly equal to the number of scalar iterations.  (Exactly
> *divides* the number of scalar iterations would be less clear-cut IMO.)
> 
> But converting a vector mode into a VF isn't trivial with our
> current vectoriser structures, so I'm not sure how much of a
> saving it would be over VECT_COMPARE_COSTS.  And it would be much
> more special-purpose than VECT_COMPARE_COSTS.

It occured to me if NITER is constant or we have a constant bound
on it we could set max_vf to the next higher power-of-two
(and min_vf to the next lower power-of-two?) when doing the
"autodetect" run.  Unfortunately we 

Re: [PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > The function is only meaningful for LOOP_VINFO_MASKS processing so
> > inline it into the single use.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> >
> > * tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
> > into ...
> > (vect_verify_full_masking): ... this.
> 
> I think we did have a use for the separate function internally,
> but obviously it was never submitted.  Personally I'd prefer
> to keep things as they are though.

OK - after 3/3 it's no longer "generic" (it wasn't before,
it doesn't inspect the _len groups either), it's only meaningful
for WHILE_ULT style analysis.

> 
> 
> > ---
> >  gcc/tree-vect-loop.cc | 22 ++
> >  1 file changed, 6 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index ace9e759f5b..a9695e5b25d 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info 
> > loop_vinfo, tree cmp_type)
> >return true;
> >  }
> >  
> > -/* Calculate the maximum number of scalars per iteration for every
> > -   rgroup in LOOP_VINFO.  */
> > -
> > -static unsigned int
> > -vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
> > -{
> > -  unsigned int res = 1;
> > -  unsigned int i;
> > -  rgroup_controls *rgm;
> > -  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
> > -res = MAX (res, rgm->max_nscalars_per_iter);
> > -  return res;
> > -}
> > -
> >  /* Calculate the minimum precision necessary to represent:
> >  
> >MAX_NITERS * FACTOR
> > @@ -1210,8 +1196,6 @@ static bool
> >  vect_verify_full_masking (loop_vec_info loop_vinfo)
> >  {
> >unsigned int min_ni_width;
> > -  unsigned int max_nscalars_per_iter
> > -= vect_get_max_nscalars_per_iter (loop_vinfo);
> >  
> >/* Use a normal loop if there are no statements that need masking.
> >   This only happens in rare degenerate cases: it means that the loop
> > @@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
> >if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
> >  return false;
> >  
> > +  /* Calculate the maximum number of scalars per iteration for every 
> > rgroup.  */
> > +  unsigned int max_nscalars_per_iter = 1;
> > +  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
> > +max_nscalars_per_iter
> > +  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
> > +
> >/* Work out how many bits we need to represent the limit.  */
> >min_ni_width
> >  = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 12:25:46PM +, Richard Biener wrote:
> I think that's still very much desirable so this followup looks OK.
> Maybe you can re-base it as prerequesite though?

Rebased then (of course with the UADDC/USUBC handling removed from this
first patch, will be added in the second one).

Ok for trunk if it passes bootstrap/regtest?

2023-06-14  Jakub Jelinek  

* gimple-fold.cc (gimple_fold_call): Move handling of arg0
as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
and .{ADD,SUB,MUL}_OVERFLOW calls from here...
* fold-const-call.cc (fold_const_call): ... here.

--- gcc/gimple-fold.cc.jj   2023-06-13 18:23:37.199793275 +0200
+++ gcc/gimple-fold.cc  2023-06-14 15:41:51.090987708 +0200
@@ -5702,22 +5702,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
  else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- else if (TREE_CODE (arg0) == INTEGER_CST
-  && TREE_CODE (arg1) == INTEGER_CST)
-   {
- if (cplx_result)
-   result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
-   result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
-   {
- if (cplx_result)
-   overflow = build_one_cst (type);
- else
-   result = NULL_TREE;
-   }
-   }
  if (result)
{
  if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj   2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc  2023-06-14 15:40:34.388064498 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
 {
   const char *p0, *p1;
   char c;
+  tree_code subcode;
   switch (fn)
 {
 case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
 case CFN_FOLD_LEFT_PLUS:
   return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+case CFN_UBSAN_CHECK_ADD:
+case CFN_ADD_OVERFLOW:
+  subcode = PLUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_SUB:
+case CFN_SUB_OVERFLOW:
+  subcode = MINUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_MUL:
+case CFN_MUL_OVERFLOW:
+  subcode = MULT_EXPR;
+  goto arith_overflow;
+
+arith_overflow:
+  if (integer_cst_p (arg0) && integer_cst_p (arg1))
+   {
+ tree itype
+   = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
+ bool ovf = false;
+ tree r = int_const_binop (subcode, fold_convert (itype, arg0),
+   fold_convert (itype, arg1));
+ if (!r || TREE_CODE (r) != INTEGER_CST)
+   return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, arg0, arg1))
+   ovf = true;
+ if (TREE_OVERFLOW (r))
+   r = drop_tree_overflow (r);
+ if (itype == type)
+   {
+ if (ovf)
+   return NULL_TREE;
+ return r;
+   }
+ else
+   return build_complex (type, r, build_int_cst (itype, ovf));
+   }
+  return NULL_TREE;
+
 default:
   return fold_const_call_1 (fn, type, arg0, arg1);
 }


Jakub



Re: [PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Jakub Jelinek wrote:

> On Wed, Jun 14, 2023 at 12:25:46PM +, Richard Biener wrote:
> > I think that's still very much desirable so this followup looks OK.
> > Maybe you can re-base it as prerequesite though?
> 
> Rebased then (of course with the UADDC/USUBC handling removed from this
> first patch, will be added in the second one).
> 
> Ok for trunk if it passes bootstrap/regtest?

OK.

Thanks,
Richard.

> 2023-06-14  Jakub Jelinek  
> 
>   * gimple-fold.cc (gimple_fold_call): Move handling of arg0
>   as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
>   and .{ADD,SUB,MUL}_OVERFLOW calls from here...
>   * fold-const-call.cc (fold_const_call): ... here.
> 
> --- gcc/gimple-fold.cc.jj 2023-06-13 18:23:37.199793275 +0200
> +++ gcc/gimple-fold.cc2023-06-14 15:41:51.090987708 +0200
> @@ -5702,22 +5702,6 @@ gimple_fold_call (gimple_stmt_iterator *
>   result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
>   result = arg1;
> -   else if (TREE_CODE (arg0) == INTEGER_CST
> -&& TREE_CODE (arg1) == INTEGER_CST)
> - {
> -   if (cplx_result)
> - result = int_const_binop (subcode, fold_convert (type, arg0),
> -   fold_convert (type, arg1));
> -   else
> - result = int_const_binop (subcode, arg0, arg1);
> -   if (result && arith_overflowed_p (subcode, type, arg0, arg1))
> - {
> -   if (cplx_result)
> - overflow = build_one_cst (type);
> -   else
> - result = NULL_TREE;
> - }
> - }
> if (result)
>   {
> if (result == integer_zero_node)
> --- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
> +++ gcc/fold-const-call.cc2023-06-14 15:40:34.388064498 +0200
> @@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
>  {
>const char *p0, *p1;
>char c;
> +  tree_code subcode;
>switch (fn)
>  {
>  case CFN_BUILT_IN_STRSPN:
> @@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
>  case CFN_FOLD_LEFT_PLUS:
>return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
>  
> +case CFN_UBSAN_CHECK_ADD:
> +case CFN_ADD_OVERFLOW:
> +  subcode = PLUS_EXPR;
> +  goto arith_overflow;
> +
> +case CFN_UBSAN_CHECK_SUB:
> +case CFN_SUB_OVERFLOW:
> +  subcode = MINUS_EXPR;
> +  goto arith_overflow;
> +
> +case CFN_UBSAN_CHECK_MUL:
> +case CFN_MUL_OVERFLOW:
> +  subcode = MULT_EXPR;
> +  goto arith_overflow;
> +
> +arith_overflow:
> +  if (integer_cst_p (arg0) && integer_cst_p (arg1))
> + {
> +   tree itype
> + = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
> +   bool ovf = false;
> +   tree r = int_const_binop (subcode, fold_convert (itype, arg0),
> + fold_convert (itype, arg1));
> +   if (!r || TREE_CODE (r) != INTEGER_CST)
> + return NULL_TREE;
> +   if (arith_overflowed_p (subcode, itype, arg0, arg1))
> + ovf = true;
> +   if (TREE_OVERFLOW (r))
> + r = drop_tree_overflow (r);
> +   if (itype == type)
> + {
> +   if (ovf)
> + return NULL_TREE;
> +   return r;
> + }
> +   else
> + return build_complex (type, r, build_int_cst (itype, ovf));
> + }
> +  return NULL_TREE;
> +
>  default:
>return fold_const_call_1 (fn, type, arg0, arg1);
>  }
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> At this point two pages of code without a comment - can you introduce
> some vertical spacing and comments as to what is matched now?  The
> split out functions help somewhat but the code is far from obvious :/
> 
> Maybe I'm confused by the loops and instead of those sth like
> 
>  if (match_x_y_z (op0)
>  || match_x_y_z (op1))
>...
> 
> would be easier to follow with the loop bodies split out?
> Maybe put just put them in lambdas even?
> 
> I guess you'll be around as long as myself so we can go with
> this code under the premise you're going to maintain it - it's
> not that I'm writing trivially to understand code myself ...

As I said on IRC, I don't really know how to split that into further
functions, the problem is that we need to pattern match a lot of
statements and it is hard to come up with names for each of them.
And we need quite a lot of variables for checking their interactions.

The code isn't that much different from say match_arith_overflow or
optimize_spaceship or other larger pattern recognizers.  And the
intent is that all the code paths in the recognizer are actually covered
by the testcases in the testsuite.

That said, I've added 18 new comments to the function, and rebased it
on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
patch with all constant arguments handling moved to fold-const-call.cc
even for the new ifns.

Ok for trunk like this if it passes bootstrap/regtest?

2023-06-13  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc5, usubc5): Document new named
patterns.
* config/i386/i386.md (subborrow): Add alternative with
memory destination.
(uaddc5, usubc5): New define_expand patterns.
(*sub_3, @add3_carry, addcarry, @sub3_carry,
subborrow, *add3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.

* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.

--- gcc/internal-fn.def.jj  2023-06-13 18:23:37.208793152 +0200
+++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
@@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.cc.jj   2023-06-13 18:23:37.206793179 +0200
+++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
@@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
   expand_arith_overflow (MULT_EXPR, stmt);
 }
 
+/* Expand UADDC STMT.  */
+
+static void
+expand_UADDC (internal_fn ifn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg1 = gimple_call_arg (stmt, 0);
+  tree arg2 = gimple_call_arg (stmt, 1);
+  tree arg3 = gimple_call_arg (stmt, 2);
+  tree type = TREE_TYPE (arg1);
+  machine_mode mode = TYPE_MODE (type);
+  insn_code icode = optab_handler (ifn == IFN_UADDC
+  ? uaddc5_optab : usubc5_optab, mode);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = expand_normal (arg3);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE

[PATCH] c++: tweak c++17 ctor/conversion tiebreaker [DR2327]

2023-06-14 Thread Jason Merrill via Gcc-patches
In discussion of this issue CWG decided that the change of behavior on
well-formed code like overload-conv-4.C is undesirable.  In further
discussion of possible resolutions, we discovered that we can avoid that
change while still getting the desired behavior on overload-conv-3.C by
making this a tiebreaker after comparing conversions, rather than before.
This also simplifies the implementation.

The issue resolution has not yet been finalized, but this seems like a clear
improvement.

DR 2327

gcc/cp/ChangeLog:

* call.cc (joust_maybe_elide_copy): Don't change cand.
(joust): Move the elided tiebreaker later.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/overload-conv-4.C: Remove warnings.
* g++.dg/cpp1z/elide7.C: New test.
---
 gcc/cp/call.cc   | 56 
 gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C |  5 +-
 gcc/testsuite/g++.dg/cpp1z/elide7.C  | 14 +
 3 files changed, 39 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/elide7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 68cf878308e..15a3d6f2a1f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -12560,11 +12560,11 @@ add_warning (struct z_candidate *winner, struct 
z_candidate *loser)
 }
 
 /* CAND is a constructor candidate in joust in C++17 and up.  If it copies a
-   prvalue returned from a conversion function, replace CAND with the candidate
-   for the conversion and return true.  Otherwise, return false.  */
+   prvalue returned from a conversion function, return true.  Otherwise, return
+   false.  */
 
 static bool
-joust_maybe_elide_copy (z_candidate *&cand)
+joust_maybe_elide_copy (z_candidate *cand)
 {
   tree fn = cand->fn;
   if (!DECL_COPY_CONSTRUCTOR_P (fn) && !DECL_MOVE_CONSTRUCTOR_P (fn))
@@ -12580,10 +12580,7 @@ joust_maybe_elide_copy (z_candidate *&cand)
   (conv->type, DECL_CONTEXT (fn)));
   z_candidate *uc = conv->cand;
   if (DECL_CONV_FN_P (uc->fn))
-   {
- cand = uc;
- return true;
-   }
+   return true;
 }
   return false;
 }
@@ -12735,27 +12732,6 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
}
 }
 
-  /* Handle C++17 copy elision in [over.match.ctor] (direct-init) context.  The
- standard currently says that only constructors are candidates, but if one
- copies a prvalue returned by a conversion function we want to treat the
- conversion as the candidate instead.
-
- Clang does something similar, as discussed at
- http://lists.isocpp.org/core/2017/10/3166.php
- http://lists.isocpp.org/core/2019/03/5721.php  */
-  int elided_tiebreaker = 0;
-  if (len == 1 && cxx_dialect >= cxx17
-  && DECL_P (cand1->fn)
-  && DECL_COMPLETE_CONSTRUCTOR_P (cand1->fn)
-  && !(cand1->flags & LOOKUP_ONLYCONVERTING))
-{
-  bool elided1 = joust_maybe_elide_copy (cand1);
-  bool elided2 = joust_maybe_elide_copy (cand2);
-  /* As a tiebreaker below we will prefer a constructor to a conversion
-operator exposed this way.  */
-  elided_tiebreaker = elided2 - elided1;
-}
-
   for (i = 0; i < len; ++i)
 {
   conversion *t1 = cand1->convs[i + off1];
@@ -12917,11 +12893,6 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
   if (winner)
 return winner;
 
-  /* Put this tiebreaker first, so that we don't try to look at second_conv of
- a constructor candidate that doesn't have one.  */
-  if (elided_tiebreaker)
-return elided_tiebreaker;
-
   /* DR 495 moved this tiebreaker above the template ones.  */
   /* or, if not that,
  the  context  is  an  initialization by user-defined conversion (see
@@ -12958,6 +12929,25 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
   }
   }
 
+  /* DR2327: C++17 copy elision in [over.match.ctor] (direct-init) context.
+ The standard currently says that only constructors are candidates, but if
+ one copies a prvalue returned by a conversion function we prefer that.
+
+ Clang does something similar, as discussed at
+ http://lists.isocpp.org/core/2017/10/3166.php
+ http://lists.isocpp.org/core/2019/03/5721.php  */
+  if (len == 1 && cxx_dialect >= cxx17
+  && DECL_P (cand1->fn)
+  && DECL_COMPLETE_CONSTRUCTOR_P (cand1->fn)
+  && !(cand1->flags & LOOKUP_ONLYCONVERTING))
+{
+  bool elided1 = joust_maybe_elide_copy (cand1);
+  bool elided2 = joust_maybe_elide_copy (cand2);
+  winner = elided1 - elided2;
+  if (winner)
+   return winner;
+}
+
   /* or, if not that,
  F1 is a non-template function and F2 is a template function
  specialization.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C 
b/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
index 6fcdbbaa6a4..d2663e6cb20 100644
--- a/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
+++ b/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C

Re: [PATCH] libstdc++: Clarify manual demangle doc

2023-06-14 Thread Jonathan Wakely via Gcc-patches
On Sat, 10 Jun 2023 at 23:04, Jonny Grant wrote:
>
> libstdc++-v3/ChangeLog:
>
> * doc/xml/manual/extensions.xml: Remove demangle exception 
> description and include.

Thanks, pushed to trunk.

>
> ---
>  libstdc++-v3/doc/xml/manual/extensions.xml | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
> b/libstdc++-v3/doc/xml/manual/extensions.xml
> index daa98f5cba7..d4fe2f509d4 100644
> --- a/libstdc++-v3/doc/xml/manual/extensions.xml
> +++ b/libstdc++-v3/doc/xml/manual/extensions.xml
> @@ -514,12 +514,10 @@ get_temporary_buffer(5, (int*)0);
>  you won't notice.)
>
>
> -Probably the only times you'll be interested in demangling at runtime
> -are when you're seeing typeid strings in RTTI, or when
> -you're handling the runtime-support exception classes.  For example:
> +Probably the only time you'll be interested in demangling at runtime
> +is when you're seeing typeid strings in RTTI.  For example:
>
> 
> -#include 
>  #include 
>  #include 
>  #include 
> --
> 2.37.2


[PATCH] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this introduces new checks for run tests.  Currently we have
riscv_vector as well as rv32 and rv64 which all check if GCC (with the
current configuration) can build (not execute) the respective tests.

Many tests specify e.g. a different -march for vector, though.  So the
check fails even though we could build as well as run the tests (i.e.
when qemu and binfmt are set up properly).

The new vector_hw now tries to compile, link and execute a simple
vector example.  If this succeeds the respective test can run.

Similarly we introduce a zvfh_hw check which will be used in the
upcoming floating-point unop/binop tests as well as rv32_hw and
rv64_hw checks that are currently unused.

I have requested feedback from some of you individually already and
would kindly ask for feedback if this works for folks (or already does
without doing anything?).
With my current gcc configuration (e.g. --target=riscv64-unknown-linux-gnu
--with-sysroot) the riscv_vector check fails and consequently
no vector test is run (UNSUPPORTED).  With the new riscv_vector_hw
check everythings runs on my machine.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Use
riscv_vector_hw.
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: Dito.
* gcc.target/riscv/rvv/autovec/series_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_ru

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Andrew Stubbs

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle 
end which expands to a vector compare against a vector of stepped 
values. This requires an additional instruction to prepare the 
comparison vector (compared to SVE), but the "while_ultv64sidi" pattern 
(for example) returns the DImode bitmask, so it works reasonably well.



Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar 128 256 512512e512f
 19.42   11.329.35   11.17   15.13   16.89
 25.726.536.666.667.628.56
 34.495.105.105.745.085.73
 44.104.334.295.213.794.25
 63.783.853.864.762.542.85
 83.641.893.764.501.922.16
123.562.213.754.261.261.42
163.360.831.064.160.951.07
203.391.421.334.070.750.85
243.230.661.724.220.620.70
283.181.092.044.200.540.61
323.160.470.410.410.470.53
343.160.670.610.560.440.50
383.190.950.950.820.400.45
423.090.581.211.130.360.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is 
a single loop in which a new mask is generated at the start of each 
iteration. In the masked epilogue case, the main loop uses no masking 
whatsoever, thus avoiding the need for generating a mask, carrying the 
mask, inserting vec_merge operations, etc, and then the epilogue looks 
much like the fully masked case, but unlike smaller mode epilogues there 
is no loop because the eplogue vector size is the same. Is that right?


This scheme seems like it might also benefit GCN, in so much as it 
simplifies the hot code path.


GCN does not actually have smaller vector sizes, so there's no analogue 
to AVX2 (we pretend we have some smaller sizes, but that's because the 
middle end can't do masking everywhere yet, and it helps make some 
vector constants smaller, perhaps).



This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

I was split between making 'vec_loop_masks' a class with methods,
possibly merging in the _len stuff into a single registry.  It
seemed to be too many changes for the purpose of getting AVX512
working.  I'm going to play wait and see what happens with RISC-V
here since they are going to get both masks and lengths registered
I think.

The vect_prepare_for_

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2023 um 16:00 schrieb Jakub Jelinek :
> 
> Hi!
> 
>> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
>> At this point two pages of code without a comment - can you introduce
>> some vertical spacing and comments as to what is matched now?  The
>> split out functions help somewhat but the code is far from obvious :/
>> 
>> Maybe I'm confused by the loops and instead of those sth like
>> 
>> if (match_x_y_z (op0)
>> || match_x_y_z (op1))
>>   ...
>> 
>> would be easier to follow with the loop bodies split out?
>> Maybe put just put them in lambdas even?
>> 
>> I guess you'll be around as long as myself so we can go with
>> this code under the premise you're going to maintain it - it's
>> not that I'm writing trivially to understand code myself ...
> 
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
> 
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
> 
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
> 
> Ok for trunk like this if it passes bootstrap/regtest?

Ok.

Thanks,
Richard 

> 2023-06-13  Jakub Jelinek  
> 
>PR middle-end/79173
>* internal-fn.def (UADDC, USUBC): New internal functions.
>* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
>(commutative_ternary_fn_p): Return true also for IFN_UADDC.
>* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
>* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
>match_uaddc_usubc): New functions.
>(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
>for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
>other optimizations have been successful for those.
>* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
>* fold-const-call.cc (fold_const_call): Likewise.
>* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
>* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
>* doc/md.texi (uaddc5, usubc5): Document new named
>patterns.
>* config/i386/i386.md (subborrow): Add alternative with
>memory destination.
>(uaddc5, usubc5): New define_expand patterns.
>(*sub_3, @add3_carry, addcarry, @sub3_carry,
>subborrow, *add3_cc_overflow_1): Add define_peephole2
>TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
>destination in these patterns.
> 
>* gcc.target/i386/pr79173-1.c: New test.
>* gcc.target/i386/pr79173-2.c: New test.
>* gcc.target/i386/pr79173-3.c: New test.
>* gcc.target/i386/pr79173-4.c: New test.
>* gcc.target/i386/pr79173-5.c: New test.
>* gcc.target/i386/pr79173-6.c: New test.
>* gcc.target/i386/pr79173-7.c: New test.
>* gcc.target/i386/pr79173-8.c: New test.
>* gcc.target/i386/pr79173-9.c: New test.
>* gcc.target/i386/pr79173-10.c: New test.
> 
> --- gcc/internal-fn.def.jj2023-06-13 18:23:37.208793152 +0200
> +++ gcc/internal-fn.def2023-06-14 12:21:38.650657857 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
> DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj2023-06-13 18:23:37.206793179 +0200
> +++ gcc/internal-fn.cc2023-06-14 12:21:38.652657829 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
>   expand_arith_overflow (MULT_EXPR, stmt);
> }
> 
> +/* Expand UADDC STMT.  */
> +
> +static void
> +expand_UADDC (internal_fn ifn, gcall *stmt)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree arg1 = gimple_call_arg (stmt, 0);
> +  tree arg2 = gimple_call_arg (stmt, 1);
> +  tree arg3 = gimple_call_arg (stmt, 2);
> +  tree type = TREE_TYPE (arg1);
> +  machine_mode mode = TYPE_MODE (type);
> +  insn_code icode = optab_handler (ifn == IFN_UADDC
> +   ? uaddc5_optab : usubc5_optab, mode);
> +  rtx op1 = expand_normal (arg1);
> +  rtx op2 = expand_normal (arg2);
> +  rtx op3 = ex

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> 
> On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
>> This implemens fully masked vectorization or a masked epilog for
>> AVX512 style masks which single themselves out by representing
>> each lane with a single bit and by using integer modes for the mask
>> (both is much like GCN).
>> AVX512 is also special in that it doesn't have any instruction
>> to compute the mask from a scalar IV like SVE has with while_ult.
>> Instead the masks are produced by vector compares and the loop
>> control retains the scalar IV (mainly to avoid dependences on
>> mask generation, a suitable mask test instruction is available).
> 
> This is also sounds like GCN. We currently use WHILE_ULT in the middle end 
> which expands to a vector compare against a vector of stepped values. This 
> requires an additional instruction to prepare the comparison vector (compared 
> to SVE), but the "while_ultv64sidi" pattern (for example) returns the DImode 
> bitmask, so it works reasonably well.
> 
>> Like RVV code generation prefers a decrementing IV though IVOPTs
>> messes things up in some cases removing that IV to eliminate
>> it with an incrementing one used for address generation.
>> One of the motivating testcases is from PR108410 which in turn
>> is extracted from x264 where large size vectorization shows
>> issues with small trip loops.  Execution time there improves
>> compared to classic AVX512 with AVX2 epilogues for the cases
>> of less than 32 iterations.
>> size   scalar 128 256 512512e512f
>> 19.42   11.329.35   11.17   15.13   16.89
>> 25.726.536.666.667.628.56
>> 34.495.105.105.745.085.73
>> 44.104.334.295.213.794.25
>> 63.783.853.864.762.542.85
>> 83.641.893.764.501.922.16
>>123.562.213.754.261.261.42
>>163.360.831.064.160.951.07
>>203.391.421.334.070.750.85
>>243.230.661.724.220.620.70
>>283.181.092.044.200.540.61
>>323.160.470.410.410.470.53
>>343.160.670.610.560.440.50
>>383.190.950.950.820.400.45
>>423.090.581.211.130.360.40
>> 'size' specifies the number of actual iterations, 512e is for
>> a masked epilog and 512f for the fully masked loop.  From
>> 4 scalar iterations on the AVX512 masked epilog code is clearly
>> the winner, the fully masked variant is clearly worse and
>> it's size benefit is also tiny.
> 
> Let me check I understand correctly. In the fully masked case, there is a 
> single loop in which a new mask is generated at the start of each iteration. 
> In the masked epilogue case, the main loop uses no masking whatsoever, thus 
> avoiding the need for generating a mask, carrying the mask, inserting 
> vec_merge operations, etc, and then the epilogue looks much like the fully 
> masked case, but unlike smaller mode epilogues there is no loop because the 
> eplogue vector size is the same. Is that right?

Yes.

> This scheme seems like it might also benefit GCN, in so much as it simplifies 
> the hot code path.
> 
> GCN does not actually have smaller vector sizes, so there's no analogue to 
> AVX2 (we pretend we have some smaller sizes, but that's because the middle 
> end can't do masking everywhere yet, and it helps make some vector constants 
> smaller, perhaps).
> 
>> This patch does not enable using fully masked loops or
>> masked epilogues by default.  More work on cost modeling
>> and vectorization kind selection on x86_64 is necessary
>> for this.
>> Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
>> which could be exploited further to unify some of the flags
>> we have right now but there didn't seem to be many easy things
>> to merge, so I'm leaving this for followups.
>> Mask requirements as registered by vect_record_loop_mask are kept in their
>> original form and recorded in a hash_set now instead of being
>> processed to a vector of rgroup_controls.  Instead that's now
>> left to the final analysis phase which tries forming the rgroup_controls
>> vector using while_ult and if that fails now tries AVX512 style
>> which needs a different organization and instead fills a hash_map
>> with the relevant info.  vect_get_loop_mask now has two implementations,
>> one for the two mask styles we then have.
>> I have decided against interweaving vect_set_loop_condition_partial_vectors
>> with conditions to do AVX512 style masking and instead opted to
>> "duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
>> Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
>> I was split between making 'vec_loop_masks' a class with methods,
>> possibly merging in the _len stuff

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now?  The
> > split out functions help somewhat but the code is far from obvious :/
> >
> > Maybe I'm confused by the loops and instead of those sth like
> >
> >  if (match_x_y_z (op0)
> >  || match_x_y_z (op1))
> >...
> >
> > would be easier to follow with the loop bodies split out?
> > Maybe put just put them in lambdas even?
> >
> > I guess you'll be around as long as myself so we can go with
> > this code under the premise you're going to maintain it - it's
> > not that I'm writing trivially to understand code myself ...
>
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
>
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
>
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
>
> Ok for trunk like this if it passes bootstrap/regtest?
>
> 2023-06-13  Jakub Jelinek  
>
> PR middle-end/79173
> * internal-fn.def (UADDC, USUBC): New internal functions.
> * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_UADDC.
> * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
> * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
> match_uaddc_usubc): New functions.
> (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
> * fold-const-call.cc (fold_const_call): Likewise.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (uaddc5, usubc5): Document new named
> patterns.
> * config/i386/i386.md (subborrow): Add alternative with
> memory destination.
> (uaddc5, usubc5): New define_expand patterns.
> (*sub_3, @add3_carry, addcarry, @sub3_carry,
> subborrow, *add3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.

LGTM for the x86 part. I did my best, but those peephole2 patterns are
real PITA to be reviewed thoroughly.

Maybe split out peephole2 pack to a separate patch, followed by a
testcase patch. This way, bisection would be able to point out if a
generic part or target-dependent part caused eventual regression.

Thanks,
Uros.

>
> --- gcc/internal-fn.def.jj  2023-06-13 18:23:37.208793152 +0200
> +++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
>  DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj   2023-06-13 18:23:37.206793179 +0200
> +++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
>expand_arith_overflow (MULT_EXPR, stmt);
>  }
>
> +/* Expand UADDC STMT.  */
> +
> +static void
> +expand_UADDC (internal_

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now?  The
> > split out functions help somewhat but the code is far from obvious :/
> >
> > Maybe I'm confused by the loops and instead of those sth like
> >
> >  if (match_x_y_z (op0)
> >  || match_x_y_z (op1))
> >...
> >
> > would be easier to follow with the loop bodies split out?
> > Maybe put just put them in lambdas even?
> >
> > I guess you'll be around as long as myself so we can go with
> > this code under the premise you're going to maintain it - it's
> > not that I'm writing trivially to understand code myself ...
>
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
>
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
>
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
>
> Ok for trunk like this if it passes bootstrap/regtest?
>
> 2023-06-13  Jakub Jelinek  
>
> PR middle-end/79173
> * internal-fn.def (UADDC, USUBC): New internal functions.
> * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_UADDC.
> * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
> * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
> match_uaddc_usubc): New functions.
> (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
> * fold-const-call.cc (fold_const_call): Likewise.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (uaddc5, usubc5): Document new named
> patterns.
> * config/i386/i386.md (subborrow): Add alternative with
> memory destination.
> (uaddc5, usubc5): New define_expand patterns.
> (*sub_3, @add3_carry, addcarry, @sub3_carry,
> subborrow, *add3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.

+;; Helper peephole2 for the addcarry and subborrow
+;; peephole2s, to optimize away nop which resulted from uaddc/usubc
+;; expansion optimization.
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (match_operand:SWI48 1 "memory_operand"))
+   (const_int 0)]
+  ""
+  [(set (match_dup 0) (match_dup 1))])

Is this (const_int 0) from a recent patch from Roger that introduced:

+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setccc"
+  [(set (reg:CCC FLAGS_REG)
+ (reg:CCC FLAGS_REG))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])
+
+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setcc_qi_negqi_ccc_1_"
+  [(set (reg:CCC FLAGS_REG)
+ (ltu:CCC (reg:CC_CCC FLAGS_REG) (const_int 0)))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])
+
+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setcc_qi_negqi_ccc_2_"
+  [(set (reg:CCC FLAGS_REG)
+ (unspec:CCC [(ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0))
+ (const_int 0)] UNSPEC_CC_NE))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])

If this interferes with RTL stream, then instead of emitting
(const_int 0), the above patterns should simply emit:

{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}

And there will be no (const_int 0) in the RTL stream.

Uros.


Re: [RFC] Add stdckdint.h header for C23

2023-06-14 Thread Joseph Myers
On Tue, 13 Jun 2023, Paul Eggert wrote:

> > There is always the possibility to have the header co-owned by both
> > the compiler and C library, limits.h style.
> > Just
> > #if __has_include_next()
> > # include_next 
> > #endif
> 
> I don't see how you could implement __has_include_next() for
> arbitrary non-GCC compilers, which is what we'd need for glibc users. For
> glibc internals we can use "#include_next" more readily, since we assume a
> new-enough GCC. I.e. we could do something like this:

Given the possibility of library functions being included in  
in future standard versions, it seems important to look at ways of 
splitting responsibility for the header between the compiler and library, 
whether with __has_include_next, or compiler version conditionals, or some 
other such variation.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> LGTM for the x86 part. I did my best, but those peephole2 patterns are
> real PITA to be reviewed thoroughly.
> 
> Maybe split out peephole2 pack to a separate patch, followed by a
> testcase patch. This way, bisection would be able to point out if a
> generic part or target-dependent part caused eventual regression.

Ok.  Guess if it helps for bisection, I could even split the peephole2s
to one peephole2 addition per commit and then the final patch would add the
expanders and the generic code.

Jakub



Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

the general method seems sane and useful (it's not very complicated).
I was just distracted by

> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
> common expression:
> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }
> 
> For this selector, we can use vmsltu + vmerge to optimize the codegen.

because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.

Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.

> +/* Recognize the patterns that we can use merge operation to shuffle the
> +   vectors. The value of Each element (index i) in selector can only be
> +   either i or nunits + i.
> +
> +   E.g.
> +   v = VEC_PERM_EXPR (v0, v1, selector),
> +   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

Same.

> +
> +   We can transform such pattern into:
> +
> +   v = vcond_mask (v0, v1, mask),
> +   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
> +
> +static bool
> +shuffle_merge_patterns (struct expand_vec_perm_d *d)
> +{
> +  machine_mode vmode = d->vmode;
> +  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
> +  int n_patterns = d->perm.encoding ().npatterns ();
> +  poly_int64 vec_len = d->perm.length ();
> +
> +  for (int i = 0; i < n_patterns; ++i)
> +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
> +  return false;
> +
> +  for (int i = n_patterns; i < n_patterns * 2; i++)
> +if (!d->perm.series_p (i, n_patterns, i, n_patterns)
> + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
> +  return false;

Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?

I didn't go through all tests in detail but skimmed several.  All in all
looks good to me.

Regards
 Robin



Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:56 PM Jakub Jelinek  wrote:
>
> On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> > LGTM for the x86 part. I did my best, but those peephole2 patterns are
> > real PITA to be reviewed thoroughly.
> >
> > Maybe split out peephole2 pack to a separate patch, followed by a
> > testcase patch. This way, bisection would be able to point out if a
> > generic part or target-dependent part caused eventual regression.
>
> Ok.  Guess if it helps for bisection, I could even split the peephole2s
> to one peephole2 addition per commit and then the final patch would add the
> expanders and the generic code.

I don't think it is necessary to split it too much. Peephole2s can be
tricky, but if there is something wrong, it is easy to figure out
which one is problematic.

Uros.


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote:
> The generic issue here is to fix (not "papering over") the signed
> overflow, we need to perform the addition in a target machine mode.  We
> may always use Pmode (IIRC const_anchor was introduced for optimizing
> some constant addresses), but can we do better?

The main issue is that the machine description generated target code to
compute some constants, but the sanitizer treats it as if it was user
code that might do wrong things.

> Should we try addition in both DImode and SImode for a 64-bit capable
> machine?

Why?  At least on PowerPC there is only one insn, and it is 64 bits.
The SImode version just ignores all bits other than the low 32 bits, in
both inputs and output.

> Or should we even try more operations than addition (for eg bit
> operations like xor or shift)?  Doing so will need to create a new
> target hook for const anchoring, this is the "complete rework" I meant.

This might make const anchor useful for way more targets maybe,
including rs6000, yes :-)


Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> I'm also thinking about other solutions:
> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>   This is the existing pattern.  It may be read as an action
>   to clean an unknown-size memory block.

Including a size zero memory block, yes.  BLKmode was originally to do
things like bcopy (before modern names like memcpy were more usually
used), and those very much need size zero as well.

> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> UNSPEC_TIE".
>   Current patch is using this one.

What would be the semantics of that?  Just the same as the current stuff
I'd say, or less?  It cannot be more!

> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> UNSPEC_TIE".
>This avoids using BLK on unspec, but using DI.

And is incorrect because of that.

> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
> UNSPEC_TIE"
>There is still a mode for the unspec.

It has VOIDmode here, which is incorrect.

> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE
> >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
> >
> > This makes it required that the operand of an UNSPEC_TIE unspec is a
> > const_int 0.  This should be documented somewhere.  Ideally you would
> > want no operand at all here, but every unspec has an operand.
> 
> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".

Yes.  But we should write down somewhere (in a comment near the unspec
constant def for example) what the operand is -- so, "operand is usually
(const_int 0) because we have to put *something* there" or such.  The
clumsiness of this is enough for me to prefer some other solution
already ;-)


Segher


Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:45:48PM +0200, Uros Bizjak wrote:
> +;; Helper peephole2 for the addcarry and subborrow
> +;; peephole2s, to optimize away nop which resulted from uaddc/usubc
> +;; expansion optimization.
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (match_operand:SWI48 1 "memory_operand"))
> +   (const_int 0)]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))])
> 
> Is this (const_int 0) from a recent patch from Roger that introduced:

The first one I see is the one immediately above that:
;; Pre-reload splitter to optimize
;; *setcc_qi followed by *addqi3_cconly_overflow_1 with the same QI
;; operand and no intervening flags modifications into nothing.
(define_insn_and_split "*setcc_qi_addqi3_cconly_overflow_1_"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC (neg:QI (geu:QI (reg:CC_CCC FLAGS_REG) (const_int 0)))
 (ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0]
  "ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(const_int 0)])

And you're right, the following incremental patch (I'd integrate it
into the full patch with
(*setcc_qi_addqi3_cconly_overflow_1_, *setccc,
*setcc_qi_negqi_ccc_1_, *setcc_qi_negqi_ccc_2_): Split
into NOTE_INSN_DELETED note rather than nop instruction.
added to ChangeLog) passes all the new tests as well:

--- gcc/config/i386/i386.md 2023-06-14 12:21:38.668657604 +0200
+++ gcc/config/i386/i386.md 2023-06-14 17:12:31.742625193 +0200
@@ -7990,16 +7990,6 @@
(set_attr "pent_pair" "pu")
(set_attr "mode" "")])
 
-;; Helper peephole2 for the addcarry and subborrow
-;; peephole2s, to optimize away nop which resulted from uaddc/usubc
-;; expansion optimization.
-(define_peephole2
-  [(set (match_operand:SWI48 0 "general_reg_operand")
-   (match_operand:SWI48 1 "memory_operand"))
-   (const_int 0)]
-  ""
-  [(set (match_dup 0) (match_dup 1))])
-
 (define_peephole2
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
@@ -8641,7 +8631,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setccc"
@@ -8650,7 +8641,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_1_"
@@ -8659,7 +8651,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_2_"
@@ -8669,7 +8662,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Overflow setting add instructions
 

Jakub



[PATCH 1/2] Missed opportunity to use [SU]ABD

2023-06-14 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 233 +-
 4 files changed, 217 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
6a435eb44610960513e9739ac9ac1e8a27182c10..e11b10d2fca11016232921bc85e47975f700e6c6
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5787,6 +5787,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = op1 > op2 ? op1 - op2 : op2 - op1;
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
3ac9d82aace322bd8ef108596e5583daa18c76e3..116965f4830cec8f60642ff011a86b6562e2c509
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -191,6 +191,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
6c064ff4993620067d38742a0bfe0a3efb511069..35b835a6ac56d72417dac8ddfd77a8a7e2475e65
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
dc102c919352a0328cf86eabceb3a38c41a7e4fd..e2392113bff4065c909aefc760b4c48978b73a5a
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -782,6 +782,83 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+
+   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
+   HALF_TYPE and UNPROM will be set should the statement be found to
+   be a widened operation.
+   DIFF_STMT will be set to the MINUS_EXPR
+   statement that precedes the ABS_STMT unless vect_widened_op_tree
+   succeeds.
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type,
+   vect_unpromoted_value unprom[2],
+   gassign **diff_stmt)
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type)
+  || TYPE_OVERFLOW_WRAPS (abs_type)
+  || TYPE_UNSIGNED (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g. from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't include unsigned promotion.
+ (Note that ABS of an unsigned promotion should have been folded
+ away before now anyway.)  */
+  vect_unpromoted_value unprom_d

[PATCH] RISC-V: Add autovec FP binary operations.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are amended and split up into non-_Float16
and _Float16 flavors as we cannot rely on the zvfh extension
being present.

As long as we do not have full middle-end support -ffast-math
is required for the tests.

In order to allow proper _Float16 support we need to disable
promotion to float.  This patch handles that similarly to
TARGET_ZFH and TARGET_ZINX which is not strictly accurate.
The zvfh extension only requires zfhmin on the scalar side
i.e. just conversion to float and no actual operations.

The *run tests rely on the testsuite changes sent earlier.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 +
 gcc/config/riscv/riscv-protos.h   |  5 +-
 gcc/config/riscv/riscv-v.cc   | 76 ++-
 gcc/config/riscv/riscv.cc | 27 +--
 .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
 .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
 .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 +
 .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
 .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 +
 .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
 .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
 .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
 .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 +
 .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
 .../riscv/r

[PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.  vfrsqrt7 will be added at a later time.

Similary to the binop tests, there are flavors for zvfh now.  Prerequisites
as before.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 ++-
 .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
 .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
 .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
 .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
 .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
 .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
 .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
 16 files changed, 241 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1c6d793cae0..72154400f1f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -498,7 +498,7 @@ (define_expand "2"
 })
 
 ;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
 ;; 
---
 
 (define_expand "abs2"
@@ -517,6 +517,40 @@ (define_expand "abs2"
   DONE;
 })
 
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop_nofrm:VF
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop:VF
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_fp_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Ternary arithmetic
 ;; 

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote:
> On Wed, 14 Jun 2023, Jiufu Guo wrote:
> > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> > UNSPEC_TIE".
> >This avoids using BLK on unspec, but using DI.
> 
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Or DSE can delete this tie even, if it can see some later store to the
same location without anything in between that can read what the tie
stores.

BLKmode avoids all of this.  You can call that elegant, you can call it
cheating, you can call it many things -- but it *works*.

> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?

Form rs6000.md:
; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"

and from rs6000-logue.cc:
/* This ties together stack memory (MEM with an alias set of frame_alias_set)
   and the change to the stack pointer.  */
static void
rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)

A big reason this is needed is because of all the hard frame pointer
stuff, which the generic parts of GCC require, but there is no register
for that in the Power architecture.  Nothing is an issue here in most
cases, but sometimes we need to do unusual things to the stack, say for
alloca.

> I'd say a
> 
>   (may_clobber (mem:BLK (reg:DI 1 1)))

"clobber" always means "may clobber".  (clobber X) means X is written
with some unspecified value, which may well be whatever value it
currently holds.  Via some magical means or whatever, there is no
mechanism specified, just the effects :-)

> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?

It is the same thing.  "clobber" means the same thing as "set", except
the value that is written is not specified.

> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).

"clobber" is nicer than the set to (const_int 0).  Does it work though?
All this code is always fragile :-/  I'm all for this change, don't get
me wrong, but preferably things stay in working order.

We use "stack_tie" as a last resort heavy hammer anyway, in all normal
cases we explain the actual data flow explicitly and correctly, also
between the various registers used in the *logues.


Segher


  1   2   >