Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-19 Thread Jan Hubicka
> On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu  wrote:
> >
> > Simplify memcpy and memset inline strategies to avoid branches for
> > -mtune=generic:
> >
> > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> >load and store for up to 16 * 16 (256) bytes when the data size is
> >fixed and known.

Originally we set CLEAR_RATION smaller than MOVE_RATIO because to store
zeros we use:

   0:   48 c7 07 00 00 00 00movq   $0x0,(%rdi)
   7:   48 c7 47 08 00 00 00movq   $0x0,0x8(%rdi)
   e:   00 
   f:   48 c7 47 10 00 00 00movq   $0x0,0x10(%rdi)
  16:   00 
  17:   48 c7 47 18 00 00 00movq   $0x0,0x18(%rdi)
  1e:   00 

so about 8 bytes per instructions.   We could optimize it by loading 0
to scratch register but we don't.  SSE variant is shorter:

   4:   0f 11 07movups %xmm0,(%rdi)
   7:   0f 11 47 10 movups %xmm0,0x10(%rdi)

So I wonder if we care about code size with -mno-sse (i.e. for building
kernel).

> >  static stringop_algs generic_memcpy[2] = {
> > -  {libcall, {{32, loop, false}, {8192, rep_prefix_4_byte, false},
> > - {-1, libcall, false}}},
> > -  {libcall, {{32, loop, false}, {8192, rep_prefix_8_byte, false},
> > - {-1, libcall, false;
> > +  {libcall,
> > +   {{256, rep_prefix_1_byte, true},
> > +{256, loop, false},
False/true here is stringop_algs->noalign field which is used to control
enable/siable alignment prologue.   For rep_prefix_1_byte it should be
noop except for pentiumpro which preferred alignment of 8.

decide_alg picks first useable algorithm with size greater than expected
size of the block.  rep_prefix_1_byte may become unuseable if user fixes
AX/CX/SI/DI, but it won't pick loop if size is known.

A reason why we use loop for small blocks is that Buldozers were quite
poor on handling rep movsb for very small blocks.  We probably want to
retune generic w/o too much of buldozer specific considerations.
There is a simple microbenchmark in contrib/bench-stringops that cycles
through different algs and different average sizes

on znver5 and memcpy I get:
memcpy
  block size  libcall rep1noalg   rep4noalg   rep8noalg   loop
noalg   unrlnoalg   sse noalg   bytePGO dynamicBEST
 8192000  0:00.07 0:00.11 0:00.11 0:00.11 0:00.09 0:00.11 0:00.08 0:00.15 
0:00.15 0:00.10 0:00.11 0:00.10 0:00.08 0:01.18 0:00.07 0:00.070:00.07 
libcall
  819200  0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.15 
0:00.15 0:00.09 0:00.10 0:00.09 0:00.09 0:01.17 0:00.07 0:00.070:00.07 
libcall
   81920  0:00.08 0:00.08 0:00.11 0:00.05 0:00.06 0:00.07 0:00.06 0:00.20 
0:00.20 0:00.12 0:00.13 0:00.10 0:00.10 0:01.57 0:00.08 0:00.080:00.05 rep4
   20480  0:00.06 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 0:00.15 
0:00.15 0:00.08 0:00.10 0:00.06 0:00.06 0:01.18 0:00.05 0:00.060:00.05 rep1
8192  0:00.05 0:00.04 0:00.04 0:00.03 0:00.05 0:00.03 0:00.05 0:00.15 
0:00.16 0:00.09 0:00.10 0:00.06 0:00.07 0:01.17 0:00.03 0:00.050:00.03 rep4
4096  0:00.03 0:00.04 0:00.04 0:00.04 0:00.05 0:00.04 0:00.05 0:00.16 
0:00.17 0:00.09 0:00.10 0:00.07 0:00.07 0:01.18 0:00.04 0:00.040:00.03 
libcall
2048  0:00.03 0:00.05 0:00.05 0:00.05 0:00.06 0:00.05 0:00.05 0:00.18 
0:00.18 0:00.08 0:00.10 0:00.09 0:00.08 0:01.20 0:00.05 0:00.050:00.03 
libcall
1024  0:00.04 0:00.07 0:00.06 0:00.07 0:00.07 0:00.07 0:00.06 0:00.19 
0:00.19 0:00.09 0:00.10 0:00.09 0:00.10 0:01.23 0:00.07 0:00.070:00.04 
libcall
 512  0:00.06 0:00.11 0:00.11 0:00.11 0:00.11 0:00.11 0:00.10 0:00.18 
0:00.19 0:00.10 0:00.11 0:00.12 0:00.13 0:01.30 0:00.11 0:00.110:00.06 
libcall
 256  0:00.11 0:00.18 0:00.18 0:00.17 0:00.17 0:00.17 0:00.16 0:00.17 
0:00.19 0:00.12 0:00.13 0:00.17 0:00.20 0:01.40 0:00.17 0:00.170:00.11 
libcall
 128  0:00.16 0:00.27 0:00.27 0:00.20 0:00.18 0:00.19 0:00.18 0:00.20 
0:00.21 0:00.17 0:00.18 0:00.31 0:00.48 0:01.41 0:00.19 0:00.190:00.16 
libcall
  64  0:00.24 0:00.23 0:00.23 0:00.39 0:00.36 0:00.37 0:00.34 0:00.26 
0:00.26 0:00.26 0:00.27 0:00.68 0:00.81 0:01.57 0:00.36 0:00.370:00.23 rep1
  48  0:00.30 0:00.34 0:00.34 0:00.51 0:00.50 0:00.49 0:00.47 0:00.33 
0:00.32 0:00.32 0:00.33 0:00.84 0:00.96 0:01.48 0:00.49 0:00.490:00.30 
libcall
  32  0:00.40 0:00.46 0:00.47 0:00.76 0:00.71 0:00.71 0:00.65 0:00.43 
0:00.42 0:00.43 0:00.42 0:01.26 0:01.13 0:01.26 0:00.71 0:00.430:00.40 
libcall
  24  0:00.54 0:00.67 0:00.65 0:01.01 0:00.98 0:00.95 0:00.89 0:00.57 
0:00.52 0:00.53 0:00.52 0:01.21 0:01.21 0:01.18 0:00.95 0:00.570:00.52 
loopnoalign
  16  0:00.71 0:00.90 0:00.91 0:01.48 0:01.36 0:01.39 0:01.17 0:00.71 
0:00.66 0:00.59 0:00.59 0:01.21 0:01.14 0:01.21 0:00.72 0:00.720:00.59 unrl
  14  0:00.86 0:01.13 0:01.15 0:01.73 0:01.64 0:01.62 0:01.41 0:00.83 
0:00.74 0:00.70 0:00.66 0:01.40 0:01.40 0:01.29 0:00.83 0:00.8

RE: [PATCH] cobol: Allow for undefined NAME_MAX [PR119217]

2025-04-19 Thread Robert Dubner
> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, April 18, 2025 14:10
> To: Rainer Orth 
> Cc: Richard Biener ; Andreas Schwab
> ; gcc-patches@gcc.gnu.org; Robert Dubner
> ; James K. Lowden 
> Subject: Re: [PATCH] cobol: Allow for undefined NAME_MAX [PR119217]
> 
> On Fri, Apr 18, 2025 at 06:04:29PM +0200, Rainer Orth wrote:
> > That's one option, but maybe it's better the other way round: instead
of
> > excluding known-bad targets, restrict cobol to known-good ones
> > (i.e. x86_64-*-linux* and aarch64-*-linux*) instead.
> >
> > I've been using the following for this (should be retested for
safety).
> 
> I admit I don't really know what works and what doesn't out of the box
> now,
> but your patch looks reasonable to me for 15 branch.
> 
> Richard, Robert and/or James, do you agree?

I agree.  At the present time, I have access to only aarch64/x86_64-linux
machines, so those are the only ones I know work.  I seem to recall I
originally did it that way; only those configurations were white-listed.


> 
> > 2025-03-17  Rainer Orth  
> >
> > PR cobol/119217
> > * configure.ac: Restrict cobol to aarch64-*-linux*,
> > x86_64-*-linux*.
> > * configure: Regenerate.
> 
>   Jakub



[r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c scan-assembler-times vcvtpd2phx[ \\t]+[^{\n]*[^\n\r]*%xmm[0-9]+(?:\n|[ \\t]+#) 1 on Linux/x86_64

2025-04-19 Thread haochen.jiang
On Linux/x86_64,

f6859fb621179ec9bf5631eb8902619ab8d4467b is the first bad commit
commit f6859fb621179ec9bf5631eb8902619ab8d4467b
Author: Jan Hubicka 
Date:   Sat Apr 19 18:51:27 2025 +0200

Add tables for SSE fp conversion costs

caused

FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c scan-assembler-times 
vcvtpd2phx[ \\t]+[^{\n]*[^\n\r]*%xmm[0-9]+(?:\n|[ \\t]+#) 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r16-39/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-trunc-extendvnhf.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-trunc-extendvnhf.c 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] libphobos: enable for sparc64-unknown-linux-gnu

2025-04-19 Thread Sam James
This bootstraps with some test failures but works well enough to build
11..15.

libphobos/ChangeLog:

* configure.tgt: Add sparc64-unknown-linux-gnu as a supported target.
---
As discussed on IRC. OK? I'd like to backport it to branches in due course
once they're all open and some time on trunk too, as it would make life
easier for us in bootstrapping from 11.

 libphobos/configure.tgt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libphobos/configure.tgt b/libphobos/configure.tgt
index 16362534f9ed..198296310174 100644
--- a/libphobos/configure.tgt
+++ b/libphobos/configure.tgt
@@ -58,7 +58,7 @@ case "${target}" in
   s390*-linux*)
LIBPHOBOS_SUPPORTED=yes
;;
-  sparc*-*-solaris2.11*)
+  sparc64-*-linux* | sparc*-*-solaris2.11*)
LIBPHOBOS_SUPPORTED=yes
;;
   *-*-darwin9* | *-*-darwin1[01]*)
-- 
2.49.0



Re: [PATCH] cobol: Allow for undefined NAME_MAX [PR119217]

2025-04-19 Thread Sam James
Robert Dubner  writes:

>> -Original Message-
>> From: Jakub Jelinek 
>> Sent: Friday, April 18, 2025 14:10
>> To: Rainer Orth 
>> Cc: Richard Biener ; Andreas Schwab
>> ; gcc-patches@gcc.gnu.org; Robert Dubner
>> ; James K. Lowden 
>> Subject: Re: [PATCH] cobol: Allow for undefined NAME_MAX [PR119217]
>> 
>> On Fri, Apr 18, 2025 at 06:04:29PM +0200, Rainer Orth wrote:
>> > That's one option, but maybe it's better the other way round: instead
> of
>> > excluding known-bad targets, restrict cobol to known-good ones
>> > (i.e. x86_64-*-linux* and aarch64-*-linux*) instead.
>> >
>> > I've been using the following for this (should be retested for
> safety).
>> 
>> I admit I don't really know what works and what doesn't out of the box
>> now,
>> but your patch looks reasonable to me for 15 branch.
>> 
>> Richard, Robert and/or James, do you agree?
>
> I agree.  At the present time, I have access to only aarch64/x86_64-linux
> machines, so those are the only ones I know work.  I seem to recall I
> originally did it that way; only those configurations were white-listed.

I think you may be mistaken. In r15-7941-g45c281deb7a2e2, aarch64 and
x86_64 were whitelisted as *architectures*, but the platform (including
the kernel - Linux) wasn't specified. Rainer is reporting an issue with
x86_64 Solaris.

thanks,
sam


Re: PING: [PATCH v2] x86: Add pcmpeq splitters

2025-04-19 Thread Uros Bizjak
On Sat, Apr 19, 2025 at 7:22 AM H.J. Lu  wrote:
>
> On Mon, Dec 2, 2024 at 6:27 AM H.J. Lu  wrote:
> >
> > Add pcmpeq splitters to split
> >
> > (insn 5 3 7 2 (set (reg:V4SI 100)
> > (eq:V4SI (reg:V4SI 98)
> > (reg:V4SI 98))) 7910 {*sse2_eqv4si3}
> >  (expr_list:REG_DEAD (reg:V4SI 98)
> > (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ])
> > (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ]))
> > (nil
> >
> > to
> >
> > (insn 8 3 7 2 (set (reg:V4SI 100)
> > (const_vector:V4SI [
> > (const_int -1 [0x]) repeated x4
> > ])) -1
> >  (nil))
> >
> > gcc/
> >
> > PR target/117863
> > * config/i386/sse.md: Add pcmpeq splitters.
> >
> > gcc/testsuite/
> >
> > PR target/117863
> > * gcc.dg/rtl/i386/vector_eq-2.c: New test.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/sse.md  | 36 +++
> >  gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 +
> >  2 files changed, 107 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 498a42d6e1e..4b19bc22a83 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -17943,6 +17943,18 @@ (define_insn "*avx2_eq3"
> > (set_attr "prefix" "vex")
> > (set_attr "mode" "OI")])
> >
> > +;; Don't remove memory operand to keep volatile memory.

Perhaps we can use MEM_VOLATILE_P to also allow memory operands?

> > +(define_split
> > +  [(set (match_operand:VI_256 0 "register_operand")
> > +   (eq:VI_256
> > + (match_operand:VI_256 1 "register_operand")
> > + (match_operand:VI_256 2 "register_operand")))]
> > +  "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (mode);
> > +})

Single preparation statements should use double quotes, here and in other cases.

Uros.

> > +
> >  (define_insn_and_split "*avx2_pcmp3_1"
> >   [(set (match_operand:VI_128_256  0 "register_operand")
> > (vec_merge:VI_128_256
> > @@ -18227,6 +18239,18 @@ (define_insn "*sse4_1_eqv2di3"
> > (set_attr "prefix" "orig,orig,vex")
> > (set_attr "mode" "TI")])
> >
> > +;; Don't remove memory operand to keep volatile memory.
> > +(define_split
> > +  [(set (match_operand:V2DI 0 "register_operand")
> > +   (eq:V2DI
> > + (match_operand:V2DI 1 "register_operand")
> > + (match_operand:V2DI 2 "register_operand")))]
> > +  "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (V2DImode);
> > +})
> > +
> >  (define_insn "*sse2_eq3"
> >[(set (match_operand:VI124_128 0 "register_operand" "=x,x")
> > (eq:VI124_128
> > @@ -18243,6 +18267,18 @@ (define_insn "*sse2_eq3"
> > (set_attr "prefix" "orig,vex")
> > (set_attr "mode" "TI")])
> >
> > +;; Don't remove memory operand to keep volatile memory.
> > +(define_split
> > +  [(set (match_operand:VI124_128 0 "register_operand")
> > +   (eq:VI124_128
> > + (match_operand:VI124_128 1 "register_operand")
> > + (match_operand:VI124_128 2 "register_operand")))]
> > +  "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  operands[1] = CONSTM1_RTX (mode);
> > +})
> > +
> >  (define_insn "sse4_2_gtv2di3"
> >[(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
> > (gt:V2DI
> > diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c 
> > b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> > new file mode 100644
> > index 000..871d489b730
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
> > @@ -0,0 +1,71 @@
> > +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-additional-options "-O2 -march=x86-64-v3" } */
> > +
> > +typedef int v4si __attribute__((vector_size(16)));
> > +typedef int v8si __attribute__((vector_size(32)));
> > +typedef int v2di __attribute__((vector_size(16)));
> > +
> > +v4si __RTL (startwith ("vregs1")) foo1 (void)
> > +{
> > +(function "foo1"
> > +  (insn-chain
> > +(block 2
> > +  (edge-from entry (flags "FALLTHRU"))
> > +  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > +  (cnote 2 NOTE_INSN_FUNCTION_BEG)
> > +  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) 
> > (const_int -1) (const_int -1) (const_int -1)])))
> > +  (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) 
> > (const_int -1) (const_int -1) (const_int -1)])))
> > +  (cinsn 5 (set (reg:V4SI <2>)
> > +   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>

Re: [PATCH v2] sh: libgcc: Implement fenv rouding and exceptions for soft-fp [PR118257]

2025-04-19 Thread Jeff Law




On 1/1/25 10:02 AM, Jiaxun Yang wrote:

Implement fenv rouding and exceptions for soft-fp, as per SuperH
arch specification.

No new tests required, as it's already covered by many torture tests
with fenv_exceptions.

PR target/118257

libgcc/ChangeLog:

* config/sh/sfp-machine.h (_FPU_GETCW): Implement with builtin.
(_FPU_SETCW): Likewise.
(FP_EX_ENABLE_SHIFT): Derive from arch spec.
(FP_EX_CAUSE_SHIFT): Likewise.
(FP_RND_MASK): Likewise.
(FP_EX_INVALID): Likewise.
(FP_EX_DIVZERO): Likewise.
(FP_EX_ALL): Likewise.
(FP_EX_OVERFLOW): Likewise.
(FP_EX_UNDERFLOW): Likewise.
(FP_EX_INEXACT): Likewise.
(_FP_DECL_EX): Declear default FCSR value.
(FP_RND_NEAREST): Derive from arch spec.
(FP_RND_ZERO): Likewise.
(FP_INIT_ROUNDMODE): Likewise.
(FP_ROUNDMODE): Likewise.
(FP_TRAPPING_EXCEPTIONS): Likewise.
(FP_HANDLE_EXCEPTIONS): Implement with _FPU_SETCW.
I've pushed to the trunk as well as the assumption that you've got these 
various constants right.  Oleg, hope I'm not stepping on your toes here, 
I'm just starting to work through some of the safer gcc-16 stuff.


jeff


Re: [PATCH] recog: Handle some mode-changing hardreg propagations

2025-04-19 Thread Jeff Law




On 1/1/25 2:08 PM, Keith Packard wrote:

From: Richard Sandiford 
Date: Mon, 30 Dec 2024 12:18:40 +


...that could be handled by adding:

   && GET_MODE_INNER (from) != GET_MODE_INNER (to)


I'll let those of you who understand this code far better than I do
figure out whether that's the right plan. I figured that copying how it
works on i386 was the best plan as that target is so similar.
Well, I *think* Andreas's comment was suggesting that the patch was 
caused a build failure in libstdc++, so that needs to be addressed as 
before this could go forward.


jeff



Re: [GCC16/PATCH] combine: Better split point for `(and (not X))` [PR111949]

2025-04-19 Thread Jeff Law




On 1/20/25 9:38 PM, Andrew Pinski wrote:

In a similar way find_split_point handles `a+b*C`, this adds
the split point for `~a & b`.  This allows for better instruction
selection when the target has this instruction (aarch64, arm and x86_64
are examples which have this).

Built and tested for aarch64-linux-gnu.

PR rtl-optmization/111949

gcc/ChangeLog:

* combine.cc (find_split_point): Add a split point
for `(and (not X) Y)` if not in the outer set already.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bic-1.c: New test.

OK.

FWIW we were already handling the first two cases well on risc-v, but 
the patch does help the 3rd case.


Jeff


Re: [PATCH v2] sh: Correct NaN signalling bit and propagation rules [PR111814]

2025-04-19 Thread Jeff Law




On 1/1/25 6:54 AM, Jiaxun Yang wrote:

As per architecture, SuperH has a reversed NaN signalling bit
vs IEEE754-2008, it also has a NaN propgation rule similar to
MIPS style.

Use mips style float format and mode for all float types, and
correct sfp-machine header accordingly.

PR target/111814

gcc/ChangeLog:

* config/sh/sh-modes.def (RESET_FLOAT_FORMAT): Use mips format.
(FLOAT_MODE): Use mips mode.

libgcc/ChangeLog:

* config/sh/sfp-machine.h (_FP_NANFRAC_B): Reverse signaling bit.
(_FP_NANFRAC_H): Likewise.
(_FP_NANFRAC_S): Likewise.
(_FP_NANFRAC_D): Likewise.
(_FP_NANFRAC_Q): Likewise.
(_FP_KEEPNANFRACP): Enable for target.
(_FP_QNANNEGATEDP): Enable for target.
(_FP_CHOOSENAN): Port from MIPS.

gcc/testsuite/ChangeLog:

* gcc.target/sh/pr111814.c: New test.
I haven't seen an explicit ack from Oleg, but he did signal in the PR 
trail that he was generally on board.


In the PR trail Joseph noted some desirable changes to glibc.  We're in 
a bit of a chicken and the egg problem if I read things correctly.  But 
if someone doesn't go first it'll never untangle.


So I'll go ahead and ACK for the trunk.  We can backport to the release 
branches per Joseph's recommendation after it's been on the trunk a bit. 
 Ideally the glibc side of this would get wrapped up before that 
project's fall release.


Jeff



Re: [PATCH] tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]

2025-04-19 Thread Andrew Pinski
On Sat, Apr 19, 2025 at 8:32 PM Andrew Pinski  wrote:
>
> r15-6943-g9c4397cafc5ded added support to undo IPA-VRP return value 
> optimization for tail calls,
> using the same code ERF_RETURNS_ARG can be supported for functions which 
> return one of their arguments.
> This allows for tail calling of memset/memcpy in some cases which were not 
> handled before.
>
> Bootstrapped and tested on x86_64-linux-gnu.

For easier review, I also attached the diff ignoring the white spaces.
Since this is mostly reindenting the code.

Thanks,
Andrew

>
> PR tree-optimization/67797
>
> gcc/ChangeLog:
>
> * tree-tailcall.cc (find_tail_calls): Add support for ERF_RETURNS_ARG.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/tailcall-14.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c |  25 +
>  gcc/tree-tailcall.cc| 105 +++-
>  2 files changed, 84 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
> new file mode 100644
> index 000..6fadff8ea00
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-tailc-details" } */
> +
> +/* PR tree-optimization/67797 */
> +
> +void *my_func(void *s, int n)
> +{
> +  __builtin_memset(s, 0, n);
> +  return s;
> +}
> +void *my_func1(void *d, void *s, int n)
> +{
> +  __builtin_memcpy(d, s, n);
> +  return d;
> +}
> +void *my_func2(void *s, void *p1, int n)
> +{
> +  if (p1)
> +__builtin_memcpy(s, p1, n);
> +  else
> +__builtin_memset(s, 0, n);
> +  return s;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Found tail call" 4 "tailc"} } */
> diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
> index f593363dae4..1371454406f 100644
> --- a/gcc/tree-tailcall.cc
> +++ b/gcc/tree-tailcall.cc
> @@ -1083,57 +1083,70 @@ find_tail_calls (basic_block bb, struct tailcall 
> **ret, bool only_musttail,
>  {
>bool ok = false;
>value_range val;
> -  tree valr;
> -  /* If IPA-VRP proves called function always returns a singleton range,
> -the return value is replaced by the only value in that range.
> -For tail call purposes, pretend such replacement didn't happen.  */
>if (ass_var == NULL_TREE && !tail_recursion)
> -   if (tree type = gimple_range_type (call))
> - if (tree callee = gimple_call_fndecl (call))
> -   if ((INTEGRAL_TYPE_P (type)
> -|| SCALAR_FLOAT_TYPE_P (type)
> -|| POINTER_TYPE_P (type))
> -   && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> - type)
> -   && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> -   && ipa_return_value_range (val, callee)
> -   && val.singleton_p (&valr))
> +   {
> + tree other_value = NULL_TREE;
> + /* If we have a function call that we know the return value is the 
> same
> +as the argument, try the argument too. */
> + int flags = gimple_call_return_flags (call);
> + if ((flags & ERF_RETURNS_ARG) != 0
> + && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args (call))
> +   other_value = gimple_call_arg (call, flags & ERF_RETURN_ARG_MASK);
> + /* If IPA-VRP proves called function always returns a singleton 
> range,
> +the return value is replaced by the only value in that range.
> +For tail call purposes, pretend such replacement didn't happen.  
> */
> + else if (tree type = gimple_range_type (call))
> +   if (tree callee = gimple_call_fndecl (call))
>   {
> -   tree rv = ret_var;
> -   unsigned int i = edges.length ();
> -   /* If ret_var is equal to valr, we can tail optimize.  */
> -   if (operand_equal_p (ret_var, valr, 0))
> - ok = true;
> -   else
> - /* Otherwise, if ret_var is a PHI result, try to find out
> -if valr isn't propagated through PHIs on the path from
> -call's bb to SSA_NAME_DEF_STMT (ret_var)'s bb.  */
> - while (TREE_CODE (rv) == SSA_NAME
> -&& gimple_code (SSA_NAME_DEF_STMT (rv)) == 
> GIMPLE_PHI)
> -   {
> - tree nrv = NULL_TREE;
> - gimple *g = SSA_NAME_DEF_STMT (rv);
> - for (; i; --i)
> -   {
> - if (edges[i - 1]->dest == gimple_bb (g))
> -   {
> - nrv
> -   = gimple_phi_arg_def_from_edge (g,
> +   tree valr;
> +

[PATCH] tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]

2025-04-19 Thread Andrew Pinski
r15-6943-g9c4397cafc5ded added support to undo IPA-VRP return value 
optimization for tail calls,
using the same code ERF_RETURNS_ARG can be supported for functions which return 
one of their arguments.
This allows for tail calling of memset/memcpy in some cases which were not 
handled before.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/67797

gcc/ChangeLog:

* tree-tailcall.cc (find_tail_calls): Add support for ERF_RETURNS_ARG.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/tailcall-14.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c |  25 +
 gcc/tree-tailcall.cc| 105 +++-
 2 files changed, 84 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
new file mode 100644
index 000..6fadff8ea00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+/* PR tree-optimization/67797 */
+
+void *my_func(void *s, int n)
+{
+  __builtin_memset(s, 0, n);
+  return s;
+}
+void *my_func1(void *d, void *s, int n)
+{
+  __builtin_memcpy(d, s, n);
+  return d;
+}
+void *my_func2(void *s, void *p1, int n)
+{
+  if (p1)
+__builtin_memcpy(s, p1, n);
+  else
+__builtin_memset(s, 0, n);
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 4 "tailc"} } */
diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index f593363dae4..1371454406f 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -1083,57 +1083,70 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail,
 {
   bool ok = false;
   value_range val;
-  tree valr;
-  /* If IPA-VRP proves called function always returns a singleton range,
-the return value is replaced by the only value in that range.
-For tail call purposes, pretend such replacement didn't happen.  */
   if (ass_var == NULL_TREE && !tail_recursion)
-   if (tree type = gimple_range_type (call))
- if (tree callee = gimple_call_fndecl (call))
-   if ((INTEGRAL_TYPE_P (type)
-|| SCALAR_FLOAT_TYPE_P (type)
-|| POINTER_TYPE_P (type))
-   && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
- type)
-   && useless_type_conversion_p (TREE_TYPE (ret_var), type)
-   && ipa_return_value_range (val, callee)
-   && val.singleton_p (&valr))
+   {
+ tree other_value = NULL_TREE;
+ /* If we have a function call that we know the return value is the 
same
+as the argument, try the argument too. */
+ int flags = gimple_call_return_flags (call);
+ if ((flags & ERF_RETURNS_ARG) != 0
+ && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args (call))
+   other_value = gimple_call_arg (call, flags & ERF_RETURN_ARG_MASK);
+ /* If IPA-VRP proves called function always returns a singleton range,
+the return value is replaced by the only value in that range.
+For tail call purposes, pretend such replacement didn't happen.  */
+ else if (tree type = gimple_range_type (call))
+   if (tree callee = gimple_call_fndecl (call))
  {
-   tree rv = ret_var;
-   unsigned int i = edges.length ();
-   /* If ret_var is equal to valr, we can tail optimize.  */
-   if (operand_equal_p (ret_var, valr, 0))
- ok = true;
-   else
- /* Otherwise, if ret_var is a PHI result, try to find out
-if valr isn't propagated through PHIs on the path from
-call's bb to SSA_NAME_DEF_STMT (ret_var)'s bb.  */
- while (TREE_CODE (rv) == SSA_NAME
-&& gimple_code (SSA_NAME_DEF_STMT (rv)) == GIMPLE_PHI)
-   {
- tree nrv = NULL_TREE;
- gimple *g = SSA_NAME_DEF_STMT (rv);
- for (; i; --i)
-   {
- if (edges[i - 1]->dest == gimple_bb (g))
-   {
- nrv
-   = gimple_phi_arg_def_from_edge (g,
+   tree valr;
+   if ((INTEGRAL_TYPE_P (type)
+|| SCALAR_FLOAT_TYPE_P (type)
+|| POINTER_TYPE_P (type))
+   && useless_type_conversion_p (TREE_TYPE (TREE_TYPE 
(callee)),
+ type)
+   && useless_type_conversion_p (TREE_TYPE (ret_var), type)
+   && ipa_return_value_range (val, callee)
+   && val

[PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-19 Thread pan2 . li
From: Pan Li 

This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR, it will:

* The pattern matching will be inactive if GR2VR cost is zero.
* The cost of GR2VR will be added to the total cost of pattern, and
  the late-combine will decide to perform the replacement or not
  based on the cost value.

Assume we have example code like below, GR2VR cost is 2 by default.

  #define DEF_VX_BINARY(T, OP)\
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = in[i] OP x;\
  }

  DEF_VX_BINARY(int32_t, +)

Before this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma // eliminated if GR2VR cost non-zero
  13   │ vmv.v.x v2,a2// Ditto.
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vadd.vv v1,v2,v1
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vadd.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_vx_): Add new
combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR
cost.
* config/riscv/riscv.cc (riscv_rtx_costs): Extract vector
cost into a separated func.
(riscv_vector_rtx_costs): Add new func to take care of the
cost of vector rtx, default to 1 and append GR2VR cost to
vec_duplicate rtx.
* config/riscv/vector-iterators.md: Add new iterator for vx.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec-opt.md  | 22 ++
 gcc/config/riscv/riscv.cc| 26 --
 gcc/config/riscv/vector-iterators.md |  4 
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 0c3b0cc7e05..1bc3985f1a3 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1673,3 +1673,25 @@ (define_insn_and_split "*vandn_"
 DONE;
   }
   [(set_attr "type" "vandn")])
+
+;; 
=
+;; Combine vec_duplicate + op.vv to op.vx
+;; Include
+;; - vadd.vx
+;; 
=
+(define_insn_and_split "*_vx_"
+ [(set (match_operand:V_VLSI0 "register_operand")
+   (any_int_binop_no_shift_vx:V_VLSI
+(vec_duplicate:V_VLSI
+  (match_operand: 1 "register_operand"))
+(match_operand:V_VLSI  2 "")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[2], operands[1]};
+riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
+  riscv_vector::BINARY_OP, ops);
+  }
+  [(set_attr "type" "vialu")])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d3656a7a430..31e9b06568a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3856,16 +3856,30 @@ riscv_extend_cost (rtx op, bool unsigned_p)
 #define SINGLE_SHIFT_COST 1
 
 static bool
-riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
ATTRIBUTE_UNUSED,
-int *total, bool speed)
+riscv_vector_rtx_costs (rtx x, machine_mode mode, int *total)
 {
+  gcc_assert (riscv_v_ext_mode_p (mode));
+
   /* TODO: We set RVV instruction cost as 1 by default.
  Cost Model need to be well analyzed and supported in the future. */
+  int cost_val = 1;
+  enum rtx_code rcode = GET_CODE (x);
+
+  /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ]))  */
+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0
+cost_val += get_vector_costs ()->regmove->GR2VR;
+
+  *total = COSTS_N_INSNS (cost_val);
+
+  return true;
+}
+
+static bool
+riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
ATTRIBUTE_UNUSED,
+int *total, bool speed)
+{
   if (riscv_v_ext_mo

[PATCH v2 2/3] RISC-V: Adjust the testcases after vec_duplicate + vadd.vv combine

2025-04-19 Thread pan2 . li
From: Pan Li 

After we support the vec_duplicate + vadd.vv combine to vadd.vx, the
existing testcases need some adjust for asm dump check times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
the asm dump check times.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/struct_vect_24.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c  | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c   | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c  | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/struct_vect_24.c | 6 +++---
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
index 667f457d658..7db55b298d1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
@@ -3,7 +3,8 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 10 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 6 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vf} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
index a3b012631be..65e569d9d1c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
@@ -3,6 +3,7 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 10 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 6 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 9 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
index 1d8a19ce0b2..4a48fce435e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
@@ -3,7 +3,8 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 8 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vf} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
index ef52f49657b..1cf6c06ecca 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
@@ -3,6 +3,7 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 8 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 9 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/struct_vect_24.c 
b/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
index 7c0852f1a55..9d36796f2ec 100644
--- a/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
+++ b/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
@@ -42,6 +42,6 @@ TEST (test)
 
 /* Check the vectorized loop for stack clash probing.  */
 
-/* { dg-final { scan-assembler-times {sd\tzero,1024\(sp\)} 6 } } */
-/* { dg-final { scan-assembler-times {bge\tt1,t0,.[^\\r\\n]*} 2 } } */
-/* { dg-final { scan-assembler-times {sub\s+t1,t1,t0} 2 } } */
+/* { dg-final { scan-assembler-times {sd\tzero,1024\(sp\)} 4 } } */
+/* { dg-final { scan-assembler-not {bge\tt1,t0,.[^\\r\\n]*} } } */
+/* { dg-final { scan-assembler-not {sub\s+t1,t1,t0} } } */
-- 
2.43.0



[PATCH v2 0/3] Introduce vec_dup + vadd.vv combine to vadd.vx

2025-04-19 Thread pan2 . li
From: Pan Li 

This patch series would like to introudce the vec_dup + vadd.vv combine
to vadd.vx, based on the cost of the GR2VR.  For example as below.

v1 = vec_dup(x2)
v2 = vec_add_vv(v3, v1)

will be optimized to below in late-combine

v2 = vec_add_vx(v3, x3)

If and only if the cost of (vec_dup + vec_add_vv) is greater than
the cost of vec_add_vx.

Pan Li (3):
  RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost
  RISC-V: Adjust the testcases after vec_duplicate + vadd.vv combine
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine to vadd.vx

 gcc/config/riscv/autovec-opt.md   |  22 +
 gcc/config/riscv/riscv.cc |  26 +-
 gcc/config/riscv/vector-iterators.md  |   4 +
 .../rvv/autovec/binop/vadd-rv32gcv-nofm.c |   3 +-
 .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|   3 +-
 .../rvv/autovec/binop/vadd-rv64gcv-nofm.c |   3 +-
 .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|   3 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  17 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 401 ++
 .../riscv/rvv/autovec/vx_vf/vx_binary_run.h   |  26 ++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|   8 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  14 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 .../gcc.target/riscv/struct_vect_24.c |   6 +-
 28 files changed, 679 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c

-- 
2.43.0



[pushed] c++: minor EXPR_STMT cleanup

2025-04-19 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I think it was around PR118574 that I noticed a few cases where we were
unnecessarily wrapping a statement tree in a further EXPR_STMT.  Let's avoid
that and also use finish_expr_stmt in a few places in the coroutines code
that were building EXPR_STMT directly.

gcc/cp/ChangeLog:

* coroutines.cc (coro_build_expr_stmt)
(coro_build_cvt_void_expr_stmt): Remove.
(build_actor_fn): Use finish_expr_stmt.
* semantics.cc (finish_expr_stmt): Avoid wrapping statement in
EXPR_STMT.
(finish_stmt_expr_expr): Add comment.
---
 gcc/cp/coroutines.cc | 21 ++---
 gcc/cp/semantics.cc  |  8 ++--
 2 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index b92d09fa4ea..743da068e35 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1852,21 +1852,6 @@ coro_build_frame_access_expr (tree coro_ref, tree 
member_id, bool preserve_ref,
   return expr;
 }
 
-/* Helpers to build EXPR_STMT and void-cast EXPR_STMT, common ops.  */
-
-static tree
-coro_build_expr_stmt (tree expr, location_t loc)
-{
-  return maybe_cleanup_point_expr_void (build_stmt (loc, EXPR_STMT, expr));
-}
-
-static tree
-coro_build_cvt_void_expr_stmt (tree expr, location_t loc)
-{
-  tree t = build1 (CONVERT_EXPR, void_type_node, expr);
-  return coro_build_expr_stmt (t, loc);
-}
-
 /* Helpers to build an artificial var, with location LOC, NAME and TYPE, in
CTX, and with initializer INIT.  */
 
@@ -2582,8 +2567,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   tree hfa = build_new_method_call (ash, hfa_m, &args, NULL_TREE, 
LOOKUP_NORMAL,
NULL, tf_warning_or_error);
   r = cp_build_init_expr (ash, hfa);
-  r = coro_build_cvt_void_expr_stmt (r, loc);
-  add_stmt (r);
+  finish_expr_stmt (r);
   release_tree_vector (args);
 
   /* Now we know the real promise, and enough about the frame layout to
@@ -2678,8 +2662,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
  we must tail call them.  However, some targets do not support indirect
  tail calls to arbitrary callees.  See PR94359.  */
   CALL_EXPR_TAILCALL (resume) = true;
-  resume = coro_build_cvt_void_expr_stmt (resume, loc);
-  add_stmt (resume);
+  finish_expr_stmt (resume);
 
   r = build_stmt (loc, RETURN_EXPR, NULL);
   gcc_checking_assert (maybe_cleanup_point_expr_void (r) == r);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 7f23efd4a11..1aa35d3861e 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1180,10 +1180,13 @@ finish_expr_stmt (tree expr)
 expr = error_mark_node;
 
   /* Simplification of inner statement expressions, compound exprs,
-etc can result in us already having an EXPR_STMT.  */
+etc can result in us already having an EXPR_STMT or other statement
+tree.  Don't wrap them in EXPR_STMT.  */
   if (TREE_CODE (expr) != CLEANUP_POINT_EXPR)
{
- if (TREE_CODE (expr) != EXPR_STMT)
+ if (TREE_CODE (expr) != EXPR_STMT
+ && !STATEMENT_CLASS_P (expr)
+ && TREE_CODE (expr) != STATEMENT_LIST)
expr = build_stmt (loc, EXPR_STMT, expr);
  expr = maybe_cleanup_point_expr_void (expr);
}
@@ -3082,6 +3085,7 @@ finish_stmt_expr_expr (tree expr, tree stmt_expr)
}
   else if (processing_template_decl)
{
+ /* Not finish_expr_stmt because we don't want convert_to_void.  */
  expr = build_stmt (input_location, EXPR_STMT, expr);
  expr = add_stmt (expr);
  /* Mark the last statement so that we can recognize it as such at

base-commit: 1dd769b3d0d9251649dcb645d7ed6c4ba2202306
-- 
2.49.0



Re: [PATCH] avoid-store-forwarding: Fix reg init on load-elimination [PR119160]

2025-04-19 Thread Jeff Law




On 4/18/25 4:37 PM, Sam James wrote:

Philipp Tomsich  writes:


Applied to trunk (16.0.0), thank you!
Should this be backported to the GCC-15 release branch as well?


BTW, what's the plan for enabling this on trunk now by default? (I don't recall 
if
some other issues were left.)

There's already an approved patch to flip it on for gcc-16.

jeff



Re: [PING^5][PATCH] Alpha: Fix base block alignment calculation regression

2025-04-19 Thread Jeff Law




On 4/14/25 5:50 AM, Maciej W. Rozycki wrote:

On Tue, 25 Feb 2025, Maciej W. Rozycki wrote:


Address this issue by recursing into COMPONENT_REF tree nodes until the
outermost one has been reached, which is supposed to be a MEM_REF one,
accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
Also use tree information to get base block alignment") regression.


  Ping for:
.

OK.  Clearly this one slipped through the cracks.

jeff


Re: [PATCH] recog: Handle some mode-changing hardreg propagations

2025-04-19 Thread Andreas Schwab
On Apr 19 2025, Jeff Law wrote:

> Well, I *think* Andreas's comment was suggesting that the patch was caused
> a build failure in libstdc++, so that needs to be addressed as before this
> could go forward.

Yes, it breaks the -mlra build.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[to-be-committed][RISC-V][PR target/119865] Don't free ggc allocated memory

2025-04-19 Thread Jeff Law
Kaiweng's patch to stop freeing riscv_arch_string was correct, but 
incomplete as there's another path that was freeing that node, which is 
just plain wrong for a node allocated by the GC system.


This patch removes that call to free() which fixes the test.  I've spun 
it in my tester and will obviously wait for the pre-commit system to 
render a verdict before moving forward.


Jeff

PR target/119865
gcc/
* config/riscv/riscv.cc (parse_features_for_version): Do not
explicitly free the architecture string.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d3656a7a430..bad59e248d0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13136,9 +13136,6 @@ parse_features_for_version (tree decl,
  DECL_SOURCE_LOCATION (decl));
   gcc_assert (parse_res);
 
-  if (arch_string != default_opts->x_riscv_arch_string)
-free (CONST_CAST (void *, (const void *) arch_string));
-
   cl_target_option_restore (&global_options, &global_options_set,
&cur_target);
 }


Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-19 Thread H.J. Lu
On Sun, Apr 20, 2025 at 4:19 AM Jan Hubicka  wrote:
>
> > On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu  wrote:
> > >
> > > Simplify memcpy and memset inline strategies to avoid branches for
> > > -mtune=generic:
> > >
> > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> > >load and store for up to 16 * 16 (256) bytes when the data size is
> > >fixed and known.
>
> Originally we set CLEAR_RATION smaller than MOVE_RATIO because to store
> zeros we use:
>
>0:   48 c7 07 00 00 00 00movq   $0x0,(%rdi)
>7:   48 c7 47 08 00 00 00movq   $0x0,0x8(%rdi)
>e:   00
>f:   48 c7 47 10 00 00 00movq   $0x0,0x10(%rdi)
>   16:   00
>   17:   48 c7 47 18 00 00 00movq   $0x0,0x18(%rdi)
>   1e:   00
>
> so about 8 bytes per instructions.   We could optimize it by loading 0

This is orthogonal to this patch.

> to scratch register but we don't.  SSE variant is shorter:
>
>4:   0f 11 07movups %xmm0,(%rdi)
>7:   0f 11 47 10 movups %xmm0,0x10(%rdi)
>
> So I wonder if we care about code size with -mno-sse (i.e. for building
> kernel).

This patch doesn't change -Os behavior which uses x86_size_cost,
not generic_cost.

> > >  static stringop_algs generic_memcpy[2] = {
> > > -  {libcall, {{32, loop, false}, {8192, rep_prefix_4_byte, false},
> > > - {-1, libcall, false}}},
> > > -  {libcall, {{32, loop, false}, {8192, rep_prefix_8_byte, false},
> > > - {-1, libcall, false;
> > > +  {libcall,
> > > +   {{256, rep_prefix_1_byte, true},
> > > +{256, loop, false},
> False/true here is stringop_algs->noalign field which is used to control
> enable/siable alignment prologue.   For rep_prefix_1_byte it should be
> noop except for pentiumpro which preferred alignment of 8.
>
> decide_alg picks first useable algorithm with size greater than expected
> size of the block.  rep_prefix_1_byte may become unuseable if user fixes
> AX/CX/SI/DI, but it won't pick loop if size is known.
>
> A reason why we use loop for small blocks is that Buldozers were quite
> poor on handling rep movsb for very small blocks.  We probably want to
> retune generic w/o too much of buldozer specific considerations.
> There is a simple microbenchmark in contrib/bench-stringops that cycles
> through different algs and different average sizes
>
> on znver5 and memcpy I get:
> memcpy
>   block size  libcall rep1noalg   rep4noalg   rep8noalg   loop
> noalg   unrlnoalg   sse noalg   bytePGO dynamicBEST
>  8192000  0:00.07 0:00.11 0:00.11 0:00.11 0:00.09 0:00.11 0:00.08 0:00.15 
> 0:00.15 0:00.10 0:00.11 0:00.10 0:00.08 0:01.18 0:00.07 0:00.070:00.07 
> libcall
>   819200  0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.07 0:00.15 
> 0:00.15 0:00.09 0:00.10 0:00.09 0:00.09 0:01.17 0:00.07 0:00.070:00.07 
> libcall
>81920  0:00.08 0:00.08 0:00.11 0:00.05 0:00.06 0:00.07 0:00.06 0:00.20 
> 0:00.20 0:00.12 0:00.13 0:00.10 0:00.10 0:01.57 0:00.08 0:00.080:00.05 
> rep4
>20480  0:00.06 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 0:00.15 
> 0:00.15 0:00.08 0:00.10 0:00.06 0:00.06 0:01.18 0:00.05 0:00.060:00.05 
> rep1
> 8192  0:00.05 0:00.04 0:00.04 0:00.03 0:00.05 0:00.03 0:00.05 0:00.15 
> 0:00.16 0:00.09 0:00.10 0:00.06 0:00.07 0:01.17 0:00.03 0:00.050:00.03 
> rep4
> 4096  0:00.03 0:00.04 0:00.04 0:00.04 0:00.05 0:00.04 0:00.05 0:00.16 
> 0:00.17 0:00.09 0:00.10 0:00.07 0:00.07 0:01.18 0:00.04 0:00.040:00.03 
> libcall
> 2048  0:00.03 0:00.05 0:00.05 0:00.05 0:00.06 0:00.05 0:00.05 0:00.18 
> 0:00.18 0:00.08 0:00.10 0:00.09 0:00.08 0:01.20 0:00.05 0:00.050:00.03 
> libcall
> 1024  0:00.04 0:00.07 0:00.06 0:00.07 0:00.07 0:00.07 0:00.06 0:00.19 
> 0:00.19 0:00.09 0:00.10 0:00.09 0:00.10 0:01.23 0:00.07 0:00.070:00.04 
> libcall
>  512  0:00.06 0:00.11 0:00.11 0:00.11 0:00.11 0:00.11 0:00.10 0:00.18 
> 0:00.19 0:00.10 0:00.11 0:00.12 0:00.13 0:01.30 0:00.11 0:00.110:00.06 
> libcall
>  256  0:00.11 0:00.18 0:00.18 0:00.17 0:00.17 0:00.17 0:00.16 0:00.17 
> 0:00.19 0:00.12 0:00.13 0:00.17 0:00.20 0:01.40 0:00.17 0:00.170:00.11 
> libcall
>  128  0:00.16 0:00.27 0:00.27 0:00.20 0:00.18 0:00.19 0:00.18 0:00.20 
> 0:00.21 0:00.17 0:00.18 0:00.31 0:00.48 0:01.41 0:00.19 0:00.190:00.16 
> libcall
>   64  0:00.24 0:00.23 0:00.23 0:00.39 0:00.36 0:00.37 0:00.34 0:00.26 
> 0:00.26 0:00.26 0:00.27 0:00.68 0:00.81 0:01.57 0:00.36 0:00.370:00.23 
> rep1
>   48  0:00.30 0:00.34 0:00.34 0:00.51 0:00.50 0:00.49 0:00.47 0:00.33 
> 0:00.32 0:00.32 0:00.33 0:00.84 0:00.96 0:01.48 0:00.49 0:00.490:00.30 
> libcall
>   32  0:00.40 0:00.46 0:00.47 0:00.76 0:00.71 0:00.71 0:00.65 0:00.43 
> 0:00.42 0:00.43 0:00.42 0:01.26 0:01.13 0:01.26 0:00.71 0:00.430:00.40 
> libcall
>   24  0:00.54 0:00.67 0:00.65 0:01.01 0:00.98 0:00.95 0:00.89 0:00.57 
> 0:00.52 0:00.53 0:00.52 0

[PATCH] libstdc++: Finalize GCC 15 ABI

2025-04-19 Thread Andreas Schwab
Disallow adding new symbols to GLIBCXX_3.4.34 and CXXABI_1.3.16 versions.

* testsuite/util/testsuite_abi.cc (check_version): Update latestp
to use GLIBCXX_3.4.35 and CXXABI_1.3.17.
---
 libstdc++-v3/testsuite/util/testsuite_abi.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index 1b4044c9518..90cda2fbca8 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -258,8 +258,8 @@ check_version(symbol& test, bool added)
test.version_status = symbol::incompatible;
 
   // Check that added symbols are added in the latest pre-release version.
-  bool latestp = (test.version_name == "GLIBCXX_3.4.34"
-|| test.version_name == "CXXABI_1.3.16"
+  bool latestp = (test.version_name == "GLIBCXX_3.4.35"
+|| test.version_name == "CXXABI_1.3.17"
 || test.version_name == "CXXABI_FLOAT128"
 || test.version_name == "CXXABI_TM_1");
   if (added && !latestp)
-- 
2.49.0


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH v2 3/3] RISC-V: Add testcases for vec_duplicate + vadd.vv combine to vadd.vx

2025-04-19 Thread pan2 . li
From: Pan Li 

Add asm dump check and run test for vec_duplicate + vadd.vv combine
to vadd.vx.  Introduce new folder to hold all related testcases.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add new folder vx_vf for all
vec_dup + vv to vx testcases.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  17 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 401 ++
 .../riscv/rvv/autovec/vx_vf/vx_binary_run.h   |  26 ++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|   8 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  14 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 20 files changed, 622 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
new file mode 100644
index 000..66654eb9022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -0,0 +1,17 @@
+#ifndef HAVE_DEFINED_VX_VF_BINARY_H
+#define HAVE_DEFINED_VX_VF_BINARY_H
+
+#include 
+
+#define DEF_VX_BI

[PATCH] strlen: Handle empty constructor as memset for combining with malloc to calloc [PR87900]

2025-04-19 Thread Andrew Pinski
This was noticed when turning memset (with constant size) into a store of an 
empty constructor
but can be reproduced without that.
In this case we have the following IR:
```
  p_3 = __builtin_malloc (4096);
  *p_3 = {};
```

Which we can treat the store as a memset.
So this patch adds the similar optimization as memset/malloc now for 
malloc/constructor.
This patch is on top of 
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681439.html
(it calls allow_memset_malloc_to_calloc but that can be removed if that patch 
is rejected).

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/87900

gcc/ChangeLog:

* tree-ssa-strlen.cc  (strlen_pass::handle_assign): Add RHS argument.
For empty constructor RHS, see if can combine with a previous malloc 
into
a calloc.
(strlen_pass::check_and_optimize_call): Update call to handle_assign;
passing NULL_TREE for RHS.
(strlen_pass::check_and_optimize_stmt): Update call to handle_assign.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/calloc-10.c: New test.
* gcc.dg/tree-ssa/calloc-11.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c | 19 
 gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c | 21 +
 gcc/tree-ssa-strlen.cc| 56 +--
 3 files changed, 91 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c
new file mode 100644
index 000..6d91563dc15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/calloc-10.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* PR tree-optimization/87900 */
+
+/* zeroing out via a CONSTRUCTOR should be treated similarly as a msmet and
+   be combined with the malloc below.  */
+
+struct S { int a[1024]; };
+struct S *foo ()
+{
+  struct S *p = (struct S *)__builtin_malloc (sizeof (struct S));
+  *p = (struct S){};
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-times "calloc " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c
new file mode 100644
index 000..397d7fa1fe8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/calloc-11.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* PR tree-optimization/87900 */
+
+/* zeroing out via a CONSTRUCTOR should be treated similarly as a msmet and
+   be combined with the malloc below.  */
+typedef int type;
+
+#define size (1025)
+type *foo ()
+{
+  type *p = (type *)__builtin_malloc (size*sizeof(type));
+  type tmp[size] = {};
+  __builtin_memcpy(p,tmp,sizeof(tmp));
+  return p;
+}
+
+/* { dg-final { scan-tree-dump-times "calloc " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset " "optimized" } } */
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index e69ceeffb03..f56570c3b78 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -249,7 +249,7 @@ public:
 
   bool check_and_optimize_stmt (bool *cleanup_eh);
   bool check_and_optimize_call (bool *zero_write);
-  bool handle_assign (tree lhs, bool *zero_write);
+  bool handle_assign (tree lhs, tree rhs, bool *zero_write);
   bool handle_store (bool *zero_write);
   void handle_pointer_plus ();
   void handle_builtin_strlen ();
@@ -5524,7 +5524,7 @@ strlen_pass::check_and_optimize_call (bool *zero_write)
}
 
   if (tree lhs = gimple_call_lhs (stmt))
-   handle_assign (lhs, zero_write);
+   handle_assign (lhs, NULL_TREE, zero_write);
 
   /* Proceed to handle user-defined formatting functions.  */
 }
@@ -5743,15 +5743,61 @@ strlen_pass::handle_integral_assign (bool *cleanup_eh)
 }
 
 /* Handle assignment statement at *GSI to LHS.  Set *ZERO_WRITE if
-   the assignment stores all zero bytes.  */
+   the assignment stores all zero bytes. RHS is the rhs of the
+   statement if not a call.  */
 
 bool
-strlen_pass::handle_assign (tree lhs, bool *zero_write)
+strlen_pass::handle_assign (tree lhs, tree rhs, bool *zero_write)
 {
   tree type = TREE_TYPE (lhs);
   if (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
 
+  if (rhs && TREE_CODE (rhs) == CONSTRUCTOR
+  && TREE_CODE (lhs) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (lhs, 0)) == SSA_NAME
+  && integer_zerop (TREE_OPERAND (lhs, 1)))
+{
+  /* Set to the non-constant offset added to PTR.  */
+  wide_int offrng[2];
+  gcc_assert (CONSTRUCTOR_NELTS (rhs) == 0);
+  tree ptr = TREE_OPERAND (lhs, 0);
+  tree len = TYPE_SIZE_UNIT (TREE_TYPE (lhs));
+  in

Re: [PING^5][PATCH] Alpha: Fix base block alignment calculation regression

2025-04-19 Thread Maciej W. Rozycki
On Sat, 19 Apr 2025, Jeff Law wrote:

> > > Address this issue by recursing into COMPONENT_REF tree nodes until the
> > > outermost one has been reached, which is supposed to be a MEM_REF one,
> > > accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
> > > Also use tree information to get base block alignment") regression.
> > 
> >   Ping for:
> > .
> OK.  Clearly this one slipped through the cracks.

 Good timing, I've applied it now just as I'm about to head away for some 
holiday time.  I'll take care of the other outstanding stuff in this area 
once GCC 16 has opened and work on the Linux kernel side meanwhile, when
I'm back.  Thank you for your review.

  Maciej


Re: [PATCH v4 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-19 Thread Luc Grosheintz




On 4/18/25 7:47 PM, Luc Grosheintz wrote:


On 4/18/25 2:00 PM, Tomasz Kaminski wrote:




On Fri, Apr 18, 2025 at 1:43 PM Luc Grosheintz 
mailto:luc.groshei...@gmail.com>> wrote:


    This implements std::extents from  according to N4950 and
    contains partial progress towards PR107761.

    If an extent changes its type, there's a precondition in the 
standard,

    that the value is representable in the target integer type. This
    precondition is not checked at runtime.

    The precondition for 'extents::{static_,}extent' is that '__r < 
rank()'.

    For extents this precondition is always violated and results in
    calling __builtin_trap. For all other specializations it's checked 
via

    __glibcxx_assert.

         PR libstdc++/107761

    libstdc++-v3/ChangeLog:

         * include/std/mdspan (extents): New class.
         * src/c++23/std.cc.in : Add 'using
    std::extents'.

    Signed-off-by: Luc Grosheintz mailto:luc.groshei...@gmail.com>>
    ---

LGTM.
Below, I shared one idea that I found interesting, and you could look 
into,

but not necessary.

  libstdc++-v3/include/std/mdspan  | 249 + 
++

  libstdc++-v3/src/c++23/std.cc.in  |   6 +-
  2 files changed, 254 insertions(+), 1 deletion(-)

    diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/
    std/mdspan
    index 4094a416d1e..f7a47552485 100644
    --- a/libstdc++-v3/include/std/mdspan
    +++ b/libstdc++-v3/include/std/mdspan
    @@ -33,6 +33,11 @@
  #pragma GCC system_header
  #endif

    +#include 
    +#include 
    +#include 
    +#include 
    +
  #define __glibcxx_want_mdspan
  #include 

    @@ -41,6 +46,250 @@
  namespace std _GLIBCXX_VISIBILITY(default)
  {
  _GLIBCXX_BEGIN_NAMESPACE_VERSION
    +  namespace __mdspan
    +  {
    +    template
    +      class _ExtentsStorage
    +      {
    +      public:
    +       static constexpr bool
    +       _S_is_dyn(size_t __ext) noexcept
    +       { return __ext == dynamic_extent; }
    +
    +       template
    +         static constexpr _IndexType
    +         _S_int_cast(const _OIndexType& __other) noexcept
    +         { return _IndexType(__other); }
    +
    +       static constexpr size_t _S_rank = _Extents.size();
    +
    +       // For __r in [0, _S_rank], _S_dynamic_index[__r] is the 
number

    +       // of dynamic extents up to (and not including) __r.
    +       //
    +       // If __r is the index of a dynamic extent, then
    +       // _S_dynamic_index[__r] is the index of that extent in
    +       // _M_dynamic_extents.
    +       static constexpr auto _S_dynamic_index = [] consteval
    +       {
    +         array __ret;
    +         size_t __dyn = 0;
    +         for(size_t __i = 0; __i < _S_rank; ++__i)
    +           {
    +             __ret[__i] = __dyn;
    +             __dyn += _S_is_dyn(_Extents[__i]);
    +           }
    +         __ret[_S_rank] = __dyn;
    +         return __ret;
    +       }();
    +
    +       static constexpr size_t _S_rank_dynamic =
    _S_dynamic_index[_S_rank];
    +
    +       // For __r in [0, _S_rank_dynamic),
    _S_dynamic_index_inv[__r] is the
    +       // index of the __r-th dynamic extent in _Extents.
    +       static constexpr auto _S_dynamic_index_inv = [] consteval
    +       {
    +         array __ret;
    +         for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
    +           if (_S_is_dyn(_Extents[__i]))
    +             __ret[__r++] = __i;
    +         return __ret;
    +       }();
    +
    +       static constexpr size_t
    +       _S_static_extent(size_t __r) noexcept
    +       { return _Extents[__r]; }
    +
    +       constexpr _IndexType
    +       _M_extent(size_t __r) const noexcept
    +       {
    +         auto __se = _Extents[__r];
    +         if (__se == dynamic_extent)
    +           return _M_dynamic_extents[_S_dynamic_index[__r]];
    +         else
    +           return __se;
    +       }
    +
    +       template
    +         constexpr void
    +         _M_init_dynamic_extents(_GetOtherExtent __get_extent) 
noexcept

    +         {
    +           for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
    +             {
    +               size_t __di = __i;
    +               if constexpr (_OtherRank != _S_rank_dynamic)
    +                 __di = _S_dynamic_index_inv[__i];
    +               _M_dynamic_extents[__i] =
    _S_int_cast(__get_extent(__di));
    +             }
    +         }
    +
    +       constexpr
    +       _ExtentsStorage() noexcept = default;
    +
    +       template
    +         constexpr
    +         _ExtentsStorage(const _ExtentsStorage<_OIndexType, 
_OExtents>&

    +                         __other) noexcept
    +         {
    +           _M_init_dynamic_extents<_S_rank>([&__other](size_t __i)
    +             { return __other._M_extent(__i); });
    +         }
  

[to-be-committed][RISC-V][PR target/118410] Improve code generation for some logical ops

2025-04-19 Thread Jeff Law
I'm posting this on behalf of Shreya Munnangi who is working as an 
intern with me.  I've got her digging into prerequisites for removing 
mvconst_internal and would prefer she focus on that rather than our 
patch process at this time.


--



We can use the orn, xnor, andn instructions on RISC-V to improve the 
code generated logical operations when one operand is a constant C where 
synthesizing ~C is cheaper than synthesizing C.


This is going to be an N -> N - 1 splitter rather than a 
define_insn_and_split.  A define_insn_and_split can obviously work, but 
has multiple undesirable effects in general.


As a result of implementing as a simple define_split we're not 
supporting AND at this time.  We need to clean up the mvconst_internal 
situation first after which supporting AND is trivial.


This has been tested in Ventana's CI system as well as my tester. 
Obviously we'll wait for the pre-commit tester to run before moving forward.


Jeff


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 2a3884cfde0..fd49d6bf8e0 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -1263,3 +1263,41 @@ (define_expand "crc4"
   expand_crc_using_clmul (mode, mode, operands);
   DONE;
 })
+
+;; If we have an XOR/IOR with a constant operand (C) and the we can
+;; synthesize ~C more efficiently than C, then synthesize ~C and use
+;; xnor/orn instead.
+;;
+;; The same can be done for AND, but mvconst_internal's issues get in
+;; the way.  That's future work.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (any_or:X (match_operand:X 1 "register_operand")
+ (match_operand:X 2 "const_int_operand")))
+   (clobber (match_operand:X 3 "register_operand"))]
+  "TARGET_ZBB
+   && (riscv_const_insns (operands[2], true)
+   > riscv_const_insns (GEN_INT (~INTVAL (operands[2])), true))"
+  [(const_int 0)]
+{
+  /* Get the inverted constant into the temporary register.  */
+  riscv_emit_move (operands[3], GEN_INT (~INTVAL (operands[2])));
+
+  /* For xnor, the NOT operation is in a different position.  So
+ we have to customize the split code we generate a bit. 
+
+ It is expected that AND will be handled like IOR in the future. */
+  if ( == XOR)
+{
+  rtx x = gen_rtx_XOR (mode, operands[1], operands[3]);
+  x = gen_rtx_NOT (mode, x);
+  emit_insn (gen_rtx_SET (operands[0], x));
+}
+  else
+{
+  rtx x = gen_rtx_NOT (mode, operands[3]);
+  x = gen_rtx_IOR (mode, x, operands[1]);
+  emit_insn (gen_rtx_SET (operands[0], x));
+}
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/riscv/pr118410-1.c 
b/gcc/testsuite/gcc.target/riscv/pr118410-1.c
new file mode 100644
index 000..4a8b847d4f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr118410-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gcb -mabi=lp64d" { target { rv64} } } */
+/* { dg-options "-march=rv32gcb -mabi=ilp32" { target { rv32} } } */
+
+long orlow(long x) { return x | ((1L << 24) - 1); }
+
+/* { dg-final { scan-assembler-times "orn\t" 1 } } */
+/* { dg-final { scan-assembler-not "addi\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr118410-2.c 
b/gcc/testsuite/gcc.target/riscv/pr118410-2.c
new file mode 100644
index 000..b63a1d9c465
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr118410-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gcb -mabi=lp64d" { target { rv64} } } */
+/* { dg-options "-march=rv32gcb -mabi=ilp32" { target { rv32} } } */
+
+long xorlow(long x) { return x ^ ((1L << 24) - 1); }
+
+/* { dg-final { scan-assembler-times "xnor\t" 1 } } */
+/* { dg-final { scan-assembler-not "addi\t" } } */


Re: [PATCH 01/61] Multilib changes

2025-04-19 Thread Jeff Law




On 1/31/25 10:13 AM, Aleksandar Rakic wrote:

From: Robert Suchanek 

Remove single-float and short-double axes from multilib spec.

The single-float/short-double combination is not immediately supportable
from GCC 6 as the -fshort-double option has been removed and we do not
have backend logic to implement a direct replacement. If/when we do this
then it needs appropriate ABI markers to describe the additional
variant.

Remove final remnant of single/short config.

Add the mips32r2 mips16 little endian soft-float multilib.

Add big-endian, MIPS64R6, soft-float, N32/N64 Linux libs.

Add MIPS32R1 HF LE Linux libraries.

Add big endian microMIPSr2 hard/soft float support.

Disable microMIPSr6 multilib configs.

Cherry-picked 2b2481cc71284ad9db3dff60bd6cab2be678e87e,
0e3416279af1417b85d1a09b1e74327c31899a5d,
e50ab07265fd8188bd4275c14b744ed2dc39116d,
32f7098d7d5bee9754c7728639a0e1cdb24d63f7,
24e261b2c9a9bea1c205cfab761c218ad50f938e, and
796ddebed418e953ba7cd5de1da42311fb1fe096
from https://github.com/MIPS/gcc

Signed-off-by: Robert Suchanek 
Signed-off-by: Matthew Fortune 
Signed-off-by: Chao-ying Fu 
Signed-off-by: Faraz Shahbazker 
Signed-off-by: Aleksandar Rakic 
---
  config-ml.in |  25 ++-
  configure|  25 +++
  configure.ac |  25 +++
  gcc/Makefile.in  |  20 ++
  gcc/config.gcc   |  12 +-
  gcc/config/mips/ml-img-elf   |  12 +
  gcc/config/mips/ml-img-linux |  10 +
  gcc/config/mips/ml-mti-elf   |  31 +++
  gcc/config/mips/ml-mti-linux |  27 +++
  gcc/config/mips/mti-elf.h|   2 +
  gcc/config/mips/mti-linux.h  |   2 +
  gcc/config/mips/t-img-elf|  33 ---
  gcc/config/mips/t-img-linux  |  38 
  gcc/config/mips/t-mips-multi | 409 +++
  gcc/config/mips/t-mti-elf|  48 
  gcc/config/mips/t-mti-linux  | 158 --
  gcc/configure|   8 +-
  gcc/configure.ac |   3 +
  gcc/genmultilib  |   3 -
  19 files changed, 604 insertions(+), 287 deletions(-)
  create mode 100644 gcc/config/mips/ml-img-elf
  create mode 100644 gcc/config/mips/ml-img-linux
  create mode 100644 gcc/config/mips/ml-mti-elf
  create mode 100644 gcc/config/mips/ml-mti-linux
  delete mode 100644 gcc/config/mips/t-img-elf
  delete mode 100644 gcc/config/mips/t-img-linux
  create mode 100644 gcc/config/mips/t-mips-multi
  delete mode 100644 gcc/config/mips/t-mti-elf
  delete mode 100644 gcc/config/mips/t-mti-linux
So I'm not at all concerned about the mips specific bits of this patch. 
After all, they only affect mips ports and the changes seem sensible. 
They would need a ChangeLog entry to go forward through.


What is concerning is the config.ml change which has no comments about 
what it's doing or justification in the cover ltter.


Similarly it's not clear why we need a blob of mips specific code in 
configure.ac and the files autogenerated from that.


Jeff


Re: [PATCH v2] sh: libgcc: Implement fenv rouding and exceptions for soft-fp [PR118257]

2025-04-19 Thread Oleg Endo
On Sat, 2025-04-19 at 08:13 -0600, Jeff Law wrote:
> 
> On 1/1/25 10:02 AM, Jiaxun Yang wrote:
> > Implement fenv rouding and exceptions for soft-fp, as per SuperH
> > arch specification.
> > 
> > No new tests required, as it's already covered by many torture tests
> > with fenv_exceptions.
> > 
> > PR target/118257
> > 
> > libgcc/ChangeLog:
> > 
> > * config/sh/sfp-machine.h (_FPU_GETCW): Implement with builtin.
> > (_FPU_SETCW): Likewise.
> > (FP_EX_ENABLE_SHIFT): Derive from arch spec.
> > (FP_EX_CAUSE_SHIFT): Likewise.
> > (FP_RND_MASK): Likewise.
> > (FP_EX_INVALID): Likewise.
> > (FP_EX_DIVZERO): Likewise.
> > (FP_EX_ALL): Likewise.
> > (FP_EX_OVERFLOW): Likewise.
> > (FP_EX_UNDERFLOW): Likewise.
> > (FP_EX_INEXACT): Likewise.
> > (_FP_DECL_EX): Declear default FCSR value.
> > (FP_RND_NEAREST): Derive from arch spec.
> > (FP_RND_ZERO): Likewise.
> > (FP_INIT_ROUNDMODE): Likewise.
> > (FP_ROUNDMODE): Likewise.
> > (FP_TRAPPING_EXCEPTIONS): Likewise.
> > (FP_HANDLE_EXCEPTIONS): Implement with _FPU_SETCW.
> I've pushed to the trunk as well as the assumption that you've got these 
> various constants right.  Oleg, hope I'm not stepping on your toes here, 
> I'm just starting to work through some of the safer gcc-16 stuff.
> 

No, not at all.  Thanks for keeping an eye on it!
I think the committed v2 patch looked good.

Best regards,
Oleg Endo


Re: [PATCH 04/61] Enable LSAN and TSAN for mips with the 64-bit abi

2025-04-19 Thread Jeff Law




On 1/31/25 10:13 AM, Aleksandar Rakic wrote:

From: Chao-ying Fu 

Cherry-picked b9fd138826394dd188936c8031dec676e2d16b47
from https://github.com/MIPS/gcc

Signed-off-by: Chao-ying Fu 
Signed-off-by: Aleksandar Rakic 
---
  libsanitizer/configure.tgt | 5 +
  1 file changed, 5 insertions(+)
This is probably OK, but it's unclear to me if it's dependent upon any 
of the earlier changes.  If it's independent of other changes, then it 
could go in now with a suitable ChangeLog entry.


jeff




Re: [PATCH v2] sh: Correct NaN signalling bit and propagation rules [PR111814]

2025-04-19 Thread Oleg Endo



On Sat, 2025-04-19 at 08:29 -0600, Jeff Law wrote:
> 
> On 1/1/25 6:54 AM, Jiaxun Yang wrote:
> > As per architecture, SuperH has a reversed NaN signalling bit
> > vs IEEE754-2008, it also has a NaN propgation rule similar to
> > MIPS style.
> > 
> > Use mips style float format and mode for all float types, and
> > correct sfp-machine header accordingly.
> > 
> > PR target/111814
> > 
> > gcc/ChangeLog:
> > 
> > * config/sh/sh-modes.def (RESET_FLOAT_FORMAT): Use mips format.
> > (FLOAT_MODE): Use mips mode.
> > 
> > libgcc/ChangeLog:
> > 
> > * config/sh/sfp-machine.h (_FP_NANFRAC_B): Reverse signaling bit.
> > (_FP_NANFRAC_H): Likewise.
> > (_FP_NANFRAC_S): Likewise.
> > (_FP_NANFRAC_D): Likewise.
> > (_FP_NANFRAC_Q): Likewise.
> > (_FP_KEEPNANFRACP): Enable for target.
> > (_FP_QNANNEGATEDP): Enable for target.
> > (_FP_CHOOSENAN): Port from MIPS.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/sh/pr111814.c: New test.
> I haven't seen an explicit ack from Oleg, but he did signal in the PR 
> trail that he was generally on board.
> 
> In the PR trail Joseph noted some desirable changes to glibc.  We're in 
> a bit of a chicken and the egg problem if I read things correctly.  But 
> if someone doesn't go first it'll never untangle.
> 
> So I'll go ahead and ACK for the trunk.  We can backport to the release 
> branches per Joseph's recommendation after it's been on the trunk a bit. 
>   Ideally the glibc side of this would get wrapped up before that 
> project's fall release.
> 

Yes, all good.  Thanks again!

Best regards,
Oleg Endo


Re: [PATCH 07/61] Testsuite: Fix tests properly for compact-branches

2025-04-19 Thread Jeff Law




On 1/31/25 10:13 AM, Aleksandar Rakic wrote:

From: abennett 

Cherry-picked 4420f953c31daf1991011d306a56ab74c39b44ee
and 83c13cb19cb1e87a25326024943b95930a17e86b
from https://github.com/MIPS/gcc

Signed-off-by: Andrew Bennett 
Signed-off-by: Matthew Fortune 
Signed-off-by: Faraz Shahbazker 
Signed-off-by: Aleksandar Rakic 
---
  gcc/testsuite/gcc.target/mips/near-far-1.c | 10 +-
  gcc/testsuite/gcc.target/mips/near-far-2.c | 10 +-
  gcc/testsuite/gcc.target/mips/near-far-3.c | 10 +-
  gcc/testsuite/gcc.target/mips/near-far-4.c | 10 +-

This likely needs to be updated for the trunk.

Before:


=== gcc Summary ===

# of expected passes95
# of unexpected failures25


After:
=== gcc Summary ===

# of expected passes70
# of unexpected failures50

Clearly not going in the right direction.  Configured as 
mips64el-linux-gnuabi64.  Running just the near-far-?.c tests.


jeff



Re: [PATCH 11/61] Fix unsafe comparison against stack_pointer_rtx

2025-04-19 Thread Jeff Law




On 1/31/25 10:13 AM, Aleksandar Rakic wrote:

From: Andrew Bennett 

GCC can modify a rtx which was created using stack_pointer_rtx.
This means that just doing a straight address comparision of a rtx
against stack_pointer_rtx to see whether it is the stack pointer
register will not be correct in all cases.
Umm, no.  There is one and only one stack_pointer_rtx.  If something is 
modifying stack_pointer_rtx, then that's a bug.  This feels like it's 
papering over a problem elsewhere.  At the least it would need a better 
explanation of how/why you're getting addresses that reference the same 
hard register as the stack pointer, but which aren't stack_pointer_rtx.


I vaguely recall a problem in this space from regrename.cc, but I 
thought we fixed that long ago.


jeff



[PATCH] Add assert to array_slice::begin/end

2025-04-19 Thread Andrew Pinski
So while debugging PR 118320, I found it was useful to have
an assert inside array_slice::begin/end that the array slice isvalid
rather than getting an segfault. This adds an assert that is only
enabled for checking.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* vec.h (array_slice::begin): Assert that the
slice is valid.
(array_slice::end): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/vec.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index 915df06f03e..eae4b0feb4b 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -2395,11 +2395,11 @@ public:
   array_slice (vec *v)
 : m_base (v ? v->address () : nullptr), m_size (v ? v->length () : 0) {}
 
-  iterator begin () { return m_base; }
-  iterator end () { return m_base + m_size; }
+  iterator begin () {  gcc_checking_assert (is_valid ()); return m_base; }
+  iterator end () {  gcc_checking_assert (is_valid ()); return m_base + 
m_size; }
 
-  const_iterator begin () const { return m_base; }
-  const_iterator end () const { return m_base + m_size; }
+  const_iterator begin () const { gcc_checking_assert (is_valid ()); return 
m_base; }
+  const_iterator end () const { gcc_checking_assert (is_valid ()); return 
m_base + m_size; }
 
   value_type &front ();
   value_type &back ();
-- 
2.43.0



Re: [PATCH 55/61] Performance drop in mips-img-linux-gnu-gcc 7.x

2025-04-19 Thread Jeff Law




On 2/3/25 2:40 AM, Richard Biener wrote:

On Fri, Jan 31, 2025 at 7:10 PM Aleksandar Rakic
 wrote:


From: Mihailo Stojanovic 


This looks like a target specific hack, this should be addressed generally
instead of opening up gcse internals to a target hook.

This should also at least come with a testcase.
Agreed across the board.  If we had a testcase then we might even be 
able to suggest avenues for improvement that aren't a hack.  But as it 
stands there's no way for this to go forward as-is.



Jeff



[PUSHED] Fix pr118947-1.c and pr78408-3.c on targets where 32 bytes memcpy uses a vector

2025-04-19 Thread Andrew Pinski
The problem here is on targets where a 32byte memcpy will use an integral 
(vector) type
to do the copy and the code will be optimized a different way than expected. 
This changes
the testcase instead to use a size of 1025 to make sure there is no target that 
will use an
integral (vector) type for the memcpy and be optimized via the method that was 
just added.

Pushed as obvious after a test run.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118947-1.c: Use 1025 as the size of the buf.
* gcc.dg/pr78408-3.c: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/pr118947-1.c | 4 ++--
 gcc/testsuite/gcc.dg/pr78408-3.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr118947-1.c 
b/gcc/testsuite/gcc.dg/pr118947-1.c
index 70b7f800065..8733e8d7f5c 100644
--- a/gcc/testsuite/gcc.dg/pr118947-1.c
+++ b/gcc/testsuite/gcc.dg/pr118947-1.c
@@ -6,10 +6,10 @@
 void* aaa();
 void* bbb()
 {
-char buf[32] = {};
+char buf[1025] = {};
 /*  Tha call to aaa should not matter and clobber buf. */
 void* ret = aaa();
-__builtin_memcpy(ret, buf, 32);
+__builtin_memcpy(ret, buf, sizeof(buf));
 return ret;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr78408-3.c b/gcc/testsuite/gcc.dg/pr78408-3.c
index 3de90d02392..5ea545868ad 100644
--- a/gcc/testsuite/gcc.dg/pr78408-3.c
+++ b/gcc/testsuite/gcc.dg/pr78408-3.c
@@ -7,8 +7,8 @@ void* aaa();
 void* bbb()
 {
 void* ret = aaa();
-char buf[32] = {};
-__builtin_memcpy(ret, buf, 32);
+char buf[1025] = {};
+__builtin_memcpy(ret, buf, sizeof(buf));
 return ret;
 }
 
-- 
2.43.0



[PUSHED] Disable parallel testing for 'rust/compile/nr2/compile.exp' [PR119508]

2025-04-19 Thread Thomas Schwinge
..., using the standard idiom.  This '*.exp' file doesn't adhere to the
parallel testing protocol as defined in 'gcc/testsuite/lib/gcc-defs.exp'.

This also restores proper behavior for '*.exp' files executing after (!) this
one, which erroneously caused hundreds or even thousands of individual test
cases get duplicated vs. skipped, randomly, depending on the '-jN' level.

PR testsuite/119508
gcc/testsuite/
* rust/compile/nr2/compile.exp: Disable parallel testing.
---
 gcc/testsuite/rust/compile/nr2/compile.exp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/rust/compile/nr2/compile.exp 
b/gcc/testsuite/rust/compile/nr2/compile.exp
index 4d91dd004a3..9e15cdd7253 100644
--- a/gcc/testsuite/rust/compile/nr2/compile.exp
+++ b/gcc/testsuite/rust/compile/nr2/compile.exp
@@ -19,6 +19,15 @@
 # Load support procs.
 load_lib rust-dg.exp
 
+# These tests don't run runtest_file_p consistently if it
+# doesn't return the same values, so disable parallelization
+# of this *.exp file.  The first parallel runtest to reach
+# this will run all the tests serially.
+if ![gcc_parallel_test_run_p compile] {
+return
+}
+gcc_parallel_test_enable 0
+
 # Initialize `dg'.
 dg-init
 
@@ -136,3 +145,5 @@ namespace eval rust-nr2-ns {
 
 # All done.
 dg-finish
+
+gcc_parallel_test_enable 1
-- 
2.34.1



[PATCH,LRA] Do inheritance transformations for any optimization [PR118591]

2025-04-19 Thread Denis Chertykov


Bugfix for PR118591

This bug occurs only with '-Os' option.

The function 'inherit_reload_reg ()' have a wrong condition:

static bool
inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx_insn *insn, rtx next_usage_insns)
{
  if (optimize_function_for_size_p (cfun))
--  
return false;

It's wrong because we heed an inheritance and we need to undoing it after 
unsuccessful pass.


I applied the following patch:

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 7dbc7fe1e00..af2d2793159 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5884,7 +5884,11 @@ inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx_insn *insn, rtx next_usage_insns)
 {
   if (optimize_function_for_size_p (cfun))
-return false;
+{
+  if (lra_dump_file != NULL)
+   fprintf (lra_dump_file,
+"<< inheritance for -Os <\n");
+}


Debug output from patched gcc:
--- Fragment from t.c.323r.reload --
** Inheritance #1: **

EBB 2
EBB 5
EBB 3
<< inheritance for -Os <
<<
Use smallest class of LD_REGS and GENERAL_REGS
  Creating newreg=59 from oldreg=43, assigning class LD_REGS to inheritance 
r59
Original reg change 43->59 (bb3):
   58: r59:SI=r57:SI
Add original<-inheritance after:
   60: r43:SI=r59:SI

Inheritance reuse change 43->59 (bb3):
   59: r58:SI=r59:SI
  
<< inheritance for -Os <
<<
Use smallest class of ALL_REGS and GENERAL_REGS
  Creating newreg=60 from oldreg=43, assigning class ALL_REGS to 
inheritance r60
Original reg change 43->60 (bb3):
   56: r56:QI=r60:SI#0
Add inheritance<-original before:
   61: r60:SI=r43:SI

Inheritance reuse change 43->60 (bb3):
   57: r57:SI=r60:SI
  
<< inheritance for -Os <
<<
Use smallest class of ALL_REGS and GENERAL_REGS
  Creating newreg=61 from oldreg=43, assigning class ALL_REGS to 
inheritance r61
Original reg change 43->61 (bb3):
   55: r55:QI=r61:SI#1
Add inheritance<-original before:
   62: r61:SI=r43:SI

Inheritance reuse change 43->61 (bb3):
   61: r60:SI=r61:SI
  
<< inheritance for -Os <
<<
Use smallest class of ALL_REGS and GENERAL_REGS
  Creating newreg=62 from oldreg=43, assigning class ALL_REGS to 
inheritance r62
Original reg change 43->62 (bb3):
   54: r54:QI=r62:SI#2
Add inheritance<-original before:
   63: r62:SI=r43:SI

Inheritance reuse change 43->62 (bb3):
   62: r61:SI=r62:SI
  
<< inheritance for -Os <
<<
Use smallest class of ALL_REGS and GENERAL_REGS
  Creating newreg=63 from oldreg=43, assigning class ALL_REGS to 
inheritance r63
Original reg change 43->63 (bb3):
   53: r53:QI=r63:SI#3
Add inheritance<-original before:
   64: r63:SI=r43:SI

Inheritance reuse change 43->63 (bb3):
   63: r62:SI=r63:SI
  
EBB 4

** Pseudo live ranges #1: **

  BB 4
   Insn 43: point = 0, n_alt = -1


[...]

   Assign 24 to reload r56 (freq=2000)
  Reassigning non-reload pseudos
   Assign 24 to r43 (freq=3000)

** Undoing inheritance #1: **

Inherit 5 out of 5 (100.00%)

** Local #2: **

[...]



So, we need 'Inheritance' and we need 'Undoing inheritance'.


The patch:

PR rtl-optimization/118591
gcc/
* lra-constraints.cc (inherit_reload_reg): Do inheritance for any
optimization.


diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 7dbc7fe1e00..bfdab4adc34 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5883,8 +5883,6 @@ static bool
 inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx_insn *insn, rtx next_usage_insns)
 {
-  if (optimize_function_for_size_p (cfun))
-return false;
 
   enum reg_class rclass = lra_get_allocno_class (original_