date:20240110

RE: [PATCH] i386: Add AVX10.1 related macros

2024-01-10 Thread Liu, Hongtao




> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, January 10, 2024 3:35 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com; burnus@net-
> b.de; san...@codesourcery.com
> Subject: [PATCH] i386: Add AVX10.1 related macros
> 
> Hi all,
> 
> This patch aims to add AVX10.1 related macros for libgomp's request. The
> request comes following:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
> 
> Ok for trunk?
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   PR target/113288
>   * config/i386/i386-c.cc (ix86_target_macros_internal):
>   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> ---
>  gcc/config/i386/i386-c.cc | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index
> c3ae984670b..366b560158a 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -735,6 +735,13 @@ ix86_target_macros_internal (HOST_WIDE_INT
> isa_flag,
>  def_or_undef (parse_in, "__EVEX512__");
>if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
>  def_or_undef (parse_in, "__USER_MSR__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> +{
> +  def_or_undef (parse_in, "__AVX10_1_256__");
> +  def_or_undef (parse_in, "__AVX10_1__");
I think this is not needed, others LGTM.
> +}
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> +def_or_undef (parse_in, "__AVX10_1_512__");
>if (TARGET_IAMCU)
>  {
>def_or_undef (parse_in, "__iamcu");
> --
> 2.31.1

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongtao Liu

On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang  wrote:
>
> Hi,
>
> For APX, the inline asm behavior was not mentioned in any document
> before. Add description for it.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.opt: Adjust document.
> * doc/invoke.texi: Add description for
> -mapx-inline-asm-use-gpr32.
> ---
>  gcc/config/i386/i386.opt | 3 +--
>  gcc/doc/invoke.texi  | 7 +++
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index a38e92baf92..5b4f1bff25f 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1357,8 +1357,7 @@ Enum(apx_features) String(all) Value(apx_all) Set(1)
>
>  mapx-inline-asm-use-gpr32
>  Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
> -Enable GPR32 in inline asm when APX_EGPR enabled, do not
> -hook reg or mem constraint in inline asm to GPR16.
> +Enable GPR32 in inline asm when APX_F enabled.
>
>  mevex512
>  Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 68d1f364ac0..47fd96648d8 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -35272,6 +35272,13 @@ r8-r15 registers so that the call and jmp 
> instruction length is 6 bytes
>  to allow them to be replaced with @samp{lfence; call *%r8-r15} or
>  @samp{lfence; jmp *%r8-r15} at run-time.
>
> +@opindex mapx-inline-asm-use-gpr32
> +@item -mapx-inline-asm-use-gpr32
> +When APX_F enabled, EGPR usage was by default disabled to prevent
> +unexpected EGPR generation in instructions that does not support it.
> +To invoke EGPR usage in inline asm, use this switch to allow EGPR in
> +inline asm, while user should ensure the asm actually supports EGPR.
Please align with
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642228.html.
Ok after changing that.
> +
>  @end table
>
>  These @samp{-m} switches are supported in addition to the above
> --
> 2.31.1
>


-- 
BR,
Hongtao

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongyu Wang

Thanks, this is the patch I'm going to check-in

Hongtao Liu  于2024年1月10日周三 16:02写道：
>
> On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For APX, the inline asm behavior was not mentioned in any document
> > before. Add description for it.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.opt: Adjust document.
> > * doc/invoke.texi: Add description for
> > -mapx-inline-asm-use-gpr32.
> > ---
> >  gcc/config/i386/i386.opt | 3 +--
> >  gcc/doc/invoke.texi  | 7 +++
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index a38e92baf92..5b4f1bff25f 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1357,8 +1357,7 @@ Enum(apx_features) String(all) Value(apx_all) Set(1)
> >
> >  mapx-inline-asm-use-gpr32
> >  Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
> > -Enable GPR32 in inline asm when APX_EGPR enabled, do not
> > -hook reg or mem constraint in inline asm to GPR16.
> > +Enable GPR32 in inline asm when APX_F enabled.
> >
> >  mevex512
> >  Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 68d1f364ac0..47fd96648d8 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -35272,6 +35272,13 @@ r8-r15 registers so that the call and jmp 
> > instruction length is 6 bytes
> >  to allow them to be replaced with @samp{lfence; call *%r8-r15} or
> >  @samp{lfence; jmp *%r8-r15} at run-time.
> >
> > +@opindex mapx-inline-asm-use-gpr32
> > +@item -mapx-inline-asm-use-gpr32
> > +When APX_F enabled, EGPR usage was by default disabled to prevent
> > +unexpected EGPR generation in instructions that does not support it.
> > +To invoke EGPR usage in inline asm, use this switch to allow EGPR in
> > +inline asm, while user should ensure the asm actually supports EGPR.
> Please align with
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642228.html.
> Ok after changing that.
> > +
> >  @end table
> >
> >  These @samp{-m} switches are supported in addition to the above
> > --
> > 2.31.1
> >
>
>
> --
> BR,
> Hongtao
From 7de5bb642c1265ff57a009dd889ab435b098bfca Mon Sep 17 00:00:00 2001
From: Hongyu Wang 
Date: Tue, 9 Jan 2024 15:00:21 +0800
Subject: [PATCH] i386: [APX] Document inline asm behavior and new switch for
 APX

For APX, the inline asm behavior was not mentioned in any document
before. Add description for it.

gcc/ChangeLog:

	* config/i386/i386.opt: Adjust document.
	* doc/invoke.texi: Add description for
	-mapx-inline-asm-use-gpr32.
---
 gcc/config/i386/i386.opt | 3 +--
 gcc/doc/invoke.texi  | 8 
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index a38e92baf92..5b4f1bff25f 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1357,8 +1357,7 @@ Enum(apx_features) String(all) Value(apx_all) Set(1)
 
 mapx-inline-asm-use-gpr32
 Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
-Enable GPR32 in inline asm when APX_EGPR enabled, do not
-hook reg or mem constraint in inline asm to GPR16.
+Enable GPR32 in inline asm when APX_F enabled.
 
 mevex512
 Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a494420e24e..216e2f594d1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -35272,6 +35272,14 @@ r8-r15 registers so that the call and jmp instruction length is 6 bytes
 to allow them to be replaced with @samp{lfence; call *%r8-r15} or
 @samp{lfence; jmp *%r8-r15} at run-time.
 
+@opindex mapx-inline-asm-use-gpr32
+@item -mapx-inline-asm-use-gpr32
+For inline asm support with APX, by default the EGPR feature was
+disabled to prevent potential illegal instruction with EGPR occurs.
+To invoke egpr usage in inline asm, use new compiler option
+-mapx-inline-asm-use-gpr32 and user should ensure the instruction
+supports EGPR.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
-- 
2.31.1

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-10 Thread Jonathan Yong


On 1/9/24 19:37, Radek Barton wrote:

Hello.

I forgot to add the target maintainers to the CC. My apologies for that.

Furthermore, I am adding also relevant changes in `libgcc/config/aarch64/lse.S` 
file to the patch. Originally we wanted to submit those changes separately but 
after the feedback from Andrew Pinski, it makes sense to add them here. I 
needed to rename `HIDDEN`, `TYPE`, and `SIZE` macros to `HIDDEN_PO`, `TYPE_PO`, 
and `SIZE_PO` (pseudo-op) because there is a collision with other macro named 
`SIZE` in the `lse.S` file.

Best regards,

Radek


Looks fine to me, but is __ELF__ correct? I am not familiar with 
pseudo-ops, OK if it is ELF specific when PE is targeted.

Re: [PATCH] PR target/112886, Add %S to print_operand for vector pair support

2024-01-10 Thread Michael Meissner

On Tue, Jan 09, 2024 at 04:35:22PM -0600, Peter Bergner wrote:
> On 1/5/24 4:18 PM, Michael Meissner wrote:
> > @@ -14504,13 +14504,17 @@ print_operand (FILE *file, rtx x, int code)
> > print_operand (file, x, 0);
> >return;
> >  
> > +case 'S':
> >  case 'x':
> > -  /* X is a FPR or Altivec register used in a VSX context.  */
> > +  /* X is a FPR or Altivec register used in a VSX context.  %x 
> > prints
> > +the VSX register number, %S prints the 2nd register number for
> > +vector pair, decimal 128-bit floating and IBM 128-bit binary floating
> > +values.  */
> >if (!REG_P (x) || !VSX_REGNO_P (REGNO (x)))
> > -   output_operand_lossage ("invalid %%x value");
> > +   output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
> > 'x'));
> >else
> > {
> > - int reg = REGNO (x);
> > + int reg = REGNO (x) + (code == 'S' ? 1 : 0);
> >   int vsx_reg = (FP_REGNO_P (reg)
> >  ? reg - 32
> >  : reg - FIRST_ALTIVEC_REGNO + 32);
> 
> The above looks good to me.  However:
> 
> 
> > +  : "=v" (*p)
> > +  : "v" (*q), "v" (*r));
> 
> These really should use "wa" rather than "v", since these are
> VSX instructions... or did you use those to ensure you got
> Altivec registers numbers assigned?

Yes in real code you would typically use "wa" instead of "v".  I used them in
the test to ensure that I was getting a register to show the problem.

But I can imagine circumstances where you are doing extended asm with 2 or more
instructions, one that uses an instruction that uses the VSX encoding (where
you would use %S and the other where you use the Altivec encoding where you
would use %L, and you would use the "v" constraint.

> > +/* { dg-final { scan-assembler-times {\mxvadddp 
> > (3[2-9]|[45][0-9]|6[0-3]),(3[2-9]|[45][0-9]|6[0-3]),(3[2-9]|[45][0-9]|6[0-3])\M}
> >  2 } } */
> 
> ...and this is really ugly and hard to read/understand.  Can't we use
> register variables to make it simpler?  Something like the following
> which tests having both FPR and Altivec reg numbers assigned?
> 
> ...
> void
> test (__vector_pair *ptr)
> {
>   register __vector_pair p asm ("vs10");
>   register __vector_pair q asm ("vs42");
>   register __vector_pair r asm ("vs44");
>   q = ptr[1];
>   r = ptr[2];
>   __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %S0,%S1,%S2"
>  : "=wa" (p)
>  : "wa" (q), "wa" (r));
>   ptr[2] = p;
> }
> 
> /* { dg-final { scan-assembler-times {\mxvadddp 10,42,44\M} 1 } } */
> /* { dg-final { scan-assembler-times {\mxvadddp 11,43,45\M} 1 } } */

Yes that probably will work.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-10 Thread Iain Sandoe

> On 10 Jan 2024, at 08:49, Jonathan Yong <10wa...@gmail.com> wrote:
> 
> On 1/9/24 19:37, Radek Barton wrote:
>> Hello.
>> I forgot to add the target maintainers to the CC. My apologies for that.
>> Furthermore, I am adding also relevant changes in 
>> `libgcc/config/aarch64/lse.S` file to the patch. Originally we wanted to 
>> submit those changes separately but after the feedback from Andrew Pinski, 
>> it makes sense to add them here. I needed to rename `HIDDEN`, `TYPE`, and 
>> `SIZE` macros to `HIDDEN_PO`, `TYPE_PO`, and `SIZE_PO` (pseudo-op) because 
>> there is a collision with other macro named `SIZE` in the `lse.S` file.
>> Best regards,
>> Radek
> 
> Looks fine to me, but is __ELF__ correct? I am not familiar with pseudo-ops, 
> OK if it is ELF specific when PE is targeted.
> 

I suspect that, the end, we really need to generalize this so that ELF, XCOFF, 
Mach-O etc. are handled.  In other places in the tree, typically an “asm.h” (or 
similar name) is included which contains macros that adjust:

global symbol
local symbol
type
size

(and sometimes .cfi_-related)

Then  the asm sources are adjusted to use those macros throughout, which means 
that they build correctly for the different object file formats.

You should be able to find a suitable example in other ports which could be 
updated to cater for aarch64-specific cases.

0.02GBP only, ( I do not have cycles to tackle this myself right now, although 
I have temporary workarounds in my Darwin branch)

Iain

Re: [PATCH] PR target/112886, Add %S to print_operand for vector pair support

2024-01-10 Thread Kewen.Lin

Hi Mike,

on 2024/1/6 06:18, Michael Meissner wrote:
> In looking at support for load vector pair and store vector pair for the
> PowerPC in GCC, I noticed that we were missing a print_operand output modifier
> if you are dealing with vector pairs to print the 2nd register in the vector
> pair.
> 
> If the instruction inside of the asm used the Altivec encoding, then we could
> use the %L modifier:

It seems there is no Power specific documentation on operand modifiers like this
"%L"?

> 
>   __vector_pair *p, *q, *r;
>   // ...
>   __asm__ ("vaddudm %0,%1,%2\n\tvaddudm %L0,%L1,%L2"
>: "=v" (*p)
>: "v" (*q), "v" (*r));
> 
> Likewise if we know the value to be in a tradiational FPR register, %L will
> work for instructions that use the VSX encoding:
> 
>   __vector_pair *p, *q, *r;
>   // ...
>   __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %L0,%L1,%L2"
>: "=f" (*p)
>: "f" (*q), "f" (*r));
> 
> But if have a value that is in a traditional Altivec register, and the
> instruction uses the VSX encoding, %L will a value between 0 and 31, when 
> it
> should give a value between 32 and 63.
> 
> This patch adds %S that acts like %x, except that it adds 1 to the
> register number.

Excepting for Peter's comments, since the existing "%L" has different handlings
on REG_P and MEM_P:

case 'L':
  /* Write second word of DImode or DFmode reference.  Works on register
 or non-indexed memory only.  */
  if (REG_P (x))
fputs (reg_names[REGNO (x) + 1], file);
  else if (MEM_P (x))
...

, maybe we can extend the existing '%X' for this similarly (as it's capital of
%x so easier to remember and it's only used for MEM_P now) instead of 
introducing
a new "%S".  But one argument can be a new character is more clear.  Thoughts?

BR,
Kewen

> 
> I have tested this on power10 and power9 little endian systems and on a power9
> big endian system.  There were no regressions in the patch.  Can I apply it to
> the trunk?
> 
> It would be nice if I could apply it to the open branches.  Can I backport it
> after a burn-in period?
> 
> 2024-01-04  Michael Meissner  
> 
> gcc/
> 
>   PR target/112886
>   * config/rs6000/rs6000.cc (print_operand): Add %S output modifier.
>   * doc/md.texi (Modifiers): Mention %S can be used like %x.
> 
> gcc/testsuite/
> 
>   PR target/112886
>   * /gcc.target/powerpc/pr112886.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc | 10 +++---
>  gcc/doc/md.texi |  5 +++--
>  gcc/testsuite/gcc.target/powerpc/pr112886.c | 19 +++
>  3 files changed, 29 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr112886.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 5a7e00b03d1..ba89377c9ec 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14504,13 +14504,17 @@ print_operand (FILE *file, rtx x, int code)
>   print_operand (file, x, 0);
>return;
>  
> +case 'S':
>  case 'x':
> -  /* X is a FPR or Altivec register used in a VSX context.  */
> +  /* X is a FPR or Altivec register used in a VSX context.  %x prints
> +  the VSX register number, %S prints the 2nd register number for
> +  vector pair, decimal 128-bit floating and IBM 128-bit binary floating
> +  values.  */
>if (!REG_P (x) || !VSX_REGNO_P (REGNO (x)))
> - output_operand_lossage ("invalid %%x value");
> + output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
> 'x'));
>else
>   {
> -   int reg = REGNO (x);
> +   int reg = REGNO (x) + (code == 'S' ? 1 : 0);
> int vsx_reg = (FP_REGNO_P (reg)
>? reg - 32
>: reg - FIRST_ALTIVEC_REGNO + 32);
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 47a87d6ceec..53ec957cb23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -3386,8 +3386,9 @@ A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  
> This is either an
>  FPR (@code{vs0}@dots{}@code{vs31} are @code{f0}@dots{}@code{f31}) or a VR
>  (@code{vs32}@dots{}@code{vs63} are @code{v0}@dots{}@code{v31}).
>  
> -When using @code{wa}, you should use the @code{%x} output modifier, so that
> -the correct register number is printed.  For example:
> +When using @code{wa}, you should use either the @code{%x} or @code{%S}
> +output modifier, so that the correct register number is printed.  For
> +example:
>  
>  @smallexample
>  asm ("xvadddp %x0,%x1,%x2"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr112886.c 
> b/gcc/testsuite/gcc.target/powerpc/pr112886.c
> new file mode 100644
> index 000..07196bdc220
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr112886.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=powe

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-10 Thread Iain Sandoe




> On 10 Jan 2024, at 09:02, Iain Sandoe  wrote:

>> On 10 Jan 2024, at 08:49, Jonathan Yong <10wa...@gmail.com> wrote:
>> 
>> On 1/9/24 19:37, Radek Barton wrote:
>>> Hello.
>>> I forgot to add the target maintainers to the CC. My apologies for that.
>>> Furthermore, I am adding also relevant changes in 
>>> `libgcc/config/aarch64/lse.S` file to the patch. Originally we wanted to 
>>> submit those changes separately but after the feedback from Andrew Pinski, 
>>> it makes sense to add them here. I needed to rename `HIDDEN`, `TYPE`, and 
>>> `SIZE` macros to `HIDDEN_PO`, `TYPE_PO`, and `SIZE_PO` (pseudo-op) because 
>>> there is a collision with other macro named `SIZE` in the `lse.S` file.
>>> Best regards,
>>> Radek
>> 
>> Looks fine to me, but is __ELF__ correct? I am not familiar with pseudo-ops, 
>> OK if it is ELF specific when PE is targeted.
>> 
> 
> I suspect that, the end, we really need to generalize this so that ELF, 
> XCOFF, Mach-O etc. are handled.  In other places in the tree, typically an 
> “asm.h” (or similar name) is included which contains macros that adjust:
> 
> global symbol
> local symbol
> type
> size
> 
> (and sometimes .cfi_-related)
> 
> Then  the asm sources are adjusted to use those macros throughout, which 
> means that they build correctly for the different object file formats.
> 
> You should be able to find a suitable example in other ports which could be 
> updated to cater for aarch64-specific cases.

duh, I was not looking hard enough - it seems that there is already such a file
libgcc/config/aarch64/aarch64-asm.h 
It has just not been used in the SME stuff.

Iain

Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-10 Thread Jakub Jelinek

On Fri, Jan 05, 2024 at 12:23:26PM +, Julian Brown wrote:
> * g++.dg/gomp/bad-array-section-10.C: New test.

This test FAILs in C++23/C++26 modes, just try
make check-g++ GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 
RUNTESTFLAGS=gomp.exp=bad-array-section-10.C
While in C++20 comma in array references was deprecated, in C++23 we
implement multidimensional arrays, so the diagnostics there is different.
See https://wg21.link/p2036r3

Jakub

RE: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-10 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina 
> Sent: Friday, January 5, 2024 1:31 PM
> To: Xi Ruoyao ; Palmer Dabbelt 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de; Jeff Law
> 
> Subject: RE: [PATCH]middle-end: Don't apply copysign optimization if target 
> does
> not implement optab [PR112468]
> 
> > On Fri, 2024-01-05 at 11:02 +, Tamar Christina wrote:
> > > Ok, so something like:
> > >
> > > > > ([istarget loongarch*-*-*] &&
> > > > > ([check_effective_target_loongarch_sx] ||
> > > > > [check_effective_target_hard_float]))
> > > ?
> >
> > We don't need "[check_effective_target_loongarch_sx] ||" because SIMD
> > requires hard float.
> >
> 
> Cool, thanks!
> 
> --
> 
> Hi All,
> 
> currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
> The latter has a libcall fallback and the IFN can only do optabs.
> 
> Because of this the change I made to optimize copysign only works if the
> target has impemented the optab, but it should work for those that have the
> libcall too.
> 
> More annoyingly if a target has vector versions of ABS and NEG but not 
> COPYSIGN
> then the change made them lose vectorization.
> 
> The proper fix for this is to treat the IFN the same as the tree EXPR and to
> enhance expand_COPYSIGN to also support vector calls.
> 
> I have such a patch for GCC 15 but it's quite big and too invasive for 
> stage-4.
> As such this is a minimal fix, just don't apply the transformation and leave
> targets which don't have the optab unoptimized.
> 
> Targets list for check_effective_target_ifn_copysign was gotten by grepping 
> for
> copysign and looking at the optab.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests ran in x86_64-pc-linux-gnu -m32 and tests no longer fail.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/112468
>   * doc/sourcebuild.texi: Document ifn_copysign.
>   * match.pd: Only apply transformation if target supports the IFN.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/112468
>   * gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
>   IFN_COPYSIGN.
>   * gcc.dg/pr55152-2.c: Likewise.
>   * gcc.dg/tree-ssa/abs-4.c: Likewise.
>   * gcc.dg/tree-ssa/backprop-6.c: Likewise.
>   * gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
>   * gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
>   * lib/target-supports.exp (check_effective_target_ifn_copysign): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index
> 4be67daedb20d394857c02739389cabf23c0d533..f4847dafe65cbbf8c9de3490
> 5f614ef6957658b4 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2664,6 +2664,10 @@ Target requires a command line argument to enable a
> SIMD instruction set.
>  @item xorsign
>  Target supports the xorsign optab expansion.
> 
> +@item ifn_copysign
> +Target supports the IFN_COPYSIGN optab expansion for both scalar and vector
> +types.
> +
>  @end table
> 
>  @subsubsection Environment attributes
> diff --git a/gcc/match.pd b/gcc/match.pd
> index
> d57e29bfe1d68afd4df4dda20fecc2405ff05332..87d13e7e3e1aa6d89119142b6
> 14890dc4729b521 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1159,13 +1159,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(copysigns @0 REAL_CST@1)
>(if (!REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (abs @0
> +   (abs @0)
> +#if GIMPLE
> +   (if (!direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +  OPTIMIZE_FOR_BOTH))
> +(negate (abs @0)))
> +#endif
> +   )))
> 
> +#if GIMPLE
>  /* Transform fneg (fabs (X)) -> copysign (X, -1).  */
>  (simplify
>   (negate (abs @0))
> - (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
> -
> + (if (direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +   OPTIMIZE_FOR_BOTH))
> +   (IFN_COPYSIGN @0 { build_minus_one_cst (type); })))
> +#endif
>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-
> copysign-1.c
> index
> f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6..96b80c733794fffada1b08274ef
> 39cc8f6e442ce 100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O -fdump-tree-cddce1" } */
> +/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
> x86_64-*-*
> } && ilp32 } } } */
> 
>  double foo (double x)
>  {
> @@ -12,5 +13,7 @@ double bar (double x)
>return __builtin_copysign (x, minuszero);
>  }
> 
> -/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" { targe

[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread Jun Sha (Joshua)

This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 .../riscv/riscv-vector-builtins-bases.cc  | 139 
 .../riscv/riscv-vector-builtins-bases.h   |  31 ++
 .../riscv/riscv-vector-builtins-shapes.cc |  98 ++
 .../riscv/riscv-vector-builtins-shapes.h  |   3 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 311 ++
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/t-riscv  |   1 +
 .../riscv/thead-vector-builtins-functions.def |  39 +++
 gcc/config/riscv/thead-vector.md  | 253 ++
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 16 files changed, 1406 insertions(+)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 46f1a1da3

[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread Jun Sha (Joshua)

This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 .../riscv/riscv-vector-builtins-bases.cc  | 139 
 .../riscv/riscv-vector-builtins-bases.h   |  31 ++
 .../riscv/riscv-vector-builtins-shapes.cc |  98 ++
 .../riscv/riscv-vector-builtins-shapes.h  |   3 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 311 ++
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/t-riscv  |   1 +
 .../riscv/thead-vector-builtins-functions.def |  39 +++
 gcc/config/riscv/thead-vector.md  | 253 ++
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 16 files changed, 1406 insertions(+)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 46f1a1da3

Re: [committed] libgomp: Use absolute pathname to testsuite/flock [PR113192]

2024-01-10 Thread Rainer Orth

Hi Jakub,

> When flock program doesn't exist, libgomp configure attempts to
> offer a fallback version using a perl script, but we weren't using
> absolute filename to that, so it apparently failed to work correctly.
>
> The following patch arranges for it to get the absolute filename.
>
> Tested by John David in the PR.

This patch completely broke parallel libgomp testing on Solaris:

ERROR: couldn't execute "\$(abs_top_srcdir)/testsuite/flock": no such file or 
directory

FLOCK is also substituted into testsuite/libgomp-site-extra.exp.in,
which gets included into site.exp.  That one has

## Begin content included from file libgomp-site-extra.exp.  Do not modify. ##
set FLOCK {$(abs_top_srcdir)/testsuite/flock}

So expect tries to literally execute '$(abs_top_srcdir)/testsuite/flock'
which cannot work.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] Fix debug info for enumeration types with reverse Scalar_Storage_Order

2024-01-10 Thread Richard Biener

On Tue, Jan 9, 2024 at 9:18 PM Eric Botcazou  wrote:
>
> Hi,
>
> this is not really a regression but the patch was written last week and is
> quite straightforward, so hopefully can nevertheless be OK.  It implements the
> support of DW_AT_endianity for enumeration types because they are scalar and,
> therefore, reverse Scalar_Storage_Order is supported for them, but only when
> the -gstrict-dwarf switch is not passed because this is an extension.
>
> There is an associated GDB patch to be submitted by Tom to grok the new DWARF.
>
> Tested on x86-64/Linux, OK for the mainline?  It may also help the GDB side to
> backport it for the upcoming 13.3 release.

Can you elaborate on the DIE order constraint and why it was chosen?  That is,

+  /* The DIE with DW_AT_endianity is placed right after the naked DIE.  */
+  if (reverse)
+   {
+ gcc_assert (type_die);
...

and

+  /* The DIE with DW_AT_endianity is placed right after the naked DIE.  */
+  if (reverse_type)
+   {
+ dw_die_ref after_die
+   = modified_type_die (type, cv_quals, false, context_die);
+ gen_type_die (type, context_die, true);
+ gcc_assert (after_die->die_sib
+ && get_AT_unsigned (after_die->die_sib, DW_AT_endianity));
+ return after_die->die_sib;

?

Likewise the extra argument to the functions is odd - is that not available
on the tree type?

Richard.

>
> 2024-01-09  Eric Botcazou  
>
> * dwarf2out.cc (modified_type_die): Extend the support of reverse
> storage order to enumeration types if -gstrict-dwarf is not passed.
> (gen_enumeration_type_die): Add REVERSE parameter and generate the
> DIE immediately after the existing one if it is true.
> (gen_tagged_type_die): Add REVERSE parameter and pass it in the
> call to gen_enumeration_type_die.
> (gen_type_die_with_usage): Add REVERSE parameter and pass it in the
> first recursive call as well as the call to gen_tagged_type_die.
> (gen_type_die): Add REVERSE parameter and pass it in the call to
> gen_type_die_with_usage.
>
> --
> Eric Botcazou

[PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-10 Thread Jakub Jelinek

Hi!

As changed in other parts of the compiler, using
build_nonstandard_integer_type is not appropriate for arbitrary precisions,
especially if the precision comes from a BITINT_TYPE or something based on
that, build_nonstandard_integer_type relies on some integral mode being
supported that can support the precision.

The following patch uses build_bitint_type instead for BITINT_TYPE
precisions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, it would be good if we were able to punt on the optimization
(but this code doesn't seem to be able to punt, so it needs to be done
somewhere earlier) at least in cases where building it would be invalid.
E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
so maybe it ran into some other SRA limit.

2024-01-10  Jakub Jelinek  

PR tree-optimization/113120
* tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
with root->size TYPE_PRECISION don't build anything new.
Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
rather than build_nonstandard_integer_type.

* gcc.dg/bitint-63.c: New test.

--- gcc/tree-sra.cc.jj  2024-01-03 11:51:35.054682295 +0100
+++ gcc/tree-sra.cc 2024-01-09 19:50:42.911500487 +0100
@@ -2733,7 +2733,8 @@ analyze_access_subtree (struct access *r
  For integral types this means the precision has to match.
 Avoid assumptions based on the integral type kind, too.  */
   if (INTEGRAL_TYPE_P (root->type)
- && (TREE_CODE (root->type) != INTEGER_TYPE
+ && ((TREE_CODE (root->type) != INTEGER_TYPE
+  && TREE_CODE (root->type) != BITINT_TYPE)
  || TYPE_PRECISION (root->type) != root->size)
  /* But leave bitfield accesses alone.  */
  && (TREE_CODE (root->expr) != COMPONENT_REF
@@ -2742,8 +2743,11 @@ analyze_access_subtree (struct access *r
  tree rt = root->type;
  gcc_assert ((root->offset % BITS_PER_UNIT) == 0
  && (root->size % BITS_PER_UNIT) == 0);
- root->type = build_nonstandard_integer_type (root->size,
-  TYPE_UNSIGNED (rt));
+ if (TREE_CODE (root->type) == BITINT_TYPE)
+   root->type = build_bitint_type (root->size, TYPE_UNSIGNED (rt));
+ else
+   root->type = build_nonstandard_integer_type (root->size,
+TYPE_UNSIGNED (rt));
  root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,
 root->offset, root->reverse,
 root->type, NULL, false);
--- gcc/testsuite/gcc.dg/bitint-63.c.jj 2024-01-09 20:08:04.831720434 +0100
+++ gcc/testsuite/gcc.dg/bitint-63.c2024-01-09 20:07:43.045029421 +0100
@@ -0,0 +1,24 @@
+/* PR tree-optimization/113120 */
+/* { dg-do compile { target bitint } } */
+/* { dg-require-stack-check "generic" } */
+/* { dg-options "-std=c23 -O -fno-tree-fre --param=large-stack-frame=1024 
-fstack-check=generic" } */
+
+#if __BITINT_MAXWIDTH__ >= 513
+typedef _BitInt(513) B;
+#else
+typedef int B;
+#endif
+
+static inline __attribute__((__always_inline__)) void
+bar (B x)
+{
+  B y = x;
+  if (y)
+__builtin_abort ();
+}
+
+void
+foo (void)
+{
+  bar (0);
+}

Jakub

Re: [PATCH] Add -mevex512 into invoke.texi

2024-01-10 Thread Richard Biener

On Wed, Jan 10, 2024 at 3:35 AM Haochen Jiang  wrote:
>
> Hi Richard,
>
> It seems that I send out a not updated patch. This patch should what
> I want to send.

OK

> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Add -mevex512.
> ---
>  gcc/doc/invoke.texi | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 68d1f364ac0..6d4f92f1101 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1463,7 +1463,7 @@ See RS/6000 and PowerPC Options.
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
>  -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
>  -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
> --musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512
> +-musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512 -mevex512
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
>  -mkl -mwidekl
> @@ -35272,6 +35272,11 @@ r8-r15 registers so that the call and jmp 
> instruction length is 6 bytes
>  to allow them to be replaced with @samp{lfence; call *%r8-r15} or
>  @samp{lfence; jmp *%r8-r15} at run-time.
>
> +@opindex mevex512
> +@item -mevex512
> +@itemx -mno-evex512
> +Enables/disables 512-bit vector. It will be default on if AVX512F is enabled.
> +
>  @end table
>
>  These @samp{-m} switches are supported in addition to the above
> --
> 2.31.1
>

Re: [PATCH] i386: Add AVX10.1 related macros

2024-01-10 Thread Richard Biener

On Wed, Jan 10, 2024 at 9:01 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Jiang, Haochen 
> > Sent: Wednesday, January 10, 2024 3:35 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; ubiz...@gmail.com; burnus@net-
> > b.de; san...@codesourcery.com
> > Subject: [PATCH] i386: Add AVX10.1 related macros
> >
> > Hi all,
> >
> > This patch aims to add AVX10.1 related macros for libgomp's request. The
> > request comes following:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
> >
> > Ok for trunk?
> >
> > Thx,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> >   PR target/113288
> >   * config/i386/i386-c.cc (ix86_target_macros_internal):
> >   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> > ---
> >  gcc/config/i386/i386-c.cc | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index
> > c3ae984670b..366b560158a 100644
> > --- a/gcc/config/i386/i386-c.cc
> > +++ b/gcc/config/i386/i386-c.cc
> > @@ -735,6 +735,13 @@ ix86_target_macros_internal (HOST_WIDE_INT
> > isa_flag,
> >  def_or_undef (parse_in, "__EVEX512__");
> >if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
> >  def_or_undef (parse_in, "__USER_MSR__");
> > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> > +{
> > +  def_or_undef (parse_in, "__AVX10_1_256__");
> > +  def_or_undef (parse_in, "__AVX10_1__");
> I think this is not needed, others LGTM.

So __AVX10_1_256__ and __AVX10_1_512__ are redundant
with __AVX10_1__ and __EVEX512__, right?

> > +}
> > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> > +def_or_undef (parse_in, "__AVX10_1_512__");
> >if (TARGET_IAMCU)
> >  {
> >def_or_undef (parse_in, "__iamcu");
> > --
> > 2.31.1
>

Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread joshua

And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index

What do you mean by it? I'm a bit confused.

Changing i8_v_scalar_const_ptr_ops into all_v_scalar_const_ptr_ops
will expand the datatypes that can be used in th_vlb. Can we restrict
again in th_loadstore_width?




--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 17:35
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主　题：Re: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)



I think we should remove those many data structure you added like: 
i8_v_scalar_const_ptr_ops
Instead, you should use all_v_scalar_const_ptr_ops


And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index




juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2024-01-10 17:27
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Chris

Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-10 Thread Richard Biener

On Wed, 10 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> As changed in other parts of the compiler, using
> build_nonstandard_integer_type is not appropriate for arbitrary precisions,
> especially if the precision comes from a BITINT_TYPE or something based on
> that, build_nonstandard_integer_type relies on some integral mode being
> supported that can support the precision.
> 
> The following patch uses build_bitint_type instead for BITINT_TYPE
> precisions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM, see below for a question.

> Note, it would be good if we were able to punt on the optimization
> (but this code doesn't seem to be able to punt, so it needs to be done
> somewhere earlier) at least in cases where building it would be invalid.
> E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> so maybe it ran into some other SRA limit.

I think SRA has a size limit, --param 
sra-max-scalarization-size-O{size,speed}, not sure if that is all or
the one that's hit.

> 2024-01-10  Jakub Jelinek  
> 
>   PR tree-optimization/113120
>   * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
>   with root->size TYPE_PRECISION don't build anything new.
>   Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
>   rather than build_nonstandard_integer_type.
> 
>   * gcc.dg/bitint-63.c: New test.
> 
> --- gcc/tree-sra.cc.jj2024-01-03 11:51:35.054682295 +0100
> +++ gcc/tree-sra.cc   2024-01-09 19:50:42.911500487 +0100
> @@ -2733,7 +2733,8 @@ analyze_access_subtree (struct access *r
>   For integral types this means the precision has to match.
>Avoid assumptions based on the integral type kind, too.  */
>if (INTEGRAL_TYPE_P (root->type)
> -   && (TREE_CODE (root->type) != INTEGER_TYPE
> +   && ((TREE_CODE (root->type) != INTEGER_TYPE
> +&& TREE_CODE (root->type) != BITINT_TYPE)
> || TYPE_PRECISION (root->type) != root->size)
> /* But leave bitfield accesses alone.  */
> && (TREE_CODE (root->expr) != COMPONENT_REF
> @@ -2742,8 +2743,11 @@ analyze_access_subtree (struct access *r
> tree rt = root->type;
> gcc_assert ((root->offset % BITS_PER_UNIT) == 0
> && (root->size % BITS_PER_UNIT) == 0);
> -   root->type = build_nonstandard_integer_type (root->size,
> -TYPE_UNSIGNED (rt));
> +   if (TREE_CODE (root->type) == BITINT_TYPE)
> + root->type = build_bitint_type (root->size, TYPE_UNSIGNED (rt));

I suppose we don't exactly need to preserve BITINT-ness, say if
root->size fits the largest supported integer mode?  It's OK as-is
for now.

> +   else
> + root->type = build_nonstandard_integer_type (root->size,
> +  TYPE_UNSIGNED (rt));
> root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,
>root->offset, root->reverse,
>root->type, NULL, false);
> --- gcc/testsuite/gcc.dg/bitint-63.c.jj   2024-01-09 20:08:04.831720434 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-63.c  2024-01-09 20:07:43.045029421 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/113120 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-require-stack-check "generic" } */
> +/* { dg-options "-std=c23 -O -fno-tree-fre --param=large-stack-frame=1024 
> -fstack-check=generic" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 513
> +typedef _BitInt(513) B;
> +#else
> +typedef int B;
> +#endif
> +
> +static inline __attribute__((__always_inline__)) void
> +bar (B x)
> +{
> +  B y = x;
> +  if (y)
> +__builtin_abort ();
> +}
> +
> +void
> +foo (void)
> +{
> +  bar (0);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread Juzhe-Zhong

This patch is preparing patch for the following cost model tweak.

Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.

The reason we want to switch to generic vector cost model is the default
cost model generates inferior codegen for various benchmarks.

For example, PR113247, we have performance bug that we end up having over 70%
performance drop of SHA256.  Currently, no matter how we adapt cost model,
we are not able to fix the performance bug since we always use default cost 
model by default.

Also, tweak the generic cost model back to default cost model since we have 
some FAILs in
current tests.

After this patch, we (me an Robin) can work on cost model tunning together to 
improve performane
in various benchmarks.

Tested on both RV32 and RV64, ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv.cc (get_common_costs): Switch RVV cost model.
(get_vector_costs): Ditto.
(riscv_builtin_vectorization_cost): Ditto.

---
 gcc/config/riscv/riscv.cc | 117 --
 1 file changed, 61 insertions(+), 56 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..d72058039ce 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -358,13 +358,13 @@ static const common_vector_cost generic_vls_vector_cost = 
{
   1, /* fp_stmt_cost  */
   1, /* gather_load_cost  */
   1, /* scatter_store_cost  */
-  2, /* vec_to_scalar_cost  */
+  1, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
-  2, /* permute_cost  */
+  1, /* permute_cost  */
   1, /* align_load_cost  */
   1, /* align_store_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
+  2, /* unalign_load_cost  */
+  2, /* unalign_store_cost  */
 };
 
 /* Generic costs for VLA vector operations.  */
@@ -374,13 +374,13 @@ static const scalable_vector_cost generic_vla_vector_cost 
= {
 1, /* fp_stmt_cost  */
 1, /* gather_load_cost  */
 1, /* scatter_store_cost  */
-2, /* vec_to_scalar_cost  */
+1, /* vec_to_scalar_cost  */
 1, /* scalar_to_vec_cost  */
-2, /* permute_cost  */
+1, /* permute_cost  */
 1, /* align_load_cost  */
 1, /* align_store_cost  */
-1, /* unalign_load_cost  */
-1, /* unalign_store_cost  */
+2, /* unalign_load_cost  */
+2, /* unalign_store_cost  */
   },
 };
 
@@ -10372,11 +10372,10 @@ riscv_frame_pointer_required (void)
   return riscv_save_frame_pointer && !crtl->is_leaf;
 }
 
-/* Return the appropriate common costs for vectors of type VECTYPE.  */
+/* Return the appropriate common costs according to VECTYPE from COSTS.  */
 static const common_vector_cost *
-get_common_costs (tree vectype)
+get_common_costs (const cpu_vector_cost *costs, tree vectype)
 {
-  const cpu_vector_cost *costs = tune_param->vec_costs;
   gcc_assert (costs);
 
   if (vectype && riscv_v_ext_vls_mode_p (TYPE_MODE (vectype)))
@@ -10384,78 +10383,84 @@ get_common_costs (tree vectype)
   return costs->vla;
 }
 
+/* Return the CPU vector costs according to -mtune if tune info has non-NULL
+   vector cost.  Otherwide, return the default generic vector costs.  */
+static const cpu_vector_cost *
+get_vector_costs ()
+{
+  const cpu_vector_cost *costs = tune_param->vec_costs;
+  if (!costs)
+return &generic_vector_cost;
+  return costs;
+}
+
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 
 static int
 riscv_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
  tree vectype, int misalign ATTRIBUTE_UNUSED)
 {
-  unsigned elements;
-  const cpu_vector_cost *costs = tune_param->vec_costs;
+  const cpu_vector_cost *costs = get_vector_costs ();
   bool fp = false;
 
   if (vectype != NULL)
 fp = FLOAT_TYPE_P (vectype);
 
-  if (costs != NULL)
+  const common_vector_cost *common_costs = get_common_costs (costs, vectype);
+  gcc_assert (common_costs != NULL);
+  switch (type_of_cost)
 {
-  const common_vector_cost *common_costs = get_common_costs (vectype);
-  gcc_assert (common_costs != NULL);
-  switch (type_of_cost)
-   {
-   case scalar_stmt:
- return fp ? costs->scalar_fp_stmt_cost : costs->scalar_int_stmt_cost;
+case scalar_stmt:
+  return fp ? costs->scalar_fp_stmt_cost : costs->scalar_int_stmt_cost;
 
-   case scalar_load:
- return costs->scalar_load_cost;
+case scalar_load:
+  return costs->scalar_load_cost;
 
-   case scalar_store:
- return costs->scalar_store_cost;
+case scalar_store:
+  return costs->scalar_store_cost;
 
-   case vector_stmt:
- return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
+case vector_stmt:
+  return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
-   case vector_load:
- return common_costs->align_load_cost;
+case vector_load:
+  return common_costs->

Re: [EXTERNAL] Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-10 Thread Radek Barton

Originally we've used `!__MINGW64__` but changed it to `__ELF__` upon feedback 
received. Should I change it back to `!__MINGW64__`? Or introduce '__COFF__' 
and then use `!__COFF__`? What would be the minimal acceptable change? we are 
currently probably not able to provide that generic solution as has Iain Sandoe 
implied. Note that we have moved the pseudo-ops wrapper macros to the 
`libgcc/config/aarch64/aarch64-asm.h` file already.

Thank you all for your valuable feedback.

Radek

Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 10:51:32AM +0100, Richard Biener wrote:
> > @@ -2742,8 +2743,11 @@ analyze_access_subtree (struct access *r
> >   tree rt = root->type;
> >   gcc_assert ((root->offset % BITS_PER_UNIT) == 0
> >   && (root->size % BITS_PER_UNIT) == 0);
> > - root->type = build_nonstandard_integer_type (root->size,
> > -  TYPE_UNSIGNED (rt));
> > + if (TREE_CODE (root->type) == BITINT_TYPE)
> > +   root->type = build_bitint_type (root->size, TYPE_UNSIGNED (rt));
> 
> I suppose we don't exactly need to preserve BITINT-ness, say if
> root->size fits the largest supported integer mode?  It's OK as-is

Sure, we could use INTEGER_TYPE in that case, but if we use BITINT_TYPE,
it won't do harm either, worst case it will be lowered to those
INTEGER_TYPEs later again.
What is IMHO important is not to introduce BITINT_TYPEs where they weren't
used before, we didn't need to use them before either.  And to use
BITINT_TYPEs for large ones which can't be expressed in INTEGER_TYPEs.

Jakub

RE: [PATCH][wwwdoc] gcc-14: Add arm cortex-m52 cpu support

2024-01-10 Thread Kyrylo Tkachov



> -Original Message-
> From: Chung-Ju Wu 
> Sent: Wednesday, January 10, 2024 7:07 AM
> To: Gerald Pfeifer ; gcc-patches  patc...@gcc.gnu.org>
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Sudakshina Das ;
> jason...@anshingtek.com.tw
> Subject: [PATCH][wwwdoc] gcc-14: Add arm cortex-m52 cpu support
> 
> Hi Gerald,
> 
> The Arm Cortex-M52 CPU has been added to the upstream:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642230.html
> 
> I would like to document this on the gcc-14 changes.html page.
> Attached is the patch for gcc-wwwdocs repository.
> 
> Is it OK?

I can approve these as port maintainer. The entry is okay.
Thanks,
Kyrill

> 
> Regards,
> jasonwucj

Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread joshua

The key difference between vlb/vlh/vlw is not output type too.
Their difference is the range of datatype, not one specific type. 
We have dived into the xtheadvector special intrinsics and are
sure about that.






--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 19:00
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


instance.op_info->args[i].get_tree_type (instance.type.index)  is output type.


You can use GDB debug it .


juzhe.zh...@rivai.ai

 
发件人： joshua
发送时间： 2024-01-10 18:57
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

Hi Juzhe,

Perhaps things are not as simple as imagined. 
The differences between vlb/vlh/vlw is not the same
as vle8/vle16/vle32. "8", "16" or "32" in vle8/vle16/vle32
can be appended from "vle" according to input type.
But vlb/vlh/vlw is different not in input type.







--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 18:03
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


I mean change these:
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)



into a single:
+DEF_RVV_FUNCTION (th_vl, th_loadstore_width, full_preds, 
all_v_scalar_const_ptr_ops)


and append "h", "w", or"b" according to 
TYPE_UNSIGNED and
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index



in th_loadstore_width.


It should definitely works, I allow this flexibility in design of the framework.




juzhe.zh...@rivai.ai

 
发件人： joshua
发送时间： 2024-01-10 17:55
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
 
What do you mean by it? I'm a bit confused.
 
Changing i8_v_scalar_const_ptr_ops into all_v_scalar_const_ptr_ops
will expand the datatypes that can be used in th_vlb. Can we restrict
again in th_loadstore_width?
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 17:35
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题：Re: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
 
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)
 
 
 
I think we should remove those many data structure you added like: 
i8_v_scalar_const_ptr_ops
Instead, you should use all_v_scalar_const_ptr_ops
 
 
And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
 
 
 
 
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 17:27
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
 
gcc/ChangeLog:
 
 * config/riscv/riscv-vector-builtins-bases.cc
 (class th_loadstore_width): Define new builtin bases.
 (BASE): Define new builtin bases.
 * config/riscv/riscv-vector-builtins-bases.h:
 Define new builtin class.
 * config/riscv/riscv-vector-builtins-functions.def (vlsegff):
 Include thead-vector-builtins-functions.def.
 * config/riscv/riscv-vector-builtins-shapes.cc
 (struct th_loadstore_width_def): Define new builtin shapes.
 (struct th_indexed_loadstore_width_def):
 Define new builtin shapes.
 (SHAPE): Define new builtin shapes.
 * config/riscv/riscv-vector-builtins-shapes.h:
 Define new builtin shapes.
 * config/riscv/riscv-vector-builtins-types

Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread joshua

Can you see the images that I sent to you in the last email？
If not, maybe you can refer to the last chapter in the thead spec.






--
发件人：joshua 
发送时间：2024年1月10日(星期三) 19:06
收件人："juzhe.zh...@rivai.ai"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


The key difference between vlb/vlh/vlw is not output type too.
Their difference is the range of datatype, not one specific type. 
We have dived into the xtheadvector special intrinsics and are
sure about that.






--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 19:00
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


instance.op_info->args[i].get_tree_type (instance.type.index)  is output type.


You can use GDB debug it .


juzhe.zh...@rivai.ai

 
发件人： joshua
发送时间： 2024-01-10 18:57
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

Hi Juzhe,

Perhaps things are not as simple as imagined. 
The differences between vlb/vlh/vlw is not the same
as vle8/vle16/vle32. "8", "16" or "32" in vle8/vle16/vle32
can be appended from "vle" according to input type.
But vlb/vlh/vlw is different not in input type.







--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 18:03
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


I mean change these:
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)



into a single:
+DEF_RVV_FUNCTION (th_vl, th_loadstore_width, full_preds, 
all_v_scalar_const_ptr_ops)


and append "h", "w", or"b" according to 
TYPE_UNSIGNED and
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index



in th_loadstore_width.


It should definitely works, I allow this flexibility in design of the framework.




juzhe.zh...@rivai.ai

 
发件人： joshua
发送时间： 2024-01-10 17:55
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
 
What do you mean by it? I'm a bit confused.
 
Changing i8_v_scalar_const_ptr_ops into all_v_scalar_const_ptr_ops
will expand the datatypes that can be used in th_vlb. Can we restrict
again in th_loadstore_width?
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 17:35
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题：Re: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
 
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)
 
 
 
I think we should remove those many data structure you added like: 
i8_v_scalar_const_ptr_ops
Instead, you should use all_v_scalar_const_ptr_ops
 
 
And revise th_loadstore_width, append the name according TYPE_UNSIGNED and 
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
 
 
 
 
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 17:27
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
 
gcc/ChangeLog:
 
 * config/riscv/riscv-vector-builtins-bases.cc
 (class th_loadstore_width): Define new builtin bases.
 (BASE): Define new builtin bases.
 * config/riscv/riscv-vector-builtins-bases.h

Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread juzhe.zh...@rivai.ai

So vlb has not only sew = 8 ?

But why do you add intrinsics as follows ?

+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)

Why it is not :

DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
all_v_scalar_const_ptr_ops)
? 

juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2024-01-10 19:06
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
The key difference between vlb/vlh/vlw is not output type too.
Their difference is the range of datatype, not one specific type.
We have dived into the xtheadvector special intrinsics and are
sure about that.
 
 
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 19:00
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
 
instance.op_info->args[i].get_tree_type (instance.type.index)  is output type.
 
 
You can use GDB debug it .
 
 
juzhe.zh...@rivai.ai
 
 
发件人： joshua
发送时间： 2024-01-10 18:57
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
Hi Juzhe,
 
Perhaps things are not as simple as imagined.
The differences between vlb/vlh/vlw is not the same
as vle8/vle16/vle32. "8", "16" or "32" in vle8/vle16/vle32
can be appended from "vle" according to input type.
But vlb/vlh/vlw is different not in input type.
 
 
 
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 18:03
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
 
I mean change these:
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)
 
 
 
into a single:
+DEF_RVV_FUNCTION (th_vl, th_loadstore_width, full_preds, 
all_v_scalar_const_ptr_ops)
 
 
and append "h", "w", or"b" according to
TYPE_UNSIGNED and
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
 
 
 
in th_loadstore_width.
 
 
It should definitely works, I allow this flexibility in design of the framework.
 
 
 
 
juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2024-01-10 17:55
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
And revise th_loadstore_width, append the name according TYPE_UNSIGNED and
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
What do you mean by it? I'm a bit confused.
Changing i8_v_scalar_const_ptr_ops into all_v_scalar_const_ptr_ops
will expand the datatypes that can be used in th_vlb. Can we restrict
again in th_loadstore_width?
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 17:35
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题：Re: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlh, th_loadstore_width, full_preds, 
i16_v_scalar_const_ptr_ops)
+DEF_RVV_FUNCTION (th_vlw, th_loadstore_width, full_preds, 
i32_v_scalar_const_ptr_ops)
I think we should remove those many data structure you added like: 
i8_v_scalar_const_ptr_ops
Instead, you should use all_v_scalar_const_ptr_ops
And revise th_loadstore_width, append the name according TYPE_UNSIGNED and
GET_MODE_BITSIZE (GET_MODE_INNER (TYPE_MODE 
(instance.op_info->args[i].get_tree_type (instance.type.index
juzhe.zh...@rivai.ai
From: Jun Sha (Joshua)
Date: 2024-01-10 17:27
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new

Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.

2024-01-10 Thread joshua

vlb can accept sew=8/16/32/64.
vlh can accept sew=16/32/64;
vlw can accept sew=32/64.

vint8m1_t __riscv_th_vlb_v_i8m1 (const int8_t *a, size_t vl);
vint8m2_t __riscv_th_vlb_v_i8m2 (const int8_t *a, size_t vl);
vint8m4_t __riscv_th_vlb_v_i8m4 (const int8_t *a, size_t vl);
vint8m8_t __riscv_th_vlb_v_i8m8 (const int8_t *a, size_t vl);
vint16m1_t __riscv_th_vlb_v_i16m1 (const int16_t *a, size_t vl);
vint16m2_t __riscv_th_vlb_v_i16m2 (const int16_t *a, size_t vl);
vint16m4_t __riscv_th_vlb_v_i16m4 (const int16_t *a, size_t vl);
vint16m8_t __riscv_th_vlb_v_i16m8 (const int16_t *a, size_t vl);
vint32m1_t __riscv_th_vlb_v_i32m1 (const int32_t *a, size_t vl);
vint32m2_t __riscv_th_vlb_v_i32m2 (const int32_t *a, size_t vl);
vint32m4_t __riscv_th_vlb_v_i32m4 (const int32_t *a, size_t vl);
vint32m8_t __riscv_th_vlb_v_i32m8 (const int32_t *a, size_t vl);
vint64m1_t __riscv_th_vlb_v_i64m1 (const int64_t *a, size_t vl);
vint64m2_t __riscv_th_vlb_v_i64m2 (const int64_t *a, size_t vl);
vint64m4_t __riscv_th_vlb_v_i64m4 (const int64_t *a, size_t vl);
vint64m8_t __riscv_th_vlb_v_i64m8 (const int64_t *a, size_t vl);
vint16m1_t __riscv_th_vlh_v_i16m1 (const int16_t *a, size_t vl);
vint16m2_t __riscv_th_vlh_v_i16m2 (const int16_t *a, size_t vl);
vint16m4_t __riscv_th_vlh_v_i16m4 (const int16_t *a, size_t vl);
vint16m8_t __riscv_th_vlh_v_i16m8 (const int16_t *a, size_t vl);
vint32m1_t __riscv_th_vlh_v_i32m1 (const int32_t *a, size_t vl);
vint32m2_t __riscv_th_vlh_v_i32m2 (const int32_t *a, size_t vl);
vint32m4_t __riscv_th_vlh_v_i32m4 (const int32_t *a, size_t vl);
vint32m8_t __riscv_th_vlh_v_i32m8 (const int32_t *a, size_t vl);
vint64m1_t __riscv_th_vlh_v_i64m1 (const int64_t *a, size_t vl);
vint64m2_t __riscv_th_vlh_v_i64m2 (const int64_t *a, size_t vl);
vint64m4_t __riscv_th_vlh_v_i64m4 (const int64_t *a, size_t vl);
vint64m8_t __riscv_th_vlh_v_i64m8 (const int64_t *a, size_t vl);
vint32m1_t __riscv_th_vlw_v_i32m1 (const int32_t *a, size_t vl);
vint32m2_t __riscv_th_vlw_v_i32m2 (const int32_t *a, size_t vl);
vint32m4_t __riscv_th_vlw_v_i32m4 (const int32_t *a, size_t vl);
vint32m8_t __riscv_th_vlw_v_i32m8 (const int32_t *a, size_t vl);
vint64m1_t __riscv_th_vlw_v_i64m1 (const int64_t *a, size_t vl);
vint64m2_t __riscv_th_vlw_v_i64m2 (const int64_t *a, size_t vl);
vint64m4_t __riscv_th_vlw_v_i64m4 (const int64_t *a, size_t vl);
vint64m8_t __riscv_th_vlw_v_i64m8 (const int64_t *a, size_t vl);

With the exisiting framework, I cannot come up with better way to differentiate
between vlb/vlw.vlh.




--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 19:09
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.


So vlb has not only sew = 8 ?

But why do you add intrinsics as follows ?

+DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
i8_v_scalar_const_ptr_ops)

Why it is not :

DEF_RVV_FUNCTION (th_vlb, th_loadstore_width, full_preds, 
all_v_scalar_const_ptr_ops)
? 

juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2024-01-10 19:06
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
The key difference between vlb/vlh/vlw is not output type too.
Their difference is the range of datatype, not one specific type.
We have dived into the xtheadvector special intrinsics and are
sure about that.
 
 
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 19:00
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
 
instance.op_info->args[i].get_tree_type (instance.type.index)  is output type.
 
 
You can use GDB debug it .
 
 
juzhe.zh...@rivai.ai
 
 
发件人： joshua
发送时间： 2024-01-10 18:57
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： Re：Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intrinsics.
 
Hi Juzhe,
 
Perhaps things are not as simple as imagined.
The differences between vlb/vlh/vlw is not the same
as vle8/vle16/vle32. "8", "16" or "32" in vle8/vle16/vle32
can be appended from "vle" according to input type.
But vlb/vlh/vlw is different not in input type.
 
 
 
 
 
 
 
--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月10日(星期三) 18:03
收件人："cooper.joshua"; 
"gcc-patches"
抄 送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题：Re: Re：[PATCH v5] RISC-V: Add support for xtheadvector-specific intr

[PATCH] libgomp, v2: Use absolute pathname to testsuite/flock [PR113192]

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 10:32:56AM +0100, Rainer Orth wrote:
> > When flock program doesn't exist, libgomp configure attempts to
> > offer a fallback version using a perl script, but we weren't using
> > absolute filename to that, so it apparently failed to work correctly.
> >
> > The following patch arranges for it to get the absolute filename.
> >
> > Tested by John David in the PR.
> 
> This patch completely broke parallel libgomp testing on Solaris:
> 
> ERROR: couldn't execute "\$(abs_top_srcdir)/testsuite/flock": no such file or 
> directory

Sorry for that.

> FLOCK is also substituted into testsuite/libgomp-site-extra.exp.in,
> which gets included into site.exp.  That one has
> 
> ## Begin content included from file libgomp-site-extra.exp.  Do not modify. ##
> set FLOCK {$(abs_top_srcdir)/testsuite/flock}
> 
> So expect tries to literally execute '$(abs_top_srcdir)/testsuite/flock'
> which cannot work.

Does the following work then?

Using autoconf's internal _AC_SRCDIRS macro doesn't seem to be a good idea
to me, so I've copied what e.g. libobjc configure does instead.

2024-01-10  Jakub Jelinek  

PR libgomp/113192
* configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock
instead of \$(abs_top_srcdir)/testsuite/flock.
* configure: Regenerated.

--- libgomp/configure.ac.jj 2024-01-09 09:54:03.398011788 +0100
+++ libgomp/configure.ac2024-01-10 12:09:05.558162522 +0100
@@ -343,7 +343,16 @@ AC_MSG_NOTICE([checking for flock implem
 AC_CHECK_PROGS(FLOCK, flock)
 # Fallback if 'perl' is available.
 if test -z "$FLOCK"; then
-  AC_CHECK_PROG(FLOCK, perl, \$(abs_top_srcdir)/testsuite/flock)
+  # These need to be absolute paths, yet at the same time need to
+  # canonicalize only relative paths, because then amd will not unmount
+  # drives. Thus the use of PWDCMD: set it to 'pawd' or 'amq -w' if using amd.
+  case $srcdir in
+changequote(,)dnl
+[\\/$]* | ?:[\\/]*) libgomp_abs_srcdir=${srcdir} ;;
+changequote([,])dnl
+*) libgomp_abs_srcdir=`cd "$srcdir" && ${PWDCMD-pwd} || echo "$srcdir"` ;;
+  esac
+  AC_CHECK_PROG(FLOCK, perl, $libgomp_abs_srcdir/testsuite/flock)
 fi
 
 AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)
--- libgomp/configure.jj2024-01-09 09:54:03.486010551 +0100
+++ libgomp/configure   2024-01-10 12:09:15.960016006 +0100
@@ -16638,6 +16638,13 @@ done
 
 # Fallback if 'perl' is available.
 if test -z "$FLOCK"; then
+  # These need to be absolute paths, yet at the same time need to
+  # canonicalize only relative paths, because then amd will not unmount
+  # drives. Thus the use of PWDCMD: set it to 'pawd' or 'amq -w' if using amd.
+  case $srcdir in
+[\\/$]* | ?:[\\/]*) libgomp_abs_srcdir=${srcdir} ;;
+*) libgomp_abs_srcdir=`cd "$srcdir" && ${PWDCMD-pwd} || echo "$srcdir"` ;;
+  esac
   # Extract the first word of "perl", so it can be a program name with args.
 set dummy perl; ac_word=$2
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
@@ -16655,7 +16662,7 @@ do
   test -z "$as_dir" && as_dir=.
 for ac_exec_ext in '' $ac_executable_extensions; do
   if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_FLOCK="\$(abs_top_srcdir)/testsuite/flock"
+ac_cv_prog_FLOCK="$libgomp_abs_srcdir/testsuite/flock"
 $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" 
>&5
 break 2
   fi


Jakub

Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-01-10 Thread Julian Brown

On Tue, 2 Jan 2024 23:21:21 +0800
Chung-Lin Tang  wrote:

> To Julian, there is a patch to the middle-end neutering, a hack
> actually, that detects SSA_NAMEs used in reduction array MEM_REFs,
> and avoids single->parallel copying (by moving those definitions
> before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work
> because reductions do their own initializing of the private copy.

It looks OK to me I think (bearing in mind your following paragraph, of
course!). I wonder though if maybe non-SSA (i.e. addressable) variables
need to be handled also, i.e. parts like this:

+  /* For accesses of variables used in array reductions, instead of
+ propagating the value for the main thread to all other worker threads
+ (which doesn't make sense as a reduction private var), move the defs
+ of such SSA_NAMEs to before the copy block and leave them alone (each
+ thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+{
+  gimple *stmt = gsi_stmt (i);
+  if (gimple_assign_single_p (stmt)
+ && def_escapes_block->contains (gimple_assign_lhs (stmt))
+ && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)

are only handling SSA-converted variables. But maybe that's OK?

> As we discussed in our internal calls, the real proper way is to
> create the private array in a more appropriate stage, but that is too
> long a shot for now. The changes here are needed at least for some
> -O0 cases (when under optimization, propagation of the private
> copies' local address eliminate the SSA_NAME and things actually just
> work in that case). So please bear with this hack.

HTH,

Julian

Re: [PATCH v2] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD

2024-01-10 Thread Richard Earnshaw


On 05/01/2024 01:43, Lipeng Zhu wrote:

This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is
not defined in dec_waiting_unlocked function. As io.h does
not include async.h, the WRLOCK and RWUNLOCK macros are
undefined.

libgfortran/ChangeLog:

* io/io.h (dec_waiting_unlocked): Use
__gthread_rwlock_wrlock/__gthread_rwlock_unlock or
__gthread_mutex_lock/__gthread_mutex_unlock functions
to replace WRLOCK and RWUNLOCK macros.

Signed-off-by: Lipeng Zhu 


Has this been committed yet?

R.

---
  libgfortran/io/io.h | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
(void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (&unit_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (&unit_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (&unit_rwlock);
+#else
+  __gthread_mutex_lock (&unit_rwlock);
u->waiting--;
-  RWUNLOCK (&unit_rwlock);
+  __gthread_mutex_unlock (&unit_rwlock);
+#endif
  #endif
  }

Re: [PATCH] Fix debug info for enumeration types with reverse Scalar_Storage_Order

2024-01-10 Thread Eric Botcazou

> Can you elaborate on the DIE order constraint and why it was chosen?  That
> is,
> 
> +  /* The DIE with DW_AT_endianity is placed right after the naked DIE. 
> */ +  if (reverse)
> +   {
> + gcc_assert (type_die);
> ...
> 
> and
> 
> +  /* The DIE with DW_AT_endianity is placed right after the naked DIE. 
> */ +  if (reverse_type)
> +   {
> + dw_die_ref after_die
> +   = modified_type_die (type, cv_quals, false, context_die);
> + gen_type_die (type, context_die, true);
> + gcc_assert (after_die->die_sib
> + && get_AT_unsigned (after_die->die_sib,
> DW_AT_endianity)); + return after_die->die_sib;
> 
> ?

That's preexisting though, see line 13730 where there is a small blurb.

The crux of the matter is that there is no scalar *_TYPE node with a reverse 
SSO, so there is nothing to equate with for the DIE carrying DW_AT_endianity, 
unlike for type variants (the reverse SSO is on the enclosing aggregate type 
instead but this does not match the way DWARF describes it).

Therefore, in order to avoid building a new DIE with DW_AT_endianity each 
time, the DIE with DW_AT_endianity is placed right after the naked DIE, so 
that the lookup done at line 13730 for reverse SSO is immediate.

> Likewise the extra argument to the functions is odd - is that not available
> on the tree type?

No, for the reason described above, so the extra parameter is preexisting for 
base_type_die, modified_type_die and add_type_attribute.

-- 
Eric Botcazou

Re: [PATCH] Fix debug info for enumeration types with reverse Scalar_Storage_Order

2024-01-10 Thread Richard Biener

On Wed, Jan 10, 2024 at 12:53 PM Eric Botcazou  wrote:
>
> > Can you elaborate on the DIE order constraint and why it was chosen?  That
> > is,
> >
> > +  /* The DIE with DW_AT_endianity is placed right after the naked DIE.
> > */ +  if (reverse)
> > +   {
> > + gcc_assert (type_die);
> > ...
> >
> > and
> >
> > +  /* The DIE with DW_AT_endianity is placed right after the naked DIE.
> > */ +  if (reverse_type)
> > +   {
> > + dw_die_ref after_die
> > +   = modified_type_die (type, cv_quals, false, context_die);
> > + gen_type_die (type, context_die, true);
> > + gcc_assert (after_die->die_sib
> > + && get_AT_unsigned (after_die->die_sib,
> > DW_AT_endianity)); + return after_die->die_sib;
> >
> > ?
>
> That's preexisting though, see line 13730 where there is a small blurb.
>
> The crux of the matter is that there is no scalar *_TYPE node with a reverse
> SSO, so there is nothing to equate with for the DIE carrying DW_AT_endianity,
> unlike for type variants (the reverse SSO is on the enclosing aggregate type
> instead but this does not match the way DWARF describes it).
>
> Therefore, in order to avoid building a new DIE with DW_AT_endianity each
> time, the DIE with DW_AT_endianity is placed right after the naked DIE, so
> that the lookup done at line 13730 for reverse SSO is immediate.

Hmm, I see.  The patch is OK then.

Thanks,
Richard.

> > Likewise the extra argument to the functions is odd - is that not available
> > on the tree type?
>
> No, for the reason described above, so the extra parameter is preexisting for
> base_type_die, modified_type_die and add_type_attribute.
>
> --
> Eric Botcazou
>
>

Re: [PATCH] reassoc vs uninitialized variable {PR112581]

2024-01-10 Thread Richard Biener

On Sat, Dec 23, 2023 at 7:35 PM Andrew Pinski  wrote:
>
> Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16,
> reassociation can combine across a few bb and one of the usage
> can be an uninitializated variable and if going from an conditional
> usage to an unconditional usage can cause wrong code.
> This uses maybe_undef_p like other passes where this can happen.
>
> Note if-to-switch uses the function (init_range_entry) provided
> by ressociation so we need to call mark_ssa_maybe_undefs there;
> otherwise we assume almost all ssa names are uninitialized.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/112581
> * gimple-if-to-switch.cc (pass_if_to_switch::execute): Call
> mark_ssa_maybe_undefs.
> * tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized
> variables can not be reassociated.
> (init_range_entry): Check for uninitialized variables too.
> (init_reassoc): Call mark_ssa_maybe_undefs.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/112581
> * gcc.c-torture/execute/pr112581-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-if-to-switch.cc|  3 ++
>  .../gcc.c-torture/execute/pr112581-1.c| 37 +++
>  gcc/tree-ssa-reassoc.cc   |  7 +++-
>  3 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr112581-1.c
>
> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> index 7792a6024cd..af8d6684d32 100644
> --- a/gcc/gimple-if-to-switch.cc
> +++ b/gcc/gimple-if-to-switch.cc
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "alloc-pool.h"
>  #include "tree-switch-conversion.h"
>  #include "tree-ssa-reassoc.h"
> +#include "tree-ssa.h"
>
>  using namespace tree_switch_conversion;
>
> @@ -494,6 +495,8 @@ pass_if_to_switch::execute (function *fun)
>auto_vec all_candidates;
>hash_map conditions_in_bbs;
>
> +  mark_ssa_maybe_undefs ();
> +
>basic_block bb;
>FOR_EACH_BB_FN (bb, fun)
>  find_conditions (bb, &conditions_in_bbs);
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr112581-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr112581-1.c
> new file mode 100644
> index 000..14081c96d58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr112581-1.c
> @@ -0,0 +1,37 @@
> +/* { dg-require-effective-target int32plus } */
> +/* PR tree-optimization/112581 */
> +/* reassociation, used to combine 2 bb to together,
> +   that made an unitialized variable unconditional used
> +   which then at runtime would cause an infinite loop.  */
> +int a = -1, b = 2501896061, c, d, e, f = 3, g;
> +int main() {
> +  unsigned h;
> +  int i;
> +  d = 0;
> +  for (; d < 1; d++) {
> +int j = ~-((6UL ^ a) / b);
> +if (b)
> +L:
> +  if (!f)
> +continue;
> +if (c)
> +  i = 1;
> +if (j) {
> +  i = 0;
> +  while (e)
> +;
> +}
> +g = -1 % b;
> +h = ~(b || h);
> +f = g || 0;
> +a = a || 0;
> +if (!a)
> +  h = 0;
> +while (h > 4294967294)
> +  if (i)
> +break;
> +if (c)
> +  goto L;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
> index cdef9f7cdc3..94873745928 100644
> --- a/gcc/tree-ssa-reassoc.cc
> +++ b/gcc/tree-ssa-reassoc.cc
> @@ -647,6 +647,9 @@ can_reassociate_op_p (tree op)
>  {
>if (TREE_CODE (op) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
>  return false;
> +  /* Uninitialized variables can't participate in reassociation. */
> +  if (TREE_CODE (op) == SSA_NAME && ssa_name_maybe_undef_p (op))
> +return false;
>/* Make sure asm goto outputs do not participate in reassociation since
>   we have no way to find an insertion place after asm goto.  */
>if (TREE_CODE (op) == SSA_NAME
> @@ -2600,7 +2603,8 @@ init_range_entry (struct range_entry *r, tree exp, 
> gimple *stmt)
> }
>
>if (TREE_CODE (arg0) != SSA_NAME
> - || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg0))
> + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg0)
> + || ssa_name_maybe_undef_p (arg0))
> break;
>loc = gimple_location (stmt);
>switch (code)
> @@ -7418,6 +7422,7 @@ init_reassoc (void)
>free (bbs);
>calculate_dominance_info (CDI_POST_DOMINATORS);
>plus_negates = vNULL;
> +  mark_ssa_maybe_undefs ();
>  }
>
>  /* Cleanup after the reassociation pass, and print stats if
> --
> 2.39.3
>

Re: [PATCH] Optimize A < B ? A : B to MIN_EXPR.

2024-01-10 Thread Richard Biener

On Tue, Jan 9, 2024 at 11:48 AM liuhongt  wrote:
>
> > I wonder if you can amend the existing patterns instead by iterating
> > over cond/vec_cond.  There are quite some (look for uses of
> > minmax_from_comparison) that could be adapted to vectors.
> >
> > The ones matching the simple form you match are
> >
> > #if GIMPLE
> > /* A >= B ? A : B -> max (A, B) and friends.  The code is still
> >in fold_cond_expr_with_comparison for GENERIC folding with
> >some extra constraints.  */
> > (for cmp (eq ne le lt unle unlt ge gt unge ungt uneq ltgt)
> >  (simplify
> >   (cond (cmp:c (nop_convert1?@c0 @0) (nop_convert2?@c1 @1))
> > (convert3? @0) (convert4? @1))
> >   (if (!HONOR_SIGNED_ZEROS (type)
> > ...
> This pattern is a conditional operation that treats a vector as a complete
> unit, it's more like cbranchm which is different from vec_cond_expr.
> So I add my patterns after this.
> >
> > I think.  Consider at least placing the new patterns next to that.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

OK.

Richard.

> Similar for A < B ? B : A to MAX_EXPR.
> There're codes in the frontend to optimize such pattern but failed to
> handle testcase in the PR since it's exposed at gimple level when
> folding backend builtins.
>
> pr95906 now can be optimized to MAX_EXPR as it's commented in the
> testcase.
>
> // FIXME: this should further optimize to a MAX_EXPR
>  typedef signed char v16i8 __attribute__((vector_size(16)));
>  v16i8 f(v16i8 a, v16i8 b)
>
> gcc/ChangeLog:
>
> PR target/104401
> * match.pd (VEC_COND_EXPR: A < B ? A : B -> MIN_EXPR): New patten 
> match.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr104401.c: New test.
> * gcc.dg/tree-ssa/pr95906.c: Adjust testcase.
> ---
>  gcc/match.pd | 21 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr95906.c  |  3 +--
>  gcc/testsuite/gcc.target/i386/pr104401.c | 27 
>  3 files changed, 49 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104401.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7b4b15acc41..d8e2009a83f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5672,6 +5672,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (VECTOR_TYPE_P (type))
> (view_convert @c0)
> (convert @c0
> +
> +/* This is for VEC_COND_EXPR
> +   Optimize A < B ? A : B to MIN (A, B)
> +   A > B ? A : B to MAX (A, B).  */
> +(for cmp (lt le ungt unge gt ge unlt unle)
> + minmax (min min min min max max max max)
> + MINMAX (MIN_EXPR MIN_EXPR MIN_EXPR MIN_EXPR MAX_EXPR MAX_EXPR MAX_EXPR 
> MAX_EXPR)
> + (simplify
> +  (vec_cond (cmp @0 @1) @0 @1)
> +   (if (VECTOR_INTEGER_TYPE_P (type)
> +   && target_supports_op_p (type, MINMAX, optab_vector))
> +(minmax @0 @1
> +
> +(for cmp (lt le ungt unge gt ge unlt unle)
> + minmax (max max max max min min min min)
> + MINMAX (MAX_EXPR MAX_EXPR MAX_EXPR MAX_EXPR MIN_EXPR MIN_EXPR MIN_EXPR 
> MIN_EXPR)
> + (simplify
> +  (vec_cond (cmp @0 @1) @1 @0)
> +   (if (VECTOR_INTEGER_TYPE_P (type)
> +   && target_supports_op_p (type, MINMAX, optab_vector))
> +(minmax @0 @1
>  #endif
>
>  (for cnd (cond vec_cond)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
> index 3d820a58e93..d15670f3e9e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
> @@ -1,7 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-forwprop3-raw -w -Wno-psabi" } */
>
> -// FIXME: this should further optimize to a MAX_EXPR
>  typedef signed char v16i8 __attribute__((vector_size(16)));
>  v16i8 f(v16i8 a, v16i8 b)
>  {
> @@ -10,4 +9,4 @@ v16i8 f(v16i8 a, v16i8 b)
>  }
>
>  /* { dg-final { scan-tree-dump-not "bit_(and|ior)_expr" "forwprop3" } } */
> -/* { dg-final { scan-tree-dump-times "vec_cond_expr" 1 "forwprop3" } } */
> +/* { dg-final { scan-tree-dump-times "max_expr" 1 "forwprop3" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr104401.c 
> b/gcc/testsuite/gcc.target/i386/pr104401.c
> new file mode 100644
> index 000..8ce7ff88d9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104401.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse4.1" } */
> +/* { dg-final { scan-assembler-times "pminsd" 2 } } */
> +/* { dg-final { scan-assembler-times "pmaxsd" 2 } } */
> +
> +#include 
> +
> +__m128i min32(__m128i value, __m128i input)
> +{
> +  return _mm_blendv_epi8(input, value, _mm_cmplt_epi32(value, input));
> +}
> +
> +__m128i max32(__m128i value, __m128i input)
> +{
> +  return _mm_blendv_epi8(input, value, _mm_cmpgt_epi32(value, input));
> +}
> +
> +__m128i min32_1(__m128i value, __m128i input)
> +{
> +  return _mm_blendv_epi8(input, value, _mm_cmpgt_epi32(input, value));
> +}
> +
> +__m128i max32_1(__m128i value, __m128i input)
> +{
> +  return _mm_bl

Re:[pushed] [PATCH v1] LoongArch: testsuite:Fixed a bug that added a target check error.

2024-01-10 Thread chenglulu


Pushed to r14-7096.

在 2024/1/10 下午3:24, chenxiaolong 写道:

After the code is committed in r14-6948, GCC regression testing on some
architectures will produce the following error:

"error executing dg-final: unknown effective target keyword `loongarch*-*-*'"

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Removed an issue with "target keyword"
checking errors on LoongArch architecture.
---
  gcc/testsuite/lib/target-supports.exp | 2 --
  1 file changed, 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5c6bb602cc0..dbc4f016091 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7994,7 +7994,6 @@ proc check_effective_target_vect_widen_mult_qi_to_hi { } {
  || ([istarget aarch64*-*-*]
  && ![check_effective_target_aarch64_sve])
  || [is-effective-target arm_neon]
- || [is-effective-target loongarch*-*-*]
  || ([istarget s390*-*-*]
  && [check_effective_target_s390_vx]))
  || [istarget amdgcn-*-*] }}]
@@ -8019,7 +8018,6 @@ proc check_effective_target_vect_widen_mult_hi_to_si { } {
 && ![check_effective_target_aarch64_sve])
 || [istarget i?86-*-*] || [istarget x86_64-*-*]
 || [is-effective-target arm_neon]
-|| [is-effective-target loongarch*-*-*]
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vx]))
 || [istarget amdgcn-*-*] }}]

Re:[pushed] [PATCH v2] LoongArch: testsuite:Added support for loongarch.

2024-01-10 Thread chenglulu


Pushed to r14-7097.

在 2024/1/10 下午3:25, chenxiaolong 写道:

The function of this test is to check that the compiler supports vectorization
using SLP and vec_{load/store/*}_lanes. However, vec_{load/store/*}_lanes are
not supported on LoongArch, such as the corresponding "st4/ld4" directives on
aarch64.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-21.c: Add loongarch.
---
  gcc/testsuite/gcc.dg/vect/slp-21.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c 
b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 712a73b69d7..58751688414 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -213,7 +213,7 @@ int main (void)
  
 Not all vect_perm targets support that, and it's a bit too specific to have

 its own effective-target selector, so we just test targets directly.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { 
vect_strided4 && { ! { powerpc64*-*-* s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { 
vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */
  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  
{ target { ! { vect_strided4 } } } } } */

Re: [PATCH] libgomp, v2: Use absolute pathname to testsuite/flock [PR113192]

2024-01-10 Thread Rainer Orth

Hi Jakub,

>> FLOCK is also substituted into testsuite/libgomp-site-extra.exp.in,
>> which gets included into site.exp.  That one has
>> 
>> ## Begin content included from file libgomp-site-extra.exp.  Do not modify. 
>> ##
>> set FLOCK {$(abs_top_srcdir)/testsuite/flock}
>> 
>> So expect tries to literally execute '$(abs_top_srcdir)/testsuite/flock'
>> which cannot work.
>
> Does the following work then?
>
> Using autoconf's internal _AC_SRCDIRS macro doesn't seem to be a good idea
> to me, so I've copied what e.g. libobjc configure does instead.

The patch worked just fine (tested on i386-pc-solaris2.11 by
rebuilding/testing libgomp from scratch).  site.exp is as expected:

set FLOCK {/vol/gcc/src/hg/master/local/libgomp/testsuite/flock}

Thanks a lot.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH v2] aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077]

2024-01-10 Thread Alex Coplan

This is a v2 which addresses feedback from v1, posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642313.html

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
attached to callee saves (in aarch64_save_callee_saves).  That patch changed
the ldp/stp representation to use unspecs instead of PARALLEL moves.  This meant
that we needed to attach CFI notes to all frame-related pair saves such that
dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs
directly).  The patch also attached REG_CFA_OFFSET notes to individual saves so
that the ldp/stp pass could easily preserve them when forming stps.

In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
choice was problematic in that REG_CFA_OFFSET requires the attached
store to be expressed in terms of the current CFA register at all times.
This means that even scheduling of frame-related insns can break this
invariant, leading to ICEs in dwarf2cfi.

The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
directly for sp-relative saves.  This change restores that behaviour by using
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
effectively just gives a different pattern for dwarf2cfi to look at instead of
the main insn pattern.  That allows us to attach the old-style PARALLEL move
representation in a REG_FRAME_RELATED_EXPR note and means we are free to always
express the save addresses in terms of the stack pointer.

Since the ldp/stp fusion pass can combine frame-related stores, this patch also
updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
the ability to synthesize those notes when combining sp-relative saves into an
stp (the latter always needs a note due to the unspec representation, the former
does not).

gcc/ChangeLog:

PR target/113077
* config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add fr_expr 
param to
extract REG_FRAME_RELATED_EXPR notes.
(combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
synthesize these if needed.  Update caller ...
(ldp_bb_info::fuse_pair): ... here.
* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.

gcc/testsuite/ChangeLog:

PR target/113077
* gcc.target/aarch64/pr113077.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 2fe1b1d4d84..324d28797da 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -904,9 +904,11 @@ aarch64_operand_mode_for_pair_mode (machine_mode mode)
 // Go through the reg notes rooted at NOTE, dropping those that we should drop,
 // and preserving those that we want to keep by prepending them to (and
 // returning) RESULT.  EH_REGION is used to make sure we have at most one
-// REG_EH_REGION note in the resulting list.
+// REG_EH_REGION note in the resulting list.  FR_EXPR is used to return any
+// REG_FRAME_RELATED_EXPR note we find, as these can need special handling in
+// combine_reg_notes.
 static rtx
-filter_notes (rtx note, rtx result, bool *eh_region)
+filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr)
 {
   for (; note; note = XEXP (note, 1))
 {
@@ -940,6 +942,10 @@ filter_notes (rtx note, rtx result, bool *eh_region)
   copy_rtx (XEXP (note, 0)),
   result);
  break;
+   case REG_FRAME_RELATED_EXPR:
+ gcc_assert (!*fr_expr);
+ *fr_expr = copy_rtx (XEXP (note, 0));
+ break;
default:
  // Unexpected REG_NOTE kind.
  gcc_unreachable ();
@@ -951,13 +957,52 @@ filter_notes (rtx note, rtx result, bool *eh_region)
 
 // Return the notes that should be attached to a combination of I1 and I2, 
where
 // *I1 < *I2.
+//
+// LOAD_P is true for loads, REVERSED is true if the insns in program order are
+// not in offset order, and PATS gives the final RTL patterns for the accesses.
 static rtx
-combine_reg_notes (insn_info *i1, insn_info *i2)
+combine_reg_notes (insn_info *i1, insn_info *i2, bool load_p, bool reversed,
+  rtx pats[2])
 {
+  // Temporary storage for REG_FRAME_RELATED_EXPR notes.
+  rtx fr_expr[2] = {};
+
   bool found_eh_region = false;
   rtx result = NULL_RTX;
-  result = filter_notes (REG_NOTES (i2->rtl ()), result, &found_eh_region);
-  return filter_notes (REG_NOTES (i1->rtl ()), result, &found_eh_region);
+  result = filter_notes (REG_NOTES (i2->rtl ()), result,
+&found_eh_region, fr_expr);
+  result = filter_notes (REG_NOTES (i1->rtl ()), result,
+&found_eh_region, fr_expr + 1);
+
+  if (!load_p)
+{
+  // Simple frame-related sp-relative saves don't need CFI note

RE: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-10 Thread Richard Biener

On Fri, 5 Jan 2024, Tamar Christina wrote:

> > On Fri, 2024-01-05 at 11:02 +, Tamar Christina wrote:
> > > Ok, so something like:
> > >
> > > > > ([istarget loongarch*-*-*] &&
> > > > > ([check_effective_target_loongarch_sx] ||
> > > > > [check_effective_target_hard_float]))
> > > ?
> > 
> > We don't need "[check_effective_target_loongarch_sx] ||" because SIMD
> > requires hard float.
> > 
> 
> Cool, thanks! 
> 
> --
> 
> Hi All,
> 
> currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
> The latter has a libcall fallback and the IFN can only do optabs.
> 
> Because of this the change I made to optimize copysign only works if the
> target has impemented the optab, but it should work for those that have the
> libcall too.
> 
> More annoyingly if a target has vector versions of ABS and NEG but not 
> COPYSIGN
> then the change made them lose vectorization.
> 
> The proper fix for this is to treat the IFN the same as the tree EXPR and to
> enhance expand_COPYSIGN to also support vector calls.

I don't think that will work - you'd still need to check for the
availability of the function, otherwise you'll end up with link
errors.  I think you instead want to verify that fallback expansion
with expand_copysign_absneg or expand_copysign_bit will work, thus
we'll never emit a libcall.  In fact I think we might want to require
that all targets either implement a copysign optab or allow such
fallback expansion given its such a core functionality.

> I have such a patch for GCC 15 but it's quite big and too invasive for 
> stage-4.
> As such this is a minimal fix, just don't apply the transformation and leave
> targets which don't have the optab unoptimized.
> 
> Targets list for check_effective_target_ifn_copysign was gotten by grepping 
> for
> copysign and looking at the optab.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests ran in x86_64-pc-linux-gnu -m32 and tests no longer fail.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/112468
>   * doc/sourcebuild.texi: Document ifn_copysign.
>   * match.pd: Only apply transformation if target supports the IFN.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/112468
>   * gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
>   IFN_COPYSIGN.
>   * gcc.dg/pr55152-2.c: Likewise.
>   * gcc.dg/tree-ssa/abs-4.c: Likewise.
>   * gcc.dg/tree-ssa/backprop-6.c: Likewise.
>   * gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
>   * gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
>   * lib/target-supports.exp (check_effective_target_ifn_copysign): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 
> 4be67daedb20d394857c02739389cabf23c0d533..f4847dafe65cbbf8c9de34905f614ef6957658b4
>  100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2664,6 +2664,10 @@ Target requires a command line argument to enable a 
> SIMD instruction set.
>  @item xorsign
>  Target supports the xorsign optab expansion.
>  
> +@item ifn_copysign
> +Target supports the IFN_COPYSIGN optab expansion for both scalar and vector
> +types.

Target supports the copysign optab expansion for both scalar and vector
modes.

Note this leaves the actual modes required unspecified - can we
restrict this to float and double?

> +
>  @end table
>  
>  @subsubsection Environment attributes
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> d57e29bfe1d68afd4df4dda20fecc2405ff05332..87d13e7e3e1aa6d89119142b614890dc4729b521
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1159,13 +1159,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(copysigns @0 REAL_CST@1)
>(if (!REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (abs @0
> +   (abs @0)
> +#if GIMPLE
> +   (if (!direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +  OPTIMIZE_FOR_BOTH))
> +(negate (abs @0)))
> +#endif
> +   )))
>  
> +#if GIMPLE
>  /* Transform fneg (fabs (X)) -> copysign (X, -1).  */
>  (simplify
>   (negate (abs @0))
> - (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
> -
> + (if (direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +   OPTIMIZE_FOR_BOTH))
> +   (IFN_COPYSIGN @0 { build_minus_one_cst (type); })))
> +#endif

I think we want to update the comments as well to note that
copysign (x, -1) is canonical for -abs(x) when copysign
is implemented as optab.

OK with these changes.

Richard.

>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> index 
> f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6..96b80c733794fffada1b08274ef39cc8f6e442ce
>  100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> @@ -1,5 +1,6 @@
>

Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-01-10 Thread Eli Zaretskii

> Date: Tue, 9 Jan 2024 21:02:44 +0100
> Cc: i...@google.com, gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> From: Björn Schäpers 
> 
> Am 07.01.2024 um 18:03 schrieb Eli Zaretskii:
> > In that case, you an call either GetModuleHandeExA or
> > GetModuleHandeExW, the difference is minor.
> 
> Here an updated version without relying on TEXT or TCHAR, directly calling 
> GetModuleHandleExW.

Thanks, this LGTM (but I couldn't test it, I just looked at the
sour ce code).

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-10 Thread Richard Sandiford

Just a note that, following discussion on IRC, I'll pull this for
GCC 14 and resubmit for GCC 15.

There was also pushback on IRC about making the pass opt-in.
Enabling it for x86_64 would mean fixing RPAD to use a representation
that is more robust against recombination, but as you can imagine, it's
kind-of difficult for me to justify spending significant time fixing an
issue in the x86_64 port.  Jeff's testing suggested that there are also
latent issues in the older, less maintained ports.

So to get an idea for expectations: would it be a requirement that a
GCC 15 submission is enabled unconditionally and all known issues in
the ports fixed?

Thanks,
Richard

[PATCH] OpenMP: Fix new lvalue-parsing map/to/from tests for 32-bit targets

2024-01-10 Thread Julian Brown

This patch fixes several tests introduced by the commit
r14-7033-g1413af02d62182 for 32-bit targets.

I will commit as obvious.

2024-01-10  Julian Brown  

gcc/testsuite/
* g++.dg/gomp/array-section-1.C: Fix scan output for 32-bit target.
* g++.dg/gomp/array-section-2.C: Likewise.
* g++.dg/gomp/bad-array-section-4.C: Adjust error output for 32-bit
target.
---
 gcc/testsuite/g++.dg/gomp/array-section-1.C | 8 
 gcc/testsuite/g++.dg/gomp/array-section-2.C | 8 
 gcc/testsuite/g++.dg/gomp/bad-array-section-4.C | 2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/g++.dg/gomp/array-section-1.C 
b/gcc/testsuite/g++.dg/gomp/array-section-1.C
index 023706b15c5..562475ab80e 100644
--- a/gcc/testsuite/g++.dg/gomp/array-section-1.C
+++ b/gcc/testsuite/g++.dg/gomp/array-section-1.C
@@ -8,10 +8,10 @@ void foo()
 {
   int arr1[40];
 #pragma omp target map(arr1[x ? C : D])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \(long 
int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} "original" 
} }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \((?:long 
)?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) &arr1\]\)} 
"original" } }
   { }
 #pragma omp target map(arr1[x ? C : D : D])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \(long 
int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} "original" 
} }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \((?:long 
)?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) &arr1\]\)} 
"original" } }
   { }
 #pragma omp target map(arr1[1 : x ? C : D])
 // { dg-final { scan-tree-dump {map\(tofrom:arr1\[1\] \[len: x != 0 \? [0-9]+ 
: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: [0-9]+\]\)} 
"original" } }
@@ -22,10 +22,10 @@ int main()
 {
   int arr1[40];
 #pragma omp target map(arr1[x ? 3 : 5])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \(long 
int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} "original" 
} }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \((?:long 
)?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) &arr1\]\)} 
"original" } }
   { }
 #pragma omp target map(arr1[x ? 3 : 5 : 5])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \(long 
int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} "original" 
} }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: \((?:long 
)?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) &arr1\]\)} 
"original" } }
   { }
 #pragma omp target map(arr1[1 : x ? 3 : 5])
 // { dg-final { scan-tree-dump {map\(tofrom:arr1\[1\] [len: x != 0 ? [0-9]+ : 
[0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: [0-9]+\]\)} 
"original" } }
diff --git a/gcc/testsuite/g++.dg/gomp/array-section-2.C 
b/gcc/testsuite/g++.dg/gomp/array-section-2.C
index 072108d1f89..e2be9791e81 100644
--- a/gcc/testsuite/g++.dg/gomp/array-section-2.C
+++ b/gcc/testsuite/g++.dg/gomp/array-section-2.C
@@ -16,10 +16,10 @@ int C::foo()
   /* There is a parsing ambiguity here without the space.  We don't try to
  resolve that automatically (though maybe we could, in theory).  */
 #pragma omp target map(arr1[::x: ::y])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: 
\(sizetype\) y \* [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: 
\(long int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} "original" } }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: 
\(sizetype\) y \* [0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: 
\((?:long )?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) &arr1\]\)} 
"original" } }
   { }
 #pragma omp target map(arr1[::x:])
-// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: \(40 
- \(sizetype\) SAVE_EXPR \) \* [0-9]+\]\) map\(firstprivate:arr1 \[pointer 
assign, bias: \(long int\) &arr1\[SAVE_EXPR \] - \(long int\) &arr1\]\)} 
"original" } }
+// { dg-final { scan-tree-dump {map\(tofrom:arr1\[SAVE_EXPR \] \[len: \(40 
- \(sizetype\) SAVE_EXPR \) \* [0-9]+\]\) map\(firstprivate:arr1 \[pointer 
assign, bias: \((?:long )?int\) &arr1\[SAVE_EXPR \] - \((?:long )?int\) 
&arr1\]\)} "original" } }
   { }
 #pragma omp target map(arr1[: ::y])
 // { dg-final { scan-tree-dump {map\(tofrom:arr1\[0\] \[len: \(sizetype\) y \* 
[0-9]+\]\) map\(firstprivate:arr1 \[pointer assign, bias: 0\]\)} "original" } }
@@ -40,10 +40,10 @@ void Ct::foo()
 {
   int arr1[40];
 #pragma

[PATCH] OpenMP: Fix g++.dg/gomp/bad-array-section-10.C for C++23 and up

2024-01-10 Thread Julian Brown

This patch adjusts diagnostic output for C++23 and above for the test
case mentioned in the commit title.

I will apply shortly as obvious.

2024-01-10  Julian Brown  

gcc/testsuite/
* g++.dg/gomp/bad-array-section-10.C: Adjust diagnostics for C++23 and
up.
---
 gcc/testsuite/g++.dg/gomp/bad-array-section-10.C | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/testsuite/g++.dg/gomp/bad-array-section-10.C 
b/gcc/testsuite/g++.dg/gomp/bad-array-section-10.C
index 393b0fefe51..286e72e9f64 100644
--- a/gcc/testsuite/g++.dg/gomp/bad-array-section-10.C
+++ b/gcc/testsuite/g++.dg/gomp/bad-array-section-10.C
@@ -6,12 +6,15 @@ void foo()
   int arr1[40];
 #pragma omp target map(arr1[4,C:])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 #pragma omp target map(arr1[4,5:C,7])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 #pragma omp target map(arr1[:8,C,10])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 }
 
@@ -20,12 +23,15 @@ int main()
   int arr1[40];
 #pragma omp target map(arr1[4,5:])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 #pragma omp target map(arr1[4,5:6,7])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 #pragma omp target map(arr1[:8,9,10])
 // { dg-warning "top-level comma expression in array subscript is deprecated" 
"" { target c++20_only } .-1 }
+// { dg-error "cannot use multidimensional subscript in OpenMP array section" 
"" { target c++23 } .-2 }
   { }
 
   foo<6, 9> ();
-- 
2.25.1

Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-10 Thread Julian Brown

On Wed, 10 Jan 2024 10:14:41 +0100
Jakub Jelinek  wrote:

> On Fri, Jan 05, 2024 at 12:23:26PM +, Julian Brown wrote:
> > * g++.dg/gomp/bad-array-section-10.C: New test.  
> 
> This test FAILs in C++23/C++26 modes, just try
> make check-g++ GXX_TESTSUITE_STDS=98,11,14,17,20,23,26
> RUNTESTFLAGS=gomp.exp=bad-array-section-10.C While in C++20 comma in
> array references was deprecated, in C++23 we implement
> multidimensional arrays, so the diagnostics there is different. See
> https://wg21.link/p2036r3

Thanks -- I've pushed a patch to fix this.  The bad-array-section-11.C
test covered the C++23 case already, but I don't think normal testing
iterates the newer language standards, hence missing this.

Julian

Re: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-10 Thread Robin Dapp

Hi Joshua,

> For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
> and floating-point compare instructions, an illegal instruction
> exception will be raised if the destination vector register overlaps
> a source vector register group.
> 
> To handle this issue, we use "group_overlap" and "enabled" attribute
> to disable some alternatives for xtheadvector.

>  ;; Widening instructions have group-overlap constraints.  Those are only
>  ;; valid for certain register-group sizes.  This attribute marks the
>  ;; alternatives not matching the required register-group size as disabled.
> -(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
> +(define_attr "group_overlap" 
> "none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled"
>(const_string "none"))

I realize there have been some discussions before but I find the naming
misleading.  The group_overlap attribute is supposed to specify whether
groups overlap (and mark the respective alternatives accepting
only this overlap).
Then we check if the groups overlap and disable all non-matching
alternatives.  "none" i.e. "no overlap" always matches.

Your first goal seems to be to disable existing non-early-clobber
alternatives for thv.  For this, maybe "full", "same" (or "any"?) would
work?  Please also add a comment in group_overlap_valid then that we
need not actually check for register equality.

For the other insns, I wonder if we could get away with not really
disabling the newly added early-clobber alternatives for RVV but
just disparaging ("?") them?  That way we could re-use "full" for
the thv-disabled alternatives and "none" for the newly added ones.
("none" will still be misleading then, though :/)

If this doesn't work or others feel the separation is not strict
enough, I'd prefer a separate attribute rather than overloading
group_overlap.  Maybe something like "spec_restriction" or similar
with two values "rvv" and "thv"?

Regards
 Robin

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-10 Thread Richard Biener

On Wed, 10 Jan 2024, Richard Sandiford wrote:

> Just a note that, following discussion on IRC, I'll pull this for
> GCC 14 and resubmit for GCC 15.
> 
> There was also pushback on IRC about making the pass opt-in.
> Enabling it for x86_64 would mean fixing RPAD to use a representation
> that is more robust against recombination, but as you can imagine, it's
> kind-of difficult for me to justify spending significant time fixing an
> issue in the x86_64 port.  Jeff's testing suggested that there are also
> latent issues in the older, less maintained ports.
> 
> So to get an idea for expectations: would it be a requirement that a
> GCC 15 submission is enabled unconditionally and all known issues in
> the ports fixed?

Can you open a bugreport with the issue in RPAD, maybe outlining
what would need to be done?

I think x86 maintainers could opt to disable the pass - so it would
be opt-out.  It's reasonable to expect them to fix the backend given
there's nothing really wrong with the new pass, it just does
something that wasn't done before at that point?

Richard.

[pushed 1/3] pretty-print: add selftest coverage for numbered args

2024-01-10 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-7104-g7daa935c7997f3.

gcc/ChangeLog:
* pretty-print.cc (selftest::test_pp_format): Add selftest
coverage for numbered args.

Signed-off-by: David Malcolm 
---
 gcc/pretty-print.cc | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index fd4c38ed3eb4..859ae2a273db 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -2605,6 +2605,20 @@ test_pp_format ()
   assert_pp_format (SELFTEST_LOCATION, "item 3 of 7", "item %i of %i", 3, 7);
   assert_pp_format (SELFTEST_LOCATION, "problem with `bar' at line 10",
"problem with %qs at line %i", "bar", 10);
+
+  /* Verified numbered args.  */
+  assert_pp_format (SELFTEST_LOCATION,
+   "foo: second bar: first",
+   "foo: %2$s bar: %1$s",
+   "first", "second");
+  assert_pp_format (SELFTEST_LOCATION,
+   "foo: 1066 bar: 1776",
+   "foo: %2$i bar: %1$i",
+   1776, 1066);
+  assert_pp_format (SELFTEST_LOCATION,
+   "foo: second bar: 1776",
+   "foo: %2$s bar: %1$i",
+   1776, "second");
 }
 
 /* A subclass of pretty_printer for use by test_prefixes_and_wrapping.  */
-- 
2.26.3

[pushed 3/3] gcc-urlifier: handle option prefixes such as '-fno-'

2024-01-10 Thread David Malcolm

Given e.g. this missppelled option (omitting the trailing 's'):
$ LANG=C ./xgcc -B. -fno-inline-small-function
xgcc: error: unrecognized command-line option '-fno-inline-small-function'; did 
you mean '-fno-inline-small-functions'?

we weren't providing a documentation URL for the suggestion.

The issue is the URLification code uses find_opt, which doesn't consider
the various '-fno-' prefixes.

This patch adds a way to find the pertinent prefix remapping and uses it
when determining URLs.
With this patch, the suggestion '-fno-inline-small-functions' now gets a
documentation link (to that of '-finline-small-functions').

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-7106-gbe2bf5dc93ca1e.

gcc/ChangeLog:
* gcc-urlifier.cc (gcc_urlifier::get_url_suffix_for_option):
Handle prefix mappings before calling find_opt.
(selftest::gcc_urlifier_cc_tests): Add example of urlifying a
"-fno-"-prefixed command-line option.
* opts-common.cc (get_option_prefix_remapping): New.
* opts.h (get_option_prefix_remapping): New decl.

Signed-off-by: David Malcolm 
---
 gcc/gcc-urlifier.cc | 49 -
 gcc/opts-common.cc  | 22 
 gcc/opts.h  |  3 +++
 3 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/gcc/gcc-urlifier.cc b/gcc/gcc-urlifier.cc
index 6bd176fc2483..be6459e8d7c1 100644
--- a/gcc/gcc-urlifier.cc
+++ b/gcc/gcc-urlifier.cc
@@ -154,11 +154,46 @@ gcc_urlifier::get_url_suffix_for_option (const char *p, 
size_t sz) const
  and skipping the leading '-'.
 
  We have a (pointer,size) pair that doesn't necessarily have a
- terminator, so create a 0-terminated clone of the string.  */
-  gcc_assert (sz > 0);
-  char *tmp = xstrndup (p + 1, sz - 1); // skip the leading '-'
-  size_t opt = find_opt (tmp, m_lang_mask);
-  free (tmp);
+ terminator.
+ Additionally, we could have one of the e.g. "-Wno-" variants of
+ the option, which find_opt doesn't handle.
+
+ Hence we need to create input for find_opt in a temporary buffer.  */
+  char *option_buffer;
+
+  const char *new_prefix;
+  if (const char *old_prefix = get_option_prefix_remapping (p, sz, 
&new_prefix))
+{
+  /* We have one of the variants; generate a buffer containing a copy
+that maps from the old prefix to the new prefix
+e.g. given "-Wno-suffix", generate "-Wsuffix".  */
+  gcc_assert (old_prefix[0] == '-');
+  gcc_assert (new_prefix);
+  gcc_assert (new_prefix[0] == '-');
+
+  const size_t old_prefix_len = strlen (old_prefix);
+  gcc_assert (old_prefix_len <= sz);
+  const size_t suffix_len = sz - old_prefix_len;
+  const size_t new_prefix_len = strlen (new_prefix);
+  const size_t new_sz = new_prefix_len + suffix_len + 1;
+
+  option_buffer = (char *)xmalloc (new_sz);
+  memcpy (option_buffer, new_prefix, new_prefix_len);
+  /* Copy suffix.  */
+  memcpy (option_buffer + new_prefix_len, p + old_prefix_len, suffix_len);
+  /* Terminate.  */
+  option_buffer[new_prefix_len + suffix_len] = '\0';
+}
+  else
+{
+  /* Otherwise we can simply create a 0-terminated clone of the string.  */
+  gcc_assert (sz > 0);
+  gcc_assert (p[0] == '-');
+  option_buffer = xstrndup (p, sz);
+}
+
+  size_t opt = find_opt (option_buffer + 1, m_lang_mask);
+  free (option_buffer);
 
   if (opt >= N_OPTS)
 /* Option not recognized.  */
@@ -221,6 +256,10 @@ gcc_urlifier_cc_tests ()
   /* Check an option.  */
   ASSERT_STREQ (u.get_url_suffix_for_quoted_text ("-fpack-struct").get (),
"gcc/Code-Gen-Options.html#index-fpack-struct");
+
+  /* Check a "-fno-" variant of an option.  */
+  ASSERT_STREQ (u.get_url_suffix_for_quoted_text ("-fno-inline").get (),
+   "gcc/Optimize-Options.html#index-finline");
 }
 
 } // namespace selftest
diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 73126cb74e0e..4a2dff243b0c 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -468,6 +468,28 @@ static const struct option_map option_map[] =
 { "--no-", NULL, "-f", false, true }
   };
 
+/* Given buffer P of size SZ, look for a prefix within OPTION_MAP;
+   if found, return the prefix and write the new prefix to *OUT_NEW_PREFIX.
+   Otherwise return nullptr.  */
+
+const char *
+get_option_prefix_remapping (const char *p, size_t sz,
+const char **out_new_prefix)
+{
+  for (unsigned i = 0; i < ARRAY_SIZE (option_map); i++)
+{
+  const char * const old_prefix = option_map[i].opt0;
+  const size_t old_prefix_len = strlen (old_prefix);
+  if (old_prefix_len <= sz
+ && !memcmp (p, old_prefix, old_prefix_len))
+   {
+ *out_new_prefix = option_map[i].new_prefix;
+ return old_prefix;
+   }
+}
+  return nullptr;
+}
+
 /* Helper function for gcc.cc's driver::suggest_option, for populating the
vec

[pushed 2/3] pretty-print: support urlification in phase 3

2024-01-10 Thread David Malcolm

TL;DR: for the case when the user misspells a command-line option
and we suggest one, with this patch we now provide a documentation URL
for the suggestion.

In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
URLs to quoted strings in diagnostics, and in r14-6920-g9e49746da303b8
through r14-6923-g4ded42c2c5a5c9 wired this up so that any time
we mention a command-line option in a diagnostic message in quotes,
the user gets a URL to the HTML documentation for that option.

However this only worked for quoted strings that were fully within
a single "chunk" within the pretty-printer implementation, such as:

* "%<-foption%>" (handled in phase 1)
* "%qs", "-foption" (handled in phase 2)

but not where the the quoted string straddled multiple chunks, in
particular for this important case in the gcc.cc:

  error ("unrecognized command-line option %<-%s%>;"
 " did you mean %<-%s%>?",
 switches[i].part1, hint);

e.g. for:
$ LANG=C ./xgcc -B. -finling-small-functions
xgcc: error: unrecognized command-line option '-finling-small-functions'; did 
you mean '-finline-small-functions'?

which within pp_format becomes these chunks:

* chunk 0: "unrecognized command-line option `-"
* chunk 1: switches[i].part1  (e.g. "finling-small-functions")
* chunk 2: "'; did you mean `-"
* chunk 3: hint (e.g. "finline-small-functions")
* chunk 4: "'?"

where the first quoted run is in chunks 1-3 and the second in
chunks 2-4.

Hence we were not attempting to provide a URL for the two quoted runs,
and, in particular not for the hint.

This patch refactors the urlification mechanism in pretty-print.cc so
that it checks for quoted runs that appear in phase 3 (as well as in
phases 1 and 2, as before).  With this, the quoted text runs
"-finling-small-functions" and "-finline-small-functions" are passed
to the urlifier, which successfully finds a documentation URL for
the latter.

As before, the urlification code is only run if the URL escapes are
enabled, and only for messages from diagnostic.cc (error, warn, inform,
etc), not for all pretty_printer usage.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-7105-g5daf9104ed5d4e.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::report_diagnostic): Pass
m_urlifier to pp_output_formatted_text.
* pretty-print.cc: Add #define of INCLUDE_VECTOR.
(obstack_append_string): New overload, taking a length.
(urlify_quoted_string): Pass in an obstack ptr, rather than using
that of the pp's buffer.  Generalize to handle trailing text in
the buffer beyond the run of quoted text.
(class quoting_info): New.
(on_begin_quote): New.
(on_end_quote): New.
(pp_format): Refactor phase 1 and phase 2 quoting support, moving
it to calls to on_begin_quote and on_end_quote.
(struct auto_obstack): New.
(quoting_info::handle_phase_3): New.
(pp_output_formatted_text): Add urlifier param.  Use it if there
is deferred urlification.  Delete m_quotes.
(selftest::pp_printf_with_urlifier): Pass urlifier to
pp_output_formatted_text.
(selftest::test_urlification): Update results for the existing
case of quoted text stradding chunks; add more such test cases.
* pretty-print.h (class quoting_info): New forward decl.
(chunk_info::m_quotes): New field.
(pp_output_formatted_text): Add optional urlifier param.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic.cc   |   2 +-
 gcc/pretty-print.cc | 444 
 gcc/pretty-print.h  |   9 +-
 3 files changed, 379 insertions(+), 76 deletions(-)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index cc2b1840c661..f5411b1ede0d 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -1603,7 +1603,7 @@ diagnostic_context::report_diagnostic (diagnostic_info 
*diagnostic)
 
   pp_format (this->printer, &diagnostic->message, m_urlifier);
   m_output_format->on_begin_diagnostic (*diagnostic);
-  pp_output_formatted_text (this->printer);
+  pp_output_formatted_text (this->printer, m_urlifier);
   if (m_show_cwe)
 print_any_cwe (*diagnostic);
   if (m_show_rules)
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 859ae2a273db..de454ab7a401 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "intl.h"
@@ -1031,7 +1032,16 @@ obstack_append_string (obstack *ostack, const char *str)
   obstack_grow (ostack, str, strlen (str));
 }
 
-/* Given quoted text starting at QUOTED_TEXT_START_IDX within PP's buffer,
+/* Append STR to OSTACK, without a null-terminator.  */
+
+static void
+obstack_append_string (obstack *ostack, const char *str, size_t len)
+{
+  obstack_grow (

Re: Re: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-10 Thread 钟居哲

>> For the other insns, I wonder if we could get away with not really
>>disabling the newly added early-clobber alternatives for RVV but
>>just disparaging ("?") them?  That way we could re-use "full" for
>>the thv-disabled alternatives and "none" for the newly added ones.
>>("none" will still be misleading then, though :/)

I prefer to disable those early-clobber alternatives added of theadvector for 
RVV,
since disparage still make RA possible reaches the early clobber alternatives.

>>If this doesn't work or others feel the separation is not strict
>>enough, I'd prefer a separate attribute rather than overloading
>>group_overlap.  Maybe something like "spec_restriction" or similar
>>with two values "rvv" and "thv"?

I like this idea, it makes more sense to me. So I think it's better to add an 
attribute to
disable alternative for theadvector or RVV1.0.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 21:36
To: Jun Sha (Joshua); gcc-patches
CC: rdapp.gcc; jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v5] RISC-V: Fix register overlap issue for some 
xtheadvector instructions
Hi Joshua,
 
> For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
> and floating-point compare instructions, an illegal instruction
> exception will be raised if the destination vector register overlaps
> a source vector register group.
> 
> To handle this issue, we use "group_overlap" and "enabled" attribute
> to disable some alternatives for xtheadvector.
 
>  ;; Widening instructions have group-overlap constraints.  Those are only
>  ;; valid for certain register-group sizes.  This attribute marks the
>  ;; alternatives not matching the required register-group size as disabled.
> -(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
> +(define_attr "group_overlap" 
> "none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled"
>(const_string "none"))
 
I realize there have been some discussions before but I find the naming
misleading.  The group_overlap attribute is supposed to specify whether
groups overlap (and mark the respective alternatives accepting
only this overlap).
Then we check if the groups overlap and disable all non-matching
alternatives.  "none" i.e. "no overlap" always matches.
 
Your first goal seems to be to disable existing non-early-clobber
alternatives for thv.  For this, maybe "full", "same" (or "any"?) would
work?  Please also add a comment in group_overlap_valid then that we
need not actually check for register equality.
 
For the other insns, I wonder if we could get away with not really
disabling the newly added early-clobber alternatives for RVV but
just disparaging ("?") them?  That way we could re-use "full" for
the thv-disabled alternatives and "none" for the newly added ones.
("none" will still be misleading then, though :/)
 
If this doesn't work or others feel the separation is not strict
enough, I'd prefer a separate attribute rather than overloading
group_overlap.  Maybe something like "spec_restriction" or similar
with two values "rvv" and "thv"?
 
Regards
Robin

[PATCH][committed][c++ frontend]: initialize ivdep value

2024-01-10 Thread Tamar Christina

Hi All,

Should control enter the switch from one of the cases other than
the IVDEP one then the variable remains uninitialized.

This fixes it by initializing it to false.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues

Committed as obvious.

Thanks,
Tamar

gcc/cp/ChangeLog:

* parser.cc (cp_parser_pragma): Initialize to false.

--- inline copy of patch -- 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 
379aeb56b152b9b29606ba4d75ad4c49dfe92aac..1b4ce1497e893d6463350eecf5ef4e88957f5f00
 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -50625,7 +50625,7 @@ cp_parser_pragma (cp_parser *parser, enum 
pragma_context context, bool *if_p)
 case PRAGMA_UNROLL:
 case PRAGMA_NOVECTOR:
   {
-   bool ivdep;
+   bool ivdep = false;
tree unroll = NULL_TREE;
bool novector = false;
const char *pragma_str;




-- 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 
379aeb56b152b9b29606ba4d75ad4c49dfe92aac..1b4ce1497e893d6463350eecf5ef4e88957f5f00
 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -50625,7 +50625,7 @@ cp_parser_pragma (cp_parser *parser, enum 
pragma_context context, bool *if_p)
 case PRAGMA_UNROLL:
 case PRAGMA_NOVECTOR:
   {
-   bool ivdep;
+   bool ivdep = false;
tree unroll = NULL_TREE;
bool novector = false;
const char *pragma_str;

Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread Robin Dapp

Hi Juzhe,

> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
> 
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256.  Currently, no matter how we adapt cost model,
> we are not able to fix the performance bug since we always use default cost 
> model by default.
> 
> Also, tweak the generic cost model back to default cost model since we have 
> some FAILs in
> current tests.

So to recap:

 - Our current default tune model is rocket which does not have a vector
   cost model.  No other tune model except generic-ooo has one.

 - We want tune models with no vector cost model to fall back to the
   default vector cost model for now, later possibly the generic RVV
   cost model.

 - You're seeing inferior codegen for dynamic-lmul2-7.c with our generic
   RVV (not default) vector cost model (built with -mtune=generic-ooo?).

Therefore the suggestions is to start over freshly with the default
vector cost model?

>  /* Generic costs for VLA vector operations.  */
> @@ -374,13 +374,13 @@ static const scalable_vector_cost 
> generic_vla_vector_cost = {
>  1, /* fp_stmt_cost  */
>  1, /* gather_load_cost  */
>  1, /* scatter_store_cost  */
> -2, /* vec_to_scalar_cost  */
> +1, /* vec_to_scalar_cost  */
>  1, /* scalar_to_vec_cost  */
> -2, /* permute_cost  */
> +1, /* permute_cost  */
>  1, /* align_load_cost  */
>  1, /* align_store_cost  */
> -1, /* unalign_load_cost  */
> -1, /* unalign_store_cost  */
> +2, /* unalign_load_cost  */
> +2, /* unalign_store_cost  */
>},
>  };

So is the idea here to just revert the values to the defaults for now
and change them again soon?  And not to keep this as another default
and add others?

I'm a bit confused here :)  How does this help?  Can't we continue to
fall back to the default vector cost model when a tune model does not
specify a vector cost model?  If generic-ooo using the generic vector
cost model is the problem, then let's just change it to NULL for now?

I suppose at some point we will not want to fall back to the default
vector cost model anymore but always use the generic RVV cost model.
Once we reach the costing part we need to fall back to something
if nothing was defined and generic RVV is supposed to always be better 
than default.

Regards
 Robin

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

>> So is the idea here to just revert the values to the defaults for now
>> and change them again soon?  And not to keep this as another default
>> and add others?

My idea is to revert default for now. Then we can refine the cost gradually.

>> I'm a bit confused here :)  How does this help?  Can't we continue to
>> fall back to the default vector cost model when a tune model does not
>> specify a vector cost model?  If generic-ooo using the generic vector
>> cost model is the problem, then let's just change it to NULL for now?

If you still want to fall back to default vector cost model.
Could you tell me how to fix the XFAILs of slp-*.c tests ?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 22:11
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
Hi Juzhe,
 
> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
> 
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256.  Currently, no matter how we adapt cost model,
> we are not able to fix the performance bug since we always use default cost 
> model by default.
> 
> Also, tweak the generic cost model back to default cost model since we have 
> some FAILs in
> current tests.
 
So to recap:
 
- Our current default tune model is rocket which does not have a vector
   cost model.  No other tune model except generic-ooo has one.
 
- We want tune models with no vector cost model to fall back to the
   default vector cost model for now, later possibly the generic RVV
   cost model.
 
- You're seeing inferior codegen for dynamic-lmul2-7.c with our generic
   RVV (not default) vector cost model (built with -mtune=generic-ooo?).
 
Therefore the suggestions is to start over freshly with the default
vector cost model?
 
>  /* Generic costs for VLA vector operations.  */
> @@ -374,13 +374,13 @@ static const scalable_vector_cost 
> generic_vla_vector_cost = {
>  1, /* fp_stmt_cost  */
>  1, /* gather_load_cost  */
>  1, /* scatter_store_cost  */
> -2, /* vec_to_scalar_cost  */
> +1, /* vec_to_scalar_cost  */
>  1, /* scalar_to_vec_cost  */
> -2, /* permute_cost  */
> +1, /* permute_cost  */
>  1, /* align_load_cost  */
>  1, /* align_store_cost  */
> -1, /* unalign_load_cost  */
> -1, /* unalign_store_cost  */
> +2, /* unalign_load_cost  */
> +2, /* unalign_store_cost  */
>},
>  };
 
So is the idea here to just revert the values to the defaults for now
and change them again soon?  And not to keep this as another default
and add others?
 
I'm a bit confused here :)  How does this help?  Can't we continue to
fall back to the default vector cost model when a tune model does not
specify a vector cost model?  If generic-ooo using the generic vector
cost model is the problem, then let's just change it to NULL for now?
 
I suppose at some point we will not want to fall back to the default
vector cost model anymore but always use the generic RVV cost model.
Once we reach the costing part we need to fall back to something
if nothing was defined and generic RVV is supposed to always be better 
than default.
 
Regards
Robin

[PATCH]middle-end: correctly identify the edge taken when condition is true. [PR113287]

2024-01-10 Thread Tamar Christina

Hi All,

The vectorizer needs to know during early break vectorization whether the edge
that will be taken if the condition is true stays or leaves the loop.

This is because the code assumes that if you take the true branch you exit the
loop.  If you don't exit the loop it has to generate a different condition.

Basically it uses this information to decide whether it's generating a
"any element" or an "all element" check.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues with --enable-lto --with-build-config=bootstrap-O3
--enable-checking=release,yes,rtl,extra.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/113287
* tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge
instead of using BRANCH_EDGE to determine true edge.

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: New test.
* gcc.dg/vect/vect-early-break_99-pr113287.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
new file mode 100644
index 
..f908e5bc60779c148dc95bda3e200383d12b9e1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
@@ -0,0 +1,35 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target bitint } */
+
+__attribute__((noipa)) void
+bar (unsigned long *p)
+{
+  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
+  p[17] = 0x500UL;
+}
+
+__attribute__((noipa)) int
+foo (void)
+{
+  unsigned long r[142];
+  bar (r);
+  unsigned long v = ((long) r[0] >> 31);
+  if (v + 1 > 1)
+return 1;
+  for (unsigned long i = 1; i <= 140; ++i)
+if (r[i] != v)
+  return 1;
+  unsigned long w = r[141];
+  if ((unsigned long) (((long) (w << 60)) >> 60) != v)
+return 1;
+  return 0;
+}
+
+int
+main ()
+{
+  if (foo () != 1)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
new file mode 100644
index 
..b92a8a268d803ab1656b4716b1a319ed4edc87a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
@@ -0,0 +1,32 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target bitint } */
+
+_BitInt(998) b;
+char c;
+char d;
+char e;
+char f;
+char g;
+char h;
+char i;
+char j;
+
+void
+foo(char y, _BitInt(9020) a, char *r)
+{
+  char x = __builtin_mul_overflow_p(a << sizeof(a), y, 0);
+  x += c + d + e + f + g + h + i + j + b;
+  *r = x;
+}
+
+int
+main(void)
+{
+  char x;
+  foo(5, 5, &x);
+  if (x != 1)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
1333d8934783acdb5277e3a03c2b4021fec4777b..da004b0e9e2696cd2ce358d3b221851c7b60b448
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12870,13 +12870,18 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
  rewrite conditions to always be a comparison against 0.  To do this it
  sometimes flips the edges.  This is fine for scalar,  but for vector we
  then have to flip the test, as we're still assuming that if you take the
- branch edge that we found the exit condition.  */
+ branch edge that we found the exit condition.  i.e. we need to know 
whether
+ we are generating a `forall` or an `exist` condition.  */
   auto new_code = NE_EXPR;
   auto reduc_optab = ior_optab;
   auto reduc_op = BIT_IOR_EXPR;
   tree cst = build_zero_cst (vectype);
+  edge exit_true_edge = EDGE_SUCC (gimple_bb (cond_stmt), 0);
+  if (exit_true_edge->flags & EDGE_FALSE_VALUE)
+exit_true_edge = EDGE_SUCC (gimple_bb (cond_stmt), 1);
+  gcc_assert (exit_true_edge->flags & EDGE_TRUE_VALUE);
   if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
-BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+exit_true_edge->dest))
 {
   new_code = EQ_EXPR;
   reduc_optab = and_optab;




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
new file mode 100644
index 
..f908e5bc60779c148dc95bda3e200383d12b9e1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
@@ -0,0 +1,35 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target bitint } */
+
+__attribute__((noipa)) void
+bar (unsigned long *p)
+

[PATCH] tree-optimization/113078 - conditional subtraction reduction vectorization

2024-01-10 Thread Richard Biener

When if-conversion was changed to use .COND_ADD/SUB for conditional
reduction it was forgotten to update reduction path handling to
canonicalize .COND_SUB to .COND_ADD for vectorizable_reduction
similar to what we do for MINUS_EXPR.  The following adds this
and testcases exercising this at runtime and looking for the
appropriate masked subtraction in the vectorized code on x86.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113078
* tree-vect-loop.cc (check_reduction_path): Canonicalize
.COND_SUB to .COND_ADD.

* gcc.dg/vect/vect-reduc-cond-sub.c: New testcase.
* gcc.target/i386/vect-pr113078.c: Likewise.
---
 .../gcc.dg/vect/vect-reduc-cond-sub.c | 29 +++
 gcc/testsuite/gcc.target/i386/vect-pr113078.c | 16 ++
 gcc/tree-vect-loop.cc |  7 +
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-sub.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr113078.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-sub.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-sub.c
new file mode 100644
index 000..0213a0ab4fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-sub.c
@@ -0,0 +1,29 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+int __attribute__((noipa))
+foo (int n, int* p, int* pi)
+{
+  int sum = 0;
+  for (int i = 0; i != n; i++)
+{
+  if (pi[i] > 0)
+sum -= p[i];
+}
+  return sum;
+}
+
+int p[16] __attribute__((aligned(__BIGGEST_ALIGNMENT__)))
+  = { 7, 3, 1, 4, 9, 10, 14, 7, -10, -55, 20, 9, 1, 2, 0, -17 };
+int pi[16] __attribute__((aligned(__BIGGEST_ALIGNMENT__)))
+  = { 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1 };
+int
+main()
+{
+  check_vect ();
+
+  if (foo (16, p, pi) != 57)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vect-pr113078.c 
b/gcc/testsuite/gcc.target/i386/vect-pr113078.c
new file mode 100644
index 000..e7666054324
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-pr113078.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512vl" } */
+
+int
+foo (int n, int* p, int* pi)
+{
+  int sum = 0;
+  for (int i = 0; i != n; i++)
+{
+  if (pi[i] > 0)
+   sum -= p[i];
+}
+  return sum;
+}
+
+/* { dg-final { scan-assembler-times "vpsub\[^\r\n\]*%k" 2 } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c5b2799be23..1bdad0fbe0f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4116,6 +4116,13 @@ pop:
  if (op.ops[1] == op.ops[opi])
neg = ! neg;
}
+  else if (op.code == IFN_COND_SUB)
+   {
+ op.code = IFN_COND_ADD;
+ /* Track whether we negate the reduction value each iteration.  */
+ if (op.ops[2] == op.ops[opi])
+   neg = ! neg;
+   }
   if (CONVERT_EXPR_CODE_P (op.code)
  && tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
;
-- 
2.35.3

Re: [PATCH]middle-end: correctly identify the edge taken when condition is true. [PR113287]

2024-01-10 Thread Richard Biener

On Wed, 10 Jan 2024, Tamar Christina wrote:

> Hi All,
> 
> The vectorizer needs to know during early break vectorization whether the edge
> that will be taken if the condition is true stays or leaves the loop.
> 
> This is because the code assumes that if you take the true branch you exit the
> loop.  If you don't exit the loop it has to generate a different condition.
> 
> Basically it uses this information to decide whether it's generating a
> "any element" or an "all element" check.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues with --enable-lto --with-build-config=bootstrap-O3
> --enable-checking=release,yes,rtl,extra.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/113287
>   * tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge
>   instead of using BRANCH_EDGE to determine true edge.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/113287
>   * gcc.dg/vect/vect-early-break_100-pr113287.c: New test.
>   * gcc.dg/vect/vect-early-break_99-pr113287.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
> new file mode 100644
> index 
> ..f908e5bc60779c148dc95bda3e200383d12b9e1e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
> @@ -0,0 +1,35 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target bitint } */
> +
> +__attribute__((noipa)) void
> +bar (unsigned long *p)
> +{
> +  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
> +  p[17] = 0x500UL;
> +}
> +
> +__attribute__((noipa)) int
> +foo (void)
> +{
> +  unsigned long r[142];
> +  bar (r);
> +  unsigned long v = ((long) r[0] >> 31);
> +  if (v + 1 > 1)
> +return 1;
> +  for (unsigned long i = 1; i <= 140; ++i)
> +if (r[i] != v)
> +  return 1;
> +  unsigned long w = r[141];
> +  if ((unsigned long) (((long) (w << 60)) >> 60) != v)
> +return 1;
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo () != 1)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
> new file mode 100644
> index 
> ..b92a8a268d803ab1656b4716b1a319ed4edc87a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
> @@ -0,0 +1,32 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target bitint } */
> +
> +_BitInt(998) b;
> +char c;
> +char d;
> +char e;
> +char f;
> +char g;
> +char h;
> +char i;
> +char j;
> +
> +void
> +foo(char y, _BitInt(9020) a, char *r)
> +{
> +  char x = __builtin_mul_overflow_p(a << sizeof(a), y, 0);
> +  x += c + d + e + f + g + h + i + j + b;
> +  *r = x;
> +}
> +
> +int
> +main(void)
> +{
> +  char x;
> +  foo(5, 5, &x);
> +  if (x != 1)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 1333d8934783acdb5277e3a03c2b4021fec4777b..da004b0e9e2696cd2ce358d3b221851c7b60b448
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12870,13 +12870,18 @@ vectorizable_early_exit (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   rewrite conditions to always be a comparison against 0.  To do this it
>   sometimes flips the edges.  This is fine for scalar,  but for vector we
>   then have to flip the test, as we're still assuming that if you take the
> - branch edge that we found the exit condition.  */
> + branch edge that we found the exit condition.  i.e. we need to know 
> whether
> + we are generating a `forall` or an `exist` condition.  */
>auto new_code = NE_EXPR;
>auto reduc_optab = ior_optab;
>auto reduc_op = BIT_IOR_EXPR;
>tree cst = build_zero_cst (vectype);
> +  edge exit_true_edge = EDGE_SUCC (gimple_bb (cond_stmt), 0);
> +  if (exit_true_edge->flags & EDGE_FALSE_VALUE)
> +exit_true_edge = EDGE_SUCC (gimple_bb (cond_stmt), 1);
> +  gcc_assert (exit_true_edge->flags & EDGE_TRUE_VALUE);
>if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> -  BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +  exit_true_edge->dest))
>  {
>new_code = EQ_EXPR;
>reduc_optab = and_optab;
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] libgccjit: Fix ira cost segfault

2024-01-10 Thread David Malcolm

On Thu, 2023-11-16 at 17:28 -0500, Antoni Boucher wrote:
> Hi.
> This patch fixes a segfault that happens when compiling librsvg (more
> specifically its dependency aho-corasick) with rustc_codegen_gcc (bug
> 112575).
> I was not able to create a reproducer for this bug: I'm assuming I
> might need to concat all the reproducers together in the same file in
> order to be able to reproduce the issue.

Hi Antoni

Thanks for the patch; sorry for missing it before.

CCing the i386 maintainers; quoting the patch here to give them
context:

> From e0f4f51682266bc9f507afdb64908ed3695a2f5e Mon Sep 17 00:00:00 2001
> From: Antoni Boucher 
> Date: Thu, 2 Nov 2023 17:18:35 -0400
> Subject: [PATCH] libgccjit: Fix ira cost segfault
> 
> gcc/ChangeLog:
>   PR jit/112575
>   * config/i386/i386-options.cc (ix86_option_override_internal):
>   Cleanup target_attribute_cache.
> ---
>  gcc/config/i386/i386-options.cc | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index df7d24352d1..f596c0fb53c 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3070,6 +3070,12 @@ ix86_option_override_internal (bool main_args_p,
>   = opts->x_flag_unsafe_math_optimizations;
>target_option_default_node = target_option_current_node
>  = build_target_option_node (opts, opts_set);
> +  /* TODO: check if this is the correct location.  It should probably be 
> in
> +  some finalizer function, but I don't
> +  know if there's one.  */
> +  target_attribute_cache[0] = NULL;
> +  target_attribute_cache[1] = NULL;
> +  target_attribute_cache[2] = NULL;
>  }
>  
>if (opts->x_flag_cf_protection != CF_NONE)
> -- 
> 2.42.1
> 

Presumably this happens when there's more than one in-process
invocation of the compiler code (via libgccjit).

> 
> I'm also not sure I put the cleanup in the correct location.
> Is there any finalizer function for target specific code?

As you know (but the i386 maintainers might not), to allow multiple in-
process invocations of the compiler code (for libgccjit) we've been
putting code to reset global state in various {filename_cc}_finalize
functions called from toplev::finalize (see the end of toplev.cc).

There doesn't seem to be any kind of hook at this time for calling
target-specific cleanups from toplev::finalize.

However, as of r14-4003-geaa8e8541349df ggc_common_finalize zeroes
everything marked with GTY.  The array target_attribute_cache does have
a GTY marking, so perhaps as of that commit this patch isn't necessary?

Otherwise, if special-casing this is required, sorry: I'm not familiar
enough with i386-options.cc to know if the patch is correct.

> 
> Thanks to fix this issue.

Dave

[PATCH] middle-end/112740 - vector boolean CTOR expansion issue

2024-01-10 Thread Richard Biener

The optimization to expand uniform boolean vectors by sign-extension
works only for dense masks but it failed to check that.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, I've
checked aarch64 RTL expansion for the testcase.  Will push tomorrow.

Richard.

PR middle-end/112740
* expr.cc (store_constructor): Check the integer vector
mask has a single bit per element before using sign-extension
to expand an uniform vector.

* gcc.dg/pr112740.c: New testcase.
---
 gcc/expr.cc |  8 +---
 gcc/testsuite/gcc.dg/pr112740.c | 19 +++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112740.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index dc816bc20fa..0bf80832fe5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7841,10 +7841,12 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
break;
  }
/* Use sign-extension for uniform boolean vectors with
-  integer modes.  Effectively "vec_duplicate" for bitmasks.  */
-   if (!TREE_SIDE_EFFECTS (exp)
+  integer modes and single-bit mask entries.
+  Effectively "vec_duplicate" for bitmasks.  */
+   if (elt_size == 1
+   && !TREE_SIDE_EFFECTS (exp)
&& VECTOR_BOOLEAN_TYPE_P (type)
-   && SCALAR_INT_MODE_P (mode)
+   && SCALAR_INT_MODE_P (TYPE_MODE (type))
&& (elt = uniform_vector_p (exp))
&& !VECTOR_TYPE_P (TREE_TYPE (elt)))
  {
diff --git a/gcc/testsuite/gcc.dg/pr112740.c b/gcc/testsuite/gcc.dg/pr112740.c
new file mode 100644
index 000..8250cafd2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr112740.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target { int128 } } } */
+/* { dg-options "" } */
+
+typedef unsigned __int128 __attribute__((__vector_size__ (16))) V;
+
+V
+foo (unsigned c, V v)
+{
+  return (V) (c <= v) == 0;
+}
+
+int
+main (void)
+{
+  V x = foo (0, (V) { });
+  if (x[0])
+__builtin_abort ();
+  return 0;
+}
-- 
2.35.3

Re: [PATCH] libgccjit: Fix ira cost segfault

2024-01-10 Thread David Malcolm

On Wed, 2024-01-10 at 09:30 -0500, David Malcolm wrote:
> On Thu, 2023-11-16 at 17:28 -0500, Antoni Boucher wrote:
> > Hi.
> > This patch fixes a segfault that happens when compiling librsvg
> > (more
> > specifically its dependency aho-corasick) with rustc_codegen_gcc
> > (bug
> > 112575).
> > I was not able to create a reproducer for this bug: I'm assuming I
> > might need to concat all the reproducers together in the same file
> > in
> > order to be able to reproduce the issue.
> 
> Hi Antoni
> 
> Thanks for the patch; sorry for missing it before.
> 
> CCing the i386 maintainers; quoting the patch here to give them
> context:

Oops; actually adding them to the CC this time; sorry.

> 
> > From e0f4f51682266bc9f507afdb64908ed3695a2f5e Mon Sep 17 00:00:00
> > 2001
> > From: Antoni Boucher 
> > Date: Thu, 2 Nov 2023 17:18:35 -0400
> > Subject: [PATCH] libgccjit: Fix ira cost segfault
> > 
> > gcc/ChangeLog:
> > PR jit/112575
> > * config/i386/i386-options.cc
> > (ix86_option_override_internal):
> > Cleanup target_attribute_cache.
> > ---
> >  gcc/config/i386/i386-options.cc | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/gcc/config/i386/i386-options.cc
> > b/gcc/config/i386/i386-options.cc
> > index df7d24352d1..f596c0fb53c 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -3070,6 +3070,12 @@ ix86_option_override_internal (bool
> > main_args_p,
> > = opts->x_flag_unsafe_math_optimizations;
> >    target_option_default_node = target_option_current_node
> >  = build_target_option_node (opts, opts_set);
> > +  /* TODO: check if this is the correct location.  It should
> > probably be in
> > +    some finalizer function, but I don't
> > +    know if there's one.  */
> > +  target_attribute_cache[0] = NULL;
> > +  target_attribute_cache[1] = NULL;
> > +  target_attribute_cache[2] = NULL;
> >  }
> >  
> >    if (opts->x_flag_cf_protection != CF_NONE)
> > -- 
> > 2.42.1
> > 
> 
> Presumably this happens when there's more than one in-process
> invocation of the compiler code (via libgccjit).
> 
> > 
> > I'm also not sure I put the cleanup in the correct location.
> > Is there any finalizer function for target specific code?
> 
> As you know (but the i386 maintainers might not), to allow multiple
> in-
> process invocations of the compiler code (for libgccjit) we've been
> putting code to reset global state in various {filename_cc}_finalize
> functions called from toplev::finalize (see the end of toplev.cc).
> 
> There doesn't seem to be any kind of hook at this time for calling
> target-specific cleanups from toplev::finalize.
> 
> However, as of r14-4003-geaa8e8541349df ggc_common_finalize zeroes
> everything marked with GTY.  The array target_attribute_cache does
> have
> a GTY marking, so perhaps as of that commit this patch isn't
> necessary?
> 
> Otherwise, if special-casing this is required, sorry: I'm not
> familiar
> enough with i386-options.cc to know if the patch is correct.
> 
> > 
> > Thanks to fix this issue.
> 
> Dave

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

I need to add these costs for segment load/stores:

/* Generic costs for VLA vector operations.  */
static const scalable_vector_cost generic_vla_vector_cost = {
  {
1, /* int_stmt_cost  */
1, /* fp_stmt_cost  */
1, /* gather_load_cost  */
1, /* scatter_store_cost  */
1, /* vec_to_scalar_cost  */
1, /* scalar_to_vec_cost  */
1, /* permute_cost  */
1, /* align_load_cost  */
1, /* align_store_cost  */
2, /* unalign_load_cost  */
2, /* unalign_store_cost  */
  },
  2, /* vlseg2_vsseg2_permute_cost  */
  2, /* vlseg3_vsseg3_permute_cost  */
  3, /* vlseg4_vsseg4_permute_cost  */
  3, /* vlseg5_vsseg5_permute_cost  */
  4, /* vlseg6_vsseg6_permute_cost  */
  4, /* vlseg7_vsseg7_permute_cost  */
  4, /* vlseg8_vsseg8_permute_cost  */
};

to fix the SLP issues in the following patches.

If you don't allow me to switch to generic vector cost model and tune it.
How can I fix the FAILs of slp-*.c cases ?

Currently, l let all slp-*.c tests all XFAIL which definitely incorrect.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 22:11
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
Hi Juzhe,
 
> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
> 
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256.  Currently, no matter how we adapt cost model,
> we are not able to fix the performance bug since we always use default cost 
> model by default.
> 
> Also, tweak the generic cost model back to default cost model since we have 
> some FAILs in
> current tests.
 
So to recap:
 
- Our current default tune model is rocket which does not have a vector
   cost model.  No other tune model except generic-ooo has one.
 
- We want tune models with no vector cost model to fall back to the
   default vector cost model for now, later possibly the generic RVV
   cost model.
 
- You're seeing inferior codegen for dynamic-lmul2-7.c with our generic
   RVV (not default) vector cost model (built with -mtune=generic-ooo?).
 
Therefore the suggestions is to start over freshly with the default
vector cost model?
 
>  /* Generic costs for VLA vector operations.  */
> @@ -374,13 +374,13 @@ static const scalable_vector_cost 
> generic_vla_vector_cost = {
>  1, /* fp_stmt_cost  */
>  1, /* gather_load_cost  */
>  1, /* scatter_store_cost  */
> -2, /* vec_to_scalar_cost  */
> +1, /* vec_to_scalar_cost  */
>  1, /* scalar_to_vec_cost  */
> -2, /* permute_cost  */
> +1, /* permute_cost  */
>  1, /* align_load_cost  */
>  1, /* align_store_cost  */
> -1, /* unalign_load_cost  */
> -1, /* unalign_store_cost  */
> +2, /* unalign_load_cost  */
> +2, /* unalign_store_cost  */
>},
>  };
 
So is the idea here to just revert the values to the defaults for now
and change them again soon?  And not to keep this as another default
and add others?
 
I'm a bit confused here :)  How does this help?  Can't we continue to
fall back to the default vector cost model when a tune model does not
specify a vector cost model?  If generic-ooo using the generic vector
cost model is the problem, then let's just change it to NULL for now?
 
I suppose at some point we will not want to fall back to the default
vector cost model anymore but always use the generic RVV cost model.
Once we reach the costing part we need to fall back to something
if nothing was defined and generic RVV is supposed to always be better 
than default.
 
Regards
Robin

Re: [PATCH]middle-end: correctly identify the edge taken when condition is true. [PR113287]

2024-01-10 Thread Jakub Jelinek

Hi!

Thanks for fixing it, just testsuite nits.

On Wed, Jan 10, 2024 at 03:22:53PM +0100, Richard Biener wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
> > @@ -0,0 +1,35 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target bitint } */

This test doesn't need bitint effective target.
But relies on long being 64-bit, otherwise e.g.
0x500UL doesn't need to fit or shifting it by 60 is invalid.
So, maybe use lp64 effective target instead.

> > +
> > +__attribute__((noipa)) void
> > +bar (unsigned long *p)
> > +{
> > +  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
> > +  p[17] = 0x500UL;
> > +}
> > +
> > +  if ((unsigned long) (((long) (w << 60)) >> 60) != v)

> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target bitint } */

bitint effective target just ensures there is some _BitInt support,
but not necessarily 998 or 9020 bit ones.

There are some specific precision bitint effective targets (e.g. bitint575),
what I generally use though is just guard the stuff on __BITINT_MAXWIDTH__ >= 
NNN.
Perhaps for this testcase you could
#if __BITINT_MAXWIDTH__ >= 998
typedef _BitInt(998) B998;
#else
typedef long long B998;
#endif
#if __BITINT_MAXWIDTH__ >= 9020
typedef _BitInt(9020) B9020;
#else
typedef long long B9020;
#endif
and use the new typedefs in the rest of the test.

Jakub

RE: [PATCH]middle-end: correctly identify the edge taken when condition is true. [PR113287]

2024-01-10 Thread Tamar Christina

> -Original Message-
> From: Jakub Jelinek 
> Sent: Wednesday, January 10, 2024 2:42 PM
> To: Tamar Christina ; Richard Biener
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: correctly identify the edge taken when 
> condition
> is true. [PR113287]
> 
> Hi!
> 
> Thanks for fixing it, just testsuite nits.
> 
> On Wed, Jan 10, 2024 at 03:22:53PM +0100, Richard Biener wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
> > > @@ -0,0 +1,35 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target bitint } */
> 
> This test doesn't need bitint effective target.
> But relies on long being 64-bit, otherwise e.g.
> 0x500UL doesn't need to fit or shifting it by 60 is invalid.
> So, maybe use lp64 effective target instead.

I was thinking about it. Would using effective-target longlong and
changing the constant to ULL instead work?

Thanks,
Tamar

[PATCH] aarch64: Make ldp/stp pass off by default

2024-01-10 Thread Alex Coplan

As discussed on IRC, this makes the aarch64 ldp/stp pass off by default.  This
should stabilize the trunk and give some time to address the P1 regressions.

Sorry for the breakage.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Alex

gcc/ChangeLog:

* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
to 0.
(-mlate-ldp-fusion): Likewise.
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index ceed5cdb201..c495cb34fbf 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -290,12 +290,12 @@ Target Var(aarch64_track_speculation)
 Generate code to track when the CPU might be speculating incorrectly.
 
 mearly-ldp-fusion
-Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(1)
+Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(0)
 Enable the copy of the AArch64 load/store pair fusion pass that runs before
 register allocation.
 
 mlate-ldp-fusion
-Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(1)
+Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(0)
 Enable the copy of the AArch64 load/store pair fusion pass that runs after
 register allocation.

Re: [PATCH] aarch64: Make ldp/stp pass off by default

2024-01-10 Thread Richard Sandiford

Alex Coplan  writes:
> As discussed on IRC, this makes the aarch64 ldp/stp pass off by default.  This
> should stabilize the trunk and give some time to address the P1 regressions.
>
> Sorry for the breakage.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
>
> Alex
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
>   to 0.
>   (-mlate-ldp-fusion): Likewise.

OK.  Thanks for doing this, and for working through the PRs.

Richard

> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index ceed5cdb201..c495cb34fbf 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -290,12 +290,12 @@ Target Var(aarch64_track_speculation)
>  Generate code to track when the CPU might be speculating incorrectly.
>  
>  mearly-ldp-fusion
> -Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(1)
> +Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(0)
>  Enable the copy of the AArch64 load/store pair fusion pass that runs before
>  register allocation.
>  
>  mlate-ldp-fusion
> -Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(1)
> +Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(0)
>  Enable the copy of the AArch64 load/store pair fusion pass that runs after
>  register allocation.
>

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2024-01-10 Thread David Malcolm

On Mon, 2023-12-11 at 09:06 -0700, Jeff Law wrote:
> 
> 
> On 11/20/23 16:54, David Malcolm wrote:
> > On Mon, 2023-11-20 at 16:38 -0700, Jeff Law wrote:
> > > 
> > > 
> > > On 11/20/23 15:46, David Malcolm wrote:
> > > > On Fri, 2023-11-17 at 14:09 -0700, Jeff Law wrote:
> > > > > 
> > > > > 
> > > > > On 11/17/23 14:08, Antoni Boucher wrote:
> > > > > > In contrast with the other frontends, libgccjit can be
> > > > > > executed
> > > > > > multiple times in a row in the same process.
> > > > > Yup.  I'm aware of that.  Even so calling init_emit_once more
> > > > > than
> > > > > one
> > > > > time still seems wrong.
> > > > 
> > > > There are two approaches we follow when dealing with state
> > > > stored
> > > > in
> > > > global variables:
> > > > (a) clean it all up via the various functions called from
> > > > toplev::finalize
> > > > (b) make it effectively constant once initialized, with
> > > > idempotent
> > > > initialization
> > > > 
> > > > The multiple in-process executions of libgccjit could pass in
> > > > different
> > > > code-generation options.  Does the RTL-initialization logic
> > > > depend
> > > > anywhere on flags passed in, because if so, we're probably
> > > > going to
> > > > need to re-run the initialization.
> > > The INIT_EXPANDERS code would be the most concerning as it's
> > > implementation is totally hidden and provided by the target. I
> > > wouldn't
> > > be at all surprised if one or more do something evil in there. 
> > > That
> > > probably needs to be evaluated on a target by target basis.
> > > 
> > > The rest really do look like single init, even in a JIT
> > > environment
> > > kinds of things -- ie all the shared constants in RTL.
> > 
> > I think Antoni's patch can we described as implementing "single
> > init",
> > in that it ensures that at least part of init_emit_once is single
> > init.
> > 
> > Is the posted patch OK by you, or do we need to rework things, and
> > if
> > the latter, what would be the goal?
> What I'm struggling with is perhaps a problem of naming. 
> Conceptually 
> "init_emit_once" in my mind should be called once and only once.   
> If I 
> read Antoni's change correctly, we call it more than once.

I'm afraid we're already doing that, Antoni's proposed patch doesn't
change that.

In toplev::finalize we try to clean up as much global state as possible
to allow toplev::main to be runnable again.  From that point of view
"once" could mean "once within an invocation of toplev::main" (if that
makes it feel any less gross).

>   That just
> feels conceptually wrong -- add to it the opaqueness of
> INIT_EXPANDERS 
> and it feels even more wrong -- we don't know what's going on behind
> the 
> scenes in there.

Given these various concerns, I think we should go with approach (a)
from above: add an emit_rtl_cc::finalizer function, and have it reset
all of the globals in emit-rtl.cc.  That seems like the most clear way
to handle this awkward situation.

Dave

Re: [PATCH]middle-end: correctly identify the edge taken when condition is true. [PR113287]

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 02:45:41PM +, Tamar Christina wrote:
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Wednesday, January 10, 2024 2:42 PM
> > To: Tamar Christina ; Richard Biener
> > 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH]middle-end: correctly identify the edge taken when 
> > condition
> > is true. [PR113287]
> > 
> > Hi!
> > 
> > Thanks for fixing it, just testsuite nits.
> > 
> > On Wed, Jan 10, 2024 at 03:22:53PM +0100, Richard Biener wrote:
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
> > > > @@ -0,0 +1,35 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-require-effective-target bitint } */
> > 
> > This test doesn't need bitint effective target.
> > But relies on long being 64-bit, otherwise e.g.
> > 0x500UL doesn't need to fit or shifting it by 60 is invalid.
> > So, maybe use lp64 effective target instead.
> 
> I was thinking about it. Would using effective-target longlong and
> changing the constant to ULL instead work?

You mean vect_long_long ?  Sure, if you change all the longs in the
test to long longs and UL to ULL...

Jakub

Re: [PATCH V2] RISC-V: Minor tweak dynamic cost model

2024-01-10 Thread Robin Dapp

LGTM.

Regards
 Robin

Re: [PATCH] aarch64: Make ldp/stp pass off by default

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 02:47:29PM +, Richard Sandiford wrote:
> Alex Coplan  writes:
> > As discussed on IRC, this makes the aarch64 ldp/stp pass off by default.  
> > This
> > should stabilize the trunk and give some time to address the P1 regressions.
> >
> > Sorry for the breakage.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> >
> > Alex
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
> > to 0.
> > (-mlate-ldp-fusion): Likewise.
> 
> OK.  Thanks for doing this, and for working through the PRs.

Thanks.  I'll repeat what I said on IRC, it is ok to reenable it say within
2-3 weeks if the fusion related P1s are addressed by then (or even earlier
if it is addressed then).

Jakub

Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread Robin Dapp

On 1/10/24 15:40, 钟居哲 wrote:
> I need to add these costs for segment load/stores:
> 
> /* Generic costs for VLA vector operations.  */
> static const scalable_vector_cost generic_vla_vector_cost = {
>   {
>     1,/* int_stmt_cost  */
>     1,/* fp_stmt_cost  */
>     1,/* gather_load_cost  */
>     1,/* scatter_store_cost  */
>     1,/* vec_to_scalar_cost  */
>     1,/* scalar_to_vec_cost  */
>     1,/* permute_cost  */
>     1,/* align_load_cost  */
>     1,/* align_store_cost  */
>     2,/* unalign_load_cost  */
>     2,/* unalign_store_cost  */
>   },
>   2,/* vlseg2_vsseg2_permute_cost  */
>   2,/* vlseg3_vsseg3_permute_cost  */
>   3,/* vlseg4_vsseg4_permute_cost  */
>   3,/* vlseg5_vsseg5_permute_cost  */
>   4,/* vlseg6_vsseg6_permute_cost  */
>   4,/* vlseg7_vsseg7_permute_cost  */
>   4,/* vlseg8_vsseg8_permute_cost  */
> };
> 
> to fix the SLP issues in the following patches.
> 
> If you don't allow me to switch to generic vector cost model and tune it.
> How can I fix the FAILs of slp-*.c cases ?
> 
> Currently, l let all slp-*.c tests all XFAIL which definitely incorrect.

Of course we don't want those XFAILs.  It's not a matter of "allowing"
or not but rather that I'd like to understand the reasoning.  The patch
itself seems reasonable to me apart from not really getting the
intention.

Your main point seems to be

> +  const cpu_vector_cost *costs = tune_param->vec_costs;
> +  if (!costs)
> +return &generic_vector_cost
and that is fine.  What's not clear is whether changing the actual
costs is a temporary thing or whether it is supposed to be another
fallback.  If they are going to be changed anyway, why do we need
to revert to the default model now?  As discussed yesterday
increased permute costs and vec_to_scalar costs make sense, to first
order.  Is that because of dynamic-lmul2-7.c?

Generally we need to make the costs dependent on the
type or mode of course, just as we started to do with the latencies.
Permute is particularly sensitive as you already gathered.

Regards
 Robin

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.

I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can be 
recovered.
However, it makes other tests failed
It's complicated story

So, I'd rather set it as default cost and switch to it.
Then, we can tune the cost gradually, not only fix the issues we faced (e.g. 
SHA256), but also no matter how we 
tweak the costs later, it won't hurt the codegen of current tests.

It's true that: we can keep current cost model 
default_builtin_vectorization_cost
And tweak generic cost model, for exampl, add testcase for SHA256 and add 
-mtune=generic-ooo to test it.
But the question, how do you know whether there is a regression on current 
testsuite with -mtune=generic-ooo ?

Note that we can tweak generic vector cost model to fix SHA256 issue easily, 
but we should also make sure 
we don't have regressions on current testsuite with the new cost model.  So I 
switch the cost model.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-01-10 23:04
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
On 1/10/24 15:40, 钟居哲 wrote:
> I need to add these costs for segment load/stores:
> 
> /* Generic costs for VLA vector operations.  */
> static const scalable_vector_cost generic_vla_vector_cost = {
>   {
> 1,/* int_stmt_cost  */
> 1,/* fp_stmt_cost  */
> 1,/* gather_load_cost  */
> 1,/* scatter_store_cost  */
> 1,/* vec_to_scalar_cost  */
> 1,/* scalar_to_vec_cost  */
> 1,/* permute_cost  */
> 1,/* align_load_cost  */
> 1,/* align_store_cost  */
> 2,/* unalign_load_cost  */
> 2,/* unalign_store_cost  */
>   },
>   2,/* vlseg2_vsseg2_permute_cost  */
>   2,/* vlseg3_vsseg3_permute_cost  */
>   3,/* vlseg4_vsseg4_permute_cost  */
>   3,/* vlseg5_vsseg5_permute_cost  */
>   4,/* vlseg6_vsseg6_permute_cost  */
>   4,/* vlseg7_vsseg7_permute_cost  */
>   4,/* vlseg8_vsseg8_permute_cost  */
> };
> 
> to fix the SLP issues in the following patches.
> 
> If you don't allow me to switch to generic vector cost model and tune it.
> How can I fix the FAILs of slp-*.c cases ?
> 
> Currently, l let all slp-*.c tests all XFAIL which definitely incorrect.

Of course we don't want those XFAILs.  It's not a matter of "allowing"
or not but rather that I'd like to understand the reasoning.  The patch
itself seems reasonable to me apart from not really getting the
intention.

Your main point seems to be

> +  const cpu_vector_cost *costs = tune_param->vec_costs;
> +  if (!costs)
> +return &generic_vector_cost
and that is fine.  What's not clear is whether changing the actual
costs is a temporary thing or whether it is supposed to be another
fallback.  If they are going to be changed anyway, why do we need
to revert to the default model now?  As discussed yesterday
increased permute costs and vec_to_scalar costs make sense, to first
order.  Is that because of dynamic-lmul2-7.c?

Generally we need to make the costs dependent on the
type or mode of course, just as we started to do with the latencies.
Permute is particularly sensitive as you already gathered.

Regards
Robin

Re: [PATCH] libgccjit: Fix GGC segfault when using -flto

2024-01-10 Thread David Malcolm

On Mon, 2023-12-11 at 19:20 -0500, Antoni Boucher wrote:
> I'm not sure how to do this. I tried the following commands, but this
> fails even on master:
> 
> ../../gcc/configure --enable-host-shared --enable-
> languages=c,jit,c++,fortran,objc,lto --enable-checking=release --
> disable-werror --prefix=/opt/gcc
> 
> make bootstrap -j24
> make -k check -j24
> 
> From what I can understand, the unexpected failures are in g++:
> 
> === g++ Summary ===
> 
> # of expected passes72790
> # of unexpected failures1
> # of expected failures  1011
> # of unsupported tests  3503
> 
> === g++ Summary ===
> 
> # of expected passes4750
> # of unexpected failures27
> # of expected failures  16
> # of unsupported tests  43
> 
> 
> Am I doing something wrong?

I normally do a pair of bootstrap/tests: a "control" build with a
pristine copy of the source tree, and an "experiment" build containing
the patch(s) of interest, then compare the results.  FWIW given that
each one takes 2 hours on my machine, I normally just do one control
build on a Monday, rebase all my working copies to that revision, and
then use that control build throughout the week for comparison when
testing patches.

I can have a go at testing an updated patch if you like; presumably the
latest version is this one:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638841.html
right?

Dave

> 
> On Fri, 2023-12-01 at 12:49 -0500, David Malcolm wrote:
> > On Thu, 2023-11-30 at 17:13 -0500, Antoni Boucher wrote:
> > > Here's the updated patch.
> > > The failure was due to the test being in the test array while it
> > > should
> > > not have been there since it changes the context.
> > 
> > Thanks for the updated patch.
> > 
> > Did you do a full bootstrap and regression test with this one, or
> > do
> > you want me to?
> > 
> > Dave
> > 
>

Re: [PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled

2024-01-10 Thread Richard Earnshaw





On 08/01/2024 17:21, Matthieu Longo wrote:

Hi,

Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is 
specified (via -march option, or target pragma) whereas it is supposed 
to be tested before including arm_bf16.h (as specified in ACLE document: 
https://arm-software.github.io/acle/main/acle.html#arm_bf16h).


gcc/ChangeLog:

     * config/arm/arm-c.cc (arm_cpu_builtins): define 
__ARM_FEATURE_BF16

     * config/arm/arm.h: define TARGET_BF16

Ok for master ?

Matthieu
index 
2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813 
100644

--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   arm_arch_cde_coproc);

   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+ TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
  TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
  TARGET_BF16_SIMD);
-  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
- TARGET_BF16_FP || TARGET_BF16_SIMD);

Why is the definition of __ARM_BF16_FORMAT_ALTERNATIVE changed?  And why 
is there explanation of that change?  It doesn't seem directly related 
to $subject.


R.

 }

 void

Re: [PATCH] libgccjit: Fix GGC segfault when using -flto

2024-01-10 Thread Antoni Boucher

On Wed, 2024-01-10 at 10:19 -0500, David Malcolm wrote:
> On Mon, 2023-12-11 at 19:20 -0500, Antoni Boucher wrote:
> > I'm not sure how to do this. I tried the following commands, but
> > this
> > fails even on master:
> > 
> > ../../gcc/configure --enable-host-shared --enable-
> > languages=c,jit,c++,fortran,objc,lto --enable-checking=release --
> > disable-werror --prefix=/opt/gcc
> > 
> > make bootstrap -j24
> > make -k check -j24
> > 
> > From what I can understand, the unexpected failures are in g++:
> > 
> > === g++ Summary ===
> > 
> > # of expected passes72790
> > # of unexpected failures1
> > # of expected failures  1011
> > # of unsupported tests  3503
> > 
> > === g++ Summary ===
> > 
> > # of expected passes4750
> > # of unexpected failures27
> > # of expected failures  16
> > # of unsupported tests  43
> > 
> > 
> > Am I doing something wrong?
> 
> I normally do a pair of bootstrap/tests: a "control" build with a
> pristine copy of the source tree, and an "experiment" build
> containing
> the patch(s) of interest, then compare the results.  FWIW given that
> each one takes 2 hours on my machine, I normally just do one control
> build on a Monday, rebase all my working copies to that revision, and
> then use that control build throughout the week for comparison when
> testing patches.
> 
> I can have a go at testing an updated patch if you like; presumably
> the
> latest version is this one:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638841.html
> right?

Thanks. I would appreciate if you do it.
Yes, this is the latest patch.

> 
> Dave
> 
> 
> 
> > 
> > On Fri, 2023-12-01 at 12:49 -0500, David Malcolm wrote:
> > > On Thu, 2023-11-30 at 17:13 -0500, Antoni Boucher wrote:
> > > > Here's the updated patch.
> > > > The failure was due to the test being in the test array while
> > > > it
> > > > should
> > > > not have been there since it changes the context.
> > > 
> > > Thanks for the updated patch.
> > > 
> > > Did you do a full bootstrap and regression test with this one, or
> > > do
> > > you want me to?
> > > 
> > > Dave
> > > 
> > 
>

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-10 Thread Richard Earnshaw





On 08/01/2024 16:07, Roger Sayle wrote:


Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently
has a large number of FAILs in libatomic (regressions since last time I
attempted this).  The failure mode is related to IFUNC handling with the
file tas_8_2_.o containing an unresolved reference to the function
libat_test_and_set_1_i2.

Bearing in mind I've no idea what's going on, the following one line
change, to build tas_1_2_.o when building tas_8_2_.o, resolves the problem
for me and restores the libatomic testsuite to 44 expected passes and 5
unsupported tests [from 22 unexpected failures and 22 unresolved testcases].

If this looks like the correct fix, I'm not confident with rebuilding
Makefile.in with correct version of automake, so I'd very much appreciate
it if someone/the reviewer/mainainer could please check this in for me.
Thanks in advance.


2024-01-08  Roger Sayle  

libatomic/ChangeLog
 * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
 * Makefile.in: Regenerate.


Roger
--



Hi Roger,

I don't really understand all this make foo :( so I'm not sure if this 
is the right fix either.  If this is, as you say, a regression, have you 
been able to track down when it first started to occur?  That might also 
help me to understand what changed to cause this.


Perhaps we should have a PR for this, to make tracking the fixes easier.

R.

Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread Robin Dapp

> Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.
> 
> I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can 
> be recovered.
> However, it makes other tests failed
> It's complicated story

Ok, makes sense.  So the plan seems to be:

 (1) Fall back to the generic cost model if the tune model didn't
 specify one, i.e. make sure we always use the generic cost
 model rather than the default one.
 (2) Change this generic (fallback) cost model so we don't have
 regressions on the current trunk, as it's now always used.
 (3) Adjust it piece by piece.

Sure this makes sense and is also what I had in mind.

> It's true that: we can keep current cost model 
> default_builtin_vectorization_cost
> And tweak generic cost model, for exampl, add testcase for SHA256 and add 
> -mtune=generic-ooo to test it.

> But the question, how do you know whether there is a regression on current 
> testsuite with -mtune=generic-ooo ?

That's a valid question and not easily solved.  Ideally the
generic model is generic enough to be a good base for most
uarchs.  Then the uarchs would only do minor adjustments and
have their own tests for that while the bulk of the generic
tests would still pass.

Generally, normal tests should be pretty independent of the
cost model with the exception of checking instruction sequences.
Those that are not should either specify their own -mtune and/or
disable scheduling.  Of course that's easier said than done...

Back to the patch:

I would suggest either renaming generic_vl[sa]_vector_cost to
rvv_vl[sa]_vector_cost (I find generic a bit too close to default)
and/or add comments that those are supposed to be the vector cost models
used by default if no other cost model was specified.

After understanding (2) of the plan the patch is OK to me with
that changed.

Regards
 Robin

[PATCH V2] RISC-V: Switch RVV cost model.

2024-01-10 Thread Juzhe-Zhong

This patch is preparing patch for the following cost model tweak.

Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.

The reason we want to switch to generic vector cost model is the default
cost model generates inferior codegen for various benchmarks.

For example, PR113247, we have performance bug that we end up having over 70%
performance drop of SHA256.  Currently, no matter how we adapt cost model,
we are not able to fix the performance bug since we always use default cost 
model by default.

Also, tweak the generic cost model back to default cost model since we have 
some FAILs in
current tests.

After this patch, we (me an Robin) can work on cost model tunning together to 
improve performane
in various benchmarks.

Tested on both RV32 and RV64, ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv.cc (get_common_costs): Switch RVV cost model.
(get_vector_costs): Ditto.
(riscv_builtin_vectorization_cost): Ditto.

---
 gcc/config/riscv/riscv.cc | 144 --
 1 file changed, 75 insertions(+), 69 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..cca01fd54d9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -352,48 +352,49 @@ const enum reg_class 
riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   VD_REGS, VD_REGS,VD_REGS,VD_REGS,
 };
 
-/* Generic costs for VLS vector operations.   */
-static const common_vector_cost generic_vls_vector_cost = {
+/* RVV costs for VLS vector operations.   */
+static const common_vector_cost rvv_vls_vector_cost = {
   1, /* int_stmt_cost  */
   1, /* fp_stmt_cost  */
   1, /* gather_load_cost  */
   1, /* scatter_store_cost  */
-  2, /* vec_to_scalar_cost  */
+  1, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
-  2, /* permute_cost  */
+  1, /* permute_cost  */
   1, /* align_load_cost  */
   1, /* align_store_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
+  2, /* unalign_load_cost  */
+  2, /* unalign_store_cost  */
 };
 
-/* Generic costs for VLA vector operations.  */
-static const scalable_vector_cost generic_vla_vector_cost = {
+/* RVV costs for VLA vector operations.  */
+static const scalable_vector_cost rvv_vla_vector_cost = {
   {
 1, /* int_stmt_cost  */
 1, /* fp_stmt_cost  */
 1, /* gather_load_cost  */
 1, /* scatter_store_cost  */
-2, /* vec_to_scalar_cost  */
+1, /* vec_to_scalar_cost  */
 1, /* scalar_to_vec_cost  */
-2, /* permute_cost  */
+1, /* permute_cost  */
 1, /* align_load_cost  */
 1, /* align_store_cost  */
-1, /* unalign_load_cost  */
-1, /* unalign_store_cost  */
+2, /* unalign_load_cost  */
+2, /* unalign_store_cost  */
   },
 };
 
-/* Generic costs for vector insn classes.  */
+/* Generic costs for vector insn classes.  It is supposed to be the vector cost
+   models used by default if no other cost model was specified.  */
 static const struct cpu_vector_cost generic_vector_cost = {
-  1,   /* scalar_int_stmt_cost  */
-  1,   /* scalar_fp_stmt_cost  */
-  1,   /* scalar_load_cost  */
-  1,   /* scalar_store_cost  */
-  3,   /* cond_taken_branch_cost  */
-  1,   /* cond_not_taken_branch_cost  */
-  &generic_vls_vector_cost, /* vls  */
-  &generic_vla_vector_cost, /* vla */
+  1,   /* scalar_int_stmt_cost  */
+  1,   /* scalar_fp_stmt_cost  */
+  1,   /* scalar_load_cost  */
+  1,   /* scalar_store_cost  */
+  3,   /* cond_taken_branch_cost  */
+  1,   /* cond_not_taken_branch_cost  */
+  &rvv_vls_vector_cost, /* vls  */
+  &rvv_vla_vector_cost, /* vla */
 };
 
 /* Costs to use when optimizing for rocket.  */
@@ -10372,11 +10373,10 @@ riscv_frame_pointer_required (void)
   return riscv_save_frame_pointer && !crtl->is_leaf;
 }
 
-/* Return the appropriate common costs for vectors of type VECTYPE.  */
+/* Return the appropriate common costs according to VECTYPE from COSTS.  */
 static const common_vector_cost *
-get_common_costs (tree vectype)
+get_common_costs (const cpu_vector_cost *costs, tree vectype)
 {
-  const cpu_vector_cost *costs = tune_param->vec_costs;
   gcc_assert (costs);
 
   if (vectype && riscv_v_ext_vls_mode_p (TYPE_MODE (vectype)))
@@ -10384,78 +10384,84 @@ get_common_costs (tree vectype)
   return costs->vla;
 }
 
+/* Return the CPU vector costs according to -mtune if tune info has non-NULL
+   vector cost.  Otherwide, return the default generic vector costs.  */
+static const cpu_vector_cost *
+get_vector_costs ()
+{
+  const cpu_vector_cost *costs = tune_param->vec_costs;
+  if (!costs)
+return &generic_vector_cost;
+  return costs;
+}
+
 /* Implement targetm.vectorize.builtin_vector

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

>> (1) Fall back to the generic cost model if the tune model didn't
>> (specify one, i.e. make sure we always use the generic cost
>> ( model rather than the default one.
>> ((2) Change this generic (fallback) cost model so we don't have
>> (regressions on the current trunk, as it's now always used.
>> ((3) Adjust it piece by piece.

>> (Sure this makes sense and is also what I had in mind.

Yes, that's my plan.

Send in V2:
[PATCH V2] RISC-V: Switch RVV cost model. (gnu.org)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 23:36
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
> Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.
> 
> I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can 
> be recovered.
> However, it makes other tests failed
> It's complicated story
 
Ok, makes sense.  So the plan seems to be:
 
(1) Fall back to the generic cost model if the tune model didn't
 specify one, i.e. make sure we always use the generic cost
 model rather than the default one.
(2) Change this generic (fallback) cost model so we don't have
 regressions on the current trunk, as it's now always used.
(3) Adjust it piece by piece.
 
Sure this makes sense and is also what I had in mind.
 
> It's true that: we can keep current cost model 
> default_builtin_vectorization_cost
> And tweak generic cost model, for exampl, add testcase for SHA256 and add 
> -mtune=generic-ooo to test it.
 
> But the question, how do you know whether there is a regression on current 
> testsuite with -mtune=generic-ooo ?
 
That's a valid question and not easily solved.  Ideally the
generic model is generic enough to be a good base for most
uarchs.  Then the uarchs would only do minor adjustments and
have their own tests for that while the bulk of the generic
tests would still pass.
 
Generally, normal tests should be pretty independent of the
cost model with the exception of checking instruction sequences.
Those that are not should either specify their own -mtune and/or
disable scheduling.  Of course that's easier said than done...
 
Back to the patch:
 
I would suggest either renaming generic_vl[sa]_vector_cost to
rvv_vl[sa]_vector_cost (I find generic a bit too close to default)
and/or add comments that those are supposed to be the vector cost models
used by default if no other cost model was specified.
 
After understanding (2) of the plan the patch is OK to me with
that changed.
 
Regards
Robin

Re: [PATCH V2] RISC-V: Switch RVV cost model.

2024-01-10 Thread Robin Dapp

LGTM.

Regards
 Robin

[PATCH] AArch64: Reassociate CONST in address expressions [PR112573]

2024-01-10 Thread Wilco Dijkstra

GCC tends to optimistically create CONST of globals with an immediate offset. 
However it is almost always better to CSE addresses of globals and add immediate
offsets separately (the offset could be merged later in single-use cases).
Splitting CONST expressions with an index in aarch64_legitimize_address fixes 
part
of PR112573.

Passes regress & bootstrap, OK for commit?

gcc/ChangeLog:
PR target/112573
* config/aarch64/aarch64.cc (aarch64_legitimize_address): Reassociate 
badly
formed CONST expressions.

gcc/testsuite/ChangeLog:
PR target/112573
* gcc.target/aarch64/pr112573.c: Add new test.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
0909b319d16b9a1587314bcfda0a8112b42a663f..9fbc8b62455f48baec533d3dd5e2d9ea995d5a8f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12608,6 +12608,20 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
  not to split a CONST for some forms of address expression, otherwise
  it will generate sub-optimal code.  */
 
+  /* First split X + CONST (base, offset) into (base + X) + offset.  */
+  if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x, 1)) == CONST)
+{
+  poly_int64 offset;
+  rtx base = strip_offset_and_salt (XEXP (x, 1), &offset);
+
+  if (offset.is_constant ())
+  {
+ base = expand_binop (Pmode, add_optab, base, XEXP (x, 0),
+  NULL_RTX, true, OPTAB_DIRECT);
+ x = plus_constant (Pmode, base, offset);
+  }
+}
+
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
   rtx base = XEXP (x, 0);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr112573.c 
b/gcc/testsuite/gcc.target/aarch64/pr112573.c
new file mode 100644
index 
..be04c0ca86ad9f33975a85f497549955d6d1236d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr112573.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-section-anchors" } */
+
+char a[100];
+
+void f1 (int x, int y)
+{
+  *((a + y) + 3) = x;
+  *((a + y) + 2) = x;
+  *((a + y) + 1) = x;
+  *((a + y) + 0) = x;
+}
+
+/* { dg-final { scan-assembler-times "strb" 4 } } */
+/* { dg-final { scan-assembler-times "adrp" 1 } } */

Re: [PATCH V2 2/4][RFC] RISC-V: Add vector related reservations

2024-01-10 Thread Robin Dapp

Hi Edwin,

> This patch copies the vector reservations from generic-ooo.md and
> inserts them into generic.md and sifive.md. Creates new vector crypto related
> insn reservations.

In principle, the changes look good to me but I wonder if we could
split off the vector parts from generic-ooo into their own md file
(generic-vector-ooo or so?) and include this in the others?  Or is
there a reason why you decided against this?

A recurring question in vector cost model discussions seems to be how
to handle the situation when a tune model does not specify a "vector tune
model".  The problem exists for the scheduler descriptions and the
normal vector cost model (and possibly insn_costs as well).

Juzhe just implemented a fallback so we always use the "generic rvv" cost
model.  Your changes would be in the same vein and if we could split
them off then we'd be able to easier exchange one scheduler descriptions
for another one (say if one tune model wants to use an in-order vector
model).

There is also still the question of whether to set all latencies
to 1 for an OOO core but this question should be settled separately
as soon as we have proper hardware benchmark results.  If so we
would probably rename generic-vector-ooo into
generic-vector-in-order ;)

Regards
 Robin

Re: [PATCH] Update documents for fcf-protection=

2024-01-10 Thread H.J. Lu

On Tue, Jan 9, 2024 at 6:02 PM liuhongt  wrote:
>
> After r14-2692-g1c6231c05bdcca, the option is defined as EnumSet and
> -fcf-protection=branch won't unset any others bits since they're in
> different groups. So to override -fcf-protection, an explicit
> -fcf-protection=none needs to be added and then with
> -fcf-protection=XXX
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:

We should mention:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113039

> * doc/invoke.texi (fcf-protection=): Update documents.
> ---
>  gcc/doc/invoke.texi | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 68d1f364ac0..d1e6fafb98c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17734,6 +17734,9 @@ function.  The value @code{full} is an alias for 
> specifying both
>  @code{branch} and @code{return}. The value @code{none} turns off
>  instrumentation.
>
> +To override @option{-fcf-protection}, @option{-fcf-protection=none}
> +needs to be explicitly added and then with @option{-fcf-protection=xxx}.
> +
>  The value @code{check} is used for the final link with link-time
>  optimization (LTO).  An error is issued if LTO object files are
>  compiled with different @option{-fcf-protection} values.  The
> --
> 2.31.1
>


-- 
H.J.

[committed] testsuite: Add testcase for already fixed PR [PR112734]

2024-01-10 Thread Jakub Jelinek

Hi!

This test was already fixed by r14-6051 aka PR112770 fix.

Tested on x86_64-linux, committed to trunk as obvious.

2024-01-10  Jakub Jelinek  

PR tree-optimization/112734
* gcc.dg/bitint-64.c: New test.

--- gcc/testsuite/gcc.dg/bitint-64.c.jj 2024-01-10 17:17:08.438466886 +0100
+++ gcc/testsuite/gcc.dg/bitint-64.c2024-01-10 17:15:20.431019135 +0100
@@ -0,0 +1,16 @@
+/* PR tree-optimization/112734 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -fnon-call-exceptions -ftrapv" } */
+
+#if __BITINT_MAXWIDTH__ >= 128
+_BitInt(128) out;
+#else
+int out;
+#endif
+
+int
+main ()
+{
+  _BitInt(8) q[1];
+  out -= 1;
+}

Jakub

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-10 Thread Jeff Law





On 1/10/24 06:35, Richard Biener wrote:


I think x86 maintainers could opt to disable the pass - so it would
be opt-out.  It's reasonable to expect them to fix the backend given
there's nothing really wrong with the new pass, it just does
something that wasn't done before at that point?
That's been both Richard S and my experience so far -- it's exposing 
latent target issues (which we're pushing forward as independent fixes) 
as well as a few latent issues in various generic RTL bits (which I'll 
leave to Richard S to submit).  Nothing major though.


I'm a bit disappointed it's not going forward for gcc-14, but understand 
and will support the decision.


Jeff

Re: [PATCH] RISC-V: Also handle sign extension in branch costing

2024-01-10 Thread Maciej W. Rozycki

On Tue, 9 Jan 2024, Jeff Law wrote:

> >   Depending on how you look at it you may qualify this as a bug fix (for
> > the commit referred; it's surely rare enough a case I missed in original
> > testing) or a missed optimisation.  Either way it's a narrow-scoped very
> > small change, almost an obviously correct one.  I'll be very happy to get
> > it off my plate now, but if it has to wait for GCC 15, I'll accept the
> > decision.
> > 
> >   OK to apply then or shall I wait?
> OK to apply.

 Thank you for your review, I have now pushed this change.

  Maciej

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-10 Thread Jeff Law





On 1/10/24 06:01, Richard Sandiford wrote:


So to get an idea for expectations: would it be a requirement that a
GCC 15 submission is enabled unconditionally and all known issues in
the ports fixed?
I don't think we need to fix those latent port issues as a hard 
requirement.  I try to balance the complexity of the fix, overall state 
of the port, value of having the port test the feature, etc.


So something like the mn103 or ephiphany where the fix was clear after a 
bit of debugging, we just fix.  Others like the long standing c6x faults 
or the rl78 assembler complaints which we're also seeing without the 
late-combine work and which don't have clearly identifiable fixes I'd 
say we leave to the port maintainers (if any) to address.


So I tend to want to understand a regression reported by the tester, 
then we determine a reasonable course of action.  I don't think that a 
no regression policy on all those old ports is a reasonable requirement.


Jeff

[committed] RISC-V/testsuite: Fix comment termination in pr105314.c

2024-01-10 Thread Maciej W. Rozycki

Add terminating `/' character missing from one of the test harness 
command clauses in pr105314.c.  This causes no issue with compilation 
owing to another comment immediately following, but would cause a:

pr105314.c:3:1: warning: "/*" within comment [-Wcomment]

message if warnings were enabled.

gcc/testsuite/
* gcc.target/riscv/pr105314.c: Fix comment termination.
---
Hi,

 Committed as obvious.

  Maciej
---
 gcc/testsuite/gcc.target/riscv/pr105314.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-test-riscv-pr105314-comment.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
===
--- gcc.orig/gcc/testsuite/gcc.target/riscv/pr105314.c
+++ gcc/gcc/testsuite/gcc.target/riscv/pr105314.c
@@ -1,5 +1,5 @@
 /* PR rtl-optimization/105314 */
-/* { dg-do compile } *
+/* { dg-do compile } */
 /* { dg-options "-O2" } */
 /* { dg-final { scan-assembler-not "\tbeq\t" } } */

Re: [PATCH] config: delete unused CYG_AC_PATH_LIBERTY macro

2024-01-10 Thread Jeff Law





On 1/9/24 19:04, Mike Frysinger wrote:

Nothing uses this, so delete it to avoid confusion.

config/ChangeLog:

* acinclude.m4 (CYG_AC_PATH_LIBERTY): Delete.

OK
jeff

[PATCH][testsuite]: Make bitint early vect test more accurate

2024-01-10 Thread Tamar Christina

Hi All,

This changes the tests I committed for PR113287 to also
run on targets that don't support bitint.

Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues and tests run on both.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.
* gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
index 
f908e5bc60779c148dc95bda3e200383d12b9e1e..05fb84e1d36d4d05f39e48e41fc70703074ecabd
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
@@ -1,28 +1,29 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target bitint } */
+/* { dg-require-effective-target vect_long_long } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 __attribute__((noipa)) void
-bar (unsigned long *p)
+bar (unsigned long long *p)
 {
-  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
-  p[17] = 0x500UL;
+  __builtin_memset (p, 0, 142 * sizeof (unsigned long long));
+  p[17] = 0x500ULL;
 }
 
 __attribute__((noipa)) int
 foo (void)
 {
-  unsigned long r[142];
+  unsigned long long r[142];
   bar (r);
-  unsigned long v = ((long) r[0] >> 31);
+  unsigned long long v = ((long) r[0] >> 31);
   if (v + 1 > 1)
 return 1;
-  for (unsigned long i = 1; i <= 140; ++i)
+  for (unsigned long long i = 1; i <= 140; ++i)
 if (r[i] != v)
   return 1;
-  unsigned long w = r[141];
-  if ((unsigned long) (((long) (w << 60)) >> 60) != v)
+  unsigned long long w = r[141];
+  if ((unsigned long long) (((long) (w << 60)) >> 60) != v)
 return 1;
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
index 
b92a8a268d803ab1656b4716b1a319ed4edc87a3..fb99ef39402ee7b3c6c564e7db5f5543a5f0c2e0
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
@@ -1,9 +1,18 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target bitint } */
+/* { dg-require-effective-target vect_long_long } */
 
-_BitInt(998) b;
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#if __BITINT_MAXWIDTH__ >= 9020
+typedef _BitInt(9020) B9020;
+typedef _BitInt(998) B998;
+#else
+typedef long long B998;
+typedef long long B9020;
+#endif
+
+B998 b;
 char c;
 char d;
 char e;
@@ -14,7 +23,7 @@ char i;
 char j;
 
 void
-foo(char y, _BitInt(9020) a, char *r)
+foo(char y, B9020 a, char *r)
 {
   char x = __builtin_mul_overflow_p(a << sizeof(a), y, 0);
   x += c + d + e + f + g + h + i + j + b;
@@ -26,7 +35,12 @@ main(void)
 {
   char x;
   foo(5, 5, &x);
+#if __BITINT_MAXWIDTH__ >= 9020
   if (x != 1)
 __builtin_abort();
+#else
+  if (x != 0)
+__builtin_abort();
+#endif
   return 0;
 }




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
index 
f908e5bc60779c148dc95bda3e200383d12b9e1e..05fb84e1d36d4d05f39e48e41fc70703074ecabd
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
@@ -1,28 +1,29 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target bitint } */
+/* { dg-require-effective-target vect_long_long } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 __attribute__((noipa)) void
-bar (unsigned long *p)
+bar (unsigned long long *p)
 {
-  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
-  p[17] = 0x500UL;
+  __builtin_memset (p, 0, 142 * sizeof (unsigned long long));
+  p[17] = 0x500ULL;
 }
 
 __attribute__((noipa)) int
 foo (void)
 {
-  unsigned long r[142];
+  unsigned long long r[142];
   bar (r);
-  unsigned long v = ((long) r[0] >> 31);
+  unsigned long long v = ((long) r[0] >> 31);
   if (v + 1 > 1)
 return 1;
-  for (unsigned long i = 1; i <= 140; ++i)
+  for (unsigned long long i = 1; i <= 140; ++i)
 if (r[i] != v)
   return 1;
-  unsigned long w = r[141];
-  if ((unsigned long) (((long) (w << 60)) >> 60) != v)
+  unsigned long long w = r[141];
+  if ((unsigned long long) (((long) (w << 60)) >> 60) != v)
 return 1;
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
index 
b92a

Re: [PATCH][testsuite]: Make bitint early vect test more accurate

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 04:55:00PM +, Tamar Christina wrote:
>   PR tree-optimization/113287
>   * gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.

This part is ok.

> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
> @@ -1,9 +1,18 @@
>  /* { dg-add-options vect_early_break } */
>  /* { dg-require-effective-target vect_early_break } */
> -/* { dg-require-effective-target vect_int } */
> -/* { dg-require-effective-target bitint } */
> +/* { dg-require-effective-target vect_long_long } */
>  
> -_BitInt(998) b;
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 9020
> +typedef _BitInt(9020) B9020;
> +typedef _BitInt(998) B998;
> +#else
> +typedef long long B998;
> +typedef long long B9020;
> +#endif
> +
> +B998 b;
>  char c;
>  char d;
>  char e;
> @@ -14,7 +23,7 @@ char i;
>  char j;
>  
>  void
> -foo(char y, _BitInt(9020) a, char *r)
> +foo(char y, B9020 a, char *r)
>  {
>char x = __builtin_mul_overflow_p(a << sizeof(a), y, 0);

But I'm afraid I have no idea how is this supposed to work on
non-bitint targets or where __BITINT_MAXWIDTH__ is smaller than 9020.
There is no loop at all there, so what should be vectorized?

I'd say introduce 
# Return 1 if the target supports _BitInt(65535), 0 otherwise.

proc check_effective_target_bitint65535 { } {
return [check_no_compiler_messages bitint65535 object {
_BitInt (2) a = 1wb;
unsigned _BitInt (65535) b = 0uwb;
} "-std=c23"]
}

after bitint575 effective target and use it in the test.

Jakub

Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-10 Thread Patrick Palka

Congratulations on landing this impressive work in GCC 14!

On Sun, 7 Jan 2024, waffl3x wrote:

> Bootstrapped and tested on x86_64-linux with no regressions.
> 
> I'm considering this finished, I have CWG2586 working but I have not
> included it in this version of the patch. I was not happy with the
> amount of work I had done on it. I will try to get it finished before
> we get cut off, and I'm pretty sure I can. I just don't want to risk
> missing the boat for the whole patch just for that.
> 
> There aren't too many changes from v7, it's mostly just cleaned up.
> There are a few though, so do take a look, if there's anything severe I
> can rush to fix it if necessary.
> 
> That's all, hopefully all is good, fingers crossed.
> 
> Alex

Re: [PATCH v2] RISC-V: T-HEAD: Add support for the XTheadInt ISA extension

2024-01-10 Thread Christoph Müllner

On Tue, Jan 9, 2024 at 6:59 PM Jeff Law  wrote:
>
>
>
> On 11/17/23 00:33, Jin Ma wrote:
> > The XTheadInt ISA extension provides acceleration interruption
> > instructions as defined in T-Head-specific:
> > * th.ipush
> > * th.ipop
> >
> > Ref:
> > https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
> >   (th_int_get_save_adjustment): Likewise.
> >   (th_int_adjust_cfi_prologue): Likewise.
> >   * config/riscv/riscv.cc (TH_INT_INTERRUPT): New macro.
> >   (riscv_expand_prologue): Add the processing of XTheadInt.
> >   (riscv_expand_epilogue): Likewise.
> >   * config/riscv/riscv.md: New unspec.
> >   * config/riscv/thead.cc (BITSET_P): New macro.
> >   * config/riscv/thead.md (th_int_push): New pattern.
> >   (th_int_pop): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadint-push-pop.c: New test.
> Thanks for the ping earlier today.  I've looked at this patch repeatedly
> over the last few weeks, but never enough to give it a full review.
>
>
> > diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
> > index 2babfafb23c..4d6e16c0edc 100644
> > --- a/gcc/config/riscv/thead.md
> > +++ b/gcc/config/riscv/thead.md
>
> > +(define_insn "th_int_pop"
> > +  [(unspec_volatile [(const_int 0)] UNSPECV_XTHEADINT_POP)
> > +   (clobber (reg:SI RETURN_ADDR_REGNUM))
> > +   (clobber (reg:SI T0_REGNUM))
> > +   (clobber (reg:SI T1_REGNUM))
> > +   (clobber (reg:SI T2_REGNUM))
> > +   (clobber (reg:SI A0_REGNUM))
> > +   (clobber (reg:SI A1_REGNUM))
> > +   (clobber (reg:SI A2_REGNUM))
> > +   (clobber (reg:SI A3_REGNUM))
> > +   (clobber (reg:SI A4_REGNUM))
> > +   (clobber (reg:SI A5_REGNUM))
> > +   (clobber (reg:SI A6_REGNUM))
> > +   (clobber (reg:SI A7_REGNUM))
> > +   (clobber (reg:SI T3_REGNUM))
> > +   (clobber (reg:SI T4_REGNUM))
> > +   (clobber (reg:SI T5_REGNUM))
> > +   (clobber (reg:SI T6_REGNUM))
> > +   (return)]
> > +  "TARGET_XTHEADINT && !TARGET_64BIT"
> > +  "th.ipop"
> > +  [(set_attr "type"  "ret")
> > +   (set_attr "mode"  "SI")])
> I probably would have gone with a load type since its the loads that are
> most likely to interact existing code in the pipeline.  But I doubt it
> really matters in practice.
>
>
> OK for the trunk.  Thanks for your patience.

I've retested this locally (no regressions), completed the ChangeLog
in the commit message and committed.

Thanks,
Christoph

Re: [PATCH] AArch64: Reassociate CONST in address expressions [PR112573]

2024-01-10 Thread Richard Sandiford

Wilco Dijkstra  writes:
> GCC tends to optimistically create CONST of globals with an immediate offset.
> However it is almost always better to CSE addresses of globals and add 
> immediate
> offsets separately (the offset could be merged later in single-use cases).
> Splitting CONST expressions with an index in aarch64_legitimize_address fixes 
> part
> of PR112573.
>
> Passes regress & bootstrap, OK for commit?
>
> gcc/ChangeLog:
> PR target/112573
> * config/aarch64/aarch64.cc (aarch64_legitimize_address): Reassociate 
> badly
> formed CONST expressions.
>
> gcc/testsuite/ChangeLog:
> PR target/112573
> * gcc.target/aarch64/pr112573.c: Add new test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 0909b319d16b9a1587314bcfda0a8112b42a663f..9fbc8b62455f48baec533d3dd5e2d9ea995d5a8f
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12608,6 +12608,20 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
> */, machine_mode mode)
>   not to split a CONST for some forms of address expression, otherwise
>   it will generate sub-optimal code.  */
>
> +  /* First split X + CONST (base, offset) into (base + X) + offset.  */
> +  if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x, 1)) == CONST)
> +{
> +  poly_int64 offset;
> +  rtx base = strip_offset_and_salt (XEXP (x, 1), &offset);

This should be just strip_offset, so that we don't lose the salt
during optimisation.

> +
> +  if (offset.is_constant ())

I'm not sure this is really required.  Logically the same thing
would apply to SVE, although admittedly:

/* { dg-do compile } */
/* { dg-options "-O2 -fno-section-anchors" } */

#include 

char a[2048];

void f1 (svint8_t x, int y)
{
  *(svint8_t *)((a + y) + svcntb() * 3) = x;
  *(svint8_t *)((a + y) + svcntb() * 2) = x;
  *(svint8_t *)((a + y) + svcntb() * 1) = x;
  *(svint8_t *)((a + y) + 0) = x;
}

/* { dg-final { scan-assembler-times "strb" 4 } } */
/* { dg-final { scan-assembler-times "adrp" 1 } } */

doesn't get arranged into the same form for other reasons (and already
produces somewhat decent code).

The patch is OK from my POV without the offset.is_constant check and
with s/strip_offset_and_salt/strip_offset/.  Please say if there's
a reason to keep the offset check though.

Thanks,
Richard

> +  {
> + base = expand_binop (Pmode, add_optab, base, XEXP (x, 0),
> +  NULL_RTX, true, OPTAB_DIRECT);
> + x = plus_constant (Pmode, base, offset);
> +  }
> +}
> +
>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>  {
>rtx base = XEXP (x, 0);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr112573.c 
> b/gcc/testsuite/gcc.target/aarch64/pr112573.c
> new file mode 100644
> index 
> ..be04c0ca86ad9f33975a85f497549955d6d1236d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr112573.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-section-anchors" } */
> +
> +char a[100];
> +
> +void f1 (int x, int y)
> +{
> +  *((a + y) + 3) = x;
> +  *((a + y) + 2) = x;
> +  *((a + y) + 1) = x;
> +  *((a + y) + 0) = x;
> +}
> +
> +/* { dg-final { scan-assembler-times "strb" 4 } } */
> +/* { dg-final { scan-assembler-times "adrp" 1 } } */

RE: [PATCH][testsuite]: Make bitint early vect test more accurate

2024-01-10 Thread Tamar Christina

> But I'm afraid I have no idea how is this supposed to work on
> non-bitint targets or where __BITINT_MAXWIDTH__ is smaller than 9020.
> There is no loop at all there, so what should be vectorized?
> 

Yeah It was giving an unresolved and I didn't notice in diff.

> I'd say introduce
> # Return 1 if the target supports _BitInt(65535), 0 otherwise.
> 
> proc check_effective_target_bitint65535 { } {
> return [check_no_compiler_messages bitint65535 object {
> _BitInt (2) a = 1wb;
> unsigned _BitInt (65535) b = 0uwb;
> } "-std=c23"]
> }
> 
> after bitint575 effective target and use it in the test.
>

Sure, how's:

--

This changes the tests I committed for PR113287 to also
run on targets that don't support bitint.

Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues and
tests run on both.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (check_effective_target_bitint65535): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.
* gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise.
* lib/target-supports.exp (bitint, bitint128, bitint575, bitint65535):
Document them.

---inline copy of patch ---

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
bd62b21f3b725936eae34c22159ccbc9db40873f..6fbb102f9971d54d66d77dcee8f10a1b57aa6e5a
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2864,6 +2864,18 @@ Target supports Graphite optimizations.
 @item fixed_point
 Target supports fixed-point extension to C.
 
+@item bitint
+Target supports _BitInt(N).
+
+@item bitint128
+Target supports _BitInt(128).
+
+@item bitint575
+Target supports _BitInt(575).
+
+@item bitint65535
+Target supports _BitInt(65535).
+
 @item fopenacc
 Target supports OpenACC via @option{-fopenacc}.
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
index 
f908e5bc60779c148dc95bda3e200383d12b9e1e..05fb84e1d36d4d05f39e48e41fc70703074ecabd
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_100-pr113287.c
@@ -1,28 +1,29 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target bitint } */
+/* { dg-require-effective-target vect_long_long } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 __attribute__((noipa)) void
-bar (unsigned long *p)
+bar (unsigned long long *p)
 {
-  __builtin_memset (p, 0, 142 * sizeof (unsigned long));
-  p[17] = 0x500UL;
+  __builtin_memset (p, 0, 142 * sizeof (unsigned long long));
+  p[17] = 0x500ULL;
 }
 
 __attribute__((noipa)) int
 foo (void)
 {
-  unsigned long r[142];
+  unsigned long long r[142];
   bar (r);
-  unsigned long v = ((long) r[0] >> 31);
+  unsigned long long v = ((long) r[0] >> 31);
   if (v + 1 > 1)
 return 1;
-  for (unsigned long i = 1; i <= 140; ++i)
+  for (unsigned long long i = 1; i <= 140; ++i)
 if (r[i] != v)
   return 1;
-  unsigned long w = r[141];
-  if ((unsigned long) (((long) (w << 60)) >> 60) != v)
+  unsigned long long w = r[141];
+  if ((unsigned long long) (((long) (w << 60)) >> 60) != v)
 return 1;
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
index 
b92a8a268d803ab1656b4716b1a319ed4edc87a3..e141e8a9277f89527e8aff809fe101fdd91a4c46
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_99-pr113287.c
@@ -1,7 +1,8 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-require-effective-target bitint } */
+/* { dg-require-effective-target bitint65535 } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 _BitInt(998) b;
 char c;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
a9c76e0b290b19fd07574805bb2b87c86a5e9cf7..1ddcb3926a8d549b6a17b61e29e1d9836ecce897
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3850,6 +3850,15 @@ proc check_effective_target_bitint575 { } {
 } "-std=c23"]
 }
 
+# Return 1 if the target supports _BitInt(65535), 0 otherwise.
+
+proc check_effective_target_bitint65535 { } {
+return [check_no_compiler_messages bitint65535 object {
+_BitInt (2) a = 1wb;
+unsigned _BitInt (65535) b = 0uwb;
+} "-std=c23"]
+}
+
 # Return 1 if the target supports compiling decimal floating point,
 # 0 otherwise.



rb18146.patch
Description: rb18146.patch

Re: [PATCH v4] AArch64: Cleanup memset expansion

2024-01-10 Thread Richard Sandiford

Wilco Dijkstra  writes:
> Hi Richard,
>
>>> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
>>
>> Since this isn't (AFAIK) a standard macro, there doesn't seem to be
>> any need to put it in the header file.  It could just go at the head
>> of aarch64.cc instead.
>
> Sure, I've moved it in v4.
>
>>> +  if (len <= 24 || (aarch64_tune_params.extra_tuning_flags
>>> +   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
>>> +set_max = 16;
>>
>> I think we should take the tuning parameter into account when applying
>> the MAX_SET_SIZE limit for -Os.  Shouldn't it be 48 rather than 96 in
>> that case?  (Alternatively, I suppose it would make sense to ignore
>> the param for -Os, although we don't seem to do that elsewhere.)
>
> That tune is only used by an obsolete core. I ran the memcpy and memset
> benchmarks from Optimized Routines on xgene-1 with and without LDP/STP.
> There is no measurable penalty for using LDP/STP. I'm not sure why it was
> ever added given it does not do anything useful. I'll post a separate patch
> to remove it to reduce the maintenance overhead.

Is that enough to justify removing it though?  It sounds from:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-June/500017.html

like the problem was in more balanced code, rather than memory-limited
things like memset/memcpy.

But yeah, I'm not sure if the intuition was supported by numbers
in the end.  If SPEC also shows no change then we can probably drop it
(unless someone objects).

Let's leave this patch until that's resolved though, since I think as it
stands the patch does leave -Os -mtune=xgene1 worse off (bigger code).
Handling the tune in the meantime would also be OK.

BTW, just noticed, but...

>
> Cheers,
> Wilco
>
>
> Here is v4 (move MAX_SET_SIZE definition to aarch64.cc):
>
> Cleanup memset implementation.  Similar to memcpy/memmove, use an offset and
> bytes throughout.  Simplify the complex calculations when optimizing for size
> by using a fixed limit.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64.cc (MAX_SET_SIZE): New define.
> (aarch64_progress_pointer): Remove function.
> (aarch64_set_one_block_and_progress_pointer): Simplify and clean up.
> (aarch64_expand_setmem): Clean up implementation, use byte offsets,
> simplify size calculation.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> a5a6b52730d6c5013346d128e89915883f1707ae..62f4eee429c1c5195d54604f1d341a8a5a499d89
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -101,6 +101,10 @@
>  /* Defined for convenience.  */
>  #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
>
> +/* Maximum bytes set for an inline memset expansion.  With -Os use 3 STP
> +   and 1 MOVI/DUP (same size as a call).  */
> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
> +
>  /* Flags that describe how a function shares certain architectural state
> with its callers.
>
> @@ -26321,15 +26325,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount)
> next, amount);
>  }
>
> -/* Return a new RTX holding the result of moving POINTER forward by the
> -   size of the mode it points to.  */
> -
> -static rtx
> -aarch64_progress_pointer (rtx pointer)
> -{
> -  return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
> -}
> -
>  typedef auto_vec, 12> copy_ops;
>
>  /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
> @@ -26484,45 +26479,21 @@ aarch64_expand_cpymem (rtx *operands, bool 
> is_memmove)
>return true;
>  }
>
> -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
> -   SRC is a register we have created with the duplicated value to be set.  */
> +/* Set one block of size MODE at DST at offset OFFSET to value in SRC.  */
>  static void
> -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
> -   machine_mode mode)
> +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)
>  {
> -  /* If we are copying 128bits or 256bits, we can do that straight from
> - the SIMD register we prepared.  */
> -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> -{
> -  mode = GET_MODE (src);
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, mode, 0);
> -  /* Emit the memset.  */
> -  emit_insn (aarch64_gen_store_pair (*dst, src, src));
> -
> -  /* Move the pointers forward.  */
> -  *dst = aarch64_move_pointer (*dst, 32);
> -  return;
> -}
> -  if (known_eq (GET_MODE_BITSIZE (mode), 128))
> +  /* Emit explict store pair instructions for 32-byte writes.  */
> +  if (known_eq (GET_MODE_SIZE (mode), 32))
>  {
> -  /* "Cast" the *dst to the correct mode.  */
> -  *dst = adjust_address (*dst, GET_MODE (src), 0);
> -  /* Emit the memset.  */
> -  emit_move_insn (*dst, src)

Re: [PATCH V2 2/4][RFC] RISC-V: Add vector related reservations

2024-01-10 Thread Edwin Lu


Hi Robin,
On 1/10/2024 8:00 AM, Robin Dapp wrote:

Hi Edwin,


This patch copies the vector reservations from generic-ooo.md and
inserts them into generic.md and sifive.md. Creates new vector crypto related
insn reservations.


In principle, the changes look good to me but I wonder if we could
split off the vector parts from generic-ooo into their own md file
(generic-vector-ooo or so?) and include this in the others?  Or is
there a reason why you decided against this?

I forgot we could include other md files into another file (I'll double 
check that there isn't anything fancy for including other pipelines), 
but I also thought that eventually all the tunes would have their own 
vector cost pipelines. Since all the pipelines should be tuned to their 
cost model, they would be different anyway. If it would be simpler for 
now, I could separate the files out.



A recurring question in vector cost model discussions seems to be how
to handle the situation when a tune model does not specify a "vector tune
model".  The problem exists for the scheduler descriptions and the
normal vector cost model (and possibly insn_costs as well).

Juzhe just implemented a fallback so we always use the "generic rvv" cost
model.  Your changes would be in the same vein and if we could split
them off then we'd be able to easier exchange one scheduler descriptions
for another one (say if one tune model wants to use an in-order vector
model).

I think I'm getting a bit confused. Is there a reason why we would want 
to exchange scheduler descriptions like the example you provided? I'm 
just thinking why a in-order model would want to use an ooo vector model 
and vice versa. Please correct me if I got the wrong idea.


I also want to double check, isn't forcing all typed instructions to be 
part of a dfa pipeline in effect removing a situation where a tune model 
does not specify a "vector tune model"? At least from my testing with 
the assert statement, I get ICEs when trying to run the testsuite 
without the vector tune model even on gc.



There is also still the question of whether to set all latencies
to 1 for an OOO core but this question should be settled separately
as soon as we have proper hardware benchmark results.  If so we
would probably rename generic-vector-ooo into
generic-vector-in-order ;)

Regards
  Robin



I agree the latencies can be tweaked after we get those benchmarks :)

Edwin

Re: [Bug libstdc++/112477] [13/14 Regression] Assignment of value-initialized iterators differs from value-initialization

2024-01-10 Thread François Dumont

libstdc++: [_GLIBCXX_DEBUG] Fix assignment of value-initialized iterator 
[PR112477]


Now that _M_Detach do not reset iterator _M_version value we need to 
reset it when
the iterator is attached to a new sequence. Even if this sequencer is 
null like when
assigning a value-initialized iterator. In this case _M_version shall be 
reset to 0.


libstdc++-v3/ChangeLog:

    PR libstdc++/112477
    * src/c++11/debug.cc
    (_Safe_iterator_base::_M_attach): Reset _M_version to 0 if 
attaching to null

    sequence.
    (_Safe_iterator_base::_M_attach_single): Likewise.
    (_Safe_local_iterator_base::_M_attach): Likewise.
    (_Safe_local_iterator_base::_M_attach_single): Likewise.
    * testsuite/23_containers/map/debug/112477.cc: New test case.

Tested under Linux x64 _GLIBCXX_DEBUG mode.

Ok to commit and backport to gcc 13 ?

François

On 09/01/2024 22:47, fdumont at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112477

François Dumont  changed:

What|Removed |Added

Assignee|unassigned at gcc dot gnu.org  |fdumont at gcc dot 
gnu.org

--- Comment #8 from François Dumont  ---
Hi
I'm going to have a look but if you wish to contribute do not hesitate.
Thanks for the report.
diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index bb0d0db6679..cb2cbf9d312 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -437,6 +437,8 @@ namespace __gnu_debug
_M_version = _M_sequence->_M_version;
_M_sequence->_M_attach(this, __constant);
   }
+else
+  _M_version = 0;
   }
 
   void
@@ -452,6 +454,8 @@ namespace __gnu_debug
_M_version = _M_sequence->_M_version;
_M_sequence->_M_attach_single(this, __constant);
   }
+else
+  _M_version = 0;
   }
 
   void
@@ -528,6 +532,8 @@ namespace __gnu_debug
_M_version = _M_sequence->_M_version;
_M_get_container()->_M_attach_local(this, __constant);
   }
+else
+  _M_version = 0;
   }
 
   void
@@ -543,6 +549,8 @@ namespace __gnu_debug
_M_version = _M_sequence->_M_version;
_M_get_container()->_M_attach_local_single(this, __constant);
   }
+else
+  _M_version = 0;
   }
 
   void
diff --git a/libstdc++-v3/testsuite/23_containers/map/debug/112477.cc 
b/libstdc++-v3/testsuite/23_containers/map/debug/112477.cc
new file mode 100644
index 000..bde613b8905
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/map/debug/112477.cc
@@ -0,0 +1,20 @@
+// { dg-do run { target c++11 } }
+// { dg-require-debug-mode "" }
+
+// PR libstdc++/112477
+
+#include 
+
+int main()
+{
+  using M = std::map;
+  using I = M::iterator;
+
+  M map{ {1, 1}, {2, 2} };
+
+  I it1 = map.begin();
+  it1 = I{};
+
+  I it2{};
+  (void)(it1 == it2);
+}

Re: [committed 2/2] libstdc++: Implement P2918R0 "Runtime format strings II" for C++26

2024-01-10 Thread Daniel Krügler

Am Mo., 8. Jan. 2024 um 03:25 Uhr schrieb Jonathan Wakely :
>
> Tested x86_64-linux and aarch64-linux. Pushed to trunk.
>
> -- >8 --
>
> This adds std::runtime_format for C++26. These new overloaded functions
> enhance the std::format API so that it isn't necessary to use the less
> ergonomic std::vformat and std::make_format_args (which are meant to be
> implementation details). This was approved in Kona 2023 for C++26.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/format (__format::_Runtime_format_string): Define
> new class template.
> (basic_format_string): Add non-consteval constructor for runtime
> format strings.
> (runtime_format): Define new function for C++26.
> * testsuite/std/format/runtime_format.cc: New test.
> ---
>  libstdc++-v3/include/std/format   | 22 +++
>  .../testsuite/std/format/runtime_format.cc| 37 +++
>  2 files changed, 59 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/std/format/runtime_format.cc
>
> diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
> index 160efa5155c..b3b5a0bbdbc 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -81,6 +81,9 @@ namespace __format
>
>template
>  using __format_context = basic_format_context<_Sink_iter<_CharT>, 
> _CharT>;
> +
> +  template
> +struct _Runtime_format_string { basic_string_view<_CharT> _M_str; };
>  } // namespace __format
>  /// @endcond
>
> @@ -115,6 +118,11 @@ namespace __format
> consteval
> basic_format_string(const _Tp& __s);
>
> +  [[__gnu__::__always_inline__]]
> +  basic_format_string(__format::_Runtime_format_string<_CharT>&& __s)
> +  : _M_str(__s._M_str)
> +  { }
> +

My understanding is that this constructor should be noexcept according to N4971.

>[[__gnu__::__always_inline__]]
>constexpr basic_string_view<_CharT>
>get() const noexcept
> @@ -133,6 +141,20 @@ namespace __format
>= basic_format_string...>;
>  #endif
>
> +#if __cplusplus > 202302L
> +  [[__gnu__::__always_inline__]]
> +  inline __format::_Runtime_format_string
> +  runtime_format(string_view __fmt)
> +  { return {__fmt}; }
> +
> +#ifdef _GLIBCXX_USE_WCHAR_T
> +  [[__gnu__::__always_inline__]]
> +  inline __format::_Runtime_format_string
> +  runtime_format(wstring_view __fmt)
> +  { return {__fmt}; }
> +#endif
> +#endif // C++26
> +

These runtime_format overloads should also be noexcept.

- Daniel

Re: [PATCH][testsuite]: Make bitint early vect test more accurate

2024-01-10 Thread Jakub Jelinek

On Wed, Jan 10, 2024 at 06:07:16PM +, Tamar Christina wrote:
> This changes the tests I committed for PR113287 to also
> run on targets that don't support bitint.
> 
> Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues and
> tests run on both.
> 
> Ok for master?

Yes, thanks.

> gcc/ChangeLog:
> 
>   * doc/sourcebuild.texi (check_effective_target_bitint65535): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/113287
>   * gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.
>   * gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise.
>   * lib/target-supports.exp (bitint, bitint128, bitint575, bitint65535):
>   Document them.

Jakub

[RFC] aarch64: Add support for __BitInt

2024-01-10 Thread Andre Vieira (lists)


Hi,

This patch is still work in progress, but posting to show failure with 
bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's 
because the idx (3) is out of range for the __BitInt(135) with a 
limb_prec of 64.


I hacked gcc locally to work around this issue and still have one 
outstanding failure, so will look to resolve that failure before posting 
a new version.


Kind Regards,
Andrediff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
a5a6b52730d6c5013346d128e89915883f1707ae..15fb0ece5256f25c2ca8bb5cb82fc61488d0393e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6534,7 +6534,7 @@ aarch64_return_in_memory_1 (const_tree type)
   machine_mode ag_mode;
   int count;
 
-  if (!AGGREGATE_TYPE_P (type)
+  if (!(AGGREGATE_TYPE_P (type) || TREE_CODE (type) == BITINT_TYPE)
   && TREE_CODE (type) != COMPLEX_TYPE
   && TREE_CODE (type) != VECTOR_TYPE)
 /* Simple scalar types always returned in registers.  */
@@ -6618,6 +6618,10 @@ aarch64_function_arg_alignment (machine_mode mode, 
const_tree type,
 
   gcc_assert (TYPE_MODE (type) == mode);
 
+  if (TREE_CODE (type) == BITINT_TYPE
+  && int_size_in_bytes (type) > 16)
+return GET_MODE_ALIGNMENT (TImode);
+
   if (!AGGREGATE_TYPE_P (type))
 {
   /* The ABI alignment is the natural alignment of the type, without
@@ -21773,6 +21777,11 @@ aarch64_composite_type_p (const_tree type,
   if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE))
 return true;
 
+  if (type
+  && TREE_CODE (type) == BITINT_TYPE
+  && int_size_in_bytes (type) > 16)
+return true;
+
   if (mode == BLKmode
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
@@ -28265,6 +28274,29 @@ aarch64_excess_precision (enum excess_precision_type 
type)
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement TARGET_C_BITINT_TYPE_INFO.
+   Return true if _BitInt(N) is supported and fill its details into *INFO.  */
+bool
+aarch64_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (n <= 8)
+info->limb_mode = QImode;
+  else if (n <= 16)
+info->limb_mode = HImode;
+  else if (n <= 32)
+info->limb_mode = SImode;
+  else
+info->limb_mode = DImode;
+
+  if (n > 128)
+info->abi_limb_mode = TImode;
+  else
+info->abi_limb_mode = info->limb_mode;
+  info->big_endian = TARGET_BIG_END;
+  info->extended = false;
+  return true;
+}
+
 /* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
scheduled for speculative execution.  Reject the long-running division
and square-root instructions.  */
@@ -30374,6 +30406,9 @@ aarch64_run_selftests (void)
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION aarch64_excess_precision
 
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info
+
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN aarch64_expand_builtin
 
diff --git a/libgcc/config/aarch64/t-softfp b/libgcc/config/aarch64/t-softfp
index 
2e32366f891361e2056c680b2e36edb1871c7670..4302ad52eb881825d0fb65b9ebd21031781781f5
 100644
--- a/libgcc/config/aarch64/t-softfp
+++ b/libgcc/config/aarch64/t-softfp
@@ -4,7 +4,8 @@ softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
 softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
-floatdibf floatundibf floattibf floatuntibf
+floatdibf floatundibf floattibf floatuntibf \
+fixtfbitint floatbitinttf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes

Re: [PATCH V2 2/4][RFC] RISC-V: Add vector related reservations

2024-01-10 Thread Robin Dapp

> Since all the pipelines should be tuned to their cost model, they
> would be different anyway. If it would be simpler for now, I could
> separate the files out.
> I think I'm getting a bit confused. Is there a reason why we would
> want to exchange scheduler descriptions like the example you
> provided? I'm just thinking why a in-order model would want to use an
> ooo vector model and vice versa. Please correct me if I got the wrong
> idea.

Yeah, the confusion is understandable as it's all in flow and several
things I mentioned are artifacts of us not yet being stabilized (or
actually having hard data to base our decisions on).

Usually, once a uarch has settled there is no reason to exchange
anything, just smaller tweaks might be done.  I was more thinking of
the near to mid-term future where larger changes like ripping out
one thing and using another one altogether might still happen.

Regarding out of order vs in order - for in-order pipelines we will
always want to get latencies right.  For out of order it is a balancing
act (proper latencies often mean more spilling and the processor will
reorder correctly anyway).

So you're mostly right that the argument is not very strong as soon
as we really know what to do and not to do.

> I also want to double check, isn't forcing all typed instructions to
> be part of a dfa pipeline in effect removing a situation where a tune
> model does not specify a "vector tune model"? At least from my
> testing with the assert statement, I get ICEs when trying to run the
> testsuite without the vector tune model even on gc.

There are (at least) three parts of the "tune model":
 - vector cost model, specifying the cost of generic vector operations,
   not necessarily corresponding to an insn
 - insn cost, specifying the cost of an individual insn, usually close
   to latency but sometimes also "complexity" or other things.
 - insn latency and other hardware scheduler properties.

We can leave out any of those which will make us fall back to default
values.  Even if we forced a scheduler description we could still have
the default fallback for the other two and generate unfavorable code
as a result.

However, this is of course not desirable and we will soon have a
reasonable vector cost model that corresponds to the non-uarch
specific properties of the vector spec.  Once this is in place
we will also want a somewhat generic vector scheduler description
that goes hand in hand with that.  Despite the name, the vector
part of generic-ooo could be used for in-order vector uarchs and
we might want to define a different description for out-of-order
uarchs.  That's a separate discussion but at least for that
contingency it would make sense to easily interchange the scheduler
description ;)

Regards
 Robin

1 2 >

1 - 100 of 177 matches

Mail list logo