date:20250207


On 2/7/25 4:04 PM, Simon Martin wrote:

Hi Jason,

On 7 Feb 2025, at 14:21, Jason Merrill wrote:


On 2/6/25 3:05 PM, Simon Martin wrote:

Hi Jason,

On 6 Feb 2025, at 16:48, Jason Merrill wrote:


On 2/5/25 2:21 PM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 21:23, Jason Merrill wrote:


On 2/4/25 3:03 PM, Jason Merrill wrote:

On 2/4/25 11:45 AM, Simon Martin wrote:

On 4 Feb 2025, at 17:17, Jason Merrill wrote:


On 2/4/25 10:56 AM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 16:39, Jason Merrill wrote:


On 1/15/25 9:56 AM, Jason Merrill wrote:

On 1/15/25 7:24 AM, Simon Martin wrote:

Hi,

On 14 Jan 2025, at 23:31, Jason Merrill wrote:


On 1/14/25 2:13 PM, Simon Martin wrote:

On 10 Jan 2025, at 19:10, Andrew Pinski wrote:

On Fri, Jan 10, 2025 at 3:18 AM Simon Martin

wrote:


We currently accept the following invalid code (EDG and



MSVC
do
as
well)


clang does too:
https://github.com/llvm/llvm-project/issues/121706 .




Note it might be useful if a testcase with multiply `*`
is
included



too:
```
struct A {
         A ();
};
```

Thanks, makes sense to add those. Done in the attached
updated
revision,
successfully tested on x86_64-pc-linux-gnu.



+/* Check that it's OK to declare a function at ID_LOC



with
the
indicated TYPE,
+   TYPE_QUALS and DECLARATOR.  SFK indicates the kind
of
special
function (if
+   any) that this function is.  OPTYPE is the type
given
in
a
conversion
      operator declaration, or the class type for
a
constructor/destructor.
      Returns the actual return type of the
function;
that
may
be
different
      than TYPE if an error occurs, or for
certain
special
functions.
*/
@@ -12361,8 +12362,19 @@
check_special_function_return_type
(special_function_kind sfk,
       tree
type,
       tree
optype,
       int
type_quals,
+    const
cp_declarator
*declarator,
+    location_t
id_loc,


id_loc should be the same as declarator->id_loc?

You’re right.


       const
location_t*
locations)
       {
+  /* If TYPE is unspecified, DECLARATOR, if set, should
not
represent a pointer
+ or a reference type.  */
+  if (type == NULL_TREE
+  && declarator
+  && (declarator->kind == cdk_pointer
+  || declarator->kind == cdk_reference))
+    error_at (id_loc, "expected unqualified-id before
%qs
token",
+  declarator->kind == cdk_pointer ? "*"
:
"&");


...and id_loc isn't the location of the ptr-operator, it's



the



location of the identifier, so this indicates the wrong
column.
I
think using declarator->id_loc makes sense, just not
pretending
it's
the location of the *.

Good catch, thanks.


Let's give diagnostics more like the others later in the
function
instead of trying to emulate cp_parser_error.

Makes sense. This is what the updated patch does,
successfully
tested on
x86_64-pc-linux-gnu. OK for GCC 16?


OK.


Does this also fix 118304?  If so, let's go ahead and apply
it
to
GCC
15.

I have checked just now, and we still ICE for 118304’s



testcase
with
that fix.


Why doesn't the preeexisting

type = void_type_node;

in check_special_function_return_type fix the return type and
avoid



the ICE?



We hit the gcc_assert at method.cc:3593, that Marek’s fix



bypasses.


Yes, but why doesn't check_special_function_return_type prevent
that?


Ah, because we call it before walking the declarator.  We need to
check again later, perhaps in grokfndecl, that the type is
correct.
Perhaps instead of your patch.

One “issue” with adding another check in or close to grokfndecl
is
that DECLARATOR will have “been moved to the ID”, and the fact
that
we had a CDK_POINTER kind is “lost”. We could obviously somehow
propagate this information, but there might be something easier.


The information isn't lost: it's now reflected in the (wrong) return



type.  One place it would make sense to check would be


  if (ctype && (sfk == sfk_constructor
|| sfk == sfk_destructor))
{
  /* We are within a class's scope. If our
declarator
name
is the same as the class name, and we are defining
   a
function, then it is a constructor/destructor, and
   therefore
returns a void type.  */


Here 'type' is still the return type, we haven't gotten to
build_function_type yet.

That’s true. However, doesn’t it make sense to cram all the
checks
about the return type of special functions in
check_special_function_return_type, and return an error if that
return
type is invalid?


This error seems easily recoverable since we know what the type needs
to be, there's no need for error return from grokdeclarator.

ACK.


However, an alternative to my suggestion above would be to build on
your patch by making check_special_function_

[PATCH, COMMITTED] rs6000: Add cast to avoid pointer to integer comparison warning [PR117674]

2025-02-07 Thread Peter Bergner

I pushed the following fix as obvious after testing the build and verifying
the warning was silenced.

Peter



rs6000: Add cast to avoid pointer to integer comparison warning [PR117674]

libgcc/
PR target/117674
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Add cast to
avoid comparison between pointer and integer warning.

diff --git a/libgcc/config/rs6000/linux-unwind.h 
b/libgcc/config/rs6000/linux-unwind.h
index 97a9fbd2d1a..6fd3c908ae8 100644
--- a/libgcc/config/rs6000/linux-unwind.h
+++ b/libgcc/config/rs6000/linux-unwind.h
@@ -395,7 +395,7 @@ ppc_backchain_fallback (struct _Unwind_Context *context, 
void *a)
   current = context->cfa;
 
   /* If the trace CFA is not the context CFA the backtrace is done.  */
-  if (arg == NULL || arg->cfa != current)
+  if (arg == NULL || arg->cfa != (_Unwind_Word) current)
return;
 
   /* Start with next address.  */

Re: [PATCH, V2] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

2025-02-07 Thread Michael Meissner

On Fri, Feb 07, 2025 at 05:51:19PM -0600, Peter Bergner wrote:
> On 2/7/25 4:02 PM, Michael Meissner wrote:
> >  (define_predicate "invert_fpmask_comparison_operator"
> > -  (match_code "ne,unlt,unle"))
> > +  (ior (match_code "ne")
> > +   (and (match_code "unlt,unle")
> > +   (match_test "!HONOR_NANS (DFmode) || !TARGET_P9_VECTOR"
> 
> Is it always safe to use DFmode here in the HONOR_NANS macro?
> Meaning does it always give the same answer as using SFmode, TFmode,
> IFmore and KFmode?  Ditto for the other use of HONOR_NANS (DFmode).

The problem is we do not have the original floating point mode.  The mode in
question is CCFPmode, i.e. the mode used for setting the CR registers.  The
code originally just used !flag_finite_math.  I could go back to that, I
thought HONOR_NANS was clearer.

> >  enum rtx_code
> > -rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
> > +rs6000_reverse_condition (machine_mode mode,
> > + enum rtx_code code,
> > + bool no_ordered)
> 
> I'm not sure I'm a fan of the no_ordered name.  Maybe use not_ordered
> instead?  Or maybe even better, rename it to "ordered" and modify the
> code that uses it to handle the reversed meaning?

I can do that, but I would problably named it something like
ordered_compare_ok.  Just ordered by itself would say to me that only ordered
comparisons are allowed, when the majority of comparisons that the function
sees are the normal unordered comparisons (i.e. EQ, NE, GT, GE, LE, and LT).

> >/* Reversal of FP compares takes care -- an ordered compare
> > - becomes an unordered compare and vice versa.  */
> > + becomes an unordered compare and vice versa.
> > +
> > + However, this is not safe for ordered comparisons (i.e. for isgreater,
> > + etc.)  starting with the power9 because ifcvt.cc will want to create 
> > a fp
> > + cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if 
> > one of
> > + the arguments is a signalling NaN.  */
> 
> I think this could use a little work smithing and there is some stray
> whitespace.  You also explicitly mention signalling NaN, but the code
> uses HONOR_NANS, not HONOR_SNANS.  Is that intentional?

As I said in my previous reply, it is intentional.  The option -fsignaling-nans
is not normally set.  However, I felt that if the user explicitly used one of
the ordered comparisons (i.e. isgreater in this case) that the compiler should
explicitly generate code that is safe for using signalling NaNs because that is
how those functions are describe.

If the original user (and the library writers, etc.) all use -fsingaling-nans,
then I would be happy to change it to HONOR_SNAN.

Or we can just go back to using !flag_finite_math.

> > +/* { dg-do compile } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
> > +/* { dg-require-effective-target powerpc_vsx } */
> > +
> > +/* PR target/118541 says that the ordered comparison functions like 
> > isgreater
> > +   should not optimize floating point conditional moves to use
> > +   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause 
> > traps
> > +   if one of the arguments is a signaling NaN.  */
> > +
> > +/* Verify isgreater does not generate xscmpgtdp.  */
> [snip]
> > +/* { dg-final { scan-assembler-times {\mxscmpg[te]dp\M}   1 } } */
> > +/* { dg-final { scan-assembler-times {\mxxsel\M}  1 } } */
> > +/* { dg-final { scan-assembler-times {\mxscmpudp\M|\mfcmpu\M} 1 } } */
> 
> I think this would be safer if we split this into two test cases, one with
> each of the functions.  I'm worried that if we were to somehow accidentally
> swap the results of your new code, we'd still produce one each of the
> instructions above and we wouldn't notice.  I think it's safer to have one
> test case for each function here (ordered and normal) and explicitly look
> for the insns you want, while at the same time using scan-assembler-not for
> the insns you don't want to see.

Sounds reasonable.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH]RISC-V:Vectorpesudoinsnswithx0operandtouseimm0.(toggle)

2025-02-07 Thread 钟居哲

>> But I don't think offering the -mvec-elide-x0 option is beneficial.
>> I'd just enable this change unconditionally.  Or, in the unlikely
>> event there's a uarch that benefits from the old code generation, this
>> would be better handled as a consequence of -mtune than as a new
>> top-level option.
I aggree with Andrew. I think the change should be unconditional instead of 
providing -mvec-elide-x0 option.

Thanks.


juzhe.zh...@rivai.ai

Re: [PATCH, V2] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

2025-02-07 Thread Michael Meissner

On Fri, Feb 07, 2025 at 05:51:19PM -0600, Peter Bergner wrote:
> On 2/7/25 4:02 PM, Michael Meissner wrote:
> >  (define_predicate "invert_fpmask_comparison_operator"
> > -  (match_code "ne,unlt,unle"))
> > +  (ior (match_code "ne")
> > +   (and (match_code "unlt,unle")
> > +   (match_test "!HONOR_NANS (DFmode) || !TARGET_P9_VECTOR"
> 
> Is it always safe to use DFmode here in the HONOR_NANS macro?
> Meaning does it always give the same answer as using SFmode, TFmode,
> IFmore and KFmode?  Ditto for the other use of HONOR_NANS (DFmode).

I forgot to mention that rs6000_reverse_condition is called in several
contexts.

One is the case that ifcvt.cc calls REVERSE_CONDITION, and that is the case
that we don't want UNLT to be converted to GE or UNLE to be converted to GT,
because that is the place we we are creating floating point conditional moves.

The second case is around line 13454 of rs6000.md where we are reversing a
branch after setting a CR register.  In this case, we explicitly want to
reverse UNLT to GE because we used FCMPU or XSCMPUDP to set the condition code,
and that does not cause traps.

The third case is rs6000_emit_sCOND in rs6000.cc.  There we do not want ordered
comparisons converted to floating point conditional moves.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] x86: Verify that PUSH/POP can be skipped

2025-02-07 Thread Hongtao Liu

On Fri, Feb 7, 2025 at 1:57 PM H.J. Lu  wrote:
>
> For
>
> ---
> int f(int);
>
> int advance(int dz)
> {
> if (dz > 0)
> return (dz + dz) * dz;
> else
> return dz * f(dz);
> }
> ---
>
> Before r15-1619-g3b9b8d6cfdf593
>
> advance(int):
> pushrbx
> mov ebx, edi
> testedi, edi
> jle .L2
> imulebx, edi
> lea eax, [rbx+rbx]
> pop rbx
> ret
> .L2:
> callf(int)
> imuleax, ebx
> pop rbx
> ret
>
> After
>
>  advance(int):
> testedi, edi
> jle .L2
> imuledi, edi
> lea eax, [rdi+rdi]
> ret
> .L2:
> sub rsp, 24
> mov DWORD PTR [rsp+12], edi
> callf(int)
> imuleax, DWORD PTR [rsp+12]
> add rsp, 24
> ret
>
> There's no call in if branch, it's not optimal to push rbx at the entry
> of the function, it can be sinked to else branch. When "jle .L2" is not
> taken, it can save one push instruction.  Update pr111673.c to verify
> that this optimization isn't turned off.
Ok
>
> PR rtl-optimization/111673
> * gcc.target/i386/pr111673.c: Verify that PUSH/POP can be
> skipped.
>
>
> --
> H.J.



-- 
BR,
Hongtao

[PATCH, V2] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

2025-02-07 Thread Michael Meissner

This is version 2 of the patch.

Changes from the V1 patch:

1: I added a test in invert_fpmask_comparison_operator to not allow UNLE and
UNLT unless fast math is in force.  Both invert_fpmask_comparison_operator and
fpmask_comparison_operator are used to form floating point conditional moves on
Power9 and beyond.

2: I reworked rs6000_reverse_condition to be a bit clearer when we are rejecting
reversing IEEE comparisons that guarantee they don't trap.

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

extern double __ieee754_acos (double);

double
__acospi (double x)
{
  double ret = __ieee754_acos (x) / 3.14;
  return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
}

GCC currently generates the following code:

Power9  Power10 and Power11
==  ===
bl __ieee754_acos   bl __ieee754_acos@notoc
nop plfd 0,.LC0@pcrel
addis 9,2,.LC2@toc@ha   xxspltidp 12,1065353216
addi 1,1,32 addi 1,1,32
lfd 0,.LC2@toc@l(9) ld 0,16(1)
addis 9,2,.LC0@toc@ha   fdiv 0,1,0
ld 0,16(1)  mtlr 0
lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12
fdiv 0,1,0  xxsel 1,0,12,1
mtlr 0  blr
xscmpgtdp 1,0,12
xxsel 1,0,12,1
blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The following patch changes the PowerPC back end so that ifcvt.c will not change
the if/then test and move into a conditional move if the comparison is one of
the comparisons that do not raise an error with signalling NaNs and -Ofast is
not used.  If a normal comparison is used or -Ofast is used, GCC will continue
to generate XSCMPGTDP and XXSEL.

For the following code:

double
ordered_compare (double a, double b, double c, double d)
{
  return __builtin_isgreater (a, b) ? c : d;
}

/* Verify normal > does generate xscmpgtdp.  */

double
normal_compare (double a, double b, double c, double d)
{
  return a > b ? c : d;
}

with the following patch, GCC generates the following for power9, power10, and
power11:

ordered_compare:
fcmpu 0,1,2
fmr 1,4
bnglr 0
fmr 1,3
blr

normal_compare:
xscmpgtdp 1,1,2
xxsel 1,4,3,1
blr

2025-02-06  Michael Meissner  

gcc/

PR target/118541
* config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do
not allow UNLT and UNLE unless -ffast-math.
* config/rs6000/rs6000-protos.h (REVERSE_COND_ORDERED_OK): New macro.
(REVERSE_COND_NO_ORDERED): Likewise.
(rs6000_reverse_condition): Add argument.
* config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
ordered comparisons to be reversed for floating point cmoves.
(rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
* config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
* config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
Adjust rs6000_reverse_condition call.

gcc/testsuite/

PR target/118541
* gcc.target/powerpc/pr118541.c: New test.
---
 gcc/config/rs6000/predicates.md |  4 +-
 gcc/config/rs6000/rs6000-protos.h   |  6 ++-
 gcc/config/rs6000/rs6000.cc | 23 ---
 gcc/config/rs6000/rs6000.h  | 10 -
 gcc/config/rs6000/rs6000.md | 24 +++-
 gcc/testsuite/gcc.target/powerpc/pr118541.c | 43 +
 6 files changed, 92 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 647e89afb6a..4c79b5da595 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1467,7 +1467,9 @@ (define_predicate "fpmask_comparison_operator"
 ;; comparisons that generate a 0/-1 mask (i.e. the inverse of
 ;; fpmask_comparison_operator).
 (define_predicate "invert_fpmask_comparison_operator"
-  (match_code "ne,unlt,unle"))
+  (ior (

[PATCH v3] c++: Reject cdtors and conversion operators with a single * as return type [PR118306]

2025-02-07 Thread Simon Martin

Hi Jason,

On 7 Feb 2025, at 14:21, Jason Merrill wrote:

> On 2/6/25 3:05 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 6 Feb 2025, at 16:48, Jason Merrill wrote:
>>
>>> On 2/5/25 2:21 PM, Simon Martin wrote:
 Hi Jason,

 On 4 Feb 2025, at 21:23, Jason Merrill wrote:

> On 2/4/25 3:03 PM, Jason Merrill wrote:
>> On 2/4/25 11:45 AM, Simon Martin wrote:
>>> On 4 Feb 2025, at 17:17, Jason Merrill wrote:
>>>
 On 2/4/25 10:56 AM, Simon Martin wrote:
> Hi Jason,
>
> On 4 Feb 2025, at 16:39, Jason Merrill wrote:
>
>> On 1/15/25 9:56 AM, Jason Merrill wrote:
>>> On 1/15/25 7:24 AM, Simon Martin wrote:
 Hi,

 On 14 Jan 2025, at 23:31, Jason Merrill wrote:

> On 1/14/25 2:13 PM, Simon Martin wrote:
>> On 10 Jan 2025, at 19:10, Andrew Pinski wrote:
>>> On Fri, Jan 10, 2025 at 3:18 AM Simon Martin
>>> 
>>> wrote:

 We currently accept the following invalid code (EDG and

 MSVC
 do
 as
 well)
>>>
>>> clang does too:
>>> https://github.com/llvm/llvm-project/issues/121706 .

>>>
>>> Note it might be useful if a testcase with multiply `*` 
>>> is
>>> included
>>
>>> too:
>>> ```
>>> struct A {
>>>         A ();
>>> };
>>> ```
>> Thanks, makes sense to add those. Done in the attached
>> updated
>> revision,
>> successfully tested on x86_64-pc-linux-gnu.
>
>> +/* Check that it's OK to declare a function at ID_LOC 

>> with
>> the
>> indicated TYPE,
>> +   TYPE_QUALS and DECLARATOR.  SFK indicates the kind
>> of
>> special
>> function (if
>> +   any) that this function is.  OPTYPE is the type
>> given
>> in
>> a
>> conversion
>>      operator declaration, or the class type for 
>> a
>> constructor/destructor.
>>      Returns the actual return type of the
>> function;
>> that
>> may
>> be
>> different
>>      than TYPE if an error occurs, or for 
>> certain
>> special
>> functions.
>> */
>> @@ -12361,8 +12362,19 @@ 
>> check_special_function_return_type
>> (special_function_kind sfk,
>>       tree 
>> type,
>>       tree 
>> optype,
>>       int
>> type_quals,
>> +    const 
>> cp_declarator
>> *declarator,
>> +    location_t 
>> id_loc,
>
> id_loc should be the same as declarator->id_loc?
 You’re right.

>>       const
>> location_t*
>> locations)
>>       {
>> +  /* If TYPE is unspecified, DECLARATOR, if set, should
>> not
>> represent a pointer
>> + or a reference type.  */
>> +  if (type == NULL_TREE
>> +  && declarator
>> +  && (declarator->kind == cdk_pointer
>> +  || declarator->kind == cdk_reference))
>> +    error_at (id_loc, "expected unqualified-id before
>> %qs
>> token",
>> +  declarator->kind == cdk_pointer ? "*" 
>> :
>> "&");
>
> ...and id_loc isn't the location of the ptr-operator, it's

> the

> location of the identifier, so this indicates the wrong
> column.
> I
> think using declarator->id_loc makes sense, just not
> pretending
> it's
> the location of the *.
 Good catch, thanks.

> Let's give diagnostics more like the others later in the
> function
> instead of trying to emulate cp_parser_error.
 Makes sense. This is what the updated patch does,
 successfully
 tested on
 x86_64-pc-linux-gnu. OK for GCC 16?
>>>
>>> OK.
>>
>> Does this also fix 118304?  If so, let's go ahead and apply 
>> it
>> to
>> GCC
>> 15.
> I have checked just now, and we still ICE for 118304’s

> testcase

Re: [PATCH v1 04/16] Remove unecessary `record` argument from maybe_version_functions.

2025-02-07 Thread Andrew Carlotti

On Mon, Feb 03, 2025 at 01:04:08PM +, Alfie Richards wrote:
> 
> The `record` argument in maybe_version_function was intended to allow
> controlling recording the relationship of versions. However, it only
> exercised this if both input funcitons were already marked as versioned,
> and this same logic is repeated in maybe_version_function itself so the
> argument is unecessary.

I think this commit message is inaccurate, but it's quite confusing to me. How
about the following instead:

Previously, the `record` argument in maybe_version_function allowed the call to
cgraph_node::record_function_versions to be skipped.  However, this was only
skipped when both decls were already marked as versioned, in which case we will 
now trigger the early exit in record_function_versions instead.

The patch otherwise looks fine to me.

> 
> gcc/cp/ChangeLog:
> 
>   * class.cc (add_method): Remove argument.
>   * cp-tree.h (maybe_version_functions): Ditto.
>   * decl.cc (decls_match): Ditto.
>   (maybe_version_functions): Ditto.
> ---
>  gcc/cp/class.cc  | 2 +-
>  gcc/cp/cp-tree.h | 2 +-
>  gcc/cp/decl.cc   | 9 +++--
>  3 files changed, 5 insertions(+), 8 deletions(-)
> 

> diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
> index f2f81a44718..a9a80d1b4be 100644
> --- a/gcc/cp/class.cc
> +++ b/gcc/cp/class.cc
> @@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using)
>/* If these are versions of the same function, process and
>move on.  */
>if (TREE_CODE (fn) == FUNCTION_DECL
> -   && maybe_version_functions (method, fn, true))
> +   && maybe_version_functions (method, fn))
>   continue;
>  
>if (DECL_INHERITED_CTOR (method))
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index ec976928f5f..8eba8d455be 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7114,7 +7114,7 @@ extern void determine_local_discriminator   (tree, 
> tree = NULL_TREE);
>  extern bool member_like_constrained_friend_p (tree);
>  extern bool fns_correspond   (tree, tree);
>  extern int decls_match   (tree, tree, bool = 
> true);
> -extern bool maybe_version_functions  (tree, tree, bool);
> +extern bool maybe_version_functions  (tree, tree);
>  extern bool validate_constexpr_redeclaration (tree, tree);
>  extern bool merge_default_template_args  (tree, tree, bool);
>  extern tree duplicate_decls  (tree, tree,
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index cf5e055e146..3b3b4481964 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -1215,9 +1215,7 @@ decls_match (tree newdecl, tree olddecl, bool 
> record_versions /* = true */)
> && targetm.target_option.function_versions (newdecl, olddecl))
>   {
> if (record_versions)
> - maybe_version_functions (newdecl, olddecl,
> -  (!DECL_FUNCTION_VERSIONED (newdecl)
> -   || !DECL_FUNCTION_VERSIONED (olddecl)));
> + maybe_version_functions (newdecl, olddecl);
> return 0;
>   }
>  }
> @@ -1288,7 +1286,7 @@ maybe_mark_function_versioned (tree decl)
> If RECORD is set to true, record function versions.  */
>  
>  bool
> -maybe_version_functions (tree newdecl, tree olddecl, bool record)
> +maybe_version_functions (tree newdecl, tree olddecl)
>  {
>if (!targetm.target_option.function_versions (newdecl, olddecl))
>  return false;
> @@ -1311,8 +1309,7 @@ maybe_version_functions (tree newdecl, tree olddecl, 
> bool record)
>maybe_mark_function_versioned (newdecl);
>  }
>  
> -  if (record)
> -cgraph_node::record_function_versions (olddecl, newdecl);
> +  cgraph_node::record_function_versions (olddecl, newdecl);
>  
>return true;
>  }

[wwwdocs] More AVR news

2025-02-07 Thread Georg-Johann Lay


Applied this one.

Johann

--

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 362f345c..6a41ac97 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -376,6 +376,12 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   Code generation for the 32-bit integer shifts with constant
 offset has been improved. The code size may slightly increase even
 when optimizing for code size with -Os.
+  Support has been added for a compact vector table as 
supported
+by some AVR devices.  It can be activated by the new command-line 
option
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#index-mcvt";

+   >-mcvt.
+It links crtmcu-cvt.o as startup code that
+is supported since AVR-LibC v2.3.
   New AVR specific optimization passes have been added.
 They run after register allocation and can be controlled by the new
 command-line options

Re: [PATCH, V2] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

2025-02-07 Thread Peter Bergner

On 2/7/25 4:02 PM, Michael Meissner wrote:
>  (define_predicate "invert_fpmask_comparison_operator"
> -  (match_code "ne,unlt,unle"))
> +  (ior (match_code "ne")
> +   (and (match_code "unlt,unle")
> + (match_test "!HONOR_NANS (DFmode) || !TARGET_P9_VECTOR"

Is it always safe to use DFmode here in the HONOR_NANS macro?
Meaning does it always give the same answer as using SFmode, TFmode,
IFmore and KFmode?  Ditto for the other use of HONOR_NANS (DFmode).

>  enum rtx_code
> -rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
> +rs6000_reverse_condition (machine_mode mode,
> +   enum rtx_code code,
> +   bool no_ordered)

I'm not sure I'm a fan of the no_ordered name.  Maybe use not_ordered
instead?  Or maybe even better, rename it to "ordered" and modify the
code that uses it to handle the reversed meaning?

>/* Reversal of FP compares takes care -- an ordered compare
> - becomes an unordered compare and vice versa.  */
> + becomes an unordered compare and vice versa.
> +
> + However, this is not safe for ordered comparisons (i.e. for isgreater,
> + etc.)  starting with the power9 because ifcvt.cc will want to create a 
> fp
> + cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one 
> of
> + the arguments is a signalling NaN.  */

I think this could use a little work smithing and there is some stray
whitespace.  You also explicitly mention signalling NaN, but the code
uses HONOR_NANS, not HONOR_SNANS.  Is that intentional?

>  /* Can the condition code MODE be safely reversed?  This is safe in
> all cases on this port, because at present it doesn't use the
> -   trapping FP comparisons (fcmpo).  */
> +   trapping FP comparisons (fcmpo).
> +
> +   However, this is not safe for ordered comparisons (i.e. for isgreater, 
> etc.)
> +   starting with the power9 because ifcvt.cc will want to create a fp cmove,
> +   and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one of the
> +   arguments is a signalling NaN.  */

Likewise.

> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +
> +/* PR target/118541 says that the ordered comparison functions like isgreater
> +   should not optimize floating point conditional moves to use
> +   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause 
> traps
> +   if one of the arguments is a signaling NaN.  */
> +
> +/* Verify isgreater does not generate xscmpgtdp.  */
[snip]
> +/* { dg-final { scan-assembler-times {\mxscmpg[te]dp\M}   1 } } */
> +/* { dg-final { scan-assembler-times {\mxxsel\M}  1 } } */
> +/* { dg-final { scan-assembler-times {\mxscmpudp\M|\mfcmpu\M} 1 } } */

I think this would be safer if we split this into two test cases, one with
each of the functions.  I'm worried that if we were to somehow accidentally
swap the results of your new code, we'd still produce one each of the
instructions above and we wouldn't notice.  I think it's safer to have one
test case for each function here (ordered and normal) and explicitly look
for the insns you want, while at the same time using scan-assembler-not for
the insns you don't want to see.

Peter

RE: [PATCH] i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

2025-02-07 Thread Liu, Hongtao




> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, February 7, 2025 4:08 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] i386: Fix ICE with conditional QI/HI vector maxmin
> [PR118776]
> 
> Hi!
> 
> The following testcase ICEs starting with GCC 12 since r12-4526 although the
> bug has been introduced already in r12-2751.
> The problem was in the addition of cond_ define_expand
> which uses nonimmediate_operand predicates for both maxmin operands for
> all VI1248_AVX512VLBW modes.  It works fine with VI48_AVX512VL modes
> because the 3_mask VI48_AVX512VL define_expand uses
> ix86_fixup_binary_operands_no_copy and the
> *avx512f_3 VI48_AVX512VL define_insn uses %
> in constraint and !(MEM_P && MEM_P) check in condition (and
> 3 define_expand with VI124_256_AVX512F_AVX512BW
> iterator does that too), but eventhough the 8-bit and 16-bit element maxmin
> is commutative too, the 3
> define_insn with VI12_AVX512VL iterator didn't use % in constraint to make it
> commutative.  So, e.g. cond_umaxv32qi define_expand allowed
> nonimmediate_operand for both umax operands, but used
> gen_umaxv32qi_mask which wasn't commutative and only allowed
> nonimmediate_operand for the second operand.
> 
> The following patch fixes it by keeping the 3
> VI124_256_AVX512F_AVX512BW define_expand as is (it does
> ix86_fixup_binary_operands_no_copy) but extending the
> 3_mask define_expand from VI48_AVX512VL to
> VI1248_AVX512VLBW which keeps the current modes with their ISA
> conditions and adds the VI12_AVX512VL modes under additional
> TARGET_AVX512BW condition, and turning the actual define_insn into an *
> prefixed name (which it was before just for the non-masked
> case) and having the same commutative operand handling as in other
> define_insns.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok, thanks.
> 
> 2025-02-07  Jakub Jelinek  
> 
>   PR target/118776
>   * config/i386/sse.md (3_mask): Use
> VI1248_AVX512VLBW
>   iterator rather than VI48_AVX512VL.
>   (3): Rename to ...
>   (*avx512bw_3): ... this.  Use
>   nonimmediate_operand rather than register_operand predicate
> and %v
>   rather than v constraint for operand 1 and adjust condition to reject
>   MEMs in both operand 1 and 2.
> 
>   * gcc.target/i386/pr118776.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2025-01-23 15:54:53.160911648 +0100
> +++ gcc/config/i386/sse.md2025-02-07 00:16:49.155363094 +0100
> @@ -17703,12 +17703,12 @@ (define_expand "cond_"
>  })
> 
>  (define_expand "3_mask"
> -  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
> - (vec_merge:VI48_AVX512VL
> -   (maxmin:VI48_AVX512VL
> - (match_operand:VI48_AVX512VL 1 "nonimmediate_operand")
> - (match_operand:VI48_AVX512VL 2 "nonimmediate_operand"))
> -   (match_operand:VI48_AVX512VL 3 "nonimm_or_0_operand")
> +  [(set (match_operand:VI1248_AVX512VLBW 0 "register_operand")
> + (vec_merge:VI1248_AVX512VLBW
> +   (maxmin:VI1248_AVX512VLBW
> + (match_operand:VI1248_AVX512VLBW 1
> "nonimmediate_operand")
> + (match_operand:VI1248_AVX512VLBW 2
> "nonimmediate_operand"))
> +   (match_operand:VI1248_AVX512VLBW 3
> "nonimm_or_0_operand")
> (match_operand: 4 "register_operand")))]
>"TARGET_AVX512F"
>"ix86_fixup_binary_operands_no_copy (, mode,
> operands);") @@ -17724,12 +17724,12 @@ (define_insn
> "*avx512f_3 (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
> 
> -(define_insn "3"
> +(define_insn "*avx512bw_3"
>[(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
>  (maxmin:VI12_AVX512VL
> -  (match_operand:VI12_AVX512VL 1 "register_operand" "v")
> +  (match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v")
>(match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm")))]
> -  "TARGET_AVX512BW"
> +  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P
> (operands[2]))"
> 
> "vp\t{%2, %1, %0|%0 _operand3>, %1, %2}"
>[(set_attr "type" "sseiadd")
> (set_attr "prefix" "evex")
> --- gcc/testsuite/gcc.target/i386/pr118776.c.jj   2025-02-07
> 08:41:46.054157905 +0100
> +++ gcc/testsuite/gcc.target/i386/pr118776.c  2025-02-07
> 08:40:30.508196302 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/118776 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
> +
> +void bar (unsigned char *);
> +
> +void
> +foo (unsigned char *x)
> +{
> +  unsigned char b[32];
> +  bar (b);
> +  for (int i = 0; i < 32; i++)
> +{
> +  unsigned char c = 8;
> +  if (i > 3)
> + {
> +   unsigned char d = b[i];
> +   d = 1 > d ? 1 : d;
> +   c = d;
> + }
> +  x[i] = c;
> +}
> +}
> 
>   Jakub

Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg





On 2/7/25 11:01 AM, Georg-Johann Lay wrote:

Am 07.02.25 um 17:12 schrieb Jeff Law:

On 2/3/25 2:09 AM, Richard Sandiford wrote:

Jeff Law  writes:

So pulling on this thread leads me into the code that sets up
ALLOCNO_WMODE in create_insn_allocnos:


   if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
 {
   a = ira_create_allocno (regno, false, 
ira_curr_loop_tree_node);

   if (outer != NULL && GET_CODE (outer) == SUBREG)
 {
   machine_mode wmode = GET_MODE (outer);
   if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
 ALLOCNO_WMODE (a) = wmode;
 }
 }

Note how we only set ALLOCNO_MODE only at allocno creation, so it'll
work as intended if and only if the first reference is via a SUBREG.


Huh, yeah, I agree that that looks wrong.


ISTM the fix here is to always do the check and set ALLOCNO_WMODE.


[ Snipped discussion on a non-issue. ]



So ISTM that moving the code out of the "if (... == NULL)" should be
enough on its own.


And it all makes sense that you caught this.  You and another colleague
at ARM were trying to address this exact problem ~11 years ago ;-)


Heh, thought it sounded familiar :)


So attached is the updated patch that adjusts IRA to avoid this problem.

Georg-Johann, this may explain an issue you were running into as well 
where you got an invalid allocation.  I think yours was at the higher 
end of the register file, but the core issue is potentially the same 
(looking at the first use rather than all of them for paradoxical 
subregs).


You mean https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116389 ?

As far as I can tell that only occurred with reload but not with LRA.
Right.  The change is in IRA so it would affect reload targets as well 
as LRA targets.


jeff

Re: [PATCH] RISC-V: Vector pesudoinsns with x0 operand to use imm 0. (toggle)

2025-02-07 Thread Andrew Waterman

Replacing x0 with 0 when possible is fine; it should never hurt and
might help on some uarches.  (Of course, future versions of those
uarches will eventually be forced to improve handling of x0, anyway,
since as Vineet notes, some of the interesting cases don't have
immediate forms.)

But I don't think offering the -mvec-elide-x0 option is beneficial.
I'd just enable this change unconditionally.  Or, in the unlikely
event there's a uarch that benefits from the old code generation, this
would be better handled as a consequence of -mtune than as a new
top-level option.

On Fri, Feb 7, 2025 at 8:23 AM Vineet Gupta  wrote:
>
> A couple of Vector pseudoinstructions use x0 scalar which being regfile
> crosser could be inefficient on certain wider uarches.
>
> Use the imm 0 form, which should be functionally equivalent.
>
> pseudoinsn orig insn with x0 this patch
>       --
>  vncvt.x.x.w vd,vs,vm  vnsrl.wx vd,vs,x0,vm  vnsrl.wi vd,vs,0,vm
>  vneg.v vd,vs  vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
>  vwcvt.x.x.v vd,vs,vm  vwadd.vx vd,vs,x0,vm  (imm not supported)
>
> New toggle -mvec-elide-x0 gates the transformation, enabled by default.
> Although it is strictly not necessary, due to functional equivalence,
> but provided nonetheless to get original VNEG or VNCVT assembler mnemonics
> for any asm output parsing script and such in the wild.
>
> This passes my testsuite run but obviously wait for the CI tester to
> confirm the same.
>
> gcc/ChangeLog:
> * config/riscv/riscv.opt: Add new Toggle.
> * config/riscv/vector.md: vncvt substitute vnsrl.
> vnsrl with x0 replace with immediate 0.
> vneg substitute vrsub.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: 
> Change
> expected pattern.
> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: 
> Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: 
> Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: 
> Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
> * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
> * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
> * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
> * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/conversions/vnsrl0-rv32gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vnsrl0-rv64gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vrsub0-rv32gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vrsub0-rv64gcv.c: New test.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/riscv.opt|  4 
>  gcc/config/riscv/vector.md| 20 +---
>  .../cond/cond_convert_int2int-rv32-1.c|  4 ++--
>  .../cond/cond_convert_int2int-rv32-2.c|  4 ++--
>  .../cond/cond_convert_int2int-rv64-1.c|  4 ++-

Re: [PATCH] testsuite: LoongArch: Remove from btrunc, ceil, and floor effective target allowlist

2025-02-07 Thread Lulu Cheng




在 2025/2/7 下午7:51, Xi Ruoyao 写道:

Now that C default is C23, so we can no longer use LSX/LASX instructions
for these operations as the standard disallows raising INEXACT
exceptions.  So LoongArch is no longer suitable for these effective
targets.

Fix the test failures on gcc.dg/vect/vect-rounding-*.c.  For the old
standards or -ffp-int-builtin-inexact we already provide test coverage
with gcc.target/loongarch/vect-ftint.c.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_btrunc): Drop LoongArch.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floor): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_lfloor): Likewise.
(check_effective_target_vect_call_lfloorf): Likewise.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  The test failures
on gcc.dg/vect/vect-rounding-*.c are fixed.  Ok for trunk?


LGTM!

Thanks.



  gcc/testsuite/lib/target-supports.exp | 24 
  1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 60e24129bd5..432e1862c7e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9708,8 +9708,7 @@ proc check_effective_target_vect_call_lrint { } {
  proc check_effective_target_vect_call_btrunc { } {
  return [check_cached_effective_target_indexed vect_call_btrunc {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector btruncf calls.

@@ -9717,8 +9716,7 @@ proc check_effective_target_vect_call_btrunc { } {
  proc check_effective_target_vect_call_btruncf { } {
  return [check_cached_effective_target_indexed vect_call_btruncf {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector ceil calls.

@@ -9726,8 +9724,7 @@ proc check_effective_target_vect_call_btruncf { } {
  proc check_effective_target_vect_call_ceil { } {
  return [check_cached_effective_target_indexed vect_call_ceil {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector ceilf calls.

@@ -9735,8 +9732,7 @@ proc check_effective_target_vect_call_ceil { } {
  proc check_effective_target_vect_call_ceilf { } {
  return [check_cached_effective_target_indexed vect_call_ceilf {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector floor calls.

@@ -9744,8 +9740,7 @@ proc check_effective_target_vect_call_ceilf { } {
  proc check_effective_target_vect_call_floor { } {
  return [check_cached_effective_target_indexed vect_call_floor {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector floorf calls.

@@ -9753,24 +9748,21 @@ proc check_effective_target_vect_call_floor { } {
  proc check_effective_target_vect_call_floorf { } {
  return [check_cached_effective_target_indexed vect_call_floorf {
expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
  }
  
  # Return 1 if the target supports vector lceil calls.
  
  proc check_effective_target_vect_call_lceil { } {

  return [check_cached_effective_target_indexed vect_call_lceil {
-  expr { [istarget aarch64*-*-*]
-|| [istarget loongarch*-*-*] }}]
+  expr { [istarget aarch64*-*-*] }}]
  }
  
  # Return 1 if the target supports vector lfloor calls.
  
  proc check_effective_target_vect_call_lfloor { } {

  return [check_cached_effective_target_indexed vect_call_lfloor {
-  expr { [istarget aarch64*-*-*]
-|| [istarget loongarch*-*-*] }}]
+  expr { [istarget aarch64*-*-*] }}]
  }
  
  # Return 1 if the target supports vector nearbyint calls.

[committed 1/2] arm: fix ICE due to fix for POP {PC} change

2025-02-07 Thread Richard Earnshaw

My earlier change for making the compiler prefer

POP {PC}

over

LDR PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC.  This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.
---
 gcc/config/arm/arm.cc | 51 +++
 1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7e2082101d8..503401544cb 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -22563,7 +22563,8 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
 
   /* The parallel needs to hold num_regs SETs
  and one SET for the stack update.  */
-  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (num_regs + emit_update + 
offset_adj));
+  par = gen_rtx_PARALLEL (VOIDmode,
+ rtvec_alloc (num_regs + emit_update + offset_adj));
 
   if (return_in_pc)
 XVECEXP (par, 0, 0) = ret_rtx;
@@ -22571,11 +22572,11 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
   if (emit_update)
 {
   /* Increment the stack pointer, based on there being
- num_regs 4-byte registers to restore.  */
+num_regs 4-byte registers to restore.  */
   tmp = gen_rtx_SET (stack_pointer_rtx,
- plus_constant (Pmode,
-stack_pointer_rtx,
-4 * num_regs));
+plus_constant (Pmode,
+   stack_pointer_rtx,
+   4 * num_regs));
   RTX_FRAME_RELATED_P (tmp) = 1;
   XVECEXP (par, 0, offset_adj) = tmp;
 }
@@ -22587,31 +22588,33 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
rtx dwarf_reg = reg = gen_rtx_REG (SImode, i);
if (arm_current_function_pac_enabled_p () && i == IP_REGNUM)
  dwarf_reg = gen_rtx_REG (SImode, RA_AUTH_CODE);
-if ((num_regs == 1) && emit_update && !return_in_pc)
-  {
-/* Emit single load with writeback.  */
-tmp = gen_frame_mem (SImode,
- gen_rtx_POST_INC (Pmode,
-   stack_pointer_rtx));
-tmp = emit_insn (gen_rtx_SET (reg, tmp));
+   if ((num_regs == 1) && emit_update && !return_in_pc)
+ {
+   /* Emit single load with writeback.  */
+   tmp = gen_frame_mem (SImode,
+gen_rtx_POST_INC (Pmode,
+  stack_pointer_rtx));
+   tmp = emit_insn (gen_rtx_SET (reg, tmp));
REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, dwarf_reg,
  dwarf);
-return;
-  }
+   arm_add_cfa_adjust_cfa_note (tmp, UNITS_PER_WORD,
+stack_pointer_rtx, stack_pointer_rtx);
+   return;
+ }
 
-tmp = gen_rtx_SET (reg,
-   gen_frame_mem
-   (SImode,
-plus_constant (Pmode, stack_pointer_rtx, 4 * j)));
-RTX_FRAME_RELATED_P (tmp) = 1;
-XVECEXP (par, 0, j + emit_update + offset_adj) = tmp;
+   tmp = gen_rtx_SET (reg,
+  gen_frame_mem
+  (SImode,
+   plus_constant (Pmode, stack_pointer_rtx, 4 * j)));
+   RTX_FRAME_RELATED_P (tmp) = 1;
+   XVECEXP (par, 0, j + emit_update + offset_adj) = tmp;
 
-/* We need to maintain a sequence for DWARF info too.  As dwarf info
-   should not have PC, skip PC.  */
-if (i != PC_REGNUM)
+   /* We need to maintain a sequence for DWARF info too.  As dwarf info
+  should not have PC, skip PC.  */
+   if (i != PC_REGNUM)
  dwarf = alloc_reg_note (REG_CFA_RESTORE, dwarf_reg, dwarf);
 
-j++;
+   j++;
   }
 
   if (return_in_pc)
-- 
2.34.1

[committed 2/2] arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]

2025-02-07 Thread Richard Earnshaw

For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry.  This saves
a couple of bytes in the resulting image.  This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.

Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.

Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.

Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.

Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note.  The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.

gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.
---
 gcc/config/arm/arm.cc | 99 +++
 .../gcc.target/arm/thumb2-pop-loreg.c | 18 
 2 files changed, 75 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-pop-loreg.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 503401544cb..a95ddf8201f 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -22543,24 +22543,50 @@ static void
 arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
 {
   int num_regs = 0;
-  int i, j;
   rtx par;
   rtx dwarf = NULL_RTX;
   rtx tmp, reg;
   bool return_in_pc = saved_regs_mask & (1 << PC_REGNUM);
   int offset_adj;
   int emit_update;
+  unsigned long reg_bits;
 
   offset_adj = return_in_pc ? 1 : 0;
-  for (i = 0; i <= LAST_ARM_REGNUM; i++)
-if (saved_regs_mask & (1 << i))
-  num_regs++;
+  for (reg_bits = saved_regs_mask; reg_bits;
+   reg_bits &= ~(reg_bits & -reg_bits))
+num_regs++;
 
   gcc_assert (num_regs && num_regs <= 16);
 
   /* If SP is in reglist, then we don't emit SP update insn.  */
   emit_update = (saved_regs_mask & (1 << SP_REGNUM)) ? 0 : 1;
 
+  /* If popping just one register, use LDR reg, [SP], #4, unless
+ we're generating Thumb code and reg is a low reg.  */
+  if (num_regs == 1
+  && emit_update
+  && !return_in_pc
+  && (TARGET_ARM
+ /* For Thumb we want to use POP for a single low register.  */
+ || (saved_regs_mask & ~0xff)))
+{
+  int i = exact_log2 (saved_regs_mask);
+
+  rtx dwarf_reg = reg = gen_rtx_REG (SImode, i);
+  if (arm_current_function_pac_enabled_p () && i == IP_REGNUM)
+   dwarf_reg = gen_rtx_REG (SImode, RA_AUTH_CODE);
+  /* Emit single load with writeback.   */
+  tmp = gen_frame_mem (SImode,
+  gen_rtx_POST_INC (Pmode,
+stack_pointer_rtx));
+  tmp = emit_insn (gen_rtx_SET (reg, tmp));
+  REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, dwarf_reg,
+   dwarf);
+  arm_add_cfa_adjust_cfa_note (tmp, UNITS_PER_WORD,
+  stack_pointer_rtx, stack_pointer_rtx);
+  return;
+}
+
   /* The parallel needs to hold num_regs SETs
  and one SET for the stack update.  */
   par = gen_rtx_PARALLEL (VOIDmode,
@@ -22582,50 +22608,39 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
 }
 
   /* Now restore every reg, which may include PC.  */
-  for (j = 0, i = 0; j < num_regs; i++)
-if (saved_regs_mask & (1 << i))
-  {
-   rtx dwarf_reg = reg = gen_rtx_REG (SImode, i);
-   if (arm_current_function_pac_enabled_p () && i == IP_REGNUM)
- dwarf_reg = gen_rtx_REG (SImode, RA_AUTH_CODE);
-   if ((num_regs == 1) && emit_update && !return_in_pc)
- {
-   /* Emit single load with writeback.  */
-   tmp = gen_frame_mem (SImode,
-gen_rtx_POST_INC (Pmode,
-  stack_pointer_rtx));
-   tmp = emit_insn (gen_rtx_SET (reg, tmp));
-   REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, dwarf_reg,
- dwarf);
-   arm_add_cfa_adjust_cfa_note (tmp, UNITS_PER_WORD,
-stack_pointer_rtx, stack_pointer_rtx);
-   return;
- }
-
-   tmp = gen_rtx_SET (reg,
-  gen_frame_mem
-  (SImode,
-   plus_constant (Pmode, stack_pointer_rtx, 4 * j)));
-   RTX_FRAME_RELATED

Re: [PATCH] ipa-cp: Perform operations in the appropriate types (PR 118097)

2025-02-07 Thread Jan Hubicka

> > gcc/ChangeLog:
> >
> > 2025-01-20  Martin Jambor  
> >
> > PR ipa/118097
> > * ipa-cp.cc (ipa_get_jf_arith_result): Adjust comment.
> > (ipa_get_jf_pass_through_result): Removed.
> > (ipa_value_from_jfunc): Use directly ipa_get_jf_arith_result, do
> > not specify operation type but make sure we check and possibly
> > convert the result.
> > (get_val_across_arith_op): Remove the last parameter, always pass
> > NULL_TREE to ipa_get_jf_arith_result in its last argument.
> > (propagate_vals_across_arith_jfunc): Do not pass res_type to
> > get_val_across_arith_op.
> > (propagate_vals_across_pass_through): Add checking assert that
> > parm_type is not NULL.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2025-01-24  Martin Jambor  
> >
> > PR ipa/118097
> > * gcc.dg/ipa/pr118097.c: New test.
> > * gcc.dg/ipa/pr118535.c: Likewise.
> > * gcc.dg/ipa/ipa-notypes-1.c: Likewise.

OK, thanks!

Honza

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Richard Sandiford  writes:
> Really nice analysis!  Thanks for writing this up.
>
> Sorry for the big quote below, but:
>
> Jan Hubicka  writes:
>> [...]
>> PR117081 is about regression in povray. The reducted testcase:
>>
>> void foo (void);
>> void bar (void);
>>
>> int
>> test (int a)
>> {
>>   int r;
>>
>>   if (r = -a)
>> foo ();
>>   else
>> bar ();
>>
>>   return r;
>> }
>>
>> shows that we now use caller saved register (EAX) to hold the return value 
>> which yields longer code.  The costs are
>> Popping a0(r98,l0)  -- (0=13000,13000) (3=15000,15000) (6=15000,15000) 
>> (40=15000,15000) (41=15000,15000) (42=15000,15000) (43=15000,15000)
>>
>> here 15000 is 11000+4000 where I think 4000 is cost of 2 reg-reg moves
>> multiplied by REG_FREQ_MAX.   This seems correct. GCC 13 uses callee
>> saved register and produces:
>>
>>  :
>>0:53  push   %rbx <--- callee save
>>1:89 fb   mov%edi,%ebx<--- move 1
>>3:f7 db   neg%ebx
>>5:74 09   je 10 
>>7:e8 00 00 00 00  call   c 
>>c:89 d8   mov%ebx,%eax<--- callee 
>> restore
>>e:5b  pop%rbx
>>f:c3  ret
>>   10:e8 00 00 00 00  call   15 
>>   15:89 d8   mov%ebx,%eax<--- move 2
>>   17:5b  pop%rbx <--- callee 
>> restore
>>   18:c3  ret
>>
>> Mainline used EAX since it has costs 13000.  It is not 100% clear to me
>> why.
>>  - 12000 is the spilling (which is emitted twice but executed just once)
>>  - I would have expected 2000 for the move from edi to eax.
>> However even if cost is 14000 we will choose EAX.  The code is:
>>
>>0:89 f8   mov%edi,%eax<--- move1
>>2:48 83 ec 18 sub$0x18,%rsp   <--- stack 
>> frame creation
>>6:f7 d8   neg%eax
>>8:89 44 24 0c mov%eax,0xc(%rsp)   <--- spill out
>>c:85 ff   test   %edi,%edi
>>e:74 10   je 20 
>>   10:e8 00 00 00 00  call   15 
>>   15:8b 44 24 0c mov0xc(%rsp),%eax   <--- spill in
>>   19:48 83 c4 18 add$0x18,%rsp   <--- stack frame
>>   1d:c3  ret
>>   1e:66 90   xchg   %ax,%ax
>>   20:e8 00 00 00 00  call   25 
>>   25:8b 44 24 0c mov0xc(%rsp),%eax   <--- spill in
>>   29:48 83 c4 18 add$0x18,%rsp   <--- stack frame
>>   2d:c3  ret
>>
>> This sequence really saves one move at the expense of of stack frame
>> allocation (which is not modelled by the cost model) and longer spill
>> code (also no modelled).
>> [...]
>> PR117082 is about noreturn function:
>> __attribute__ ((noreturn))
>> void
>> f3 (void)
>> {
>>   int y0 = x0;
>>   int y1 = x1;
>>   f1 ();
>>   f2 (y0, y1);
>>   while (1);
>> }
>>
>> Here the cost model is really wrong by assuming that entry and exit
>> block have same frequencies.  This can be fixed quite easilly (though it
>> is a rare case)
>
> Yeah (and nice example).
>
>> PR118497  seems to be ixed.
>>
>> So overall I think
>>  1) we can fix scaling of epilogue by exit block frequency
>> to get noreturns right.
>>  2) we should drop the check for optimize_size.  Since with -Os
>> REG_FREQ_FROM_BB always returns 1000 everything should be scaled
>> same way
>>  3) we currently have wire in "-1" to biass the cost metric for callee
>> saved registers.
>> It may make sense to allow targets to control this, since i.e. x86
>> has push/pop that is shorter. -3 would solve the testcase with neg
>> and would express that push/pop is still cheaper with extra reg-reg
>> move.
>>  4) cost model misses shring wrapping, the fact that if register is
>> callee saved it may be used by multiple allocnos and also that
>> push/pop sequence may avoid need for manual RSP adjustments.
>>
>> Those seems bit harder things to fit in though.
>>
>> So if we want to go with the target hook, I think it should adjust the
>> cost before scalling (since targets may have special tricks for
>> prologues) rather than the scale factor (which is target independent
>> part of cost model).
>
> Like you say, one of the missing pieces appears to be the allocation/
> dealloaction overhead for callee saves.  Could we try to add a hook to
> model that cost, based on certain parameters?
>
> In particular, one thing that the examples above have in common is that
> they don't need to allocate a frame for local variables.  That seems
> like it ought to be part of the mix.  If we need to allo

Re: [PATCH v2 4/7] Add a cache of recent lines

2025-02-07 Thread David Malcolm

On Sun, 2025-02-02 at 21:47 -0800, Andi Kleen wrote:
> > 
> > If I reading this right, calls to get_next_line lead to insertions
> > into
> > the ring buffer whilst the buffer is empty or the last line in the
> > ring
> > buffer cache is m_line_num - 1.
> > 
> > There are a few places where we update m_line_num, but this caching
> > code doesn't seem to touch those places.  Should it?  I don't know
> > if
> > the lack of a reset is an issue, but it's an aspect of the patch
> > that's
> > a bit hazy to me; sorry.
> 
> The idea was that there is only a single cursor for this cache,
> assuming the main parser algorithm goes linearly through the file.
> 
> So even if m_line_num changes temporarily for some warning 
> it will eventually go back to where that cursor was. So
> I didn't bother to refill the cache on these changes.
> 
> There is a good chance that the changed position hits the cache
> if it's within its range.
> 
> If there is an access pattern where this assumption is not true
> the cache will not work, but the normal anchors still do of course.
> 
> I will add a comment describing this.
> 
> > If I'm reading this right, the caching that this adds is only for
> > the
> > final 256 lines read so far in the file, and lets us use a line
> > offset
> > relative to the most recent entry to go direct to such a recent
> > line in
> > the file.
> 
> Right.

Thanks; the patch is OK for trunk (with the comment added).

Dave

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-07 Thread Vladimir Makarov




On 2/6/25 5:35 PM, Jan Hubicka wrote:


Register 3 (first caller saved) has cost 11000.  This comes from:
 add_cost = ((ira_memory_move_cost[mode][rclass][0]
  + ira_memory_move_cost[mode][rclass][1])
 * saved_nregs / hard_regno_nregs (hard_regno,
   mode) - 1)
  ^^
  here
* (optimize_size ? 1 :
   REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

There is no comment why -1, but I suppose it is there to biass costs to
use prologue/epilogue instad of caller save sequence when runtime cost
estimate is even.


It is a very old code.  As I remember RA w/o this avoided to use a 
callee-saved reg at all in cases when it could be used for more one 
pseudo.  The idea was also that some targets typical savings/restores 
are cheaper (pop/push, multiple reg lds/sts) and it should be taken into 
account somehow.





 :
0:  53  push   %rbx <--- callee save
1:  89 fb   mov%edi,%ebx<--- move 1
3:  f7 db   neg%ebx
5:  74 09   je 10 
7:  e8 00 00 00 00  call   c 
c:  89 d8   mov%ebx,%eax<--- callee restore
e:  5b  pop%rbx
f:  c3  ret
   10:  e8 00 00 00 00  call   15 
   15:  89 d8   mov%ebx,%eax<--- move 2
   17:  5b  pop%rbx <--- callee restore
   18:  c3  ret

Mainline used EAX since it has costs 13000.  It is not 100% clear to me
why.
In many cases it is hard to find why this particular cost occurs as the 
costs are updated dynamically and assignment a pseudo to hard reg can 
affect cost for another pseudo which is not involved in a move with the 
first pseudo (but involved through a chain of moves). Those are 
complicated heuristics changed several times and verified by visible 
SPEC performance improvements.


So overall I think
  1) we can fix scaling of epilogue by exit block frequency
 to get noreturns right.
  2) we should drop the check for optimize_size.  Since with -Os
 REG_FREQ_FROM_BB always returns 1000 everything should be scaled
 same way
  3) we currently have wire in "-1" to biass the cost metric for callee
 saved registers.
 It may make sense to allow targets to control this, since i.e. x86
 has push/pop that is shorter. -3 would solve the testcase with neg
 and would express that push/pop is still cheaper with extra reg-reg
 move.
  4) cost model misses shring wrapping, the fact that if register is
 callee saved it may be used by multiple allocnos and also that
 push/pop sequence may avoid need for manual RSP adjustments.
Shrink wrapping was later addition to RA and I guess nobody thought how 
to update cost model taking it into account.

 Those seems bit harder things to fit in though.

So if we want to go with the target hook, I think it should adjust the
cost before scalling (since targets may have special tricks for
prologues) rather than the scale factor (which is target independent
part of cost model).


Very nice analysis, Honza.  I believe we still need a hook and I'll work 
on the target hook improvement.

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-07 Thread Andrew Pinski

On Fri, Feb 7, 2025 at 9:20 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Really nice analysis!  Thanks for writing this up.
> >
> > Sorry for the big quote below, but:
> >
> > Jan Hubicka  writes:
> >> [...]
> >> PR117081 is about regression in povray. The reducted testcase:
> >>
> >> void foo (void);
> >> void bar (void);
> >>
> >> int
> >> test (int a)
> >> {
> >>   int r;
> >>
> >>   if (r = -a)
> >> foo ();
> >>   else
> >> bar ();
> >>
> >>   return r;
> >> }
> >>
> >> shows that we now use caller saved register (EAX) to hold the return value 
> >> which yields longer code.  The costs are
> >> Popping a0(r98,l0)  -- (0=13000,13000) (3=15000,15000) (6=15000,15000) 
> >> (40=15000,15000) (41=15000,15000) (42=15000,15000) (43=15000,15000)
> >>
> >> here 15000 is 11000+4000 where I think 4000 is cost of 2 reg-reg moves
> >> multiplied by REG_FREQ_MAX.   This seems correct. GCC 13 uses callee
> >> saved register and produces:
> >>
> >>  :
> >>0:53  push   %rbx <--- callee 
> >> save
> >>1:89 fb   mov%edi,%ebx<--- move 1
> >>3:f7 db   neg%ebx
> >>5:74 09   je 10 
> >>7:e8 00 00 00 00  call   c 
> >>c:89 d8   mov%ebx,%eax<--- callee 
> >> restore
> >>e:5b  pop%rbx
> >>f:c3  ret
> >>   10:e8 00 00 00 00  call   15 
> >>   15:89 d8   mov%ebx,%eax<--- move 2
> >>   17:5b  pop%rbx <--- callee 
> >> restore
> >>   18:c3  ret
> >>
> >> Mainline used EAX since it has costs 13000.  It is not 100% clear to me
> >> why.
> >>  - 12000 is the spilling (which is emitted twice but executed just once)
> >>  - I would have expected 2000 for the move from edi to eax.
> >> However even if cost is 14000 we will choose EAX.  The code is:
> >>
> >>0:89 f8   mov%edi,%eax<--- move1
> >>2:48 83 ec 18 sub$0x18,%rsp   <--- stack 
> >> frame creation
> >>6:f7 d8   neg%eax
> >>8:89 44 24 0c mov%eax,0xc(%rsp)   <--- spill out
> >>c:85 ff   test   %edi,%edi
> >>e:74 10   je 20 
> >>   10:e8 00 00 00 00  call   15 
> >>   15:8b 44 24 0c mov0xc(%rsp),%eax   <--- spill in
> >>   19:48 83 c4 18 add$0x18,%rsp   <--- stack 
> >> frame
> >>   1d:c3  ret
> >>   1e:66 90   xchg   %ax,%ax
> >>   20:e8 00 00 00 00  call   25 
> >>   25:8b 44 24 0c mov0xc(%rsp),%eax   <--- spill in
> >>   29:48 83 c4 18 add$0x18,%rsp   <--- stack 
> >> frame
> >>   2d:c3  ret
> >>
> >> This sequence really saves one move at the expense of of stack frame
> >> allocation (which is not modelled by the cost model) and longer spill
> >> code (also no modelled).
> >> [...]
> >> PR117082 is about noreturn function:
> >> __attribute__ ((noreturn))
> >> void
> >> f3 (void)
> >> {
> >>   int y0 = x0;
> >>   int y1 = x1;
> >>   f1 ();
> >>   f2 (y0, y1);
> >>   while (1);
> >> }
> >>
> >> Here the cost model is really wrong by assuming that entry and exit
> >> block have same frequencies.  This can be fixed quite easilly (though it
> >> is a rare case)
> >
> > Yeah (and nice example).
> >
> >> PR118497  seems to be ixed.
> >>
> >> So overall I think
> >>  1) we can fix scaling of epilogue by exit block frequency
> >> to get noreturns right.
> >>  2) we should drop the check for optimize_size.  Since with -Os
> >> REG_FREQ_FROM_BB always returns 1000 everything should be scaled
> >> same way
> >>  3) we currently have wire in "-1" to biass the cost metric for callee
> >> saved registers.
> >> It may make sense to allow targets to control this, since i.e. x86
> >> has push/pop that is shorter. -3 would solve the testcase with neg
> >> and would express that push/pop is still cheaper with extra reg-reg
> >> move.
> >>  4) cost model misses shring wrapping, the fact that if register is
> >> callee saved it may be used by multiple allocnos and also that
> >> push/pop sequence may avoid need for manual RSP adjustments.
> >>
> >> Those seems bit harder things to fit in though.
> >>
> >> So if we want to go with the target hook, I think it should adjust the
> >> cost before scalling (since targets may have special tricks for
> >> prologues) rather than the scale factor (which is target independent
> >> part of cost model).
> >
> > Like you say, one of the missing pieces appears to be the allocation/
> > dea

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Richard Sandiford  writes:
> FWIW, here's a very rough initial version of the kind of thing
> I was thinking about.  Hopefully the hook documentation describes
> the approach.  It's deliberately (overly?) flexible.

Argh!  I forgot:

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index de34be31f48..4b1e7eff41f 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -5253,6 +5253,7 @@ color (void)
 {
   allocno_stack_vec.create (ira_allocnos_num);
   memset (allocated_hardreg_p, 0, sizeof (allocated_hardreg_p));
+  CLEAR_HARD_REG_SET (allocated_callee_save_regs);
   ira_initiate_assign ();
   do_coloring ();
   ira_finish_assign ();

> This still needs a lot of clean-up and testing, but I thought I might
> as well send what I have before leaving for the weekend.  Does it look
> reasonable in principle?

To be clear, "testing" of course includes performance testing
(SPEC, but alo other things).

Thanks,
Richard

Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

2025-02-07 Thread Georg-Johann Lay


Am 07.02.25 um 17:12 schrieb Jeff Law:

On 2/3/25 2:09 AM, Richard Sandiford wrote:

Jeff Law  writes:

So pulling on this thread leads me into the code that sets up
ALLOCNO_WMODE in create_insn_allocnos:


   if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
 {
   a = ira_create_allocno (regno, false, 
ira_curr_loop_tree_node);

   if (outer != NULL && GET_CODE (outer) == SUBREG)
 {
   machine_mode wmode = GET_MODE (outer);
   if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
 ALLOCNO_WMODE (a) = wmode;
 }
 }

Note how we only set ALLOCNO_MODE only at allocno creation, so it'll
work as intended if and only if the first reference is via a SUBREG.


Huh, yeah, I agree that that looks wrong.


ISTM the fix here is to always do the check and set ALLOCNO_WMODE.


[ Snipped discussion on a non-issue. ]



So ISTM that moving the code out of the "if (... == NULL)" should be
enough on its own.


And it all makes sense that you caught this.  You and another colleague
at ARM were trying to address this exact problem ~11 years ago ;-)


Heh, thought it sounded familiar :)


So attached is the updated patch that adjusts IRA to avoid this problem.

Georg-Johann, this may explain an issue you were running into as well 
where you got an invalid allocation.  I think yours was at the higher 
end of the register file, but the core issue is potentially the same 
(looking at the first use rather than all of them for paradoxical subregs).


You mean https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116389 ?

As far as I can tell that only occurred with reload but not with LRA.

Johann

I've had this in my tester about a week.  So it's been through the 
crosses as well as various native bootstraps, including but not limited 
to m68k, ppc, s390, hppa, sh4, etc.  And just for good measure I 
bootstrapped & regression tested it on x86_64 a few minutes ago.


Pushing to the trunk.

Jeff

Re: [PATCH] Fortran: fix initialization of allocatable non-deferred character [PR59252]

2025-02-07 Thread Harald Anlauf


Hi Steve,

Am 07.02.25 um 21:39 schrieb Steve Kargl:

On Fri, Feb 07, 2025 at 09:31:12PM +0100, Harald Anlauf wrote:


Regtested on x86_64-pc-linux-gnu.  OK for mainline?



Looks reasonable.


While it is a really old bug but wrong code, I'd like to backport
this also at least to 14-branch.  Any reservations?


If it passes regression testing, no reservations.


Will do, as usual.




-  else if (init && cm->attr.allocatable && expr->expr_type == EXPR_NULL)
+  else if (cm->attr.allocatable && expr->expr_type == EXPR_NULL
+  && (init
+  || (cm->ts.type == BT_CHARACTER
+  && !(cm->ts.deferred || cm->attr.pdt_string
  {
-  /* NULL initialization for allocatable components.  */
+  /* NULL initialization for allocatable components.
+Deferred-length character is dealth with later.  */


s/dealth/dealt


Oops, now I see it, too.

Fixed and pushed as r15-7433-g818c36a85e3fae .

Thanks for the review!

Harald

Re: PING^2 [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]

2025-02-07 Thread Oleg Endo



On Fri, 2025-02-07 at 07:04 -0700, Jeff Law wrote:
> 
> On 2/7/25 5:51 AM, Oleg Endo wrote:
> > > Hi,
> > > 
> > > Can the issue be resolved in a target independent manner as suggested 
> > > below?
> > > Or is it better to deal with this in the target code?

> That seems like a pretty heavy hammer though.  For that reason alone I 
> think this is going to need some discussion and I believe the folks most 
> needed for that discussion are focused on release related issues.
> 

Thanks for your attention.

As Andrew pointed out in the PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116713#c8

this issue has been around for 15 years.  So I guess nobody really cares
about that niche thing.

Anyway, will try to ping it again later.

Best regards,
Oleg Endo

[PATCH] Fortran: fix initialization of allocatable non-deferred character [PR59252]

2025-02-07 Thread Harald Anlauf


Dear all,

the initialization of allocatable non-deferred character with NULL
when being a component of a derived type used a wrong path instead
of properly initializing with a null pointer and produced really
weird code.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

While it is a really old bug but wrong code, I'd like to backport
this also at least to 14-branch.  Any reservations?

Thanks,
Harald

From f90b21d89c206507c4383e349db12546b793ce31 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 7 Feb 2025 21:21:10 +0100
Subject: [PATCH] Fortran: fix initialization of allocatable non-deferred
 character [PR59252]

	PR fortran/59252

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_trans_subcomponent_assign): Initialize
	allocatable non-deferred character with NULL properly.

gcc/testsuite/ChangeLog:

	* gfortran.dg/allocatable_char_1.f90: New test.
---
 gcc/fortran/trans-expr.cc |  8 +++-
 .../gfortran.dg/allocatable_char_1.f90| 47 +++
 2 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/allocatable_char_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index f923aeb9460..a7af67cb441 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9836,9 +9836,13 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component * cm,
   tmp = gfc_trans_alloc_subarray_assign (tmp, cm, expr);
   gfc_add_expr_to_block (&block, tmp);
 }
-  else if (init && cm->attr.allocatable && expr->expr_type == EXPR_NULL)
+  else if (cm->attr.allocatable && expr->expr_type == EXPR_NULL
+	   && (init
+	   || (cm->ts.type == BT_CHARACTER
+		   && !(cm->ts.deferred || cm->attr.pdt_string
 {
-  /* NULL initialization for allocatable components.  */
+  /* NULL initialization for allocatable components.
+	 Deferred-length character is dealth with later.  */
   gfc_add_modify (&block, dest, fold_convert (TREE_TYPE (dest),
 		  null_pointer_node));
 }
diff --git a/gcc/testsuite/gfortran.dg/allocatable_char_1.f90 b/gcc/testsuite/gfortran.dg/allocatable_char_1.f90
new file mode 100644
index 000..1d6c25c4942
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocatable_char_1.f90
@@ -0,0 +1,47 @@
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/59252
+
+module mod
+  implicit none
+
+  type t1
+ character(256), allocatable :: label
+  end type t1
+
+  type t2
+ type(t1),   allocatable :: appv(:)
+  end type t2
+
+contains
+  subroutine construct(res)
+type(t2), allocatable, intent(inout) :: res
+if (.not. allocated(res)) allocate(res)
+  end subroutine construct
+
+  subroutine construct_appv(appv)
+type(t1), allocatable, intent(inout) :: appv(:)
+if (.not. allocated(appv)) allocate(appv(20))
+  end subroutine construct_appv
+
+  type(t1) function foo () result (res)
+  end function foo
+end module mod
+
+program testy
+  use mod
+  implicit none
+  type(t2), allocatable :: res
+  type(t1)  :: s
+
+  ! original test from pr59252
+  call construct (res)
+  call construct_appv(res%appv)
+  deallocate (res)
+
+  ! related test from pr118747 comment 2:
+  s = foo ()
+end program testy
+
+! { dg-final { scan-tree-dump-not "__builtin_memmove" "original" } }
-- 
2.43.0

Re: [PATCH] Fortran: fix initialization of allocatable non-deferred character [PR59252]

2025-02-07 Thread Steve Kargl

On Fri, Feb 07, 2025 at 09:31:12PM +0100, Harald Anlauf wrote:
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 

Looks reasonable.

> While it is a really old bug but wrong code, I'd like to backport
> this also at least to 14-branch.  Any reservations?

If it passes regression testing, no reservations.


> -  else if (init && cm->attr.allocatable && expr->expr_type == EXPR_NULL)
> +  else if (cm->attr.allocatable && expr->expr_type == EXPR_NULL
> +&& (init
> +|| (cm->ts.type == BT_CHARACTER
> +&& !(cm->ts.deferred || cm->attr.pdt_string
>  {
> -  /* NULL initialization for allocatable components.  */
> +  /* NULL initialization for allocatable components.
> +  Deferred-length character is dealth with later.  */

s/dealth/dealt

-- 
Steve

[PATCH] tree-optimization/115538 - possible wrong-code with SLP conversion

The following fixes a latent issue where we use ranges to verify
correctness of a vector conversion optimization.  We rely on ranges
from 'op0' which for SLP is extracted from the representative stmt
which does not necessarily correspond to any actual scalar operation.
We also do not verify the range of all scalar lanes in the SLP
operand match.  The following rectifies this, restricting the support
to single-lane SLP nodes at this point - on branches we'd simply
not perform this optimization with SLP.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/115538
* tree-vectorizer.h (vect_get_slp_scalar_def): Declare.
* tree-vect-slp.cc (vect_get_slp_scalar_def): New helper.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): For SLP
correctly look at ranges of the scalar defs of the SLP operand.
(supportable_indirect_convert_operation): Likewise.
---
 gcc/tree-vect-generic.cc |  6 ++
 gcc/tree-vect-slp.cc | 19 +++
 gcc/tree-vect-stmts.cc   | 37 +++--
 gcc/tree-vectorizer.h|  4 +++-
 4 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index c2f7a29d539..173ebd9a7ba 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1755,10 +1755,8 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 modifier = WIDEN;
 
   auto_vec > converts;
-  if (supportable_indirect_convert_operation (code,
- ret_type, arg_type,
- converts,
- arg))
+  if (supportable_indirect_convert_operation (code, ret_type, arg_type,
+ converts))
 {
   new_rhs = arg;
   for (unsigned int i = 0; i < converts.length () - 1; i++)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ac1733004b6..8ed746ea5a9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10199,6 +10199,25 @@ vect_create_constant_vectors (vec_info *vinfo, 
slp_tree op_node)
   SLP_TREE_VEC_DEFS (op_node).quick_push (vop);
 }
 
+/* Get the scalar definition of the Nth lane from SLP_NODE or NULL_TREE
+   if there is no definition for it in the scalar IL or it is not known.  */
+
+tree
+vect_get_slp_scalar_def (slp_tree slp_node, unsigned n)
+{
+  if (SLP_TREE_DEF_TYPE (slp_node) == vect_internal_def)
+{
+  if (!SLP_TREE_SCALAR_STMTS (slp_node).exists ())
+   return NULL_TREE;
+  stmt_vec_info def = SLP_TREE_SCALAR_STMTS (slp_node)[n];
+  if (!def)
+   return NULL_TREE;
+  return gimple_get_lhs (STMT_VINFO_STMT (def));
+}
+  else
+return SLP_TREE_SCALAR_OPS (slp_node)[n];
+}
+
 /* Get the Ith vectorized definition from SLP_NODE.  */
 
 tree
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1b639ae3b17..c815dd3a5b9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5610,10 +5610,8 @@ vectorizable_conversion (vec_info *vinfo,
return false;
   gcc_assert (code.is_tree_code ());
   if (supportable_indirect_convert_operation (code,
- vectype_out,
- vectype_in,
- converts,
- op0))
+ vectype_out, vectype_in,
+ converts, op0, slp_op0))
{
  gcc_assert (converts.length () <= 2);
  if (converts.length () == 1)
@@ -5750,7 +5748,16 @@ vectorizable_conversion (vec_info *vinfo,
   else if (code == FLOAT_EXPR)
{
  wide_int op_min_value, op_max_value;
- if (!vect_get_range_info (op0, &op_min_value, &op_max_value))
+ if (slp_node)
+   {
+ tree def;
+ /* ???  Merge ranges in case of more than one lane.  */
+ if (SLP_TREE_LANES (slp_op0) != 1
+ || !(def = vect_get_slp_scalar_def (slp_op0, 0))
+ || !vect_get_range_info (def, &op_min_value, &op_max_value))
+   goto unsupported;
+   }
+ else if (!vect_get_range_info (op0, &op_min_value, &op_max_value))
goto unsupported;
 
  cvt_type
@@ -15197,7 +15204,7 @@ supportable_indirect_convert_operation (code_helper 
code,
tree vectype_out,
tree vectype_in,
vec > 
&converts,
-   tree op0)
+   tree op0, slp_tree slp_op0)
 {
   bool found_mode = false;
   scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_out));
@@ -15269,10 +15

[PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread pan2 . li

From: Pan Li 

Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.

For example as below code:

  21   │
  22   │ void compute ()
  23   │ {
  24   │   size_t vl = __riscv_vsetvl_e16m1 (N);
  25   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
  26   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
  27   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, 
vl);
  28   │
  29   │   __riscv_vse16_v_u16m1 (c, vc, vl);
  30   │ }
  31   │
  32   │ int main ()
  33   │ {
  34   │   initialize ();
  35   │   compute();
  36   │
  37   │   return 0;
  38   │ }

After compile with -march=rv64gcv -O3, we will have:

  30   │ compute:
  31   │ csrwi   vxrm,2
  32   │ lui a3,%hi(a)
  33   │ lui a4,%hi(b)
  34   │ addia4,a4,%lo(b)
  35   │ vsetivlizero,4,e16,m1,ta,ma
  36   │ addia3,a3,%lo(a)
  37   │ vle16.v v2,0(a4)
  38   │ vle16.v v1,0(a3)
  39   │ lui a4,%hi(c)
  40   │ addia4,a4,%lo(c)
  41   │ vaaddu.vv   v1,v1,v2
  42   │ vse16.v v1,0(a4)
  43   │ ret
  44   │ .size   compute, .-compute
  45   │ .section.text.startup,"ax",@progbits
  46   │ .align  1
  47   │ .globl  main
  48   │ .type   main, @function
  49   │ main:
   | // csrwi   vxrm,2 deleted after inline
  50   │ addisp,sp,-16
  51   │ sd  ra,8(sp)
  52   │ callinitialize
  53   │ lui a3,%hi(a)
  54   │ lui a4,%hi(b)
  55   │ vsetivlizero,4,e16,m1,ta,ma
  56   │ addia4,a4,%lo(b)
  57   │ addia3,a3,%lo(a)
  58   │ vle16.v v2,0(a4)
  59   │ vle16.v v1,0(a3)
  60   │ lui a4,%hi(c)
  61   │ addia4,a4,%lo(c)
  62   │ li  a0,0
  63   │ vaaddu.vv   v1,v1,v2

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/118103

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  4 +-
 .../gcc.target/riscv/rvv/base/pr118103-2.c| 40 +
 .../riscv/rvv/base/pr118103-run-2.c   | 44 +++
 3 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-run-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 439cc12f93d..819e1538741 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10900,7 +10900,9 @@ riscv_conditional_register_usage (void)
call_used_regs[regno] = 1;
 }
 
-  if (!TARGET_VECTOR)
+  if (TARGET_VECTOR)
+global_regs[VXRM_REGNUM] = 1;
+  else
 {
   for (int regno = V_REG_FIRST; regno <= V_REG_LAST; regno++)
fixed_regs[regno] = call_used_regs[regno] = 1;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
new file mode 100644
index 000..d6e3aa09077
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d" } */
+
+#include "riscv_vector.h"
+
+#define N 4
+uint16_t a[N];
+uint16_t b[N];
+uint16_t c[N];
+
+void initialize ()
+{
+  uint16_t tmp_0[N] = { 0xfff, 3213, 238, 275, };
+
+  for (int i = 0; i < N; ++i)
+a[i] = b[i] = tmp_0[i];
+
+  for (int i = 0; i < N; ++i)
+c[i] = 0;
+}
+
+void compute ()
+{
+  size_t vl = __riscv_vsetvl_e16m1 (N);
+  vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
+  vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
+  vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
+
+  __riscv_vse16_v_u16m1 (c, vc, vl);
+}
+
+int main ()
+{
+  initialize ();
+  compute();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {csrwi\s+vxrm,\s*[01234]} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-run-2.c
new file mode 100644
index 000..89150d4f6b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-run-2.c
@@ -0,0 +1,44 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3" } */
+
+#include "riscv_vector.h"
+
+#define N 4
+uint16_t a[N];
+uint16_t b[N];
+uint16_t c[N];
+
+void initialize () {
+  uint16_t tmp_0[N] = { 0xfff, 3213, 238, 275, };
+  uint16_t tmp_1[N] = { 0x2,  823,  39,   9, };
+
+  for (int i = 0; i < N; ++i)
+{
+  a[i] = tmp_0[i];
+  b[i] = tmp_1[i];
+}
+
+  for (int i = 0; i < N; ++i)
+c[i] = 0;
+}
+
+void compute (

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Really nice analysis!  Thanks for writing this up.

Sorry for the big quote below, but:

Jan Hubicka  writes:
>> > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
>> > + 
>> > +static int
>> > +ix86_ira_callee_saved_register_cost_scale (int)
>> > +{
>> > +  return 1;
>> > +}
>> > +
>
>> > return cl;
>> >   }
>> > +int
>> > +default_ira_callee_saved_register_cost_scale (int)
>> > +{
>> > +  return (optimize_size
>> > +? 1
>> > +: REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>> > +}
>> > +
>
> I am not sure how this makes sense - why x86 would be significantly
> different from other targets?
>
> I think the only bit non-standard thing is that prologue/epilogue code
> can use push/pop that is shorter than mov used to save caller saved
> registers.
>
> I went through few testcases:
>
> void d();
> void a()
> {   
> int b;
> asm ("use %0":"=r" (b));
> d();
> asm volatile (""::"r" (b));
> }
> compiler with -O2 -fira-verbose=1 gives:
>
> Popping a0(r99,l0)  -- (0=12000,12000) (1=12000,12000) (2=12000,12000) 
> (4=12000,12000) (5=12000,12000) (36=12000,12000) (37=12000,12000) 
> (38=12000,12000) (39=12000,12000) (3=11000,11000) (6=11000,11000) 
> (40=11000,11000) (41=11000,11000) (42=11000,11000) (43=11000,11000)
>
> load and save costs are 6. So spill pair is 12 weighted by 1000 that is
> REG_FREQ_MAX.
>
> Register 0 (EAX) has cost 12000 which makes sense to me:
>   - load and save costs are 6, combined spill pair is 12
>   - REG_FREQ_MAX is 1000 and since function has only one BB, it has
> maximal frequency, so we get 12000.
>
> Register 3 (first caller saved) has cost 11000.  This comes from:
> add_cost = ((ira_memory_move_cost[mode][rclass][0]
>  + ira_memory_move_cost[mode][rclass][1])
> * saved_nregs / hard_regno_nregs (hard_regno,
>   mode) - 1)
> ^^
> here
>
>* (optimize_size ? 1 :
>   REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>
> There is no comment why -1, but I suppose it is there to biass costs to
> use prologue/epilogue instad of caller save sequence when runtime cost
> estimate is even.
>
> Now for
> void d();
> void a()
> {
> for (int i = 0; i < 100; i++)
>   d();
> int b;
> asm ("use %0":"=r" (b));
> d();
> asm volatile (""::"r" (b));
> }
>
> I get
>
>   Popping a0(r100,l0)  -- (0=120,120) (1=120,120) (2=120,120) (4=120,120) 
> (5=120,120) (36=120,120) (37=120,120) (38=120,120) (39=120,120) (3=0,0) 
> (6=110,110) (40=110,110) (41=110,110) (42=110,110) (43=110,110)
>
> This also makes sense to me, since there is loop the basic block has
> lower frequency of 10, thus costs are scaled down.
>
> void d();
> int cnd;
> void a()
> {
> int b;
> asm ("use %0":"=r" (b));
>
> if (__builtin_expect_with_probability (cnd, 1, 0.8))
>   d();
> asm volatile (""::"r" (b));
> }
>
> I get
>
>  Popping a0(r100,l0)  -- (0=9600,9600) (1=9600,9600) (2=9600,9600) 
> (4=9600,9600) (5=9600,9600) (36=9600,9600) (37=9600,9600) (38=9600,9600) 
> (39=9600,9600) (3=11000,11000) (6=11000,11000) (40=11000,11000) 
> (41=11000,11000) (42=11000,11000) (43=11000,11000)
>
> which seems also correct.  It is better to use caller saved registr
> since call to d() has lower frequency then the entry basic block. This
> is what gcc 13 and this patch gets wrong
>
>  Popping a0(r100,l0)  -- (1=9600,9600) (2=9600,9600) (4=9600,9600) 
> (5=9600,9600) (36=9600,9600) (37=9600,9600) (38=9600,9600) (39=9600,9600) 
> (3=11,11) (6=11,11) (40=11,11) (41=11,11) (42=11,11) (43=11,11)
>
> Due to missing scaling factor we think that using callee saved registr
> is win while it is not.  GCC13 gets this wrong even for probability 0.
>
> Looking into PRs referneced in the patch:
> PR111673 is the original bug that motivated correcting the cost (adding
>  the scale by entry block frequency)
> PR115932 is cris-elf I don't know how to bencmark easily.
> PR116028 seems to be about shrink wrapping in
>
> void f(int *i)
> {
> if (!i)
> return;
> else
> {
> __builtin_printf("Hi");
> *i=0;
> }
> }
>
> here I see tha tthe cost model misses the fact that epilogue will be
> shrink-wrapped so both caller and callee saving will result in one spill
> after the early exit.
>
> PR117081 is about regression in povray. The reducted testcase:
>
> void foo (void);
> void bar (void);
>
> int
> test (int a)
> {
>   int r;
>
>   if (r = -a)
> foo ();
>   else
> bar ();
>
>   return r;
> }
>
> shows that we now use caller saved register (EAX) to hold the return value 
> which yields longer code.  The costs are
> Popping

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-07 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, February 5, 2025 1:15 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH]middle-end: delay checking for alignment to load 
> [PR118464]
> 
> On Wed, 5 Feb 2025, Tamar Christina wrote:
> 
> [...]
> 
> > >
> 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > > > 60002933f384f65b 100644
> > > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > > (loop_vec_info
> > > > > loop_vinfo)
> > > > > >   if (is_gimple_debug (stmt))
> > > > > > continue;
> > > > > >
> > > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > > + stmt_vec_info stmt_vinfo
> > > > > > +   = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > >   auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > >   if (!dr_ref)
> > > > > > continue;
> > > > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > > > (loop_vec_info loop_vinfo)
> > > > > >  bounded by VF so accesses are within range.  We only need 
> > > > > > to
> check
> > > > > >  the reads since writes are moved to a safe place where if 
> > > > > > we get
> > > > > >  there we know they are safe to perform.  */
> > > > > > - if (DR_IS_READ (dr_ref)
> > > > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > > + if (DR_IS_READ (dr_ref))
> > > > > > {
> > > > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > > -   {
> > > > > > - const char *msg
> > > > > > -   = "early break not supported: cannot peel "
> > > > > > - "for alignment, vectorization would read out of "
> > > > > > - "bounds at %G";
> > > > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > > > -   }
> > > > > > -
> > > > > >   dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > >   dr_info->need_peeling_for_alignment = true;
> > > > >
> > > > > You're setting the flag on any DR of a DR group here ...
> > > > >
> > > > > >   if (dump_enabled_p ())
> > > > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > -"marking DR (read) as needing peeling 
> > > > > > for
> "
> > > > > > -"alignment at %G", stmt);
> > > > > > +"marking DR (read) as possibly needing
> peeling "
> > > > > > +"for alignment at %G", stmt);
> > > > > > }
> > > > > >
> > > > > >   if (DR_IS_READ (dr_ref))
> > > > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > > Compute the misalignment of the data reference DR_INFO when
> vectorizing
> > > > > > with VECTYPE.
> > > > > >
> > > > > > -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that case, 
> > > > > > *RESULT
> will
> > > > > > -   be set appropriately on failure (but is otherwise left 
> > > > > > unchanged).
> > > > > > -
> > > > > > Output:
> > > > > > 1. initialized misalignment info for DR_INFO
> > > > > >
> > > > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > >
> > > > > >  static void
> > > > > >  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info 
> > > > > > *dr_info,
> > > > > > -tree vectype, opt_result *result = 
> > > > > > nullptr)
> > > > > > +tree vectype)
> > > > > >  {
> > > > > >stmt_vec_info stmt_info = dr_info->stmt;
> > > > > >vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> > > *vinfo,
> > > > > dr_vec_info *dr_info,
> > > > > >  = exact_div (targetm.vectorize.preferred_vector_alignment 
> > > > > > (vectype),
> > > > > >  BITS_PER_UNIT);
> > > > > >
> > > > > > -  /* If this DR needs peeling for alignment for correctness, we 
> > > > > > must
> > > > > > - ensure the target alignment is a constant power-of-two 
> > > > > > multiple of the
> > > > > > - amount read per vector iteration (overriding the above hook 
> > > > > > where
> > > > > > - necessary).  */
> > > > > > -  if (dr_info->need_peeling_for_alignment)
> > > > > > -{
> > > > > > -  /* Vector size in bytes.  */
> > > > > > -  poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > > (vectype));
> > > > > > -
> > > > > > -  /* We can only peel for loops, of course.  */
> > > > > > -  gcc_checking_assert (loop_vinfo);
> > > > > > -
> > > > > > -  /* Calculate the number of vectors read per vector 
> > > > > > iteration.  If
> > > > > > -it is a power of two, mul

Re: [Patch] [gcn] Fix gfx906's sramecc setting


On 06/02/2025 22:09, Tobias Burnus wrote:

ROCm 6.3.2 does not like my patch for reasons that I do not understand;
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html

Until that's sorted, I decided to split off two obvious fixes;
I might suggest some further changes, but the full patch has to
wait until generic really works.

* * *

The attached patch adds '...' to the device macro to avoid touching
this when adding new fields.

And it fixes an issue with gfx906, which shows up when compiling,
e.g. as
   gcc -g -fopenmp -foffload-options=amdgcn-amdhsa=-march=gfx906 file.c
(with offloading code in file.c).

The reason is that '-g' causes mkoffload.cc to create a .o file with
debugging symbols - and that .o file is linked with the GCN device
files. While that file does not contain executable code, the ELF
header still needs to match the GCN .o files as otherwise the linker
complains that there is a mismatch. For the line above, it complains
about: "ld: error: incompatible sramecc: /tmp/ccLhwZle.o".

And in line with the GCC 14 code in mkoffload.cc and with the entries
in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processor-table for
gfx906 + the llvm-mc / lld implementation, that means that the sramecc
type is 'any' and not unsupported.

OK for mainline?


OK.

Andrew

[PATCH] i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

2025-02-07 Thread Jakub Jelinek

Hi!

The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_ define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes.  It works fine with
VI48_AVX512VL modes because the 3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_3 VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the 3
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative.  So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.

The following patch fixes it by keeping the 3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-07  Jakub Jelinek  

PR target/118776
* config/i386/sse.md (3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(3): Rename to ...
(*avx512bw_3): ... this.  Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.

* gcc.target/i386/pr118776.c: New test.

--- gcc/config/i386/sse.md.jj   2025-01-23 15:54:53.160911648 +0100
+++ gcc/config/i386/sse.md  2025-02-07 00:16:49.155363094 +0100
@@ -17703,12 +17703,12 @@ (define_expand "cond_"
 })
 
 (define_expand "3_mask"
-  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
-   (vec_merge:VI48_AVX512VL
- (maxmin:VI48_AVX512VL
-   (match_operand:VI48_AVX512VL 1 "nonimmediate_operand")
-   (match_operand:VI48_AVX512VL 2 "nonimmediate_operand"))
- (match_operand:VI48_AVX512VL 3 "nonimm_or_0_operand")
+  [(set (match_operand:VI1248_AVX512VLBW 0 "register_operand")
+   (vec_merge:VI1248_AVX512VLBW
+ (maxmin:VI1248_AVX512VLBW
+   (match_operand:VI1248_AVX512VLBW 1 "nonimmediate_operand")
+   (match_operand:VI1248_AVX512VLBW 2 "nonimmediate_operand"))
+ (match_operand:VI1248_AVX512VLBW 3 "nonimm_or_0_operand")
  (match_operand: 4 "register_operand")))]
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
@@ -17724,12 +17724,12 @@ (define_insn "*avx512f_3")])
 
-(define_insn "3"
+(define_insn "*avx512bw_3"
   [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
 (maxmin:VI12_AVX512VL
-  (match_operand:VI12_AVX512VL 1 "register_operand" "v")
+  (match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v")
   (match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX512BW"
+  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vp\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
(set_attr "prefix" "evex")
--- gcc/testsuite/gcc.target/i386/pr118776.c.jj 2025-02-07 08:41:46.054157905 
+0100
+++ gcc/testsuite/gcc.target/i386/pr118776.c2025-02-07 08:40:30.508196302 
+0100
@@ -0,0 +1,23 @@
+/* PR target/118776 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+void bar (unsigned char *);
+
+void
+foo (unsigned char *x)
+{
+  unsigned char b[32];
+  bar (b);
+  for (int i = 0; i < 32; i++)
+{
+  unsigned char c = 8;
+  if (i > 3)
+   {
+ unsigned char d = b[i];
+ d = 1 > d ? 1 : d;
+ c = d;
+   }
+  x[i] = c;
+}
+}

Jakub

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-07 Thread Jan Hubicka

> On Thu, Feb 6, 2025 at 11:40 PM Vladimir Makarov  wrote:
> >
> >
> > On 2/6/25 4:54 PM, Richard Sandiford wrote:
> >
> > Vladimir Makarov  writes:
> >
> > This is a complicated problem resulted in many tries to fix it in some
> > general way.
> >
> > In general I am agree with Surya's approach to scale cost of reg
> > saves/restores somehow.  But the general approach, although solved some
> > problems, also created a lot of new ones.  May be because IRA does not
> > take some other aspects of using callee saved regs.  And some of them
> > were addressed by other patches. e.g. recently proposed by Surya and one
> > for PR118497.
> >
> > I also agree with Richard Sandiford's comment that we should avoid
> > introducing the new hooks for RA and I actually tried to stick to this
> > policy for a long time.  But I don't see another solution to introducing
> > the new hook in this case.  It is hard to figure out generally in RA
> > that saves/restores will be insns different from ld/st (e.g. x86
> > push/pop) and that they will be cheaper.
> >
> > So after some time to think about the patch I decided to approve the RA
> > part of the patch.  I also hope that the work on this problem will
> > continue (e.g. improving default and target hook implementations and
> > documentation how to better use it).
> >
> > For the record, I strongly object to this.  The hook just seems like a
> > complete hack to me.  Even if we accept that there is target-specific
> > information in play, the hook isn't providing that information.
> >
> > In contrast to what you said above, my objection isn't to having hooks
> > -- those are often needed and good.  What bothers me is that the hook
> > isn't well designed.  Hooks should provide information rather than
> > override code by brute force.
> >
> > On the other hand, I accept that you're (rightly!) the maintainer.
> >
> > I also don't like the hook implementation for x86-64 (although this is a 
> > matter of target maintainers).  All these costs look voodoo and random to 
> > me.
> >
> > But this problem is longing for more than half year.  I spent a lot of time 
> > too on this.  Patches were submitted and reverted and nobody did find so 
> > far any solution satisfying all GCC tests.  If somebody finds a solution 
> > without the hook, I will be glad to get rid off the hook.  Also the related 
> > PRs are marked as P1 ones, it means people think they are important (I am 
> > not sure about this myself).  Without fixing them (or downgrading them) 
> > there will be no GCC release.  So I am in difficult situation with these 
> > PRs and need some resolution.
> 
> Just to chime in as the one who likely made those PRs P1.  'P1' here
> is really about the testsuite FAIL,
> how to resolve it is up to target maintainers - P1 should make them
> look, and adjusting the testcase
> (or even XFAILing it) is a valid resolution of the P1 regression.

I certainly understand the pain with heuirstics changes distubring
testsuite and bringing some regressions and some improvements making it
difficult to weight overall impact.  With inliner heuristics this
happens often.  

We are good on tracking regressions, but if something improves, it goes
without saying. If some heuristics is formely wrong and fixing it
introduces some problems, we are kind of biassed towards keeping old,
broken one. Consistency is good, but I think in this specific case we
should see overall improvements.

The testcases where we previously shrink-wrapped and now use caller
saved registers are IMO not necessarily a problem.  We had bug in cost
model that made us to use callee saved registers too often and
shrink-wrapping helped to mitigate it. Using caller saved register
has same effect if frequency of caller saving sums to 1.  So perhaps
adjusting the testcases to have more calls will restore original
behaviour in cases where they are intendd to verify that shrink-wrapping
works.

Naturally it would be nice to make the cost model anticipate the
shrink-wrapping but that looks like quite hard problem to me, since
precise conditions are known only post-reload.  Perhaps one can special
case some early exit conditions where shrink-wrapping is most important.

I will prepare patch for noreturns.  They are probably not too important
in practice but easy to get right.

Honza

Re: [PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-02-07 Thread Monk Chiang

Hi Robin,
Thanks for your comment. I think your point is correct, especially the part
about SEWmin.
I will revise this patch again.

On Wed, Feb 5, 2025 at 4:18 PM Robin Dapp  wrote:

> > Hi Robin,
> > Sorry, I should have simplified the problem by presenting it in terms of
> > Zve32x, because Zve32f implies Zve32x.
> > As the specification states, the requirement is to support LMUL ≥
> SEW/ELEN.
> > Regarding the implementation,
>
> But the spec requirement mentions SEW_min not SEW?
>
> "In general, the requirement is to support LMUL ≥ SEWMIN/ELEN, where
> SEWMIN is
> the narrowest supported SEW value and ELEN is the widest supported SEW
> value"
>
> Further it states:
> "For standard vector extensions with ELEN=32, fractional LMULs of 1/2 and
> 1/4
> must be supported."
>
> > I followed this rule to fix the problem.
> > In this link: https://godbolt.org/z/j59oTW371, there is a vsetivli
> > zero,2,e32,mf2,ta,ma.
> > Here, SEW=32, and Zve32x has ELEN=32, which makes LMUL=1/2 illegal.
> >
> > According to the rule LMUL ≥ SEW/ELEN => LMUL ≥ 32 / 32 => LMUL ≥  1.
>
> As you're specifying VLEN=128 (with zvl128b) we enable the respective
> modes,
> i.e. V2SI and V4SI.  With VLEN=32 those wouldn't be available.
> RVVMF2SI already has the requirement TARGET_MIN_VLEN > 32 so wouldn't be
> chosen
> either.
>
> If LMUL <= 1 were illegal for all zve32 we couldn't vectorize anything that
> doesn't fit a full vector?  That can't be correct and would severely limit
> us.
>
> I think the only necessary change is to make sure we're not emitting
> LMUL = 1/8 for ELEN=32/zve32.  That should be a far less invasive change.
>
> >> In particular, how would the same LMUL for AVL=2 and AVL=4 and the same
> data
> >> type be correct?
> >
> > That's right. The case just allocates more space, but storing 2 and 4
> > elements remains the same.
>
> Even if a V2SI with LMUL=1 on VLEN=128 doesn't lead to a SIGILL right away
> it would surely modify the overlap constraints and such.  To me that
> doesn't
> look right.
>
> --
> Regards
>  Robin
>
>

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Richard Sandiford  writes:
> In particular, one thing that the examples above have in common is that
> they don't need to allocate a frame for local variables.  That seems
> like it ought to be part of the mix.  If we need to allocate a frame
> using addition anyway, then presumably one of the advantages of push/pop
> over callee saves goes away.

Gah, of course I meant caller saves.

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c


This patch is part of the following series (all unreviewed so far)
but can be independently applied:

* [Patch] [gcn] Fix gfx906's sramecc setting,
  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675251.html

* "[gcn] Add gfx9-generic and generic-associated gfx*"
  (email subject: "Re: [Patch] [GCN] Handle generic ISA names in libgomp's 
plugin-gcn.c";
   this thread), 
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675259.html

* * *

This patch permits loading generic ISA code objects - by just
trying whether the runtime accepts it.  If not, it fails with
an error. - The error messages should be a bit more helpful in
some cases as before.


OK for mainline?


This becomes useful by configuring a gfx*-generic multilib,
once ROCR ("ROCm") support it; thus, this is a future proof
patch.

* * *

Note: This currently fails with all ROCm <= 6.3.2 as those
either don't recognize the generic ISA code or do not support
generic resolution. However, it looks as if one of the next
ROCm (AOMP?) releases will do.

Note 2: As the generic ISA is not yet supported, this patch does
not suggest compiling with -march=gfx*-generic, yet.

Note 3: The patch series is based on it (with some modifications):
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html
that also has this -march= diagnostic.

Tobias
[GCN] Handle generic ISA names in libgomp's plugin-gcn.c

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (ELFABIVERSION_AMDGPU_HSA_V6,
	EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET,
	GET_GENERIC_VERSION): New #define.
	(elf_gcn_isa_is_generic): New.
	(isa_matches_agent): Accept all generic code objects on the first
	go; extend the diagnostic and handle runtime-failed case.
	(create_and_finalize_hsa_program): Call it also after loading
	the code failed, pass the status.

 libgomp/plugin/plugin-gcn.c | 116 ++--
 1 file changed, 90 insertions(+), 26 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8015a6f80f3..54d11478635 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -66,6 +66,14 @@
 #define R_AMDGPU_RELATIVE64	13	/* B + A  */
 #endif
 
+#define ELFABIVERSION_AMDGPU_HSA_V6		4
+
+#define EF_AMDGPU_GENERIC_VERSION_V		0xff00  /* Mask.  */
+#define EF_AMDGPU_GENERIC_VERSION_OFFSET	24
+
+#define GET_GENERIC_VERSION(VAR) ((VAR & EF_AMDGPU_GENERIC_VERSION_V) \
+  >> EF_AMDGPU_GENERIC_VERSION_OFFSET)
+
 /* GCN specific definitions for asynchronous queues.  */
 
 #define ASYNC_QUEUE_SIZE 64
@@ -242,7 +250,7 @@ struct kernel_dispatch
 };
 
 /* Structure of the kernargs segment, supporting console output.
- 
+
This needs to match the definitions in Newlib, and the expectations
in libgomp target code.  */
 
@@ -1668,6 +1676,13 @@ elf_gcn_isa_field (Elf64_Ehdr *image)
   return image->e_flags & EF_AMDGPU_MACH_MASK;
 }
 
+static int
+elf_gcn_isa_is_generic (Elf64_Ehdr *image)
+{
+  return (image->e_ident[8] == ELFABIVERSION_AMDGPU_HSA_V6
+	  && GET_GENERIC_VERSION (image->e_flags));
+}
+
 /* Returns the name that the HSA runtime uses for the ISA or NULL if we do not
support the ISA. */
 
@@ -2399,38 +2414,87 @@ init_basic_kernel_info (struct kernel_info *kernel,
   return true;
 }
 
-/* Check that the GCN ISA of the given image matches the ISA of the agent. */
+/* If status is SUCCESS, assume that the code runs if either the ISA of agent
+   and code is the same - or it is generic code.
+   Otherwise, execution failed with the provided status code; try to give
+   some useful diagnostic.  */
 
 static bool
-isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image)
+isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image,
+		   hsa_status_t status)
 {
+  /* Generic image - assume that it works and only return to here
+ when it fails, i.e. fatal == true.  */
+  if (status == HSA_STATUS_SUCCESS && elf_gcn_isa_is_generic (image))
+return true;
+
   int isa_field = elf_gcn_isa_field (image);
-  const char* isa_s = isa_name (isa_field);
-  if (!isa_s)
+  if (status == HSA_STATUS_SUCCESS && isa_field == agent->device_isa)
+return true;
+
+  /* Either nongeneric and mismatch of the ISA - or generic but
+ not handled by the ROCm (e.g. because it is too old).  */
+
+  char msg[340];
+  char agent_isa_xs[8];
+  char device_isa_xs[8];
+  const char *agent_isa_s = isa_name (agent->device_isa);
+  const char *device_isa_s = isa_name (isa_field);
+  if (agent_isa_s == NULL)
 {
-  hsa_error ("Unsupported ISA in GCN code object.", HSA_STATUS_ERROR);
-  return false;
+  snprintf (agent_isa_xs, sizeof agent_isa_xs,
+		"0x%X", agent->device_isa);
+  agent_isa_s = agent_isa_xs;
 }
-
-  if (isa_field != agent->device_isa)
+  if (device_isa_s == NULL)
 {
-  char msg[204];
-  const char *agent_isa_s = isa_name (agent->device_isa);
-  assert (agent_isa_s);
-
-  snprintf (msg, sizeof msg,
-		"GCN code object ISA '%s' do

Offer

2025-02-07 Thread Farhan Faruqui

Hello,

Should I continue using this email address to contact you?

Farhan Faruqui

Re: [PATCH] aarch64: Fix bootstrap with --enable-checking=release [PR118771]

2025-02-07 Thread Kyrylo Tkachov




> On 7 Feb 2025, at 01:04, Andrew Pinski  wrote:
> 
> With release checking we get an uninitialization warning
> inside aarch64_split_move because of jump threading for the case of 
> `npieces==0`
> but `npieces` is never 0 (but there is no way the compiler can know that.
> So this fixes the issue by adding a `gcc_assert` to the function which asserts
> that `npieces > 0` and fixes the uninitialization warning.
> 
> Bootstrapped and tested on aarch64-linux-gnu (with and without 
> --enable-checking=release).
> 
> The warning:
> 
> aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)':
> aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + 
> offsetof(auto_vec,auto_vec::m_data[0]))' may be 
> used uninitialized [-Werror=maybe-uninitialized]
> 3418 |   if (reg_overlap_mentioned_p (dst_pieces[0], src))
>  |   ^~~~
> aarch64.cc:3408:20: note: 'dst_pieces' declared here
> 3408 |   auto_vec dst_pieces, src_pieces;
>  |^~
> 
> PR target/118771
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is
> greater than 0.
> 

Ok.
Thanks,
Kyrill

> Signed-off-by: Andrew Pinski 
> ---
> gcc/config/aarch64/aarch64.cc | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index c1e40200806..f5f23f6ff4b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3407,6 +3407,9 @@ aarch64_split_move (rtx dst, rtx src, machine_mode 
> single_mode)
>GET_MODE_SIZE (single_mode)).to_constant ();
>   auto_vec dst_pieces, src_pieces;
> 
> +  /* There should be at least one piece. */
> +  gcc_assert (npieces > 0);
> +
>   for (unsigned int i = 0; i < npieces; ++i)
> {
>   auto off = i * GET_MODE_SIZE (single_mode);
> -- 
> 2.43.0
>

[PATCH] c++: Properly support null pointer constants in conditional operators [PR118282]

2025-02-07 Thread Simon Martin

We've been rejecting the following valid code since GCC 4

=== cut here ===
struct A {
  explicit A (int);
  operator void* () const;
};
void foo (const A& x) {
  auto res = 0 ? x : 0;
}
int main () {
  A a{5};
  foo(a);
}
=== cut here ===

The problem is that for COND_EXPR, add_builtin_candidate has an early
return if the true and false values are not pointers that does not take
null pointer constants into account. This causes to not find any valid
conversion, and fail to compile.

This patch fixes the condition to also pass if the true/false values are
not pointers but null pointer constants, which resolves the PR.

Successfully tested on x86_64-pc-linux-gnu. Given this regression's age,
I don't think it make much sense to fix it during stage 4 (let me know
if you disagree), so OK for GCC16?

PR c++/118282

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate): Also check for null_ptr_cst_p
operands.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/op8.C: New test.

---
 gcc/cp/call.cc|  3 +-
 gcc/testsuite/g++.dg/conversion/op8.C | 75 +++
 2 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/conversion/op8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c08bd0c8634..e440d58141b 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -3272,7 +3272,8 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
break;
 
   /* Otherwise, the types should be pointers.  */
-  if (!TYPE_PTR_OR_PTRMEM_P (type1) || !TYPE_PTR_OR_PTRMEM_P (type2))
+  if (!((TYPE_PTR_OR_PTRMEM_P (type1) || null_ptr_cst_p (args[0]))
+   && (TYPE_PTR_OR_PTRMEM_P (type2) || null_ptr_cst_p (args[1]
return;
 
   /* We don't check that the two types are the same; the logic
diff --git a/gcc/testsuite/g++.dg/conversion/op8.C 
b/gcc/testsuite/g++.dg/conversion/op8.C
new file mode 100644
index 000..eac958776c9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/op8.C
@@ -0,0 +1,75 @@
+// PR c++/118282
+// { dg-do "compile" }
+
+#if __cplusplus >= 201103L
+# include  // Only available from c++11 onwards.
+#endif
+
+struct A {
+  explicit A (int);
+  operator void* () const;
+};
+
+struct B {
+  explicit B (int);
+  operator char* () const;
+};
+
+struct C {
+  explicit C (int);
+  operator int () const;
+};
+
+struct BothWays {
+  BothWays (int);
+  operator void*() const;
+};
+
+extern bool my_bool;
+
+void foo (const A& a, const B& b, const C& c, const BothWays& d) {
+  void *res_a_1 = 0  ? 0 : a;
+  void *res_a_2 = 1  ? 0 : a;
+  void *res_a_3 = my_bool ? 0 : a;
+  void *res_a_4 = 0  ? a : 0;
+  void *res_a_5 = 1  ? a : 0;
+  void *res_a_6 = my_bool ? a : 0;
+
+  void *res_b_1 = 0  ? 0 : b;
+  void *res_b_2 = 1  ? 0 : b;
+  void *res_b_3 = my_bool ? 0 : b;
+  void *res_b_4 = 0  ? b : 0;
+  void *res_b_5 = 1  ? b : 0;
+  void *res_b_6 = my_bool ? b : 0;
+
+  //
+  // 0 valued constants that are NOT null pointer constants - this worked 
already.
+  //
+  char zero_char  = 0;
+  void *res_ko1  = 0 ? zero_char : a; // { dg-error "different 
types" }
+
+#if __cplusplus >= 201103L
+  // Those are only available starting with c++11.
+  int8_t zero_i8  = 0;
+  void *res_ko2  = 0 ? zero_i8   : a; // { dg-error "different 
types" "" { target c++11 }  }
+  uintptr_t zerop = 0;
+  void *res_ko3  = 0 ? zerop : a; // { dg-error "different 
types" "" { target c++11 }  }
+#endif
+
+  // Conversion to integer - this worked already.
+  int res_int= 0 ? 0 : c;
+
+  // Case where one arm is of class type that can be constructed from an
+  // integer and the other arm is a null pointer constant (inspired by
+  // g++.dg/template/cond5.C).
+  0 ? d : 0;
+  0 ? 0 : d;
+}
+
+int main(){
+  A a (5);
+  B b (42);
+  C c (43);
+  BothWays d (1982);
+  foo (a, b, c, d);
+}
-- 
2.44.0

[PATCH][v2] tree-optimization/115538 - possible wrong-code with SLP conversion

The following fixes a latent issue where we use ranges to verify
correctness of a vector conversion optimization.  We rely on ranges
from 'op0' which for SLP is extracted from the representative stmt
which does not necessarily correspond to any actual scalar operation.
We also do not verify the range of all scalar lanes in the SLP
operand match.  The following rectifies this, restricting the support
to single-lane SLP nodes at this point - on branches we'd simply
not perform this optimization with SLP.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

v2 adds a missing check for NULL op0 as now passed from
expand_vector_conversion.

PR tree-optimization/115538
* tree-vectorizer.h (vect_get_slp_scalar_def): Declare.
* tree-vect-slp.cc (vect_get_slp_scalar_def): New helper.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): For SLP
correctly look at ranges of the scalar defs of the SLP operand.
(supportable_indirect_convert_operation): Likewise.
---
 gcc/tree-vect-generic.cc |  6 ++
 gcc/tree-vect-slp.cc | 19 +++
 gcc/tree-vect-stmts.cc   | 38 --
 gcc/tree-vectorizer.h|  4 +++-
 4 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index c2f7a29d539..173ebd9a7ba 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1755,10 +1755,8 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 modifier = WIDEN;
 
   auto_vec > converts;
-  if (supportable_indirect_convert_operation (code,
- ret_type, arg_type,
- converts,
- arg))
+  if (supportable_indirect_convert_operation (code, ret_type, arg_type,
+ converts))
 {
   new_rhs = arg;
   for (unsigned int i = 0; i < converts.length () - 1; i++)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ac1733004b6..8ed746ea5a9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10199,6 +10199,25 @@ vect_create_constant_vectors (vec_info *vinfo, 
slp_tree op_node)
   SLP_TREE_VEC_DEFS (op_node).quick_push (vop);
 }
 
+/* Get the scalar definition of the Nth lane from SLP_NODE or NULL_TREE
+   if there is no definition for it in the scalar IL or it is not known.  */
+
+tree
+vect_get_slp_scalar_def (slp_tree slp_node, unsigned n)
+{
+  if (SLP_TREE_DEF_TYPE (slp_node) == vect_internal_def)
+{
+  if (!SLP_TREE_SCALAR_STMTS (slp_node).exists ())
+   return NULL_TREE;
+  stmt_vec_info def = SLP_TREE_SCALAR_STMTS (slp_node)[n];
+  if (!def)
+   return NULL_TREE;
+  return gimple_get_lhs (STMT_VINFO_STMT (def));
+}
+  else
+return SLP_TREE_SCALAR_OPS (slp_node)[n];
+}
+
 /* Get the Ith vectorized definition from SLP_NODE.  */
 
 tree
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1b639ae3b17..6bbb16beff2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5610,10 +5610,8 @@ vectorizable_conversion (vec_info *vinfo,
return false;
   gcc_assert (code.is_tree_code ());
   if (supportable_indirect_convert_operation (code,
- vectype_out,
- vectype_in,
- converts,
- op0))
+ vectype_out, vectype_in,
+ converts, op0, slp_op0))
{
  gcc_assert (converts.length () <= 2);
  if (converts.length () == 1)
@@ -5750,7 +5748,16 @@ vectorizable_conversion (vec_info *vinfo,
   else if (code == FLOAT_EXPR)
{
  wide_int op_min_value, op_max_value;
- if (!vect_get_range_info (op0, &op_min_value, &op_max_value))
+ if (slp_node)
+   {
+ tree def;
+ /* ???  Merge ranges in case of more than one lane.  */
+ if (SLP_TREE_LANES (slp_op0) != 1
+ || !(def = vect_get_slp_scalar_def (slp_op0, 0))
+ || !vect_get_range_info (def, &op_min_value, &op_max_value))
+   goto unsupported;
+   }
+ else if (!vect_get_range_info (op0, &op_min_value, &op_max_value))
goto unsupported;
 
  cvt_type
@@ -15197,7 +15204,7 @@ supportable_indirect_convert_operation (code_helper 
code,
tree vectype_out,
tree vectype_in,
vec > 
&converts,
-   tree op0)
+   tree op0, slp_tree slp_op0)
 {
   bool found_mode = fal

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Robin Dapp

> Inspired by PR118103, the VXRM register should be treated almost the
> same as the FRM register, aka cooperatively-managed global register.
> Thus, add the VXRM to global_regs to avoid the elimination by the
> late-combine pass.

OK (unless the CI complains of course).

-- 
Regards
 Robin

Re: [PATCH v4] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

On Fri, Feb 7, 2025 at 11:00 AM Alexandre Oliva  wrote:
>
> On Feb  6, 2025, Sam James  wrote:
>
> > Richard Biener  writes:
> >> On Thu, Feb 6, 2025 at 2:41 PM Alexandre Oliva  wrote:
> >>>
> >>> On Jan 27, 2025, Richard Biener  wrote:
> >>> > (I see the assert is no longer in the patch).
> >>>
> >>> That's because the assert went in as part of an earlier patch.  I take
> >>> it it should be backed out along with the to-be-split-out bits above,
> >>> right?
> >>
> >> Yes.
> >>
> >> (IIRC there's also a PR tripping over this or a similar assert)
>
> > Right, PR118706.
>
> Thanks.  I've added its testcase to the patch below, reverted the
> assert, and dropped the other unwanted bits.  Regstrapped on
> x86_64-linux-gnu.  Ok to install?

OK.

Thanks,
Richard.

>
>
> If decode_field_reference finds a load that accesses past the inner
> object's size, bail out.
>
> Drop the too-strict assert.
>
>
> for  gcc/ChangeLog
>
> PR tree-optimization/118514
> PR tree-optimization/118706
> * gimple-fold.cc (decode_field_reference): Refuse to consider
> merging out-of-bounds BIT_FIELD_REFs.
> (make_bit_field_load): Drop too-strict assert.
> * tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
> (access_in_bounds_of_type_p): ... this.  Change interface,
> export.
> (tree_could_trap_p): Adjust.
> * tree-eh.h (access_in_bounds_of_type_p): Declare.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/118514
> PR tree-optimization/118706
> * gcc.dg/field-merge-25.c: New.
> ---
>  gcc/gimple-fold.cc|   11 ++-
>  gcc/testsuite/gcc.dg/field-merge-25.c |   15 +++
>  gcc/tree-eh.cc|   25 +
>  gcc/tree-eh.h |1 +
>  4 files changed, 31 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-25.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 45485782cdf91..29191685a43c5 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>|| bs <= shiftrt
>|| offset != 0
>|| TREE_CODE (inner) == PLACEHOLDER_EXPR
> -  /* Reject out-of-bound accesses (PR79731).  */
> -  || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> -  bp + bs) < 0)
> +  /* Reject out-of-bound accesses (PR79731, PR118514).  */
> +  || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp)
>|| (INTEGRAL_TYPE_P (TREE_TYPE (inner))
>   && !type_has_mode_precision_p (TREE_TYPE (inner
>  return NULL_TREE;
> @@ -7859,11 +7857,6 @@ make_bit_field_load (location_t loc, tree inner, tree 
> orig_inner, tree type,
>gimple *new_stmt = gsi_stmt (i);
>if (gimple_has_mem_ops (new_stmt))
> gimple_set_vuse (new_stmt, reaching_vuse);
> -  gcc_checking_assert (! (gimple_assign_load_p (point)
> - && gimple_assign_load_p (new_stmt))
> -  || (tree_could_trap_p (gimple_assign_rhs1 (point))
> -  == tree_could_trap_p (gimple_assign_rhs1
> -(new_stmt;
>  }
>
>gimple_stmt_iterator gsi = gsi_for_stmt (point);
> diff --git a/gcc/testsuite/gcc.dg/field-merge-25.c 
> b/gcc/testsuite/gcc.dg/field-merge-25.c
> new file mode 100644
> index 0..e769b0ae7b846
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-25.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-fre" } */
> +
> +/* PR tree-optimization/118706 */
> +
> +int a[1][1][3], b;
> +int main() {
> +  int c = -1;
> +  while (b) {
> +if (a[c][c][6])
> +  break;
> +if (a[0][0][0])
> +  break;
> +  }
> +}
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index 7015189a2de83..a4d59954c0597 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref)
>return true;
>  }
>
> -/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known 
> to
> -   be in bounds for the referred operand type.  */
> +/* Return true iff a BIT_FIELD_REF <(TYPE)???, SIZE, OFFSET> would access a 
> bit
> +   range that is known to be in bounds for TYPE.  */
>
> -static bool
> -bit_field_ref_in_bounds_p (tree expr)
> +bool
> +access_in_bounds_of_type_p (tree type, poly_uint64 size, poly_uint64 offset)
>  {
> -  tree size_tree;
> -  poly_uint64 size_max, min, wid, max;
> +  tree type_size_tree;
> +  poly_uint64 type_size_max, min = offset, wid = size, max;
>
> -  size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0)));
> -  if (!size_tree || !poly_int_tree_p (size_tree, &size_max))
> +  type_size_tree = TYPE_SIZE (type);
> +  if (!type_size_tree || !poly_int_tree_p (type_si

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c


On 07/02/2025 09:40, Tobias Burnus wrote:

This patch is part of the following series (all unreviewed so far)
but can be independently applied:

* [Patch] [gcn] Fix gfx906's sramecc setting,
   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675251.html

* "[gcn] Add gfx9-generic and generic-associated gfx*"
   (email subject: "Re: [Patch] [GCN] Handle generic ISA names in 
libgomp's plugin-gcn.c";
    this thread), https://gcc.gnu.org/pipermail/gcc-patches/2025- 
February/675259.html


* * *

This patch permits loading generic ISA code objects - by just
trying whether the runtime accepts it.  If not, it fails with
an error. - The error messages should be a bit more helpful in
some cases as before.


OK for mainline?


This becomes useful by configuring a gfx*-generic multilib,
once ROCR ("ROCm") support it; thus, this is a future proof
patch.

* * *

Note: This currently fails with all ROCm <= 6.3.2 as those
either don't recognize the generic ISA code or do not support
generic resolution. However, it looks as if one of the next
ROCm (AOMP?) releases will do.

Note 2: As the generic ISA is not yet supported, this patch does
not suggest compiling with -march=gfx*-generic, yet.

Note 3: The patch series is based on it (with some modifications):
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html
that also has this -march= diagnostic.

Tobias



+
+  /* Either nongeneric and mismatch of the ISA - or generic but
+ not handled by the ROCm (e.g. because it is too old).  */
+


/* If we get here, either the binary is non-generic 


+snprintf (msg, sizeof msg,
+ "GCN code object ISA '%s' is incompatile to GPU ISA '%s' "
+ "(device %d).\n"


Spelling of "incompatible", and s/to/with/. In two places.

Also I think all the sentences should finish with '.'.


-  if (!isa_matches_agent (agent, image))
+  if (!isa_matches_agent (agent, image, HSA_STATUS_SUCCESS))
goto fail;


/* Check the ISA early because older ROCm had unhelpful errors.  */

Andrew

[Patch][v2] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c


Andrew Stubbs wrote:

On 07/02/2025 09:40, Tobias Burnus wrote:

This patch permits loading generic ISA code objects - by just
trying whether the runtime accepts it.  If not, it fails with
an error. - The error messages should be a bit more helpful in
some cases as before.


...


Also I think all the sentences should finish with '.'.


Thanks for proof reading. Updated patch attached.

I also added the final sentence-end period, for consistency. But I note 
that this is a plugin-gcn-ism; there is even a GCC warning that 
'warning_at'/error_at' diagnostic does not end with a full stop.


OK for mainline?

Tobias

PS: Pending patches:

* mkoffload.cc: switch -march= to generic version if it has a multilib 
and the specific one hasn't


* amdhsa.version fix

And otherwise to do:

* [Waiting for ROCm update] plugin-gcn: Suggest -march=gfx*-generic 
besides -march=gfx.


* Update install.texi – could be done now or once ROCm supports it?

Cf. original patch 
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html for 
the last two.
[GCN] Handle generic ISA names in libgomp's plugin-gcn.c

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (ELFABIVERSION_AMDGPU_HSA_V6,
	EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET,
	GET_GENERIC_VERSION): New #define.
	(elf_gcn_isa_is_generic): New.
	(isa_matches_agent): Accept all generic code objects on the first
	go; extend the diagnostic and handle runtime-failed case.
	(create_and_finalize_hsa_program): Call it also after loading
	the code failed, pass the status.

 libgomp/plugin/plugin-gcn.c | 118 ++--
 1 file changed, 92 insertions(+), 26 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8015a6f80f3..5c65778191a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -66,6 +66,14 @@
 #define R_AMDGPU_RELATIVE64	13	/* B + A  */
 #endif
 
+#define ELFABIVERSION_AMDGPU_HSA_V6		4
+
+#define EF_AMDGPU_GENERIC_VERSION_V		0xff00  /* Mask.  */
+#define EF_AMDGPU_GENERIC_VERSION_OFFSET	24
+
+#define GET_GENERIC_VERSION(VAR) ((VAR & EF_AMDGPU_GENERIC_VERSION_V) \
+  >> EF_AMDGPU_GENERIC_VERSION_OFFSET)
+
 /* GCN specific definitions for asynchronous queues.  */
 
 #define ASYNC_QUEUE_SIZE 64
@@ -242,7 +250,7 @@ struct kernel_dispatch
 };
 
 /* Structure of the kernargs segment, supporting console output.
- 
+
This needs to match the definitions in Newlib, and the expectations
in libgomp target code.  */
 
@@ -1668,6 +1676,13 @@ elf_gcn_isa_field (Elf64_Ehdr *image)
   return image->e_flags & EF_AMDGPU_MACH_MASK;
 }
 
+static int
+elf_gcn_isa_is_generic (Elf64_Ehdr *image)
+{
+  return (image->e_ident[8] == ELFABIVERSION_AMDGPU_HSA_V6
+	  && GET_GENERIC_VERSION (image->e_flags));
+}
+
 /* Returns the name that the HSA runtime uses for the ISA or NULL if we do not
support the ISA. */
 
@@ -2399,38 +2414,88 @@ init_basic_kernel_info (struct kernel_info *kernel,
   return true;
 }
 
-/* Check that the GCN ISA of the given image matches the ISA of the agent. */
+/* If status is SUCCESS, assume that the code runs if either the ISA of agent
+   and code is the same - or it is generic code.
+   Otherwise, execution failed with the provided status code; try to give
+   some useful diagnostic.  */
 
 static bool
-isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image)
+isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image,
+		   hsa_status_t status)
 {
+  /* Generic image - assume that it works and only return to here
+ when it fails, i.e. fatal == true.  */
+  if (status == HSA_STATUS_SUCCESS && elf_gcn_isa_is_generic (image))
+return true;
+
   int isa_field = elf_gcn_isa_field (image);
-  const char* isa_s = isa_name (isa_field);
-  if (!isa_s)
+  if (status == HSA_STATUS_SUCCESS && isa_field == agent->device_isa)
+return true;
+
+  /* If we get here, either the binary is non-generic and has a mismatch of
+ the ISA - or is generic but not handled by the ROCm (e.g. because ROCm
+ is too old).  */
+
+  char msg[340];
+  char agent_isa_xs[8];
+  char device_isa_xs[8];
+  const char *agent_isa_s = isa_name (agent->device_isa);
+  const char *device_isa_s = isa_name (isa_field);
+  if (agent_isa_s == NULL)
 {
-  hsa_error ("Unsupported ISA in GCN code object.", HSA_STATUS_ERROR);
-  return false;
+  snprintf (agent_isa_xs, sizeof agent_isa_xs,
+		"0x%X", agent->device_isa);
+  agent_isa_s = agent_isa_xs;
 }
-
-  if (isa_field != agent->device_isa)
+  if (device_isa_s == NULL)
 {
-  char msg[204];
-  const char *agent_isa_s = isa_name (agent->device_isa);
-  assert (agent_isa_s);
-
-  snprintf (msg, sizeof msg,
-		"GCN code object ISA '%s' does not match GPU ISA '%s' "
-		"(device %d).\n"
-		"Try to recompile with '-foffload-options=-march=%s',\n"
-		"or use ROCR_VISIBLE_DEVICES to disable incompatible "
-		"devices.\n",
-		isa_s, agent_isa_s, agent->device_id, agent_isa_

[PATCH] testsuite: LoongArch: Remove from btrunc, ceil, and floor effective target allowlist

Now that C default is C23, so we can no longer use LSX/LASX instructions
for these operations as the standard disallows raising INEXACT
exceptions.  So LoongArch is no longer suitable for these effective
targets.

Fix the test failures on gcc.dg/vect/vect-rounding-*.c.  For the old
standards or -ffp-int-builtin-inexact we already provide test coverage
with gcc.target/loongarch/vect-ftint.c.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_btrunc): Drop LoongArch.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floor): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_lfloor): Likewise.
(check_effective_target_vect_call_lfloorf): Likewise.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  The test failures
on gcc.dg/vect/vect-rounding-*.c are fixed.  Ok for trunk?

 gcc/testsuite/lib/target-supports.exp | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 60e24129bd5..432e1862c7e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9708,8 +9708,7 @@ proc check_effective_target_vect_call_lrint { } {
 proc check_effective_target_vect_call_btrunc { } {
 return [check_cached_effective_target_indexed vect_call_btrunc {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector btruncf calls.
@@ -9717,8 +9716,7 @@ proc check_effective_target_vect_call_btrunc { } {
 proc check_effective_target_vect_call_btruncf { } {
 return [check_cached_effective_target_indexed vect_call_btruncf {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector ceil calls.
@@ -9726,8 +9724,7 @@ proc check_effective_target_vect_call_btruncf { } {
 proc check_effective_target_vect_call_ceil { } {
 return [check_cached_effective_target_indexed vect_call_ceil {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector ceilf calls.
@@ -9735,8 +9732,7 @@ proc check_effective_target_vect_call_ceil { } {
 proc check_effective_target_vect_call_ceilf { } {
 return [check_cached_effective_target_indexed vect_call_ceilf {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector floor calls.
@@ -9744,8 +9740,7 @@ proc check_effective_target_vect_call_ceilf { } {
 proc check_effective_target_vect_call_floor { } {
 return [check_cached_effective_target_indexed vect_call_floor {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector floorf calls.
@@ -9753,24 +9748,21 @@ proc check_effective_target_vect_call_floor { } {
 proc check_effective_target_vect_call_floorf { } {
 return [check_cached_effective_target_indexed vect_call_floorf {
   expr { [istarget aarch64*-*-*]
-|| [istarget amdgcn-*-*]
-|| [istarget loongarch*-*-*] }}]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector lceil calls.
 
 proc check_effective_target_vect_call_lceil { } {
 return [check_cached_effective_target_indexed vect_call_lceil {
-  expr { [istarget aarch64*-*-*]
-|| [istarget loongarch*-*-*] }}]
+  expr { [istarget aarch64*-*-*] }}]
 }
 
 # Return 1 if the target supports vector lfloor calls.
 
 proc check_effective_target_vect_call_lfloor { } {
 return [check_cached_effective_target_indexed vect_call_lfloor {
-  expr { [istarget aarch64*-*-*]
-|| [istarget loongarch*-*-*] }}]
+  expr { [istarget aarch64*-*-*] }}]
 }
 
 # Return 1 if the target supports vector nearbyint calls.
-- 
2.48.1

[PATCH] aarch64: Update fp8 dependencies

2025-02-07 Thread Andrew Carlotti

We agreed with LLVM developer to not enforce the architectural
dependencies between fp8 multiplication features, and they have already
been removed from LLVM and Binutils.  Remove them from GCC as well.



I have bootstrapped and regression tested this.  There are no test
result changes between GCC+Binutils with old feature dependencies and
GCC+Binutils with new feature dependencies, and some improvements
compared to old GCC with new Binutils.

Ok for master?


gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def
(SSVE_FP8FMA): Adjust formatting.
(FP8DOT4): Replace FP8FMA dependency with FP8.
(SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8.
(FP8DOT2): Replace FP8DOT4 dependency with FP8.
(SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected
defines.
* gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target
pragmas.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c:
Ditto.
* 
gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
cc42bd518dca5e4b947c81f06e543133b4f25440..aa8d315c240fbd25b49008b131cc09f04001eb80
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -261,17 +261,17 @@ AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "f8cvt")
 
 AARCH64_OPT_EXTENSION("fp8fma", FP8FMA, (FP8), (), (), "f8fma")
 
-AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2,FP8), (), (), 
"smesf8fma")
+AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2, FP8), (), (), 
"smesf8fma")
 
 AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax")
 
-AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8FMA), (), (), "f8dp4")
+AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8), (), (), "f8dp4")
 
-AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SSVE_FP8FMA), (), (), 
"smesf8dp4")
+AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SME2, FP8), (), (), 
"smesf8dp4")
 
-AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8DOT4), (), (), "f8dp2")
+AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8), (), (), "f8dp2")
 
-AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SSVE_FP8DOT4), (), (), 
"smesf8dp2")
+AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SME2, FP8), (), (), 
"smesf8dp2")
 
 AARCH64_OPT_EXTENSION("lut", LUT, (SIMD), (), (), "lut")
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c 
b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
index 
0dcfbec05bad5f446c9f169051c9b86b9844946d..97d68b94512e1ffdd5ceb484a6378b3a1ec9d115
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
@@ -292,7 +292,7 @@
 #ifndef __ARM_FEATURE_FP8
 #error Foo
 #endif
-#ifndef __ARM_FEATURE_FP8FMA
+#ifdef __ARM_FEATURE_FP8FMA
 #error Foo
 #endif
 #ifndef __ARM_FEATURE_FP8DOT4
@@ -306,10 +306,10 @@
 #ifndef __ARM_FEATURE_FP8
 #error Foo
 #endif
-#ifndef __ARM_FEATURE_FP8FMA
+#ifdef __ARM_FEATURE_FP8FMA
 #error Foo
 #endif
-#ifndef __ARM_FEATURE_FP8DOT4
+#ifdef __ARM_FEATURE_FP8DOT4
 #error Foo
 #endif
 #ifndef __ARM_FEATURE_FP8DOT2
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
index 
d1a69f4ba54133a5d6d19b5fb73c2768ec29e60b..739ff4c6a75a8014637b2b48d8121127ad6a8539
 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
@@ -2,7 +2,7 @@
 
 #include "arm_neon.h"
 
-#pragma GCC target "+fp8dot4+fp8dot2"
+#pragma GCC target "+fp8fma"
 
 void
 test(float16x4_t f16, float16x8_t f16q, float32x2_t f32,
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
index 
9ad789a8ad2c5df109d6471a7ca22355ba26edea..fa0df46db2262a5a3e17bec974fb4807886708e9
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
@@ -2,7 +2,7 @@
 
 #include 
 
-#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
+#pragma GCC target ("arch=armv8.2-a+sve2+fp8fma+fp8dot4+fp8dot2")
 
 void
 test (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm, 
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
index 
dec00e3abf15e054fbd3f0964c00732f71de14ea..f6fce2f5c40f3da214d

[PATCH 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

For

  a = (v4si){0x, 0x, 0x, 0x}

we just want

  vrepli.b $vr0, 0xdd

but the compiler actually produces a load:

  la.local $r14,.LC0
  vld  $vr0,$r14,0

It's because we only tried vrepli.d which wouldn't work.  Try all vrepli
instructions for const int vector materializing to fix it.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_const_vector_vrepli): New function prototype.
* config/loongarch/loongarch.cc (loongarch_const_vector_vrepli):
Implement.
(loongarch_const_insns): Call loongarch_const_vector_vrepli
instead of loongarch_const_vector_same_int_p.
(loongarch_split_vector_move_p): Likewise.
(loongarch_output_move): Use loongarch_const_vector_vrepli to
pun operend[1] into a better mode if it's a const int vector,
and decide the suffix of [x]vrepli with the new mode.
* config/loongarch/constraints.md (YI): Call
loongarch_const_vector_vrepli instead of
loongarch_const_vector_same_int_p.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vrepli.c: New test.
---
 gcc/config/loongarch/constraints.md |  2 +-
 gcc/config/loongarch/loongarch-protos.h |  1 +
 gcc/config/loongarch/loongarch.cc   | 34 ++---
 gcc/testsuite/gcc.target/loongarch/vrepli.c | 15 +
 4 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index a7c31c2c4e0..97a4e4e35d3 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -301,7 +301,7 @@ (define_constraint "YI"
A replicated vector const in which the replicated value is in the range
[-512,511]."
   (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, -512, 511)")))
+   (match_test "loongarch_const_vector_vrepli (op, mode)")))
 
 (define_constraint "YC"
   "@internal
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index b99f949a004..20acca690c8 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -121,6 +121,7 @@ extern bool loongarch_const_vector_same_int_p (rtx, 
machine_mode,
 extern bool loongarch_const_vector_shuffle_set_p (rtx, machine_mode);
 extern bool loongarch_const_vector_bitimm_set_p (rtx, machine_mode);
 extern bool loongarch_const_vector_bitimm_clr_p (rtx, machine_mode);
+extern rtx loongarch_const_vector_vrepli (rtx, machine_mode);
 extern rtx loongarch_lsx_vec_parallel_const_half (machine_mode, bool);
 extern rtx loongarch_gen_const_int_vector (machine_mode, HOST_WIDE_INT);
 extern enum reg_class loongarch_secondary_reload_class (enum reg_class,
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..e036f802fde 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1846,6 +1846,28 @@ loongarch_const_vector_shuffle_set_p (rtx op, 
machine_mode mode)
   return true;
 }
 
+rtx
+loongarch_const_vector_vrepli (rtx x, machine_mode mode)
+{
+  int size = GET_MODE_SIZE (mode);
+
+  if (GET_CODE (x) != CONST_VECTOR
+  || GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
+return NULL_RTX;
+
+  for (scalar_int_mode elem_mode: {QImode, HImode, SImode, DImode})
+{
+  machine_mode new_mode =
+   mode_for_vector (elem_mode, size / GET_MODE_SIZE (elem_mode))
+ .require ();
+  rtx op = lowpart_subreg (new_mode, x, mode);
+  if (loongarch_const_vector_same_int_p (op, new_mode, -512, 511))
+   return op;
+}
+
+  return NULL_RTX;
+}
+
 /* Return true if rtx constants of mode MODE should be put into a small
data section.  */
 
@@ -2501,7 +2523,7 @@ loongarch_const_insns (rtx x)
 case CONST_VECTOR:
   if ((LSX_SUPPORTED_MODE_P (GET_MODE (x))
   || LASX_SUPPORTED_MODE_P (GET_MODE (x)))
- && loongarch_const_vector_same_int_p (x, GET_MODE (x), -512, 511))
+ && loongarch_const_vector_vrepli (x, GET_MODE (x)))
return 1;
   /* Fall through.  */
 case CONST_DOUBLE:
@@ -4656,7 +4678,7 @@ loongarch_split_vector_move_p (rtx dest, rtx src)
   /* Check for vector set to an immediate const vector with valid replicated
  element.  */
   if (FP_REG_RTX_P (dest)
-  && loongarch_const_vector_same_int_p (src, GET_MODE (src), -512, 511))
+  && loongarch_const_vector_vrepli (src, GET_MODE (src)))
 return false;
 
   /* Check for vector load zero immediate.  */
@@ -4792,13 +4814,15 @@ loongarch_output_move (rtx *operands)
   && src_code == CONST_VECTOR
   && CONST_INT_P (CONST_VECTOR_ELT (src, 0)))
 {
-  gcc_assert (loongarch_const_vector_same_int_p (src, mode, -512, 511));
+  operands[1] = loongarch_const_vector_vrepli (src, mode);
+  gcc_assert (operands[1]);

[PATCH 4/8] LoongArch: Simplify {lsx_, lasx_x}hv{add, sub}w description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX_XVHADDW_QU_DU): Remove.
(UNSPEC_LASX_XVHSUBW_QU_DU): Remove.
(lasx_xvhw_h_b): Remove.
(lasx_xvhw_w_h): Remove.
(lasx_xvhw_d_w): Remove.
(lasx_xvhaddw_q_d): Remove.
(lasx_xvhsubw_q_d): Remove.
(lasx_xvhaddw_qu_du): Remove.
(lasx_xvhsubw_qu_du): Remove.
(reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead
of gen_lasx_xvhaddw_q_d.
(reduc_plus_scal_v8si): Likewise.
* config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove.
(UNSPEC_ASX_VHSUBW_Q_D): Remove.
(UNSPEC_ASX_VHADDW_QU_DU): Remove.
(UNSPEC_ASX_VHSUBW_QU_DU): Remove.
(lsx_vhw_h_b): Remove.
(lsx_vhw_w_h): Remove.
(lsx_vhw_d_w): Remove.
(lsx_vhaddw_q_d): Remove.
(lsx_vhsubw_q_d): Remove.
(lsx_vhaddw_qu_du): Remove.
(lsx_vhsubw_qu_du): Remove.
(reduc_plus_scal_v2di): Change the temporary register mode to
V1TI, and pun the mode calling gen_vec_extractv2didi.
(reduc_plus_scal_v4si): Change the temporary register mode to
V1TI.
* config/loongarch/predicates.md (vect_par_cnst_even_half): New
define_special_predicate.
(vect_par_cnst_even_or_odd_half): Likewise.
* config/loongarch/simd.md (simd_hw__): New
define_insn.
(_vhw__): New
define_expand.
(_hw_q_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with
punned expand.
(CODE_FOR_lsx_vhaddw_qu_du): Likewise.
(CODE_FOR_lsx_vhsubw_q_d): Likewise.
(CODE_FOR_lsx_vhsubw_qu_du): Likewise.
(CODE_FOR_lasx_xvhaddw_q_d): Likewise.
(CODE_FOR_lasx_xvhaddw_qu_du): Likewise.
(CODE_FOR_lasx_xvhsubw_q_d): Likewise.
(CODE_FOR_lasx_xvhsubw_qu_du): Likewise.
---
 gcc/config/loongarch/lasx.md   | 126 +
 gcc/config/loongarch/loongarch-builtins.cc |  10 ++
 gcc/config/loongarch/lsx.md| 108 +-
 gcc/config/loongarch/predicates.md |  16 +++
 gcc/config/loongarch/simd.md   |  48 
 5 files changed, 81 insertions(+), 227 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 640fa028f1e..1dc11840187 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -100,10 +100,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVMADDWOD
   UNSPEC_LASX_XVMADDWOD2
   UNSPEC_LASX_XVMADDWOD3
-  UNSPEC_LASX_XVHADDW_Q_D
-  UNSPEC_LASX_XVHSUBW_Q_D
-  UNSPEC_LASX_XVHADDW_QU_DU
-  UNSPEC_LASX_XVHSUBW_QU_DU
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -1407,76 +1403,6 @@ (define_insn "fixuns_trunc2"
(set_attr "cnv_mode" "")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvhw_h_b"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (addsub:V16HI
- (any_extend:V16HI
-   (vec_select:V16QI
- (match_operand:V32QI 1 "register_operand" "f")
- (parallel [(const_int 1) (const_int 3)
-(const_int 5) (const_int 7)
-(const_int 9) (const_int 11)
-(const_int 13) (const_int 15)
-(const_int 17) (const_int 19)
-(const_int 21) (const_int 23)
-(const_int 25) (const_int 27)
-(const_int 29) (const_int 31)])))
- (any_extend:V16HI
-   (vec_select:V16QI
- (match_operand:V32QI 2 "register_operand" "f")
- (parallel [(const_int 0) (const_int 2)
-(const_int 4) (const_int 6)
-(const_int 8) (const_int 10)
-(const_int 12) (const_int 14)
-(const_int 16) (const_int 18)
-(const_int 20) (const_int 22)
-(const_int 24) (const_int 26)
-(const_int 28) (const_int 30)])]
-  "ISA_HAS_LASX"
-  "xvhw.h.b\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "V16HI")])
-
-(define_insn "lasx_xvhw_w_h"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (addsub:V8SI
- (any_extend:V8SI
-   (vec_select:V8HI
- (match_operand:V16HI 1 "register_operand" "f")
- (parallel [(const_int 1) (const_int 3)
-(const_int 5) (const_int 7)
-(const_int 9) (const_int 11)
-(const_int 13) (const_int 15)])))
- (any_extend:V8SI
-   (vec_selec

[PATCH 2/8] LoongArch: Allow moving TImode vectors

We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors.  Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.

For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX registers so we won't get a reload failure when we start to
save TImode vectors in these registers.

This implicitly depends on the vrepli optimization: without it we'd try
"vrepli.q" which does not really exist and trigger an ICE.

gcc/ChangeLog:

* config/loongarch/lsx.md (mov): Remove.
(movmisalign): Remove.
(mov_lsx): Remove.
* config/loongarch/lasx.md (mov): Remove.
(movmisalign): Remove.
(mov_lasx): Remove.
* config/loongarch/simd.md (ALLVEC_TI): New mode iterator.
(mov): Likewise.
(mov_simd): New define_insn_and_split.
---
 gcc/config/loongarch/lasx.md | 40 --
 gcc/config/loongarch/lsx.md  | 36 ---
 gcc/config/loongarch/simd.md | 42 
 3 files changed, 42 insertions(+), 76 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index a37c85a25a4..d82ad61be60 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -699,46 +699,6 @@ (define_expand "lasx_xvrepli"
   DONE;
 })
 
-(define_expand "mov"
-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-
-(define_expand "movmisalign"
-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-;; 256-bit LASX modes can only exist in LASX registers or memory.
-(define_insn "mov_lasx"
-  [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
-   (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
-  "ISA_HAS_LASX"
-  { return loongarch_output_move (operands); }
-  [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
-   (set_attr "mode" "")
-   (set_attr "length" "8,4,4,4,4")])
-
-
-(define_split
-  [(set (match_operand:LASX 0 "nonimmediate_operand")
-   (match_operand:LASX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LASX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
 
 ;; LASX
 (define_insn "add3"
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index ca0066a21ed..bcc5ae85fb3 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -575,42 +575,6 @@ (define_insn "lsx_vshuf_"
   [(set_attr "type" "simd_sld")
(set_attr "mode" "")])
 
-(define_expand "mov"
-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_expand "movmisalign"
-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_insn "mov_lsx"
-  [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f,*r")
-   (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r,*r"))]
-  "ISA_HAS_LSX"
-{ return loongarch_output_move (operands); }
-  [(set_attr "type" 
"simd_move,simd_load,simd_store,simd_copy,simd_insert,simd_copy")
-   (set_attr "mode" "")])
-
-(define_split
-  [(set (match_operand:LSX 0 "nonimmediate_operand")
-   (match_operand:LSX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LSX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
 
 ;; Integer operations
 (define_insn "add3"
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 7605b17d21e..61fc1ab20ad 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -130,6 +130,48 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
 ;; instruction here so we can avoid duplicating logics.
 ;; ===
 
+
+;; Move
+
+;; Some immediate values in V1TI or V2TI may be stored in LSX or LASX
+;; registers, thus we need to allow moving them for reload.
+(define_mode_iterator ALLVEC_TI [ALLVEC
+(V1TI "ISA_HAS_LSX")
+(V2TI "ISA_HAS_LASX")])
+
+(define_expand "mov"
+  [(set (match_operand:ALLVEC_TI 0)
+   (match_operand:ALLVEC_TI 1))]
+  ""
+{
+  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
+DONE;
+})
+
+(define_expand "movmisalign"
+  [(set (match_operand:ALLVEC_TI 0)
+   (match_operand:ALLVEC_TI 1))]
+  ""
+{
+  if (loongarch_legitimize_move (mode, operands[0], operands[1])

[PATCH 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

This series is intended to fix some test failures on
vect-reduc-chain-*.c by adding the [su]dot_prod* expand for LSX and LASX
vector modes.  But the code base of the related instructions was not
readable, so clean it up first (using the approach learnt from AArch64)
before adding the expands.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (8):
  LoongArch: Try harder using vrepli instructions to materialize const
vectors
  LoongArch: Allow moving TImode vectors
  LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
  LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description
  LoongArch: Simplify {lsx_,lasx_x}maddw description
  LoongArch: Simplify {lsx,lasx_x}vpick description
  LoongArch: Implement vec_widen_mult_{even,odd}_* for LSX and LASX
modes
  LoongArch: Implement [su]dot_prod* for LSX and LASX modes

 gcc/config/loongarch/constraints.md   |2 +-
 gcc/config/loongarch/lasx.md  | 1222 +
 gcc/config/loongarch/loongarch-builtins.cc|   60 +
 gcc/config/loongarch/loongarch-modes.def  |2 +
 gcc/config/loongarch/loongarch-protos.h   |3 +
 gcc/config/loongarch/loongarch.cc |   50 +-
 gcc/config/loongarch/loongarch.md |2 +-
 gcc/config/loongarch/lsx.md   |  984 +
 gcc/config/loongarch/predicates.md|   43 +
 gcc/config/loongarch/simd.md  |  408 +-
 gcc/testsuite/gcc.target/loongarch/vrepli.c   |   15 +
 .../gcc.target/loongarch/wide-mul-reduc-1.c   |   18 +
 .../gcc.target/loongarch/wide-mul-reduc-2.c   |   18 +
 13 files changed, 619 insertions(+), 2208 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c

-- 
2.48.1

[PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvpickev_b): Remove.
(lasx_xvpickev_h): Remove.
(lasx_xvpickev_w): Remove.
(lasx_xvpickev_w_f): Remove.
(lasx_xvpickod_b): Remove.
(lasx_xvpickod_h): Remove.
(lasx_xvpickod_w): Remove.
(lasx_xvpickev_w_f): Remove.
* config/loongarch/lsx.md (lsx_vpickev_b): Remove.
(lsx_vpickev_h): Remove.
(lsx_vpickev_w): Remove.
(lsx_vpickev_w_f): Remove.
(lsx_vpickod_b): Remove.
(lsx_vpickod_h): Remove.
(lsx_vpickod_w): Remove.
(lsx_vpickev_w_f): Remove.
* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_i): Make it same as simdfmt for integer vector
modes.
(_f): New define_mode_attr.
(simd_pick_evod_): New define_insn.
(_vpick_<_f>): New
define_expand.
---
 gcc/config/loongarch/lasx.md | 152 ---
 gcc/config/loongarch/lsx.md  | 120 ---
 gcc/config/loongarch/simd.md |  52 +++-
 3 files changed, 50 insertions(+), 274 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 4ac85b7fcf9..c31aefa892a 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1804,158 +1804,6 @@ (define_insn "lasx_xvnor_"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvpickev_b"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (vec_select:V32QI
- (vec_concat:V64QI
-   (match_operand:V32QI 1 "register_operand" "f")
-   (match_operand:V32QI 2 "register_operand" "f"))
- (parallel [(const_int 0) (const_int 2)
-(const_int 4) (const_int 6)
-(const_int 8) (const_int 10)
-(const_int 12) (const_int 14)
-(const_int 32) (const_int 34)
-(const_int 36) (const_int 38)
-(const_int 40) (const_int 42)
-(const_int 44) (const_int 46)
-(const_int 16) (const_int 18)
-(const_int 20) (const_int 22)
-(const_int 24) (const_int 26)
-(const_int 28) (const_int 30)
-(const_int 48) (const_int 50)
-(const_int 52) (const_int 54)
-(const_int 56) (const_int 58)
-(const_int 60) (const_int 62)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.b\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V32QI")])
-
-(define_insn "lasx_xvpickev_h"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (vec_select:V16HI
- (vec_concat:V32HI
-   (match_operand:V16HI 1 "register_operand" "f")
-   (match_operand:V16HI 2 "register_operand" "f"))
- (parallel [(const_int 0) (const_int 2)
-(const_int 4) (const_int 6)
-(const_int 16) (const_int 18)
-(const_int 20) (const_int 22)
-(const_int 8) (const_int 10)
-(const_int 12) (const_int 14)
-(const_int 24) (const_int 26)
-(const_int 28) (const_int 30)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.h\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V16HI")])
-
-(define_insn "lasx_xvpickev_w"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_select:V8SI
- (vec_concat:V16SI
-   (match_operand:V8SI 1 "register_operand" "f")
-   (match_operand:V8SI 2 "register_operand" "f"))
- (parallel [(const_int 0) (const_int 2)
-(const_int 8) (const_int 10)
-(const_int 4) (const_int 6)
-(const_int 12) (const_int 14)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.w\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8SI")])
-
-(define_insn "lasx_xvpickev_w_f"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (vec_select:V8SF
- (vec_concat:V16SF
-   (match_operand:V8SF 1 "register_operand" "f")
-   (match_operand:V8SF 2 "register_operand" "f"))
- (parallel [(const_int 0) (const_int 2)
-(const_int 8) (const_int 10)
-(const_int 4) (const_int 6)
-(const_int 12) (const_int 14)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.w\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvpickod_b"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (vec_select:V32QI
- (vec_concat:V64QI
-   (match_operand:V32QI 1 "register_operand" "f")
-   (match_operand:V32QI 2 "register_operand" "f"))
- (parallel [(const_int 1

[PATCH 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors.  To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even indices for define_insn's, and generate those
vectors in define_expand's.

For "backward compatibilty" we need to provide a "punned" version for
the operations invoking TImode vectors as the intrinsics still expect
DImode vectors.

The stat is "201 insertions, 905 deletions."

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove.
(UNSPEC_LASX_XVADDWEV2): Remove.
(UNSPEC_LASX_XVADDWEV3): Remove.
(UNSPEC_LASX_XVSUBWEV): Remove.
(UNSPEC_LASX_XVSUBWEV2): Remove.
(UNSPEC_LASX_XVMULWEV): Remove.
(UNSPEC_LASX_XVMULWEV2): Remove.
(UNSPEC_LASX_XVMULWEV3): Remove.
(UNSPEC_LASX_XVADDWOD): Remove.
(UNSPEC_LASX_XVADDWOD2): Remove.
(UNSPEC_LASX_XVADDWOD3): Remove.
(UNSPEC_LASX_XVSUBWOD): Remove.
(UNSPEC_LASX_XVSUBWOD2): Remove.
(UNSPEC_LASX_XVMULWOD): Remove.
(UNSPEC_LASX_XVMULWOD2): Remove.
(UNSPEC_LASX_XVMULWOD3): Remove.
(lasx_xvwev_h_b): Remove.
(lasx_xvwev_w_h): Remove.
(lasx_xvwev_d_w): Remove.
(lasx_xvaddwev_q_d): Remove.
(lasx_xvsubwev_q_d): Remove.
(lasx_xvmulwev_q_d): Remove.
(lasx_xvwod_h_b): Remove.
(lasx_xvwod_w_h): Remove.
(lasx_xvwod_d_w): Remove.
(lasx_xvaddwod_q_d): Remove.
(lasx_xvsubwod_q_d): Remove.
(lasx_xvmulwod_q_d): Remove.
(lasx_xvaddwev_q_du): Remove.
(lasx_xvsubwev_q_du): Remove.
(lasx_xvmulwev_q_du): Remove.
(lasx_xvaddwod_q_du): Remove.
(lasx_xvsubwod_q_du): Remove.
(lasx_xvmulwod_q_du): Remove.
(lasx_xvwev_h_bu_b): Remove.
(lasx_xvwev_w_hu_h): Remove.
(lasx_xvwev_d_wu_w): Remove.
(lasx_xvwod_h_bu_b): Remove.
(lasx_xvwod_w_hu_h): Remove.
(lasx_xvwod_d_wu_w): Remove.
(lasx_xvaddwev_q_du_d): Remove.
(lasx_xvsubwev_q_du_d): Remove.
(lasx_xvmulwev_q_du_d): Remove.
(lasx_xvaddwod_q_du_d): Remove.
(lasx_xvsubwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove.
(UNSPEC_LSX_VADDWEV2): Remove.
(UNSPEC_LSX_VADDWEV3): Remove.
(UNSPEC_LSX_VSUBWEV): Remove.
(UNSPEC_LSX_VSUBWEV2): Remove.
(UNSPEC_LSX_VMULWEV): Remove.
(UNSPEC_LSX_VMULWEV2): Remove.
(UNSPEC_LSX_VMULWEV3): Remove.
(UNSPEC_LSX_VADDWOD): Remove.
(UNSPEC_LSX_VADDWOD2): Remove.
(UNSPEC_LSX_VADDWOD3): Remove.
(UNSPEC_LSX_VSUBWOD): Remove.
(UNSPEC_LSX_VSUBWOD2): Remove.
(UNSPEC_LSX_VMULWOD): Remove.
(UNSPEC_LSX_VMULWOD2): Remove.
(UNSPEC_LSX_VMULWOD3): Remove.
(lsx_vwev_h_b): Remove.
(lsx_vwev_w_h): Remove.
(lsx_vwev_d_w): Remove.
(lsx_vaddwev_q_d): Remove.
(lsx_vsubwev_q_d): Remove.
(lsx_vmulwev_q_d): Remove.
(lsx_vwod_h_b): Remove.
(lsx_vwod_w_h): Remove.
(lsx_vwod_d_w): Remove.
(lsx_vaddwod_q_d): Remove.
(lsx_vsubwod_q_d): Remove.
(lsx_vmulwod_q_d): Remove.
(lsx_vaddwev_q_du): Remove.
(lsx_vsubwev_q_du): Remove.
(lsx_vmulwev_q_du): Remove.
(lsx_vaddwod_q_du): Remove.
(lsx_vsubwod_q_du): Remove.
(lsx_vmulwod_q_du): Remove.
(lsx_vwev_h_bu_b): Remove.
(lsx_vwev_w_hu_h): Remove.
(lsx_vwev_d_wu_w): Remove.
(lsx_vwod_h_bu_b): Remove.
(lsx_vwod_w_hu_h): Remove.
(lsx_vwod_d_wu_w): Remove.
(lsx_vaddwev_q_du_d): Remove.
(lsx_vsubwev_q_du_d): Remove.
(lsx_vmulwev_q_du_d): Remove.
(lsx_vaddwod_q_du_d): Remove.
(lsx_vsubwod_q_du_d): Remove.
(lsx_vmulwod_q_du_d): Remove.
* config/loongarch/loongarch-modes.def: Add V1TI and V4TI.
* config/loongarch/loongarch-protos.h
(loongarch_gen_stepped_int_parallel): New function prototype.
* config/loongarch/loongarch.cc (loongarch_print_operand):
Accept 'O' for printing "ev" or "od."
(loongarch_gen_stepped_int_parallel): Implement.
* config/loongarch/loongarch.md (mode): Add V1TI and V2TI.
* config/loongarch/predicates.md
(vect_par_cnst_even_or_odd_half): New define_predicate.
* config/loongarch/simd.md (WVEC_HALF): New define_mode_attr.
(simdfmt_w): Likewise.
(zero_one): New define_int_iterator.
(ev_od): New define_int_attr.
(simd_w_evod__): New define_insn.
(_vw__): New
define_expand.
(simd_w_evod__hetero): New define_insn.
(_vw__u_):
New define_expand.
(DIVEC): New define_mode_iterator.
(_w_q_d_punne

[PATCH 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization.  Also fix some test failures with the test cases:

- gcc.dg/vect/vect-reduc-chain-2.c
- gcc.dg/vect/vect-reduc-chain-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-4.c

gcc/ChangeLog:

* config/loongarch/simd.md (wvec_half): New define_mode_attr.
(dot_prod): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan
DOT_PROD_EXPR in optimized tree.
---
 gcc/config/loongarch/simd.md  | 29 +++
 .../gcc.target/loongarch/wide-mul-reduc-2.c   |  3 +-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index a888c7090ce..611d1f87dd2 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -84,6 +84,12 @@ (define_mode_attr WVEC_HALF [(V2DI "V1TI") (V4DI "V2TI")
 (V8HI "V4SI") (V16HI "V8SI")
 (V16QI "V8HI") (V32QI "V16HI")])
 
+;; Lower-case version.
+(define_mode_attr wvec_half [(V2DI "v1ti") (V4DI "v2ti")
+(V4SI "v2di") (V8SI "v4di")
+(V8HI "v4si") (V16HI "v8si")
+(V16QI "v8hi") (V32QI "v16hi")])
+
 ;; Integer vector modes with the same length and unit size as a mode.
 (define_mode_attr VIMODE [(V2DI "V2DI") (V4SI "V4SI")
  (V8HI "V8HI") (V16QI "V16QI")
@@ -804,6 +810,29 @@ (define_expand 
"_vmaddw__"
   DONE;
 })
 
+(define_expand "dot_prod"
+  [(match_operand: 0 "register_operand" "=f,f")
+   (match_operand:IVEC   1 "register_operand" " f,f")
+   (match_operand:IVEC   2 "register_operand" " f,f")
+   (match_operand: 3 "reg_or_0_operand" " 0,YG")
+   (any_extend (const_int 0))]
+  ""
+{
+  auto [op0, op1, op2, op3] = operands;
+
+  if (op3 == CONST0_RTX (mode))
+emit_insn (
+  gen__vmulwev__ (op0, op1, op2));
+  else
+emit_insn (
+  gen__vmaddwev__ (op0, op3, op1,
+  op2));
+
+  emit_insn (
+gen__vmaddwod__ (op0, op0, op1, op2));
+  DONE;
+})
+
 (define_insn "simd_maddw_evod__hetero"
   [(set (match_operand: 0 "register_operand" "=f")
(plus:
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
index 07a7601888a..61e92e58fc3 100644
--- a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlasx" } */
+/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */
 /* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */
+/* { dg-final { scan-tree-dump "DOT_PROD_EXPR" "optimized" } } */
 
 typedef __INT32_TYPE__ i32;
 typedef __INT64_TYPE__ i64;
-- 
2.48.1

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has


On 07/02/2025 10:17, Tobias Burnus wrote:

This patch is part of the following series (not yet in mainline);
this patch depends on the first one, but only makes sense if both are in:

* "[gcn] Add gfx9-generic and generic-associated gfx*"
   (email subject: "Re: [Patch] [GCN] Handle generic ISA names in 
libgomp's plugin-gcn.c";
    this thread), https://gcc.gnu.org/pipermail/gcc-patches/2025- 
February/675259.html


* "[Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c",
   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675274.html

* * *

This patch handles the following case (in mkoffload.cc):

* If for the specified specific device, no (multi)lib is available but for
   its generic ISA, automatically choose the generic ISA (with a warning).

Due to the double condition (not available for specified, available for
generic), it shouldn't cause surprises. Additionally, when not using
mkoffload, i.e. when compiling directly with the GCN compiler, the
specific ISA is still used.

As currently no one will build a multilib for gfx*generic and ship it,
the change should not cause surprises to users. And once ROCm supports
it, rebuilding GCC with the added multilib is enough.

Thus, like the libgomp change, it make GCC future proof and aid
deployment by Linux distros.

OK for mainline?

Tobias


I think the correct place for this whole concept might be in the 
MULTILIB_MATCHES configuration option, not in mkoffload.  We can perhaps 
do something with an awk script


In general, I think there's an argument for only having generic arch 
libraries, where possible, once the prerequisites are a non-issue.


What's the motivation for adding the warning? I don't think any of the 
restrictions are so interesting for library code. In theory there are 
some restricted instructions that might be used in libm, perhaps, at 
some future time, but that's all. The register count restrictions are 
not interesting at all, since that restricts occupancy, not usage (which 
is already limited by the ABI).


This business of changing the -march flag from what the user specified 
is also questionable.


Andrew

Re: [Patch] [gcn] Fix the output amdhsa.version


On 07/02/2025 11:16, Tobias Burnus wrote:

Andrew Stubbs wrote:
Otherwise, this patch seems fine (I have not reviewed the new magic 
numbers and settings.)


As Andrew mentioned via chat, we also have to update the 'amdhsa.version'.

Well, that's what the attached patch does. (I have no idea which tool / 
library relies on it, but it makes sense to use the right value.)


OK for mainline? (*)

Tobias

(*) Loosely tested with offloading with a ROCm 6.0.2 and 6.3.2 runtime; 
however, the 1.0 was accepted by ROCm also for v4 and as generic does 
not work, v6' 1.1 could not really be tested. However, looking at the 
ROCR code and during debugging, I did not spot any issue with it. 
Actually the string 'amdhsa' does not appear at all in ROCR.


PS: If you wonder where V6 is set: that's a few lines up in the .awk file.


OK

Andrew

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

On Fri, 7 Feb 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, February 5, 2025 1:15 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH]middle-end: delay checking for alignment to load 
> > [PR118464]
> > 
> > On Wed, 5 Feb 2025, Tamar Christina wrote:
> > 
> > [...]
> > 
> > > >
> > 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > > > > 60002933f384f65b 100644
> > > > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > > > (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > > if (is_gimple_debug (stmt))
> > > > > > >   continue;
> > > > > > >
> > > > > > > -   stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > > > +   stmt_vec_info stmt_vinfo
> > > > > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > > > if (!dr_ref)
> > > > > > >   continue;
> > > > > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > > > > (loop_vec_info loop_vinfo)
> > > > > > >bounded by VF so accesses are within range.  We only need 
> > > > > > > to
> > check
> > > > > > >the reads since writes are moved to a safe place where if 
> > > > > > > we get
> > > > > > >there we know they are safe to perform.  */
> > > > > > > -   if (DR_IS_READ (dr_ref)
> > > > > > > -   && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > > > +   if (DR_IS_READ (dr_ref))
> > > > > > >   {
> > > > > > > -   if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > > > -   || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > > > - {
> > > > > > > -   const char *msg
> > > > > > > - = "early break not supported: cannot peel "
> > > > > > > -   "for alignment, vectorization would read out of "
> > > > > > > -   "bounds at %G";
> > > > > > > -   return opt_result::failure_at (stmt, msg, stmt);
> > > > > > > - }
> > > > > > > -
> > > > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > > > dr_info->need_peeling_for_alignment = true;
> > > > > >
> > > > > > You're setting the flag on any DR of a DR group here ...
> > > > > >
> > > > > > > if (dump_enabled_p ())
> > > > > > >   dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > -  "marking DR (read) as needing peeling 
> > > > > > > for
> > "
> > > > > > > -  "alignment at %G", stmt);
> > > > > > > +  "marking DR (read) as possibly needing
> > peeling "
> > > > > > > +  "for alignment at %G", stmt);
> > > > > > >   }
> > > > > > >
> > > > > > > if (DR_IS_READ (dr_ref))
> > > > > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info 
> > > > > > > *vinfo)
> > > > > > > Compute the misalignment of the data reference DR_INFO when
> > vectorizing
> > > > > > > with VECTYPE.
> > > > > > >
> > > > > > > -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that 
> > > > > > > case, *RESULT
> > will
> > > > > > > -   be set appropriately on failure (but is otherwise left 
> > > > > > > unchanged).
> > > > > > > -
> > > > > > > Output:
> > > > > > > 1. initialized misalignment info for DR_INFO
> > > > > > >
> > > > > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info 
> > > > > > > *vinfo)
> > > > > > >
> > > > > > >  static void
> > > > > > >  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info 
> > > > > > > *dr_info,
> > > > > > > -  tree vectype, opt_result *result = 
> > > > > > > nullptr)
> > > > > > > +  tree vectype)
> > > > > > >  {
> > > > > > >stmt_vec_info stmt_info = dr_info->stmt;
> > > > > > >vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> > > > *vinfo,
> > > > > > dr_vec_info *dr_info,
> > > > > > >  = exact_div (targetm.vectorize.preferred_vector_alignment 
> > > > > > > (vectype),
> > > > > > >BITS_PER_UNIT);
> > > > > > >
> > > > > > > -  /* If this DR needs peeling for alignment for correctness, we 
> > > > > > > must
> > > > > > > - ensure the target alignment is a constant power-of-two 
> > > > > > > multiple of the
> > > > > > > - amount read per vector iteration (overriding the above hook 
> > > > > > > where
> > > > > > > - necessary).  */
> > > > > > > -  if (dr_info->need_peeling_for_alignment)
> > > > > > > -{
> > > > > > > -  /* Vector size in bytes.  */
> > > > > > > -  poly_uint64 safe_align = tree_to_poly_uint64 
> > > > > > > (TYPE_SIZE_UNIT
> > > > (vectype));
> > > > > > > -
> > > > > > > -  /* We can

[PATCH] testsuite: Fix g++.dg/modules/adl-5

2025-02-07 Thread Nathaniel Shead

Tested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This testcase wasn't running, because adl-5_a had the wrong extension.
adl-5_d should have been reporting an error because 'frob' is only
visible from within the 'hidden' module but this was missed.

gcc/testsuite/ChangeLog:

* g++.dg/modules/adl-5_a.c: Move to...
* g++.dg/modules/adl-5_a.C: ...here.
* g++.dg/modules/adl-5_d.C: Add errors.

Signed-off-by: Nathaniel Shead 
---
 gcc/testsuite/g++.dg/modules/{adl-5_a.c => adl-5_a.C} | 0
 gcc/testsuite/g++.dg/modules/adl-5_d.C| 5 +++--
 2 files changed, 3 insertions(+), 2 deletions(-)
 rename gcc/testsuite/g++.dg/modules/{adl-5_a.c => adl-5_a.C} (100%)

diff --git a/gcc/testsuite/g++.dg/modules/adl-5_a.c 
b/gcc/testsuite/g++.dg/modules/adl-5_a.C
similarity index 100%
rename from gcc/testsuite/g++.dg/modules/adl-5_a.c
rename to gcc/testsuite/g++.dg/modules/adl-5_a.C
diff --git a/gcc/testsuite/g++.dg/modules/adl-5_d.C 
b/gcc/testsuite/g++.dg/modules/adl-5_d.C
index 9c75b6d14a7..09760c5ad01 100644
--- a/gcc/testsuite/g++.dg/modules/adl-5_d.C
+++ b/gcc/testsuite/g++.dg/modules/adl-5_d.C
@@ -7,10 +7,11 @@ int main ()
 {
   X x (2);
 
-  if (frob (x) != 2)
+  if (frob (x) != 2)  // { dg-error "not declared in" }
 return 1;
 
-  if (TPL (x) != 2)
+  // { dg-regexp "\n\[^\n]*adl-5_a.C:9:15: error: 'frob' was not declared in 
this scope$" }
+  if (TPL (x) != 2)  // { dg-message "required from here" }
 return 2;
 
   return 0;
-- 
2.47.0

Re: [PATCH] RISC-V: Fix ratio in vsetvl fuse rule [PR115703].

2025-02-07 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2025-02-07 00:36
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Fix ratio in vsetvl fuse rule [PR115703].
Hi,
 
in PR115703 we fuse two vsetvls:
 
Fuse curr info since prev info compatible with it:
  prev_info: VALID (insn 438, bb 2)
Demand fields: demand_ge_sew demand_non_zero_avl
SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(reg:DI 0 zero)
VL=(reg:DI 9 s1 [312])
  curr_info: VALID (insn 92, bb 20)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 4 [0x4])
VL=(nil)
  prev_info after fused: VALID (insn 438, bb 2)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 4 [0x4])
VL=(nil).
 
The result is vsetvl zero, zero, e64, mf2, ta, ma.  The previous vsetvl
set vl = 4 but here we wrongly set it to vl = 2.  As all the following
vsetvls only ever change the ratio we never recover.
 
The issue is quite difficult to trigger.  The last known bad commit is
r15-3458-g5326306e7d9d36.  With that commit the output is wrong but
-fno-schedule-insns makes it correct.  From the next commit on the issue is
latent.  I still added the PR's test as scan and run check even if they don't
trigger right now (and I'm not sure the run test will ever fail, but well).
I verified that the patch fixes the issue when applied on top of
r15-3458-g5326306e7d9d36.
 
Regtested on rv64gcv_zvl512b.  Let's see what the CI says.
 
Regards
Robin
 
PR target/115703
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the
new LMUL.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr115703-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr115703.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  3 +-
.../riscv/rvv/autovec/pr115703-run.c  | 44 +++
.../gcc.target/riscv/rvv/autovec/pr115703.c   | 38 
3 files changed, 84 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 72c4c59514e..82284624a24 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1756,7 +1756,8 @@ private:
   inline void use_max_sew_and_lmul_with_next_ratio (vsetvl_info &prev,
const vsetvl_info &next)
   {
-prev.set_vlmul (calculate_vlmul (prev.get_sew (), next.get_ratio ()));
+int max_sew = MAX (prev.get_sew (), next.get_sew ());
+prev.set_vlmul (calculate_vlmul (max_sew, next.get_ratio ()));
 use_max_sew (prev, next);
 prev.set_ratio (next.get_ratio ());
   }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703-run.c
new file mode 100644
index 000..0c2c3d7d4fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703-run.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target rvv_zvl256b_ok } */
+/* { dg-options "-O3 -march=rv64gcv_zvl256b -mabi=lp64d -fwhole-program 
-fwrapv" } */
+
+int a, i;
+unsigned long b;
+unsigned c, f;
+long long d = 1;
+short e, m;
+long g, h;
+
+__attribute__ ((noipa))
+void check (unsigned long long x)
+{
+  if (x != 13667643351234938049ull)
+__builtin_abort ();
+}
+
+int main() {
+  for (int q = 0; q < 2; q += 1) {
+for (short r = 0; r < 2; r += 1)
+  for (char s = 0; s < 6; s++)
+for (short t = 0; t < 011; t += 12081 - 12080)
+  for (short u = 0; u < 11; u++) {
+a = ({ a > 1 ? a : 1; });
+b = ({ b > 5 ? b : 5; });
+for (short j = 0; j < 2; j = 2080)
+  c = ({ c > 030 ? c : 030; });
+for (short k = 0; k < 2; k += 2080)
+  d *= 7;
+e *= 10807;
+f = ({ f > 3 ? f : 3; });
+  }
+for (int l = 0; l < 21; l += 1)
+  for (int n = 0; n < 16; n++) {
+g = ({ m ? g : m; });
+for (char o = 0; o < 7; o += 1)
+  h *= 3;
+i = ({ i < 0 ? i : 0; });
+  }
+  }
+
+  check (d);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703.c
new file mode 100644
index 000..fc147fefa98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115703.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl256b -fwhole-program -fwrapv" } */
+
+int a, i;
+unsigned long b;
+unsigned c, f;
+lon

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Robin Dapp

> You mean the "rivoscibot/toolchain-ci-rivos-test" from the patchwork ? That 
> looks great!
>
> https://patchwork.sourceware.org/project/gcc/patch/20250207082032.1450527-1-pan2...@intel.com/

Yes that one.

-- 
Regards
 Robin

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c


On 07/02/2025 00:25, Tobias Burnus wrote:

After spending some time with the debugger, I am now convinced that
ROCm 6.3.2 does not yet support generic. The amd-staging branch at
https://github.com/ROCm/ROCR-Runtime/ support does, albeit only after
the tag rocm-6.3.2. However, the released ROCm 6.3.2 does not match
that tagged commit as it seems to contain at least the much newer
commit eec21304 (but not the generic support that appeared in an
in-between commit.) [AOMP is also different; it seems as if 20.0-1
does not support it, but 20.0-2 might; AOMP does not seem to have
commit eec21304 to confuse things.]

* * *

The attached patch now adds gfx9-generic - alongside the existing
gfx{10-3,1}-generic and all gfx* that are enabled by those.

See previous thread for the related discussions.

OK for mainline?


What happened to the documentation patch with the "Experimental" 
markers? I'm still uncomfortable with adding so many untested devices, 
so the documentation is important.


Otherwise, this patch seems fine (I have not reviewed the new magic 
numbers and settings.)


Andrew



(This patch depends on the just submitted patch:
"[Patch] [gcn] Fix gfx906's sramecc setting",
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675251.html )

* * *

RFC:
The patch currently does not document any gfx*-generic nor the
gfx902 etc. I wonder whether a follow-up patch should just add
the non-generic ones with "(Experimental)" to invoke.texi?

Or any better ideas regarding what to make available now
and what only later, once a generic-supporting ROCm is available?

In other words: What bits of:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html ?

Tobias

[PATCH v4] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

2025-02-07 Thread Alexandre Oliva

On Feb  6, 2025, Sam James  wrote:

> Richard Biener  writes:
>> On Thu, Feb 6, 2025 at 2:41 PM Alexandre Oliva  wrote:
>>> 
>>> On Jan 27, 2025, Richard Biener  wrote:
>>> > (I see the assert is no longer in the patch).
>>> 
>>> That's because the assert went in as part of an earlier patch.  I take
>>> it it should be backed out along with the to-be-split-out bits above,
>>> right?
>> 
>> Yes.
>> 
>> (IIRC there's also a PR tripping over this or a similar assert)

> Right, PR118706.

Thanks.  I've added its testcase to the patch below, reverted the
assert, and dropped the other unwanted bits.  Regstrapped on
x86_64-linux-gnu.  Ok to install?



If decode_field_reference finds a load that accesses past the inner
object's size, bail out.

Drop the too-strict assert.


for  gcc/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gimple-fold.cc (decode_field_reference): Refuse to consider
merging out-of-bounds BIT_FIELD_REFs.
(make_bit_field_load): Drop too-strict assert.
* tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
(access_in_bounds_of_type_p): ... this.  Change interface,
export.
(tree_could_trap_p): Adjust.
* tree-eh.h (access_in_bounds_of_type_p): Declare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gcc.dg/field-merge-25.c: New.
---
 gcc/gimple-fold.cc|   11 ++-
 gcc/testsuite/gcc.dg/field-merge-25.c |   15 +++
 gcc/tree-eh.cc|   25 +
 gcc/tree-eh.h |1 +
 4 files changed, 31 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-25.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 45485782cdf91..29191685a43c5 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   || bs <= shiftrt
   || offset != 0
   || TREE_CODE (inner) == PLACEHOLDER_EXPR
-  /* Reject out-of-bound accesses (PR79731).  */
-  || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
- && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
-  bp + bs) < 0)
+  /* Reject out-of-bound accesses (PR79731, PR118514).  */
+  || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp)
   || (INTEGRAL_TYPE_P (TREE_TYPE (inner))
  && !type_has_mode_precision_p (TREE_TYPE (inner
 return NULL_TREE;
@@ -7859,11 +7857,6 @@ make_bit_field_load (location_t loc, tree inner, tree 
orig_inner, tree type,
   gimple *new_stmt = gsi_stmt (i);
   if (gimple_has_mem_ops (new_stmt))
gimple_set_vuse (new_stmt, reaching_vuse);
-  gcc_checking_assert (! (gimple_assign_load_p (point)
- && gimple_assign_load_p (new_stmt))
-  || (tree_could_trap_p (gimple_assign_rhs1 (point))
-  == tree_could_trap_p (gimple_assign_rhs1
-(new_stmt;
 }
 
   gimple_stmt_iterator gsi = gsi_for_stmt (point);
diff --git a/gcc/testsuite/gcc.dg/field-merge-25.c 
b/gcc/testsuite/gcc.dg/field-merge-25.c
new file mode 100644
index 0..e769b0ae7b846
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-25.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-fre" } */
+
+/* PR tree-optimization/118706 */
+
+int a[1][1][3], b;
+int main() {
+  int c = -1;
+  while (b) {
+if (a[c][c][6])
+  break;
+if (a[0][0][0])
+  break;
+  }
+}
diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
index 7015189a2de83..a4d59954c0597 100644
--- a/gcc/tree-eh.cc
+++ b/gcc/tree-eh.cc
@@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref)
   return true;
 }
 
-/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known to
-   be in bounds for the referred operand type.  */
+/* Return true iff a BIT_FIELD_REF <(TYPE)???, SIZE, OFFSET> would access a bit
+   range that is known to be in bounds for TYPE.  */
 
-static bool
-bit_field_ref_in_bounds_p (tree expr)
+bool
+access_in_bounds_of_type_p (tree type, poly_uint64 size, poly_uint64 offset)
 {
-  tree size_tree;
-  poly_uint64 size_max, min, wid, max;
+  tree type_size_tree;
+  poly_uint64 type_size_max, min = offset, wid = size, max;
 
-  size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0)));
-  if (!size_tree || !poly_int_tree_p (size_tree, &size_max))
+  type_size_tree = TYPE_SIZE (type);
+  if (!type_size_tree || !poly_int_tree_p (type_size_tree, &type_size_max))
 return false;
 
-  min = bit_field_offset (expr);
-  wid = bit_field_size (expr);
   max = min + wid;
   if (maybe_lt (max, min)
-  || maybe_lt (size_max, max))
+  || maybe_lt (type_size_max, max))
 return false;
 
   return true;
@@ -2712,7 +2710,10 @@ tree_could_trap_p (tree expr)
   swi

[Patch, v2] [gcn] Add gfx9-generic and generic-associated gfx* (was: [GCN] Handle generic ISA names in libgomp's plugin-gcn.c)


Andrew Stubbs wrote:

The attached patch now adds gfx9-generic - alongside the existing
gfx{10-3,1}-generic and all gfx* that are enabled by those.


What happened to the documentation patch with the "Experimental" 
markers? I'm still uncomfortable with adding so many untested devices, 
so the documentation is important.


I was a bit unsure whether/how/which ones to document.

But I think it makes sense to have them, including the gfx*-generic 
ones, also 'experimental'.


Otherwise, this patch seems fine (I have not reviewed the new magic 
numbers and settings.)


Updated patch attached - unchanged except for the .texi.

I guess, I can go ahead and commit it now?

Tobias
[gcn] Add gfx9-generic and generic-associated gfx*

This patch adds gfx9-generic, completing the gfx*-generic support.
It also adds all gfx* devices that are part of any of the gfx*-generic,
i.e. gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.

gcc/ChangeLog:

	* config/gcn/gcn-devices.def (GCN_DEVICE): Add gfx9-generic,
	gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
	gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.
	Add a currently unused column linking, a specific ISA to a generic
	one (if it exists).
	* config/gcn/gcn-tables.opt: Regenerate
* doc/invoke.texi (AMD GCN): Add the the new gfc... and the older
	gfx{10-3,11}-generic to -march= as 'experimental'.

 gcc/config/gcn/gcn-devices.def | 202 ++---
 gcc/config/gcn/gcn-tables.opt  |  45 +
 gcc/doc/invoke.texi|  53 +++
 3 files changed, 289 insertions(+), 11 deletions(-)

diff --git a/gcc/config/gcn/gcn-devices.def b/gcc/config/gcn/gcn-devices.def
index a8b21a358b4..af1420382e2 100644
--- a/gcc/config/gcn/gcn-devices.def
+++ b/gcc/config/gcn/gcn-devices.def
@@ -71,6 +71,10 @@
 	generated by the used llvm-mc assembler.
   10 "Architecture Family Name"  (string, external)
 	Used to #define '__GFX<...>__'.
+  11 "GENERIC NAME" (text, external)
+	The name of the generic ISA this device is compatible with or "NONE",
+	where the generic name is the NAME (field 2) of the associated
+	generic device.
 
 Fields marked "external", above, have values defined elsewhere (HSA, ROCM,
 LLVM, ELF, etc.) and must have matching definitions here.  Fields marked
@@ -86,7 +90,30 @@ GCN_DEVICE(gfx900, GFX900, 0x2c, ISA_GCN5,
 	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
 	   /* Max ISA VGPRs */ 256,
 	   /* Generic code obj version */ 0,  /* non-generic */
-	   /* Architecture Family */ GFX9
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ GFX9_GENERIC
+	   )
+
+GCN_DEVICE(gfx902, GFX902, 0x2d, ISA_GCN5,
+	   /* XNACK default */ HSACO_ATTR_OFF,
+	   /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED,
+	   /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* Max ISA VGPRs */ 256,
+	   /* Generic code obj version */ 0,  /* non-generic */
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ GFX9_GENERIC
+	   )
+
+GCN_DEVICE(gfx904, GFX904, 0x2e, ISA_GCN5,
+	   /* XNACK default */ HSACO_ATTR_OFF,
+	   /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED,
+	   /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* Max ISA VGPRs */ 256,
+	   /* Generic code obj version */ 0,  /* non-generic */
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ GFX9_GENERIC
 	   )
 
 GCN_DEVICE(gfx906, GFX906, 0x2f, ISA_GCN5,
@@ -96,7 +123,8 @@ GCN_DEVICE(gfx906, GFX906, 0x2f, ISA_GCN5,
 	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
 	   /* Max ISA VGPRs */ 256,
 	   /* Generic code obj version */ 0,  /* non-generic */
-	   /* Architecture Family */ GFX9
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ GFX9_GENERIC
 	   )
 
 GCN_DEVICE(gfx908, GFX908, 0x30, ISA_CDNA1,
@@ -106,7 +134,19 @@ GCN_DEVICE(gfx908, GFX908, 0x30, ISA_CDNA1,
 	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
 	   /* Max ISA VGPRs */ 256,
 	   /* Generic code obj version */ 0,  /* non-generic */
-	   /* Architecture Family */ GFX9
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ NONE
+	   )
+
+GCN_DEVICE(gfx909, GFX909, 0x31, ISA_GCN5,
+	   /* XNACK default */ HSACO_ATTR_ANY,
+	   /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED,
+	   /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
+	   /* Max ISA VGPRs */ 256,
+	   /* Generic code obj version */ 0,  /* non-generic */
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ GFX9_GENERIC
 	   )
 
 GCN_DEVICE(gfx90a, GFX90A, 0x3f, ISA_CDNA2,
@@ -116,7 +156,8 @@ GCN_DEVICE(gfx90a, GFX90A, 0x3f, ISA_CDNA2,
 	   /* CU mode */ HSACO_ATTR_UNSUPPORTED,
 	   /* Max ISA VGPRs */ 512,
 	   /* Generic code obj version */ 0,  /* non-generic */
-	   /* Architecture Family */ GFX9
+	   /* Architecture Family */ GFX9,
+	   /* Generic Name */ NONE
 	   )
 
 GCN_DEVICE(gfx90c, GFX90C, 0x32

[Patch] [gcn] Fix the output amdhsa.version (was: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c)


Andrew Stubbs wrote:
Otherwise, this patch seems fine (I have not reviewed the new magic 
numbers and settings.)


As Andrew mentioned via chat, we also have to update the 'amdhsa.version'.

Well, that's what the attached patch does. (I have no idea which tool / 
library relies on it, but it makes sense to use the right value.)


OK for mainline? (*)

Tobias

(*) Loosely tested with offloading with a ROCm 6.0.2 and 6.3.2 runtime; 
however, the 1.0 was accepted by ROCm also for v4 and as generic does 
not work, v6' 1.1 could not really be tested. However, looking at the 
ROCR code and during debugging, I did not spot any issue with it. 
Actually the string 'amdhsa' does not appear at all in ROCR.


PS: If you wonder where V6 is set: that's a few lines up in the .awk file.
[gcn] Fix the output amdhsa.version

The amdhsa.version depends on the code object version; while V3 had 1.0,
V4 has 1.1 and V5 (and V6) have 1.2. GCC used 1.0 but generated since
a while either V4 or, with -march=gfx...-generic, V6. Now it uses the
proper version again.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Update
	'amdhsa.version' output to match used code version.
	* config/gcn/gen-gcn-device-macros.awk: Add a comment to
	crosslink.

 gcc/config/gcn/gcn.cc| 17 +++--
 gcc/config/gcn/gen-gcn-device-macros.awk |  4 +++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 82fc6ff1e41..b0c06d5e632 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -6668,18 +6668,23 @@ gcn_hsa_declare_function_name (FILE *file, const char *name,
 fprintf (file,
 	 "\t  .amdhsa_tg_split\t0\n");
   fputs ("\t.end_amdhsa_kernel\n", file);
 
 #if 1
   /* The following is YAML embedded in assembler; tabs are not allowed.  */
-  fputs (".amdgpu_metadata\n"
-	 "amdhsa.version:\n"
-	 "  - 1\n"
-	 "  - 0\n"
-	 "amdhsa.kernels:\n"
-	 "  - .name: ", file);
+
+  /* 'amdhsa.version': code object V3 = 1.0, V4 = 1.1, V5/V6 = 1.2.  */
+  /* Keep in sync with 'amdhsa-code-object' in gen-gcn-device-macros.awk.  */
+  fprintf (file,
+	   ".amdgpu_metadata\n"
+	   "amdhsa.version:\n"
+	   "  - 1\n"
+	   "  - %d\n"
+	   "amdhsa.kernels:\n"
+	   "  - .name: ",
+	   gcn_devices[gcn_arch].generic_version ? 2 /* V6 */ : 1 /* V4 */);
   assemble_name (file, name);
   fputs ("\n.symbol: ", file);
   assemble_name (file, name);
   fprintf (file,
 	   ".kd\n"
 	   ".kernarg_segment_size: %i\n"
diff --git a/gcc/config/gcn/gen-gcn-device-macros.awk b/gcc/config/gcn/gen-gcn-device-macros.awk
index aa271004c27..d227e6fcedf 100644
--- a/gcc/config/gcn/gen-gcn-device-macros.awk
+++ b/gcc/config/gcn/gen-gcn-device-macros.awk
@@ -114,13 +114,15 @@ BEGIN {
 # ABI Version: In principle, the LLVM default would work. However,
 # however, when debugging symbols are turned on, mkoffload.cc
 # writes a new AMD GPU object file and the ABI version needs to be the
 # same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
 # GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
 # Code object V6 is supported since LLVM 19.
-
+#
+# Keep in sync with 'amdhsa.version' in gcn.cc
+#
 END {
   print ""
   print ""
   printf "#define ABI_VERSION_OPT \"%%{\"%s \"!march=*|march=*:--amdhsa-code-object-version=4} \"\n", generic_list
   printf "#define XNACKOPT \"%%{\"%s \":%%eexpected march\\n} \"\n", gensub (/OPT/, "XNACK", "g", list)
   printf "#define SRAMOPT \"%%{\"%s \":%%eexpected march\\n} \"\n", gensub (/OPT/, "SRAM", "g", list)

RE: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Li, Pan2

Thanks Robin.

> OK (unless the CI complains of course).

You mean the "rivoscibot/toolchain-ci-rivos-test" from the patchwork ? That 
looks great!

https://patchwork.sourceware.org/project/gcc/patch/20250207082032.1450527-1-pan2...@intel.com/

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, February 7, 2025 5:50 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp 
Subject: Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

> Inspired by PR118103, the VXRM register should be treated almost the
> same as the FRM register, aka cooperatively-managed global register.
> Thus, add the VXRM to global_regs to avoid the elimination by the
> late-combine pass.

OK (unless the CI complains of course).

-- 
Regards
 Robin

Re: [Patch, v2] [gcn] Add gfx9-generic and generic-associated gfx*


On 07/02/2025 10:37, Tobias Burnus wrote:

Andrew Stubbs wrote:

The attached patch now adds gfx9-generic - alongside the existing
gfx{10-3,1}-generic and all gfx* that are enabled by those.


What happened to the documentation patch with the "Experimental" 
markers? I'm still uncomfortable with adding so many untested devices, 
so the documentation is important.


I was a bit unsure whether/how/which ones to document.

But I think it makes sense to have them, including the gfx*-generic 
ones, also 'experimental'.


Otherwise, this patch seems fine (I have not reviewed the new magic 
numbers and settings.)


Updated patch attached - unchanged except for the .texi.

I guess, I can go ahead and commit it now?


OK.

Andrew

[Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has (was: [GCN] Handle generic ISA names in libgomp's plugin-gcn.c)


This patch is part of the following series (not yet in mainline);
this patch depends on the first one, but only makes sense if both are in:

* "[gcn] Add gfx9-generic and generic-associated gfx*"
  (email subject: "Re: [Patch] [GCN] Handle generic ISA names in libgomp's 
plugin-gcn.c";
   this thread), 
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675259.html

* "[Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c",
  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675274.html

* * *

This patch handles the following case (in mkoffload.cc):

* If for the specified specific device, no (multi)lib is available but for
  its generic ISA, automatically choose the generic ISA (with a warning).

Due to the double condition (not available for specified, available for
generic), it shouldn't cause surprises. Additionally, when not using
mkoffload, i.e. when compiling directly with the GCN compiler, the
specific ISA is still used.

As currently no one will build a multilib for gfx*generic and ship it,
the change should not cause surprises to users. And once ROCm supports
it, rebuilding GCC with the added multilib is enough.

Thus, like the libgomp change, it make GCC future proof and aid
deployment by Linux distros.

OK for mainline?

Tobias
[gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

Assume that a distro has configured, e.g., a gfx9-generic multilib but not
for gfx902. In that, mkoffload would fail to link - hence, automatically.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (enum elf_arch_code): Add
	EF_AMDGPU_MACH_AMDGCN_NONE.
	(elf_arch): Use enum elf_arch_code as type.
	(tool_cleanup): Silence warning by removing tailing '.' from error.
	(get_arch_name): Return enum elf_arch_code.
	(elf_arch_generic_update): New; replace specific device by
	generic device if only the latter has a multilib.
	(main): Call it; replace -march= as needed.

 gcc/config/gcn/mkoffload.cc | 115 
 1 file changed, 105 insertions(+), 10 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 92e8fe70c12..39be6630fd0 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -53,6 +53,7 @@
 
 /* Extract the EF_AMDGPU_MACH_AMDGCN_GFXnnn from the def file.  */
 enum elf_arch_code {
+  EF_AMDGPU_MACH_AMDGCN_NONE = -1,  /* For generic handling.  */
 #define GCN_DEVICE(name, NAME, ELF_ARCH, ...) \
   EF_AMDGPU_MACH_AMDGCN_ ## NAME = ELF_ARCH,
 #include "gcn-devices.def"
@@ -135,9 +136,8 @@ static struct obstack files_to_cleanup;
 enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
 const char *offload_abi_host_opts = NULL;
 
-uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
+enum elf_arch_code elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
 uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
-
 static int gcn_stack_size = 0;  /* Zero means use default.  */
 
 /* Delete tempfiles.  */
@@ -782,7 +782,7 @@ compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_ptr_grow (&argv_obstack, ".c");
   if (!offload_abi_host_opts)
 fatal_error (input_location,
-		 "%<-foffload-abi-host-opts%> not specified.");
+		 "%<-foffload-abi-host-opts%> not specified");
   obstack_ptr_grow (&argv_obstack, offload_abi_host_opts);
   obstack_ptr_grow (&argv_obstack, infile);
   obstack_ptr_grow (&argv_obstack, "-c");
@@ -796,16 +796,15 @@ compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_free (&argv_obstack, NULL);
 }
 
-static int
+static enum elf_arch_code
 get_arch (const char *str, const char *with_arch_str)
 {
   /* Use the def file to map the name to the elf_arch_code.  */
   if (!str) ;
 #define GCN_DEVICE(name, NAME, ELF, ...) \
   else if (strcmp (str, #name) == 0) \
-return ELF;
+return (enum elf_arch_code) ELF;
 #include "gcn-devices.def"
-#undef GCN_DEVICE
 
   /* else */
   error ("unrecognized argument in option %<-march=%s%>", str);
@@ -839,7 +838,92 @@ get_arch (const char *str, const char *with_arch_str)
 
   exit (FATAL_EXIT_CODE);
 
-  return 0;
+  return EF_AMDGPU_MACH_AMDGCN_NONE;
+}
+
+static const char*
+get_arch_name (enum elf_arch_code arch_code)
+{
+  switch (arch_code)
+{
+#define GCN_DEVICE(name, NAME, ELF, ...) \
+case EF_AMDGPU_MACH_AMDGCN_ ## NAME: \
+  return #name;
+#include "../../gcc/config/gcn/gcn-devices.def"
+default: return NULL;
+}
+}
+
+/* If an generic arch exists and for the chosen arch no (multi)lib is
+   available, default to the generic version, if that has a (multi)lib
+   configured for.  */
+
+static enum elf_arch_code
+elf_arch_generic_update (enum elf_arch_code elf_arch,
+			 enum elf_arch_code default_arch)
+{
+  enum elf_arch_code generic_arch;
+  switch (elf_arch)
+{
+#define GCN_DEVICE(name, NAME, ELF, ISA, XNACK, SRAM, WAVE64, CU, \
+		   MAX_ISA_VGPRS, GEN_VER, ARCH_

Re: [PATCH 1/3] c++: Fix mangling of lambas in static member template initializers [PR107741]


On 1/31/25 8:44 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

My fix for this issue in r15-7147 turns out to not be quite sufficient;
static member templates apparently go down a different code path and
need their own handling.

PR c++/107741

gcc/cp/ChangeLog:

* decl2.cc (start_initialized_static_member): Push the
TEMPLATE_DECL when appropriate.
* parser.cc (cp_parser_init_declarator): Start the member decl
early for static members so that lambda scope is set.
(cp_parser_template_declaration_after_parameters): Don't
register static members here.

gcc/testsuite/ChangeLog:

* g++.dg/abi/lambda-ctx2-19.C: Add tests for template members.
* g++.dg/abi/lambda-ctx2-19vs20.C: Likewise.
* g++.dg/abi/lambda-ctx2-20.C: Likewise.
* g++.dg/abi/lambda-ctx2.h: Likewise.
* g++.dg/cpp0x/static-member-init-1.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl2.cc   | 15 --
  gcc/cp/parser.cc  | 30 +++
  gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C |  3 ++
  gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C |  3 ++
  gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C |  3 ++
  gcc/testsuite/g++.dg/abi/lambda-ctx2.h| 16 ++
  .../g++.dg/cpp0x/static-member-init-1.C   |  5 
  7 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 9e61afd359f..994a459c79c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1295,6 +1295,8 @@ start_initialized_static_member (const cp_declarator 
*declarator,
gcc_checking_assert (VAR_P (value));
  
DECL_CONTEXT (value) = current_class_type;

+  DECL_INITIALIZED_IN_CLASS_P (value) = true;
+
if (processing_template_decl)
  {
value = push_template_decl (value);
@@ -1305,8 +1307,17 @@ start_initialized_static_member (const cp_declarator 
*declarator,
if (attrlist)
  cplus_decl_attributes (&value, attrlist, 0);
  
-  finish_member_declaration (value);

-  DECL_INITIALIZED_IN_CLASS_P (value) = true;
+  /* When defining a template we need to register the TEMPLATE_DECL.  */
+  tree maybe_template = value;
+  if (template_parm_scope_p ())
+{
+  if (!DECL_TEMPLATE_SPECIALIZATION (value))
+   maybe_template = DECL_TI_TEMPLATE (value);
+  else
+   maybe_template = NULL_TREE;
+}
+  if (maybe_template)
+finish_member_declaration (maybe_template);


Sigh, this all seems increasingly fragile.  Perhaps it would have been 
preferable to break up grokfield for all members rather than just 
initialized variables.  Or at least for all static data members.  Now is 
not the time for that, of course.



return value;
  }
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 7ddb7f119a4..af1c3774f74 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -24179,8 +24179,17 @@ cp_parser_init_declarator (cp_parser* parser,
 here we only handle the latter two.  */
  bool has_lambda_scope = false;
  
+	  if (member_p && decl_specifiers->storage_class == sc_static)

+   {
+ gcc_checking_assert (!decl);
+ tree all_attrs = attr_chainon (attributes, prefix_attributes);
+ decl = start_initialized_static_member (declarator,
+ decl_specifiers,
+ all_attrs);
+   }


Could we do this sooner, near the start_decl call?  And adjust the 
comment there that says we wait until after the initializer for members...



  if (decl != error_mark_node
- && !member_p
+ && (!member_p || decl)


I think this line can just be "decl" now?


  && (processing_template_decl || DECL_NAMESPACE_SCOPE_P (decl)))
has_lambda_scope = true;
  
@@ -33739,7 +33752,12 @@ cp_parser_template_declaration_after_parameters (cp_parser* parser,

  }
  
/* Register member declarations.  */

-  if (member_p && !friend_p && decl && !DECL_CLASS_TEMPLATE_P (decl))
+  if (member_p && !friend_p && decl && !DECL_CLASS_TEMPLATE_P (decl)
+  /* But this is not needed for initialised static members, that were
+registered early to be able to be used in their own definition.  */
+  && !(variable_template_p (decl)
+  && DECL_CLASS_SCOPE_P (decl)
+  && DECL_INITIALIZED_IN_CLASS_P (DECL_TEMPLATE_RESULT (decl


This should be a predicate.

Jason

Re: [PATCH 2/3] c++: Clear lambda scope for unattached member template lambdas


On 1/31/25 8:45 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

In r15-7202 we made lambdas between a template parameter scope and a
class/function/initializer be considered TU-local, in lieu of working
out how to mangle them to the succeeding declaration.

I neglected to clear any existing mangling on the template declaration
however; this means that such lambdas can occasionally get a lambda
scope, and will in general inherit the lambda scope of their
instantiation context (whatever that might be).

This patch ensures that the scope is cleared on the template declaration
as well.

gcc/cp/ChangeLog:

* lambda.cc (record_lambda_scope): Clear mangling scope for
otherwise unattached lambdas in class member templates.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval22.C: Add check that the primary
specialisation of the lambda is TU-local.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/lambda.cc | 11 +++
  gcc/testsuite/g++.dg/cpp2a/lambda-uneval22.C |  3 ++-
  2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index 5593636eaf8..73cf816b6e1 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1575,6 +1575,17 @@ record_lambda_scope (tree lambda)
}
  }
  
+  /* An otherwise unattached class-scope lambda in a member template

+ should not have a mangling scope, as the mangling scope will not
+ correctly inherit on instantiation.  */
+  tree ctx = TYPE_CONTEXT (closure);
+  if (scope
+  && ctx
+  && CLASS_TYPE_P (ctx)
+  && ctx == TREE_TYPE (scope)
+  && current_template_depth > template_class_depth (ctx))
+scope = NULL_TREE;
+
LAMBDA_EXPR_EXTRA_SCOPE (lambda) = scope;
if (scope)
  maybe_key_decl (scope, TYPE_NAME (closure));
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-uneval22.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval22.C
index 9c0e8128f10..1a25a0255fc 100644
--- a/gcc/testsuite/g++.dg/cpp2a/lambda-uneval22.C
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval22.C
@@ -5,7 +5,7 @@ struct S {
using T = decltype([]{ return I; });
  
template 

-  decltype([]{ return I; }) f() { return {}; }
+  decltype([]{ return I; }) f();  // { dg-error "declared using local type" }
  };
  
  void a(S::T<0>*);  // { dg-error "declared using local type" }

@@ -18,4 +18,5 @@ int main() {
b(nullptr);
c(nullptr);
d(nullptr);
+  S{}.f<2>()();
  }

Re: [Patch][v2] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c


On 07/02/2025 11:44, Tobias Burnus wrote:

Andrew Stubbs wrote:

On 07/02/2025 09:40, Tobias Burnus wrote:

This patch permits loading generic ISA code objects - by just
trying whether the runtime accepts it.  If not, it fails with
an error. - The error messages should be a bit more helpful in
some cases as before.


...


Also I think all the sentences should finish with '.'.


Thanks for proof reading. Updated patch attached.

I also added the final sentence-end period, for consistency. But I note 
that this is a plugin-gcn-ism; there is even a GCC warning that 
'warning_at'/error_at' diagnostic does not end with a full stop.


OK for mainline?

Tobias

PS: Pending patches:

* mkoffload.cc: switch -march= to generic version if it has a multilib 
and the specific one hasn't


* amdhsa.version fix

And otherwise to do:

* [Waiting for ROCm update] plugin-gcn: Suggest -march=gfx*-generic 
besides -march=gfx.


* Update install.texi – could be done now or once ROCm supports it?

Cf. original patch https://gcc.gnu.org/pipermail/gcc-patches/2025- 
February/675200.html for the last two.


OK

Andrew

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Andrew Waterman

This patch runs counter to the ABI spec, which states that vxrm is not
preserved across calls and is volatile upon function entry [1].  vxrm
does not play the same role as frm plays in the calling convention.
(I won't get into the rationale in this email, but the rationale isn't
especially important: we should follow the ABI.)

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120


On Fri, Feb 7, 2025 at 12:21 AM  wrote:
>
> From: Pan Li 
>
> Inspired by PR118103, the VXRM register should be treated almost the
> same as the FRM register, aka cooperatively-managed global register.
> Thus, add the VXRM to global_regs to avoid the elimination by the
> late-combine pass.
>
> For example as below code:
>
>   21   │
>   22   │ void compute ()
>   23   │ {
>   24   │   size_t vl = __riscv_vsetvl_e16m1 (N);
>   25   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
>   26   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
>   27   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, 
> __RISCV_VXRM_RDN, vl);
>   28   │
>   29   │   __riscv_vse16_v_u16m1 (c, vc, vl);
>   30   │ }
>   31   │
>   32   │ int main ()
>   33   │ {
>   34   │   initialize ();
>   35   │   compute();
>   36   │
>   37   │   return 0;
>   38   │ }
>
> After compile with -march=rv64gcv -O3, we will have:
>
>   30   │ compute:
>   31   │ csrwi   vxrm,2
>   32   │ lui a3,%hi(a)
>   33   │ lui a4,%hi(b)
>   34   │ addia4,a4,%lo(b)
>   35   │ vsetivlizero,4,e16,m1,ta,ma
>   36   │ addia3,a3,%lo(a)
>   37   │ vle16.v v2,0(a4)
>   38   │ vle16.v v1,0(a3)
>   39   │ lui a4,%hi(c)
>   40   │ addia4,a4,%lo(c)
>   41   │ vaaddu.vv   v1,v1,v2
>   42   │ vse16.v v1,0(a4)
>   43   │ ret
>   44   │ .size   compute, .-compute
>   45   │ .section.text.startup,"ax",@progbits
>   46   │ .align  1
>   47   │ .globl  main
>   48   │ .type   main, @function
>   49   │ main:
>| // csrwi   vxrm,2 deleted after inline
>   50   │ addisp,sp,-16
>   51   │ sd  ra,8(sp)
>   52   │ callinitialize
>   53   │ lui a3,%hi(a)
>   54   │ lui a4,%hi(b)
>   55   │ vsetivlizero,4,e16,m1,ta,ma
>   56   │ addia4,a4,%lo(b)
>   57   │ addia3,a3,%lo(a)
>   58   │ vle16.v v2,0(a4)
>   59   │ vle16.v v1,0(a3)
>   60   │ lui a4,%hi(c)
>   61   │ addia4,a4,%lo(c)
>   62   │ li  a0,0
>   63   │ vaaddu.vv   v1,v1,v2
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
>
> PR target/118103
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_conditional_register_usage): Add
> the VXRM as the global_regs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr118103-2.c: New test.
> * gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc |  4 +-
>  .../gcc.target/riscv/rvv/base/pr118103-2.c| 40 +
>  .../riscv/rvv/base/pr118103-run-2.c   | 44 +++
>  3 files changed, 87 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-run-2.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 439cc12f93d..819e1538741 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10900,7 +10900,9 @@ riscv_conditional_register_usage (void)
> call_used_regs[regno] = 1;
>  }
>
> -  if (!TARGET_VECTOR)
> +  if (TARGET_VECTOR)
> +global_regs[VXRM_REGNUM] = 1;
> +  else
>  {
>for (int regno = V_REG_FIRST; regno <= V_REG_LAST; regno++)
> fixed_regs[regno] = call_used_regs[regno] = 1;
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
> new file mode 100644
> index 000..d6e3aa09077
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118103-2.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d" } */
> +
> +#include "riscv_vector.h"
> +
> +#define N 4
> +uint16_t a[N];
> +uint16_t b[N];
> +uint16_t c[N];
> +
> +void initialize ()
> +{
> +  uint16_t tmp_0[N] = { 0xfff, 3213, 238, 275, };
> +
> +  for (int i = 0; i < N; ++i)
> +a[i] = b[i] = tmp_0[i];
> +
> +  for (int i = 0; i < N; ++i)
> +c[i] = 0;
> +}
> +
> +void compute ()
> +{
> +  size_t vl = __riscv_vsetvl_e16m1 (N);
> +  vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
> +  vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
> +  vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
> +
> +  __riscv_vse16_v_u16m1 (c, vc, vl);
> +}
> +
> +int main ()
> +{
> +  initialize ();
> +  compute();
> +
> +  return 0;
> +}
> +
>

Re: [PATCH v2] c++: Reject cdtors and conversion operators with a single * as return type [PR118306]


On 2/6/25 3:05 PM, Simon Martin wrote:

Hi Jason,

On 6 Feb 2025, at 16:48, Jason Merrill wrote:


On 2/5/25 2:21 PM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 21:23, Jason Merrill wrote:


On 2/4/25 3:03 PM, Jason Merrill wrote:

On 2/4/25 11:45 AM, Simon Martin wrote:

On 4 Feb 2025, at 17:17, Jason Merrill wrote:


On 2/4/25 10:56 AM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 16:39, Jason Merrill wrote:


On 1/15/25 9:56 AM, Jason Merrill wrote:

On 1/15/25 7:24 AM, Simon Martin wrote:

Hi,

On 14 Jan 2025, at 23:31, Jason Merrill wrote:


On 1/14/25 2:13 PM, Simon Martin wrote:

On 10 Jan 2025, at 19:10, Andrew Pinski wrote:

On Fri, Jan 10, 2025 at 3:18 AM Simon Martin

wrote:


We currently accept the following invalid code (EDG and
MSVC
do
as
well)


clang does too:
https://github.com/llvm/llvm-project/issues/121706 .




Note it might be useful if a testcase with multiply `*` is
included



too:
```
struct A {
        A ();
};
```

Thanks, makes sense to add those. Done in the attached
updated
revision,
successfully tested on x86_64-pc-linux-gnu.



+/* Check that it's OK to declare a function at ID_LOC with
the
indicated TYPE,
+   TYPE_QUALS and DECLARATOR.  SFK indicates the kind
of
special
function (if
+   any) that this function is.  OPTYPE is the type
given
in
a
conversion
     operator declaration, or the class type for a
constructor/destructor.
     Returns the actual return type of the
function;
that
may
be
different
     than TYPE if an error occurs, or for certain
special
functions.
*/
@@ -12361,8 +12362,19 @@ check_special_function_return_type
(special_function_kind sfk,
      tree type,
      tree optype,
      int
type_quals,
+    const cp_declarator
*declarator,
+    location_t id_loc,


id_loc should be the same as declarator->id_loc?

You’re right.


      const
location_t*
locations)
      {
+  /* If TYPE is unspecified, DECLARATOR, if set, should
not
represent a pointer
+ or a reference type.  */
+  if (type == NULL_TREE
+  && declarator
+  && (declarator->kind == cdk_pointer
+  || declarator->kind == cdk_reference))
+    error_at (id_loc, "expected unqualified-id before
%qs
token",
+  declarator->kind == cdk_pointer ? "*" :
"&");


...and id_loc isn't the location of the ptr-operator, it's
the



location of the identifier, so this indicates the wrong
column.
I
think using declarator->id_loc makes sense, just not
pretending
it's
the location of the *.

Good catch, thanks.


Let's give diagnostics more like the others later in the
function
instead of trying to emulate cp_parser_error.

Makes sense. This is what the updated patch does,
successfully
tested on
x86_64-pc-linux-gnu. OK for GCC 16?


OK.


Does this also fix 118304?  If so, let's go ahead and apply it
to
GCC
15.

I have checked just now, and we still ICE for 118304’s
testcase
with
that fix.


Why doesn't the preeexisting

type = void_type_node;

in check_special_function_return_type fix the return type and
avoid



the ICE?



We hit the gcc_assert at method.cc:3593, that Marek’s fix



bypasses.


Yes, but why doesn't check_special_function_return_type prevent
that?


Ah, because we call it before walking the declarator.  We need to
check again later, perhaps in grokfndecl, that the type is correct.
Perhaps instead of your patch.

One “issue” with adding another check in or close to grokfndecl
is
that DECLARATOR will have “been moved to the ID”, and the fact
that
we had a CDK_POINTER kind is “lost”. We could obviously somehow
propagate this information, but there might be something easier.


The information isn't lost: it's now reflected in the (wrong) return
type.  One place it would make sense to check would be


 if (ctype && (sfk == sfk_constructor
   || sfk == sfk_destructor))
   {
 /* We are within a class's scope. If our declarator
name
   is the same as the class name, and we are defining
  a
function, then it is a constructor/destructor, and
  therefore
returns a void type.  */


Here 'type' is still the return type, we haven't gotten to
build_function_type yet.

That’s true. However, doesn’t it make sense to cram all the checks
about the return type of special functions in
check_special_function_return_type, and return an error if that return
type is invalid?


This error seems easily recoverable since we know what the type needs to 
be, there's no need for error return from grokdeclarator.


However, an alternative to my suggestion above would be to build on your 
patch by making check_special_function_return_type actually strip the 
invalid declarators, not just complain about them.


Jason

Re: [PATCH] c++: Use cplus_decl_attributes rather than decl_attributes in grokdecl [PR118773]


On 2/6/25 1:49 PM, Jakub Jelinek wrote:

Hi!

My r15-3046 change regressed the first half of the following testcase.
When it calls decl_attributes, it doesn't handle attributes with
dependent arguments correctly and so is now rejected that N is not
a constant integer during template parsing.

I've actually followed the pointer/reference case which did that
too and that one has been failing for a couple of years on the
second part of the testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


Note, there is also
   if (decl_context != PARM && decl_context != TYPENAME)
 /* Assume that any attributes that get applied late to
templates will DTRT when applied to the declaration
as a whole.  */
 late_attrs = splice_template_attributes (&attrs, type);
   returned_attrs = decl_attributes (&type,
 attr_chainon (returned_attrs,
   attrs),
 attr_flags);
   returned_attrs = attr_chainon (late_attrs, returned_attrs);
call directly to decl_attributes in grokdeclarator, but this one handles
the splicing manually, so maybe it is ok as is (and I don't have a testcase
of anything misbehaving for that).

2025-02-06  Jakub Jelinek  

PR c++/118773
* decl.cc (grokdeclarator): Use cplus_decl_attributes rather than
decl_attributes for std_attributes on pointer and array types.

* g++.dg/cpp0x/gen-attrs-87.C: New test.
* g++.dg/gomp/attrs-3.C: Adjust expected diagnostics.

--- gcc/cp/decl.cc.jj   2025-02-06 09:01:25.600721993 +0100
+++ gcc/cp/decl.cc  2025-02-06 15:59:20.707070240 +0100
@@ -13846,7 +13846,7 @@ grokdeclarator (const cp_declarator *dec
  
  	   The optional attribute-specifier-seq appertains to the

   array type.  */
-   decl_attributes (&type, declarator->std_attributes, 0);
+   cplus_decl_attributes (&type, declarator->std_attributes, 0);
  break;
  
  	case cdk_function:

@@ -14522,8 +14522,7 @@ grokdeclarator (const cp_declarator *dec
 [the optional attribute-specifier-seq (7.6.1) appertains
  to the pointer and not to the object pointed to].  */
  if (declarator->std_attributes)
-   decl_attributes (&type, declarator->std_attributes,
-0);
+   cplus_decl_attributes (&type, declarator->std_attributes, 0);
  
  	  ctype = NULL_TREE;

  break;
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-87.C.jj2025-02-06 
16:14:27.247387432 +0100
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-87.C   2025-02-06 16:13:57.413804728 
+0100
@@ -0,0 +1,10 @@
+// PR c++/118773
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template 
+using T = char[4] [[gnu::aligned (N)]];
+T<2> t;
+template 
+using U = char *[[gnu::aligned (N)]]*;
+U<__alignof (char *)> u;
--- gcc/testsuite/g++.dg/gomp/attrs-3.C.jj  2024-08-30 16:09:01.230290254 
+0200
+++ gcc/testsuite/g++.dg/gomp/attrs-3.C 2025-02-06 19:23:02.331719653 +0100
@@ -32,10 +32,10 @@ foo ()
  i++;
auto a = [] () [[omp::directive (threadprivate (t1))]] {};  // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
int [[omp::directive (threadprivate (t2))]] b;  // { dg-warning 
"attribute ignored" }
-  int *[[omp::directive (threadprivate (t3))]] c;  // { dg-warning 
"'omp::directive' scoped attribute directive ignored" }
-  int &[[omp::directive (threadprivate (t4))]] d = b;  // { dg-warning 
"'omp::directive' scoped attribute directive ignored" }
+  int *[[omp::directive (threadprivate (t3))]] c;  // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
+  int &[[omp::directive (threadprivate (t4))]] d = b;  // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
typedef int T [[omp::directive (threadprivate (t5))]];  // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
int e [[omp::directive (threadprivate (t6))]] [10]; // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
-  int f[10] [[omp::directive (threadprivate (t6))]];   // { dg-warning 
"'omp::directive' scoped attribute directive ignored" }
+  int f[10] [[omp::directive (threadprivate (t6))]];   // { dg-error 
"'omp::directive' not allowed to be specified in this context" }
struct [[omp::directive (threadprivate (t7))]] S {};// { dg-error 
"'omp::directive' not allowed to be specified in this context" }
  }

Jakub

Re: [PATCH] c++: ICE with unparsed noexcept [PR117106]


On 2/6/25 1:48 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
In a member-specification of a class, a noexcept-specifier is
a complete-class context.  Thus we delay parsing until the end of
the class via our DEFERRED_PARSE mechanism; see cp_parser_save_noexcept
and cp_parser_late_noexcept_specifier.

We also attempt to defer instantiation of noexcept-specifiers in order
to reduce the number of instantiations; this is done via DEFERRED_NOEXCEPT.

We can even have both, as in noexcept65.C: a DEFERRED_PARSE wrapped in
DEFERRED_NOEXCEPT, which uses the DEFPARSE_INSTANTIATIONS mechanism.
noexcept65.C works, because when we really need the noexcept, which is
when parsing the body of S::A::A(), the noexcept will have been parsed
already; noexcepts are parsed before bodies of member function.

But in this test we have:

   struct A {
   int x;
   template
   void foo() noexcept(noexcept(x)) {}
   auto bar() -> decltype(foo()) {} // #1
   };

and I think the decltype in #1 needs the unparsed noexcept before it
could have been parsed.  clang++ rejects the test and I suppose we
should reject it as well, rather than crashing on a DEFERRED_PARSE
in tsubst_expr.

PR c++/117106
PR c++/118190

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Give an error if the noexcept
hasn't been parsed yet.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept89.C: New test.
* g++.dg/cpp0x/noexcept90.C: New test.
---
  gcc/cp/pt.cc| 16 +++-
  gcc/testsuite/g++.dg/cpp0x/noexcept89.C |  9 +
  gcc/testsuite/g++.dg/cpp0x/noexcept90.C | 12 
  3 files changed, 32 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept89.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept90.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 39232b5e67f..8108bf5de65 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27453,7 +27453,8 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  {
static hash_set* fns = new hash_set;
bool added = false;
-  if (DEFERRED_NOEXCEPT_PATTERN (noex) == NULL_TREE)
+  tree pattern = DEFERRED_NOEXCEPT_PATTERN (noex);
+  if (pattern == NULL_TREE)
{
  spec = get_defaulted_eh_spec (fn, complain);
  if (spec == error_mark_node)
@@ -27464,13 +27465,19 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
else if (!(added = !fns->add (fn)))
{
  /* If hash_set::add returns true, the element was already there.  */
- location_t loc = cp_expr_loc_or_loc (DEFERRED_NOEXCEPT_PATTERN (noex),
-   DECL_SOURCE_LOCATION (fn));
+ location_t loc = cp_expr_loc_or_loc (pattern,
+  DECL_SOURCE_LOCATION (fn));
  error_at (loc,
"exception specification of %qD depends on itself",
fn);
  spec = noexcept_false_spec;
}
+  else if (TREE_CODE (pattern) == DEFERRED_PARSE)
+   {
+ error ("exception specification of %qD is not available "
+"until end of class definition", fn);
+ spec = noexcept_false_spec;
+   }
else if (push_tinst_level (fn))
{
  const bool push_to_top = maybe_push_to_top_level (fn);
@@ -27497,8 +27504,7 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
++processing_template_decl;
  
  	  /* Do deferred instantiation of the noexcept-specifier.  */

- noex = tsubst_expr (DEFERRED_NOEXCEPT_PATTERN (noex),
- DEFERRED_NOEXCEPT_ARGS (noex),
+ noex = tsubst_expr (pattern, DEFERRED_NOEXCEPT_ARGS (noex),
  tf_warning_or_error, fn);
  /* Build up the noexcept-specification.  */
  spec = build_noexcept_spec (noex, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept89.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept89.C
new file mode 100644
index 000..308abf6fb45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept89.C
@@ -0,0 +1,9 @@
+// PR c++/117106
+// { dg-do compile { target c++11 } }
+
+struct A {
+int x;
+template
+void foo() noexcept(noexcept(x)) {}
+auto bar() -> decltype(foo()) {} // { dg-error "not available until end of 
class" }
+};
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept90.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept90.C
new file mode 100644
index 000..6d403f66e72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept90.C
@@ -0,0 +1,12 @@
+// PR c++/118190
+// { dg-do compile { target c++11 } }
+
+struct S {
+  template
+  struct S5 {
+void f1() noexcept(noexcept(i)) { }
+int i;
+  };
+  S5 s5;
+  static_assert (noexcept(s5.f1()), ""); // { dg-error "not available until end of 
class|static assertion failed" }
+};

bas

RE: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Li, Pan2

Thanks Jeff and Andrew, committed as the CI passed.

Pan

-Original Message-
From: Andrew Waterman  
Sent: Friday, February 7, 2025 9:54 PM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

On Fri, Feb 7, 2025 at 5:51 AM Jeff Law  wrote:
>
>
>
> On 2/7/25 5:59 AM, Andrew Waterman wrote:
> > This patch runs counter to the ABI spec, which states that vxrm is not
> > preserved across calls and is volatile upon function entry [1].  vxrm
> > does not play the same role as frm plays in the calling convention.
> > (I won't get into the rationale in this email, but the rationale isn't
> > especially important: we should follow the ABI.)
> >
> > [1] 
> > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
> Pan's patch doesn't change the basic property that VXRM has no known
> state at function entry or upon return from a function call.

Ah, GCC-internal notion of global register versus the conventional
understanding of the term.  My mistake.

>
> Jeff
>
>

[PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove.
(UNSPEC_LASX_XVMADDWEV2): Remove.
(UNSPEC_LASX_XVMADDWEV3): Remove.
(UNSPEC_LASX_XVMADDWOD): Remove.
(UNSPEC_LASX_XVMADDWOD2): Remove.
(UNSPEC_LASX_XVMADDWOD3): Remove.
(lasx_xvmaddwev_h_b): Remove.
(lasx_xvmaddwev_w_h): Remove.
(lasx_xvmaddwev_d_w): Remove.
(lasx_xvmaddwev_q_d): Remove.
(lasx_xvmaddwod_h_b): Remove.
(lasx_xvmaddwod_w_h): Remove.
(lasx_xvmaddwod_d_w): Remove.
(lasx_xvmaddwod_q_d): Remove.
(lasx_xvmaddwev_q_du): Remove.
(lasx_xvmaddwod_q_du): Remove.
(lasx_xvmaddwev_h_bu_b): Remove.
(lasx_xvmaddwev_w_hu_h): Remove.
(lasx_xvmaddwev_d_wu_w): Remove.
(lasx_xvmaddwev_q_du_d): Remove.
(lasx_xvmaddwod_h_bu_b): Remove.
(lasx_xvmaddwod_w_hu_h): Remove.
(lasx_xvmaddwod_d_wu_w): Remove.
(lasx_xvmaddwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove.
(UNSPEC_LSX_VMADDWEV2): Remove.
(UNSPEC_LSX_VMADDWEV3): Remove.
(UNSPEC_LSX_VMADDWOD): Remove.
(UNSPEC_LSX_VMADDWOD2): Remove.
(UNSPEC_LSX_VMADDWOD3): Remove.
(lsx_vmaddwev_h_b): Remove.
(lsx_vmaddwev_w_h): Remove.
(lsx_vmaddwev_d_w): Remove.
(lsx_vmaddwev_q_d): Remove.
(lsx_vmaddwod_h_b): Remove.
(lsx_vmaddwod_w_h): Remove.
(lsx_vmaddwod_d_w): Remove.
(lsx_vmaddwod_q_d): Remove.
(lsx_vmaddwev_q_du): Remove.
(lsx_vmaddwod_q_du): Remove.
(lsx_vmaddwev_h_bu_b): Remove.
(lsx_vmaddwev_w_hu_h): Remove.
(lsx_vmaddwev_d_wu_w): Remove.
(lsx_vmaddwev_q_du_d): Remove.
(lsx_vmaddwod_h_bu_b): Remove.
(lsx_vmaddwod_w_hu_h): Remove.
(lsx_vmaddwod_d_wu_w): Remove.
(lsx_vmaddwod_q_du_d): Remove.
* config/loongarch/simd.md (simd_maddw_evod__):
New define_insn.
(_vmaddw__): New
define_expand.
(simd_maddw_evod__hetero): New define_insn.
(_vmaddw__u_):
New define_expand.
(_maddw_q_d_punned): New define_expand.
(_maddw_q_du_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it
with the punned expand.
(CODE_FOR_lsx_vmaddwev_q_du): Likewise.
(CODE_FOR_lsx_vmaddwev_q_du_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise.
---
 gcc/config/loongarch/lasx.md   | 400 -
 gcc/config/loongarch/loongarch-builtins.cc |  14 +
 gcc/config/loongarch/lsx.md| 320 -
 gcc/config/loongarch/simd.md   | 104 ++
 4 files changed, 118 insertions(+), 720 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 1dc11840187..4ac85b7fcf9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -94,12 +94,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVPERMI_Q
   UNSPEC_LASX_XVPERMI_D
 
-  UNSPEC_LASX_XVMADDWEV
-  UNSPEC_LASX_XVMADDWEV2
-  UNSPEC_LASX_XVMADDWEV3
-  UNSPEC_LASX_XVMADDWOD
-  UNSPEC_LASX_XVMADDWOD2
-  UNSPEC_LASX_XVMADDWOD3
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -3122,400 +3116,6 @@ (define_insn "lasx_xvldrepl__insn_0"
(set_attr "mode" "")
(set_attr "length" "4")])
 
-;;XVMADDWEV.H.B   XVMADDWEV.H.BU
-(define_insn "lasx_xvmaddwev_h_b"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (plus:V16HI
- (match_operand:V16HI 1 "register_operand" "0")
- (mult:V16HI
-   (any_extend:V16HI
- (vec_select:V16QI
-   (match_operand:V32QI 2 "register_operand" "%f")
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)
-  (const_int 8) (const_int 10)
-  (const_int 12) (const_int 14)
-  (const_int 16) (const_int 18)
-  (const_int 20) (const_int 22)
-  (const_int 24) (const_int 26)
-  (const_int 28) (const_int 30)])))
-

PING^2 [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]

2025-02-07 Thread Oleg Endo

> Hi,
> 
> Can the issue be resolved in a target independent manner as suggested below?
> Or is it better to deal with this in the target code?
> 
> Best regards,
> Oleg Endo
> 
> On Fri, 2024-09-27 at 00:26 -0400, Pietro Monteiro wrote:
> > The prefetch instruction that is emitted by __builtin_prefetch is
> > re-ordered on GCC, but not on clang[0]. GCC's behavior is surprising
> > because when using the builtin you want the instruction to be placed at
> > the exact point where you put it. Moving it around, specially across
> > load/stores, may end up being a pessimization. Adding a blockage
> > instruction before the prefetch prevents the scheduler from moving it.
> > 
> > [0] https://godbolt.org/z/Ycjr7Tq8b
> > 
> > 
> > -- 8< --
> > 
> > 
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 37c7c98e5c..fec751e0d6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
> >create_integer_operand (&ops[1], INTVAL (op1));
> >create_integer_operand (&ops[2], INTVAL (op2));
> >if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> > -   return;
> > +{
> > +  /* Prevent the prefetch from being moved.  */
> > +  rtx_insn *last = get_last_insn ();
> > +  emit_insn_before (gen_blockage (), last);
> > +  return;
> > +}
> >  }
> >  
> >/* Don't do anything with direct references to volatile memory, but

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has


On 07/02/2025 12:53, Tobias Burnus wrote:

Hi Andrew,

Andrew Stubbs wrote:
I think the correct place for this whole concept might be in the 
MULTILIB_MATCHES configuration option, not in mkoffload.


In any case, mkoffload needs to know about this; if only the driver 
('gcc') knows about it, it comes too late for the early debug file 
writing. — And if only the compiler itself (lto1, cc1, f951) knows about 
it, it comes too late for mkoffload, 'as' (llvm-mc) and the collect/ 
(l)ld run.


That's not how MULTILIB_MATCHES works. The debug .o file would use the 
same arch as the user specified.


I just realized that I'm assuming that -march=gfx1100 object files will 
link with -march=gfx11-generic libraries, and produce gfx1100 binaries. 
Is this not the case?


The mkoffload.cc ELF-writing issue is actually the reason we already 
check whether the default version (gfx900) has been overridden at 
compile time - and, hence, already include the required config/multlib 
include files.


(Maybe not a compelling reason, but when invoking amdgcn-amdhsa- 
{gcc,gfortran}, there is no need to disallow any version as no library 
is linked.)



What's the motivation for adding the warning?


I don't like silently changing the specified -march=. But I also want to 
make it easy for a user to find the option when compiling with, e.g. - 
march=gfx1100 and only -march=gfx11-generic is available as 
lib{c,m,gomp,gfortran}.


Thus, I thought having a warning would be useful: – it does by default 
do the right thing but informs the user why the compiler did something 
different.


So, the recommended way to silence the warning would be to use the 
generic arch explicitly?


I don't think any of the restrictions are so interesting for library 
code.
In theory there are some restricted instructions that might be used in 
libm, perhaps, at some future time, but that's all. The register count 
restrictions are not interesting at all, since that restricts 
occupancy, not usage (which is already limited by the ABI).


For now, there is also the issue that only ROCm > 6.3.2 support it, i.e. 
recompiling with a new distro version would fail with an odd error while 
it worked in the version before. Using a warning prevents all this.


And otherwise, it is not only library code – it is also hot offloading 
code. Mixing a generic-code library with a specific-code runtime is not 
permitted (rejected by lld). And in some cases, I could imagine that 
some operations could matter.


Library code does not have metadata to conflict (that only comes from 
entry points - although I just realized that init/fini might break that 
assumption), so as long as the generic ISA is a strict subset of the 
specific GPU ISA, it ought to work. But if it doesn't then I guess we're 
out of luck.


But admittedly, the restrictions aren't that hard. For gfx115x, the 
scalar ALU floating point instructions and SGPRs are not supported for 
src1 in data parallel processing (dpp) instructions could matter in 
theory, but I don't think that we would exploit this – and there are 
other things to first optimize for.


For AI-style applications, the FP8/BF8/XF32 restrictions could matter 
with gfx9-4-generic, but we don't support gfx94x yet and, again, we 
should start with other type of optimizations first.



* * *


This business of changing the -march flag from what the user specified 
is also questionable.



I concur – but it is the simplest way to permit a user to link the code, 
point him to the existence of the new -march= flag and avoid gotchas but 
makes also clear why the flag was changed.


That's based on Richard's comment ...


For distributors it might be good to just ship -generic multilibs and
have all specific -march=gfxXYZ to map to their respective -generic
variant.  That is, consider the configured multilibs when interpreting
-march=gfxXYZ which probably means always configuring the -generic
multilibs (and back to dependence on llvm19 and recent ROCm for the
runtime ...).

That said, I'm happy about -generic, and I hope it ends up in GCC 15
in some way.


... and trying to come up with something that solves this issue but 
avoids surprises.


I think Richard was assuming that MULTILIB_MATCHES linking would work.



I use locally --with-multilib- 
list=gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103,gfx9- 
generic,gfx11-generic (i.e. no gfx900 and no gfx10-3-generic + none of 
the newly added ones.)


And when plying around for testing all the patches, it works rather 
smoothly.


Except you have warnings...

Andrew

Re: [PATCH] c++: Don't use CLEANUP_EH_ONLY for new expression cleanup [PR118763]


On 2/6/25 1:44 PM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled since r12-6325 stopped
preevaluating the initializers for new expression.
If evaluating the initializers throws, there is a correct cleanup
for that, but it is marked CLEANUP_EH_ONLY.  While in standard
C++ that is just fine, if it has statement expressions, it can
return or goto out of the expression and we should delete the
pointer in that case too.

There is already a sentry variable initialized to true and
set to false after everything is initialized and used as a guard
for the cleanup, so just removing the CLEANUP_EH_ONLY flag does
everything we need.  And in the normal case of the initializer
not using statement expressions at least with -O2 we get the same code,
while the change changes one
try { sentry = true; ... sentry = false; } catch { if (sentry) delete ...; }
into
try { sentry = true; ... sentry = false; } finally { if (sentry) delete ...; }
optimizations will see that sentry is false when reaching the finally
other than through an exception.

Though, wonder what other CLEANUP_EH_ONLY cleanups might be an issue
with statement expressions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2025-02-06  Jakub Jelinek  

PR c++/118763
* init.cc (build_new_1): Don't set CLEANUP_EH_ONLY.

* g++.dg/asan/pr118763.C: New test.

--- gcc/cp/init.cc.jj   2025-02-04 21:54:35.102087948 +0100
+++ gcc/cp/init.cc  2025-02-06 12:25:17.624810169 +0100
@@ -3842,7 +3842,6 @@ build_new_1 (vec **placemen
  tree end, sentry, begin;
  
  	  begin = get_target_expr (boolean_true_node);

- CLEANUP_EH_ONLY (begin) = 1;
  
  	  sentry = TARGET_EXPR_SLOT (begin);
  
--- gcc/testsuite/g++.dg/asan/pr118763.C.jj	2025-02-06 12:23:50.724022482 +0100

+++ gcc/testsuite/g++.dg/asan/pr118763.C2025-02-06 12:23:29.407319860 
+0100
@@ -0,0 +1,15 @@
+// PR c++/118763
+// { dg-do run }
+
+int *
+foo (bool x)
+{
+  return new int (({ if (x) return nullptr; 1; }));
+}
+
+int
+main ()
+{
+  delete foo (true);
+  delete foo (false);
+}

Jakub

Re: [PATCH] c++, v2: Allow constexpr reads from volatile std::nullptr_t objects [PR118661]


On 2/6/25 1:52 PM, Jakub Jelinek wrote:

On Thu, Feb 06, 2025 at 01:45:59PM -0500, Marek Polacek wrote:

--- gcc/cp/constexpr.cc.jj  2025-02-05 13:14:34.771198185 +0100
+++ gcc/cp/constexpr.cc 2025-02-06 09:53:03.236587121 +0100
@@ -9717,7 +9717,8 @@ potential_constant_expression_1 (tree t,
  return true;
  
if (TREE_THIS_VOLATILE (t) && want_rval

-  && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (t)))
+  && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (t))
+  && TREE_CODE (TREE_TYPE (t)) != NULLPTR_TYPE)


Patch looks good but we should use NULLPTR_TYPE_P.


You're right.  Here is the patch adjusted:


OK.


2025-02-06  Jakub Jelinek  

PR c++/118661
* constexpr.cc (potential_constant_expression_1): Don't diagnose
lvalue-to-rvalue conversion of volatile lvalue if it has NULLPTR_TYPE.
* decl2.cc (decl_maybe_constant_var_p): Return true for constexpr
decls with NULLPTR_TYPE even if they are volatile.

* g++.dg/cpp0x/constexpr-volatile4.C: New test.
* g++.dg/cpp0x/constexpr-union9.C: New test.

--- gcc/cp/constexpr.cc.jj  2025-02-05 13:14:34.771198185 +0100
+++ gcc/cp/constexpr.cc 2025-02-06 09:53:03.236587121 +0100
@@ -9717,7 +9717,8 @@ potential_constant_expression_1 (tree t,
  return true;
  
if (TREE_THIS_VOLATILE (t) && want_rval

-  && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (t)))
+  && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (t))
+  && !NULLPTR_TYPE_P (TREE_TYPE (t)))
  {
if (flags & tf_error)
constexpr_error (loc, fundef_p, "lvalue-to-rvalue conversion of "
--- gcc/cp/decl2.cc.jj  2025-01-27 16:45:45.455970792 +0100
+++ gcc/cp/decl2.cc 2025-02-06 09:53:53.150890600 +0100
@@ -4985,7 +4985,8 @@ decl_maybe_constant_var_p (tree decl)
tree type = TREE_TYPE (decl);
if (!VAR_P (decl))
  return false;
-  if (DECL_DECLARED_CONSTEXPR_P (decl) && !TREE_THIS_VOLATILE (decl))
+  if (DECL_DECLARED_CONSTEXPR_P (decl)
+  && (!TREE_THIS_VOLATILE (decl) || NULLPTR_TYPE_P (type)))
  return true;
if (DECL_HAS_VALUE_EXPR_P (decl))
  /* A proxy isn't constant.  */
--- gcc/testsuite/g++.dg/cpp0x/constexpr-volatile4.C.jj 2025-02-06 
09:50:43.339539282 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-volatile4.C2025-02-06 
09:50:16.071919784 +0100
@@ -0,0 +1,20 @@
+// PR c++/118661
+// { dg-do compile { target c++11 } }
+
+using nullptr_t = decltype (nullptr);
+constexpr volatile nullptr_t a = {};
+constexpr nullptr_t b = a;
+
+constexpr nullptr_t
+foo ()
+{
+#if __cplusplus >= 201402L
+  volatile nullptr_t c = {};
+  return c;
+#else
+  return nullptr;
+#endif
+}
+
+static_assert (b == nullptr, "");
+static_assert (foo () == nullptr, "");
--- gcc/testsuite/g++.dg/cpp0x/constexpr-union9.C.jj2025-02-06 
09:57:46.149639270 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-union9.C   2025-02-06 
10:01:08.472815988 +0100
@@ -0,0 +1,16 @@
+// PR c++/118661
+// { dg-do compile { target c++11 } }
+
+using nullptr_t = decltype (nullptr);
+union U { int i; nullptr_t n; };
+constexpr U u = { 42 };
+static_assert (u.n == nullptr, "");
+
+#if __cplusplus >= 201402L
+constexpr nullptr_t
+foo ()
+{
+  union U { int i; nullptr_t n; } u = { 42 };
+  return u.n;
+}
+#endif


Jakub

[PATCH] jit/118780 - make sure to include dlfcn.h when plugin support is disabled

The following makes the dlfcn.h explicitly requested which avoids
build failure when JIT is enabled but plugin support disabled as
currently the include is conditional on plugin support.

I've built GCC with JIT enabled and plugin support disabled as well
as the other way around successfully with this patch.

OK for trunk and branches (after a while)?

Thanks,
Richard.

PR jit/118780
gcc/
* system.h: Check INCLUDE_DLFCN_H for including dlfcn.h instead
of ENABLE_PLUGIN.
* plugin.cc: Define INCLUDE_DLFCN_H.

gcc/jit/
* jit-playback.cc: Define INCLUDE_DLFCN_H.
* jit-result.cc: Likewise.
---
 gcc/jit/jit-playback.cc | 1 +
 gcc/jit/jit-result.cc   | 1 +
 gcc/plugin.cc   | 1 +
 gcc/system.h| 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index c9fcebc4730..6946f100d5c 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MUTEX
+#define INCLUDE_DLFCN_H
 #include "libgccjit.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/jit/jit-result.cc b/gcc/jit/jit-result.cc
index 1c793aef062..2ad6deb1da8 100644
--- a/gcc/jit/jit-result.cc
+++ b/gcc/jit/jit-result.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_DLFCN_H
 #include "system.h"
 #include "coretypes.h"
 
diff --git a/gcc/plugin.cc b/gcc/plugin.cc
index 6d3394908fc..0de2cc2dd2c 100644
--- a/gcc/plugin.cc
+++ b/gcc/plugin.cc
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
APIs described in doc/plugin.texi.  */
 
 #include "config.h"
+#define INCLUDE_DLFCN_H
 #include "system.h"
 #include "coretypes.h"
 #include "options.h"
diff --git a/gcc/system.h b/gcc/system.h
index 39d28ba0bb4..e516b49d04a 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -694,7 +694,7 @@ extern int vsnprintf (char *, size_t, const char *, 
va_list);
 # endif
 #endif
 
-#if defined (ENABLE_PLUGIN) && defined (HAVE_DLFCN_H)
+#if defined (INCLUDE_DLFCN_H) && defined (HAVE_DLFCN_H)
 /* If plugin support is enabled, we could use libdl.  */
 #include 
 #endif
-- 
2.43.0

Re: [PATCH] [testsuite] tolerate later success [PR108357]





On 2/6/25 3:50 AM, Alexandre Oliva wrote:


On leon3-elf and presumably on other targets, the test fails due to
differences in calling conventions and other reasons, that add extra
gimple stmts that prevent the expected optimization at the expected
point.  The optimization takes place anyway, just a little later, so
tolerate that.

Regstrapped on x86_64-linux-gnu, also tested with gcc-14 targeting
leon3-elf.  Ok to install?


for  gcc/testsuite/ChangeLog

PR tree-optimization/108357
* gcc.dg/tree-ssa/pr108357.c: Tolerate later optimization.

OK
jeff

Re: [PATCH 1/3] c++: Fix mangling of lambas in static member template initializers [PR107741]

2025-02-07 Thread Nathaniel Shead

On Fri, Feb 07, 2025 at 08:05:54AM -0500, Jason Merrill wrote:
> On 1/31/25 8:44 AM, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > -- >8 --
> > 
> > My fix for this issue in r15-7147 turns out to not be quite sufficient;
> > static member templates apparently go down a different code path and
> > need their own handling.
> > 
> > PR c++/107741
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl2.cc (start_initialized_static_member): Push the
> > TEMPLATE_DECL when appropriate.
> > * parser.cc (cp_parser_init_declarator): Start the member decl
> > early for static members so that lambda scope is set.
> > (cp_parser_template_declaration_after_parameters): Don't
> > register static members here.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/abi/lambda-ctx2-19.C: Add tests for template members.
> > * g++.dg/abi/lambda-ctx2-19vs20.C: Likewise.
> > * g++.dg/abi/lambda-ctx2-20.C: Likewise.
> > * g++.dg/abi/lambda-ctx2.h: Likewise.
> > * g++.dg/cpp0x/static-member-init-1.C: Likewise.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/decl2.cc   | 15 --
> >   gcc/cp/parser.cc  | 30 +++
> >   gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C |  3 ++
> >   gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C |  3 ++
> >   gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C |  3 ++
> >   gcc/testsuite/g++.dg/abi/lambda-ctx2.h| 16 ++
> >   .../g++.dg/cpp0x/static-member-init-1.C   |  5 
> >   7 files changed, 67 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> > index 9e61afd359f..994a459c79c 100644
> > --- a/gcc/cp/decl2.cc
> > +++ b/gcc/cp/decl2.cc
> > @@ -1295,6 +1295,8 @@ start_initialized_static_member (const cp_declarator 
> > *declarator,
> > gcc_checking_assert (VAR_P (value));
> > DECL_CONTEXT (value) = current_class_type;
> > +  DECL_INITIALIZED_IN_CLASS_P (value) = true;
> > +
> > if (processing_template_decl)
> >   {
> > value = push_template_decl (value);
> > @@ -1305,8 +1307,17 @@ start_initialized_static_member (const cp_declarator 
> > *declarator,
> > if (attrlist)
> >   cplus_decl_attributes (&value, attrlist, 0);
> > -  finish_member_declaration (value);
> > -  DECL_INITIALIZED_IN_CLASS_P (value) = true;
> > +  /* When defining a template we need to register the TEMPLATE_DECL.  */
> > +  tree maybe_template = value;
> > +  if (template_parm_scope_p ())
> > +{
> > +  if (!DECL_TEMPLATE_SPECIALIZATION (value))
> > +   maybe_template = DECL_TI_TEMPLATE (value);
> > +  else
> > +   maybe_template = NULL_TREE;
> > +}
> > +  if (maybe_template)
> > +finish_member_declaration (maybe_template);
> 
> Sigh, this all seems increasingly fragile.  Perhaps it would have been
> preferable to break up grokfield for all members rather than just
> initialized variables.  Or at least for all static data members.  Now is not
> the time for that, of course.
> 

Maybe, yeah; I might look into seeing what can be untangled here for
GCC 16.

> > return value;
> >   }
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 7ddb7f119a4..af1c3774f74 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -24179,8 +24179,17 @@ cp_parser_init_declarator (cp_parser* parser,
> >  here we only handle the latter two.  */
> >   bool has_lambda_scope = false;
> > + if (member_p && decl_specifiers->storage_class == sc_static)
> > +   {
> > + gcc_checking_assert (!decl);
> > + tree all_attrs = attr_chainon (attributes, prefix_attributes);
> > + decl = start_initialized_static_member (declarator,
> > + decl_specifiers,
> > + all_attrs);
> > +   }
> 
> Could we do this sooner, near the start_decl call?  And adjust the comment
> there that says we wait until after the initializer for members...
>

Done.

> >   if (decl != error_mark_node
> > - && !member_p
> > + && (!member_p || decl)
> 
> I think this line can just be "decl" now?
> 

Done.

> >   && (processing_template_decl || DECL_NAMESPACE_SCOPE_P (decl)))
> > has_lambda_scope = true;
> > @@ -33739,7 +33752,12 @@ cp_parser_template_declaration_after_parameters 
> > (cp_parser* parser,
> >   }
> > /* Register member declarations.  */
> > -  if (member_p && !friend_p && decl && !DECL_CLASS_TEMPLATE_P (decl))
> > +  if (member_p && !friend_p && decl && !DECL_CLASS_TEMPLATE_P (decl)
> > +  /* But this is not needed for initialised static members, that were
> > +registered early to be able to be used in their own definition.  */
> > +  && !(variable_template_p (decl)
> > +  && DECL_CLASS_SCOPE_P (decl)
> > +  && DECL_INITIALIZED_IN_CLASS_P (DECL_TEMPLATE_RESULT (decl
> 
> This

Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg





On 2/3/25 2:09 AM, Richard Sandiford wrote:



So yeah, I think the first question is why ira_build_conflicts isn't
kicking in for this register or (if it is) why we still get register 0.

So pulling on this thread leads me into the code that sets up
ALLOCNO_WMODE in create_insn_allocnos:


   if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
 {
   a = ira_create_allocno (regno, false, ira_curr_loop_tree_node);
   if (outer != NULL && GET_CODE (outer) == SUBREG)
 {
   machine_mode wmode = GET_MODE (outer);
   if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
 ALLOCNO_WMODE (a) = wmode;
 }
 }

Note how we only set ALLOCNO_MODE only at allocno creation, so it'll
work as intended if and only if the first reference is via a SUBREG.


Huh, yeah, I agree that that looks wrong.


ISTM the fix here is to always do the check and set ALLOCNO_WMODE.

The other bug I see is that we may potentially have paradoxicals in
different modes.  ie, on a 32 bit target, we could in theory have a
paradoxical in DI and another in TI.  So in addition to pulling that
code out of the conditional so that it executes every time, the
assignment would look like

if (partial_subreg_p (ALLCONO_WMODE (a), wmode)
  && wmode > ALLOCNO_WMODE (a))
ALLOCNO_WMODE (a) = wmode;

Or something along those lines.


Not sure about this part though.  The construct:

   if (partial_subreg_p (ALLCONO_WMODE (a), wmode))
 ALLOCNO_WMODE (a) = wmode;

is effectively:

   ALLOCNO_WMODE (a) = MAX_SIZE (ALLOCNO_WMODE (a), wmode);

You're right.



and so already picks the single widest mode, if there is one.
For things like DI vs DF, it will use the existing mode as a tie-breaker.

So ISTM that moving the code out of the "if (... == NULL)" should be
enough on its own.
It is and that's actually the patch that's been in my tester for the 
last week or so.  The other (non)issue wasn't necessary to fix the 
problem, just something that looked odd/wrong as I was preparing the 
email and I included it in the discussion.


Jeff

Re: [PATCH 3/3] c++/modules: Handle exposures of TU-local types in uninstantiated member templates

2025-02-07 Thread Nathaniel Shead

On Fri, Feb 07, 2025 at 08:14:23AM -0500, Jason Merrill wrote:
> On 1/31/25 8:46 AM, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > Happy to remove the custom inform for lambdas, but I felt that the
> > original message (which suggests that defining it within a class should
> > make it OK) was unhelpful here.
> > 
> > Similarly the 'is_exposure_of_member_type' function is not necessary to
> > fix the bug, and is just for slightly nicer diagnostics.
> > 
> > -- >8 --
> > 
> > Previously, 'is_tu_local_entity' wouldn't detect the exposure of the (in
> > practice) TU-local lambda in the following example, unless instantiated:
> > 
> >struct S {
> >  template 
> >  static inline decltype([]{}) x = {};
> >};
> > 
> > This is for two reasons.  Firstly, when traversing the TYPE_FIELDS of S
> > we only see the TEMPLATE_DECL, and never end up building a dependency on
> > its DECL_TEMPLATE_RESULT (due to not being instantiated).  This patch
> > fixes this by stripping any templates before checking for unnamed types.
> > 
> > The second reason is that we currently assume all class-scope entities
> > are not TU-local.  Despite this being unambiguous in the standard, this
> > is not actually true in our implementation just yet, due to issues with
> > mangling lambdas in some circumstances.  Allowing these lambdas to be
> > exported can cause issues in importers with apparently conflicting
> > declarations, so this patch treats them as TU-local as well.
> > 
> > After these changes, we now get double diagnostics from the two ways
> > that we can see the above lambda being exposed, via 'S' (through
> > TYPE_FIELDS) or via 'S::x'.  To workaround this we hide diagnostics from
> > the first case, so we only get errors from 'S::x' which will be closer
> > to the point the offending lambda is declared.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_out::type_node): Adjust assertion.
> > (depset::hash::is_tu_local_entity): Handle unnamed template
> > types, treat lambdas specially.
> > (is_exposure_of_member_type): New function.
> > (depset::hash::add_dependency): Use it.
> > (depset::hash::finalize_dependencies): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/internal-10.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc   | 67 ++
> >   gcc/testsuite/g++.dg/modules/internal-10.C | 25 
> >   2 files changed, 81 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/internal-10.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index c89834c1abd..59b7270f4a5 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -9261,7 +9261,9 @@ trees_out::type_node (tree type)
> > /* We'll have either visited this type or have newly discovered
> >that it's TU-local; either way we won't need to visit it again.  */
> > -   gcc_checking_assert (TREE_VISITED (type) || has_tu_local_dep (name));
> > +   gcc_checking_assert (TREE_VISITED (type)
> > +|| has_tu_local_dep (TYPE_NAME (type))
> > +|| has_tu_local_dep (TYPE_TI_TEMPLATE (type)));
> 
> Why doesn't the template having a TU-local dep imply that the TYPE_NAME
> does?
> 
> Jason
> 

I may not have written this the most clearly; the type doesn't
necessarily even have a template, but if it's not visited and its
TYPE_NAME hasn't had a TU-local dep made then we must instead have seen
a TYPE_TI_TEMPLATE that does have a TU-local dep.

Would you prefer me to write it like this?

  gcc_checking_assert (TREE_VISITED (type)
   || has_tu_local_dep (TYPE_NAME (type))
   || (TYPE_TEMPLATE_INFO (type)
   && TYPE_TI_TEMPLATE (type)
   && has_tu_local_dep (TYPE_TI_TEMPLATE (type;

Alternatively, I suppose we could attempt to optimise the assertion to
only call has_tu_local_dep a maximum of one time like

  gcc_checking_assert (TREE_VISITED (type)
   || has_tu_local_dep ((TYPE_TEMPLATE_INFO (type)
 && TYPE_TI_TEMPLATE (type))
? TYPE_TI_TEMPLATE (type)
: TYPE_NAME (type)));

but I don't think this is worth it, has_tu_local_dep is basically just a
couple of conditions and a hash table lookup anyway.

Nathaniel

[PATCH 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.

gcc/ChangeLog:

* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-1.c: New test.
* gcc.target/loongarch/wide-mul-reduc-2.c: New test.
---
 gcc/config/loongarch/simd.md   | 16 
 .../gcc.target/loongarch/wide-mul-reduc-1.c| 18 ++
 .../gcc.target/loongarch/wide-mul-reduc-2.c| 17 +
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 1b78c754a12..a888c7090ce 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -624,6 +624,7 @@ (define_expand "cbranch4"
 ;; Operations on elements at even/odd indices.
 (define_int_iterator zero_one [0 1])
 (define_int_attr ev_od [(0 "ev") (1 "od")])
+(define_int_attr even_odd [(0 "even") (1 "odd")])
 
 ;; Picking even/odd elements.
 (define_insn "simd_pick_evod_"
@@ -687,6 +688,21 @@ (define_expand 
"_vw__"
   DONE;
 })
 
+(define_expand "vec_widen_mult__"
+  [(match_operand: 0 "register_operand" "=f")
+   (match_operand:IVEC   1 "register_operand" " f")
+   (match_operand:IVEC   2 "register_operand" " f")
+   (any_extend (const_int 0))
+   (const_int zero_one)]
+  ""
+{
+  emit_insn (
+gen__vmulw__ (operands[0],
+operands[1],
+operands[2]));
+  DONE;
+})
+
 (define_insn "simd_w_evod__hetero"
   [(set (match_operand: 0 "register_operand" "=f")
(addsubmul:
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
new file mode 100644
index 000..d6e0da59dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump "WIDEN_MULT_EVEN_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "WIDEN_MULT_ODD_EXPR" "optimized" } } */
+
+typedef __INT32_TYPE__ i32;
+typedef __INT64_TYPE__ i64;
+
+i32 x[8], y[8];
+
+i64
+test (void)
+{
+  i64 ret = 0;
+  for (int i = 0; i < 8; i++)
+ret ^= (i64) x[i] * y[i];
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
new file mode 100644
index 000..07a7601888a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx" } */
+/* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */
+
+typedef __INT32_TYPE__ i32;
+typedef __INT64_TYPE__ i64;
+
+i32 x[8], y[8];
+
+i64
+test (void)
+{
+  i64 ret = 0;
+  for (int i = 0; i < 8; i++)
+ret += (i64) x[i] * y[i];
+  return ret;
+}
-- 
2.48.1

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

Hi Andrew,

Andrew Stubbs wrote:
I think the correct place for this whole concept might be in the
MULTILIB_MATCHES configuration option, not in mkoffload.

In any case, mkoffload needs to know about this; if only the driver
('gcc') knows about it, it comes too late for the early debug file
writing. — And if only the compiler itself (lto1, cc1, f951) knows about
it, it comes too late for mkoffload, 'as' (llvm-mc) and the
collect/(l)ld run.

The mkoffload.cc ELF-writing issue is actually the reason we already
check whether the default version (gfx900) has been overridden at
compile time - and, hence, already include the required config/multlib
include files.

(Maybe not a compelling reason, but when invoking
amdgcn-amdhsa-{gcc,gfortran}, there is no need to disallow any version
as no library is linked.)

What's the motivation for adding the warning?

I don't like silently changing the specified -march=. But I also want to
make it easy for a user to find the option when compiling with, e.g.
-march=gfx1100 and only -march=gfx11-generic is available as
lib{c,m,gomp,gfortran}.

Thus, I thought having a warning would be useful: – it does by default
do the right thing but informs the user why the compiler did something
different.

I don't think any of the restrictions are so interesting for library code.
In theory there are some restricted instructions that might be used in
libm, perhaps, at some future time, but that's all. The register count
restrictions are not interesting at all, since that restricts
occupancy, not usage (which is already limited by the ABI).

For now, there is also the issue that only ROCm > 6.3.2 support it, i.e.
recompiling with a new distro version would fail with an odd error while
it worked in the version before. Using a warning prevents all this.

And otherwise, it is not only library code – it is also hot offloading
code. Mixing a generic-code library with a specific-code runtime is not
permitted (rejected by lld). And in some cases, I could imagine that
some operations could matter.

But admittedly, the restrictions aren't that hard. For gfx115x, the
scalar ALU floating point instructions and SGPRs are not supported for
src1 in data parallel processing (dpp) instructions could matter in
theory, but I don't think that we would exploit this – and there are
other things to first optimize for.

For AI-style applications, the FP8/BF8/XF32 restrictions could matter
with gfx9-4-generic, but we don't support gfx94x yet and, again, we
should start with other type of optimizations first.

* * *

This business of changing the -march flag from what the user specified
is also questionable.

I concur – but it is the simplest way to permit a user to link the code,
point him to the existence of the new -march= flag and avoid gotchas but
makes also clear why the flag was changed.

That's based on Richard's comment ...

For distributors it might be good to just ship -generic multilibs and
have all specific -march=gfxXYZ to map to their respective -generic
variant. That is, consider the configured multilibs when interpreting
-march=gfxXYZ which probably means always configuring the -generic
multilibs (and back to dependence on llvm19 and recent ROCm for the
runtime ...).

That said, I'm happy about -generic, and I hope it ends up in GCC 15
in some way.

... and trying to come up with something that solves this issue but
avoids surprises.

I use locally
--with-multilib-list=gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103,gfx9-generic,gfx11-generic
(i.e. no gfx900 and no gfx10-3-generic + none of the newly added ones.)

And when plying around for testing all the patches, it works rather
smoothly.

Tobias

Re: [PATCH 3/3] c++/modules: Handle exposures of TU-local types in uninstantiated member templates


On 1/31/25 8:46 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Happy to remove the custom inform for lambdas, but I felt that the
original message (which suggests that defining it within a class should
make it OK) was unhelpful here.

Similarly the 'is_exposure_of_member_type' function is not necessary to
fix the bug, and is just for slightly nicer diagnostics.

-- >8 --

Previously, 'is_tu_local_entity' wouldn't detect the exposure of the (in
practice) TU-local lambda in the following example, unless instantiated:

   struct S {
 template 
 static inline decltype([]{}) x = {};
   };

This is for two reasons.  Firstly, when traversing the TYPE_FIELDS of S
we only see the TEMPLATE_DECL, and never end up building a dependency on
its DECL_TEMPLATE_RESULT (due to not being instantiated).  This patch
fixes this by stripping any templates before checking for unnamed types.

The second reason is that we currently assume all class-scope entities
are not TU-local.  Despite this being unambiguous in the standard, this
is not actually true in our implementation just yet, due to issues with
mangling lambdas in some circumstances.  Allowing these lambdas to be
exported can cause issues in importers with apparently conflicting
declarations, so this patch treats them as TU-local as well.

After these changes, we now get double diagnostics from the two ways
that we can see the above lambda being exposed, via 'S' (through
TYPE_FIELDS) or via 'S::x'.  To workaround this we hide diagnostics from
the first case, so we only get errors from 'S::x' which will be closer
to the point the offending lambda is declared.

gcc/cp/ChangeLog:

* module.cc (trees_out::type_node): Adjust assertion.
(depset::hash::is_tu_local_entity): Handle unnamed template
types, treat lambdas specially.
(is_exposure_of_member_type): New function.
(depset::hash::add_dependency): Use it.
(depset::hash::finalize_dependencies): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-10.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 67 ++
  gcc/testsuite/g++.dg/modules/internal-10.C | 25 
  2 files changed, 81 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/internal-10.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c89834c1abd..59b7270f4a5 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -9261,7 +9261,9 @@ trees_out::type_node (tree type)
  
  	/* We'll have either visited this type or have newly discovered

   that it's TU-local; either way we won't need to visit it again.  */
-   gcc_checking_assert (TREE_VISITED (type) || has_tu_local_dep (name));
+   gcc_checking_assert (TREE_VISITED (type)
+|| has_tu_local_dep (TYPE_NAME (type))
+|| has_tu_local_dep (TYPE_TI_TEMPLATE (type)));


Why doesn't the template having a TU-local dep imply that the TYPE_NAME 
does?


Jason

Re: [PATCH] c++: Properly support null pointer constants in conditional operators [PR118282]


On 2/7/25 4:41 AM, Simon Martin wrote:

We've been rejecting the following valid code since GCC 4

=== cut here ===
struct A {
   explicit A (int);
   operator void* () const;
};
void foo (const A& x) {
   auto res = 0 ? x : 0;
}
int main () {
   A a{5};
   foo(a);
}
=== cut here ===

The problem is that for COND_EXPR, add_builtin_candidate has an early
return if the true and false values are not pointers that does not take
null pointer constants into account. This causes to not find any valid
conversion, and fail to compile.

This patch fixes the condition to also pass if the true/false values are
not pointers but null pointer constants, which resolves the PR.

Successfully tested on x86_64-pc-linux-gnu. Given this regression's age,
I don't think it make much sense to fix it during stage 4 (let me know
if you disagree), so OK for GCC16?


This looks safe enough, OK for GCC 15.


PR c++/118282

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate): Also check for null_ptr_cst_p
operands.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/op8.C: New test.

---
  gcc/cp/call.cc|  3 +-
  gcc/testsuite/g++.dg/conversion/op8.C | 75 +++
  2 files changed, 77 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/conversion/op8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c08bd0c8634..e440d58141b 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -3272,7 +3272,8 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
break;
  
/* Otherwise, the types should be pointers.  */

-  if (!TYPE_PTR_OR_PTRMEM_P (type1) || !TYPE_PTR_OR_PTRMEM_P (type2))
+  if (!((TYPE_PTR_OR_PTRMEM_P (type1) || null_ptr_cst_p (args[0]))
+   && (TYPE_PTR_OR_PTRMEM_P (type2) || null_ptr_cst_p (args[1]
return;
  
/* We don't check that the two types are the same; the logic

diff --git a/gcc/testsuite/g++.dg/conversion/op8.C 
b/gcc/testsuite/g++.dg/conversion/op8.C
new file mode 100644
index 000..eac958776c9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/op8.C
@@ -0,0 +1,75 @@
+// PR c++/118282
+// { dg-do "compile" }
+
+#if __cplusplus >= 201103L
+# include  // Only available from c++11 onwards.
+#endif
+
+struct A {
+  explicit A (int);
+  operator void* () const;
+};
+
+struct B {
+  explicit B (int);
+  operator char* () const;
+};
+
+struct C {
+  explicit C (int);
+  operator int () const;
+};
+
+struct BothWays {
+  BothWays (int);
+  operator void*() const;
+};
+
+extern bool my_bool;
+
+void foo (const A& a, const B& b, const C& c, const BothWays& d) {
+  void *res_a_1 = 0  ? 0 : a;
+  void *res_a_2 = 1  ? 0 : a;
+  void *res_a_3 = my_bool ? 0 : a;
+  void *res_a_4 = 0  ? a : 0;
+  void *res_a_5 = 1  ? a : 0;
+  void *res_a_6 = my_bool ? a : 0;
+
+  void *res_b_1 = 0  ? 0 : b;
+  void *res_b_2 = 1  ? 0 : b;
+  void *res_b_3 = my_bool ? 0 : b;
+  void *res_b_4 = 0  ? b : 0;
+  void *res_b_5 = 1  ? b : 0;
+  void *res_b_6 = my_bool ? b : 0;
+
+  //
+  // 0 valued constants that are NOT null pointer constants - this worked 
already.
+  //
+  char zero_char  = 0;
+  void *res_ko1  = 0 ? zero_char : a; // { dg-error "different 
types" }
+
+#if __cplusplus >= 201103L
+  // Those are only available starting with c++11.
+  int8_t zero_i8  = 0;
+  void *res_ko2  = 0 ? zero_i8   : a; // { dg-error "different types" 
"" { target c++11 }  }
+  uintptr_t zerop = 0;
+  void *res_ko3  = 0 ? zerop : a; // { dg-error "different types" 
"" { target c++11 }  }
+#endif
+
+  // Conversion to integer - this worked already.
+  int res_int= 0 ? 0 : c;
+
+  // Case where one arm is of class type that can be constructed from an
+  // integer and the other arm is a null pointer constant (inspired by
+  // g++.dg/template/cond5.C).
+  0 ? d : 0;
+  0 ? 0 : d;
+}
+
+int main(){
+  A a (5);
+  B b (42);
+  C c (43);
+  BothWays d (1982);
+  foo (a, b, c, d);
+}

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has


Hi Andrew,

Andrew Stubbs wrote:
I just realized that I'm assuming that -march=gfx1100 object files 
will link with -march=gfx11-generic libraries, and produce gfx1100 
binaries. Is this not the case?


Currently not: lld checks that the ELF ISA and flags (xnack, sramecc, …) 
are identical and, if not, it fails to link.



So, the recommended way to silence the warning would be to use the 
generic arch explicitly?


Yes – or, in special cases, to reconfigure + build GCC to have the 
specific ISA available.


I think the latter would be the case with gfx942 (MI300A/MI300X) vs. 
gfx9-4-generic when being compiled by an HPC center.


But for normal users, just specifying the generic instead (or ignoring 
the warning) would be the default choice.


* * *



I use locally --with-multilib- 
list=gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103,gfx9- 
generic,gfx11-generic (i.e. no gfx900 and no gfx10-3-generic + none 
of the newly added ones.)


And when plying around for testing all the patches, it works rather 
smoothly.


Except you have warnings...


... but I also know what happens.

We could also classify the warning under [-Wopenmp] or … and permit 
-Werror=… and -Wno-… for it. (I actually do not know whether '-w' and/or 
'-Werror' are honored or not, but I bet either not or only limited.)


Tobias

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]





On 2/7/25 5:59 AM, Andrew Waterman wrote:

This patch runs counter to the ABI spec, which states that vxrm is not
preserved across calls and is volatile upon function entry [1].  vxrm
does not play the same role as frm plays in the calling convention.
(I won't get into the rationale in this email, but the rationale isn't
especially important: we should follow the ABI.)

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
Pan's patch doesn't change the basic property that VXRM has no known 
state at function entry or upon return from a function call.


Jeff

Re: [PATCH] jit/118780 - make sure to include dlfcn.h when plugin support is disabled

2025-02-07 Thread Jakub Jelinek

On Fri, Feb 07, 2025 at 02:48:22PM +0100, Richard Biener wrote:
> The following makes the dlfcn.h explicitly requested which avoids
> build failure when JIT is enabled but plugin support disabled as
> currently the include is conditional on plugin support.
> 
> I've built GCC with JIT enabled and plugin support disabled as well
> as the other way around successfully with this patch.
> 
> OK for trunk and branches (after a while)?
> 
> Thanks,
> Richard.
> 
>   PR jit/118780
> gcc/
>   * system.h: Check INCLUDE_DLFCN_H for including dlfcn.h instead
>   of ENABLE_PLUGIN.
>   * plugin.cc: Define INCLUDE_DLFCN_H.
> 
> gcc/jit/
>   * jit-playback.cc: Define INCLUDE_DLFCN_H.
>   * jit-result.cc: Likewise.

Okay.

Jakub

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-07 Thread Andrew Waterman

On Fri, Feb 7, 2025 at 5:51 AM Jeff Law  wrote:
>
>
>
> On 2/7/25 5:59 AM, Andrew Waterman wrote:
> > This patch runs counter to the ABI spec, which states that vxrm is not
> > preserved across calls and is volatile upon function entry [1].  vxrm
> > does not play the same role as frm plays in the calling convention.
> > (I won't get into the rationale in this email, but the rationale isn't
> > especially important: we should follow the ABI.)
> >
> > [1] 
> > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
> Pan's patch doesn't change the basic property that VXRM has no known
> state at function entry or upon return from a function call.

Ah, GCC-internal notion of global register versus the conventional
understanding of the term.  My mistake.

>
> Jeff
>
>

Re: [PATCH] c++: Properly support null pointer constants in conditional operators [PR118282]

2025-02-07 Thread Simon Martin

On 7 Feb 2025, at 14:17, Jason Merrill wrote:

> On 2/7/25 4:41 AM, Simon Martin wrote:
>> We've been rejecting the following valid code since GCC 4
>>
>> === cut here ===
>> struct A {
>>explicit A (int);
>>operator void* () const;
>> };
>> void foo (const A& x) {
>>auto res = 0 ? x : 0;
>> }
>> int main () {
>>A a{5};
>>foo(a);
>> }
>> === cut here ===
>>
>> The problem is that for COND_EXPR, add_builtin_candidate has an early
>> return if the true and false values are not pointers that does not take
>> null pointer constants into account. This causes to not find any valid
>> conversion, and fail to compile.
>>
>> This patch fixes the condition to also pass if the true/false values are
>> not pointers but null pointer constants, which resolves the PR.
>>
>> Successfully tested on x86_64-pc-linux-gnu. Given this regression's age,
>> I don't think it make much sense to fix it during stage 4 (let me know
>> if you disagree), so OK for GCC16?
>
> This looks safe enough, OK for GCC 15.
Cool, thanks. Merged as r15-7416-g0b2f34ca19edf2.

Simon

Re: PING^2 [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]