date:20250109

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-09 Thread Richard Biener

On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao  wrote:
>
>
>
> > On Jan 7, 2025, at 07:29, Richard Biener  wrote:
> >
> > On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Jan 6, 2025, at 11:01, Richard Biener  
> >>> wrote:
> >>>
> >>> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao  wrote:
> 
> 
> 
> > On Jan 6, 2025, at 09:21, Jeff Law  wrote:
> >
> >
> >
> > On 1/6/25 7:11 AM, Qing Zhao wrote:
> >>>
> >>> Given it doesn't cause user visible UB, we could insert the trap 
> >>> *before* the UB inducing statement.  That would then make the 
> >>> statement unreachable and it'd get removed avoiding the false 
> >>> positive diagnostic.
> >> Yes, that’s a good idea.
> >> However, in order to distinguish a user visible UB and a UB in the IL 
> >> that is introduced purely by compiler, we might need some new marking 
> >> in the IR?
> > I don't think we've ever really tackled that question; the closest I 
> > can think of would be things like integer overflow which we try to 
> > avoid allowing the compiler to introduce.  If we take the integer 
> > overflow as the model, then that would say we should be tackling this 
> > during loop unrolling.
> 
>  UB that is introduced by compiler transformation is one important cause 
>  of false positive warnings.
> 
>  There are two approaches to tackle this problem from my understanding:
> 
>  1. Avoid generating such UB from the beginning. i.e, for every compiler 
>  transformation that might introduce such UB, we should add check to 
>  avoid generating it.
> 
>  2. Marking the IR portion that were generated by compiler 
>  transformations, then check whether the UB is compiler generated when 
>  issue static checker warnings.
> 
>  Are there other approaches?
> >>>
> >>> Note unrolling doesn't introduce UB - it makes conditional UB
> >>> "obvious”.
> >>
> >> So, you mean this is the same issue as PR109071 (and PR85788, PR88771, 
> >> etc), i.e, the compiler optimization make the conditional UB that’s 
> >> originally in the source code “obvious” after code duplication?
> >>
> >> (I need to study the testing case in PR92539 more carefully to make sure 
> >> this is the case...)
> >>
> >> If so, then the claimed false positive warning in PR92539 actually is a 
> >> real bug in the original source code,  and my patch that introduced the 
> >> new option “--fdiagnostics-details” should also include loop unrolling to 
> >> provide more details on the warning introduced by loop unrolling.
> >>
> >>
> >>> Note -Warray-bounds wants to
> >>> diagnose UB, so doing path isolation and removing the UB would make
> >>> -Warray-bounds useless.
> >>>
> >>> So unless the condition guarding the UB unrolling exposes is visibly
> >>> false to the compiler but we fail
> >>> to exploit that (missed optimization) there's not much that we can do.
> >>> I think "folding" away the UB
> >>> like what Jeff proposes trades false negatives for the false positive
> >>> diagnostics.
> >>>
> >>> Note the unroller knows UB that effectively bounds the number of
> >>> iterations, even on conditional
> >>> paths and it uses this to limit the number of copies _and_ to prune
> >>> unreachable paths (exploiting
> >>> UB, avoiding diagnostics).  But one of the limitations is that it only
> >>> prunes paths in the last unrolled
> >>> copy which can be insufficient (ISTR some PR where I noticed this).
> >>>
> >>> That said - I think for these unroller exposed cases of apparent false
> >>> positives we should improve
> >>> the path pruning in the unroller itself.  For the other cases the path
> >>> diagnostic might help clarify
> >>> that the UB happens on the 'n-th' iteration of the loop when some
> >>> additional condition is true/false.
> >>
> >> So, the “other cases” refer to the situation similar as PR109071, i.e, 
> >> “conditional UB” in the original source code is made obvious after loop 
> >> unrolling?
> >> Yes, for such cases, the new option I have been trying to add, 
> >> “-fdiagnostic-details” should be able to track and provide more details on 
> >> the conditions that lead to the UB.
> >> Is this understanding correct?
> >
> > I think so, but I didn't look into the testcase of the referenced PR.
>
> I took a detailed study of the test case of PR92539 yesterday.  The following 
> is a brief summary:
>
> 1. The pass that caused the issue is: cunrolli.
>  Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
>
> 2. The IR Before cunrolli:
>
> const char *local_iterator = beginning address of string "aa";
> const char *last = last address of string "aa";
>
> for (int i = 0; i < 3; ++i)
>   if (local_iterator != last)   // pointer comparison 1
> {
>   local_iterator++;
>   if (local_iterator != last)   // pointer comparison 2
> local_iterator++;
> }
>
> I think that the IR has NO UB at t

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-09 Thread Sam James

Richard Biener  writes:

> On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao  wrote:
>>
>>
>>
>> > On Jan 7, 2025, at 07:29, Richard Biener  
>> > wrote:
>> >
>> > On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao  wrote:
>> >>
>> >>
>> >>
>> >>> On Jan 6, 2025, at 11:01, Richard Biener  
>> >>> wrote:
>> >>>
>> >>> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> > On Jan 6, 2025, at 09:21, Jeff Law  wrote:
>> >
>> >
>> >
>> > On 1/6/25 7:11 AM, Qing Zhao wrote:
>> >>>
>> >>> Given it doesn't cause user visible UB, we could insert the
>> >>> trap *before* the UB inducing statement.  That would then
>> >>> make the statement unreachable and it'd get removed avoiding
>> >>> the false positive diagnostic.
>> >> Yes, that’s a good idea.
>> >> However, in order to distinguish a user visible UB and a UB in the IL 
>> >> that is introduced purely by compiler, we might need some new marking 
>> >> in the IR?
>> > I don't think we've ever really tackled that question; the
>> > closest I can think of would be things like integer overflow
>> > which we try to avoid allowing the compiler to introduce.  If
>> > we take the integer overflow as the model, then that would say
>> > we should be tackling this during loop unrolling.
>> 
>>  UB that is introduced by compiler transformation is one important cause 
>>  of false positive warnings.
>> 
>>  There are two approaches to tackle this problem from my understanding:
>> 
>>  1. Avoid generating such UB from the beginning. i.e, for every compiler 
>>  transformation that might introduce such UB, we should add check to 
>>  avoid generating it.
>> 
>>  2. Marking the IR portion that were generated by compiler 
>>  transformations, then check whether the UB is compiler generated when 
>>  issue static checker warnings.
>> 
>>  Are there other approaches?
>> >>>
>> >>> Note unrolling doesn't introduce UB - it makes conditional UB
>> >>> "obvious”.
>> >>
>> >> So, you mean this is the same issue as PR109071 (and PR85788,
>> >> PR88771, etc), i.e, the compiler optimization make the
>> >> conditional UB that’s originally in the source code “obvious”
>> >> after code duplication?
>> >>
>> >> (I need to study the testing case in PR92539 more carefully to make sure 
>> >> this is the case...)
>> >>
>> >> If so, then the claimed false positive warning in PR92539
>> >> actually is a real bug in the original source code, and my patch
>> >> that introduced the new option “--fdiagnostics-details” should
>> >> also include loop unrolling to provide more details on the
>> >> warning introduced by loop unrolling.
>> >>
>> >>
>> >>> Note -Warray-bounds wants to
>> >>> diagnose UB, so doing path isolation and removing the UB would make
>> >>> -Warray-bounds useless.
>> >>>
>> >>> So unless the condition guarding the UB unrolling exposes is visibly
>> >>> false to the compiler but we fail
>> >>> to exploit that (missed optimization) there's not much that we can do.
>> >>> I think "folding" away the UB
>> >>> like what Jeff proposes trades false negatives for the false positive
>> >>> diagnostics.
>> >>>
>> >>> Note the unroller knows UB that effectively bounds the number of
>> >>> iterations, even on conditional
>> >>> paths and it uses this to limit the number of copies _and_ to prune
>> >>> unreachable paths (exploiting
>> >>> UB, avoiding diagnostics).  But one of the limitations is that it only
>> >>> prunes paths in the last unrolled
>> >>> copy which can be insufficient (ISTR some PR where I noticed this).
>> >>>
>> >>> That said - I think for these unroller exposed cases of apparent false
>> >>> positives we should improve
>> >>> the path pruning in the unroller itself.  For the other cases the path
>> >>> diagnostic might help clarify
>> >>> that the UB happens on the 'n-th' iteration of the loop when some
>> >>> additional condition is true/false.
>> >>
>> >> So, the “other cases” refer to the situation similar as PR109071, i.e, 
>> >> “conditional UB” in the original source code is made obvious after loop 
>> >> unrolling?
>> >> Yes, for such cases, the new option I have been trying to add, 
>> >> “-fdiagnostic-details” should be able to track and provide more details 
>> >> on the conditions that lead to the UB.
>> >> Is this understanding correct?
>> >
>> > I think so, but I didn't look into the testcase of the referenced PR.
>>
>> I took a detailed study of the test case of PR92539 yesterday.  The 
>> following is a brief summary:
>>
>> 1. The pass that caused the issue is: cunrolli.
>>  Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
>>
>> 2. The IR Before cunrolli:
>>
>> const char *local_iterator = beginning address of string "aa";
>> const char *last = last address of string "aa";
>>
>> for (int i = 0; i < 3; ++i)
>>   if (local_iterator != last)   // pointer comparison 1
>> {
>>   local_it

[COMMITTED 1/2] ada: Cleanup preanalysis of static expressions (part 3)

2025-01-09 Thread Marc Poulhiès

From: Javier Miranda 

Avoid reporting spurious errors.

gcc/ada/ChangeLog:

* freeze.adb (Freeze_Expr_Types): Reverse patch; that is, restore
calls to Preanalyze_Spec_Expression instead of Preanalyze_And_Resolve
for the sake of consistency with Analyze_Expression_Function. Patch
suggested by Eric Botcazou.
* exp_put_image.adb (Image_Should_Call_Put_Image): Ensure that
function Defining_Identifier is called with a proper node to
avoid internal assertion failure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_put_image.adb | 2 ++
 gcc/ada/freeze.adb| 8 +---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/exp_put_image.adb b/gcc/ada/exp_put_image.adb
index 38bde44ff5a..ae5fa40fa38 100644
--- a/gcc/ada/exp_put_image.adb
+++ b/gcc/ada/exp_put_image.adb
@@ -1190,6 +1190,8 @@ package body Exp_Put_Image is
  --  aspects, not just for Put_Image?
 
  if Is_Itype (U_Type)
+   and then Nkind (Associated_Node_For_Itype (U_Type)) in
+  N_Full_Type_Declaration | N_Subtype_Declaration
and then Has_Aspect (Defining_Identifier
   (Associated_Node_For_Itype (U_Type)),
 Aspect_Put_Image)
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 10f0de78d9d..54b620214e8 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -9389,14 +9389,16 @@ package body Freeze is
   --  pre/postconditions during expansion of the subprogram body, the
   --  subprogram is already installed.
 
+  --  Call Preanalyze_Spec_Expression instead of Preanalyze_And_Resolve
+  --  for the sake of consistency with Analyze_Expression_Function.
+
   if Def_Id /= Current_Scope then
  Push_Scope (Def_Id);
  Install_Formals (Def_Id);
-
- Preanalyze_And_Resolve (Dup_Expr, Typ);
+ Preanalyze_Spec_Expression (Dup_Expr, Typ);
  End_Scope;
   else
- Preanalyze_And_Resolve (Dup_Expr, Typ);
+ Preanalyze_Spec_Expression (Dup_Expr, Typ);
   end if;
 
   --  Restore certain attributes of Def_Id since the preanalysis may
-- 
2.43.0

[COMMITTED 2/2] ada: Error on Disable_Controlled aspect in Multiway_Trees

2025-01-09 Thread Marc Poulhiès

From: squirek 

This patch fixes an issue in the compiler whereby instantiating Multiway_Trees
with a formal type leads to a compile-time error due to the expression supplied
for aspect Disable_Controlled specified on types decalred within
Multiway_Trees' body not being static.

gcc/ada/ChangeLog:

* libgnat/a-comutr.adb, libgnat/a-comutr.ads:
Move the declarations of iterator types into the specification and
add additional comments.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-comutr.adb | 49 
 gcc/ada/libgnat/a-comutr.ads | 39 
 2 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/gcc/ada/libgnat/a-comutr.adb b/gcc/ada/libgnat/a-comutr.adb
index e866e2ff895..df37410 100644
--- a/gcc/ada/libgnat/a-comutr.adb
+++ b/gcc/ada/libgnat/a-comutr.adb
@@ -41,55 +41,6 @@ is
pragma Warnings (Off, "variable ""Lock*"" is not referenced");
--  See comment in Ada.Containers.Helpers
 
-   
-   --  Root_Iterator --
-   
-
-   type Root_Iterator is abstract new Limited_Controlled and
- Tree_Iterator_Interfaces.Forward_Iterator with
-   record
-  Container : Tree_Access;
-  Subtree   : Tree_Node_Access;
-   end record
- with Disable_Controlled => not T_Check;
-
-   overriding procedure Finalize (Object : in out Root_Iterator);
-
-   ---
-   --  Subtree_Iterator --
-   ---
-
-   --  ??? these headers are a bit odd, but for sure they do not substitute
-   --  for documenting things, what *is* a Subtree_Iterator?
-
-   type Subtree_Iterator is new Root_Iterator with null record;
-
-   overriding function First (Object : Subtree_Iterator) return Cursor;
-
-   overriding function Next
- (Object   : Subtree_Iterator;
-  Position : Cursor) return Cursor;
-
-   -
-   --  Child_Iterator --
-   -
-
-   type Child_Iterator is new Root_Iterator and
- Tree_Iterator_Interfaces.Reversible_Iterator with null record
-   with Disable_Controlled => not T_Check;
-
-   overriding function First (Object : Child_Iterator) return Cursor;
-
-   overriding function Next
- (Object   : Child_Iterator;
-  Position : Cursor) return Cursor;
-
-   overriding function Last (Object : Child_Iterator) return Cursor;
-
-   overriding function Previous
- (Object   : Child_Iterator;
-  Position : Cursor) return Cursor;
-
---
-- Local Subprograms --
---
diff --git a/gcc/ada/libgnat/a-comutr.ads b/gcc/ada/libgnat/a-comutr.ads
index b6d006fd626..adc2cad8e5e 100644
--- a/gcc/ada/libgnat/a-comutr.ads
+++ b/gcc/ada/libgnat/a-comutr.ads
@@ -491,6 +491,45 @@ private
 
for Reference_Type'Write use Write;
 
+   --  Base iterator type for shared functionality between Child_Iterator
+   --  and Subtree_Iterator - namely finalization.
+   type Root_Iterator is abstract new Limited_Controlled and
+ Tree_Iterator_Interfaces.Forward_Iterator with
+   record
+  Container : Tree_Access;
+  Subtree   : Tree_Node_Access;
+   end record
+ with Disable_Controlled => not T_Check;
+
+   overriding procedure Finalize (Object : in out Root_Iterator);
+
+   --  Iterator to handle traversal within a specific subtree.
+   type Subtree_Iterator is new Root_Iterator with null record;
+
+   overriding function First (Object : Subtree_Iterator) return Cursor;
+
+   overriding function Next
+ (Object   : Subtree_Iterator;
+  Position : Cursor) return Cursor;
+
+   --  Iterator to handle bidirectional traversal of a node's immediate
+   --  children for operations like reverse enumeration and selective
+   --  insertion.
+   type Child_Iterator is new Root_Iterator and
+ Tree_Iterator_Interfaces.Reversible_Iterator with null record
+   with Disable_Controlled => not T_Check;
+
+   overriding function First (Object : Child_Iterator) return Cursor;
+
+   overriding function Next
+ (Object   : Child_Iterator;
+  Position : Cursor) return Cursor;
+
+   overriding function Last (Object : Child_Iterator) return Cursor;
+
+   overriding function Previous
+ (Object   : Child_Iterator;
+  Position : Cursor) return Cursor;
--  See Ada.Containers.Vectors for documentation on the following
 
function Pseudo_Reference
-- 
2.43.0

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-09 Thread Richard Biener

On Thu, Jan 9, 2025 at 9:08 AM Richard Biener
 wrote:
>
> On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao  wrote:
> >
> >
> >
> > > On Jan 7, 2025, at 07:29, Richard Biener  
> > > wrote:
> > >
> > > On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao  wrote:
> > >>
> > >>
> > >>
> > >>> On Jan 6, 2025, at 11:01, Richard Biener  
> > >>> wrote:
> > >>>
> > >>> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao  wrote:
> > 
> > 
> > 
> > > On Jan 6, 2025, at 09:21, Jeff Law  wrote:
> > >
> > >
> > >
> > > On 1/6/25 7:11 AM, Qing Zhao wrote:
> > >>>
> > >>> Given it doesn't cause user visible UB, we could insert the trap 
> > >>> *before* the UB inducing statement.  That would then make the 
> > >>> statement unreachable and it'd get removed avoiding the false 
> > >>> positive diagnostic.
> > >> Yes, that’s a good idea.
> > >> However, in order to distinguish a user visible UB and a UB in the 
> > >> IL that is introduced purely by compiler, we might need some new 
> > >> marking in the IR?
> > > I don't think we've ever really tackled that question; the closest I 
> > > can think of would be things like integer overflow which we try to 
> > > avoid allowing the compiler to introduce.  If we take the integer 
> > > overflow as the model, then that would say we should be tackling this 
> > > during loop unrolling.
> > 
> >  UB that is introduced by compiler transformation is one important 
> >  cause of false positive warnings.
> > 
> >  There are two approaches to tackle this problem from my understanding:
> > 
> >  1. Avoid generating such UB from the beginning. i.e, for every 
> >  compiler transformation that might introduce such UB, we should add 
> >  check to avoid generating it.
> > 
> >  2. Marking the IR portion that were generated by compiler 
> >  transformations, then check whether the UB is compiler generated when 
> >  issue static checker warnings.
> > 
> >  Are there other approaches?
> > >>>
> > >>> Note unrolling doesn't introduce UB - it makes conditional UB
> > >>> "obvious”.
> > >>
> > >> So, you mean this is the same issue as PR109071 (and PR85788, PR88771, 
> > >> etc), i.e, the compiler optimization make the conditional UB that’s 
> > >> originally in the source code “obvious” after code duplication?
> > >>
> > >> (I need to study the testing case in PR92539 more carefully to make sure 
> > >> this is the case...)
> > >>
> > >> If so, then the claimed false positive warning in PR92539 actually is a 
> > >> real bug in the original source code,  and my patch that introduced the 
> > >> new option “--fdiagnostics-details” should also include loop unrolling 
> > >> to provide more details on the warning introduced by loop unrolling.
> > >>
> > >>
> > >>> Note -Warray-bounds wants to
> > >>> diagnose UB, so doing path isolation and removing the UB would make
> > >>> -Warray-bounds useless.
> > >>>
> > >>> So unless the condition guarding the UB unrolling exposes is visibly
> > >>> false to the compiler but we fail
> > >>> to exploit that (missed optimization) there's not much that we can do.
> > >>> I think "folding" away the UB
> > >>> like what Jeff proposes trades false negatives for the false positive
> > >>> diagnostics.
> > >>>
> > >>> Note the unroller knows UB that effectively bounds the number of
> > >>> iterations, even on conditional
> > >>> paths and it uses this to limit the number of copies _and_ to prune
> > >>> unreachable paths (exploiting
> > >>> UB, avoiding diagnostics).  But one of the limitations is that it only
> > >>> prunes paths in the last unrolled
> > >>> copy which can be insufficient (ISTR some PR where I noticed this).
> > >>>
> > >>> That said - I think for these unroller exposed cases of apparent false
> > >>> positives we should improve
> > >>> the path pruning in the unroller itself.  For the other cases the path
> > >>> diagnostic might help clarify
> > >>> that the UB happens on the 'n-th' iteration of the loop when some
> > >>> additional condition is true/false.
> > >>
> > >> So, the “other cases” refer to the situation similar as PR109071, i.e, 
> > >> “conditional UB” in the original source code is made obvious after loop 
> > >> unrolling?
> > >> Yes, for such cases, the new option I have been trying to add, 
> > >> “-fdiagnostic-details” should be able to track and provide more details 
> > >> on the conditions that lead to the UB.
> > >> Is this understanding correct?
> > >
> > > I think so, but I didn't look into the testcase of the referenced PR.
> >
> > I took a detailed study of the test case of PR92539 yesterday.  The 
> > following is a brief summary:
> >
> > 1. The pass that caused the issue is: cunrolli.
> >  Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
> >
> > 2. The IR Before cunrolli:
> >
> > const char *local_iterator = beginning address of string "aa";
> > const char *last = last addre

[PATCH] [ifcombine] reuse left-hand mask to decode right-hand xor operand

2025-01-09 Thread Alexandre Oliva



If fold_truth_andor_for_ifcombine applies a mask to an xor, say
because the result of the xor is compared with a power of two [minus
one], we have to apply the same mask when processing both the left-
and right-hand xor paths for the transformation to be sound.  Arrange
for decode_field_reference to propagate the incoming mask along with
the expression to the right-hand operand.

Don't require the right-hand xor operand to be a constant, that was a
cut&pasto.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
Propagate pand_mask to the right-hand xor operand.  Don't
require the right-hand xor operand to be a constant.
(fold_truth_andor_for_ifcombine): Pass right-hand mask when
appropriate.
---
 gcc/gimple-fold.cc |   23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index d95f04213ee40..0ad92de3a218f 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7519,8 +7519,9 @@ gimple_binop_def_p (enum tree_code code, tree t, tree 
op[2])
 
*XOR_P is to be FALSE if EXP might be a XOR used in a compare, in which
case, if XOR_CMP_OP is a zero constant, it will be overridden with *PEXP,
-   *XOR_P will be set to TRUE, and the left-hand operand of the XOR will be
-   decoded.  If *XOR_P is TRUE, XOR_CMP_OP is supposed to be NULL, and then the
+   *XOR_P will be set to TRUE, *XOR_PAND_MASK will be copied from *PAND_MASK,
+   and the left-hand operand of the XOR will be decoded.  If *XOR_P is TRUE,
+   XOR_CMP_OP and XOR_PAND_MASK are supposed to be NULL, and then the
right-hand operand of the XOR will be decoded.
 
*LOAD is set to the load stmt of the innermost reference, if any,
@@ -7537,7 +7538,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
HOST_WIDE_INT *pbitpos,
bool *punsignedp, bool *preversep, bool *pvolatilep,
wide_int *pand_mask, bool *psignbit,
-   bool *xor_p, tree *xor_cmp_op,
+   bool *xor_p, tree *xor_cmp_op, wide_int *xor_pand_mask,
gimple **load, location_t loc[4])
 {
   tree exp = *pexp;
@@ -7599,15 +7600,14 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
 and_mask = *pand_mask;
 
   /* Turn (a ^ b) [!]= 0 into a [!]= b.  */
-  if (xor_p && gimple_binop_def_p (BIT_XOR_EXPR, exp, res_ops)
-  && uniform_integer_cst_p (res_ops[1]))
+  if (xor_p && gimple_binop_def_p (BIT_XOR_EXPR, exp, res_ops))
 {
   /* No location recorded for this one, it's entirely subsumed by the
 compare.  */
   if (*xor_p)
{
  exp = res_ops[1];
- gcc_checking_assert (!xor_cmp_op);
+ gcc_checking_assert (!xor_cmp_op && !xor_pand_mask);
}
   else if (!xor_cmp_op)
/* Not much we can do when xor appears in the right-hand compare
@@ -7618,6 +7618,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
  *xor_p = true;
  exp = res_ops[0];
  *xor_cmp_op = *pexp;
+ *xor_pand_mask = *pand_mask;
}
 }
 
@@ -8152,19 +8153,21 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   bool l_xor = false, r_xor = false;
   ll_inner = decode_field_reference (&ll_arg, &ll_bitsize, &ll_bitpos,
 &ll_unsignedp, &ll_reversep, &volatilep,
-&ll_and_mask, &ll_signbit, &l_xor, &lr_arg,
+&ll_and_mask, &ll_signbit,
+&l_xor, &lr_arg, &lr_and_mask,
 &ll_load, ll_loc);
   lr_inner = decode_field_reference (&lr_arg, &lr_bitsize, &lr_bitpos,
 &lr_unsignedp, &lr_reversep, &volatilep,
-&lr_and_mask, &lr_signbit, &l_xor, 0,
+&lr_and_mask, &lr_signbit, &l_xor, 0, 0,
 &lr_load, lr_loc);
   rl_inner = decode_field_reference (&rl_arg, &rl_bitsize, &rl_bitpos,
 &rl_unsignedp, &rl_reversep, &volatilep,
-&rl_and_mask, &rl_signbit, &r_xor, &rr_arg,
+&rl_and_mask, &rl_signbit,
+&r_xor, &rr_arg, &rr_and_mask,
 &rl_load, rl_loc);
   rr_inner = decode_field_reference (&rr_arg, &rr_bitsize, &rr_bitpos,
 &rr_unsignedp, &rr_reversep, &volatilep,
-&rr_and_mask, &rr_signbit, &r_xor, 0,
+&rr_and_mask, &rr_signbit, &r_xor, 0, 0,
 &rr_load, rr_loc);
 
   /* It must be true that the inner operation on the

Re: [PATCH] s390: Add testcase for just fixed PR118362

2025-01-09 Thread Stefan Schulze Frielinghaus

On Thu, Jan 09, 2025 at 07:21:53PM +0100, Jakub Jelinek wrote:
> On Thu, Jan 09, 2025 at 01:29:27PM +0100, Stefan Schulze Frielinghaus wrote:
> > Optimization s390_constant_via_vgbm_p() should only apply to constant
> > vectors which can be expressed by the hardware, i.e., which have a size
> > of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
> > and s390_constant_via_vrepi_p().
> > 
> > gcc/ChangeLog:
> > 
> > PR target/118362
> > * config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
> > 16-byte vectors.
> > ---
> >  Bootstrap and regtest are still running.  If both are successful, I
> >  will push this one promptly.
> 
> This was committed without a testcase, which IMHO shouldn't hurt.
> 
> Ok for trunk?

Ok.

Thanks,
Stefan

Re: [PATCH] c++: Ignore default arguments for friend functions that cannot have any [PR118319]

2025-01-09 Thread Jason Merrill


On 1/9/25 8:25 AM, Simon Martin wrote:

We segfault upon the following invalid code

=== cut here ===
template  struct S {
   friend void foo (int a = []{}());
};
void foo (int a) {}
int main () {
   S<0> t;
   foo ();
}
=== cut here ===

The problem is that we end up with a LAMBDA_EXPR callee in
set_flags_from_callee, and dereference its NULL_TREE
TREE_TYPE (TREE_TYPE ( )).

This patch simply sets the default argument to error_mark_node for
friend functions that do not meet the requirement in C++17 11.3.6/4.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118319

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Inspect all friend function parameters,
and set them to error_mark_node if invalid.

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg18.C: New test.

---
  gcc/cp/decl.cc| 13 +---
  gcc/testsuite/g++.dg/parse/defarg18.C | 48 +++
  2 files changed, 57 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/defarg18.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 503ecd9387e..b2761c23d3e 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -11134,14 +11134,19 @@ grokfndecl (tree ctype,
   expression, that declaration shall be a definition..."  */
if (friendp && !funcdef_flag)
  {
+  bool has_permerrored = false;
for (tree t = FUNCTION_FIRST_USER_PARMTYPE (decl);
   t && t != void_list_node; t = TREE_CHAIN (t))
if (TREE_PURPOSE (t))
  {
-   permerror (DECL_SOURCE_LOCATION (decl),
-  "friend declaration of %qD specifies default "
-  "arguments and isn%'t a definition", decl);
-   break;
+   if (!has_permerrored)
+ {
+   has_permerrored = true;
+   permerror (DECL_SOURCE_LOCATION (decl),
+  "friend declaration of %qD specifies default "
+  "arguments and isn%'t a definition", decl);
+ }
+   TREE_PURPOSE (t) = error_mark_node;


If we're going to unconditionally change TREE_PURPOSE, then permerror 
needs to strengthen to error.  But I'd think we could leave the current 
state in a non-template class, only changing the template case.


Jason

[PATCH] Refactor ix86_expand_vecop_qihi2.

2025-01-09 Thread liuhongt

Since there's regression to use vpermq, and it's manually disabled by
!TARGET_AVX512BW. I remove the codes related to vpermq and make
ix86_expand_vecop_qihi2 only handle vpmovbw + op + vpmovwb case.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Refactor to avoid redundant TARGET_AVX512BW in many places.
---
 gcc/config/i386/i386-expand.cc | 39 +-
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2ab57874234..da030832bba 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24864,11 +24864,9 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
  generic permutation to merge the data back into the right place.  This
  permutation results in VPERMQ, which is slow, so better fall back to
  ix86_expand_vecop_qihi.  */
-  if (!TARGET_AVX512BW)
-return false;
-
-  if ((qimode == V16QImode && !TARGET_AVX2)
-  || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
+  if (!TARGET_AVX512BW
+  || (qimode == V16QImode && !TARGET_AVX512VL)
+  || (qimode == V32QImode && !TARGET_EVEX512)
   /* There are no V64HImode instructions.  */
   || qimode == V64QImode)
  return false;
@@ -24883,8 +24881,7 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
 {
 case E_V16QImode:
   himode = V16HImode;
-  if (TARGET_AVX512VL && TARGET_AVX512BW)
-   gen_truncate = gen_truncv16hiv16qi2;
+  gen_truncate = gen_truncv16hiv16qi2;
   break;
 case E_V32QImode:
   himode = V32HImode;
@@ -24926,33 +24923,7 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
 hdest = expand_simple_binop (himode, code, hop1, hop2,
 NULL_RTX, 1, OPTAB_DIRECT);
 
-  if (gen_truncate)
-emit_insn (gen_truncate (dest, hdest));
-  else
-{
-  struct expand_vec_perm_d d;
-  rtx wqdest = gen_reg_rtx (wqimode);
-  rtx wqres = gen_lowpart (wqimode, hdest);
-  bool ok;
-  int i;
-
-  /* Merge the data back into the right place.  */
-  d.target = wqdest;
-  d.op0 = d.op1 = wqres;
-  d.vmode = wqimode;
-  d.nelt = GET_MODE_NUNITS (wqimode);
-  d.one_operand_p = false;
-  d.testing_p = false;
-
-  for (i = 0; i < d.nelt; ++i)
-   d.perm[i] = i * 2;
-
-  ok = ix86_expand_vec_perm_const_1 (&d);
-  gcc_assert (ok);
-
-  emit_move_insn (dest, gen_lowpart (qimode, wqdest));
-}
-
+  emit_insn (gen_truncate (dest, hdest));
   return true;
 }
 
-- 
2.34.1

Re:[pushed] [PATCH v1] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-09 Thread Lulu Cheng


Pushed to r15-6755.

在 2025/1/6 下午4:16, mengqinggang 写道:

Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: New tests for lu12i.w, lu32i.d
and lu52i.d.
---
  gcc/config/loongarch/lasx.md  |  2 +-
  gcc/config/loongarch/loongarch-protos.h   |  2 +-
  gcc/config/loongarch/loongarch.cc | 14 ++--
  gcc/config/loongarch/loongarch.md | 34 ---
  gcc/config/loongarch/lsx.md   |  2 +-
  gcc/testsuite/gcc.target/loongarch/imm-load.c |  3 ++
  6 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index edaf64eeb95..a37c85a25a4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -723,7 +723,7 @@ (define_insn "mov_lasx"
[(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
(match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
"ISA_HAS_LASX"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
[(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
 (set_attr "mode" "")
 (set_attr "length" "8,4,4,4,4")])
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index fb544ad75ca..6601f767dab 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -86,7 +86,7 @@ extern void loongarch_split_move (rtx, rtx);
  extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode);
  extern void loongarch_split_plus_constant (rtx *, machine_mode);
  extern void loongarch_split_vector_move (rtx, rtx);
-extern const char *loongarch_output_move (rtx, rtx);
+extern const char *loongarch_output_move (rtx *);
  #ifdef RTX_CODE
  extern void loongarch_expand_scc (rtx *);
  extern void loongarch_expand_vec_cmp (rtx *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 89237c377e7..f26c1346acc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4721,8 +4721,10 @@ loongarch_split_vector_move (rtx dest, rtx src)
 that SRC is operand 1 and DEST is operand 0.  */
  
  const char *

-loongarch_output_move (rtx dest, rtx src)
+loongarch_output_move (rtx *operands)
  {
+  rtx src = operands[1];
+  rtx dest = operands[0];
enum rtx_code dest_code = GET_CODE (dest);
enum rtx_code src_code = GET_CODE (src);
machine_mode mode = GET_MODE (dest);
@@ -4875,13 +4877,19 @@ loongarch_output_move (rtx dest, rtx src)
if (src_code == CONST_INT)
{
  if (LU12I_INT (src))
-   return "lu12i.w\t%0,%1>>12\t\t\t# %X1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 12);
+ return "lu12i.w\t%0,%1\t\t\t# %X1";
+   }
  else if (IMM12_INT (src))
return "addi.w\t%0,$r0,%1\t\t\t# %X1";
  else if (IMM12_INT_UNSIGNED (src))
return "ori\t%0,$r0,%1\t\t\t# %X1";
  else if (LU52I_INT (src))
-   return "lu52i.d\t%0,$r0,%X1>>52\t\t\t# %1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 52);
+ return "lu52i.d\t%0,$r0,%X1\t\t\t# %1";
+   }
  else
gcc_unreachable ();
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 3eff4077160..59f45770311 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2209,7 +2209,7 @@ (define_insn_and_split "*movdi_32bit"
"!TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
"CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
(operands[0]))"
[(const_int 0)]
@@ -2228,7 +2228,9 @@ (define_insn_and_split "*movdi_64bit"
"TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  {
+return loongarch_output_move (operands);
+  }
"CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
(operands[0]))"
[(const_int 0)]
@@ -2315,7 +2

Re: [PATCH V4 0/2] RISC-V: Add intrinsics support and testcases for SiFive Xsfvcp extension.

2025-01-09 Thread Kito Cheng

Could you rebase and send the patch set again? I can't apply the patch set:

[kitoc@hsinchu18 gcc]$ git am
/tmp/git-pw8sm7zbop/RISC-V-Add-intrinsics-support-and-testcases-for-SiFive-Xsfvcp-extension..patch
Applying: RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.
error: patch failed: gcc/config/riscv/riscv-vector-builtins-types.def:369
error: gcc/config/riscv/riscv-vector-builtins-types.def: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.cc:3600
error: gcc/config/riscv/riscv-vector-builtins.cc: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.def:729
error: gcc/config/riscv/riscv-vector-builtins.def: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.h:297
error: gcc/config/riscv/riscv-vector-builtins.h: patch does not apply
error: patch failed: gcc/config/riscv/vector-iterators.md:4814
error: gcc/config/riscv/vector-iterators.md: patch does not apply
error: patch failed: gcc/config/riscv/vector.md:56
error: gcc/config/riscv/vector.md: patch does not apply
Patch failed at 0001 RISC-V: Add intrinsics support for SiFive Xsfvcp
extensions.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[kitoc@hsinchu18 gcc]$

On Wed, Jan 8, 2025 at 5:04 PM  wrote:
>
> From: yulong 
>
> This patch implements the Sifvie vendor extension Xsfvcp[1]
>  support to gcc. Providing a flexible mechanism to extend application
>  processors with custom coprocessors and variable-latency arithmetic
>   units intrinsics.
>
> [1] 
> https://www.sifive.com/document-file/sifive-vector-coprocessor-interface-vcix-software
>
> Co-Authored by: Jiawei Chen 
> Co-Authored by: Shihua Liao 
> Co-Authored by: Yixuan Chen 
>
> Diff with V3: Add new RTL mode and sifive_vector.h file and change testcase 
> include file.
>
> yulong (2):
>   RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.
>   RISC-V: Add intrinsics testcases for SiFive Xsfvcp extensions.
>
>  gcc/config.gcc|   2 +-
>  gcc/config/riscv/constraints.md   |  10 +
>  gcc/config/riscv/generic-vector-ooo.md|   4 +
>  gcc/config/riscv/genrvv-type-indexer.cc   |   9 +
>  gcc/config/riscv/riscv-c.cc   |   3 +-
>  .../riscv/riscv-vector-builtins-shapes.cc |  48 +
>  .../riscv/riscv-vector-builtins-shapes.h  |   2 +
>  .../riscv/riscv-vector-builtins-types.def |  40 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 362 +++-
>  gcc/config/riscv/riscv-vector-builtins.def|  30 +-
>  gcc/config/riscv/riscv-vector-builtins.h  |   8 +
>  gcc/config/riscv/riscv.md |   5 +-
>  .../riscv/sifive-vector-builtins-bases.cc |  78 ++
>  .../riscv/sifive-vector-builtins-bases.h  |   3 +
>  .../sifive-vector-builtins-functions.def  |  45 +
>  gcc/config/riscv/sifive-vector.md | 871 ++
>  gcc/config/riscv/sifive_vector.h  |  47 +
>  gcc/config/riscv/vector-iterators.md  |  48 +
>  gcc/config/riscv/vector.md|   3 +-
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_f.c  |  88 ++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_i.c  | 132 +++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_v.c  | 107 +++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_x.c  | 138 +++
>  23 files changed, 2074 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/config/riscv/sifive_vector.h
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_v.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_x.c
>
> --
> 2.34.1
>

RE: [PATCH] COBOL 3/8 gen: GENERIC interface

2025-01-09 Thread Richard Biener

On Thu, 9 Jan 2025, Robert Dubner wrote:

> I am going to trim back some of the older stuff.
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, January 7, 2025 08:32
> > To: Robert Dubner 
> > Cc: jklow...@symas.com; Joseph Myers ; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [PATCH] COBOL 3/8 gen: GENERIC interface
> >
> > On Mon, 23 Dec 2024, Robert Dubner wrote:
> >
> > > Richard, a bunch of things you address are in my bailwick.
> > >
> > > When Jim and I set out to create a COBOL front end, I knew *NOTHING*
> > > about, well, anything vis-à-vis GCC.  I barely knew how it worked.
> >
> > I guess that's expected - we always hope people doing new frontends have
> > spare time left to fill gaps in documentation with knowledge they gained
> > ;)  Or maybe write a blog post about how to do a new GCC frontend (there
> > are multiple such for backends).  But I know time is scarce.
> >
> 
> I don't think it's just time.  There are so many layers.  By analogy: I am
> imagining a simple machine that levitates a steel ball bearing with an
> electromagnet above and an optical sensor: sensor sees the bearing is too
> low, so more current is sent to the coil, which raises the bearing and so
> on.  Somebody wants to know how to build such a thing.  It's simple.  You
> just need to know how to build a sensor, and how to wind an electromagnet,
> and build an amplifier, and you need to know physics and feedback control
> theory, which means you need to know calculus, which means you need to
> know trigonometry, which means you need to know algrebra  I could
> publish "Popular Mechanics" plans for such a gadget that a kid could
> build, but they wouldn't know how to do it themselves.
> 
> The front end seems to be a lot like that.  It took me weeks to figure out
> the relationship between tree.h and tree.def and GENERIC tags and the
> build_ routines and...  After I built my GENERIC dumper, I spent many days
> drawing the directed cyclic graphs of functions, starting with "void
> foo(void){}", to figure out what the middle end expected from me.  And I
> build up from there.  I have routines that do the hard work, so much so
> that I rarely work with individual GENERIC tags any more.  (I personally
> call individual trees "tags" when they are in isolation, and "nodes" when
> they are part of a tree, because otherwise the word "tree" gets so
> overworked it becomes meaningless).  I have macro-like routines that I
> have created to do the work.  I suspect every front end does, too.
> 
> And the thought of trying to document that in a way that's more meaningful
> than a do-it-yourself "Popular Mechanics" project plan "GCC Front Ends For
> Dummies" is exhausting.  How far down do you go?  ("First, find a deposit
> of iron ore, and a seam of coal.  Then, use a pile of rocks to build a
> forge...")
> 
> I'll give it some thought, though.  I would have found an "hello, world"
> front end incredibly useful.
> 
> > > COBOL sections and paragraphs (a section is a group of paragraphs) are
> > > conceptually similar to C functions. Given a paragraph named FOO, you
> > > can PERFORM FOO and the group of sentences (yes, a paragraph is made
> > > up of sentences; I remind you that COBOL was originally designed to be
> > > readable by non-programmers) are executed and control then returns to
> > > the statement after the PERFORM.
> > >
> > > For various reasons, execution into, through, and back from sections
> > > and paragraphs must be implemented with GOTO statements, and cannot be
> > > implemented with calls.
> >
> > Uh, that's awkward (if not only for the fact that the big functions you
> > end up will be slow to compile).
> 
> I am not sure I was clear.  Those GOTO statements are implemented in the
> run-time executable, so the executable contents of a paragraph are laid
> down only once in the generated executable.  I am not jumping all over
> creation in the front end.  ( That's a horrid thought, isn't it?)
> 
> >
> > > I nonetheless attempted, at one point, to implement PERFORM via calls,
> > > and the ".global" you noticed is a vestige of that effort.  The
> > > routine it was used in has a boolean variable 'global' that defaults
> > > to false, and the routine was never called with that parameter set to
> > > true.  It is unnecessary and has been deleted.
> > >
> > > gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use
> > > it to generate #-delimited comments into the generated assembly
> > > language so that I can see what's going on.
> > >
> > > But I also use it to generate labeled locations in the executables.
> > >
> > > I am also developing a GDB-COBOL version of GDB, one that understands
> > > the executables GCOBOL is generating.  We need the GDB NEXT
> > > instruction to execute through a PROC that is the subject of a PERFORM
> > PROC statement.
> > > And I need to be able to set a breakpoint with "(gdb)break PROC".
> > >
> > > I have not yet figured out how to use GC

[PATCH] c/c++: UX improvements to 'too {few, many} arguments' errors (v3) [PR118112]

2025-01-09 Thread David Malcolm

On Thu, 2025-01-09 at 14:21 -0500, Jason Merrill wrote:

Thanks for taking a look...

> > On 1/9/25 2:11 PM, David Malcolm wrote:
> > 
> > @@ -4743,7 +4769,38 @@ convert_arguments (tree typelist, vec > va_gc> **values, tree fndecl,
> > if (typetail && typetail != void_list_node)
> > {
> >   if (complain & tf_error)
> > -   error_args_num (input_location, fndecl,
> > /*too_many_p=*/false);
> > +   {
> > + /* Not enough args.
> > +Determine minimum number of arguments required.  */
> > + int min_expected_num = 0;
> > + bool at_least_p = false;
> > + tree iter = typelist;
> > + while (true)
> > +   {
> > + if (!iter)
> > +   {
> > + /* Variadic arguments; stop iterating.  */
> > + at_least_p = true;
> > + break;
> > +   }
> > + if (iter == void_list_node)
> > +   /* End of arguments; stop iterating.  */
> > +   break;
> > + if (fndecl && TREE_PURPOSE (iter)
> > + && TREE_CODE (TREE_PURPOSE (iter)) != DEFERRED_PARSE)
> > 
> 
> Why are you checking DEFERRED_PARSE?  That indicates a default
> argument, 
> even if it isn't parsed yet.  For that case we should get the error
> in 
> convert_default_arg rather than pretend there's no default argument.

I confess that the check for DEFERRED_PARSE was a rather mindless copy
and paste by me from the "See if there are default arguments that can be
used" logic earlier in the function.

I've removed it in the latest version of the patch.

> > +   {
> > + /* Found a default argument; skip this one when
> > +counting minimum required.  */
> > + at_least_p = true;
> > + iter = TREE_CHAIN (iter);
> > + continue;
> 
> We could break here, once you have a default arg the rest of the
> parms 
> need to have them as well.

Indeed; I've updated this in the latest version of the patch, so
we break out as soon as we see an arg with a non-null TREE_PURPOSE.

> 
> > +   }
> > + ++min_expected_num;
> > + iter = TREE_CHAIN (iter);
> > +   }
> > + error_args_num (input_location, fndecl,
> > + min_expected_num, actual_num, at_least_p);
> > +   }
> >   return -1;
> > }

Here's a v3 version of the patch, which is currently going through
my tester.

OK for trunk if it passes bootstrap®rtesting?

Thanks
Dave

Consider this case of a bad call to a callback function (perhaps
due to C23 changing the meaning of () in function decls):

struct p {
int (*bar)();
};

void baz() {
struct p q;
q.bar(1);
}

Before this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'
7 | q.bar(1);
  | ^

and the C++ frontend emits:

t.c: In function 'void baz()':
t.c:7:10: error: too many arguments to function
7 | q.bar(1);
  | ~^~~

neither of which give the user much help in terms of knowing what
was expected, and where the relevant declaration is.

With this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'; expected 0, have 1
7 | q.bar(1);
  | ^ ~
t.c:2:15: note: declared here
2 | int (*bar)();
  |   ^~~

(showing the expected vs actual counts, the pertinent field decl, and
underlining the first extraneous argument at the callsite)

and the C++ frontend emits:

t.c: In function 'void baz()':
t.c:7:10: error: too many arguments to function; expected 0, have 1
7 | q.bar(1);
  | ~^~~

(showing the expected vs actual counts; the other data was not accessible
without a more invasive patch)

Similarly, the patch also updates the "too few arguments" case to also
show expected vs actual counts.  Doing so requires a tweak to the
wording to say "at least" for the case of variadic fns, and for C++ fns
with default args, where e.g. previously the C FE emitted:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'
5 |   callee ();
  |   ^~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
  |  ^~

with this patch it emits:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'; expected at least 1, 
have 0
5 |   callee ();
  |   ^~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
  |  ^~

gcc/c/ChangeLog:
PR c/118112
* c-typeck.cc (inform_declaration): Add "function_expr" param and
use it for cases where we couldn't show the function decl to show
field decls for callbacks.
(build_function_call_vec): Add missing auto_diagnostic_group.
Update for new param of inform_declaration.
(convert_argum

[PATCH] [ifcombine] fix mask variable test to match use [PR118344]

2025-01-09 Thread Alexandre Oliva



There was a cut&pasto in the rr_and_mask's adjustment to match the
combined type: the test on whether there was a mask already was
testing the wrong variable, and then it might crash or otherwise fail
accessing an undefined mask.  This only hit with checking enabled,
and rarely at that.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/118344
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
rr_and_mask's type adjustment test.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118344
* gcc.dg/field-merge-19.c: New.
---
 gcc/gimple-fold.cc|2 +-
 gcc/testsuite/gcc.dg/field-merge-19.c |   41 +
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-19.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 0ad92de3a218f..20b5024d861db 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8644,7 +8644,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
  xlr_bitpos);
   else
lr_mask = wi::shifted_mask (xlr_bitpos, lr_bitsize, false, rnprec);
-  if (rl_and_mask.get_precision ())
+  if (rr_and_mask.get_precision ())
rr_mask = wi::lshift (wide_int::from (rr_and_mask, rnprec, UNSIGNED),
  xrr_bitpos);
   else
diff --git a/gcc/testsuite/gcc.dg/field-merge-19.c 
b/gcc/testsuite/gcc.dg/field-merge-19.c
new file mode 100644
index 0..5622baa52b0a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-19.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fchecking" } */
+
+/* PR tree-optimization/118344 */
+
+/* This used to ICE attempting to extend a mask variable after testing the
+   wrong mask variable.  */
+
+int d, e, g, h, i, c, j;
+static short k;
+char o;
+static int *p;
+static long *a;
+int b[0];
+int q(int s, int t, int *u, int *v) {
+  for (int f = 0; f < s; f++)
+if ((t & v[f]) != u[f])
+  return 0;
+  return 1;
+}
+int w(int s, int t) {
+  int l[] = {t, t, t, t}, m[] = {e, e, 3, 1};
+  int n = q(s, d, l, m);
+  return n;
+}
+int x(unsigned s) {
+  unsigned r;
+  if (s >= -1)
+return 1;
+  r = 1000;
+  while (s > 1 / r)
+r /= 2;
+  return g ? 2 : 0;
+}
+void y() {
+  for (;;) {
+b[w(8, *p)] = h;
+for (; a + k; j = o)
+  i &= c = x(6) < 0;
+  }
+}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] c++: Avoid infinite recursion when deducing template arguments for invalid code [PR118078]

2025-01-09 Thread Jason Merrill


On 12/20/24 2:37 PM, Simon Martin wrote:

We currently fail due to "infinite recursion" on the following invalid
code with -std=c++20 and above

=== cut here ===
template  struct S { struct U { const S s; } u; };
S t{2};
=== cut here ===

The problem is that reshape_init_class for S calls reshape_init_r for
its field S::u, that calls reshape_init_class for S::U, that calls
reshape_init_r for field S::U::s that calls reshape_init_class for its
type S, etc.

This patch fixes the issue by erroring out in reshape_init_class if we
detect that we're about to call reshape_init_r for a type that's part of
our "TYPE_CONTEXT chain".

An alternative was to change the check in grokdeclarator that rejects
fields with an incomplete type to check for the field type's
TYPE_BEING_DECLARED (which sounds like the right thing to do), however
erroring for the definition of U::s breaks a bunch of existing test
cases, and it would also make us diverge from clang and MSVC (but behave
like EDG) - see https://godbolt.org/z/hT8d1GWa3. It feels to me like we
don't want that (I'm happy to revisit if people think it's OK).


We do error for U::s at instantiation time, because we need to 
instantiate U in order to instantiate S.


Erroring at template parse time is a bit tricky; we can't error 
immediately on the declaration of U::s, what makes it a problem is the 
later declaration of S::u.  Without that we could instantiate U later, 
after S is already complete.



Successfully tested on x86_64-pc-linux-gnu.

PR c++/118078

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Don't trigger infinite recursion
for invalid "recursive types".

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction118.C: New test.
* g++.dg/cpp2a/class-deduction-aggr16.C: New test.

---
  gcc/cp/decl.cc| 21 ---
  .../g++.dg/cpp1z/class-deduction118.C |  8 +++
  .../g++.dg/cpp2a/class-deduction-aggr16.C |  7 +++
  3 files changed, 33 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 42e83f880f9..ea55ea4c0e5 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7315,9 +7315,24 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
  d->cur++;
}
else
-   field_init = reshape_init_r (TREE_TYPE (field), d,
-/*first_initializer_p=*/NULL_TREE,
-complain);
+   {
+ /* Make sure that we won't be calling ourselves recursively, which
+could happen with "recursive template types" (PR c++/118078).  */
+ tree ctx = TYPE_CONTEXT (type);
+ while (ctx && CLASS_TYPE_P (ctx))
+   {
+ if (ctx == TYPE_MAIN_VARIANT (TREE_TYPE (field))) {
+ if (complain & tf_error)
+   error ("field %qD has incomplete type %qT", field,
+  TREE_TYPE (field));
+ return error_mark_node;
+ }
+ ctx = TYPE_CONTEXT (ctx);
+   }


I'm concerned that this will break the situation I mentioned, of 
initializing an S::U in the case where S doesn't actually have a member 
of type U.  That the member has the enclosing class type isn't the 
problem; the problem is that combined with the enclosing class having a 
member of the nested class type.  It's the recursion, not the nesting.



+ field_init = reshape_init_r (TREE_TYPE (field), d,
+  /*first_initializer_p=*/NULL_TREE,
+  complain);
+   }
  
if (field_init == error_mark_node)

return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
new file mode 100644
index 000..b14d62df793
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction118.C
@@ -0,0 +1,8 @@
+// PR c++/118078
+// { dg-do "compile" { target c++11 } }
+
+template 
+struct S { struct U { const S s; } u; };
+S t{2};   // { dg-error "invalid use of template-name 'S' without an argument list" 
"" { target c++14_down } }
+ // { dg-error "class template argument deduction failed" "" { target 
c++17 } .-1 }
+ // { dg-error "no matching function for call to 'S\\\(int\\\)'" "" { 
target c++17 } .-2 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
new file mode 100644
index 000..feab11927c1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr16.C
@@ -0,0 +1,7 @@
+// PR c++/118078
+// { dg-do "compile" { target c++20 } }
+
+template 
+struct S { const struct U { struct V { volatile S s; } v; } u; };
+S t{2};   // { dg-error "class template argument ded

[r15-6751 Regression] FAIL: gcc.target/i386/pr118017.c (test for excess errors) on Linux/x86_64

2025-01-09 Thread haochen.jiang

On Linux/x86_64,

fab96de044f1f023f52d43af866205d17d8895fb is the first bad commit
commit fab96de044f1f023f52d43af866205d17d8895fb
Author: Vladimir N. Makarov 
Date:   Thu Jan 9 16:22:02 2025 -0500

[PR118017][LRA]: Don't inherit reg of non-uniform reg class

caused

FAIL: gcc.target/i386/pr118017.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-6751/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr118017.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr118017.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[PATCH v2] testsuite: arm: Use -std=c17 and effective-target arm_arch_v5te_thumb

2025-01-09 Thread Torbjörn SVENSSON

Changes since v1:

- Added dg-add-options arm_arch_v5te_thumb
- Added -std=c17 to dg-options.
- Removed -march=armv5te -mfloat-abi=soft -mthumb from dg-options
- Updated the commit message to reflect the new changes

Note: This changes from armv5te to armv5te+fp and from soft to softfp.
Does this matter? If so, I can override it in a new
dg-additional-options line after the dg-add-options.

Ok for trunk?

--

With -std=c23, the following errors are now emitted as the function
prototype and implementation does not match:

.../pr59858.c: In function 're_search_internal':
.../pr59858.c:95:17: error: too many arguments to function 'check_matching'
.../pr59858.c:75:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:100:1: error: conflicting types for 'check_matching'; have 
'int(re_match_context_t *, int *)'
.../pr59858.c:75:12: note: previous declaration of 'check_matching' with type 
'int(void)'
.../pr59858.c: In function 'check_matching':
.../pr59858.c:106:14: error: too many arguments to function 'transit_state'
.../pr59858.c:77:23: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:111:1: error: conflicting types for 'transit_state'; have 
're_dfastate_t *(re_match_context_t *, re_dfastate_t *)'
.../pr59858.c:77:23: note: previous declaration of 'transit_state' with type 
're_dfastate_t *(void)'
.../pr59858.c: In function 'transit_state':
.../pr59858.c:116:7: error: too many arguments to function 'build_trtable'
.../pr59858.c:79:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:121:1: error: conflicting types for 'build_trtable'; have 
'int(const re_dfa_t *, re_dfastate_t *)'
.../pr59858.c:79:12: note: previous declaration of 'build_trtable' with type 
'int(void)'

Adding -std=c17 removes these errors.

Also, updated test case to use -mcpu=unset/-march=unset feature
introduced in r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr59858.c: Use -std=c17 and effective-target
arm_arch_v5te_thumb.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/pr59858.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 9336edfce27..8fc63b57af4 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC 
-w -fpermissive" } */
+/* { dg-options "-std=c17 -fno-builtin -fno-stack-protector -Os 
-fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w -fpermissive" } */
 /* { dg-require-effective-target fpic } */
-/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
-mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
 /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
+/* { dg-add-options arm_arch_v5te_thumb } */
 
 typedef enum {
  REG_ENOSYS = -1,
-- 
2.25.1

Re: [PATCH] c++/modules: Handle chaining already-imported local types [PR114630]

2025-01-09 Thread Jason Merrill


On 1/9/25 9:31 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?


OK.


-- >8 --

In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.

It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.

So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.

A couple of suggested implementations from the discussion on the PR that
don't work:

- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
   doesn't handle the case where type definitions are followed by regular
   local variables, since those won't have been imported as separate
   backreferences and so the chains will diverge.

- Correcting the purviewness of GMF template instantiations to force Y
   to emit copies of the local types rather than backreferences into X is
   insufficient, as it's still possible that the local types got streamed
   in a separate cluster to the function definition, and so will be again
   referred to via regular backreferences when importing.

- Likewise, preventing the emission of function definitions where an
   import has already provided that same definition also is insufficient,
   for much the same reason.

PR c++/114630

gcc/cp/ChangeLog:

* module.cc (trees_in::core_vals) : Chain a new node if
DECL_CHAIN already is set.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  | 14 +-
  gcc/testsuite/g++.dg/modules/pr114630.h   | 11 +++
  gcc/testsuite/g++.dg/modules/pr114630_a.C |  7 +++
  gcc/testsuite/g++.dg/modules/pr114630_b.C |  8 
  gcc/testsuite/g++.dg/modules/pr114630_c.C |  4 
  5 files changed, 43 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630.h
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_c.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 5350e6c4bad..ff2683de73e 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6928,11 +6928,23 @@ trees_in::core_vals (tree t)
   body) anyway.  */
decl = maybe_duplicate (decl);
  
-	if (!DECL_P (decl) || DECL_CHAIN (decl))

+   if (!DECL_P (decl))
  {
set_overrun ();
break;
  }
+
+   /* If DECL_CHAIN is already set then this was a backreference to a
+  local type or enumerator from a previous read (PR c++/114630).
+  Let's copy the node so we can keep building the chain for ODR
+  checking later.  */
+   if (DECL_CHAIN (decl))
+ {
+   gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
+&& find_duplicate (DECL_CONTEXT (decl)));
+   decl = copy_node (decl);
+ }
+
*chain = decl;
chain = &DECL_CHAIN (decl);
  }
diff --git a/gcc/testsuite/g++.dg/modules/pr114630.h 
b/gcc/testsuite/g++.dg/modules/pr114630.h
new file mode 100644
index 000..8730007f59f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr114630.h
@@ -0,0 +1,11 @@
+template 
+void _M_do_parse() {
+  struct A {};
+  struct B {};
+  int x;
+}
+
+template  struct formatter;
+template <> struct formatter {
+  void parse() { _M_do_parse(); }
+};
diff --git a/gcc/testsuite/g++.dg/modules/pr114630_a.C 
b/gcc/testsuite/g++.dg/modules/pr114630_a.C
new file mode 100644
index 000..ecfd7ca0b28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr114630_a.C
@@ -0,0 +1,7 @@
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi X }
+
+module;
+#include "pr114630.h"
+export module X;
+formatter a;
diff --git a/gcc/testsuite/g++.dg/modules/pr114630_b.C 
b/gcc/testsuite/g++.dg/modules/pr114630_b.C
new file mode 100644
index 000..52fe04e2ce0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr114630_b.C
@@ -0,0 +1,8 @@
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi Y }
+
+module;
+#include "pr114630.h"
+export module Y;
+import X;
+formatter b;
diff --git a/gcc/testsuite/g++.dg/modules/pr1

Re: [PATCH] c++/modules: Handle chaining already-imported local types [PR114630]

2025-01-09 Thread Patrick Palka

On Fri, 10 Jan 2025, Nathaniel Shead wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?

Nice approach, thanks for fixing this!

> 
> -- >8 --
> 
> In the linked testcase, an ICE occurs because when reading the
> (duplicate) function definition for _M_do_parse from module Y, the local
> type definitions have already been streamed from module X and setup as
> regular backreferences, rather than being found with find_duplicate,
> causing issues with managing DECL_CHAIN.
> 
> It is tempting to just skip setting up the DECL_CHAIN for this case.
> However, for the future it would be best to ensure that the block vars
> for the duplicate definition are accurate, so that we could implement
> ODR checking on function definitions at some point.
> 
> So to solve this, this patch creates a copy of the streamed-in local
> type and chains that; it will be discarded along with the rest of the
> duplicate function after we've finished processing.
> 
> A couple of suggested implementations from the discussion on the PR that
> don't work:
> 
> - Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
>   doesn't handle the case where type definitions are followed by regular
>   local variables, since those won't have been imported as separate
>   backreferences and so the chains will diverge.
> 
> - Correcting the purviewness of GMF template instantiations to force Y
>   to emit copies of the local types rather than backreferences into X is
>   insufficient, as it's still possible that the local types got streamed
>   in a separate cluster to the function definition, and so will be again
>   referred to via regular backreferences when importing.
> 
> - Likewise, preventing the emission of function definitions where an
>   import has already provided that same definition also is insufficient,
>   for much the same reason.
> 
>   PR c++/114630
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (trees_in::core_vals) : Chain a new node if
>   DECL_CHAIN already is set.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/pr114630.h: New test.
>   * g++.dg/modules/pr114630_a.C: New test.
>   * g++.dg/modules/pr114630_b.C: New test.
>   * g++.dg/modules/pr114630_c.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc  | 14 +-
>  gcc/testsuite/g++.dg/modules/pr114630.h   | 11 +++
>  gcc/testsuite/g++.dg/modules/pr114630_a.C |  7 +++
>  gcc/testsuite/g++.dg/modules/pr114630_b.C |  8 
>  gcc/testsuite/g++.dg/modules/pr114630_c.C |  4 
>  5 files changed, 43 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630.h
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_c.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 5350e6c4bad..ff2683de73e 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -6928,11 +6928,23 @@ trees_in::core_vals (tree t)
>  body) anyway.  */
>   decl = maybe_duplicate (decl);
>  
> - if (!DECL_P (decl) || DECL_CHAIN (decl))
> + if (!DECL_P (decl))
> {
>   set_overrun ();
>   break;
> }
> +
> + /* If DECL_CHAIN is already set then this was a backreference to a
> +local type or enumerator from a previous read (PR c++/114630).
> +Let's copy the node so we can keep building the chain for ODR
> +checking later.  */
> + if (DECL_CHAIN (decl))
> +   {
> + gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
> +  && find_duplicate (DECL_CONTEXT (decl)));
> + decl = copy_node (decl);

Shall we use copy_decl here instead so that any DECL_LANG_SPECIFIC node
is copied as well?  IIUC we usually don't share DECL_LANG_SPECIFIC
between decls.

> +   }
> +
>   *chain = decl;
>   chain = &DECL_CHAIN (decl);
> }
> diff --git a/gcc/testsuite/g++.dg/modules/pr114630.h 
> b/gcc/testsuite/g++.dg/modules/pr114630.h
> new file mode 100644
> index 000..8730007f59f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr114630.h
> @@ -0,0 +1,11 @@
> +template 
> +void _M_do_parse() {
> +  struct A {};
> +  struct B {};
> +  int x;
> +}
> +
> +template  struct formatter;
> +template <> struct formatter {
> +  void parse() { _M_do_parse(); }
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/pr114630_a.C 
> b/gcc/testsuite/g++.dg/modules/pr114630_a.C
> new file mode 100644
> index 000..ecfd7ca0b28
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr114630_a.C
> @@ -0,0 +1,7 @@
> +// { dg-additional-options "-fmodules" }
> +// { dg-module-cmi X }
> +
> +module;
> +#include "pr114630.h"
> +export module X;
> +formatter a;
> diff --git a/gcc/testsuite/g++.dg/modules/pr11

[pushed][PR118017][LRA]: Don't inherit reg of non-uniform reg class

2025-01-09 Thread Vladimir Makarov


The patch in the attachment solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118017

The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit 6ffaed8d8713874b7c4ee112249ed8a91ff9
Author: Vladimir N. Makarov 
Date:   Thu Jan 9 16:22:02 2025 -0500

[PR118017][LRA]: Don't inherit reg of non-uniform reg class

In the PR case LRA inherited value of register of class INT_SSE_REGS
which resulted in LRA cycling when LRA tried to use different move
alternatives with SSE/general regs and memory.  The patch rejects to
inherit such (non-uniform) classes to prevent cycling.

gcc/ChangeLog:

PR target/118017
* lra-constraints.cc (inherit_reload_reg): Check reg class on uniformity.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index a0f05b290dd..8f32e98f1c4 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5878,6 +5878,20 @@ inherit_reload_reg (bool def_p, int original_regno,
 	}
   return false;
 }
+  if (ira_reg_class_min_nregs[rclass][GET_MODE (original_reg)]
+  != ira_reg_class_max_nregs[rclass][GET_MODE (original_reg)])
+{
+  if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "Rejecting inheritance for %d "
+		   "because of requiring non-uniform class %s\n",
+		   original_regno, reg_class_names[rclass]);
+	  fprintf (lra_dump_file,
+		   ">>\n");
+	}
+  return false;
+}
   new_reg = lra_create_new_reg (GET_MODE (original_reg), original_reg,
 rclass, NULL, "inheritance");
   start_sequence ();
diff --git a/gcc/testsuite/gcc.target/i386/pr118017.c b/gcc/testsuite/gcc.target/i386/pr118017.c
new file mode 100644
index 000..c82d71e8d29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr118017.c
@@ -0,0 +1,21 @@
+/* PR target/118017 */
+/* { dg-do compile } */
+/* { dg-options "-Og -frounding-math -mno-80387 -mno-mmx -Wno-psabi" } */
+
+typedef __attribute__((__vector_size__ (64))) _Float128 F;
+typedef __attribute__((__vector_size__ (64))) _Decimal64 G;
+typedef __attribute__((__vector_size__ (64))) _Decimal128 H;
+
+void
+bar(_Float32, _BitInt(1025), _BitInt(1025), _Float128, __int128, __int128,  F,
+int, int, G, _Float64, __int128, __int128, H, F);
+
+
+void
+foo ()
+{
+  bar ((__int128)68435455, 0, 0, 0, 0, 0, (F){}, 0, 0, (G){3689348814741910323},
+   0, 0, 0, (H){0, (_Decimal128) ((__int128) 860933398830926 << 64),
+   (_Decimal128) ((__int128) 966483857959145 << 64), 4},
+   (F){(__int128) 3689348814741910323 << 64 | 3});
+}

Re: [PATCH] testsuite: arm: Use -Os in memset-inline-8* tests

2025-01-09 Thread Torbjorn SVENSSON





On 2025-01-09 12:42, Richard Earnshaw (lists) wrote:

On 22/12/2024 15:27, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

When the test was initially created, -fcommon was the default, but in
commit r10-4867-g6271dd984d7 the default value changed to -fno-common.
This change made the test start failing. To counter the over-alignment
caused by 'a' no longer being common, use -Os.

gcc/testsuite/ChangeLog:

* gcc.target/arm/memset-inline-8.c: Use -Os and prefix assembler
instructions with a tab to improve test stability.
* gcc.target/arm/memset-inline-8-exe.c: Use -Os.

Signed-off-by: Torbjörn SVENSSON 


OK.


Pushed as r15-6746-g681934aead9 and r14.2.0-645-g1f509da6d7c.

Kind regards,
Torbjörn



R.

[PATCH RFA (diagnostic)] c++: modules and #pragma diagnostic

2025-01-09 Thread Jason Merrill

Tested x86_64-pc-linux-gnu.  Is the diagnostic.h change OK for trunk?

-- 8< --

To respect the #pragma diagnostic lines in libstdc++ headers when compiling
with module std, we need to represent them in the module.

I think it's reasonable to make module_state a friend of
diagnostic_option_classifier to allow direct access to the data.  This is a
different approach from how Jakub made PCH streaming members of
diagnostic_option_classifier, but it seems to me that modules handling
belongs in module.cc.

gcc/ChangeLog:

* diagnostic.h: Add friends.

gcc/cp/ChangeLog:

* module.cc (module_state::write_diagnostic_classification): New.
(module_state::write_begin): Call it.
(module_state::read_diagnostic_classification): New.
(module_state::read_initial): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/warn-spec-3_a.C: New test.
* g++.dg/modules/warn-spec-3_b.C: New test.
---
 gcc/diagnostic.h |  4 +
 gcc/cp/module.cc | 86 +++-
 gcc/testsuite/g++.dg/modules/warn-spec-3_a.C | 20 +
 gcc/testsuite/g++.dg/modules/warn-spec-3_b.C |  8 ++
 4 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-3_b.C

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 202760b2f85..91bde3ff06c 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -299,6 +299,8 @@ private:
 
   /* For pragma push/pop.  */
   vec m_push_list;
+
+  friend class module_state;
 };
 
 /* A bundle of options relating to printing the user's source code
@@ -807,6 +809,8 @@ private:
   /* The stack of sets of overridden diagnostic option severities.  */
   diagnostic_option_classifier m_option_classifier;
 
+  friend class module_state;
+
   /* True if we should print any CWE identifiers associated with
  diagnostics.  */
   bool m_show_cwe;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 78fb21dc22f..49c9c092163 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3876,6 +3876,9 @@ class GTY((chain_next ("%h.parent"), for_user)) 
module_state {
   void write_macro_maps (elf_out *to, range_t &, unsigned *crc_ptr);
   bool read_macro_maps (line_map_uint_t);
 
+  void write_diagnostic_classification (elf_out *, diagnostic_context *, 
unsigned *);
+  bool read_diagnostic_classification (diagnostic_context *);
+
  private:
   void write_define (bytes_out &, const cpp_macro *);
   cpp_macro *read_define (bytes_in &, cpp_reader *) const;
@@ -17637,6 +17640,78 @@ module_state::write_ordinary_maps (elf_out *to, 
range_t &info,
   dump.outdent ();
 }
 
+/* Write out any #pragma GCC diagnostic info to the .dgc section.  */
+
+void
+module_state::write_diagnostic_classification (elf_out *to,
+  diagnostic_context *dc,
+  unsigned *crc_p)
+{
+  auto &changes = dc->m_option_classifier.m_classification_history;
+
+  dump () && dump ("Writing diagnostic change locations");
+  dump.indent ();
+
+  bytes_out sec (to);
+  if (sec.streaming_p ())
+sec.begin ();
+
+  unsigned len = changes.length ();
+  dump () && dump ("Diagnostic changes: %u", len);
+  if (sec.streaming_p ())
+sec.u (len);
+
+  for (const auto &c: changes)
+{
+  write_location (sec, c.location);
+  if (sec.streaming_p ())
+   {
+ sec.u (c.option);
+ sec.u (c.kind);
+   }
+}
+
+  if (sec.streaming_p ())
+sec.end (to, to->name (MOD_SNAME_PFX ".dgc"), crc_p);
+  dump.outdent ();
+}
+
+/* Read any #pragma GCC diagnostic info from the .dgc section.  */
+
+bool
+module_state::read_diagnostic_classification (diagnostic_context *dc)
+{
+  bytes_in sec;
+
+  if (!sec.begin (loc, from (), MOD_SNAME_PFX ".dgc"))
+return false;
+
+  dump () && dump ("Reading diagnostic change locations");
+  dump.indent ();
+
+  unsigned len = sec.u ();
+  dump () && dump ("Diagnostic changes: %u", len);
+
+  auto &changes = dc->m_option_classifier.m_classification_history;
+  unsigned offset = changes.length ();
+  changes.reserve (len);
+  for (unsigned i = 0; i < len; ++i)
+{
+  location_t loc = read_location (sec);
+  int opt = sec.u ();
+  diagnostic_t kind = (diagnostic_t) sec.u ();
+  if (kind == DK_POP)
+   opt += offset;
+  changes.quick_push ({ loc, opt, kind });
+}
+
+  dump.outdent ();
+  if (!sec.end (from ()))
+return false;
+
+  return true;
+}
+
 void
 module_state::write_macro_maps (elf_out *to, range_t &info, unsigned *crc_p)
 {
@@ -19231,6 +19306,8 @@ module_state::write_begin (elf_out *to, cpp_reader 
*reader,
   if (is_header ())
 macros = prepare_macros (reader);
 
+  write_diagnostic_classification (nullptr, global_dc, nullptr);
+
   config.num_imports = mod_hwm;
   config.num_partitions = modules->length () - mod_hwm;
   auto map_info = write_prepare_maps (&config, bo

Re: [PATCH] c++: ICE with pack indexing and partial inst [PR117937]

2025-01-09 Thread Jason Merrill


On 12/20/24 12:54 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK


-- >8 --
Here we ICE in expand_expr_real_1:

   if (exp)
 {
   tree context = decl_function_context (exp);
   gcc_assert (SCOPE_FILE_SCOPE_P (context)
   || context == current_function_decl

on something like this test:

   void
   f (auto... args)
   {
 [&](seq) {
g(args...[i]...);
 }(seq<0>());
   }

because while current_function_decl is:

   f(int)::)> [with long unsigned int ...i = {0}]

(correct), context is:

   f(int)::)>

which is only the partial instantiation.

I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it.  The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator().  So we should let retrieve_local_specialization
find the right args#0.

PR c++/117937

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.
---
  gcc/cp/pt.cc |  8 +++
  gcc/testsuite/g++.dg/cpp26/pack-indexing13.C | 23 
  gcc/testsuite/g++.dg/cpp26/pack-indexing14.C | 18 +++
  3 files changed, 49 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
  create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing14.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 7fa286698ef..c40b147e837 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14063,6 +14063,14 @@ tsubst_pack_index (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
tree pack = PACK_INDEX_PACK (t);
if (PACK_EXPANSION_P (pack))
  pack = tsubst_pack_expansion (pack, args, complain, in_decl);
+  else
+{
+  /* PACK can be {*args#0} whose args#0's value-expr refers to
+a partially instantiated closure.  Let tsubst find the
+fully-instantiated one.  */
+  gcc_assert (TREE_CODE (pack) == TREE_VEC);
+  pack = tsubst (pack, args, complain, in_decl);
+}
if (TREE_CODE (pack) == TREE_VEC && TREE_VEC_LENGTH (pack) == 0)
  {
if (complain & tf_error)
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
new file mode 100644
index 000..e0dd9c21c67
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing13.C
@@ -0,0 +1,23 @@
+// PR c++/117937
+// { dg-do compile { target c++26 } }
+
+using size_t = decltype(sizeof(0));
+
+template
+struct seq {};
+
+void g(auto...) {}
+
+void
+f (auto... args)
+{
+  [&](seq) {
+  g(args...[i]...);
+  }(seq<0>());
+}
+
+int
+main ()
+{
+  f(0);
+}
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C
new file mode 100644
index 000..c8a67ee16ed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing14.C
@@ -0,0 +1,18 @@
+// PR c++/117937
+// { dg-do compile { target c++26 } }
+
+void operate_one(const int) {}
+
+template
+void operate_multi(T... args)
+{
+[&]()
+{
+   ::operate_one(args...[idx]);
+}.template operator()<0>();
+}
+
+int main()
+{
+::operate_multi(0);
+}

base-commit: 219ddae16f9d724baeff86934f8981aa5ef7b95f

Re: [PATCH] testsuite: arm: Verify asm per function for armv8_2-fp16-conv-1.c

2025-01-09 Thread Torbjorn SVENSSON





On 2025-01-09 12:56, Richard Earnshaw (lists) wrote:

On 27/12/2024 17:01, Torbjörn SVENSSON wrote:

Ok for trunk?

--

This change will enforce that the expected instructions are generated
per function rather than allowing some other function to use the
expected instructions.

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Convert
scan-assembler-times to check-function-bodies.

Signed-off-by: Torbjörn SVENSSON 


I'd recommend that you also add "-fno-schedule-insns -fno-schedule-insns2" to 
dg-options to avoid the risk of the scheduler moving code around and breaking the 
sequences.

OK with that change.


Pushed as r15-6745-g794f6721e0e (and the typo fix in r15-6749-g424a9ac45ab).

Note: This commit may cause a "regression" as these 2 fails:

FAIL: gcc.target/arm/armv8_2-fp16-conv-1.c scan-assembler-times 
vcvt\\.s32\\.f64\\ts[0-9]+, d[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-conv-1.c scan-assembler-times 
vcvt\\.s32\\.f64\\ts[0-9]+, d[0-9]+ 1

are replaced with

FAIL: gcc.target/arm/armv8_2-fp16-conv-1.c check-function-bodies f64_to_s16
FAIL: gcc.target/arm/armv8_2-fp16-conv-1.c check-function-bodies f64_to_u16

when testing using march=armv8-m.main+dsp+fp/float-abi=hard/fpu=fpv5-sp-d16. 
The reason for the failures are that __aeabi_d2iz and __aeabi_d2uiz are used.

Note 2: There was no check for what f64_to_u16 would generate before my 2 
changes and that was overlooked when I did this patch. Sorry!


Kind regards,
Torbjörn



R.


---
  .../gcc.target/arm/armv8_2-fp16-conv-1.c  | 99 ---
  1 file changed, 83 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
index c9639a542ae..279aafbc7b4 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
@@ -2,100 +2,167 @@
  /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
  /* { dg-options "-O2" }  */
  /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-final { check-function-bodies "**" "" } } */
  
  /* Test ARMv8.2 FP16 conversions.  */

  #include 
  
+/*

+** f16_to_f32:
+** ...
+** vcvtb\.f32\.f16 (s[0-9]+), \1
+** ...
+*/
  float
  f16_to_f32 (__fp16 a)
  {
return (float)a;
  }
  
+/*

+** f16_to_pf32:
+** ...
+** vcvtb\.f32\.f16 (s[0-9]+), \1
+** ...
+*/
  float
  f16_to_pf32 (__fp16* a)
  {
return (float)*a;
  }
  
+/*

+** f16_to_s16:
+** ...
+** vcvtb\.f32\.f16 (s[0-9]+), \1
+** vcvt\.s32\.f32  \1, \1
+** ...
+*/
  short
  f16_to_s16 (__fp16 a)
  {
return (short)a;
  }
  
+/*

+** pf16_to_s16:
+** ...
+** vcvtb\.f32\.f16 (s[0-9]+), \1
+** vcvt\.s32\.f32  \1, \1
+** ...
+*/
  short
  pf16_to_s16 (__fp16* a)
  {
return (short)*a;
  }
  
-/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } }  */

-
+/*
+** f32_to_f16:
+** ...
+** vcvtb\.f16\.f32 (s[0-9]+), \1
+** ...
+*/
  __fp16
  f32_to_f16 (float a)
  {
return (__fp16)a;
  }
  
+/*

+** f32_to_pf16:
+** ...
+** vcvtb\.f16\.f32 (s[0-9]+), \1
+** ...
+*/
  void
  f32_to_pf16 (__fp16* x, float a)
  {
*x = (__fp16)a;
  }
  
+/*

+** s16_to_f16:
+** ...
+** vcvt\.f32\.s32  (s[0-9]+), \1
+** vcvtb\.f16\.f32 \1, \1
+** ...
+*/
  __fp16
  s16_to_f16 (short a)
  {
return (__fp16)a;
  }
  
+/*

+** s16_to_pf16:
+** ...
+** vcvt\.f32\.s32  (s[0-9]+), \1
+** vcvtb\.f16\.f32 \1, \1
+** ...
+*/
  void
  s16_to_pf16 (__fp16* x, short a)
  {
*x = (__fp16)a;
  }
  
-/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } }  */

-
+/*
+** s16_to_f32:
+** ...
+** vcvt\.f32\.s32  (s[0-9]+), \1
+** ...
+*/
  float
  s16_to_f32 (short a)
  {
return (float)a;
  }
  
-/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } }  */

-
+/*
+** f32_to_s16:
+** ...
+** vcvt\.s32\.f32  (s[0-9]+), \1
+** ...
+*/
  short
  f32_to_s16 (float a)
  {
return (short)a;
  }
  
-/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } }  */

-
+/*
+** f32_to_u16:
+** ...
+** vcvt\.u32\.f32  (s[0-9]+), \1
+** ...
+*/
  unsigned short
  f32_to_u16 (float a)
  {
return (unsigned short)a;
  }
  
-/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } }  */

-
+/*
+** f64_to_s16:
+** ...
+** vcvt\.s32\.f64  s[0-9]+, d[0-9]+
+** ...
+*/
  short
  f64_to_s16 (double a)
  {
return (short)a;
  }
  
-/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */

-
+/*
+** f64_to_s16:
+** ...
+** vcvt\.s32\.f64  s[0-9]+, d[0-9]+
+** ...
+*/
  unsigned short
  f64_to_u16 (double a)
  {
return (unsigned short)a;
  }
-
-/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  
*/
-
-

[PATCH] c++: be permissive about eh spec mismatch for op new

2025-01-09 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

r15-3532 made us more strict about exception-specification mismatches with
the standard library, but let's still be permissive about operator new,
since previously you needed to say throw(std::bad_alloc).

gcc/cp/ChangeLog:

* decl.cc (check_redeclaration_exception_specification): Be more
lenient about ::operator new.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept88.C: New test.
---
 gcc/cp/decl.cc  | 11 +--
 gcc/testsuite/g++.dg/cpp0x/noexcept88.C |  9 +
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept88.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 288da65fd8d..5c6a4996a89 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1398,7 +1398,14 @@ check_redeclaration_exception_specification (tree 
new_decl,
   location_t new_loc = DECL_SOURCE_LOCATION (new_decl);
   auto_diagnostic_group d;
 
-  if (DECL_IN_SYSTEM_HEADER (old_decl) && DECL_EXTERN_C_P (old_decl))
+  /* Be permissive about C++98 vs C++11 operator new declarations.  */
+  bool global_new = (IDENTIFIER_NEW_OP_P (DECL_NAME (new_decl))
+&& CP_DECL_CONTEXT (new_decl) == global_namespace
+&& (nothrow_spec_p (new_exceptions)
+== nothrow_spec_p (old_exceptions)));
+
+  if (DECL_IN_SYSTEM_HEADER (old_decl)
+ && (global_new || DECL_EXTERN_C_P (old_decl)))
/* Don't fuss about the C library; the C library functions are not
   specified to have exception specifications (just behave as if they
   have them), but some implementations include them.  */
@@ -1407,7 +1414,7 @@ check_redeclaration_exception_specification (tree 
new_decl,
/* We used to silently permit mismatched eh specs with
   -fno-exceptions, so only complain if -pedantic.  */
complained = pedwarn (new_loc, OPT_Wpedantic, msg, new_decl);
-  else if (!new_exceptions)
+  else if (!new_exceptions || global_new)
/* Reduce to pedwarn for omitted exception specification.  No warning
   flag for this; silence the warning by correcting the code.  */
complained = pedwarn (new_loc, 0, msg, new_decl);
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept88.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept88.C
new file mode 100644
index 000..9b57fddbf0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept88.C
@@ -0,0 +1,9 @@
+// { dg-options -Wsystem-headers }
+
+#include 
+#include 
+
+void *operator new (std::size_t) throw (std::bad_alloc); // { dg-line decl }
+// { dg-error "dynamic exception spec" "" { target c++17 } decl }
+// { dg-warning "dynamic exception spec" "" { target { c++11 && { ! c++17 } } 
} decl }
+// { dg-warning "different exception spec" "" { target { c++11 && { ! c++17 } 
} } decl }

base-commit: 424a9ac45ab1a80359b22ee30d2815fb0f2c5149
-- 
2.47.1

Re: [PATCH] c/c++: UX improvements to 'too {few,many} arguments' errors (v3) [PR118112]

2025-01-09 Thread David Malcolm

On Thu, 2025-01-09 at 21:15 -0500, Jason Merrill wrote:
> On 1/9/25 7:00 PM, David Malcolm wrote:
> > On Thu, 2025-01-09 at 14:21 -0500, Jason Merrill wrote:
> > 
> > Thanks for taking a look...
> > 
> > > > On 1/9/25 2:11 PM, David Malcolm wrote:
> > > > 
> > > > @@ -4743,7 +4769,38 @@ convert_arguments (tree typelist,
> > > > vec > > > va_gc> **values, tree fndecl,
> > > >  if (typetail && typetail != void_list_node)
> > > >     {
> > > >       if (complain & tf_error)
> > > > -       error_args_num (input_location, fndecl,
> > > > /*too_many_p=*/false);
> > > > +       {
> > > > +     /* Not enough args.
> > > > +Determine minimum number of arguments
> > > > required.  */
> > > > +     int min_expected_num = 0;
> > > > +     bool at_least_p = false;
> > > > +     tree iter = typelist;
> > > > +     while (true)
> > > > +   {
> > > > +     if (!iter)
> > > > +       {
> > > > +     /* Variadic arguments; stop iterating. 
> > > > */
> > > > +     at_least_p = true;
> > > > +     break;
> > > > +       }
> > > > +     if (iter == void_list_node)
> > > > +       /* End of arguments; stop iterating.  */
> > > > +       break;
> > > > +     if (fndecl && TREE_PURPOSE (iter)
> > > > +     && TREE_CODE (TREE_PURPOSE (iter)) !=
> > > > DEFERRED_PARSE)
> > > > 
> > > 
> > > Why are you checking DEFERRED_PARSE?  That indicates a default
> > > argument,
> > > even if it isn't parsed yet.  For that case we should get the
> > > error
> > > in
> > > convert_default_arg rather than pretend there's no default
> > > argument.
> > 
> > I confess that the check for DEFERRED_PARSE was a rather mindless
> > copy
> > and paste by me from the "See if there are default arguments that
> > can be
> > used" logic earlier in the function.
> > 
> > I've removed it in the latest version of the patch.
> >   
> > > > +       {
> > > > +     /* Found a default argument; skip this
> > > > one when
> > > > +counting minimum required.  */
> > > > +     at_least_p = true;
> > > > +     iter = TREE_CHAIN (iter);
> > > > +     continue;
> > > 
> > > We could break here, once you have a default arg the rest of the
> > > parms
> > > need to have them as well.
> > 
> > Indeed; I've updated this in the latest version of the patch, so
> > we break out as soon as we see an arg with a non-null TREE_PURPOSE.
> > 
> > > 
> > > > +       }
> > > > +     ++min_expected_num;
> > > > +     iter = TREE_CHAIN (iter);
> > > > +   }
> > > > +     error_args_num (input_location, fndecl,
> > > > +     min_expected_num, actual_num,
> > > > at_least_p);
> > > > +       }
> > > >       return -1;
> > > >     }
> > 
> > Here's a v3 version of the patch, which is currently going through
> > my tester.
> > 
> > OK for trunk if it passes bootstrap®rtesting?
> 
> OK.

Thanks.  However, it turns out that I may have misspoke, and the v2
patch might have been the correct approach: if we bail out on the first
arg with a TREE_PURPOSE then in
gcc/testsuite/g++.dg/cpp0x/variadic169.C

   1   │ // DR 2233
   2   │ // { dg-do compile { target c++11 } }
   3   │ 
   4   │ template void f(int n = 0, T ...t);
   5   │ 
   6   │ int main()
   7   │ {
   8   │   f(); // { dg-error "too few arguments to
function '\[^\n\r\]*'; expected at least 1, have 0" }
   9   │ }

we instead emit the nonsensical "expected at least 0, have 0":

error: too few arguments to function ‘void f(int, T ...) [with T =
{int}]’; expected at least 0, have 0
8 |   f(); // { dg-error "too few
arguments to function '\[^\n\r\]*'; expected at least 1, have 0" }
  |   ~~^~
../../src/gcc/testsuite/g++.dg/cpp0x/variadic169.C:4:30: note: declared
here
4 | template void f(int n = 0, T ...t);
  |  ^

whereas with the v2 patch we count the trailing arg after the default
arg, and we emit "expected at least 1, have 0", which seems correct to
me.

I'm testing a version of the patch that continues to iterate after a
TREE_PURPOSE (as v2, but but dropping the check on DEFERRED_PARSE).

Thanks
Dave

[PATCH] testsuite: arm: Fix typo in gcc.target/arm/armv8_2-fp16-conv-1.c

2025-01-09 Thread Torbjörn SVENSSON

While writing the summary for my push of r15-6745-g794f6721e0e, I
noticed the following typo.

Pushed this patch as obivous.

--

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Fix typo.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
index 517ffd7e123..e5145b993ed 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
@@ -156,9 +156,9 @@ f64_to_s16 (double a)
 }
 
 /*
-** f64_to_s16:
+** f64_to_u16:
 ** ...
-** vcvt\.s32\.f64  s[0-9]+, d[0-9]+
+** vcvt\.u32\.f64  s[0-9]+, d[0-9]+
 ** ...
 */
 unsigned short
-- 
2.25.1

Re: [Committed] RISC-V: testsuite: fix target selector for sync_char_short

2025-01-09 Thread Edwin Lu


Thanks! Committed.

Edwin

On 1/9/2025 1:04 PM, Jeff Law wrote:



On 1/9/25 11:33 AM, Edwin Lu wrote:

The effective-target selector for riscv on sync_char_short did not
check to see if atomics were enabled. As a result, these test cases were
ran on targets without the a extension. Add additional checks for zalrsc
or zabha extensions.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix effective target sync_char_short
for riscv*-*-*

OK
jeff

Re: [PATCH] c++, v3: Fix ICEs with large initializer lists or ones including #embed [PR118124]

2025-01-09 Thread Jason Merrill


On 12/20/24 4:24 AM, Jakub Jelinek wrote:

On Thu, Dec 19, 2024 at 07:01:39PM +0100, Jakub Jelinek wrote:

So far lightly tested, ok for trunk this way if it passes bootstrap & testing?


Bootstrap/regtest found an issue, warning about
   if ()
 for ()
   if ()
   else if ()
   else
so I've added {}s around it (no other changes from the previous patch).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-12-20  Jakub Jelinek  

PR c++/118124
* call.cc (convert_like_internal): Handle RAW_DATA_CST in
ck_list handling.  Formatting fixes.

* g++.dg/cpp/embed-15.C: New test.
* g++.dg/cpp/embed-16.C: New test.
* g++.dg/cpp0x/initlist-opt3.C: New test.
* g++.dg/cpp0x/initlist-opt4.C: New test.

--- gcc/cp/call.cc.jj   2024-12-11 17:27:52.481221310 +0100
+++ gcc/cp/call.cc  2024-12-19 18:50:52.478315892 +0100
@@ -8766,8 +8766,8 @@ convert_like_internal (conversion *convs
  
  	if (tree init = maybe_init_list_as_array (elttype, expr))

  {
-   elttype = cp_build_qualified_type
- (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST);
+   elttype = cp_build_qualified_type (elttype, cp_type_quals (elttype)
+   | TYPE_QUAL_CONST);


Emacs won't preserve this formatting, it will move the | left to line up 
with the (.



array = build_array_of_n_type (elttype, len);
array = build_vec_init_expr (array, init, complain);
array = get_target_expr (array);
@@ -8775,13 +8775,85 @@ convert_like_internal (conversion *convs
  }
else if (len)
  {
-   tree val; unsigned ix;
-
+   tree val;
+   unsigned ix;
tree new_ctor = build_constructor (init_list_type_node, NULL);
  
  	/* Convert all the elements.  */

FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), ix, val)
  {
+   if (TREE_CODE (val) == RAW_DATA_CST)
+ {
+   tree elt_type;
+   conversion *next;
+   /* For conversion to initializer_list or
+  initializer_list or initializer_list
+  we can optimize and keep RAW_DATA_CST with adjusted
+  type if we report narrowing errors if needed, for
+  others this converts each element separately.  */
+   if (convs->u.list[ix]->kind == ck_std
+   && (elt_type = convs->u.list[ix]->type)
+   && (TREE_CODE (elt_type) == INTEGER_TYPE
+   || is_byte_access_type (elt_type))
+   && TYPE_PRECISION (elt_type) == CHAR_BIT
+   && (next = next_conversion (convs->u.list[ix]))
+   && next->kind == ck_identity)
+ {
+   if (!TYPE_UNSIGNED (elt_type)
+   && (TYPE_UNSIGNED (TREE_TYPE (val))
+   || (TYPE_PRECISION (TREE_TYPE (val))
+   > CHAR_BIT)))


Is it possible to have a RAW_DATA_CST with elements larger than char?


+ for (int i = 0; i < RAW_DATA_LENGTH (val); ++i)
+   {
+ if (RAW_DATA_SCHAR_ELT (val, i) >= 0)
+   continue;
+ else if (complain & tf_error)
+   {
+ location_t loc
+   = cp_expr_loc_or_input_loc (val);
+ int savederrorcount = errorcount;
+ permerror_opt (loc, OPT_Wnarrowing,
+"narrowing conversion of "
+"%qd from %qH to %qI",
+RAW_DATA_UCHAR_ELT (val, i),
+TREE_TYPE (val), elt_type);
+ if (errorcount != savederrorcount)
+   return error_mark_node;
+   }
+ else
+   return error_mark_node;
+   }
+   tree sub = copy_node (val);
+   TREE_TYPE (sub) = elt_type;
+   CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_ctor),
+   NULL_TREE, sub);
+ }
+   else
+ {
+   for (int i = 0; i < RAW_DATA_LENGTH (val); ++i)
+ {
+   tree elt
+ = build_int_cst (TREE_TYPE (val),
+  RAW_DATA_UCHAR_ELT (val, i));

Re: [PATCH] c++, v2: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124]

2025-01-09 Thread Jason Merrill


On 12/19/24 1:38 PM, Jakub Jelinek wrote:

On Thu, Dec 19, 2024 at 11:44:54AM -0500, Jason Merrill wrote:

--- gcc/cp/call.cc.jj   2024-12-19 16:10:12.977071898 +0100
+++ gcc/cp/call.cc  2024-12-19 16:55:40.953546502 +0100
@@ -4386,7 +4386,13 @@ maybe_init_list_as_array (tree elttype,
 if (!is_xible (INIT_EXPR, elttype, copy_argtypes))
   return NULL_TREE;
-  tree arr = build_array_of_n_type (init_elttype, CONSTRUCTOR_NELTS (init));
+  unsigned int len = CONSTRUCTOR_NELTS (init);
+  if (INTEGRAL_TYPE_P (init_elttype))
+for (constructor_elt &e: CONSTRUCTOR_ELTS (init))
+  if (TREE_CODE (e.value) == RAW_DATA_CST)
+   len += RAW_DATA_LENGTH (e.value) - 1;


Really seems like we could use a function to ask how many elements a
CONSTRUCTOR initializes, perhaps as a wrapper around
categorize_ctor_elements?


I think categorize_ctor_elements is heavy-weight and computes tons of info
we don't need here, but more importantly does something different, it recurses
into each of the elements as well.  True, { { 0, 1 }, { 2, 3 } } would likely 
fail
braced_init_element_type, but still...  And it would be upset about ctors
with yet to be determined types.
Another question is if the function for this purpose should count
RANGE_EXPRs or not, e.g. I think convert_like_internal will simply not
handle them, when the loop is FOR_EACH_CONSTRUCTOR_VALUE it simply doesn't
consider them at all.  I think we currently reject
A a { 1, 2, [ 2 ... 6 ] = 3 };
And am not really sure a function like that would be useful for other FEs
or middle-end, given that C designated initializers can skip or initialize
elements out of order, so simply counting them isn't good enough, one would
need to find the highest index (implicit or explicit) or something similar.

This version just adds a static function so far used just here.

Also, I'm worried about build_array_of_n_type, which currently takes int
argument.  It is true that vectors, CONSTRUCTOR_ELTS etc. usually count
stuff in unsigned int or sometimes even in int, so without #embed
or RAW_DATA_CST it isn't really possible to have 2GB+ or 4GB+ elt
initializers (and without them it would be a compile time nightmare anyway).
With #embed/RAW_DATA_CST it isn't that hard to cross that boundary though,
so the patch changes it to uhwi.  One still needs to be careful not to
case breakup of the huge RAW_DATA_CSTs, but at least simple
std::initializer_list from say 16GB #embed should work.


OK.


2024-12-19  Jakub Jelinek  

PR c++/118124
* cp-tree.h (build_array_of_n_type): Change second argument type
from int to unsigned HOST_WIDE_INT.
* tree.cc (build_array_of_n_type): Likewise.
* call.cc (count_ctor_elements): New function.
(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
(convert_like_internal): Use length from init's type instead of
len when handling the maybe_init_list_as_array case.

* g++.dg/cpp0x/initlist-opt5.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-12-19 18:47:01.98895 +0100
+++ gcc/cp/cp-tree.h2024-12-19 19:12:10.252716265 +0100
@@ -8156,7 +8156,7 @@ extern tree build_aggr_init_expr  (tree,
  extern tree get_target_expr   (tree,
 tsubst_flags_t = 
tf_warning_or_error);
  extern tree build_cplus_array_type(tree, tree, int is_dep = -1);
-extern tree build_array_of_n_type  (tree, int);
+extern tree build_array_of_n_type  (tree, unsigned HOST_WIDE_INT);
  extern bool array_of_runtime_bound_p  (tree);
  extern bool vla_type_p(tree);
  extern tree build_array_copy  (tree);
--- gcc/cp/tree.cc.jj   2024-12-19 11:35:59.227312977 +0100
+++ gcc/cp/tree.cc  2024-12-19 19:12:34.291385303 +0100
@@ -1207,7 +1207,7 @@ build_cplus_array_type (tree elt_type, t
  /* Return an ARRAY_TYPE with element type ELT and length N.  */
  
  tree

-build_array_of_n_type (tree elt, int n)
+build_array_of_n_type (tree elt, unsigned HOST_WIDE_INT n)
  {
return build_cplus_array_type (elt, build_index_type (size_int (n - 1)));
  }
--- gcc/cp/call.cc.jj   2024-12-19 18:50:52.478315892 +0100
+++ gcc/cp/call.cc  2024-12-19 19:13:15.528817544 +0100
@@ -4325,6 +4325,20 @@ has_non_trivial_temporaries (tree expr)
return false;
  }
  
+/* Return number of initialized elements in CTOR.  */

+
+static unsigned HOST_WIDE_INT
+count_ctor_elements (tree ctor)
+{
+  unsigned HOST_WIDE_INT len = 0;
+  for (constructor_elt &e: CONSTRUCTOR_ELTS (ctor))
+if (TREE_CODE (e.value) == RAW_DATA_CST)
+  len += RAW_DATA_LENGTH (e.value);
+else
+  ++len;
+  return len;
+}
+
  /* We're initializing an array of ELTTYPE from INIT.  If it seems useful,
 return INIT as an array (of its own type) so the caller can initialize the
 target array in a loop.  */
@@ -4386,7 +4400,8 @@ maybe_init_list_as_array (tree elttype,
if (!is_xibl

Re: [PATCH] c++, gimplify: Clear zero padding in empty types [PR118002]

2025-01-09 Thread Jason Merrill


On 12/12/24 4:41 AM, Jakub Jelinek wrote:

Hi!

I believe we need to clear padding bits even in empty types when using
zero initialization,
https://eel.is/c++draft/dcl.init.general#6.2
doesn't have an exception for empty types.
I came to this when playing with an optimization for PR116416 to improve
tree-ssa/pr78687.C testcase back.

Initially I had in the patch also
--- gcc/cp/cp-gimplify.cc.jj2024-12-11 12:46:32.958466985 +0100
+++ gcc/cp/cp-gimplify.cc   2024-12-11 16:23:11.598860505 +0100
@@ -674,7 +674,10 @@ cp_gimplify_expr (tree *expr_p, gimple_s
   TREE_OPERAND (*expr_p, 1) = build1 (VIEW_CONVERT_EXPR,
   TREE_TYPE (op0), op1);

-   else if (simple_empty_class_p (TREE_TYPE (op0), op1, code))
+   else if (simple_empty_class_p (TREE_TYPE (op0), op1, code)
+&& (TREE_CODE (*expr_p) != INIT_EXPR
+|| TREE_CODE (op1) != CONSTRUCTOR
+|| !CONSTRUCTOR_ZERO_PADDING_BITS (op1)))
   {
 while (TREE_CODE (op1) == TARGET_EXPR)
   /* We're disconnecting the initializer from its target,
hunk but that regressed the g++.dg/init/empty1.C testcase, where
the empty bases are overlaid with other data members and the test
wants to ensure that the non-static data members aren't overwritten
when initializing the base padding.
On the other side, with this patch and the cp-gimplify.cc hunk plus
the optimization I'm going to post we change
-  D.10177 = {};
+  D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {};
in the gimple dump (option_2 is zero initialized there), while with
just this patch and the optimization and no cp-gimplify.cc hunk
   D.10177 = {};
is simply removed altogether and no clearing done.
So, I'm not 100% sure if what the patch below does is 100% safe not to
overwrite the overlaid stuff, but at least testsuite doesn't reveal
anything further, and on the other side clears padding in everything it
should.

Earlier version of this patch (with the cp-gimplify.cc hunk and
without the TYPE_SIZE/integer_zerop subconditions) has been bootstrapped
and regtested on x86_64-linux and i686-linux, this version just tested
on the set of tests which regressed.

2024-12-12  Jakub Jelinek  

PR c++/118002
gcc/
* gimplify.cc (gimplify_init_constructor, gimplify_modify_expr):
Don't optimize away INIT_EXPRs of empty classes with rhs CONSTRUCTOR
with CONSTRUCTOR_ZERO_PADDING_BITS.
gcc/testsuite/
* g++.dg/cpp0x/zero-init2.C: New test.

--- gcc/gimplify.cc.jj  2024-12-07 11:35:49.475439705 +0100
+++ gcc/gimplify.cc 2024-12-12 09:38:03.865543272 +0100
@@ -6094,8 +6094,15 @@ gimplify_init_constructor (tree *expr_p,
   not emitted an assignment, do so now.   */
if (*expr_p
/* If the type is an empty type, we don't need to emit the
-assignment. */
-  && !is_empty_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0
+assignment.  Except when rhs is a CONSTRUCTOR with
+CONSTRUCTOR_ZERO_PADDING_BITS.  */
+  && (!is_empty_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0)))
+ || (is_init_expr
+ && TREE_CODE (TREE_OPERAND (*expr_p, 1)) == CONSTRUCTOR
+ && CONSTRUCTOR_ZERO_PADDING_BITS (TREE_OPERAND (*expr_p, 1))
+ && TYPE_SIZE (TREE_TYPE (TREE_OPERAND (*expr_p, 0)))
+ && !integer_zerop (TYPE_SIZE (TREE_TYPE (TREE_OPERAND (*expr_p,
+0)))
  {
tree lhs = TREE_OPERAND (*expr_p, 0);
tree rhs = TREE_OPERAND (*expr_p, 1);
@@ -6685,7 +6692,14 @@ gimplify_modify_expr (tree *expr_p, gimp
/* Don't do this for calls that return addressable types, expand_call
 relies on those having a lhs.  */
&& !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
-  && TREE_CODE (*from_p) == CALL_EXPR))
+  && TREE_CODE (*from_p) == CALL_EXPR)
+  /* And similarly don't do that for rhs being CONSTRUCTOR with
+CONSTRUCTOR_ZERO_PADDING_BITS set.  */
+  && !(TREE_CODE (*expr_p) == INIT_EXPR
+  && TREE_CODE (*to_p) == CONSTRUCTOR
+  && CONSTRUCTOR_ZERO_PADDING_BITS (*to_p)


Shouldn't these two *to_p be *from_p?  Is this hunk actually doing 
anything as is?



+  && TYPE_SIZE (TREE_TYPE (*from_p))
+  && !integer_zerop (TYPE_SIZE (TREE_TYPE (*from_p)
  {
gimplify_stmt (from_p, pre_p);
gimplify_stmt (to_p, pre_p);
--- gcc/testsuite/g++.dg/cpp0x/zero-init2.C.jj  2024-12-11 16:50:26.513845473 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/zero-init2.C 2024-12-11 16:50:45.879572789 
+0100
@@ -0,0 +1,37 @@
+// PR c++/118002
+// { dg-do run { target c++11 } }
+// { dg-options "-O0" }
+
+struct S {};
+struct T { S a, b, c, d, e, f, g, h; };
+struct U { T i, j, k, l, m, n, o, p; };
+
+[[gnu::noipa]] void
+foo (struct U *)
+{
+}
+
+[[gnu::noipa]] void
+bar ()
+{
+  U u[4];
+  __builtin_mem

[PATCH] c++: modules, generic lambda, constexpr if

2025-01-09 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In std/ranges/concat/1.cc we end up instantiating
concat_view::iterator::operator-, which has nested generic lambdas, where
the innermost is all constexpr if.  tsubst_lambda_expr propagates
the returns_* flags for generic lambdas since we might not substitute into
the whole function, as in this case with constexpr if.  But the module
wasn't preserving that flag, and so the importer gave a bogus "no return
statement" diagnostic.

gcc/cp/ChangeLog:

* module.cc (trees_out::write_function_def): Write returns* flags.
(struct post_process_data): Add returns_* flags.
(trees_in::read_function_def): Set them.
(module_state::read_cluster): Use them.

gcc/testsuite/ChangeLog:

* g++.dg/modules/constexpr-if-1_a.C: New test.
* g++.dg/modules/constexpr-if-1_b.C: New test.
---
 gcc/cp/module.cc  | 24 ---
 .../g++.dg/modules/constexpr-if-1_a.C | 14 +++
 .../g++.dg/modules/constexpr-if-1_b.C |  8 +++
 3 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/constexpr-if-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/constexpr-if-1_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 5350e6c4bad..0533a2bcf2c 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2929,6 +2929,10 @@ struct post_process_data {
   tree decl;
   location_t start_locus;
   location_t end_locus;
+  bool returns_value;
+  bool returns_null;
+  bool returns_abnormally;
+  bool infinite_loop;
 };
 
 /* Tree stream reader.  Note that reading a stream doesn't mark the
@@ -12263,10 +12267,16 @@ trees_out::write_function_def (tree decl)
 {
   unsigned flags = 0;
 
+  flags |= 1 * DECL_NOT_REALLY_EXTERN (decl);
   if (f)
-   flags |= 2;
-  if (DECL_NOT_REALLY_EXTERN (decl))
-   flags |= 1;
+   {
+ flags |= 2;
+ /* These flags are needed in tsubst_lambda_expr.  */
+ flags |= 4 * f->language->returns_value;
+ flags |= 8 * f->language->returns_null;
+ flags |= 16 * f->language->returns_abnormally;
+ flags |= 32 * f->language->infinite_loop;
+   }
 
   u (flags);
 }
@@ -12314,6 +12324,10 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
 {
   pdata.start_locus = state->read_location (*this);
   pdata.end_locus = state->read_location (*this);
+  pdata.returns_value = flags & 4;
+  pdata.returns_null = flags & 8;
+  pdata.returns_abnormally = flags & 16;
+  pdata.infinite_loop = flags & 32;
 }
 
   if (get_overrun ())
@@ -16232,6 +16246,10 @@ module_state::read_cluster (unsigned snum)
   cfun->language->base.x_stmt_tree.stmts_are_full_exprs_p = 1;
   cfun->function_start_locus = pdata.start_locus;
   cfun->function_end_locus = pdata.end_locus;
+  cfun->language->returns_value = pdata.returns_value;
+  cfun->language->returns_null = pdata.returns_null;
+  cfun->language->returns_abnormally = pdata.returns_abnormally;
+  cfun->language->infinite_loop = pdata.infinite_loop;
 
   if (abstract)
;
diff --git a/gcc/testsuite/g++.dg/modules/constexpr-if-1_a.C 
b/gcc/testsuite/g++.dg/modules/constexpr-if-1_a.C
new file mode 100644
index 000..80a064f4d39
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/constexpr-if-1_a.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules }
+
+export module M;
+
+export
+template 
+inline void f()
+{
+  []() -> int {
+if constexpr (M > 0) { return true; }
+else { return false; }
+  };
+}
diff --git a/gcc/testsuite/g++.dg/modules/constexpr-if-1_b.C 
b/gcc/testsuite/g++.dg/modules/constexpr-if-1_b.C
new file mode 100644
index 000..af285da79ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/constexpr-if-1_b.C
@@ -0,0 +1,8 @@
+// { dg-additional-options -fmodules }
+
+import M;
+
+int main()
+{
+  f();
+}

base-commit: fab96de044f1f023f52d43af866205d17d8895fb
-- 
2.47.1

Re: [PATCH] RISC-V: testsuite: fix target selector for sync_char_short

2025-01-09 Thread Jeff Law





On 1/9/25 11:33 AM, Edwin Lu wrote:

The effective-target selector for riscv on sync_char_short did not
check to see if atomics were enabled. As a result, these test cases were
ran on targets without the a extension. Add additional checks for zalrsc
or zabha extensions.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix effective target sync_char_short
for riscv*-*-*

OK
jeff

RE: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop conditions

2025-01-09 Thread Tamar Christina

Hi,

Sorry for the slow reply.

I ran the numbers filliping the probabilities.  Most of it was in the noise and 
not statistically relevant.
However there were some outliers:

+---+---+---+---++
| benchmark | container | Type  | Size  | difference |
+---+---+---+---++
| find  | unordered | string| 13| 42.00% |
| find  | unordered | string| 13| 41.30% |
| find  | unordered | uint64_t  | 11253 | 21.40% |
| find  | unordered | uint64_t  | 11253 | 20.90% |
| find  | unordered | shared string | 11253 | 18.60% |
| find  | unordered | uint64_t  | 11253 | 15.60% |
| find  | unordered | uint64_t  | 13| 13.50% |
| find many | unordered | string| 345   | 11.20% |
| find many | unordered | string| 345   | 8.60%  |
| find  | vector| uint64_t  | 11253 | -23.00%|
| find  | custom| uint64_t  | 11253 | -18.40%|
+---+---+---+---++

It looks like for unordered maps when the number of entries is either small or 
the element is placed in a what I
assume to be crowded bucket.

It does seem to be beneficial for some user defined datatypes, I assume due to 
some IPA shenanigans.  But overall
there were more and larger wins using probability of 0 rather than 1.

Kind regards,
Tamar

From: Tamar Christina 
Sent: Thursday, January 2, 2025 11:02 PM
To: François Dumont ; Jonathan Wakely 
Cc: gcc-patches@gcc.gnu.org; nd ; libstd...@gcc.gnu.org
Subject: RE: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop 
conditions

Hi,

> It means that we consider that hasher is not perfect, we have several entries 
> in the same bucket. Shouldn't we reward those that are spending time on their 
> hasher to make it as perfect as possible ?

I don’t think it makes much of a difference for a perfect hashtable as you’re 
exiting on the first iteration anyway.  So taking the branch on iteration 0 
shouldn’t be an issue.

> Said differently is using 1 in the __builtin_expect changes the produced 
> figures a lot ?

I expect it to be slower since the entire loop is no longer in a single fetch 
block. But I’ll run the numbers with this change.

Thanks,
Tamar

From: François Dumont mailto:frs.dum...@gmail.com>>
Sent: Monday, December 30, 2024 5:08 PM
To: Jonathan Wakely mailto:jwak...@redhat.com>>
Cc: Tamar Christina mailto:tamar.christ...@arm.com>>; 
gcc-patches@gcc.gnu.org; nd 
mailto:n...@arm.com>>; 
libstd...@gcc.gnu.org
Subject: Re: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop 
conditions

Sorry to react so late on this patch.

I'm only surprised by the expected result of the added __builtin_expect which 
is 0.

It means that we consider that hasher is not perfect, we have several entries 
in the same bucket. Shouldn't we reward those that are spending time on their 
hasher to make it as perfect as possible ?

Said differently is using 1 in the __builtin_expect changes the produced 
figures a lot ?

François

On Wed, Dec 18, 2024 at 5:01 PM Jonathan Wakely 
mailto:jwak...@redhat.com>> wrote:
On Wed, 18 Dec 2024 at 14:14, Tamar Christina 
mailto:tamar.christ...@arm.com>> wrote:
>
> > e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931a
> > > ffcf71d826c88094 100644
> > > > --- a/libstdc++-v3/include/bits/hashtable.h
> > > > +++ b/libstdc++-v3/include/bits/hashtable.h
> > > > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >   if (this->_M_equals(__k, __code, *__p))
> > > > return __prev_p;
> > > >
> > > > - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > > > + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > >_M_next())
> > > != __bkt, 0))
> > > > break;
> > > >   __prev_p = __p;
> > > > }
> > > > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > if (this->_M_equals_tr(__k, __code, *__p))
> > > >   return __prev_p;
> > > >
> > > > -   if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != 
> > > > __bkt)
> > > > +   if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > > >_M_next()) != __bkt, 0))
> > > >   break;
> > > > __prev_p = __p;
> > > >   }
> > > > @@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >pointer_to(const_cast<__node_base&>(_M_before_begin));
> > > >   while (__loc._M_before->_M_nxt)
> > > > {
> > > > - if (this->_M_key_equals(__k, *__loc._M_node()))
> > > > + if (__builtin_expect (this->_M_key_equals(__k, 
> > > > *__loc._M_node()), 1))
> > > > return __loc;
> > > >   __loc._M_before = __loc._M_before->_M_nxt;
> > > > }
> > >
> > >

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-09 Thread Richard Sandiford

Akram Ahmad  writes:
> Hi Kyrill,
>
> Thanks for the feedback on V2. I found a pattern which works for
> the open-coded signed arithmetic, and I've implemented the other
> feedback you provided as well.
>
> I've send the modified patch in this thread as the SVE patch [2/2]
> hasn't been changed, but I'm happy to send the entire V3 patch
> series as a new thread if that's easier. Patch continues below.
>
> If this is OK, please could you commit on my behalf?
>
> Many thanks,
>
> Akram
>
> ---
>
> This renames the existing {s,u}q{add,sub} instructions to use the
> standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
> IFN_SAT_SUB.
>
> The NEON intrinsics for saturating arithmetic and their corresponding
> builtins are changed to use these standard names too.
>
> Using the standard names for the instructions causes 32 and 64-bit
> unsigned scalar saturating arithmetic to use the NEON instructions,
> resulting in an additional (and inefficient) FMOV to be generated when
> the original operands are in GP registers. This patch therefore also
> restores the original behaviour of using the adds/subs instructions
> in this circumstance.
>
> Furthermore, this patch introduces a new optimisation for signed 32
> and 64-bit scalar saturating arithmetic which uses adds/subs in place
> of the NEON instruction.
>
> Addition, before:
>   fmovd0, x0
>   fmovd1, x1
>   sqadd   d0, d0, d1
>   fmovx0, d0
>
> Addition, after:
>   asr x2, x1, 63
>   addsx0, x0, x1
>   eor x2, x2, 0x8000
>   csinv   x0, x0, x2, vc
>
> In the above example, subtraction replaces the adds with subs and the
> csinv with csel. The 32-bit case follows the same approach. Arithmetic
> with a constant operand is simplified further by directly storing the
> saturating limit in the temporary register, resulting in only three
> instructions being used. It is important to note that this only works
> when early-ra is disabled due to an early-ra bug which erroneously
> assigns FP registers to the operands; if early-ra is enabled, then the
> original behaviour (NEON instruction) occurs.

This can be fixed by changing:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op))
return true;
  break;

to:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
return true;
  break;

But I can test & post that as a follow-up if you prefer.

> Additional tests are written for the scalar and Adv. SIMD cases to
> ensure that the correct instructions are used. The NEON intrinsics are
> already tested elsewhere. The signed scalar case is also tested with
> an execution test to check the results.

It looks like this is based on a relatively old version of trunk.
(Probably from the same time as v1?  Sorry that this has been in review
for a while.)

Otherwise it mostly LGTM too, but some additional comments on top of Kyrill's:

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index e456f693d2f..ef5e2823673 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -5230,15 +5230,225 @@
>  )
>  ;; q
>  
> -(define_insn "aarch64_q"
> -  [(set (match_operand:VSDQ_I 0 "register_operand" "=w")
> - (BINQOPS:VSDQ_I (match_operand:VSDQ_I 1 "register_operand" "w")
> - (match_operand:VSDQ_I 2 "register_operand" "w")))]
> +(define_insn "s3"
> +  [(set (match_operand:VSDQ_I_QI_HI 0 "register_operand" "=w")
> + (BINQOPS:VSDQ_I_QI_HI (match_operand:VSDQ_I_QI_HI 1 "register_operand" 
> "w")
> +   (match_operand:VSDQ_I_QI_HI 2 "register_operand" 
> "w")))]

Very minor, wouldn't have raised it if it wasn't for the comments
below, but: it'd be good to put the input match_operands on their own
line now that this overflows 80 chars.

>"TARGET_SIMD"
>"q\\t%0, %1, %2"
>[(set_attr "type" "neon_q")]
>  )
>  
> +(define_expand "s3"
> +  [(parallel [(set (match_operand:GPI 0 "register_operand")
> + (SBINQOPS:GPI (match_operand:GPI 1 "register_operand")
> +   (match_operand:GPI 2 "aarch64_plus_operand")))
> +(clobber (scratch:GPI))
> +(clobber (reg:CC CC_REGNUM))])]

Likewise very minor, but I think a more usual formatting would be:

  [(parallel
 [(set (match_operand:GPI 0 "register_operand")
   (SBINQOPS:GPI (match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")))
 (clobber (scratch:GPI))
 (clobber (reg:CC CC_REGNUM))])]

> +)
> +
> +;; Introducing a temporary GP reg allows signed saturating arithmetic with 
> GPR
> +;; operands to be calculated without the use of costly transfers to and from 
> FP
> +;; registers.  For example, saturating addition usually uses three FMOVs:
> +;;
> +;;   fmovd0, x0
> +;;   fmovd1, x1
> +;;   sqadd   d0, d0, d1
> +;;   fmovx0, d0
> +;;
>

[PATCH]AArch64: correct Cortex-X4 MIDR

2025-01-09 Thread Tamar Christina

Hi All,

The Parts Num field for the MIDR for Cortex-X4 is wrong.  It's currently the
parts number for a Cortex-A720 (which does have the right number).

The correct number can be found in the Cortex-X4 Technical Reference Manual [1]
on page 382 in Issue Number 5.

[1] https://developer.arm.com/documentation/102484/latest/

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport to GCC-14?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
num.

---
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
caf61437d1805254b7453e74ea27d2ca8f55d32b..5ac81332b67c9612acf9dde144aee5b0db8d9f7a
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -193,7 +193,7 @@ AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversev2, 0x41, 0xd4e, -1)
 
-AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, 
PROFILE), neoversev3, 0x41, 0xd81, -1)
+AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, 
PROFILE), neoversev3, 0x41, 0xd82, -1)
 AARCH64_CORE("cortex-x925", cortexx925, cortexa57, V9_2A,  (SVE2_BITPERM, 
MEMTAG, PROFILE), cortexx925, 0x41, 0xd85, -1)
 
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)




-- 
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index caf61437d1805254b7453e74ea27d2ca8f55d32b..5ac81332b67c9612acf9dde144aee5b0db8d9f7a 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -193,7 +193,7 @@ AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, I8MM, BF16), neoversev2, 0x41, 0xd4e, -1)
 
-AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, PROFILE), neoversev3, 0x41, 0xd81, -1)
+AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, PROFILE), neoversev3, 0x41, 0xd82, -1)
 AARCH64_CORE("cortex-x925", cortexx925, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, PROFILE), cortexx925, 0x41, 0xd85, -1)
 
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)

Re: [PATCH] c/c++: UX improvements to 'too {few,many} arguments' errors (v3) [PR118112]

2025-01-09 Thread Jason Merrill


On 1/9/25 7:00 PM, David Malcolm wrote:

On Thu, 2025-01-09 at 14:21 -0500, Jason Merrill wrote:

Thanks for taking a look...


On 1/9/25 2:11 PM, David Malcolm wrote:

@@ -4743,7 +4769,38 @@ convert_arguments (tree typelist, vec **values, tree fndecl,
 if (typetail && typetail != void_list_node)
{
  if (complain & tf_error)
-   error_args_num (input_location, fndecl,
/*too_many_p=*/false);
+   {
+ /* Not enough args.
+Determine minimum number of arguments required.  */
+ int min_expected_num = 0;
+ bool at_least_p = false;
+ tree iter = typelist;
+ while (true)
+   {
+ if (!iter)
+   {
+ /* Variadic arguments; stop iterating.  */
+ at_least_p = true;
+ break;
+   }
+ if (iter == void_list_node)
+   /* End of arguments; stop iterating.  */
+   break;
+ if (fndecl && TREE_PURPOSE (iter)
+ && TREE_CODE (TREE_PURPOSE (iter)) != DEFERRED_PARSE)



Why are you checking DEFERRED_PARSE?  That indicates a default
argument,
even if it isn't parsed yet.  For that case we should get the error
in
convert_default_arg rather than pretend there's no default argument.


I confess that the check for DEFERRED_PARSE was a rather mindless copy
and paste by me from the "See if there are default arguments that can be
used" logic earlier in the function.

I've removed it in the latest version of the patch.
  

+   {
+ /* Found a default argument; skip this one when
+counting minimum required.  */
+ at_least_p = true;
+ iter = TREE_CHAIN (iter);
+ continue;


We could break here, once you have a default arg the rest of the
parms
need to have them as well.


Indeed; I've updated this in the latest version of the patch, so
we break out as soon as we see an arg with a non-null TREE_PURPOSE.




+   }
+ ++min_expected_num;
+ iter = TREE_CHAIN (iter);
+   }
+ error_args_num (input_location, fndecl,
+ min_expected_num, actual_num, at_least_p);
+   }
  return -1;
}


Here's a v3 version of the patch, which is currently going through
my tester.

OK for trunk if it passes bootstrap®rtesting?


OK.


Thanks
Dave



Consider this case of a bad call to a callback function (perhaps
due to C23 changing the meaning of () in function decls):

struct p {
 int (*bar)();
};

void baz() {
 struct p q;
 q.bar(1);
}

Before this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'
 7 | q.bar(1);
   | ^

and the C++ frontend emits:

t.c: In function 'void baz()':
t.c:7:10: error: too many arguments to function
 7 | q.bar(1);
   | ~^~~

neither of which give the user much help in terms of knowing what
was expected, and where the relevant declaration is.

With this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'; expected 0, have 1
 7 | q.bar(1);
   | ^ ~
t.c:2:15: note: declared here
 2 | int (*bar)();
   |   ^~~

(showing the expected vs actual counts, the pertinent field decl, and
underlining the first extraneous argument at the callsite)

and the C++ frontend emits:

t.c: In function 'void baz()':
t.c:7:10: error: too many arguments to function; expected 0, have 1
 7 | q.bar(1);
   | ~^~~

(showing the expected vs actual counts; the other data was not accessible
without a more invasive patch)

Similarly, the patch also updates the "too few arguments" case to also
show expected vs actual counts.  Doing so requires a tweak to the
wording to say "at least" for the case of variadic fns, and for C++ fns
with default args, where e.g. previously the C FE emitted:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'
 5 |   callee ();
   |   ^~
s.c:1:6: note: declared here
 1 | void callee (const char *, ...);
   |  ^~

with this patch it emits:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'; expected at least 1, 
have 0
 5 |   callee ();
   |   ^~
s.c:1:6: note: declared here
 1 | void callee (const char *, ...);
   |  ^~

gcc/c/ChangeLog:
PR c/118112
* c-typeck.cc (inform_declaration): Add "function_expr" param and
use it for cases where we couldn't show the function decl to show
field decls for callbacks.
(build_function_call_vec): Add missing auto_diagnostic_group.
Update for new param of inform_declaration.

Re: [PATCH] c++, v2: Fix up ICEs on constexpr inline asm strings in templates [PR118277]

2025-01-09 Thread Jason Merrill


On 1/8/25 9:19 AM, Jakub Jelinek wrote:

On Tue, Jan 07, 2025 at 03:20:35PM -0800, Andi Kleen wrote:

There is one case I didn't handle and I'd like to discuss.
The initial commit to enable this new extension also changed
cp_parser_asm_specification_opt to use cp_parser_asm_string_expression.
That function doesn't have anything to do with asm statements though,
it is about asm redirection of declarations.


I don't know of a use case for this, so i guess it can be rejected


Ok.


@@ -30067,7 +30063,11 @@ cp_parser_asm_specification_opt (cp_parser* parser)
parens.require_open (parser);
  
/* Look for the string-literal.  */

+  token = cp_lexer_peek_token (parser->lexer);
tree asm_specification = cp_parser_asm_string_expression (parser);
+  if (TREE_CODE (asm_specification) != STRING_CST)
+error_at (token->location,
+ "% specification for declaration must be string");


Much easier to just revert your 2024-06-24 changes in that function.


Since you add a return of error_mark_node to finish_asm_stmt you
also need this patchlet:


You're right (just in case also changed it in parser.cc).


It needs a test case with constexpr errors too. In my version I had
a lot of trouble with them.


I've ported my c++26/static_assert1.C testcase for this, just left out

#if  __cpp_constexpr_dynamic_alloc >= 201907L
struct T {
   const char *d = init ();
   constexpr int size () const { return 4; }
   constexpr const char *data () const { return d; }
   constexpr const char *init () const { return new char[4] { 't', 'e', 's', 
't' }; }
   constexpr ~T () { delete[] d; }
};
#endif

void
foo ()
{
#if  __cpp_constexpr_dynamic_alloc >= 201907L
   asm ((T{}));
#endif
}

part which I think should work (it works with static_assert) but doesn't
really work with asm.  Will look at it incrementally, just need to work on
some mass rebuild ICEs and backporting now.

So far just lightly tested, ok for trunk if it passes full
bootstrap/regtest?


OK.


2025-01-08  Jakub Jelinek  

PR c++/118277
* cp-tree.h (finish_asm_string_expression): Declare.
* semantics.cc (finish_asm_string_expression): New function.
(finish_asm_stmt): Use it.
* parser.cc (cp_parser_asm_string_expression): Likewise.
Wrap string into PAREN_EXPR in the ("") case.
(cp_parser_asm_definition): Don't ICE if finish_asm_stmt
returns error_mark_node.
(cp_parser_asm_specification_opt): Revert 2024-06-24 changes.
* pt.cc (tsubst_stmt): Don't ICE if finish_asm_stmt returns
error_mark_node.

* g++.dg/cpp1z/constexpr-asm-4.C: New test.
* g++.dg/cpp1z/constexpr-asm-5.C: New test.

--- gcc/cp/cp-tree.h.jj 2025-01-07 13:13:31.042023537 +0100
+++ gcc/cp/cp-tree.h2025-01-08 12:17:32.349673711 +0100
@@ -7946,6 +7946,7 @@ enum {
  extern tree begin_compound_stmt   (unsigned int);
  
  extern void finish_compound_stmt		(tree);

+extern tree finish_asm_string_expression   (location_t, tree);
  extern tree finish_asm_stmt   (location_t, int, tree, tree,
 tree, tree, tree, bool, bool);
  extern tree finish_label_stmt (tree);
--- gcc/cp/semantics.cc.jj  2025-01-07 13:13:31.079023023 +0100
+++ gcc/cp/semantics.cc 2025-01-08 14:51:52.583352223 +0100
@@ -2133,6 +2133,29 @@ finish_compound_stmt (tree stmt)
add_stmt (stmt);
  }
  
+/* Finish an asm string literal, which can be a string literal

+   or parenthesized constant expression.  Extract the string literal
+   from the latter.  */
+
+tree
+finish_asm_string_expression (location_t loc, tree string)
+{
+  if (string == error_mark_node
+  || TREE_CODE (string) == STRING_CST
+  || processing_template_decl)
+return string;
+  string = cxx_constant_value (string, tf_error);
+  if (TREE_CODE (string) == STRING_CST)
+string = build1_loc (loc, PAREN_EXPR, TREE_TYPE (string),
+string);
+  cexpr_str cstr (string);
+  if (!cstr.type_check (loc))
+return error_mark_node;
+  if (!cstr.extract (loc, string))
+string = error_mark_node;
+  return string;
+}
+
  /* Finish an asm-statement, whose components are a STRING, some
 OUTPUT_OPERANDS, some INPUT_OPERANDS, some CLOBBERS and some
 LABELS.  Also note whether the asm-statement should be
@@ -2159,6 +2182,26 @@ finish_asm_stmt (location_t loc, int vol
  
oconstraints = XALLOCAVEC (const char *, noutputs);
  
+  string = finish_asm_string_expression (cp_expr_loc_or_loc (string, loc),

+string);
+  if (string == error_mark_node)
+   return error_mark_node;
+  for (int i = 0; i < 2; ++i)
+   for (t = i ? input_operands : output_operands; t; t = TREE_CHAIN (t))
+ {
+   tree s = TREE_VALUE (TREE_PURPOSE (t));
+   s = finish_asm_string_expression (cp_expr_loc_or_loc (s, loc), s);
+   if (s == error_mark_nod

Re: [PATCH] c++: Fix up modules handling of namespace scope structured bindings

2025-01-09 Thread Jason Merrill


On 1/7/25 2:48 PM, Jakub Jelinek wrote:

Hi!

With the following patch I actually get a simple namespace scope structured
binding working with modules.

The core_vals change ensure we actually save/restore DECL_VALUE_EXPR even
for namespace scope vars, the get_merge_kind is based on the assumption
that structured bindings are always unique, one can't redeclare them and
without it we really ICE because their base vars have no name.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2025-01-07  Jakub Jelinek  

* module.cc (trees_out::core_vals): Note DECL_VALUE_EXPR even for
vars outside of functions.
(trees_in::core_vals): Read in DECL_VALUE_EXPR even for vars outside
of functions.
(trees_out::get_merge_kind): Make DECL_DECOMPOSITION_P MK_unique.

* g++.dg/modules/decomp-2_b.C: New test.
* g++.dg/modules/decomp-2_a.H: New file.

--- gcc/cp/module.cc.jj 2025-01-03 17:54:12.411905971 +0100
+++ gcc/cp/module.cc2025-01-07 13:42:45.495623879 +0100
@@ -6313,7 +6313,11 @@ trees_out::core_vals (tree t)
  case VAR_DECL:
if (DECL_CONTEXT (t)
  && TREE_CODE (DECL_CONTEXT (t)) != FUNCTION_DECL)
-   break;
+   {
+ if (DECL_HAS_VALUE_EXPR_P (t))
+   WT (DECL_VALUE_EXPR (t));
+ break;
+   }
/* FALLTHROUGH  */
  
  case RESULT_DECL:

@@ -6843,7 +6847,14 @@ trees_in::core_vals (tree t)
  case VAR_DECL:
if (DECL_CONTEXT (t)
  && TREE_CODE (DECL_CONTEXT (t)) != FUNCTION_DECL)
-   break;
+   {
+ if (DECL_HAS_VALUE_EXPR_P (t))
+   {
+ tree val = tree_node ();
+ SET_DECL_VALUE_EXPR (t, val);
+   }
+ break;
+   }
/* FALLTHROUGH  */
  
  case RESULT_DECL:

@@ -10985,6 +10996,12 @@ trees_out::get_merge_kind (tree decl, de
break;
  }
  
+	if (DECL_DECOMPOSITION_P (decl))

+ {
+   mk = MK_unique;
+   break;
+ }
+
if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
  {
if (RECORD_OR_UNION_TYPE_P (ctx))
--- gcc/testsuite/g++.dg/modules/decomp-2_b.C.jj2025-01-07 
13:27:32.352323501 +0100
+++ gcc/testsuite/g++.dg/modules/decomp-2_b.C   2025-01-07 13:27:32.352323501 
+0100
@@ -0,0 +1,11 @@
+// { dg-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import "decomp-2_a.H";
+
+int
+main ()
+{
+  if (a != 1 || b != 2 || c != 3)
+__builtin_abort ();
+}
--- gcc/testsuite/g++.dg/modules/decomp-2_a.H.jj2025-01-07 
13:27:32.352323501 +0100
+++ gcc/testsuite/g++.dg/modules/decomp-2_a.H   2025-01-07 13:27:32.352323501 
+0100
@@ -0,0 +1,11 @@
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+struct A {
+  int a, b, c;
+};
+
+namespace {
+A d = { 1, 2, 3 };
+auto [a, b, c] = d;
+}

Jakub

[PATCH v2] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-09 Thread mengqinggang

Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: Not generate ">>".
---
Changes in v2:
- Change imm-load test to scan-assembler-not >>.

 gcc/config/loongarch/lasx.md  |  2 +-
 gcc/config/loongarch/loongarch-protos.h   |  2 +-
 gcc/config/loongarch/loongarch.cc | 14 ++--
 gcc/config/loongarch/loongarch.md | 34 ---
 gcc/config/loongarch/lsx.md   |  2 +-
 gcc/testsuite/gcc.target/loongarch/imm-load.c |  1 +
 6 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index edaf64eeb95..a37c85a25a4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -723,7 +723,7 @@ (define_insn "mov_lasx"
   [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
(match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
   "ISA_HAS_LASX"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
   [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
(set_attr "mode" "")
(set_attr "length" "8,4,4,4,4")])
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index fb544ad75ca..6601f767dab 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -86,7 +86,7 @@ extern void loongarch_split_move (rtx, rtx);
 extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode);
 extern void loongarch_split_plus_constant (rtx *, machine_mode);
 extern void loongarch_split_vector_move (rtx, rtx);
-extern const char *loongarch_output_move (rtx, rtx);
+extern const char *loongarch_output_move (rtx *);
 #ifdef RTX_CODE
 extern void loongarch_expand_scc (rtx *);
 extern void loongarch_expand_vec_cmp (rtx *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 89237c377e7..f26c1346acc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4721,8 +4721,10 @@ loongarch_split_vector_move (rtx dest, rtx src)
that SRC is operand 1 and DEST is operand 0.  */
 
 const char *
-loongarch_output_move (rtx dest, rtx src)
+loongarch_output_move (rtx *operands)
 {
+  rtx src = operands[1];
+  rtx dest = operands[0];
   enum rtx_code dest_code = GET_CODE (dest);
   enum rtx_code src_code = GET_CODE (src);
   machine_mode mode = GET_MODE (dest);
@@ -4875,13 +4877,19 @@ loongarch_output_move (rtx dest, rtx src)
   if (src_code == CONST_INT)
{
  if (LU12I_INT (src))
-   return "lu12i.w\t%0,%1>>12\t\t\t# %X1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 12);
+ return "lu12i.w\t%0,%1\t\t\t# %X1";
+   }
  else if (IMM12_INT (src))
return "addi.w\t%0,$r0,%1\t\t\t# %X1";
  else if (IMM12_INT_UNSIGNED (src))
return "ori\t%0,$r0,%1\t\t\t# %X1";
  else if (LU52I_INT (src))
-   return "lu52i.d\t%0,$r0,%X1>>52\t\t\t# %1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 52);
+ return "lu52i.d\t%0,$r0,%X1\t\t\t# %1";
+   }
  else
gcc_unreachable ();
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 3eff4077160..59f45770311 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2209,7 +2209,7 @@ (define_insn_and_split "*movdi_32bit"
   "!TARGET_64BIT
&& (register_operand (operands[0], DImode)
|| reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
   "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
   (operands[0]))"
   [(const_int 0)]
@@ -2228,7 +2228,9 @@ (define_insn_and_split "*movdi_64bit"
   "TARGET_64BIT
&& (register_operand (operands[0], DImode)
|| reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  {
+return loongarch_output_move (operands);
+  }
   "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
   (operands[0]))"
   [(const_int 0)]
@@ -2315,7 +2317,7 @@ (define_insn_and_split "*movsi_internal"
(match_op

Re: [pushed] [PATCH v1] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-09 Thread Lulu Cheng




在 2025/1/10 上午10:03, Lulu Cheng 写道:

Pushed to r15-6755.

Sorry, I replied to the wrong email.


在 2025/1/6 下午4:16, mengqinggang 写道:
Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and 
lu52i.d use

the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: New tests for lu12i.w, lu32i.d
and lu52i.d.
---
  gcc/config/loongarch/lasx.md  |  2 +-
  gcc/config/loongarch/loongarch-protos.h   |  2 +-
  gcc/config/loongarch/loongarch.cc | 14 ++--
  gcc/config/loongarch/loongarch.md | 34 ---
  gcc/config/loongarch/lsx.md   |  2 +-
  gcc/testsuite/gcc.target/loongarch/imm-load.c |  3 ++
  6 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index edaf64eeb95..a37c85a25a4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -723,7 +723,7 @@ (define_insn "mov_lasx"
    [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
  (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
    "ISA_HAS_LASX"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
    [(set_attr "type" 
"simd_move,simd_load,simd_store,simd_copy,simd_insert")

 (set_attr "mode" "")
 (set_attr "length" "8,4,4,4,4")])
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h

index fb544ad75ca..6601f767dab 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -86,7 +86,7 @@ extern void loongarch_split_move (rtx, rtx);
  extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, 
machine_mode);

  extern void loongarch_split_plus_constant (rtx *, machine_mode);
  extern void loongarch_split_vector_move (rtx, rtx);
-extern const char *loongarch_output_move (rtx, rtx);
+extern const char *loongarch_output_move (rtx *);
  #ifdef RTX_CODE
  extern void loongarch_expand_scc (rtx *);
  extern void loongarch_expand_vec_cmp (rtx *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc

index 89237c377e7..f26c1346acc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4721,8 +4721,10 @@ loongarch_split_vector_move (rtx dest, rtx src)
 that SRC is operand 1 and DEST is operand 0.  */
    const char *
-loongarch_output_move (rtx dest, rtx src)
+loongarch_output_move (rtx *operands)
  {
+  rtx src = operands[1];
+  rtx dest = operands[0];
    enum rtx_code dest_code = GET_CODE (dest);
    enum rtx_code src_code = GET_CODE (src);
    machine_mode mode = GET_MODE (dest);
@@ -4875,13 +4877,19 @@ loongarch_output_move (rtx dest, rtx src)
    if (src_code == CONST_INT)
  {
    if (LU12I_INT (src))
-    return "lu12i.w\t%0,%1>>12\t\t\t# %X1";
+    {
+  operands[1] = GEN_INT (INTVAL (operands[1]) >> 12);
+  return "lu12i.w\t%0,%1\t\t\t# %X1";
+    }
    else if (IMM12_INT (src))
  return "addi.w\t%0,$r0,%1\t\t\t# %X1";
    else if (IMM12_INT_UNSIGNED (src))
  return "ori\t%0,$r0,%1\t\t\t# %X1";
    else if (LU52I_INT (src))
-    return "lu52i.d\t%0,$r0,%X1>>52\t\t\t# %1";
+    {
+  operands[1] = GEN_INT (INTVAL (operands[1]) >> 52);
+  return "lu52i.d\t%0,$r0,%X1\t\t\t# %1";
+    }
    else
  gcc_unreachable ();
  }
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md

index 3eff4077160..59f45770311 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2209,7 +2209,7 @@ (define_insn_and_split "*movdi_32bit"
    "!TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
    "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
    (operands[0]))"
    [(const_int 0)]
@@ -2228,7 +2228,9 @@ (define_insn_and_split "*movdi_64bit"
    "TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  {
+    return loongarch_output_move (operands);
+  }
    "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
    (operands[0]))"
    [(const_int 0)]
@@ -2315,7 +2317,7 @@

Re:[pushed] [PATCH v2] LoongArch: Opitmize the cost of vec_construct.

2025-01-09 Thread Lulu Cheng


Pushed to r15-6755.

在 2025/1/7 下午9:04, chenxiaolong 写道:

   When analyzing 525 on LoongArch architecture, it was found that the
for loop of hotspot function x264_pixel_satd_8x4 could not be quantized
256-bit due to the cost of vec_construct setting. After re-adjusting
vec_construct, the performance of 525 program was improved by 16.57%.
It was found that this function can be vectorized on the aarch64 and
x86 architectures, see [PR98138].

Co-Authored-By: Deng Jianbo .

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Modify the
 construction cost of the vec_construct vector.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-slp-two-operator.c: New test.
---
  gcc/config/loongarch/loongarch.cc |  6 +--
  .../loongarch/vect-slp-two-operator.c | 38 +++
  2 files changed, 41 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 89237c377e7..ff27b96c31e 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4127,10 +4127,10 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
  
case vec_construct:

elements = TYPE_VECTOR_SUBPARTS (vectype);
-   if (ISA_HAS_LASX)
- return elements + 1;
+   if (LASX_SUPPORTED_MODE_P (mode) && !LSX_SUPPORTED_MODE_P (mode))
+ return elements / 2 + 3;
else
- return elements;
+ return elements / 2 + 1;
  
default:

gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c 
b/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c
new file mode 100644
index 000..43b46759902
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx -ftree-vectorize -fdump-tree-vect 
-fdump-tree-vect-details" } */
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint32_t;
+
+#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) \
+  {   \
+int t0 = s0 + s1; \
+int t1 = s0 - s1; \
+int t2 = s2 + s3; \
+int t3 = s2 - s3; \
+d0 = t0 + t2; \
+d1 = t1 + t3; \
+d2 = t0 - t2; \
+d3 = t1 - t3; \
+  }
+
+void sink (uint32_t tmp[4][4]);
+
+void
+x264_pixel_satd_8x4 (uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2)
+{
+  uint32_t tmp[4][4];
+  int sum = 0;
+  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
+{
+  uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
+  uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
+  uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
+  uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
+  HADAMARD4 (tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, a2, a3);
+}
+  sink (tmp);
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

Re: [PATCH] c++/modules: Handle chaining already-imported local types [PR114630]

2025-01-09 Thread Nathaniel Shead

On Thu, Jan 09, 2025 at 05:41:07PM -0500, Patrick Palka wrote:
> On Fri, 10 Jan 2025, Nathaniel Shead wrote:
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?
> 
> Nice approach, thanks for fixing this!
> 
> > 
> > -- >8 --
> > 
> > In the linked testcase, an ICE occurs because when reading the
> > (duplicate) function definition for _M_do_parse from module Y, the local
> > type definitions have already been streamed from module X and setup as
> > regular backreferences, rather than being found with find_duplicate,
> > causing issues with managing DECL_CHAIN.
> > 
> > It is tempting to just skip setting up the DECL_CHAIN for this case.
> > However, for the future it would be best to ensure that the block vars
> > for the duplicate definition are accurate, so that we could implement
> > ODR checking on function definitions at some point.
> > 
> > So to solve this, this patch creates a copy of the streamed-in local
> > type and chains that; it will be discarded along with the rest of the
> > duplicate function after we've finished processing.
> > 
> > A couple of suggested implementations from the discussion on the PR that
> > don't work:
> > 
> > - Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
> >   doesn't handle the case where type definitions are followed by regular
> >   local variables, since those won't have been imported as separate
> >   backreferences and so the chains will diverge.
> > 
> > - Correcting the purviewness of GMF template instantiations to force Y
> >   to emit copies of the local types rather than backreferences into X is
> >   insufficient, as it's still possible that the local types got streamed
> >   in a separate cluster to the function definition, and so will be again
> >   referred to via regular backreferences when importing.
> > 
> > - Likewise, preventing the emission of function definitions where an
> >   import has already provided that same definition also is insufficient,
> >   for much the same reason.
> > 
> > PR c++/114630
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_in::core_vals) : Chain a new node if
> > DECL_CHAIN already is set.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/pr114630.h: New test.
> > * g++.dg/modules/pr114630_a.C: New test.
> > * g++.dg/modules/pr114630_b.C: New test.
> > * g++.dg/modules/pr114630_c.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  gcc/cp/module.cc  | 14 +-
> >  gcc/testsuite/g++.dg/modules/pr114630.h   | 11 +++
> >  gcc/testsuite/g++.dg/modules/pr114630_a.C |  7 +++
> >  gcc/testsuite/g++.dg/modules/pr114630_b.C |  8 
> >  gcc/testsuite/g++.dg/modules/pr114630_c.C |  4 
> >  5 files changed, 43 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630.h
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_a.C
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_b.C
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr114630_c.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 5350e6c4bad..ff2683de73e 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -6928,11 +6928,23 @@ trees_in::core_vals (tree t)
> >body) anyway.  */
> > decl = maybe_duplicate (decl);
> >  
> > -   if (!DECL_P (decl) || DECL_CHAIN (decl))
> > +   if (!DECL_P (decl))
> >   {
> > set_overrun ();
> > break;
> >   }
> > +
> > +   /* If DECL_CHAIN is already set then this was a backreference to a
> > +  local type or enumerator from a previous read (PR c++/114630).
> > +  Let's copy the node so we can keep building the chain for ODR
> > +  checking later.  */
> > +   if (DECL_CHAIN (decl))
> > + {
> > +   gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
> > +&& find_duplicate (DECL_CONTEXT (decl)));
> > +   decl = copy_node (decl);
> 
> Shall we use copy_decl here instead so that any DECL_LANG_SPECIFIC node
> is copied as well?  IIUC we usually don't share DECL_LANG_SPECIFIC
> between decls.
> 

Happy to use copy_decl instead; I'll retest tonight and push tomorrow if
there's no issues.  Thanks!

> > + }
> > +
> > *chain = decl;
> > chain = &DECL_CHAIN (decl);
> >   }
> > diff --git a/gcc/testsuite/g++.dg/modules/pr114630.h 
> > b/gcc/testsuite/g++.dg/modules/pr114630.h
> > new file mode 100644
> > index 000..8730007f59f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr114630.h
> > @@ -0,0 +1,11 @@
> > +template 
> > +void _M_do_parse() {
> > +  struct A {};
> > +  struct B {};
> > +  int x;
> > +}
> > +
> > +template  struct formatter;
> > +template <> struct formatter {
> > +  void parse() { _M_do_parse(); }
> > +};
> > diff --git a/gcc/testsuite/g++.dg/modules/pr114630_a.C 
> > b/gcc/testsuite/g++.dg

[PATCH] [testsuite] rearrange requirements for dfp bitint run tests

2025-01-09 Thread Alexandre Oliva



dfp.exp sets the default to compile when dfprt is not available, but
some dfp bitint tests override the default without that requirement,
and try to run even when dfprt is not available.

Instead of overriding the default, rewrite the requirements so that
they apply even when compiling, since the absence of bitint or of
int128 would presumably cause compile failures.

Regstrapped on x86_64-linux-gnu.  Also tested with aarch64-elf and
arm-eabi on gcc-14, with dfp support (implicitly) disabled in libgcc.
Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/dfp/bitint-1.c: Rewrite requirements to retain dfprt.
* gcc.dg/dfp/bitint-2.c: Likewise.
* gcc.dg/dfp/bitint-3.c: Likewise.
* gcc.dg/dfp/bitint-4.c: Likewise.
* gcc.dg/dfp/bitint-5.c: Likewise.
* gcc.dg/dfp/bitint-6.c: Likewise.
* gcc.dg/dfp/bitint-7.c: Likewise.
* gcc.dg/dfp/bitint-8.c: Likewise.
* gcc.dg/dfp/int128-1.c: Likewise.
* gcc.dg/dfp/int128-2.c: Likewise.
* gcc.dg/dfp/int128-3.c: Likewise.
* gcc.dg/dfp/int128-4.c: Likewise.
---
 gcc/testsuite/gcc.dg/dfp/bitint-1.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-2.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-3.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-4.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-5.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-6.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-7.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-8.c |2 +-
 gcc/testsuite/gcc.dg/dfp/int128-1.c |3 ++-
 gcc/testsuite/gcc.dg/dfp/int128-2.c |3 ++-
 gcc/testsuite/gcc.dg/dfp/int128-3.c |3 ++-
 gcc/testsuite/gcc.dg/dfp/int128-4.c |3 ++-
 12 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-1.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-1.c
index ab826e16ba390..1493bf3c52f02 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-1.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-1.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-2.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-2.c
index 68cce0e66521c..1ed5be8929f2d 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-2.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-2.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-3.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-3.c
index 911bf8afb3083..11997ddbea698 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-3.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-3.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-4.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-4.c
index 0b6011055786e..0e600160752be 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-4.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-4.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-5.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-5.c
index 37d373cdf320a..b7f7484d225b7 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-5.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-5.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-6.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-6.c
index eb137a60e4b7d..e9c538015f4a0 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-6.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-6.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2 -std=c23 -pedantic-errors" } */
 
 #if __BITINT_MAXWIDTH__ >= 192
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-7.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-7.c
index 49e8103723cb2..530a26c47e51d 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-7.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-7.c
@@ -1,6 +1,6 @@
 /* PR c/102989 */
 /* Test non-canonical BID significands.  */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-require-effective-target dfp_bid } */
 /* { dg-options "-std=gnu23 -O2" } */
 
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-8.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-8.c
index 18263e2bd7533..2990877a2fbd0 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-8.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-8.c
@@ -1,5 +1,5 @@
 /* PR c/102989 */
-/* { dg-do run { target

Re: [PATCH] rtl: Remove invalid compare simplification [PR117186]

2025-01-09 Thread Richard Biener

On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford
 wrote:
>
> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
> added code to treat:
>
>   (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))
>
> as a nop.  This PR shows that that isn't always correct.
> The compare in the set above is between two 0/1 booleans (at least
> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
> produced the incoming (reg:CC cc) is unconstrained; it could be between
> arbitrary integers, or even floats.  The fold is therefore replacing a
> cc that is valid for both signed and unsigned comparisons with one that
> is only known to be valid for signed comparisons.
>
>   (gt (compare (gt cc 0) (lt cc 0) 0)
>
> does simplify to:
>
>   (gt cc 0)
>
> but:
>
>   (gtu (compare (gt cc 0) (lt cc 0) 0)
>
> does not simplify to:
>
>   (gtu cc 0)
>
> The optimisation didn't come with a testcase, but it was added for
> i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
> as it once did, since it's now conditional on -minline-all-stringops.
> But the patch is almost 25 years old, so whatever the original
> motivation was, it seems likely that other things now rely on it.
>
> It therefore seems better to try to preserve the optimisation on rtl
> rather than get rid of it.  To do that, we need to look at how the
> result of the outer compare is used.  We'd therefore be looking at four
> instructions (the gt, the lt, the compare, and the use of the compare),
> but combine already allows that for 3-instruction combinations thanks
> to:
>
>   /* If the source is a COMPARE, look for the use of the comparison result
>  and try to simplify it unless we already have used undobuf.other_insn.  
> */
>
> When applied to boolean inputs, a comparison operator is
> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
> simplify_logical_relational_operation already had code to simplify
> logical operators between two comparison results, but:
>
> * It only handled IOR, which doesn't cover all the cases needed here.
>   The others are easily added.
>
> * It treated comparisons of integers as having an ORDERED/UNORDERED result.
>   Therefore:
>
>   * it would not treat "true for LT + EQ + GT" as "always true" for
> comparisons between integers, because the mask excluded the UNORDERED
> condition.
>
>   * it would try to convert "true for LT + GT" into LTGT even for comparisons
> between integers.  To prevent an ICE later, the code used:
>
>/* Many comparison codes are only valid for certain mode classes.  */
>if (!comparison_code_valid_for_mode (code, mode))
>  return 0;
>
> However, this used the wrong mode, since "mode" is here the integer
> result of the comparisons (and the mode of the IOR), not the mode of
> the things being compared.  Thus the effect was to reject all
> floating-point-only codes, even when comparing floats.
>
>   I think instead the code should detect whether the comparison is between
>   integer values and remove UNORDERED from consideration if so.  It then
>   always produces a valid comparison (or an always true/false result),
>   and so comparison_code_valid_for_mode is not needed.  In particular,
>   "true for LT + GT" becomes NE for comparisons between integers but
>   remains LTGT for comparisons between floats.
>
> * There was a missing check for whether the comparison inputs had
>   side effects.
>
> While there, it also seemed worth extending
> simplify_logical_relational_operation to unsigned comparisons, since
> that makes the testing easier.
>
> As far as that testing goes: the patch exhaustively tests all
> combinations of integer comparisons in:
>
>   (cmp1 (cmp2 X Y) (cmp3 X Y))
>
> for the 10 integer comparisons, giving 1000 fold attempts in total.
> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
> on the result of the fold, giving 9 checks per fold, or 9000 in total.
> That's probably more than is typical for self-tests, but it seems to
> complete in neglible time, even for -O0 builds.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK.

> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose
> the patch probably isn't suitable for backports.

Maybe for GCC 14, but not without some soaking time of course.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR rtl-optimization/117186
> * rtl.h (simplify_context::simplify_logical_relational_operation): Add
> an invert0_p parameter.
> * simplify-rtx.cc (unsigned_comparison_to_mask): New function.
> (mask_to_unsigned_comparison): Likewise.
> (comparison_code_valid_for_mode): Delete.
> (simplify_context::simplify_logical_relational_operation): Add
> an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
> comparisons.  Handle always-false results.  Ignore t

Re: [PATCH] arm: [MVE intrinsics] Another fix for moves of tuples (PR target/118131)

2025-01-09 Thread Richard Earnshaw (lists)

On 20/12/2024 22:53, Christophe Lyon wrote:
> Commit r15-6389-g670df03e5294a3 only partially fixed support for moves
> of large modes: despite the introduction of V2x* and V4x* modes in
> r15-6245-g4f4e13dd235b to support MVE tuples, we still need to support
> TI, OI and XI modes, which appear for instance in gcc.dg/pr100887.c.
> 
> The problem was noticed when running the testsuite with
> -mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
> where several tests would ICE in output_move_neon.
> 
> gcc/ChangeLog:
> 
>   PR target/118131
>   * config/arm/arm.h (VALID_MVE_STRUCT_MODE): Accept TI, OI and XI
>   modes again.

OK.

R.

Re: [PATCH] ifcombine field-merge: improve handling of dwords

2025-01-09 Thread Richard Biener

On Sat, Dec 21, 2024 at 6:05 AM Alexandre Oliva  wrote:
>
> On Dec 20, 2024, Jakub Jelinek  wrote:
>
> > On Wed, Dec 18, 2024 at 12:59:11AM -0300, Alexandre Oliva wrote:
> >> * gcc.dg/field-merge-16.c: New.
>
> > Note the test FAILs on i686-linux or on x86_64-linux with -m32.
>
> Indeed, thanks.  Here's a fix.
>
>
> On 32-bit hosts, data types with 64-bit alignment aren't getting
> treated as desired by ifcombine field-merging: we limit the choice of
> modes at BITS_PER_WORD sizes, but when deciding the boundary for a
> split, we'd limit the choice only by the alignment, so we wouldn't
> even consider a split at an odd 32-bit boundary.  Fix that by limiting
> the boundary choice by word choice as well.
>
> Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
> data structures unhandled by ifcombine on 32-bit hosts.  We already
> need to loading them as double words, and if they're not byte-aligned,
> the code gets really ugly, but ifcombine could improve it if it allows
> double-word loads as a last resort.  I've added that.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Thanks,
Richard.

>
> for  gcc/ChangeLog
>
> * gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
> boundary choice by word size as well.  Try aligned double-word
> loads as a last resort.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/field-merge-17.c: New.
> ---
>  gcc/gimple-fold.cc|   30 +++---
>  gcc/testsuite/gcc.dg/field-merge-17.c |   46 
> +
>  2 files changed, 73 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-17.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 2d6e2074416f5..0e832158a47b3 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8381,16 +8381,40 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>  {
>/* Consider the possibility of recombining loads if any of the
>  fields straddles across an alignment boundary, so that either
> -part can be loaded along with the other field.  */
> +part can be loaded along with the other field.  Since we
> +limit access modes to BITS_PER_WORD, don't exceed that,
> +otherwise on a 32-bit host and a 64-bit-aligned data
> +structure, we'll fail the above for a field that straddles
> +across two words, and would fail here for not even trying to
> +split it at between 32-bit words.  */
>HOST_WIDE_INT boundary = compute_split_boundary_from_align
> -   (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +   (MIN (ll_align, BITS_PER_WORD),
> +ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
>
>if (boundary < 0
>   || !get_best_mode (boundary - first_bit, first_bit, 0, 
> ll_end_region,
>  ll_align, BITS_PER_WORD, volatilep, &lnmode)
>   || !get_best_mode (end_bit - boundary, boundary, 0, ll_end_region,
>  ll_align, BITS_PER_WORD, volatilep, &lnmode2))
> -   return 0;
> +   {
> + if (ll_align <= BITS_PER_WORD)
> +   return 0;
> +
> + /* As a last resort, try double-word access modes.  This
> +enables us to deal with misaligned double-word fields
> +that straddle across 3 separate words.  */
> + boundary = compute_split_boundary_from_align
> +   (MIN (ll_align, 2 * BITS_PER_WORD),
> +ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> + if (boundary < 0
> + || !get_best_mode (boundary - first_bit, first_bit,
> +0, ll_end_region, ll_align, 2 * 
> BITS_PER_WORD,
> +volatilep, &lnmode)
> + || !get_best_mode (end_bit - boundary, boundary,
> +0, ll_end_region, ll_align, 2 * 
> BITS_PER_WORD,
> +volatilep, &lnmode2))
> +   return 0;
> +   }
>
>/* If we can't have a single load, but can with two, figure out whether
>  the two compares can be separated, i.e., whether the entirety of the
> diff --git a/gcc/testsuite/gcc.dg/field-merge-17.c 
> b/gcc/testsuite/gcc.dg/field-merge-17.c
> new file mode 100644
> index 0..06c8ec16e86c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-17.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-ifcombine-details" } */
> +
> +/* Check that we can optimize misaligned double-words.  */
> +
> +struct s {
> +  short a;
> +  long long b;
> +  int c;
> +  long long d;
> +  short e;
> +} __attribute__ ((packed, aligned (8)));
> +
> +struct s p = { 0, 0, 0, 0, 0 };
> +
> +__attribute__ ((__noinline__, __noipa__, __noclone__))
> +int fp ()
> +{
> +  if (p.a
> +  || p.b
> +  || p.c
> +  || p.d
> +  || p.e)
> +return 1;
> +  else
> +retur

[PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-09 Thread alfie.richards


This patch adds a warning when FMV is used for Aarch64.

The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
potential future compatability issues.

There is a patch for an ACLE compliant version of target_version and
target_clone in progress but it won't make gcc-15.

This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_mangle_decl_assembler_name): Add experimental warning.
* config/aarch64/aarch64.opt: Add command line option to disable
warning.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Add CLI flag
* g++.target/aarch64/mv-symbols1.C: Add CLI flag
* g++.target/aarch64/mv-symbols2.C: Add CLI flag
* g++.target/aarch64/mv-symbols3.C: Add CLI flag
* g++.target/aarch64/mv-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-symbols5.C: Add CLI flag
* g++.target/aarch64/mvc-symbols1.C: Add CLI flag
* g++.target/aarch64/mvc-symbols2.C: Add CLI flag
* g++.target/aarch64/mvc-symbols3.C: Add CLI flag
* g++.target/aarch64/mvc-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-warning1.C: New test.
---
 gcc/config/aarch64/aarch64.cc   |  4 
 gcc/config/aarch64/aarch64.opt  |  4 
 gcc/doc/invoke.texi | 11 ++-
 gcc/testsuite/g++.target/aarch64/mv-1.C |  1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols1.C  |  1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols2.C  |  1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols3.C  |  1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols4.C  |  1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols5.C  |  1 +
 gcc/testsuite/g++.target/aarch64/mv-warning1.C  |  9 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols1.C |  1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols2.C |  1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols3.C |  1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols4.C |  1 +
 14 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 91de13159cb..7d64e99b76b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20347,6 +20347,10 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
   if (TREE_CODE (decl) == FUNCTION_DECL
   && DECL_FUNCTION_VERSIONED (decl))
 {
+  warning_at (DECL_SOURCE_LOCATION(decl),  OPT_Wexperimental_fmv_target,
+		  "Function Multi Versioning support is experimental, and the "
+		  "behavior is likely to change");
+
   aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version (decl);
 
   std::string name = IDENTIFIER_POINTER (id);
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 36bc719b822..2a8dd8ea66c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -431,3 +431,7 @@ handling.  One means we try to form pairs involving one or more existing
 individual writeback accesses where possible.  A value of two means we
 also try to opportunistically form writeback opportunities by folding in
 trailing destructive updates of the base register used by a pair.
+
+Wexperimental-fmv-target
+Target Var(warn_experimental_fmv) Warning Init(1)
+Warn about usage of experimental Function Multi Versioning.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 51dc871e6bc..bdf9ee1bc0c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -822,7 +822,8 @@ Objective-C and Objective-C++ Dialects}.
 -moverride=@var{string}  -mverbose-cost-dump
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg}
 -mstack-protector-guard-offset=@var{offset} -mtrack-speculation
--moutline-atomics -mearly-ldp-fusion -mlate-ldp-fusion}
+-moutline-atomics -mearly-ldp-fusion -mlate-ldp-fusion
+-Wexperimental-fmv-target}
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs
@@ -22087,6 +22088,14 @@ which specify use of that register as a fixed register,
 and @samp{none}, which means that no register is used for this
 purpose.  The default is @option{-m1reg-none}.
 
+@opindex Wexperimental-fmv-target
+@opindex Wno-experimental-fmv-target
+@item -Wexperimental-fmv-target
+Warn about use of experimental Function Multi Versioning.
+The Arm C Language Extension specification for Function Multi Versioning
+is beta and subject to change. Any usage of FMV is caveated that future
+behavior change and incompatibility is likely.
+
 @end table
 
 @node AMD GCN Options
diff --git a/gcc/testsuite/g++.target/aarch64/mv-1.C b/gcc/testsuite/g++.target/aarch64/mv-1.C
index b4b0e5e3fea..b1003

nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181] (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2025-01-09 Thread Thomas Schwinge

Hi!

On 2024-09-20T18:49:46+0200, I wrote:
> We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler"
> to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04).
> This is, primarily, so that we're able to use 'alloca' and related stack
> manipulation instructions, and improve upon the current:
>
> sorry ("target cannot support alloca");

Pushed to trunk branch commit 3861d362ec7e3c50742fc43833fe9d8674f4070e
"nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]", see
attached.


Grüße
 Thomas


>From 3861d362ec7e3c50742fc43833fe9d8674f4070e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sat, 7 Dec 2024 00:17:49 +0100
Subject: [PATCH] nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+
 [PR65181]

..., and use it for '-mno-soft-stack': PTX "native" stacks.

	PR target/65181
	gcc/
	* config/nvptx/nvptx.cc (nvptx_get_drap_rtx): Handle
	'!TARGET_SOFT_STACK'.
	* config/nvptx/nvptx.md (define_c_enum "unspec"): Add
	'UNSPEC_STACKSAVE', 'UNSPEC_STACKRESTORE'.
	(define_expand "allocate_stack", define_expand "save_stack_block")
	(define_expand "save_stack_block"): Handle '!TARGET_SOFT_STACK',
	PTX 'alloca'.
	(define_insn "@nvptx_alloca_")
	(define_insn "@nvptx_stacksave_")
	(define_insn "@nvptx_stackrestore_"): New.
	* doc/invoke.texi (Nvidia PTX Options): Update '-msoft-stack',
	'-mno-soft-stack'.
	* doc/sourcebuild.texi (nvptx-specific attributes): Document
	'nvptx_runtime_alloca_ptx'.
	(Add Options): Document 'nvptx_alloca_ptx'.
	gcc/testsuite/
	* gcc.target/nvptx/alloca-1.c: Evolve into...
	* gcc.target/nvptx/alloca-1-O0.c: ... this, ...
	* gcc.target/nvptx/alloca-1-O1.c: ... this, and...
	* gcc.target/nvptx/alloca-1-sm_30.c: ... this.
	* gcc.target/nvptx/vla-1.c: Evolve into...
	* gcc.target/nvptx/vla-1-O0.c: ... this, ...
	* gcc.target/nvptx/vla-1-O1.c: ... this, and...
	* gcc.target/nvptx/vla-1-sm_30.c: ... this.
	* gcc.c-torture/execute/pr36321.c: Adjust.
	* gcc.target/nvptx/__builtin_alloca_0-1-O0.c: Likewise.
	* gcc.target/nvptx/__builtin_alloca_0-1-O1.c: Likewise.
	* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1.c:
	Likewise.
	* gcc.target/nvptx/softstack.c: Likewise.
	* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1-sm_30.c:
	New.
	* gcc.target/nvptx/alloca-2-O0.c: Likewise.
	* gcc.target/nvptx/alloca-3-O1.c: Likewise.
	* gcc.target/nvptx/alloca-4-O3.c: Likewise.
	* gcc.target/nvptx/alloca-5.c: Likewise.
	* lib/target-supports.exp (check_effective_target_alloca): Adjust.
	(check_nvptx_default_ptx_isa_target_architecture_at_least)
	(check_nvptx_runtime_ptx_isa_target_architecture_at_least)
	(check_effective_target_nvptx_runtime_alloca_ptx)
	(add_options_for_nvptx_alloca_ptx): New.
	libgomp/
	* fortran.c (omp_get_device_from_uid_): Adjust.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
---
 gcc/config/nvptx/nvptx.cc |   4 +-
 gcc/config/nvptx/nvptx.md |  92 ---
 gcc/doc/invoke.texi   |  13 ++-
 gcc/doc/sourcebuild.texi  |   6 +
 gcc/testsuite/gcc.c-torture/execute/pr36321.c |   3 +
 .../nvptx/__builtin_alloca_0-1-O0.c   |   2 +
 .../nvptx/__builtin_alloca_0-1-O1.c   |   2 +
 ...ack_save___builtin_stack_restore-1-sm_30.c |  28 +
 ...tin_stack_save___builtin_stack_restore-1.c |   8 +-
 gcc/testsuite/gcc.target/nvptx/alloca-1-O0.c  |  49 
 gcc/testsuite/gcc.target/nvptx/alloca-1-O1.c  |  33 ++
 .../nvptx/{alloca-1.c => alloca-1-sm_30.c}|   1 +
 gcc/testsuite/gcc.target/nvptx/alloca-2-O0.c  |  12 ++
 gcc/testsuite/gcc.target/nvptx/alloca-3-O1.c  |  40 +++
 gcc/testsuite/gcc.target/nvptx/alloca-4-O3.c  |  55 +
 gcc/testsuite/gcc.target/nvptx/alloca-5.c | 107 ++
 gcc/testsuite/gcc.target/nvptx/softstack.c|   2 +
 gcc/testsuite/gcc.target/nvptx/vla-1-O0.c |  29 +
 gcc/testsuite/gcc.target/nvptx/vla-1-O1.c |  40 +++
 .../nvptx/{vla-1.c => vla-1-sm_30.c}  |   1 +
 gcc/testsuite/lib/target-supports.exp | 105 -
 libgomp/fortran.c |   4 +-
 .../libgomp.oacc-fortran/privatized-ref-2.f90 |  10 --
 23 files changed, 611 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1-sm_30.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-1-O0.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-1-O1.c
 rename gcc/testsuite/gcc.target/nvptx/{alloca-1.c => alloca-1-sm_30.c} (83%)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-2-O0.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-3-O1.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-4-O3.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alloca-5.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/vla-1-O0.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/vla-1-O1.c
 rename gcc/testsuite/gcc.target/nvptx/{vla

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-09 Thread Qing Zhao



> On Jan 9, 2025, at 03:17, Sam James  wrote:
> 
> Richard Biener  writes:
> 
>> On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Jan 7, 2025, at 07:29, Richard Biener  
 wrote:
 
 On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao  wrote:
> 
> 
> 
>> On Jan 6, 2025, at 11:01, Richard Biener  
>> wrote:
>> 
>> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Jan 6, 2025, at 09:21, Jeff Law  wrote:
 
 
 
 On 1/6/25 7:11 AM, Qing Zhao wrote:
>> 
>> Given it doesn't cause user visible UB, we could insert the
>> trap *before* the UB inducing statement.  That would then
>> make the statement unreachable and it'd get removed avoiding
>> the false positive diagnostic.
> Yes, that’s a good idea.
> However, in order to distinguish a user visible UB and a UB in the IL 
> that is introduced purely by compiler, we might need some new marking 
> in the IR?
 I don't think we've ever really tackled that question; the
 closest I can think of would be things like integer overflow
 which we try to avoid allowing the compiler to introduce.  If
 we take the integer overflow as the model, then that would say
 we should be tackling this during loop unrolling.
>>> 
>>> UB that is introduced by compiler transformation is one important cause 
>>> of false positive warnings.
>>> 
>>> There are two approaches to tackle this problem from my understanding:
>>> 
>>> 1. Avoid generating such UB from the beginning. i.e, for every compiler 
>>> transformation that might introduce such UB, we should add check to 
>>> avoid generating it.
>>> 
>>> 2. Marking the IR portion that were generated by compiler 
>>> transformations, then check whether the UB is compiler generated when 
>>> issue static checker warnings.
>>> 
>>> Are there other approaches?
>> 
>> Note unrolling doesn't introduce UB - it makes conditional UB
>> "obvious”.
> 
> So, you mean this is the same issue as PR109071 (and PR85788,
> PR88771, etc), i.e, the compiler optimization make the
> conditional UB that’s originally in the source code “obvious”
> after code duplication?
> 
> (I need to study the testing case in PR92539 more carefully to make sure 
> this is the case...)
> 
> If so, then the claimed false positive warning in PR92539
> actually is a real bug in the original source code, and my patch
> that introduced the new option “--fdiagnostics-details” should
> also include loop unrolling to provide more details on the
> warning introduced by loop unrolling.
> 
> 
>> Note -Warray-bounds wants to
>> diagnose UB, so doing path isolation and removing the UB would make
>> -Warray-bounds useless.
>> 
>> So unless the condition guarding the UB unrolling exposes is visibly
>> false to the compiler but we fail
>> to exploit that (missed optimization) there's not much that we can do.
>> I think "folding" away the UB
>> like what Jeff proposes trades false negatives for the false positive
>> diagnostics.
>> 
>> Note the unroller knows UB that effectively bounds the number of
>> iterations, even on conditional
>> paths and it uses this to limit the number of copies _and_ to prune
>> unreachable paths (exploiting
>> UB, avoiding diagnostics).  But one of the limitations is that it only
>> prunes paths in the last unrolled
>> copy which can be insufficient (ISTR some PR where I noticed this).
>> 
>> That said - I think for these unroller exposed cases of apparent false
>> positives we should improve
>> the path pruning in the unroller itself.  For the other cases the path
>> diagnostic might help clarify
>> that the UB happens on the 'n-th' iteration of the loop when some
>> additional condition is true/false.
> 
> So, the “other cases” refer to the situation similar as PR109071, i.e, 
> “conditional UB” in the original source code is made obvious after loop 
> unrolling?
> Yes, for such cases, the new option I have been trying to add, 
> “-fdiagnostic-details” should be able to track and provide more details 
> on the conditions that lead to the UB.
> Is this understanding correct?
 
 I think so, but I didn't look into the testcase of the referenced PR.
>>> 
>>> I took a detailed study of the test case of PR92539 yesterday.  The 
>>> following is a brief summary:
>>> 
>>> 1. The pass that caused the issue is: cunrolli.
>>> Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
>>> 
>>> 2. The IR Before cunrolli:
>>> 
>>> const char *local_iterator = beginning address of string "aa";
>>> const char *last = last address of string "aa";
>>> 
>>> for (int i = 0

Re: [PATCH] c++: Suppress note linked to error suppressed by -Wno-template-body [PR118163]

2025-01-09 Thread Simon Martin

On 9 Jan 2025, at 18:05, Patrick Palka wrote:

> On Wed, 8 Jan 2025, Jason Merrill wrote:
>
>> On 12/21/24 11:35 AM, Simon Martin wrote:
>>> When erroring out due to an incomplete type, we add a contextual 
>>> note
>>> about the type. However, when the error is suppressed by
>>> -Wno-template-body, the note remains, making the compiler output 
>>> quite
>>> puzzling.
>>>
>>> This patch makes sure the note is suppressed if we're processing a
>>> template declaration body with -Wno-template-body.
>>>
>>> Successfully tested on x86_64-pc-linux-gnu.
>>>
>>> PR c++/118163
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> * cp-tree.h (get_current_template): Declare.
>>> * error.cc (get_current_template): Make non static.
>>> * typeck2.cc (cxx_incomplete_type_inform): Suppress note when
>>> parsing a template declaration with -Wno-template-body.
>>
>> I think rather than adding this sort of thing in lots of places where 
>> an error
>> is followed by an inform, we should change error to return bool like 
>> other
>> diagnostic functions, and check its return value before calling
>> cxx_incomplete_type_inform or plain inform.  This likely involves the 
>> same
>> number of changes, but they should be smaller.
>>
>> Patrick, what do you think?
>
> That makes sense to me, it's consistent with the 'warning' API and how
> we handle issuing a warning followed by a note.  But since the
> -Wtemplate-body mechanism is really only useful for compiling legacy
> code where you don't really care about any diagnostics anyway, and
> the intended way to use it is -fpermissive / -Wno-error=template-body
> rather than -Wno-template-body, I'd prefer a less invasive solution 
> that
> doesn't change the API of 'error' if possible.
What I like in Jason’s suggestion is that even though it’s a bit 
more invasive, you know exactly what you get when you call error.

Right now you know in >99% of the cases and it’s great. But when the 
diagnostics machinery does changes under you, the UX becomes weird for 
the end-user (this PR) or the GCC developer (see PR 118388 with 
seen_error and permerror).

This PR is clearly not a burning issue, and it’s fine to only fix it 
in GCC 16 with a more robust fix.
>
> I wonder if we can work around this by taking advantage of the fact 
> that
> notes that follow an error are expected to be linked via an active
> auto_diagnostic_group?  Roughly, if we issued a -Wtemplate-body
> diagnostic from an active auto_diagnostic_group then all other
> diagnostics from that auto_diagnostic_group should also be associated
> with -Wtemplate-body, including notes.  That way -Wno-template-body 
> will
> effectively suppress subsequent notes followed by an eligible error, 
> and
> no 'error' callers need to be changed (unless to use
> auto_diagnostic_group).
>
> Another simpler approach, maybe for templates that have already been
> deemed erroneous, and -Wno-template-body is active, we could suppress
> all subsequent diagnostics, including warnings/notes.
>
>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.dg/diagnostic/incomplete-type-2.C: New test.
>>> * g++.dg/diagnostic/incomplete-type-2a.C: New test.
>>>
>>> ---
>>>   gcc/cp/cp-tree.h|  1 +
>>>   gcc/cp/error.cc |  2 +-
>>>   gcc/cp/typeck2.cc   |  6 ++
>>>   gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C |  7 +++
>>>   .../g++.dg/diagnostic/incomplete-type-2a.C  | 13 
>>> +
>>>   5 files changed, 28 insertions(+), 1 deletion(-)
>>>   create mode 100644 
>>> gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
>>>   create mode 100644 
>>> gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C
>>>
>>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>>> index 6de8f64b5ee..52f954b63d9 100644
>>> --- a/gcc/cp/cp-tree.h
>>> +++ b/gcc/cp/cp-tree.h
>>> @@ -7297,6 +7297,7 @@ struct decl_location_traits
>>>   typedef hash_map
>>> erroneous_templates_t;
>>>   extern GTY((cache)) erroneous_templates_t *erroneous_templates;
>>>   +extern tree get_current_template ();
>>>   extern bool cp_seen_error ();
>>>   #define seen_error() cp_seen_error ()
>>>   diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
>>> index 8c0644fba7e..7fd03dd6d12 100644
>>> --- a/gcc/cp/error.cc
>>> +++ b/gcc/cp/error.cc
>>> @@ -197,7 +197,7 @@ class cxx_format_postprocessor : public
>>> format_postprocessor
>>>   /* Return the in-scope template that's currently being parsed, or
>>>  NULL_TREE otherwise.  */
>>>   -static tree
>>> +tree
>>>   get_current_template ()
>>>   {
>>> if (scope_chain && in_template_context && !current_instantiation 
>>> ())
>>> diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
>>> index fce687e83b3..505c143dae7 100644
>>> --- a/gcc/cp/typeck2.cc
>>> +++ b/gcc/cp/typeck2.cc
>>> @@ -273,6 +273,12 @@ cxx_incomplete_type_inform (const_tree type)
>>> if (!TYPE_MAIN_DECL (type))
>>>   return;
>>>   +  /* When processing a templa

Re: [Patch, Fortran, PR118337, v1] Fortran: Fix Fortran *.mod compatibility [PR118337]

2025-01-09 Thread Mikael Morin


Le 09/01/2025 à 18:12, Andre Vehreschild a écrit :

Hi Jakub,

Yes, that is what I had in mind. Being German I don't see any problem with the
explanation, but that is better judged by a native English speaker.

Is the send patch hunk intentional where only indentation is changed? I haven't
applied it though.

Thanks for the patch,
Andre


Fine with me as well, thanks.

Re: [PATCH] c++: Suppress note linked to error suppressed by -Wno-template-body [PR118163]

2025-01-09 Thread Jason Merrill


On 1/9/25 12:05 PM, Patrick Palka wrote:

On Wed, 8 Jan 2025, Jason Merrill wrote:


On 12/21/24 11:35 AM, Simon Martin wrote:

When erroring out due to an incomplete type, we add a contextual note
about the type. However, when the error is suppressed by
-Wno-template-body, the note remains, making the compiler output quite
puzzling.

This patch makes sure the note is suppressed if we're processing a
template declaration body with -Wno-template-body.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118163

gcc/cp/ChangeLog:

* cp-tree.h (get_current_template): Declare.
* error.cc (get_current_template): Make non static.
* typeck2.cc (cxx_incomplete_type_inform): Suppress note when
parsing a template declaration with -Wno-template-body.


I think rather than adding this sort of thing in lots of places where an error
is followed by an inform, we should change error to return bool like other
diagnostic functions, and check its return value before calling
cxx_incomplete_type_inform or plain inform.  This likely involves the same
number of changes, but they should be smaller.

Patrick, what do you think?


That makes sense to me, it's consistent with the 'warning' API and how
we handle issuing a warning followed by a note.  But since the
-Wtemplate-body mechanism is really only useful for compiling legacy
code where you don't really care about any diagnostics anyway, and
the intended way to use it is -fpermissive / -Wno-error=template-body
rather than -Wno-template-body, I'd prefer a less invasive solution that
doesn't change the API of 'error' if possible.

I wonder if we can work around this by taking advantage of the fact that
notes that follow an error are expected to be linked via an active
auto_diagnostic_group?  Roughly, if we issued a -Wtemplate-body
diagnostic from an active auto_diagnostic_group then all other
diagnostics from that auto_diagnostic_group should also be associated
with -Wtemplate-body, including notes.  That way -Wno-template-body will
effectively suppress subsequent notes followed by an eligible error, and
no 'error' callers need to be changed (unless to use
auto_diagnostic_group).


I like the idea that if the first diagnostic in an auto_diagnostic_group 
is suppressed, the others are as well.



Another simpler approach, maybe for templates that have already been
deemed erroneous, and -Wno-template-body is active, we could suppress
all subsequent diagnostics, including warnings/notes.


That would also make sense. Relatedly, see the handling of 
current_tinst_level->errors and setting CLASSTYPE_ERRONEOUS.



gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/incomplete-type-2.C: New test.
* g++.dg/diagnostic/incomplete-type-2a.C: New test.

---
   gcc/cp/cp-tree.h|  1 +
   gcc/cp/error.cc |  2 +-
   gcc/cp/typeck2.cc   |  6 ++
   gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C |  7 +++
   .../g++.dg/diagnostic/incomplete-type-2a.C  | 13 +
   5 files changed, 28 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
   create mode 100644 gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6de8f64b5ee..52f954b63d9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7297,6 +7297,7 @@ struct decl_location_traits
   typedef hash_map
erroneous_templates_t;
   extern GTY((cache)) erroneous_templates_t *erroneous_templates;
   +extern tree get_current_template ();
   extern bool cp_seen_error ();
   #define seen_error() cp_seen_error ()
   diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 8c0644fba7e..7fd03dd6d12 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -197,7 +197,7 @@ class cxx_format_postprocessor : public
format_postprocessor
   /* Return the in-scope template that's currently being parsed, or
  NULL_TREE otherwise.  */
   -static tree
+tree
   get_current_template ()
   {
 if (scope_chain && in_template_context && !current_instantiation ())
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index fce687e83b3..505c143dae7 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -273,6 +273,12 @@ cxx_incomplete_type_inform (const_tree type)
 if (!TYPE_MAIN_DECL (type))
   return;
   +  /* When processing a template declaration body, the error generated by
the
+ caller (if any) might have been suppressed by -Wno-template-body. If
that
+ is the case, suppress the inform as well.  */
+  if (!warn_template_body && get_current_template ())
+return;
+
 location_t loc = DECL_SOURCE_LOCATION (TYPE_MAIN_DECL (type));
 tree ptype = strip_top_quals (CONST_CAST_TREE (type));
   diff --git a/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
b/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
new file mode 100644
index 000..e2fb20a4ae8
--- /d

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-09 Thread Qing Zhao



> On Jan 9, 2025, at 03:08, Richard Biener  wrote:
> 
> On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 7, 2025, at 07:29, Richard Biener  wrote:
>>> 
>>> On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao  wrote:
 
 
 
> On Jan 6, 2025, at 11:01, Richard Biener  
> wrote:
> 
> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 6, 2025, at 09:21, Jeff Law  wrote:
>>> 
>>> 
>>> 
>>> On 1/6/25 7:11 AM, Qing Zhao wrote:
> 
> Given it doesn't cause user visible UB, we could insert the trap 
> *before* the UB inducing statement.  That would then make the 
> statement unreachable and it'd get removed avoiding the false 
> positive diagnostic.
 Yes, that’s a good idea.
 However, in order to distinguish a user visible UB and a UB in the IL 
 that is introduced purely by compiler, we might need some new marking 
 in the IR?
>>> I don't think we've ever really tackled that question; the closest I 
>>> can think of would be things like integer overflow which we try to 
>>> avoid allowing the compiler to introduce.  If we take the integer 
>>> overflow as the model, then that would say we should be tackling this 
>>> during loop unrolling.
>> 
>> UB that is introduced by compiler transformation is one important cause 
>> of false positive warnings.
>> 
>> There are two approaches to tackle this problem from my understanding:
>> 
>> 1. Avoid generating such UB from the beginning. i.e, for every compiler 
>> transformation that might introduce such UB, we should add check to 
>> avoid generating it.
>> 
>> 2. Marking the IR portion that were generated by compiler 
>> transformations, then check whether the UB is compiler generated when 
>> issue static checker warnings.
>> 
>> Are there other approaches?
> 
> Note unrolling doesn't introduce UB - it makes conditional UB
> "obvious”.
 
 So, you mean this is the same issue as PR109071 (and PR85788, PR88771, 
 etc), i.e, the compiler optimization make the conditional UB that’s 
 originally in the source code “obvious” after code duplication?
 
 (I need to study the testing case in PR92539 more carefully to make sure 
 this is the case...)
 
 If so, then the claimed false positive warning in PR92539 actually is a 
 real bug in the original source code,  and my patch that introduced the 
 new option “--fdiagnostics-details” should also include loop unrolling to 
 provide more details on the warning introduced by loop unrolling.
 
 
> Note -Warray-bounds wants to
> diagnose UB, so doing path isolation and removing the UB would make
> -Warray-bounds useless.
> 
> So unless the condition guarding the UB unrolling exposes is visibly
> false to the compiler but we fail
> to exploit that (missed optimization) there's not much that we can do.
> I think "folding" away the UB
> like what Jeff proposes trades false negatives for the false positive
> diagnostics.
> 
> Note the unroller knows UB that effectively bounds the number of
> iterations, even on conditional
> paths and it uses this to limit the number of copies _and_ to prune
> unreachable paths (exploiting
> UB, avoiding diagnostics).  But one of the limitations is that it only
> prunes paths in the last unrolled
> copy which can be insufficient (ISTR some PR where I noticed this).
> 
> That said - I think for these unroller exposed cases of apparent false
> positives we should improve
> the path pruning in the unroller itself.  For the other cases the path
> diagnostic might help clarify
> that the UB happens on the 'n-th' iteration of the loop when some
> additional condition is true/false.
 
 So, the “other cases” refer to the situation similar as PR109071, i.e, 
 “conditional UB” in the original source code is made obvious after loop 
 unrolling?
 Yes, for such cases, the new option I have been trying to add, 
 “-fdiagnostic-details” should be able to track and provide more details on 
 the conditions that lead to the UB.
 Is this understanding correct?
>>> 
>>> I think so, but I didn't look into the testcase of the referenced PR.
>> 
>> I took a detailed study of the test case of PR92539 yesterday.  The 
>> following is a brief summary:
>> 
>> 1. The pass that caused the issue is: cunrolli.
>> Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
>> 
>> 2. The IR Before cunrolli:
>> 
>> const char *local_iterator = beginning address of string "aa";
>> const char *last = last address of string "aa";
>> 
>> for (int i = 0; i < 3; ++i)
>>  if (local_iterator != last)   // pointer comparison 1
>>{
>>  local_iterator++;
>>  if (local_iterator != l

Ping #2: [PATCH V4 0/5] Add more user friendly TARGET_ names for PowerPC

2025-01-09 Thread Michael Meissner

Ping patches 1-5 to Add more user friendly TARGET_ names for PowerPC:

Message-ID 

Information for patch set:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669067.html

Patch #1, Change TARGET_POPCNTB to TARGET_POWER5:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669068.html

Patch #2: Change TARGET_FPRND to TARGET_POWER5X:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669069.html

Patch #3: Change TARGET_CMPB to TARGET_POWER6:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669070.html

Patch #4: Change TARGET_POPCNTD to TARGET_POWER7:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669071.html

Patch #5: Change TARGET_MODULO to TARGET_POWER9:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669072.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH V4] Do not allow -mvsx to boost the cpu to power7

2025-01-09 Thread Michael Meissner

Ping patch to not allow -mvsx to boost the cpu to power7

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669106.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH 10/10] aarch64: Try to avoid passing new flags to assembler

2025-01-09 Thread Richard Sandiford

Richard Sandiford  writes:
> Andrew Carlotti  writes:
>> On Mon, Nov 25, 2024 at 11:26:39PM +, Richard Sandiford wrote:
>>> Sorry for the slow review.
>>> 
>>> Andrew Carlotti  writes:
>>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs)
>>> > were only recently added to the assembler.  To improve compatibility
>>> > with older assemblers, we try to avoid passing these new flags to the
>>> > assembler if we can express the targetted architecture without them. We
>>> > do so by using an almost-equivalent architecture string with a higher
>>> > architecture version.
>>> >
>>> > This should never reduce the set of instructions accepted by the
>>> > assembler.  It will make it more lenient in two cases:
>>> >
>>> > 1. Many system registers are currently gated behind architecture
>>> > versions instead of specific feature flags.  Increasing the base
>>> > architecture version may cause more system register accesses to be
>>> > accepted.
>>> >
>>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry.  We still want to
>>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll
>>> > instruct the assembler to accept FEAT_XS instructions as well whenever
>>> > the rest of the new features are enabled.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> >   * common/config/aarch64/aarch64-common.cc
>>> >   (aarch64_get_arch_string_for_assembler): New.
>>> >   (aarch64_rewrite_march): New.
>>> >   (aarch64_rewrite_selected_cpu): Call new function.
>>> >   * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
>>> >   * config/aarch64/aarch64-protos.h
>>> >   (aarch64_get_arch_string_for_assembler): New.
>>> >   * config/aarch64/aarch64.cc
>>> >   (aarch64_declare_function_name): Call new function.
>>> >   (aarch64_start_file): Ditto.
>>> >   * config/aarch64/aarch64.h
>>> >   * config/aarch64/aarch64.h
>>> >   (EXTRA_SPEC_FUNCTIONS): Use new macro name.
>>> >   (MCPU_TO_MARCH_SPEC): Rename to...
>>> >   (MARCH_REWRITE_SPEC): ...this, and add new spec rule.
>>> >   (aarch64_rewrite_march): New declaration.
>>> >   (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
>>> >   (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
>>> >   (ASM_CPU_SPEC): Use new macro name.
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> >   * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check.
>>> >   * gcc.target/aarch64/cpunative/info_27: New test.
>>> >   * gcc.target/aarch64/cpunative/info_28: New test.
>>> >   * gcc.target/aarch64/cpunative/info_29: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_27.c: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_28.c: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_29.c: New test.
>>> >
>>> >
>>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
>>> > b/gcc/common/config/aarch64/aarch64-common.cc
>>> > index 
>>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d
>>> >  100644
>>> > --- a/gcc/common/config/aarch64/aarch64-common.cc
>>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
>>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags
>>> >return outstr;
>>> >  }
>>> >  
>>> > +/* Generate an arch string to be passed to the assembler.
>>> > +
>>> > +   Several flags were added retrospectively for features that were 
>>> > previously
>>> > +   enabled only by specifying an architecture version.  We want to avoid
>>> > +   passing these flags to the assembler if possible, to improve 
>>> > compatibility
>>> > +   with older assemblers.  */
>>> > +
>>> > +std::string
>>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch,
>>> > +aarch64_feature_flags flags)
>>> > +{
>>> > +  if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT))
>>> > +goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8A
>>> > +  || arch == AARCH64_ARCH_V8_1A
>>> > +  || arch == AARCH64_ARCH_V8_2A)
>>> > +arch = AARCH64_ARCH_V8_3A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_RCPC2))
>>> > +goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_3A)
>>> > +arch = AARCH64_ARCH_V8_4A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2))
>>> > +goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_4A)
>>> > +arch = AARCH64_ARCH_V8_5A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_WFXT))
>>> > +goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_5A || arch == AARCH64_ARCH_V8_6A)
>>> > +{
>>> > +  arch = AARCH64_ARCH_V8_7A;
>>> > +  /* We don't support native detection for FEAT_XS, so we'll assume 
>>> > it's
>>> > +  present if the rest of these features are also present.  If we don't
>>> > +  do this, then we would end up passing +noxs to the assembler.  */
>>> > +  flags |= AARCH64_FL_XS;
>>> > +}
>>> > +done:
>>> > +
>>> > +  const struct arch_to_arch_name* a_to_an;
>>>

Ping #2: [PATCH V4 0/4] Add support for -mcpu=future in the PowerPC

2025-01-09 Thread Michael Meissner

Ping patches 1-4 to add support for -mcpu=future in the PowerPC

Message-ID 

Information about the patch set:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669099.html

Patch #1, Add support for -mcpu=future in the PowerPC
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669101.html

Patch #2, Add tuning support for -mcpu=future
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669102.html

Patch #3, Add -mcpu=future tests
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669103.html

Patch #4, Use vector pair load/store for memcpy with -mcpu=future
Note, the second file is to change the test condition for the new future-3.c to
exclude 32-bit tests.

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669104.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669132.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH report] PR target/99293 Optimize splat of a V2DF/V2DI extract with constant element

2025-01-09 Thread Michael Meissner

Ping patch to fix PR target/99293, Optimize splat of a V2DF/V2DI extract with
constant element:

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669136.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH V4 0/2] Separate PowerPC ISA bits from architecture bits set by -mcpu=

2025-01-09 Thread Michael Meissner

Ping patches 1-2 to separate PowerPC ISA bits from architecture bits set by
-mcpu=.

Message-ID 

Explanation of the patch set:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669108.html

Patch #1, add rs6000 architecture masks:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669109.html

Patch #2, use architecture flags for defining _ARCH_PWR macros:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669110.html


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH] PR target/117487 Add power9/power10 float to logical operations

2025-01-09 Thread Michael Meissner

Ping patch to fix PR target/117487, Add power9/power10 float to logical
operations

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669137.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH] PR target/108958: Use mtvsrdd to zero extend GPR DImode to VSX TImode

2025-01-09 Thread Michael Meissner

Ping patch for PR target/108958, Use mtvsrdd to zero extend GPR DImode to VSX
TImode

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669242.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #2: [PATCH repost] PR target/117251 Add PowerPC XXEVAL support for fusion optimization in power10

2025-01-09 Thread Michael Meissner

Ping patch to fix PR target/117251, Add PowerPC XXEVAL support for fusion
optimization in power10

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669138.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH] s390: Add testcase for just fixed PR118362

2025-01-09 Thread Jakub Jelinek

On Thu, Jan 09, 2025 at 01:29:27PM +0100, Stefan Schulze Frielinghaus wrote:
> Optimization s390_constant_via_vgbm_p() should only apply to constant
> vectors which can be expressed by the hardware, i.e., which have a size
> of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
> and s390_constant_via_vrepi_p().
> 
> gcc/ChangeLog:
> 
>   PR target/118362
>   * config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
>   16-byte vectors.
> ---
>  Bootstrap and regtest are still running.  If both are successful, I
>  will push this one promptly.

This was committed without a testcase, which IMHO shouldn't hurt.

Ok for trunk?

2025-01-09  Jakub Jelinek  

PR target/118362
* gcc.c-torture/compile/pr118362.c: New test.
* gcc.target/s390/pr118362.c: New test.

--- gcc/testsuite/gcc.c-torture/compile/pr118362.c.jj   2025-01-09 
19:13:10.536029180 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr118362.c  2025-01-09 
19:13:03.377128620 +0100
@@ -0,0 +1,19 @@
+/* PR target/118362 */
+
+int a, b, c[18];
+
+void
+foo (void)
+{
+  for (int i = 0; i < 8; i++)
+if (b)
+  {
+   c[i * 2 + 1] = a;
+   c[i * 2 + 2] = 200;
+  }
+else
+  {
+   c[i * 2 + 1] = 100;
+   c[i * 2 + 2] = 200;
+  }
+}
--- gcc/testsuite/gcc.target/s390/pr118362.c.jj 2025-01-09 19:13:47.865510663 
+0100
+++ gcc/testsuite/gcc.target/s390/pr118362.c2025-01-09 19:14:25.691985248 
+0100
@@ -0,0 +1,5 @@
+/* PR target/118362 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z14" } */
+
+#include "../../gcc.c-torture/compile/pr118362.c"


Jakub

[committed] testsuite: Require trampolines for gcc.dg/pr118325.c

2025-01-09 Thread Dimitar Dimitrov

The test case uses a nested function, which is not supported by some
targets.

This fixes a spurious error for pru-unknown-elf, where nested functions
are not supported.  Pushed to trunk as obvious.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118325.c: Require effective target trampolines.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/pr118325.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr118325.c b/gcc/testsuite/gcc.dg/pr118325.c
index 74f92cc2bb6..7129bc9b9be 100644
--- a/gcc/testsuite/gcc.dg/pr118325.c
+++ b/gcc/testsuite/gcc.dg/pr118325.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target trampolines } */
 /* { dg-options "-std=gnu17 -fchecking" } */
 
 void f(void*);
-- 
2.47.1

Re: [PATCH] [gcc-14] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2025-01-09 Thread Christophe Lyon

On Thu, 9 Jan 2025 at 12:25, Richard Earnshaw (lists)
 wrote:
>
> On 09/01/2025 08:58, Christophe Lyon wrote:
> > OK for gcc-14?
> >
> > This backport is a cherry pick of commit
> > 2089009210a1774c37e527ead8bbcaaa1a7a9d2d, with a small change needed
> > because force_lowpart_subreg does not exist in gcc-14: the patch
> > replaces it with the equivalent:
> >
> > -x = force_lowpart_subreg (mode, x, GET_MODE (x));
> > +{
> > +  auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
> > +  x = force_subreg (mode, x, GET_MODE (x), byte);
> > +}
>
> I think it would be OK to backport force_lowpart_subreg() to gcc-14 (nothing 
> else will call it, so it can't change the behaviour elsewhere).
>
> But this is OK too.  Your call.

Thanks, I pushed my version, because I think it's slightly less intrusive.

Christophe

>
> R.
>
> >
> > In this PR, we have to handle a case where MVE predicates are supplied
> > as a const_int, where individual predicates have illegal boolean
> > values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
> > fix the constant (any non-zero value is converted to all 1s) and emit
> > a warning.
> >
> > On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
> > instruction level, but end-users should describe lanes rather than
> > bytes (so all bytes of a true-predicated lane should be '1'), see the
> > section on MVE intrinsics in the Arm ACLE specification.
> >
> > Since force_lowpart_subreg cannot handle const_int (because they have VOID 
> > mode),
> > use gen_lowpart on them, force_lowpart_subreg otherwise.
> >
> > 2024-11-20  Christophe Lyon  
> >   Jakub Jelinek  
> >
> >   PR target/114801
> >   gcc/
> >   * config/arm/arm-mve-builtins.cc
> >   (function_expander::add_input_operand): Handle CONST_INT
> >   predicates.
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/mve/pr108443.c: Update predicate constant.
> >   * gcc.target/arm/mve/pr108443-run.c: Likewise.
> >   * gcc.target/arm/mve/pr114801.c: New test.
> >
> > (cherry picked from commit 2089009210a1774c37e527ead8bbcaaa1a7a9d2d)
> > ---
> >  gcc/config/arm/arm-mve-builtins.cc| 35 -
> >  .../gcc.target/arm/mve/pr108443-run.c |  2 +-
> >  gcc/testsuite/gcc.target/arm/mve/pr108443.c   |  4 +-
> >  gcc/testsuite/gcc.target/arm/mve/pr114801.c   | 39 +++
> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
> >
> > diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> > b/gcc/config/arm/arm-mve-builtins.cc
> > index e1826ae4052..c57bf0844b0 100644
> > --- a/gcc/config/arm/arm-mve-builtins.cc
> > +++ b/gcc/config/arm/arm-mve-builtins.cc
> > @@ -2107,7 +2107,40 @@ function_expander::add_input_operand (insn_code 
> > icode, rtx x)
> >mode = GET_MODE (x);
> >  }
> >else if (VALID_MVE_PRED_MODE (mode))
> > -x = gen_lowpart (mode, x);
> > +{
> > +  if (CONST_INT_P (x))
> > + {
> > +   if (mode == V8BImode || mode == V4BImode)
> > + {
> > +   /* In V8BI or V4BI each element has 2 or 4 bits, if those bits
> > +  aren't all the same, gen_lowpart might ICE.  Canonicalize all
> > +  the 2 or 4 bits to all ones if any of them is non-zero.  V8BI
> > +  and V4BI multi-bit masks are interpreted byte-by-byte at
> > +  instruction level, but such constants should describe lanes,
> > +  rather than bytes.  See the section on MVE intrinsics in the
> > +  Arm ACLE specification.  */
> > +   unsigned HOST_WIDE_INT xi = UINTVAL (x);
> > +   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
> > +   if (mode == V4BImode)
> > + xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
> > +   if (xi != UINTVAL (x))
> > + warning_at (location, 0, "constant predicate argument %d"
> > + " (%wx) does not map to %d lane numbers,"
> > + " converted to %wx",
> > + opno, UINTVAL (x) & 0x,
> > + mode == V8BImode ? 8 : 4,
> > + xi & 0x);
> > +
> > +   x = gen_int_mode (xi, HImode);
> > + }
> > +   x = gen_lowpart (mode, x);
> > + }
> > +  else
> > + {
> > +   auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
> > +   x = force_subreg (mode, x, GET_MODE (x), byte);
> > + }
> > +}
> >
> >m_ops.safe_grow (m_ops.length () + 1, true);
> >create_input_operand (&m_ops.last (), x, mode);
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c 
> > b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > index cb4b45bd305..b894f019b8b 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> > @@ -16,7 +16,7 @@ __attribute__ ((noipa)) partial_w

Re: [patch, Fortran] -fc-prototypes fixes.

2025-01-09 Thread Andre Vehreschild

You forgot to add the patch!

On Thu, 9 Jan 2025 14:34:50 +0100
Thomas Koenig  wrote:

> Hello world,
>
> This patch fixes and reorganizes dumping C prototypes.  It makes the
> following changes:
>
>   - BIND(C) types are now always output before any global symbols
>   - CFI_cdesc_t is issued for assumed shape and assumed rank arguments.
>   - BIND(C,NAME="...") entities were not always issued.
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/118359
>   * dump-parse-tree.cc (show_external_symbol): New function.
>   (write_type): Add prototype, put in restrictions on what not to dump.
>   (has_cfi_cdesc): New function.
>   (need_iso_fortran_binding): New function.
>   (gfc_dump_c_prototypes): Adjust to take only a file output.  Add
>   "#includeTraverse global namespaces to dump types and the globalsymol list
>   to dump external symbols.
>   (gfc_dump_external_c_prototypes): Traverse global namespaces.
>   (get_c_type_name): Handle CFI_cdesc_t.
>   (write_proc): Also pass array spec to get_c_type_name.
>   * gfortran.h (gfc_dump_c_prototypes): Adjust prototype.
>   * parse.cc (gfc_parse_file): Adjust call to gfc_dump_c_prototypes.
>
>
> Regression-tested.  As for testcases... as the dump is to
> standard output, I don't know how to do that. If anybody
> has an idea, please let me know.
>
> OK for trunk?
>
> Best regards
>
>   Thomas
>
>
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [patch, Fortran] -fc-prototypes fixes.

2025-01-09 Thread Thomas Koenig


Am 09.01.25 um 14:45 schrieb Andre Vehreschild:

You forgot to add the patch!


Sent two minutes later :-)

https://gcc.gnu.org/pipermail/fortran/2025-January/061540.html

[PATCH] RISC-V: testsuite: fix target selector for sync_char_short

2025-01-09 Thread Edwin Lu

The effective-target selector for riscv on sync_char_short did not
check to see if atomics were enabled. As a result, these test cases were
ran on targets without the a extension. Add additional checks for zalrsc
or zabha extensions.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix effective target sync_char_short
for riscv*-*-*

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/lib/target-supports.exp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index a89f531f887..939ef3a4119 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10080,7 +10080,9 @@ proc check_effective_target_sync_char_short { } {
 || ([istarget sparc*-*-*] && [check_effective_target_sparc_v9])
 || ([istarget arc*-*-*] && [check_effective_target_arc_atomic])
 || [istarget loongarch*-*-*]
-|| [istarget riscv*-*-*]
+|| ([istarget riscv*-*-*]
+&& ([check_effective_target_riscv_zalrsc]
+|| [check_effective_target_riscv_zabha]))
 || [check_effective_target_mips_llsc] }}]
 }
 
-- 
2.34.1

[PATCH] [ifcombine] adjust for narrowing converts before shifts [PR118206]

2025-01-09 Thread Alexandre Oliva



A narrowing conversion and a shift both drop bits from the loaded
value, but we need to take into account which one comes first to get
the right number of bits and mask.

Fold when applying masks to parts, comparing the parts, and combining
the results, in the odd chance either mask happens to be zero.

Regstrapped on x86_64-linux-gnu.  Ok to intall?


for  gcc/ChangeLog

PR tree-optimization/118206
* gimple-fold.cc (decode_field_reference): Account for upper
bits dropped by narrowing conversions whether before or after
a right shift.
(fold_truth_andor_for_ifcombine): Fold masks, compares, and
combined results.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118206
* gcc.dg/field-merge-18.c: New.
---
 gcc/gimple-fold.cc|   39 
 gcc/testsuite/gcc.dg/field-merge-18.c |   46 +
 2 files changed, 79 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-18.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c8a726e0ae3f3..d95f04213ee40 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7547,6 +7547,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   int shiftrt = 0;
   tree res_ops[2];
   machine_mode mode;
+  bool convert_before_shift = false;
 
   *load = NULL;
   *psignbit = false;
@@ -7651,6 +7652,12 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   if (*load)
loc[3] = gimple_location (*load);
   exp = res_ops[0];
+  /* This looks backwards, but we're going back the def chain, so if we
+find the conversion here, after finding a shift, that's because the
+convert appears before the shift, and we should thus adjust the bit
+pos and size because of the shift after adjusting it due to type
+conversion.  */
+  convert_before_shift = true;
 }
 
   /* Identify the load, if there is one.  */
@@ -7693,6 +7700,15 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   *pvolatilep = volatilep;
 
   /* Adjust shifts...  */
+  if (convert_before_shift
+  && outer_type && *pbitsize > TYPE_PRECISION (outer_type))
+{
+  HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
+  if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+   *pbitpos += excess;
+  *pbitsize -= excess;
+}
+
   if (shiftrt)
 {
   if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
@@ -7701,7 +7717,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
 }
 
   /* ... and bit position.  */
-  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
+  if (!convert_before_shift
+  && outer_type && *pbitsize > TYPE_PRECISION (outer_type))
 {
   HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
   if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
@@ -8377,6 +8394,8 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
 ll_align, BITS_PER_WORD, volatilep, &lnmode))
 l_split_load = false;
+  /* ??? If ll and rl share the same load, reuse that?
+ See PR 118206 -> gcc.dg/field-merge-18.c  */
   else
 {
   /* Consider the possibility of recombining loads if any of the
@@ -8757,11 +8776,11 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   /* Apply masks.  */
   for (int j = 0; j < 2; j++)
if (mask[j] != wi::mask (0, true, mask[j].get_precision ()))
- op[j] = build2_loc (locs[j][2], BIT_AND_EXPR, type,
- op[j], wide_int_to_tree (type, mask[j]));
+ op[j] = fold_build2_loc (locs[j][2], BIT_AND_EXPR, type,
+  op[j], wide_int_to_tree (type, mask[j]));
 
-  cmp[i] = build2_loc (i ? rloc : lloc, wanted_code, truth_type,
-  op[0], op[1]);
+  cmp[i] = fold_build2_loc (i ? rloc : lloc, wanted_code, truth_type,
+   op[0], op[1]);
 }
 
   /* Reorder the compares if needed.  */
@@ -8773,7 +8792,15 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   if (parts == 1)
 result = cmp[0];
   else if (!separatep || !maybe_separate)
-result = build2_loc (rloc, orig_code, truth_type, cmp[0], cmp[1]);
+{
+  /* Only fold if any of the cmp is known, otherwise we may lose the
+sequence point, and that may prevent further optimizations.  */
+  if (TREE_CODE (cmp[0]) == INTEGER_CST
+ || TREE_CODE (cmp[1]) == INTEGER_CST)
+   result = fold_build2_loc (rloc, orig_code, truth_type, cmp[0], cmp[1]);
+  else
+   result = build2_loc (rloc, orig_code, truth_type, cmp[0], cmp[1]);
+}
   else
 {
   result = cmp[0];
diff --git a/gcc/testsuite/gcc.dg/field-merge-18.c 
b/gcc/testsuite/gcc.dg/field-merge-18.c
ne

Re: [PATCH] [ifcombine] adjust for narrowing converts before shifts [PR118206]

2025-01-09 Thread Richard Biener

On Fri, 10 Jan 2025, Alexandre Oliva wrote:

> 
> A narrowing conversion and a shift both drop bits from the loaded
> value, but we need to take into account which one comes first to get
> the right number of bits and mask.
> 
> Fold when applying masks to parts, comparing the parts, and combining
> the results, in the odd chance either mask happens to be zero.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to intall?

OK.

Richard.

> 
> for  gcc/ChangeLog
> 
>   PR tree-optimization/118206
>   * gimple-fold.cc (decode_field_reference): Account for upper
>   bits dropped by narrowing conversions whether before or after
>   a right shift.
>   (fold_truth_andor_for_ifcombine): Fold masks, compares, and
>   combined results.
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/118206
>   * gcc.dg/field-merge-18.c: New.
> ---
>  gcc/gimple-fold.cc|   39 
>  gcc/testsuite/gcc.dg/field-merge-18.c |   46 
> +
>  2 files changed, 79 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-18.c
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index c8a726e0ae3f3..d95f04213ee40 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7547,6 +7547,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>int shiftrt = 0;
>tree res_ops[2];
>machine_mode mode;
> +  bool convert_before_shift = false;
>  
>*load = NULL;
>*psignbit = false;
> @@ -7651,6 +7652,12 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>if (*load)
>   loc[3] = gimple_location (*load);
>exp = res_ops[0];
> +  /* This looks backwards, but we're going back the def chain, so if we
> +  find the conversion here, after finding a shift, that's because the
> +  convert appears before the shift, and we should thus adjust the bit
> +  pos and size because of the shift after adjusting it due to type
> +  conversion.  */
> +  convert_before_shift = true;
>  }
>  
>/* Identify the load, if there is one.  */
> @@ -7693,6 +7700,15 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>*pvolatilep = volatilep;
>  
>/* Adjust shifts...  */
> +  if (convert_before_shift
> +  && outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> +{
> +  HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
> +  if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + *pbitpos += excess;
> +  *pbitsize -= excess;
> +}
> +
>if (shiftrt)
>  {
>if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> @@ -7701,7 +7717,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>  }
>  
>/* ... and bit position.  */
> -  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> +  if (!convert_before_shift
> +  && outer_type && *pbitsize > TYPE_PRECISION (outer_type))
>  {
>HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
>if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> @@ -8377,6 +8394,8 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
>ll_align, BITS_PER_WORD, volatilep, &lnmode))
>  l_split_load = false;
> +  /* ??? If ll and rl share the same load, reuse that?
> + See PR 118206 -> gcc.dg/field-merge-18.c  */
>else
>  {
>/* Consider the possibility of recombining loads if any of the
> @@ -8757,11 +8776,11 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>/* Apply masks.  */
>for (int j = 0; j < 2; j++)
>   if (mask[j] != wi::mask (0, true, mask[j].get_precision ()))
> -   op[j] = build2_loc (locs[j][2], BIT_AND_EXPR, type,
> -   op[j], wide_int_to_tree (type, mask[j]));
> +   op[j] = fold_build2_loc (locs[j][2], BIT_AND_EXPR, type,
> +op[j], wide_int_to_tree (type, mask[j]));
>  
> -  cmp[i] = build2_loc (i ? rloc : lloc, wanted_code, truth_type,
> -op[0], op[1]);
> +  cmp[i] = fold_build2_loc (i ? rloc : lloc, wanted_code, truth_type,
> + op[0], op[1]);
>  }
>  
>/* Reorder the compares if needed.  */
> @@ -8773,7 +8792,15 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>if (parts == 1)
>  result = cmp[0];
>else if (!separatep || !maybe_separate)
> -result = build2_loc (rloc, orig_code, truth_type, cmp[0], cmp[1]);
> +{
> +  /* Only fold if any of the cmp is known, otherwise we may lose the
> +  sequence point, and that may prevent further optimizations.  */
> +  if (TREE_CODE (cmp[0]) == INTEGER_CST
> +   || TREE_CODE (cmp[1]) == INTEGER_CST)
> + result = fold_build2_loc (rloc, orig_code

Re: [PATCH] [ifcombine] reuse left-hand mask to decode right-hand xor operand

2025-01-09 Thread Richard Biener

On Fri, 10 Jan 2025, Alexandre Oliva wrote:

> 
> If fold_truth_andor_for_ifcombine applies a mask to an xor, say
> because the result of the xor is compared with a power of two [minus
> one], we have to apply the same mask when processing both the left-
> and right-hand xor paths for the transformation to be sound.  Arrange
> for decode_field_reference to propagate the incoming mask along with
> the expression to the right-hand operand.
> 
> Don't require the right-hand xor operand to be a constant, that was a
> cut&pasto.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Richard.

> 
> for  gcc/ChangeLog
> 
>   * gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
>   Propagate pand_mask to the right-hand xor operand.  Don't
>   require the right-hand xor operand to be a constant.
>   (fold_truth_andor_for_ifcombine): Pass right-hand mask when
>   appropriate.
> ---
>  gcc/gimple-fold.cc |   23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index d95f04213ee40..0ad92de3a218f 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7519,8 +7519,9 @@ gimple_binop_def_p (enum tree_code code, tree t, tree 
> op[2])
>  
> *XOR_P is to be FALSE if EXP might be a XOR used in a compare, in which
> case, if XOR_CMP_OP is a zero constant, it will be overridden with *PEXP,
> -   *XOR_P will be set to TRUE, and the left-hand operand of the XOR will be
> -   decoded.  If *XOR_P is TRUE, XOR_CMP_OP is supposed to be NULL, and then 
> the
> +   *XOR_P will be set to TRUE, *XOR_PAND_MASK will be copied from *PAND_MASK,
> +   and the left-hand operand of the XOR will be decoded.  If *XOR_P is TRUE,
> +   XOR_CMP_OP and XOR_PAND_MASK are supposed to be NULL, and then the
> right-hand operand of the XOR will be decoded.
>  
> *LOAD is set to the load stmt of the innermost reference, if any,
> @@ -7537,7 +7538,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>   HOST_WIDE_INT *pbitpos,
>   bool *punsignedp, bool *preversep, bool *pvolatilep,
>   wide_int *pand_mask, bool *psignbit,
> - bool *xor_p, tree *xor_cmp_op,
> + bool *xor_p, tree *xor_cmp_op, wide_int *xor_pand_mask,
>   gimple **load, location_t loc[4])
>  {
>tree exp = *pexp;
> @@ -7599,15 +7600,14 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>  and_mask = *pand_mask;
>  
>/* Turn (a ^ b) [!]= 0 into a [!]= b.  */
> -  if (xor_p && gimple_binop_def_p (BIT_XOR_EXPR, exp, res_ops)
> -  && uniform_integer_cst_p (res_ops[1]))
> +  if (xor_p && gimple_binop_def_p (BIT_XOR_EXPR, exp, res_ops))
>  {
>/* No location recorded for this one, it's entirely subsumed by the
>compare.  */
>if (*xor_p)
>   {
> exp = res_ops[1];
> -   gcc_checking_assert (!xor_cmp_op);
> +   gcc_checking_assert (!xor_cmp_op && !xor_pand_mask);
>   }
>else if (!xor_cmp_op)
>   /* Not much we can do when xor appears in the right-hand compare
> @@ -7618,6 +7618,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
> *xor_p = true;
> exp = res_ops[0];
> *xor_cmp_op = *pexp;
> +   *xor_pand_mask = *pand_mask;
>   }
>  }
>  
> @@ -8152,19 +8153,21 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>bool l_xor = false, r_xor = false;
>ll_inner = decode_field_reference (&ll_arg, &ll_bitsize, &ll_bitpos,
>&ll_unsignedp, &ll_reversep, &volatilep,
> -  &ll_and_mask, &ll_signbit, &l_xor, &lr_arg,
> +  &ll_and_mask, &ll_signbit,
> +  &l_xor, &lr_arg, &lr_and_mask,
>&ll_load, ll_loc);
>lr_inner = decode_field_reference (&lr_arg, &lr_bitsize, &lr_bitpos,
>&lr_unsignedp, &lr_reversep, &volatilep,
> -  &lr_and_mask, &lr_signbit, &l_xor, 0,
> +  &lr_and_mask, &lr_signbit, &l_xor, 0, 0,
>&lr_load, lr_loc);
>rl_inner = decode_field_reference (&rl_arg, &rl_bitsize, &rl_bitpos,
>&rl_unsignedp, &rl_reversep, &volatilep,
> -  &rl_and_mask, &rl_signbit, &r_xor, &rr_arg,
> +  &rl_and_mask, &rl_signbit,
> +  &r_xor, &rr_arg, &rr_and_mask,
>&rl_load, rl_loc);
>rr_inner = decode_field_reference (&rr_arg, &rr_bitsize, &rr_bitpos,
>&rr_unsignedp, &rr_reversep, &volatilep,
> -  &rr_and_mask, &rr_signbit, &r_xor, 0,
>

Re: [PATCH] [ifcombine] fix mask variable test to match use [PR118344]

2025-01-09 Thread Richard Biener

On Fri, 10 Jan 2025, Alexandre Oliva wrote:

> 
> There was a cut&pasto in the rr_and_mask's adjustment to match the
> combined type: the test on whether there was a mask already was
> testing the wrong variable, and then it might crash or otherwise fail
> accessing an undefined mask.  This only hit with checking enabled,
> and rarely at that.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Richard.

> 
> for  gcc/ChangeLog
> 
>   PR tree-optimization/118344
>   * gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
>   rr_and_mask's type adjustment test.
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/118344
>   * gcc.dg/field-merge-19.c: New.
> ---
>  gcc/gimple-fold.cc|2 +-
>  gcc/testsuite/gcc.dg/field-merge-19.c |   41 
> +
>  2 files changed, 42 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-19.c
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 0ad92de3a218f..20b5024d861db 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8644,7 +8644,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
> xlr_bitpos);
>else
>   lr_mask = wi::shifted_mask (xlr_bitpos, lr_bitsize, false, rnprec);
> -  if (rl_and_mask.get_precision ())
> +  if (rr_and_mask.get_precision ())
>   rr_mask = wi::lshift (wide_int::from (rr_and_mask, rnprec, UNSIGNED),
> xrr_bitpos);
>else
> diff --git a/gcc/testsuite/gcc.dg/field-merge-19.c 
> b/gcc/testsuite/gcc.dg/field-merge-19.c
> new file mode 100644
> index 0..5622baa52b0a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-19.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fchecking" } */
> +
> +/* PR tree-optimization/118344 */
> +
> +/* This used to ICE attempting to extend a mask variable after testing the
> +   wrong mask variable.  */
> +
> +int d, e, g, h, i, c, j;
> +static short k;
> +char o;
> +static int *p;
> +static long *a;
> +int b[0];
> +int q(int s, int t, int *u, int *v) {
> +  for (int f = 0; f < s; f++)
> +if ((t & v[f]) != u[f])
> +  return 0;
> +  return 1;
> +}
> +int w(int s, int t) {
> +  int l[] = {t, t, t, t}, m[] = {e, e, 3, 1};
> +  int n = q(s, d, l, m);
> +  return n;
> +}
> +int x(unsigned s) {
> +  unsigned r;
> +  if (s >= -1)
> +return 1;
> +  r = 1000;
> +  while (s > 1 / r)
> +r /= 2;
> +  return g ? 2 : 0;
> +}
> +void y() {
> +  for (;;) {
> +b[w(8, *p)] = h;
> +for (; a + k; j = o)
> +  i &= c = x(6) < 0;
> +  }
> +}
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] ree: Skip extension on stack pointer

2025-01-09 Thread Jeff Law





On 1/8/25 4:14 PM, H.J. Lu wrote:

On Thu, Jan 9, 2025 at 5:35 AM Jeff Law  wrote:




On 1/8/25 1:53 PM, H.J. Lu wrote:

Skip extension on stack pointer since we can't turn

(insn 27 26 139 2 (parallel [
  (set (reg/f:SI 7 sp)
  (plus:SI (reg/f:SI 7 sp)
  (const_int 16 [0x10])))
  (clobber (reg:CC 17 flags))
  ]) "x.ii":14:17 discrim 1 283 {*addsi_1}
   (expr_list:REG_ARGS_SIZE (const_int 0 [0])
  (nil)))
...
(insn 43 125 74 2 (set (reg/f:DI 6 bp [145])
  (zero_extend:DI (reg/f:SI 7 sp))) "x.ii":15:9 175 {*zero_extendsidi2}
   (nil))

into

(insn 27 26 155 2 (parallel [
  (set (reg:DI 6 bp)
  (zero_extend:DI (plus:SI (reg/f:SI 7 sp)
  (const_int 16 [0x10]
  (clobber (reg:CC 17 flags))
  ]) "x.ii":14:17 discrim 1 296 {addsi_1_zext}
   (expr_list:REG_ARGS_SIZE (const_int 0 [0])
  (nil)))
(insn 155 27 139 2 (set (reg:DI 7 sp)
  (reg:DI 6 bp)) "x.ii":14:17 discrim 1 -1
   (nil))

without updating stack frame info.

FWIW, I think the stack update was a canary for a bigger issue here.




gcc/

   PR rtl-optimization/118266
   * ree.cc (add_removable_extension): Skip extension on stack
   pointer.

gcc/testsuite/

   PR rtl-optimization/118266
   * gcc.target/i386/pr118266.c: New test.

Presumably there were no other uses of sp?


There are many uses of sp, including sp update, push and pop.
Thanks.  I can speculate about multiple possible bad scenarios around 
fixed registers, so thanks for adjusting.


jeff

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-09 Thread Richard Sandiford

Wilco Dijkstra  writes:
> Hi Richard,
>
>> The patch below is what I meant.  It passes bootstrap & regression-test
>> on aarch64-linux-gnu (and so produces the same results for the tests
>> that you changed).  Do you see any problems with this version?
>> If not, I think we should go with it.
>
> Thanks for the detailed example - unfortunately there are issues with it.
> Early expansion means more instructions to deal with in RTL and fewer
> optimizations - it even affects inlining (I see more calls/returns in the
> instruction frequencies).

Do you have an example?  It seems odd that rtl changes after cfgexpand
would affect inlining.

> Worse, this change completely disables rematerialization of FP immediates
> which implies extra spilling. A basic example goes like this:
>
> void g(void);
> double bad_remat (double x)
> {
>   x += 5.347897294;
>   g();
>   x *= 5.347897294;
>   return x;
> }
>
> which with -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 
> -ffixed-d11 -ffixed-d12 -ffixed-d13 -ffixed-d14 now compiles to:
>
> adrpx0, .LC0
> str x30, [sp, -32]!
> ldr d31, [x0, #:lo12:.LC0]
> str d15, [sp, 8]
> faddd15, d0, d31
> str d31, [sp, 24]
> bl  g
> ldr d31, [sp, 24]
> fmuld0, d15, d31
> ldr d15, [sp, 8]
> ldr x30, [sp], 32
> ret

Hmm, interesting.  Thanks for the example.

The reason this works with your patch and not mine seems to be a direct
consequence of the fix for PR37273:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37273

It means that, if we start with:

   (set (reg R) (mem const_pool))
 REG_EQUIV: legitimate_const

then the final costing pass starts out with a significantly negative
memory cost for R.  Assigning memory is taken to be a saving:

  if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
  && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
  && ((MEM_P (XEXP (note, 0))
   && !side_effects_p (SET_SRC (set)))
  || (CONSTANT_P (XEXP (note, 0))
  && targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
XEXP (note, 0))
  && REG_N_SETS (REGNO (SET_DEST (set))) == 1))
  && general_operand (SET_SRC (set), GET_MODE (SET_SRC (set)))
  /* LRA does not use equiv with a symbol for PIC code.  */
  && (! ira_use_lra_p || ! pic_offset_table_rtx
  || ! contains_symbol_ref_p (XEXP (note, 0

Thus for your patch IRA assigns a call-clobbered register while for
my patch it assumes that memory is cheaper than using caller saves.
If I disable the CONSTANT_P part of the check, to mimic what IRA
does with your patch, then I get the expected output.

The patch for PR37273 predates LRA and so was aimed at reload's
version of caller saves.  AIUI, the problem there was that the
caller saves didn't take advantage of equivalences and so would
always save to the stack.  LRA is smarter than that, so I suspect
the check could be dropped.  We should then get better code in
cases where the rematerialised pseudo is used multiple times
between two calls.

Even so, the code above is clearly assuming that the equivalence
will be replaced by a memory where necessary.  That doesn't happen
due to the following code in lra_constraints:

/* If it is not a reverse equivalence, we check that a
   pseudo in rhs of the init insn is not dying in the
   insn.  Otherwise, the live info at the beginning of
   the corresponding BB might be wrong after we
   removed the insn.  When the equiv can be a
   constant, the right hand side of the init insn can
   be a pseudo.  */
|| (! reverse_equiv_p (i)
&& (init_insn_rhs_dead_pseudo_p (i)
/* If we reloaded the pseudo in an equivalence
   init insn, we cannot remove the equiv init
   insns and the init insns might write into
   const memory in this case.  */
|| contains_reloaded_insn_p (i)))

This specifically mentions constants, but the behaviour seems odd
for them.  We don't use init_insns directly to rematerialise the
constant.  We simply replace the pseudo with the constant and reload
it afresh.

So I wouldn't have expected the check above to be necessary for
constants.  In particular, we check elsewhere whether the constant
is legitimate, and things like that.  Adding !CONSTANT_P (x)
to that condition also "fixes" the testcase.

Note that if you change the constant in your example to (say) 7902,
the output with current trunk is similarly bad:

str x30, [sp, -32]!
mov x0, 244091581366272
movkx0, 0x40be, lsl 48
str d15, [sp, 8]
fmovd15, x0
faddd0, d0, d

[PATCH v2] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-09 Thread Christophe Lyon

The previous fix only worked for C, for C++ we need to add more
information to the underlying type so that
finish_class_member_access_expr accepts it.

We use the same logic as in aarch64's register_tuple_type for AdvSIMD
tuples.

This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
mode.

gcc/ChangeLog:

PR target/118332
* config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
(register_type_decl): Delete.
(register_builtin_tuple_types): Use
lang_hooks.types.simulate_record_decl.
---
 gcc/config/arm/arm-mve-builtins.cc | 52 +-
 1 file changed, 8 insertions(+), 44 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 846cd773c0b..b37c91c541b 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -449,47 +449,6 @@ register_vector_type (vector_type_index type)
   acle_vector_types[0][type] = vectype;
 }
 
-/* Return a structure type that contains a single field of type FIELD_TYPE.
-   The field is called 'val', as mandated by ACLE.  */
-static tree
-wrap_type_in_struct (tree field_type)
-{
-  tree field = build_decl (input_location, FIELD_DECL,
-  get_identifier ("val"), field_type);
-  tree struct_type = lang_hooks.types.make_type (RECORD_TYPE);
-  DECL_FIELD_CONTEXT (field) = struct_type;
-  TYPE_FIELDS (struct_type) = field;
-  layout_type (struct_type);
-  return struct_type;
-}
-
-/* Register a built-in TYPE_DECL called NAME for TYPE.  This is used/needed
-   when TYPE is a structure type.  */
-static void
-register_type_decl (tree type, const char *name)
-{
-  tree decl = build_decl (input_location, TYPE_DECL,
- get_identifier (name), type);
-  TYPE_NAME (type) = decl;
-  TYPE_STUB_DECL (type) = decl;
-  lang_hooks.decls.pushdecl (decl);
-  /* ??? Undo the effect of set_underlying_type for C.  The C frontend
- doesn't recognize DECL as a built-in because (as intended) the decl has
- a real location instead of BUILTINS_LOCATION.  The frontend therefore
- treats the decl like a normal C "typedef struct foo foo;", expecting
- the type for tag "struct foo" to have a dummy unnamed TYPE_DECL instead
- of the named one we attached above.  It then sets DECL_ORIGINAL_TYPE
- on the supposedly unnamed decl, creating a circularity that upsets
- dwarf2out.
-
- We don't want to follow the normal C model and create "struct foo"
- tags for tuple types since (a) the types are supposed to be opaque
- and (b) they couldn't be defined as a real struct anyway.  Treating
- the TYPE_DECLs as "typedef struct foo foo;" without creating
- "struct foo" would lead to confusing error messages.  */
-  DECL_ORIGINAL_TYPE (decl) = NULL_TREE;
-}
-
 /* Register tuple types of element type TYPE under their arm_mve_types.h
names.  */
 static void
@@ -518,13 +477,18 @@ register_builtin_tuple_types (vector_type_index type)
  && TYPE_MODE_RAW (arrtype) == TYPE_MODE (arrtype)
  && TYPE_ALIGN (arrtype) == 64);
 
-  tree tuple_type = wrap_type_in_struct (arrtype);
+  /* Build a structure type that contains a single field of type ARRTYPE.
+The field is called 'val', as mandated by ACLE.  */
+  tree field = build_decl (input_location, FIELD_DECL,
+  get_identifier ("val"), arrtype);
+  tree tuple_type
+   = lang_hooks.types.simulate_record_decl (input_location,
+buffer,
+make_array_slice (&field, 1));
   gcc_assert (VECTOR_MODE_P (TYPE_MODE (tuple_type))
  && TYPE_MODE_RAW (tuple_type) == TYPE_MODE (tuple_type)
  && TYPE_ALIGN (tuple_type) == 64);
 
-  register_type_decl (tuple_type, buffer);
-
   acle_vector_types[num_vectors >> 1][type] = tuple_type;
 }
 }
-- 
2.34.1

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-09 Thread Richard Sandiford

Tamar Christina  writes:
>> > + After the final loads are done it issues a
>> > + vec_construct to recreate the vector from the scalar.  For costing 
>> > when
>> > + we see a vec_to_scalar on a stmt with VMAT_GATHER_SCATTER we are
>> dealing
>> > + with an emulated instruction and should adjust costing properly.  */
>> > +  if (kind == vec_to_scalar
>> > +  && (m_vec_flags & VEC_ADVSIMD)
>> > +  && vect_mem_access_type (stmt_info, node) == VMAT_GATHER_SCATTER)
>> > +{
>> > +  auto dr = STMT_VINFO_DATA_REF (stmt_info);
>> > +  tree dr_ref = DR_REF (dr);
>> > +  /* Only really expect MEM_REF or ARRAY_REF here.  Ignore the rest.  
>> > */
>> > +  switch (TREE_CODE (dr_ref))
>> > +  {
>> > +  case MEM_REF:
>> > +  case ARRAY_REF:
>> > +  case TARGET_MEM_REF:
>> > +{
>> > +  tree offset = TREE_OPERAND (dr_ref, 1);
>> > +  if (SSA_VAR_P (offset)
>> > +  && gimple_vuse (SSA_NAME_DEF_STMT (offset)))
>> > +{
>> > +  if (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type)
>> > +ops->loads += count - 1;
>> > +  else
>> > +/* Stores want to count both the index to array and data to
>> > +   array using vec_to_scalar.  However we have index stores in
>> > +   Adv.SIMD and so we only want to adjust the index loads.  */
>> > +ops->loads += count / 2;
>> > +  return;
>> > +}
>> > +  break;
>> > +}
>> > +  default:
>> > +break;
>> > +  }
>> > +}
>> 
>> Operand 1 of MEM_REF and TARGET_MEM_REF are always constant, so the
>> handling of those codes looks redundant.  Perhaps we should instead use:
>> 
>>while (handled_component_p (dr_ref))
>>  {
>>if (TREE_CODE (dr_ref) == ARRAY_REF)
>>  {
>>  ...early return or break if SSA offset found...
>>  }
>>dr_ref = TREE_OPERAND (dr_ref, 0);
>>  }
>> 
>> A gather load could reasonably be to COMPONENT_REFs or BIT_FIELD_REFs of
>> an ARRAY_REF, so the ARRAY_REF might not be the outermost code.
>> 
>
> Ah, I tried to find their layouts but couldn't readily find one.  
> Interesting, I would have
> Thought a scatter/gather on either of those is more efficiently done through 
> bit masking.

It can happen in things like:

#define iterations 10
#define LEN_1D 32000

typedef struct { float f; } floats;
float a[LEN_1D];
floats b[LEN_1D][2];

float
s4115 (int *ip)
{
float sum = 0.;
for (int i = 0; i < LEN_1D; i++)
  {
sum += a[i] * b[ip[i]][0].f;
  }
return sum;
}

where the outer compoennt reference is the ".f" and the array reference
is hidden inside.  This example also shows why I think we should only
break on SSA offsets: the outer array reference has a constant index,
but the inner one is variable and is loaded from memory.

So:

> Here's updated version of patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/118188
>   * config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Adjust
>   throughput of emulated gather and scatters.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/118188
>   * gcc.target/aarch64/sve/gather_load_12.c: New test.
>   * gcc.target/aarch64/sve/gather_load_13.c: New test.
>   * gcc.target/aarch64/sve/gather_load_14.c: New test.
>
> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 3e700ed41e97a98dc844cad1c8a66a3555d82221..6a7abceda466f5569d743b0d3a0eed1bbeb79a2c
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -17378,6 +17378,46 @@ aarch64_vector_costs::count_ops (unsigned int count, 
> vect_cost_for_stmt kind,
>   return;
>  }
>  
> +  /* Detect the case where we are using an emulated gather/scatter.  When a
> + target does not support gathers and scatters directly the vectorizer
> + emulates these by constructing an index vector and then issuing an
> + extraction for every lane in the vector.  If the index vector is loaded
> + from memory, the vector load and extractions are subsequently lowered by
> + veclower into a series of scalar index loads.  After the final loads are
> + done it issues a vec_construct to recreate the vector from the scalar.  
> For
> + costing when we see a vec_to_scalar on a stmt with VMAT_GATHER_SCATTER 
> we
> + are dealing with an emulated instruction and should adjust costing
> + properly.  */
> +  if (kind == vec_to_scalar
> +  && (m_vec_flags & VEC_ADVSIMD)
> +  && vect_mem_access_type (stmt_info, node) == VMAT_GATHER_SCATTER)
> +{
> +  auto dr = STMT_VINFO_DATA_REF (stmt_info);
> +  tree dr_ref = DR_REF (dr);
> +  while (handled_component_p (dr_ref))
> + {
> +   if (TREE_CODE (dr_ref) == ARRAY_REF)
> + {
> +   tree offset = TREE_OPERAND (dr_ref, 1);
>

Re: [PATCH] LoongArch: Adjust the cost of ADDRESS_REG_REG [PR114978].

2025-01-09 Thread Lulu Cheng




在 2025/1/8 下午11:16, Xi Ruoyao 写道:

On Tue, 2025-01-07 at 10:44 +0800, Lulu Cheng wrote:

After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.

Would this fix https://gcc.gnu.org/PR114978 (or at least make it
latent)?


The code at revision r14-9540 shows some improvement for test 548,

but it still falls below that of r14-9539.

There is virtually no change in the performance of test 548 with the 
code based on revision r14-6015.


I will test the latest upstream code again.



Add option '-maddr-reg-reg-cost='.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Add
option '-maddr-reg-reg-cost='.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize
addr_reg_reg_cost to 3.
* config/loongarch/loongarch-opts.cc
(loongarch_target_option_override): If '-maddr-reg-reg-cost='
is not used, set it to the initial value.
* config/loongarch/loongarch-tune.h
(struct loongarch_rtx_cost_data): Add the member
addr_reg_reg_cost and its assignment function to the structure
loongarch_rtx_cost_data.
* config/loongarch/loongarch.cc (loongarch_address_insns):
Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* doc/invoke.texi: Add description of '-maddr-reg-reg-cost='.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/const-double-zero-stx.c: Add
'-maddr-reg-reg-cost=1'.
* gcc.target/loongarch/stack-check-alloca-1.c: Likewise.

---
  gcc/config/loongarch/genopts/loongarch.opt.in  | 4 
  gcc/config/loongarch/loongarch-def.cc  | 1 +
  gcc/config/loongarch/loongarch-opts.cc | 3 +++
  gcc/config/loongarch/loongarch-tune.h  | 7 +++
  gcc/config/loongarch/loongarch.cc  | 2 +-
  gcc/config/loongarch/loongarch.opt | 4 
  gcc/config/loongarch/loongarch.opt.urls    | 3 +++
  gcc/doc/invoke.texi    | 7 ++-
  gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +-
  gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c  | 2 +-
  10 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 8c292c8600d..39c1545e540 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -177,6 +177,10 @@ mbranch-cost=
  Target RejectNegative Joined UInteger Var(la_branch_cost) Save
  -mbranch-cost=COSTSet the cost of branches to roughly COST instructions.
  
+maddr-reg-reg-cost=

+Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save
+-maddr-reg-reg-cost=COST  Set the cost of ADDRESS_REG_REG to the value 
calculated by COST.
+
  mcheck-zero-division
  Target Mask(CHECK_ZERO_DIV) Save
  Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index b0271eb3b9a..5f235a04ef2 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
  movcf2gr (COSTS_N_INSNS (7)),
  movgr2cf (COSTS_N_INSNS (15)),
  branch_cost (6),
+    addr_reg_reg_cost (3),
  memory_latency (4) {}
  
  /* The following properties cannot be looked up directly using "cpucfg".

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 36342cc9373..c2a63f75fc2 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target 
*target,
    if (!opts_set->x_la_branch_cost)
  opts->x_la_branch_cost = loongarch_cost->branch_cost;
  
+  if (!opts_set->x_la_addr_reg_reg_cost)

+    opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost;
+
    /* other stuff */
    if (ABI_LP64_P (target->abi.base))
  opts->x_flag_pcc_struct_return = 0;
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index e69173ebf79..f7819fe7678 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data
    unsigned short movcf2gr;
    unsigned short movgr2cf;
    unsigned short branch_cost;
+  unsigned short addr_reg_reg_cost;
    unsigned short memory_latency;
  
    /* Default RTX cost initializer, implemented in loongarch-def.cc.  */

@@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data
  return *this;
    }
  
+  loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short _addr_

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-09 Thread Wilco Dijkstra

Hi Richard,

> The patch below is what I meant.  It passes bootstrap & regression-test
> on aarch64-linux-gnu (and so produces the same results for the tests
> that you changed).  Do you see any problems with this version?
> If not, I think we should go with it.

Thanks for the detailed example - unfortunately there are issues with it.
Early expansion means more instructions to deal with in RTL and fewer
optimizations - it even affects inlining (I see more calls/returns in the
instruction frequencies).

Worse, this change completely disables rematerialization of FP immediates
which implies extra spilling. A basic example goes like this:

void g(void);
double bad_remat (double x)
{
  x += 5.347897294;
  g();
  x *= 5.347897294;
  return x;
}

which with -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 
-ffixed-d11 -ffixed-d12 -ffixed-d13 -ffixed-d14 now compiles to:

adrpx0, .LC0
str x30, [sp, -32]!
ldr d31, [x0, #:lo12:.LC0]
str d15, [sp, 8]
faddd15, d0, d31
str d31, [sp, 24]
bl  g
ldr d31, [sp, 24]
fmuld0, d15, d31
ldr d15, [sp, 8]
ldr x30, [sp], 32
ret

Recent changes have been moving in the opposite direction - keeping
high-level constructs (like GOT accesses) as a single operation works out
better for register allocation and allows more optimization.

So keeping FP immediates as standard move instructions until regalloc
is best. Supporting MOV/FMOV in regalloc would require another secondary
reload (and would then allow rematerialization of these constants).

Cheers,
Wilco

[PATCH] [gcc-14] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2025-01-09 Thread Christophe Lyon

OK for gcc-14?

This backport is a cherry pick of commit
2089009210a1774c37e527ead8bbcaaa1a7a9d2d, with a small change needed
because force_lowpart_subreg does not exist in gcc-14: the patch
replaces it with the equivalent:

-x = force_lowpart_subreg (mode, x, GET_MODE (x));
+{
+  auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
+  x = force_subreg (mode, x, GET_MODE (x), byte);
+}

In this PR, we have to handle a case where MVE predicates are supplied
as a const_int, where individual predicates have illegal boolean
values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
fix the constant (any non-zero value is converted to all 1s) and emit
a warning.

On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
instruction level, but end-users should describe lanes rather than
bytes (so all bytes of a true-predicated lane should be '1'), see the
section on MVE intrinsics in the Arm ACLE specification.

Since force_lowpart_subreg cannot handle const_int (because they have VOID 
mode),
use gen_lowpart on them, force_lowpart_subreg otherwise.

2024-11-20  Christophe Lyon  
Jakub Jelinek  

PR target/114801
gcc/
* config/arm/arm-mve-builtins.cc
(function_expander::add_input_operand): Handle CONST_INT
predicates.

gcc/testsuite/
* gcc.target/arm/mve/pr108443.c: Update predicate constant.
* gcc.target/arm/mve/pr108443-run.c: Likewise.
* gcc.target/arm/mve/pr114801.c: New test.

(cherry picked from commit 2089009210a1774c37e527ead8bbcaaa1a7a9d2d)
---
 gcc/config/arm/arm-mve-builtins.cc| 35 -
 .../gcc.target/arm/mve/pr108443-run.c |  2 +-
 gcc/testsuite/gcc.target/arm/mve/pr108443.c   |  4 +-
 gcc/testsuite/gcc.target/arm/mve/pr114801.c   | 39 +++
 4 files changed, 76 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index e1826ae4052..c57bf0844b0 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -2107,7 +2107,40 @@ function_expander::add_input_operand (insn_code icode, 
rtx x)
   mode = GET_MODE (x);
 }
   else if (VALID_MVE_PRED_MODE (mode))
-x = gen_lowpart (mode, x);
+{
+  if (CONST_INT_P (x))
+   {
+ if (mode == V8BImode || mode == V4BImode)
+   {
+ /* In V8BI or V4BI each element has 2 or 4 bits, if those bits
+aren't all the same, gen_lowpart might ICE.  Canonicalize all
+the 2 or 4 bits to all ones if any of them is non-zero.  V8BI
+and V4BI multi-bit masks are interpreted byte-by-byte at
+instruction level, but such constants should describe lanes,
+rather than bytes.  See the section on MVE intrinsics in the
+Arm ACLE specification.  */
+ unsigned HOST_WIDE_INT xi = UINTVAL (x);
+ xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
+ if (mode == V4BImode)
+   xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
+ if (xi != UINTVAL (x))
+   warning_at (location, 0, "constant predicate argument %d"
+   " (%wx) does not map to %d lane numbers,"
+   " converted to %wx",
+   opno, UINTVAL (x) & 0x,
+   mode == V8BImode ? 8 : 4,
+   xi & 0x);
+
+ x = gen_int_mode (xi, HImode);
+   }
+ x = gen_lowpart (mode, x);
+   }
+  else
+   {
+ auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
+ x = force_subreg (mode, x, GET_MODE (x), byte);
+   }
+}
 
   m_ops.safe_grow (m_ops.length () + 1, true);
   create_input_operand (&m_ops.last (), x, mode);
diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c 
b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
index cb4b45bd305..b894f019b8b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
+++ b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
@@ -16,7 +16,7 @@ __attribute__ ((noipa)) partial_write (uint32_t *a, 
uint32x4_t v, unsigned short
 
 int main (void)
 {
-  unsigned short p = 0x00CC;
+  unsigned short p = 0x00FF;
   uint32_t a[] = {0, 0, 0, 0};
   uint32_t b[] = {0, 0, 0, 0};
   uint32x4_t v = vdupq_n_u32 (0xU);
diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443.c 
b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
index c5fbfa4a1bb..0c0e2dd6eb8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/pr108443.c
+++ b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
@@ -7,8 +7,8 @@
 void
 __attribute__ ((noipa)) partial_write_cst (uint32_t *a, uint32x4_t v)
 {
-  vstrwq_p_u32 (a, v, 0x00CC);
+  vstrwq_p_u32 (a, v, 0x00FF);
 }
 
-/* { dg-final { scan-assembler {mov\tr[0-9]+, #204} } } */
+/* { dg-final

Re: [PATCH] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-09 Thread Richard Earnshaw (lists)

On 08/01/2025 18:54, Christophe Lyon wrote:
> The previous fix only worked for C, for C++ we need to add more
> information to the underlying type so that
> finish_class_member_access_expr accepts it.
> 
> This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
> mode.
> 
> gcc/ChangeLog:
> 
>   PR target/118332
>   * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Handle C++
>   case.
> ---
>  gcc/config/arm/arm-mve-builtins.cc | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index 846cd773c0b..2cf81853cfa 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -457,6 +457,22 @@ wrap_type_in_struct (tree field_type)
>tree field = build_decl (input_location, FIELD_DECL,
>  get_identifier ("val"), field_type);
>tree struct_type = lang_hooks.types.make_type (RECORD_TYPE);
> +
> +  /* In C++ we need more info to comply with CLASS_TYPE_P and lookup_member 
> in
> + finish_class_member_access_expr.  */
> +  if (lang_GNU_CXX ())
> +{
> +  /* Equivalent to SET_CLASS_TYPE_P (struct_type, 1); but 
> SET_CLASS_TYPE_P
> +  is not available here.  */
> +  struct_type->type_common.lang_flag_5 = 1;
> +
> +  /* Extracted from xref_basetypes.  */
> +  tree binfo = make_tree_binfo (0);
> +  TYPE_BINFO (struct_type) = binfo;
> +  BINFO_OFFSET (binfo) = size_zero_node;
> +  BINFO_TYPE (binfo) = struct_type;
> +}
> +
>DECL_FIELD_CONTEXT (field) = struct_type;
>TYPE_FIELDS (struct_type) = field;
>layout_type (struct_type);

Can't this be handled via lang_hooks.types.simulate_record_decl() rather than 
having to have lang-specific code directly in the back-end?

R.

Re: [PATCH v5 05/10] OpenMP: Add C support for metadirectives and dynamic selectors.

2025-01-09 Thread Jakub Jelinek

On Thu, Jan 09, 2025 at 01:17:24PM +0100, Tobias Burnus wrote:
> A case where 'omp error' diagnostic should be delayed - and (here) suppressed:
> 
> program_control/sources/error.1.c:15:23: error: ‘pragma omp error’ 
> encountered: GNU compiler required.
>15 | otherwise(error at(compilation) severity(fatal) \
>   |   ^
> 
> which is odd given that we have the GNU compiler:
> 
> #pragma omp metadirective \
> when(implementation={vendor(gnu)}: nothing )   \
> otherwise(error at(compilation) severity(fatal) \
> message("GNU compiler required."))

This isn't going to be fun if it is supposed to be resolved only very late
in the compilation process, I guess then we'll need some tree/statement
holding the error directive perhaps with the guarding condition from the
metadirective and defer diagnostics until the metadirective is resolved.

Jakub

[COMMITTED 2/2] ada: Fix missing detection of late equality operator returning subtype of Boolean

2025-01-09 Thread Marc Poulhiès

From: Eric Botcazou 

In Ada 2012, the compiler fails to check that a primitive equality operator
for an untagged record type must appear before the type is frozen, when the
operator returns a subtype of Boolean.  This plugs the legality loophole but
adds the debug switch -gnatd_q to go back to the previous state.

gcc/ada/ChangeLog:

PR ada/18765
* debug.adb (d_q): Document new usage.
* sem_ch6.adb (New_Overloaded_Entity): Apply the special processing
to all equality operators whose base result type is Boolean, but do
not enforce the new Ada 2012 freezing rule if the result type is a
proper subtype of it and the -gnatd_q switch is specified.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/debug.adb   |  6 +-
 gcc/ada/sem_ch6.adb | 12 ++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
index 7b95fa87f02..ac3ce41dcc5 100644
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -154,7 +154,7 @@ package body Debug is
--  d_n
--  d_o
--  d_p  Ignore assertion pragmas for elaboration
-   --  d_q
+   --  d_q  Do not enforce freezing for equality operator of boolean subtype
--  d_r  Disable the use of the return slot in functions
--  d_s  Stop elaboration checks on synchronous suspension
--  d_t  In LLVM-based CCG, dump LLVM IR after transformations are done
@@ -999,6 +999,10 @@ package body Debug is
--   semantics of invariants and postconditions in both the static and
--   dynamic elaboration models.
 
+   --  d_q  The compiler does not enforce the new freezing rule introduced for
+   --   primitive equality operators in Ada 2012 when the operator returns
+   --   a subtype of Boolean.
+
--  d_r  The compiler does not make use of the return slot in the expansion
--   of functions returning a by-reference type. If this use is required
--   for these functions to return on the primary stack, then they are
diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 49d83f8d5e0..80e0c9c634c 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -12880,12 +12880,20 @@ package body Sem_Ch6 is
 
   <>
  if Chars (S) = Name_Op_Eq
-   and then Etype (S) = Standard_Boolean
+   and then Base_Type (Etype (S)) = Standard_Boolean
and then Present (Parent (S))
and then not Is_Dispatching_Operation (S)
  then
 Make_Inequality_Operator (S);
-Check_Untagged_Equality (S);
+
+--  The freezing rule introduced in Ada 2012 was historically
+--  not enforced for operators returning a subtype of Boolean.
+
+if Etype (S) = Standard_Boolean
+  or else not Debug_Flag_Underscore_Q
+then
+   Check_Untagged_Equality (S);
+end if;
  end if;
end New_Overloaded_Entity;
 
-- 
2.43.0

[COMMITTED 1/2] ada: Accept predefined multiply operator for fixed point in expression function

2025-01-09 Thread Marc Poulhiès

From: Eric Botcazou 

The RM 4.5.5(19.1/2) subclause says that the predefined multiply operator
for universal_fixed is still available, despite the declaration of a user-
defined primitive multiply operator for the fixed-point type at stake, if
it is identified using an expanded name with prefix denoting Standard, but
this is currently not the case in the context of an expression function.

gcc/ada/ChangeLog:

PR ada/118274
* sem_ch4.adb (Check_Arithmetic_Pair.Has_Fixed_Op): Use the original
node of the operator to identify the case of an expanded name whose
prefix is the package Standard.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index 5b9456bed0a..406983995f3 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -6619,18 +6619,20 @@ package body Sem_Ch4 is
   --
 
   function Has_Fixed_Op (Typ : Entity_Id; Op : Entity_Id) return Boolean is
- Bas : constant Entity_Id := Base_Type (Typ);
+ Bas: constant Entity_Id := Base_Type (Typ);
+ Orig_N : constant Node_Id   := Original_Node (N);
+
  Ent : Entity_Id;
  F1  : Entity_Id;
  F2  : Entity_Id;
 
   begin
- --  If the universal_fixed operation is given explicitly the rule
+ --  If the universal_fixed operation is given explicitly, the rule
  --  concerning primitive operations of the type do not apply.
 
- if Nkind (N) = N_Function_Call
-   and then Nkind (Name (N)) = N_Expanded_Name
-   and then Entity (Prefix (Name (N))) = Standard_Standard
+ if Nkind (Orig_N) = N_Function_Call
+   and then Nkind (Name (Orig_N)) = N_Expanded_Name
+   and then Entity (Prefix (Name (Orig_N))) = Standard_Standard
  then
 return False;
  end if;
-- 
2.43.0

Re: [PATCH] Add warning for use of non-spec FMV in Aarch64

2025-01-09 Thread Alfie Richards


Hi Kyrylo,

(resending due to missing CC)

On 09/01/2025 10:41, Kyrylo Tkachov wrote:
diff --git a/gcc/config/aarch64/aarch64.cc 
b/gcc/config/aarch64/aarch64.cc

index 91de13159cb..afc0749fd67 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20347,6 +20347,9 @@ aarch64_mangle_decl_assembler_name (tree decl, 
tree id)

if (TREE_CODE (decl) == FUNCTION_DECL
&& DECL_FUNCTION_VERSIONED (decl))
{
+ warning_at(DECL_SOURCE_LOCATION(decl), OPT_Wexperimental_fmv_target,
+ "function multi-versioning support is experimental");
+
Some wording nits.

Space before the “(“.

Thank you, I will fix.
I think there should be no ‘-‘ here to keep consistent with the ACLE 
wording. Not sure whether Function Multi Versioning should be 
capitalised. What do you think?
Ah yes, best to follow the ACLE for formatting I suppose. I will 
capitalise and remove the hyphen to match.
diff --git a/gcc/config/aarch64/aarch64.opt 
b/gcc/config/aarch64/aarch64.opt

index 36bc719b822..55670eeb74f 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -431,3 +431,7 @@ handling. One means we try to form pairs involving 
one or more existing

individual writeback accesses where possible. A value of two means we
also try to opportunistically form writeback opportunities by folding in
trailing destructive updates of the base register used by a pair.
+
+Wexperimental-fmv-target
+Target Var(warn_experimental_fmv) Warning Init(1)
+Warn about usage of experimental function multi versioning

Should this have aarch64 in the name somehow? It feels awkward to have 
aarch64 in the name, but the option is not generic.
I was following the Wopenacc-dims in designing this which is a target 
specific option for gcn

that isn't named overly specifically.

I also don't think this necessarily has to be limited to Aarch64. I'm 
aware other ports have
specified the behaviour of target_version like Aarch64 has which GCC 
currently doesn't conform to

and so may want to issue the same warning.

In any case, this should be documented in invoke.texi.

Will send a patch adding this.
I also think that from a user experience POV if they get this warning 
they may ask what the recommendation is.
Should they change their source? Be prepared that the spec will change 
in future releases??

The documentation should give some guidance


I have added to the warning (in upcoming patch) that the behaviour is 
likely to change.

I will also add similar to the documentation.

Thank you for the feedback! I will get an updated patch to you shortly.
Alfie Richards

[PATCH] Avoid PHI node re-allocation in loop copying

2025-01-09 Thread Richard Biener

duplicate_loop_body_to_header_edge redirects the original loop entry
edge to the loop copy header and the copied loop exit to the old
loop header.  But it does so in the order that requires temporary
space for an extra edge on the original loop header, causing
unnecessary re-allocations.  The following avoids this by swapping
the order of the redirects.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This originally surfaced with the location_t work which effectively
changed PHI allocations and thus made some passes unexpectedly
get their PHI nodes re-allocated when calling loop_version.  Mitigation
for this was provided which might or might not be completely
resolved by this (the vectorizer change is, as far as my testing goes).

Pushed to trunk.

I don't plan to revert the location_t changes in this area though,
not at this stage at least.

Richard.

* cfgloopmanip.cc (duplicate_loop_body_to_header_edge): When
copying to the header edge first redirect the entry to the
new loop and then the exit to the old to avoid PHI node
re-allocation.
---
 gcc/cfgloopmanip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 534e556e1e4..17bcf9f4acc 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -1447,9 +1447,9 @@ duplicate_loop_body_to_header_edge (class loop *loop, 
edge e,
}
   else
{
+ redirect_edge_and_branch_force (e, new_bbs[0]);
  redirect_edge_and_branch_force (new_spec_edges[SE_LATCH],
  loop->header);
- redirect_edge_and_branch_force (e, new_bbs[0]);
  set_immediate_dominator (CDI_DOMINATORS, new_bbs[0], e->src);
  e = new_spec_edges[SE_LATCH];
}
-- 
2.43.0

[PATCH] Fix SLP scalar costing with stmts also used in externals

2025-01-09 Thread Richard Biener

When we have the situation of an external SLP node that is
permuted the scalar stmts recorded in the permute node do not
mean the scalar computation can be removed.  We are removing
those stmts from the vectorized_scalar_stmts for this reason
but we fail to check this set when we cost scalar stmts.

The following fixes this.

This shows in PR115777 when we avoid vectorizing the load, but
on it's own doesn't help the PR yet.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/115777
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
cost a scalar stmt that needs to be preserved.
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 337506419d9..152ca433b0e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8687,7 +8687,12 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
   ssa_op_iter op_iter;
   def_operand_p def_p;
 
-  if (!stmt_info || (*life)[i])
+  if (!stmt_info
+ || (*life)[i]
+ /* Defs also used in external nodes are not in the
+vectorized_scalar_stmts set as they need to be preserved.
+Honor that.  */
+ || !vectorized_scalar_stmts.contains (stmt_info))
continue;
 
   stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
-- 
2.43.0

[PATCH] Add warning for use of non-spec FMV in Aarch64

2025-01-09 Thread alfie.richards

This patch adds a warning whenever FMV is used for Aarch64.

The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
future compatability issues.

There is a patch for and ACLE compliant version of target_version and
target_clone coming eventually but it won't make gcc-15.

This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_mangle_decl_assembler_name): Add experimental warning.
* config/aarch64/aarch64.opt: Add command line option to disable
warning.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Add CLI flag
* g++.target/aarch64/mv-symbols1.C: Add CLI flag
* g++.target/aarch64/mv-symbols2.C: Add CLI flag
* g++.target/aarch64/mv-symbols3.C: Add CLI flag
* g++.target/aarch64/mv-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-symbols5.C: Add CLI flag
* g++.target/aarch64/mvc-symbols1.C: Add CLI flag
* g++.target/aarch64/mvc-symbols2.C: Add CLI flag
* g++.target/aarch64/mvc-symbols3.C: Add CLI flag
* g++.target/aarch64/mvc-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-warning1.C: New test.
---
 gcc/config/aarch64/aarch64.cc   | 3 +++
 gcc/config/aarch64/aarch64.opt  | 4 
 gcc/testsuite/g++.target/aarch64/mv-1.C | 1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols1.C  | 1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols2.C  | 1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols3.C  | 1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols4.C  | 1 +
 gcc/testsuite/g++.target/aarch64/mv-symbols5.C  | 1 +
 gcc/testsuite/g++.target/aarch64/mv-warning1.C  | 9 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols1.C | 1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols2.C | 1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols3.C | 1 +
 gcc/testsuite/g++.target/aarch64/mvc-symbols4.C | 1 +
 13 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 91de13159cb..afc0749fd67 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20347,6 +20347,9 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
   if (TREE_CODE (decl) == FUNCTION_DECL
   && DECL_FUNCTION_VERSIONED (decl))
 {
+  warning_at(DECL_SOURCE_LOCATION(decl),  OPT_Wexperimental_fmv_target,
+		 "function multi-versioning support is experimental");
+
   aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version (decl);
 
   std::string name = IDENTIFIER_POINTER (id);
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 36bc719b822..55670eeb74f 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -431,3 +431,7 @@ handling.  One means we try to form pairs involving one or more existing
 individual writeback accesses where possible.  A value of two means we
 also try to opportunistically form writeback opportunities by folding in
 trailing destructive updates of the base register used by a pair.
+
+Wexperimental-fmv-target
+Target Var(warn_experimental_fmv) Warning Init(1)
+Warn about usage of experimental function multi versioning
diff --git a/gcc/testsuite/g++.target/aarch64/mv-1.C b/gcc/testsuite/g++.target/aarch64/mv-1.C
index b4b0e5e3fea..b10037f1b9b 100644
--- a/gcc/testsuite/g++.target/aarch64/mv-1.C
+++ b/gcc/testsuite/g++.target/aarch64/mv-1.C
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-ifunc "" } */
 /* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
 
 __attribute__((target_version("default")))
 int foo ()
diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols1.C b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
index 53e0abcd9b4..73cde42fa34 100644
--- a/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-ifunc "" } */
 /* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
 
 int foo ()
 {
diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols2.C b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
index f0c7967a97a..6da88ddfb48 100644
--- a/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-ifunc "" } */
 /* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
 
 __attribute__((target_version("default")))
 int foo ()
diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols3.C b/gcc/testsuite/g++.target/aarch64/mv-symbols3.C
index 3d30e27deb8..5d

Ping (was: rs6000: Add -msplit-patch-nops (PR112980))

2025-01-09 Thread Michael Matz

Hello,

On Wed, 13 Nov 2024, Michael Matz wrote:

> Hello,
> 
> this is essentially 
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
> 
> from Kewen in functionality.  When discussing this with Segher at the 
> Cauldron he expressed reservations about changing the default 
> implementation of -fpatchable-function-entry.  So, to move forward, let's 
> move it under a new target option -msplit-patch-nops (expressing the 
> important deviation from the default behaviour, namely that all the 
> patching nops form a consecutive sequence normally).
> 
> Regstrapping on power9 ppc64le in progress.  Okay if that passes?

Is there anything I can do to move forward with this one?  Please?


Ciao,
Michael.


> 
> 
> Ciao,
> Michael.
> 
> ---
> 
> as the bug report details some uses of -fpatchable-function-entry
> aren't happy with the "before" NOPs being inserted between global and
> local entry point on powerpc.  We want the before NOPs be in front
> of the global entry point.  That means that the patching NOPs aren't
> consecutive for dual entry point functions, but for these usecases
> that's not the problem.  But let us support both under the control
> of a new target option: -msplit-patch-nops.
> 
>   gcc/
> 
> PR target/112980
> * config/rs6000/rs6000.opt (msplit-patch-nops): New option.
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
> * config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
> New member.
> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
> Emit split nops under control of that one.
> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
> Add handling of split patch nops.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 15 +--
>  gcc/config/rs6000/rs6000.cc   | 27 +++
>  gcc/config/rs6000/rs6000.h|  6 ++
>  gcc/config/rs6000/rs6000.opt  |  4 
>  gcc/doc/invoke.texi   | 17 +++--
>  5 files changed, 57 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index c87058b435e..aa1e0442f2b 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
>  
>unsigned short patch_area_size = crtl->patch_area_size;
>unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> +  /* Emit non-split patching area now.  */
> +  if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
>   {
> cfun->machine->global_entry_emitted = true;
> /* As ELFv2 ABI shows, the allowable bytes between the global
> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
>  patch_area_entry);
> rs6000_print_patchable_function_entry (file, patch_area_entry,
>true);
> -   patch_area_size -= patch_area_entry;
>   }
>   }
>  
> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
>assemble_name (file, name);
>fputs ("\n", file);
>/* Emit the nops after local entry.  */
> -  if (patch_area_size > 0)
> - rs6000_print_patchable_function_entry (file, patch_area_size,
> -patch_area_entry == 0);
> +  if (patch_area_size > patch_area_entry)
> + {
> +   patch_area_size -= patch_area_entry;
> +   cfun->machine->stop_patch_area_print = false;
> +   rs6000_print_patchable_function_entry (file, patch_area_size,
> +  patch_area_entry == 0);
> + }
>  }
>  
>else if (rs6000_pcrel_p ())
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 950fd947fda..6427e6913ba 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file,
>  {
>bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p 
> ();
>/* For a function which needs global entry point, we will emit the
> - patchable area before and after local entry point under the control of
> - cfun->machine->global_entry_emitted, see the handling in function
> - rs6000_output_function_prologue.  */
> -  if (!global_entry_needed_p || cfun->machine->global_entry_emitted)
> + patchable area when it isn't split before and after local entry point
> + under the control of cfun->machine->global_entry_emitted, see the
> + handling in function rs6000_output_function_prologue.  */
> +  if (!TARGET_SPLIT_PATCH_NOPS
> +  && (!global_entry_needed_p || cfun->machine->global_entry_emitted))
>  default_print_patchable_function_entry (file, patch_are

Re: [PATCH v2] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-09 Thread Richard Earnshaw (lists)

On 09/01/2025 14:50, Christophe Lyon wrote:
> The previous fix only worked for C, for C++ we need to add more
> information to the underlying type so that
> finish_class_member_access_expr accepts it.
> 
> We use the same logic as in aarch64's register_tuple_type for AdvSIMD
> tuples.
> 
> This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
> mode.
> 
> gcc/ChangeLog:
> 
>   PR target/118332
>   * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
>   (register_type_decl): Delete.
>   (register_builtin_tuple_types): Use
>   lang_hooks.types.simulate_record_decl.

Much nicer.

OK, but please give Richard S 24 hours to comment.

R.

[PING 1] [PATCH v2] rs6000: Inefficient vector splat of small V2DI constants [PR107757]

2025-01-09 Thread Surya Kumari Jangala

Ping

On 02/12/24 2:20 pm, Surya Kumari Jangala wrote:
> I have incorporated review comments in this patch.
> 
> Regards,
> Surya
> 
> 
> rs6000: Inefficient vector splat of small V2DI constants [PR107757]
> 
> On P8, for vector splat of double word constants, specifically -1 and 1,
> gcc generates inefficient code. For -1, gcc generates two instructions
> (vspltisw and vupkhsw) whereas only one instruction (vspltisw) is
> sufficient. For constant 1, gcc generates a load of the constant from
> .rodata instead of the instructions vspltisw and vupkhsw.
> 
> The routine vspltisw_vupkhsw_constant_p() returns true if the constant
> can be synthesized with instructions vspltisw and vupkhsw. However, for
> constant 1, this routine returns false.
> 
> For constant -1, this routine returns true. Vector splat of -1 can be
> done with only one instruction, i.e., vspltisw. We do not need two
> instructions. Hence this routine should return false for -1.
> 
> With this patch, gcc generates only one instruction (vspltisw)
> for -1. And for constant 1, this patch generates two instructions
> (vspltisw and vupkhsw).
> 
> 2024-11-20  Surya Kumari Jangala  
> 
> gcc/
>   PR target/107757
>   * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p):
>   Return false for -1 and return true for 1.
> 
> gcc/testsuite/
>   PR target/107757
>   * gcc.target/powerpc/pr107757-1.c: New.
>   * gcc.target/powerpc/pr107757-2.c: New.
> ---
>  gcc/config/rs6000/rs6000.cc   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr107757-1.c | 14 ++
>  gcc/testsuite/gcc.target/powerpc/pr107757-2.c | 13 +
>  3 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 02a2f1152db..d0c528f4d5f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -6652,7 +6652,7 @@ vspltisw_vupkhsw_constant_p (rtx op, machine_mode mode, 
> int *constant_ptr)
>  return false;
>  
>value = INTVAL (elt);
> -  if (value == 0 || value == 1
> +  if (value == 0 || value == -1
>|| !EASY_VECTOR_15 (value))
>  return false;
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
> new file mode 100644
> index 000..49076fba255
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */
> +/* { dg-final { scan-assembler {\mvupkhsw\M} } } */
> +/* { dg-final { scan-assembler-not {\mlvx\M} } } */
> +
> +#include 
> +
> +vector long long
> +foo ()
> +{
> +   return vec_splats (1LL);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
> new file mode 100644
> index 000..4955696f11d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */
> +/* { dg-final { scan-assembler-not {\mvupkhsw\M} } } */
> +
> +#include 
> +
> +vector long long
> +foo ()
> +{
> +  return vec_splats (~0LL);
> +}

Re: [Patch, Fortran, PR118337, v1] Fortran: Fix Fortran *.mod compatibility [PR118337]

2025-01-09 Thread Andre Vehreschild

Hi Mikael,

merged only patch #2 as gcc-15-6729-gd1071402055. 

Thanks for the ok and regards,
Andre

On Wed, 8 Jan 2025 22:46:15 +0100
Mikael Morin  wrote:

> Le 08/01/2025 à 18:23, Andre Vehreschild a écrit :
> > 
> > First of all the recursive attr must not be set on vtypes, neither on module
> > ones nor anywhere else. Strictly speaking is a vtype recursive, because by
> > its extends member it references itself through a pointer. But it is
> > guaranteed that the base type is never the same as the extended one. So no
> > cycle can occur. Furthermore are vtypes never freeed nor copied (yet). So
> > the flag is not needed which the patch starting with 0002 ensures.
> >   
> 
> > From d0b43ccb141dbec998e81fd437f7f1a02bd74731 Mon Sep 17 00:00:00 2001
> > From: Andre Vehreschild 
> > Date: Wed, 8 Jan 2025 14:58:35 +0100
> > Subject: [PATCH 2/3] Fortran: Cylce detection for non vtypes only.
> > [PR118337]
> > 
> > gcc/fortran/ChangeLog:
> > 
> > PR fortran/118337
> > 
> > * resolve.cc (resolve_fl_derived0): Exempt vtypes from cycle
> > detection.
> > ---
> >  gcc/fortran/resolve.cc | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
> > index 6dcda70679f..dab0c3af601 100644
> > --- a/gcc/fortran/resolve.cc
> > +++ b/gcc/fortran/resolve.cc
> > @@ -16840,7 +16840,8 @@ resolve_fl_derived0 (gfc_symbol *sym)
> > 
> >/* Resolving components below, may create vtabs for which the cyclic type
> >   information needs to be present.  */
> > -  resolve_cyclic_derived_type (sym);
> > +  if (!sym->attr.vtype)
> > +resolve_cyclic_derived_type (sym);
> > 
> >c = (sym->attr.is_class) ? CLASS_DATA (sym->components)
> >: sym->components;
> > --
> > 2.47.1  
> 
> OK.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] [gcc-14] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2025-01-09 Thread Richard Earnshaw (lists)

On 09/01/2025 08:58, Christophe Lyon wrote:
> OK for gcc-14?
> 
> This backport is a cherry pick of commit
> 2089009210a1774c37e527ead8bbcaaa1a7a9d2d, with a small change needed
> because force_lowpart_subreg does not exist in gcc-14: the patch
> replaces it with the equivalent:
> 
> -x = force_lowpart_subreg (mode, x, GET_MODE (x));
> +{
> +  auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
> +  x = force_subreg (mode, x, GET_MODE (x), byte);
> +}

I think it would be OK to backport force_lowpart_subreg() to gcc-14 (nothing 
else will call it, so it can't change the behaviour elsewhere).

But this is OK too.  Your call.

R.

> 
> In this PR, we have to handle a case where MVE predicates are supplied
> as a const_int, where individual predicates have illegal boolean
> values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
> fix the constant (any non-zero value is converted to all 1s) and emit
> a warning.
> 
> On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
> instruction level, but end-users should describe lanes rather than
> bytes (so all bytes of a true-predicated lane should be '1'), see the
> section on MVE intrinsics in the Arm ACLE specification.
> 
> Since force_lowpart_subreg cannot handle const_int (because they have VOID 
> mode),
> use gen_lowpart on them, force_lowpart_subreg otherwise.
> 
> 2024-11-20  Christophe Lyon  
>   Jakub Jelinek  
> 
>   PR target/114801
>   gcc/
>   * config/arm/arm-mve-builtins.cc
>   (function_expander::add_input_operand): Handle CONST_INT
>   predicates.
> 
>   gcc/testsuite/
>   * gcc.target/arm/mve/pr108443.c: Update predicate constant.
>   * gcc.target/arm/mve/pr108443-run.c: Likewise.
>   * gcc.target/arm/mve/pr114801.c: New test.
> 
> (cherry picked from commit 2089009210a1774c37e527ead8bbcaaa1a7a9d2d)
> ---
>  gcc/config/arm/arm-mve-builtins.cc| 35 -
>  .../gcc.target/arm/mve/pr108443-run.c |  2 +-
>  gcc/testsuite/gcc.target/arm/mve/pr108443.c   |  4 +-
>  gcc/testsuite/gcc.target/arm/mve/pr114801.c   | 39 +++
>  4 files changed, 76 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index e1826ae4052..c57bf0844b0 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -2107,7 +2107,40 @@ function_expander::add_input_operand (insn_code icode, 
> rtx x)
>mode = GET_MODE (x);
>  }
>else if (VALID_MVE_PRED_MODE (mode))
> -x = gen_lowpart (mode, x);
> +{
> +  if (CONST_INT_P (x))
> + {
> +   if (mode == V8BImode || mode == V4BImode)
> + {
> +   /* In V8BI or V4BI each element has 2 or 4 bits, if those bits
> +  aren't all the same, gen_lowpart might ICE.  Canonicalize all
> +  the 2 or 4 bits to all ones if any of them is non-zero.  V8BI
> +  and V4BI multi-bit masks are interpreted byte-by-byte at
> +  instruction level, but such constants should describe lanes,
> +  rather than bytes.  See the section on MVE intrinsics in the
> +  Arm ACLE specification.  */
> +   unsigned HOST_WIDE_INT xi = UINTVAL (x);
> +   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
> +   if (mode == V4BImode)
> + xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
> +   if (xi != UINTVAL (x))
> + warning_at (location, 0, "constant predicate argument %d"
> + " (%wx) does not map to %d lane numbers,"
> + " converted to %wx",
> + opno, UINTVAL (x) & 0x,
> + mode == V8BImode ? 8 : 4,
> + xi & 0x);
> +
> +   x = gen_int_mode (xi, HImode);
> + }
> +   x = gen_lowpart (mode, x);
> + }
> +  else
> + {
> +   auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
> +   x = force_subreg (mode, x, GET_MODE (x), byte);
> + }
> +}
>  
>m_ops.safe_grow (m_ops.length () + 1, true);
>create_input_operand (&m_ops.last (), x, mode);
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> index cb4b45bd305..b894f019b8b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr108443-run.c
> @@ -16,7 +16,7 @@ __attribute__ ((noipa)) partial_write (uint32_t *a, 
> uint32x4_t v, unsigned short
>  
>  int main (void)
>  {
> -  unsigned short p = 0x00CC;
> +  unsigned short p = 0x00FF;
>uint32_t a[] = {0, 0, 0, 0};
>uint32_t b[] = {0, 0, 0, 0};
>uint32x4_t v = vdupq_n_u32 (0xU);
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr108443.c

Re: [Patch, Fortran, PR118337, v1] Fortran: Fix Fortran *.mod compatibility [PR118337]

2025-01-09 Thread Andre Vehreschild

Hi all,

I am sorry, I don't get it. So we are trying to help the compiler, i.e. the C++
one, to create a fast gfortran binary. But we don't care about devs that stumble
about the code and ask themselves, "Why is this done (without a comment) so
oddly?"! Furthermore is the code again using convention or inside knowledge:

  intmod_sym symbol[] = {
#define NAMED_INTCST(a,b,c,d) { a, b, 0, d },
#define NAMED_UINTCST(a,b,c,d) { a, b, 0, d },
#define NAMED_KINDARRAY(a,b,c,d) { a, b, 0, d },
#define NAMED_DERIVED_TYPE(a,b,c,d) { a, b, 0, d },
#define NAMED_FUNCTION(a,b,c,d) { a, b, c, d },
#define NAMED_SUBROUTINE(a,b,c,d) { a, b, c, d },

So in function and subroutine definitions `c` may become non-constant in the
future and we neither detect it nor care. 

#include "iso-fortran-env.def"
{ ISOFORTRANENV_INVALID, NULL, -1234, 0 } };

  i = 0;
#define NAMED_INTCST(a,b,c,d) symbol[i++].value = c;
#define NAMED_UINTCST(a,b,c,d) symbol[i++].value = c;
#define NAMED_KINDARRAY(a,b,c,d) i++;
#define NAMED_DERIVED_TYPE(a,b,c,d) i++;
#define NAMED_FUNCTION(a,b,c,d) i++;
#define NAMED_SUBROUTINE(a,b,c,d) i++;
#include "iso-fortran-env.def"

So why not at least adding a comment to document this design decision and I
would also propose to be consequent and also set `c` for functions and
subroutines in the second part. Just to have it similar. You know the
design "principle of least confusion"?!

A confused
Andre


On Wed, 8 Jan 2025 22:41:13 +0100
Mikael Morin  wrote:

> Le 08/01/2025 à 18:37, Jakub Jelinek a écrit :
> > On Wed, Jan 08, 2025 at 06:23:40PM +0100, Andre Vehreschild wrote:  
> >> gcc/fortran/ChangeLog:
> >>
> >>PR fortran/118337
> >>* module.cc (use_iso_fortran_env_module): Prevent additional run
> >>over (un-)signed ints and assign kind directly.
> >> ---
> >>   gcc/fortran/module.cc | 13 ++---
> >>   1 file changed, 2 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/gcc/fortran/module.cc b/gcc/fortran/module.cc
> >> index de8df05781d..8dc10e1d349 100644
> >> --- a/gcc/fortran/module.cc
> >> +++ b/gcc/fortran/module.cc
> >> @@ -7113,8 +7113,8 @@ use_iso_fortran_env_module (void)
> >> int i, j;
> >>
> >> intmod_sym symbol[] = {
> >> -#define NAMED_INTCST(a,b,c,d) { a, b, 0, d },
> >> -#define NAMED_UINTCST(a,b,c,d) { a, b, 0, d },
> >> +#define NAMED_INTCST(a, b, c, d) {a, b, c, d},
> >> +#define NAMED_UINTCST(a, b, c, d) {a, b, c, d},
> >>   #define NAMED_KINDARRAY(a,b,c,d) { a, b, 0, d },
> >>   #define NAMED_DERIVED_TYPE(a,b,c,d) { a, b, 0, d },
> >>   #define NAMED_FUNCTION(a,b,c,d) { a, b, c, d },
> >> @@ -7122,15 +7122,6 @@ use_iso_fortran_env_module (void)
> >>   #include "iso-fortran-env.def"
> >>   { ISOFORTRANENV_INVALID, NULL, -1234, 0 } };
> >>
> >> -  i = 0;
> >> -#define NAMED_INTCST(a,b,c,d) symbol[i++].value = c;
> >> -#define NAMED_UINTCST(a,b,c,d) symbol[i++].value = c;
> >> -#define NAMED_KINDARRAY(a,b,c,d) i++;
> >> -#define NAMED_DERIVED_TYPE(a,b,c,d) i++;
> >> -#define NAMED_FUNCTION(a,b,c,d) i++;
> >> -#define NAMED_SUBROUTINE(a,b,c,d) i++;
> >> -#include "iso-fortran-env.def"
> >> -  
> > 
> > I thought the reason was that NAMED_{,U}INTCST c is non-constant
> > while everything else is constant and the source trying to help
> > the C++ compiler to emit decent code for it.  
> 
> > Though, I think g++ will end up doing pretty much the same thing,
> > split the non-constant parts of the initializer into statements overwriting
> > values in the variable and using 0 for that in the initializer before
> > it is overwritten.
> >   
> I have double-checked that, and it doesn't seem to be the case, at least 
> according to the .optimized dump.  So I'm inclined to prefer the 
> two-stages initialization version.
> Maybe we can add an assert making sure that i remain synced through the 
> second stage?
> That would be something like gcc_checking_assert (symbol[i].id == a);
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] Add warning for use of non-spec FMV in Aarch64

2025-01-09 Thread Kyrylo Tkachov

Hi Alfie,

> On 9 Jan 2025, at 10:58, alfie.richa...@arm.com wrote:
> 
> This patch adds a warning whenever FMV is used for Aarch64.
> 
> The reasoning for this is the ACLE [1] spec for FMV has diverged
> significantly from the current implementation and we want to prevent
> future compatability issues.
> 
> There is a patch for and ACLE compliant version of target_version and
> target_clone coming eventually but it won't make gcc-15.
> 
> This has been bootstrap and regression tested for Aarch64.
> Is this okay for master and packport to gcc-14?
> 
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64.cc
> (aarch64_mangle_decl_assembler_name): Add experimental warning.
> * config/aarch64/aarch64.opt: Add command line option to disable
> warning.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/aarch64/mv-1.C: Add CLI flag
> * g++.target/aarch64/mv-symbols1.C: Add CLI flag
> * g++.target/aarch64/mv-symbols2.C: Add CLI flag
> * g++.target/aarch64/mv-symbols3.C: Add CLI flag
> * g++.target/aarch64/mv-symbols4.C: Add CLI flag
> * g++.target/aarch64/mv-symbols5.C: Add CLI flag
> * g++.target/aarch64/mvc-symbols1.C: Add CLI flag
> * g++.target/aarch64/mvc-symbols2.C: Add CLI flag
> * g++.target/aarch64/mvc-symbols3.C: Add CLI flag
> * g++.target/aarch64/mvc-symbols4.C: Add CLI flag
> * g++.target/aarch64/mv-warning1.C: New test.
> ---
> gcc/config/aarch64/aarch64.cc   | 3 +++
> gcc/config/aarch64/aarch64.opt  | 4 
> gcc/testsuite/g++.target/aarch64/mv-1.C | 1 +
> gcc/testsuite/g++.target/aarch64/mv-symbols1.C  | 1 +
> gcc/testsuite/g++.target/aarch64/mv-symbols2.C  | 1 +
> gcc/testsuite/g++.target/aarch64/mv-symbols3.C  | 1 +
> gcc/testsuite/g++.target/aarch64/mv-symbols4.C  | 1 +
> gcc/testsuite/g++.target/aarch64/mv-symbols5.C  | 1 +
> gcc/testsuite/g++.target/aarch64/mv-warning1.C  | 9 +
> gcc/testsuite/g++.target/aarch64/mvc-symbols1.C | 1 +
> gcc/testsuite/g++.target/aarch64/mvc-symbols2.C | 1 +
> gcc/testsuite/g++.target/aarch64/mvc-symbols3.C | 1 +
> gcc/testsuite/g++.target/aarch64/mvc-symbols4.C | 1 +
> 13 files changed, 26 insertions(+)
> create mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C
> 
> <0001-Add-warning-for-use-of-non-spec-FMV-in-Aarch64.patch>

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 91de13159cb..afc0749fd67 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20347,6 +20347,9 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
if (TREE_CODE (decl) == FUNCTION_DECL
&& DECL_FUNCTION_VERSIONED (decl))
{
+ warning_at(DECL_SOURCE_LOCATION(decl), OPT_Wexperimental_fmv_target,
+ "function multi-versioning support is experimental");
+
Some wording nits.

Space before the “(“.
I think there should be no ‘-‘ here to keep consistent with the ACLE wording. 
Not sure whether Function Multi Versioning should be capitalised. What do you 
think?

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 36bc719b822..55670eeb74f 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -431,3 +431,7 @@ handling. One means we try to form pairs involving one or 
more existing
individual writeback accesses where possible. A value of two means we
also try to opportunistically form writeback opportunities by folding in
trailing destructive updates of the base register used by a pair.
+
+Wexperimental-fmv-target
+Target Var(warn_experimental_fmv) Warning Init(1)
+Warn about usage of experimental function multi versioning

Should this have aarch64 in the name somehow? It feels awkward to have aarch64 
in the name, but the option is not generic.
In any case, this should be documented in invoke.texi.
I also think that from a user experience POV if they get this warning they may 
ask what the recommendation is.
Should they change their source? Be prepared that the spec will change in 
future releases??
The documentation should give some guidance

Thanks,
Kyrill

Re: [PATCH] testsuite: arm: Verify asm per function for armv8_2-fp16-conv-1.c

2025-01-09 Thread Richard Earnshaw (lists)

On 27/12/2024 17:01, Torbjörn SVENSSON wrote:
> Ok for trunk?
> 
> --
> 
> This change will enforce that the expected instructions are generated
> per function rather than allowing some other function to use the
> expected instructions.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/armv8_2-fp16-conv-1.c: Convert
>   scan-assembler-times to check-function-bodies.
> 
> Signed-off-by: Torbjörn SVENSSON 

I'd recommend that you also add "-fno-schedule-insns -fno-schedule-insns2" to 
dg-options to avoid the risk of the scheduler moving code around and breaking 
the sequences.

OK with that change.

R.

> ---
>  .../gcc.target/arm/armv8_2-fp16-conv-1.c  | 99 ---
>  1 file changed, 83 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c 
> b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
> index c9639a542ae..279aafbc7b4 100644
> --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
> +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
> @@ -2,100 +2,167 @@
>  /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
>  /* { dg-options "-O2" }  */
>  /* { dg-add-options arm_v8_2a_fp16_scalar }  */
> +/* { dg-final { check-function-bodies "**" "" } } */
>  
>  /* Test ARMv8.2 FP16 conversions.  */
>  #include 
>  
> +/*
> +** f16_to_f32:
> +** ...
> +**   vcvtb\.f32\.f16 (s[0-9]+), \1
> +** ...
> +*/
>  float
>  f16_to_f32 (__fp16 a)
>  {
>return (float)a;
>  }
>  
> +/*
> +** f16_to_pf32:
> +** ...
> +**   vcvtb\.f32\.f16 (s[0-9]+), \1
> +** ...
> +*/
>  float
>  f16_to_pf32 (__fp16* a)
>  {
>return (float)*a;
>  }
>  
> +/*
> +** f16_to_s16:
> +** ...
> +**   vcvtb\.f32\.f16 (s[0-9]+), \1
> +**   vcvt\.s32\.f32  \1, \1
> +** ...
> +*/
>  short
>  f16_to_s16 (__fp16 a)
>  {
>return (short)a;
>  }
>  
> +/*
> +** pf16_to_s16:
> +** ...
> +**   vcvtb\.f32\.f16 (s[0-9]+), \1
> +**   vcvt\.s32\.f32  \1, \1
> +** ...
> +*/
>  short
>  pf16_to_s16 (__fp16* a)
>  {
>return (short)*a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } 
> }  */
> -
> +/*
> +** f32_to_f16:
> +** ...
> +**   vcvtb\.f16\.f32 (s[0-9]+), \1
> +** ...
> +*/
>  __fp16
>  f32_to_f16 (float a)
>  {
>return (__fp16)a;
>  }
>  
> +/*
> +** f32_to_pf16:
> +** ...
> +**   vcvtb\.f16\.f32 (s[0-9]+), \1
> +** ...
> +*/
>  void
>  f32_to_pf16 (__fp16* x, float a)
>  {
>*x = (__fp16)a;
>  }
>  
> +/*
> +** s16_to_f16:
> +** ...
> +**   vcvt\.f32\.s32  (s[0-9]+), \1
> +**   vcvtb\.f16\.f32 \1, \1
> +** ...
> +*/
>  __fp16
>  s16_to_f16 (short a)
>  {
>return (__fp16)a;
>  }
>  
> +/*
> +** s16_to_pf16:
> +** ...
> +**   vcvt\.f32\.s32  (s[0-9]+), \1
> +**   vcvtb\.f16\.f32 \1, \1
> +** ...
> +*/
>  void
>  s16_to_pf16 (__fp16* x, short a)
>  {
>*x = (__fp16)a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } 
> }  */
> -
> +/*
> +** s16_to_f32:
> +** ...
> +**   vcvt\.f32\.s32  (s[0-9]+), \1
> +** ...
> +*/
>  float
>  s16_to_f32 (short a)
>  {
>return (float)a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } 
> }  */
> -
> +/*
> +** f32_to_s16:
> +** ...
> +**   vcvt\.s32\.f32  (s[0-9]+), \1
> +** ...
> +*/
>  short
>  f32_to_s16 (float a)
>  {
>return (short)a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } 
> }  */
> -
> +/*
> +** f32_to_u16:
> +** ...
> +**   vcvt\.u32\.f32  (s[0-9]+), \1
> +** ...
> +*/
>  unsigned short
>  f32_to_u16 (float a)
>  {
>return (unsigned short)a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } 
> }  */
> -
> +/*
> +** f64_to_s16:
> +** ...
> +**   vcvt\.s32\.f64  s[0-9]+, d[0-9]+
> +** ...
> +*/
>  short
>  f64_to_s16 (double a)
>  {
>return (short)a;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } 
> }  */
> -
> +/*
> +** f64_to_s16:
> +** ...
> +**   vcvt\.s32\.f64  s[0-9]+, d[0-9]+
> +** ...
> +*/
>  unsigned short
>  f64_to_u16 (double a)
>  {
>return (unsigned short)a;
>  }
> -
> -/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } 
> }  */
> -
> -

Re: [PATCH] c++: Suppress note linked to error suppressed by -Wno-template-body [PR118163]

2025-01-09 Thread Simon Martin

Hi Jason,

On 8 Jan 2025, at 22:56, Jason Merrill wrote:

> On 12/21/24 11:35 AM, Simon Martin wrote:
>> When erroring out due to an incomplete type, we add a contextual note
>> about the type. However, when the error is suppressed by
>> -Wno-template-body, the note remains, making the compiler output 
>> quite
>> puzzling.
>>
>> This patch makes sure the note is suppressed if we're processing a
>> template declaration body with -Wno-template-body.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/118163
>>
>> gcc/cp/ChangeLog:
>>
>>  * cp-tree.h (get_current_template): Declare.
>>  * error.cc (get_current_template): Make non static.
>>  * typeck2.cc (cxx_incomplete_type_inform): Suppress note when
>>  parsing a template declaration with -Wno-template-body.
>
> I think rather than adding this sort of thing in lots of places where 
> an error is followed by an inform, we should change error to return 
> bool like other diagnostic functions, and check its return value 
> before calling cxx_incomplete_type_inform or plain inform.  This 
> likely involves the same number of changes, but they should be 
> smaller.
That’d be more future-proof for sure. I can work on this change for 
GCC 16 if there’s consensus it’s the right thing to do.

> Patrick, what do you think?
>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/diagnostic/incomplete-type-2.C: New test.
>>  * g++.dg/diagnostic/incomplete-type-2a.C: New test.
>>
>> ---
>>   gcc/cp/cp-tree.h|  1 +
>>   gcc/cp/error.cc |  2 +-
>>   gcc/cp/typeck2.cc   |  6 ++
>>   gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C |  7 +++
>>   .../g++.dg/diagnostic/incomplete-type-2a.C  | 13 
>> +
>>   5 files changed, 28 insertions(+), 1 deletion(-)
>>   create mode 100644 
>> gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
>>   create mode 100644 
>> gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C
>>
>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>> index 6de8f64b5ee..52f954b63d9 100644
>> --- a/gcc/cp/cp-tree.h
>> +++ b/gcc/cp/cp-tree.h
>> @@ -7297,6 +7297,7 @@ struct decl_location_traits
>>   typedef hash_map 
>> erroneous_templates_t;
>>   extern GTY((cache)) erroneous_templates_t *erroneous_templates;
>>  +extern tree get_current_template ();
>>   extern bool cp_seen_error ();
>>   #define seen_error() cp_seen_error ()
>>  diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
>> index 8c0644fba7e..7fd03dd6d12 100644
>> --- a/gcc/cp/error.cc
>> +++ b/gcc/cp/error.cc
>> @@ -197,7 +197,7 @@ class cxx_format_postprocessor : public 
>> format_postprocessor
>>   /* Return the in-scope template that's currently being parsed, or
>>  NULL_TREE otherwise.  */
>>  -static tree
>> +tree
>>   get_current_template ()
>>   {
>> if (scope_chain && in_template_context && !current_instantiation 
>> ())
>> diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
>> index fce687e83b3..505c143dae7 100644
>> --- a/gcc/cp/typeck2.cc
>> +++ b/gcc/cp/typeck2.cc
>> @@ -273,6 +273,12 @@ cxx_incomplete_type_inform (const_tree type)
>> if (!TYPE_MAIN_DECL (type))
>>   return;
>>  +  /* When processing a template declaration body, the error 
>> generated by the
>> + caller (if any) might have been suppressed by 
>> -Wno-template-body. If that
>> + is the case, suppress the inform as well.  */
>> +  if (!warn_template_body && get_current_template ())
>> +return;
>> +
>> location_t loc = DECL_SOURCE_LOCATION (TYPE_MAIN_DECL (type));
>> tree ptype = strip_top_quals (CONST_CAST_TREE (type));
>>  diff --git a/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C 
>> b/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
>> new file mode 100644
>> index 000..e2fb20a4ae8
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2.C
>> @@ -0,0 +1,7 @@
>> +// PR c++/118163
>> +// { dg-do "compile" }
>> +
>> +template
>> +struct S {  // { dg-note "until the closing brace" }
>> +  S s;  // { dg-error "has incomplete type" }
>> +};
>> diff --git a/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C 
>> b/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C
>> new file mode 100644
>> index 000..d13021d0b68
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/diagnostic/incomplete-type-2a.C
>> @@ -0,0 +1,13 @@
>> +// PR c++/118163
>> +// { dg-do "compile" }
>> +// { dg-additional-options "-Wno-template-body" }
>> +
>> +template
>> +struct S {  // { dg-bogus "until the closing brace" }
>> +  S s;  // { dg-bogus "has incomplete type" }
>> +};
>> +
>> +// Check that we don't suppress errors outside of the body.
>> +struct forward_decl;// { dg-note "forward declaration" }
>> +template
>> +void foo (forward_decl) {}  // { dg-error "has incomplete type" }

Re: [PATCH v5 05/10] OpenMP: Add C support for metadirectives and dynamic selectors.

2025-01-09 Thread Tobias Burnus


Hi Sandra,

I had a first glance at your patch, albeit very superficial. I found one
issue reading the code - and observed some issues when running it against
some existing external tests.

I will have a deeper looker later, will now first do another iteration
on your main/middle-end patch #2, which I still have to finish studying.

But now comments to this patch - or C testing as at least one comment
relates to code in Patch #2.

Sandra Loosemore wrote:


Additional shared C/C++ testcases are included in a subsequent patch in this
series.


I started by playing around (well, just running) some existing testcases.
It seems as if the OpenMP example document found cases which aren't handled.

I think those should eventually work - but it can be deferred to a follow-up
patch; and probably they should be handled as follow up. (We might consider
collecting those in a PR / some PRs, if we won't fix them soon.)

Only tested with C, but C++ and Fortran are likely affected by most of
them alike (even if each FE has its own implementation).

* * *

A case where 'omp error' diagnostic should be delayed - and (here) suppressed:

program_control/sources/error.1.c:15:23: error: ‘pragma omp error’ encountered: 
GNU compiler required.
   15 | otherwise(error at(compilation) severity(fatal) \
  |   ^

which is odd given that we have the GNU compiler:

#pragma omp metadirective \
when(implementation={vendor(gnu)}: nothing )   \
otherwise(error at(compilation) severity(fatal) \
message("GNU compiler required."))

https://github.com/OpenMP/Examples/blob/main/program_control/sources/error.1.c

(Clang-19 handles this: with vendor(llvm) no error is printed.)

* * *

Here, it seems as if the 'target' + 'teams' without intervening code check has 
to be updated:

program_control/sources/metadirective.1.c:16:12: error: ‘target’ construct with 
nested ‘teams’ construct contains directives outside of the ‘teams’ construct
   16 |#pragma omp target map(to:v1,v2) map(from:v3) device(0)
  |^~~

for

   #pragma omp target map(to:v1,v2) map(from:v3) device(0)
   #pragma omp metadirective \
   when( device={arch("nvptx")}: teams loop) \
   otherwise( parallel loop)
 for (int i= 0; i< N; i++)  v3[i] = v1[i] * v2[i];

https://github.com/OpenMP/Examples/blob/main/program_control/sources/metadirective.1.c

Same issue with one example in OpenMP_VV (formerly: sollve_vv):
https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/blob/master/tests/5.0/metadirective/test_metadirective_arch_is_nvidia.c

* * *

Regarding the following:

program_control/sources/metadirective.3.c:34:7: warning: direct calls to an 
offloadable function containing metadirectives with a ‘construct={target}’ 
selector may produce unexpected results

https://github.com/OpenMP/Examples/blob/main/program_control/sources/metadirective.3.c

First, it seems as if it shouldn't be unconditionally printed but only with 
-Wopenmp,
enabled by default. (i.e. change the 0 to OPT_Wopenmp in the warning_at call).

That's for:

#pragma omp begin declare target
void exp_pi_diff(double *d, double my_pi){
   #pragma omp metadirective \
   when(   construct={target}: distribute parallel for ) \
   otherwise(  parallel for simd

I think the warning perfectly makes sense and the description in the example
document is either wrong or at least misleading, but for GCC, I am wondering
whether we should add the following, which might help the user:

inform (loc, "consider whether the context selector % should 
be used instead");

(needs to be implemented such that -Wno-openmp does not print it, e.g. if 
(warning_at ...))

Disclaimer: That's not exactly the same for 'target device(omp_initial_device)'
as running it inside a target region with host fallback, the nteams-var and
nthreads-var ICV are probably different from running it outside of the host,
such that the choice between 'when' and 'otherwise' still might matter but
I bet that in most cases 'kind(nohost)' is what the user actually wants to
have.


On the example document side, I have filed the OpenMP Example Issue #483 as
the testcase seems to be either wrong or misleading, judging from the
testcase and its description.

* * *

Continuing with OpenMP_VV - I think the programmer meant 'device' and not
'target_device', but I think GCC overdoes the diagnostic, given that there
are only two 'target_device':

 
tests/5.1/metadirective/test_metadirective_target_device.c:30:15: sorry, unimplemented: ‘target_device’ selector set inside of ‘target’ directive

   30 |   #pragma omp metadirective \
  |   ^~~
tests/5.1/metadirective/test_metadirective_target_device.c:30:15: sorry, 
unimplemented: ‘target_device’ selector set inside of ‘target’ directive
tests/5.1/metadirective/test_metadirective_target_device.c:30:15: sorry, 
u

[PATCH] s390: Fix s390_constant_via_vgbm_p() [PR118362]

2025-01-09 Thread Stefan Schulze Frielinghaus

Optimization s390_constant_via_vgbm_p() should only apply to constant
vectors which can be expressed by the hardware, i.e., which have a size
of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
and s390_constant_via_vrepi_p().

gcc/ChangeLog:

PR target/118362
* config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
16-byte vectors.
---
 Bootstrap and regtest are still running.  If both are successful, I
 will push this one promptly.

 gcc/config/s390/s390.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 918a2cd6c6d..08acb69de3e 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -2818,7 +2818,7 @@ s390_constant_via_vgbm_p (rtx op, unsigned *mask)
   unsigned tmp_mask = 0;
   int nunit, unit_size;
 
-  if (GET_CODE (op) == CONST_VECTOR)
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_SIZE (GET_MODE (op)) <= 16)
 {
   if (GET_MODE_INNER (GET_MODE (op)) == TImode
  || GET_MODE_INNER (GET_MODE (op)) == TFmode)
-- 
2.47.0

Re: [Patch, Fortran, PR118337, v1] Fortran: Fix Fortran *.mod compatibility [PR118337]

2025-01-09 Thread Jakub Jelinek

On Thu, Jan 09, 2025 at 11:32:35AM +0100, Andre Vehreschild wrote:
> I am sorry, I don't get it. So we are trying to help the compiler, i.e. the 
> C++
> one, to create a fast gfortran binary. But we don't care about devs that 
> stumble
> about the code and ask themselves, "Why is this done (without a comment) so
> oddly?"! Furthermore is the code again using convention or inside knowledge:
> 
>   intmod_sym symbol[] = {
> #define NAMED_INTCST(a,b,c,d) { a, b, 0, d },
> #define NAMED_UINTCST(a,b,c,d) { a, b, 0, d },
> #define NAMED_KINDARRAY(a,b,c,d) { a, b, 0, d },
> #define NAMED_DERIVED_TYPE(a,b,c,d) { a, b, 0, d },
> #define NAMED_FUNCTION(a,b,c,d) { a, b, c, d },
> #define NAMED_SUBROUTINE(a,b,c,d) { a, b, c, d },
> 
> So in function and subroutine definitions `c` may become non-constant in the
> future and we neither detect it nor care. 
> 
> #include "iso-fortran-env.def"
> { ISOFORTRANENV_INVALID, NULL, -1234, 0 } };
> 
>   i = 0;
> #define NAMED_INTCST(a,b,c,d) symbol[i++].value = c;
> #define NAMED_UINTCST(a,b,c,d) symbol[i++].value = c;
> #define NAMED_KINDARRAY(a,b,c,d) i++;
> #define NAMED_DERIVED_TYPE(a,b,c,d) i++;
> #define NAMED_FUNCTION(a,b,c,d) i++;
> #define NAMED_SUBROUTINE(a,b,c,d) i++;
> #include "iso-fortran-env.def"
> 
> So why not at least adding a comment to document this design decision and I

So like this?
I've also added an assert, I think it isn't really needed to assert that
symbol[i].id == a in each case, just that we increment i for all the cases.

2025-01-09  Jakub Jelinek  

PR fortran/118337
* module.cc (use_iso_fortran_env_module): Add a comment explaining
the optimization performed.  Add gcc_checking_assert that i was
incremented for all the elements.  Formatting fix.

--- gcc/fortran/module.cc.jj2025-01-09 08:25:45.648324540 +0100
+++ gcc/fortran/module.cc   2025-01-09 15:12:07.611842917 +0100
@@ -7122,6 +7122,13 @@ use_iso_fortran_env_module (void)
 #include "iso-fortran-env.def"
 { ISOFORTRANENV_INVALID, NULL, -1234, 0 } };
 
+  /* We could used c in the NAMED_{,U}INTCST macros
+ instead of c, but then current g++ expands the initialization
+ as clearing the whole object followed by explicit stores of
+ all the non-zero elements (over 150), while by using 0s for
+ the non-constant initializers and initializing them afterwards
+ g++ will often copy everything from .rodata and then only override
+ over 30 non-constant ones.  */
   i = 0;
 #define NAMED_INTCST(a,b,c,d) symbol[i++].value = c;
 #define NAMED_UINTCST(a,b,c,d) symbol[i++].value = c;
@@ -7130,6 +7137,7 @@ use_iso_fortran_env_module (void)
 #define NAMED_FUNCTION(a,b,c,d) i++;
 #define NAMED_SUBROUTINE(a,b,c,d) i++;
 #include "iso-fortran-env.def"
+  gcc_checking_assert (i == (int) ARRAY_SIZE (symbol) - 1);
 
   /* Generate the symbol for the module itself.  */
   mod_symtree = gfc_find_symtree (gfc_current_ns->sym_root, mod);
@@ -7288,12 +7296,11 @@ use_iso_fortran_env_module (void)
break;
 
 #define NAMED_FUNCTION(a,b,c,d) \
-   case a:
+ case a:
 #include "iso-fortran-env.def"
- create_intrinsic_function (symbol[i].name, symbol[i].id, mod,
-INTMOD_ISO_FORTRAN_ENV, false,
-NULL);
- break;
+   create_intrinsic_function (symbol[i].name, symbol[i].id, mod,
+  INTMOD_ISO_FORTRAN_ENV, false, NULL);
+   break;
 
  default:
gcc_unreachable ();


Jakub

1 2 >

1 - 100 of 125 matches

Mail list logo