Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Richard Sandiford
Uros Bizjak  writes:
> On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
>>
>> On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
>> >
>> > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
>> > >
>> > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
>> > > avoid the following regressions with
>> > >
>> > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
>> > >
>> > > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
>> > > lra_split_hard_reg_for, at lra-assigns.cc:1863)
>> > > FAIL: gcc.dg/pr105911.c (test for excess errors)
>> > > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
>> > > vpro[lr]q 29
>> > > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
>> > > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
>> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
>> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
>> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
>> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
>> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
>> > > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
>> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
>> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
>> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
>> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
>> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
>> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
>> > > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
>> > >
>> > > Tested with glibc master branch at
>> > >
>> > > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
>> > > Author: Samuel Thibault 
>> > > Date:   Sun Mar 2 15:16:45 2025 +0100
>> > >
>> > > htl: move pthread_once into libc
>> > >
>> > > and built Linux kernel 6.13.5 on x86-64.
>> > >
>> > > PR target/119083
>> > > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
>> > > and BREG.
>> >
>> > The commit message doesn't reflect what the patch does.
>> >
>> > OTOH, this is a very delicate part of the compiler. You are risking RA
>> > failures, the risk/benefit ratio is very high, so I wouldn't touch it
>> > without clear benefits. Do you have a concrete example where declaring
>> > BREG as spilled hurts?
>> >
>>
>> I can find a testcase to show the improvement.  But I am not sure if
>> it is what you were asking for.  ix86_class_likely_spilled_p was
>> CLASS_LIKELY_SPILLED_P which was added by
>>
>> commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
>> Author: Michael Meissner 
>> Date:   Thu Sep 8 17:59:18 1994 +
>>
>> Add support for -mreg-alloc=
>>
>> This option is long gone and there is no test in GCC testsuite to show
>> that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
>> another callee-saved register, not different from R12.  On i386, BX is
>> used as the PIC register.  But today RA may pick a different register if
>> PLT isn't involved.  This patch gives RA a little bit more freedom.
>
> In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
> the compiler that some extra care is needed with listed classes. On
> i386 and x86_64 these include single register classes that represent
> "architectural" registers (registers with assigned role). The compiler
> does take care to not extend life times of CLASS_LIKELY_SPILLED_P
> classes too much to avoid reload failures in cases where instruction
> with C_L_S_P class (e.g. shifts with %cl register) is emitted between
> unrelated register def and use.
>
> Registers in these classes won't disappear from the pool of available
> registers and RA can still use them, but with some extra care. So,
> without clear and noticeable benefits, single register classes remain
> declared as CLASS_LIKELY_SPILLED_P to avoid possible reload failures.

I agree with what you say.  But even so, part of me thinks it might be
interesting to see how much breakage there would be if we stopped using
likely-spilled for requirements that the RA can see for itself (through
constraints).  As the commits quoted show, this is an ancient mechanism
and a lot of received wisdom dates from reload and pre-IRA RA.

I would expect IRA to handle the situation better (e.g. because it
replaces scratches with new pseudos, so that it can allocate scratches
directly rather than leaving it to reload/LRA).  And I would expect
LRA to avoid fatal spill failures in cases that reload couldn't.

Thanks,
Richard


[PATCH] c++: Implement C++26 P1061R10 - Structured Bindings can introduce a Pack [PR117783]

2025-05-01 Thread Jakub Jelinek
Hi!

The following patch implements the C++26
P1061R10 - Structured Bindings can introduce a Pack
paper.
One thing unresolved in the patch is mangling, I've raised
https://github.com/itanium-cxx-abi/cxx-abi/issues/200
for that but no comments there yet.  One question is if it is ok
not to mention the fact that there is a structured binding pack in
the mangling of the structured bindings but more important is in case
of std::tuple* we might need to mangle individual structured binding
pack elements separately (each might need an exported name for the
var itself and perhaps its guard variable as well).  The patch just
uses the normal mangling for the whole structured bindings and emits
sorry if we need to mangle the structured binding pack elements.
The patch just marks the structured binding pack specially (considered
e.g. using some bit on it, but in the end I'm identifying it using
a made up type which causes DECL_PACK_P to be true; it is kind of
self-referential solution, because the type on the pack mentions the
DECL_DECOMPOSITION_P VAR_DECL on which the type is attached as its pack,
so it needs to be handled carefully during instantiation to avoid infinite
recursion, but it is the type that should be used if something else actually
needs to use the same type as the structured binding pack, e.g. a capture
proxy), and stores the pack elements when actually processed through
cp_finish_decomp with non-dependent initializer into a TREE_VEC used as
DECL_VALUE_EXPR of the pack; though because several spots use the
DECL_VALUE_EXPR and assume it is ARRAY_REF from which they can find out the
base variable and the index, it stores the base variable and index in the
first 2 TREE_VEC elts and has the structured binding elements only after
that.
https://eel.is/c++draft/temp.dep.expr#3.6 says the packs are type dependent
regardless of whether the initializer of the structured binding is type
dependent or not, so I hope having a dependent type on the structured
binding VAR_DECL is ok.
The paper also has an exception for sizeof... which is then not value
dependent when the structured bindings are initialized with non-dependent
initializer: https://eel.is/c++draft/temp.dep.constexpr#4
The patch special cases that in 3 spots (I've been wondering if e.g. during
parsing I couldn't just fold the sizeof... to the INTEGER_CST right away,
but guess I'd need to repeat that also during partial instantiation).

And one thing still unresolved is debug info, I've just added DECL_IGNORED_P
on the structured binding pack VAR_DECL because there were ICEs with -g
for now, hope it can be fixed incrementally but am not sure what exactly
we should emit in the debug info for that.

Speaking of which, I see
DW_TAG_GNU_template_parameter_pack
DW_TAG_GNU_formal_parameter_pack
etc. DIEs emitted regardless of DWARF version, shouldn't we try to upstream
those into DWARF 6 or check what other compilers emit for the packs?
And bet we'd need DW_TAG_GNU_structured_binding_pack as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-05-01  Jakub Jelinek  

PR c++/117783
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_structured_bindings
predefined value for C++26 from 202403L to 202411L.
gcc/cp/
* parser.cc: Implement C++26 P1061R10 - Structured Bindings can
introduce a Pack.
(cp_parser_range_for): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
(cp_parser_decomposition_declaration): Use sb-identifier-list instead
of identifier-list in comments.  Parse structured bindings with
structured binding pack.  Don't emit pedwarn about structured
binding attributes in structured bindings inside of a condition.
(cp_convert_omp_range_for): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
* decl.cc (get_tuple_element_type): Change i argument type from
unsigned to unsigned HOST_WIDE_INT.
(get_tuple_decomp_init): Likewise.
(set_sb_pack_name): New function.
(cp_finish_decomp): Handle structured binding packs.
* pt.cc (tsubst_pack_expansion): Handle structured binding packs
and capture proxies for them.  Formatting fixes.
(tsubst_decl): For structured binding packs don't tsubst TREE_TYPE
first, instead recreate the type after r is created.
(tsubst_omp_for_iterator): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
(tsubst_expr): Handle sizeof... on non-dependent structure binding
packs.
(value_dependent_expression_p): Return false for sizeof... on
non-dependent structure binding packs.
(instantiation_dependent_r): Don't recurse on sizeof... on
non-dependent structure binding packs.
* constexpr.cc (potential_constant_expression_1): Also handle
TREE_VEC on DECL_VALUE_EXPR of structure binding packs.
gcc/testsuite/
* g++.dg/cpp

[PATCH v2 1/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Christopher Bazley
Named loops (C2y) could not previously be compiled with
-O1 and -ggdb2 or higher because the label preceding
a loop (or switch) could not be found when using such
command lines.

This could be observed by compiling
gcc/gcc/testsuite/gcc.dg/c2y-named-loops-1.c with
the provoking command line (or any minimal example such
as that cited in the bug report).

The fix was simply to ignore the tree nodes inserted
for debugging information.

Base commit is 79aa2a283a8d3327ff4d6dca77e81d5b1ac3a01e

PR c/119317

gcc/c/ChangeLog:

* c-decl.cc (c_get_loop_names): Do not prematurely
end the search for a label that names a loop or
switch statement upon encountering a DEBUG_BEGIN_STMT.
Instead, ignore any instances of DEBUG_BEGIN_STMT.

gcc/testsuite/ChangeLog:

* gcc.dg/c2y-named-loops-8.c: New test.
---
 gcc/c/c-decl.cc  | 3 ++-
 gcc/testsuite/gcc.dg/c2y-named-loops-8.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/c2y-named-loops-8.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 4e200f91107..ad66d7d258b 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -13861,7 +13861,8 @@ c_get_loop_names (tree before_labels, bool switch_p, 
tree *last_p)
  ++ret;
}
}
-  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR)
+  else if (TREE_CODE (stmt) != CASE_LABEL_EXPR
+  && TREE_CODE (stmt) != DEBUG_BEGIN_STMT)
break;
 }
   if (last)
diff --git a/gcc/testsuite/gcc.dg/c2y-named-loops-8.c 
b/gcc/testsuite/gcc.dg/c2y-named-loops-8.c
new file mode 100644
index 000..8d69db45f77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2y-named-loops-8.c
@@ -0,0 +1,5 @@
+/* PR c/119317 - Named loops (C2y) did not compile with -O1 and -ggdb2 or 
higher */
+/* { dg-do compile } */
+/* { dg-options "-std=c2y -O1 -ggdb2" } */
+
+#include "c2y-named-loops-1.c"
-- 
2.43.0



[PATCH v2 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Christopher Bazley
Fixed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119317

Tested v2 on AArch64 using a new regression test,
which is simply a wrapper around an existing test
for named loops that sets the provoking options.

Confirmed that the new test passes with the fix
and fails if the fix is reverted. e.g.,
FAIL: gcc.dg/c2y-named-loops-8.c (test for excess errors)

Changes in v2:
 - Fixed a formatting issue.
 - Added a test.

Link to v1:
https://inbox.sourceware.org/gcc-patches/20250429124531.103475-1-chris.baz...@arm.com/

Christopher Bazley (1):
  Fix BZ 119317: named loops (C2y) with debug info

 gcc/c/c-decl.cc  | 3 ++-
 gcc/testsuite/gcc.dg/c2y-named-loops-8.c | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/c2y-named-loops-8.c

-- 
2.43.0



Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 1:21 PM Richard Sandiford
 wrote:
>
> Uros Bizjak  writes:
> > On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
> >>
> >> On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
> >> >
> >> > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> >> > >
> >> > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
> >> > > avoid the following regressions with
> >> > >
> >> > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
> >> > >
> >> > > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
> >> > > lra_split_hard_reg_for, at lra-assigns.cc:1863)
> >> > > FAIL: gcc.dg/pr105911.c (test for excess errors)
> >> > > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> >> > > vpro[lr]q 29
> >> > > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> >> > > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> >> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> >> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> >> > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> >> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> >> > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> >> > > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> >> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> >> > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> >> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> >> > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> >> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> >> > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> >> > > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
> >> > >
> >> > > Tested with glibc master branch at
> >> > >
> >> > > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> >> > > Author: Samuel Thibault 
> >> > > Date:   Sun Mar 2 15:16:45 2025 +0100
> >> > >
> >> > > htl: move pthread_once into libc
> >> > >
> >> > > and built Linux kernel 6.13.5 on x86-64.
> >> > >
> >> > > PR target/119083
> >> > > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove 
> >> > > CREG
> >> > > and BREG.
> >> >
> >> > The commit message doesn't reflect what the patch does.
> >> >
> >> > OTOH, this is a very delicate part of the compiler. You are risking RA
> >> > failures, the risk/benefit ratio is very high, so I wouldn't touch it
> >> > without clear benefits. Do you have a concrete example where declaring
> >> > BREG as spilled hurts?
> >> >
> >>
> >> I can find a testcase to show the improvement.  But I am not sure if
> >> it is what you were asking for.  ix86_class_likely_spilled_p was
> >> CLASS_LIKELY_SPILLED_P which was added by
> >>
> >> commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
> >> Author: Michael Meissner 
> >> Date:   Thu Sep 8 17:59:18 1994 +
> >>
> >> Add support for -mreg-alloc=
> >>
> >> This option is long gone and there is no test in GCC testsuite to show
> >> that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
> >> another callee-saved register, not different from R12.  On i386, BX is
> >> used as the PIC register.  But today RA may pick a different register if
> >> PLT isn't involved.  This patch gives RA a little bit more freedom.
> >
> > In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
> > the compiler that some extra care is needed with listed classes. On
> > i386 and x86_64 these include single register classes that represent
> > "architectural" registers (registers with assigned role). The compiler
> > does take care to not extend life times of CLASS_LIKELY_SPILLED_P
> > classes too much to avoid reload failures in cases where instruction
> > with C_L_S_P class (e.g. shifts with %cl register) is emitted between
> > unrelated register def and use.
> >
> > Registers in these classes won't disappear from the pool of available
> > registers and RA can still use them, but with some extra care. So,
> > without clear and noticeable benefits, single register classes remain
> > declared as CLASS_LIKELY_SPILLED_P to avoid possible reload failures.
>
> I agree with what you say.  But even so, part of me thinks it might be
> interesting to see how much breakage there would be if we stopped using
> likely-spilled for requirements that the RA can see for itself (through
> constraints).  As the commits quoted show, this is an ancient mechanism
> and a lot of received wisdom dates from reload and pre-IRA RA.

A quick test for improved RA capabilities would be to remove CREG from
C_L_S_P. Integer shift instructions use %cl as the "architectural"
count register, so the allocator has to take extra care with its
allocation. These instructions are heavily used, so there would be
plenty of cases that exercise this functionality.

We should also not forget inline asms, these are also heavy users of
single

Re: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract for 128-bit VLS.

2025-05-01 Thread Jennifer Schmitz


> On 28 Apr 2025, at 16:29, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>>> On 27 Apr 2025, at 08:42, Tamar Christina  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
 -Original Message-
 From: Richard Sandiford 
 Sent: Friday, April 25, 2025 4:45 PM
 To: Jennifer Schmitz 
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] aarch64: Optimize SVE extract last to Neon lane 
 extract for
 128-bit VLS.
 
 Jennifer Schmitz  writes:
> For the test case
> int32_t foo (svint32_t x)
> {
> svbool_t pg = svpfalse ();
> return svlastb_s32 (pg, x);
> }
> compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced:
> foo:
>   pfalse  p3.b
>   lastb   w0, p3, z0.s
>   ret
> when it could use a Neon lane extract instead:
> foo:
>   umovw0, v0.s[3]
>   ret
> 
> We implemented this optimization by guarding the emission of
> pfalse+lastb in the pattern vec_extract by
> known_gt (BYTES_PER_SVE_VECTOR, 16). Thus, for a last-extract operation
> in 128-bit VLS, the pattern *vec_extract_v128 is used instead.
> 
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
>   * config/aarch64/aarch64-sve.md (vec_extract):
>   Prevent the emission of pfalse+lastb for 128-bit VLS.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/sve/extract_last_128.c: New test.
 
 OK, thanks.
 
>>> 
>>> Hi Both,
>>> 
>>> If I may, how about viewing this transformation differently.
>>> 
>>> How about instead make pfalse + lastb become rev + umov, which is
>>> faster on all Arm micro-architectures for both VLS and VLA and then
>>> the optimization becomes lowering rev + umov for VL128 into umov[3]?
>>> 
>>> This would speed up this case for everything.
>>> 
>>> It would also be good to have a test that checks that
>>> 
>>> #include 
>>> 
>>> int foo (svint32_t a)
>>> {
>>>   return svrev (a)[0];
>>> }
>>> 
>>> Works. At the moment, with -msve-vector-bits=128 GCC transforms
>>> this back into lastb, which is a de-optimization. Your current patch does 
>>> fix it
>>> so would be good to test this too.
>> Hi Richard and Tamar,
>> thanks for the feedback.
>> Tamar’s suggestion sounds like a more general solution.
>> I think we could either change define_expand "vec_extract”
>> to not emit pfalse+lastb in case the last element is extracted, but emit 
>> rev+umov instead,
>> or we could add a define_insn_and_split to change pfalse+lastb to rev+umov 
>> (and just umov for VL128).
> 
> I think we should do it in the expand.  But ISTM that there are
> two separate things here:
> 
> (1) We shouldn't use the VLA expansion for VLS cases if the VLS case
>can do something simpler.
> 
> (2) We should improve the expansion based on what uarches have actually
>implemented.  (The current code predates h/w.)
> 
> I think we want (1) even if (2) involves using REV+UMOV, since for
> (1) we still want a single UMOV for the top of the range.  And we'd
> still want the tests in the patch.
> 
> So IMO, from a staging perspective, it makes sense to do (1) first
> and treat (2) as a follow-on.
> 
> The REV approach isn't restricted to the last element.  It should be able
> to handle:
> 
>  int8_t foo(svint8_t x) { return x[svcntb()-2]; }
> 
> as well.  In other words, if:
> 
>   auto idx = GET_MODE_NUNITS (mode) - val - 1;
> 
> is a constant in the range [0, 255], we can do a reverse and then
> fall through to the normal CONST_INT expansion with idx as the index.
> 
> But I suppose that does mean that, rather than:
> 
>&& known_gt (BYTES_PER_SVE_VECTOR, 16))
> 
> we should instead check:
> 
>&& !val.is_constant ()
> 
> For 256-bit SVE we should use DUP instead of PLAST+LASTB or REV+UMOV.
I adjusted the patch to use && !val.is_constant () as check, such that the 
extraction
of the last element is matched by other patterns for VLS of all vector widths.
256-bit and 512-bit VLS are now matched by *vec_extract_dup
and 1024-bit and 2048-bit by *vec_extract_ext.
There were existing tests for 256-bit, 512-bit, 1024-bit and 2048-bit that 
could be adjusted.
128-bit VLS is covered by the new test in the patch. Let me know if I shall 
change it to be
more analogous to the other extract_*.c tests.

About implementing point (2): Do we only want the REV approach you outlined for 
VLA,
now that VLS is redirected to other patterns?

Thanks,
Jennifer

For the test case
int32_t foo (svint32_t x)
{
  svbool_t pg = svpfalse ();
  return svlastb_s32 (pg, x);
}
compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced:
foo:
pfalse  p3.b
lastb   w0, p3, z0.s
ret
when it could use a Neon lane extract instead:
foo:
umovw0, v0.s[3]
ret

Similar

Re: [PATCH v2 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Joseph Myers
On Thu, 1 May 2025, Christopher Bazley wrote:

> Changes in v2:
>  - Fixed a formatting issue.
>  - Added a test.

Thanks.  I have no further comments; I hope someone more familiar with 
debug info handling can review this version.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] x86: Update TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 12:49 AM H.J. Lu  wrote:
>
> On Wed, Apr 30, 2025 at 7:48 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> > >
> > > SMALL_REGISTER_CLASSES was added by
> > >
> > > commit c98f874233428d7e6ba83def7842fd703ac0ddf1
> > > Author: James Van Artsdalen 
> > > Date:   Sun Feb 9 13:28:48 1992 +
> > >
> > > Initial revision
> > >
> > > which became TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P.  It is false from
> > > day 1 for i386.  Since x86-64 doubles the number of registers, Change
> > > TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to return false for x86-64
> > > and update decrease_live_ranges_number to skip hard register if
> > > targetm.class_likely_spilled_p returns true.  These extend the live
> > > range of rbp, r8-r31 and xmm1-xmm31 registers.
> > >
> > > PR target/118996
> > > * ira.cc (decrease_live_ranges_number): Skip hard register if
> > > targetm.class_likely_spilled_p returns true.
> > > * config/i386/i386.cc (ix86_small_register_classes_for_mode_p):
> > > New.
> > > (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Use it.
> >
> > Redeclaring X86_64 is too risky. We can perhaps enable integer
> > registers for APX because it is a relatively new development. I'm
> > undecided about AVX-512, perhaps this is worth a try.
>
> There are many GCC targets with 16 GPRs which don't return false
> for TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P.

They don't have half of their available registers declared as
"architectural" registers with assigned roles.

Also, register allocation for x86_64 is fine-tuned with
TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P, we are risking regressions
if this is redefined.

Uros.


Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread H.J. Lu
On Thu, May 1, 2025 at 2:56 PM Uros Bizjak  wrote:
>
> On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
> >
> > On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> > > >
> > > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
> > > > avoid the following regressions with
> > > >
> > > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
> > > >
> > > > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
> > > > lra_split_hard_reg_for, at lra-assigns.cc:1863)
> > > > FAIL: gcc.dg/pr105911.c (test for excess errors)
> > > > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> > > > vpro[lr]q 29
> > > > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> > > > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> > > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> > > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> > > > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> > > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> > > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> > > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> > > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> > > > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
> > > >
> > > > Tested with glibc master branch at
> > > >
> > > > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> > > > Author: Samuel Thibault 
> > > > Date:   Sun Mar 2 15:16:45 2025 +0100
> > > >
> > > > htl: move pthread_once into libc
> > > >
> > > > and built Linux kernel 6.13.5 on x86-64.
> > > >
> > > > PR target/119083
> > > > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
> > > > and BREG.
> > >
> > > The commit message doesn't reflect what the patch does.
> > >
> > > OTOH, this is a very delicate part of the compiler. You are risking RA
> > > failures, the risk/benefit ratio is very high, so I wouldn't touch it
> > > without clear benefits. Do you have a concrete example where declaring
> > > BREG as spilled hurts?
> > >
> >
> > I can find a testcase to show the improvement.  But I am not sure if
> > it is what you were asking for.  ix86_class_likely_spilled_p was
> > CLASS_LIKELY_SPILLED_P which was added by
> >
> > commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
> > Author: Michael Meissner 
> > Date:   Thu Sep 8 17:59:18 1994 +
> >
> > Add support for -mreg-alloc=
> >
> > This option is long gone and there is no test in GCC testsuite to show
> > that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
> > another callee-saved register, not different from R12.  On i386, BX is
> > used as the PIC register.  But today RA may pick a different register if
> > PLT isn't involved.  This patch gives RA a little bit more freedom.
>
> In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
> the compiler that some extra care is needed with listed classes. On
> i386 and x86_64 these include single register classes that represent
> "architectural" registers (registers with assigned role). The compiler
> does take care to not extend life times of CLASS_LIKELY_SPILLED_P
> classes too much to avoid reload failures in cases where instruction
> with C_L_S_P class (e.g. shifts with %cl register) is emitted between
> unrelated register def and use.

But there is no assigned role for RBX.

> Registers in these classes won't disappear from the pool of available
> registers and RA can still use them, but with some extra care. So,
> without clear and noticeable benefits, single register classes remain
> declared as CLASS_LIKELY_SPILLED_P to avoid possible reload failures.
>
> Uros.



-- 
H.J.


[PATCH] c++: Small build_vec_init improvement [PR117827]

2025-05-01 Thread Jakub Jelinek
Hi!

As discussed in the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674492.html
thread, the following patch attempts to improve build_vec_init generated
code.  E.g. on g++.dg/eh/aggregate1.C test the patch has differences like:
 D.2988 = &D.2950->e1;
 D.2989 = D.2988;
 D.2990 = 1;
 try
   {
 goto ;
 :
 A::A (D.2989);
 D.2990 = D.2990 + -1;
 D.2989 = D.2989 + 1;
 :
 if (D.2990 >= 0) goto ; else goto ;
 :
 retval.4 = D.2988;
 _13 = &D.2950->e2;
 A::A (_13);
-D.2990 = 1;
+D.2988 = 0B;
 D.2951 = D.2951 + -1;
   }
 catch
   {
 {
   struct A * D.2991;
 
   if (D.2988 != 0B) goto ; else goto 
;
   :
   _11 = 1 - D.2990;
   _12 = (sizetype) _11;
   D.2991 = D.2988 + _12;
   :
   if (D.2991 == D.2988) goto ; else goto 
;
   :
   D.2991 = D.2991 + 18446744073709551615;
   A::~A (D.2991);
   goto ;
   :
   goto ;
   :
   :
 }
   }
in 3 spots.  As you can see, both setting D.2990 (i.e. iterator) to
maxindex and setting D.2988 (i.e. rval) to nullptr have the same effect of
not actually destructing anything anymore in the cleanup, the
advantage of clearing rval is that setting something to zero is often less
expensive than potentially huge maxindex and that the cleanup tests that
value first.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-30  Jakub Jelinek  

PR c++/117827
* init.cc (build_vec_init): Push to *cleanup_flags clearing of rval
instead of setting of iterator to maxindex.

--- gcc/cp/init.cc.jj   2025-01-24 19:26:31.980193087 +0100
+++ gcc/cp/init.cc  2025-01-24 19:45:03.324702001 +0100
@@ -4749,7 +4749,8 @@ build_vec_init (tree base, tree maxindex
 itself.  But that breaks when gimplify_target_expr adds a clobber
 cleanup that runs before the build_vec_init cleanup.  */
   if (cleanup_flags)
-   vec_safe_push (*cleanup_flags, build_tree_list (iterator, maxindex));
+   vec_safe_push (*cleanup_flags,
+  build_tree_list (rval, build_zero_cst (ptype)));
 }
 
   /* Should we try to create a constant initializer?  */

Jakub



Re: [PATCH] x86: Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Wed, Apr 30, 2025 at 11:43 PM H.J. Lu  wrote:
>
> On Wed, Apr 30, 2025 at 8:12 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> > >
> > > SSE_FIRST_REG was added to CLASS_LIKELY_SPILLED_P, which became
> > > TARGET_CLASS_LIKELY_SPILLED_P, for
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40470
> > >
> > > Since RA has been improved and xmm0 is a commonly used register, remove
> > > SSE_FIRST_REG from ix86_class_likely_spilled_p to improve xmm0 codegen:
> >
> > While the AVX version of pblendvb doesn't use XMM0 as an architectural
> > register, there are still plenty of other cases, e.g. PCMPESTM and
> > PCMPISTRM, SHA and KEYLOCKER insns, so you are risking RA failures
> > with these insn if the life of XMM0 is extended.
>
> This is no different from
>
> void
> foo1 (double x)
> {
>   register double xmm1 __asm ("xmm1") = x;
>   asm volatile ("# %0" : "+v" (xmm1));
> }
>
> We don't add XMM1 to ix86_class_likely_spilled_p.

I don't see the relation between the above example and

xor %xmm0,%xmm0
...
movaps val,%xmm0  <- needed for pcmpistrm, fails reload due to already
live %xmm0
pcmpistrm
...
movaps %xmm0, %xmm2

> > OTOH, the (unrelated?) ternlog change changes the expander, where the
> > expander does subreg tricks on memory operand, which doesn' look
> > correct to me. adjust_address should be used instead.
> >
> >
>
> Should I drop the ternlog change and submit a different patch?

No, please submit a ternlog change that will avoid subreg with memory.

Uros.


[committed htdocs v2] bugs: improve "ABI changes" subsection

2025-05-01 Thread Sam James
C++ ABI for C++ standards with full support by GCC (rather than those
marked as experimental per https://gcc.gnu.org/projects/cxx-status.html)
should be stable. It's certainly not the case in 2025 that one needs a
full world rebuild for C++ libraries using e.g. the default standard
or any other supported standard by C++, unless it is marked experimental
where we provide no guarantees.
---
Changed the ... text to a suggestion from Jason on IRC.

Pushed.

 htdocs/bugs/index.html | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index c4907c38..c7c040e8 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -647,14 +647,14 @@ GCC maintains a 'Porting to' resource for new versions:
 components: the first defines how the elements of classes are laid
 out, how functions are called, how function names are mangled, etc;
 the second part deals with the internals of the objects in libstdc++.
-Although we strive for a non-changing ABI, so far we have had to
-modify it with each major release.  If you change your compiler to a
-different major release you must recompile all libraries that
-contain C++ code.  If you fail to do so you risk getting linker
-errors or malfunctioning programs.
-It should not be necessary to recompile if you have changed
-to a bug-fix release of the same version of the compiler; bug-fix
-releases are careful to avoid ABI changes. See also the
+For C++ standards marked as
+https://gcc.gnu.org/projects/cxx-status.html";>experimental,
+stable ABI is not guaranteed: for these, if you change your compiler to a
+different major release you must recompile any libraries that were built
+using a C++ -std= flag that was still experimental.  If you fail
+to do so, you risk getting linker errors or malfunctioning programs.
+It should not be necessary to recompile for C++ standards supported fully
+by GCC, such as the default standard.  See also the
 https://gcc.gnu.org/onlinedocs/gcc/Compatibility.html";>compatibility
 section of the GCC manual.
 

base-commit: 745430b2e92b32951dcd7e8572d5e54e189c74b7
-- 
2.49.0



Re: [PATCH v5 05/10] libstdc++: Implement layout_left from mdspan.

2025-05-01 Thread Tomasz Kaminski
On Wed, Apr 30, 2025 at 7:13 AM Tomasz Kaminski  wrote:

> Hi,
>
> As we will be landing patches for extends, this will become a separate
> patch series.
> I would prefer, if you could commit per layout, and start with
> layout_right (default)
> I try to provide prompt responses, so if that works better for you, you
> can post a patch
> only with this layout first, as most of the comments will apply to all of
> them.
>
> For the general design we have constructors that allow conversion between
> rank-0
> and rank-1 layouts left and right. This is done because they essentially
> represents
> the same layout. I think we could benefit from that in code by having a
> base classes
> for rank0 and rank1 mapping:
> template
> _Rank0_mapping_base
> {
>static_assert(_Extents::rank() == 0);
>
>template
>// explicit, requires goes here
>_Rank0_mapping_base(_Rank0_mapping_base);
>
> // All members layout_type goes her
> };
>
> template
> _Rank1_mapping_base
> {
>static_assert(_Extents::rank() == 1);
>   // Static assert for product is much simpler here, as we need to check
> one
>
>template
>// explicit, requires goes here
>_Rank1_mapping_base(_Rank1_mapping_base);
>
>   // Call operator can also be simplified
>   index_type operator()(index_type i) const // conversion happens at user
> side
>
>   // cosntructor from strided_layout of Rank1 goes here.
>
> // All members layout_type goes her
> };
> Then we will specialize layout_left/right/stride to use
> _Rank0_mapping_base as a base for rank() == 0
> and layout_left/right to use _Rank1_mapping as base for rank()1;
> template
> struct extents {};
>
> struct layout
> {
> template
> struct mapping
> {
> // static assert that Extents mmyst be specialization of _Extents goes
> here.
> }
> };
>
> template
> struct layout::mapping>
> : _Rank0_mapping_base>
> {
> using layout_type = layout_left;
> // Provides converting constructor.
> using _Rank0_mapping_base>::_Rank0_mapping_base;
> // This one is implicit;
> mapping(_Rank0_mapping_base> const&);
> };
>
> template
> struct layout::mapping>
> : _Rank1_mapping_base>
>
> {
> using layout_type = layout_left;
> // Provides converting constructor.
> using _Rank0_mapping_base>::_Rank0_mapping_base;
> // This one is implicit, allows construction from layout_right
> mapping(_Rank1_mapping_base> const&);
> };
> };
>
> template
> requires sizeof..(_Ext) > = 2
> struct layout::mapping>
>
> The last one is a generic implementation that you can use in yours.
> Please also include a comment explaining that we are deviating from
> standard text here.
>
>
> On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz 
> wrote:
>
>> Implements the parts of layout_left that don't depend on any of the
>> other layouts.
>>
>> libstdc++/ChangeLog:
>>
>> * include/std/mdspan (layout_left): New class.
>>
>> Signed-off-by: Luc Grosheintz 
>> ---
>>  libstdc++-v3/include/std/mdspan | 179 
>>  1 file changed, 179 insertions(+)
>>
>> diff --git a/libstdc++-v3/include/std/mdspan
>> b/libstdc++-v3/include/std/mdspan
>> index 39ced1d6301..e05048a5b93 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>namespace __mdspan
>>{
>> +template
>> +  constexpr typename _Extents::index_type
>> +  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
>> +  {
>> +   typename _Extents::index_type __fwd = 1;
>> +   for(size_t __i = 0; __i < __r; ++__i)
>> + __fwd *= __exts.extent(__i);
>> +   return __fwd;
>> +  }
>>
> As we are inside the standard library implementation, we can do some
> tricks here,
> and provide two functions:
> // Returns the std::span(_ExtentsStorage::_Ext).substr(f, l);
> // For extents forward to __static_exts
> span __static_exts(size_t f, size_t l);
> // Returns the
> std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f],
> _S_dynamic_index[l);
> span __dynamic_exts(Extents const& c);
> Then you can befriend this function both to extents and _ExtentsStorage.
> Also add index_type members to _ExtentsStorage.
>
> Then instead of having fwd-prod and rev-prod I would have:
> template
> consteval size_t __static_ext_prod(size_t f, size_t l)
> {
>   // multiply E != dynamic_ext from __static_exts
> }
> constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l)
> {
>// multiply __static_ext_prod<_Extents>(f, l) and each elements of
> __dynamic_exts(__exts, f, l);
> }
>
> Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) would
> be __ext_prod(e, __ext.rank() -n, n, __ext.rank())
>
>
>> +
>> +template
>> +  constexpr typename _Extents::index_type
>> +  __rev_prod(const _Extents& __exts, size_t __r) noexcept
>> +  {
>> +   typename _Extents::index_type __rev = 1;
>> +   for(size_t __i = __r + 1; __i < __exts.rank(); ++__i)
>> + __rev *= __exts.extent(__i)

Re : [PATCH v4] RISC-V: Fix missing implied Zicsr from Zve32x

2025-05-01 Thread Liao Shihua

Seems CI still fail:

Patch Status 46835-v4_RISCV_Fix_missing_implied_Zicsr_from_Zve32x-1 · 
Issue #3283 · ewlu/gcc-precommit-ci 



/home/ewlu/precommit-03/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/predef-19.c: 
In function 'main':
/home/ewlu/precommit-03/_work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/predef-19.c:31:2: 
error: #error "__riscv_zmmul"


riscv-gcc don't provide __riscv_zmmul .

@kito, there are __riscv_mul, __riscv_div, and __riscv_muldiv in GCC. 
Should we enable __riscv_mul when we use zmmul extension ?




The Zve32x extension depends on the Zicsr extension.
Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.

gcc/ChangeLog:
 * common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/predef-19.c: set the march to rv64im_zve32x
   instead of rv64gc_zve32x to avoid Zicsr implied by g. Extra m is
   added to avoid current 'V' extension requires 'M' extension

Signed-off-by: Jerry Zhang Jian
---
  gcc/common/config/riscv/riscv-common.cc|  1 +
  gcc/testsuite/gcc.target/riscv/predef-19.c | 34 +-
  2 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 15df22d5377..145a0f2bd95 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
{"zve64f", "f"},
{"zve64d", "d"},
  
+  {"zve32x", "zicsr"},

{"zve32x", "zvl32b"},
{"zve32f", "zve32x"},
{"zve32f", "zvl32b"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index 2b90702192b..ca3d57abca9 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow 
-misa-spec=2.2" } */
+/* { dg-options "-O2 -march=rv64im_zve32x -mabi=lp64 -mcmodel=medlow 
-misa-spec=2.2" } */
  
  int main () {
  
@@ -15,50 +15,30 @@ int main () {

  #error "__riscv_i"
  #endif
  
-#if !defined(__riscv_c)

-#error "__riscv_c"
-#endif
-
  #if defined(__riscv_e)
  #error "__riscv_e"
  #endif
  
-#if !defined(__riscv_a)

-#error "__riscv_a"
-#endif
-
  #if !defined(__riscv_m)
  #error "__riscv_m"
  #endif
  
-#if !defined(__riscv_f)

-#error "__riscv_f"
-#endif
-
-#if !defined(__riscv_d)
-#error "__riscv_d"
-#endif
-
-#if defined(__riscv_v)
-#error "__riscv_v"
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
  #endif
  
-#if defined(__riscv_zvl128b)

-#error "__riscv_zvl128b"
+#if !defined(_riscv_zmmul)
+#error "__riscv_zmmul"
  #endif
  
-#if defined(__riscv_zvl64b)

-#error "__riscv_zvl64b"
+#if !defined(__riscv_zve32x)
+#error "__riscv_zve32x"
  #endif
  
  #if !defined(__riscv_zvl32b)

  #error "__riscv_zvl32b"
  #endif
  
-#if !defined(__riscv_zve32x)

-#error "__riscv_zve32x"
-#endif
-
  #if !defined(__riscv_vector)
  #error "__riscv_vector"
  #endif
-- 2.49.0


[PATCH] get_known_nonzero_bits_1 should use wi::bit_and_not [PR118659]

2025-05-01 Thread Andrew Pinski
While looking into bitwise optimizations, I noticed that
get_known_nonzero_bits_1 does `bm.value () & ~bm.mask ()` which
is ok except it creates a temporary wide_int. Instead if we
use wi::bit_and_not, we can avoid the temporary and on some
targets use the andn/bic instruction.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssanames.cc (get_known_nonzero_bits_1): Use
wi::bit_and_not instead of `a & ~b`.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssanames.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index d7865f29f0b..de7b9b79f94 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -576,7 +576,7 @@ get_known_nonzero_bits_1 (const_tree name)
   if (tmp.undefined_p ())
 return wi::shwi (0, precision);
   irange_bitmask bm = tmp.get_bitmask ();
-  return bm.value () & ~bm.mask ();
+  return wi::bit_and_not (bm.value (), bm.mask ());
 }
 
 /* Return a wide_int with known non-zero bits in SSA_NAME
-- 
2.43.0



Re: [Stage 1][Middle-end][PATCH v5 0/3] Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-01 Thread Kees Cook
On Mon, Apr 07, 2025 at 03:04:56PM +, Qing Zhao wrote:
> Adding -fdiagnotics-details into GCC to provide more hints to the
> end users on how the warnings come from, in order to help the user
> to locate the exact location in source code on the specific warnings
> due to compiler optimizations.

FWIW, I've been very happily using this to find and squash tricky bugs
in Linux for a few months now. This is finally getting us close to being
able to enable -Warray-bounds globally. :)

https://lore.kernel.org/all/?q=%22-fdiagnostic%22+%22-details%22

-- 
Kees Cook


[PATCH v2] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-01 Thread LIU Hao
For i686-w64-mingw32, both the incoming and preferred stack boundaries are decreased to 4 bytes (32 
bits), same as MSVC. The change has no effect on 64-bit targets.


Documentation of `force_align_arg_pointer` attribute seems to say that it realigns the stack to 16-byte 
boundaries, so it now has a side effect to increase the preferred boundary to at least 16 bytes (128 bits).


I have booststrapped GCC 15.1 with this patch on i686-w64-mingw32, and verified that ESP is properly 
aligned with all these configurations:


   --
   -O?local variable force_align_arg_pointer runtime
  alignment  realign to
   --
0  4 no   4 (no-op)
0  8 no   8
0 16 no  16
0 32 no  32
0  4 yes 16
0  8 yes 16
0 16 yes 16
0 32 yes 32
2  4 no   4 (no-op)
2  8 no   8
2 16 no  16
2 32 no  32
2  4 yes 16
2  8 yes 16
2 16 yes 16
2 32 yes 32
s  4 no   4 (no-op)
s  8 no   8
s 16 no  16
s 32 no  32
s  4 yes 16
s  8 yes 16
s 16 yes 16
s 32 yes 32
   --

This patch may affect other 32-bit targets where the stack is not always aligned to 16 bytes, but I don't 
have any system with such a configuration, so can't test that for now.




--
Best regards,
LIU Hao



From 1c101f4903a9be7d56efa8d97be603284f6bd4d4 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

   ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR 07
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
* config/i386/i386.cc (ix86_compute_frame_layout): Force minimum
128-bit alignment if `force_align_arg_pointer`.

Signed-off-by: LIU Hao 
---
 gcc/config/i386/cygming.h | 4 
 gcc/config/i386/i386.cc   | 9 +
 2 files changed, 13 insertions(+)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..b8c396d35793 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,6 +28,10 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SEH
 #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)

+#undef PREFERRED_STACK_BOUNDARY_DEFAULT
+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY)
+
 /* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
Force the use of different mechanisms to allocate aligned local data.  */
 #undef MAX_STACK_ALIGNMENT
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38df84f7db24..61e2acc53b7d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7920,6 +7920,15 @@ ix86_update_stack_boundary (void)
   if (ix86_tls_descriptor_calls_expanded_in_cfun
   &&

Re: [PATCH] c++: Small build_vec_init improvement [PR117827]

2025-05-01 Thread Jason Merrill

On 5/1/25 3:00 AM, Jakub Jelinek wrote:

Hi!

As discussed in the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674492.html
thread, the following patch attempts to improve build_vec_init generated
code.  E.g. on g++.dg/eh/aggregate1.C test the patch has differences like:
  D.2988 = &D.2950->e1;
  D.2989 = D.2988;
  D.2990 = 1;
  try
{
  goto ;
  :
  A::A (D.2989);
  D.2990 = D.2990 + -1;
  D.2989 = D.2989 + 1;
  :
  if (D.2990 >= 0) goto ; else goto 
;
  :
  retval.4 = D.2988;
  _13 = &D.2950->e2;
  A::A (_13);
-D.2990 = 1;
+D.2988 = 0B;
  D.2951 = D.2951 + -1;
}
  catch
{
  {
struct A * D.2991;
  
if (D.2988 != 0B) goto ; else goto ;

:
_11 = 1 - D.2990;
_12 = (sizetype) _11;
D.2991 = D.2988 + _12;
:
if (D.2991 == D.2988) goto ; else goto 
;
:
D.2991 = D.2991 + 18446744073709551615;
A::~A (D.2991);
goto ;
:
goto ;
:
:
  }
}
in 3 spots.  As you can see, both setting D.2990 (i.e. iterator) to
maxindex and setting D.2988 (i.e. rval) to nullptr have the same effect of
not actually destructing anything anymore in the cleanup, the
advantage of clearing rval is that setting something to zero is often less
expensive than potentially huge maxindex and that the cleanup tests that
value first.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2025-04-30  Jakub Jelinek  

PR c++/117827
* init.cc (build_vec_init): Push to *cleanup_flags clearing of rval
instead of setting of iterator to maxindex.

--- gcc/cp/init.cc.jj   2025-01-24 19:26:31.980193087 +0100
+++ gcc/cp/init.cc  2025-01-24 19:45:03.324702001 +0100
@@ -4749,7 +4749,8 @@ build_vec_init (tree base, tree maxindex
 itself.  But that breaks when gimplify_target_expr adds a clobber
 cleanup that runs before the build_vec_init cleanup.  */
if (cleanup_flags)
-   vec_safe_push (*cleanup_flags, build_tree_list (iterator, maxindex));
+   vec_safe_push (*cleanup_flags,
+  build_tree_list (rval, build_zero_cst (ptype)));
  }
  
/* Should we try to create a constant initializer?  */


Jakub





[Patch fortran] PR119948 - Source allocation of pure function result rejected

2025-05-01 Thread Paul Richard Thomas
Committed as r16-329-g0abc77da9d704bba55a376bb5c162a54826ab94a following OK
by Jerry on PR.

I am ready to backport in a few weeks time if that is wanted.

Regards

Paul

Fortran: Source allocation of pure function result rejected [PR119948]

2025-05-01  Paul Thomas  

gcc/fortran
PR fortran/119948
* resolve.cc (gfc_impure_variable): The result of a module
procedure with an interface declaration is not impure even if
the current namespace is not the same as the symbol's.

gcc/testsuite/
PR fortran/119948
* gfortran.dg/pr119948.f90: New test.


Re: [GCC16,RFC,V2 08/14] aarch64: memtag: implement target hooks

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
>
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect.  Continue to use the default target hook if HWASAN is
> being used.  Following target hooks are implemented:
>- TARGET_MEMTAG_INSERT_RANDOM_TAG
>- TARGET_MEMTAG_ADD_TAG
>- TARGET_MEMTAG_TAG_MEMORY
>
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
>- TARGET_MEMTAG_TAG_SIZE
>- TARGET_MEMTAG_GRANULE_SIZE

How about using "bits"/"bytes" rather than "size", since the two hooks
use different units?  Alternatively, we could follow the machine_mode
convention of using "size" for bytes and "bitsize" for bits.

> As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook,
> the  _only_ use of which is done by the MEMTAG sanitizer.  On aarch64,
> TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory
> of a given size.  TARGET_MEMTAG_TAG_MEMORY target hook implementation
> may emit an actual loop to tag/untag memory when size of memory block is
> an expression to be evaluated.  Both aarch64_memtag_tag_memory () and
> aarch64_memtag_tag_memory_via_loop () may generate stg or st2g,
> depending on the number of iterations.
>
> TBD:
> - rtx generation in the target-hooks not tested well.  WIP.
> - See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then
> used to generate a loop to tag/untag memory.  Is there a better way
> to do this ?
>
> gcc/ChangeLog:
>
>   * asan.cc (memtag_sanitize_p): New definition.
>   * asan.h (memtag_sanitize_p): New declaration.
>   * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
>   Define.
>   (AARCH64_MEMTAG_TAG_SIZE): Define.
>   (aarch64_can_tag_addresses): Add MEMTAG specific handling.
>   (aarch64_memtag_tag_size): Likewise.
>   (aarch64_memtag_granule_size): Likewise.
>   (aarch64_memtag_insert_random_tag): Generate irg insn.
>   (aarch64_memtag_add_tag): Generate addg/subg insn.
>   (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define.
>   (aarch64_memtag_tag_memory_via_loop): New definition.
>   (aarch64_memtag_tag_memory): Likewise.
> (aarch64_gen_tag_memory_postindex): Likewise.
>   (TARGET_MEMTAG_TAG_SIZE): Define target-hook.
>   (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
>   (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
>   (TARGET_MEMTAG_ADD_TAG): Likewise.
>   (TARGET_MEMTAG_TAG_MEMORY): Likewise.
>
> [...]
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG.  */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> +  rtx ret;
> +  if (memtag_sanitize_p ())
> +{
> +  gcc_assert (param_memtag_instrument_stack || 
> param_memtag_instrument_allocas);

It might be better to drop this assert.  The hook should be prepared
to do what it's told without inquiring into the reason.

> +  if (!target)
> + target = gen_reg_rtx (Pmode);
> +
> +  rtx insn = gen_irg (target, untagged, untagged);
> +  emit_insn (insn);

Usually, this kind of hook would allow "untagged" to be any reasonable
operand, and similarly "target".  They might not match the predicates
on irg.

So I think this should use the expand_insn machinery, with
create_output_operand for the result and create_input_operand
for the input.  This will also avoid the need to call gen_reg_rtx
explicitly.

Same for the other uses of gen_* in the patch.

> +
> +  ret = XEXP (insn, 0);
> +}
> +  else
> +ret = default_memtag_insert_random_tag (untagged, target);
> +
> +  return ret;
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG.  */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  rtx offset_rtx, tag_offset_rtx;
> +  rtx target, insn;
> +  poly_int64 abs_offset;
> +  enum rtx_code code;
> +  bool neg_p;
> +  rtx ret;
> +
> +  if (memtag_sanitize_p ())
> +{
> +  target = gen_reg_rtx (Pmode);
> +  tag_offset_rtx = gen_int_mode (tag_offset, DImode);
> +  gcc_assert (aarch64_memtag_tag_offset (tag_offset_rtx, DImode));
> +
> +  neg_p = known_lt (offset, 0);
> +  abs_offset = neg_p ? offset * (-1) : offset;
> +  offset_rtx = gen_int_mode (abs_offset, DImode);
> +
> +  if (!aarch64_granule16_uimm6 (offset_rtx, DImode))
> + {
> +   /* Emit addr arithmetic prior to addg/subg.  */
> +   code = neg_p ? MINUS : PLUS;
> +   insn = expand_simple_binop (Pmode, code, base, offset_rtx,
> +   target, true, OPTAB_LIB_WIDEN);
> +   offset_rtx = const0_rtx;
> + }
> +
> +  /* Addr offset offset must be within bounds at this time.  */
> +  gcc_assert (aarch64_granule16_uimm6 (offset_rtx, DImode));
> +
> +  /* Even if tag_offset_rtx is CONST0_RTX, ge

Re: [GCC16, RFC, V2 12/14] memtag: doc: add new option -fsanitize-memtag-mode=

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> Add new command line option -fsanitize-memtag-mode with three possible
> values:
>   - sync (default)
>   - async
>   - asymm
> This allows the user to select the fault conveyance model when using MTE
> instructions for their applications.

Not sure about the name "asymm": it doesn't say which way it's
asymmetrical.  Maybe syncread would have been more explicit.
But I agree we should go with what Clang has established.

> This option is not (sanity checked) processed in GCC at all currently.  If bad
> args / unsupported args are passed, ld will complain.
>
> TBD:
>   - This option is not checked / processed in GCC at all currently.
>   - asymm is not specified in Memtag ABI...
>   - clang has -fsanitize-memtag-mode=
> Sets default MTE mode to 'sync' (default) or 'async'
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Document -fsanitize-memtag-mode.
>
> ---
> [New in RFC V2]
> ---
>  gcc/doc/invoke.texi | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index de651183a703..b33585430e6a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -18032,6 +18032,18 @@ Use Memory Tagging Extension instructions instead of 
> instrumentation to allow
>  the detection of memory errors.  This option is available only on those 
> AArch64
>  architectures that support Memory Tagging Extensions.
>  
> +@opindex -fsanitize-memtag-mode
> +@item -fsanitize-memtag-mode=@r{[}sync@r{|}async@r{|}asymm@r{]}
> +Control the fault conveyance model of MTE instructions.  Mismatched logical 
> and
> +allocation tags are detected during a load/store operation.

We'd need to phrase this in a way that isn't AArch64-specific.
Users might also not be familiar with the terms "logical tags" and
"allocation tags".

Otherwise this LGTM part from a nit:

>  In @code{sync}
> +mode, exceptions are precise, providing the exact instruction where the fault
> +occurred, and the exact faulting address.  The @code{aysnc} mode allows
> +imprecise detection that a fault has occurred, at the benefit of increased

s/at the benefit/with the benefit/

Thanks,
Richard

> +performance over the synchronous mode.  The @code{asymm} mode provides
> +synchronous checking on memory reads, and asynchronous checking of memory
> +writes.  The selection of fault conveyance model does not alter code
> +generation.
> +
>  @opindex fsanitize=pointer-compare
>  @item -fsanitize=pointer-compare
>  Instrument comparison operation (<, <=, >, >=) with pointer operands.


[PATCH 0/4] gimple: Rework the bit-test switch lowering algorithm

2025-05-01 Thread Filip Kastl
Hi,

while developing GCC 15, we found out that the bit-test switch lowering
algorithm is quadratic and that it is possible to construct a switch that
causes huge compile time (PR117091).

Andi Kleen came up with a fast bit-test lowering algorithm which was
1) linear
2) in some cases better than the original algorithm
3) in some cases worse than the original algorithm

For GCC 15 we decided to use the original algorithm when the given switch is
small and the fast algorithm when the switch is too big.

In this patch set I combine both algorithms.  The new algorithm gives strictly
better solutions than Andi's and the original one and runs in linear time.

More details below and in commit messages.  Bootstrapped and regtested on
x86_64 linux.

Ok for trunk?
Filip Kastl


Better solutions


When I say "better solutions", I mean the number of clusters.  Cluster is a
concept from switch lowering pass.  It is basically a group of cases that can
be handled by constantly many operations.

On every switch the new algorithm should output clustering of the cases such
that the number of clusters is the smallest possible.  This should be
guaranteed by how the algorithm works.

To be extra sure, I tested this by compiling GCC itself.  I bootstrapped GCC
once for each of the three algorithms (the original one, Andi's and the new
one) and I counted the number of clusters for each function (I didn't find a
way to easily compare individual switches).  Indeed the new algorithm either
gave a function with fewer clusters or a function with more switches
(if-to-switch conversion converts if-else chains into switches based on if
switch lowering can lower the resulting switch -- I think that this is also a
win).


A note about --param switch-lower-slow-alg-max-cases


The param switch-lower-slow-alg-max-cases specified how big a switch had to be
so that GCC would switch to the fast algorithm.  After this patch set it will
serve no purpose.  I kept it though.  There is the jump-table switch lowering
algorithm.  That algorithm is still quadratic.  Nobody ran into any real issues
with that yet, but it would be nice to come up with a fast algorithm for
jump-table lowering and set GCC to fall back to it when a switch is too big
(pr118353).  Then the param would be relevant again.  I plan to look into that
in this Stage 1.


Testing
---

- I compiled the testcase from PR117091 to confirm that the new algorithm
  spends reasonable time on it.

> gcc.sh -O2 -ftime-report big-switch.c 
Time variable  wall   GGC
 ipa SRA:  12.58 (  5%)   536  (  0%)
 dominator optimization :  55.13 ( 22%)  4687k (  1%)
 tree FRE   :  71.05 ( 29%)   744  (  0%)
 tree switch lowering   :  27.41 ( 11%)13M (  3%)
 tree modref:  13.24 (  5%)   480  (  0%)
 rest of compilation:  23.09 (  9%)  2816  (  0%)
 TOTAL  : 248.27  413M
(some entries omitted)

The original bug report claimed that this testcase ran over 30 minutes.  That's
not the case here.  Switch lowering doesn't even dominate the compilation time.

- I also ran a few SPEC CPU 2017 benchmarks (x86_64 linux).

with this patch set:

O2 generic
500.perlbench_r   NR   1237   6.72  
*
502.gcc_r NR   1216   6.57  
*

Ofast native
500.perlbench_r   NR   1251   6.33  
*
502.gcc_r NR   1210   6.73  
*

without this patch set:

O2 generic
500.perlbench_r   NR   1242   6.57  
*
502.gcc_r NR   1216   6.56  
*

Ofast native
500.perlbench_r   NR   1253   6.29  
*
502.gcc_r NR   1210   6.73  
*

The fourth row is the exec time in seconds.  There are only small differences
(2% max) and they favor the new algorithm.  They could just be noise, btw.

I've selected the perl and gcc benchmarks because I expect that those contain
many switches.

- I bootstrapped and regtested this on x86_64 linux.


Filip Kastl (4):
  gimple: Merge slow and fast bit-test switch lowering [PR117091]
  gimple: Make bit-test switch lowering more powerful
  gimple: Don't warn about using different algs for big switch lowering
[PR117091]
  gimple: Switch bit-test lowering testcases for the more powerful alg

 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c |  60 ++
 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c |  51 +
 gcc/tree-switch-conversion.cc| 245 +++
 gcc/tree-switch-conversion.h |  10 -
 4 files changed, 184 insertions(+), 182 deletions(-)
 create mode 100

[PATCH 1/4] gimple: Merge slow and fast bit-test switch lowering [PR117091]

2025-05-01 Thread Filip Kastl
PR117091 showed that bit-test switch lowering can take a lot of time.
The algorithm was O(n^2).  We therefore came up with a faster algorithm
(O(n * BITS_IN_WORD)) and made GCC choose between the slow and the fast
algorithm based on how big the switch is.

Here I combine the algorithms so that we get the results of the slower
algorithm in the faster asymptotic time.

PR middle-end/117091

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests_fast):
Remove function.
(bit_test_cluster::find_bit_tests_slow): Remove function.
(bit_test_cluster::find_bit_tests): We don't need to decide
between slow and fast so just put the modified (no longer) slow
algorithm here.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 107 ++
 1 file changed, 17 insertions(+), 90 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 39a8a893edd..a70274b0337 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1773,92 +1773,38 @@ jump_table_cluster::is_beneficial (const vec 
&,
   return end - start + 1 >= case_values_threshold ();
 }
 
-/* Find bit tests of given CLUSTERS, where all members of the vector are of
-   type simple_cluster.  Use a fast algorithm that might not find the optimal
-   solution (minimal number of clusters on the output).  New clusters are
-   returned.
-
-   You should call find_bit_tests () instead of calling this function
-   directly.  */
-
-vec
-bit_test_cluster::find_bit_tests_fast (vec &clusters)
-{
-  unsigned l = clusters.length ();
-  vec output;
-
-  output.create (l);
-
-  /* Look at sliding BITS_PER_WORD sized windows in the switch value space
- and determine if they are suitable for a bit test cluster.  Worst case
- this can examine every value BITS_PER_WORD-1 times.  */
-  unsigned k;
-  for (unsigned i = 0; i < l; i += k)
-{
-  hash_set targets;
-  cluster *start_cluster = clusters[i];
-
-  /* Find the biggest k such that clusters i to i+k-1 can be turned into a
-one big bit test cluster.  */
-  k = 0;
-  while (i + k < l)
-   {
- cluster *end_cluster = clusters[i + k];
-
- /* Does value range fit into the BITS_PER_WORD window?  */
- HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
-   end_cluster->get_high ());
- if (w == 0 || w > BITS_PER_WORD)
-   break;
-
- /* Check for max # of targets.  */
- if (targets.elements () == m_max_case_bit_tests
- && !targets.contains (end_cluster->m_case_bb))
-   break;
-
- targets.add (end_cluster->m_case_bb);
- k++;
-   }
-
-  if (is_beneficial (k, targets.elements ()))
-   {
- output.safe_push (new bit_test_cluster (clusters, i, i + k - 1,
- i == 0 && k == l));
-   }
-  else
-   {
- output.safe_push (clusters[i]);
- /* ??? Might be able to skip more.  */
- k = 1;
-   }
-}
-
-  return output;
-}
-
 /* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.  Use a slow (quadratic) algorithm that always
-   finds the optimal solution (minimal number of clusters on the output).  New
-   clusters are returned.
-
-   You should call find_bit_tests () instead of calling this function
-   directly.  */
+   are of type simple_cluster.  MAX_C is the approx max number of cases per
+   label.  New clusters are returned.  */
 
 vec
-bit_test_cluster::find_bit_tests_slow (vec &clusters)
+bit_test_cluster::find_bit_tests (vec &clusters, int max_c)
 {
+  if (!is_enabled () || max_c == 1)
+return clusters.copy ();
+
   unsigned l = clusters.length ();
   auto_vec min;
   min.reserve (l + 1);
 
   min.quick_push (min_cluster_item (0, 0, 0));
 
+  unsigned bits_in_word = GET_MODE_BITSIZE (word_mode);
+
   for (unsigned i = 1; i <= l; i++)
 {
   /* Set minimal # of clusters with i-th item to infinite.  */
   min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
 
-  for (unsigned j = 0; j < i; j++)
+  /* Since each cluster contains at least one case number and one bit test
+cluster can cover at most bits_in_word case numbers, we don't need to
+look farther than bits_in_word clusters back.  */
+  unsigned j;
+  if (i - 1 >= bits_in_word)
+   j = i - 1 - bits_in_word;
+  else
+   j = 0;
+  for (; j < i; j++)
{
  if (min[j].m_count + 1 < min[i].m_count
  && can_be_handled (clusters, j, i - 1))
@@ -1900,25 +1846,6 @@ bit_test_cluster::find_bit_tests_slow (vec 
&clusters)
   return output;
 }
 
-/* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.  MAX_C is the approx max number of cases per
-   lab

[PATCH 2/4] gimple: Make bit-test switch lowering more powerful

2025-05-01 Thread Filip Kastl
A reasonable goal for bit-test lowering is to produce the least amount
of clusters for a given switch (a cluster is basically a group of cases
that can be handled by constantly many operations).

The current algorithm doesn't always give optimal solutions in that
sense.  This patch should fix this.  The important thing is basically
just to ask if a cluster is_beneficial() more proactively.

The patch also has a fix for a mistake which made bit-test lowering only
create BITS_IN_WORD - 1 big clusters.  There are also some new comments
that go into more detail on the dynamic programming algorithm.

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Modify the dynamic programming algorithm to take is_beneficial()
into account earlier.  To do this efficiently, copy some logic
from is_beneficial() here.  Add detailed comments about how the
DP algorithm works.
(bit_test_cluster::can_be_handled): Check that the cluster range
is >, not >= BITS_IN_WORD.  Remove the
"vec &, unsigned, unsigned" overloaded variant since
we no longer need it.
(bit_test_cluster::is_beneficial): Add a comment that this
function is closely tied to m_max_case_bit_tests.  Remove the
"vec &, unsigned, unsigned" overloaded variant since
we no longer need it.
* tree-switch-conversion.h: Remove the vec overloaded variants
of bit_test_cluster::is_beneficial and
bit_test_cluster::can_be_handled.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 153 +++---
 gcc/tree-switch-conversion.h  |  10 ---
 2 files changed, 67 insertions(+), 96 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index a70274b0337..4f0be8c43f0 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1783,58 +1783,98 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
   if (!is_enabled () || max_c == 1)
 return clusters.copy ();
 
+  /* Dynamic programming algorithm.
+
+ In: List of simple clusters
+ Out: List of simple clusters and bit test clusters such that each bit test
+ cluster can_be_handled() and is_beneficial()
+
+ Tries to merge consecutive clusters into bigger (bit test) ones.  Tries to
+ end up with as few clusters as possible.  */
+
   unsigned l = clusters.length ();
   auto_vec min;
   min.reserve (l + 1);
 
-  min.quick_push (min_cluster_item (0, 0, 0));
+  gcc_checking_assert (l > 0);
+  gcc_checking_assert (l <= INT_MAX);
 
-  unsigned bits_in_word = GET_MODE_BITSIZE (word_mode);
+  int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
-  for (unsigned i = 1; i <= l; i++)
+  /* First phase: Compute the minimum number of clusters for each prefix of the
+ input list incrementally
+
+ min[i] = (count, j, _) means that the prefix ending with the (i-1)-th
+ element can be made to contain as few as count clusters and that in such
+ clustering the last cluster is made up of input clusters [j, i-1]
+ (inclusive).  */
+  min.quick_push (min_cluster_item (0, 0, INT_MAX));
+  min.quick_push (min_cluster_item (1, 0, INT_MAX));
+  for (int i = 2; i <= (int) l; i++)
 {
-  /* Set minimal # of clusters with i-th item to infinite.  */
-  min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
+  auto_vec unique_labels;
 
   /* Since each cluster contains at least one case number and one bit test
 cluster can cover at most bits_in_word case numbers, we don't need to
 look farther than bits_in_word clusters back.  */
-  unsigned j;
-  if (i - 1 >= bits_in_word)
-   j = i - 1 - bits_in_word;
-  else
-   j = 0;
-  for (; j < i; j++)
+  for (int j = i - 1; j >= 0 && j >= i - bits_in_word; j--)
{
- if (min[j].m_count + 1 < min[i].m_count
- && can_be_handled (clusters, j, i - 1))
-   min[i] = min_cluster_item (min[j].m_count + 1, j, INT_MAX);
-   }
+ /* Consider creating a bit test cluster from input clusters [j, i-1]
+(inclusive)  */
 
-  gcc_checking_assert (min[i].m_count != INT_MAX);
+ simple_cluster *sc = static_cast (clusters[j]);
+ unsigned label = sc->m_case_bb->index;
+ if (!unique_labels.contains (label))
+   {
+ if (unique_labels.length () >= m_max_case_bit_tests)
+   /* is_beneficial() will be false for this and the following
+  iterations.  */
+   break;
+ unique_labels.quick_push (label);
+   }
+
+ unsigned new_count = min[j].m_count + 1;
+
+ if (j == i - 1)
+   {
+ min.quick_push (min_cluster_item (new_count, j, INT_MAX));
+ continue;
+   }
+
+ unsigned HOST_WIDE_INT range
+   = get_range (clusters[j]->get_low (), clusters[i-1]->get_high ());
+   

[PATCH 3/4] gimple: Don't warn about using different algs for big switch lowering [PR117091]

2025-05-01 Thread Filip Kastl
We currently don't switch to a faster switch lowering algorithm when a
switch is too big.  This patch removes a warning about this.

PR middle-end/117091

gcc/ChangeLog:

* tree-switch-conversion.cc 
(switch_decision_tree::analyze_switch_statement):
Remove warning about using different algorithms.

Signed-off-by: Filip Kastl 
---
 gcc/tree-switch-conversion.cc | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 4f0be8c43f0..dea217a01ef 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -2257,13 +2257,6 @@ switch_decision_tree::analyze_switch_statement ()
 
   reset_out_edges_aux (m_switch);
 
-  if (l > (unsigned) param_switch_lower_slow_alg_max_cases)
-warning_at (gimple_location (m_switch), OPT_Wdisabled_optimization,
-  "Using faster switch lowering algorithms. "
-  "Number of switch cases (%d) exceeds "
-  "%<--param=switch-lower-slow-alg-max-cases=%d%> limit.",
-  l, param_switch_lower_slow_alg_max_cases);
-
   /* Find bit-test clusters.  */
   vec output = bit_test_cluster::find_bit_tests (clusters, max_c);
 
-- 
2.48.1



[PATCH 4/4] gimple: Switch bit-test lowering testcases for the more powerful alg

2025-05-01 Thread Filip Kastl
This patch adds 2 testcases.  One tests that GCC is able to create
bit-test clusters of size 64.  The other one contains two switches which
GCC wouldn't completely cover with bit-test clusters before the changes
from this patch set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/switch-5.c: New test.
* gcc.dg/tree-ssa/switch-6.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c | 60 
 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c | 51 
 2 files changed, 111 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-6.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
new file mode 100644
index 000..b05742cf153
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-5.c
@@ -0,0 +1,60 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* 
} && lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-switchlower1" } */
+
+int f0();
+int f1();
+int f2();
+int f3();
+int f4();
+ 
+int foo(int a)
+{
+switch (a)
+{
+case 0:
+case 2:
+case 4:
+case 6:
+return f0();
+case 8:
+return f1();
+case 10:
+case 14:
+case 16:
+case 18:
+return f2();
+case 12:
+return f3();
+case 20:
+return f4();
+}
+return -1;
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-8 
BT:10-20" "switchlower1" } } */
+
+int bar(int a)
+{
+switch (a)
+{
+case 20:
+case 18:
+case 16:
+case 14:
+return f0();
+case 12:
+return f1();
+case 10:
+case 6:
+case 4:
+case 2:
+return f2();
+case 8:
+return f3();
+case 0:
+return f4();
+}
+return -1;
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-10 
BT:12-20" "switchlower1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c
new file mode 100644
index 000..bbbc87462c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-6.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* 
} && lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-switchlower1 -fno-jump-tables" } */
+
+/* Test that bit-test switch lowering can create cluster of size 64 (there was
+   an of-by-one error causing it to only do 63 before).  */
+
+int f();
+
+int foo(int a)
+{
+switch (a)
+{
+case 0:
+case 3:
+case 5:
+case 7:
+case 9:
+case 11:
+case 13:
+case 15:
+case 17:
+case 19:
+case 21:
+case 23:
+case 25:
+case 27:
+case 29:
+case 31:
+case 33:
+case 35:
+case 37:
+case 39:
+case 41:
+case 43:
+case 45:
+case 47:
+case 49:
+case 51:
+case 53:
+case 55:
+case 57:
+case 59:
+case 61:
+case 63:
+return f();
+default:
+return -1;
+}
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:0-63" 
"switchlower1" } } */
-- 
2.48.1



[PATCH] Fixup vect_remove_slp_scalar_calls

2025-05-01 Thread Richard Biener
There's a logic error for vect_remove_slp_scalar_calls where it
simply ignores pattern stmts but it should instead look at the
original stmt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_remove_slp_scalar_calls): Look
at the original stmt.
---
 gcc/tree-vect-slp.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b5a9604d074..9bf142d0faf 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -11361,12 +11361,12 @@ vect_remove_slp_scalar_calls (vec_info *vinfo,
 {
   if (!stmt_info)
continue;
+  if (!PURE_SLP_STMT (stmt_info))
+   continue;
+  stmt_info = vect_orig_stmt (stmt_info);
   gcall *stmt = dyn_cast  (stmt_info->stmt);
   if (!stmt || gimple_bb (stmt) == NULL)
continue;
-  if (is_pattern_stmt_p (stmt_info)
- || !PURE_SLP_STMT (stmt_info))
-   continue;
   lhs = gimple_call_lhs (stmt);
   if (lhs)
new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs)));
-- 
2.43.0


[PATCH] Fix gcc.dg/tree-ssa/ssa-dom-thread-7.c for aarch64

2025-05-01 Thread Richard Biener
So on another machine with a cross I see 17 jumps threaded, so adjusted
like that.

Pushed.

PR tree-optimization/120003
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust aarch64 expected
thread2 number of threads.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 8be9878e0cf..59891f29132 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -12,7 +12,7 @@
jump threading opportunities.  Skip the later tests on aarch64.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! 
aarch64*-*-* } } } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 10"  "thread2" { target { ! 
aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 14"  "thread2" { target { 
aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 17"  "thread2" { target { 
aarch64*-*-* } } } } */
 
 enum STATE {
   S0=0,
-- 
2.43.0


Re: [PATCH RFA (fold)] c++: remove TREE_STATIC from constexpr heap vars [PR119162]

2025-05-01 Thread Patrick Palka
On Mon, 21 Apr 2025, Jason Merrill wrote:

> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> -- 8< --
> 
> While working on PR119162 it occurred to me that it would be simpler to
> detect the problem of a value referring to a heap allocation if we stopped
> setting TREE_STATIC on them so they naturally are not considered to have a
> constant address.  With that change we no longer need to specifically avoid
> caching a value that refers to a deleted pointer.
> 
> But with this change maybe_nonzero_address is not sure whether the variable
> could have address zero.  I don't understand why it returns 1 only for
> variables in the current function, rather than all non-symtab decls; an auto
> variable from some other function also won't have address zero.  Maybe this
> made more sense when it was in tree_single_nonzero_warnv_p before r7-5868?
> 
> But assuming there is some reason for the current behavior, this patch only
> changes the handling of non-symtab decls when folding_cxx_constexpr.

FWIW the maybe_nonzero_address change seems to fix PR85944, yay!  Except for
the PR115207 dup, which we still reject with

115207.C:11:18: error: ‘(((const value*)(&)) != ((const value*)(& 
test.array<4>::_items)))’ is not a constant expression
   11 | if (this != &other) {
  | ~^

Could this mean another spot in the middle-end needs relaxing in order to
handle address comparisons of context-less local vars?

> 
>   PR c++/119162
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (find_deleted_heap_var): Remove.
>   (cxx_eval_call_expression): Don't call it.  Don't set TREE_STATIC on
>   heap vars.
>   (cxx_eval_outermost_constant_expr): Don't mess with varpool.
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (maybe_nonzero_address): Return 1 for non-symtab
>   vars if folding_cxx_constexpr.
> ---
>  gcc/cp/constexpr.cc | 29 -
>  gcc/fold-const.cc   | 25 -
>  2 files changed, 16 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 8a11e6265f2..5b7b70f7e65 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -1550,7 +1550,6 @@ static tree cxx_eval_bare_aggregate (const 
> constexpr_ctx *, tree,
>  static tree cxx_fold_indirect_ref (const constexpr_ctx *, location_t, tree, 
> tree,
>  bool * = NULL);
>  static tree find_heap_var_refs (tree *, int *, void *);
> -static tree find_deleted_heap_var (tree *, int *, void *);
>  
>  /* Attempt to evaluate T which represents a call to a builtin function.
> We assume here that all builtin functions evaluate to scalar types
> @@ -2975,14 +2974,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>: heap_uninit_identifier,
>type);
> DECL_ARTIFICIAL (var) = 1;
> -   TREE_STATIC (var) = 1;
> -   // Temporarily register the artificial var in varpool,
> -   // so that comparisons of its address against NULL are folded
> -   // through nonzero_address even with
> -   // -fno-delete-null-pointer-checks or that comparison of
> -   // addresses of different heap artificial vars is folded too.
> -   // See PR98988 and PR99031.
> -   varpool_node::finalize_decl (var);
> ctx->global->heap_vars.safe_push (var);
> ctx->global->put_value (var, NULL_TREE);
> return fold_convert (ptr_type_node, build_address (var));
> @@ -3454,11 +3445,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
> cacheable = false;
> break;
>   }
> -   /* And don't cache a ref to a deleted heap variable (119162).  */
> -   if (cacheable
> -   && (cp_walk_tree_without_duplicates
> -   (&result, find_deleted_heap_var, NULL)))
> - cacheable = false;
>   }
>  
>   /* Rewrite all occurrences of the function's RESULT_DECL with the
> @@ -9025,20 +9011,6 @@ find_heap_var_refs (tree *tp, int *walk_subtrees, void 
> */*data*/)
>return NULL_TREE;
>  }
>  
> -/* Look for deleted heap variables in the expression *TP.  */
> -
> -static tree
> -find_deleted_heap_var (tree *tp, int *walk_subtrees, void */*data*/)
> -{
> -  if (VAR_P (*tp)
> -  && DECL_NAME (*tp) == heap_deleted_identifier)
> -return *tp;
> -
> -  if (TYPE_P (*tp))
> -*walk_subtrees = 0;
> -  return NULL_TREE;
> -}
> -
>  /* Find immediate function decls in *TP if any.  */
>  
>  static tree
> @@ -9275,7 +9247,6 @@ cxx_eval_outermost_constant_expr (tree t, bool 
> allow_non_constant,
> r = t;
> non_constant_p = true;
>   }
> -   varpool_node::get (heap_var)->remove ();
>   }
>  }
>  
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index c9471ea44b0..35fcf5087fb 100644
> --- a

Re: [PATCH] sreal.h: fix typo in the comment for sreal::max

2025-05-01 Thread Filip Kastl
On Wed 2025-04-30 08:44:11, Vojtěch Káně wrote:
> ---
>  gcc/sreal.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/sreal.h b/gcc/sreal.h
> index 8700807a131..c5aef1f3a82 100644
> --- a/gcc/sreal.h
> +++ b/gcc/sreal.h
> @@ -118,7 +118,7 @@ public:
>  return min;
>}
>  
> -  /* Global minimum sreal can hold.  */
> +  /* Global maximum sreal can hold.  */
>inline static sreal max ()
>{
>  sreal max;
> -- 
> 2.30.2

Hi Vojta!

I believe that even for small patches you have to include a ChangeLog entry in
the commit message.  See this https://gcc.gnu.org/contribute.html#patches.
Basically you run `contrib/git-gcc-customization.sh` on your local repo and
then use `git gcc-commit-mklog` instead of `git commit` which gives you a
template that you then fill out.

Cheers,
Filip


Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov


> On 1 May 2025, at 14:02, Ayan Shafqat  wrote:
> 
> On Thu, May 01, 2025 at 08:09:18AM +, Kyrylo Tkachov wrote:
>> 
>> I was going to ask why not use the standard __buuiltin_sqrt builtins but I 
>> guess those don’t guarantee that we avoid a libcall in all cases.
>> So this is ok.
>> 
> 
> Yes, __builtin_sqrt will generate calls to sqrt(3) with default compiler
> flags. Only with -fno-math-errno, or -ffast-math, __builtin_sqrt will
> generate sqrt instruction. Therefore, a new instruction was added in
> Aarch64's GCC backend to avoid libc call for sqrt(3).
> 
>> 
>> Do you want me to push the patches for you?
>> 
> 
> Yes, please. Thank you very much for the support and feedback.
> 

Ok, I’ve pushed the two patches to trunk.
Thanks,
Kyrill

> Best Regards,
> Ayan



Re: [GCC16, RFC, V2 01/14] opts: use unsigned HOST_WIDE_INT for sanitizer flags

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> Currently, the data type of sanitizer flags is unsigned int, with
> SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> enumerator for enum sanitize_code.  Use 'unsigned HOST_WIDE_INT' data
> type to allow for more distinct instrumentation modes be added when
> needed.
>
> FIXME:
> 1. Is using d_ulong_type for build_int_cst in gcc/d/d-attribs.cc, and
>uint64_type_node in gcc/c-family/c-attribs.cc OK ? To get type
>associated with unsigned HOST_WIDE_INT ?
>
> gcc/ChangeLog:
>
> * asan.h (sanitize_flags_p): Use 'unsigned HOST_WIDE_INT'
>   instead of 'unsigned int'.
> * common.opt: Likewise.
> * dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
> * opts.cc (find_sanitizer_argument): Likewise.
> (report_conflicting_sanitizer_options): Likewise.
> (parse_sanitizer_options): Likewise.
> (parse_no_sanitize_attribute): Likewise.
> * opts.h (parse_sanitizer_options): Likewise.
> (parse_no_sanitize_attribute): Likewise.
> * tree-cfg.cc (print_no_sanitize_attr_value): Likewise.
>
> gcc/c-family/ChangeLog:
>
> * c-attribs.cc (add_no_sanitize_value): Likewise.
> (handle_no_sanitize_attribute): Likewise.
> (handle_no_sanitize_address_attribute): Likewise.
> (handle_no_sanitize_thread_attribute): Likewise.
> (handle_no_address_safety_analysis_attribute): Likewise.
> * c-common.h (add_no_sanitize_value): Likewise.
>
> gcc/c/ChangeLog:
>
> * c-parser.cc (c_parser_declaration_or_fndef): Likewise.
>
> gcc/cp/ChangeLog:
>
> * typeck.cc (get_member_function_from_ptrfunc): Likewise.
>
> gcc/d/ChangeLog:
>
> * d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.

I don't have any good suggestions here.  But if possible, I think
it would be good to introduce a typedef for the flags, rather than
simply hard-coding HOST_WIDE_INT.  It might be even more forward-looking
to use a bbitmap.

Then we could have target-independent functions to convert the flags
to and from an attribute argument.  I would hope that these attribute
arguments are well enough shielded from the usual frontend type system
that we could use build_nonstandard_integer_type.  But this isn't
really my area.

Thanks,
Richard


Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

2025-05-01 Thread Pengfei Li
Thank you for the comments.

> I don't think we can use an unbounded recursive walk, since that
> would become quadratic if we ever used it when optimising one
> AND in a chain of ANDs.  (And using this function for ANDs
> seems plausible.)  Maybe we should be handling the information
> in a similar way to Ranger.

I'm trying to get rid of the recursion by reusing the code in 
get_nonzero_bits().

> Rather than handle the built-in case entirely in target code, how about
> having a target hook into nonzero_element_bits (or whatever replaces it)
> for machine-dependent builtins?

>From the perspective of necessity, do you think it's worth checking the 
>"svand" call inside, or worth handling the whole built-in case? Operations 
>with ACLE SVE types can also be folded as long as we use C/C++ general 
>operators which has been supported in GCC 15.

Thanks,
Pengfei


[PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-01 Thread LIU Hao
Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of 
`incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more.



--
Best regards,
LIU Hao



From eeb30bf621baa3af1a73e8e91bff297ef478 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

   ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this
target, so it is removed.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR 07
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
(STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect.
* config/i386/i386.cc (ix86_update_stack_boundary): Force minimum
128-bit alignment if `force_align_arg_pointer`.

Signed-off-by: LIU Hao 
---
 gcc/config/i386/cygming.h | 9 -
 gcc/config/i386/i386.cc   | 9 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..743cc38f5852 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,16 +28,15 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SEH
 #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)

+#undef PREFERRED_STACK_BOUNDARY_DEFAULT
+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY)
+
 /* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
Force the use of different mechanisms to allocate aligned local data.  */
 #undef MAX_STACK_ALIGNMENT
 #define MAX_STACK_ALIGNMENT  (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT)

-/* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
-   may require 16-byte alignment.  */
-#undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT TARGET_SSE
-
 /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
 #define TARGET_ASM_UNWIND_EMIT  i386_pe_seh_unwind_emit
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38df84f7db24..61e2acc53b7d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7920,6 +7920,15 @@ ix86_update_stack_boundary (void)
   if (ix86_tls_descriptor_calls_expanded_in_cfun
   && crtl->preferred_stack_boundary < 128)
 crtl->preferred_stack_boundary = 128;
+
+  /* For 32-bit MS ABI, both the incoming and preferred stack boundaries
+ are 32 bits, but if force_align_arg_pointer is specified, it should
+ prefer 128 bits for a backward-compatibility reason, which is also
+ what the doc suggests.  */
+  if (lookup_attribute ("force_align_arg_pointer",
+   TYPE_ATTRIBUTES (TREE_TYPE (current_function_decl)))
+  && crtl->preferred_stack_boundary < 128)
+crtl->preferred_stack_boundary = 128;
 }

 /* Handle the TARGET_GET_DRAP_RTX hook.  Return NULL if no DRAP is
--
2.49.0


From eeb30bf621baa3af1a73e8e91bff297ef478 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

   ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this
targ

Re: [PATCH v4 1/2] Aarch64: Use BUILTIN_VHSDF_HSDF for vector and scalar sqrt builtins

2025-05-01 Thread Kyrylo Tkachov



> On 28 Apr 2025, at 21:27, Ayan Shafqat  wrote:
> 
> Rebased with gcc 15.1
> 
> This patch changes the `sqrt` builtin definition from `BUILTIN_VHSDF_DF`
> to `BUILTIN_VHSDF_HSDF` in `aarch64-simd-builtins.def`, ensuring the
> builtin covers half, single, and double precision variants. The redundant
> `VAR1 (UNOP, sqrt, 2, FP, hf)` lines are removed, as they are no longer
> needed now that `BUILTIN_VHSDF_HSDF` handles those cases.
> 

Ok.
Thanks,
Kyrill

> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-simd-builtins.def: Change
> BUILTIN_VHSDF_DF to BUILTIN_VHSDF_HSDF
> 
> Signed-off-by: Ayan Shafqat 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/config/aarch64/aarch64-simd-builtins.def | 5 +
> 1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 6cc45b18a72..685bf0dc408 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -57,7 +57,7 @@
>   VAR1 (BINOPP, pmull, 0, DEFAULT, v8qi)
>   VAR1 (BINOPP, pmull_hi, 0, DEFAULT, v16qi)
>   BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0, FP)
> -  BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP)
> +  BUILTIN_VHSDF_HSDF (UNOP, sqrt, 2, FP)
>   BUILTIN_VDQ_I (BINOP, addp, 0, DEFAULT)
>   BUILTIN_VDQ_I (BINOPU, addp, 0, DEFAULT)
>   BUILTIN_VDQ_BHSI (UNOP, clrsb, 2, DEFAULT)
> @@ -848,9 +848,6 @@
>   BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0, FP)
>   BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0, FP)
> 
> -  /* Implemented by sqrt2.  */
> -  VAR1 (UNOP, sqrt, 2, FP, hf)
> -
>   /* Implemented by hf2.  */
>   VAR1 (UNOP, floatdi, 2, FP, hf)
>   VAR1 (UNOP, floatsi, 2, FP, hf)
> -- 
> 2.43.0
> 



Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Richard Biener



> Am 01.05.2025 um 09:53 schrieb Uros Bizjak :
> 
> On Thu, May 1, 2025 at 9:10 AM H.J. Lu  wrote:
>> 
>>> On Thu, May 1, 2025 at 2:56 PM Uros Bizjak  wrote:
>>> 
>>> On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
 
 On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
> 
> On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
>> 
>> AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to
>> avoid the following regressions with
>> 
>> $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
>> 
>> FAIL: gcc.dg/pr105911.c (internal compiler error: in 
>> lra_split_hard_reg_for, at lra-assigns.cc:1863)
>> FAIL: gcc.dg/pr105911.c (test for excess errors)
>> FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
>> vpro[lr]q 29
>> FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
>> FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
>> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
>> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
>> FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
>> FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
>> FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
>> FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
>> FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
>> FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
>> FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
>> FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
>> FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
>> FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
>> FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
>> 
>> Tested with glibc master branch at
>> 
>> commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
>> Author: Samuel Thibault 
>> Date:   Sun Mar 2 15:16:45 2025 +0100
>> 
>>htl: move pthread_once into libc
>> 
>> and built Linux kernel 6.13.5 on x86-64.
>> 
>>PR target/119083
>>* config/i386/i386.cc (ix86_class_likely_spilled_p): Remove CREG
>>and BREG.
> 
> The commit message doesn't reflect what the patch does.
> 
> OTOH, this is a very delicate part of the compiler. You are risking RA
> failures, the risk/benefit ratio is very high, so I wouldn't touch it
> without clear benefits. Do you have a concrete example where declaring
> BREG as spilled hurts?
> 
 
 I can find a testcase to show the improvement.  But I am not sure if
 it is what you were asking for.  ix86_class_likely_spilled_p was
 CLASS_LIKELY_SPILLED_P which was added by
 
 commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
 Author: Michael Meissner 
 Date:   Thu Sep 8 17:59:18 1994 +
 
Add support for -mreg-alloc=
 
 This option is long gone and there is no test in GCC testsuite to show
 that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
 another callee-saved register, not different from R12.  On i386, BX is
 used as the PIC register.  But today RA may pick a different register if
 PLT isn't involved.  This patch gives RA a little bit more freedom.
>>> 
>>> In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
>>> the compiler that some extra care is needed with listed classes. On
>>> i386 and x86_64 these include single register classes that represent
>>> "architectural" registers (registers with assigned role). The compiler
>>> does take care to not extend life times of CLASS_LIKELY_SPILLED_P
>>> classes too much to avoid reload failures in cases where instruction
>>> with C_L_S_P class (e.g. shifts with %cl register) is emitted between
>>> unrelated register def and use.
>> 
>> But there is no assigned role for RBX.
> 
> If no instruction uses "b" constraint and BREG class is not otherwise
> used for x86_64, then CLASS_LIKELY_SPILLED_P change is irrelevant for
> x86_64.

There might be a minor difference in instruction size when rbx is chosen over 
r9-15.  assuming reg alloc order prefers lower regs otherwise, of course.

Richard 

> 
> Uros.


Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 9:10 AM H.J. Lu  wrote:
>
> On Thu, May 1, 2025 at 2:56 PM Uros Bizjak  wrote:
> >
> > On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu  wrote:
> > >
> > > On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu  wrote:
> > > > >
> > > > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p 
> > > > > to
> > > > > avoid the following regressions with
> > > > >
> > > > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
> > > > >
> > > > > FAIL: gcc.dg/pr105911.c (internal compiler error: in 
> > > > > lra_split_hard_reg_for, at lra-assigns.cc:1863)
> > > > > FAIL: gcc.dg/pr105911.c (test for excess errors)
> > > > > FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times 
> > > > > vpro[lr]q 29
> > > > > FAIL: gcc.target/i386/bt-7.c scan-assembler-not and[lq][ \t]
> > > > > FAIL: gcc.target/i386/naked-4.c scan-assembler-not %[re]bp
> > > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-not addl
> > > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tv?movd\t 3
> > > > > FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times v?paddd 6
> > > > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-not \taddq\t
> > > > > FAIL: gcc.target/i386/pr107548-2.c scan-assembler-times v?paddq 2
> > > > > FAIL: gcc.target/i386/pr119171-1.c (test for excess errors)
> > > > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > > > FAIL: gcc.target/i386/pr57189.c scan-assembler-not movaps
> > > > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
> > > > > FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
> > > > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler-not movzbl
> > > > > FAIL: gcc.target/i386/pr78904-7b.c scan-assembler [ \t]orb
> > > > > FAIL: gcc.target/i386/pr91188-2c.c scan-assembler [ \t]andw
> > > > >
> > > > > Tested with glibc master branch at
> > > > >
> > > > > commit ccdb68e829a31e4cda8339ea0d2dc3e51fb81ba5
> > > > > Author: Samuel Thibault 
> > > > > Date:   Sun Mar 2 15:16:45 2025 +0100
> > > > >
> > > > > htl: move pthread_once into libc
> > > > >
> > > > > and built Linux kernel 6.13.5 on x86-64.
> > > > >
> > > > > PR target/119083
> > > > > * config/i386/i386.cc (ix86_class_likely_spilled_p): Remove 
> > > > > CREG
> > > > > and BREG.
> > > >
> > > > The commit message doesn't reflect what the patch does.
> > > >
> > > > OTOH, this is a very delicate part of the compiler. You are risking RA
> > > > failures, the risk/benefit ratio is very high, so I wouldn't touch it
> > > > without clear benefits. Do you have a concrete example where declaring
> > > > BREG as spilled hurts?
> > > >
> > >
> > > I can find a testcase to show the improvement.  But I am not sure if
> > > it is what you were asking for.  ix86_class_likely_spilled_p was
> > > CLASS_LIKELY_SPILLED_P which was added by
> > >
> > > commit f5316dfe88b8d1b8d3012c1f75349edf2ba1bdde
> > > Author: Michael Meissner 
> > > Date:   Thu Sep 8 17:59:18 1994 +
> > >
> > > Add support for -mreg-alloc=
> > >
> > > This option is long gone and there is no test in GCC testsuite to show
> > > that BX should be in ix86_class_likely_spilled_p.  On x86-64, BX is just
> > > another callee-saved register, not different from R12.  On i386, BX is
> > > used as the PIC register.  But today RA may pick a different register if
> > > PLT isn't involved.  This patch gives RA a little bit more freedom.
> >
> > In the past *decades* CLASS_LIKELY_SPILLED_P was repurposed to signal
> > the compiler that some extra care is needed with listed classes. On
> > i386 and x86_64 these include single register classes that represent
> > "architectural" registers (registers with assigned role). The compiler
> > does take care to not extend life times of CLASS_LIKELY_SPILLED_P
> > classes too much to avoid reload failures in cases where instruction
> > with C_L_S_P class (e.g. shifts with %cl register) is emitted between
> > unrelated register def and use.
>
> But there is no assigned role for RBX.

If no instruction uses "b" constraint and BREG class is not otherwise
used for x86_64, then CLASS_LIKELY_SPILLED_P change is irrelevant for
x86_64.

Uros.


Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov


> On 28 Apr 2025, at 21:29, Ayan Shafqat  wrote:
> 
> Rebased with gcc 15.1
> 
> This patch introduces two new inline functions, __sqrt and __sqrtf, in
> arm_acle.h for Aarch64 targets. These functions wrap the new builtins
> __builtin_aarch64_sqrtdf and __builtin_aarch64_sqrtsf, respectively,
> providing direct access to hardware instructions without relying on the
> standard math library or optimization levels.
> 
> This patch also introduces acle_sqrt.c in the AArch64 testsuite,
> verifying that the new __sqrt and __sqrtf intrinsics emit the expected
> fsqrt instructions for double and float arguments.
> 
> Coverage for new intrinsics ensures that __sqrt and __sqrtf are
> correctly expanded to hardware instructions and do not fall back to
> library calls, regardless of optimization levels.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/arm_acle.h (__sqrt, __sqrtf): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/acle/acle_sqrt.c: New test.
> 
> Signed-off-by: Ayan Shafqat 
> ---
> gcc/config/aarch64/arm_acle.h | 14 ++
> .../gcc.target/aarch64/acle/acle_sqrt.c   | 19 +++
> 2 files changed, 33 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/acle_sqrt.c
> 
> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> index d9e2401ea9f..507b6e72bcb 100644
> --- a/gcc/config/aarch64/arm_acle.h
> +++ b/gcc/config/aarch64/arm_acle.h
> @@ -118,6 +118,20 @@ __revl (unsigned long __value)
> return __rev (__value);
> }
> 
> +__extension__ extern __inline double
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__sqrt (double __x)
> +{
> +  return __builtin_aarch64_sqrtdf (__x);
> +}
> +
> +__extension__ extern __inline float
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__sqrtf (float __x)
> +{
> +  return __builtin_aarch64_sqrtsf (__x);
> +}

I was going to ask why not use the standard __buuiltin_sqrt builtins but I 
guess those don’t guarantee that we avoid a libcall in all cases.
So this is ok.

Do you want me to push the patches for you?

Thanks,
Kyrill


> +
> #pragma GCC push_options
> #pragma GCC target ("+nothing+jscvt")
> __extension__ extern __inline int32_t
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/acle_sqrt.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/acle_sqrt.c
> new file mode 100644
> index 000..482351fa7e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/acle_sqrt.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_acle.h"
> +
> +double
> +test_acle_sqrt (double x)
> +{
> +  return __sqrt (x);
> +}
> +
> +float
> +test_acle_sqrtf (float x)
> +{
> +  return __sqrtf (x);
> +}
> +
> +/* { dg-final { scan-assembler-times "fsqrt\td\[0-9\]" 1 } } */
> +/* { dg-final { scan-assembler-times "fsqrt\ts\[0-9\]" 1 } } */
> -- 
> 2.43.0
> 



Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Ayan Shafqat
On Thu, May 01, 2025 at 08:09:18AM +, Kyrylo Tkachov wrote:
> 
> I was going to ask why not use the standard __buuiltin_sqrt builtins but I 
> guess those don’t guarantee that we avoid a libcall in all cases.
> So this is ok.
>

Yes, __builtin_sqrt will generate calls to sqrt(3) with default compiler
flags. Only with -fno-math-errno, or -ffast-math, __builtin_sqrt will
generate sqrt instruction. Therefore, a new instruction was added in
Aarch64's GCC backend to avoid libc call for sqrt(3).

> 
> Do you want me to push the patches for you?
> 

Yes, please. Thank you very much for the support and feedback.

Best Regards,
Ayan


Re: [PING^5][PATCH] Alpha: Fix base block alignment calculation regression

2025-05-01 Thread Maciej W. Rozycki
On Sat, 19 Apr 2025, Maciej W. Rozycki wrote:

> > > > Address this issue by recursing into COMPONENT_REF tree nodes until the
> > > > outermost one has been reached, which is supposed to be a MEM_REF one,
> > > > accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
> > > > Also use tree information to get base block alignment") regression.
> > > 
> > >   Ping for:
> > > .
> > OK.  Clearly this one slipped through the cracks.
> 
>  Good timing, I've applied it now just as I'm about to head away for some 
> holiday time.  I'll take care of the other outstanding stuff in this area 
> once GCC 16 has opened and work on the Linux kernel side meanwhile, when
> I'm back.  Thank you for your review.

 Umm, I have now become aware that GCC 15 had already branched by the time 
of your approval; OK for GCC 15 then as it's a regression fix?

  Maciej


[COMMITTED, OBVIOUS] OpenMP: Restore lost Fortran testcase for 'omp allocate'

2025-05-01 Thread Sandra Loosemore
Tobias asked me to push the attached patch on his behalf.  I verified 
that the testcase still does pass on mainline.  :-)


-Sandra

From 08ce1b9f6707e00089c4d77d2bb82963d531bb1d Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Thu, 1 May 2025 15:39:42 +
Subject: [COMMITTED, OBVIOUS] OpenMP: Restore lost Fortran testcase for 'omp allocate'

This testcase, which is present on the OG13 and OG14 branches, was
overlooked when the Fortran support for 'omp allocate' was added to
mainline (commit d4b6d147920b93297e621124a99ed01e7e310d92 from
December 2023).

libgomp/ChangeLog

	* testsuite/libgomp.fortran/allocate-8a.f90: New test.
---
 .../testsuite/libgomp.fortran/allocate-8a.f90 | 45 +++
 1 file changed, 45 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-8a.f90

diff --git a/libgomp/testsuite/libgomp.fortran/allocate-8a.f90 b/libgomp/testsuite/libgomp.fortran/allocate-8a.f90
new file mode 100644
index 000..5f6c8c1e271
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/allocate-8a.f90
@@ -0,0 +1,45 @@
+! { dg-additional-options "-fopenmp-allocators" }
+! { dg-additional-options "-fdump-tree-omplower" }
+program main
+  use iso_c_binding
+  use omp_lib
+  implicit none (type, external)
+  integer(omp_allocator_handle_kind):: alloc_h
+  integer :: i, N
+  integer(c_intptr_t) :: intptr
+  integer, allocatable :: A(:)
+  type(omp_alloctrait):: traits(1) = [omp_alloctrait(omp_atk_alignment, 128)]
+
+  N = 10
+  alloc_h = omp_init_allocator(omp_default_mem_space, 1, traits)
+
+  !$omp allocate(A) allocator(alloc_h)
+  allocate(A(N))
+  a(:) = [(i, i=1,N)]
+  if (mod (transfer (loc(a), intptr),128) /= 0) &
+stop 1
+  if (any (a /= [(i, i=1,N)])) &
+stop 2
+  deallocate(A)
+  !$omp allocate(A) allocator(alloc_h) align(512)
+  allocate(A(N))
+  block
+integer, allocatable :: B(:)
+!$omp allocators allocate(allocator(alloc_h), align(256) : B)
+allocate(B(N))
+B(:) = [(2*i, i=1,N)]
+A(:) = B
+if (mod (transfer (loc(B), intptr), 256) /= 0) &
+  stop 1
+! end of scope deallocation
+  end block
+  if (mod (transfer (loc(a), intptr),512) /= 0) &
+stop 1
+  if (any (a /= [(2*i, i=1,N)])) &
+stop 2
+  deallocate(A) ! Must deallocate here - before deallocator is destroyed
+  call omp_destroy_allocator(alloc_h)
+  ! No auto dealloc of A because it is SAVE
+end
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_alloc \\(" 3 "omplower" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_free \\(" 3 "omplower" } }
-- 
2.34.1



Re: [GCC16,RFC,V2 02/14] aarch64: add new define_insn for subg

2025-05-01 Thread Richard Sandiford
Sorry for the late reply.

Indu Bhagat  writes:
> On 4/22/25 11:00 AM, Indu Bhagat wrote:
>> On 4/15/25 8:35 AM, Richard Sandiford wrote:
>>> Hi,
>>>
>>> Indu Bhagat  writes:
 subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE)
 instruction.  It can be used to subtract an immediate value scaled by
 the tag granule from the address in the source register.

 gcc/ChangeLog:

 * config/aarch64/aarch64.md (subg): New definition.
>>>
>>> In my previous comment about this patch:
>>>
>>>    https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668669.html
>>>
>>> I hadn't realised that the pattern follows the existing "addg" pattern.
>>> But...
>>>
>> 
>> And I just realized that it seems some of my edits to this commit got 
>> lost in a rebase or such in my work branch.
>> 
>> I had addressed the review comments earlier on this patch, but they 
>> didnt make it out in RFC V2.
>> 
 ---
   gcc/config/aarch64/aarch64.md | 17 +
   1 file changed, 17 insertions(+)

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/ 
 aarch64.md
 index 031e621c98a1..0c7aebb838cd 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -8416,6 +8416,23 @@
     [(set_attr "type" "memtag")]
   )
 +(define_insn "subg"
 +  [(set (match_operand:DI 0 "register_operand" "=rk")
 +    (ior:DI
 + (and:DI (minus:DI (match_operand:DI 1 "register_operand" "rk")
 +  (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
 + (const_int -1080863910568919041)) ;; 0xf0ff...
 + (ashift:DI
 +  (unspec:QI
 +   [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) 
 (const_int 15))
 +    (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
 +   UNSPEC_GEN_TAG)
 +  (const_int 56]
 +  "TARGET_MEMTAG"
 +  "subg\\t%0, %1, #%2, #%3"
 +  [(set_attr "type" "memtag")]
 +)
 +
   (define_insn "subp"
     [(set (match_operand:DI 0 "register_operand" "=r")
   (minus:DI
>>>
>>> ...subtractions of constants are canonically expressed using (plus ...)
>>> of a negative number, rather than (minus ...) of a positive number.
>>> So I think we should instead add subg to the existing addg pattern.
>>> That is, in:
>>>
>>> (define_insn "addg"
>>>    [(set (match_operand:DI 0 "register_operand" "=rk")
>>> (ior:DI
>>>  (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
>>>   (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
>>>  (const_int -1080863910568919041)) ;; 0xf0ff...
>>>  (ashift:DI
>>>   (unspec:QI
>>>    [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 
>>> 15))
>>>     (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
>>>    UNSPEC_GEN_TAG)
>>>   (const_int 56]
>>>    "TARGET_MEMTAG"
>>>    "addg\\t%0, %1, #%2, #%3"
>>>    [(set_attr "type" "memtag")]
>>> )
>>>
>>> the aarch64_granule16_uimm6 would be replaced with a predicate that
>>> accepts all multiples of 16 in the range [-1008, 1008].  Then the
>>> output pattern would generate an addg or subg instruction based on
>>> whether operand 2 is negative.
>>>
>> 
>> ;; These constants are used as a const_int in MTE instructions
>> (define_constants
>>    [; 0xf0ff...
>>     ; Tag mask for the 4-bit tag stored in the top 8 bits of a pointer.
>>     (MEMTAG_TAG_MASK -1080863910568919041)])
>> 
>> Above was requested by Andrew.  If what is shown here (both defining the 
>> constant and using it below) in the email OK to you, I can submit a 
>> separate patch to then use this const in existing irg, ldg..
>> 
>> with predicates.md saying:
>> 
>> (define_predicate "aarch64_granule16_imm6"
>>    (and (match_code "const_int")
>>     (match_test "IN_RANGE (INTVAL (op), -1008, 1008)

The minimum can be -1020 instead of -1008.

>>      && !(INTVAL (op) & 0xf)")))
>> 
>> addg pattern as follows:
>> 
>> (define_insn "addg"
>>    [(set (match_operand:DI 0 "register_operand")
>>      (ior:DI
>>   (and:DI (plus:DI (match_operand:DI 1 "register_operand")
>>    (match_operand:DI 2 "aarch64_granule16_imm6"))
>>   (const_int MEMTAG_TAG_MASK))
>>   (ashift:DI
>>     (unspec:DI [(match_dup 1)
>>     (match_operand:QI 3 "aarch64_memtag_tag_offset")]
>>     UNSPEC_GEN_TAG)
>>    (const_int 56]
>>    "TARGET_MEMTAG"
>>    {@ [ cons: =0 , 1  , 2 , 3 ; attrs: type ]
>>   [ rk   , rk , I , I ; memtag   ] addg\t%0, %1, #%2, #%3
>>   [ rk   , rk , J , I ; memtag   ] subg\t%0, %1, #%2, #%3
>>    }
>> )
>> 
>
> Sorry, it should be subg\t%0, %1, #%n2, #%3.

Yeah.

> Also I need to define new Ix and Jx constraints suiting imm6...

They would need to start with something other than "I" or "J",
since the number of characters in a constrai

[pushed] c++: avoid weird #line paths in std-name-hint.h

2025-05-01 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

etags was getting confused by the #line pathnames in std-name-hint.h that
don't match my directory layout; let's avoid encoding information about
a particular developer's $(srcdir) in the generated file.

gcc/cp/ChangeLog:

* Make-lang.in: Don't pass the full path to gperf.
* std-name-hint.h: Regenerate.
---
 gcc/cp/std-name-hint.h | 974 -
 gcc/cp/Make-lang.in|   4 +-
 2 files changed, 489 insertions(+), 489 deletions(-)

diff --git a/gcc/cp/std-name-hint.h b/gcc/cp/std-name-hint.h
index dde2f9bd717..c5f800400ea 100644
--- a/gcc/cp/std-name-hint.h
+++ b/gcc/cp/std-name-hint.h
@@ -144,979 +144,979 @@ std_name_hint_lookup::find (const char *str, size_t len)
 
   static const struct std_name_hint wordlist[] =
 {
-#line 130 "../../src/gcc/cp/std-name-hint.gperf"
+#line 130 "std-name-hint.gperf"
   {"regular", "", cxx20},
-#line 292 "../../src/gcc/cp/std-name-hint.gperf"
+#line 292 "std-name-hint.gperf"
   {"reverse_iterator", "", cxx98},
-#line 454 "../../src/gcc/cp/std-name-hint.gperf"
+#line 454 "std-name-hint.gperf"
   {"range_error", "", cxx98},
-#line 408 "../../src/gcc/cp/std-name-hint.gperf"
+#line 408 "std-name-hint.gperf"
   {"set", "", cxx98},
-#line 231 "../../src/gcc/cp/std-name-hint.gperf"
+#line 231 "std-name-hint.gperf"
   {"setbase", "", cxx98},
-#line 325 "../../src/gcc/cp/std-name-hint.gperf"
+#line 325 "std-name-hint.gperf"
   {"reinterpret_pointer_cast", "", cxx17},
-#line 288 "../../src/gcc/cp/std-name-hint.gperf"
+#line 288 "std-name-hint.gperf"
   {"next", "", cxx11},
-#line 171 "../../src/gcc/cp/std-name-hint.gperf"
+#line 171 "std-name-hint.gperf"
   {"format", "", cxx20},
-#line 181 "../../src/gcc/cp/std-name-hint.gperf"
+#line 181 "std-name-hint.gperf"
   {"formatter", "", cxx20},
-#line 196 "../../src/gcc/cp/std-name-hint.gperf"
+#line 196 "std-name-hint.gperf"
   {"basic_filebuf", "", cxx98},
-#line 575 "../../src/gcc/cp/std-name-hint.gperf"
+#line 575 "std-name-hint.gperf"
   {"pair", "", cxx98},
-#line 276 "../../src/gcc/cp/std-name-hint.gperf"
+#line 276 "std-name-hint.gperf"
   {"begin", "", cxx11},
-#line 179 "../../src/gcc/cp/std-name-hint.gperf"
+#line 179 "std-name-hint.gperf"
   {"formattable", "", cxx23},
-#line 541 "../../src/gcc/cp/std-name-hint.gperf"
+#line 541 "std-name-hint.gperf"
   {"bad_cast", "", cxx98},
-#line 233 "../../src/gcc/cp/std-name-hint.gperf"
+#line 233 "std-name-hint.gperf"
   {"setiosflags", "", cxx98},
-#line 393 "../../src/gcc/cp/std-name-hint.gperf"
+#line 393 "std-name-hint.gperf"
   {"print", "", cxx23},
-#line 221 "../../src/gcc/cp/std-name-hint.gperf"
+#line 221 "std-name-hint.gperf"
   {"promise", "", cxx11},
-#line 581 "../../src/gcc/cp/std-name-hint.gperf"
+#line 581 "std-name-hint.gperf"
   {"bad_variant_access", "", cxx17},
-#line 328 "../../src/gcc/cp/std-name-hint.gperf"
+#line 328 "std-name-hint.gperf"
   {"to_address", "", cxx20},
-#line 420 "../../src/gcc/cp/std-name-hint.gperf"
+#line 420 "std-name-hint.gperf"
   {"basic_spanbuf", "", cxx23},
-#line 106 "../../src/gcc/cp/std-name-hint.gperf"
+#line 106 "std-name-hint.gperf"
   {"same_as", "", cxx20},
-#line 336 "../../src/gcc/cp/std-name-hint.gperf"
+#line 336 "std-name-hint.gperf"
   {"pmr", "", cxx17},
-#line 180 "../../src/gcc/cp/std-name-hint.gperf"
+#line 180 "std-name-hint.gperf"
   {"formatted_size", "", cxx20},
-#line 275 "../../src/gcc/cp/std-name-hint.gperf"
+#line 275 "std-name-hint.gperf"
   {"back_inserter", "", cxx98},
-#line 251 "../../src/gcc/cp/std-name-hint.gperf"
+#line 251 "std-name-hint.gperf"
   {"nouppercase", "", cxx98},
-#line 250 "../../src/gcc/cp/std-name-hint.gperf"
+#line 250 "std-name-hint.gperf"
   {"nounitbuf", "", cxx98},
-#line 433 "../../src/gcc/cp/std-name-hint.gperf"
+#line 433 "std-name-hint.gperf"
   {"basic_stringbuf", "", cxx98},
-#line 592 "../../src/gcc/cp/std-name-hint.gperf"
+#line 592 "std-name-hint.gperf"
   {"vector", "", cxx98},
-#line 246 "../../src/gcc/cp/std-name-hint.gperf"
+#line 246 "std-name-hint.gperf"
   {"noshowbase", "", cxx98},
-#line 219 "../../src/gcc/cp/std-name-hint.gperf"
+#line 219 "std-name-hint.gperf"
   {"future", "", cxx11},
-#line 340 "../../src/gcc/cp/std-name-hint.gperf"
+#line 340 "std-name-hint.gperf"
   {"pmr::new_delete_resource", "", cxx17},
-#line 337 "../../src/gcc/cp/std-name-hint.gperf"
+#line 337 "std-name-hint.gperf"
   {"pmr::get_default_resource", "", cxx17},
-#line 343 "../../src/gcc/cp/std-name-hint.gperf"
+#line 343 "std-name-hint.gperf"
   {"pmr::set_default_resource", "", cxx17},
-#line 455 "../../src/gcc/cp/std-name-hint.gperf"
+#line 455 "std-name-hint.gperf"
   {"runtime_error", "", cxx98},
-#line 516 "../../src/gcc/cp/std-name-hint.gperf"
+#line 516 "std-name-hint.gperf"
   {"tuple", "", cxx11},
-#line 132 "../../src/gcc/cp/std-na

Re: [GCC16, RFC, V2 06/14] opts: doc: aarch64: add new memtag sanitizer

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> [...]
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ffde9df85fd3..de651183a703 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17079,6 +17079,14 @@ and @option{-fsanitize=kernel-hwaddress}.
>  To disable instrumentation of builtin functions use
>  @option{--param hwasan-instrument-mem-intrinsics=0}.
>  
> +@item memtag-instrument-stack
> +Enable MTE tagging of statically sized stack-allocated variables.  This kind 
> of
> +code generation is enabled by default when using @option{-fsanitize=memtag}.
> +
> +@item memtag-instrument-allocas
> +Enable MTE tagging of dynamically sized stack-allocated variables.  This 
> kind of
> +code generation is enabled by default when using @option{-fsanitize=memtag}.
> +

Since this is a target-independent parameter, it might be better
to use something more neutral than "MTE", such as "hardware memory
tagging".  Same elsewhere.

> [...]
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 86c6691ecec4..00db662c32ef 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> [...]
> @@ -2780,6 +2788,13 @@ common_handle_option (struct gcc_options *opts,
> SET_OPTION_IF_UNSET (opts, opts_set,
>  param_hwasan_instrument_allocas, 0);
>   }
> +  /* Memtag sanitizer implies HWASAN but with tags always generated by 
> the
> +  hardware randomly.  */
> +  if (opts->x_flag_sanitize & SANITIZE_MEMTAG)
> + {
> +   SET_OPTION_IF_UNSET (opts, opts_set,
> +param_hwasan_random_frame_tag, 1);
> + }

Does this have any effect in practice?  The default seems to be 1,
so I would expect this to be a nop.  The pattern elsewhere in the
sanitiser code seems to be to use SET_OPTION_IF_UNSET only to turn
features off.

Thanks,
Richard


Ping: Re: [Stage 1][Middle-end][PATCH v5 0/3] Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-01 Thread Qing Zhao
Hi, 

A gentle ping on review of the Middle-end of change of this patch set.
The diagnostic part has been reviewed and approved by David last year already. 

The 4th version of the patch set has been sent for review since Nov 5, 2024. 
Pinged 5 times since then. 

Linux Kernel has been using this feature for a while, and found it very useful.
Kees has been asking for the status of this patch many many times.  -:)

We are hoping to make this into GCC16, should be a nice improvement in general.

Please take a look and let me know whether it’s ready for GCC16? 

Thanks a lot.

Qing.

For your convenience, the links to the latest version are:

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680336.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680337.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680339.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680338.html


> On Apr 7, 2025, at 11:04, Qing Zhao  wrote:
> 
> Hi,
> 
> These are the patches for fixing PR109071 for GCC16 stage1:
> 
> Adding -fdiagnotics-details into GCC to provide more hints to the
> end users on how the warnings come from, in order to help the user
> to locate the exact location in source code on the specific warnings
> due to compiler optimizations.
> 
> They base on the the following 4th version of the patch and rebased
> on the latest trunk. 
> 
> bootstrapping and regression testing on both x86 and aarch64.
> 
> Kees and Sam have been using this option for a while in linux kernel
> and other applications and both found very helpful.
> 
> They asked me several times about the status of this work and hope
> the functionality can be available in GCC as soon as possible.
> 
> The diagnostic part of the patch had been reviewed and approved by
> David already last year. 
> 
> Please review the middle-end part of the change.
> 
> thanks a lot.
> 
> Qing
> 
> ===
> 
> The latest version of(4th version) is:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667613.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667614.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667615.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667616.html
> 
> The major improvements to this patch compared to version 3 are:
> 
> 1. Divide the patch into 3 parts:
>Part 1: Add new data structure move_history, record move_history during
>transformation;
>Part 2: In warning analysis, Use the new move_history to form a rich
>location with a sequence of events, to report more context info
>of the warnings.
>Part 3: Add debugging mechanism for move_history.
> 
> 2. Major change to the above Part 2, completely rewritten based on David's
>   new class lazy_diagnostic_path. 
> 
> 3. Fix all issues identied By Sam;
>   A. fix PR117375 (Bug in tree-ssa-sink.cc);
>   B. documentation clarification;
>   C. Add all the duplicated PRs in the commit comments;
> 
> 4. Bootstrap GCC with the new -fdiagnostics-details on by default (Init (1)).
>   exposed some ICE similar as PR117375 in tree-ssa-sink.cc, fixed.
> 
> Qing Zhao (3):
>  Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
>due to code movements from compiler transformation (Part 1)
>[PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
>  Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
>due to code movements from compiler transformation (Part 2)
>[PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
>  Provide more contexts for -Warray-bounds, -Wstringop-* warning
>messages due to code movements from compiler transformation (Part 3)
>[PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
> 
> gcc/Makefile.in   |   2 +
> gcc/common.opt|   4 +
> gcc/diagnostic-move-history.cc| 332 ++
> gcc/diagnostic-move-history.h |  94 +
> gcc/doc/invoke.texi   |  11 +
> gcc/gimple-array-bounds.cc|  39 ++--
> gcc/gimple-array-bounds.h |   2 +-
> gcc/gimple-iterator.cc|   3 +
> gcc/gimple-pretty-print.cc|   4 +
> gcc/gimple-ssa-isolate-paths.cc   |  21 ++
> gcc/gimple-ssa-warn-access.cc | 131 +++-
> gcc/gimple-ssa-warn-restrict.cc   |  25 ++-
> gcc/move-history-rich-location.cc |  56 +
> gcc/move-history-rich-location.h  |  65 ++
> gcc/testsuite/gcc.dg/pr109071.c   |  43 
> gcc/testsuite/gcc.dg/pr109071_1.c |  36 
> gcc/testsuite/gcc.dg/pr109071_2.c |  50 +
> gcc/testsuite/gcc.dg/pr109071_3.c |  42 
> gcc/testsuite/gcc.dg/pr109071_4.c |  41 
> gcc/testsuite/gcc.dg/pr109071_5.c |  33 +++
> gcc/testsuite/gcc.dg/pr109071_6.c |  49 +
> gcc/testsuite/gcc.dg/pr117375.c   |  13 ++
> gcc/toplev.cc |   3 +
> gcc/tree-ssa-sink.cc  |  65 ++
> gcc/tree-ssa-threadupdate.cc  |  43 
> 25 files changed, 1124

[PATCH v4] libstdc++: Implement C++26 features (P2546R5)

2025-05-01 Thread Jonathan Wakely
This includes the P2810R4 (is_debugger_present is_replaceable) changes,
allowing std::is_debugger_present to be replaced by the program.

It would be good to provide a macOS definition of is_debugger_present as
per https://developer.apple.com/library/archive/qa/qa1361/_index.html
but that isn't included in this change.

The src/c++26/debugging.cc file defines a global volatile int which can
be set by debuggers to indicate when they are attached and detached from
a running process. This allows std::is_debugger_present() to give a
reliable answer, and additionally allows a debugger to choose how
std::breakpoint() should behave. Setting the global to a positive value
will cause std::breakpoint() to use that value as an argument to
std::raise, so debuggers that prefer SIGABRT for breakpoints can select
that. By default std::breakpoint() will use a platform-specific action
such as the INT3 instruction on x86, or GCC's __builtin_trap().

On Linux the std::is_debugger_present() function checks whether the
process is being traced by a process named "gdb", "gdbserver" or
"lldb-server", to try to avoid interpreting other tracing processes
(such as strace) as a debugger. There have been comments suggesting this
isn't desirable and that std::is_debugger_present() should just return
true for any tracing process (which is the case for non-Linux targets
that support the ptrace system call).

libstdc++-v3/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for facilities needed by .
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (debugging): Add.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Add new header.
* src/c++26/Makefile.am: Add new file.
* src/c++26/Makefile.in: Regenerate.
* include/std/debugging: New file.
* src/c++26/debugging.cc: New file.
* testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
* testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
New test.
* testsuite/19_diagnostics/debugging/is_debugger_present.cc: New
test.
* testsuite/19_diagnostics/debugging/is_debugger_present-2.cc:
New test.
---

Patch v4 replaces the attach_debugger(bool) function with a global
variable debugger_signal_for_breakpoint which indicates whether a
debugger is present but also

On the LLVM discourse there were concerns about requiring debuggers to
execute a function in the inferior process, and I got similar feedback
from John DelSignore of TotalView. Setting an int value is simpler, and
what's proposed in this patch also allows customizing std::breakpoint()
for debuggers that would prefer to deal with signals than trap
instructions.

Tested x86_64-linux.

 libstdc++-v3/config.h.in  |   9 +
 libstdc++-v3/configure|  22 +++
 libstdc++-v3/configure.ac |   9 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/precompiled/stdc++.h |   1 +
 libstdc++-v3/include/std/debugging|  77 
 libstdc++-v3/src/c++26/Makefile.am|   4 +-
 libstdc++-v3/src/c++26/Makefile.in|   7 +-
 libstdc++-v3/src/c++26/debugging.cc   | 174 ++
 .../19_diagnostics/debugging/breakpoint.cc|  13 ++
 .../debugging/breakpoint_if_debugging.cc  |  13 ++
 .../debugging/is_debugger_present-2.cc|  19 ++
 .../debugging/is_debugger_present.cc  |  14 ++
 16 files changed, 379 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/include/std/debugging
 create mode 100644 libstdc++-v3/src/c++26/debugging.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/breakpoint.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/is_debugger_present-2.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/is_debugger_present.cc

diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index ed067717a20..6c468973b93 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -70,6 +70,9 @@
 /* Define to 1 if you have the `cosl' function. */
 #undef HAVE_COSL
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_DEBUGAPI_H
+
 /* Define to 1 if you have the declaration of `strnlen', and to 0 if you
don't. */
 #undef HAVE_DECL_STRNLEN
@@ -439,6 +442,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_PARAM_H
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_SYS_PTRACE_H
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_RESOURCE_

Re: [PATCH] phiopt: Remove special case for a sequence after match and simplify for early phiopt

2025-05-01 Thread Richard Biener
On Thu, May 1, 2025 at 6:26 AM Andrew Pinski  wrote:
>
> r16-189-g99aa410f5e0a72 fixed the case where match-and-simplify there was an 
> extra
> assignment happening inside the sequence return. phiopt_early_allow had code 
> to
> workaround that issue but now can be removed and simplify down to only 
> allowing
> the sequence having only one MIN/MAX if the outer code is MIN/MAX also.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (phiopt_early_allow): Only allow a sequence
> with one statement for MIN/MAX and the op was MIN/MAX.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-phiopt.cc | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index e27166c55a5..54ecd93495a 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -549,8 +549,7 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
>tree_code code = (tree_code)op.code;
>
>/* For non-empty sequence, only allow one statement
> - except for MIN/MAX, allow max 2 statements,
> - each with MIN/MAX.  */
> + a MIN/MAX and an original MIN/MAX.  */
>if (!gimple_seq_empty_p (seq))
>  {
>if (code == MIN_EXPR || code == MAX_EXPR)
> @@ -565,18 +564,7 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
>   code = gimple_assign_rhs_code (stmt);
>   return code == MIN_EXPR || code == MAX_EXPR;
> }
> -  /* Check to make sure op was already a SSA_NAME.  */
> -  if (code != SSA_NAME)
> -   return false;
> -  if (!gimple_seq_singleton_p (seq))
> -   return false;
> -  gimple *stmt = gimple_seq_first_stmt (seq);
> -  /* Only allow assignments.  */
> -  if (!is_gimple_assign (stmt))
> -   return false;
> -  if (gimple_assign_lhs (stmt) != op.ops[0])
> -   return false;
> -  code = gimple_assign_rhs_code (stmt);
> +  return false;
>  }
>
>switch (code)
> --
> 2.43.0
>


Re: [GCC16,RFC,V2 04/14] aarch64: add new definition for post-index st2g

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> On 4/15/25 11:52 AM, Richard Sandiford wrote:
>> Indu Bhagat  writes:
>>> Using post-index st2g is a faster way of memory tagging/untagging.
>>> Because a post-index 'st2g tag, [addr], #32' is equivalent to:
>>> stg tag, addr, #0
>>> stg tag, addr, #16
>>> add addr, addr, #32
>>>
>>> TBD:
>>>- Currently generated by in the aarch64 backend.  Not sure if this is
>>>  the right way to do it.
>> 
>> If we do go for the "aarch64_granule_memory_operand" approach that
>> I described for patch 3, then that predicate (and the associated constrant)
>> could handle PRE_MODIFY and POST_MODIFY addresseses, which would remove
>> the need for separate patterns.
>> 
>
> I think I understand :)  I will try it out.  I guess one of the unknowns 
> for me is whether the PRE_MODIFY / POST_MODIFY will be generated as 
> expected, even when the involved instructions have an unspec...

I wouldn't expect an ordinary unspec to matter, but yeah, perhaps the
unspec_volatile would cause issues.  If it turns out to be necessary,
we could add a target hook that says that auto inc/dec address adjustments
are allowed for a given insn.

>>>- Also not clear how to weave in the generation of stz2g.
>> 
>> I think stz2g could be:
>> 
>> (set (match_operand:OI 0 "aarch64_granule_memory_operand" "+> constraint>")
>>   (unspec_volatile:OI
>> [(const_int 0)
>>  (match_operand:DI 1 "register_operand" "rk")]
>> UNSPECV...))
>> 
>
> The question I have is what changes will be necessary to have the 
> compiler DTRT:
>
>i.e. for the zero-init case, instead of
>  stg x1, [x1, #0]
>  str wzr, [x1]
>generate
>  stzg x0, [x0]
>
>Similarly for the value init case, instead of
>  stg x0, [x0, #0]
>  mov w1, 42
>  str w1, [x0]
> generate
>  mov  w1, #42
>  stgp x1, xzr, [x0]
>
> I guess once I have worked out the patterns for above, I should see the 
> combiner in action DTRT, but I dont know for sure if something else in 
> the compiler will also need adjustments for these new MTE insns.

Yeah, I'd expect combine to handle the STZG example.  But the STGP
example might need some changes to aarch64-ldp-fusion.cc.

Thanks,
Richard


Re: [GCC16, RFC, V2 13/14] gcc: aarch64: memtag: update link spec to pass arguments to linker

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> In context of stack tagging, the AArch64 Memtag ABI Extension to ELF
> specifies the usage of two dynamic tags for the dynamic loader to do the
> necessary tasks:
>   - If DT_AARCH64_MEMTAG_MODE is present, the dynamic loader should (in
> a platform-specific specific way) enable MTE for the process.
>   - If DT_AARCH64_MEMTAG_STACK is present, the dynamic loader should
> enable tagging for the main stack and thread stacks.
>
> Make changes in the link spec so appropriate command line options can be
> passed to ld.
>
> The two (proposed) command line options added to ld are:
>   -z memtag-mode=
>   -z memtag-stack
>
> On the GCC side, the user can:
>   - Enable MTE stack tagging using -fsanitize=memtag
>   - Select the MTE mode by using -fsanitize-memtag-mode=mode.
>
> TBD:
>  - We need to check explicitly for stack tagging; sanitize(memtag) does
>not appear to be enough.  Because -fsanitize=memtag will also be used
>for MTE tagging of globals later.  On a related note, clang has two
>explicit options: -fsanitize=memtag-stack and -fsanitize=memtag-globals.

Yeah, I think we should aim for option compatibility with Clang unless
there's a specific reason not to.  In a way, getting rid of the associated
--params is a feature, since --params are supposed to be developer options
that can go away at any time.

Thanks,
Richard

> gcc/ChangeLog:
>
> * config/aarch64/aarch64-linux.h: Update LINUX_TARGET_LINK_SPEC
>   macro.
> * gcc.cc (sanitize_spec_function): Add check for memtag.
>
> ---
> [New in RFC V2]
> ---
>  gcc/config/aarch64/aarch64-linux.h | 4 +++-
>  gcc/gcc.cc | 2 ++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-linux.h 
> b/gcc/config/aarch64/aarch64-linux.h
> index 116bb4e69f37..a5e5f8bb5ac5 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -48,7 +48,9 @@
> %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
> -X\
> %{mbig-endian:-EB} %{mlittle-endian:-EL} \
> -   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
> +   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} \
> +   %{%:sanitize(memtag):%{!fsanitize-memtag-mode:-z memtag-stack -z 
> memtag-mode=sync}} \
> +   %{%:sanitize(memtag):%{fsanitize-memtag-mode=*:-z memtag-stack -z 
> memtag-mode=%}}"
>  
>  
>  #define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index aac33e91a9a0..5beb793b075c 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -10443,6 +10443,8 @@ sanitize_spec_function (int argc, const char **argv)
>  return (flag_sanitize & SANITIZE_KERNEL_ADDRESS) ? "" : NULL;
>if (strcmp (argv[0], "kernel-hwaddress") == 0)
>  return (flag_sanitize & SANITIZE_KERNEL_HWADDRESS) ? "" : NULL;
> +  if (strcmp (argv[0], "memtag") == 0)
> +return (flag_sanitize & SANITIZE_MEMTAG) ? "" : NULL;
>if (strcmp (argv[0], "thread") == 0)
>  return (flag_sanitize & SANITIZE_THREAD) ? "" : NULL;
>if (strcmp (argv[0], "undefined") == 0)


Re: [PATCH v2 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Jakub Jelinek
On Thu, May 01, 2025 at 04:30:56PM +, Joseph Myers wrote:
> On Thu, 1 May 2025, Christopher Bazley wrote:
> 
> > Changes in v2:
> >  - Fixed a formatting issue.
> >  - Added a test.
> 
> Thanks.  I have no further comments; I hope someone more familiar with 
> debug info handling can review this version.

LGTM as well.

Jakub



Re: [GCC16, RFC, V2 09/14] hwasan: add support for generating MTE instructions for memory tagging

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> Memory tagging is used for detecting memory safety bugs.  On AArch64, the
> memory tagging extension (MTE) helps in reducing the overheads of memory
> tagging:
>  - CPU: MTE instructions for efficiently tagging and untagging memory.
>  - Memory: New memory type, Normal Tagged Memory, added to the Arm
>Architecture.
>
> The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
> HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
> rely on the same sanitizer machinery in parts.  So, define new
> constructs that allow MEMTAG and HWASAN to share the infrastructure:
>
>   - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
> SANITIZE_HWASAN is true.
>   - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
> stack variables are to be sanitized.
>
> MEMTAG and HWASAN do have differences, however, and hence, the need to
> conditionalize using memtag_sanitize_p () in the relevant places. E.g.,
>
>   - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
> to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
> memory.  Similar approach can be seen for handling
> handle_builtin_alloca, where instead of doing the gimple
> transformations, target hooks are used.
>
>   - Add a new internal function HWASAN_ALLOCA_POISON to handle
> dynamically allocated stack when MEMTAG sanitizer is enabled. At
> expansion, this allows to, in turn, invoke target-hooks to increment
> tag, and use the generated tag to finally tag the dynamically allocated
> memory.
>
>   - HWASAN uses HWASAN_STACK_BACKGROUND as the background color.  In
> case of MEMTAG, we simply pick stack_pointer_rtx.  See
> hwasan_emit_untag_frame ().
>
>   - HWASAN relies on CFN_HWASAN_CHOOSE_TAG to extract the tag if
> necessary.  For MEMTAG, however, since there is no hardware
> instruction to extract tag from register, we make the implementation
> diverge a bit.
>
> The usual pattern:
> irg x0, x0, x0
> subgx0, x0, #16, #0
> creates a tag in x0 and so on.  For alloca, we need to apply the
> generated tag to the new sp.  In absense of an extract tag insn, the
> implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg
> again.

This is probably one of those where the reason will click as soon as
I hit send, but I'm sure why this is necessary.  Can't the tag be
extracted using normal base Armv8-A instructions?

AIUI, the point of hwasan_frame_tag_offset is to "guarantee" that
ajoining data in the same frame has a different tag.  Wouldn't reusing
IRG be weaker, given the randomness?

> TBD:
>  - Not sure if we really need param_memtag_instrument_mem_intrinsics
>explicitly.
>  - Conditionalizing using hwassist_sanitize_p (), memtag_sanitize_p ()
>etc looks unappetizing in some cases.  Not sure if there is a better
>way.  Is this generally the right thing to do, or is there some
>desirable refactorings.
>  - adding decl to hwasan_stack_var. double check if this is necessary.
>See how we update the RTL for decl at expand_one_stack_var_at. And
>then use the RTL for decl in hwasan_emit_prologue.  Add testcases
>around this.
>  - In hwasan_frame_base (), see if checking for memtag_sanitize_p () for
>force_reg etc is really necessary.  Revisit to see what gives, fix or
>add documentation.
>  - Error out if user specifies stack alloc alignment not a factor of 16 ?

I think it's up to gcc to round up the alignment to STACK_BOUNDARY.
Specifying a smaller alignment doesn't seem like an error; it's
a minimum rather than a maximum.

Thanks,
Richard

>


>
> gcc/ChangeLog:
>
>   * asan.cc (struct hwasan_stack_var):
>   (handle_builtin_stack_restore): Accommodate MEMTAG sanitizer.
>   (handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
>   (get_mem_refs_of_builtin_call): Include MEMTAG along with
>   HWASAN.
>   (memtag_sanitize_stack_p): New definition.
>   (memtag_sanitize_allocas_p): Likewise.
>   (memtag_memintrin): Likewise.
>   (hwassist_sanitize_p): Likewise.
>   (hwassist_sanitize_stack_p): Likewise.
>   (report_error_func): Include MEMTAG along with HWASAN.
>   (build_check_stmt): Likewise.
>   (instrument_derefs): MEMTAG too does not deal with globals yet.
>   (instrument_builtin_call):
>   (maybe_instrument_call): Include MEMTAG along with HWASAN.
>   (asan_expand_mark_ifn): Likewise.
>   (asan_expand_check_ifn): Likewise.
>   (asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
>   (asan_instrument):
>   (hwasan_frame_base):
>   (hwasan_record_stack_var):
>   (hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
>   (hwasan_emit_untag_frame): Likewise.
>   * asan.h (hwasan_record_stack_var):
>   (memtag_sanitize_stack_p): New declaration.
>   (memtag_sanitize_allocas_p): Like

Re: [GCC16,RFC,V2 03/14] aarch64: add new insn definition for st2g

2025-05-01 Thread Richard Sandiford
Indu Bhagat  writes:
> On 4/15/25 9:21 AM, Richard Sandiford wrote:
>> Indu Bhagat  writes:
>>> Store Allocation Tags (st2g) is an Armv8.5-A memory tagging (MTE)
>>> instruction. It stores an allocation tag to two tag granules of memory.
>>>
>>> TBD:
>>>- Not too sure what is the best way to generate the st2g yet; A
>>>  subsequent patch will emit them in one of the target hooks.
>> 
>> Regarding the previous thread about this:
>> 
>>  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668671.html
>> 
>
> Apologies for this one again slipping through the cracks.
>
>> and your question about whether all types of store tag instructions
>> should be volatile: if we went for that approach, then yeah, I think so.
>> 
>> As I mentioned there, I don't think we should use (unspec ...) memory
>> addresses.
>> 
>> But thinking more about it: can we guarantee that GCC will only use
>> these instruction patterns with base registers that are aligned to
>> 16 bytes?  If so, then perhaps an alternative would be to model
>> them as read-modify-write operations to the whole granule (even though
>> the actual instructions leave normal memory untouched and only change
>> the tags).  That is, rather than:
>> 
>
> I think so. Each stack var is assigned a granule of its own (16-bytes); 
> for alloca, we align explicitly; and each pointer is based off the SP 
> (which is also 16-byte aligned for AAPCS64).  IIRC, FEAT_MTE*  are 
> available in AArch64 only.
>
> BTW, the aarch64.md file has this comment for stg:
> ";; STG doesn't align the address but aborts with alignment fault
> ;; when the address is not 16-byte aligned."

Ah, yeah.  So using a read-modify-rewrite TImode memory operation
should be fine in that case.

> Just curious: the same should be true for all MTE load/store tag 
> instructions, isnt it ?  I cant seem to find the above specific 
> information in the ARM MTE instruction specification  "Arm Architecture 
> Reference Manual for A-profile architecture."...

In the pseudocode, this is done in the AArch64.MemTag accesses.  E.g:

AArch64.MemTag[bits(64) address, AccessDescriptor accdesc_in] = bits(4) value
assert accdesc_in.tagaccess && !accdesc_in.tagchecked;

AddressDescriptor memaddrdesc;
AccessDescriptor accdesc = accdesc_in;

constant boolean aligned = IsAligned(address, TAG_GRANULE);

// Stores of allocation tags must be aligned
if !aligned then
constant FaultRecord fault = AlignmentFault(accdesc, address);
AArch64.Abort(fault);

Thanks,
Richard


Re: [PATCH] libgomp: Update SVE test

2025-05-01 Thread Richard Sandiford
Tejas Belagod  writes:
> Fix udr-sve.c target test that to check for the correct results based on the
> OpenMP clauses used.  The test was first written with a misunderstood
> functionality of the reduction clause.
>
> Tested with aarch64-linux-gnu. OK for trunk?
>
> libgomp/ChangeLog:
>
>   * testsuite/libgomp.c-target/aarch64/udr-sve.c: Fix test.

Thanks for the update.  OK if Jakub has no further comments by Monday.

Richard

> ---
>  .../libgomp.c-target/aarch64/udr-sve.c| 58 +++
>  1 file changed, 47 insertions(+), 11 deletions(-)
>
> diff --git a/libgomp/testsuite/libgomp.c-target/aarch64/udr-sve.c 
> b/libgomp/testsuite/libgomp.c-target/aarch64/udr-sve.c
> index 03d93cc44b2..02e02dc04b6 100644
> --- a/libgomp/testsuite/libgomp.c-target/aarch64/udr-sve.c
> +++ b/libgomp/testsuite/libgomp.c-target/aarch64/udr-sve.c
> @@ -9,8 +9,8 @@
>  void __attribute__ ((noipa))
>  parallel_reduction ()
>  {
> -  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> -  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  int a[8] = {1, 1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0, 0, 0, 0, 0, 0, 0, 0};
>svint32_t va = svld1_s32 (svptrue_b32 (), b);
>int i = 0;
>int64_t res;
> @@ -30,8 +30,8 @@ parallel_reduction ()
>  void __attribute__ ((noipa))
>  for_reduction ()
>  {
> -  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> -  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  int a[8] = {1, 1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0, 0, 0, 0, 0, 0, 0, 0};
>svint32_t va = svld1_s32 (svptrue_b32 (), b);
>int j;
>int64_t res;
> @@ -58,13 +58,13 @@ simd_reduction ()
>for (j = 0; j < 8; j++)
>  a[j] = 1;
>  
> -  #pragma omp simd reduction (+:va, i)
> +  #pragma omp simd reduction (+:va)
>for (j = 0; j < 16; j++)
> -va = svld1_s32 (svptrue_b32 (), a);
> +va += svld1_s32 (svptrue_b32 (), a);
>  
>res = svaddv_s32 (svptrue_b32 (), va);
>  
> -  if (res != 8)
> +  if (res != 128)
>  __builtin_abort ();
>  }
>  
> @@ -72,22 +72,57 @@ void __attribute__ ((noipa))
>  inscan_reduction_incl ()
>  {
>svint32_t va = svindex_s32 (0, 0);
> +  int a[8] = {1, 1, 1, 1, 1, 1, 1, 1};
> +  int b[64] = { 0 };
>int j;
>int64_t res = 0;
>  
> -  #pragma omp parallel
> -  #pragma omp for reduction (inscan,+:va) firstprivate (res) lastprivate 
> (res)
> +  #pragma omp parallel for reduction (inscan, +:va)
>for (j = 0; j < 8; j++)
>  {
> -  va = svindex_s32 (1, 0);
> +  va += svld1_s32 (svptrue_b32 (), a);
>#pragma omp scan inclusive (va)
> -  res += svaddv_s32 (svptrue_b32 (), va);
> +  svst1_s32 (svptrue_b32 (), b + j * 8, va);
> +}
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != 64)
> +__builtin_abort ();
> +
> +  for (j = 0; j < 64; j+=8)
> +if (b[j] != (j / 8 + 1))
> +  __builtin_abort ();
> +}
> +
> +void __attribute__ ((noipa))
> +inscan_reduction_excl ()
> +{
> +  svint32_t va = svindex_s32 (0, 0);
> +  int a[8] = {1, 1, 1, 1, 1, 1, 1, 1};
> +  int b[64] = { 0 };
> +  int j;
> +  int64_t res = 0;
> +
> +  #pragma omp parallel for reduction (inscan, +:va)
> +  for (j = 0; j < 8; j++)
> +{
> +  svst1_s32 (svptrue_b32 (), b + j * 8, va);
> +  #pragma omp scan exclusive (va)
> +  va += svld1_s32 (svptrue_b32 (), a);
>  }
>  
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
>if (res != 64)
>  __builtin_abort ();
> +
> +  for (j = 0; j < 64; j+=8)
> +if (b[j] != j / 8)
> +  __builtin_abort ();
>  }
>  
> +
>  int
>  main ()
>  {
> @@ -95,4 +130,5 @@ main ()
>for_reduction ();
>simd_reduction ();
>inscan_reduction_incl ();
> +  inscan_reduction_excl ();
>  }


Re: [PATCH v2 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, May 01, 2025 at 04:30:56PM +, Joseph Myers wrote:
>> On Thu, 1 May 2025, Christopher Bazley wrote:
>> 
>> > Changes in v2:
>> >  - Fixed a formatting issue.
>> >  - Added a test.
>> 
>> Thanks.  I have no further comments; I hope someone more familiar with 
>> debug info handling can review this version.
>
> LGTM as well.

Thanks for the reviews.  I've pushed to the patch to trunk.

Richard


[PATCH PING] OpenMP: Implement "begin declare variant"

2025-05-01 Thread Sandra Loosemore

Hi all,

The first two pieces of this patch series

https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675464.html

were approved and committed for GCC 15, but the remaining 5 parts are 
still awaiting review.  I was hoping that I could get these in early in 
stage 1, or at least be told early in stage 1 that this is all bogus and 
the patches need to be totally re-worked :-P so that there is time to do 
that work before stage 1 ends.


The linked patches still apply cleanly to mainline head and test without 
problems.


-Sandra


[PATCH] expand: Remove unsignedp argument from get_compare_parts [PR118090]

2025-05-01 Thread Andrew Pinski
While helping Eikansh with a patch to ccmp, it was noticed that the
result stored in the up pointer that gets passed to get_compare_parts
was unused on all call sites.
It was always unused since get_compare_parts was added in
r8-1717-gf580a969d7fbab. It looks it was not noticed it became unused
when rcode was set via get_compare_parts and in RTL, the signedness is
part of the comparison.

PR middle-end/118090
gcc/ChangeLog:

* ccmp.cc (get_compare_parts): Remove the up argument.
(expand_ccmp_next): Update call to get_compare_parts.
(expand_ccmp_expr_1): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/ccmp.cc | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 67efe7d4f5a..e49bafa85c5 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -133,23 +133,22 @@ ccmp_candidate_p (gimple *g, bool outer = false)
 
 /* Extract the comparison we want to do from the tree.  */
 void
-get_compare_parts (tree t, int *up, rtx_code *rcode,
+get_compare_parts (tree t, rtx_code *rcode,
   tree *rhs1, tree *rhs2)
 {
   tree_code code;
   gimple *g = get_gimple_for_ssa_name (t);
   if (g && is_gimple_assign (g))
 {
-  *up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
+  int up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
   code = gimple_assign_rhs_code (g);
-  *rcode = get_rtx_code (code, *up);
+  *rcode = get_rtx_code (code, up);
   *rhs1 = gimple_assign_rhs1 (g);
   *rhs2 = gimple_assign_rhs2 (g);
 }
   else
 {
   /* If g is not a comparison operator create a compare to zero.  */
-  *up = 1;
   *rcode = NE;
   *rhs1 = t;
   *rhs2 = build_zero_cst (TREE_TYPE (t));
@@ -167,10 +166,9 @@ expand_ccmp_next (tree op, tree_code code, rtx prev,
  rtx_insn **prep_seq, rtx_insn **gen_seq)
 {
   rtx_code rcode;
-  int unsignedp;
   tree rhs1, rhs2;
 
-  get_compare_parts(op, &unsignedp, &rcode, &rhs1, &rhs2);
+  get_compare_parts (op, &rcode, &rhs1, &rhs2);
   return targetm.gen_ccmp_next (prep_seq, gen_seq, prev, rcode,
rhs1, rhs2, get_rtx_code (code, 0));
 }
@@ -204,7 +202,6 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
rtx_insn **gen_seq)
 {
   if (ccmp_tree_comparison_p (op1, bb))
{
- int unsignedp0, unsignedp1;
  rtx_code rcode0, rcode1;
  tree logical_op0_rhs1, logical_op0_rhs2;
  tree logical_op1_rhs1, logical_op1_rhs2;
@@ -214,10 +211,10 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
rtx_insn **gen_seq)
  unsigned cost1 = MAX_COST;
  unsigned cost2 = MAX_COST;
 
- get_compare_parts (op0, &unsignedp0, &rcode0,
+ get_compare_parts (op0, &rcode0,
 &logical_op0_rhs1, &logical_op0_rhs2);
 
- get_compare_parts (op1, &unsignedp1, &rcode1,
+ get_compare_parts (op1, &rcode1,
 &logical_op1_rhs1, &logical_op1_rhs2);
 
  rtx_insn *prep_seq_1, *gen_seq_1;
-- 
2.43.0



[PATCH] ctf: emit CTF_K_ARRAY for GNU vector types

2025-05-01 Thread Bruce McCulloch
Currently, there is a check in gen_ctf_array_type that prevents GNU vectors
generated by the vector attribute from being emitted (e.g. typedef int v8si
__attribute__ ((vector_size (32)));). Because this check happens in
dwarf2ctf.cc, this prevents GNU vectors from being emitted not only in CTF,
but also in BTF. This is a problem, as there are a handful of GNU vectors
present in the kernel that are not being accurately represented in the
vmlinux.{ctfa,btfa}. Additionally, BTF generated by clang emits these vectors
as arrays.

This patch solves the issue by simply removing the check that prevents
these types from being appropriately emitted. Additionally, a new test is
included that checks for the appropriate asm emission when generating CTF.

gcc/ChangeLog:

* dwarf2ctf.cc (gen_ctf_array_type): Remove check for DW_AT_GNU_vector.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/ctf/ctf-vector.c: New test.


Signed-off-by: Bruce McCulloch 
---
 gcc/dwarf2ctf.cc|  4 ---
 gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c | 32 +
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index fd326b320af..a3497d58504 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -417,10 +417,6 @@ gen_ctf_array_type (ctf_container_ref ctfc,
   dw_die_ref first, last, array_elems_type;
   ctf_dtdef_ref array_dtd, elem_dtd;
 
-  int vector_type_p = get_AT_flag (array_type, DW_AT_GNU_vector);
-  if (vector_type_p)
-return NULL;
-
   /* Find the first and last array dimension DIEs.  */
   last = dw_get_die_child (array_type);
   first = dw_get_die_sib (last);
diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c
new file mode 100644
index 000..368046db214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c
@@ -0,0 +1,32 @@
+/* Tests for CTF SIMD vector type.
+   - Verify that there is a record of:
+ + int
+ + void
+ + int[8] -> int
+ + v8si -> int[8] -> int.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gctf -dA" } */
+
+/* Check for presence of strings:  */
+/* { dg-final { scan-assembler-times "ascii \"int.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"void.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"v8si.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+
+/* Check for information about int.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x600\[ \t\]+\[^\n\]*# 
ctt_info" 2 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x4\[ \t\]+\[^\n\]*# 
ctt_size or ctt_type" 1 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x120\[ \t\]+\[^\n\]*# 
ctf_encoding_data" 1 } } */
+
+/* Check for information about void.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0\[ \t\]+\[^\n\]*# ctt_size 
or ctt_type" 2 } } */
+
+/* Check for information about int[8] array.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x1200\[ \t\]+\[^\n\]*# 
ctt_info" 1 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x8\[ \t\]+\[^\n\]*# 
cta_nelems" 1 } } */
+
+/* Check for information about v8si.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x2a00\[ \t\]+\[^\n\]*# 
ctt_info" 1 } } */
+
+typedef int v8si __attribute__ ((vector_size (32)));
+v8si foo;
-- 
2.43.5



Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-05-01 Thread H.J. Lu
On Wed, Apr 30, 2025 at 7:40 PM Uros Bizjak  wrote:
>
> On Tue, Apr 29, 2025 at 12:22 PM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 5:30 PM Uros Bizjak  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 9:56 AM H.J. Lu  wrote:
> > > >
> > > > Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy
> > > > propagation pass can eliminate multiple __tls_get_addr calls.
> > >
> > > __tls_get_addr needs to be called with 16-byte aligned stack, I don't
> > > think the compiler will correctly handle required call alignment if
> > > you emit the call without emit_libcall_block.
> >
> > ix86_split_tls_local_dynamic_base_64 generates the same sequence
> > as emit_libcall_block.  stack alignment is handled by
> >
> > (define_expand "@tls_local_dynamic_base_64_"
> >   [(set (match_operand:P 0 "register_operand")
> > (unspec:P
> >  [(match_operand 1 "constant_call_address_operand")
> >   (reg:P SP_REG)]
> >  UNSPEC_TLS_LD_BASE))]
> >   "TARGET_64BIT"
> >   "ix86_tls_descriptor_calls_expanded_in_cfun = true;")
>
> The above is to align the initial %rsp at the beginning of the
> function. When PUSH instructions in the function misaling %rsp, there
> will be nothing to keep %rsp aligned before the call to
> __tls_get_addr.
>
> We have been bitten by this in the past.
>

True:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

which was fixed by

commit 272325bd6abba598a8f125dab36b626acb648b03
Author: Wei Mi 
Date:   Thu May 8 16:44:52 2014 +

re PR target/58066 (__tls_get_addr is called with misaligned stack on x86-64
)

__tls_get_addr doesn't take an argument and my patch still aligns the stack
properly.

-- 
H.J.


Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-05-01 Thread Andrew Pinski
On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
> wrote:
> >
> > When we have an empty function, things can go wrong with
> > cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> > the only thing the function does is a call to __builtin_unreachable,
> > let's expand that to a __builtin_trap instead. For most targets that
> > is one instruction wide so it won't hurt things that much and we get
> > correct behavior for exceptions and some linkers will be better for it.
> >
> > The only thing I have a concern about is that some targets still
> > don't define a trap instruction. I tried to emit a nop instead of
> > an abort but that nop is removed during RTL DCE.
> > Should we just push targets to define a trap instead?
> > E.g. BPF, avr and sh are the 3 semi active targets which still don't
> > have a trap defined.
>
> Do any of those targets have the cfi_startproc/cfi_endproc issue
> or exceptions are relevant on those?
>
> I'd say guard this with targetm.have_trap (), there's the chance that
> say on avr the expansion to abort() might fail to link in a
> freestanding environment.
>
> As for the nop, if you mark it volatile does it prevail?

What is interesting is we are having the same discussion 25 years
later about targets not having a trap being defined.
https://gcc.gnu.org/legacy-ml/gcc-patches/2000-01/msg00323.html
I think it is time to stop kicking the bucket down the road and start
requiring targets to define a trap.

Thanks,
Andrew

>
> > The QOI idea for basic block reorder is recorded as PR 120004.
> >
> > Changes since v1:
> > * v2: Move to final gimple cfg cleanup instead of expand and use
> >   BUILT_IN_UNREACHABLE_TRAP.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> > * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If the 
> > first
> > non debug statement in the first (and only) basic block is a call
> > to __builtin_unreachable change it to a call to __builtin_trap.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pr109267-1.c: New test.
> > * gcc.dg/pr109267-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> >  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
> >  gcc/tree-cfgcleanup.cc| 14 ++
> >  3 files changed, 42 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> > b/gcc/testsuite/gcc.dg/pr109267-1.c
> > new file mode 100644
> > index 000..d6df2c3b49a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/109267 */
> > +
> > +int f(void)
> > +{
> > +  __builtin_unreachable();
> > +}
> > +
> > +/* This unreachable should be changed to be a trap. */
> > +
> > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
> > "optimized"} } */
> > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
> > "optimized"} } */
> > diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> > b/gcc/testsuite/gcc.dg/pr109267-2.c
> > new file mode 100644
> > index 000..6cd1419a1e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/109267 */
> > +void g(void);
> > +int f(int *t)
> > +{
> > +  g();
> > +  __builtin_unreachable();
> > +}
> > +
> > +/* The unreachable should stay a unreachable. */
> > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable trap \\\(" 
> > "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable \\\(" 1 
> > "optimized"} } */
> > diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
> > index 9a8a668e12b..38a62499f93 100644
> > --- a/gcc/tree-cfgcleanup.cc
> > +++ b/gcc/tree-cfgcleanup.cc
> > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "cgraph.h"
> >  #include "tree-into-ssa.h"
> >  #include "tree-cfgcleanup.h"
> > +#include "target.h"
> >
> >
> >  /* The set of blocks in that at least one of the following changes 
> > happened:
> > @@ -1530,6 +1531,19 @@ execute_cleanup_cfg_post_optimizing (void)
> >cleanup_dead_labels ();
> >if (group_case_labels ())
> >  todo |= TODO_cleanup_cfg;
> > +
> > +  basic_block bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > +  gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
> > +  /* If the first (and only) bb and the only non debug
> > + statement is __builtin_unreachable call, then replace it with a trap
> > + so the function is at least one instruction in size.  */
> > + 

Re: [PATCH] libgomp: Update SVE test

2025-05-01 Thread Jakub Jelinek
On Thu, May 01, 2025 at 08:38:57PM +0100, Richard Sandiford wrote:
> Tejas Belagod  writes:
> > Fix udr-sve.c target test that to check for the correct results based on the
> > OpenMP clauses used.  The test was first written with a misunderstood
> > functionality of the reduction clause.
> >
> > Tested with aarch64-linux-gnu. OK for trunk?
> >
> > libgomp/ChangeLog:
> >
> > * testsuite/libgomp.c-target/aarch64/udr-sve.c: Fix test.
> 
> Thanks for the update.  OK if Jakub has no further comments by Monday.

LGTM.

Jakub



Re: [PATCH][stage1] middle-end/60779 - LTO vs. -fcx-fortran-rules and -fcx-limited-range

2025-05-01 Thread Mark Wielaard
Hi Richard,

On Mon, Apr 28, 2025 at 11:26:50AM +0200, Richard Biener wrote:
> On Tue, Feb 18, 2025 at 1:44 PM Richard Biener  wrote:
> >
> > The following changes how flag_complex_method is managed towards
> > being able to record that in the optimization set so we can stream
> > and restore it per function.  Currently -fcx-fortran-rules and
> > -fcx-limited-range are separate recorded options but saving/restoring
> > does not restore flag_complex_method which is later used in the
> > middle-end.
> >
> > The solution is to make -fcx-fortran-rules and -fcx-limited-range
> > aliases of a new -fcx-method= switch that represents flag_complex_method
> > directly so we can save and restore it.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.  How do
> > we go about documenting Aliased flags?  I'm hoping for test coverage
> > of language-specific defaults.
> >
> > We allowed inlining of -fcx-limited-range into -fno-cx-limited-range
> > (but failed to check -fcx-fortran-rules).  Such inlining would
> > pessimize complex multiplication/division, but I've preserved this
> > behavior and properly based it on flag_complex_method.
> >
> > OK for stage1?
> 
> I have pushed this now.

This breaks the autoregen bot because gcc/common.opt.urls wasn't
regenerated. https://builder.sourceware.org/buildbot/#/builders/269

Fix is trivial:

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 00775118e6f..c1085608735 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -502,6 +502,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fcse-follow-jumps)
 fcse-skip-blocks
 UrlSuffix(gcc/Optimize-Options.html#index-fcse-skip-blocks)
 
+fcx-method=
+UrlSuffix(gcc/Optimize-Options.html#index-fcx-method)
+
 fcx-limited-range
 UrlSuffix(gcc/Optimize-Options.html#index-fcx-limited-range)
 

Cheers,

Mark


[pushed] config-list.mk: Update FreeBSD targets to version 13

2025-05-01 Thread Gerald Pfeifer
FreeBSD 13 has another year of life, the majority of (deployment and user) 
action should be on FreeBSD 14 by now. Switch config-list.mk accordingly.

Pushed.

Gerald


contrib:
* config-list.mk: Update FreeBSD targets to version 13.
---
 contrib/config-list.mk | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 58bb4c5c186..27aabf421b1 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -35,7 +35,7 @@ OPT_IN_LANGUAGES=
 #
 
 LIST = \
-  aarch64-elf aarch64-freebsd13 aarch64-linux-gnu aarch64-rtems \
+  aarch64-elf aarch64-freebsd14 aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-netbsd alpha-openbsd \
   alpha64-dec-vms alpha-dec-vms \
   amdgcn-amdhsa \
@@ -54,7 +54,7 @@ LIST = \
   hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
   i686-apple-darwin9 i686-apple-darwin13 i686-apple-darwin17 \
-  i686-freebsd13 i686-kfreebsd-gnu \
+  i686-freebsd14 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
   i686-pc-linux-gnu i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
@@ -82,7 +82,7 @@ LIST = \
   or1k-elf or1k-linux-uclibc or1k-linux-musl or1k-rtems \
   pdp11-aout \
   powerpc-apple-darwin9 powerpc64-apple-darwin9 powerpc-apple-darwin8 \
-  powerpc-freebsd13 powerpc-netbsd \
+  powerpc-freebsd14 powerpc-netbsd \
   powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
   powerpc-rtems \
@@ -105,7 +105,7 @@ LIST = \
   vax-netbsdelf visium-elf \
   x86_64-apple-darwin10 x86_64-apple-darwin15 x86_64-apple-darwin21 \
   x86_64-gnu x86_64-pc-linux-gnuOPT-with-fpmath=avx \
-  x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
+  x86_64-elfOPT-with-fpmath=sse x86_64-freebsd14 x86_64-netbsd \
   x86_64-w64-mingw32 \
   x86_64-mingw32OPT-enable-sjlj-exceptions=yes x86_64-rtems \
   xstormy16-elf xtensa-elf \
-- 
2.49.0


Re: [PATCH] vect: Use internal storage for converts for call into supportable_indirect_convert_operation [PR118617]

2025-05-01 Thread H.J. Lu
On Fri, May 2, 2025 at 7:28 AM Andrew Pinski  wrote:
>
> While looking into PR 118616, I noticed that
> supportable_indirect_convert_operation only pushes up to 2 into its vec.
> And the 2 places which call supportable_indirect_convert_operation,
> use an auto_vec but without an internal storage. In this case an internal
> storage of 2 elements would save both memory and slight compile time 
> performance.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/118617
> gcc/ChangeLog:
>
> * tree-vect-generic.cc (expand_vector_conversion):
> * tree-vect-stmts.cc (vectorizable_conversion):

Need more descriptions.

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-vect-generic.cc | 2 +-
>  gcc/tree-vect-stmts.cc   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index 80c2d31776b..3c68361870b 100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -1754,7 +1754,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
>else if (ret_elt_bits > arg_elt_bits)
>  modifier = WIDEN;
>
> -  auto_vec > converts;
> +  auto_vec, 2> converts;
>if (supportable_indirect_convert_operation (code, ret_type, arg_type,
>   converts))
>  {
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 42b6059520a..537ae6c2f61 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5706,7 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
>scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
>scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
>opt_scalar_mode rhs_mode_iter;
> -  auto_vec > converts;
> +  auto_vec, 2> converts;
>
>/* Supportable by target?  */
>switch (modifier)
> --
> 2.43.0
>


-- 
H.J.


Re: [PATCH] vect: Use internal storage for converts for call into supportable_indirect_convert_operation [PR118617]

2025-05-01 Thread Andrew Pinski
On Thu, May 1, 2025 at 4:28 PM Andrew Pinski  wrote:
>
> While looking into PR 118616, I noticed that
> supportable_indirect_convert_operation only pushes up to 2 into its vec.
> And the 2 places which call supportable_indirect_convert_operation,
> use an auto_vec but without an internal storage. In this case an internal
> storage of 2 elements would save both memory and slight compile time 
> performance.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/118617
> gcc/ChangeLog:
>
> * tree-vect-generic.cc (expand_vector_conversion):
> * tree-vect-stmts.cc (vectorizable_conversion):

Whoops I messed up the changelog:
* tree-vect-generic.cc (expand_vector_conversion): Have 2 elements as
internal storage for converts.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.


>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-vect-generic.cc | 2 +-
>  gcc/tree-vect-stmts.cc   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index 80c2d31776b..3c68361870b 100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -1754,7 +1754,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
>else if (ret_elt_bits > arg_elt_bits)
>  modifier = WIDEN;
>
> -  auto_vec > converts;
> +  auto_vec, 2> converts;
>if (supportable_indirect_convert_operation (code, ret_type, arg_type,
>   converts))
>  {
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 42b6059520a..537ae6c2f61 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5706,7 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
>scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
>scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
>opt_scalar_mode rhs_mode_iter;
> -  auto_vec > converts;
> +  auto_vec, 2> converts;
>
>/* Supportable by target?  */
>switch (modifier)
> --
> 2.43.0
>


[PATCH] vect: Use internal storage for converts for call into supportable_indirect_convert_operation [PR118617]

2025-05-01 Thread Andrew Pinski
While looking into PR 118616, I noticed that
supportable_indirect_convert_operation only pushes up to 2 into its vec.
And the 2 places which call supportable_indirect_convert_operation,
use an auto_vec but without an internal storage. In this case an internal
storage of 2 elements would save both memory and slight compile time 
performance.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/118617
gcc/ChangeLog:

* tree-vect-generic.cc (expand_vector_conversion):
* tree-vect-stmts.cc (vectorizable_conversion):

Signed-off-by: Andrew Pinski 
---
 gcc/tree-vect-generic.cc | 2 +-
 gcc/tree-vect-stmts.cc   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 80c2d31776b..3c68361870b 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1754,7 +1754,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
   else if (ret_elt_bits > arg_elt_bits)
 modifier = WIDEN;
 
-  auto_vec > converts;
+  auto_vec, 2> converts;
   if (supportable_indirect_convert_operation (code, ret_type, arg_type,
  converts))
 {
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 42b6059520a..537ae6c2f61 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5706,7 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
   scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
   scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
   opt_scalar_mode rhs_mode_iter;
-  auto_vec > converts;
+  auto_vec, 2> converts;
 
   /* Supportable by target?  */
   switch (modifier)
-- 
2.43.0



[PATCH] dwarf2out: Propagate dtprel into the .debug_addr table in resolve_addr_in_expr

2025-05-01 Thread Kyle Huey
For a debugger to display statically-allocated[0] TLS variables the compiler
must communicate information[1] that can be used in conjunction with knowledge
of the runtime enviroment[2] to calculate a location for the variable for
each thread. That need gives rise to dw_loc_dtprel in dwarf2out, a flag tracking
whether the location description is dtprel, or relative to the
"dynamic thread pointer". Location descriptions in the .debug_info section for
TLS variables need to be relocated by the static linker accordingly, and
dw_loc_dtprel controls emission of the needed relocations.

This is further complicated by -gsplit-dwarf. -gsplit-dwarf is designed to allow
as much debugging information as possible to bypass the static linker to improve
linking performance. One of the ways that is done is by introducing a layer of
indirection for relocatable values[3]. That gives rise to addr_index_table which
ultimately results in the .debug_addr section.

While the code handling addr_index_table clearly contemplates the existence of
dtprel entries[4] resolve_addr_in_expr does not, and the result is that when
using -gsplit-dwarf the DWARF for TLS variables contains an address[5] rather
than an offset, and debuggers can't work with that.

This is visible on a trivial example. Compile

```
static __thread int tls_var;

int main(void) {
  tls_var = 42;
  return 0;
}
```

with -g and -g -gsplit-dwarf. Run the program under gdb. When examining the
value of tls_var before and after the assignment, -g behaves as one would
expect but -g -gsplit-dwarf does not. If the user is lucky and the miscalculated
address is not mapped, gdb will print "Cannot access memory at address ...".
If the user is unlucky and the miscalculated address is mapped, gdb will simply
give the wrong value. You can further confirm that the issue is the address
calculation by asking gdb for the address of tls_var and comparing that to what
one would expect.[6]

Thankfully this is trivial to fix by modifying resolve_addr_in_expr to propagate
the dtprel character of the location where necessary. gdb begins working as
expected and the diff in the generated assembly is clear.

```
.section.debug_addr,"",@progbits
.long   0x14
.value  0x5
.byte   0x8
.byte   0
 .Ldebug_addr0:
-   .quad   tls_var
+   .long   tls_var@dtpoff, 0
.quad   .LFB0
```

[0] Referring to e.g. __thread as statically-allocated vs. e.g. a
dynamically-allocated pthread_key_create() call.
[1] Generally an offset in a TLS block.
[2] With glibc, provided by libthread_db.so.
[3] Relocatable values are moved to a table in the .debug_addr section, those
values in .debug_info are replaced with special values that look up indexes
in that table, and then the static linker elsewhere assigns a single per-CU
starting index in the .debug_addr section, allowing those special values to
remain permanently fixed and the resulting data to be ignored by the linker.
[4] ate_kind_rtx_dtprel exists, after all, and new_addr_loc_descr does produce
it where appropriate.
[5] e.g. an address in the .tbss/.tdata section.
[6] e.g. on x86-64 by examining %fsbase and the offset in the assembly

2025-05-01  Kyle Huey  

* dwarf2out.cc (resolve_addr_in_expr): Propagate dtprel into the address
table when appropriate.
---
 gcc/dwarf2out.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 34ffeed86ff..9aecdb9fd5a 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -31068,7 +31068,8 @@ resolve_addr_in_expr (dw_attr_node *a, dw_loc_descr_ref 
loc)
   return false;
 remove_addr_table_entry (loc->dw_loc_oprnd1.val_entry);
loc->dw_loc_oprnd1.val_entry
- = add_addr_table_entry (rtl, ate_kind_rtx);
+ = add_addr_table_entry (rtl, loc->dw_loc_dtprel
+ ? ate_kind_rtx_dtprel : ate_kind_rtx);
   }
break;
   case DW_OP_const4u:
-- 
2.43.0



Re: [PATCH v6 1/2] RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.

2025-05-01 Thread Mark Wielaard
Hi Kito,

Unfortunately this breaks bootstrap for riscv:

../../gcc/gcc/config/riscv/genrvv-type-indexer.cc: In function ‘int main(int, 
const char**)’:
../../gcc/gcc/config/riscv/genrvv-type-indexer.cc:302:7: error: this ‘for’ 
clause does not guard... [-Werror=misleading-indentation]
  302 |   for (unsigned eew : EEW_SIZE_LIST)
  |   ^~~
../../gcc/gcc/config/riscv/genrvv-type-indexer.cc:306:9: note: ...this 
statement, but the latter is misleadingly indented as if it were guarded by the 
‘for’
  306 | fprintf (fp, "  /*X2*/ INVALID,\n");
  | ^~~
cc1plus: all warnings being treated as errors

https://builder.sourceware.org/buildbot/#/builders/338/builds/150

On Wed, Apr 30, 2025 at 05:25:34PM +0800, Kito Cheng wrote:
[...]
> > diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> > b/gcc/config/riscv/genrvv-type-indexer.cc
> > index 6de23cb6e1c..2fd429ad734 100644
> > --- a/gcc/config/riscv/genrvv-type-indexer.cc
> > +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> > @@ -303,6 +303,8 @@ main (int argc, const char **argv)
> > fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> >  inttype (eew, LMUL1_LOG2, /* unsigned_p */true).c_str ());
> >
> > +   fprintf (fp, "  /*X2*/ INVALID,\n");
> > +
> >for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> > {
> >   unsigned multiple_of_lmul = 1 << lmul_log2_offset;

That fprintf line is indented too much.

Cheers,

Mark


Re: [PATCH] c: Suppress -Wdeprecated-non-prototype warnings for builtins

2025-05-01 Thread Richard Sandiford
Florian Weimer  writes:
> Builtins defined with BT_FN_INT_VAR etc. show as functions without
> a prototype and trigger the warning.
>
> gcc/c/
>
>   PR c/119950
>   * c-typeck.cc (convert_arguments): Check for built-in
>   function declaration before warning.
>
> gcc/testsuite/
>
>   * gcc.dg/Wdeprecated-non-prototype-5.c: New test.
>
> ---
>  gcc/c/c-typeck.cc  |  3 ++-
>  gcc/testsuite/gcc.dg/Wdeprecated-non-prototype-5.c | 14 ++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 55d896e02df..f8d1f711513 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -4342,7 +4342,8 @@ convert_arguments (location_t loc, vec 
> arg_loc, tree fntype,
> builtin_typetail = NULL_TREE;
>   }
>  
> -  if (!typetail && parmnum == 0 && !TYPE_NO_NAMED_ARGS_STDARG_P (fntype))
> +  if (!typetail && parmnum == 0 && !TYPE_NO_NAMED_ARGS_STDARG_P (fntype)
> +   && !fndecl_built_in_p (fundecl))

This is causing gcc.c-compile/callind.c and others to segfault on
aarch64, since fundecl can be null for indirect calls.

Thanks,
Richard

>   {
> auto_diagnostic_group d;
> bool warned;
> diff --git a/gcc/testsuite/gcc.dg/Wdeprecated-non-prototype-5.c 
> b/gcc/testsuite/gcc.dg/Wdeprecated-non-prototype-5.c
> new file mode 100644
> index 000..b231a74cebe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Wdeprecated-non-prototype-5.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wdeprecated-non-prototype" } */
> +
> +static inline
> +int f (int x)
> +{
> +  return __builtin_constant_p (x);
> +}
> +
> +static inline
> +int g (double x)
> +{
> +  return __builtin_isfinite (x);
> +}
>
> base-commit: 0035613389a939043d654aea7a76faae95f69422


[PATCH] ctf: use CTF_FP_LDOUBLE encoding for 128-bit floats

2025-05-01 Thread Bruce McCulloch
Currently, gen_ctf_base_type will obtain the bit_size of a given DWARF DIE based
on the system GCC is compiling for. For DIEs with a DW_ATE_float encoding, this
is used to determine whether to classify a given DIE as a single, double, or
long double. However, on some systems, a long double will not have a bit_size
of 128 (usually it will be 80). This means that a __float128 (_Float128) type
will not be classified as a long double, and will actually be dropped entirely.

This patch makes an explicit check for a float with a bit_size of 128, in order
to handle __float128 types. A test has also been added to ensure that this type
generates appropriate CTF information.

gcc/ChangeLog:

* dwarf2ctf.cc (gen_ctf_base_type): encode CTF_FP_LDOUBLE if bit_size 
== 128.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/ctf/ctf-float128.c: New test.

Signed-off-by: Bruce McCulloch 
---
 gcc/dwarf2ctf.cc  |  2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-float128.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-float128.c

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index 7de3696a4d7..fd326b320af 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -262,7 +262,7 @@ gen_ctf_base_type (ctf_container_ref ctfc, dw_die_ref type)
  ctf_encoding.cte_format = CTF_FP_SINGLE;
else if (bit_size == double_bit_size)
  ctf_encoding.cte_format = CTF_FP_DOUBLE;
-   else if (bit_size == long_double_bit_size)
+   else if ((bit_size == long_double_bit_size) || (bit_size == 128))
  ctf_encoding.cte_format = CTF_FP_LDOUBLE;
else
  /* CTF does not have representation for other types.  Skip them.  */
diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf-float128.c 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf-float128.c
new file mode 100644
index 000..50240d2d6cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf-float128.c
@@ -0,0 +1,11 @@
+/* Tests for CTF __float128 type.
+   - Verify that there is a long double record.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gctf -dA" } */
+/* { dg-additional-options "-m32" { target { i?86-*-* x86_64-*-* } } }*/
+/* { dg-require-effective-target __float128 } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0xa00\[ \t\]+\[^\n\]*# 
ctt_info" 1} } */
+/* { dg-final { scan-assembler-times "ascii \"_Float128.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+
+__float128 a = 100;
-- 
2.43.5



Re: [GCC16, RFC, V2 01/14] opts: use unsigned HOST_WIDE_INT for sanitizer flags

2025-05-01 Thread Richard Biener
On Thu, May 1, 2025 at 3:40 PM Richard Sandiford
 wrote:
>
> Indu Bhagat  writes:
> > Currently, the data type of sanitizer flags is unsigned int, with
> > SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> > enumerator for enum sanitize_code.  Use 'unsigned HOST_WIDE_INT' data
> > type to allow for more distinct instrumentation modes be added when
> > needed.
> >
> > FIXME:
> > 1. Is using d_ulong_type for build_int_cst in gcc/d/d-attribs.cc, and
> >uint64_type_node in gcc/c-family/c-attribs.cc OK ? To get type
> >associated with unsigned HOST_WIDE_INT ?
> >
> > gcc/ChangeLog:
> >
> > * asan.h (sanitize_flags_p): Use 'unsigned HOST_WIDE_INT'
> >   instead of 'unsigned int'.
> > * common.opt: Likewise.
> > * dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
> > * opts.cc (find_sanitizer_argument): Likewise.
> > (report_conflicting_sanitizer_options): Likewise.
> > (parse_sanitizer_options): Likewise.
> > (parse_no_sanitize_attribute): Likewise.
> > * opts.h (parse_sanitizer_options): Likewise.
> > (parse_no_sanitize_attribute): Likewise.
> > * tree-cfg.cc (print_no_sanitize_attr_value): Likewise.
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-attribs.cc (add_no_sanitize_value): Likewise.
> > (handle_no_sanitize_attribute): Likewise.
> > (handle_no_sanitize_address_attribute): Likewise.
> > (handle_no_sanitize_thread_attribute): Likewise.
> > (handle_no_address_safety_analysis_attribute): Likewise.
> > * c-common.h (add_no_sanitize_value): Likewise.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc (c_parser_declaration_or_fndef): Likewise.
> >
> > gcc/cp/ChangeLog:
> >
> > * typeck.cc (get_member_function_from_ptrfunc): Likewise.
> >
> > gcc/d/ChangeLog:
> >
> > * d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.
>
> I don't have any good suggestions here.  But if possible, I think
> it would be good to introduce a typedef for the flags, rather than
> simply hard-coding HOST_WIDE_INT.  It might be even more forward-looking
> to use a bbitmap.

Also note HOST_WIDE_INT is now always 64bit and we can rely on uint64_t
being present which is much better in a non-tree/RTX (aka legacy) context.

Richard.

>
> Then we could have target-independent functions to convert the flags
> to and from an attribute argument.  I would hope that these attribute
> arguments are well enough shielded from the usual frontend type system
> that we could use build_nonstandard_integer_type.  But this isn't
> really my area.
>
> Thanks,
> Richard


Re: [PATCH] get_known_nonzero_bits_1 should use wi::bit_and_not [PR118659]

2025-05-01 Thread Richard Biener
On Thu, May 1, 2025 at 5:02 PM Andrew Pinski  wrote:
>
> While looking into bitwise optimizations, I noticed that
> get_known_nonzero_bits_1 does `bm.value () & ~bm.mask ()` which
> is ok except it creates a temporary wide_int. Instead if we
> use wi::bit_and_not, we can avoid the temporary and on some
> targets use the andn/bic instruction.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * tree-ssanames.cc (get_known_nonzero_bits_1): Use
> wi::bit_and_not instead of `a & ~b`.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssanames.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
> index d7865f29f0b..de7b9b79f94 100644
> --- a/gcc/tree-ssanames.cc
> +++ b/gcc/tree-ssanames.cc
> @@ -576,7 +576,7 @@ get_known_nonzero_bits_1 (const_tree name)
>if (tmp.undefined_p ())
>  return wi::shwi (0, precision);
>irange_bitmask bm = tmp.get_bitmask ();
> -  return bm.value () & ~bm.mask ();
> +  return wi::bit_and_not (bm.value (), bm.mask ());
>  }
>
>  /* Return a wide_int with known non-zero bits in SSA_NAME
> --
> 2.43.0
>


Re: [PATCH] expand: Remove unsignedp argument from get_compare_parts [PR118090]

2025-05-01 Thread Richard Biener
On Thu, May 1, 2025 at 9:45 PM Andrew Pinski  wrote:
>
> While helping Eikansh with a patch to ccmp, it was noticed that the
> result stored in the up pointer that gets passed to get_compare_parts
> was unused on all call sites.
> It was always unused since get_compare_parts was added in
> r8-1717-gf580a969d7fbab. It looks it was not noticed it became unused
> when rcode was set via get_compare_parts and in RTL, the signedness is
> part of the comparison.

OK.

> PR middle-end/118090
> gcc/ChangeLog:
>
> * ccmp.cc (get_compare_parts): Remove the up argument.
> (expand_ccmp_next): Update call to get_compare_parts.
> (expand_ccmp_expr_1): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/ccmp.cc | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 67efe7d4f5a..e49bafa85c5 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -133,23 +133,22 @@ ccmp_candidate_p (gimple *g, bool outer = false)
>
>  /* Extract the comparison we want to do from the tree.  */
>  void
> -get_compare_parts (tree t, int *up, rtx_code *rcode,
> +get_compare_parts (tree t, rtx_code *rcode,
>tree *rhs1, tree *rhs2)
>  {
>tree_code code;
>gimple *g = get_gimple_for_ssa_name (t);
>if (g && is_gimple_assign (g))
>  {
> -  *up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
> +  int up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
>code = gimple_assign_rhs_code (g);
> -  *rcode = get_rtx_code (code, *up);
> +  *rcode = get_rtx_code (code, up);
>*rhs1 = gimple_assign_rhs1 (g);
>*rhs2 = gimple_assign_rhs2 (g);
>  }
>else
>  {
>/* If g is not a comparison operator create a compare to zero.  */
> -  *up = 1;
>*rcode = NE;
>*rhs1 = t;
>*rhs2 = build_zero_cst (TREE_TYPE (t));
> @@ -167,10 +166,9 @@ expand_ccmp_next (tree op, tree_code code, rtx prev,
>   rtx_insn **prep_seq, rtx_insn **gen_seq)
>  {
>rtx_code rcode;
> -  int unsignedp;
>tree rhs1, rhs2;
>
> -  get_compare_parts(op, &unsignedp, &rcode, &rhs1, &rhs2);
> +  get_compare_parts (op, &rcode, &rhs1, &rhs2);
>return targetm.gen_ccmp_next (prep_seq, gen_seq, prev, rcode,
> rhs1, rhs2, get_rtx_code (code, 0));
>  }
> @@ -204,7 +202,6 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
> rtx_insn **gen_seq)
>  {
>if (ccmp_tree_comparison_p (op1, bb))
> {
> - int unsignedp0, unsignedp1;
>   rtx_code rcode0, rcode1;
>   tree logical_op0_rhs1, logical_op0_rhs2;
>   tree logical_op1_rhs1, logical_op1_rhs2;
> @@ -214,10 +211,10 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
> rtx_insn **gen_seq)
>   unsigned cost1 = MAX_COST;
>   unsigned cost2 = MAX_COST;
>
> - get_compare_parts (op0, &unsignedp0, &rcode0,
> + get_compare_parts (op0, &rcode0,
>  &logical_op0_rhs1, &logical_op0_rhs2);
>
> - get_compare_parts (op1, &unsignedp1, &rcode1,
> + get_compare_parts (op1, &rcode1,
>  &logical_op1_rhs1, &logical_op1_rhs2);
>
>   rtx_insn *prep_seq_1, *gen_seq_1;
> --
> 2.43.0
>


Re: [PATCH][stage1] middle-end/60779 - LTO vs. -fcx-fortran-rules and -fcx-limited-range

2025-05-01 Thread Richard Biener
On Thu, May 1, 2025 at 10:35 PM Mark Wielaard  wrote:
>
> Hi Richard,
>
> On Mon, Apr 28, 2025 at 11:26:50AM +0200, Richard Biener wrote:
> > On Tue, Feb 18, 2025 at 1:44 PM Richard Biener  wrote:
> > >
> > > The following changes how flag_complex_method is managed towards
> > > being able to record that in the optimization set so we can stream
> > > and restore it per function.  Currently -fcx-fortran-rules and
> > > -fcx-limited-range are separate recorded options but saving/restoring
> > > does not restore flag_complex_method which is later used in the
> > > middle-end.
> > >
> > > The solution is to make -fcx-fortran-rules and -fcx-limited-range
> > > aliases of a new -fcx-method= switch that represents flag_complex_method
> > > directly so we can save and restore it.
> > >
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.  How do
> > > we go about documenting Aliased flags?  I'm hoping for test coverage
> > > of language-specific defaults.
> > >
> > > We allowed inlining of -fcx-limited-range into -fno-cx-limited-range
> > > (but failed to check -fcx-fortran-rules).  Such inlining would
> > > pessimize complex multiplication/division, but I've preserved this
> > > behavior and properly based it on flag_complex_method.
> > >
> > > OK for stage1?
> >
> > I have pushed this now.
>
> This breaks the autoregen bot because gcc/common.opt.urls wasn't
> regenerated. https://builder.sourceware.org/buildbot/#/builders/269

Fixed.

I wonder if we can somehow arrange build to complain, like we do for
tm.texi{,.in}

Richard.

> Fix is trivial:
>
> diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> index 00775118e6f..c1085608735 100644
> --- a/gcc/common.opt.urls
> +++ b/gcc/common.opt.urls
> @@ -502,6 +502,9 @@ 
> UrlSuffix(gcc/Optimize-Options.html#index-fcse-follow-jumps)
>  fcse-skip-blocks
>  UrlSuffix(gcc/Optimize-Options.html#index-fcse-skip-blocks)
>
> +fcx-method=
> +UrlSuffix(gcc/Optimize-Options.html#index-fcx-method)
> +
>  fcx-limited-range
>  UrlSuffix(gcc/Optimize-Options.html#index-fcx-limited-range)
>
>
> Cheers,
>
> Mark


Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-05-01 Thread Richard Biener
On Fri, May 2, 2025 at 12:56 AM Andrew Pinski  wrote:
>
> On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
>  wrote:
> >
> > On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
> > wrote:
> > >
> > > When we have an empty function, things can go wrong with
> > > cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> > > the only thing the function does is a call to __builtin_unreachable,
> > > let's expand that to a __builtin_trap instead. For most targets that
> > > is one instruction wide so it won't hurt things that much and we get
> > > correct behavior for exceptions and some linkers will be better for it.
> > >
> > > The only thing I have a concern about is that some targets still
> > > don't define a trap instruction. I tried to emit a nop instead of
> > > an abort but that nop is removed during RTL DCE.
> > > Should we just push targets to define a trap instead?
> > > E.g. BPF, avr and sh are the 3 semi active targets which still don't
> > > have a trap defined.
> >
> > Do any of those targets have the cfi_startproc/cfi_endproc issue
> > or exceptions are relevant on those?
> >
> > I'd say guard this with targetm.have_trap (), there's the chance that
> > say on avr the expansion to abort() might fail to link in a
> > freestanding environment.
> >
> > As for the nop, if you mark it volatile does it prevail?
>
> What is interesting is we are having the same discussion 25 years
> later about targets not having a trap being defined.
> https://gcc.gnu.org/legacy-ml/gcc-patches/2000-01/msg00323.html
> I think it is time to stop kicking the bucket down the road and start
> requiring targets to define a trap.

I tried to check for SH and could not find a 'trap' instruction.  So I think
we need to suggest somewhat common idioms that targets can use
as replacement?  Like div by zero (when that traps), volatile store
to NULL (when that reliably traps), jump to not properly aligned address,
etc. - and the question then is whether such thing always will exist,
and if not, whether "halting" is as good as trapping then or if a
call to abort() is required.

But yes, I kind-of agree.

Are there bugzillas for all the targets that do not define a trap?

Richard.

>
> Thanks,
> Andrew
>
> >
> > > The QOI idea for basic block reorder is recorded as PR 120004.
> > >
> > > Changes since v1:
> > > * v2: Move to final gimple cfg cleanup instead of expand and use
> > >   BUILT_IN_UNREACHABLE_TRAP.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu.
> > >
> > > PR middle-end/109267
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If 
> > > the first
> > > non debug statement in the first (and only) basic block is a call
> > > to __builtin_unreachable change it to a call to __builtin_trap.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/pr109267-1.c: New test.
> > > * gcc.dg/pr109267-2.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> > >  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
> > >  gcc/tree-cfgcleanup.cc| 14 ++
> > >  3 files changed, 42 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> > > b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > new file mode 100644
> > > index 000..d6df2c3b49a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +
> > > +/* PR middle-end/109267 */
> > > +
> > > +int f(void)
> > > +{
> > > +  __builtin_unreachable();
> > > +}
> > > +
> > > +/* This unreachable should be changed to be a trap. */
> > > +
> > > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 1 
> > > "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
> > > "optimized"} } */
> > > diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> > > b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > new file mode 100644
> > > index 000..6cd1419a1e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +
> > > +/* PR middle-end/109267 */
> > > +void g(void);
> > > +int f(int *t)
> > > +{
> > > +  g();
> > > +  __builtin_unreachable();
> > > +}
> > > +
> > > +/* The unreachable should stay a unreachable. */
> > > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable trap \\\(" 
> > > "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable \\\(" 1 
> > > "optimized"} } */
> > > diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
> > > index 9a8a668e12b..38a62499f93 100644
> > > --- a/gcc/tree-cfgcleanup.cc
> >

Re: [PATCH] vect: Use internal storage for converts for call into supportable_indirect_convert_operation [PR118617]

2025-05-01 Thread Richard Biener
On Fri, May 2, 2025 at 2:01 AM Andrew Pinski  wrote:
>
> On Thu, May 1, 2025 at 4:28 PM Andrew Pinski  wrote:
> >
> > While looking into PR 118616, I noticed that
> > supportable_indirect_convert_operation only pushes up to 2 into its vec.
> > And the 2 places which call supportable_indirect_convert_operation,
> > use an auto_vec but without an internal storage. In this case an internal
> > storage of 2 elements would save both memory and slight compile time 
> > performance.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR tree-optimization/118617
> > gcc/ChangeLog:
> >
> > * tree-vect-generic.cc (expand_vector_conversion):
> > * tree-vect-stmts.cc (vectorizable_conversion):
>
> Whoops I messed up the changelog:
> * tree-vect-generic.cc (expand_vector_conversion): Have 2 elements as
> internal storage for converts.
> * tree-vect-stmts.cc (vectorizable_conversion): Likewise.

OK with that fix.

Richard.

>
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/tree-vect-generic.cc | 2 +-
> >  gcc/tree-vect-stmts.cc   | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> > index 80c2d31776b..3c68361870b 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -1754,7 +1754,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
> >else if (ret_elt_bits > arg_elt_bits)
> >  modifier = WIDEN;
> >
> > -  auto_vec > converts;
> > +  auto_vec, 2> converts;
> >if (supportable_indirect_convert_operation (code, ret_type, arg_type,
> >   converts))
> >  {
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 42b6059520a..537ae6c2f61 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -5706,7 +5706,7 @@ vectorizable_conversion (vec_info *vinfo,
> >scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
> >scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
> >opt_scalar_mode rhs_mode_iter;
> > -  auto_vec > converts;
> > +  auto_vec, 2> converts;
> >
> >/* Supportable by target?  */
> >switch (modifier)
> > --
> > 2.43.0
> >


Re: [PATCH v2] Change __builtin_unreachable to __builtin_trap if only thing in function [PR109267]

2025-05-01 Thread Andrew Pinski
On Thu, May 1, 2025 at 11:18 PM Richard Biener
 wrote:
>
> On Fri, May 2, 2025 at 12:56 AM Andrew Pinski  wrote:
> >
> > On Tue, Apr 29, 2025 at 11:49 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 4:25 PM Andrew Pinski  
> > > wrote:
> > > >
> > > > When we have an empty function, things can go wrong with
> > > > cfi_startproc/cfi_endproc and a few other things like exceptions. So if
> > > > the only thing the function does is a call to __builtin_unreachable,
> > > > let's expand that to a __builtin_trap instead. For most targets that
> > > > is one instruction wide so it won't hurt things that much and we get
> > > > correct behavior for exceptions and some linkers will be better for it.
> > > >
> > > > The only thing I have a concern about is that some targets still
> > > > don't define a trap instruction. I tried to emit a nop instead of
> > > > an abort but that nop is removed during RTL DCE.
> > > > Should we just push targets to define a trap instead?
> > > > E.g. BPF, avr and sh are the 3 semi active targets which still don't
> > > > have a trap defined.
> > >
> > > Do any of those targets have the cfi_startproc/cfi_endproc issue
> > > or exceptions are relevant on those?
> > >
> > > I'd say guard this with targetm.have_trap (), there's the chance that
> > > say on avr the expansion to abort() might fail to link in a
> > > freestanding environment.
> > >
> > > As for the nop, if you mark it volatile does it prevail?
> >
> > What is interesting is we are having the same discussion 25 years
> > later about targets not having a trap being defined.
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2000-01/msg00323.html
> > I think it is time to stop kicking the bucket down the road and start
> > requiring targets to define a trap.
>
> I tried to check for SH and could not find a 'trap' instruction.  So I think
> we need to suggest somewhat common idioms that targets can use
> as replacement?  Like div by zero (when that traps), volatile store
> to NULL (when that reliably traps), jump to not properly aligned address,
> etc. - and the question then is whether such thing always will exist,
> and if not, whether "halting" is as good as trapping then or if a
> call to abort() is required.
>
> But yes, I kind-of agree.
>
> Are there bugzillas for all the targets that do not define a trap?


Only one for sh, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70216 .
I will file one for the other targets tomorrow.

Thanks,
Andrew

>
> Richard.
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > > > The QOI idea for basic block reorder is recorded as PR 120004.
> > > >
> > > > Changes since v1:
> > > > * v2: Move to final gimple cfg cleanup instead of expand and use
> > > >   BUILT_IN_UNREACHABLE_TRAP.
> > > >
> > > > Bootstrapped and tested on x86_64-linux-gnu.
> > > >
> > > > PR middle-end/109267
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-cfgcleanup.cc (execute_cleanup_cfg_post_optimizing): If 
> > > > the first
> > > > non debug statement in the first (and only) basic block is a 
> > > > call
> > > > to __builtin_unreachable change it to a call to __builtin_trap.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/pr109267-1.c: New test.
> > > > * gcc.dg/pr109267-2.c: New test.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > >  gcc/testsuite/gcc.dg/pr109267-1.c | 14 ++
> > > >  gcc/testsuite/gcc.dg/pr109267-2.c | 14 ++
> > > >  gcc/tree-cfgcleanup.cc| 14 ++
> > > >  3 files changed, 42 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109267-2.c
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/pr109267-1.c 
> > > > b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > > new file mode 100644
> > > > index 000..d6df2c3b49a
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr109267-1.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +
> > > > +/* PR middle-end/109267 */
> > > > +
> > > > +int f(void)
> > > > +{
> > > > +  __builtin_unreachable();
> > > > +}
> > > > +
> > > > +/* This unreachable should be changed to be a trap. */
> > > > +
> > > > +/* { dg-final { scan-tree-dump-times "__builtin_unreachable trap \\\(" 
> > > > 1 "optimized"} } */
> > > > +/* { dg-final { scan-tree-dump-not "__builtin_unreachable \\\(" 
> > > > "optimized"} } */
> > > > diff --git a/gcc/testsuite/gcc.dg/pr109267-2.c 
> > > > b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > > new file mode 100644
> > > > index 000..6cd1419a1e3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr109267-2.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +
> > > > +/* PR middle-end/109267 */
> > > > +void g(void);
> > > > +int f(int *t)
> > > > +{
> > > > +  g();
> > > > +

Re: [PATCH] dwarf2out: Propagate dtprel into the .debug_addr table in resolve_addr_in_expr

2025-05-01 Thread Richard Biener
On Fri, May 2, 2025 at 2:14 AM Kyle Huey  wrote:
>
> For a debugger to display statically-allocated[0] TLS variables the compiler
> must communicate information[1] that can be used in conjunction with knowledge
> of the runtime enviroment[2] to calculate a location for the variable for
> each thread. That need gives rise to dw_loc_dtprel in dwarf2out, a flag 
> tracking
> whether the location description is dtprel, or relative to the
> "dynamic thread pointer". Location descriptions in the .debug_info section for
> TLS variables need to be relocated by the static linker accordingly, and
> dw_loc_dtprel controls emission of the needed relocations.
>
> This is further complicated by -gsplit-dwarf. -gsplit-dwarf is designed to 
> allow
> as much debugging information as possible to bypass the static linker to 
> improve
> linking performance. One of the ways that is done is by introducing a layer of
> indirection for relocatable values[3]. That gives rise to addr_index_table 
> which
> ultimately results in the .debug_addr section.
>
> While the code handling addr_index_table clearly contemplates the existence of
> dtprel entries[4] resolve_addr_in_expr does not, and the result is that when
> using -gsplit-dwarf the DWARF for TLS variables contains an address[5] rather
> than an offset, and debuggers can't work with that.
>
> This is visible on a trivial example. Compile
>
> ```
> static __thread int tls_var;
>
> int main(void) {
>   tls_var = 42;
>   return 0;
> }
> ```
>
> with -g and -g -gsplit-dwarf. Run the program under gdb. When examining the
> value of tls_var before and after the assignment, -g behaves as one would
> expect but -g -gsplit-dwarf does not. If the user is lucky and the 
> miscalculated
> address is not mapped, gdb will print "Cannot access memory at address ...".
> If the user is unlucky and the miscalculated address is mapped, gdb will 
> simply
> give the wrong value. You can further confirm that the issue is the address
> calculation by asking gdb for the address of tls_var and comparing that to 
> what
> one would expect.[6]
>
> Thankfully this is trivial to fix by modifying resolve_addr_in_expr to 
> propagate
> the dtprel character of the location where necessary. gdb begins working as
> expected and the diff in the generated assembly is clear.
>
> ```
> .section.debug_addr,"",@progbits
> .long   0x14
> .value  0x5
> .byte   0x8
> .byte   0
>  .Ldebug_addr0:
> -   .quad   tls_var
> +   .long   tls_var@dtpoff, 0
> .quad   .LFB0
> ```
>
> [0] Referring to e.g. __thread as statically-allocated vs. e.g. a
> dynamically-allocated pthread_key_create() call.
> [1] Generally an offset in a TLS block.
> [2] With glibc, provided by libthread_db.so.
> [3] Relocatable values are moved to a table in the .debug_addr section, those
> values in .debug_info are replaced with special values that look up 
> indexes
> in that table, and then the static linker elsewhere assigns a single 
> per-CU
> starting index in the .debug_addr section, allowing those special values 
> to
> remain permanently fixed and the resulting data to be ignored by the 
> linker.
> [4] ate_kind_rtx_dtprel exists, after all, and new_addr_loc_descr does produce
> it where appropriate.
> [5] e.g. an address in the .tbss/.tdata section.
> [6] e.g. on x86-64 by examining %fsbase and the offset in the assembly

OK.  Feel free to backport as appropriate.

Thanks,
Richard.

> 2025-05-01  Kyle Huey  
>
> * dwarf2out.cc (resolve_addr_in_expr): Propagate dtprel into the 
> address
> table when appropriate.
> ---
>  gcc/dwarf2out.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 34ffeed86ff..9aecdb9fd5a 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -31068,7 +31068,8 @@ resolve_addr_in_expr (dw_attr_node *a, 
> dw_loc_descr_ref loc)
>return false;
>  remove_addr_table_entry (loc->dw_loc_oprnd1.val_entry);
> loc->dw_loc_oprnd1.val_entry
> - = add_addr_table_entry (rtl, ate_kind_rtx);
> + = add_addr_table_entry (rtl, loc->dw_loc_dtprel
> + ? ate_kind_rtx_dtprel : ate_kind_rtx);
>}
> break;
>case DW_OP_const4u:
> --
> 2.43.0
>


Re: [PATCH] c: Fix crash in c-typeck.cc convert_arguments with indirect calls

2025-05-01 Thread Richard Biener
On Fri, May 2, 2025 at 5:50 AM Florian Weimer  wrote:
>
> gcc/c/
>
> PR c/120055
> * c-typeck.cc (convert_arguments): Check if fundecl is null
> before checking for builtin function declaration.
>
> ---
>  gcc/c/c-typeck.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index c7a13bf2b2f..9fd568feee3 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -4337,7 +4337,7 @@ convert_arguments (location_t loc, vec 
> arg_loc, tree fntype,
> }
>
>if (!typetail && parmnum == 0 && !TYPE_NO_NAMED_ARGS_STDARG_P (fntype)
> - && !fndecl_built_in_p (fundecl))
> + && fundecl && !fndecl_built_in_p (fundecl))

I think this should be && (!fundecl || !fndecl_built_in_p (fundecl)), otherwise
you also restrict the diagnostic to direct calls now, which it wasn't previously

Richard.

> {
>   auto_diagnostic_group d;
>   bool warned;
>
> base-commit: fd013e3fe47f2623b581213c6d7c08bdbd5b1614
>


[pushed] c++: add missing -fabi-version docs

2025-05-01 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Looks like I've forgotten to update the docs for -fabi-version for a couple
of my changes.

gcc/ChangeLog:

* doc/invoke.texi: Add -fabi-version detail.
* common.opt: Likewise.
---
 gcc/doc/invoke.texi | 9 +++--
 gcc/common.opt  | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d1925c98c2f..e7a9a03bace 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3006,12 +3006,17 @@ in C++14 and up.
 Version 18, which first appeared in G++ 13, fixes manglings of lambdas
 that have additional context.
 
-Version 19, which first appeared in G++ 14, fixes manglings of structured
-bindings to include ABI tags.
+Version 19, which first appeared in G++ 14, fixes manglings of
+structured bindings to include ABI tags, handling of cv-qualified
+[[no_unique_address]] members, and adds mangling of C++20 constraints
+on function templates.
 
 Version 20, which first appeared in G++ 15, fixes manglings of lambdas
 in static data member initializers.
 
+Version 21, which first appeared in G++ 16, fixes unnecessary captures
+in noexcept lambdas (c++/119764).
+
 See also @option{-Wabi}.
 
 @opindex fabi-compat-version
diff --git a/gcc/common.opt b/gcc/common.opt
index 3edc5907ec8..8a5b69d0767 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1055,7 +1055,7 @@ Driver Undocumented
 ; 20: Fix mangling of lambdas in static data member initializers.
 ; Default in G++ 15.
 ;
-; 21:
+; 21: Fix noexcept lambda capture pruning.
 ; Default in G++ 16.
 ;
 ; Additional positive integers will be assigned as new versions of

base-commit: b6d37ec1dd2a228d94e7b5b438f3aa53684316bc
-- 
2.49.0



[PATCH RFA] i386: -Wabi false positive with indirect call

2025-05-01 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, OK for trunk?

-- 8< --

This warning relies on the TRANSLATION_UNIT_WARN_EMPTY_P flag (set in
cxx_init_decl_processing) to decide whether we want to warn about the GCC 8
empty class parameter passing fix, but in a call through a function pointer
we don't have a translation unit and so complain for any -Wabi flag, even
now long after this was likely to be relevant.

In that situation, let's check the TU for current_function_decl instead.
And if we still can't come up with a TU, default to not warning.

PR c++/60336

gcc/ChangeLog:

* config/i386/i386.cc (ix86_warn_parameter_passing_abi):
If no target, check the current TU.

gcc/testsuite/ChangeLog:

* g++.dg/abi/pr60336-8a.C: New test.
---
 gcc/config/i386/i386.cc   | 11 ---
 gcc/testsuite/g++.dg/abi/pr60336-8a.C | 15 +++
 2 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/pr60336-8a.C

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 2f840338138..cb348cb9cfb 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18084,9 +18084,14 @@ ix86_warn_parameter_passing_abi (cumulative_args_t 
cum_v, tree type)
   if (cum->decl && !TREE_PUBLIC (cum->decl))
 return;
 
-  const_tree ctx = get_ultimate_context (cum->decl);
-  if (ctx != NULL_TREE
-  && !TRANSLATION_UNIT_WARN_EMPTY_P (ctx))
+  tree decl = cum->decl;
+  if (!decl)
+/* If we don't know the target, look at the current TU.  */
+decl = current_function_decl;
+
+  const_tree ctx = get_ultimate_context (decl);
+  if (ctx == NULL_TREE
+  || !TRANSLATION_UNIT_WARN_EMPTY_P (ctx))
 return;
 
   /* If the actual size of the type is zero, then there is no change
diff --git a/gcc/testsuite/g++.dg/abi/pr60336-8a.C 
b/gcc/testsuite/g++.dg/abi/pr60336-8a.C
new file mode 100644
index 000..a0518436e0e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/pr60336-8a.C
@@ -0,0 +1,15 @@
+// { dg-do compile }
+// { dg-options "-O2 -Wabi=12" }
+
+struct dummy { struct{} a[7][3]; };
+
+extern void test1 (struct dummy, ...);
+extern void (*test2) (struct dummy, ...);
+
+void
+foo ()
+{
+  struct dummy a0;
+  test1 (a0, 42);
+  test2 (a0, 42);
+}

base-commit: 87c4460024dadef0aa1c767be146ad3831857ebe
-- 
2.49.0



Re: [PATCH v2 0/1] Fix BZ 119317: named loops (C2y) with debug info

2025-05-01 Thread Jakub Jelinek
On Thu, May 01, 2025 at 10:02:06PM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Thu, May 01, 2025 at 04:30:56PM +, Joseph Myers wrote:
> >> On Thu, 1 May 2025, Christopher Bazley wrote:
> >> 
> >> > Changes in v2:
> >> >  - Fixed a formatting issue.
> >> >  - Added a test.
> >> 
> >> Thanks.  I have no further comments; I hope someone more familiar with 
> >> debug info handling can review this version.
> >
> > LGTM as well.
> 
> Thanks for the reviews.  I've pushed to the patch to trunk.

I think it should be eventually backported to 15 branch as well, it isn't
a regression as it is a new feature, but keeping it broken is not a good
idea.

Jakub



[PATCH] c: Fix crash in c-typeck.cc convert_arguments with indirect calls

2025-05-01 Thread Florian Weimer
gcc/c/

PR c/120055
* c-typeck.cc (convert_arguments): Check if fundecl is null
before checking for builtin function declaration.

---
 gcc/c/c-typeck.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index c7a13bf2b2f..9fd568feee3 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -4337,7 +4337,7 @@ convert_arguments (location_t loc, vec 
arg_loc, tree fntype,
}
 
   if (!typetail && parmnum == 0 && !TYPE_NO_NAMED_ARGS_STDARG_P (fntype)
- && !fndecl_built_in_p (fundecl))
+ && fundecl && !fndecl_built_in_p (fundecl))
{
  auto_diagnostic_group d;
  bool warned;

base-commit: fd013e3fe47f2623b581213c6d7c08bdbd5b1614