Re: [PATCH] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Hongtao Liu
On Tue, Sep 3, 2024 at 2:24 PM Haochen Jiang wrote: > > Hi all, > > The intrin for non-optimized got a typo in mask type, which will cause > the high bits of __mmask32 being unexpectedly zeroed. > > The test does not fail under O0 with current 1b since the testcase is > wrong. We need to include a

[PATCH] tree-optimization/116575 - avoid ICE with SLP mask_load_lane

2024-09-03 Thread Richard Biener
The following avoids performing re-discovery with single lanes in the attempt to for the use of mask_load_lane as rediscovery will fail since a single lane of a mask load will appear permuted which isn't supported. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimiz

Re: [PATCH] lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501]

2024-09-03 Thread Richard Biener
On Mon, 2 Sep 2024, Jakub Jelinek wrote: > Hi! > > The following testcase is miscompiled. The problem is in the last_ovf step. > The second operand has signed _BitInt(513) type but has the MSB clear, > so range_to_prec returns 512 for it (i.e. it fits into unsigned > _BitInt(512)). Because of t

Re: [patch][v2] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

2024-09-03 Thread Richard Biener
On Mon, 2 Sep 2024, Tobias Burnus wrote: > Hi Richard, > > Am 02.09.24 um 13:58 schrieb Richard Biener: > > Hmm, I can't really follow how and where it's currently decided whether to > > output offload tables for the LTRANS units > > Before the patch, output_offload_tables is called unconditiona

RE: [PATCH 2/8] i386: Optimize ordered and nonequal

2024-09-03 Thread Hu, Lin1
> -Original Message- > From: Hu, Lin1 > Sent: Tuesday, September 3, 2024 2:05 PM > To: Jakub Jelinek ; Andrew Pinski ; > Liu, Hongtao > Cc: Jiang, Haochen ; Richard Biener > ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com > Subject: RE: [PATCH 2/8] i386: Optimize ordered and nonequal > > >

[r15-3391 Regression] FAIL: gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c (test for excess errors) on Linux/x86_64

2024-09-03 Thread haochen.jiang
On Linux/x86_64, 8e16f26ca9fad685b9b723da7112ffcc99e81593 is the first bad commit commit 8e16f26ca9fad685b9b723da7112ffcc99e81593 Author: Levy Hsu Date: Mon Aug 26 10:46:30 2024 +0930 i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt caused FAIL: gcc.target/i386/avx10_2

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-03 Thread Prathamesh Kulkarni
> -Original Message- > From: Richard Biener > Sent: Monday, September 2, 2024 12:47 PM > To: Prathamesh Kulkarni > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to > accelerator > > External email: Use caution opening links or attachments > >

[COMMITTED 01/10] ada: Fix Finalize_Storage_Only bug in b-i-p calls

2024-09-03 Thread Marc Poulhiès
From: Bob Duff Do not pass null for the Collection parameter when Finalize_Storage_Only is in effect. If the collection is null in that case, we will blow up later when we deallocate the object. gcc/ada/ * exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call): Remove Finali

[COMMITTED 02/10] ada: Reject illegal array aggregates as per AI22-0106.

2024-09-03 Thread Marc Poulhiès
From: Steve Baird Implement the new legality rules of AI22-0106 which (as discussed in the AI) are needed to disallow constructs whose semantics would otherwise be poorly defined. gcc/ada/ * sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new legality rules of AI11-010

[COMMITTED 03/10] ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specifi

[COMMITTED 04/10] ada: Transform Length attribute references for non-Strict overflow mode.

2024-09-03 Thread Marc Poulhiès
From: Steve Baird The non-strict overflow checking code does a better job of eliminating overflow checks if given an expression consisting only of predefined operators (including relationals), literals, identifiers, and conditional expressions. If it is both feasible and useful, rewrite a Length

[COMMITTED 08/10] ada: Fix internal error with Atomic Volatile_Full_Access object

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specifi

[COMMITTED 10/10] ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly may give the wrong answer for them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) : Add kludge to cope with

[COMMITTED 05/10] ada: Simplify Note_Uplevel_Bound procedure

2024-09-03 Thread Marc Poulhiès
The procedure Note_Uplevel_Bound was implemented as a custom expression tree walk. This change replaces this custom tree traversal by a more idiomatic use of Traverse_Proc. gcc/ada/ * exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor to use the generic Traverse_Proc.

[COMMITTED 07/10] ada: Pass unaligned record components by copy in calls on all platforms

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou This has historically been done only on platforms requiring the strict alignment of memory references, but this can arguably be considered as being mandated by the language on all of them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) : Take into account

[COMMITTED 06/10] ada: Fix internal error on pragma pack with discriminated record component

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou When updating the size after making a packable type in gnat_to_gnu_field, we fail to clear it again when it is not constant. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size after updating it if it is not constant. Tested on x86_64-

[COMMITTED 09/10] ada: Plug loophole exposed by previous change

2024-09-03 Thread Marc Poulhiès
From: Eric Botcazou The change causes more temporaries to be created at call sites for unaligned actual parameters, thus revealing that the machinery does not properly deal with unconstrained nominal subtypes for them. gcc/ada/ * gcc-interface/trans.cc (create_temporary): Deal with type

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Andrew Pinski
On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: > > This patch implements constant folding of binary operations for SVE intrinsics > by calling the constant-folding mechanism of the middle-end for a given > tree_code. > In fold-const.cc, the code for folding vector constants was moved from

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-03 Thread Richard Biener
On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote: > > -Original Message- > > From: Richard Biener > > Sent: Monday, September 2, 2024 12:47 PM > > To: Prathamesh Kulkarni > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to > > accelerator > >

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Richard Biener
On Tue, 3 Sep 2024, Andrew Pinski wrote: > On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: > > > > This patch implements constant folding of binary operations for SVE > > intrinsics > > by calling the constant-folding mechanism of the middle-end for a given > > tree_code. > > In fold-con

[PATCH] Do not assert NUM_POLY_INT_COEFFS != 1 early

2024-09-03 Thread Richard Biener
The following moves the assert on NUM_POLY_INT_COEFFS != 1 after INTEGER_CST processing. Bootstrap and regtest running on x86_64-unknown-linux-gnu, pushed as obvious after getting into stage3. * fold-const.cc (poly_int_binop): Move assert on NUM_POLY_INT_COEFFS after INTEGER_CST p

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Sam James
"H.J. Lu" writes: > On Tue, Aug 27, 2024 at 1:11 PM H.J. Lu wrote: >> >> Update analyze_parms not to disable function parameter analysis for >> -ffat-lto-objects. Tested on x86-64, there are no differences in zstd >> with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects". >> >>

Re: [PATCH] Do not assert NUM_POLY_INT_COEFFS != 1 early

2024-09-03 Thread Jakub Jelinek
On Tue, Sep 03, 2024 at 10:42:34AM +0200, Richard Biener wrote: > The following moves the assert on NUM_POLY_INT_COEFFS != 1 after > INTEGER_CST processing. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, pushed > as obvious after getting into stage3. > > * fold-const.cc (pol

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Richard Biener
On Mon, Sep 2, 2024 at 4:23 AM H.J. Lu wrote: > > On Tue, Aug 27, 2024 at 1:11 PM H.J. Lu wrote: > > > > Update analyze_parms not to disable function parameter analysis for > > -ffat-lto-objects. Tested on x86-64, there are no differences in zstd > > with "-O2 -flto=auto" -g "vs -O2 -flto=auto -

[r15-3392 Regression] FAIL: gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c (test for excess errors) on Linux/x86_64

2024-09-03 Thread haochen.jiang
On Linux/x86_64, 62df24e50039ae04aa3b940e680cffd9041ef5bf is the first bad commit commit 62df24e50039ae04aa3b940e680cffd9041ef5bf Author: Levy Hsu Date: Tue Aug 27 14:22:20 2024 +0930 i386: Support partial vectorized V2BF/V4BF smaxmin caused FAIL: gcc.target/i386/avx10_2-512-bf-vector-sm

[PATCH v1 3/9] aarch64: Add minimal C++ support

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 3:15 PM Kyrylo Tkachov wrote: >> libstdc++-v3/ChangeLog: >> >>        * src/c++17/fast_float/fast_float.h (defined): Adjust a condition >>        for AArch64. > > libstdc++ is reviewed on its own list (CC’ed here) so I’d suggest splitting > the libstdc++-v3 hunk into its

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Jennifer Schmitz
> On 3 Sep 2024, at 10:39, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Tue, 3 Sep 2024, Andrew Pinski wrote: > >> On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz wrote: >>> >>> This patch implements constant folding of binary operations for S

[PATCH] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-03 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14? -- Some of the test cases were scanning for "bti", but it would, incorrectly, match the ".arch_extenssion pacbti". Also, keep test cases active if a supported Cortex-M core is supplied. gcc/testsuite/ChangeLog: * gcc.target/arm/bti-1.c: Enable for Cor

[PATCH] RISC-V: Also lower SLP grouped loads with just one consumer

2024-09-03 Thread Richard Biener
This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_stor

[Patch, rs6000, middle-end] v10: Add implementation for different targets for pair mem fusion

2024-09-03 Thread Ajit Agarwal
Hello Richard: This patch addresses all the review comments. It also fix the arm build failure. Common infrastructure using generic code for pair mem fusion of different targets. rs6000 target specific code implement virtual functions defined by generic code. Target specific code are added in r

[committed] MAINTAINERS: Update my email address

2024-09-03 Thread Szabolcs Nagy
* MAINTAINERS: Update my email address and add myself to DCO. --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 07ea5f5b6e1..cfd96c9f33e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -676,7 +676,7 @@ Christoph Müllner

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-03 Thread Jan Hubicka
> > > > > > PR ipa/116410 > > > * ipa-modref.cc (analyze_parms): Always analyze function parameter > > > for LTO streaming. > > > > > > Signed-off-by: H.J. Lu > > > --- > > > gcc/ipa-modref.cc | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff

[PATCH][v2] RISC-V: Also lower SLP grouped loads with just one consumer

2024-09-03 Thread Richard Biener
This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_stor

Zen5 tuning part 1: avoid FMA chains

2024-09-03 Thread Jan Hubicka
Hi, testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fa

[PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-03 Thread pan2 . li
From: Pan Li This patch would like to support the form 2 of the scalar signed integer .SAT_ADD. Aka below example: Form 2: #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_2 (T x, T y) \ {

Re: [PATCH] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-03 Thread Christophe Lyon
Hi Torbjörn, On 9/3/24 11:30, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- Some of the test cases were scanning for "bti", but it would, incorrectly, match the ".arch_extenssion pacbti". Also, keep test cases active if a supported Cortex-M core is supplied. gcc/testsuite/Ch

Zen5 tuning part 2: disable gather and scatter

2024-09-03 Thread Jan Hubicka
Hi, We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-03 Thread Richard Biener
On Tue, Sep 3, 2024 at 3:07 PM Jan Hubicka wrote: > > Hi, > We disable gathers for zen4. It seems that gather has improved a bit compared > to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when > the indices are known ahead of time. Vector loads followed by shuffles result

[PATCH] Fix missed peeling for gaps with SLP load-lanes

2024-09-03 Thread Richard Biener
The following disables peeling for gap avoidance with using smaller vector accesses when using load-lanes. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. * tree-vect-stmts.cc (get_group_load_store_type): Only disable peeling for gaps by using smaller vect

[PATCH] Dump whether a SLP node represents load/store-lanes

2024-09-03 Thread Richard Biener
This makes it easier to discover whether SLP load or store nodes participate in load/store-lanes accesses. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. * tree-vect-slp.cc (vect_print_slp_tree): Annotate load and store-lanes nodes. --- gcc/tree-vect-slp

Re: [PATCH v1] RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD

2024-09-03 Thread Jeff Law
On 9/2/24 5:27 AM, pan2...@intel.com wrote: From: Pan Li This patch would like to allow the IMM operand of the unsigned scalar .SAT_ADD. Like the operand 0, the operand 1 of .SAT_ADD will be zero extended to Xmode before underlying code generation. The below test suites are passed for this

Re: [PATCH v1 1/2] Match: Add int type fits check for form 1 of .SAT_SUB imm operand

2024-09-03 Thread Jeff Law
On 9/1/24 11:52 PM, pan2...@intel.com wrote: From: Pan Li This patch would like to add strict check for imm operand of .SAT_SUB matching. We have no type checking for imm operand in previous, which may result in unexpected IL to be catched by .SAT_SUB pattern. We leverage the int_fits_type

Ping * 4: [PATCH v2] Provide more contexts for -Warray-bounds warning messages

2024-09-03 Thread Qing Zhao
Hi, Richard, I’d like to ping this patch again. For the convenience, the original 2nd version of the patch is at: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657150.html The diagnostic part has been reviewed by David. Could you please take a look at the middle end implementation and le

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Qing Zhao
Hi, Jakub, I’d like to ping this simple patch again. It’s based on your suggestion in PR116016 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c28 Could you please take a look at the patch and let me know whether its okay for committing to trunk? thanks. Qing > On Aug 12, 2024, at 09:5

Zen5 tuning part 3: scheduler tweaks

2024-09-03 Thread Jan Hubicka
Hi, this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-r

[committed] libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- We don't need to use std::aligned_storage in std::any. We just need a POD type of the right size. The void* union member already ensures the alignment will be correct. Avoiding std::aligned_storage means we don't need to suppress a -Wdeprecated-decla

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Jakub Jelinek
On Tue, Sep 03, 2024 at 01:59:45PM +, Qing Zhao wrote: > Hi, Jakub, > > I’d like to ping this simple patch again. It’s based on your suggestion in > PR116016 > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c28 > > Could you please take a look at the patch and let me know whether it

[PATCH] libcpp: Implement the strict reading of the #embed expansion rules

2024-09-03 Thread Jakub Jelinek
Hi! The following patch attempts to implement the current wording of the C23 #embed expansion rules on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661901.html patch (haven't yet adjusted the rest of the series, but I expect only minor tweaks). After parsing #embed it first che

[PATCH v1 4/9] aarch64: Exclude symbols using GOT from code models

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 5:00 PM Richard Sandiford wrote: > I think we should instead patch the callers that are using > aarch64_symbol_binds_local_p for GOT decisions. The function itself > is checking for a more general property (and one that could be useful > in other contexts). The patch h

[PATCH] d, ada/spec: only sub nostd{inc, lib} rather than nostd{inc, lib}*

2024-09-03 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu. OK for trunk? -- >8 -- This prevents the gcc driver erroneously accepting -nostdlib++ when it should not when Ada was enabled. Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either the Ada or D compiler, so the spec should not subs

[PATCH v1 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-09-03 Thread Evgeny Karpov
Monday, September 2, 2024 5:36 PM Richard Sandiford wrote: >> In some cases, the alignment can be bigger than BIGGEST_ALIGNMENT. >> The patch handles these cases. >> >> gcc/ChangeLog: >> >>* config/aarch64/aarch64-coff.h (ASM_OUTPUT_ALIGNED_LOCAL): >>Change alignment. > > Can you

[committed] libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549]

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- LWG 3736 added a partial specialization of this variable template for two std::move_iterator types. This is needed for the case where the types satisfy std::sentinel_for and are subtractable, but do not model the semantics requirements of std::sized_

[committed] libstdc++: Fix error handling in fs::hard_link_count for Windows

2024-09-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- The recent change to use auto_win_file_handle for std::filesystem::hard_link_count caused a regression. The std::error_code argument should be cleared if no error occurs, but this no longer happens. Add a call to ec.clear() in fs::hard_link_count to

[PATCH v8 0/2] aarch64: Add support for AdvSIMD faminmax.

2024-09-03 Thread saurabh.jha
From: Saurabh Jha This series is a revised version of: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661860.html. The first patch of the series is updated to address these comments: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661866.html All comments are addressed exactly as s

[PATCH v8 1/2] aarch64: Add AdvSIMD faminmax intrinsics

2024-09-03 Thread saurabh.jha
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of this extensio

[PATCH v8 2/2] aarch64: Add codegen support for AdvSIMD faminmax

2024-09-03 Thread saurabh.jha
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation support for famax and famin in terms of existing R

Re: [PATCH] lto: Don't check obj.found for offload section

2024-09-03 Thread H.J. Lu
On Fri, Aug 23, 2024 at 5:50 AM Richard Biener wrote: > > On Fri, Aug 23, 2024 at 2:36 PM H.J. Lu wrote: > > > > obj.found is the number of LTO symbols. We should include the offload > > section when it is used by linker even if there are no LTO symbols. > > OK. > > > PR lto/116361 > >

Zen5 tuning part 4: update reassociation width

2024-09-03 Thread Jan Hubicka
Hi, Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in 3 of them. FP units can do 2 additions and 2 multiplications with latency 2 and 3. This patch updates reassociation width accordingly. This has potential of increasing register pressure but unlike while benchmarki

[PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-03 Thread Tamar Christina
Hi All, The meaning of the testcase was changed by passing it -fwrapv. The reason for the test failures on some platform was because the test was testing some implementation defined behavior wrt INT_MIN in generic code. Instead of using -fwrapv this just removes the border case from the test so

[PATCH][docs]: [committed] remove double mention of armv9-a.

2024-09-03 Thread Tamar Christina
Hi All, The list of available architecture for Arm is incorrectly listing armv9-a twice. This removes the duplicate armv9-a enumeration from the part of the list having M-profile targets. committed under the obvious rule. Thanks, Tamar gcc/ChangeLog: * doc/invoke.texi: Remove duplicate

[PATCH v2 0/5] openmp: Add support for iterators in OpenMP mapping clauses

2024-09-03 Thread Kwok Cheung Yeung
This is an improved version of the previous series that was posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652680.html Compared to the previous version, this version delays the gimplification of iterators until the very end of gimplify_adjust_omp_clauses (instead of doing it in

[PATCH v2 1/5] openmp: Refactor handling of iterators

2024-09-03 Thread Kwok Cheung Yeung
This patch factors out the code to calculate the number of iterations required and to generate the iteration loop into separate functions from gimplify_omp_depend for reuse later. I have also replaced the 'TREE_CODE (*tp) == TREE_LIST && ...' checks used for detecting an iterator clause with a ma

[PATCH v2 2/5] openmp: Add support for iterators in map clauses (C/C++)

2024-09-03 Thread Kwok Cheung Yeung
This patch modifies the C and C++ parsers to accept an iterator as a map type modifier, encoded in the same way as the depend and affinity clauses. When finishing the clauses, clauses with iterators are treated separately from ones without to avoid clashes (e.g. iterating over x[i] will likely gen

[PATCH v2 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++)

2024-09-03 Thread Kwok Cheung Yeung
This patch extends the previous patch to cover to/from clauses in 'target update'.From c3dfc4a792610530a4ab729c3f250917b828e469 Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Mon, 2 Sep 2024 19:34:09 +0100 Subject: [PATCH 3/5] openmp: Add support for iterators in 'target update' clauses

[PATCH v2 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)

2024-09-03 Thread Kwok Cheung Yeung
This patch adds support for iterators in the map clause of OpenMP target constructs. The parsing and translation of iterators in the front-end works the same as for the affinity and depend clauses. The iterator gimplification needed to be modified slightly to handle Fortran. The difference i

[PATCH v2 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)

2024-09-03 Thread Kwok Cheung Yeung
This patch adds parsing and translation of the 'to' and 'from' clauses for the 'target update' construct in Fortran.From cfb6b76da5bba038d854d510a4fd44ddf4fa8f1f Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Mon, 2 Sep 2024 19:34:29 +0100 Subject: [PATCH 5/5] openmp, fortran: Add support

Re: [PATCH] RISC-V: Optimize branches with shifted immediate operands

2024-09-03 Thread Jeff Law
On 9/2/24 7:52 AM, Jovan Vukic wrote: The patch adds a new instruction pattern to handle conditional branches with equality checks between shifted arithmetic operands. This pattern optimizes the use of shifted constants (with trailing zeros), making it more efficient. For the C code: void

[PATCH 1/4]middle-end: have vect_recog_cond_store_pattern use pattern statement for cond if available

2024-09-03 Thread Tamar Christina
Hi All, When vectorizing a conditional operation we rely on the bool_recog pattern to hit and convert the bool of the operand to a valid mask. However we are currently not using the converted operand as this is in a pattern statement. This change updates it to look at the actual statement to be

[PATCH 2/4]middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern

2024-09-03 Thread Tamar Christina
Hi All, Currently the vectorizer cheats when lowering COND_EXPR during bool recog. In the cases where the conditonal is loop invariant or non-boolean it instead converts the operation back into GENERIC and hides much of the operation from the analysis part of the vectorizer. i.e. a ? b : c is

[PATCH 3/4][rtl]: simplify boolean vector EQ and NE comparisons

2024-09-03 Thread Tamar Christina
Hi All, This adds vector constant simplification for EQ and NE. This is useful since the vectorizer generates a lot more vector compares now, in particular NE and EQ and so these help us optimize cases where the values were not known at GIMPLE but instead only at RTL. Bootstrapped Regtested on a

[PATCH 4/4]AArch64: Define VECTOR_STORE_FLAG_VALUE.

2024-09-03 Thread Tamar Christina
Hi All, This defines VECTOR_STORE_FLAG_VALUE to CONST1_RTX for AArch64 so we simplify vector comparisons in AArch64. With this enabled res: moviv0.4s, 0 cmeqv0.4s, v0.4s, v0.4s ret is simplified to: res: mvniv0.4s, 0 ret NOTE: I don't really

[pushed] c++: add fixed test [PR109095]

2024-09-03 Thread Marek Polacek
Tested x86_64-pc-linux-gnu, applying to trunk. -- >8 -- Fixed by r13-6693. PR c++/109095 gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-class66.C: New test. --- gcc/testsuite/g++.dg/cpp2a/nontype-class66.C | 19 +++ 1 file changed, 19 insertions(+) create mode

[PATCH] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski
This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that: ``` if (a) goto B else goto C; B: goto C; C:

[PATCH] coros: mark .CO_YIELD as LEAF [PR106973]

2024-09-03 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu. OK for trunk? -- >8 -- We rely on .CO_YIELD calls being followed by an assignment (optionally) and then a switch/if in the same basic block. This implies that a .CO_YIELD can never end a block. However, since a call to .CO_YIELD is still a call, if

[PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Andrew Pinski
This moves the check for # of statements to copy in join to be the first check. This check is the cheapest check so it should be first. Plus add a print to the dump file since there was none beforehand. gcc/ChangeLog: * gimple-ssa-split-paths.cc (is_feasible_trace): Move check for

[PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski
This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that: ``` if (a) goto B else goto C; B: goto C; C:

Re: [PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-03 Thread Richard Biener
> Am 03.09.2024 um 19:00 schrieb Tamar Christina : > > Hi All, > > The meaning of the testcase was changed by passing it -fwrapv. The reason for > the test failures on some platform was because the test was testing some > implementation defined behavior wrt INT_MIN in generic code. > > Inst

Re: [PING] [PATCH] rust: avoid clobbering LIBS

2024-09-03 Thread Marc
Richard Biener writes: > On Wed, Aug 28, 2024 at 11:10 AM Marc wrote: >> >> Hello, >> >> Gentle reminder for this simple autoconf patch :) > > OK. > > Note that completely wiping LIBS might remove requirements detected earlier, > like some systems require explicit -lc for example. I would inste

[PATCH] c++: noexcept and pointer to member function type [PR113108]

2024-09-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- We ICE in nothrow_spec_p because it got a DEFERRED_NOEXCEPT. This DEFERRED_NOEXCEPT was created in implicitly_declare_fn when declaring Foo& operator=(Foo&&) = default; in the test. The problem is that in resolve_overloa

[PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-09-03 Thread Pengxuan Zheng
This is similar to the recent improvements to the Advanced SIMD popcount expansion by using SVE. We can utilize SVE to generate more efficient code for scalar mode popcount too. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (popcount2): Update pattern to

[pushed 1/3] pretty-print: naming cleanups

2024-09-03 Thread David Malcolm
This patch is a followup to r15-3311-ge31b6176996567 making some cleanups to pretty-printing to reflect those changes: - renaming "chunk_info" to "pp_formatted_chunks" - renaming "cur_chunk_array" to "m_cur_fomatted_chunks" - rewording/clarifying comments and taking the opportunity to add a "m_" pr

[pushed 2/3] pretty-print: add selftest of pp_format's stack

2024-09-03 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-3430-gd0891f3aa75d31. gcc/ChangeLog: * pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New accessor. * pretty-print.cc (selftest::push_pp_format): New. (ASSERT_TEXT_TOK

[pushed 3/3] pretty-print: split up pretty_printer::format into subroutines

2024-09-03 Thread David Malcolm
The body of pretty_printer::format is almost 500 lines long, mostly comprising two distinct phases. This patch splits it up so that there are explicit subroutines for the two different phases, reducing the scope of various locals, and making it easier to e.g. put a breakpoint on phase 2. No funct

Re: [PING^3] [PATCH] PR116080: Fix test suite checks for musttail

2024-09-03 Thread Mike Stump
On Sep 2, 2024, at 4:23 PM, Andi Kleen wrote: > > Andi Kleen writes: > > PING^3 Ok. >> Andi Kleen writes: >> >> PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html >> >> This fixes some musttail related test suite failures that cause noise on >> various targets. >>

[pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- I don't see any reason why we can't allow the [[]] attribute syntax in C++98 mode with a pedwarn just like many other C++11 features. In fact, we already do support it in some places in the grammar, but not in places that check cp_nth_token

[PATCH] c++: ICE with TTP [PR96097]

2024-09-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside a template. That happens here because in template typename X> void func() {} template struct Y {}; void g() { func(); } when performing overload

Re: [PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This moves the check for # of statements to copy in join to be the first check. This check is the cheapest check so it should be first. Plus add a print to the dump file since there was none beforehand. gcc/ChangeLog: * gimple-ssa-split-paths.

Re: [PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that:

Re: [PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Jeff Law
On 9/3/24 12:11 PM, Andrew Pinski wrote: This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that:

Re: Ping: [PATCH v2] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Qing Zhao
thanks. Updated per your suggestion and pushed: https://gcc.gnu.org/pipermail/gcc-cvs/2024-September/408749.html Qing > On Sep 3, 2024, at 10:09, Jakub Jelinek wrote: > > On Tue, Sep 03, 2024 at 01:59:45PM +, Qing Zhao wrote: >> Hi, Jakub, >> >> I’d like to ping this simple patch again.

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Palmer Dabbelt
On Tue, 20 Aug 2024 23:18:36 PDT (-0700), jia...@iscas.ac.cn wrote: 在 2024/8/21 3:23, Palmer Dabbelt 写道: On Mon, 19 Aug 2024 21:53:54 PDT (-0700), jia...@iscas.ac.cn wrote: Supports RISC-V profiles[1] in -march option. Default input set the profile before other formal extensions. V2: Fixes s

Re: [pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Andrew Pinski
On Tue, Sep 3, 2024 at 3:01 PM Jason Merrill wrote: > > Tested x86_64-pc-linux-gnu, applying to trunk. > > -- 8< -- > > I don't see any reason why we can't allow the [[]] attribute syntax in C++98 > mode with a pedwarn just like many other C++11 features. In fact, we > already do support it in so

[PATCH] object-size: Use simple_dce_from_worklist in object-size pass

2024-09-03 Thread Andrew Pinski
While trying to see if there was a way to improve object-size pass to use the ranger (for pointer plus), I noticed that it leaves around the statement containing __builtin_object_size if it was reduced to a constant. This fixes that by using simple_dce_from_worklist. Bootstrapped and tested on x86

[PUSHED] aarch64: Fix testcase vec-init-22-speed.c [PR116589]

2024-09-03 Thread Andrew Pinski
For this testcase, the trunk produces: ``` f_s16: fmovs31, w0 fmovs0, w1 ``` While the testcase was expecting what was produced in GCC 14: ``` f_s16: sxthw0, w0 sxthw1, w1 fmovd31, x0 fmovd0, x1 ``` After r15-1575-gea8061f46a

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Kito Cheng
I don't see there is conflict if we want to support both gnu2024 and RVI profiles? also I am not sure what the usage scenarios for the gnu2024 and how we defined that? On Wed, Sep 4, 2024 at 6:49 AM Palmer Dabbelt wrote: > > On Tue, 20 Aug 2024 23:18:36 PDT (-0700), jia...@iscas.ac.cn wrote: > >

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Andrew Waterman
As is normally the case when it comes to matters of RISC-V International, Palmer is taking the least-charitable interpretation and then adding a generous dollop of falsehoods. The RVA23U64 profile is set to be ratified soon, and that's our intended target for apps processors. On Tue, Sep 3, 2024

Re: [pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Jason Merrill
On 9/3/24 7:00 PM, Andrew Pinski wrote: On Tue, Sep 3, 2024 at 3:01 PM Jason Merrill wrote: Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- I don't see any reason why we can't allow the [[]] attribute syntax in C++98 mode with a pedwarn just like many other C++11 features. In fact,

Re: [PATCH v4] RISC-V: Supports Profiles in '-march' option.

2024-09-03 Thread Palmer Dabbelt
On Tue, 03 Sep 2024 18:05:42 PDT (-0700), Kito Cheng wrote: I don't see there is conflict if we want to support both gnu2024 and RVI profiles? Ya, they'd just be two different things aimed at solving the same set of problems. I'm just tired of users coming and complaining that stuff is broke

[PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-03 Thread Levy Hsu
Hi This change adds BFmode support to the ix86_preferred_simd_mode function enhancing SIMD vectorization for BF16 operations. The update ensures optimized usage of SIMD capabilities improving performance and aligning vector sizes with processor capabilities. Bootstrapped and tested on x86-64-pc-l

[PATCH] expand: Add dump for costing of positive divides

2024-09-03 Thread Andrew Pinski
While trying to understand PR 115910 I found it was useful to print out the two costs of doing a signed and unsigned division just like was added in r15-3272-g3c89c41991d8e8 for popcount==1. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (expand_expr_divmod): Add d

  1   2   >