PING^5: [PATCH] sibcall: Adjust BLKmode argument size for alignment padding

2024-11-13 Thread H.J. Lu
On Thu, Nov 7, 2024 at 1:40 PM H.J. Lu wrote: > > On Sat, Nov 2, 2024 at 6:48 AM H.J. Lu wrote: > > > > On Sat, Oct 26, 2024 at 7:25 AM H.J. Lu wrote: > > > > > > On Sun, Oct 20, 2024 at 6:42 AM H.J. Lu wrote: > > > > > > > > On Sun, Oct 13, 2024, 10:07 AM H.J. Lu wrote: > > > >> > > > >> Adju

Re: [PATCH v1] RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]

2024-11-13 Thread Kito Cheng
Pre-approved for that change, so you don't need to wait for another response :) Just a reminder that this requires either adding a new exp file or adding a few new lines in riscv.exp. On Thu, Nov 14, 2024 at 3:28 PM Li, Pan2 wrote: > > Make sense and sure thing, let me file another patch for thi

RE: [PATCH v1] RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]

2024-11-13 Thread Li, Pan2
Make sense and sure thing, let me file another patch for this. Pan -Original Message- From: Kito Cheng Sent: Thursday, November 14, 2024 3:22 PM To: 钟居哲 Cc: Li, Pan2 ; gcc-patches ; jeffreyalaw ; Robin Dapp Subject: Re: [PATCH v1] RISC-V: Rearrange the test files for scalar SAT_ADD

Re: [PATCH v1] RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]

2024-11-13 Thread Kito Cheng
Hi Pan: Could you create a sub folder in RISC-V to contain all saturation related testcase? e.g. gcc/testsuite/gcc.target/riscv/sat/ On Thu, Nov 14, 2024 at 2:48 PM 钟居哲 wrote: > > LGTM > > > juzhe.zh...@rivai.ai > > > From: pan2.li > Date: 2024-11-14 14:42 > To:

[PATCH v1] RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]

2024-11-13 Thread pan2 . li
From: Pan Li The test files of scalar SAT_ADD only has numbers as the suffix. Rearrange the file name to -{form number}-{target-type}. For example, test form 3 for uint32_t SAT_ADD will have -3-u32.c for asm check and -run-3-u32.c for the run test. The below test suites are passed for this patc

Re: [PATCH] libstdc++: Simplify _Hashtable merge functions

2024-11-13 Thread François Dumont
Sounds like a very good idea. Moreover friend declaration could be limited to another _Hashtable<> type with same _Key, _Value and _Alloc types to be compatible. On 08/11/2024 11:33, Jonathan Wakely wrote: On Thu, 7 Nov 2024 at 22:18, Jonathan Wakely wrote: I realised that _M_merge_unique an

Re: [PATCH] i386: Fix cstorebf4 fp comparison operand [PR117495]

2024-11-13 Thread Hongtao Liu
On Wed, Nov 13, 2024 at 10:00 AM Hongyu Wang wrote: > > Hi, > > For cstorebf4 it uses comparison_operator for BFmode compare, which is > incorrect when directly uses ix86_expand_setcc as it does not canonicalize > the input comparison to correct the compare code by swapping operands. > Since the o

[GCC13 PATCH] testsuite: Correct dg-error to dg-warning for cmpccxadd testcase in GCC13

2024-11-13 Thread Haochen Jiang
Hi all, In GCC13, the error for GCC14+ is actually a warning for the pointer type. Correct that in testcase. Commit as obvious. Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/cmpccxadd-1b.c: Change to dg-warning. --- gcc/testsuite/gcc.target/i386/cmpccxadd-1b.c | 4 ++-- 1 fi

[PATCH v4 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-11-13 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_int_cfmovcc): Expand to cfcmov pattern. * config/i386/i386-opts.h (enum apx_features): New. * config/i386/i386-protos.h (ix86_expand_int_cfmovcc): Define. * config/i386/i386.cc (ix86_rtx_costs): Add U

[PATCH v4 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-11-13 Thread Kong, Lingling
Hi, Many thanks to Richard for the suggestion that conditional load is like a scalar instance of maskload_optab . So this version has use maskload and maskstore optab to expand and generate cfcmov in ifcvt pass. All the changes passed bootstrap & regtest x86-64-pc-linux-gnu. We also tested spec

Re: [PATCH] RISC-V: Add VLS modes to strided loads.

2024-11-13 Thread 钟居哲
LGTM. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-11-14 00:57 To: gcc-patches CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com Subject: [PATCH] RISC-V: Add VLS modes to strided loads. Hi, this patch adds V

Re: match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-13 Thread Andrew Pinski
On Wed, Nov 13, 2024 at 5:06 AM Jovan Vukic wrote: > > The patch simplifies expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a > to the constants 0, -1, and -1, respectively. > > Currently, GCC does not perform these simplifications. > > Bootstrapped and tested on x86-linux-gnu with no regre

Re: [PATCH] Fortran: fix passing of NULL() actual argument to character dummy [PR104819]

2024-11-13 Thread Jerry D
On 11/13/24 2:26 PM, Harald Anlauf wrote: Dear all, the attached patch is the third part of a series to fix the handling of NULL() passed to pointer dummy arguments. This one addresses character dummy arguments (scalar, assumed-shape, assumed-rank) for various uses in the caller. The patch is

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Andrew Carlotti
On Wed, Nov 13, 2024 at 07:03:44PM +, Richard Sandiford wrote: > Andrew Carlotti writes: > > On Tue, Nov 12, 2024 at 10:42:50PM +, Richard Sandiford wrote: > >> Sorry for the slow review. I think Jeff's much better placed to comment > >> on this than I am, but here's a stab. Mostly it lo

Re: match.pd: Add pattern to simplify `((X - 1) & ~X) < 0` to `X == 0`

2024-11-13 Thread Andrew Pinski
On Wed, Nov 13, 2024 at 5:14 AM Jovan Vukic wrote: > > The patch makes the following simplifications: > ((X - 1) & ~X) < 0 -> X == 0 > ((X - 1) & ~X) >= 0 -> X != 0 > > On x86, the number of instructions is reduced from 4 to 3, > but on platforms like RISC-V, it reduces to a single instruction. >

Re: [RFC 3/9] aarch64: add new insn definition for st2g

2024-11-13 Thread Richard Sandiford
Indu Bhagat writes: > Store Allocation Tags (st2g) is an Armv8.5-A memory tagging (MTE) > instruction. It stores an allocation tag to two tag granules of memory. > > TBD: > - Not too sure what is the best way to generate the st2g yet; A > subsequent patch will emit them in one of the target

Re: [RFC 2/9] aarch64: add new define_insn for subg

2024-11-13 Thread Andrew Pinski
On Thu, Nov 7, 2024 at 1:41 PM Indu Bhagat wrote: > > subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE) > instruction. It can be used to subtract an immediate value scaled by > the tag granule from the address in the source register. > > gcc/ChangeLog: > > * config/aarch64/aar

Re: [RFC 2/9] aarch64: add new define_insn for subg

2024-11-13 Thread Richard Sandiford
Indu Bhagat writes: > subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE) > instruction. It can be used to subtract an immediate value scaled by > the tag granule from the address in the source register. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (subg): New definition. T

Ping #4: [PATCH] PR 99293: Optimize splat of a V2DF/V2DI extract with constant element

2024-11-13 Thread Michael Meissner
This patch seems to have been over looked. https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663101.html I ran a set of spec 2017 benchmarks with this patch applied and compared it to a run without the patch applied. There were no regressions, but 3 benchmarks had slight improvement in ru

[PATCH] Fortran: fix passing of NULL() actual argument to character dummy [PR104819]

2024-11-13 Thread Harald Anlauf
Dear all, the attached patch is the third part of a series to fix the handling of NULL() passed to pointer dummy arguments. This one addresses character dummy arguments (scalar, assumed-shape, assumed-rank) for various uses in the caller. The patch is a little larger than I expected, due to corn

[PATCH V3, 06/11] Change TARGET_CMPB to TARGET_POWER6

2024-11-13 Thread Michael Meissner
Note, in the V2 patch series, I forgot to post this patch. As part of the architecture flags patches, this patch changes the use of TARGET_CMPB to TARGET_POWER6. The CMPB instruction was added in power6 (ISA 2.05). I have built both big endian and little endian bootstrap compilers and there were

[PATCH] libgcc: Fix COPY_ARG_VAL initializer (PR 117537)

2024-11-13 Thread Christophe Lyon
We recently forced -Werror when building libgcc for aarch64, to make sure we'd catch and fix the kind of problem described in the PR. In this case, when building for aarch64_be (so, big endian), gcc emits this warning/error: libgcc/config/libbid/bid_conf.h:847:25: error: missing braces around ini

Re: [PATCH 3/5] ctf: translate annotation DIEs to internal ctf

2024-11-13 Thread Indu Bhagat
On 10/30/24 11:31 AM, David Faust wrote: Translate DW_TAG_GNU_annotation DIEs created for C attributes btf_decl_tag and btf_type_tag into an in-memory representation in the CTF/BTF container. They will be output in BTF as BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records. The new CTF kinds used t

Ping: [PATCH] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations

2024-11-13 Thread Michael Meissner
This patch seems to have been overlooked: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666393.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com

Re: [PATCH v3 0/8] SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement

2024-11-13 Thread Martin Storsjö
On Tue, 12 Nov 2024, Richard Sandiford wrote: Evgeny Karpov writes: Hello, Thank you for reviewing v2! v3 addresses all comments on v2. Changes in v3: - Refactor implementation for the offset limit extension in "symbol + offset" from 1MB to 16MB. - Apply HOST_WIDE_INT_PRINT_UNSIGNED in ASM

Re: [PATCH] libstdc++: Refactor std::hash specializations

2024-11-13 Thread Jonathan Wakely
I've pushed this now. On Wed, 6 Nov 2024 at 15:50, Jonathan Wakely wrote: > > This attempts to simplify and clean up our std::hash code. The primary > benefit is improved diagnostics for users when they do something wrong > involving std::hash or unordered containers. An additional benefit is > t

Re: [PATCH 0/12] libstdc++: Refactor _Hashtable class

2024-11-13 Thread Jonathan Wakely
I've pushed this series now. On Fri, 8 Nov 2024 at 15:46, Jonathan Wakely wrote: > > This patch series attempts to remove some unnecessary complexity in the > internals of std::unordered_xxx containers. There is a lot of overloading, tag > dispatching, and inheritance that can be removed by using

[PATCH V3, 11/11] Add -mcpu=future tuning support.

2024-11-13 Thread Michael Meissner
This patch makes -mtune=future use the same tuning decision as -mtune=power11. 2024-11-13 Michael Meissner gcc/ * config/rs6000/power10.md (all reservations): Add future as an alterntive to power10 and power11. --- gcc/config/rs6000/power10.md | 144 +-

Ping: [PATCH 0/6] PowerPC Future support (Dense Math Registers)

2024-11-13 Thread Michael Meissner
Ping the following patch series to add PowerPC Future support for Dense Math Registers: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/62.html https://gcc.gnu.org/pipermail/gcc-patches/2024-October/63.html https://gcc.gnu.org/pipermail/gcc-patches/2024-October/64.html https://g

[committed] libstdc++: Fix calculation of system time in performance tests

2024-11-13 Thread Jonathan Wakely
The system_time() function used the wrong element of the splits array. Also add a comment about the units for time measurements. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_performance.h (time_counter): Add comment about times. (time_counter::system_time): Use corr

[committed] libstdc++: Stop using std::unary_function in perf tests

2024-11-13 Thread Jonathan Wakely
This fixes some -Wdeprecated-declarations warnings. libstdc++-v3/ChangeLog: * testsuite/performance/ext/pb_ds/hash_int_erase_mem.cc: Replace std::unary_function with result_type and argument_type typedefs. * testsuite/util/performance/assoc/multimap_common_type.hpp:

[committed] libstdc++: Write timestamp to libstdc++-performance.sum file

2024-11-13 Thread Jonathan Wakely
The results of 'make check-performance' are appended to the .sum file, with no indication where one set of results ends and the next begins. We could just remove the file when starting a new run, but appending makes it a little easier to compare with previous runs, without having to copy and store

[committed] libstdc++: Use __is_single_threaded() in performance tests

2024-11-13 Thread Jonathan Wakely
With recent glibc releases the __gthread_active_p() function is always true, so we always append "-thread" onto performance benchmark names. Use the __gnu_cxx::__is_single_threaded() function instead. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_performance.h: Use __gnu_cxx

[committed] libstdc++: Fix nodiscard warnings in perf test for memory pools

2024-11-13 Thread Jonathan Wakely
The use of unnamed std::lock_guard temporaries was intentional here, as they were used like barriers (but std::barrier isn't available until C++20). But that gives nodiscard warnings, because unnamed temporary locks are usually unintentional. Use named variables in new block scopes instead. libstd

Ping: [PATCH, V2] PowerPC vector pair support

2024-11-13 Thread Michael Meissner
This patch appears to be overlooked: The first link is the long explanation of the patch, and the second link is the patch itself. https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667451.html https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667452.html -- Michael Meissner, IBM PO

Re: [PATCH V2 1/11] Add rs6000 architecture masks.

2024-11-13 Thread Michael Meissner
On Fri, Nov 08, 2024 at 02:28:11PM -0600, Peter Bergner wrote: > On 11/8/24 1:44 PM, Michael Meissner wrote: > > diff --git a/gcc/config/rs6000/rs6000-arch.def > > b/gcc/config/rs6000/rs6000-arch.def > > new file mode 100644 > > index 000..e5b6e958133 > > --- /dev/null > > +++ b/gcc/config

[PATCH V3, 10/11] Add support for -mcpu=future

2024-11-13 Thread Michael Meissner
This patch adds the support that can be used in developing GCC support for future PowerPC processors. 2024-11-13 Michael Meissner * config.gcc (powerpc*-*-*): Add support for --with-cpu=future. * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future. * conf

[PATCH V3, 09/11] Update tests to work with architecture flags changes.

2024-11-13 Thread Michael Meissner
Two tests used -mvsx to raise the processor level to at least power7. These tests were rewritten to add cpu=power7 support. I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test case that used every archiecture define

[PATCH V3, 08/11] Change TARGET_MODULO to TARGET_POWER9

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_MODULO to TARGET_POWER9. The modulo instructions were added in power9 (ISA 3.0). Note, I did not change the uses of TARGET_MODULO where it was explicitly generating different code if the machine had a modulo instruct

[PATCH V3, 03/11] Do not allow -mvsx to boost processor to power7.

2024-11-13 Thread Michael Meissner
This patch restructures the code so that -mvsx for example will not silently convert the processor to power7. The user must now use -mcpu=power7 or higher. This means if the user does -mvsx and the default processor does not have VSX support, it will be an error. I have built both big endian and

[PATCH V3, 07/11] Change TARGET_POPCNTD to TARGET_POWER7

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_POPCNTD to TARGET_POWER7. The POPCNTD instruction was added in power7 (ISA 2.06). I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test ca

[PATCH V3, 05/11] Change TARGET_FPRND to TARGET_POWER5X

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_FPRND to TARGET_POWER5X. The FPRND instruction was added in power5+. I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test case that used

[PATCH V3, 04/11] Change TARGET_POPCNTB to TARGET_POWER5

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_POPCNTB to TARGET_POWER5. The POPCNTB instruction was added in ISA 2.02 (power5). I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test ca

[PATCH V3, 00/11] Separate PowerPC archiecture bits from ISA flags that use command line option.

2024-11-13 Thread Michael Meissner
These patches replaces the first patch in the 11 patch set that separates PowerPC architecture bits from ISA flags that use command line options. The V2 patch thread starts at: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668177.html The are two differences from the previous patches:

[PATCH V3, 02/11] Use architecture flags for defining _ARCH_PWR macros.

2024-11-13 Thread Michael Meissner
For the newer architectures, this patch changes GCC to define the _ARCH_PWR macros using the new architecture flags instead of relying on isa options like -mpower10. The -mpower8-internal, -mpower10, and -mpower11 options were removed. The -mpower11 option was removed completely, since it was jus

[PATCH V3, 01/11] Add rs6000 architecture masks.

2024-11-13 Thread Michael Meissner
Note, this patch fixes the attribution and the copyright year from the previous V2 page. This patch begins the journey to move architecture bits that are not user ISA options from rs6000_isa_flags to a new targt variable rs6000_arch_flags. The intention is to remove switches that are currently is

[PATCH 4/8] libdiagnostics v4: add C++ wrapper API

2024-11-13 Thread David Malcolm
Unchanged in v4 Changed in v3: * Moved the testsuite to a separate patch * Updated copyright year * class text_sink: New. * class file: Add default ctor, copy ctor, move ctor; make m_inner non-const * class physical_location: Add default ctor * class logical_location: Make m_inner non-const * cl

[PATCH 6/8] libdiagnostics v4: test suite

2024-11-13 Thread David Malcolm
Changed in v4: * Fix SARIF schema URL * Various changes to help with API docs Changed in v3: * split out the C and C++ API tests into this patch * heavily rewritten libdiagnostics.exp; added support for Python tests * tests updated for API changes, rewritten and extended gcc/testsuite/ChangeLog:

[PATCH 7/8] json: add json parsing support

2024-11-13 Thread David Malcolm
This patch implements JSON parsing support. It's based on the parsing parts of the patch I posted here: https://gcc.gnu.org/legacy-ml/gcc-patches/2017-08/msg00417.html with the parsing moved to a separate source file and header, heavily rewritten to capture source location information for JSON val

[PATCH 1/8] libdiagnostics v4: header

2024-11-13 Thread David Malcolm
Changed in v4: * added DIAGNOSTIC_SARIF_VERSION_2_2_PRERELEASE Changed in v3: * Added support for execution paths * Moved the test cases to another patch * diagnostic_manager_add_sarif_sink: add param "main_input_file" * Added diagnostic_text_sink_set_colorize * Added DIAGNOSTIC_LEVEL_SORRY * Upda

[PATCH 5/8] testsuite: move dg-test cleanup code from gcc-dg.exp to its own file

2024-11-13 Thread David Malcolm
I need to use this cleanup logic for the testsuite for libdiagnostics where it's too awkward to directly use gcc-dg.exp itself. No functional change intended. gcc/testsuite/ChangeLog: * lib/dg-test-cleanup.exp: New file, from material moved from lib/gcc-dg.exp. * lib/gcc-d

[PATCH 3/8] libdiagnostics: add API docs

2024-11-13 Thread David Malcolm
gcc/ChangeLog: * doc/libdiagnostics/Makefile: New file. * doc/libdiagnostics/conf.py: New file. * doc/libdiagnostics/index.rst: New file. * doc/libdiagnostics/make.bat: New file. * doc/libdiagnostics/topics/diagnostic-manager.rst: New file. * doc/libd

[PATCH 2/8] libdiagnostics v4: implementation

2024-11-13 Thread David Malcolm
Changed in v4: * Updated for the various changes to diagnostics in trunk * Reimplement FAIL_IF_NULL to stop checks being optimized away Changed in v3: * Added a --enable-libdiagnostics to configure.ac. It is disabled by default, and requires --enable-host-shared. * Split out gcc/testsuite/libdi

[PATCH 0/8] v4 of libdiagnostics

2024-11-13 Thread David Malcolm
Here's v4 of my patch kit for "libdiagnostics", which makes GCC's diagnostics subsystem available as a shared library; see: https://gcc.gnu.org/wiki/libdiagnostics New in v4: * tutorial and API documentation (see patch 4) * added DIAGNOSTIC_SARIF_VERSION_2_2_PRERELEASE * reimplemented FAIL_IF_NU

[patch,lra] PR117191 remove unnecessary CLOBBER insns after LRA

2024-11-13 Thread Denis Chertykov
The fix for PR117191 Wrong code appears after dse2 pass because it removes necessary insns. (ie insn 554 - store to frame spill slot) This happened because LRA pass doesn't cleanup the code exactly like reload does. The reload1.c has a special pass for such cleanup. The reload removes CLOBBER in

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Richard Sandiford
Andrew Carlotti writes: > On Tue, Nov 12, 2024 at 10:42:50PM +, Richard Sandiford wrote: >> Sorry for the slow review. I think Jeff's much better placed to comment >> on this than I am, but here's a stab. Mostly it looks really good to me >> FWIW. >> >> Andrew Carlotti writes: >> > This pa

Re: [PATCH 0/7] v3 of libdiagnostics

2024-11-13 Thread David Malcolm
On Wed, 2024-08-21 at 10:34 +0200, Richard Biener wrote: > On Wed, Aug 21, 2024 at 2:01 AM David Malcolm > wrote: > > > > On Tue, 2024-08-20 at 11:49 +0200, Richard Biener wrote: > > > On Thu, Aug 15, 2024 at 8:13 PM David Malcolm > > > > > > wrote: > > > > > > > > Here's v3 of my patch kit for

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Richard Sandiford
Richard Biener writes: > On Tue, 12 Nov 2024, Richard Sandiford wrote: > >> Sorry for the slow review. I think Jeff's much better placed to comment >> on this than I am, but here's a stab. Mostly it looks really good to me >> FWIW. >> >> Andrew Carlotti writes: >> > This pass is used to optimi

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Andrew Carlotti
On Tue, Nov 12, 2024 at 10:42:50PM +, Richard Sandiford wrote: > Sorry for the slow review. I think Jeff's much better placed to comment > on this than I am, but here's a stab. Mostly it looks really good to me > FWIW. > > Andrew Carlotti writes: > > This pass is used to optimise assignment

[pushed] aarch64: Relax add_overloaded_function assert

2024-11-13 Thread Richard Sandiford
There are some SVE intrinsics that support one set of suffixes for one extension (E1, say) and another set of suffixes for another extension (E2, say). It is usually the case that, mutatis mutandis, E2 extends E1. Listing E1 first would then ensure that the manual C overload would also require E1

[PATCH v2 4/4] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds support for the following intrinsics: - svmlalb[_f16_mf8]_fpm - svmlalb[_n_f16_mf8]_fpm - svmlalt[_f16_mf8]_fpm - svmlalt[_n_f16_mf8]_fpm - svmlalb_lane[_f16_mf8]_fpm - svmlalt_lane[_f16_mf8]_fpm - svmlallbb[_f32_mf8]_fpm - svmlallbb[_n_f32_mf8]_fpm - svmlallbt[_f32_mf8]_fpm - svml

[PATCH] cfgexpand: Skip doing conflicts if there is only 1 variable

2024-11-13 Thread Andrew Pinski
This is a small speed up. If there is only one know stack variable, there is no reason figure out the scope conflicts as there are none. So don't go through all the live range calculations just to see there are none. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog:

[PATCH] ada: PR target/117538 Traceback includes load address if executable is PIE.

2024-11-13 Thread Simon Wright
If s-trasym.adb (System.Traceback.Symbolic, used as a renaming by GNAT.Traceback.Symbolic) is given a traceback from a position-independent executable, it does not include the executable's load address in the report. This is necessary in order to decode the traceback report. Note, this has already

[PATCH] RISC-V: Add VLS modes to strided loads.

2024-11-13 Thread Robin Dapp
Hi, this patch adds VLS modes to the strided load expanders. Regtested on rv64gcv and handing it over to the CI. Regards Robin gcc/ChangeLog: * config/riscv/autovec.md: Add VLS modes. * config/riscv/vector-iterators.md: Ditto. * config/riscv/vector.md: Ditto. --- gcc/

[PATCH v3 2/4] aarch64: specify fpm mode in function instances and groups

2024-11-13 Thread Claudio Bantaloukas
Some intrinsics require setting the fpm register before calling the specific asm opcode required. In order to simplify review, this patch: - adds the fpm_mode_index attribute to function_group_info and function_instance objects - updates existing initialisations and call sites. - updates equalit

[PATCH v2 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an additional argument of type fpm_t. The following patches introduce: - the types - intrinsics that operate without the fpm_

[PATCH v2 3/4] aarch64: add svcvt* FP8 intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds the following intrinsics: - svcvt1_bf16[_mf8]_fpm - svcvt1_f16[_mf8]_fpm - svcvt2_bf16[_mf8]_fpm - svcvt2_f16[_mf8]_fpm - svcvtlt1_bf16[_mf8]_fpm - svcvtlt1_f16[_mf8]_fpm - svcvtlt2_bf16[_mf8]_fpm - svcvtlt2_f16[_mf8]_fpm - svcvtn_mf8[_f16_x2]_fpm (unpredicated) - svcvtnb_mf8[_f32_

[PATCH] RISC-V: Tie MUL and DIV masks to the M extension

2024-11-13 Thread Dimitar Dimitrov
When configuring GCC for RV32EC with: ./configure \ --target=riscv32-none-elf \ --with-multilib-generator="rv32ec-ilp32e--" \ --with-abi=ilp32e \ --with-arch=rv32ec Then the build fails becaus

[PATCH v3 4/4] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds support for the following intrinsics: - svmlalb[_f16_mf8]_fpm - svmlalb[_n_f16_mf8]_fpm - svmlalt[_f16_mf8]_fpm - svmlalt[_n_f16_mf8]_fpm - svmlalb_lane[_f16_mf8]_fpm - svmlalt_lane[_f16_mf8]_fpm - svmlallbb[_f32_mf8]_fpm - svmlallbb[_n_f32_mf8]_fpm - svmlallbt[_f32_mf8]_fpm - svml

[PATCH v3 3/4] aarch64: add svcvt* FP8 intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds the following intrinsics: - svcvt1_bf16[_mf8]_fpm - svcvt1_f16[_mf8]_fpm - svcvt2_bf16[_mf8]_fpm - svcvt2_f16[_mf8]_fpm - svcvtlt1_bf16[_mf8]_fpm - svcvtlt1_f16[_mf8]_fpm - svcvtlt2_bf16[_mf8]_fpm - svcvtlt2_f16[_mf8]_fpm - svcvtn_mf8[_f16_x2]_fpm (unpredicated) - svcvtnb_mf8[_f32_

[PATCH v3 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an additional argument of type fpm_t. The following patches introduce: - the types - intrinsics that operate without the fpm_

Re: [PATCH v2 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
Please disregard this series, posted as v2 by mistake. Cheers, Claudio On 11/13/2024 4:34 PM, Claudio Bantaloukas wrote: The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an

[PATCH v2 2/4] aarch64: specify fpm mode in function instances and groups

2024-11-13 Thread Claudio Bantaloukas
Some intrinsics require setting the fpm register before calling the specific asm opcode required. In order to simplify review, this patch: - adds the fpm_mode_index attribute to function_group_info and function_instance objects - updates existing initialisations and call sites. - updates equalit

Re: [PATCH v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx"

2024-11-13 Thread Alexey Lapshin
Takayuki, thank you for the quick fix! It seems works good now except only one degradation. Instead generating two instructions: 7 ptr += (i & 1); 0x40078564 <+12>:extui a9, a8, 0, 1 0x40078567 <+15>:addx2 a2, a9, a2 Now it generates three: 7 ptr

Re: [PATCH] v2: Run selftests for C++ as well as C

2024-11-13 Thread Thomas Schwinge
Hi! I'd like to add selftests for an aspect of the GCC/nvptx back end's multilib configuration, outside of the language front ends: at Makefile/shell level. Looking into GCC's selftest implementation, I found one issue to potentially refactor: On 2018-10-13T09:12:03-0400, David Malcolm wrote: >

Re: rs6000: Add -msplit-patch-nops (PR112980)

2024-11-13 Thread Andreas Schwab
On Nov 13 2024, Michael Matz wrote: > @@ -31658,6 +31660,17 @@ requires @code{.plt} and @code{.got} > sections that are both writable and executable. > This is a PowerPC 32-bit SYSV ABI option. > > +@opindex msplit-patch-nops > +@item -msplit-patch-nops > +When adding NOPs for a patchable area

rs6000: Add -msplit-patch-nops (PR112980)

2024-11-13 Thread Michael Matz
Hello, this is essentially https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html from Kewen in functionality. When discussing this with Segher at the Cauldron he expressed reservations about changing the default implementation of -fpatchable-function-entry. So, to move forward, l

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
On Wed 2024-11-13 15:18:32, Jan Hubicka wrote: > > - sincos and all functions working with arrays ... Because these > > functions have pointer arguments and that would require a bigger > > rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever > > generates calls to these funct

[committed] hppa: Remove inner `fix:SF/DF` from fixed-point patterns

2024-11-13 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to all active branches. Dave --- hppa: Remove inner `fix:SF/DF` from fixed-point patterns 2024-11-13 John David Anglin gcc/ChangeLog: PR target/117525 * config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:S

Re: [PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

2024-11-13 Thread Robin Dapp
OK. For your other patch I suggest you resubmit with the RISC-V typo fixed so the CI can pick it up. Generally, it looks reasonable. -- Regards Robin

Re: [PATCH v3 11/23] aarch64: Add GCS support for nonlocal stack save

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > Nonlocal stack save and restore has to also save and restore the GCS > pointer. This is used in __builtin_setjmp/longjmp and nonlocal goto. > > The GCS specific code is only emitted if GCS branch-protection is > enabled and the code always checks

Re: Implement removal of malloc/free pairs with NULL check

2024-11-13 Thread Richard Biener
On Wed, 6 Nov 2024, Jan Hubicka wrote: > Hi, > this is updated patch which adds -fmalloc-dce flag to control malloc/free > removal. I ended up copying what -fallocation-dse does so -fmalloc-dce=1 > enables malloc/free removal provided return value is unused otherwise and > -fmalloc-dce=2 allows a

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Jan Hubicka
> - sincos and all functions working with arrays ... Because these > functions have pointer arguments and that would require a bigger > rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever > generates calls to these functions. GCC is able to recognize sin and cos calls and tu

Re: Add testcase that we optimize away empty std::vector

2024-11-13 Thread Jan Hubicka
> On Tue, Nov 12, 2024 at 04:00:03PM +0100, Jan Hubicka wrote: > > Hi, > > with __builtin_operator_new we now can optimize away unused std::vectors. > > This adds testcases mentioned in the PR. > > > > Regtested x86_64-linux and comitted. > > > > PR tree-optimization/96945 > > > > gcc/testsu

Re: [RFC/RFA][PATCH v6 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-13 Thread Mariam Arutunian
On Tue, Nov 12, 2024 at 2:15 AM Jeff Law wrote: > > + > > + > > +/* Generate assembly to calculate CRC using clmul instruction. > > + The following code will be generated when the CRC and data sizes are > equal: > > + li a4,quotient > > + li a5,polynomial > > + xor a0,

[PATCH] tree-optimization/117554 - correct single-element interleaving check

2024-11-13 Thread Richard Biener
In addition to a single DR we also require a single lane, not a splat. Boostrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/117554 * tree-vect-stmts.cc (get_group_load_store_type): We can use gather/scatter only for a single-lane single element gr

[PATCH] tree-optimization/117556 - SLP of live stmts from load-lanes

2024-11-13 Thread Richard Biener
The following fixes SLP live lane generation for load-lanes which fails to analyze for gcc.dg/vect/vect-live-slp-3.c because the VLA division doesn't work out but it would also wrongly use the transposed vector defs I think. The following properly disables the actual load-lanes SLP node from live

Re: [PATCH] c++: Add __builtin_operator_{new,delete} support

2024-11-13 Thread Jan Hubicka
> Hi! > > clang++ adds __builtin_operator_{new,delete} builtins which as documented > work similarly to ::operator {new,delete}, except that it is an error > if the called ::operator {new,delete} is not a replaceable global operator > and allow optimizations which C++ normally allows just when tho

Re: [PATCH v4 4/7] OpenMP: C++ front-end support for dispatch + adjust_args

2024-11-13 Thread Tobias Burnus
Hi PA, thanks for the updated patch! Paul-Antoine Arras wrote: OpenMP: C++ front-end support for dispatch + adjust_args This patch adds C++ support for the `dispatch` construct and the `adjust_args` clause. It relies on the c-family bits comprised in the corresponding C f

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-13 Thread Wilco Dijkstra
Hi Richard, > ...I still think we should avoid testing can_create_pseudo_p. > Does it work with the last part replaced by: > >  if (!DECIMAL_FLOAT_MODE_P (mode)) >    { >  if (aarch64_can_const_movi_rtx_p (src, mode) >  || aarch64_float_const_representable_p (src) >  || aarch64

[pushed: r15-5202] diagnostics: avoid using global_dc in path-printing

2024-11-13 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-5202-g5ace2b23199f42. gcc/analyzer/ChangeLog: * checker-path.cc (checker_path::debug): Explicitly use global_dc's reference printer. * diagnostic-manager.cc (diagnostic_manager::pr

Re: [PATCH v3 08/23] aarch64: Add GCS builtins

2024-11-13 Thread Richard Sandiford
Richard Sandiford writes: > Yury Khrustalev writes: >> From: Szabolcs Nagy >> >> Add new builtins for GCS: >> >> void *__builtin_aarch64_gcspr (void) >> uint64_t __builtin_aarch64_gcspopm (void) >> void *__builtin_aarch64_gcsss (void *) >> >> The builtins are always enabled, but should be

match.pd: Add pattern to simplify `((X - 1) & ~X) < 0` to `X == 0`

2024-11-13 Thread Jovan Vukic
The patch makes the following simplifications: ((X - 1) & ~X) < 0 -> X == 0 ((X - 1) & ~X) >= 0 -> X != 0 On x86, the number of instructions is reduced from 4 to 3, but on platforms like RISC-V, it reduces to a single instruction. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc

[PATCH] tree-optimization/117559 - avoid hybrid SLP for masked load/store lanes

2024-11-13 Thread Richard Biener
Hybrid analysis is confused by the mask_conversion pattern making a uniform mask non-uniform. As load/store lanes only uses a single lane to mask all data lanes the SLP graph doesn't cover the alternate (redundant) mask lanes and thus their pattern defs. The following adds a hack to mark them cov

match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-13 Thread Jovan Vukic
The patch simplifies expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a to the constants 0, -1, and -1, respectively. Currently, GCC does not perform these simplifications. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: New pattern. gcc/t

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
Hi Honza, Here is the second version of the patch. On Mon 2024-11-11 18:31:47, Jan Hubicka wrote: > > We currently support generating vectorized math calls to the AMD core > > math library (ACML) (-mveclibabi=acml). That library is end-of-life and > > its successor is the math library from AMD O

Re: [PATCH v3 21/23] aarch64: Add tests and docs for indirect_return attribute

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Richard Ball > > This patch adds a new testcase and docs for indirect_return > attribute. > > gcc/ChangeLog: > > * doc/extend.texi: Add AArch64 docs for indirect_return > attribute. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/indirect_re

Re: [PATCH v3 20/23] aarch64: Introduce indirect_return attribute

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > Tail calls of indirect_return functions from non-indirect_return > functions are disallowed even if BTI is disabled, since the call > site may have BTI enabled. > > Following x86, mismatching attribute on function pointers is not > a type error ev

Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-11-13 Thread Richard Sandiford
Jennifer Schmitz writes: > As follow-up to > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, > this patch implements folding of svmul and svdiv by -1 to svneg for > unsigned SVE vector types. The key idea is to reuse the existing code that > does this fold for signed types and

Re: [PATCH v2] contrib/: Configure git-format-patch(1) to add To: gcc-patches@gcc.gnu.org

2024-11-13 Thread Alejandro Colomar
Hi Eric, On Thu, Oct 17, 2024 at 03:20:11PM GMT, Eric Gallager wrote: > On Thu, Oct 17, 2024 at 10:54 AM Alejandro Colomar wrote: > > > > Just like we already do for git-send-email(1). In some cases, patches > > are prepared with git-format-patch(1), but are sent with a different > > program, or

  1   2   >