Re: [PATCH] Match: Fold pow calls to ldexp when possible [PR57492]

2024-11-13 Thread Richard Biener
On Wed, 13 Nov 2024, Soumya AR wrote: > > > > On 12 Nov 2024, at 6:19 PM, Richard Biener wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Mon, 11 Nov 2024, Soumya AR wrote: > > > >> Hi Richard, > >> > >>> On 7 Nov 2024, at 6:10 PM, Richard Biener wrot

RE: [PATCH 1/2] Add suggested_epilogue_mode to vector costs

2024-11-13 Thread Richard Biener
On Wed, 13 Nov 2024, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Monday, November 11, 2024 12:17 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford ; Tamar Christina > > > > Subject: [PATCH 1/2] Add suggested_epilogue_mode to vector costs >

Re: [PATCH] Add support for nonnull_if_nonzero attribute [PR117023]

2024-11-13 Thread Richard Biener
On Tue, 12 Nov 2024, Jakub Jelinek wrote: > Hi! > > As mentioned in an earlier thread, C2Y voted in a change which made > various library APIs callable with NULL arguments in certain cases, > e.g. > memcpy (NULL, NULL, 0); > is now valid, although > memcpy (NULL, NULL, 1); > remains invalid. Thi

[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

2024-11-13 Thread Jin Ma
error: unrecognizable insn: (insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0) (unspec:RVVM1SF [ (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (reg:DI 0 zero)

Re: [PATCH] Add support for nonnull_if_nonzero attribute [PR117023]

2024-11-13 Thread Jakub Jelinek
On Wed, Nov 13, 2024 at 10:13:23AM +0100, Richard Biener wrote: > > /* Return true if OP can be inferred to be a non-NULL after STMT > > - executes by using attributes. */ > > + executes by using attributes. If OP2 is non-NULL and nonnull_if_nonzero > > + is the only attribute implying OP

Re: [PATCH v2] contrib/: Configure git-format-patch(1) to add To: gcc-patches@gcc.gnu.org

2024-11-13 Thread Alejandro Colomar
Hi Eric, On Thu, Oct 17, 2024 at 03:20:11PM GMT, Eric Gallager wrote: > On Thu, Oct 17, 2024 at 10:54 AM Alejandro Colomar wrote: > > > > Just like we already do for git-send-email(1). In some cases, patches > > are prepared with git-format-patch(1), but are sent with a different > > program, or

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
Hi Honza, Here is the second version of the patch. On Mon 2024-11-11 18:31:47, Jan Hubicka wrote: > > We currently support generating vectorized math calls to the AMD core > > math library (ACML) (-mveclibabi=acml). That library is end-of-life and > > its successor is the math library from AMD O

match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-13 Thread Jovan Vukic
The patch simplifies expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a to the constants 0, -1, and -1, respectively. Currently, GCC does not perform these simplifications. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: New pattern. gcc/t

[PATCH] tree-optimization/117559 - avoid hybrid SLP for masked load/store lanes

2024-11-13 Thread Richard Biener
Hybrid analysis is confused by the mask_conversion pattern making a uniform mask non-uniform. As load/store lanes only uses a single lane to mask all data lanes the SLP graph doesn't cover the alternate (redundant) mask lanes and thus their pattern defs. The following adds a hack to mark them cov

Re: [PATCH v3 ] i386: Add ix86_expand_integer_cst_argument

2024-11-13 Thread Jakub Jelinek
On Wed, Nov 13, 2024 at 09:22:45AM +0100, Richard Biener wrote: > While I'm far from an expert here this doesn't look right and instead the > const_0_to_255_operand looks bogus to me in not properly taking into > account 'mode'. I think the bug is in use of const_0_to_255* predicates with QImode o

Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-11-13 Thread Richard Sandiford
Jennifer Schmitz writes: > As follow-up to > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, > this patch implements folding of svmul and svdiv by -1 to svneg for > unsigned SVE vector types. The key idea is to reuse the existing code that > does this fold for signed types and

Re: [PATCH v3 20/23] aarch64: Introduce indirect_return attribute

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > Tail calls of indirect_return functions from non-indirect_return > functions are disallowed even if BTI is disabled, since the call > site may have BTI enabled. > > Following x86, mismatching attribute on function pointers is not > a type error ev

Re: [PATCH v3 21/23] aarch64: Add tests and docs for indirect_return attribute

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Richard Ball > > This patch adds a new testcase and docs for indirect_return > attribute. > > gcc/ChangeLog: > > * doc/extend.texi: Add AArch64 docs for indirect_return > attribute. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/indirect_re

match.pd: Add pattern to simplify `((X - 1) & ~X) < 0` to `X == 0`

2024-11-13 Thread Jovan Vukic
The patch makes the following simplifications: ((X - 1) & ~X) < 0 -> X == 0 ((X - 1) & ~X) >= 0 -> X != 0 On x86, the number of instructions is reduced from 4 to 3, but on platforms like RISC-V, it reduces to a single instruction. Bootstrapped and tested on x86-linux-gnu with no regressions. gcc

Re: [PATCH v3 08/23] aarch64: Add GCS builtins

2024-11-13 Thread Richard Sandiford
Richard Sandiford writes: > Yury Khrustalev writes: >> From: Szabolcs Nagy >> >> Add new builtins for GCS: >> >> void *__builtin_aarch64_gcspr (void) >> uint64_t __builtin_aarch64_gcspopm (void) >> void *__builtin_aarch64_gcsss (void *) >> >> The builtins are always enabled, but should be

[pushed: r15-5202] diagnostics: avoid using global_dc in path-printing

2024-11-13 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-5202-g5ace2b23199f42. gcc/analyzer/ChangeLog: * checker-path.cc (checker_path::debug): Explicitly use global_dc's reference printer. * diagnostic-manager.cc (diagnostic_manager::pr

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-13 Thread Wilco Dijkstra
Hi Richard, > ...I still think we should avoid testing can_create_pseudo_p. > Does it work with the last part replaced by: > >  if (!DECIMAL_FLOAT_MODE_P (mode)) >    { >  if (aarch64_can_const_movi_rtx_p (src, mode) >  || aarch64_float_const_representable_p (src) >  || aarch64

Re: [PATCH] Match: Fold pow calls to ldexp when possible [PR57492]

2024-11-13 Thread Soumya AR
> On 12 Nov 2024, at 6:19 PM, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Mon, 11 Nov 2024, Soumya AR wrote: > >> Hi Richard, >> >>> On 7 Nov 2024, at 6:10 PM, Richard Biener wrote: >>> >>> External email: Use caution opening links or attach

[PATCH] v2: Add support for nonnull_if_nonzero attribute [PR117023]

2024-11-13 Thread Jakub Jelinek
On Tue, Nov 12, 2024 at 06:34:39PM +0100, Jakub Jelinek wrote: > What do you think about this? So far lightly tested. Unfortunately bootstrap/regtest revealed some issues in the patch, the tree-ssa-ccp.cc changes break bootstrap because fntype in there may be NULL and that is what get_nonnull_arg

RE: [PATCH 1/2] Add suggested_epilogue_mode to vector costs

2024-11-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Monday, November 11, 2024 12:17 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Tamar Christina > > Subject: [PATCH 1/2] Add suggested_epilogue_mode to vector costs > > The following enables targets to suggest the vector mode

Re: [PATCH 5/5] doc: document btf_type_tag and btf_decl_tag attributes

2024-11-13 Thread Indu Bhagat
On 10/30/24 11:31 AM, David Faust wrote: gcc/ * doc/extend.texi (Common Variable Attributes): Document new btf_decl_tag attribute. (Common Type Attributes): Document new btf_type_tag attribute. --- gcc/doc/extend.texi | 68 +

Re: [PATCH 4/5] btf: generate and output DECL_TAG and TYPE_TAG records

2024-11-13 Thread Indu Bhagat
On 10/30/24 11:31 AM, David Faust wrote: Support the btf_decl_tag and btf_type_tag attributes in BTF by creating and emitting BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records, respectively, for them. Some care is required when -gprune-btf is in effect to avoid emitting decl or type tags for decla

Re: [RFC/RFA][PATCH v6 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-13 Thread Mariam Arutunian
On Tue, Nov 12, 2024 at 2:15 AM Jeff Law wrote: > > + > > + > > +/* Generate assembly to calculate CRC using clmul instruction. > > + The following code will be generated when the CRC and data sizes are > equal: > > + li a4,quotient > > + li a5,polynomial > > + xor a0,

Re: Add testcase that we optimize away empty std::vector

2024-11-13 Thread Jan Hubicka
> On Tue, Nov 12, 2024 at 04:00:03PM +0100, Jan Hubicka wrote: > > Hi, > > with __builtin_operator_new we now can optimize away unused std::vectors. > > This adds testcases mentioned in the PR. > > > > Regtested x86_64-linux and comitted. > > > > PR tree-optimization/96945 > > > > gcc/testsu

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Jan Hubicka
> - sincos and all functions working with arrays ... Because these > functions have pointer arguments and that would require a bigger > rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever > generates calls to these functions. GCC is able to recognize sin and cos calls and tu

Re: [PATCH v3 11/23] aarch64: Add GCS support for nonlocal stack save

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > Nonlocal stack save and restore has to also save and restore the GCS > pointer. This is used in __builtin_setjmp/longjmp and nonlocal goto. > > The GCS specific code is only emitted if GCS branch-protection is > enabled and the code always checks

rs6000: Add -msplit-patch-nops (PR112980)

2024-11-13 Thread Michael Matz
Hello, this is essentially https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html from Kewen in functionality. When discussing this with Segher at the Cauldron he expressed reservations about changing the default implementation of -fpatchable-function-entry. So, to move forward, l

Re: rs6000: Add -msplit-patch-nops (PR112980)

2024-11-13 Thread Andreas Schwab
On Nov 13 2024, Michael Matz wrote: > @@ -31658,6 +31660,17 @@ requires @code{.plt} and @code{.got} > sections that are both writable and executable. > This is a PowerPC 32-bit SYSV ABI option. > > +@opindex msplit-patch-nops > +@item -msplit-patch-nops > +When adding NOPs for a patchable area

[PATCH v2 2/4] aarch64: specify fpm mode in function instances and groups

2024-11-13 Thread Claudio Bantaloukas
Some intrinsics require setting the fpm register before calling the specific asm opcode required. In order to simplify review, this patch: - adds the fpm_mode_index attribute to function_group_info and function_instance objects - updates existing initialisations and call sites. - updates equalit

Re: [PATCH v2 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
Please disregard this series, posted as v2 by mistake. Cheers, Claudio On 11/13/2024 4:34 PM, Claudio Bantaloukas wrote: The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an

[PATCH v3 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an additional argument of type fpm_t. The following patches introduce: - the types - intrinsics that operate without the fpm_

[PATCH v3 2/4] aarch64: specify fpm mode in function instances and groups

2024-11-13 Thread Claudio Bantaloukas
Some intrinsics require setting the fpm register before calling the specific asm opcode required. In order to simplify review, this patch: - adds the fpm_mode_index attribute to function_group_info and function_instance objects - updates existing initialisations and call sites. - updates equalit

[PATCH v2 0/4] aarch64: Add fp8 sve foundation

2024-11-13 Thread Claudio Bantaloukas
The ACLE defines a new set of fp8 vector types and intrinsics that operate on these, some of them operating on the vectors as if they were bags of bits and some requiring an additional argument of type fpm_t. The following patches introduce: - the types - intrinsics that operate without the fpm_

[PATCH v2 3/4] aarch64: add svcvt* FP8 intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds the following intrinsics: - svcvt1_bf16[_mf8]_fpm - svcvt1_f16[_mf8]_fpm - svcvt2_bf16[_mf8]_fpm - svcvt2_f16[_mf8]_fpm - svcvtlt1_bf16[_mf8]_fpm - svcvtlt1_f16[_mf8]_fpm - svcvtlt2_bf16[_mf8]_fpm - svcvtlt2_f16[_mf8]_fpm - svcvtn_mf8[_f16_x2]_fpm (unpredicated) - svcvtnb_mf8[_f32_

[PATCH] RISC-V: Tie MUL and DIV masks to the M extension

2024-11-13 Thread Dimitar Dimitrov
When configuring GCC for RV32EC with: ./configure \ --target=riscv32-none-elf \ --with-multilib-generator="rv32ec-ilp32e--" \ --with-abi=ilp32e \ --with-arch=rv32ec Then the build fails becaus

[PATCH v3 3/4] aarch64: add svcvt* FP8 intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds the following intrinsics: - svcvt1_bf16[_mf8]_fpm - svcvt1_f16[_mf8]_fpm - svcvt2_bf16[_mf8]_fpm - svcvt2_f16[_mf8]_fpm - svcvtlt1_bf16[_mf8]_fpm - svcvtlt1_f16[_mf8]_fpm - svcvtlt2_bf16[_mf8]_fpm - svcvtlt2_f16[_mf8]_fpm - svcvtn_mf8[_f16_x2]_fpm (unpredicated) - svcvtnb_mf8[_f32_

[PATCH v3 4/4] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds support for the following intrinsics: - svmlalb[_f16_mf8]_fpm - svmlalb[_n_f16_mf8]_fpm - svmlalt[_f16_mf8]_fpm - svmlalt[_n_f16_mf8]_fpm - svmlalb_lane[_f16_mf8]_fpm - svmlalt_lane[_f16_mf8]_fpm - svmlallbb[_f32_mf8]_fpm - svmlallbb[_n_f32_mf8]_fpm - svmlallbt[_f32_mf8]_fpm - svml

[PATCH] RISC-V: Add VLS modes to strided loads.

2024-11-13 Thread Robin Dapp
Hi, this patch adds VLS modes to the strided load expanders. Regtested on rv64gcv and handing it over to the CI. Regards Robin gcc/ChangeLog: * config/riscv/autovec.md: Add VLS modes. * config/riscv/vector-iterators.md: Ditto. * config/riscv/vector.md: Ditto. --- gcc/

[PATCH] ada: PR target/117538 Traceback includes load address if executable is PIE.

2024-11-13 Thread Simon Wright
If s-trasym.adb (System.Traceback.Symbolic, used as a renaming by GNAT.Traceback.Symbolic) is given a traceback from a position-independent executable, it does not include the executable's load address in the report. This is necessary in order to decode the traceback report. Note, this has already

[PATCH] cfgexpand: Skip doing conflicts if there is only 1 variable

2024-11-13 Thread Andrew Pinski
This is a small speed up. If there is only one know stack variable, there is no reason figure out the scope conflicts as there are none. So don't go through all the live range calculations just to see there are none. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog:

[PATCH v2 4/4] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-13 Thread Claudio Bantaloukas
This patch adds support for the following intrinsics: - svmlalb[_f16_mf8]_fpm - svmlalb[_n_f16_mf8]_fpm - svmlalt[_f16_mf8]_fpm - svmlalt[_n_f16_mf8]_fpm - svmlalb_lane[_f16_mf8]_fpm - svmlalt_lane[_f16_mf8]_fpm - svmlallbb[_f32_mf8]_fpm - svmlallbb[_n_f32_mf8]_fpm - svmlallbt[_f32_mf8]_fpm - svml

[pushed] aarch64: Relax add_overloaded_function assert

2024-11-13 Thread Richard Sandiford
There are some SVE intrinsics that support one set of suffixes for one extension (E1, say) and another set of suffixes for another extension (E2, say). It is usually the case that, mutatis mutandis, E2 extends E1. Listing E1 first would then ensure that the manual C overload would also require E1

Re: [PATCH] Match: Fold pow calls to ldexp when possible [PR57492]

2024-11-13 Thread Soumya AR
> On 13 Nov 2024, at 2:49 PM, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Wed, 13 Nov 2024, Soumya AR wrote: > >> >> >>> On 12 Nov 2024, at 6:19 PM, Richard Biener wrote: >>> >>> External email: Use caution opening links or attachments >>>

[PATCH 3/6] aarch64: Improve early_ra dump information

2024-11-13 Thread Richard Sandiford
The early-ra pass often didn't print a dump message when aborting the allocation. This patch uses a similar helper to the previous patch. gcc/ * config/aarch64/aarch64-early-ra.cc (early_ra::record_allocation_failure): New member function. (early_ra::get_allocno_subgroup):

[PATCH 2/6] aarch64: Add early_ra::record_live_range_failure

2024-11-13 Thread Richard Sandiford
So far, early_ra has used a single m_allocation_successful bool to record whether the current region is still being allocated. But there are (at least) two reasons why we might pull out of attempting an allocation: (1) We can't track the liveness of individual FP allocnos, due to some awkward

[PATCH 6/6] aarch64: Improve early-ra handling of reductions

2024-11-13 Thread Richard Sandiford
At the moment, early-ra ducks out of allocating any region that contains a register with both a strong FPR affinity and a strong GPR affinity. The proper allocators are much better at handling that situation. But this means that early-ra tends not to allocate a region of vector code that ends in

[PATCH 4/4] Remove dead code related to VEC_COND_EXPR expansion from ISEL

2024-11-13 Thread Richard Biener
ISEL was introduced to translate vector comparison and vector condition combinations back to internal function calls mapping to one of the vcond[u][_eq][_mask] and vec_cmp[_eq] optabs. With removing the legacy non-mask vcond expanders we now rely on all vector comparisons and vector conditions to

Re: [PATCH v3 05/23] aarch64: Add ACLE __chkfeat intrinsic

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > Note that compared to __builtin_aarch64_chkfeat (x) the ACLE __chkfeat(x) > flips the bits to be more intuitive (xor the input to output). > > gcc/ChangeLog: > * config/aarch64/arm_acle.h (__chkfeat): New. > --- > gcc/config/aarch64/arm_acle.h | 13 + >

[PATCH 2/4] Avoid expand_vec_cond_expr_p with comparison code

2024-11-13 Thread Richard Biener
This removes the obsolete API use by vector divmod lowering. Bootstrapped and tested on x86_64-unknown-linux-gnu. * tree-vect-generic.cc (expand_vector_divmod): Query vector comparison and vec_cond_mask capability. --- gcc/tree-vect-generic.cc | 4 +++- 1 file changed, 3 insertio

[PATCH 1/4] Remove last comparison-code expand_vec_cond_expr_p call from vectorizer

2024-11-13 Thread Richard Biener
The following refactors the check with the last remaininig expand_vec_cond_expr_p call with a comparison code to make it obvious we are not relying on those anymore. Bootstrapped and tested on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (vectorizable_condition): Refactor target

[PATCH 0/6] aarch64: Some tweaks to early-ra

2024-11-13 Thread Richard Sandiford
This series makes some minor tweaks to early-ra. The main patch is really the last one, which tries to apply early-ra to a situation that it currently avoids handling. It removes some MOVs from x264, for a very minor speed improvement. Bootstrapped & regression-tested on aarch64-linux-gnu. Also

[PATCH 4/6] aarch64: Relax early_ra treatment of modes_tieable_p

2024-11-13 Thread Richard Sandiford
At least on aarch64, modes_tieable_p is a stricter condition than can_change_mode_class. can_change_mode_class tells us whether the subreg rules produce a sensible result for a particular mode change. modes_tieable_p in addition tells us whether a mode change is reasonable for optimisation purpose

[PATCH 1/6] aarch64: Split early_ra::record_insn_refs

2024-11-13 Thread Richard Sandiford
record_insn_refs has three distinct phases: model the definitions, model any call, and model the uses. This patch splits each phase out into its own function. This isn't beneficial on its own, but it helps with later patches. gcc/ * config/aarch64/aarch64-early-ra.cc (early_ra::r

[PATCH 5/6] aarch64: Extend early-ra splitting of single-block regions

2024-11-13 Thread Richard Sandiford
When early-ra treats a block as an isolated allocation region, it opportunistically splits the block into smaller regions at points where no FPRs or FPR allocnos are live. Previously it only did this if m_allocation_successful, since the contrary included cases in which the live range information

[PATCH] Do not consider overrun for VMAT_ELEMENTWISE

2024-11-13 Thread Richard Biener
When we classify an SLP access as VMAT_ELEMENTWISE we still consider overrun - the reset of it is later overwritten. The following fixes this, resolving a few RISC-V FAILs with --param vect-force-slp=1. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (get_

Re: [PATCH v3 06/23] aarch64: Add __builtin_aarch64_chkfeat and __chkfeat tests

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/acle/chkfeat-1.c: New test. > * gcc.target/aarch64/chkfeat-1.c: New test. > * gcc.target/aarch64/chkfeat-2.c: New test. > > Co-authored-by: Yury Khrustalev > Co-authored-by: Rich

Re: [PATCH v3 08/23] aarch64: Add GCS builtins

2024-11-13 Thread Richard Sandiford
Yury Khrustalev writes: > From: Szabolcs Nagy > > Add new builtins for GCS: > > void *__builtin_aarch64_gcspr (void) > uint64_t __builtin_aarch64_gcspopm (void) > void *__builtin_aarch64_gcsss (void *) > > The builtins are always enabled, but should be used behind runtime > checks in case t

[PATCH 3/4] Streamline vector lowering of VEC_COND_EXPRs

2024-11-13 Thread Richard Biener
The following makes sure to lower all VEC_COND_EXPRs that we cannot trivially expand. Bootstrapped and tested on x86_64-unknown-linux-gnu. * tree-vect-generic.cc (expand_vector_condition): Lower vector conditions that we cannot trivially expand. --- gcc/tree-vect-generic.cc | 28

Re: [PATCH v4 5/7] OpenMP: common C/C++ testcases for dispatch + adjust_args

2024-11-13 Thread Paul-Antoine Arras
Here is an updated version of the patch following earlier reviews in the series. -- PAcommit 8f67de476decf151f853d68eb26223200535cc57 Author: Paul-Antoine Arras Date: Fri May 24 19:04:35 2024 +0200 OpenMP: common C/C++ testcases for dispatch + adjust_args gcc/testsuite/ChangeLog:

[PATCH] RSIC-V: Fix ICE for unrecognizable insn `UNSPEC_VSETVL` for XTheadVector

2024-11-13 Thread Jin Ma
Since XTheadvector does not support vsetivli, vl needs to be put into registers during the expand phase. PR 116593 gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (function_expander::add_input_operand): Put const to GPR for vl * config/riscv/thead-vector.m

Re: [RFC/RFA] [PATCH v7 01/12] Implement internal functions for efficient CRC computation.

2024-11-13 Thread Mariam Arutunian
On Tue, Nov 12, 2024 at 12:31 AM Jeff Law wrote: > > > On 11/9/24 12:43 PM, Mariam Arutunian wrote: > > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster > > CRC generation. > > One performs bit-forward and the other bit-reversed CRC computation. > > If CRC optabs are suppo

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Richard Biener
On Tue, 12 Nov 2024, Richard Sandiford wrote: > Sorry for the slow review. I think Jeff's much better placed to comment > on this than I am, but here's a stab. Mostly it looks really good to me > FWIW. > > Andrew Carlotti writes: > > This pass is used to optimise assignments to the FPMR regist

Re: [PATCH] AArch64: Switch off early scheduling

2024-11-13 Thread Kyrylo Tkachov
> On 12 Nov 2024, at 18:55, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> Hi, >> > What do you think about disabling late scheduling as well? I think this would definitely need separate consideration and evaluation given the above. Another thing to con

Re: [PATCH v5 0/8] RISC-V: Add Function Multi-Versioning support

2024-11-13 Thread Kito Cheng
Pushed, thanks! On Tue, Nov 5, 2024 at 11:21 AM Yangyu Chen wrote: > > This patch series adds support for Function Multi-Versioning (FMV) to > RISC-V. The FMV feature allows users to specify multiple versions of a > function and select the version at runtime based on the target_clones > and targe

Re: [PATCH v3 ] i386: Add ix86_expand_integer_cst_argument

2024-11-13 Thread Richard Biener
On Wed, Nov 13, 2024 at 6:22 AM H.J. Lu wrote: > > On Wed, Nov 13, 2024 at 11:25 AM H.J. Lu wrote: > > > > On Wed, Nov 13, 2024 at 10:23 AM Hongtao Liu wrote: > > > > > > On Wed, Nov 13, 2024 at 8:29 AM H.J. Lu wrote: > > > > > > > > On Wed, Nov 13, 2024 at 5:57 AM H.J. Lu wrote: > > > > > > >

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Andrew Carlotti
On Tue, Nov 12, 2024 at 10:42:50PM +, Richard Sandiford wrote: > Sorry for the slow review. I think Jeff's much better placed to comment > on this than I am, but here's a stab. Mostly it looks really good to me > FWIW. > > Andrew Carlotti writes: > > This pass is used to optimise assignment

Re: [PATCH 0/7] v3 of libdiagnostics

2024-11-13 Thread David Malcolm
On Wed, 2024-08-21 at 10:34 +0200, Richard Biener wrote: > On Wed, Aug 21, 2024 at 2:01 AM David Malcolm > wrote: > > > > On Tue, 2024-08-20 at 11:49 +0200, Richard Biener wrote: > > > On Thu, Aug 15, 2024 at 8:13 PM David Malcolm > > > > > > wrote: > > > > > > > > Here's v3 of my patch kit for

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Richard Sandiford
Andrew Carlotti writes: > On Tue, Nov 12, 2024 at 10:42:50PM +, Richard Sandiford wrote: >> Sorry for the slow review. I think Jeff's much better placed to comment >> on this than I am, but here's a stab. Mostly it looks really good to me >> FWIW. >> >> Andrew Carlotti writes: >> > This pa

[PATCH 0/8] v4 of libdiagnostics

2024-11-13 Thread David Malcolm
Here's v4 of my patch kit for "libdiagnostics", which makes GCC's diagnostics subsystem available as a shared library; see: https://gcc.gnu.org/wiki/libdiagnostics New in v4: * tutorial and API documentation (see patch 4) * added DIAGNOSTIC_SARIF_VERSION_2_2_PRERELEASE * reimplemented FAIL_IF_NU

[PATCH 2/8] libdiagnostics v4: implementation

2024-11-13 Thread David Malcolm
Changed in v4: * Updated for the various changes to diagnostics in trunk * Reimplement FAIL_IF_NULL to stop checks being optimized away Changed in v3: * Added a --enable-libdiagnostics to configure.ac. It is disabled by default, and requires --enable-host-shared. * Split out gcc/testsuite/libdi

[patch,lra] PR117191 remove unnecessary CLOBBER insns after LRA

2024-11-13 Thread Denis Chertykov
The fix for PR117191 Wrong code appears after dse2 pass because it removes necessary insns. (ie insn 554 - store to frame spill slot) This happened because LRA pass doesn't cleanup the code exactly like reload does. The reload1.c has a special pass for such cleanup. The reload removes CLOBBER in

[PATCH 3/8] libdiagnostics: add API docs

2024-11-13 Thread David Malcolm
gcc/ChangeLog: * doc/libdiagnostics/Makefile: New file. * doc/libdiagnostics/conf.py: New file. * doc/libdiagnostics/index.rst: New file. * doc/libdiagnostics/make.bat: New file. * doc/libdiagnostics/topics/diagnostic-manager.rst: New file. * doc/libd

Re: [PATCH] Add new hardreg PRE pass

2024-11-13 Thread Richard Sandiford
Richard Biener writes: > On Tue, 12 Nov 2024, Richard Sandiford wrote: > >> Sorry for the slow review. I think Jeff's much better placed to comment >> on this than I am, but here's a stab. Mostly it looks really good to me >> FWIW. >> >> Andrew Carlotti writes: >> > This pass is used to optimi

[PATCH 1/8] libdiagnostics v4: header

2024-11-13 Thread David Malcolm
Changed in v4: * added DIAGNOSTIC_SARIF_VERSION_2_2_PRERELEASE Changed in v3: * Added support for execution paths * Moved the test cases to another patch * diagnostic_manager_add_sarif_sink: add param "main_input_file" * Added diagnostic_text_sink_set_colorize * Added DIAGNOSTIC_LEVEL_SORRY * Upda

[PATCH 5/8] testsuite: move dg-test cleanup code from gcc-dg.exp to its own file

2024-11-13 Thread David Malcolm
I need to use this cleanup logic for the testsuite for libdiagnostics where it's too awkward to directly use gcc-dg.exp itself. No functional change intended. gcc/testsuite/ChangeLog: * lib/dg-test-cleanup.exp: New file, from material moved from lib/gcc-dg.exp. * lib/gcc-d

[PATCH 6/8] libdiagnostics v4: test suite

2024-11-13 Thread David Malcolm
Changed in v4: * Fix SARIF schema URL * Various changes to help with API docs Changed in v3: * split out the C and C++ API tests into this patch * heavily rewritten libdiagnostics.exp; added support for Python tests * tests updated for API changes, rewritten and extended gcc/testsuite/ChangeLog:

[PATCH 7/8] json: add json parsing support

2024-11-13 Thread David Malcolm
This patch implements JSON parsing support. It's based on the parsing parts of the patch I posted here: https://gcc.gnu.org/legacy-ml/gcc-patches/2017-08/msg00417.html with the parsing moved to a separate source file and header, heavily rewritten to capture source location information for JSON val

[PATCH 4/8] libdiagnostics v4: add C++ wrapper API

2024-11-13 Thread David Malcolm
Unchanged in v4 Changed in v3: * Moved the testsuite to a separate patch * Updated copyright year * class text_sink: New. * class file: Add default ctor, copy ctor, move ctor; make m_inner non-const * class physical_location: Add default ctor * class logical_location: Make m_inner non-const * cl

[PATCH V3, 04/11] Change TARGET_POPCNTB to TARGET_POWER5

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_POPCNTB to TARGET_POWER5. The POPCNTB instruction was added in ISA 2.02 (power5). I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test ca

[PATCH V3, 05/11] Change TARGET_FPRND to TARGET_POWER5X

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_FPRND to TARGET_POWER5X. The FPRND instruction was added in power5+. I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test case that used

[PATCH V3, 07/11] Change TARGET_POPCNTD to TARGET_POWER7

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_POPCNTD to TARGET_POWER7. The POPCNTD instruction was added in power7 (ISA 2.06). I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test ca

[PATCH V3, 03/11] Do not allow -mvsx to boost processor to power7.

2024-11-13 Thread Michael Meissner
This patch restructures the code so that -mvsx for example will not silently convert the processor to power7. The user must now use -mcpu=power7 or higher. This means if the user does -mvsx and the default processor does not have VSX support, it will be an error. I have built both big endian and

[PATCH V3, 02/11] Use architecture flags for defining _ARCH_PWR macros.

2024-11-13 Thread Michael Meissner
For the newer architectures, this patch changes GCC to define the _ARCH_PWR macros using the new architecture flags instead of relying on isa options like -mpower10. The -mpower8-internal, -mpower10, and -mpower11 options were removed. The -mpower11 option was removed completely, since it was jus

[PATCH V3, 01/11] Add rs6000 architecture masks.

2024-11-13 Thread Michael Meissner
Note, this patch fixes the attribution and the copyright year from the previous V2 page. This patch begins the journey to move architecture bits that are not user ISA options from rs6000_isa_flags to a new targt variable rs6000_arch_flags. The intention is to remove switches that are currently is

[PATCH V3, 00/11] Separate PowerPC archiecture bits from ISA flags that use command line option.

2024-11-13 Thread Michael Meissner
These patches replaces the first patch in the 11 patch set that separates PowerPC architecture bits from ISA flags that use command line options. The V2 patch thread starts at: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668177.html The are two differences from the previous patches:

[PATCH V3, 08/11] Change TARGET_MODULO to TARGET_POWER9

2024-11-13 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of TARGET_MODULO to TARGET_POWER9. The modulo instructions were added in power9 (ISA 3.0). Note, I did not change the uses of TARGET_MODULO where it was explicitly generating different code if the machine had a modulo instruct

[PATCH V3, 09/11] Update tests to work with architecture flags changes.

2024-11-13 Thread Michael Meissner
Two tests used -mvsx to raise the processor level to at least power7. These tests were rewritten to add cpu=power7 support. I have built both big endian and little endian bootstrap compilers and there were no regressions. In addition, I constructed a test case that used every archiecture define

[PATCH V3, 10/11] Add support for -mcpu=future

2024-11-13 Thread Michael Meissner
This patch adds the support that can be used in developing GCC support for future PowerPC processors. 2024-11-13 Michael Meissner * config.gcc (powerpc*-*-*): Add support for --with-cpu=future. * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future. * conf

Re: [PATCH v4 4/7] OpenMP: C++ front-end support for dispatch + adjust_args

2024-11-13 Thread Tobias Burnus
Hi PA, thanks for the updated patch! Paul-Antoine Arras wrote: OpenMP: C++ front-end support for dispatch + adjust_args This patch adds C++ support for the `dispatch` construct and the `adjust_args` clause. It relies on the c-family bits comprised in the corresponding C f

Re: [PATCH] c++: Add __builtin_operator_{new,delete} support

2024-11-13 Thread Jan Hubicka
> Hi! > > clang++ adds __builtin_operator_{new,delete} builtins which as documented > work similarly to ::operator {new,delete}, except that it is an error > if the called ::operator {new,delete} is not a replaceable global operator > and allow optimizations which C++ normally allows just when tho

[PATCH] tree-optimization/117556 - SLP of live stmts from load-lanes

2024-11-13 Thread Richard Biener
The following fixes SLP live lane generation for load-lanes which fails to analyze for gcc.dg/vect/vect-live-slp-3.c because the VLA division doesn't work out but it would also wrongly use the transposed vector defs I think. The following properly disables the actual load-lanes SLP node from live

[PATCH] tree-optimization/117554 - correct single-element interleaving check

2024-11-13 Thread Richard Biener
In addition to a single DR we also require a single lane, not a splat. Boostrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/117554 * tree-vect-stmts.cc (get_group_load_store_type): We can use gather/scatter only for a single-lane single element gr

Re: [PATCH] i386: Add -mveclibabi=aocl [PR56504]

2024-11-13 Thread Filip Kastl
On Wed 2024-11-13 15:18:32, Jan Hubicka wrote: > > - sincos and all functions working with arrays ... Because these > > functions have pointer arguments and that would require a bigger > > rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever > > generates calls to these funct

Re: [PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

2024-11-13 Thread Robin Dapp
OK. For your other patch I suggest you resubmit with the RISC-V typo fixed so the CI can pick it up. Generally, it looks reasonable. -- Regards Robin

[committed] hppa: Remove inner `fix:SF/DF` from fixed-point patterns

2024-11-13 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to all active branches. Dave --- hppa: Remove inner `fix:SF/DF` from fixed-point patterns 2024-11-13 John David Anglin gcc/ChangeLog: PR target/117525 * config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:S

Re: Implement removal of malloc/free pairs with NULL check

2024-11-13 Thread Richard Biener
On Wed, 6 Nov 2024, Jan Hubicka wrote: > Hi, > this is updated patch which adds -fmalloc-dce flag to control malloc/free > removal. I ended up copying what -fallocation-dse does so -fmalloc-dce=1 > enables malloc/free removal provided return value is unused otherwise and > -fmalloc-dce=2 allows a

Re: [PATCH] v2: Run selftests for C++ as well as C

2024-11-13 Thread Thomas Schwinge
Hi! I'd like to add selftests for an aspect of the GCC/nvptx back end's multilib configuration, outside of the language front ends: at Makefile/shell level. Looking into GCC's selftest implementation, I found one issue to potentially refactor: On 2018-10-13T09:12:03-0400, David Malcolm wrote: >

Re: [PATCH v2] xtensa: Fix the issue in "*extzvsi-1bit_addsubx"

2024-11-13 Thread Alexey Lapshin
Takayuki, thank you for the quick fix! It seems works good now except only one degradation. Instead generating two instructions: 7 ptr += (i & 1); 0x40078564 <+12>:extui a9, a8, 0, 1 0x40078567 <+15>:addx2 a2, a9, a2 Now it generates three: 7 ptr

[committed] libstdc++: Fix calculation of system time in performance tests

2024-11-13 Thread Jonathan Wakely
The system_time() function used the wrong element of the splits array. Also add a comment about the units for time measurements. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_performance.h (time_counter): Add comment about times. (time_counter::system_time): Use corr

Ping: [PATCH 0/6] PowerPC Future support (Dense Math Registers)

2024-11-13 Thread Michael Meissner
Ping the following patch series to add PowerPC Future support for Dense Math Registers: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/62.html https://gcc.gnu.org/pipermail/gcc-patches/2024-October/63.html https://gcc.gnu.org/pipermail/gcc-patches/2024-October/64.html https://g

  1   2   >