Re: [PATCH v2 7/8] i386: Add else operand to masked loads.

2024-10-28 Thread Robin Dapp
CC'ing/pinging i386 maintainers before posting a third revision. > This patch adds a zero else operand to masked loads, in particular the > masked gather load builtins that are used for gather vectorization. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (ix86_expand_special_args_builti

[PATCH] testcase: Add testcase for tree-optimization/117341

2024-10-28 Thread Andrew Pinski
Even though PR 117341 was a duplicate of PR 116768, another testcase this time C++ does not hurt to have. The testcase is a self-contained and does not use directly libstdc++ except for operator new (it does not even call delete). Tested on x86_64-linux-gnu with it working. PR tree-optimi

Re: [RFC][PATCH] Adjust param_vect_max_version_for_alias_checks

2024-10-28 Thread Kugan Vivekanandarajah
Hi Richard, Thanks for the review. > On 25 Oct 2024, at 8:53 pm, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Fri, Oct 25, 2024 at 12:22 AM Kugan Vivekanandarajah > wrote: >> >> Hi, >> >> This patch sets param_vect_max_version_for_alias_check

[PATCH] ifcombine: For short circuit case, allow 2 defining statements [PR85605]

2024-10-28 Thread Andrew Pinski
r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs into using either AND or OR. But it only allowed the inner condition basic block having the conditional only. This changes to allow up to 2 defining statements as long as they are just nop conversions for either the lhs or r

Re: [PATCH 0/7] Support Intel Diamond Rapid new features

2024-10-28 Thread Hongtao Liu
On Tue, Oct 22, 2024 at 2:31 PM Haochen Jiang wrote: > > Hi all, > > ISE054 has just been released and you can find doc from here: > > https://cdrdv2.intel.com/v1/dl/getContent/671368 > > Diamond Rapids features are added in this ISE, including AMX > related instructions, SM4 EVEX extension and MO

[pushed: r15-4727] testsuite: drop the "test-" prefix from sarif-output python scripts

2024-10-28 Thread David Malcolm
Drop the "text-" prefix from the various gcc.dg/sarif-output/test-*.py scripts so that the scripts are close to the .c files they are used by when the files are sorted by name. Successfully regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-4727-ga67594d1815272. gcc/testsuite/ChangeLog:

Re: [PATCH ver2 0/4] rs6000, remove redundant built-ins and add more test cases

2024-10-28 Thread Carl Love
Ping 3 On 10/17/24 1:31 PM, Carl Love wrote: Ping 2 On 10/9/24 7:44 AM, Carl Love wrote: Ping On 10/1/24 8:12 AM, Carl Love wrote: GCC maintainers: The following version 2 of a series of patches for PowerPC removes some built-ins that are covered by existing overloaded built-ins. Add

Re: [PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-10-28 Thread Carl Love
Ping 3 On 10/17/24 1:31 PM, Carl Love wrote: Ping 2 On 10/9/24 7:43 AM, Carl Love wrote: Ping, FYI this is a fairly simple fix to a testcase. On 10/3/24 8:11 AM, Carl Love wrote: GCC maintainers: The builtins-1-10-runnable.c has the debugging inadvertently enabled.  The test uses #ifdef

Re: [PATCH] c, v3: Add __builtin_stdc_rotate_{left,right} builtins [PR117030]

2024-10-28 Thread Joseph Myers
On Mon, 28 Oct 2024, Jakub Jelinek wrote: > Ok, here is an updated patch, which for ubsan checks just for negative > count and nothing else, does that check before using TRUNC_MOD_EXPR on > the argument and uses it on unsigned types in all cases. > The c_fully_fold_internal new wording is removed

Re: [PATCH 0/4] sched1 improvements

2024-10-28 Thread Jeff Law
On 10/28/24 4:24 PM, Vineet Gupta wrote: Ping ! Pong. I've got a response to the first patch partially written :-) Exec summary is I don't have a problem with functionality in that patch, just naming/comments stuff. Still trying to figure out how to express it clearly. jeff

Re: [PATCH 0/4] sched1 improvements

2024-10-28 Thread Vineet Gupta
Ping ! On 10/20/24 12:40, Vineet Gupta wrote: > Hi, > > PFA patch series which improves sched1 spilling. This all started with > SPEC2017 507.Cactu dynamic icounts on RISC-V being double than those of > aarch64 (~2.6 trillion vs. ~1.4 trillion). Robin/Jeff hinted that the > issue could be sched1 w

[PATCH v2 2/3] Only do switch bit test clustering when multiple labels point to same bb

2024-10-28 Thread Andi Kleen
From: Andi Kleen The bit cluster code generation strategy is only beneficial when multiple case labels point to the same code. Do a quick check if that is the case before trying to cluster. This fixes the switch part of PR117091 where all case labels are unique however it doesn't address the per

[PATCH v2 3/3] Simplify switch bit test clustering algorithm

2024-10-28 Thread Andi Kleen
From: Andi Kleen The current switch bit test clustering enumerates all possible case clusters combinations to find ones that fit the bit test constrains best. This causes performance problems with very large switches. For bit test clustering which happens naturally in word sized chunks I don't

[PATCH v2 1/3] Disable -fbit-tests and -fjump-tables at -O0

2024-10-28 Thread Andi Kleen
From: Andi Kleen gcc/ChangeLog: * common.opt: Enable -fbit-tests and -fjump-tables only at -O1. * opts.cc (default_options_table): Dito. --- gcc/common.opt | 4 ++-- gcc/opts.cc| 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/common.opt b/gcc/common

[committed] testsuite: Require atomic operations for pr47333_0

2024-10-28 Thread Dimitar Dimitrov
Since the test uses __sync_fetch_and_add, add a requirement for target to support atomic operations on int and long types. This fixes a spurious test failure on pru-unknown-elf, which lacks atomic ops. The test still passes on x86_64-linux-gnu. Pushed to trunk as obvious. gcc/testsuite/ChangeLog

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-28 Thread Robin Dapp
> I'm not sure how this is different to just deleting the > zero-initializer, which is what I already tested and found some random > behaviour? The difference is in the else-operand predicate. So unless there are more bugs we should only have added VCOND_EXPRs for the cases where they are absol

[PUSHED] testcase: Add testcase for PR 117330 [PR117330]

2024-10-28 Thread Andrew Pinski
This testcase was causing an ICE during vectorization due to r15-4695-gd17e672ce82e69 but was fixed with r15-4713-g0942bb85fc5573. Pushed as obvious after a quick test on x86_64-linux-gnu to make sure the testcase passes. PR tree-optimization/117330 gcc/testsuite/ChangeLog: * gc

Re: [PATCH v2 5/8] aarch64: Add masked-load else operands.

2024-10-28 Thread Robin Dapp
>> For the lack of a better idea I used a function call property to specify >> whether a builtin needs an else operand or not. Somebody with better >> knowledge of the aarch64 target can surely improve that. > > Yeah, those flags are really for source-level/gimple-level attributes. > Would it work

[RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-10-28 Thread Alex Coplan
This allows us to vectorize more loops with early exits by forcing peeling for alignment to make sure that we're guaranteed to be able to safely read an entire vector iteration without crossing a page boundary. To make this work for VLA architectures we have to allow compile-time non-constant targ

[PATCH 0/6] PowerPC Future support (Dense Math Registers)

2024-10-28 Thread Michael Meissner
This patch was posted a year or so during the GCC 14 patches, and I'm posting it again with the hopes that I can get this into GCC 15. In the GCC 14 time frame, 1,024 bit registers were not supported due to the bit length in internal structures. In GCC 15, 1,024 bit registers are now supported.

[PATCH 3/6] Add support for dense math registers.

2024-10-28 Thread Michael Meissner
The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the VSX registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in fut

[PATCH 6/6] Add support for 1,024 bit Dense Math Registers

2024-10-28 Thread Michael Meissner
This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit re

[PATCH 5/6] Add dense math test for new instruction names

2024-10-28 Thread Michael Meissner
This patch adds a test for the new dense math support. 2024-10-28 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. --- .../gcc.target/powerpc/dm-double-tes

[PATCH 4/6] Switch to dense math names for all MMA operations with -mcpu=future

2024-10-28 Thread Michael Meissner
This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. For the non-prefixed MMA instructi

[PATCH 2/6] Add wD constraint.

2024-10-28 Thread Michael Meissner
This patch adds a new constraint ('wD') that matches the accumulator registers that overlap with VSX registers 0..31 on power10. Future patches will add the support for a separate accumulator register class that will be used when the support for dense math registes is added. 2024-10-22 Michael

[PATCH 1/6] Use vector pair load/store for memcpy with -mcpu=future

2024-10-28 Thread Michael Meissner
In the development for the power10 processor, GCC did not enable using the load vector pair and store vector pair instructions when optimizing things like memory copy. This patch enables using those instructions if -mcpu=future is used. 2024-10-22 Michael Meissner gcc/ * config/rs600

[PATCH] gcc: fix 'statements' comment typo

2024-10-28 Thread Sam James
gcc/ChangeLog: * opts-common.cc (prune_options): Fix typo. --- Pushed as obvious. gcc/opts-common.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc index 22774457bf0f..ac2e77b16590 100644 --- a/gcc/opts-common.cc +++ b/gcc/opts

Re: [PATCH] testsuite: add testcase for fixed PR107467

2024-10-28 Thread Sam James
Mike Stump writes: > On Oct 25, 2024, at 12:47 PM, Sam James wrote: >> >> PR107467 ended up being fixed by the fix for PR115110, but let's >> add the testcase on top. >> >> gcc/testsuite/ChangeLog: >> PR tree-optimization/107467 >> PR middle-end/115110 >> >> * g++.dg/lto/pr1074

[COMMITTED] - Fix bitwise_or logic for prange.

2024-10-28 Thread Andrew MacLeod
Fixed. Bootstrapped on with no regressions.  Pushed. Andrew On 10/28/24 10:25, Mikael Morin wrote: Le 28/10/2024 à 14:38, Andrew MacLeod a écrit : On 10/26/24 15:08, Mikael Morin wrote: Hello, Le 24/10/2024 à 14:53, Andrew MacLeod a écrit : diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-

Re: [PATCH] testsuite: add testcase for fixed PR107467

2024-10-28 Thread Mike Stump
On Oct 25, 2024, at 12:47 PM, Sam James wrote: > > PR107467 ended up being fixed by the fix for PR115110, but let's > add the testcase on top. > > gcc/testsuite/ChangeLog: > PR tree-optimization/107467 > PR middle-end/115110 > > * g++.dg/lto/pr107467_0.C: New test. > --- > OK?

Re: [PATCH] libstdc++: Fix complexity of drop_view::begin const [PR112641]

2024-10-28 Thread Jonathan Wakely
On Fri, 25 Oct 2024 at 16:24, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports? OK for all (also approved on the forge). > Also available in PR form at https://forge.sourceware.org/gcc/gcc-TEST/pulls/8 > > -- >8 -- > > Views are required to have a amo

[PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-10-28 Thread Akram Ahmad
Case 7 of unsigned scalar saturating addition defines SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1 being commutative. The pattern for case 7 currently does not accept the alternative where Y is used in the condition. Ther

[PATCH v2 1/2] Match: support new case of unsigned scalar SAT_SUB

2024-10-28 Thread Akram Ahmad
This patch adds a new case for unsigned scalar saturating subtraction using a branch with a greater-than-or-equal condition. For example, X >= (X - Y) ? (X - Y) : 0 is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars, which therefore correctly matches more cases of IFN SAT_SUB. N

[PATCH v2 0/2] Match: support additional cases of unsigned scalar arithmetic

2024-10-28 Thread Akram Ahmad
Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. Regression tests for aarch64-none-linux-gnu all

[PATCH][committed] aarch64: Use implementation namespace for vxarq_u64 immediate argument

2024-10-28 Thread Kyrylo Tkachov
Hi all, Looks like this immediate variable was missed out when I last fixed the namespace issues in arm_neon.h. Fixed in the obvious manner. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov * config/aarch64/arm_neon.h (vxarq_u

[RFC PATCH 5/5] vect: Also cost gconds for scalar

2024-10-28 Thread Alex Coplan
Currently we only cost gconds for the vector loop while we omit costing them when analyzing the scalar loop; this unfairly penalizes the vector loop in the case of loops with early exits. This (together with the previous patches) enables us to vectorize std::find with 64-bit element sizes. gcc/Ch

[RFC PATCH 4/5] vect: Ensure we add vector skip guard even when versioning for aliasing

2024-10-28 Thread Alex Coplan
This fixes a latent wrong code issue whereby vect_do_peeling determined the wrong condition for inserting the vector skip guard. Specifically in the case where the loop niters are unknown at compile time we used to check: !LOOP_REQUIRES_VERSIONING (loop_vinfo) but LOOP_REQUIRES_VERSIONING is t

[RFC PATCH 2/5] vect: Don't guard scalar epilogue for inverted loops

2024-10-28 Thread Alex Coplan
For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always enter the scalar epilogue, so avoid emitting a guard on entry to the epilogue. gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an epilogue guard for inverted early-exit loops. --- gcc/t

[RFC PATCH 3/5] vect: Fix dominators when adding a guard to skip the vector loop

2024-10-28 Thread Alex Coplan
From: Tamar Christina The alignment peeling changes exposed a latent missing dominator update with early break vectorization, specifically when inserting the vector skip edge, since the new edge bypasses the prolog skip block and thus has the potential to subvert its dominance. This patch fixes

[RFC PATCH 0/5] vect: Force peeling for alignment to handle more early break loops

2024-10-28 Thread Alex Coplan
This patch series allows us to vectorize more loops with early exits by forcing peeling for alignment to make sure that we're guaranteed to be able to safely read an entire vector iteration without crossing a page boundary. The motivation is to vectorize search loops such as std::find. This shows

Re: [COMMITTED 4/4] - Implement pointer_or_operator.

2024-10-28 Thread Mikael Morin
Le 28/10/2024 à 14:38, Andrew MacLeod a écrit : On 10/26/24 15:08, Mikael Morin wrote: Hello, Le 24/10/2024 à 14:53, Andrew MacLeod a écrit : diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc index dd312a80366..ef2b2cce516 100644 --- a/gcc/range-op-ptr.cc +++ b/gcc/range-op-ptr.cc (...)

[committed] libstdc++: Fix tests for std::vector range operations

2024-10-28 Thread Jonathan Wakely
The commit I pushed was not the one I'd tested, so it had older versions of the tests, with bugs that I'd already fixed locally. This commit has the fixed tests that I'd intended to push in the first place. libstdc++-v3/ChangeLog: * testsuite/23_containers/vector/bool/cons/from_range.cc:

Re: counted_by attribute and type compatibility

2024-10-28 Thread Qing Zhao
> On Oct 27, 2024, at 05:02, Martin Uecker wrote: > >> > > For standard attributes, there is a policy that the attribute should > be ignorable, i.e. removing it from a valid program should not cause > any change in semantics. > > For GCC's attributes this is not necessarily the case, but I

Re: [COMMITTED 4/4] - Implement pointer_or_operator.

2024-10-28 Thread Andrew MacLeod
On 10/26/24 15:08, Mikael Morin wrote: Hello, Le 24/10/2024 à 14:53, Andrew MacLeod a écrit : diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc index dd312a80366..ef2b2cce516 100644 --- a/gcc/range-op-ptr.cc +++ b/gcc/range-op-ptr.cc (...) -void -pointer_or_operator::wi_fold (irange &r

[r15-4711 Regression] FAIL: gcc.target/i386/pr53533-3.c scan-assembler-times add(?:l|q)[ \t] 1 on Linux/x86_64

2024-10-28 Thread haochen.jiang
On Linux/x86_64, f1823d8037e355cd755087e695051d190ffe755e is the first bad commit commit f1823d8037e355cd755087e695051d190ffe755e Author: H.J. Lu Date: Sat Oct 12 05:53:49 2024 +0800 gcc.target/i386/pr53533-[13].c: Adjust assembly scan caused FAIL: gcc.target/i386/pr53533-1.c scan-assemb

Re: [PATCH 2/2] Match: make SAT_ADD case 7 commutative

2024-10-28 Thread Akram Ahmad
On 24/10/2024 16:06, Richard Biener wrote: Can you check whether removing the :c from the (plus in usadd_left_part_1 keeps things working? Hi Richard, Thanks for the feedback. I've written some tests and can confirm that they pass as expected with these two changes being made (removal of :c in

[committed][target/117316] Fix initializer for riscv code alignment handling

2024-10-28 Thread Jeff Law
The construct used for initializing the code alignments in a recent change is causing bootstrap problems on riscv64 as seen in the referenced bugzilla. This patch adjusts the initializer by pushing the NULL down into each uarch clause. Bootstrapped on riscv64, regression test in flight, but

Re: [PATCH v2 5/5] arm: [MVE intrinsics] Rework MVE vld/vst intrinsics

2024-10-28 Thread Richard Earnshaw (lists)
On 25/10/2024 19:47, Christophe Lyon wrote: > From: Alfie Richards > > Implement the mve vld and vst intrinsics using the MVE builtins framework. > > The main part of the patch is to reimplement to vstr/vldr patterns > such that we now have much fewer of them: > - non-truncating stores > - predi

Re: [PATCH] config: add -Werror=lto-type-mismatch,odr to bootstrap-lto*

2024-10-28 Thread Sam James
Sam James writes: > Sam James writes: > >> Add -Werror=lto-type-mismatch,odr to bootstrap-lto* configurations to >> help stop LTO breakage/correctness issues sneaking in. >> >> We discussed -Werror=strict-aliasing but it runs early and doesn't >> give better diagnostics with LTO so left it out.

Re: [PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

2024-10-28 Thread Richard Sandiford
Kyrylo Tkachov writes: > Hi all, > > The MD pattern for the XAR instruction in SVE2 is currently expressed with > non-canonical RTL by using a ROTATERT code with a constant rotate amount. > Fix it by using the left ROTATE code. This necessitates adjusting the rotate > amount during expand. > > A

[PATCH] tree-optimization/117307 - STMT_VINFO_SLP_VECT_ONLY mis-computation

2024-10-28 Thread Richard Biener
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all group members and when the group is later split due to duplicates not all sub-groups inherit the flag. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/117307 * tree-vect-data-refs.cc

Re: [PATCH] Allow BB vectorisation of scalar loop when ifcvt versioned loop is not vectorized

2024-10-28 Thread Richard Biener
On Mon, Oct 28, 2024 at 9:35 AM Kugan Vivekanandarajah wrote: > > Hi, > > When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If the > vector loop is not vectorized and removed, the scalar loop is still left with > dont_vectorize. As a result, BB vectorization will not happen. >

[PATCH] Allow BB vectorisation of scalar loop when ifcvt versioned loop is not vectorized

2024-10-28 Thread Kugan Vivekanandarajah
Hi, When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If the vector loop is not vectorized and removed, the scalar loop is still left with dont_vectorize. As a result, BB vectorization will not happen. This patch adds a new attribute called dont_loop_vectorize (that is differe

Re: [PATCH] vec-lowering: Fix ABSU lowering [PR111285]

2024-10-28 Thread Richard Biener
On Mon, Oct 28, 2024 at 12:41 AM Andrew Pinski wrote: > > ABSU_EXPR lowering incorrectly used the resulting type > for the new expression but in the case of ABSU the resulting > type is an unsigned type and with ABSU is folded away. The fix > is to use a signed type for the expression instead. > >

Re: [PATCH] phiopt: Move check for maybe_undef_p slightly earlier

2024-10-28 Thread Richard Biener
On Mon, Oct 28, 2024 at 12:42 AM Andrew Pinski wrote: > > This moves the check for maybe_undef_p in match_simplify_replacement > slightly earlier before figuring out the true/false arg using arg0/arg1 > instead. > In most cases this is no difference in compile time; just in the case > there is an

[PATCH] c, v3: Add __builtin_stdc_rotate_{left,right} builtins [PR117030]

2024-10-28 Thread Jakub Jelinek
On Fri, Oct 25, 2024 at 08:06:36PM +, Joseph Myers wrote: > If sanitizing makes sense for these built-in functions, surely it should > check for all negative shifts, including those that are multiples of the > width (and there should be tests for it in the testsuite). So sanitizing > would

[PATCH] RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return value[pr117286]

2024-10-28 Thread Li Xu
From: xuli This patch fixes following ICE: test.c: In function 'func': test.c:37:24: internal compiler error: Segmentation fault 37 | vfloat16mf2_t vc = __riscv_vlmul_trunc_v_f16m1_f16mf2(vb); |^~ The root cause is that vl

Re: [PATCH] RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return value[pr117286]

2024-10-28 Thread 钟居哲
LGTM. Thanks for fixing it. juzhe.zh...@rivai.ai From: Li Xu Date: 2024-10-28 14:28 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli Subject: [PATCH] RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return value[pr117286] From: xuli This patch fixes following ICE: