CC'ing/pinging i386 maintainers before posting a third revision.
> This patch adds a zero else operand to masked loads, in particular the
> masked gather load builtins that are used for gather vectorization.
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (ix86_expand_special_args_builti
Even though PR 117341 was a duplicate of PR 116768, another
testcase this time C++ does not hurt to have.
The testcase is a self-contained and does not use directly libstdc++
except for operator new (it does not even call delete).
Tested on x86_64-linux-gnu with it working.
PR tree-optimi
Hi Richard,
Thanks for the review.
> On 25 Oct 2024, at 8:53 pm, Richard Biener wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Fri, Oct 25, 2024 at 12:22 AM Kugan Vivekanandarajah
> wrote:
>>
>> Hi,
>>
>> This patch sets param_vect_max_version_for_alias_check
r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs
into using either AND or OR. But it only allowed the inner condition
basic block having the conditional only. This changes to allow up to 2 defining
statements as long as they are just nop conversions for either the lhs or r
On Tue, Oct 22, 2024 at 2:31 PM Haochen Jiang wrote:
>
> Hi all,
>
> ISE054 has just been released and you can find doc from here:
>
> https://cdrdv2.intel.com/v1/dl/getContent/671368
>
> Diamond Rapids features are added in this ISE, including AMX
> related instructions, SM4 EVEX extension and MO
Drop the "text-" prefix from the various gcc.dg/sarif-output/test-*.py
scripts so that the scripts are close to the .c files they are used by
when the files are sorted by name.
Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-4727-ga67594d1815272.
gcc/testsuite/ChangeLog:
Ping 3
On 10/17/24 1:31 PM, Carl Love wrote:
Ping 2
On 10/9/24 7:44 AM, Carl Love wrote:
Ping
On 10/1/24 8:12 AM, Carl Love wrote:
GCC maintainers:
The following version 2 of a series of patches for PowerPC removes
some built-ins that are covered by existing overloaded built-ins.
Add
Ping 3
On 10/17/24 1:31 PM, Carl Love wrote:
Ping 2
On 10/9/24 7:43 AM, Carl Love wrote:
Ping, FYI this is a fairly simple fix to a testcase.
On 10/3/24 8:11 AM, Carl Love wrote:
GCC maintainers:
The builtins-1-10-runnable.c has the debugging inadvertently
enabled. The test uses #ifdef
On Mon, 28 Oct 2024, Jakub Jelinek wrote:
> Ok, here is an updated patch, which for ubsan checks just for negative
> count and nothing else, does that check before using TRUNC_MOD_EXPR on
> the argument and uses it on unsigned types in all cases.
> The c_fully_fold_internal new wording is removed
On 10/28/24 4:24 PM, Vineet Gupta wrote:
Ping !
Pong. I've got a response to the first patch partially written :-)
Exec summary is I don't have a problem with functionality in that patch,
just naming/comments stuff. Still trying to figure out how to express
it clearly.
jeff
Ping !
On 10/20/24 12:40, Vineet Gupta wrote:
> Hi,
>
> PFA patch series which improves sched1 spilling. This all started with
> SPEC2017 507.Cactu dynamic icounts on RISC-V being double than those of
> aarch64 (~2.6 trillion vs. ~1.4 trillion). Robin/Jeff hinted that the
> issue could be sched1 w
From: Andi Kleen
The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.
This fixes the switch part of PR117091 where all case labels are unique
however it doesn't address the per
From: Andi Kleen
The current switch bit test clustering enumerates all possible case
clusters combinations to find ones that fit the bit test constrains
best. This causes performance problems with very large switches.
For bit test clustering which happens naturally in word sized chunks
I don't
From: Andi Kleen
gcc/ChangeLog:
* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* opts.cc (default_options_table): Dito.
---
gcc/common.opt | 4 ++--
gcc/opts.cc| 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/gcc/common.opt b/gcc/common
Since the test uses __sync_fetch_and_add, add a requirement for
target to support atomic operations on int and long types.
This fixes a spurious test failure on pru-unknown-elf, which lacks
atomic ops. The test still passes on x86_64-linux-gnu.
Pushed to trunk as obvious.
gcc/testsuite/ChangeLog
> I'm not sure how this is different to just deleting the
> zero-initializer, which is what I already tested and found some random
> behaviour?
The difference is in the else-operand predicate. So unless there are
more bugs we should only have added VCOND_EXPRs for the cases where
they are absol
This testcase was causing an ICE during vectorization
due to r15-4695-gd17e672ce82e69 but was fixed with
r15-4713-g0942bb85fc5573.
Pushed as obvious after a quick test on x86_64-linux-gnu to
make sure the testcase passes.
PR tree-optimization/117330
gcc/testsuite/ChangeLog:
* gc
>> For the lack of a better idea I used a function call property to specify
>> whether a builtin needs an else operand or not. Somebody with better
>> knowledge of the aarch64 target can surely improve that.
>
> Yeah, those flags are really for source-level/gimple-level attributes.
> Would it work
This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.
To make this work for VLA architectures we have to allow compile-time
non-constant targ
This patch was posted a year or so during the GCC 14 patches, and I'm posting
it again with the hopes that I can get this into GCC 15. In the GCC 14 time
frame, 1,024 bit registers were not supported due to the bit length in internal
structures. In GCC 15, 1,024 bit registers are now supported.
The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
the VSX registers 0..31, but logically the accumulator registers were separate
from the FPR registers. In ISA 3.1, it was anticipated that in fut
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the
DMR register.
This patch only adds the new 1,024 bit register support. It does not add
support for any instructions that need 1,024 bit re
This patch adds a test for the new dense math support.
2024-10-28 Michael Meissner
gcc/testsuite/
* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
.../gcc.target/powerpc/dm-double-tes
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the
same bits for either spelling.
For the non-prefixed MMA instructi
This patch adds a new constraint ('wD') that matches the accumulator registers
that overlap with VSX registers 0..31 on power10. Future patches will add the
support for a separate accumulator register class that will be used when the
support for dense math registes is added.
2024-10-22 Michael
In the development for the power10 processor, GCC did not enable using the load
vector pair and store vector pair instructions when optimizing things like
memory copy. This patch enables using those instructions if -mcpu=future is
used.
2024-10-22 Michael Meissner
gcc/
* config/rs600
gcc/ChangeLog:
* opts-common.cc (prune_options): Fix typo.
---
Pushed as obvious.
gcc/opts-common.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 22774457bf0f..ac2e77b16590 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts
Mike Stump writes:
> On Oct 25, 2024, at 12:47 PM, Sam James wrote:
>>
>> PR107467 ended up being fixed by the fix for PR115110, but let's
>> add the testcase on top.
>>
>> gcc/testsuite/ChangeLog:
>> PR tree-optimization/107467
>> PR middle-end/115110
>>
>> * g++.dg/lto/pr1074
Fixed.
Bootstrapped on with no regressions. Pushed.
Andrew
On 10/28/24 10:25, Mikael Morin wrote:
Le 28/10/2024 à 14:38, Andrew MacLeod a écrit :
On 10/26/24 15:08, Mikael Morin wrote:
Hello,
Le 24/10/2024 à 14:53, Andrew MacLeod a écrit :
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-
On Oct 25, 2024, at 12:47 PM, Sam James wrote:
>
> PR107467 ended up being fixed by the fix for PR115110, but let's
> add the testcase on top.
>
> gcc/testsuite/ChangeLog:
> PR tree-optimization/107467
> PR middle-end/115110
>
> * g++.dg/lto/pr107467_0.C: New test.
> ---
> OK?
On Fri, 25 Oct 2024 at 16:24, Patrick Palka wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/backports?
OK for all (also approved on the forge).
> Also available in PR form at https://forge.sourceware.org/gcc/gcc-TEST/pulls/8
>
> -- >8 --
>
> Views are required to have a amo
Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.
The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Ther
This patch adds a new case for unsigned scalar saturating subtraction
using a branch with a greater-than-or-equal condition. For example,
X >= (X - Y) ? (X - Y) : 0
is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars,
which therefore correctly matches more cases of IFN SAT_SUB. N
Hi all,
This patch series adds support for 2 new cases of unsigned scalar saturating
arithmetic
(one addition, one subtraction). This results in more valid patterns being
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.
Regression tests for aarch64-none-linux-gnu all
Hi all,
Looks like this immediate variable was missed out when I last fixed the
namespace issues in arm_neon.h. Fixed in the obvious manner.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
* config/aarch64/arm_neon.h (vxarq_u
Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.
This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.
gcc/Ch
This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard. Specifically
in the case where the loop niters are unknown at compile time we used to
check:
!LOOP_REQUIRES_VERSIONING (loop_vinfo)
but LOOP_REQUIRES_VERSIONING is t
For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
epilogue guard for inverted early-exit loops.
---
gcc/t
From: Tamar Christina
The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance. This patch fixes
This patch series allows us to vectorize more loops with early exits by
forcing peeling for alignment to make sure that we're guaranteed to be
able to safely read an entire vector iteration without crossing a page
boundary.
The motivation is to vectorize search loops such as std::find. This
shows
Le 28/10/2024 à 14:38, Andrew MacLeod a écrit :
On 10/26/24 15:08, Mikael Morin wrote:
Hello,
Le 24/10/2024 à 14:53, Andrew MacLeod a écrit :
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index dd312a80366..ef2b2cce516 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
(...)
The commit I pushed was not the one I'd tested, so it had older versions
of the tests, with bugs that I'd already fixed locally. This commit has
the fixed tests that I'd intended to push in the first place.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/bool/cons/from_range.cc:
> On Oct 27, 2024, at 05:02, Martin Uecker wrote:
>
>>
>
> For standard attributes, there is a policy that the attribute should
> be ignorable, i.e. removing it from a valid program should not cause
> any change in semantics.
>
> For GCC's attributes this is not necessarily the case, but I
On 10/26/24 15:08, Mikael Morin wrote:
Hello,
Le 24/10/2024 à 14:53, Andrew MacLeod a écrit :
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index dd312a80366..ef2b2cce516 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
(...)
-void
-pointer_or_operator::wi_fold (irange &r
On Linux/x86_64,
f1823d8037e355cd755087e695051d190ffe755e is the first bad commit
commit f1823d8037e355cd755087e695051d190ffe755e
Author: H.J. Lu
Date: Sat Oct 12 05:53:49 2024 +0800
gcc.target/i386/pr53533-[13].c: Adjust assembly scan
caused
FAIL: gcc.target/i386/pr53533-1.c scan-assemb
On 24/10/2024 16:06, Richard Biener wrote:
Can you check whether removing the :c from the (plus in
usadd_left_part_1 keeps things
working?
Hi Richard,
Thanks for the feedback. I've written some tests and can confirm that they
pass as expected with these two changes being made (removal of :c in
The construct used for initializing the code alignments in a recent
change is causing bootstrap problems on riscv64 as seen in the
referenced bugzilla.
This patch adjusts the initializer by pushing the NULL down into each
uarch clause. Bootstrapped on riscv64, regression test in flight, but
On 25/10/2024 19:47, Christophe Lyon wrote:
> From: Alfie Richards
>
> Implement the mve vld and vst intrinsics using the MVE builtins framework.
>
> The main part of the patch is to reimplement to vstr/vldr patterns
> such that we now have much fewer of them:
> - non-truncating stores
> - predi
Sam James writes:
> Sam James writes:
>
>> Add -Werror=lto-type-mismatch,odr to bootstrap-lto* configurations to
>> help stop LTO breakage/correctness issues sneaking in.
>>
>> We discussed -Werror=strict-aliasing but it runs early and doesn't
>> give better diagnostics with LTO so left it out.
Kyrylo Tkachov writes:
> Hi all,
>
> The MD pattern for the XAR instruction in SVE2 is currently expressed with
> non-canonical RTL by using a ROTATERT code with a constant rotate amount.
> Fix it by using the left ROTATE code. This necessitates adjusting the rotate
> amount during expand.
>
> A
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all
group members and when the group is later split due to duplicates
not all sub-groups inherit the flag.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/117307
* tree-vect-data-refs.cc
On Mon, Oct 28, 2024 at 9:35 AM Kugan Vivekanandarajah
wrote:
>
> Hi,
>
> When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If the
> vector loop is not vectorized and removed, the scalar loop is still left with
> dont_vectorize. As a result, BB vectorization will not happen.
>
Hi,
When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If the
vector loop is not vectorized and removed, the scalar loop is still left with
dont_vectorize. As a result, BB vectorization will not happen.
This patch adds a new attribute called dont_loop_vectorize (that is differe
On Mon, Oct 28, 2024 at 12:41 AM Andrew Pinski wrote:
>
> ABSU_EXPR lowering incorrectly used the resulting type
> for the new expression but in the case of ABSU the resulting
> type is an unsigned type and with ABSU is folded away. The fix
> is to use a signed type for the expression instead.
>
>
On Mon, Oct 28, 2024 at 12:42 AM Andrew Pinski wrote:
>
> This moves the check for maybe_undef_p in match_simplify_replacement
> slightly earlier before figuring out the true/false arg using arg0/arg1
> instead.
> In most cases this is no difference in compile time; just in the case
> there is an
On Fri, Oct 25, 2024 at 08:06:36PM +, Joseph Myers wrote:
> If sanitizing makes sense for these built-in functions, surely it should
> check for all negative shifts, including those that are multiples of the
> width (and there should be tests for it in the testsuite). So sanitizing
> would
From: xuli
This patch fixes following ICE:
test.c: In function 'func':
test.c:37:24: internal compiler error: Segmentation fault
37 | vfloat16mf2_t vc = __riscv_vlmul_trunc_v_f16m1_f16mf2(vb);
|^~
The root cause is that vl
LGTM. Thanks for fixing it.
juzhe.zh...@rivai.ai
From: Li Xu
Date: 2024-10-28 14:28
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
Subject: [PATCH] RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return
value[pr117286]
From: xuli
This patch fixes following ICE:
58 matches
Mail list logo