Re: [PATCH] libgccjit: Make new_array_type take unsigned long

2024-06-27 Thread Antoni Boucher
Thanks for the review. I'm a bit concerned about using unsigned long. Would it be OK if I change the type to uint64_t? I could rename the function to gcc_jit_context_new_array_type_u64. Regards. Le 2024-06-26 à 11 h 34, David Malcolm a écrit : On Fri, 2024-02-23 at 09:55 -0500, Antoni Boucher wr

[RFC PATCH] cse: Add another CSE pass after split1

2024-06-27 Thread Palmer Dabbelt
This is really more of a question than a patch. Looking at PR/115687 I managed to convince myself there's a general class of problems here: splitting might produce constant subexpressions, but as far as I can tell there's nothing to eliminate those constant subexpressions. So I very quickly threw

Re: [PATCH 2/3] libstdc++: Optimize __uninitialized_default using memset

2024-06-27 Thread Jonathan Wakely
On Thu, 27 Jun 2024 at 14:27, Maciej Cencora wrote: > > I think going the bit_cast way would be the best because it enables the > optimization for many more classes including common wrappers like optional, > variant, pair, tuple and std::array. This isn't tested but seems to work on simple case

Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Thomas Schwinge
Hi! On 2024-06-27T23:20:18+0200, I wrote: > On 2024-06-27T22:27:21+0200, I wrote: >> On 2024-06-27T18:49:17+0200, I wrote: >>> On 2023-10-24T19:49:10+0100, Richard Sandiford >>> wrote: This patch adds a combine pass that runs late in the pipeline. >> >> [After sending, I realized I replied

Re: [PATCH v2] MIPS: Output $0 for conditional trap if !ISA_HAS_COND_TRAPI

2024-06-27 Thread YunQiang Su
Maciej W. Rozycki 于2024年6月28日周五 01:01写道: > > On Thu, 27 Jun 2024, YunQiang Su wrote: > > > > The missed optimisation in GAS, which used not to trigger pre-R6, is > > > irrelevant from this change's point of view and just adds noise. I'm > > > surprised that it worked even in the first place, as

[PATCH] Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64

2024-06-27 Thread YunQiang Su
BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now. It is an improvment since that we can save a instruction. ILVR.D is used for test43_v2i64 now, instead of INSVE.D. gcc/testsuite gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and test43_v2i64. --- gcc/testsuite/gcc.ta

Re: [PATCH] preprocessor: Create the parser before handling command-line includes [PR115312]

2024-06-27 Thread Marek Polacek
On Thu, Jun 27, 2024 at 05:06:14PM -0400, Lewis Hyatt wrote: > Hello- > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115312 > > This fixes a 14.1 regression with PCH for MinGW and other platforms that don't > use stdc-predef.h. Bootstrap + regtest all languages on x86-64 Linux; > bootstrap + re

Re: [PATCH] libgccjit: Fix get_size of size_t

2024-06-27 Thread Antoni Boucher
Le 2024-06-26 à 18 h 01, David Malcolm a écrit : On Wed, 2024-02-21 at 14:16 -0500, Antoni Boucher wrote: On Thu, 2023-12-07 at 19:57 -0500, David Malcolm wrote: On Thu, 2023-12-07 at 17:26 -0500, Antoni Boucher wrote: Hi. This patch fixes getting the size of size_t (bug 112910). There's o

[PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-27 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b For V4HI, we gen

RE: [PATCH v5] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-27 Thread Pengxuan Zheng (QUIC)
Thanks, Richard! I've updated the patch accordingly. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655912.html Please let me know if any other changes are needed. Thanks, Pengxuan > Sorry for the slow reply. > > Pengxuan Zheng writes: > > This patch improves GCC’s vectorization of __buil

Re: [x86 SSE PATCH] Some additional ternlog refinements.

2024-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2024 at 4:29 PM Roger Sayle wrote: > > > This patch is another round of refinements to fine tune the new ternlog > infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx > to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to > splitting (before reload),

[PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-27 Thread liuhongt
for the testcase in the PR115406, here is part of the dump. char D.4882; vector(1) _1; vector(1) signed char _2; char _5; : _1 = { -1 }; When assign { -1 } to vector(1} {signed-boolean:8}, Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest with each vector el

[PATCH] MIPS: Support more cases with alien mode of SHF.DF

2024-06-27 Thread YunQiang Su
Currently, we support the cases that strictly fit for the instructions. For example, for V16QImode, we only support shuffle like (0<=N0, N1, N2, N3<=3 here) N0, N1, N2, N3 N0+4N1+4N2+4, N3+4 N0+8N1+8N2+8, N3+8 N0+12 N1+12 N2+12, N

[PATCH] MIPS/testsuite: Add -mfpxx to call-clobbered-1.c

2024-06-27 Thread YunQiang Su
The scan-assembler-times rules only fit for -mfp32 and -mfpxx. It fails if we are configured as FP64 by default, as it has one less sdc1/ldc1 pair. gcc/testsuite * gcc.target/mips/call-clobbered-1.c: Add -mfpxx. --- gcc/testsuite/gcc.target/mips/call-clobbered-1.c | 2 +- 1 file changed,

[PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

2024-06-27 Thread pan2 . li
From: Pan Li This patch would like to support the form of unsigned scalar .SAT_ADD when one of the op is IMM. For example as below: Form IMM: #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ {

[PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables.

2024-06-27 Thread liuhongt
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations with k mask, but mask is defined as uninitialized local variable which will be set as 0 at rtl expand phase. And it's further simplified off by late_combine which caused scan assembly failure. Move the definition of mask outside to

[PATCH 0/3][x86] Enable pass_late_combine for x86.

2024-06-27 Thread liuhongt
Because of the issue described in PR115610, late_combine is disabled by default.The series try to solve the regressions and enable late_combine. There're 4 regressions observed. 1. The first one is related to pass_stv2, because late_combine will restore transformation did in the pass. Move the pas

[PATCH 3/3] [x86] Enable flate-combine.

2024-06-27 Thread liuhongt
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also define target_insn_cost to prevent post_reload pass_late_combine to revert the optimziation did in pass_rpad. Adjust testcases since pass_late_combine generates better code but break scan assembly. .i.e Under 32-bit target, gcc

[PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.

2024-06-27 Thread liuhongt
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which cause extra mov between gpr and kmask, add ?k to the pattern. gcc/ChangeLog: PR target/115610 * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k, enable it only for lshiftrt and under avx512bw.

Re: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Richard Biener
On Thu, Jun 27, 2024 at 4:45 PM Li, Pan2 wrote: > > Hi Richard, > > As mentioned by tamar in previous, would like to try even more optimization > based on this patch. > Assume we take zip benchmark as example, we may have gimple similar as below > > unsigned int _1, _2; > unsigned short int _9; >

Re: [PATCH] vect: Fix shift-by-induction for single-lane slp

2024-06-27 Thread Richard Biener
On Thu, Jun 27, 2024 at 5:15 PM Feng Xue OS wrote: > > I added two test cases for the examples your mentioned. OK, thanks. > BTW: would you please look over another 3 lane-reducing patches that have > been updated? If ok, I would consider to check them in. Sorry, I've been distracted by other

Re: [x86 PATCH] Handle sign_extend like zero_extend in *concatditi3_[346]

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:40 PM Roger Sayle wrote: > > > This patch generalizes some of the patterns in i386.md that recognize > double word concatenation, so they handle sign_extend the same way that > they handle zero_extend in appropriate contexts. > > As a motivating example consider the follo

Re: [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.

2024-06-27 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt wrote: > > late_combine will combine lshift + zero into *lshifrtsi3_1_zext which > cause extra mov between gpr and kmask, add ?k to the pattern. > > gcc/ChangeLog: > > PR target/115610 > * config/i386/i386.md (<*insnsi3_zext): Add alternativ

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-27 Thread Richard Biener
On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote: > > for the testcase in the PR115406, here is part of the dump. > > char D.4882; > vector(1) _1; > vector(1) signed char _2; > char _5; > >: > _1 = { -1 }; > > When assign { -1 } to vector(1} {signed-boolean:8}, > Since TYPE_PRECISION

Re: [PATCH 3/3] [x86] Enable flate-combine.

2024-06-27 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt wrote: > > Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also > define target_insn_cost to prevent post_reload pass_late_combine to > revert the optimziation did in pass_rpad. > > Adjust testcases since pass_late_combine generates better

RE: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Roger Sayle
Hi Thomas, There are two things I think I can contribute to this discussion. The first is that I have a patch (from a year or two ago) for adding rtx_costs to the nvptx backend that I will respin, which will provide more backend control over combine-like pass decisions. The second is in res

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-27 Thread Richard Biener
On Fri, Jun 28, 2024 at 8:01 AM Richard Biener wrote: > > On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote: > > > > for the testcase in the PR115406, here is part of the dump. > > > > char D.4882; > > vector(1) _1; > > vector(1) signed char _2; > > char _5; > > > >: > > _1 = { -1 };

[PATCH] Use move-aware auto_vec in map

2024-06-27 Thread Jørgen Kvalsvik
Using auto_vec rather than vec for means the vectors are release automatically upon return, to stop the leak. The problem seems is that auto_vec is not really move-aware, only the specialization is. This is actually Jan's original suggestion https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655

Re: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-27 Thread Tejas Belagod
On 6/28/24 6:18 AM, Pengxuan Zheng wrote: This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.

Re: [PATCH 2/3] libstdc++: Optimize __uninitialized_default using memset

2024-06-27 Thread Maciej Cencora
But constexpr-ness of bit_cast has additional limitations and e.g. providing an union as _Tp would be a hard-error. So we have two options: - before bitcasting check if type can be bitcast-ed at compile-time, - change the 'if constexpr' to regular 'if'. If we go with the second solution then we

<    1   2