[PATCH v3] x86: Properly find the maximum stack slot alignment

2025-04-27 Thread H.J. Lu
R target/109093 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. * gcc.target/i386/pr109780-3.c: Likewise. -- H.J. From 2233834e398711b65c8b8eeefbf6fa830a6c2974 Mon Sep 17 00:00:00

[PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-26 Thread H.J. Lu
When passing 0xff as an unsigned char function argument with the C frontend promotion, expand_normal used to get constant 255> and returned the rtx value using the sign-extended representation: (const_int 255 [0xff]) But after commit a670ebde3995481225ec62b29686ec07a21e5c10 Author: H.J.

RFC v2: Add TARGET_STORE_BY_PIECES_ICODE

2025-04-21 Thread H.J. Lu
On Mon, Apr 21, 2025 at 9:38 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 6:34 PM Jan Hubicka wrote: > ... > > We originally put CLEAR_RATIO < MOVE_RATIO based on observation that > > mov $0, mem > > is longer in encoding than > > mov mem, mem &

RFC: Add TARGET_STORE_BY_PIECES_ICODE

2025-04-21 Thread H.J. Lu
moves, but it did not materialize (yet). With SSE this problem > disappears since SSE stores does not have immediates anyway. Here is a patch to implement it with UNSPEC_STORE_BY_PIECES. How does it look? -- H.J. From c021053a4fea121a3c4a593b2907701c42a626bc Mon Sep 17 00:00:00 2001 From: "H.J. L

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-21 Thread H.J. Lu
On Mon, Apr 21, 2025 at 6:34 PM Jan Hubicka wrote: > > > On Mon, Apr 21, 2025 at 7:24 AM H.J. Lu wrote: > > > > > > On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka wrote: > > > > > > > > > PR target/102294 > > > > >

[PATCH v2] x86: Properly find the maximum stack slot alignment

2025-04-21 Thread H.J. Lu
On Mon, Apr 21, 2025 at 3:06 PM Uros Bizjak wrote: > > On Sun, Apr 20, 2025 at 11:26 PM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > regist

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread H.J. Lu
On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > For all different modes of all 0s/1s vectors, we can use the single widest > >

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-20 Thread H.J. Lu
On Mon, Apr 21, 2025 at 7:24 AM H.J. Lu wrote: > > On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka wrote: > > > > > PR target/102294 > > > PR target/119596 > > > * config/i386/x86-tune-costs.h (generic_memcpy): Updated.

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-20 Thread H.J. Lu
On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka wrote: > > > PR target/102294 > > PR target/119596 > > * config/i386/x86-tune-costs.h (generic_memcpy): Updated. > > (generic_memset): Likewise. > > (generic_cost): Change CLEAR_RATIO to 17. > > * config/i386/x86-tune.

[PATCH] x86: Properly find the maximum stack slot alignment

2025-04-20 Thread H.J. Lu
ewise. * gcc.target/i386/pr109780-3.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc| 174 ++--- gcc/testsuite/g++.target/i386/pr109780-1.C | 72 + gcc/testsuite/gcc.target/i386/pr109093-1.c | 33 gcc/testsuite/gcc.target/i386/pr109780-1.c

Re: PING: [PATCH v2] x86: Add pcmpeq splitters

2025-04-20 Thread H.J. Lu
On Sat, Apr 19, 2025 at 4:16 PM Uros Bizjak wrote: > > On Sat, Apr 19, 2025 at 7:22 AM H.J. Lu wrote: > > > > On Mon, Dec 2, 2024 at 6:27 AM H.J. Lu wrote: > > > > > > Add pcmpeq splitters to split > > > > > > (insn 5 3 7 2 (

Re: [PATCH] simplify-rtx: Fix shortcut for vector eq/ne

2025-04-20 Thread H.J. Lu
eop1) > + || CONST_VECTOR_P (trueop1))) >&& (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0 >/* We cannot do this if tem is a nonzero address. */ >&& ! nonzero_address_p (tem)) > return simplify_const_relatio

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-19 Thread H.J. Lu
On Sun, Apr 20, 2025 at 4:19 AM Jan Hubicka wrote: > > > On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote: > > > > > > Simplify memcpy and memset inline strategies to avoid branches for > > > -mtune=generic: > > > > > > 1. With MOVE_RA

PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-18 Thread H.J. Lu
On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > For all different modes of all 0s/1s vectors, we can use the single widest > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > Add a pass to generate a single widest all 0s/1s vector set instruction at &g

PING: [PATCH v2] x86: Add pcmpeq splitters

2025-04-18 Thread H.J. Lu
On Mon, Dec 2, 2024 at 6:27 AM H.J. Lu wrote: > > Add pcmpeq splitters to split > > (insn 5 3 7 2 (set (reg:V4SI 100) > (eq:V4SI (reg:V4SI 98) > (reg:V4SI 98))) 7910 {*sse2_eqv4si3} > (expr_list:REG_DEAD (reg:V4SI 98) > (expr

[PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-04-18 Thread H.J. Lu
-29.c: Likewise. * gcc.target/i386/preserve-none-30a.c: Likewise. * gcc.target/i386/preserve-none-30b.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-expand.cc| 6 +- gcc/config/i386/i386-options.cc | 90 -- gcc/config/i386/i386-protos.h

Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-16 Thread H.J. Lu
On Tue, Apr 15, 2025 at 12:19 PM Uros Bizjak wrote: > > On Tue, Apr 15, 2025 at 2:23 PM H.J. Lu wrote: > > > > On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak wrote: > > > > > > On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu wrote: > > > > > > >

[PATCH][GCC14] Extend check-function-bodies to allow label and directives

2025-04-15 Thread H.J. Lu
tarts with ".L". Signed-off-by: H.J. Lu (cherry picked from commit d6bb1e257fc414d21bc31faa7ddecbc93a197e3c) --- gcc/doc/sourcebuild.texi | 9 ++--- gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++--- gcc/testsuite/lib/scanasm.exp| 15 +++

[PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread H.J. Lu
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers pushed in red-zone. Since commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801 Author: H.J. Lu Date: Sun Apr 13 12:20:42 2025 -0700 APX: Don't use red-zone with 32 GPRs and no caller-saved registers disabled red

Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread H.J. Lu
On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak wrote: > > On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu wrote: > > > > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers > > pushed in red-zone. Since > > > > commit 0a074b8c7e79f9d9359d044f

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread H.J. Lu
On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote: > > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > > > Don't use red-zone when there are no caller-saved registers and APX is > >

[PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-13 Thread H.J. Lu
/testsuite/ PR target/119784 * gcc.target/i386/pr119784a.c: New test. * gcc.target/i386/pr119784b.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc | 6 ++ gcc/testsuite/gcc.target/i386/pr119784a.c | 96 +++ gcc/testsuite/

Re: [PATCH v4 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-09 Thread H.J. Lu
On Wed, Apr 9, 2025 at 1:53 AM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-based

Re: [PATCH v4 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-09 Thread H.J. Lu
On Wed, Apr 9, 2025 at 8:54 AM Ard Biesheuvel wrote: > > On Wed, 9 Apr 2025 at 16:46, H.J. Lu wrote: > > > > On Wed, Apr 9, 2025 at 1:53 AM Ard Biesheuvel wrote: > > > > > > From: Ard Biesheuvel > > > > > > Commit bde21de1205 ("i386:

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-08 Thread H.J. Lu
On Tue, Apr 8, 2025 at 9:59 AM Ard Biesheuvel wrote: > > On Tue, 8 Apr 2025 at 18:44, H.J. Lu wrote: > > > > On Tue, Apr 8, 2025 at 9:39 AM Ard Biesheuvel wrote: > > > > > > On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > > > >

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-08 Thread H.J. Lu
On Tue, Apr 8, 2025 at 9:39 AM Ard Biesheuvel wrote: > > On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > > > On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel wrote: > > > > > > From: Ard Biesheuvel > > > > > > Commit bde21de1205 ("i386:

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-08 Thread H.J. Lu
On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-based

Re: [PATCH v2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-08 Thread H.J. Lu
On Tue, Apr 8, 2025 at 3:15 AM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-based

[PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-07 Thread H.J. Lu
get/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte. Signed-off-by: H.J. Lu --- gcc/config/i386/x86-tune-costs.h | 31 --- gcc/config/i386/x86-tune.def | 2 +- .../gcc.target/i386/auto-init-padding-3.c | 7 ++--- .../gcc.target/i386/auto

Re: [PATCH] lto: lto-opts fixes [PR119625]

2025-04-07 Thread H.J. Lu
On Mon, Apr 7, 2025 at 2:53 AM Richard Biener wrote: > > On Fri, 4 Apr 2025, Jakub Jelinek wrote: > > > On Fri, Apr 04, 2025 at 08:52:10PM +0200, Richard Biener wrote: > > > > Or do you want something further (like > > > > switch (global_options.x_flag_cf_protection & ~CF_SET) > > > > )? > > > > >

Re: [Patch, fortran] PR119460 - gfortran.dg/reduce_1.f90 FAILs

2025-04-06 Thread H.J. Lu
On Sun, Apr 6, 2025 at 5:39 AM Paul Richard Thomas wrote: > > Hi All, > > As far as I can tell, the attached patch fixes the problems with the reduce > intrinsic. I would be grateful to the reporters if they would confirm that > this is the case. > > The key to the fix appears in reduce_3.f90, w

Re: [PATCH] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-06 Thread H.J. Lu
On Sun, Apr 6, 2025 at 8:54 AM H.J. Lu wrote: > > On Fri, Apr 4, 2025 at 12:01 AM Ard Biesheuvel wrote: > > > > From: Ard Biesheuvel > > > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > > __fentry__") updated the logic tha

Re: [PATCH] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-06 Thread H.J. Lu
On Fri, Apr 4, 2025 at 12:01 AM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-base

Re: [Patch, fortran] PR85836: Implement the F2018 reduce intrinsic

2025-03-30 Thread H.J. Lu
On Wed, Mar 19, 2025 at 11:23 AM Paul Richard Thomas wrote: > > Hi Andre, > > Thanks for the review - I'll act on the points that you raised. > > The Linaro people reported a failure in reduce_1.f90 execution, which I > believe is due to incorrect casting of 'dim' and a wrong specification of its

[PATCH] gcc.dg/pr90838-2.c: Replace long with long long

2025-03-17 Thread H.J. Lu
Since gcc.dg/pr90838-2.c is only for 64-bit integer, replace long with long long for ILP32 targets. * gcc.dg/pr90838-2.c (ctz4): Replace long with long long. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.dg/pr90838-2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff

Re: [Patch] Fortran: Store OpenMP's 'declare variant' in module file [PR115271]

2025-03-16 Thread H.J. Lu
On Sat, Mar 15, 2025 at 3:46 AM Tobias Burnus wrote: > > I wonder why sometimes my line breaks are preserved and at other times all > eaten. > > Next try ... > > Tobias Burnus wrote: > > Hi Thomas, > > Thomas Koenig wrote: > > Just one question - as this will change the module file, will we still

Re: [PATCH] middle-end/119204 - ICE with strcspn folding

2025-03-15 Thread H.J. Lu
On Tue, Mar 11, 2025 at 2:58 AM Richard Biener wrote: > > The following makes sure to convert the folded expression to the > original expression type. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? > > Thanks, > Richard. > > PR middle-end/119204 > * builtins.cc (fold_

[PATCH stage1 2/3] x86: Add a pass to fold tail call

2025-03-15 Thread H.J. Lu
53-5.c: Likewise. * gcc.target/i386/pr47253-6.c: Likewise. * gcc.target/i386/pr47253-7a.c: Likewise. * gcc.target/i386/pr47253-7b.c: Likewise. * gcc.target/i386/pr47253-8.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-features.cc | 204

[PATCH stage1 6/6] ssa-fre-4.c: Enable for all targets and adjust scan match

2025-03-14 Thread H.J. Lu
Since the C frontend no longer promotes char argument, enable ssa-fre-4.c for all targets and adjust scan match. PR middle-end/112877 * gcc.dg/tree-ssa/ssa-fre-4.c: Enable for all targets and adjust scan match. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.dg/tree-ssa

[PATCH stage1 2/6] Drop targetm.promote_prototypes from C, C++ and Ada frontends

2025-03-14 Thread H.J. Lu
/112877 * call.cc (type_passed_as): Remove the targetm.calls.promote_prototypes call. (convert_for_arg_passing): Likewise. * typeck.cc (cxx_safe_arg_type_equiv_p): Likewise. Signed-off-by: H.J. Lu --- gcc/ada/gcc-interface/utils.cc | 24 gcc/c/c

[PATCH stage1 1/6] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2025-03-14 Thread H.J. Lu
/112877 * gfortran.dg/pr112877-1.f90: New test. Signed-off-by: H.J. Lu --- gcc/calls.cc | 9 + gcc/testsuite/gfortran.dg/pr112877-1.f90 | 17 + 2 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/pr112877-1.f90

[PATCH stage1 0/6] Correct outgoing integer argument promotion

2025-03-14 Thread H.J. Lu
/bugzilla/show_bug.cgi?id=108357 H.J. Lu (6): Honor TARGET_PROMOTE_PROTOTYPES during RTL expand Drop targetm.promote_prototypes from C, C++ and Ada frontends i386: Adjust apx-ndd.c for frontend promotion removal vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86 scev-cast.c: Enable

[PATCH stage1 4/6] vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86

2025-03-14 Thread H.J. Lu
-simd-clone-17c.c: Likewise. * gcc.dg/vect/vect-simd-clone-17d.c: Likewise. * gcc.dg/vect/vect-simd-clone-18c.c: Likewise. * gcc.dg/vect/vect-simd-clone-18d.c: Likewise. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c | 5 + gcc/testsuite

[PATCH stage1 5/6] scev-cast.c: Enable for all targets and adjust scan matches

2025-03-14 Thread H.J. Lu
Since the C frontend no longer promotes char argument, enable scev-cast.c for all targets and adjust scan matches. PR middle-end/112877 * gcc.dg/tree-ssa/scev-cast.c: Enable for all targets and adjust scan match. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.dg/tree-ssa

[PATCH stage1 3/6] i386: Adjust apx-ndd.c for frontend promotion removal

2025-03-14 Thread H.J. Lu
@@ foo4_rol_uint64_t: foo1_imul_short: .LFB92: .cfi_startproc - imull %edi, %esi, %eax + imull %esi, %edi, %eax ret .cfi_endproc .LFE92: Adjust the assembler scans. PR middle-end/112877 * gcc.target/i386/apx-ndd.c: Adjusted. Signed-off-by: H.J. Lu --- gcc

[PATCH stage1 0/3] x86: Add a pass to fold tail call

2025-03-14 Thread H.J. Lu
c block. If the jump table entry points to a target basic block with only a direct sibcall, change the entry to point to the sibcall target, decrement the target basic block entry label use count and redirect the edge to the exit basic block. Call delete_unreachable_blocks to delete the unreachable basi

[PATCH stage1 3/3] x86: Fold sibcall targets into jump table

2025-03-14 Thread H.J. Lu
ise. * gcc.target/i386/pr14721-2b.c: Likewise. * gcc.target/i386/pr14721-2c.c: Likewise. * gcc.target/i386/pr14721-3c.c: Likewise. * gcc.target/i386/pr14721-3b.c: Likewise. * gcc.target/i386/pr14721-3c.c: Likewise. Signed-off-by: H.J. Lu --- gc

[PATCH stage1 1/3] Support symbol reference in jump label and jump table

2025-03-14 Thread H.J. Lu
x argument. * rtl.h (condsibcall_p): New. * rtlanal.cc (tablejump_p): Return false if JUMP_LABEL is a symbol reference. * config/i386/i386-expand.cc (ix86_notrack_prefixed_insn_p): Likewise. * doc/rtl.texi (addr_vec): Also allow symbol reference.

[PATCH v2] i386: Verify that argument registers are spilled properly

2025-03-09 Thread H.J. Lu
pass arguments in 32-bit mode. But there is no coverage in the GCC testsuite. Add tests to verify that argument registers are spilled properly. PR target/119171 * gcc.target/i386/pr119171-1.c: New test. * gcc.target/i386/pr119171-2.c: Likewise. Signed-off-by: H.J. Lu

Re: [PATCH] i386: Verify that argument registers are spilled properly

2025-03-09 Thread H.J. Lu
On Sun, Mar 9, 2025 at 2:54 PM Sam James wrote: > > Uros Bizjak writes: > > > On Sun, Mar 9, 2025 at 3:05 PM H.J. Lu wrote: > >> > >> RDI, RSI, RDX and RCX registers are used to pass arguments in 64-bit > >> mode. EAX, EDX and ECX registers are used t

[PATCH] i386: Verify that argument registers are spilled properly

2025-03-09 Thread H.J. Lu
. * gcc.target/i386/pr119171-2.c: Likewise. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.target/i386/pr119171-1.c | 14 ++ gcc/testsuite/gcc.target/i386/pr119171-2.c | 14 ++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr119171-1.c

Re: [PATCH v3] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-07 Thread H.J. Lu
On Fri, Mar 7, 2025 at 8:42 PM Jan Hubicka wrote: > > > > This is OK. In general, I think we could also go with assert on > > > mem_cost <= 2, since that is kind of bogus setting (I don't think we > > > will ever need to support x86 CPU with memory stores being as cheap as > > > reg-reg moves), b

[PATCH v3] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-07 Thread H.J. Lu
On Fri, Mar 7, 2025 at 7:04 PM Richard Biener wrote: > > On Fri, Mar 7, 2025 at 11:30 AM H.J. Lu wrote: > > > > On Tue, Mar 4, 2025 at 6:18 PM Richard Sandiford > > wrote: > > > > > > Richard Sandiford writes: > > > > Jan Hubicka writes

[PATCH v2] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-07 Thread H.J. Lu
o model the cost of using new callee-saved registers. Apply the exit rather than entry frequency to the cost of restoring a register or deallocating the frame. Update the new variables above. (improve_allocation): Use record_allocation. (color): Initialize allocated_callee_save_regs. (ira_

[PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc

2025-02-25 Thread H.J. Lu
Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Wed, 26 Feb 2025 05:57:13 +0800 Subject: [PATCH] x86: Move TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P to i386.cc Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from i386.h to i386.cc. * config/i

[PATCH] x86: Add tests for PR tree-optimization/82142

2025-02-23 Thread H.J. Lu
Verify that PR tree-optimization/82142 testcase is properly optimized. PR tree-optimization/82142 * gcc.target/i386/pr82142a.c: New file. * gcc.target/i386/pr82142b.c: Likewise. I am checking it in. -- H.J. From e4f44c9f33f70a5053ea817e0dcc7a5d3fa3eec1 Mon Sep 17 00:00:00 2001 From: "H.

[PATCH] Append a newline in debug_edge

2025-02-20 Thread H.J. Lu
Append a newline in debug_edge so that we get (gdb) call debug_edge (e) edge (bb_9, bb_1) (gdb) instead of (gdb) call debug_edge (e) edge (bb_9, bb_1)(gdb) * sese.cc (debug_edge): Append a newline. -- H.J. From 9d209112f37f681cd1e214a3412336476ca18527 Mon Sep 17 00:00:00 2001 From: "H.

Re: [PATCH v4] x86: Check the stack access register for stack access

2025-02-19 Thread H.J. Lu
On Thu, Feb 20, 2025 at 3:11 PM Uros Bizjak wrote: > > On Thu, Feb 20, 2025 at 3:17 AM H.J. Lu wrote: > > > > On Thu, Feb 20, 2025 at 5:37 AM H.J. Lu wrote: > > > > > > On Wed, Feb 19, 2025 at 10:09 PM Uros Bizjak wrote: > > > > > > > .

[PATCH v4] x86: Check the stack access register for stack access

2025-02-19 Thread H.J. Lu
On Thu, Feb 20, 2025 at 5:37 AM H.J. Lu wrote: > > On Wed, Feb 19, 2025 at 10:09 PM Uros Bizjak wrote: > > > ... > > > My algorithm keeps a list of registers which can access the stack > > > starting with SP and FP. If any registers are derived from the list,

[PATCH v3] x86: Check the stack access register for stack access

2025-02-19 Thread H.J. Lu
386.cc:8486 8486 stack_access_data *p = (stack_access_data *) data; (set (reg/f:DI 20 xmm0 [orig:126 _53 ] [126]) (reg/f:DI 0 ax [orig:126 _53 ] [126])) (gdb) FOR_EACH_SUBRTX is needed to check for the memory operand referenced by the stack access register. Here is the v3 patch. OK for master? -

[PATCH v2] x86: Check register and GENERAL_REG_P for stack access

2025-02-19 Thread H.J. Lu
On Wed, Feb 19, 2025 at 8:16 PM Uros Bizjak wrote: > > On Wed, Feb 19, 2025 at 12:53 PM H.J. Lu wrote: > > > > Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of > > REG_P, in ix86_find_all_reg_use_1. > > > > gcc/ > > >

[PATCH] x86: Check GENERAL_REG_P for stack access

2025-02-19 Thread H.J. Lu
master? Thanks. -- H.J. From 30d6c36b86030712c1f243a3440502baa5c56f87 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Wed, 19 Feb 2025 19:48:07 +0800 Subject: [PATCH] x86: Check GENERAL_REG_P for stack access Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of

Re: [PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label

2025-02-17 Thread H.J. Lu
On Mon, Feb 17, 2025 at 7:08 PM Richard Earnshaw (lists) wrote: > > On 13/02/2025 21:43, H.J. Lu wrote: > > Increment LABEL_NUSES when using minipool_vector_label to avoid the zero > > use count on minipool_vector_label. > > > > PR target/118866 > > * conf

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-14 Thread H.J. Lu
On Fri, Feb 14, 2025 at 10:08 PM Richard Biener wrote: > > On Fri, 14 Feb 2025, Uros Bizjak wrote: > > > On Fri, Feb 14, 2025 at 4:56 AM H.J. Lu wrote: > > > > > > On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak wrote: > > > > > >

[PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread H.J. Lu
On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > regist

[PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label

2025-02-13 Thread H.J. Lu
From: "H.J. Lu" Date: Fri, 14 Feb 2025 05:25:47 +0800 Subject: [PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label Increment LABEL_NUSES when using minipool_vector_label to avoid the zero use count on minipool_vector_label. PR target/118866 * config/arm/arm.cc

Re: [PATCH 0/2] x86: Add a pass to fold tail call

2025-02-13 Thread H.J. Lu
On Thu, Feb 13, 2025 at 5:31 PM Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu wrote: > > > > x86 conditional branch (jcc) target can be either a label or a symbol. > > Add a pass to fold tail call with jcc by turning: > > > > jcc

[PATCH v2] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread H.J. Lu
780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. * gcc.target/i386/pr109780-3.c: Likewise. -- H.J. From 820f939a024fc71e4e37b509a3aa0290e8c4e9df Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Da

[PATCH 0/2] x86: Add a pass to fold tail call

2025-02-12 Thread H.J. Lu
upport symbol reference in jump table. Update create_trace_edges to skip symbol reference in jump table. H.J. Lu (2): x86: Add a pass to fold tail call x86: Fold sibcall targets into jump table gcc/config/i386/i386-features.cc | 274 + gcc/config/i386/i386-passes

[PATCH 1/2] x86: Add a pass to fold tail call

2025-02-12 Thread H.J. Lu
c: Likewise. * gcc.target/i386/pr47253-5.c: Likewise. * gcc.target/i386/pr47253-6.c: Likewise. * gcc.target/i386/pr47253-7a.c: Likewise. * gcc.target/i386/pr47253-7b.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-features.cc | 208

[PATCH 2/2] x86: Fold sibcall targets into jump table

2025-02-12 Thread H.J. Lu
Likewise. * gcc.target/i386/pr14721-3b.c: Likewise. * gcc.target/i386/pr14721-3c.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-features.cc | 70 +- gcc/dwarf2cfi.cc | 7 ++- gcc/final.cc

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread H.J. Lu
On Wed, Feb 12, 2025 at 4:03 PM Sam James wrote: > > "H.J. Lu" writes: > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > registers. Then check me

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread H.J. Lu
On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak wrote: > > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > regist

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread H.J. Lu
On Wed, Feb 12, 2025 at 3:16 PM Uros Bizjak wrote: > > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > regist

[PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread H.J. Lu
/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. -- H.J. From 13da9e9be612333b7df7f66cf4b4c1396a64d89d Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 14 Mar 2023 11:41:51 -0700 Subject: [PATCH] x86: Properly find the max

Re: [PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF

2025-02-11 Thread H.J. Lu
On Tue, Feb 11, 2025 at 3:12 PM Uros Bizjak wrote: > > On Tue, Feb 11, 2025 at 7:13 AM H.J. Lu wrote: > > > > x is not a macro argument. It just happens to work as final.cc passes > > x for 2nd argument: > > > > final.cc: ASM_OUTPUT_SYMBOL_REF

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread H.J. Lu
On Tue, Feb 11, 2025 at 4:38 PM Hongtao Liu wrote: > > On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > > > PR117081 is about regression in povray. The reducted testcase: > > &g

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread H.J. Lu
save registers, in the benchmark > using caller saved registers is much better). > Sorry, I may not have been clear in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9 > My patch doesn't change the codegen for that code as shown by commit 846837c2406ae7a52d9123b29c13e4b

[PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF

2025-02-10 Thread H.J. Lu
:00 2001 From: "H.J. Lu" Date: Tue, 11 Feb 2025 13:47:54 +0800 Subject: [PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF x is not a macro argument. It just happens to work as final.cc passes x for 2nd argument: final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); PR target/118825 * config/i

[PATCH] x86: Verify that PUSH/POP can be skipped

2025-02-06 Thread H.J. Lu
.L2" is not taken, it can save one push instruction. Update pr111673.c to verify that this optimization isn't turned off. PR rtl-optimization/111673 * gcc.target/i386/pr111673.c: Verify that PUSH/POP can be skipped. -- H.J. From 6606ec5573e724295bdceb572ddc2813f021709f Mon Sep 17

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 6:29 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > >> > >> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > >> Author: Surya Kumari Jangala > >> Date: Tue Jun

Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 5:21 PM Richard Biener wrote: > > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu wrote: > > > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > > wrote: > > > > > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-03 Thread H.J. Lu
On Mon, Feb 3, 2025 at 5:27 PM Richard Biener wrote: > > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > Author: Surya Kumari Jangala > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > ir

[PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-02 Thread H.J. Lu
(ix86_ira_callee_saved_register_cost_scale): New. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. * doc/tm.texi: Regenerated. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): New. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc | 11 +++ gcc/doc/tm.texi

Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-02 Thread H.J. Lu
On Sun, Feb 2, 2025 at 4:20 PM Richard Biener wrote: > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > wrote: > >> > >> > >> > >>>> Am 02.02.2025 um 08:00 schrie

Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-02 Thread H.J. Lu
On Sun, Feb 2, 2025 at 3:33 PM Richard Biener wrote: > > > > > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > > > > Don't increase callee-saved register cost by 1000x, which leads to that > > callee-saved registers aren't used to preserve local variable va

[PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-01 Thread H.J. Lu
ion/116028 PR rtl-optimization/117081 PR rtl-optimization/118497 * ira-color.cc (assign_hard_reg): Cap callee-saved register cost scale to 300. Signed-off-by: H.J. Lu --- gcc/ira-color.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git

[PATCH, COMMITTED] x86: Add a test for PR rtl-optimization/111673

2025-02-01 Thread H.J. Lu
Add a test for the target independent bug, PR rtl-optimization/111673. PR rtl-optimization/111673 * gcc.target/i386/pr111673.c: New file. -- H.J. From 149da4e8927509a4e72eb01ee8277b6952757c7c Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sun, 2 Feb 2025 06:46:29 +0800 Subject: [

[PATCH, COMMITTED] x86: Change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)"

2025-02-01 Thread H.J. Lu
Update commit dd6247cb8fc11a15e23e949092f89d24ff329209 Author: H.J. Lu Date: Fri Jan 31 12:29:04 2025 +0800 x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt to change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)". PR target/118713 * config

Re: [PATCH v3] x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt

2025-02-01 Thread H.J. Lu
On Sat, Feb 1, 2025 at 6:33 PM H.J. Lu wrote: > > On Sat, Feb 1, 2025 at 5:52 PM Uros Bizjak wrote: > > > > On Sat, Feb 1, 2025 at 9:51 AM H.J. Lu wrote: > > > > > > If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should > > > us

Re: [PATCH v3] x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt

2025-02-01 Thread H.J. Lu
On Sat, Feb 1, 2025 at 5:52 PM Uros Bizjak wrote: > > On Sat, Feb 1, 2025 at 9:51 AM H.J. Lu wrote: > > > > If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should > > use register, not memory. Update Bs, Bw and Bz constraints to disable > >

[PATCH] x86: Add a -mstack-protector-guard=global test

2025-02-01 Thread H.J. Lu
:00:00 2001 From: "H.J. Lu" Date: Sat, 1 Feb 2025 18:06:33 +0800 Subject: [PATCH] x86: Add a -mstack-protector-guard=global test Verify that -mstack-protector-guard=global works on x86. Default stack protector uses TLS. -mstack-protector-guard=global uses a global variable, __stack

[PATCH v3] x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt

2025-02-01 Thread H.J. Lu
. * gcc.target/i386/pr118713-12-x32.c: Likewise. * gcc.target/i386/pr118713-12.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/constraints.md| 24 +-- gcc/config/i386/i386-expand.cc| 22 +++-- gcc/config/i386/i386-protos.h

Re: [PATCH v2] x86: Handle -mindirect-branch-register for -fno-plt

2025-01-31 Thread H.J. Lu
On Fri, Jan 31, 2025 at 10:09 PM Uros Bizjak wrote: > > On Fri, Jan 31, 2025 at 2:54 PM Uros Bizjak wrote: > > > > On Fri, Jan 31, 2025 at 2:36 PM H.J. Lu wrote: > > > > > > -fno-plt forces external call to indirect call via GOT memory. But > > >

[PATCH v2] x86: Handle -mindirect-branch-register for -fno-plt

2025-01-31 Thread H.J. Lu
/pr118713-12.c: Likewise. Co-Authored-By: Uros Bizjak Signed-off-by: H.J. Lu --- gcc/config/i386/i386-expand.cc| 20 ++-- gcc/config/i386/i386.md | 98 +-- .../gcc.target/i386/pr118713-1-x32.c | 8 ++ gcc/testsuite/gcc.target/i386

Re: [PATCH] x86: Handle -mindirect-branch-register for indirect calls

2025-01-31 Thread H.J. Lu
On Fri, Jan 31, 2025 at 8:44 PM Uros Bizjak wrote: > > On Fri, Jan 31, 2025 at 12:09 PM H.J. Lu wrote: > > > > -mindirect-branch-register requires indirect call and jump via register. > > For -mindirect-branch-register, expanding indirect call via register and >

[PATCH] x86: Handle -mindirect-branch-register for indirect calls

2025-01-31 Thread H.J. Lu
. * gcc.target/i386/pr115673-12.c: Likewise. Co-Authored-By: Uros Bizjak Signed-off-by: H.J. Lu --- gcc/config/i386/i386-expand.cc| 20 +-- gcc/config/i386/i386.md | 118 -- .../gcc.target/i386/pr115673-1-x32.c | 8 ++ gcc

[PATCH] force-indirect-call-2.c: Allow indirect branch via GOT

2025-01-31 Thread H.J. Lu
. Signed-off-by: H.J. Lu --- gcc/testsuite/gcc.target/i386/force-indirect-call-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/force-indirect-call-2.c b/gcc/testsuite/gcc.target/i386/force-indirect-call-2.c index 2f702363041..405c97c8000 100644

Re: [PATCH] ree: Skip extension on stack pointer

2025-01-08 Thread H.J. Lu
On Thu, Jan 9, 2025 at 5:35 AM Jeff Law wrote: > > > > On 1/8/25 1:53 PM, H.J. Lu wrote: > > Skip extension on stack pointer since we can't turn > > > > (insn 27 26 139 2 (parallel [ > > (set (reg/f:SI 7 sp) >

[PATCH] ree: Skip extension on stack pointer

2025-01-08 Thread H.J. Lu
target/i386/pr118266.c: New test. Signed-off-by: H.J. Lu --- gcc/ree.cc | 12 +++ gcc/testsuite/gcc.target/i386/pr118266.c | 27 2 files changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr118266.c diff --gi

  1   2   3   4   5   6   7   8   9   10   >