RE: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-22 Thread Li, Pan2
I see, thanks Jeff, will make sure the online CI is OK before commit. Pan -Original Message- From: Jeff Law Sent: Saturday, June 21, 2025 10:32 PM To: Robin Dapp ; Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Chen, Ken ; Liu, Hongtao Subject: Re:

Re: Improve static and AFDO profile combination

2025-06-22 Thread Jan Hubicka
> In addition to working with you on the issues of profile being lost with > LTO, cloning and other cases, my plan is to > 1) finish the VPT reorganization > 2) make AFD reader to scale up the profile since at least in data from > SPEC or profiledbootstrap the counters are quite small integers w

Handle functions with 0 profile in auto-profile

2025-06-22 Thread Jan Hubicka
Hi, This is the last part of the infrastructure to allow functions with local profiles and 0 global autofdo counts. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (afdo_set_bb_count): Dump inline stacks and reasons when lookup failed. (afd

Re: [PATCH] xtensa: Make use of DEPBITS instruction

2025-06-22 Thread Max Filippov
Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. On Mon, Jun 16, 2025 at 11:56 PM Takayuki 'January June' Suwa wrote: > > This patch implements bitfield insertion MD pattern using the DEPBITS > machine instruction, the counterpart of the EXTUI instruction, if > av

Re: [RFC] Implement mdspan.

2025-06-22 Thread Luc Grosheintz
Hi Tomasz and others, please don't review this. I found some preprequisites I'm not checking. While implementing the checks for std::extents, I got saved by one of the tests that exercises the code with an IntLike (instead of an int). Therefore, the first task has to be to tighten up the testing

Do not drop discriminator when inlining

2025-06-22 Thread Jan Hubicka
Hi, auto-fdo is currently confused by a fact that all inlined functions get locators with 0 discriminator, so it is not bale to distinguish multiple inlined calls from single line. Discriminator is lost by calling LOCATION_LOCUS before copying it from former call statement. I believe this is only

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread H.J. Lu
On Sun, Jun 22, 2025 at 1:32 PM Jan Hubicka wrote: > > > Since there is > > > > /* X86_TUNE_SPLIT_LONG_MOVES: Avoid instructions moving immediates > >directly to memory. */ > > DEF_TUNE (X86_TUNE_SPLIT_LONG_MOVES, "split_long_moves", m_PPRO) > > If I recall correctly, this tune was added for

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> Since there is > > /* X86_TUNE_SPLIT_LONG_MOVES: Avoid instructions moving immediates >directly to memory. */ > DEF_TUNE (X86_TUNE_SPLIT_LONG_MOVES, "split_long_moves", m_PPRO) If I recall correctly, this tune was added for PentiumPro which had problem decoding moves with long immediate an

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> > Since read-modify-write is enabled for PentiumPro: > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions >such as "add $1, mem". */ > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > ~(m_PENT | m_LAKEMONT)) > > should this > > /* Generate

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread Jan Hubicka
> This contradicts > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions >such as "add $1, mem". */ > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > ~(m_PENT | m_LAKEMONT)) > > which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread H.J. Lu
On Sun, Jun 22, 2025 at 2:12 PM Jan Hubicka wrote: > > > > > Since read-modify-write is enabled for PentiumPro: > > > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions > >such as "add $1, mem". */ > > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > >

[PATCH] match: Unwrap non-lvalue as unary or binary operand

2025-06-22 Thread Mikael Morin
From: Mikael Morin See the description in the ChangeLog entry below. The testcases are best effort; for some operators the fortran frontend generates a temporary variable, so the simplification doesn't happen. Those cases are not tested. Regression tested on x86_64-linux. OK for master? -- 8<

Re: [PATCH] fortran: Mention user variable in SELECT TYPE temporary variable names

2025-06-22 Thread Harald Anlauf
Hi Mikael! Am 20.06.25 um 12:08 schrieb Mikael Morin: From: Mikael Morin Regression-tested on x86_64-pc-linux-gnu. Ok for master? -- >8 -- The temporary variables that are generated to implement SELECT TYPE and TYPE IS statements have (before this change) a name depending only on the typ

Re: [PATCH] xtensa: Remove TARGET_PROMOTE_FUNCTION_MODE

2025-06-22 Thread Takayuki 'January June' Suwa
On 2025/06/23 6:20, H.J. Lu wrote: On Sun, Jun 22, 2025 at 9:54 PM Max Filippov wrote: On Sun, Jun 22, 2025 at 5:49 AM Takayuki 'January June' Suwa wrote: On 2025/06/22 6:41, Max Filippov wrote: On Sat, Jun 21, 2025 at 2:12 PM Takayuki 'January June' Suwa wrote: That hook has since been

[PATCH] [genoutput] mark scratch outputs as eliminable [PR120424]

2025-06-22 Thread Alexandre Oliva
acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2 -fstack-clash-protection -march=armv7-a -marm: a clobbered scratch register in a *iorsi3_compare0_scratch pattern gets initially assigned to the frame pointer register, but at some point during lra the frame size grows to nonzero, arm_f

[RFC] [lra] catch all to-sp eliminations [PR120424]

2025-06-22 Thread Alexandre Oliva
An x86_64-linux-gnu native with ix86_frame_pointer_required modified to return true for nonzero frames, to exercize lra_update_fp2sp_elimination, reveals in stage1 testing that wrong code is generated for gcc.c-torture/execute/ieee/fp-cmp-8l.c: argp-to-sp eliminations are used for one_test to pas

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Jan Hubicka
> Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > > > -- > H.J. > From 8b37db60ec21c1c673eb1e336208dc10a5d86d5c

[PATCH] [lra] reorder operations in lra_update_fp2sp_elimination [PR120424]

2025-06-22 Thread Alexandre Oliva
And here's a followup to clean up the mess I made in lra_update_fp2sp_elimination, without any functional changes. The various recent additions to lra_update_fp2sp_elimination rendered it somewhat confusing, with intermixed groups of statements pertaining to three different major actions: disabli

[PATCH] [lra] rework deactivation of fp2sp elimination [PR120424]

2025-06-22 Thread Alexandre Oliva
On Jun 13, 2025, Vladimir Makarov wrote: >> * lra-eliminations.cc (lra_update_fp2sp_elimination): >> Inactivate the unused fp2sp elimination right away. Alas, this seems to cause trouble on arm-linux-gnueabihf bootstraps. Here's an alternate approach that builds on it to solves the earlier prob

[PATCH] [lra] apply elimination offsets to MEM in autoinc address [PR120424]

2025-06-22 Thread Alexandre Oliva
When attempting to bootstrap arm-linux-gnueabihf with {BOOT_C,T}FLAGS='-g -O2 -fnon-call-exceptions -fstack-clash-protection', gmp fails to build in stage2: gen-fac's mpz_and gets miscompiled. A pseudo is initialized before a loop and used in a PRE_INC load inside a loop. It gets spilled just a

Re: [PATCH] xtensa: Remove TARGET_PROMOTE_FUNCTION_MODE

2025-06-22 Thread H.J. Lu
On Sun, Jun 22, 2025 at 9:54 PM Max Filippov wrote: > > On Sun, Jun 22, 2025 at 5:49 AM Takayuki 'January June' Suwa > wrote: > > > > On 2025/06/22 6:41, Max Filippov wrote: > > > On Sat, Jun 21, 2025 at 2:12 PM Takayuki 'January June' Suwa > > > wrote: > > >> > > >> That hook has since been dep

[PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread H.J. Lu
Add a PROCESSOR_XXX comment to each entry in processor_cost_table to describe which processor the cost enry is applied to. * config/i386/i386-options.cc (processor_cost_table): Add a PROCESSOR_XXX comment to each entry. -- H.J. From 8b37db60ec21c1c673eb1e336208dc10a5d86d5c Mon Sep 17 00:00:00 2

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 11:03 AM H.J. Lu wrote: > > Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. Ok as obvious. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > >

Re: [PATCH v2] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

2025-06-22 Thread Hongtao Liu
On Sat, Jun 21, 2025 at 11:09 PM H.J. Lu wrote: > > On Fri, Jun 20, 2025 at 4:12 PM H.J. Lu wrote: > > > > Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is > > available. > > > > gcc/ > > > > PR target/120728 > > * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovd

Re: Do not drop discriminator when inlining

2025-06-22 Thread Richard Biener
On Sun, 22 Jun 2025, Jan Hubicka wrote: > Hi, > auto-fdo is currently confused by a fact that all inlined functions get > locators with 0 discriminator, so it is not bale to distinguish multiple > inlined calls from single line. > > Discriminator is lost by calling LOCATION_LOCUS before copying i

Re: [PATCH] xtensa: Remove TARGET_PROMOTE_FUNCTION_MODE

2025-06-22 Thread Max Filippov
On Sat, Jun 21, 2025 at 3:54 PM H.J. Lu wrote: > On Sun, Jun 22, 2025 at 6:35 AM Max Filippov wrote: > > On Sat, Jun 21, 2025 at 2:41 PM Max Filippov wrote: > > > On Sat, Jun 21, 2025 at 2:12 PM Takayuki 'January June' Suwa > > > wrote: > > > > > > > > That hook has since been deprecated > > >

Add GUESSED_GLOBAL0_AFDO profile quality

2025-06-22 Thread Jan Hubicka
Hi, This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can be used to preserve local counts of functions which have 0 AFDO profile. I originally did not include it as it was not clear it will be useful and it turns quality from 3bits to 4bits which means that we need to steal another bit fro

Re: [PATCH] xtensa: Implement TARGET_ZERO_CALL_USED_REGS

2025-06-22 Thread Max Filippov
On Mon, Jun 16, 2025 at 6:33 AM Takayuki 'January June' Suwa wrote: > > This patch implements the target-specific ZERO_CALL_USED_REGS hook, since > if -fzero-call-used-regs=all the default hook tries to assign 0 to B0 > (bit 0 of the BR register) and the ICE will be thrown. > > gcc/ChangeLog: > >

Re: [PATCH] i386: Remove CLDEMOTE for clients

2025-06-22 Thread Hongtao Liu
On Fri, Jun 20, 2025 at 10:04 AM Haochen Jiang wrote: > > Hi all, > > CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned > it will be enabled on Xeon and Atom servers, not clients. Remove them > since Alder Lake (where it is introduced). > > Also will backport this patch to GC

C++ patch ping

2025-06-22 Thread Jakub Jelinek
Hi! I'd like to ping some C family patches: https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681741.html - PR44677 - c, c++: Extend -Wunused-but-set-* warnings https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685543.html - PR120520 - Extend nonnull_if_nonzero attribute plus a questio

Fix some problems with afdo propagation

2025-06-22 Thread Jan Hubicka
Hi, This patch fixes problems I noticed by exploring profiles of some hot functions in GCC. In particular the propagation sometimes changed precise 0 to afdo 0 for paths calling abort and sometimes we could propagate more when we accept that some paths has 0 count. Finally there was important bug

Re: [PATCH] xtensa: Remove TARGET_PROMOTE_FUNCTION_MODE

2025-06-22 Thread Max Filippov
On Sun, Jun 22, 2025 at 5:49 AM Takayuki 'January June' Suwa wrote: > > On 2025/06/22 6:41, Max Filippov wrote: > > On Sat, Jun 21, 2025 at 2:12 PM Takayuki 'January June' Suwa > > wrote: > >> > >> That hook has since been deprecated > >> (commit a670ebde3995481225ec62b29686ec07a21e5c10) and has

[to-be-committed][RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hosts

2025-06-22 Thread Jeff Law
So this is Andrew's patch from the PR. We weren't clean for a 32bit host in some of the arithmetic for constant synthesis. I confirmed the bug on a 32bit linux host, then confirmed that Andrew's patch from the PR fixes the problem, then ran Andrew's patch through my tester successfully. Nat

Re: [PATCH] x86: Enable *mov_(and|or) for TARGET_SPLIT_LONG_MOVES

2025-06-22 Thread H.J. Lu
On Sun, Jun 22, 2025 at 2:57 PM Jan Hubicka wrote: > > > This contradicts > > > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions > >such as "add $1, mem". */ > > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > > ~(m_PENT | m_LAKEMONT)) > > >

Re: [PATCH] xtensa: Remove TARGET_PROMOTE_FUNCTION_MODE

2025-06-22 Thread Takayuki 'January June' Suwa
On 2025/06/22 6:41, Max Filippov wrote: On Sat, Jun 21, 2025 at 2:12 PM Takayuki 'January June' Suwa wrote: That hook has since been deprecated (commit a670ebde3995481225ec62b29686ec07a21e5c10) and has led to incorrect results on Xtensa: /* example */ #define uint32_t __att

[committed][PR rtl-optimization/120550] Drop REG_EQUAL note after ext-dce transformation

2025-06-22 Thread Jeff Law
This bug was found by Edwin's fuzzing efforts on RISC-V, though it likely affects other targets. In simplest terms when ext-dce converts an extension into a (possibly simplified) subreg copy it may make an attached REG_EQUAL note invalid. In the case Edwin found the note was an extension, but