On Thu, 8 May 2025, H.J. Lu wrote: > On Mon, Apr 28, 2025 at 8:57 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On x86, both stores with 32-bit immediate and register are supported: > > > > 0: 48 c7 40 10 00 00 00 00 movq $0x0,0x10(%rax) > > 8: 48 89 50 10 movq %rdx,0x10(%rax) > > > > But store with 32-bit immediate is 4 byte longer. > > > > Add UNSPEC_STORE_BY_PIECES to x86 backend for register store to avoid > > store with 32-bit immediate for shorter encoding and add a target hook to > > select the store instruction used by the store by_pieces infrastructure > > so that a target can choose a specific instruction for shorter encoding. > > When optimizing on x86, we choose register store: > > > > 1. If length-changing prefix (LCP) stall is avoided with 16-bit register > > store. Or > > 2. If more than 2 stores with 32-bit immediate will be used. > > > > gcc/ > > > > * expr.c (store_by_pieces_d::prepare_mode): Call > > targetm.store_by_pieces_icode to get store by_pieces insn code. > > * target.def (store_by_pieces_icode): New hook. > > * targhooks.cc (default_store_by_pieces_icode): New. > > targhooks.h (default_store_by_pieces_icode): Likewise. > > * config/i386/i386.cc (ix86_store_by_pieces_icode): New. > > (TARGET_STORE_BY_PIECES_ICODE): Likewise. > > * config/i386/i386.md (UNSPEC_STORE_BY_PIECES): New. > > (store_by_pieces_mov<mode>): Likewise. > > (store_by_pieces_mov<mode>_1): Likewise. > > * config/i386/x86-tune.def (X86_TUNE_USE_REGISTER_STORE_BY_PIECES): > > Likewise. > > * doc/tm.texi: Regenerated. > > * doc/tm.texi.in: Add TARGET_STORE_BY_PIECES_ICODE. > > > > gcc/testsuite/ > > > > * gcc.target/i386/memset-strategy-10.c: New test. > > * gcc.target/i386/memset-strategy-11.c: Likewise. > > * gcc.target/i386/memset-strategy-12.c: Likewise. > > * gcc.target/i386/memset-strategy-13.c: Likewise. > > * gcc.target/i386/memset-strategy-14.c: Likewise. > > * gcc.target/i386/memset-strategy-15.c: Likewise. > > * gcc.target/i386/memset-strategy-16.c: Likewise. > > * gcc.target/i386/memset-strategy-17.c: Likewise. > > * gcc.target/i386/memset-strategy-18.c: Likewise. > > * gcc.target/i386/memset-strategy-19.c: Likewise. > > * gcc.target/i386/memset-strategy-20.c: Likewise. > > * gcc.target/i386/memset-strategy-21.c: Likewise. > > * gcc.target/i386/pr72839.c: Scan for register store. > > > > OK for master? > > > > Thanks. > > > > -- > > H.J. > > PING: > > https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682007.html
IMO it's better to have the underlying issue - lack of "CSE" of immediates - to be addressed, either in generic code or in a machine dependent pass since this comes up not only in store-by-pieces context. Didn't you do such a machine pass recently? Using an UNSPEC in RTL for this will very likely pessimize optimization there. I wonder if we should consider to only allow (large) immediates after reload? Richard.