On Thu, 8 May 2025, H.J. Lu wrote:

> On Mon, Apr 28, 2025 at 8:57 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On x86, both stores with 32-bit immediate and register are supported:
> >
> >    0: 48 c7 40 10 00 00 00 00 movq   $0x0,0x10(%rax)
> >    8: 48 89 50 10          movq   %rdx,0x10(%rax)
> >
> > But store with 32-bit immediate is 4 byte longer.
> >
> > Add UNSPEC_STORE_BY_PIECES to x86 backend for register store to avoid
> > store with 32-bit immediate for shorter encoding and add a target hook to
> > select the store instruction used by the store by_pieces infrastructure
> > so that a target can choose a specific instruction for shorter encoding.
> > When optimizing on x86, we choose register store:
> >
> > 1. If length-changing prefix (LCP) stall is avoided with 16-bit register
> > store. Or
> > 2. If more than 2 stores with 32-bit immediate will be used.
> >
> > gcc/
> >
> > * expr.c (store_by_pieces_d::prepare_mode): Call
> > targetm.store_by_pieces_icode to get store by_pieces insn code.
> > * target.def (store_by_pieces_icode): New hook.
> > * targhooks.cc (default_store_by_pieces_icode): New.
> > targhooks.h (default_store_by_pieces_icode): Likewise.
> > * config/i386/i386.cc (ix86_store_by_pieces_icode): New.
> > (TARGET_STORE_BY_PIECES_ICODE): Likewise.
> > * config/i386/i386.md (UNSPEC_STORE_BY_PIECES): New.
> > (store_by_pieces_mov<mode>): Likewise.
> > (store_by_pieces_mov<mode>_1): Likewise.
> > * config/i386/x86-tune.def (X86_TUNE_USE_REGISTER_STORE_BY_PIECES):
> > Likewise.
> > * doc/tm.texi: Regenerated.
> > * doc/tm.texi.in: Add TARGET_STORE_BY_PIECES_ICODE.
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/memset-strategy-10.c: New test.
> > * gcc.target/i386/memset-strategy-11.c: Likewise.
> > * gcc.target/i386/memset-strategy-12.c: Likewise.
> > * gcc.target/i386/memset-strategy-13.c: Likewise.
> > * gcc.target/i386/memset-strategy-14.c: Likewise.
> > * gcc.target/i386/memset-strategy-15.c: Likewise.
> > * gcc.target/i386/memset-strategy-16.c: Likewise.
> > * gcc.target/i386/memset-strategy-17.c: Likewise.
> > * gcc.target/i386/memset-strategy-18.c: Likewise.
> > * gcc.target/i386/memset-strategy-19.c: Likewise.
> > * gcc.target/i386/memset-strategy-20.c: Likewise.
> > * gcc.target/i386/memset-strategy-21.c: Likewise.
> > * gcc.target/i386/pr72839.c: Scan for register store.
> >
> > OK for master?
> >
> > Thanks.
> >
> > --
> > H.J.
> 
> PING:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682007.html

IMO it's better to have the underlying issue - lack of "CSE"
of immediates - to be addressed, either in generic code or
in a machine dependent pass since this comes up not only in
store-by-pieces context.

Didn't you do such a machine pass recently?

Using an UNSPEC in RTL for this will very likely pessimize
optimization there.

I wonder if we should consider to only allow (large) immediates
after reload?

Richard.

Reply via email to