On Sun, Jun 22, 2025 at 2:12 PM Jan Hubicka <[email protected]> wrote:
>
> >
> > Since read-modify-write is enabled for PentiumPro:
> >
> > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
> > such as "add $1, mem". */
> > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
> > ~(m_PENT | m_LAKEMONT))
> >
> > should this
> >
> > /* Generate "and $0,mem" and "or $-1,mem", instead of "mov $0,mem" and
> > "mov $-1,mem" with shorter encoding for TARGET_SPLIT_LONG_MOVES with
> > TARGET_READ_MODIFY_WRITE or -Oz. */
> > #define TARGET_USE_AND0_ORM1_STORE \
> > ((TARGET_SPLIT_LONG_MOVES && TARGET_READ_MODIFY_WRITE) \
> > || (optimize_insn_for_size_p () && optimize_size > 1))
>
> I really think we are mixing performance and code size optimizations.
> I may be misremembering, but I believe that on PPro
>
> movl $0, (%edx)
>
> is slower than
>
> xorl %eax, %eax
> movl $0, (%edx)
>
> due to hardware limitations on decoding instructions with long encoding.
> However
>
> andl $0, (%edx)
>
> is even slower than both above since it is a read-modify-write instruction
This contradicts
/* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
such as "add $1, mem". */
DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
~(m_PENT | m_LAKEMONT))
which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx)" works
well on PentiumPro.
> while both variants above does only write. I do not think hardware
> special cases this.
>
> Situation is different when you actually do read-modify-write
>
> If read_modify_write is set we produce:
>
> andl $1, (%edx)
>
> While if it is unset we will do:
>
> movl (%edx), %eax
> andl $0, %eax
> movl %eax,(%edx)
>
> which scheduled better on original Pentium provided extra register is
> available.
>
> Honza
--
H.J.