On Sun, Jun 22, 2025 at 2:12 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> >
> > Since read-modify-write is enabled for PentiumPro:
> >
> > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
> >    such as "add $1, mem".  */
> > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
> >           ~(m_PENT | m_LAKEMONT))
> >
> > should this
> >
> > /* Generate "and $0,mem" and "or $-1,mem", instead of "mov $0,mem" and
> >    "mov $-1,mem" with shorter encoding for TARGET_SPLIT_LONG_MOVES with
> >    TARGET_READ_MODIFY_WRITE or -Oz.  */
> > #define TARGET_USE_AND0_ORM1_STORE \
> >   ((TARGET_SPLIT_LONG_MOVES && TARGET_READ_MODIFY_WRITE) \
> >    || (optimize_insn_for_size_p () && optimize_size > 1))
>
> I really think we are mixing performance and code size optimizations.
> I may be misremembering, but I believe that on PPro
>
>  movl $0, (%edx)
>
> is slower than
>
>  xorl %eax, %eax
>  movl $0, (%edx)
>
> due to hardware limitations on decoding instructions with long encoding.
> However
>
>  andl $0, (%edx)
>
> is even slower than both above since it is a read-modify-write instruction

This contradicts

/* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
   such as "add $1, mem".  */
DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
          ~(m_PENT | m_LAKEMONT))

which enables "andl $0, (%edx)" for PentiumPro.   "andl $0, (%edx)" works
well on PentiumPro.

> while both variants above does only write.  I do not think hardware
> special cases this.
>
> Situation is different when you actually do read-modify-write
>
> If read_modify_write is set we produce:
>
>  andl $1, (%edx)
>
> While if it is unset we will do:
>
>  movl (%edx), %eax
>  andl $0, %eax
>  movl %eax,(%edx)
>
> which scheduled better on original Pentium provided extra register is
> available.
>
> Honza



-- 
H.J.

Reply via email to