On Wed, Jul 3, 2024 at 4:25 PM Iain Sandoe <idsan...@googlemail.com> wrote:
>
>
>
> > On 28 Jun 2024, at 07:03, Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Fri, Jun 28, 2024 at 7:29 AM liuhongt <hongtao....@intel.com> wrote:
> >>
> >> Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
> >> define target_insn_cost to prevent post_reload pass_late_combine to
> >> revert the optimziation did in pass_rpad.
> >>
> >> Adjust testcases since pass_late_combine generates better code but
> >> break scan assembly.
> >>
> >> .i.e
> >> Under 32-bit target, gcc used to generate broadcast from stack and
> >> then do the real operation.
> >> After flate_combine, they're combined into embeded broadcast
> >> operations.
> >>
> >> gcc/ChangeLog:
> >>
> >>        * config/i386/i386-features.cc (ix86_rpad_gate): New function.
> >>        * config/i386/i386-options.cc (ix86_override_options_after_change):
> >>        Don't disable flate_combine.
> >>        * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
> >>        after pre_reload pas_late_combine.
> >>        * config/i386/i386-protos.h (ix86_rpad_gate): New declare.
> >>        * config/i386/i386.cc (ix86_insn_cost): New function.
> >>        (TARGET_INSN_COST): Define.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>        * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
> >>        testcase.
> >>        * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
> >>        * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
> >>        * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
> >>        * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
> >>        * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
> >>        * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
> >>        * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
> >>        * gcc.target/i386/pr91333.c: Ditto.
> >>        * gcc.target/i386/vect-strided-4.c: Ditto.
> >
> > LGTM.
>
> Unfortunately, this breaks bootstrap on every x86_64 Darwin with a 32b 
> multilib
> as well as on all x86 Darwin 32b hosts.
>
> I have some analysis - and will raise a BZ (also going to see if it can be 
> reproduced
> on a linux - darwin cross).
>
> What’s happening is that the Darwin picbase load is being pushed from  a 
> single
> instance in the prolog to multiple instances at the start of other blocks (I 
> cannot
> tell yet if this is shrink wrapping or some other effect).
>
> Anyway - there is supposed to be only one picbase load per function - and some
> code-gen (e.g. non-local-gotos and similar) might well depend on that 
> assumption.
>
> If the reason for the changed codegen is that the picbase load costing has 
> been
> over-estimated in the past, then that means we’d need to adjust both the 
> generation
> of picbase (to cater for multiple instances) and anything that depends on 
> assuming
> there’s only one.
>
> If the reason for the codegen change is that the picbase load cost is now 
> under-
> estimated - that’s an easier fix.

Most likely you just need to define a TARGET_CANNOT_COPY_INSN_P and
return true for that insn.

Thanks,
Andrew Pinski

>
> Initial investigation only, I’ll try to raise a BZ tomorrow - it took a while 
> to bisect.
> Iain
>

Reply via email to