On Wed, Jul 3, 2024 at 4:25 PM Iain Sandoe <idsan...@googlemail.com> wrote: > > > > > On 28 Jun 2024, at 07:03, Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Fri, Jun 28, 2024 at 7:29 AM liuhongt <hongtao....@intel.com> wrote: > >> > >> Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also > >> define target_insn_cost to prevent post_reload pass_late_combine to > >> revert the optimziation did in pass_rpad. > >> > >> Adjust testcases since pass_late_combine generates better code but > >> break scan assembly. > >> > >> .i.e > >> Under 32-bit target, gcc used to generate broadcast from stack and > >> then do the real operation. > >> After flate_combine, they're combined into embeded broadcast > >> operations. > >> > >> gcc/ChangeLog: > >> > >> * config/i386/i386-features.cc (ix86_rpad_gate): New function. > >> * config/i386/i386-options.cc (ix86_override_options_after_change): > >> Don't disable flate_combine. > >> * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad > >> after pre_reload pas_late_combine. > >> * config/i386/i386-protos.h (ix86_rpad_gate): New declare. > >> * config/i386/i386.cc (ix86_insn_cost): New function. > >> (TARGET_INSN_COST): Define. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus > >> testcase. > >> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto. > >> * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto. > >> * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto. > >> * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto. > >> * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto. > >> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto. > >> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto. > >> * gcc.target/i386/pr91333.c: Ditto. > >> * gcc.target/i386/vect-strided-4.c: Ditto. > > > > LGTM. > > Unfortunately, this breaks bootstrap on every x86_64 Darwin with a 32b > multilib > as well as on all x86 Darwin 32b hosts. > > I have some analysis - and will raise a BZ (also going to see if it can be > reproduced > on a linux - darwin cross). > > What’s happening is that the Darwin picbase load is being pushed from a > single > instance in the prolog to multiple instances at the start of other blocks (I > cannot > tell yet if this is shrink wrapping or some other effect). > > Anyway - there is supposed to be only one picbase load per function - and some > code-gen (e.g. non-local-gotos and similar) might well depend on that > assumption. > > If the reason for the changed codegen is that the picbase load costing has > been > over-estimated in the past, then that means we’d need to adjust both the > generation > of picbase (to cater for multiple instances) and anything that depends on > assuming > there’s only one. > > If the reason for the codegen change is that the picbase load cost is now > under- > estimated - that’s an easier fix.
Most likely you just need to define a TARGET_CANNOT_COPY_INSN_P and return true for that insn. Thanks, Andrew Pinski > > Initial investigation only, I’ll try to raise a BZ tomorrow - it took a while > to bisect. > Iain >