Hi! On 2024-06-27T23:20:18+0200, I wrote: > On 2024-06-27T22:27:21+0200, I wrote: >> On 2024-06-27T18:49:17+0200, I wrote: >>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> >>> wrote: >>>> This patch adds a combine pass that runs late in the pipeline. >> >> [After sending, I realized I replied to a previous thread of this work.] >> >>> I've beek looking a bit through recent nvptx target code generation >>> changes for GCC target libraries, and thought I'd also share here my >>> findings for the "late-combine" changes in isolation, for nvptx target. >>> >>> First the unexpected thing: >> >> So much for "unexpected thing" -- next level of unexpected here... >> Appreciated if anyone feels like helping me find my way through this, but >> I totally understand if you've got other things to do. > > OK, I found something already. (Unexpectedly quickly...) ;-) > >>> there are a few cases where we now see unused >>> registers get declared
> But in fact, for both cases Now tested: 's%both%all'. :-) > the unexpected difference goes away if after > 'pass_late_combine' I inject a 'pass_fast_rtl_dce'. That's normally run > as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's > all not active for nvptx target given '!reload_completed', given nvptx is > 'targetm.no_register_allocation'. Maybe we need to enable a few more > passes, or is there anything in 'pass_late_combine' to change, so that we > don't run into this? Does it inadvertently mark registers live or > something like that? Basically, is 'pass_late_combine' potentionally doing things that depend on later clean-up? (..., or shouldn't it be doing these things in the first place?) > The following makes these two cases work, but evidently needs a lot more > analysis: a lot of other passes are enabled that may be anything between > beneficial and harmful for 'targetm.no_register_allocation'/nvptx. > > --- gcc/passes.cc > +++ gcc/passes.cc > @@ -676,17 +676,17 @@ const pass_data pass_data_postreload = > class pass_postreload : public rtl_opt_pass > { > public: > pass_postreload (gcc::context *ctxt) > : rtl_opt_pass (pass_data_postreload, ctxt) > {} > > /* opt_pass methods: */ > - bool gate (function *) final override { return reload_completed; } > + bool gate (function *) final override { return reload_completed || > targetm.no_register_allocation; } > --- gcc/regcprop.cc > +++ gcc/regcprop.cc > @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass > public: > pass_cprop_hardreg (gcc::context *ctxt) > : rtl_opt_pass (pass_data_cprop_hardreg, ctxt) > {} > > /* opt_pass methods: */ > bool gate (function *) final override > { > - return (optimize > 0 && (flag_cprop_registers)); > + return (optimize > 0 && flag_cprop_registers && > !targetm.no_register_allocation); > } Also, that quickly ICEs; more '[...] && !targetm.no_register_allocation' are needed elsewhere, at least. The following simpler thing, however, does work; move 'pass_fast_rtl_dce' out of 'pass_postreload': --- gcc/passes.cc +++ gcc/passes.cc @@ -677,14 +677,15 @@ class pass_postreload : public rtl_opt_pass { public: pass_postreload (gcc::context *ctxt) : rtl_opt_pass (pass_data_postreload, ctxt) {} /* opt_pass methods: */ + opt_pass * clone () final override { return new pass_postreload (m_ctxt); } bool gate (function *) final override { return reload_completed; } }; // class pass_postreload --- gcc/passes.def +++ gcc/passes.def @@ -529,7 +529,10 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_regrename); NEXT_PASS (pass_fold_mem_offsets); NEXT_PASS (pass_cprop_hardreg); - NEXT_PASS (pass_fast_rtl_dce); + POP_INSERT_PASSES () + NEXT_PASS (pass_fast_rtl_dce); + NEXT_PASS (pass_postreload); + PUSH_INSERT_PASSES_WITHIN (pass_postreload) NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_leaf_regs); NEXT_PASS (pass_split_before_sched2); This (only) cleans up "the mess that 'pass_late_combine' created"; no further changes in GCC target libraries for nvptx. (For avoidance of doubt: "mess" is a great exaggeration here.) Grüße Thomas >> But: should we expect '-fno-late-combine-instructions' vs. >> '-flate-combine-instructions' to behave in the same way? (After all, >> '%r22' remains unused also with '-flate-combine-instructions', and >> doesn't need to be emitted.) This could, of course, also be a nvptx back >> end issue? >> >> I'm happy to supply any dump files etc. Also, 'tmp-libc_a-lnumeric.i.xz' >> is attached if you'd like to reproduce this with your own nvptx target >> 'cc1': >> >> $ [...]/configure --target=nvptx-none --enable-languages=c >> $ make -j12 all-gcc >> $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase >> tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o >> tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions >> >> >> Grüße >> Thomas