https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Kewen Lin from comment #29) > (In reply to Hongtao.liu from comment #28) > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement > > > > Yes, it's the place. > > > > > is UB to UH conversion statement, further check if the def of the input UB > > > is MEM. > > > > Only if there's no multi-use for UB. More generally, it's quite difficult to > > guess later optimizations for the purpose of more accurate vectorization > > cost model, :(. > > Yeah, it's hard sadly. The generic cost modeling is rough, > ix86_add_stmt_cost is more fine-grain (at least than what we have on Power > :)), if you want to check it more, it seems doable in target specific hook > finish_cost where you can get the whole vinfo object, but it could end up > with very heavy analysis and might not be worthy. > > Do you mind to check if it can also fix this degradation on x86 to run FRE > and DSE just after cunroll? I found it worked for Power, hoped it can help > there too. Btw, we could try sth like adding a TODO_force_next_scalar_cleanup to be returned from passes that see cleanup opportunities and have the pass manager queue that up, looking for a special marked pass and enabling that so we could have NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_scalar_cleanup); PUSH_INSERT_PASSES_WITHIN (pass_scalar_cleanup); NEXT_PASS (pass_fre, false /* may_iterate */); NEXT_PASS (pass_dse); POP_INSERT_PASSES (); with pass_scalar_cleanup gate() returning false otherwise. Eventually pass properties would match this better, or sth else. That said, running a cleanup on the whole function should be done via a separate pass - running a cleanup on a sub-CFG can be done from within another pass. But mind that sub-CFG cleanup really has to be of O(size-of-sub-CFG), otherwise it doesn't help.