On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > >> > noce_convert_multiple_sets has been introduced and extended over time to > >> > handle > >> > if conversion for blocks with multiple sets. Currently this is focused on > >> > register moves and rejects any sort of arithmetic operations. > >> > > >> > This series is an extension to allow more sequences to take part in if > >> > conversion. The first patch is a required change to emit correct code > >> > and the > >> > second patch whitelists a larger number of operations through > >> > bb_ok_for_noce_convert_multiple_sets. > >> > > >> > For targets that have a rich selection of conditional instructions, > >> > like aarch64, I have seen an ~5x increase of profitable if conversions > >> > for > >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety > >> > of > >> > benchmarks and I have not seen performance regressions on either x64 / > >> > aarch64. > >> > >> Interesting results. Are you free to say which target you used for > >> aarch64? > >> > >> If I've understood the cost heuristics correctly, we'll allow a > >> "predictable" > >> branch to be replaced by up to 5 simple conditional instructions and an > >> "unpredictable" branch to be replaced by up to 10 simple conditional > >> instructions. That seems pretty high. And I'm not sure how well we > >> guess predictability in the absence of real profile information. > >> > >> So my gut instinct was that the limitations of the current code might > >> be saving us from overly generous limits. It sounds from your results > >> like that might not be the case though. > >> > >> Still, if it does turn out to be the case in future, I agree we should > >> fix the costs rather than hamstring the code. > >> > > > > My writing may have been confusing, but with "~5x increase of > > profitable if conversions" I just meant that ifcvt considers these > > profitable, not that they actually are when executed in particular > > hardware. > > Yeah, sorry, I'd read that part as measuring the number of if-converisons. > But... > > > But at the same time I haven't yet seen any obvious performance > > regressions in some benchmarks that I have ran. > > ...it was a pleasant surprise that doing so much more if-conversion > didn't make things noticeably worse. :) > > > In any case it could be interesting to microbenchmark branches vs > > conditional instructions and see how sane these numbers are. > > I think for this we really do need the real workload, since it's > hard to measure realistic branch mispredict penalties with a > microbenchmark. > Yes indeed. I'm still trying to get properly analyze the effects of this change. I'll share when I have something interesting on the benchmarks side.
> > [...] > >> (2) Don't you also need to update the "rewiring" mechanism, to cope > >> with cases where the then block has something like: > >> > >> if (a == 0) { > >> a = b op c; -> a' = a == 0 ? b op c : a; > >> d = a op b; -> d = a == 0 ? a' op b : d; > >> } a = a' > >> > >> At the moment the code only handles regs and subregs, whereas but IIUC > >> it should now iterate over all the regs in the SET_SRC. And I suppose > >> that creates the need for multiple possible rewirings in the same insn, > >> so that it isn't a simple insn -> index mapping any more. > >> > > > > Indeed, I believe this current patch cannot properly handle these. I > > will create testcases for this and see what changes need to be done in > > the next iteration so that correct code is generated. > > Perhaps we should change the way that the rewiring is done. > At the moment, need_cmov_or_rewire detects the renumbering > ahead of time. But it might be easier to: > > - have noce_convert_multiple_sets_1 keep track of which > SET_DESTs it has replaced with temporaries. > > - for each subsequent instruction, go through that list in order > and use insn_propagation (from recog.h) to apply each replacement. > > That might be simpler, and should also be more robust, since the > insn_propagation routines return false on failure. > Thanks, I've tried various designs with these ideas, including moving the rewiring code to noce_convert_multiple_sets_1 but there was always some issue. For example we cannot remove the need_cmov_or_rewire function because the part that calculates need_no_cmov needs to iterate once before the conversion, otherwise it wouldn't work. Also noce_convert_multiple_sets_1 is ran twice each time so doing the new rewiring logic which is somewhat expensive two times felt like a regression without making things much easier. In the end I opted to keep need_cmov_or_rewire (albait renamed) and introduce a new struct noce_multiple_sets_info, which I thing made things much nicer. That includes taking care of the very long signatures of the helper functions and improving efficiency by removing a lot of vectors/hash set/hash maps. I've also added a SCALAR_INT_MODE_P check that takes care of the force_reg x86-64 issue. These changes can be found in the two last commits of the new series. Any feedback on the new implementation would be welcome: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html > Thanks, > Richard > Thanks, Manolis