On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
> > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
> >> > noce_convert_multiple_sets has been introduced and extended over time to 
> >> > handle
> >> > if conversion for blocks with multiple sets. Currently this is focused on
> >> > register moves and rejects any sort of arithmetic operations.
> >> >
> >> > This series is an extension to allow more sequences to take part in if
> >> > conversion. The first patch is a required change to emit correct code 
> >> > and the
> >> > second patch whitelists a larger number of operations through
> >> > bb_ok_for_noce_convert_multiple_sets.
> >> >
> >> > For targets that have a rich selection of conditional instructions,
> >> > like aarch64, I have seen an ~5x increase of profitable if conversions 
> >> > for
> >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety 
> >> > of
> >> > benchmarks and I have not seen performance regressions on either x64 / 
> >> > aarch64.
> >>
> >> Interesting results.  Are you free to say which target you used for 
> >> aarch64?
> >>
> >> If I've understood the cost heuristics correctly, we'll allow a 
> >> "predictable"
> >> branch to be replaced by up to 5 simple conditional instructions and an
> >> "unpredictable" branch to be replaced by up to 10 simple conditional
> >> instructions.  That seems pretty high.  And I'm not sure how well we
> >> guess predictability in the absence of real profile information.
> >>
> >> So my gut instinct was that the limitations of the current code might
> >> be saving us from overly generous limits.  It sounds from your results
> >> like that might not be the case though.
> >>
> >> Still, if it does turn out to be the case in future, I agree we should
> >> fix the costs rather than hamstring the code.
> >>
> >
> > My writing may have been confusing, but with "~5x increase of
> > profitable if conversions" I just meant that ifcvt considers these
> > profitable, not that they actually are when executed in particular
> > hardware.
>
> Yeah, sorry, I'd read that part as measuring the number of if-converisons.
> But...
>
> > But at the same time I haven't yet seen any obvious performance
> > regressions in some benchmarks that I have ran.
>
> ...it was a pleasant surprise that doing so much more if-conversion
> didn't make things noticeably worse. :)
>
> > In any case it could be interesting to microbenchmark branches vs
> > conditional instructions and see how sane these numbers are.
>
> I think for this we really do need the real workload, since it's
> hard to measure realistic branch mispredict penalties with a
> microbenchmark.
>
Yes indeed. I'm still trying to get properly analyze the effects of this change.
I'll share when I have something interesting on the benchmarks side.

> > [...]
> >> (2) Don't you also need to update the "rewiring" mechanism, to cope
> >>     with cases where the then block has something like:
> >>
> >>       if (a == 0) {
> >>         a = b op c;       ->    a' = a == 0 ? b op c : a;
> >>         d = a op b;       ->    d = a == 0 ? a' op b : d;
> >>       }                         a = a'
> >>
> >>     At the moment the code only handles regs and subregs, whereas but IIUC
> >>     it should now iterate over all the regs in the SET_SRC.  And I suppose
> >>     that creates the need for multiple possible rewirings in the same insn,
> >>     so that it isn't a simple insn -> index mapping any more.
> >>
> >
> > Indeed, I believe this current patch cannot properly handle these. I
> > will create testcases for this and see what changes need to be done in
> > the next iteration so that correct code is generated.
>
> Perhaps we should change the way that the rewiring is done.
> At the moment, need_cmov_or_rewire detects the renumbering
> ahead of time.  But it might be easier to:
>
> - have noce_convert_multiple_sets_1 keep track of which
>   SET_DESTs it has replaced with temporaries.
>
> - for each subsequent instruction, go through that list in order
>   and use insn_propagation (from recog.h) to apply each replacement.
>
> That might be simpler, and should also be more robust, since the
> insn_propagation routines return false on failure.
>
Thanks, I've tried various designs with these ideas, including moving
the rewiring code to noce_convert_multiple_sets_1  but there was
always some issue.
For example we cannot remove the need_cmov_or_rewire function because
the part that calculates need_no_cmov needs to iterate once before the
conversion, otherwise it wouldn't work.
Also noce_convert_multiple_sets_1 is ran twice each time so doing the
new rewiring logic which is somewhat expensive two times felt like a
regression without making things much easier.

In the end I opted to keep need_cmov_or_rewire (albait renamed) and
introduce a new struct noce_multiple_sets_info, which I thing made
things much nicer.
That includes taking care of the very long signatures of the helper
functions and improving efficiency by removing a lot of vectors/hash
set/hash maps.

I've also added a SCALAR_INT_MODE_P check that takes care of the
force_reg x86-64 issue.

These changes can be found in the two last commits of the new series.
Any feedback on the new implementation would be welcome:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html

> Thanks,
> Richard
>

Thanks,
Manolis

Reply via email to