Hi all, >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: >> On 30/07/2019 10:08, Christophe Lyon wrote:
>>> Hi Wilco, >>> >>> Do you know which benchmarks were used when this was checked-in? >>> It isn't clear from >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html >> >> It was from my time in Linaro and thus would have been a famous embedded >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and >> Cortex-A15. In addition to this I would like to see what the impact of >> this is on something like Cortex-A53 as the issue rates are likely to be >> different on the schedulers causing different behaviour. Obviously there are differences between various schedulers, but the general issue is that register pressure is increased many times beyond the spilling limit (a few cases I looked at had a pressure well over 120 when there are only 14 integer registers - this causes panic spilling in the register allocator). In fact the spilling overhead between the 2 algorithms is almost identical on Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline model used. It seems more related to the scheduler being too aggressive and not caring about register pressure at all (for example lifting a load 100 instructions before its use so it must be spilled). >> I don't have all the notes today for that - maybe you can look into the >> linaro wiki. >> >> I am concerned about taking this patch in without some more data across >> a variety of cores. >> > > My concern is the original patch > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > any real detail as to the reasons for the choice of the second algorithm > over the first. > > - It's not clear what the win was > - It's not clear what outliers there were and whether they were significant. > > And finally, it's not clear if, 7 years later, this is still the best > choice. > > If the second algorithm really is better, why is no other target using > it by default? > > I think we need a bit more information (both ways). In particular I'm > concerned not just by the overall benchmark average, but also the amount > of variance across the benchmarks. I think the default needs to avoid > significant outliers if at all possible, even if it is marginally less > good on the average. The results clearly show that algorithm 1 works best on Arm today - I haven't seen a single benchmark where algorithm 2 results in less spilling. We could tune algorithm 2 so it switches back to algorithm 1 when register pressure is high or a basic block is large. However until it is fixed, the evidence is that algorithm 1 is the best choice for current cores. Wilco