https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78120
--- Comment #7 from Bernd Schmidt <bernds at gcc dot gnu.org> --- Sorry James, I think these two got mixed up in my memory. I've attached a candidate patch I'm testing. This tries to make a better effort to calculate before/after costs for the speed case so we don't rely entirely on max_seq_cost. I'd be interested in whether it produces good results on ARM (or even on x86) if someone is set up to run benchmarks.