https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964
--- Comment #4 from PeteVine <tulipawn at gmail dot com> --- > I'm not sure what you're trying to measure here - it's very confusing with > multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it > related to -fipa-pta or is that not relevant? All the relevant flags have been kept constant (-Ofast -mcpu), so you should only look at this result side by side with the previous one. I'll summarise the findings for you: To get the best c-ray performance out of gcc7 it's necessary to either use -mcpu/mtune=cortex-a57 or -mcpu=cortex-a53 -frename-registers (depessimizing with -mno-fix-cortex-a53-843419 if necessary) However, in gcc8, neither produce the expected, best performance. No combination does, a clear regression.