https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912
--- Comment #11 from Erik Schnetter <schnetter at gmail dot com> --- The number of active local variables is likely much larger than the number of registers, and I expect there to be a lot of spilling. I hope that the compiler is clever about changing the order in which expressions are evaluated to reduce spilling as much as possible. Because the loop is so large, I split it into two, each calculating about half of the output variables. The code here looks at one of the loops. To simplify the code, each loop still loads all variables (via masked loads), but may not use all of them. The unused masked loads do not surprise me per se, but I expect the compiler to remove them.