https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712
--- Comment #6 from PeteVine <tulipawn at gmail dot com> --- The difference between clang and gcc is even greater on ARMv7 Cortex A5 but there's no way to catch up through unrolling (no effect): gcc version 7.0.1 20170225: 1227.2 Kpos/sec clang 3.6: 1540.4 Kpos/sec