https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773
--- Comment #5 from PeteVine ---
The issue seems to be purely about soft division. (I was either using no -mcpu
or -mcpu=cortex-a5)
Compiling for e.g Cortex-A7, doesn't need to lower any library calls and even
though hardware division is not u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773
--- Comment #4 from Richard Biener ---
It possibly does value profiling figuring out a common division/modulo value
and then making all other values unlikely (and thus cold).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773
--- Comment #3 from PeteVine ---
Oh, a divmod issue. At least it's not using modsi3 ;) (llvm #26450)
BTW, the attached assembly files were generated with lto and NEON enabled but
the 20% difference stayed the same. (1s vs 1.2s)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773
--- Comment #2 from Andrew Pinski ---
Looks like for some reason with profiling __aeabi_idiv/__aeabi_idivmod is being
used in one place.
Most likely for pos / 9 .