https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773
--- Comment #5 from PeteVine <tulipawn at gmail dot com> --- The issue seems to be purely about soft division. (I was either using no -mcpu or -mcpu=cortex-a5) Compiling for e.g Cortex-A7, doesn't need to lower any library calls and even though hardware division is not used in the vanilla case, it is after profiling. The profiled binary even shows a benefit under qemu-arm so I can safely guess the issue is non-existent when targeting similar ARM parts.