https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85366
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- For arm64, it is obvious why it is not optimized into one divide: .L2: udiv w2, w0, w3 msub w2, w2, w3, w0 cbnz w2, .L5 sdiv w2, w0, w3 .p2align 2 .L4: mov w0, w2 str w3, [x1], 4 sdiv w2, w2, w3 msub w4, w2, w3, w0 cbz w4, .L4 .L5: add w3, w3, 1 cmp w3, w0 ble .L2 .L1: ret --- CUT --- In the first case we have an unsigned division while in the second case we have a signed division.