https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Unrolling of the inner loop accounts for the rest (both conditional moves with if-conversion applied and the branchy code if not seems to put a too heavy load on the branch predictor(?) when inside another loop).