https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103802
--- Comment #1 from luoxhu at gcc dot gnu.org --- MOVE_MAX_PIECES is 4 on m32 but it is 8 on m64, then estimate_move_cost is different between them 2 vs 1 for “((size + MOVE_MAX_PIECES - 1) / MOVE_MAX_PIECES)". recip-3.m32.c.172t.cunroll: BB: 11, after_exit: 0 BB: 7, after_exit: 0 size: 2 _4 = F[i_23]; size: 1 _5 = _4 + iftmp.1_10; size: 2 F[i_23] = _5; BB: 5, after_exit: 0 size: 1 _2 = d_14 + 1.00000000000000088817841970012523233890533447265625e-1; size: 1 reciptmp_12 = 1.0e+0 / d_14; size: 1 iftmp.1_18 = reciptmp_12 * _2; BB: 6, after_exit: 0 size: 1 _3 = -1.00000000000000088817841970012523233890533447265625e-1 - d_14; size: 1 reciptmp_25 = 1.0e+0 / d_14; size: 1 iftmp.1_17 = reciptmp_25 * _3; BB: 4, after_exit: 0 size: 2 if (e.0_1 < -5.00000000000000444089209850062616169452667236328125e-2) size: 19-4, last_iteration: 19-4 Loop size: 19 Estimated size after unrolling: 20 Not unrolling loop 1: size would grow. But recip-3.m64.c.172t.cunroll: BB: 11, after_exit: 0 BB: 7, after_exit: 0 size: 1 _4 = F[i_23]; size: 1 _5 = _4 + iftmp.1_10; size: 1 F[i_23] = _5; BB: 5, after_exit: 0 size: 1 _2 = d_14 + 1.00000000000000088817841970012523233890533447265625e-1; size: 1 reciptmp_12 = 1.0e+0 / d_14; size: 1 iftmp.1_18 = reciptmp_12 * _2; BB: 6, after_exit: 0 size: 1 _3 = -1.00000000000000088817841970012523233890533447265625e-1 - d_14; size: 1 reciptmp_25 = 1.0e+0 / d_14; size: 1 iftmp.1_17 = reciptmp_25 * _3; BB: 4, after_exit: 0 size: 2 if (e.0_1 < -5.00000000000000444089209850062616169452667236328125e-2) size: 17-4, last_iteration: 17-4 Loop size: 17 Estimated size after unrolling: 17 Making edge 18->9 impossible by redistributing probability to other edges. Making edge 8->10 impossible by redistributing probability to other edges. /home/luoxhu/workspace/gcc-master/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c:16:14: optimized: loop with 1 iterations completely unrolled (header execution count 357878154)