https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580
--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> --- For the original testcase in comment #0 we produce (in GCC 11+): movl %edi, %eax mull %esi seto %dl xorl %r8d, %r8d movzbl %dl, %edx testl %eax, %eax jle .L1 testl %edx, %edx sete %r8b .L1: movl %r8d, %eax ret ------- CUT ---- I have a patch which I think improves the code even more. The gimple level looks like this correctly: x.0_1 = (unsigned int) x_6(D); y.1_2 = (unsigned int) y_7(D); _11 = .MUL_OVERFLOW (x.0_1, y.1_2); tmp_8 = REALPART_EXPR <_11>; tmp.3_3 = (int) tmp_8; if (tmp.3_3 > 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 633507680]: _12 = IMAGPART_EXPR <_11>; _10 = _12 == 0; <bb 4> [local count: 1073741824]: # iftmp.2_5 = PHI <_10(3), 0(2)> Notice no divide. The _12 == 0 part really should just _12 ^ 1. After my patch (which I need to finish up) we get: movl %edi, %eax mull %esi seto %dl xorl %r8d, %r8d movzbl %dl, %edx xorl $1, %edx testl %eax, %eax cmovg %edx, %r8d movl %r8d, %eax ret Which should be exactly what you wanted or very close. There looks to be a few micro-optimizations needed still really.