https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580
--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For the original testcase in comment #0 we produce (in GCC 11+):
movl %edi, %eax
mull %esi
seto %dl
xorl %r8d, %r8d
movzbl %dl, %edx
testl %eax, %eax
jle .L1
testl %edx, %edx
sete %r8b
.L1:
movl %r8d, %eax
ret
------- CUT ----
I have a patch which I think improves the code even more.
The gimple level looks like this correctly:
x.0_1 = (unsigned int) x_6(D);
y.1_2 = (unsigned int) y_7(D);
_11 = .MUL_OVERFLOW (x.0_1, y.1_2);
tmp_8 = REALPART_EXPR <_11>;
tmp.3_3 = (int) tmp_8;
if (tmp.3_3 > 0)
goto <bb 3>; [59.00%]
else
goto <bb 4>; [41.00%]
<bb 3> [local count: 633507680]:
_12 = IMAGPART_EXPR <_11>;
_10 = _12 == 0;
<bb 4> [local count: 1073741824]:
# iftmp.2_5 = PHI <_10(3), 0(2)>
Notice no divide. The _12 == 0 part really should just _12 ^ 1.
After my patch (which I need to finish up) we get:
movl %edi, %eax
mull %esi
seto %dl
xorl %r8d, %r8d
movzbl %dl, %edx
xorl $1, %edx
testl %eax, %eax
cmovg %edx, %r8d
movl %r8d, %eax
ret
Which should be exactly what you wanted or very close.
There looks to be a few micro-optimizations needed still really.