https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580

--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For the original testcase in comment #0 we produce (in GCC 11+):
        movl    %edi, %eax
        mull    %esi
        seto    %dl
        xorl    %r8d, %r8d
        movzbl  %dl, %edx
        testl   %eax, %eax
        jle     .L1
        testl   %edx, %edx
        sete    %r8b
.L1:
        movl    %r8d, %eax
        ret

------- CUT ----
I have a patch which I think improves the code even more.

The gimple level looks like this correctly:
  x.0_1 = (unsigned int) x_6(D);
  y.1_2 = (unsigned int) y_7(D);
  _11 = .MUL_OVERFLOW (x.0_1, y.1_2);
  tmp_8 = REALPART_EXPR <_11>;
  tmp.3_3 = (int) tmp_8;
  if (tmp.3_3 > 0)
    goto <bb 3>; [59.00%]
  else
    goto <bb 4>; [41.00%]

  <bb 3> [local count: 633507680]:
  _12 = IMAGPART_EXPR <_11>;
  _10 = _12 == 0;

  <bb 4> [local count: 1073741824]:
  # iftmp.2_5 = PHI <_10(3), 0(2)>

Notice no divide.  The _12 == 0 part really should just _12 ^ 1.

After my patch (which I need to finish up) we get:
        movl    %edi, %eax
        mull    %esi
        seto    %dl
        xorl    %r8d, %r8d
        movzbl  %dl, %edx
        xorl    $1, %edx
        testl   %eax, %eax
        cmovg   %edx, %r8d
        movl    %r8d, %eax
        ret
Which should be exactly what you wanted or very close.
There looks to be a few micro-optimizations needed still really.

Reply via email to