[Bug tree-optimization/88603] optimization missed for saturation arithmetic add

rguenth at gcc dot gnu.org Wed, 02 Jan 2019 02:42:26 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88603


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-01-02
                 CC|                            |rguenth at gcc dot gnu.org
            Version|unknown                     |8.2.1
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  GCC "optimized" the code to

  <bb 2> [local count: 1073741825]:
  _1 = (long unsigned int) a_4(D);
  _2 = (long unsigned int) b_5(D);
  tmp_6 = _1 + _2;
  if (tmp_6 > 4294967295)
    goto <bb 4>; [21.72%]
  else
    goto <bb 3>; [78.28%]

  <bb 3> [local count: 840525100]:
  _7 = a_4(D) + b_5(D);

  <bb 4> [local count: 1073741825]:
  # _3 = PHI <_7(3), 4294967295(2)>
  return _3;

forwprop1 does this already by means of narrowing patterns in match.pd:

/* Narrowing of arithmetic and logical operations.

   These are conceptually similar to the transformations performed for
   the C/C++ front-ends by shorten_binary_op and shorten_compare.  Long
   term we want to move all that code out of the front-ends into here.  */

/* If we have a narrowing conversion of an arithmetic operation where
   both operands are widening conversions from the same type as the outer
   narrowing conversion.  Then convert the innermost operands to a suitable
   unsigned type (to avoid introducing undefined behavior), perform the
   operation and convert the result to the desired type.  */
(for op (plus minus)
  (simplify
    (convert (op:s (convert@2 @0) (convert?@3 @1)))
...
         (convert (op (convert:utype @0)
                      (convert:utype @1))))))))

here :s is not effective for your testcase since there's no extra
conversions in the end.  Disabling this yields

  <bb 2> [local count: 1073741825]:
  _1 = (long unsigned int) a_4(D);
  _2 = (long unsigned int) b_5(D);
  tmp_6 = _1 + _2;
  _10 = MIN_EXPR <tmp_6, 4294967295>;
  _3 = (uint32_t) _10;

and not much better code in the end:

saturation_add:
.LFB4:
        .cfi_startproc
        movl    %edi, %edi
        movl    %esi, %eax
        movl    $4294967295, %edx
        addq    %rdi, %rax
        cmpq    %rdx, %rax
        cmova   %rdx, %rax
        ret

[Bug tree-optimization/88603] optimization missed for saturation arithmetic add

Reply via email to