https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #1)
> Indeed, if you compile imul() with -fdump-tree-all-details-alias -O2 and
> look at the vrp1 dump, one can see:
> 
>   # RANGE [0, 1] NONZERO 1
>   is_rec_12 = (int) _4;
> ...
>   # RANGE [0, 1] NONZERO 1
>   _6 = (int) _15;
>   # RANGE [0, 1] NONZERO 1
>   _7 = _6 * is_rec_12;
> 
> This pattern persists throughout the optimization pipeline, so any remaining
> optimizer could potentially see the range of the operands and strength
> reduce this.
> 
> What would be the best place to do this?

The best place to do this is a pattern in match.pd

(simplify (mult @1 @2)
 (if (INTEGRAL_TYPE_P (type)
      && wi::eq_p (get_nonzero_bits (@1), wi::one (TYPE_PRECISION (type))
      && wi::eq_p (get_nonzero_bits (@2), wi::one (TYPE_PRECISION (type)))
  (and @1 @2))

maybe even think of tricks like requiring only one operand to be [0,1]
and sign-extending that from bit 0 before the and?  Thus [0,1] * b
-> (a ? -1 : 0) & b.  But that's probably sth to do at RTL expansion time
because it depends on costs to do the sign extension.

if nonzero-bits is 2 then we can do b + (a == 2 ? 1 : -1) * b.  Thus
transfer the bit to the sign bit and do an add.  This is similarly sth
for RTL expansion.  I guess any power-of-two or zero duality can be
optimized a bit.

Reply via email to