https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Aldy Hernandez from comment #1) > Indeed, if you compile imul() with -fdump-tree-all-details-alias -O2 and > look at the vrp1 dump, one can see: > > # RANGE [0, 1] NONZERO 1 > is_rec_12 = (int) _4; > ... > # RANGE [0, 1] NONZERO 1 > _6 = (int) _15; > # RANGE [0, 1] NONZERO 1 > _7 = _6 * is_rec_12; > > This pattern persists throughout the optimization pipeline, so any remaining > optimizer could potentially see the range of the operands and strength > reduce this. > > What would be the best place to do this? The best place to do this is a pattern in match.pd (simplify (mult @1 @2) (if (INTEGRAL_TYPE_P (type) && wi::eq_p (get_nonzero_bits (@1), wi::one (TYPE_PRECISION (type)) && wi::eq_p (get_nonzero_bits (@2), wi::one (TYPE_PRECISION (type))) (and @1 @2)) maybe even think of tricks like requiring only one operand to be [0,1] and sign-extending that from bit 0 before the and? Thus [0,1] * b -> (a ? -1 : 0) & b. But that's probably sth to do at RTL expansion time because it depends on costs to do the sign extension. if nonzero-bits is 2 then we can do b + (a == 2 ? 1 : -1) * b. Thus transfer the bit to the sign bit and do an add. This is similarly sth for RTL expansion. I guess any power-of-two or zero duality can be optimized a bit.