Richard Biener wrote:
> Hurugalawadi, Naveen wrote:
> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.

> What's the reason of this transform?  I expect that the HW multiplier
> is quite fast given one operand is either zero or one and a multiplication
> is a gimple operation that's better handled in optimizations than
> COND_EXPRs which eventually expand to conditional code which
> would be much slower.

Even really fast multipliers have several cycles latency, and this is generally
fixed irrespectively of the inputs. Maybe you were thinking about division?

Additionally integer multiply typically has much lower throughput than other 
ALU operations like conditional move - a modern CPU may have 4 ALUs
but only 1 multiplier, so removing redundant integer multiplies is always good.

Note (m1 > m2) is also a conditional expression which will result in branches
for floating point expressions and on some targets even for integers. Moving
the multiply into the conditional expression generates the best code:

Integer version:
f1:
        cmp    w0, 100
        csel   w0, w1, wzr, gt
        ret
f2:
        cmp    w0, 100
        cset   w0, gt
        mul    w0, w0, w1
        ret

Float version:
f3:
        movi   v1.2s, #0
        cmp    w0, 100
        fcsel  s0, s0, s1, gt
        ret
f4:
        cmp    w0, 100
        bgt    .L8
        movi   v1.2s, #0
        fmul   s0, s0, s1  // eh???
.L8:
        ret

Wilco

Reply via email to