On Mon, May 30, 2016 at 9:14 AM, Alexander Monakov <amona...@ispras.ru> wrote: > On Sun, 29 May 2016, Marc Glisse wrote: >> On Sat, 28 May 2016, Alexander Monakov wrote: >> >> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether >> > 'A*B' >> > overflows (or 'B && A > -1 / B' if B may be zero). Let's optimize it to an >> > invocation of __builtin_mul_overflow to avoid the divide operation. >> >> I forgot to ask earlier: what does this give for modes / platforms where >> umulv4 does not have a specific implementation? Is the generic implementation >> worse than A>-1/B, in which case we may want to check optab_handler before >> doing the transformation? Or is it always at least as good? > > If umulv<mode>4 is unavailable (which today is everywhere except x86), gcc > falls back as follows. First, it tries to see if doing a multiplication in a > 2x wider type is possible (which it usually is, as gcc supports __int128_t on > 64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at > high bits of the 2x wide product. This should boil down to doing a 'high > multiply' instruction if original operands' type matches register size, and a > normal multiply + masking high bits if the type is smaller than register. > > Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform), > then gcc emits a sequence that performs the multiplication by parts in a 2x > narrower type. > > I think the first, more commonly taken, fallback path results in an > always-good code. In the second case, the eliminated 64-bit divide is unlikely > to have a direct hw support; e.g., on i386 it's a library call to __udivdi3. > This makes the transformation a likely loss for code size, a likely win for > performance. It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW) > with A*B on gimple.
CCing Jakub who wrote the tree-ssa-math-opts.c code last year. I remember we discussed using match.pd but ended up with not doing it there but I don't remember the exact reason. It would be nice to have these in one place rather than split. As of umulv<mode>4 handling - users can write overflow checks using the builtins so we better expand them to the optimal form for each target, thus canonicalizing them to the IFN looks reasonable to me. The plus/minus case in tree-ssa-math-opts.c _does_ disable itself if no uaddv is available though. As for the division by zero thing - division by zero invokes undefined behavior. Yes, with -fnon-call-exceptions you could catch this as exception, but you shouldn't be able to w/o a -fdivision-by-zero. And yes, we're quite inconsistent in folding of, say, 0/0 - but that is due to warning code (historically). I'd rather ignore that bit in folding and make us consistent here (hopefully w/o regressing in diagnostics). Richard. > Thanks. > Alexander