Re: [PR43721] Failure to optimise (a/b) and (a%b) into single call

Richard Biener Mon, 17 Jun 2013 02:38:18 -0700

On Mon, 17 Jun 2013, Kugan wrote:

> Hi,
> 
> I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b) into
> single __aeabi_idivmod call
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721)
> 
> execute_cse_sincos tree level pass does similar cse so I attempted to use
> similar approach here. Div/mod cse is not really using built-in functions
> though at this level.


The issue with performing the transform at the same time as we
transform SINCOS is that the vectorizer will now no longer be able
to vectorize these loops.  It would need to be taught how to
handle the builtin calls (basically undo the transformation, I don't
know of any ISA that can do vectorized combined div/mod).  Which
means it should rather be done at the point we CSE reciprocals
(which also replaces computes with builtin target function calls).

> For the case of div and mod operations, after CSE is performed, there isnt a
> way to represent the resulting stament in gimple. We will endup with divmod
> taking two arguments and returning double the size of one arguments in the
> three address format (divmod will return reminder and quotient so the return
> type is double the size of argument type).
> 
> Since GIMPLE_ASSIGN will result in type checking failure in this case, I
> atempted use built-in functions (GIMPLE_CALL to represent the runtime library
> call). Name for the function here  is target specific and can be obtained from
> sdivmod_optab so the builtin function name defined in tree level is not used.
> I am not entirelt sure this is the right approach so I am attaching the first
> cut of the patch to get your feedback and understand the right approach to
> this problem.

If we don't want to expose new builtins to the user (I'm not sure we
want that), then using "internal functions" is an easier way to
avoid these issues (see gimple.h and internal-fn.(def|h)).

Generally the transform looks useful to me as it moves forward with
the general idea of moving pattern recognition done during RTL expansion
to an earlier place.

For the use of a larger integer type and shifts to represent
the modulo and division result I don't think that's the very best
idea.  Instead resorting to a complex integer type as return
value looks more appealing (similar to sincos using cexpi here).
That way you also avoid the ugly hard-coding of bit-sizes.

+  if (HAVE_divsi3
+       || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32)

watch out for types whose TYPE_PRECISION is not the bitsize
of their mode.  Also it should be GET_MODE_PRECISION here.

+       || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab : 
sdivmod_optab,
+                            TYPE_MODE (type)))

targets that use a libfunc should also get this optimization, as
it always removes computations.  I think the proper test is
for whether the target can do division and/or modulus without
using a libfunc, not whether there is a divmod optab/libfunc.

Others knowing this piece of the compiler better may want to comment
here, of course.

Thanks,
Richard.

Re: [PR43721] Failure to optimise (a/b) and (a%b) into single call

Reply via email to