On Mon, 17 Jun 2013, Kugan wrote: > Hi, > > I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b) into > single __aeabi_idivmod call > (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721) > > execute_cse_sincos tree level pass does similar cse so I attempted to use > similar approach here. Div/mod cse is not really using built-in functions > though at this level.
The issue with performing the transform at the same time as we transform SINCOS is that the vectorizer will now no longer be able to vectorize these loops. It would need to be taught how to handle the builtin calls (basically undo the transformation, I don't know of any ISA that can do vectorized combined div/mod). Which means it should rather be done at the point we CSE reciprocals (which also replaces computes with builtin target function calls). > For the case of div and mod operations, after CSE is performed, there isnt a > way to represent the resulting stament in gimple. We will endup with divmod > taking two arguments and returning double the size of one arguments in the > three address format (divmod will return reminder and quotient so the return > type is double the size of argument type). > > Since GIMPLE_ASSIGN will result in type checking failure in this case, I > atempted use built-in functions (GIMPLE_CALL to represent the runtime library > call). Name for the function here is target specific and can be obtained from > sdivmod_optab so the builtin function name defined in tree level is not used. > I am not entirelt sure this is the right approach so I am attaching the first > cut of the patch to get your feedback and understand the right approach to > this problem. If we don't want to expose new builtins to the user (I'm not sure we want that), then using "internal functions" is an easier way to avoid these issues (see gimple.h and internal-fn.(def|h)). Generally the transform looks useful to me as it moves forward with the general idea of moving pattern recognition done during RTL expansion to an earlier place. For the use of a larger integer type and shifts to represent the modulo and division result I don't think that's the very best idea. Instead resorting to a complex integer type as return value looks more appealing (similar to sincos using cexpi here). That way you also avoid the ugly hard-coding of bit-sizes. + if (HAVE_divsi3 + || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32) watch out for types whose TYPE_PRECISION is not the bitsize of their mode. Also it should be GET_MODE_PRECISION here. + || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab : sdivmod_optab, + TYPE_MODE (type))) targets that use a libfunc should also get this optimization, as it always removes computations. I think the proper test is for whether the target can do division and/or modulus without using a libfunc, not whether there is a divmod optab/libfunc. Others knowing this piece of the compiler better may want to comment here, of course. Thanks, Richard.