Kenneth Zadeck <[email protected]> writes:
>>> + There are two useful preprocessor defines for use by maintainers:
>>> +
>>> + #define LOG_COSTS
>>> +
>>> + if you wish to see the actual cost estimates that are being used
>>> + for each mode wider than word mode and the cost estimates for zero
>>> + extension and the shifts. This can be useful when port maintainers
>>> + are tuning insn rtx costs.
>>> +
>>> + #define FORCE_LOWERING
>>> +
>>> + if you wish to test the pass with all the transformation forced on.
>>> + This can be useful for finding bugs in the transformations.
>> Must admit I'm not keen on these kinds of macro, but it's Ian's call.
>>
>> Idea for the future (i.e. not this patch) is to have a dump file for
>> target initialisation.
> Imagine my horror when i did all of this as you had privately suggested
> and discovered that there was no way to log what i was doing. This is
> good enough until someone wants to fix the general problem.
>
>>> +/* This pass can transform 4 different operations: move, ashift,
>>> + lshiftrt, and zero_extend. There is a boolean vector for move
>>> + splitting that is indexed by mode and is true for each mode that is
>>> + to have its copies split. The other three operations are only done
>>> + for one mode so they are only controlled by a single boolean .*/
>> As mentioned privately, whether this is profitable for shifts depends
>> to some extent on the shift amount. GCC already supports targets where
>> this transformation would be OK for some shift amounts but not others.
>> So for shifts, I think this should be an array of HOST_BITS_PER_WIDE_INT
>> booleans rather than just one.
>>
>> More comments below about how this filters through your other changes.
> I think that you actually are missing what i am doing with this. I
> look at 3 representative values that "should" discover any non
> uniformities. If any of them are profitable, i set this bit. Then at
> the point where i really have to pull the trigger on a real instance, i
> check the shift amount used at that spot to see if the individual shift
> is profitable.
No, I got that. I just think it's an unnecessary complication.
> I did this for two reasons. One of them was that i was a little
> concerned that HOST_BITS_PER_WIDE_INT on the smallest host was not as
> big as the bitsize of word_word mode on the largest target (it could be
> but this knowledge is above my pay grade).
Ah, yes, sorry, I meant an array of BITS_PER_WORD booleans. I had
HOST_WIDE_INT on the brain after Mike's patch.
> The other reason was did not see this as a common operation and
> checking it on demand seemed like the winner.
But (at least after the other changes I mentioned) these overall
booleans cut out only a very small portion of find_decomposable_shift_zext.
I.e.:
op = SET_SRC (set);
if (GET_CODE (op) != ASHIFT
&& GET_CODE (op) != LSHIFTRT
&& GET_CODE (op) != ZERO_EXTEND) <-- unified booleans checked here
return 0;
op_operand = XEXP (op, 0);
if (!REG_P (SET_DEST (set)) || !REG_P (op_operand)
|| HARD_REGISTER_NUM_P (REGNO (SET_DEST (set)))
|| HARD_REGISTER_NUM_P (REGNO (op_operand))
|| !SCALAR_INT_MODE_P (GET_MODE (op)))
return 0;
if (GET_CODE (op) == ZERO_EXTEND)
{
if (GET_MODE (op_operand) != word_mode
|| GET_MODE_BITSIZE (GET_MODE (op)) != 2 * BITS_PER_WORD)
return 0;
}
else /* left or right shift */
{
<--- specific booleans checked here
if (!CONST_INT_P (XEXP (op, 1))
|| INTVAL (XEXP (op, 1)) < BITS_PER_WORD
|| GET_MODE_BITSIZE (GET_MODE (op_operand)) != 2 * BITS_PER_WORD)
return 0;
}
It seems better (and simpler) not to prejudge which shift amounts are
interesting and instead cache the "win or no win" flag for each value.
As I say, this is all in the context of this pass not being interesting
for modes where the split move is strictly more expensive than the
unified move, regardless of shift & zext costs.
Richard