Richard Henderson <[email protected]> writes:
> On 12/22/2011 11:33 AM, Richard Sandiford wrote:
>> was it simply the mode punning from HI to V4HI that stops us from
>> using the loongson_pinsr<V_suffix>_N insns? E.g. something like:
>
> Mostly...
>
>> I realise this isn't exactly simpler in code terms, but it avoids the
>> unspec and avoids the dual mode on match_operand 2 (HImode for matching,
>> SImode for generation). And with HImode not being valid for FPRs,
>> I'd have expected:
>>
>> (subreg:V4HI (reg:HI R) 0)
>>
>> to be as "bad" as:
>>
>> (subreg:SI (reg:HI R) 0)
>
> Well, for one thing, note that I actually copy that subreg into a real
> SImode pseudo. So that sub-reggy-ness is fairly well forced to happen
> in the integer registers. At which point the SImode value is perfectly
> happy to be copied into the fp regs.
You mean the register allocator will prefer a GPR for the SImode pseudo,
so the need for one won't be hidden in an input reload of the subreg?
Not sure -- maybe it would get a GPR, but with one definition as "d"
and one use as "f", I'm not sure we can really rely on that.
> The only other alternative that I see is to do the same with DImode
> and then subreg from there to V4HImode. While I only tested mips64el,
> I would guess that using DImode would perform worse in 32-bit mode.
Well, I wasn't thinking of using DImode, more going direct from
HImode to V4HImode, but I suppose the same thing would apply for
the subreg reload. Never mind then.
>>> /* Vector modes. */
>>> -VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */
>>> -VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */
>>> -VECTOR_MODES (INT, 4); /* V4QI V2HI */
>>> +VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */
>>> +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */
>>> +VECTOR_MODES (INT, 4); /* V4QI V2HI */
>>
>> Not sure about this bit.
>
> That's just re-aligning the comment columns so that V16QI fits.
OK, but there wasn't a V16QI comment that I could see.
>>> +/* Double-sized vector modes for vec_concat. */
>>> +VECTOR_MODE (INT, QI, 16);
>>> +VECTOR_MODE (INT, HI, 8);
>>> +VECTOR_MODE (INT, SI, 4);
>>> +VECTOR_MODE (FLOAT, SF, 4);
>>
>> It occured to me later that there might be an ABI impact with this
>> on n32 and n64, due to the historical mistake of defining things
>> in terms of modes (partly for the sake of libcalls):
>
> Hmm. Except that since mips_vector_mode_supported_p disallows
> V4SFmode, you'll never wind up with variables of that type.
>
> If the mips folk ever come up with another isa extension that
> operates on larger register sets, this will make a difference...
I'd still rather be safe than sorry when it comes to ABI stuff.
We've defined the mode, and we've long allowed users to define:
typedef float v4sf __attribute__((vector_size(16)));
regardless of backend support. I realise we shouldn't be matching
the two up in the absence of bugs, but...
>> Is there an explicit target-independent requirement that
>> TARGET_VECTORIZE_VEC_PERM_CONST_OK is only called in a
>> context where insns can be emitted?
>
> We start a sequence inside vec_perm_const_ok that prevents emit_insn
> from actually doing anything. It works everwhere else...
OK.
>>> + /* The backend (vec_select (vec_concat)) patterns are not duplicated
>>> + for single-operand. Try once with the original un-folded selector. */
>>> + if (mips_expand_vec_perm_const_1 (&d))
>>> + return true;
>>
>> What sort of input does this handle, and why don't we want the same thing
>> for case 1 & 2 single vectors?
>
> V4HI a;
> __builtin_select (a, a, (V4HI){ 0, 4, 1, 5 })
>
> When we called vec_perm_const_ok, we looked at the elements of the selector
> and saw that they come from two different operands. So we performed the test
> with op0 != op1 and one_operand_p = false.
>
> Here in expand_vec_perm_const, we see that a == a and want to fold to
> { 0, 0, 1, 1 }. Well, this caused us problems in the i386 port where
> we had some double-operand patterns that didn't match a single-operand
> pattern.
OK, but just so that I understand: does that mean we're missing some
single-operand cases that ought to be there, and this code is just
stopping us from ICEing when we hit them? Would it be worth having
a gcc_checking_assert there?
Richard