Richard Henderson <r...@redhat.com> writes: > On 12/22/2011 11:33 AM, Richard Sandiford wrote: >> was it simply the mode punning from HI to V4HI that stops us from >> using the loongson_pinsr<V_suffix>_N insns? E.g. something like: > > Mostly... > >> I realise this isn't exactly simpler in code terms, but it avoids the >> unspec and avoids the dual mode on match_operand 2 (HImode for matching, >> SImode for generation). And with HImode not being valid for FPRs, >> I'd have expected: >> >> (subreg:V4HI (reg:HI R) 0) >> >> to be as "bad" as: >> >> (subreg:SI (reg:HI R) 0) > > Well, for one thing, note that I actually copy that subreg into a real > SImode pseudo. So that sub-reggy-ness is fairly well forced to happen > in the integer registers. At which point the SImode value is perfectly > happy to be copied into the fp regs.
You mean the register allocator will prefer a GPR for the SImode pseudo, so the need for one won't be hidden in an input reload of the subreg? Not sure -- maybe it would get a GPR, but with one definition as "d" and one use as "f", I'm not sure we can really rely on that. > The only other alternative that I see is to do the same with DImode > and then subreg from there to V4HImode. While I only tested mips64el, > I would guess that using DImode would perform worse in 32-bit mode. Well, I wasn't thinking of using DImode, more going direct from HImode to V4HImode, but I suppose the same thing would apply for the subreg reload. Never mind then. >>> /* Vector modes. */ >>> -VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ >>> -VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ >>> -VECTOR_MODES (INT, 4); /* V4QI V2HI */ >>> +VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ >>> +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ >>> +VECTOR_MODES (INT, 4); /* V4QI V2HI */ >> >> Not sure about this bit. > > That's just re-aligning the comment columns so that V16QI fits. OK, but there wasn't a V16QI comment that I could see. >>> +/* Double-sized vector modes for vec_concat. */ >>> +VECTOR_MODE (INT, QI, 16); >>> +VECTOR_MODE (INT, HI, 8); >>> +VECTOR_MODE (INT, SI, 4); >>> +VECTOR_MODE (FLOAT, SF, 4); >> >> It occured to me later that there might be an ABI impact with this >> on n32 and n64, due to the historical mistake of defining things >> in terms of modes (partly for the sake of libcalls): > > Hmm. Except that since mips_vector_mode_supported_p disallows > V4SFmode, you'll never wind up with variables of that type. > > If the mips folk ever come up with another isa extension that > operates on larger register sets, this will make a difference... I'd still rather be safe than sorry when it comes to ABI stuff. We've defined the mode, and we've long allowed users to define: typedef float v4sf __attribute__((vector_size(16))); regardless of backend support. I realise we shouldn't be matching the two up in the absence of bugs, but... >> Is there an explicit target-independent requirement that >> TARGET_VECTORIZE_VEC_PERM_CONST_OK is only called in a >> context where insns can be emitted? > > We start a sequence inside vec_perm_const_ok that prevents emit_insn > from actually doing anything. It works everwhere else... OK. >>> + /* The backend (vec_select (vec_concat)) patterns are not duplicated >>> + for single-operand. Try once with the original un-folded selector. */ >>> + if (mips_expand_vec_perm_const_1 (&d)) >>> + return true; >> >> What sort of input does this handle, and why don't we want the same thing >> for case 1 & 2 single vectors? > > V4HI a; > __builtin_select (a, a, (V4HI){ 0, 4, 1, 5 }) > > When we called vec_perm_const_ok, we looked at the elements of the selector > and saw that they come from two different operands. So we performed the test > with op0 != op1 and one_operand_p = false. > > Here in expand_vec_perm_const, we see that a == a and want to fold to > { 0, 0, 1, 1 }. Well, this caused us problems in the i386 port where > we had some double-operand patterns that didn't match a single-operand > pattern. OK, but just so that I understand: does that mean we're missing some single-operand cases that ought to be there, and this code is just stopping us from ICEing when we hit them? Would it be worth having a gcc_checking_assert there? Richard