Looks good, but with this: Richard Henderson <r...@redhat.com> writes: > +(define_insn "*vec_setv4hi" > + [(set (match_operand:V4HI 0 "register_operand" "=f") > + (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "f") > + (match_operand:SI 2 "register_operand" "f") > + (match_operand:SI 3 "const_0_to_3_operand" "")] > + UNSPEC_LOONGSON_PINSRH))] > "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS" > - "pinsr<V_suffix>_3\t%0,%1,%2" > + "pinsrh_%3\t%0,%1,%2" > [(set_attr "type" "fdiv")]) > > +(define_expand "vec_setv4hi" > + [(set (match_operand:V4HI 0 "register_operand" "=f") > + (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "f") > + (match_operand:HI 2 "register_operand" "f") > + (match_operand:SI 3 "const_0_to_3_operand" "")] > + UNSPEC_LOONGSON_PINSRH))] > + "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS" > +{ > + rtx ext = gen_reg_rtx (SImode); > + emit_move_insn (ext, gen_lowpart (SImode, operands[1])); > + operands[1] = ext; > +})
was it simply the mode punning from HI to V4HI that stops us from using the loongson_pinsr<V_suffix>_N insns? E.g. something like: (define_expand "vec_setv4hi" [(set (match_operand:V4HI 0 "register_operand" "=f") (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "f") (match_operand:HI 2 "register_operand" "f") (match_operand:SI 3 "const_0_to_3_operand" "")] UNSPEC_LOONGSON_PINSRH))] "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS" { unsigned char perm[4]; int i; for (i = 0; i < 4; i++) perm[i] = (i == INTVAL (operands[3]) ? 4 : i); operands[2] = copy_to_reg (gen_lowpart (V4HImode, operands[2])); if (!mips_expand_vselect_vconcat (operands[0], operands[1], operands[2], perm, 4)) gcc_unreachable (); DONE; }) I realise this isn't exactly simpler in code terms, but it avoids the unspec and avoids the dual mode on match_operand 2 (HImode for matching, SImode for generation). And with HImode not being valid for FPRs, I'd have expected: (subreg:V4HI (reg:HI R) 0) to be as "bad" as: (subreg:SI (reg:HI R) 0) > diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def > index b9c508b..85861a9 100644 > --- a/gcc/config/mips/mips-modes.def > +++ b/gcc/config/mips/mips-modes.def > @@ -26,9 +26,15 @@ RESET_FLOAT_FORMAT (DF, mips_double_format); > FLOAT_MODE (TF, 16, mips_quad_format); > > /* Vector modes. */ > -VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ > -VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ > -VECTOR_MODES (INT, 4); /* V4QI V2HI */ > +VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ > +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ > +VECTOR_MODES (INT, 4); /* V4QI V2HI */ Not sure about this bit. > +/* Double-sized vector modes for vec_concat. */ > +VECTOR_MODE (INT, QI, 16); > +VECTOR_MODE (INT, HI, 8); > +VECTOR_MODE (INT, SI, 4); > +VECTOR_MODE (FLOAT, SF, 4); It occured to me later that there might be an ABI impact with this on n32 and n64, due to the historical mistake of defining things in terms of modes (partly for the sake of libcalls): /* Scalar, complex and vector floating-point types are passed in floating-point registers, as long as this is a named rather than a variable argument. */ info->fpr_p = (named && (type == 0 || FLOAT_TYPE_P (type)) && (GET_MODE_CLASS (mode) == MODE_FLOAT || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) && GET_MODE_UNIT_SIZE (mode) <= UNITS_PER_FPVALUE); UNITS_PER_FPVALUE is 16 because of the 128-bit long double. I think every: GET_MODE_CLASS (...) == MODE_VECTOR_FLOAT in mips.c needs to be changed to "mode == V2SFmode". > +/* Construct (set target (vec_select op0 (parallel perm))) and > + return true if that's a valid instruction in the active ISA. */ > + > +static bool > +expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned > nelt) Stupid nitpick, but there's a mixture of statics without "mips_" and statics with. Would be nice to have them all as all "mips_", just in case. > +/* Construct (set target (vec_select op0 (parallel perm))) and > + return true if that's a valid instruction in the active ISA. */ > + > +static bool > +expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned > nelt) > +{ > + rtx rperm[MAX_VECT_LEN], x; > + unsigned i; > + > + for (i = 0; i < nelt; ++i) > + rperm[i] = GEN_INT (perm[i]); > + > + x = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (nelt, rperm)); > + x = gen_rtx_VEC_SELECT (GET_MODE (target), op0, x); > + x = gen_rtx_SET (VOIDmode, target, x); > + > + x = emit_insn (x); > + if (recog_memoized (x) < 0) > + { > + remove_insn (x); > + return false; > + } > + return true; > +} Is there an explicit target-independent requirement that TARGET_VECTORIZE_VEC_PERM_CONST_OK is only called in a context where insns can be emitted? Reload seems to get by with make_insn_raw. Maybe we could add a "testing?" parameter and only call add_insn if the parameter is false, and if recog_memoized succeeds. > + /* The backend (vec_select (vec_concat)) patterns are not duplicated > + for single-operand. Try once with the original un-folded selector. */ > + if (mips_expand_vec_perm_const_1 (&d)) > + return true; What sort of input does this handle, and why don't we want the same thing for case 1 & 2 single vectors? Richard