On Tue, Aug 30, 2011 at 7:01 PM, Artem Shinkarov <artyom.shinkar...@gmail.com> wrote: > On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov >> <artyom.shinkar...@gmail.com> wrote: >>> Hi >>> >>> This is a patch for the explicit vector shuffling we have discussed a >>> long time ago here: >>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html >>> >>> The new patch introduces the new tree code, as we agreed, and expands >>> this code by checking the vshuffle pattern in the backend. >>> >>> The patch at the moment lacks of some examples, but mainly it works >>> fine for me. It would be nice if i386 gurus could look into the way I >>> am doing the expansion. >>> >>> Middle-end parts seems to be more or less fine, they have not changed >>> much from the previous time. >> >> +@code{__builtin_shuffle (vec, mask)} and >> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct >> >> the latter would be __builtin_shuffle2. > > Why?? > That was the syntax we agreed on that elegantly handles both cases in one > place.
Ah, then there was a case below that mentions __builtin_shuffle2 that needs adjusting then. >> +bool >> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, >> + tree v1, tree mask) >> +{ >> +#define inner_type_size(vec) \ >> + GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec)))) >> >> missing comment. No #defines like this please, just initialize >> two temporary variables. >> >> + >> +rtx >> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target) >> +{ >> >> comment. >> >> +vshuffle: >> + gcc_assert (v1 == v0); >> + >> + icode = direct_optab_handler (vshuffle_optab, mode); >> >> hmm, so we don't have a vshuffle2 optab but always go via the >> builtin function, but only for constant masks there? I wonder >> if we should arrange for targets to only support a vshuffle >> optab (thus, transition away from the builtin) and so >> unconditionally have a vshuffle2 optab only (with possibly >> equivalent v1 and v0?) > > I have only implemented the case with non-constant mask that supports > only one argument. I think that it would be enough for the first > version. Later we can introduce vshuffle2 pattern and reuse the code > that expands vshuffle at the moment. Ok. >> I suppose Richard might remember what he had in mind back >> when we discussed this. >> >> Index: gcc/c-typeck.c >> =================================================================== >> --- gcc/c-typeck.c (revision 177758) >> +++ gcc/c-typeck.c (working copy) >> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc, >> && !check_builtin_function_arguments (fundecl, nargs, argarray)) >> return error_mark_node; >> >> + /* Typecheck a builtin function which is declared with variable >> + argument list. */ >> + if (fundecl && DECL_BUILT_IN (fundecl) >> + && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL) >> >> just add to check_builtin_function_arguments which is called right >> in front of your added code. >> >> + /* Here we change the return type of the builtin function >> + from int f(...) --> t f(...) where t is a type of the >> + first argument. */ >> + fundecl = copy_node (fundecl); >> + TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg), >> + TYPE_ARG_TYPES (TREE_TYPE >> (fundecl))); >> + function = build_fold_addr_expr (fundecl); >> >> oh, hum - now I remember ;) Eventually the C frontend should handle >> this not via the function call mechanism but similar to how Joseph >> added __builtin_complex support with >> >> 2011-08-19 Joseph Myers <jos...@codesourcery.com> >> >> * c-parser.c (c_parser_postfix_expression): Handle >> RID_BUILTIN_COMPLEX. >> * doc/extend.texi (__builtin_complex): Document. >> >> and then emit VEC_SHUFFLE_EXPRs directly from the frontend. Joseph? >> >> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, >> value) >> - if (!CONSTANT_CLASS_P (value)) >> + if (!CONSTANT_CLASS_P (value)) >> >> watch out for spurious whitespace changes. >> >> Index: gcc/gimplify.c >> =================================================================== >> --- gcc/gimplify.c (revision 177758) >> +++ gcc/gimplify.c (working copy) >> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq >> break; >> >> case BIT_FIELD_REF: >> + case VEC_SHUFFLE_EXPR: >> >> I don't think that's quite the right place given the is_gimple_lvalue >> predicate on the first operand. More like >> >> case VEC_SHUFFLE_EXPR: >> goto expr_3; >> >> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks> >> >> typo, mask >> >> + means >> + >> + freach i in length (mask): >> + A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]] >> +*/ >> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3) >> >> what is the (is there any?) constraint on the operand types, especially >> the mask type? >> >> Index: gcc/gimple.c >> =================================================================== >> --- gcc/gimple.c (revision 177758) >> +++ gcc/gimple.c (working copy) >> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c >> || (SYM) == ADDR_EXPR \ >> || (SYM) == WITH_SIZE_EXPR \ >> || (SYM) == SSA_NAME \ >> + || (SYM) == VEC_SHUFFLE_EXPR \ >> || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS \ >> : GIMPLE_INVALID_RHS), >> #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS, >> >> please make it GIMPLE_TERNARY_RHS instead. >> >> which requires adjustment at least here: >> >> Index: gcc/tree-ssa-operands.c >> =================================================================== >> --- gcc/tree-ssa-operands.c (revision 177758) >> +++ gcc/tree-ssa-operands.c (working copy) >> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex >> >> case COND_EXPR: >> case VEC_COND_EXPR: >> + case VEC_SHUFFLE_EXPR: >> >> >> I think it would be nicer if the builtin would be handled by the frontend >> not as builtin but like __builtin_complex and we'd just deal with >> VEC_SHUFFLE_EXPR throughout the middle-end, eventually >> lowering it in tree-vect-generic.c. So I didn't look at the lowering >> code in detail because that would obviously change then. >> >> Defering to Joseph for a decision here and to x86 maintainers for >> the target specific bits. >> >> Thanks, >> Richard. > > I'll go and see how the __builtin_complex are treated, and try to > adjust the patch. Thanks, also see Josephs comments on this. Richard. > > Thanks, > Artem. > >>> ChangeLog: >>> 2011-08-30 Artjoms Sinkarovs <artyom.shinkar...@gmailc.com> >>> >>> gcc/ >>> * optabs.c (expand_vec_shuffle_expr_p): New function. Checks >>> if given expression can be expanded by the target. >>> (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR >>> using target vector instructions. >>> * optabs.h: New optab vshuffle. >>> (expand_vec_shuffle_expr_p): New prototype. >>> (expand_vec_shuffle_expr): New prototype. >>> * genopinit.c: Adjust to support vecshuffle. >>> * builtins.def: New builtin __builtin_shuffle. >>> * c-typeck.c (build_function_call_vec): Typecheck >>> __builtin_shuffle, allowing only two or three arguments. >>> Change the type of builtin depending on the arguments. >>> (digest_init): Warn when constructor has less elements than >>> vector type. >>> * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR. >>> * tree.def: New tree code VEC_SHUFFLE_EXPR. >>> * tree-vect-generic.c (vector_element): New function. Returns an >>> element of the vector at the given position. >>> (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR >>> or expand an expression piecewise. >>> (expand_vector_operations_1): Adjusted. >>> (gate_expand_vector_operations_noop): New gate function. >>> * gimple.c (get_gimple_rhs_num_ops): Adjust. >>> * passes.c: Move veclower down. >>> * tree-pretty-print.c (dump_generic_node): Recognize >>> VEC_SHUFFLE_EXPR as valid expression. >>> * tree-ssa-operands: Adjust. >>> >>> gcc/config/i386 >>> * sse.md: (sseshuffint) New mode_attr. Correspondence between the >>> vector and the type of the mask when shuffling. >>> (vecshuffle<mode>): New expansion. >>> * i386-protos.h (ix86_expand_vshuffle): New prototype. >>> * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb. >>> (ix86_vectorize_builtin_vec_perm_ok): Adjust. >>> >>> gcc/doc >>> * extend.texi: Adjust. >>> >>> gcc/testsuite >>> * gcc.c-torture/execute/vect-shuffle-2.c: New test. >>> * gcc.c-torture/execute/vect-shuffle-4.c: New test. >>> * gcc.c-torture/execute/vect-shuffle-1.c: New test. >>> * gcc.c-torture/execute/vect-shuffle-3.c: New test. >>> >>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not >>> tested, because I don't have actual hardware. It works with -mavx, the >>> assembler code looks fine to me. I'll test it on a real hardware in >>> couple of days. >>> >>> >>> >>> Thanks, >>> Artem Shinkarov. >>> >> >