On Tue, Aug 30, 2011 at 7:01 PM, Artem Shinkarov
<[email protected]> wrote:
> On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther
> <[email protected]> wrote:
>> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
>> <[email protected]> wrote:
>>> Hi
>>>
>>> This is a patch for the explicit vector shuffling we have discussed a
>>> long time ago here:
>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>>>
>>> The new patch introduces the new tree code, as we agreed, and expands
>>> this code by checking the vshuffle pattern in the backend.
>>>
>>> The patch at the moment lacks of some examples, but mainly it works
>>> fine for me. It would be nice if i386 gurus could look into the way I
>>> am doing the expansion.
>>>
>>> Middle-end parts seems to be more or less fine, they have not changed
>>> much from the previous time.
>>
>> +@code{__builtin_shuffle (vec, mask)} and
>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>
>> the latter would be __builtin_shuffle2.
>
> Why??
> That was the syntax we agreed on that elegantly handles both cases in one
> place.
Ah, then there was a case below that mentions __builtin_shuffle2 that
needs adjusting then.
>> +bool
>> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
>> + tree v1, tree mask)
>> +{
>> +#define inner_type_size(vec) \
>> + GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
>>
>> missing comment. No #defines like this please, just initialize
>> two temporary variables.
>>
>> +
>> +rtx
>> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
>> +{
>>
>> comment.
>>
>> +vshuffle:
>> + gcc_assert (v1 == v0);
>> +
>> + icode = direct_optab_handler (vshuffle_optab, mode);
>>
>> hmm, so we don't have a vshuffle2 optab but always go via the
>> builtin function, but only for constant masks there? I wonder
>> if we should arrange for targets to only support a vshuffle
>> optab (thus, transition away from the builtin) and so
>> unconditionally have a vshuffle2 optab only (with possibly
>> equivalent v1 and v0?)
>
> I have only implemented the case with non-constant mask that supports
> only one argument. I think that it would be enough for the first
> version. Later we can introduce vshuffle2 pattern and reuse the code
> that expands vshuffle at the moment.
Ok.
>> I suppose Richard might remember what he had in mind back
>> when we discussed this.
>>
>> Index: gcc/c-typeck.c
>> ===================================================================
>> --- gcc/c-typeck.c (revision 177758)
>> +++ gcc/c-typeck.c (working copy)
>> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
>> && !check_builtin_function_arguments (fundecl, nargs, argarray))
>> return error_mark_node;
>>
>> + /* Typecheck a builtin function which is declared with variable
>> + argument list. */
>> + if (fundecl && DECL_BUILT_IN (fundecl)
>> + && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
>>
>> just add to check_builtin_function_arguments which is called right
>> in front of your added code.
>>
>> + /* Here we change the return type of the builtin function
>> + from int f(...) --> t f(...) where t is a type of the
>> + first argument. */
>> + fundecl = copy_node (fundecl);
>> + TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
>> + TYPE_ARG_TYPES (TREE_TYPE
>> (fundecl)));
>> + function = build_fold_addr_expr (fundecl);
>>
>> oh, hum - now I remember ;) Eventually the C frontend should handle
>> this not via the function call mechanism but similar to how Joseph
>> added __builtin_complex support with
>>
>> 2011-08-19 Joseph Myers <[email protected]>
>>
>> * c-parser.c (c_parser_postfix_expression): Handle
>> RID_BUILTIN_COMPLEX.
>> * doc/extend.texi (__builtin_complex): Document.
>>
>> and then emit VEC_SHUFFLE_EXPRs directly from the frontend. Joseph?
>>
>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix,
>> value)
>> - if (!CONSTANT_CLASS_P (value))
>> + if (!CONSTANT_CLASS_P (value))
>>
>> watch out for spurious whitespace changes.
>>
>> Index: gcc/gimplify.c
>> ===================================================================
>> --- gcc/gimplify.c (revision 177758)
>> +++ gcc/gimplify.c (working copy)
>> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>> break;
>>
>> case BIT_FIELD_REF:
>> + case VEC_SHUFFLE_EXPR:
>>
>> I don't think that's quite the right place given the is_gimple_lvalue
>> predicate on the first operand. More like
>>
>> case VEC_SHUFFLE_EXPR:
>> goto expr_3;
>>
>> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>>
>> typo, mask
>>
>> + means
>> +
>> + freach i in length (mask):
>> + A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
>> +*/
>> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
>>
>> what is the (is there any?) constraint on the operand types, especially
>> the mask type?
>>
>> Index: gcc/gimple.c
>> ===================================================================
>> --- gcc/gimple.c (revision 177758)
>> +++ gcc/gimple.c (working copy)
>> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
>> || (SYM) == ADDR_EXPR \
>> || (SYM) == WITH_SIZE_EXPR \
>> || (SYM) == SSA_NAME \
>> + || (SYM) == VEC_SHUFFLE_EXPR \
>> || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS \
>> : GIMPLE_INVALID_RHS),
>> #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
>>
>> please make it GIMPLE_TERNARY_RHS instead.
>>
>> which requires adjustment at least here:
>>
>> Index: gcc/tree-ssa-operands.c
>> ===================================================================
>> --- gcc/tree-ssa-operands.c (revision 177758)
>> +++ gcc/tree-ssa-operands.c (working copy)
>> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
>>
>> case COND_EXPR:
>> case VEC_COND_EXPR:
>> + case VEC_SHUFFLE_EXPR:
>>
>>
>> I think it would be nicer if the builtin would be handled by the frontend
>> not as builtin but like __builtin_complex and we'd just deal with
>> VEC_SHUFFLE_EXPR throughout the middle-end, eventually
>> lowering it in tree-vect-generic.c. So I didn't look at the lowering
>> code in detail because that would obviously change then.
>>
>> Defering to Joseph for a decision here and to x86 maintainers for
>> the target specific bits.
>>
>> Thanks,
>> Richard.
>
> I'll go and see how the __builtin_complex are treated, and try to
> adjust the patch.
Thanks, also see Josephs comments on this.
Richard.
>
> Thanks,
> Artem.
>
>>> ChangeLog:
>>> 2011-08-30 Artjoms Sinkarovs <[email protected]>
>>>
>>> gcc/
>>> * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>>> if given expression can be expanded by the target.
>>> (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>>> using target vector instructions.
>>> * optabs.h: New optab vshuffle.
>>> (expand_vec_shuffle_expr_p): New prototype.
>>> (expand_vec_shuffle_expr): New prototype.
>>> * genopinit.c: Adjust to support vecshuffle.
>>> * builtins.def: New builtin __builtin_shuffle.
>>> * c-typeck.c (build_function_call_vec): Typecheck
>>> __builtin_shuffle, allowing only two or three arguments.
>>> Change the type of builtin depending on the arguments.
>>> (digest_init): Warn when constructor has less elements than
>>> vector type.
>>> * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>>> * tree.def: New tree code VEC_SHUFFLE_EXPR.
>>> * tree-vect-generic.c (vector_element): New function. Returns an
>>> element of the vector at the given position.
>>> (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>>> or expand an expression piecewise.
>>> (expand_vector_operations_1): Adjusted.
>>> (gate_expand_vector_operations_noop): New gate function.
>>> * gimple.c (get_gimple_rhs_num_ops): Adjust.
>>> * passes.c: Move veclower down.
>>> * tree-pretty-print.c (dump_generic_node): Recognize
>>> VEC_SHUFFLE_EXPR as valid expression.
>>> * tree-ssa-operands: Adjust.
>>>
>>> gcc/config/i386
>>> * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>>> vector and the type of the mask when shuffling.
>>> (vecshuffle<mode>): New expansion.
>>> * i386-protos.h (ix86_expand_vshuffle): New prototype.
>>> * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>>> (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>>>
>>> gcc/doc
>>> * extend.texi: Adjust.
>>>
>>> gcc/testsuite
>>> * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>>> * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>>> * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>>> * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>>>
>>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
>>> tested, because I don't have actual hardware. It works with -mavx, the
>>> assembler code looks fine to me. I'll test it on a real hardware in
>>> couple of days.
>>>
>>>
>>>
>>> Thanks,
>>> Artem Shinkarov.
>>>
>>
>