On Thu, Aug 25, 2011 at 2:00 PM, Richard Guenther
<[email protected]> wrote:
> On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
> <[email protected]> wrote:
>> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
>> <[email protected]> wrote:
>>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>>> <[email protected]> wrote:
>>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>>> <[email protected]> wrote:
>>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>>> <[email protected]> wrote:
>>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>>> we discussed.
>>>>>>
>>>>>> So I think it is a right time to do something about vcond patterns,
>>>>>> which would allow me to get rid of conversions that I need to put all
>>>>>> over the code.
>>>>>>
>>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>>> #define vector(elcount, type) \
>>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>
>>>>>> int main (int argc, char *argv[]) {
>>>>>> vector (4, float) f0;
>>>>>> vector (4, float) f1;
>>>>>>
>>>>>> f0 = f1 != f0
>>>>>> ? (vector (4, float)){-1,-1,-1,-1} : (vector (4,
>>>>>> float)){0,0,0,0};
>>>>>>
>>>>>> return (int)f0[argc];
>>>>>> }
>>>>>>
>>>>>> test-lto.c:8:14: internal compiler error: in convert, at
>>>>>> lto/lto-lang.c:1244
>>>>>>
>>>>>> I looked into the file, the conversion function is defined as
>>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>>> know what is the right way to treat the conversions.
>>>>>>
>>>>>> And I seriously need help with backend patterns.
>>>>>
>>>>> On the patch.
>>>>>
>>>>> The documentation needs review by a native english speaker, but here
>>>>> are some factual comments:
>>>>>
>>>>> +In C vector comparison is supported within standard comparison operators:
>>>>>
>>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>>> extension.
>>>>>
>>>>> The result of the
>>>>> +comparison is a signed integer-type vector where the size of each
>>>>> +element must be the same as the size of compared vectors element.
>>>>>
>>>>> The result type of the comparison is determined by the C frontend,
>>>>> it isn't under control of the user. What you are implying here is
>>>>> restrictions on vector assignments, which are documented elsewhere.
>>>>> I'd just say
>>>>>
>>>>> 'The result of the comparison is a vector of the same width and number
>>>>> of elements as the comparison operands with a signed integral element
>>>>> type.'
>>>>>
>>>>> +In addition to the vector comparison C supports conditional expressions
>>>>>
>>>>> See above.
>>>>>
>>>>> +For the convenience condition in the vector conditional can be just a
>>>>> +vector of signed integer type.
>>>>>
>>>>> 'of integer type.' I don't see a reason to disallow unsigned integers,
>>>>> they can be equally well compared against zero.
>>>>
>>>> I'll have a final go on the documentation, it is untouched from the old
>>>> patches.
>>>>
>>>>> Index: gcc/targhooks.h
>>>>> ===================================================================
>>>>> --- gcc/targhooks.h (revision 177665)
>>>>> +++ gcc/targhooks.h (working copy)
>>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>> extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>>
>>>>> extern bool default_builtin_vector_alignment_reachable (const_tree,
>>>>> bool);
>>>>> +
>>>>> extern bool
>>>>> default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>> const_tree,
>>>>>
>>>>> spurious whitespace change.
>>>>
>>>> Yes, thanks.
>>>>
>>>>> Index: gcc/optabs.c
>>>>> ===================================================================
>>>>> --- gcc/optabs.c (revision 177665)
>>>>> +++ gcc/optabs.c (working copy)
>>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>>> ...
>>>>> + else
>>>>> + {
>>>>> + rtx rtx_op0;
>>>>> + rtx vec;
>>>>> +
>>>>> + rtx_op0 = expand_normal (op0);
>>>>> + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>>> + vec = CONST0_RTX (mode);
>>>>> +
>>>>> + create_output_operand (&ops[0], target, mode);
>>>>> + create_input_operand (&ops[1], rtx_op1, mode);
>>>>> + create_input_operand (&ops[2], rtx_op2, mode);
>>>>> + create_input_operand (&ops[3], comparison, mode);
>>>>> + create_input_operand (&ops[4], rtx_op0, mode);
>>>>> + create_input_operand (&ops[5], vec, mode);
>>>>>
>>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>>> with the .md part if we want to use a machine specific pattern for this
>>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>>
>>>> Yes, I am waiting for it. This is the only way at the moment to make
>>>> sure that in
>>>> m = a > b;
>>>> r = m ? c : d;
>>>>
>>>> m in the vcond is not transformed into the m != 0.
>>>>
>>>>> Index: gcc/target.h
>>>>> ===================================================================
>>>>> --- gcc/target.h (revision 177665)
>>>>> +++ gcc/target.h (working copy)
>>>>> @@ -51,6 +51,7 @@
>>>>> #define GCC_TARGET_H
>>>>>
>>>>> #include "insn-modes.h"
>>>>> +#include "gimple.h"
>>>>>
>>>>> #ifdef ENABLE_CHECKING
>>>>>
>>>>> spurious change.
>>>>
>>>> Old stuff, fixed.
>>>>
>>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>> floating-point, we can only do some of these simplifications.) */
>>>>> if (operand_equal_p (arg0, arg1, 0))
>>>>> {
>>>>> + tree arg0_type = TREE_TYPE (arg0);
>>>>> +
>>>>> switch (code)
>>>>> {
>>>>> case EQ_EXPR:
>>>>> - if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>>> - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>>> + if (! FLOAT_TYPE_P (arg0_type)
>>>>> + || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>>> ...
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> Likewise.
>>>>>
>>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>> case UNGE_EXPR:
>>>>> case UNEQ_EXPR:
>>>>> case LTGT_EXPR:
>>>>> + if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>>> + {
>>>>> + enum tree_code code = ops->code;
>>>>> + tree arg0 = ops->op0;
>>>>> + tree arg1 = ops->op1;
>>>>>
>>>>> move this code to do_store_flag (we really store a flag value). It should
>>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>>
>>>> I started to do that, but the code in do_store_flag is completely
>>>> different from what I am doing, and it looks confusing. I just call
>>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>>> but the code is quite small.
>>>
>>> Hm? I see in your patch
>>>
>>> Index: gcc/expr.c
>>> ===================================================================
>>> --- gcc/expr.c (revision 177665)
>>> +++ gcc/expr.c (working copy)
>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>> case UNGE_EXPR:
>>> case UNEQ_EXPR:
>>> case LTGT_EXPR:
>>> + if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>> + {
>>> + enum tree_code code = ops->code;
>>> + tree arg0 = ops->op0;
>>> + tree arg1 = ops->op1;
>>> + tree arg_type = TREE_TYPE (arg0);
>>> + tree el_type = TREE_TYPE (arg_type);
>>> + tree t, ifexp, if_true, if_false;
>>> +
>>> + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>>> (el_type), 0);
>>> +
>>> +
>>> + ifexp = build2 (code, type, arg0, arg1);
>>> + if_true = build_vector_from_val (type, build_int_cst (el_type,
>>> -1));
>>> + if_false = build_vector_from_val (type, build_int_cst (el_type,
>>> 0));
>>> +
>>> + if (arg_type != type)
>>> + {
>>> + if_true = convert (arg_type, if_true);
>>> + if_false = convert (arg_type, if_false);
>>> + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true,
>>> if_false);
>>> + t = convert (type, t);
>>> + }
>>> + else
>>> + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>>> +
>>> + return expand_expr (t,
>>> + modifier != EXPAND_STACK_PARM ? target :
>>> NULL_RTX,
>>> + tmode != VOIDmode ? tmode : mode,
>>> + modifier);
>>> + }
>>>
>>> that's not exactly "calling expand_vec_cond_expr".
>>
>> Well, actually it is. Keep in mind that clean backend would imply
>> removing the conversions. But I'll make a function.
>
> Why does
>
> return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
> build_vector_from_val
> (type, build_int_cst (el_type, -1)),
> build_vector_from_val
> (type, build_int_cst (el_type, 0)));
>
> not work? If you push the conversions to expand_vec_cond_expr
> by doing them on RTL you simplify things here and remove the requirement
> from doing them in the C frontend for VEC_COND_EXPR as well.
It does not work because vcond <a > b, c, d> requires a,b,c,d to have
the same type. Now here we are dealing only with comparisons, so in
case of floats we have vcond < f0 > f1, {-1,...}, {0,...}> which we
have to transform into
(vsi)(vcond< f0 >f1, (vsf){-1,...}, (vsf){0,...}>).
Ok, so is it ok to do make this conversion here for the real types?
>>>>>
>>>>> As for the still required conversions, you should be able to delay those
>>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>>> and then convert the result back to the original mode.
>>>>>
>>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>>
>>>> Conversions are there until we fix the backend. When backend will be
>>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>>> away.
>>>>
>>>>> +static tree
>>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>>
>>>>> is missing a function comment.
>>>>
>>>> fixed.
>>>>
>>>>> +static tree
>>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>>> + tree bitpos, tree bitsize, enum tree_code code)
>>>>> +{
>>>>> + tree cond;
>>>>> + tree comp_type;
>>>>> +
>>>>> + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>>> + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>>> +
>>>>> + comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>>>>> (inner_type), 0);
>>>>> +
>>>>>
>>>>> Use
>>>>>
>>>>> comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type),
>>>>> 0);
>>>>>
>>>>> instead. But I think you don't want to use TYPE_PRECISION on
>>>>> FP types. Instead you want a signed integer type of the same (mode)
>>>>> size as the vector element type, thus
>>>>>
>>>>> comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>>> (TYPE_MODE (inner_type)), 0);
>>>>
>>>> Hm, I thought that at this stage we don't wan to know anything about
>>>> modes. I mean here I am really building the same integer type as the
>>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>>> don't think that it could happen that the size of the mode is
>>>> different from the size of the type. Or could it?
>>>
>>> The comparison could be on floating-point types where TYPE_PRECISION
>>> can be, for example, 80 for x87 doubles. You want an integer type
>>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>>> to use here.
>>
>> Ok.
>>
>>>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>>
>>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>>
>>>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>>> + build_int_cst (comp_type, -1),
>>>>> + build_int_cst (comp_type, 0));
>>>>>
>>>>> writing this as
>>>>>
>>>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>> fold_build2 (code, boolean_type_node, a, b),
>>>>> + build_int_cst (comp_type, -1),
>>>>> + build_int_cst (comp_type, 0));
>>>>>
>>>>> will get the gimplifier a better chance at simplifcation.
>>>>>
>>>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>>
>>>>> I think we are expecting the scalar type and the vector mode here
>>>>> from looking at the single existing caller. It probably doesn't make
>>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>>> also work for vector types), but let's be consistent. Thus,
>>>>
>>>> Ok.
>>>>
>>>>> if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>>
>>>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>> + t = expand_vector_piecewise (gsi, do_compare, type,
>>>>> + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>>> + else
>>>>> + t = gimplify_build2 (gsi, code, type, op0, op1);
>>>>>
>>>>> the else case looks odd. Why re-build a stmt that already exists?
>>>>> Simply return NULL_TREE instead?
>>>>
>>>> I can adjust. The reason it is written that way is that
>>>> expand_vector_operations_1 is using the result of the function to
>>>> update rhs.
>>>
>>> Ok, so it should check whether there was any lowering done then.
>>>
>>>>> +static tree
>>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>>> +{
>>>>> ...
>>>>> + /* Expand vector condition inside of VEC_COND_EXPR. */
>>>>> + if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>>> + TYPE_MODE (TREE_TYPE (cond))))
>>>>> + {
>>>>> ...
>>>>> + new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>>> + TREE_TYPE (cond),
>>>>> + TREE_TYPE (TREE_TYPE (op1)),
>>>>> + op0, op1, TREE_CODE (cond));
>>>>>
>>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>>
>>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>>> v0 & tem | v1 & ~tem;
>>>>>
>>>>> instead of
>>>>>
>>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>>
>>>>> even if the bitwise operations could be carried out using vectors.
>>>>> It's definitely beneficial to do the first if the CPU can create the
>>>>> bitmask.
>>>>>
>>>>
>>>> o_O
>>>>
>>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>>> Do you want to have two cases of the expansion then -- when we have
>>>> mask available and when we don't? But it is really unlikely that we
>>>> can get the mask, but cannot get vcond. Because condition is actually
>>>> vcond. So once again -- do we always expand to {a[0] > b[0] ? v[0] :
>>>> c[0], ...}?
>>>
>>> Hm, yeah. I suppose with the current setup it's hard to only
>>> get the mask but not the full vcond ;) So it probably makes
>>> sense to always expand to {a[0] > b[0] ? v[0] :c[0],...} as
>>> fallback. Sorry for the confusion ;)
>>
>> Ok.
>>
>>>>> + /* Run vecower on the expresisons we have introduced. */
>>>>> + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>>> + expand_vector_operations_1 (&gsi_tmp);
>>>>>
>>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>>
>>>>> +static bool
>>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>>> +{
>>>>>
>>>>> This function is lacking a comment.
>>>>>
>>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>>> ...
>>>>> + /* Try to get rid from the useless vector comparison
>>>>> + x != {0,0,...} which is inserted by the typechecker. */
>>>>> + if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>>
>>>>> how and why? You simply drop that comparison - that doesn't look
>>>>> correct. And in fact TREE_OPERAND (cond, 0) will never be a
>>>>> comparison - that wouldn't be valid gimple. Please leave this
>>>>> optimization to SSA based forward propagation (I can help you here
>>>>> once the patch is in).
>>>>
>>>> No-no-no. This is the second part of avoiding
>>>> m = a > b;
>>>> r = m ? v0 : v1;
>>>>
>>>> to prevent m expansion to m != {0}.
>>>>
>>>> I do not _simply_ drop the comparison. I drop it only if
>>>> is_vector_comparison returned true. It means that we can never get
>>>> into the situation that we are dropping actually a comparison inserted
>>>> by the user. But what I really want to achieve here is to drop the
>>>> comparison that the frontend inserts every time when it sees an
>>>> expression there.
>>>>
>>>> As I said earlier, tree forward propagation kicks only using -On, and
>>>> I would really like to make sure that I can get rid of useless != {0}
>>>> at any level.
>>
>>> Please don't. If the language extension forces a != 0 then it should
>>> appear at -O0. The code is fishy anyway in the way it walks stmts
>>> in is_vector_comparison. At least I don't like to see this optimization
>>> done here for the sake of -O0 in this initial patch - you could try
>>> arguing about it as a followup improvement (well, probably with not
>>> much luck). -O0 is about compile-speed and debugging, doing
>>> data-flow by walking stmts backward is slow.
>>
>> Ok, then I seriously don't see any motivation to support the
>> VEC_COND_EXPR. The following code:
>>
>> m = a > b;
>> r = (m & v0) | (~m & v1)
>>
>> gives me much more flexibility and control. What the VEC_COND_EXPR is
>> good for? Syntactical sugar?
>>
>> How about throwing away all the VEC_COND_EXPR parts supporting only
>> conditions (implicitly expressed using vconds)? If we would agree on
>> implicit conversions for real types, then this is a functionality that
>> perfectly satisfies my needs.
>>
>> I don't see any interest from the backend people and I cannot wait
>> forever, so why don't we start with a simple thing?
>
> But the simple thing is already what the backend supports.
>
> Richard.
>
Well, it is not "what" it is "how" -- that is what we are discussing
for three weeks already.
Ok, so the question now is, whether it is fine to have conversions
inside expand_expr_real_2? If we agree that it is ok to do, then I can
adjust the patch.
Artem.