On Tue, 20 Aug 2024, Tamar Christina wrote: > Hi, > > As you know I've been working on removing the code that demotes GIMPLE > COND_EXPR to GENERIC during vect_recog_bool_pattern. > > To restate why, The issue we currently have today is that the mask (boolean > argument of a COND_EXPR) is not always available during pattern matching. > > This is a problem as it means we can't do mask optimization early on and the > it prevents us from writing patterns that consume these masks, such as my > vect_recog_cond_store_pattern. > > What's happening here is that normally in GIMPLE we would have this sequence: > > a = <bool expression> > d = a ? b : c > > Here a will be expected to be the mask. vect_recog_mask_conversion_pattern > will > handle this and we can refer to a and get the appropriate mask out. > > But there are some cases where in gimple a is not a mask, or even a boolean > type. An example is when a is an external. > > The vectorizer then just sees > > d = a ? b : c > > and in order to create a mask lowers this into > > d = a == 0 ? b : c > > Two interesting things happen in this case. > > 1. the operand is no longer a mask, so it can't be used by any optimization. > 2. it bypasses most of the validation in the vectorizer. Specifically it > skips > the checks imposed by vect_maybe_update_slp_op_vectype and > vect_create_constant_vectors creates the right invariant conversions at the > end of vectorization. > > The goal is to make this all explicit. It means that the mask lowering should > stay in GIMPLE and explicitly say how to convert the argument to a mask. > > For non-boolean this is simple, essentially we transform > > d = a ? b : c
I see we don't verify this but 'a' should be effectively a boolean even for scalar? Note that almost all cases result from loop if-conversion so we might "mess up" there in this regard. > into > > a1 = a != 0 > d = a1 ? b : c > > This works, but does have a downside, we cost a1, and we vectorize a1. But > it's loop invariant so we didn't really need to. i.e. we could have done a > scalar compare and split, which is what vect_create_constant_vectors currently > does. The issue is that we do need to vectorize (but invariant). We already support invariant loads (VMAT_INVARIANT), it would be possible to support invariant code generation in the preheader (SLP codegen would already naturally do this but I think we forcefully tie its hands). A bigger problem might be that the vectorization of a1 = a != 0 might influence the vectorization factor. > The bigger issue is boolean arguments. > canonically the way we can transform the boolean into a mask is by > transforming > > > d = a ? b : c > > into > > a1 = a ? -1 : 0 > a2 = a1 != 0 > d = a2 ? b : c > > But this introduces a chicken and egg problem.. It re-introduces a COND_EXPR > and so vectorization would fail as we have no way to vectorize a1. Hmm. Why's that? ISTR we only need to be able to distinguish flag/mask uses of 'bool' from data uses of 'bool'. For flag/mask uses we replace it with a compare - which of course we likely fold away when the argument is boolean. I would suggest to instead produce a1 = a != 0; d = a1 ? b : c; similar as for non-boolean and simply avoid folding. ISTR most of the bool patterns deal with properly widening/trucating the operand for the mask-use so we get signed-bool:16 or signed-bool:1, we need to preserve that. So the above would be a1 = a != 0; a2 = (signed-bool:16) a1; d = a2 ? b : c; which I think we try to "optimize" as a2 = a != 0 ? (signed-bool:16)-1 : 0; but then (signed-bool:16) a2 = a != 0; should work as well? I might misremember all of this of course, it's a minefield for sure. > My current code for this is: > > tree lhs_var = vect_recog_temp_ssa_var (boolean_type_node, NULL); > tree cond_type = TREE_TYPE (var); > > /* If the condition is already a boolean then manually convert it to a > mask of > the given integer type but don't set a vectype. */ > if (TREE_CODE (cond_type) == BOOLEAN_TYPE > && TREE_CODE (var) == SSA_NAME) > { > tree lhs_ivar = vect_recog_temp_ssa_var (type, NULL); > tree true_val = build_all_ones_cst (type); > tree false_val = build_zero_cst (type); > pattern_stmt = gimple_build_assign (lhs_ivar, COND_EXPR, var, > true_val, false_val); > append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt); > var = lhs_ivar; > cond_type = type; > } > > pattern_stmt = gimple_build_assign (lhs_var, NE_EXPR, var, > build_zero_cst (cond_type)); > > Now to address this, without creating hacks into the vectorizer, would the > right > approach be to teach the vectorizer to recognize loop invariant computations? I would defer this, but I'd see to compute an alternate def type, vect_invariant_def maybe? Encoding this in relevancy doesn't look right. Note we also get testcases from time to time where invariant motion failed to move invariant code. We could possibly simply leave SLP scheduling to code-generate in the preheader (as said, that worked at some point). Richard. > This would then not need to be vectorizered and can simply be lifted into the > loop pre-header pred similar to what vect_create_constant_vectors does. > > If you agree, what's the best way? Should I teach analysis this through a new > vect_used_invariant_scope or something? > > Or should I add a new gimple_seq to VINFO? or something else? > > Thanks, > Tamar > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)