On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <[email protected]> wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <[email protected]>
>> wrote:
>> >
>> > Yes, I want to try it. But getting rid of bool patterns would mean
>> > support for all targets currently supporting vec_cond. Would it be OK
>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>
>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be
>> able
>> to figure out the correct vector type (or mask type) from the uses.
>> Currently
>> it simply looks at the stmts LHS type but as all stmt operands already have
>> vector types it can as well compute the result type from those. We'd want to
>> have a helper function that does this result type computation as I figure it
>> will be needed in multiple places.
>>
>> This is now on my personal TODO list (but that's already quite long for GCC
>> 6),
>> so if you manage to get to that... see
>> tree-vect-loop.c:vect_determine_vectorization_factor
>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>> vector type set from data-ref analysis already - there 'bool' loads
>> correctly get
>> VNQImode). There is a basic-block / SLP part as well that would need to use
>> the helper function (eventually with some SLP discovery order issue).
>>
>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>> > MASK_STORE).
>>
>> You can certainly try that first, but as soon as you hit complications with
>> needing to adjust bool patterns then I'd rather get rid of them.
>>
>> >
>> > I can directly build a vector type with specified mode to avoid it. Smth.
>> > like:
>> >
>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>
>> Hmm, indeed, that might be a (good) solution. Btw, in this case
>> target attribute
>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>> on the active target). There would also be no way for the user to
>> declare vector<bool>
>> in source (which is good because of that target attribute issue...).
>>
>> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits)
>> and adjusting
>> truth_type_for is the way to go.
>>
>> I suggest you try modifying those parts first according to this scheme
>> that will most
>> likely uncover issues we missed.
>>
>> Thanks,
>> Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.
> There were no major issues (for now).
>
> build_truth_vector_type and get_mask_type_for_scalar_type were added to build
> a mask type. It is always a vector of bools but its mode is determined by a
> target using number of units and currently used vector length.
>
> As previously I fixed if-conversion to apply boolean masks for loads and
> stores which automatically disables bool patterns for them and flow goes by a
> mask path. Vectorization factor computation is fixed to have a separate
> computation for mask types. Comparison is now handled separately by
> vectorizer and is vectorized into vector comparison.
>
> Optabs for masked loads and stores were transformed into convert optabs. Now
> it is checked using both value and mask modes.
>
> Optabs for comparison were added. These are also convert optabs checking
> value and result type.
>
> I had to introduce significant number of new patterns in i386 target to
> support new optabs. The reason was vector compare was never expanded
> separately and always was a part of a vec_cond expansion.
Indeed.
> As a result it's possible to use the sage GIMPLE representation for both
> vector and scalar masks target type. Here is an example I used as a simple
> test:
>
> for (i=0; i<N; i++)
> {
> float t = a[i];
> if (t > 0.0f && t < 1.0e+2f)
> if (c[i] != 0)
> c[i] = 1;
> }
>
> Produced vector GIMPLE (before expand):
>
> vect_t_5.22_105 = MEM[base: _256, offset: 0B];
> mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
> };
> mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2,
> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
> mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
> vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
> mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
> MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1
> });
Looks good to me.
> Produced assembler on AVX-512:
>
> vmovups (%rdi), %zmm0
> vcmpps $25, %zmm5, %zmm0, %k1
> vcmpps $22, %zmm3, %zmm0, %k1{%k1}
> vmovdqa32 -64(%rdx), %zmm2{%k1}
> vpcmpd $4, %zmm1, %zmm2, %k1{%k1}
> vmovdqa32 %zmm4, (%rcx){%k1}
>
> Produced assembler on AVX-2:
>
> vmovups (%rdx), %xmm1
> vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1
> vcmpltps %ymm1, %ymm3, %ymm0
> vcmpltps %ymm5, %ymm1, %ymm1
> vpand %ymm0, %ymm1, %ymm0
> vpmaskmovd -32(%rcx), %ymm0, %ymm1
> vpcmpeqd %ymm2, %ymm1, %ymm1
> vpcmpeqd %ymm2, %ymm1, %ymm1
> vpand %ymm0, %ymm1, %ymm0
> vpmaskmovd %ymm4, %ymm0, (%rax)
>
> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>
> vmovups (%rdx), %xmm0
> vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0
> vcmpltps %ymm0, %ymm6, %ymm1
> vcmpltps %ymm7, %ymm0, %ymm0
> vpand %ymm1, %ymm5, %ymm2
> vpand %ymm0, %ymm2, %ymm1
> vpcmpeqd %ymm3, %ymm1, %ymm0
> vpandn %ymm4, %ymm0, %ymm0
> vpmaskmovd -32(%rcx), %ymm0, %ymm0
> vpcmpeqd %ymm3, %ymm0, %ymm0
> vpandn %ymm1, %ymm0, %ymm0
> vpcmpeqd %ymm3, %ymm0, %ymm0
> vpandn %ymm4, %ymm0, %ymm0
> vpmaskmovd %ymm5, %ymm0, (%rax)
>
>
> For now I still don't disable bool patterns, thus new masks apply to masked
> loads and stores only. Patch is also not tested and tried on several small
> tests only. Could you please look at what I currently have and say if it's
> in sync with your view on vector masking?
So apart from bool patterns and maybe implementation details (didn't
look too closely at the patch yet, maybe tomorrow), there is
+ /* Or a boolean vector type with the same element count
+ as the comparison operand types. */
+ else if (TREE_CODE (type) == VECTOR_TYPE
+ && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+ {
so we now allow both, integer typed and boolean typed comparison
results? I was hoping that on GIMPLE
we can canonicalize to a single form, the boolean one and for the
"old" style force the use of VEC_COND exprs
(which we did anyway, AFAIK). The comparison in the VEC_COND would
still have vector bool result type.
I expect the vectorization factor changes to "vanish" if we remove
bool patterns and re-org vector type deduction
Richard.
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-01 Ilya Enkovich <[email protected]>
>
> * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
> (ix86_expand_int_vec_cmp): New.
> (ix86_expand_fp_vec_cmp): New.
> * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
> op_true and op_false.
> (ix86_int_cmp_code_to_pcmp_immediate): New.
> (ix86_fp_cmp_code_to_pcmp_immediate): New.
> (ix86_cmp_code_to_pcmp_immediate): New.
> (ix86_expand_mask_vec_cmp): New.
> (ix86_expand_fp_vec_cmp): New.
> (ix86_expand_int_sse_cmp): New.
> (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
> (ix86_expand_int_vec_cmp): New.
> (ix86_get_mask_mode): New.
> (TARGET_VECTORIZE_GET_MASK_MODE): New.
> * config/i386/sse.md (avx512fmaskmodelower): New.
> (vec_cmp<mode><avx512fmaskmodelower>): New.
> (vec_cmp<mode><sseintvecmodelower>): New.
> (vec_cmpv2div2di): New.
> (vec_cmpu<mode><avx512fmaskmodelower>): New.
> (vec_cmpu<mode><sseintvecmodelower>): New.
> (vec_cmpuv2div2di): New.
> (maskload<mode>): Rename to ...
> (maskload<mode><sseintvecmodelower>): ... this.
> (maskstore<mode>): Rename to ...
> (maskstore<mode><sseintvecmodelower>): ... this.
> (maskload<mode><avx512fmaskmodelower>): New.
> (maskstore<mode><avx512fmaskmodelower>): New.
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
> * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
> * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
> (expand_MASK_STORE): Likewise.
> * optabs.c (vector_compare_rtx): Add OPNO arg.
> (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
> (get_vec_cmp_icode): New.
> (expand_vec_cmp_expr_p): New.
> (expand_vec_cmp_expr): New.
> (can_vec_mask_load_store_p): Add MASK_MODE arg.
> * optabs.def (vec_cmp_optab): New.
> (vec_cmpu_optab): New.
> (maskload_optab): Transform into convert optab.
> (maskstore_optab): Likewise.
> * optabs.h (expand_vec_cmp_expr_p): New.
> (expand_vec_cmp_expr): New.
> (can_vec_mask_load_store_p): Add MASK_MODE arg.
> * target.def (get_mask_mode): New.
> * targhooks.c (default_vector_alignment): Use mode alignment
> for vector masks.
> (default_get_mask_mode): New.
> * targhooks.h (default_get_mask_mode): New.
> * tree-cfg.c (verify_gimple_comparison): Support vector mask.
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
> can_vec_mask_load_store_p signature change.
> (predicate_mem_writes): Use boolean mask.
> * tree-vect-data-refs.c (vect_get_new_vect_var): Support
> vect_mask_var.
> (vect_create_destination_var): Likewise.
> * tree-vect-generic.c (expand_vector_comparison): Use
> expand_vec_cmp_expr_p for comparison availability.
> (expand_vector_operations_1): Ignore statements with scalar mode.
> * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
> operations for VF. Add mask type computation.
> * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
> constant.
> (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
> signature change.
> (vectorizable_comparison): New.
> (vect_analyze_stmt): Add vectorizable_comparison.
> (vect_transform_stmt): Likewise.
> (get_mask_type_for_scalar_type): New.
> * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
> (enum stmt_vec_info_type): Add comparison_vec_info_type.
> (get_mask_type_for_scalar_type): New.
> * tree.c (build_truth_vector_type): New.
> (truth_type_for): Use build_truth_vector_type for vectors.
> * tree.h (build_truth_vector_type): New.