On Thu, 20 Oct 2011, Jakub Jelinek wrote: > On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote: > > > + if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR > > > + && is_pattern_stmt_p (stmt_info)) > > > + scalar_dest = TREE_OPERAND (scalar_dest, 0); > > > if (TREE_CODE (scalar_dest) != ARRAY_REF > > > && TREE_CODE (scalar_dest) != INDIRECT_REF > > > && TREE_CODE (scalar_dest) != COMPONENT_REF > > > > Just change the if () stmt to > > > > if (!handled_component_p (scalar_dest) > > && TREE_CODE (scalar_dest) != MEM_REF) > > return false; > > That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside of > pattern stmts). > The VCEs I hope don't appear, but the first two might, and I'm not sure > we are prepared to handle them. Certainly not BIT_FIELD_REFs. > > > > + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, > > > stmts); > > > + if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == > > > TARGET_MEM_REF) > > > + { > > > + lhs = copy_node (lhs); > > > > We don't handle TARGET_MEM_REF in vectorizable_store, so no need to > > do it here. In fact, just unconditionally do ... > > > > > + TREE_TYPE (lhs) = TREE_TYPE (vectype); > > > + } > > > + else > > > + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); > > > > ... this (wrap it in a V_C_E). No need to special-case any > > MEM_REFs. > > Ok. After all it seems vectorizable_store pretty much ignores it > (except for the scalar_dest check above). For aliasing it uses the type > from DR_REF and otherwise it uses the vectorized type. > > > > + if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) > > > > This should never be false, so you can as well unconditionally build > > the conversion stmt. > > You mean because currently adjust_bool_pattern will prefer signed types > over unsigned while here lhs will be unsigned? I guess I should > change it to use signed type for the memory store too to avoid the extra > cast instead. Both types can be certainly the same precision, e.g. for: > unsigned char a[N], b[N]; > unsigned int d[N], e[N]; > bool c[N]; > ... > for (i = 0; i < N; ++i) > c[i] = a[i] < b[i]; > or different precision, e.g. for: > for (i = 0; i < N; ++i) > c[i] = d[i] < e[i]; > > > > @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo > > > gcc_assert (STMT_VINFO_DATA_REF (stmt_info) > > > || is_pattern_stmt_p (stmt_info)); > > > vectype = STMT_VINFO_VECTYPE (stmt_info); > > > + if (STMT_VINFO_DATA_REF (stmt_info)) > > > + { > > > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > > > + tree scalar_type = TREE_TYPE (DR_REF (dr)); > > > + /* vect_analyze_data_refs will allow bool writes through, > > > + in order to allow vect_recog_bool_pattern to transform > > > + those. If they couldn't be transformed, give up now. */ > > > + if (((TYPE_PRECISION (scalar_type) == 1 > > > + && TYPE_UNSIGNED (scalar_type)) > > > + || TREE_CODE (scalar_type) == BOOLEAN_TYPE) > > > > Shouldn't it be always possible to vectorize those? For loads > > we can assume the memory contains only 1 or 0 (we assume that for > > scalar loads), for stores we can mask out all other bits explicitly > > if you add support for truncating conversions to non-mode precision > > (in fact, we could support non-mode precision vectorization that way, > > if not support bitfield loads or extending conversions). > > Not without the pattern recognizer transforming it into something. > That is something we've discussed on IRC before I started working on the > first vect_recog_bool_pattern patch, we'd need to special case bool and > one-bit precision types in way too many places all around the vectorizer. > Another reason for that was that what vect_recog_bool_pattern does currently > is certainly way faster than what would we end up with if we just handled > bool as unsigned (or signed?) char with masking on casts and stores > - the ability to use any integer type for the bools rather than char > as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and > corresponding VEC_UNPACK_{LO,HI}_EXPRs. > So the chosen solution was attempt to transform some of bool patterns > into something the vectorizer can handle easily. > And that can be extended over time what it handles. > > The above just reflects it, probably just me trying to be too cautious, > the vectorization would likely fail on the stmt feeding the store, because > get_vectype_for_scalar_type would fail on it. > > If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE (TYPE_MODE) > vectorization (hopefully with still preserving the pattern bool recognizer > for the above stated reasons), we'd start with changing > get_vectype_for_scalar_type to handle those types (then the > tree-vect-data-refs.c and tree-vect-loop.c changes from this patch would > be unnecessary), but then we'd need to handle it in other places too > (I guess loads would be fine (unless BIT_FIELD_REF loads), but then > casts and stores need extra code).
This is what I have right now, bootstrapped and tested on x86_64-unknown-linux-gnu. I do see FAIL: gfortran.dg/logical_dot_product.f90 -O3 -fomit-frame-pointer (internal c ompiler error) FAIL: gfortran.dg/mapping_1.f90 -O3 -fomit-frame-pointer (internal compiler er ror) FAIL: gfortran.fortran-torture/execute/pr43390.f90, -O3 -g (internal compiler error) so there is some fallout, but somebody broke dejagnu enough that I can't easily debug this right now, so I'm post-poning it until that is fixed. It doesn't seem to break any testcases for Bool vectorization. I probably should factor out the precision test. Thanks, Richard. 2011-10-24 Richard Guenther <rguent...@suse.de> * tree-vect-stmts.c (vectorizable_assignment): Bail out for non-mode-precision operations. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_type_demotion): Likewise. (vectorizable_type_promotion): Likewise. (vectorizable_store): Handle non-mode-precision stores. (vectorizable_load): Handle non-mode-precision loads. (get_vectype_for_scalar_type_and_size): Return a vector type for non-mode-precision integers. * gcc.dg/vect/vect-bool-1.c: New testcase. Index: gcc/tree-vect-stmts.c =================================================================== *** gcc/tree-vect-stmts.c (revision 180380) --- gcc/tree-vect-stmts.c (working copy) *************** vectorizable_assignment (gimple stmt, gi *** 2173,2178 **** --- 2173,2197 ---- != GET_MODE_SIZE (TYPE_MODE (vectype_in))))) return false; + /* We do not handle bit-precision changes. */ + if ((CONVERT_EXPR_CODE_P (code) + || code == VIEW_CONVERT_EXPR) + && INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) + && ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest)))) + || ((TYPE_PRECISION (TREE_TYPE (op)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))) + /* But a conversion that does not change the bit-pattern is ok. */ + && !((TYPE_PRECISION (TREE_TYPE (scalar_dest)) + > TYPE_PRECISION (TREE_TYPE (op))) + && TYPE_UNSIGNED (TREE_TYPE (op)))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "type conversion to/from bit-precision " + "unsupported."); + return false; + } + if (!vec_stmt) /* transformation not required. */ { STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type; *************** vectorizable_shift (gimple stmt, gimple_ *** 2326,2331 **** --- 2345,2357 ---- scalar_dest = gimple_assign_lhs (stmt); vectype_out = STMT_VINFO_VECTYPE (stmt_info); + if (TYPE_PRECISION (TREE_TYPE (scalar_dest)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest)))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "bit-precision shifts not supported."); + return false; + } op0 = gimple_assign_rhs1 (stmt); if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo, *************** vectorizable_operation (gimple stmt, gim *** 2660,2665 **** --- 2686,2706 ---- scalar_dest = gimple_assign_lhs (stmt); vectype_out = STMT_VINFO_VECTYPE (stmt_info); + /* Most operations cannot handle bit-precision types without extra + truncations. */ + if ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest)))) + /* Exception are bitwise operations. */ + && code != BIT_IOR_EXPR + && code != BIT_XOR_EXPR + && code != BIT_AND_EXPR + && code != BIT_NOT_EXPR) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "bit-precision arithmetic not supported."); + return false; + } + op0 = gimple_assign_rhs1 (stmt); if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo, &def_stmt, &def, &dt[0], &vectype)) *************** vectorizable_type_demotion (gimple stmt, *** 3082,3090 **** if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) && INTEGRAL_TYPE_P (TREE_TYPE (op0))) || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest)) ! && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0)) ! && CONVERT_EXPR_CODE_P (code)))) return false; if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo, &def_stmt, &def, &dt[0], &vectype_in)) { --- 3123,3142 ---- if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) && INTEGRAL_TYPE_P (TREE_TYPE (op0))) || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest)) ! && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0))))) return false; + + if (INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) + && ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest)))) + || ((TYPE_PRECISION (TREE_TYPE (op0)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op0))))))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "type demotion to/from bit-precision unsupported."); + return false; + } + if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo, &def_stmt, &def, &dt[0], &vectype_in)) { *************** vectorizable_type_promotion (gimple stmt *** 3365,3370 **** --- 3417,3435 ---- && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0)) && CONVERT_EXPR_CODE_P (code)))) return false; + + if (INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) + && ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest)))) + || ((TYPE_PRECISION (TREE_TYPE (op0)) + != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op0))))))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "type promotion to/from bit-precision " + "unsupported."); + return false; + } + if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo, &def_stmt, &def, &dt[0], &vectype_in)) { *************** vectorizable_store (gimple stmt, gimple_ *** 3673,3689 **** return false; } - /* The scalar rhs type needs to be trivially convertible to the vector - component type. This should always be the case. */ elem_type = TREE_TYPE (vectype); - if (!useless_type_conversion_p (elem_type, TREE_TYPE (op))) - { - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, "??? operands of different types"); - return false; - } - vec_mode = TYPE_MODE (vectype); /* FORNOW. In some cases can vectorize even if data-type not supported (e.g. - array initialization with 0). */ if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing) --- 3738,3746 ---- return false; } elem_type = TREE_TYPE (vectype); vec_mode = TYPE_MODE (vectype); + /* FORNOW. In some cases can vectorize even if data-type not supported (e.g. - array initialization with 0). */ if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing) *************** vectorizable_load (gimple stmt, gimple_s *** 4117,4123 **** bool strided_load = false; bool load_lanes_p = false; gimple first_stmt; - tree scalar_type; bool inv_p; bool negative; bool compute_in_loop = false; --- 4174,4179 ---- *************** vectorizable_load (gimple stmt, gimple_s *** 4192,4198 **** return false; } ! scalar_type = TREE_TYPE (DR_REF (dr)); mode = TYPE_MODE (vectype); /* FORNOW. In some cases can vectorize even if data-type not supported --- 4248,4254 ---- return false; } ! elem_type = TREE_TYPE (vectype); mode = TYPE_MODE (vectype); /* FORNOW. In some cases can vectorize even if data-type not supported *************** vectorizable_load (gimple stmt, gimple_s *** 4204,4219 **** return false; } - /* The vector component type needs to be trivially convertible to the - scalar lhs. This should always be the case. */ - elem_type = TREE_TYPE (vectype); - if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type)) - { - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, "??? operands of different types"); - return false; - } - /* Check if the load is a part of an interleaving chain. */ if (STMT_VINFO_STRIDED_ACCESS (stmt_info)) { --- 4260,4265 ---- *************** vectorizable_load (gimple stmt, gimple_s *** 4560,4566 **** msq = new_temp; bump = size_binop (MULT_EXPR, vs_minus_1, ! TYPE_SIZE_UNIT (scalar_type)); ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump); new_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, NULL_TREE, ptr, --- 4606,4612 ---- msq = new_temp; bump = size_binop (MULT_EXPR, vs_minus_1, ! TYPE_SIZE_UNIT (elem_type)); ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump); new_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, NULL_TREE, ptr, *************** get_vectype_for_scalar_type_and_size (tr *** 5441,5453 **** if (nbytes < TYPE_ALIGN_UNIT (scalar_type)) return NULL_TREE; ! /* If we'd build a vector type of elements whose mode precision doesn't ! match their types precision we'll get mismatched types on vector ! extracts via BIT_FIELD_REFs. This effectively means we disable ! vectorization of bool and/or enum types in some languages. */ if (INTEGRAL_TYPE_P (scalar_type) && GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type)) ! return NULL_TREE; if (GET_MODE_CLASS (inner_mode) != MODE_INT && GET_MODE_CLASS (inner_mode) != MODE_FLOAT) --- 5487,5500 ---- if (nbytes < TYPE_ALIGN_UNIT (scalar_type)) return NULL_TREE; ! /* For vector types of elements whose mode precision doesn't ! match their types precision we use a element type of mode ! precision. The vectorization routines will have to make sure ! they support the proper result truncation/extension. */ if (INTEGRAL_TYPE_P (scalar_type) && GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type)) ! scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (inner_mode), ! TYPE_UNSIGNED (scalar_type)); if (GET_MODE_CLASS (inner_mode) != MODE_INT && GET_MODE_CLASS (inner_mode) != MODE_FLOAT) Index: gcc/testsuite/gcc.dg/vect/vect-bool-1.c =================================================================== *** gcc/testsuite/gcc.dg/vect/vect-bool-1.c (revision 0) --- gcc/testsuite/gcc.dg/vect/vect-bool-1.c (revision 0) *************** *** 0 **** --- 1,15 ---- + /* { dg-do compile } */ + /* { dg-require-effective-target vect_int } */ + + _Bool a[1024]; + _Bool b[1024]; + _Bool c[1024]; + void foo (void) + { + unsigned i; + for (i = 0; i < 1024; ++i) + a[i] = b[i] | c[i]; + } + + /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */