On Tue, 18 May 2021, Uros Bizjak wrote: > Add infrastructure, logic and arithmetic support for 4-byte vectors. > These can be used with SSE2 targets, where movd instructions from/to > XMM registers are available. x86_64 ABI passes 4-byte vectors in > integer registers, so also add logic operations with integer registers. > > 2021-05-18 Uroš Bizjak <ubiz...@gmail.com> > > gcc/ > PR target/100637 > * config/i386/i386.h (VALID_SSE2_REG_MODE): > Add V4QI and V2HI modes. > (VALID_INT_MODE_P): Ditto. > * config/i386/mmx.md (VI_32): New mode iterator. > (mmxvecsize): Handle V4QI and V2HI. > (Yv_Yw): Ditto. > (mov<VI_32:mode>): New expander. > (*mov<mode>_internal): New insn pattern. > (movmisalign<VI_32:mode>): New expander. > (neg<VI_32:mode>): New expander. > (<plusminus:insn><VI_32:mode>3): New expander. > (*<plusminus:insn><VI_32:mode>3): New insn pattern. > (mulv2hi3): New expander. > (*mulv2hi3): New insn pattern. > (one_cmpl<VI_32:mode>2): New expander. > (*andnot<VI_32:mode>3): New insn pattern. > (<any_logic:code><VI_32:mode>3): New expander. > (*<any_logic:code><VI_32:mode>3): New insn pattern. > > gcc/testsuite/ > > PR target/100637 > * gcc.target/i386/pr100637-1b.c: New test. > * gcc.target/i386/pr100637-1w.c: Ditto. > > * gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq. > * gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq. > * gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for vpmovdb. > * gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for > pmovsxbd and pmovsxwq. > * gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for > pmovzxbd and pmovzxwq. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > There is one scan-tree failure introduced into the vectorizer testsuite: > > FAIL: gcc.dg/vect/pr71264.c > > Where we want to vectorize a copy loop of 4 * vector_size(4). The > target now handles 4-byte vectors, and the loop fails to vectorize for > some reason. IMO, the vectorization should not depend on the handling > of the underlying type, so it looks like something needs to be fixed > in the middle-end. Adding Richi to CC.
So before the change the 'footype' which is typedef uint8_t footype __attribute__((vector_size(4))); had SImode, now it has V4QImode. That makes us run into vect_get_vector_types_for_stmt which does if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt)))) return opt_result::failure_at (stmt, "not vectorized: vector stmt in loop:%G", stmt); note that vector _types_ slip through here and get handled in get_related_vectype_for_scalar_type by passing the integral mode checks and then /* We shouldn't end up building VECTOR_TYPEs of non-scalar components. When the component mode passes the above test simply use a type corresponding to that mode. The theory is that any use that would cause problems with this will disable vectorization anyway. */ else if (!SCALAR_FLOAT_TYPE_P (scalar_type) && !INTEGRAL_TYPE_P (scalar_type)) scalar_type = lang_hooks.types.type_for_mode (inner_mode, 1); in the end we're choosing V4SImode vectors as vector type here on the GCC 11 branch. For operations other than bit-operations we're later failing with checks like scalar_dest = gimple_assign_lhs (stmt); vectype_out = STMT_VINFO_VECTYPE (stmt_info); if (!type_has_mode_precision_p (TREE_TYPE (scalar_dest))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "bit-precision shifts not supported.\n"); return false; } now for vector types type_has_mode_precision_p has odd semantics: inline bool type_has_mode_precision_p (const_tree t) { return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t))); } where TYPE_PRECISION is log(TYPE_VECTOR_SUBPARTS) and the mode precision here is the full vector width for vector modes. So it seems we're just a bit lucky that things go correct here. In the end we only have limited support to re-vectorize existing vector code. To re-instantiate the vectorization for the testcase we could try to do the get_related_vectype_for_scalar_type "trick" in vect_get_vector_types_for_stmt and make the scalar_type a scalar type matching the vector size. But I'm not sure what can of worms we'll open when doing this given the above. Richard?