On Tue, 18 May 2021, Uros Bizjak wrote:
> Add infrastructure, logic and arithmetic support for 4-byte vectors.
> These can be used with SSE2 targets, where movd instructions from/to
> XMM registers are available. x86_64 ABI passes 4-byte vectors in
> integer registers, so also add logic operations with integer registers.
>
> 2021-05-18 Uroš Bizjak <[email protected]>
>
> gcc/
> PR target/100637
> * config/i386/i386.h (VALID_SSE2_REG_MODE):
> Add V4QI and V2HI modes.
> (VALID_INT_MODE_P): Ditto.
> * config/i386/mmx.md (VI_32): New mode iterator.
> (mmxvecsize): Handle V4QI and V2HI.
> (Yv_Yw): Ditto.
> (mov<VI_32:mode>): New expander.
> (*mov<mode>_internal): New insn pattern.
> (movmisalign<VI_32:mode>): New expander.
> (neg<VI_32:mode>): New expander.
> (<plusminus:insn><VI_32:mode>3): New expander.
> (*<plusminus:insn><VI_32:mode>3): New insn pattern.
> (mulv2hi3): New expander.
> (*mulv2hi3): New insn pattern.
> (one_cmpl<VI_32:mode>2): New expander.
> (*andnot<VI_32:mode>3): New insn pattern.
> (<any_logic:code><VI_32:mode>3): New expander.
> (*<any_logic:code><VI_32:mode>3): New insn pattern.
>
> gcc/testsuite/
>
> PR target/100637
> * gcc.target/i386/pr100637-1b.c: New test.
> * gcc.target/i386/pr100637-1w.c: Ditto.
>
> * gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq.
> * gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq.
> * gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for vpmovdb.
> * gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for
> pmovsxbd and pmovsxwq.
> * gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for
> pmovzxbd and pmovzxwq.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> There is one scan-tree failure introduced into the vectorizer testsuite:
>
> FAIL: gcc.dg/vect/pr71264.c
>
> Where we want to vectorize a copy loop of 4 * vector_size(4). The
> target now handles 4-byte vectors, and the loop fails to vectorize for
> some reason. IMO, the vectorization should not depend on the handling
> of the underlying type, so it looks like something needs to be fixed
> in the middle-end. Adding Richi to CC.
So before the change the 'footype' which is
typedef uint8_t footype __attribute__((vector_size(4)));
had SImode, now it has V4QImode. That makes us run into
vect_get_vector_types_for_stmt which does
if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
return opt_result::failure_at (stmt,
"not vectorized: vector stmt in
loop:%G",
stmt);
note that vector _types_ slip through here and get handled
in get_related_vectype_for_scalar_type by passing the integral
mode checks and then
/* We shouldn't end up building VECTOR_TYPEs of non-scalar components.
When the component mode passes the above test simply use a type
corresponding to that mode. The theory is that any use that
would cause problems with this will disable vectorization anyway. */
else if (!SCALAR_FLOAT_TYPE_P (scalar_type)
&& !INTEGRAL_TYPE_P (scalar_type))
scalar_type = lang_hooks.types.type_for_mode (inner_mode, 1);
in the end we're choosing V4SImode vectors as vector type here
on the GCC 11 branch.
For operations other than bit-operations we're later failing
with checks like
scalar_dest = gimple_assign_lhs (stmt);
vectype_out = STMT_VINFO_VECTYPE (stmt_info);
if (!type_has_mode_precision_p (TREE_TYPE (scalar_dest)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"bit-precision shifts not supported.\n");
return false;
}
now for vector types type_has_mode_precision_p has odd semantics:
inline bool
type_has_mode_precision_p (const_tree t)
{
return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE
(t)));
}
where TYPE_PRECISION is log(TYPE_VECTOR_SUBPARTS) and the mode precision
here is the full vector width for vector modes. So it seems we're
just a bit lucky that things go correct here.
In the end we only have limited support to re-vectorize existing
vector code.
To re-instantiate the vectorization for the testcase we could try
to do the get_related_vectype_for_scalar_type "trick" in
vect_get_vector_types_for_stmt and make the scalar_type a
scalar type matching the vector size. But I'm not sure what
can of worms we'll open when doing this given the above.
Richard?