Re: [PATCH] i386: Implement 4-byte vector support [PR100637]

Richard Biener Wed, 19 May 2021 01:49:21 -0700

On Tue, 18 May 2021, Uros Bizjak wrote:

> Add infrastructure, logic and arithmetic support for 4-byte vectors.
> These can be used with SSE2 targets, where movd instructions from/to
> XMM registers are available.  x86_64 ABI passes 4-byte vectors in
> integer registers, so also add logic operations with integer registers.
> 
> 2021-05-18  Uroš Bizjak  <ubiz...@gmail.com>
> 
> gcc/
>     PR target/100637
>     * config/i386/i386.h (VALID_SSE2_REG_MODE):
>     Add V4QI and V2HI modes.
>     (VALID_INT_MODE_P): Ditto.
>     * config/i386/mmx.md (VI_32): New mode iterator.
>     (mmxvecsize): Handle V4QI and V2HI.
>     (Yv_Yw): Ditto.
>     (mov<VI_32:mode>): New expander.
>     (*mov<mode>_internal): New insn pattern.
>     (movmisalign<VI_32:mode>): New expander.
>     (neg<VI_32:mode>): New expander.
>     (<plusminus:insn><VI_32:mode>3): New expander.
>     (*<plusminus:insn><VI_32:mode>3): New insn pattern.
>     (mulv2hi3): New expander.
>     (*mulv2hi3): New insn pattern.
>     (one_cmpl<VI_32:mode>2): New expander.
>     (*andnot<VI_32:mode>3): New insn pattern.
>     (<any_logic:code><VI_32:mode>3): New expander.
>     (*<any_logic:code><VI_32:mode>3): New insn pattern.
> 
> gcc/testsuite/
> 
>     PR target/100637
>     * gcc.target/i386/pr100637-1b.c: New test.
>     * gcc.target/i386/pr100637-1w.c: Ditto.
> 
>     * gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq.
>     * gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq.
>     * gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for vpmovdb.
>     * gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for
>     pmovsxbd and pmovsxwq.
>     * gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for
>     pmovzxbd and pmovzxwq.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> There is one scan-tree failure introduced into the vectorizer testsuite:
> 
> FAIL: gcc.dg/vect/pr71264.c
> 
> Where we want to vectorize a copy loop of 4 * vector_size(4). The
> target now handles 4-byte vectors, and the loop fails to vectorize for
> some reason. IMO, the vectorization should not depend on the handling
> of the underlying type, so it looks like something needs to be fixed
> in the middle-end. Adding Richi to CC.


So before the change the 'footype' which is

typedef uint8_t footype __attribute__((vector_size(4)));

had SImode, now it has V4QImode.  That makes us run into
vect_get_vector_types_for_stmt which does

  if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
    return opt_result::failure_at (stmt,
                                   "not vectorized: vector stmt in 
loop:%G",
                                   stmt);

note that vector _types_ slip through here and get handled
in get_related_vectype_for_scalar_type by passing the integral
mode checks and then

  /* We shouldn't end up building VECTOR_TYPEs of non-scalar components.
     When the component mode passes the above test simply use a type
     corresponding to that mode.  The theory is that any use that
     would cause problems with this will disable vectorization anyway.  */
  else if (!SCALAR_FLOAT_TYPE_P (scalar_type)
           && !INTEGRAL_TYPE_P (scalar_type))
    scalar_type = lang_hooks.types.type_for_mode (inner_mode, 1);

in the end we're choosing V4SImode vectors as vector type here
on the GCC 11 branch.

For operations other than bit-operations we're later failing
with checks like

  scalar_dest = gimple_assign_lhs (stmt);
  vectype_out = STMT_VINFO_VECTYPE (stmt_info);
  if (!type_has_mode_precision_p (TREE_TYPE (scalar_dest)))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "bit-precision shifts not supported.\n");
      return false;
    }

now for vector types type_has_mode_precision_p has odd semantics:

inline bool
type_has_mode_precision_p (const_tree t)
{
  return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE 
(t)));
}

where TYPE_PRECISION is log(TYPE_VECTOR_SUBPARTS) and the mode precision
here is the full vector width for vector modes.  So it seems we're
just a bit lucky that things go correct here.

In the end we only have limited support to re-vectorize existing
vector code.

To re-instantiate the vectorization for the testcase we could try
to do the get_related_vectype_for_scalar_type "trick" in
vect_get_vector_types_for_stmt and make the scalar_type a
scalar type matching the vector size.  But I'm not sure what
can of worms we'll open when doing this given the above.

Richard?

Re: [PATCH] i386: Implement 4-byte vector support [PR100637]

Reply via email to