"yangyang (ET)" <[email protected]> writes:
> Although Richard mentioned in the PR that poly_uint64 will naturally
> decay to a uint64_t in i386 target files, it seems that operation /= is not
> supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE
> (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE
> (TYPE_MODE (base_type));".
Ah, don't remember encountering that one. But yeah, expanding the
/= seems like the best approach for now.
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a8cc545c370..c630c0c7f81 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen
> (struct cgraph_node *node,
> tree base_type, int num)
> {
> tree t, ret_type, arg_type;
> - unsigned int elt_bits, vec_bits, count;
> + unsigned int elt_bits, count;
> + unsigned HOST_WIDE_INT const_simdlen;
> + poly_uint64 vec_bits;
>
> if (!TARGET_SIMD)
> return 0;
>
> - if (clonei->simdlen
> - && (clonei->simdlen < 2
> - || clonei->simdlen > 1024
> - || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> + /* For now, SVE simdclones won't produce illegal simdlen, So only check
> + const simdlens here. */
> + if (maybe_ne (clonei->simdlen, 0U)
> + && (clonei->simdlen.is_constant (&const_simdlen))
Very minor, but GCC style is (mostly!) not to wrap a condition like this
in parentheses if it fits on a single line, so just:
&& clonei->simdlen.is_constant (&const_simdlen)
> else
> {
> count = 1;
> vec_bits = clonei->simdlen * elt_bits;
> - if (vec_bits != 64 && vec_bits != 128)
> + /* For now, SVE simdclones won't produce illegal simdlen, So only check
> + const simdlens here. */
> + if (clonei->simdlen.is_constant (&const_simdlen)
> + && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U))
Although it won't make a difference in context due to the is_constant
check, in principle this should be “maybe_ne” rather than “known_ne”.
E.g. when testing SVE conditions, known_ne (2 + 2 * (VQ - 1), 2)
is false but maybe_ne (2 + 2 * (VQ - 1), 2) is true.
Alternatively:
!(known_eq (vec_bits, 64U) || known_eq (vec_bits, 128U))
if that seems more natural (either's fine).
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54c2cdaf060..0ef037e5e55 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct
> cgraph_node *node,
> || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> {
> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> - "unsupported simdlen %d", clonei->simdlen);
> + "unsupported simdlen %ld", clonei->simdlen.to_constant ());
I think this should be %wd instead.
> @@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct
> cgraph_node *node,
> if (cnt > (TARGET_64BIT ? 16 : 8))
> {
> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> - "unsupported simdlen %d", clonei->simdlen);
> + "unsupported simdlen %ld",
> + clonei->simdlen.to_constant ());
Same here.
> @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
> veclen = node->simdclone->vecsize_int;
> else
> veclen = node->simdclone->vecsize_float;
> - veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> - if (veclen > node->simdclone->simdlen)
> + veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> + if (known_gt (veclen, node->simdclone->simdlen))
> veclen = node->simdclone->simdlen;
Although again it probably doesn't make a difference in practice,
the known/maybe situation is similar here. When comparing:
- an SVE vector of 2 + 2 * (VQ - 1) doubles and
- an Advanced SIMD vector of 2 doubles
the Advanced SIMD version is conceptually ordered <= the SVE one,
in the sense that the SVE vector always contains a whole number of
Advanced SIMD vectors whereas the Advanced SIMD vector might not
contain a whole number of SVE vectors.
In other words, the number of lanes in the Advanced SIMD vector
is known_le the number of lanes in the SVE vector, and the number
of lanes in the SVE vector is known_ge and maybe_gt (but not known_gt)
the number of lanes in the Advanced SIMD vector. So for these kinds of
comparison, known_gt can look a bit unexpected, even if (as here) it's
probably fine in practice.
There's currently a hard-coded assumption in this code and in the
vectoriser that both constant-length software vectors and constant-length
hardware vectors are a power of 2 in size. This means that the > above
is effectively testing whether veclen contains a whole number of
node->simdclone->simdlens, not just whether the veclen is bigger.
So when adding the initial autovec support for SVE, we generally rewrote
these kinds of test to use multiple_p instead of comparisons. I think
that makes the intent more obvious (but others might disagree :-)).
That was a long-winded way of saying that I suggest we use:
if (multiple_p (veclen, node->simdclone->simdlen))
veclen = node->simdclone->simdlen;
instead.
> if (POINTER_TYPE_P (t))
> t = pointer_sized_int_node;
> - if (veclen == node->simdclone->simdlen)
> + if (known_eq (veclen, node->simdclone->simdlen))
> t = build_vector_type (t, node->simdclone->simdlen);
> else
> {
> t = build_vector_type (t, veclen);
> - t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
> + t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
> + veclen));
Another minor formatting thing, but the “veclen” should be indented under
“node->simdclone->simdlen”.
> }
> TREE_TYPE (TREE_TYPE (fndecl)) = t;
> if (!node->definition)
> @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node
> *node)
> veclen = sc->vecsize_int;
> else
> veclen = sc->vecsize_float;
> - veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> - if (veclen > sc->simdlen)
> + veclen = exact_div (veclen,
> + GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
> + if (known_gt (veclen, sc->simdlen))
> veclen = sc->simdlen;
Similarly here.
> if (sc->mask_mode != VOIDmode)
> adj.type
> @@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node
> *node)
> adj.type = build_vector_type (base_type, veclen);
> vec_safe_push (new_params, adj);
>
> - for (j = veclen; j < sc->simdlen; j += veclen)
> + k = vector_unroll_factor (sc->simdlen, veclen);
> + for (j = 1; j < k; j++)
> vec_safe_push (new_params, adj);
>
> /* We have previously allocated one extra entry for the mask. Use
> @@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node
> *node)
> if (sc->mask_mode == VOIDmode)
> sc->args[i].simd_array
> = create_tmp_simd_array ("mask", base_type, sc->simdlen);
> - else if (veclen < sc->simdlen)
> + else if (known_lt (veclen, sc->simdlen))
I think k > 1 might be clearer, in the sense that it avoids a choice
between “maybe” and “known”.
> sc->args[i].simd_array
> - = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
> + = create_tmp_simd_array ("mask", adj.type, k);
> else
> sc->args[i].simd_array = NULL_TREE;
> }
> […]
> @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
> iter, NULL_TREE, NULL_TREE);
> adjustments->register_replacement (&(*adjustments->m_adj_params)[j],
> r);
>
> - if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> - j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> + if (known_lt (simd_clone_subparts (vectype), node->simdclone->simdlen))
> + {
> + j += vector_unroll_factor (node->simdclone->simdlen,
> + simd_clone_subparts (vectype)) - 1;
> + }
And similarly here:
if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
> diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
> index 5e04e63ebf2..78083098baa 100644
> --- a/gcc/poly-int-types.h
> +++ b/gcc/poly-int-types.h
> @@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int>
> poly_widest_int;
> #define vector_element_size(SIZE, NELTS) \
> (exact_div (SIZE, NELTS).to_constant ())
>
> +/* Return the number of unroll times when a vector has NELTS1 elements
s/vector has/vector that has/
> + is unrolled to vectors has NELTS2 elements.
s/vectors has/vectors that have/.
> +
> + to_constant () is safe in this situation because the multiples of the
> + NELTS of two vectors are always constant-size scalars. */
> +#define vector_unroll_factor(NELTS1, NELTS2) \
> + (exact_div (NELTS1, NELTS2).to_constant ())
> +
> /* Wrapper for poly_int arguments to target macros, so that if a target
> doesn't need polynomial-sized modes, its header file can continue to
> treat the argument as a normal constant. This should go away once
> @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> n = n->simdclone->next_clone)
> {
> unsigned int this_badness = 0;
> - if (n->simdclone->simdlen > vf
> + if (known_gt (n->simdclone->simdlen, vf)
I think !constant_multiple_p rather than known_gt here. Until we add
support for vectorising simd clones with partial vectors, a whole number
of simdlens must fit within VF. So maybe:
unsigned int num_calls;
if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)
and then…
> || n->simdclone->nargs != nargs)
> continue;
> - if (n->simdclone->simdlen < vf)
> - this_badness += (exact_log2 (vf)
> - - exact_log2 (n->simdclone->simdlen)) * 1024;
> + if (known_lt (n->simdclone->simdlen, vf))
> + this_badness += exact_log2
> + (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
…use num_calls instead of vector_unroll_factor here.
> if (n->simdclone->inbranch)
> this_badness += 2048;
> int target_badness = targetm.simd_clone.usable (n);
> @@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> slp_node);
> if (arginfo[i].vectype == NULL
> - || (simd_clone_subparts (arginfo[i].vectype)
> - > bestn->simdclone->simdlen))
> + || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> + bestn->simdclone->simdlen)))
Here too I think we want constant_multiple_p:
|| !constant_multiple_p (bestn->simdclone->simdlen,
simd_clone_subparts (arginfo[i].vectype))
> return false;
> }
>
> fndecl = bestn->decl;
> nunits = bestn->simdclone->simdlen;
> - ncopies = vf / nunits;
> + ncopies = vector_unroll_factor (vf, nunits);
>
> /* If the function isn't const, only allow it in simd loops where user
> has asserted that at least nunits consecutive iterations can be
> performed using SIMD instructions. */
> - if ((loop == NULL || (unsigned) loop->safelen < nunits)
> + if ((loop == NULL || known_lt ((unsigned) loop->safelen, nunits))
This should be maybe_lt, because we can only safely optimise if we know
that the simdlen >= safelen.
> && gimple_vuse (stmt))
> return false;
>
> @@ -4078,15 +4078,16 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> {
> case SIMD_CLONE_ARG_TYPE_VECTOR:
> atype = bestn->simdclone->args[i].vector_type;
> - o = nunits / simd_clone_subparts (atype);
> + o = vector_unroll_factor (nunits,
> + simd_clone_subparts (atype));
> for (m = j * o; m < (j + 1) * o; m++)
> {
> - if (simd_clone_subparts (atype)
> - < simd_clone_subparts (arginfo[i].vectype))
> + if (known_lt (simd_clone_subparts (atype),
> + simd_clone_subparts (arginfo[i].vectype)))
I think we should leave this and…
> {
> poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
> - k = (simd_clone_subparts (arginfo[i].vectype)
> - / simd_clone_subparts (atype));
> + k = simd_clone_subparts (arginfo[i].vectype)
> + / simd_clone_subparts (atype);
…this until the simd_clone_subparts calls are removed. FWIW, the original
formatting of the division is the preferred one (with the justification
that it helps emacs to indent the second line correctly :-)).
> gcc_assert ((k & (k - 1)) == 0);
> if (m == 0)
> {
> @@ -4116,8 +4117,8 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> }
> else
> {
> - k = (simd_clone_subparts (atype)
> - / simd_clone_subparts (arginfo[i].vectype));
> + k = simd_clone_subparts (atype)
> + / simd_clone_subparts (arginfo[i].vectype);
Similarly here, the original formatting was correct.
> gcc_assert ((k & (k - 1)) == 0);
> vec<constructor_elt, va_gc> *ctor_elts;
> if (k != 1)
> @@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> gcall *new_call = gimple_build_call_vec (fndecl, vargs);
> if (vec_dest)
> {
> - gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
> + gcc_assert (ratype || known_eq (simd_clone_subparts (rtype),
> + nunits));
Another formatting nit, sorry, but now that it needs to span multiple lines,
the preferred formatting is:
gcc_assert (ratype
|| known_eq (simd_clone_subparts (rtype), nunits));
> if (ratype)
> new_temp = create_tmp_var (ratype);
> else if (useless_type_conversion_p (vectype, rtype))
> @@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
>
> if (vec_dest)
> {
> - if (simd_clone_subparts (vectype) < nunits)
> + if (known_lt (simd_clone_subparts (vectype), nunits))
multiple_p (nunits, …) here too. It would also be possible to do:
if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)
and avoid the separate vector_unroll_factor.
> {
> unsigned int k, l;
> poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
> poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
> - k = nunits / simd_clone_subparts (vectype);
> + k = vector_unroll_factor (nunits,
> + simd_clone_subparts (vectype));
> gcc_assert ((k & (k - 1)) == 0);
> for (l = 0; l < k; l++)
> {
> @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
> continue;
> }
> - else if (simd_clone_subparts (vectype) > nunits)
> + else if (known_gt (simd_clone_subparts (vectype), nunits))
multiple_p here too.
> {
> - unsigned int k = (simd_clone_subparts (vectype)
> - / simd_clone_subparts (rtype));
> + unsigned int k = simd_clone_subparts (vectype)
> + / simd_clone_subparts (rtype);
Same comment about formatting.
The patch looks really good though, thanks. The above comments are only
really repetitions of a small number of minor things.
Richard