Steve Ellcey <[email protected]> writes:
> Here is an updated version of the GCC patch to enable SIMD functions on
> Aarch64. There are a number of changes from the last patch.
>
> I reduced the shared code changes, there is still one change in shared code
> (omp-simd-clone.c) to call targetm.simd_clone.adjust from expand_simd_clones
> but it now uses the same argument as the existing call. This new call allows
> Aarch64 to add the aarch64_vector_pcs attribute to SIMD clone definitions
> which in turn ensures they use the correct ABI. Previously this target
> function was only called on declarations, not definitions. This change
> affects
> the x86 target so I modified ix86_simd_clone_adjust to return and do nothing
> when called with a definition. This means there is no change in behaviour
> on x86. I did a build and GCC testsuite run on x86 to verify this.
>
> Most of the changes from the previous patch are in the
> aarch64_simd_clone_compute_vecsize_and_simdlen function.
>
> The previous version was heavily based on the x86 function, this one has
> changes to address the issues that were raised in the earlier patch
> and so it no longer looks like the x86 version. I use types instead of modes
> to check for what we can/cannot vectorize and I (try to) differentiate
> between vectors that we are not currently handling (but could later) and
> those that won't ever be handled.
>
> I have also added a testsuite patch to fix regressions in the gcc.dg/gomp
> and g++.dg/gomp tests. There are no regressions with this patch applied.
>
> Steve Ellcey
> [email protected]
>
> 2018-12-19 Steve Ellcey <[email protected]>
>
> * config/aarch64/aarch64.c (cgraph.h): New include.
> (supported_simd_type): New function.
> (currently_supported_simd_type): Ditto.
> (aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
> (aarch64_simd_clone_adjust): Ditto.
> (aarch64_simd_clone_usable): Ditto.
> (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
> (TARGET_SIMD_CLONE_ADJUST): Ditto.
> (TARGET_SIMD_CLONE_USABLE): Ditto.
> * config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
> * omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
> call.
>
> 2018-12-19 Steve Ellcey <[email protected]>
>
> * g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
> warning checks and assembler scans.
> * g++.dg/gomp/declare-simd-3.C: Ditto.
> * g++.dg/gomp/declare-simd-4.C: Ditto.
> * g++.dg/gomp/declare-simd-7.C: Ditto.
> * gcc.dg/gomp/declare-simd-1.c: Ditto.
> * gcc.dg/gomp/declare-simd-3.c: Ditto.
Sorry, hadn't realised this was still unreviewed.
> @@ -18064,6 +18066,138 @@ aarch64_estimated_poly_value (poly_int64 val)
> return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
> }
>
> +
> +/* Return true for types that could be supported as SIMD return or
> + argument types. */
> +
> +static bool supported_simd_type (tree t)
> +{
> + return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));
We should also check that the size is 1, 2, 4 or 8 bytes.
> +}
> +
> +/* Return true for types that currently are supported as SIMD return
> + or argument types. */
> +
> +static bool currently_supported_simd_type (tree t)
> +{
> + if (COMPLEX_FLOAT_TYPE_P (t))
> + return false;
> +
> + return supported_simd_type (t);
> +}
> +
> +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */
> +
> +static int
> +aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
> + struct cgraph_simd_clone *clonei,
> + tree base_type,
> + int num ATTRIBUTE_UNUSED)
> +{
> + const char *wmsg;
> + int vsize;
> + tree t, ret_type, arg_type;
> +
> + if (!TARGET_SIMD)
> + return 0;
> +
> + if (clonei->simdlen
> + && (clonei->simdlen < 2
> + || clonei->simdlen > 1024
> + || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> + {
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> + "unsupported simdlen %d", clonei->simdlen);
> + return 0;
> + }
> +
> + ret_type = TREE_TYPE (TREE_TYPE (node->decl));
> + if (TREE_CODE (ret_type) != VOID_TYPE
> + && !currently_supported_simd_type (ret_type))
> + {
> + if (supported_simd_type (ret_type))
> + wmsg = G_("GCC does not currently support return type %qT for simd");
> + else
> + wmsg = G_("unsupported return type return type %qT for simd");
Typo: doubled "return type".
Maybe s/for simd/for %<simd%> functions/ in both messages.
Since that will blow the line limit...
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, ret_type);
> + return 0;
> + }
...it's probably worth just calling warning_at in each arm of the "if".
We'll then get proper format checking in bootstraps.
> + for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> + {
> + arg_type = TREE_TYPE (t);
> + if (POINTER_TYPE_P (arg_type))
> + arg_type = TREE_TYPE (arg_type);
> + if (!currently_supported_simd_type (arg_type))
> + {
> + if (supported_simd_type (arg_type))
> + wmsg = G_("GCC does not currently support argument type %qT "
> + "for simd");
> + else
> + wmsg = G_("unsupported argument type %qT for simd");
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, arg_type);
> + return 0;
> + }
> + }
The ABI supports all argument types, so this should only check
current_supported_simd_type and should always use the "GCC does not..."
message.
Could you explain why the POINTER_TYPE_P handling is correct?
Think it's worth a comment. Dropping it is also fine if that's easier.
(Just in case: the point about PBV in my previous reply was that if
an argument isn't a natural vector element, the ABI says that you
should implicitly convert it to a vector of pointers instead, which is
how it can handle any argument type. I didn't mean that we should add
a pointer check here.)
> + if (clonei->simdlen == 0)
> + {
> + if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> + clonei->simdlen = clonei->vecsize_int;
> + else
> + clonei->simdlen = clonei->vecsize_float;
> + clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> + return 1;
> + }
I should have noticed this last time, but base_type is the CDT in the
Intel ABI. That isn't always right for the AArch64 ABI.
I think for now currently_supported_simd_type should take base_type
as a second parameter and check that the given type has the same size.
> + /* Restrict ourselves to vectors that fit in a single register */
> +
> + gcc_assert (tree_fits_shwi_p (TYPE_SIZE (base_type)));
> + vsize = clonei->simdlen * tree_to_shwi (TYPE_SIZE (base_type));
> + if (vsize > 128)
> + {
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> + "GCC does not currently support simdlen %d for type %qT",
> + clonei->simdlen, base_type);
> + return 0;
> + }
nit: block contents indented too far.
> + return 0;
> +}
Doesn't this mean that we always silently fail for an explicit and
correct simdlen? If so, that suggests we might not have enough
testsuite coverage :-)
Thanks,
Richard