Re: Let the target choose a vectorisation alignment

Richard Sandiford Fri, 06 Oct 2017 05:48:39 -0700

[+arm maintainers]

Christophe Lyon <christophe.l...@linaro.org> writes:
> On 18 September 2017 at 15:57, Richard Sandiford
> <richard.sandif...@linaro.org> wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
>>> <richard.sandif...@linaro.org> wrote:
>>>> The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
>>>> there was also a hard-coded assumption that this was equal to the type
>>>> size.  This was inconvenient for SVE for two reasons:
>>>>
>>>> - When compiling for a specific power-of-2 SVE vector length, we might
>>>>   want to align to a full vector.  However, the TYPE_ALIGN is governed
>>>>   by the ABI alignment, which is 128 bits regardless of size.
>>>>
>>>> - For vector-length-agnostic code it doesn't usually make sense to align,
>>>>   since the runtime vector length might not be a power of two.  Even for
>>>>   power of two sizes, there's no guarantee that aligning to the previous
>>>>   16 bytes will be an improveent.
>>>>
>>>> This patch therefore adds a target hook to control the preferred
>>>> vectoriser (as opposed to ABI) alignment.
>>>>
>>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>>>> Also tested by comparing the testsuite assembly output on at least one
>>>> target per CPU directory.  OK to install?
>>>
>>> Did you specifically choose to pass the hook a vector type rather than
>>> a mode?
>>
>> It seemed like the safest thing to do for the default implementation,
>> e.g. in case we're vectorising "without SIMD" and thus without an
>> underlying vector mode.  I agree it probably doesn't make much
>> difference for non-default implementations.
>>
>>> I suppose in peeling for alignment the target should be able to
>>> prevent peeling by returning element alignment from the hook?
>>
>> Yeah.  This is what the SVE port does in the default vector-length
>> agnostic mode, and might also make sense in fixed-length mode.
>> Maybe it would be useful for other targets too, if unaligned accesses
>> have a negligible penalty for them.
>>
>
> Since this was committed (r253101), I've noticed a regression
> on arm-none-linux-gnueabihf:
> FAIL:    gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 4
> FAIL:    gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect
> "Vectorizing an unaligned access" 4
>
> for instance, with
> --target arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16


Thanks for letting me know.  I hadn't realised that AArch32 had an
ABI alignment of 64 bits for 128-bit vectors.

The problem is that, before the patch, we were inconsistent about
using the type alignment and type size for alignment calculations.
vect_compute_data_ref_alignment used the type alignment, so for
128-bit vectors would calculate misalignment relative to 64 bits
rather than 128 bits.  But vect_enhance_data_refs_alignment
calculated the number of peels based on the type size rather
than the type alignment, so would act as though the misalignment
calculated by vect_compute_data_ref_alignment was relative to
128 bits rather than 64 bits.

So for one loop in the test case, we pick a "vector short (8)".
vect_compute_data_ref_alignment realised that one of the original scalar
short accesses was misaligned by 6 bytes (3 elements) relative to the
ABI-defined 8-byte boundary.  However, vect_enhance_data_refs_alignment
applied that misalignment to the vector size of 16 bytes.  We would
then peel 5 iterations to achieve alignment, even though only one
iteration was needed to achieve 64-bit alignment.

(Peeling 5 iterations doesn't achieve 128-bit alignment, since for this
loop we can't compute the misalignment relative to 128 bits at compile
time.  All we were doing was peeling an extra vector iteration to
achieve 64-bit alignment.)

After the patch we base everything on the ABI's 64-bit alignment,
so peel once rather than 5 times.  A combination of these effects
mean that we end up with fewer accesses being reported as unaligned.

In that sense the patch is behaving as designed.  But the question is:
what should the preferred alignment for vectorisaton purposes be on Arm?
The options seem to be:

1) Prefer to align to the ABI alignment, as current trunk does.
2) Prefer to align to the vector size.

Both are changes from the pre-patch behaviour for this testcase,
since as I mentioned above, we cannot compute the misalignment
wrt the vector size at compile time (but before the patch we
acted as though we could).

The attached patch would do 2).  (Minus changelog since it's
just an illustration.)  Changes to the testsuite markup are
needed both ways.

Thanks,
Richard


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3e5438a..6cee000 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -273,6 +273,7 @@ static bool arm_array_mode_supported_p (machine_mode,
 static machine_mode arm_preferred_simd_mode (scalar_mode);
 static bool arm_class_likely_spilled_p (reg_class_t);
 static HOST_WIDE_INT arm_vector_alignment (const_tree type);
+static HOST_WIDE_INT arm_preferred_vector_alignment (const_tree);
 static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
 static bool arm_builtin_support_vector_misalignment (machine_mode mode,
                                                     const_tree type,
@@ -718,6 +719,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_VECTOR_ALIGNMENT
 #define TARGET_VECTOR_ALIGNMENT arm_vector_alignment
+#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
+#define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \
+  arm_preferred_vector_alignment
 
 #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
 #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \
@@ -27922,6 +27926,16 @@ arm_vector_alignment (const_tree type)
   return align;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT.  We prefer to
+   align 128-bit vectors to 128 bits even when the ABI alignment is
+   clamped to 64 bits.  */
+
+static HOST_WIDE_INT
+arm_preferred_vector_alignment (const_tree type)
+{
+  return tree_to_shwi (TYPE_SIZE (type));
+}
+
 static unsigned int
 arm_autovectorize_vector_sizes (void)
 {

Re: Let the target choose a vectorisation alignment

Reply via email to