On Mon, 1 Nov 2010 15:57:11 +0200
Ira Rosen <i...@il.ibm.com> wrote:

> It looks like it's enough to implement targetm.vectorize.
> autovectorize_vector_sizes for NEON in order to enable initial
> auto-detection of vector size. With the attached patch and
> -mvectorize-with-neon-quad flag, the vectorizer first tries to
> vectorize for 128 bit, and if this fails, it tries to vectorize for
> 64 bit. For example, in the attached testcase number of iterations is
> too small for 128 bit (first 2 iterations have to be peeled in order
> to align the array accesses), but is sufficient for 64 bit (the
> accesses are aligned here).

Cool, I hadn't found out about that hook yet (nor that
UNITS_PER_SIMD_WORD had disappeared in favour of
TARGET_VECTORIZE_PREFERRED_SIMD_MODE!).

> I'd appreciate your comments on the patch, and I also have a few
> questions: 1. Why the default vector size is 64?

Ah, well, that was a decision made several years ago, and may no longer
be the optimal choice. There are several possible reasons:

1. NEON hardware available at the time (Cortex-A8) only processed data
in 64-bit chunks, so Q-reg operations weren't necessarily any faster
than D-reg operations (that may still be true).

2. Initially misaligned operations were not supported, so larger
vectors increase the chances that loop versioning would be required.

3. Generally, setup/tear-down code is assumed to be larger for wider
vectors, and ARM hardware generally isn't very well-endowed with cache.

The choice wasn't supported by benchmarking, since IIRC we didn't
have real hardware when we implemented initial NEON support, and
nobody's revisited the decision since.

> 2. Where is the place of NEON vectorization tests? I found NEON tests
> with intrinsics at gcc.target/arm, is that the right place?

Looks like a good place to me. It might be an idea to avoid putting
stuff in gcc.target/arm/neon, since most of those tests are
auto-generated.

> 3. According to gcc.dg/vect/vect.exp the only flag that is used for
> NEON (in addition to target independent flags) is -ffast-math. Is
> that enough?

I'm not sure what you're asking here: there aren't any other
target-specific flags to tweak NEON behaviour. Generally to enable NEON
you need to set the float ABI and FPU variant (if they're not
configured-in to the compiler), e.g.:

 -mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]

* there are several variants for this, e.g. neon, neon-fp16,
neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8
chips at least.

>       * config/arm/arm.c (arm_autovectorize_vector_sizes): New
>       function.
>       (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.

Looks fine to me (though I'm not an ARM maintainer, so can't approve it
upstream). It's a bit of a pity the hook is defined in such a way that
it doesn't support non-power-of-two sized vectors, but never mind...

I wonder if this allows us to remove -mvectorize-with-neon-quad
already (or, perhaps, wire it on but make the option a no-op, for
possible backward-compatibility reasons)?

Thanks,

Julian

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to