On Mon, 1 Nov 2010 15:57:11 +0200 Ira Rosen <i...@il.ibm.com> wrote: > It looks like it's enough to implement targetm.vectorize. > autovectorize_vector_sizes for NEON in order to enable initial > auto-detection of vector size. With the attached patch and > -mvectorize-with-neon-quad flag, the vectorizer first tries to > vectorize for 128 bit, and if this fails, it tries to vectorize for > 64 bit. For example, in the attached testcase number of iterations is > too small for 128 bit (first 2 iterations have to be peeled in order > to align the array accesses), but is sufficient for 64 bit (the > accesses are aligned here).
Cool, I hadn't found out about that hook yet (nor that UNITS_PER_SIMD_WORD had disappeared in favour of TARGET_VECTORIZE_PREFERRED_SIMD_MODE!). > I'd appreciate your comments on the patch, and I also have a few > questions: 1. Why the default vector size is 64? Ah, well, that was a decision made several years ago, and may no longer be the optimal choice. There are several possible reasons: 1. NEON hardware available at the time (Cortex-A8) only processed data in 64-bit chunks, so Q-reg operations weren't necessarily any faster than D-reg operations (that may still be true). 2. Initially misaligned operations were not supported, so larger vectors increase the chances that loop versioning would be required. 3. Generally, setup/tear-down code is assumed to be larger for wider vectors, and ARM hardware generally isn't very well-endowed with cache. The choice wasn't supported by benchmarking, since IIRC we didn't have real hardware when we implemented initial NEON support, and nobody's revisited the decision since. > 2. Where is the place of NEON vectorization tests? I found NEON tests > with intrinsics at gcc.target/arm, is that the right place? Looks like a good place to me. It might be an idea to avoid putting stuff in gcc.target/arm/neon, since most of those tests are auto-generated. > 3. According to gcc.dg/vect/vect.exp the only flag that is used for > NEON (in addition to target independent flags) is -ffast-math. Is > that enough? I'm not sure what you're asking here: there aren't any other target-specific flags to tweak NEON behaviour. Generally to enable NEON you need to set the float ABI and FPU variant (if they're not configured-in to the compiler), e.g.: -mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a] * there are several variants for this, e.g. neon, neon-fp16, neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8 chips at least. > * config/arm/arm.c (arm_autovectorize_vector_sizes): New > function. > (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define. Looks fine to me (though I'm not an ARM maintainer, so can't approve it upstream). It's a bit of a pity the hook is defined in such a way that it doesn't support non-power-of-two sized vectors, but never mind... I wonder if this allows us to remove -mvectorize-with-neon-quad already (or, perhaps, wire it on but make the option a no-op, for possible backward-compatibility reasons)? Thanks, Julian _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain