LLVM ARM NEON VMUL.f32

Renato Golin Tue, 19 Mar 2013 14:58:19 -0700

Hi folks,

I found an issue while fixing a test using the wrong VMUL.f32, and I'd like
to know what should be our choice on this topic that is slightly
controversial.


Basically, LLVM chooses to lower single-precision FMUL to NEON's VMUL.f32
instead of VFP's version because, on some cores (A8, A5 and Apple's Swift),
the VFP variant is really slow.

This is all cool and dandy, but NEON is not IEEE 754 compliant, so the
result is slightly different. So slightly that only one test, that was
really pushing the boundaries (ie. going below FLT_MIN) did catch it.

There are two ways we can go here:

1. Strict IEEE compatibility and *only* lower NEON's VMUL if unsafe-math is
on. This will make generic single-prec. code slower but you can always turn
unsafe-math on if you want more speed.

2. Continue using NEON for f32 by default and put a note somewhere that
people should turn this option (FeatureNEONForFP) off on A5/A8 if they
*really* care about maximum IEEE compliance.

Apple already said that for Darwin, 2 is still the option of choice. Do we
agree and ignore this issue? Or for GNU/EABI we want strict conformance by
default?

GCC uses fmuls...

cheers,
--renato

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

LLVM ARM NEON VMUL.f32

Reply via email to