On 9 October 2012 10:37, Jubi Taneja <jubitan...@gmail.com> wrote: > Hi All, > > I wanted to see the difference in objdump of an application where I can make > the difference between the VFPV3 and VFPV4 support. I tried enabling the > flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test > code but cannot see the difference in two objdumps.
Try the following (tested against FSF GCC: /* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */ (Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs). > According to my survey, the fused multiply and accumulate is the only > instruction that can create the difference in two. Can any one provide the > sample test code for the same? Precisely, I wish to see the difference in > performance for vfpv3 and vfpv4. I would be surprised if you see much difference at all. VFPv3 has the VMLA (non-fused multiply-accumulate) instruction, which does an extra rounding-step, but I expect will have similar performance characteristics to VFMA. Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also -mfpu=vfpv3-fp16 which added support for loading and storing half-precision floating-point values. Again this won't make a performance difference unless you use half-precision as your storage format. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain