On 9 October 2012 10:37, Jubi Taneja <jubitan...@gmail.com> wrote:
> Hi All,
>
> I wanted to see the difference in objdump of an application where I can make
> the difference between the VFPV3 and VFPV4 support. I tried enabling the
> flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test
> code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o-
/tmp/fma.c -mfloat-abi=hard -O2 */
float f(float a, float b, float c)
{
  return a * b + c;
}
/* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example.  Which
one you want to use depends on whether you have configured your system
for hard or soft-float ABIs).

> According to my survey, the fused multiply and accumulate is the only
> instruction that can create the difference in two. Can any one provide the
> sample test code for the same? Precisely, I wish to see the difference in
> performance for vfpv3 and vfpv4.

I would be surprised if you see much difference at all.  VFPv3 has the
VMLA (non-fused multiply-accumulate) instruction, which does an extra
rounding-step, but I expect will have similar performance
characteristics to VFMA.

Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also
-mfpu=vfpv3-fp16 which added support for loading and storing
half-precision floating-point values.  Again this won't make a
performance difference unless you use half-precision as your storage
format.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to