Re: Improving the code generated for vld and vst intrinsics

2011-02-22 Thread Richard Sandiford
Julian Brown writes: >> > 2. Builtins (__builtin_neon_*) which previously used "big" integer >> > modes to pass/return values, are initialised such that they >> > directly pass/return the struct types above instead. The intrinsic >> > wrappers in arm_neon.h no longer need to use unions to pun the

Re: Improving the code generated for vld and vst intrinsics

2011-02-22 Thread Julian Brown
On Tue, 22 Feb 2011 09:42:15 + Richard Sandiford wrote: > Julian Brown writes: > > Richard Sandiford wrote: > > 1. Struct (tree) types are defined via hard-wired code in the ARM > > backend rather than in arm_neon.h. The "type mode" of those struct > > types is overridden to be an extra-wid

Re: Improving the code generated for vld and vst intrinsics

2011-02-22 Thread Richard Sandiford
Julian Brown writes: > Richard Sandiford wrote: >> One of the vectorisation discussions from last year was about the poor >> code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces >> the result of the loads onto the stack, then loads the individual >> pieces from there. It does the

Improving the code generated for vld and vst intrinsics

2011-02-21 Thread Richard Sandiford
One of the vectorisation discussions from last year was about the poor code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the result of the loads onto the stack, then loads the individual pieces from there. It does the same thing in reverse for stores. I think there are two majo