Richard Earnshaw wrote:

> Hmm, this is going to cause bottlenecks on Cortex-A15: writing a Neon
> single-precision register and then reading it back as a double-precision
> value will cause scheduling problems.

Ok, that is a problem ...

> The awkward thing here is that the shift only uses the bottom 8 bits of
> the register, even though the instruction takes a 64-bit register, so we
> don't want to go to the trouble of sign-extending the value all the way
> out to 64-bits.

We don't really care what the upper bits are set to.  Would a
  vdup.32 Dn, Rm
(instead of the vmov) help here, or does this likewise have
performance issues?
 
> A solution to this is to have the set of the shifter register done as a
> lane-set operation rather than as a set of the lower register, but it
> probably needs some thought as to how to achieve this without creating
> other overheads.

What instruction are you refering to here?  Loads from memory?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Reply via email to