Following on from last night's performance call, I had a look at how
64 bit integer operations are mapped to NEON instructions.  The
summary is:

 * add - fine
 * subtract - fine
 * bitwise and - fine
 * bitwise or - fine
 * bitwise xor - fine
 * multiply - can't as the instruction tops out at 32 bits.  Might be
able to compose using VMLAL
 * div, mod - no instruction
 * negate - instruction tops out at 32 bits, but could be turned into
vmov #0, vsub
 * left shift constant - missing
 * right shift constant - missing
 * right arithmetic shift constant - missing
 * left shift register - missing
 * right shift register - tricky, as you do this as a left shift -register
 * not - no instruction, but could be done through a vceq, #0?
 * bitwise not - missing

I also noticed that the replicated constants aren't being used.  A
pre-increment is load constant pool; vadd but could be done as a vmov,
#-1; vsub.  The same with pre-decrement - it could be done as a vmov,
#-1; vadd.

This seems worth blueprinting.

-- Michael

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to