On 09/28/2016 10:24 PM, Richard Henderson wrote:
On 09/27/2016 10:45 PM, Rajalakshmi Srinivasaraghavan wrote:+#if defined(HOST_WORDS_BIGENDIAN) +#define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int i; \ + int index = a & 0xf; \ + for (i = 0; i < elem; i++) { \ + r = r << 8; \ + if (index + i <= 15) { \ + r = r | b->u8[index + i]; \ + } \ + } \ + return r; \ +} +#else +#define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int i; \ + int index = 15 - (a & 0xf); \ + for (i = 0; i < elem; i++) { \ + r = r << 8; \ + if (index - i >= 0) { \ + r = r | b->u8[index - i]; \ + } \ + } \ + return r; \ +} +#endif + +VEXTULX_DO(vextublx, 1) +VEXTULX_DO(vextuhlx, 2) +VEXTULX_DO(vextuwlx, 4) +#undef VEXTULX_DOEw. This should be one 128-bit shift and one and. Since the shift amount is a multiple of 8, the 128-bit shift for vextub[lr]x does not need to cross a double-word boundary, and so can be decomposed into one 64-bit shift of (count & 64 ? hi : lo). For vextu[hw]lr]x, you'd need to do the whole left-shift, right-shift, or thing. But still, fantastically better than a loop.
Ack. Will send an updated patch.
r~
-- Thanks Rajalakshmi S
