[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-27 Thread Kyrylo Tkachov
Hi all, We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation awa

Re: [PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-23 Thread Richard Sandiford
Kyrylo Tkachov writes: > Hi all, > > We can make use of the integrated rotate step of the XAR instruction > to implement most vector integer rotates, as long we zero out one > of the input registers for it. This allows for a lower-latency sequence > than the fallback SHL+USRA, especially when we

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-22 Thread Kyrylo Tkachov
Hi all, We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation awa