subject:"\[PATCH 5\/6\] aarch64\: Emit XAR for vector rotates where possible"

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-27 Thread Kyrylo Tkachov

Hi all, We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation awa

Re: [PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-23 Thread Richard Sandiford

Kyrylo Tkachov writes: > Hi all, > > We can make use of the integrated rotate step of the XAR instruction > to implement most vector integer rotates, as long we zero out one > of the input registers for it. This allows for a lower-latency sequence > than the fallback SHL+USRA, especially when we

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-22 Thread Kyrylo Tkachov

Hi all, We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation awa

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

Re: [PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

3 matches

Site Navigation

Mail list logo

Footer information