https://github.com/RKSimon requested changes to this pull request.
We really shouldn't need the 128 AND 256 variants. This is a general hop
pattern that you should be able to adapt:
```c
using T = __v16hi;
T hop(T x, T y) {
unsigned NumElems = sizeof(x) / sizeof(x[0]);
unsigned NumLanes = sizeof(x) / 16;
unsigned NumElemsPerLane = NumElems / NumLanes;
unsigned HalfElemsPerLane = NumElemsPerLane / 2;
T r;
for (unsigned L = 0; L != NumElems; L += NumElemsPerLane) {
for (unsigned E = 0; E != HalfElemsPerLane; ++E) {
r[L + E] = x[L+(2*E)+0] - x[L+(2*E)+1];
}
for (unsigned E = 0; E != HalfElemsPerLane; ++E) {
r[L + E + HalfElemsPerLane] = y[L+(2*E)+0] - y[L+(2*E)+1];
}
}
return r;
}
```
https://github.com/llvm/llvm-project/pull/156822
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits