https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The use of ldr/tbl vs rev64/ext is questionable and depend on if we are inside
a loop or not. In the case of it being inside the loop and there are enough
registers, then using TBL is better on many (not all though) micro-arches as it
is similar latency as rev64. 

Though I should note that clang/LLVM implements it as rev64/ext.

E.g.:
```

#define vector __attribute__((vector_size(16)))

vector char g(vector char a)
{
    return __builtin_shufflevector (a,a,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,
0);
}

vector char g1(vector char a)
{
    vector char t= __builtin_shufflevector
(a,a,7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8);
    vector long long t1 = (vector long long)t;
    t1 = __builtin_shufflevector(t1,t1, 1,0);
    return (vector char)t1;
}
```

Produces:
```
        rev64   v0.16b, v0.16b
        ext     v0.16b, v0.16b, v0.16b, #8
```

For both.

Reply via email to