On 17 September 2012 20:04, Richard Earnshaw <rearn...@arm.com> wrote: > On 17/09/12 16:50, Christophe Lyon wrote: >> On 17 September 2012 17:21, Richard Earnshaw <rearn...@arm.com> wrote: >>> On 17/09/12 16:13, Christophe Lyon wrote: >>>> On 17 September 2012 14:56, Richard Earnshaw <rearn...@arm.com> wrote: >>>>> On 05/09/12 23:14, Christophe Lyon wrote: >>>>>> Hello, >>>>>> >>>>>> Although the recent optimization I have committed to use Neon vext >>>>>> instruction for suitable builtin_shuffle calls does not support >>>>>> big-endian yet, I have written a patch to the existing testcases such >>>>>> they now support big-endian mode. >>>>>> >>>>>> I think it's worth improving these tests since writing the right masks >>>>>> for big-endian (such that the program computes the same results as in >>>>>> little-endian) is not always straightforward. >>>>>> >>>>>> In particular: >>>>>> * I have added some comments in a few tests were it took me a while to >>>>>> find the right mask. >>>>>> * In the case of the test which is executed, I had to force the >>>>>> noinline attribute on the helper functions, otherwise the computed >>>>>> results are wrong in big-endian. It is probably an overkill workaround >>>>>> but it works :-) >>>>>> I am going to file a bugzilla for this problem. >>>>>> >>>>>> I have checked that replacing calls to builtin_shuffle by the expected >>>>>> Neon vext variant produces the expected results in big-endian mode, >>>>>> and I arranged the big-endian masks to get the same results. >>>>>> >>>>>> Christophe.= >>>>>> >>>>>> >>>>>> neon-vext-big-endian-tests.patch >>>>>> >>>>>> >>>>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý >>>>>> >>>>> >>>>> >>>>> I'm not sure about this. Looking at the documentation in the manual for >>>>> builtin_suffle makes no mention of the results/behaviour being endian >>>>> dependent, which makes me wonder why this test needs to be. >>>>> >>>>> R. >>>> >>>> >>>> Indeed, but I had to modify the mask value in order to get the same >>>> results in big and little-endian. >>>> >>>> If the mask should be the same (it would be much more confortable for >>>> the developers indeed), then GCC needs to be changed/fixed. >>>> >>> >>> That's what I'm trying to establish. I suspect that there is a bug in >>> GCC for all big-endian code here. >>> >>> What happens for a test of uint8x8_t? >>> >> >> Well, in my sample testcase in little-endian, I used mask = {2, 3, 4, >> 5, 6, 7, 8, 9}, which can be optimized into vext #2. >> >> In big-endian mode, explicitly forcing use of vext #2 leads to the >> right result, but to achieve it using builtin_shuffle, I had to change >> the mask into {14, 15, 0, 1, 2, 3, 4, 5}. >> >> I did read the thread starting at >> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01133.html and the >> threads it references, and I must admit that I got a bit confused :-) >> >> IMHO, it's currently impossible for a GCC user to write code using >> vector initializers that would be portable on big and little endian >> targets. It's too much of a headache.... >> >> It was also a purpose of this patch: have someone react if it looked >> inappropriate. >> >> Thanks for the review, >> >> Christophe. >> > > I think for big-endian, __builtin_shuffle needs to expand to (for 64-bit > vectors) > > vrev64.<size> mask > vext > > and for 128-bit vectors > > vrev64.<size> mask > vswap mask<top-dword>, mask<low-dword> > vext ... > > Obviously, if you've got a literal you can simplify the operations down > to something that doesn't need the extra instructions. > > R. >
Does this mean that the currently generated code is wrong (I mean when no optimization is performed by the compiler, as it is currently the case with GCC in big-endian) ? When the input mask is known by the compiler, it generates a series a moves to perform the shuffle operation. Christophe.