On 17 September 2012 20:04, Richard Earnshaw <rearn...@arm.com> wrote:
> On 17/09/12 16:50, Christophe Lyon wrote:
>> On 17 September 2012 17:21, Richard Earnshaw <rearn...@arm.com> wrote:
>>> On 17/09/12 16:13, Christophe Lyon wrote:
>>>> On 17 September 2012 14:56, Richard Earnshaw <rearn...@arm.com> wrote:
>>>>> On 05/09/12 23:14, Christophe Lyon wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Although the recent optimization I have committed to use Neon vext
>>>>>> instruction for suitable builtin_shuffle calls does not support
>>>>>> big-endian yet, I have written a patch to the existing testcases such
>>>>>> they now support big-endian mode.
>>>>>>
>>>>>> I think it's worth improving these tests since writing the right masks
>>>>>> for big-endian (such that the program computes the same results as in
>>>>>> little-endian) is not always straightforward.
>>>>>>
>>>>>> In particular:
>>>>>> * I have added some comments in a few tests were it took me a while to
>>>>>> find the right mask.
>>>>>> * In the case of the test which is executed, I had to force the
>>>>>> noinline attribute on the helper functions, otherwise the computed
>>>>>> results are wrong in big-endian. It is probably an overkill workaround
>>>>>> but it works :-)
>>>>>>   I am going to file a bugzilla for this problem.
>>>>>>
>>>>>> I have checked that replacing calls to builtin_shuffle by the expected
>>>>>> Neon vext variant produces the expected results in big-endian mode,
>>>>>> and I arranged the big-endian masks to get the same results.
>>>>>>
>>>>>> Christophe.=
>>>>>>
>>>>>>
>>>>>> neon-vext-big-endian-tests.patch
>>>>>>
>>>>>>
>>>>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>>>>>>
>>>>>
>>>>>
>>>>> I'm not sure about this.  Looking at the documentation in the manual for
>>>>> builtin_suffle makes no mention of the results/behaviour being endian
>>>>> dependent, which makes me wonder why this test needs to be.
>>>>>
>>>>> R.
>>>>
>>>>
>>>> Indeed, but I had to modify the mask value in order to get the same
>>>> results in big and little-endian.
>>>>
>>>> If the mask should be the same (it would be much more confortable for
>>>> the developers indeed), then GCC needs to be changed/fixed.
>>>>
>>>
>>> That's what I'm trying to establish.  I suspect that there is a bug in
>>> GCC for all big-endian code here.
>>>
>>> What happens for a test of uint8x8_t?
>>>
>>
>> Well, in my sample testcase in little-endian, I used mask = {2, 3, 4,
>> 5, 6, 7, 8, 9}, which can be optimized into vext #2.
>>
>> In big-endian mode, explicitly forcing use of vext #2 leads to the
>> right result, but to achieve it using builtin_shuffle, I had to change
>> the mask into {14, 15, 0, 1, 2, 3, 4, 5}.
>>
>> I did read the thread starting at
>> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01133.html and the
>> threads it references, and I must admit that I got a bit confused :-)
>>
>> IMHO, it's currently impossible for a GCC user to write code using
>> vector initializers that would be portable on big and little endian
>> targets. It's too much of a headache....
>>
>> It was also a purpose of this patch: have someone react if it looked
>> inappropriate.
>>
>> Thanks for the review,
>>
>> Christophe.
>>
>
> I think for big-endian, __builtin_shuffle needs to expand to (for 64-bit
> vectors)
>
>         vrev64.<size> mask
>         vext
>
> and for 128-bit vectors
>
>         vrev64.<size> mask
>         vswap  mask<top-dword>, mask<low-dword>
>         vext ...
>
> Obviously, if you've got a literal you can simplify the operations down
> to something that doesn't need the extra instructions.
>
> R.
>

Does this mean that the currently generated code is wrong (I mean when
no optimization is performed by the compiler, as it is currently the
case with GCC in big-endian) ?
When the input mask is known by the compiler, it generates a series a
moves to perform the shuffle operation.

Christophe.

Reply via email to