On Fri, Feb 26, 2016 at 1:44 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh
> <james.greenha...@arm.com> wrote:
>> On Thu, Feb 25, 2016 at 09:25:45AM +0000, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In this wrong-code PR we get bad code when synthesising a TImode right shift
>>> by variable amount using DImode shifts during expand.
>>>
>>> The expand_double_word_shift function expands two paths: one where the
>>> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for 
>>> aarch64) at runtime
>>> and one where it's less than that, and performs a conditional select 
>>> between the two in the end.
>>>
>>> The second path is the one that goes wrong here.
>>> The algorithm it uses for calculating a TImode shift when the variable 
>>> shift amount is <64 is:
>>>
>>>
>>>   ------DImode-----   ------DImode----- ------DImode-----   
>>> ------DImode-----
>>> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ 
>>> source-lo ___| } >> var_amnt
>>>
>>> 1) carry = source-hi << 1
>>> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending 
>>> on shift_truncation_mask
>>> 3) carry = carry << tmp // net effect is that carry is source-hi shifted 
>>> left by BITS_PER_WORD - var_amnt
>>> 4) target-lo = source-lo >>u var_amnt //unsigned shift.
>>> 5) target-lo = target-lo | carry
>>> 6) target-hi = source-hi >> var_amnt
>>>
>>> where the two DImode words source-hi and source-lo are the two words of the
>>> source TImode value and var_amnt is the register holding the variable shift
>>> amount. This is performed by the expand_subword_shift function in optabs.c.
>>>
>>> Step 2) is valid only if the target truncates the shift amount by the width
>>> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and
>>> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.
>>>
>>> Step 3) is the one that goes wrong.  On aarch64 a left shift is usually
>>> implemented using the LSL instruction but we also have alternatives that can
>>> use the SIMD registers and the USHL instruction.  In this case we end up
>>> using the USHL instruction. However, although the USHL instruction does
>>> a DImode shift, it doesn't truncate the shift amount by 64, but rather by
>>> 255. Furthermore, the shift amount is interpreted as a 2's complement signed
>>> integer and if it's negative performs a right shift.  This is in contrast
>>> with the normal LSL instruction which just performs an unsigned shift by an
>>> amount truncated by 64.
>>>
>>> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we 
>>> don't
>>> assume shift amounts are truncated unless we know we can only ever use the
>>> LSL instructions for variable shifts.
>>>
>>> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode,
>>> so expand_subword_shift assumes that since the mask is non-zero it's a valid
>>> shift truncation mask.  The documentation for TARGET_SHIFT_TRUNCATION_MASK
>>> says:
>>> " The default implementation of this function returns
>>>   @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
>>>   and 0 otherwise."
>>>
>>> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their
>>> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when
>>> SHIFT_COUNT_TRUNCATED is false.  This is what the patch does, and with it
>>> step 2) becomes:
>>> 2) tmp = BITS_PER_WORD - 1 - var_amnt
>>> which behaves correctly on aarch64.
>>>
>>> Unfortunately, the testcase from the PR has very recently gone latent on
>>> trunk because it depends on register allocation picking a particular
>>> alternative from the *aarch64_ashl_sisd_or_int_<mode>3 pattern in 
>>> aarch64.md.
>>> I tried to do some inline asm tricks to get it to force the correct
>>> constraints but was unsuccessful. Nevertheless I've included the testcase in
>>> the patch. I suppose it's better than nothing.
>>
>> Thanks for the great write-up. This level of detail is very helpful for
>> review.
>>
>> OK for trunk.
>>
>> Thanks,
>> James
>>
>>>
>>> Bootstrapped and tested on aarch64.
>>>
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>     PR target/69613
>>>     * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
>>>     Return 0 if !SHIFT_COUNT_TRUNCATED
>>>
>>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>     PR target/69613
>>>     * gcc.dg/torture/pr69613.c: New test.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 
>>> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107
>>>  100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
>>>  aarch64_shift_truncation_mask (machine_mode mode)
>>>  {
>>>    return
>>> -    (aarch64_vector_mode_supported_p (mode)
>>> +    (!SHIFT_COUNT_TRUNCATED
>>> +     || aarch64_vector_mode_supported_p (mode)
>>>       || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) 
>>> - 1);
>>>  }
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c 
>>> b/gcc/testsuite/gcc.dg/torture/pr69613.c
>>> new file mode 100644
>>> index 
>>> 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
>>> @@ -0,0 +1,40 @@
>>> +/* PR target/69613.  */
>>> +/* { dg-do run { target int128 } } */
>>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is incorrect.  You need to check AVX run-time to run
> it on x86.
>

Something like this?


-- 
H.J.
diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c
index 44f2b0c..99375f1 100644
--- a/gcc/testsuite/gcc.dg/torture/pr69613.c
+++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
@@ -1,6 +1,7 @@
 /* PR target/69613.  */
 /* { dg-do run { target int128 } } */
-/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-Wno-psabi" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
 
 typedef unsigned short u16;
 typedef unsigned short v32u16 __attribute__ ((vector_size (32)));

Reply via email to