On Fri, Feb 26, 2016 at 1:44 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh > <james.greenha...@arm.com> wrote: >> On Thu, Feb 25, 2016 at 09:25:45AM +0000, Kyrill Tkachov wrote: >>> Hi all, >>> >>> In this wrong-code PR we get bad code when synthesising a TImode right shift >>> by variable amount using DImode shifts during expand. >>> >>> The expand_double_word_shift function expands two paths: one where the >>> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for >>> aarch64) at runtime >>> and one where it's less than that, and performs a conditional select >>> between the two in the end. >>> >>> The second path is the one that goes wrong here. >>> The algorithm it uses for calculating a TImode shift when the variable >>> shift amount is <64 is: >>> >>> >>> ------DImode----- ------DImode----- ------DImode----- >>> ------DImode----- >>> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ >>> source-lo ___| } >> var_amnt >>> >>> 1) carry = source-hi << 1 >>> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending >>> on shift_truncation_mask >>> 3) carry = carry << tmp // net effect is that carry is source-hi shifted >>> left by BITS_PER_WORD - var_amnt >>> 4) target-lo = source-lo >>u var_amnt //unsigned shift. >>> 5) target-lo = target-lo | carry >>> 6) target-hi = source-hi >> var_amnt >>> >>> where the two DImode words source-hi and source-lo are the two words of the >>> source TImode value and var_amnt is the register holding the variable shift >>> amount. This is performed by the expand_subword_shift function in optabs.c. >>> >>> Step 2) is valid only if the target truncates the shift amount by the width >>> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and >>> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case. >>> >>> Step 3) is the one that goes wrong. On aarch64 a left shift is usually >>> implemented using the LSL instruction but we also have alternatives that can >>> use the SIMD registers and the USHL instruction. In this case we end up >>> using the USHL instruction. However, although the USHL instruction does >>> a DImode shift, it doesn't truncate the shift amount by 64, but rather by >>> 255. Furthermore, the shift amount is interpreted as a 2's complement signed >>> integer and if it's negative performs a right shift. This is in contrast >>> with the normal LSL instruction which just performs an unsigned shift by an >>> amount truncated by 64. >>> >>> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we >>> don't >>> assume shift amounts are truncated unless we know we can only ever use the >>> LSL instructions for variable shifts. >>> >>> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode, >>> so expand_subword_shift assumes that since the mask is non-zero it's a valid >>> shift truncation mask. The documentation for TARGET_SHIFT_TRUNCATION_MASK >>> says: >>> " The default implementation of this function returns >>> @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED} >>> and 0 otherwise." >>> >>> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their >>> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when >>> SHIFT_COUNT_TRUNCATED is false. This is what the patch does, and with it >>> step 2) becomes: >>> 2) tmp = BITS_PER_WORD - 1 - var_amnt >>> which behaves correctly on aarch64. >>> >>> Unfortunately, the testcase from the PR has very recently gone latent on >>> trunk because it depends on register allocation picking a particular >>> alternative from the *aarch64_ashl_sisd_or_int_<mode>3 pattern in >>> aarch64.md. >>> I tried to do some inline asm tricks to get it to force the correct >>> constraints but was unsuccessful. Nevertheless I've included the testcase in >>> the patch. I suppose it's better than nothing. >> >> Thanks for the great write-up. This level of detail is very helpful for >> review. >> >> OK for trunk. >> >> Thanks, >> James >> >>> >>> Bootstrapped and tested on aarch64. >>> >>> Ok for trunk? >>> >>> Thanks, >>> Kyrill >>> >>> >>> 2016-02-25 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR target/69613 >>> * config/aarch64/aarch64.c (aarch64_shift_truncation_mask): >>> Return 0 if !SHIFT_COUNT_TRUNCATED >>> >>> 2016-02-25 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR target/69613 >>> * gcc.dg/torture/pr69613.c: New test. >> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index >>> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 >>> 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT >>> aarch64_shift_truncation_mask (machine_mode mode) >>> { >>> return >>> - (aarch64_vector_mode_supported_p (mode) >>> + (!SHIFT_COUNT_TRUNCATED >>> + || aarch64_vector_mode_supported_p (mode) >>> || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) >>> - 1); >>> } >>> >>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c >>> b/gcc/testsuite/gcc.dg/torture/pr69613.c >>> new file mode 100644 >>> index >>> 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c >>> @@ -0,0 +1,40 @@ >>> +/* PR target/69613. */ >>> +/* { dg-do run { target int128 } } */ >>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */ > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This is incorrect. You need to check AVX run-time to run > it on x86. >
Something like this? -- H.J.
diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c index 44f2b0c..99375f1 100644 --- a/gcc/testsuite/gcc.dg/torture/pr69613.c +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c @@ -1,6 +1,7 @@ /* PR target/69613. */ /* { dg-do run { target int128 } } */ -/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-Wno-psabi" { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ typedef unsigned short u16; typedef unsigned short v32u16 __attribute__ ((vector_size (32)));