On 10/11/24 20:00, Jonathan Wakely wrote: > > On Fri, 11 Oct 2024 at 19:52, Christophe Lyon > <christophe.l...@linaro.org> wrote: >> >> On Fri, 11 Oct 2024 at 17:52, Jonathan Wakely <jwak...@redhat.com> wrote: >>> >>> On Wed, 9 Oct 2024 at 10:41, Ricardo Jesus <r...@nvidia.com> wrote: >>>> >>>> This patch modifies the implementation of the vectorized Mersenne >>>> Twister random number generator to use __builtin_shufflevector instead >>>> of __builtin_shuffle. This makes it (almost) compatible with Clang. >>>> >>>> To make the implementation fully compatible with Clang, Clang will need >>>> to support internal Neon types like __Uint8x16_t and __Uint32x4_t, which >>>> currently it does not. This looks like an oversight in Clang and so will >>>> be addressed separately. >>>> >>>> I see no codegen change with this patch. >>> >>> I'm not qualified to review this myself, but I'd at least like to see >>> the CI checks passing: >>> https://patchwork.sourceware.org/project/gcc/patch/c911a45e-5924-4a4b-9b6b-bb3af0cc7...@nvidia.com/ >>> Apparently the patch couldn't be applied. >>> >>> Please configure your email client (thunderbird?) to not munge the >>> patch, or attach it rather than sending inline. Or just use >>> git-send-email :-) >>> >> Hi! >> >> The problem is not with how the patch was sent: patchwork managed to >> see it as a patch, and the CI tried to apply it. >> The problem is that for some reason, git was not able to apply the >> patch to our current baseline. >> Unfortunately, we do not go as far as calling 'git am >> --show-current-patch=diff' or something else to provide more info in >> our CI logs, so we can only guess that something went wrong. Maybe >> your patch is based against a too old revision of GCC? > > No, that file hasn't changed anywhere near the patch. The problem is > that the patch was munged by thunderbird, adding line breaks where > they corrupt the patch: > > > Applying: Use shufflevector instead of shuffle in opt_random.h > error: git diff header lacks filename information when removing 1 > leading pathname component (line 6) > Patch failed at 0001 Use shufflevector instead of shuffle in opt_random.h > > I fixed that manually, but it still fails: > > Applying: Use shufflevector instead of shuffle in opt_random.h > error: patch failed: libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h:35 > error: libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h: patch > does not apply > Patch failed at 0001 Use shufflevector instead of shuffle in opt_random.h > > That's because of an incorrect number of space characters on the > unchanged context lines around the +/- diffs. I fixed that manually, > and failed at the next chunk: > > Applying: Use shufflevector instead of shuffle in opt_random.h > error: patch failed: libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h:52 > error: libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h: patch > does not apply > Patch failed at 0001 Use shufflevector instead of shuffle in opt_random.h > > So the problem is how the patch was sent. >
Hi, I'm sorry about this. I've resubmitted the patch with git-send-email, please let me know if that still doesn't work. Thanks, Ricardo >> >> Thanks, >> >> Christophe >> >> >>> >>>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>> >>>> Signed-off-by: Ricardo Jesus <r...@nvidia.com> >>>> >>>> 2024-09-05 Ricardo Jesus <r...@nvidia.com> >>>> >>>> * config/cpu/aarch64/opt/ext/opt_random.h (__VEXT): Replace uses >>>> of __builtin_shuffle with __builtin_shufflevector. >>>> (__aarch64_lsl_128): Move shift amount to a template parameter. >>>> (__aarch64_lsr_128): Move shift amount to a template parameter. >>>> (__aarch64_recursion): Update call sites of __aarch64_lsl_128 >>>> and __aarch64_lsr_128. >>>> --- >>>> .../config/cpu/aarch64/opt/ext/opt_random.h | 28 +++++++++++-------- >>>> 1 file changed, 16 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h >>>> b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h >>>> index 7f756d1572f..7eb816abcd0 100644 >>>> --- a/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h >>>> +++ b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h >>>> @@ -35,13 +35,13 @@ >>>> #ifdef __ARM_NEON >>>> >>>> #ifdef __ARM_BIG_ENDIAN >>>> -# define __VEXT(_A,_B,_C) __builtin_shuffle (_A, _B, (__Uint8x16_t) \ >>>> - {16-_C, 17-_C, 18-_C, 19-_C, 20-_C, 21-_C, 22-_C, 23-_C, \ >>>> - 24-_C, 25-_C, 26-_C, 27-_C, 28-_C, 29-_C, 30-_C, 31-_C}) >>>> +# define __VEXT(_A,_B,_C) __builtin_shufflevector (_A, _B, \ >>>> + 16-_C, 17-_C, 18-_C, 19-_C, 20-_C, 21-_C, 22-_C, 23-_C, \ >>>> + 24-_C, 25-_C, 26-_C, 27-_C, 28-_C, 29-_C, 30-_C, 31-_C) >>>> #else >>>> -# define __VEXT(_A,_B,_C) __builtin_shuffle (_B, _A, (__Uint8x16_t) \ >>>> - {_C, _C+1, _C+2, _C+3, _C+4, _C+5, _C+6, _C+7, \ >>>> - _C+8, _C+9, _C+10, _C+11, _C+12, _C+13, _C+14, _C+15}) >>>> +# define __VEXT(_A,_B,_C) __builtin_shufflevector (_B, _A, \ >>>> + _C, _C+1, _C+2, _C+3, _C+4, _C+5, _C+6, _C+7, \ >>>> + _C+8, _C+9, _C+10, _C+11, _C+12, _C+13, _C+14, _C+15) >>>> #endif >>>> >>>> #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ >>>> @@ -52,9 +52,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> namespace { >>>> // Logical Shift right 128-bits by c * 8 bits >>>> >>>> - __extension__ extern __inline __Uint32x4_t >>>> + __extension__ >>>> + template<int __c> >>>> + extern __inline __Uint32x4_t >>>> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >>>> - __aarch64_lsr_128 (__Uint8x16_t __a, __const int __c) >>>> + __aarch64_lsr_128 (__Uint8x16_t __a) >>>> { >>>> const __Uint8x16_t __zero = {0, 0, 0, 0, 0, 0, 0, 0, >>>> 0, 0, 0, 0, 0, 0, 0, 0}; >>>> @@ -64,9 +66,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> >>>> // Logical Shift left 128-bits by c * 8 bits >>>> >>>> - __extension__ extern __inline __Uint32x4_t >>>> + __extension__ >>>> + template<int __c> >>>> + extern __inline __Uint32x4_t >>>> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >>>> - __aarch64_lsl_128 (__Uint8x16_t __a, __const int __c) >>>> + __aarch64_lsl_128 (__Uint8x16_t __a) >>>> { >>>> const __Uint8x16_t __zero = {0, 0, 0, 0, 0, 0, 0, 0, >>>> 0, 0, 0, 0, 0, 0, 0, 0}; >>>> @@ -82,14 +86,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> __Uint32x4_t __e) >>>> { >>>> __Uint32x4_t __y = (__b >> __sr1); >>>> - __Uint32x4_t __z = __aarch64_lsr_128 ((__Uint8x16_t) __c, __sr2); >>>> + __Uint32x4_t __z = __aarch64_lsr_128<__sr2> ((__Uint8x16_t) __c); >>>> >>>> __Uint32x4_t __v = __d << __sl1; >>>> >>>> __z = __z ^ __a; >>>> __z = __z ^ __v; >>>> >>>> - __Uint32x4_t __x = __aarch64_lsl_128 ((__Uint8x16_t) __a, __sl2); >>>> + __Uint32x4_t __x = __aarch64_lsl_128<__sl2> ((__Uint8x16_t) __a); >>>> >>>> __y = __y & __e; >>>> __z = __z ^ __x; >>>> -- >>>> 2.44.0 >>>> >>> >> >